astrophot.fit package

astrophot.fit package#

Subpackages#

Submodules#

astrophot.fit.base module#

class astrophot.fit.base.BaseOptimizer(model: Model, initial_state: Sequence = None, relative_tolerance: float = 0.001, verbose: int = 1, max_iter: int = None, save_steps: str | None = None, fit_valid: bool = True)[source]#

Bases: object

Base optimizer object that other optimizers inherit from. Ensures consistent signature for the classes.

Parameters:
  • model – an AstroPhot_Model object that will have its (unlocked) parameters optimized [AstroPhot_Model]

  • initial_state – optional initialization for the parameters as a 1D Array [Array]

  • relative_tolerance – tolerance for counting success steps as: $0 < (chi_2^2 - chi_1^2)/chi_1^2 < text{tol}$ [float]

  • verbose – verbosity level for the optimizer [int]

  • max_iter – maximum allowed number of iterations [int]

  • save_steps – optional string for path to save the model at each step (fitter dependent), e.g. “model_step_{step}.hdf5” [str]

  • fit_valid – whether to fit while forcing parameters into valid range, or allow any value for each parameter. Default True [bool]

chi2min() float[source]#

Returns the minimum value of chi^2 loss in the loss history.

fit() BaseOptimizer[source]#
res() ndarray[source]#

Returns the value of lambda (state parameters) at which minimum loss was achieved.

res_loss()[source]#

returns the minimum value from the loss history.

step(current_state: Annotated[Tensor, 'One of: torch.Tensor or jax.numpy.ndarray depending on the chosen backend.'] = None) None[source]#

astrophot.fit.gradient module#

class astrophot.fit.gradient.Grad(model: Model, initial_state: Sequence = None, likelihood='gaussian', method='NAdam', optim_kwargs={}, patience: int = 10, report_freq=10, **kwargs)[source]#

Bases: BaseOptimizer

A gradient descent optimization wrapper for AstroPhot Model objects.

The default method is “NAdam”, a variant of the Adam optimization algorithm. This optimizer uses a combination of gradient descent and Nesterov momentum for faster convergence. The optimizer is instantiated with a set of initial parameters and optimization options provided by the user. The fit method performs the optimization, taking a series of gradient steps until a stopping criteria is met.

BaseOptimizer

Base optimizer object that other optimizers inherit from. Ensures consistent signature for the classes.

Parameters:
  • model – an AstroPhot_Model object that will have its (unlocked) parameters optimized [AstroPhot_Model]

  • initial_state – optional initialization for the parameters as a 1D Array [Array]

  • relative_tolerance – tolerance for counting success steps as: $0 < (chi_2^2 - chi_1^2)/chi_1^2 < text{tol}$ [float]

  • verbose – verbosity level for the optimizer [int]

  • max_iter – maximum allowed number of iterations [int]

  • save_steps – optional string for path to save the model at each step (fitter dependent), e.g. “model_step_{step}.hdf5” [str]

  • fit_valid – whether to fit while forcing parameters into valid range, or allow any value for each parameter. Default True [bool]

  • likelihood – The likelihood function to use for the optimization. Defaults to “gaussian”.

  • method – the optimization method to use for the update step. Defaults to “NAdam”.

  • optim_kwargs – a dictionary of keyword arguments to pass to the pytorch optimizer.

  • patience – number of steps with no improvement before stopping the optimization. Defaults to 10.

  • report_freq – frequency of reporting the optimization progress. Defaults to 10 steps.

density(state: Tensor) Tensor[source]#

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state. Based on self.likelihood, will be either the Gaussian or Poisson negative log likelihood.

fit() BaseOptimizer[source]#

Perform an iterative fit of the model parameters using the specified optimizer.

The fit procedure continues until a stopping criteria is met, such as the maximum number of iterations being reached, or no improvement being made after a specified number of iterations.

step() None[source]#

Take a single gradient step.

Computes the loss function of the model, computes the gradient of the parameters using automatic differentiation, and takes a step with the PyTorch optimizer.

astrophot.fit.hmc module#

class astrophot.fit.hmc.HMC(model: Model, initial_state: Sequence | None = None, max_iter: int = 1000, inv_mass: Tensor | None = None, epsilon: float = 0.0001, leapfrog_steps: int = 10, progress_bar: bool = True, prior: Distribution | None = None, warmup: int = 100, hmc_kwargs: dict = {}, mcmc_kwargs: dict = {}, likelihood: str = 'gaussian', **kwargs)[source]#

Bases: BaseOptimizer

Hamiltonian Monte-Carlo sampler wrapper for the Pyro package.

This MCMC algorithm uses gradients of the $chi^2$ to more efficiently explore the probability distribution.

More information on HMC can be found at: https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo, https://arxiv.org/abs/1701.02434, and http://www.mcmchandbook.net/HandbookChapter5.pdf

Parameters:
  • max_iter (int, optional) – The number of sampling steps to perform. Defaults to 1000.

  • epsilon (float, optional) – The length of the integration step to perform for each leapfrog iteration. The momentum update will be of order epsilon * score. Defaults to 1e-5.

  • leapfrog_steps (int, optional) – Number of steps to perform with leapfrog integrator per sample of the HMC. Defaults to 10.

  • inv_mass (float or array, optional) – Inverse Mass matrix (covariance matrix) which can tune the behavior in each dimension to ensure better mixing when sampling. Defaults to the identity.

  • progress_bar (bool, optional) – Whether to display a progress bar during sampling. Defaults to True.

  • prior (distribution, optional) – Prior distribution for the parameters. Defaults to None.

  • warmup (int, optional) – Number of warmup steps before actual sampling begins. Defaults to 100.

  • hmc_kwargs (dict, optional) – Additional keyword arguments for the HMC sampler. Defaults to {}.

  • mcmc_kwargs (dict, optional) – Additional keyword arguments for the MCMC process. Defaults to {}.

fit(state: Tensor | None = None)[source]#

Performs MCMC sampling using Hamiltonian Monte-Carlo step.

Records the chain for later examination.

Parameters:

state (Array, optional) – Model parameters as a 1D Array.

astrophot.fit.iterative module#

class astrophot.fit.iterative.Iter(model: Model, initial_state: ndarray = None, max_iter: int = 100, lm_kwargs: Dict[str, Any] = {'verbose': 0}, **kwargs: Dict[str, Any])[source]#

Bases: BaseOptimizer

Optimizer wrapper that performs optimization iteratively.

This optimizer applies the LM optimizer to a group model iteratively one model at a time. It can be used for complex fits or when the number of models to fit is too large to fit in memory. Note that it will iterate through the group model, but if models within the group are themselves group models, then they will be optimized as a whole. This gives some flexibility to structure the models in a useful way.

If not given, the lm_kwargs will be set to a relative tolerance of 1e-3 and a maximum of 15 iterations. This is to allow for faster convergence, it is not worthwhile for a single model to spend lots of time optimizing when its neighbors havent converged.

Parameters:
  • max_iter – Maximum number of iterations, defaults to 100.

  • lm_kwargs – Keyword arguments to pass to LM optimizer.

fit() BaseOptimizer[source]#

Perform the iterative fitting process until convergence or maximum iterations reached.

step()[source]#

Perform a single iteration of optimization.

sub_step(model: Model, update_uncertainty=False)[source]#

Perform optimization for a single model.

astrophot.fit.lm module#

class astrophot.fit.lm.LM(model, initial_state: Sequence = None, max_iter: int = 100, relative_tolerance: float = 1e-05, Lup=11.0, Ldn=9.0, L0=1.0, max_step_iter: int = 10, ndf=None, likelihood='gaussian', constraint: LMConstraint | None = None, forward=None, jacobian=None, **kwargs)[source]#

Bases: BaseOptimizer

The LM class is an implementation of the Levenberg-Marquardt optimization algorithm. This method is used to solve non-linear least squares problems and is known for its robustness and efficiency.

The Levenberg-Marquardt (LM) algorithm is an iterative method used for solving non-linear least squares problems. It can be seen as a combination of the Gauss-Newton method and the gradient descent method: it works like the Gauss-Newton method when the parameters are approximately near a minimum, and like the gradient descent method when the parameters are far from their optimal values.

The cost function that the LM algorithm tries to minimize is of the form:

$$f(boldsymbol{beta}) = frac{1}{2}sum_{i=1}^{N} r_i(boldsymbol{beta})^2$$

where $boldsymbol{beta}$ is the vector of parameters, $r_i$ are the residuals, and $N$ is the number of observations.

The LM algorithm iteratively performs the following update to the parameters:

$$boldsymbol{beta}_{n+1} = boldsymbol{beta}_{n} - (J^T J + lambda diag(J^T J))^{-1} J^T boldsymbol{r}$$

where: - $J$ is the Jacobian matrix whose elements are $J_{ij} = frac{partial r_i}{partial beta_j}$, - $boldsymbol{r}$ is the vector of residuals $r_i(boldsymbol{beta})$, - $lambda$ is a damping factor which is adjusted at each iteration.

When $lambda = 0$ this can be seen as the Gauss-Newton method. In the limit that $lambda$ is large, the $J^T J$ matrix (an approximation of the Hessian) becomes subdominant and the update essentially points along $J^T boldsymbol{r}$ which is the gradient. In this scenario the gradient descent direction is also modified by the $lambda diag(J^T J)$ scaling which in some sense makes each gradient unitless and further improves the step. Note as well that as $lambda$ gets larger the step taken will be smaller, which helps to ensure convergence when the initial guess of the parameters are far from the optimal solution.

Note that the residuals $r_i$ are typically also scaled by the variance of the pixels, but this does not change the equations above. For a detailed explanation of the LM method see the article by Henri Gavin on which much of the AstroPhot LM implementation is based:

.. code-block:: text
@article{Gavin2019,

title={The Levenberg-Marquardt algorithm for nonlinear least squares curve-fitting problems}, author={Gavin, Henri P}, journal={Department of Civil and Environmental Engineering, Duke University}, volume={19}, year={2019}

}

as well as the paper on LM geodesic acceleration by Mark Transtrum:

.. code-block:: text
@article{Tanstrum2012,
author = {{Transtrum}, Mark K. and {Sethna}, James P.},
title = “{Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization}”,
year = 2012,

doi = {10.48550/arXiv.1201.5885},

adsurl = {https://ui.adsabs.harvard.edu/abs/2012arXiv1201.5885T},

}

The damping factor $lambda$ is adjusted at each iteration: it is effectively increased when we are far from the solution, and decreased when we are close to it. In practice, the algorithm attempts to pick the smallest $lambda$ that is can while making sure that the $chi^2$ decreases at each step.

The main advantage of the LM algorithm is its adaptability. When the current estimate is far from the optimum, the algorithm behaves like a gradient descent to ensure global convergence. However, when it is close to the optimum, it behaves like Gauss-Newton, which provides fast local convergence.

In practice, the algorithm is typically implemented with various enhancements to improve its performance. For example, the Jacobian may be approximated with finite differences, geodesic acceleration can be used to speed up convergence, and more sophisticated strategies can be used to adjust the damping factor $lambda$.

The exact performance of the LM algorithm will depend on the nature of the problem, including the complexity of the function f(x), the quality of the initial estimate x0, and the distribution of the data.

The LM class implemented for AstroPhot takes a model, initial state, and various other optional parameters as inputs and seeks to find the parameters that minimize the cost function.

check_convergence() bool[source]#

Check if the optimization has converged based on the last iteration’s chi^2 and the relative tolerance.

chi2_ndf()[source]#
property covariance_matrix: Annotated[Tensor, 'One of: torch.Tensor or jax.numpy.ndarray depending on the chosen backend.']#

The covariance matrix for the model at the current parameters. This can be used to construct a full Gaussian PDF for the parameters using: $mathcal{N}(mu,Sigma)$ where $mu$ is the optimized parameters and $Sigma$ is the covariance matrix.

fit(update_uncertainty=True) BaseOptimizer[source]#

This performs the fitting operation. It iterates the LM step function until convergence is reached. Includes a message after fitting to indicate how the fitting exited. Typically if the message returns a “success” then the algorithm found a minimum. This may be the desired solution, or a pathological local minimum, this often depends on the initial conditions.

poisson_2nll_ndf()[source]#
update_uncertainty() None[source]#

Call this function after optimization to set the uncertainties for the parameters. This will use the diagonal of the covariance matrix to update the uncertainties. See the covariance_matrix function for the full representation of the uncertainties.

class astrophot.fit.lm.LMConstraint(model, constraint, sigma, name=None)[source]#

Bases: Module

jacobian(model, params)[source]#

astrophot.fit.mala module#

class astrophot.fit.mala.MALA(model: Model, initial_state: Sequence | None = None, chains=4, epsilon: float = 0.01, mass_matrix: ndarray | None = None, max_iter: int = 1000, progress_bar: bool = True, likelihood='gaussian', **kwargs)[source]#

Bases: BaseOptimizer

Metropolis-Adjusted Langevin Algorithm (MALA) sampler, based on: https://en.wikipedia.org/wiki/Metropolis-adjusted_Langevin_algorithm . This is a gradient-based MCMC sampler that uses the gradient of the log-likelihood to propose new samples. These gradient based proposals can lead to more efficient sampling of the parameter space. This is especially true when the mass_matrix is set well. A good guess for the mass matrix is the covariance matrix of the likelihood at the maximum likelihood point. Which can be found fairly easily with the LM optimizer (see the fitting methods tutorial).

Parameters:
  • chains – The number of MCMC chains to run in parallel. Default is 4.

  • epsilon – The step size for the MALA sampler. Default is 1e-2.

  • mass_matrix – The mass matrix for the MALA sampler. If None, the identity matrix is used.

  • progress_bar – Whether to show a progress bar during sampling. Default is True.

  • likelihood – The likelihood function to use for the MCMC sampling. Can be “gaussian” or “poisson”. Default is “gaussian”.

density_func()[source]#

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state.

density_grad_func()[source]#

Returns the gradient of the density of the model at the given state vector. This is used to calculate the gradient of the likelihood of the model at the given state.

fit()[source]#

astrophot.fit.mhmcmc module#

class astrophot.fit.mhmcmc.MHMCMC(model: Model, initial_state: Sequence | None = None, max_iter: int = 1000, likelihood='gaussian', **kwargs)[source]#

Bases: BaseOptimizer

Metropolis-Hastings Markov-Chain Monte-Carlo sampler, based on: https://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm . This is simply a thin wrapper for the Emcee package, which is a well-known MCMC sampler.

Note that the Emcee sampler requires multiple walkers to sample the parameter space efficiently. The number of walkers is set to twice the number of parameters by default, but can be made higher (not lower) if desired. This is done by passing a 2D array of shape (nwalkers, ndim) to the fit method.

Parameters:

likelihood – The likelihood function to use for the MCMC sampling. Can be “gaussian” or “poisson”. Default is “gaussian”.

density()[source]#

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state.

fit(state: ndarray | None = None, nsamples: int | None = None, restart_chain: bool = True, skip_initial_state_check: bool = True, flat_chain: bool = True)[source]#

Performs the MCMC sampling using a Metropolis Hastings acceptance step and records the chain for later examination.

astrophot.fit.scipy_fit module#

class astrophot.fit.scipy_fit.ScipyFit(model, initial_state: Sequence = None, method: Literal['Nelder-Mead', 'L-BFGS-B', 'TNC', 'SLSQP', 'Powell', 'trust-constr'] = 'Nelder-Mead', likelihood: Literal['gaussian', 'poisson'] = 'gaussian', ndf=None, **kwargs)[source]#

Bases: BaseOptimizer

Scipy-based optimizer for fitting models to data using various optimization methods.

The optimizer uses the scipy.optimize.minimize function to perform the fitting. The Scipy package is widely used and well tested for optimization tasks. It supports a variety of methods, however only a subset allow users to define boundaries for the parameters. This wrapper is only for those methods.

Parameters:
  • model – The model to fit, which should be an instance of Model.

  • initial_state – Initial guess for the model parameters as a 1D Array.

  • method – The optimization method to use. Default is “Nelder-Mead”, but can be set to any of: “Nelder-Mead”, “L-BFGS-B”, “TNC”, “SLSQP”, “Powell”, or “trust-constr”.

  • ndf – Optional number of degrees of freedom for the fit. If not provided, it is calculated as the number of data points minus the number of parameters.

density(state: Sequence) float[source]#
fit()[source]#
numpy_bounds()[source]#

Convert the model’s parameter bounds to a format suitable for scipy.optimize.

Module contents#

class astrophot.fit.Grad(model: Model, initial_state: Sequence = None, likelihood='gaussian', method='NAdam', optim_kwargs={}, patience: int = 10, report_freq=10, **kwargs)[source]#

Bases: BaseOptimizer

A gradient descent optimization wrapper for AstroPhot Model objects.

The default method is “NAdam”, a variant of the Adam optimization algorithm. This optimizer uses a combination of gradient descent and Nesterov momentum for faster convergence. The optimizer is instantiated with a set of initial parameters and optimization options provided by the user. The fit method performs the optimization, taking a series of gradient steps until a stopping criteria is met.

BaseOptimizer

Base optimizer object that other optimizers inherit from. Ensures consistent signature for the classes.

Parameters:
  • model – an AstroPhot_Model object that will have its (unlocked) parameters optimized [AstroPhot_Model]

  • initial_state – optional initialization for the parameters as a 1D Array [Array]

  • relative_tolerance – tolerance for counting success steps as: $0 < (chi_2^2 - chi_1^2)/chi_1^2 < text{tol}$ [float]

  • verbose – verbosity level for the optimizer [int]

  • max_iter – maximum allowed number of iterations [int]

  • save_steps – optional string for path to save the model at each step (fitter dependent), e.g. “model_step_{step}.hdf5” [str]

  • fit_valid – whether to fit while forcing parameters into valid range, or allow any value for each parameter. Default True [bool]

  • likelihood – The likelihood function to use for the optimization. Defaults to “gaussian”.

  • method – the optimization method to use for the update step. Defaults to “NAdam”.

  • optim_kwargs – a dictionary of keyword arguments to pass to the pytorch optimizer.

  • patience – number of steps with no improvement before stopping the optimization. Defaults to 10.

  • report_freq – frequency of reporting the optimization progress. Defaults to 10 steps.

density(state: Tensor) Tensor[source]#

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state. Based on self.likelihood, will be either the Gaussian or Poisson negative log likelihood.

fit() BaseOptimizer[source]#

Perform an iterative fit of the model parameters using the specified optimizer.

The fit procedure continues until a stopping criteria is met, such as the maximum number of iterations being reached, or no improvement being made after a specified number of iterations.

step() None[source]#

Take a single gradient step.

Computes the loss function of the model, computes the gradient of the parameters using automatic differentiation, and takes a step with the PyTorch optimizer.

class astrophot.fit.HMC(model: Model, initial_state: Sequence | None = None, max_iter: int = 1000, inv_mass: Tensor | None = None, epsilon: float = 0.0001, leapfrog_steps: int = 10, progress_bar: bool = True, prior: Distribution | None = None, warmup: int = 100, hmc_kwargs: dict = {}, mcmc_kwargs: dict = {}, likelihood: str = 'gaussian', **kwargs)[source]#

Bases: BaseOptimizer

Hamiltonian Monte-Carlo sampler wrapper for the Pyro package.

This MCMC algorithm uses gradients of the $chi^2$ to more efficiently explore the probability distribution.

More information on HMC can be found at: https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo, https://arxiv.org/abs/1701.02434, and http://www.mcmchandbook.net/HandbookChapter5.pdf

Parameters:
  • max_iter (int, optional) – The number of sampling steps to perform. Defaults to 1000.

  • epsilon (float, optional) – The length of the integration step to perform for each leapfrog iteration. The momentum update will be of order epsilon * score. Defaults to 1e-5.

  • leapfrog_steps (int, optional) – Number of steps to perform with leapfrog integrator per sample of the HMC. Defaults to 10.

  • inv_mass (float or array, optional) – Inverse Mass matrix (covariance matrix) which can tune the behavior in each dimension to ensure better mixing when sampling. Defaults to the identity.

  • progress_bar (bool, optional) – Whether to display a progress bar during sampling. Defaults to True.

  • prior (distribution, optional) – Prior distribution for the parameters. Defaults to None.

  • warmup (int, optional) – Number of warmup steps before actual sampling begins. Defaults to 100.

  • hmc_kwargs (dict, optional) – Additional keyword arguments for the HMC sampler. Defaults to {}.

  • mcmc_kwargs (dict, optional) – Additional keyword arguments for the MCMC process. Defaults to {}.

fit(state: Tensor | None = None)[source]#

Performs MCMC sampling using Hamiltonian Monte-Carlo step.

Records the chain for later examination.

Parameters:

state (Array, optional) – Model parameters as a 1D Array.

class astrophot.fit.Iter(model: Model, initial_state: ndarray = None, max_iter: int = 100, lm_kwargs: Dict[str, Any] = {'verbose': 0}, **kwargs: Dict[str, Any])[source]#

Bases: BaseOptimizer

Optimizer wrapper that performs optimization iteratively.

This optimizer applies the LM optimizer to a group model iteratively one model at a time. It can be used for complex fits or when the number of models to fit is too large to fit in memory. Note that it will iterate through the group model, but if models within the group are themselves group models, then they will be optimized as a whole. This gives some flexibility to structure the models in a useful way.

If not given, the lm_kwargs will be set to a relative tolerance of 1e-3 and a maximum of 15 iterations. This is to allow for faster convergence, it is not worthwhile for a single model to spend lots of time optimizing when its neighbors havent converged.

Parameters:
  • max_iter – Maximum number of iterations, defaults to 100.

  • lm_kwargs – Keyword arguments to pass to LM optimizer.

fit() BaseOptimizer[source]#

Perform the iterative fitting process until convergence or maximum iterations reached.

step()[source]#

Perform a single iteration of optimization.

sub_step(model: Model, update_uncertainty=False)[source]#

Perform optimization for a single model.

class astrophot.fit.IterParam(model: Model, initial_state: Sequence = None, chunks: int | tuple = 50, chunk_order: Literal['random', 'sequential'] = 'sequential', max_iter: int = 100, relative_tolerance: float = 1e-05, Lup=11.0, Ldn=9.0, L0=1.0, max_step_iter: int = 10, ndf=None, W=None, likelihood='gaussian', **kwargs)[source]#

Bases: BaseOptimizer

Optimization wrapper that call LM optimizer on subsets of variables.

IterParam takes the full set of parameters for a model and breaks them down into chunks as specified by the user. It then calls Levenberg-Marquardt optimization on the subset of parameters, and iterates through all subsets until every parameter has been optimized. It cycles through these chunks until convergence. This method is very powerful in situations where the full optimization problem cannot fit in memory, or where the optimization problem is too complex to tackle as a single large problem. In full LM optimization a single problematic parameter can ripple into issues with every other parameter, so breaking the problem down can sometimes make an otherwise intractable problem easier. For small problems with only a few models, it is likely better to optimize the full problem with LM as, when it works, LM is faster than the IterParam method.

Parameters:
  • chunks (Union[int, tuple]) – Specify how to break down the model parameters. If an integer, at each iteration the algorithm will break the parameters into groups of that size. If a tuple, it should be a tuple of arrays of length num_dimensions which act as selectors for the parameters to fit (1 to include, 0 to exclude). Default: 50.

  • chunk_order (str) – How to iterate through the chunks. Should be one of: "random", "sequential". Default: "sequential".

check_convergence() bool[source]#

Check if the optimization has converged based on the last iteration’s chi^2 and the relative tolerance.

chi2_ndf()[source]#
property covariance_matrix#

The covariance matrix for the model at the current parameters. This can be used to construct a full Gaussian PDF for the parameters using: $mathcal{N}(mu,Sigma)$ where $mu$ is the optimized parameters and $Sigma$ is the covariance matrix.

fit(update_uncertainty=True) BaseOptimizer[source]#

This performs the fitting operation. It iterates the LM step function until convergence is reached. Includes a message after fitting to indicate how the fitting exited. Typically if the message returns a “success” then the algorithm found a minimum. This may be the desired solution, or a pathological local minimum, this often depends on the initial conditions.

iter_chunks()[source]#
make_chunks(chunks)[source]#
poisson_2nll_ndf()[source]#
update_uncertainty() None[source]#

Call this function after optimization to set the uncertainties for the parameters. This will use the diagonal of the covariance matrix to update the uncertainties. See the covariance_matrix function for the full representation of the uncertainties.

class astrophot.fit.LM(model, initial_state: Sequence = None, max_iter: int = 100, relative_tolerance: float = 1e-05, Lup=11.0, Ldn=9.0, L0=1.0, max_step_iter: int = 10, ndf=None, likelihood='gaussian', constraint: LMConstraint | None = None, forward=None, jacobian=None, **kwargs)[source]#

Bases: BaseOptimizer

The LM class is an implementation of the Levenberg-Marquardt optimization algorithm. This method is used to solve non-linear least squares problems and is known for its robustness and efficiency.

The Levenberg-Marquardt (LM) algorithm is an iterative method used for solving non-linear least squares problems. It can be seen as a combination of the Gauss-Newton method and the gradient descent method: it works like the Gauss-Newton method when the parameters are approximately near a minimum, and like the gradient descent method when the parameters are far from their optimal values.

The cost function that the LM algorithm tries to minimize is of the form:

$$f(boldsymbol{beta}) = frac{1}{2}sum_{i=1}^{N} r_i(boldsymbol{beta})^2$$

where $boldsymbol{beta}$ is the vector of parameters, $r_i$ are the residuals, and $N$ is the number of observations.

The LM algorithm iteratively performs the following update to the parameters:

$$boldsymbol{beta}_{n+1} = boldsymbol{beta}_{n} - (J^T J + lambda diag(J^T J))^{-1} J^T boldsymbol{r}$$

where: - $J$ is the Jacobian matrix whose elements are $J_{ij} = frac{partial r_i}{partial beta_j}$, - $boldsymbol{r}$ is the vector of residuals $r_i(boldsymbol{beta})$, - $lambda$ is a damping factor which is adjusted at each iteration.

When $lambda = 0$ this can be seen as the Gauss-Newton method. In the limit that $lambda$ is large, the $J^T J$ matrix (an approximation of the Hessian) becomes subdominant and the update essentially points along $J^T boldsymbol{r}$ which is the gradient. In this scenario the gradient descent direction is also modified by the $lambda diag(J^T J)$ scaling which in some sense makes each gradient unitless and further improves the step. Note as well that as $lambda$ gets larger the step taken will be smaller, which helps to ensure convergence when the initial guess of the parameters are far from the optimal solution.

Note that the residuals $r_i$ are typically also scaled by the variance of the pixels, but this does not change the equations above. For a detailed explanation of the LM method see the article by Henri Gavin on which much of the AstroPhot LM implementation is based:

.. code-block:: text
@article{Gavin2019,

title={The Levenberg-Marquardt algorithm for nonlinear least squares curve-fitting problems}, author={Gavin, Henri P}, journal={Department of Civil and Environmental Engineering, Duke University}, volume={19}, year={2019}

}

as well as the paper on LM geodesic acceleration by Mark Transtrum:

.. code-block:: text
@article{Tanstrum2012,
author = {{Transtrum}, Mark K. and {Sethna}, James P.},
title = “{Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization}”,
year = 2012,

doi = {10.48550/arXiv.1201.5885},

adsurl = {https://ui.adsabs.harvard.edu/abs/2012arXiv1201.5885T},

}

The damping factor $lambda$ is adjusted at each iteration: it is effectively increased when we are far from the solution, and decreased when we are close to it. In practice, the algorithm attempts to pick the smallest $lambda$ that is can while making sure that the $chi^2$ decreases at each step.

The main advantage of the LM algorithm is its adaptability. When the current estimate is far from the optimum, the algorithm behaves like a gradient descent to ensure global convergence. However, when it is close to the optimum, it behaves like Gauss-Newton, which provides fast local convergence.

In practice, the algorithm is typically implemented with various enhancements to improve its performance. For example, the Jacobian may be approximated with finite differences, geodesic acceleration can be used to speed up convergence, and more sophisticated strategies can be used to adjust the damping factor $lambda$.

The exact performance of the LM algorithm will depend on the nature of the problem, including the complexity of the function f(x), the quality of the initial estimate x0, and the distribution of the data.

The LM class implemented for AstroPhot takes a model, initial state, and various other optional parameters as inputs and seeks to find the parameters that minimize the cost function.

check_convergence() bool[source]#

Check if the optimization has converged based on the last iteration’s chi^2 and the relative tolerance.

chi2_ndf()[source]#
property covariance_matrix: Annotated[Tensor, 'One of: torch.Tensor or jax.numpy.ndarray depending on the chosen backend.']#

The covariance matrix for the model at the current parameters. This can be used to construct a full Gaussian PDF for the parameters using: $mathcal{N}(mu,Sigma)$ where $mu$ is the optimized parameters and $Sigma$ is the covariance matrix.

fit(update_uncertainty=True) BaseOptimizer[source]#

This performs the fitting operation. It iterates the LM step function until convergence is reached. Includes a message after fitting to indicate how the fitting exited. Typically if the message returns a “success” then the algorithm found a minimum. This may be the desired solution, or a pathological local minimum, this often depends on the initial conditions.

poisson_2nll_ndf()[source]#
update_uncertainty() None[source]#

Call this function after optimization to set the uncertainties for the parameters. This will use the diagonal of the covariance matrix to update the uncertainties. See the covariance_matrix function for the full representation of the uncertainties.

class astrophot.fit.LMConstraint(model, constraint, sigma, name=None)[source]#

Bases: Module

jacobian(model, params)[source]#
class astrophot.fit.MALA(model: Model, initial_state: Sequence | None = None, chains=4, epsilon: float = 0.01, mass_matrix: ndarray | None = None, max_iter: int = 1000, progress_bar: bool = True, likelihood='gaussian', **kwargs)[source]#

Bases: BaseOptimizer

Metropolis-Adjusted Langevin Algorithm (MALA) sampler, based on: https://en.wikipedia.org/wiki/Metropolis-adjusted_Langevin_algorithm . This is a gradient-based MCMC sampler that uses the gradient of the log-likelihood to propose new samples. These gradient based proposals can lead to more efficient sampling of the parameter space. This is especially true when the mass_matrix is set well. A good guess for the mass matrix is the covariance matrix of the likelihood at the maximum likelihood point. Which can be found fairly easily with the LM optimizer (see the fitting methods tutorial).

Parameters:
  • chains – The number of MCMC chains to run in parallel. Default is 4.

  • epsilon – The step size for the MALA sampler. Default is 1e-2.

  • mass_matrix – The mass matrix for the MALA sampler. If None, the identity matrix is used.

  • progress_bar – Whether to show a progress bar during sampling. Default is True.

  • likelihood – The likelihood function to use for the MCMC sampling. Can be “gaussian” or “poisson”. Default is “gaussian”.

density_func()[source]#

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state.

density_grad_func()[source]#

Returns the gradient of the density of the model at the given state vector. This is used to calculate the gradient of the likelihood of the model at the given state.

fit()[source]#
class astrophot.fit.MHMCMC(model: Model, initial_state: Sequence | None = None, max_iter: int = 1000, likelihood='gaussian', **kwargs)[source]#

Bases: BaseOptimizer

Metropolis-Hastings Markov-Chain Monte-Carlo sampler, based on: https://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm . This is simply a thin wrapper for the Emcee package, which is a well-known MCMC sampler.

Note that the Emcee sampler requires multiple walkers to sample the parameter space efficiently. The number of walkers is set to twice the number of parameters by default, but can be made higher (not lower) if desired. This is done by passing a 2D array of shape (nwalkers, ndim) to the fit method.

Parameters:

likelihood – The likelihood function to use for the MCMC sampling. Can be “gaussian” or “poisson”. Default is “gaussian”.

density()[source]#

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state.

fit(state: ndarray | None = None, nsamples: int | None = None, restart_chain: bool = True, skip_initial_state_check: bool = True, flat_chain: bool = True)[source]#

Performs the MCMC sampling using a Metropolis Hastings acceptance step and records the chain for later examination.

class astrophot.fit.ScipyFit(model, initial_state: Sequence = None, method: Literal['Nelder-Mead', 'L-BFGS-B', 'TNC', 'SLSQP', 'Powell', 'trust-constr'] = 'Nelder-Mead', likelihood: Literal['gaussian', 'poisson'] = 'gaussian', ndf=None, **kwargs)[source]#

Bases: BaseOptimizer

Scipy-based optimizer for fitting models to data using various optimization methods.

The optimizer uses the scipy.optimize.minimize function to perform the fitting. The Scipy package is widely used and well tested for optimization tasks. It supports a variety of methods, however only a subset allow users to define boundaries for the parameters. This wrapper is only for those methods.

Parameters:
  • model – The model to fit, which should be an instance of Model.

  • initial_state – Initial guess for the model parameters as a 1D Array.

  • method – The optimization method to use. Default is “Nelder-Mead”, but can be set to any of: “Nelder-Mead”, “L-BFGS-B”, “TNC”, “SLSQP”, “Powell”, or “trust-constr”.

  • ndf – Optional number of degrees of freedom for the fit. If not provided, it is calculated as the number of data points minus the number of parameters.

density(state: Sequence) float[source]#
fit()[source]#
numpy_bounds()[source]#

Convert the model’s parameter bounds to a format suitable for scipy.optimize.

class astrophot.fit.Slalom(model: Model, initial_state: Sequence = None, S=0.0001, likelihood: str = 'gaussian', report_freq: int = 10, relative_tolerance: float = 0.0001, momentum: float = 0.5, max_iter: int = 1000, **kwargs)[source]#

Bases: BaseOptimizer

Slalom optimizer for Model objects.

Slalom is a gradient descent optimization algorithm that uses a few evaluations along the direction of the gradient to find the optimal step size. This is done by assuming that the posterior density is a parabola and then finding the minimum.

The optimizer quickly finds the minimum of the posterior density along the gradient direction, then updates the gradient at the new position and repeats. This continues until it reaches a set of 5 steps which collectively improve the posterior density by an amount smaller than the relative_tolerance threshold, indicating that convergence has been achieved. Note that this convergence criteria is not a guarantee, simply a heuristic. The default tolerance was such that the optimizer will substantially improve from the starting point, and do so quickly, but may not reach all the way to the minimum of the posterior density. Like other gradient descent algorithms, Slalom slows down considerably when trying to achieve very high precision.

Parameters:
  • S (float, optional) – The initial step size for the Slalom optimizer. Defaults to 1e-4.

  • likelihood (str, optional) – The likelihood function to use for the optimization. Defaults to “gaussian”.

  • report_freq (int, optional) – Frequency of reporting the optimization progress. Defaults to 10 steps.

  • relative_tolerance (float, optional) – The relative tolerance for convergence. Defaults to 1e-4.

  • momentum (float, optional) – The momentum factor for the Slalom optimizer. Defaults to 0.5.

  • max_iter (int, optional) – The maximum number of iterations for the optimizer. Defaults to 1000.

density(state: Annotated[Tensor, 'One of: torch.Tensor or jax.numpy.ndarray depending on the chosen backend.']) Annotated[Tensor, 'One of: torch.Tensor or jax.numpy.ndarray depending on the chosen backend.'][source]#

Calculate the density of the model at the given state. Based on self.likelihood, will be either the Gaussian or Poisson negative log likelihood.

fit() BaseOptimizer[source]#

Perform the Slalom optimization.