fit#

astrophot.fit.LM#

The LM class is an implementation of the Levenberg-Marquardt optimization algorithm. This method is used to solve non-linear least squares problems and is known for its robustness and efficiency.

The Levenberg-Marquardt (LM) algorithm is an iterative method used for solving non-linear least squares problems. It can be seen as a combination of the Gauss-Newton method and the gradient descent method: it works like the Gauss-Newton method when the parameters are approximately near a minimum, and like the gradient descent method when the parameters are far from their optimal values.

The cost function that the LM algorithm tries to minimize is of the form:

\[f(\boldsymbol{\beta}) = \frac{1}{2}\sum_{i=1}^{N} r_i(\boldsymbol{\beta})^2\]

where \(\boldsymbol{\beta}\) is the vector of parameters, \(r_i\) are the residuals, and \(N\) is the number of observations.

The LM algorithm iteratively performs the following update to the parameters:

\[\boldsymbol{\beta}_{n+1} = \boldsymbol{\beta}_{n} - (J^T J + \lambda diag(J^T J))^{-1} J^T \boldsymbol{r}\]

where:

  • \(J\) is the Jacobian matrix whose elements are \(J_{ij} = \frac{\partial r_i}{\partial \beta_j}\),

  • \(\boldsymbol{r}\) is the vector of residuals \(r_i(\boldsymbol{\beta})\),

  • \(\lambda\) is a damping factor which is adjusted at each iteration.

When \(\lambda = 0\) this can be seen as the Gauss-Newton method. In the limit that \(\lambda\) is large, the \(J^T J\) matrix (an approximation of the Hessian) becomes subdominant and the update essentially points along \(J^T \boldsymbol{r}\) which is the gradient. In this scenario the gradient descent direction is also modified by the \(\lambda diag(J^T J)\) scaling which in some sense makes each gradient unitless and further improves the step. Note as well that as \(\lambda\) gets larger the step taken will be smaller, which helps to ensure convergence when the initial guess of the parameters are far from the optimal solution.

Note that the residuals \(r_i\) are typically also scaled by the variance of the pixels, but this does not change the equations above. For a detailed explanation of the LM method see the article by Henri Gavin on which much of the AstroPhot LM implementation is based::

as well as the paper on LM geodesic acceleration by Mark Transtrum::

The damping factor \(\lambda\) is adjusted at each iteration: it is effectively increased when we are far from the solution, and decreased when we are close to it. In practice, the algorithm attempts to pick the smallest \(\lambda\) that is can while making sure that the \(\chi^2\) decreases at each step.

The main advantage of the LM algorithm is its adaptability. When the current estimate is far from the optimum, the algorithm behaves like a gradient descent to ensure global convergence. However, when it is close to the optimum, it behaves like Gauss-Newton, which provides fast local convergence.

In practice, the algorithm is typically implemented with various enhancements to improve its performance. For example, the Jacobian may be approximated with finite differences, geodesic acceleration can be used to speed up convergence, and more sophisticated strategies can be used to adjust the damping factor \(\lambda\).

The exact performance of the LM algorithm will depend on the nature of the problem, including the complexity of the function f(x), the quality of the initial estimate x0, and the distribution of the data.

The LM class implemented for AstroPhot takes a model, initial state, and various other optional parameters as inputs and seeks to find the parameters that minimize the cost function.

method: check_convergence() -> bool

Check if the optimization has converged based on the last iteration’s chi^2 and the relative tolerance.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: fit( update_uncertainty=True) -> astrophot.fit.base.BaseOptimizer

This performs the fitting operation. It iterates the LM step function until convergence is reached. Includes a message after fitting to indicate how the fitting exited. Typically if the message returns a “success” then the algorithm found a minimum. This may be the desired solution, or a pathological local minimum, this often depends on the initial conditions.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

method: update_uncertainty() -> None

Call this function after optimization to set the uncertainties for the parameters. This will use the diagonal of the covariance matrix to update the uncertainties. See the covariance_matrix function for the full representation of the uncertainties.

astrophot.fit.Grad#

A gradient descent optimization wrapper for AstroPhot Model objects.

The default method is “NAdam”, a variant of the Adam optimization algorithm. This optimizer uses a combination of gradient descent and Nesterov momentum for faster convergence. The optimizer is instantiated with a set of initial parameters and optimization options provided by the user. The fit method performs the optimization, taking a series of gradient steps until a stopping criteria is met.

Args:

  • likelihood (str, optional): The likelihood function to use for the optimization. Defaults to “gaussian”.

  • method (str, optional): the optimization method to use for the update step. Defaults to “NAdam”.

  • optim_kwargs (dict, optional): a dictionary of keyword arguments to pass to the pytorch optimizer.

  • patience (int, optional): number of steps with no improvement before stopping the optimization. Defaults to 10.

  • report_freq (int, optional): frequency of reporting the optimization progress. Defaults to 10 steps.

SUBUNIT BaseOptimizer

Base optimizer object that other optimizers inherit from. Ensures consistent signature for the classes.

Args:

  • model: an AstroPhot_Model object that will have its (unlocked) parameters optimized [AstroPhot_Model]

  • initial_state: optional initialization for the parameters as a 1D tensor [tensor]

  • relative_tolerance: tolerance for counting success steps as: \(0 < (\chi_2^2 - \chi_1^2)/\chi_1^2 < \text{tol}\) [float]

  • verbose: verbosity level for the optimizer [int]

  • max_iter: maximum allowed number of iterations [int]

  • save_steps: optional string for path to save the model at each step (fitter dependent), e.g. “model_step_{step}.hdf5” [str]

  • fit_valid: whether to fit while forcing parameters into valid range, or allow any value for each parameter. Default True [bool]

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: density( state: torch.Tensor) -> torch.Tensor

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state. Based on self.likelihood, will be either the Gaussian or Poisson negative log likelihood.

method: fit() -> astrophot.fit.base.BaseOptimizer

Perform an iterative fit of the model parameters using the specified optimizer.

The fit procedure continues until a stopping criteria is met, such as the maximum number of iterations being reached, or no improvement being made after a specified number of iterations.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

method: step() -> None

Take a single gradient step.

Computes the loss function of the model, computes the gradient of the parameters using automatic differentiation, and takes a step with the PyTorch optimizer.

astrophot.fit.Iter#

Optimizer wrapper that performs optimization iteratively.

This optimizer applies the LM optimizer to a group model iteratively one model at a time. It can be used for complex fits or when the number of models to fit is too large to fit in memory. Note that it will iterate through the group model, but if models within the group are themselves group models, then they will be optimized as a whole. This gives some flexibility to structure the models in a useful way.

If not given, the lm_kwargs will be set to a relative tolerance of 1e-3 and a maximum of 15 iterations. This is to allow for faster convergence, it is not worthwhile for a single model to spend lots of time optimizing when its neighbors havent converged.

Args:

  • max_iter: Maximum number of iterations, defaults to 100.

  • lm_kwargs: Keyword arguments to pass to LM optimizer.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: fit() -> astrophot.fit.base.BaseOptimizer

Perform the iterative fitting process until convergence or maximum iterations reached.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

method: step()

Perform a single iteration of optimization.

method: sub_step( model: astrophot.models.base.Model, update_uncertainty=False)

Perform optimization for a single model.

astrophot.fit.MALA#

Metropolis-Adjusted Langevin Algorithm (MALA) sampler, based on: https://en.wikipedia.org/wiki/Metropolis-adjusted_Langevin_algorithm . This is a gradient-based MCMC sampler that uses the gradient of the log-likelihood to propose new samples. These gradient based proposals can lead to more efficient sampling of the parameter space. This is especially true when the mass_matrix is set well. A good guess for the mass matrix is the covariance matrix of the likelihood at the maximum likelihood point. Which can be found fairly easily with the LM optimizer (see the fitting methods tutorial).

Args:

  • chains: The number of MCMC chains to run in parallel. Default is 4.

  • epsilon: The step size for the MALA sampler. Default is 1e-2.

  • mass_matrix: The mass matrix for the MALA sampler. If None, the identity matrix is used.

  • progress_bar: Whether to show a progress bar during sampling. Default is True.

  • likelihood: The likelihood function to use for the MCMC sampling. Can be “gaussian” or “poisson”. Default is “gaussian”.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: density_func()

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state.

method: density_grad_func()

Returns the gradient of the density of the model at the given state vector. This is used to calculate the gradient of the likelihood of the model at the given state.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

astrophot.fit.IterParam#

Optimization wrapper that call LM optimizer on subsets of variables.

IterParam takes the full set of parameters for a model and breaks them down into chunks as specified by the user. It then calls Levenberg-Marquardt optimization on the subset of parameters, and iterates through all subsets until every parameter has been optimized. It cycles through these chunks until convergence. This method is very powerful in situations where the full optimization problem cannot fit in memory, or where the optimization problem is too complex to tackle as a single large problem. In full LM optimization a single problematic parameter can ripple into issues with every other parameter, so breaking the problem down can sometimes make an otherwise intractable problem easier. For small problems with only a few models, it is likely better to optimize the full problem with LM as, when it works, LM is faster than the IterParam method.

Args: chunks (Union[int, tuple]): Specify how to break down the model parameters. If an integer, at each iteration the algorithm will break the parameters into groups of that size. If a tuple, should be a tuple of arrays of length num_dimensions which act as selectors for the parameters to fit (1 to include, 0 to exclude). Default: 50 chunk_order (str): How to iterate through the chunks. Should be one of: random, sequential. Default: sequential

method: check_convergence() -> bool

Check if the optimization has converged based on the last iteration’s chi^2 and the relative tolerance.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: fit( update_uncertainty=True) -> astrophot.fit.base.BaseOptimizer

This performs the fitting operation. It iterates the LM step function until convergence is reached. Includes a message after fitting to indicate how the fitting exited. Typically if the message returns a “success” then the algorithm found a minimum. This may be the desired solution, or a pathological local minimum, this often depends on the initial conditions.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

method: update_uncertainty() -> None

Call this function after optimization to set the uncertainties for the parameters. This will use the diagonal of the covariance matrix to update the uncertainties. See the covariance_matrix function for the full representation of the uncertainties.

astrophot.fit.ScipyFit#

Scipy-based optimizer for fitting models to data using various optimization methods.

The optimizer uses the scipy.optimize.minimize function to perform the fitting. The Scipy package is widely used and well tested for optimization tasks. It supports a variety of methods, however only a subset allow users to define boundaries for the parameters. This wrapper is only for those methods.

Args:

  • model: The model to fit, which should be an instance of Model.

  • initial_state: Initial guess for the model parameters as a 1D tensor.

  • method: The optimization method to use. Default is “Nelder-Mead”, but can be set to any of: “Nelder-Mead”, “L-BFGS-B”, “TNC”, “SLSQP”, “Powell”, or “trust-constr”.

  • ndf: Optional number of degrees of freedom for the fit. If not provided, it is calculated as the number of data points minus the number of parameters.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: numpy_bounds()

Convert the model’s parameter bounds to a format suitable for scipy.optimize.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

astrophot.fit.HMC#

Hamiltonian Monte-Carlo sampler wrapper for the Pyro package.

This MCMC algorithm uses gradients of the \(\chi^2\) to more efficiently explore the probability distribution.

More information on HMC can be found at: https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo, https://arxiv.org/abs/1701.02434, and http://www.mcmchandbook.net/HandbookChapter5.pdf

Args:

  • max_iter (int, optional): The number of sampling steps to perform. Defaults to 1000.

  • epsilon (float, optional): The length of the integration step to perform for each leapfrog iteration. The momentum update will be of order epsilon * score. Defaults to 1e-5.

  • leapfrog_steps (int, optional): Number of steps to perform with leapfrog integrator per sample of the HMC. Defaults to 10.

  • inv_mass (float or array, optional): Inverse Mass matrix (covariance matrix) which can tune the behavior in each dimension to ensure better mixing when sampling. Defaults to the identity.

  • progress_bar (bool, optional): Whether to display a progress bar during sampling. Defaults to True.

  • prior (distribution, optional): Prior distribution for the parameters. Defaults to None.

  • warmup (int, optional): Number of warmup steps before actual sampling begins. Defaults to 100.

  • hmc_kwargs (dict, optional): Additional keyword arguments for the HMC sampler. Defaults to {}.

  • mcmc_kwargs (dict, optional): Additional keyword arguments for the MCMC process. Defaults to {}.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: fit( state: Optional[torch.Tensor] = None)

Performs MCMC sampling using Hamiltonian Monte-Carlo step.

Records the chain for later examination.

Args: state (torch.Tensor, optional): Model parameters as a 1D tensor.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

astrophot.fit.MHMCMC#

Metropolis-Hastings Markov-Chain Monte-Carlo sampler, based on: https://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm . This is simply a thin wrapper for the Emcee package, which is a well-known MCMC sampler.

Note that the Emcee sampler requires multiple walkers to sample the parameter space efficiently. The number of walkers is set to twice the number of parameters by default, but can be made higher (not lower) if desired. This is done by passing a 2D array of shape (nwalkers, ndim) to the fit method.

Args:

  • likelihood: The likelihood function to use for the MCMC sampling. Can be “gaussian” or “poisson”. Default is “gaussian”.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: density()

Returns the density of the model at the given state vector. This is used to calculate the likelihood of the model at the given state.

method: fit( state: Optional[numpy.ndarray] = None, nsamples: Optional[int] = None, restart_chain: bool = True, skip_initial_state_check: bool = True, flat_chain: bool = True)

Performs the MCMC sampling using a Metropolis Hastings acceptance step and records the chain for later examination.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.

astrophot.fit.Slalom#

Slalom optimizer for Model objects.

Slalom is a gradient descent optimization algorithm that uses a few evaluations along the direction of the gradient to find the optimal step size. This is done by assuming that the posterior density is a parabola and then finding the minimum.

The optimizer quickly finds the minimum of the posterior density along the gradient direction, then updates the gradient at the new position and repeats. This continues until it reaches a set of 5 steps which collectively improve the posterior density by an amount smaller than the relative_tolerance threshold, indicating that convergence has been achieved. Note that this convergence criteria is not a guarantee, simply a heuristic. The default tolerance was such that the optimizer will substantially improve from the starting point, and do so quickly, but may not reach all the way to the minimum of the posterior density. Like other gradient descent algorithms, Slalom slows down considerably when trying to achieve very high precision.

Args:

  • S (float, optional): The initial step size for the Slalom optimizer. Defaults to 1e-4.

  • likelihood (str, optional): The likelihood function to use for the optimization. Defaults to “gaussian”.

  • report_freq (int, optional): Frequency of reporting the optimization progress. Defaults to 10 steps.

  • relative_tolerance (float, optional): The relative tolerance for convergence. Defaults to 1e-4.

  • momentum (float, optional): The momentum factor for the Slalom optimizer. Defaults to 0.5.

  • max_iter (int, optional): The maximum number of iterations for the optimizer. Defaults to 1000.

method: chi2min() -> float

Returns the minimum value of chi^2 loss in the loss history.

method: density( state: Annotated[torch.Tensor, ‘One of: torch.Tensor or jax.numpy.ndarray depending on the chosen backend.’]) -> Annotated[torch.Tensor, ‘One of: torch.Tensor or jax.numpy.ndarray depending on the chosen backend.’]

Calculate the density of the model at the given state. Based on self.likelihood, will be either the Gaussian or Poisson negative log likelihood.

method: fit() -> astrophot.fit.base.BaseOptimizer

Perform the Slalom optimization.

method: res() -> numpy.ndarray

Returns the value of lambda (state parameters) at which minimum loss was achieved.

method: res_loss()

returns the minimum value from the loss history.