ALFA: automated line fitting algorithm: manual

NAME

alfa − automated line fitting algorithm

SYNOPSIS

alfa [OPTION] [VALUE]... [FILE]

DESCRIPTION

alfa rapidly fits emission line spectra, using a genetic algorithm to optimise fitting parameters. It is intended to be entirely automated, but while the default values should work well in many situations, a good fit to your observed spectrum may require some adjustments to the input parameters. It is optimised for optical spectra, but can be applied to any wavelength range if a suitable line catalogue is provided.

alfa reads one dimensional spectra in either plain text or FITS format. Plain text input should consist of two columns, giving wavelength and flux. It can also read data cubes and row-stacked spectra in FITS format. Results are written out in plain text, to files containing the fit (total fit, continuum-subtracted original, continuum, sky lines and residuals), and the line flux measurements.

OPTIONS

−b, −−bad-data [REAL]

If all values in a spectrum are below the value specified, alfa will not fit it. Most useful for avoiding wasting time on low signal regions of data cubes.

−−citation

Prints out the bibliographic details of the paper to cite if you use alfa in your research.

−−collapse

Sums all spectra in multi-dimensional FITS files into a single spectrum. Has no effect on 1D data.

−el, −−exclude-line [REAL]

When reading in the line catalogues, any wavelengths indicated with this option will be ignored. For example, if H alpha were saturated, it could be excluded from the fit with --exclude-line 6562.77. Any number of lines can be excluded by repeating the option with the appropriate wavelengths.

−g, −−generations [INTEGER]

The number of generations used in the genetic algorithm. Default: 500

−n, −−normalise [VALUE]

Normalise to Hb=100 assuming that F(Hb)=VALUE. If VALUE is zero, no normalisation is applied. If this option is not specified, fluxes are normalised using the measured value of Hb if it is detected, and not normalised otherwise.

−o, −−output−dir [DIRECTORY]

The directory in which to put the output files. Default: current working directory.

−pr, −−pressure [REAL]

The fraction of the population retained from each generation. The product of the pressure and the population size should be an integer. Default: 0.3

−ps, −−populationsize [INTEGER]

The size of the population used in the genetic algorithm. Default: 30

−rp, −−rebin [INTEGER]

Rebin the input spectrum by the specified factor. Useful for high resolution spectra where line profiles are not instrumental but kinematic. This option is currently only implemented for 1d spectra.

−rg, −−resolution−guess [VALUE]

Initial guess for the resolution [lambda/delta lambda]. Default: estimated using the sampling of the input spectrum, assuming that it is Nyquist sampled.

−rtol1, −−resolution−tolerance−1 [VALUE]

Variation allowed in resolution in first pass. Default: equal to 0.9 x resolution guess.

−rtol2, −−resolution−tolerance−2 [VALUE]

Variation allowed in resolution in second pass. Default: 500.

−skyc, --sky-catalogue; −sc, --strong-catalogue; −dc, --deep-catalogue [FILENAME]

The files containing the line catalogues to be used for the removal of sky lines, the estimation of velocity and resolution, and the full line fitting. The default catalogues are stored in /usr/share/alfa . If you wish to create your own catalogue, the required format is that each line should be 85 characters wide, with a wavelength in the first column, and the rest of the characters are not used by the code but are transferred to the output files. They can thus be used, as in the supplied catalogues, for line transition data. To use the default catalogues but exclude some lines, the --exclude-line option can be used.

−ss, −−subtract−sky

Fit and subtract night sky emission lines before fitting nebular emission lines.

−ul, −−upper-limits

Write out upper limits for all lines searched for and not detected.

−vg, −−velocity−guess [VALUE]

Initial guess for the velocity of the object [km/s]. Default: 0.

−vtol1, −−velocity−tolerance−1 [VALUE]

Variation allowed in velocity in first pass of the fitting. Default: 900km/s

−vtol2, −−velocity−tolerance−2 [VALUE]

Variation allowed in velocity in second pass of the fitting. Default: 60km/s

−ws, −−wavelength−scaling [VALUE]

alfa checks the units of FITS file headers to set the wavelength scale, defaulting to assuming Angstroms if no keyword is present. If your input spectrum is not in Angstroms, use this option to set the value by which wavelengths should be multiplied to convert them to A. For example, −ws 10.0 would apply if your spectra have wavelengths in nm.

ALGORITHM

alfa works in three stages. First, it estimates and subtracts the continuum. Second, it estimates the resolution of the spectra and the velocity of the object. And third, it fits the emission lines. These stages work as follows:
Continuum subtraction

alfa estimates the continuum using a percentile filter, taking the 25th percentile in a moving window 101 pixels wide. Currently these values are hard-coded, but will be user-configurable in a future release. Regions such as the Balmer and Paschen jumps may be poorly fitted by this method if the spectral resolution is low and the continuum gradient is changing fast. Broad stellar emission lines and telluric absorption features may also not be well fitted by this method. Inspection of the fitted continuum is recommended.

Estimation of the resolution and velocity

If no relevant command line options are specified, alfa begins by assuming that the velocity of the object is zero, and that the spectrum is Nyquist-sampled. It then carries out a fit on a subset of strong lines, using the genetic algorith described below, to obtain an overall estimate for the velocity and the resolution. If necessary, the initial guesses can be specified with the -vg and -rg options described above, and the parameter space for the fine tuning can be specified with -vtol1 and -rtol1.

Fitting of the emission lines

With the continuum subtracted and the resolution and velocity estimated, alfa divides the spectrum up into chunks 440 pixels wide, with 20 pixels at either end overlapping with adjacent chunks. Then the genetic algorithm fits all lines from the deep catalogue that fall within the central 400 pixels, with the overlap regions providing the full line profile for lines close to the edge of the chunk. The initial guess for the resolution and velocity are taken from the global estimate for the first chunk, and from the preceding chunk’s fine tuned value for all succeeding chunks.

With the parameters optimised in each chunk, uncertainties are estimated using the root mean square of the residuals in a 20 pixel window, exlucing the two largest residuals to mitigate against overestimated uncertainties in the neighbourhood of bad pixels or strong lines.

INPUT FILES

alfa can read either plain text files or FITS format files. For plain text, the file should contain a wavelength and a flux, with the wavelength in the same units as the line catalogues (the default catalogues have wavelengths in Angstroms). FITS files are read using the CFITSIO library, so any FITS-compliant file should be fine. However, a surprisingly large fraction of all FITS files do not comply with the standard, so in case of problems, trying using fitsverify to check your FITS file.

The FITS file can have one, two or three dimensions. If it has two, it is assumed to be in Row-Stacked Spectra (RSS) format, while if it has three, it is assumed to be a data cube with two axes representing spatial dimensions and the third representing the spectral dimension.

If you don’t want to fit the whole dataset, you can specify the range of pixels on each axis that you want alfa to read in. This functionality is part of the CFITSIO library, and the format is described at https://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c_user/node94.html. alfa itself does not read in the coordinates of the section, and so the output file numbering starts from 1 on each axis regardless of where the image section actually started. The next release of alfa will have improved support for image sections.

OUTPUT FILES

For single spectra, alfa writes out three text files containing its results. Their filenames are the input file suffixed with _fit, _lines, and _lines.tex.
The fit file (filename_fit):

The fit file contains the best fitting synthesised spectrum. It contains seven columns, representing the wavelength, the input spectrum, the fitted spectrum, the original after continuum subtraction, the estimated continuum, the fluxes of sky lines, and the residuals. Thus, to see the fitted spectrum, you need to plot columns 1 and 3 of this file. In gnuplot, one can compare the input and fitted spectra using this command:
plot ’filename_fit’ w l, ’filename_fit’ using 1:3 w l

The plain text lines file (filename_lines):

This file contains four columns with parameters of the fitted lines - the observed wavelength, the rest wavelength, the flux, and the uncertainty estimated from the residuals. This file can be read directly by neat, which determines abundances for photoionised nebulae.

The latex lines file (filename_lines.tex):

This file can be used in publications. It contains the information in the plain text lines file, as well as the line identification and atomic transition data.

For RSS files and data cubes, alfa currently produces two files per pixel, these being the fit file and the plain text lines file. Thus, for a data cube you may end up with tens of thousands of files in the output directory. FITS output will be supported in the next release of alfa.

USAGE NOTES

alfa’s default parameters are supposed to work in most cases, but sometimes you might find that it does not converge on the correct wavelength solution. It searches initially for velocities in the range +/-900km/s, which is very large for Galactic objects. So, running the code with --resolution-tolerance-1 100. or so may improve your results.

The genetic parameters (population size, number of generations, pressure) are likely to be suitable for most cases. There is no algorithm yet known for optimising these parameters in a genetic algorithm, so changing them requires trial and error. In spectra of regions with lots of emission lines, such as 4000-4500 Angstrom, increasing the number of generations can result in a better fit.

SEE ALSO

neat

BUGS

No known bugs. If reporting one, please state which version of alfa you were using, and include input and any output files produced if possible.

AUTHOR

Roger Wesson