Note
This document captures the redesign work of the lsst::afw::math::Statistics functionality.
Warning
This is an initial draft. A partially working proof of concept implementation, which sketches out the general design and has been benchmarked to be equivalent to the current Statistics performance on 4k x 4k noise images, is available at:
3 Requirements¶
3.1 Critical requirements From LDM-151 (chapter 9.7)¶
- Compute various robust statistics for central tendency and distribution widths, measured on 2-d and 1-d arrays.
- Needs to be able to make use of mask and uncertainty arrays.
- Needs to work on 2-D Images and MaskedImages.
- Needs to work on stacks of aligned pixels for coaddition.
3.2 Keep from current implementation¶
- The implementation will be in C++.
- Equivalent performance (but some micro optimizations may be dropped).
- For efficiency multiple statistics will be computed together in a single pass wherever possible.
- Enable statistics to be calculated for afw::image::Image<T>,afw::image::MaskedImage<T>andstd::vector<T>objects.
- Support all currently supported statistics. I.e.
- number of sample points
- sample mean
- sample standard deviation
- sample variance
- sample median
- sample inter-quartile range
- sample N-sigma clipped mean
- sample N-sigma clipped stdev
- sample N-sigma clipped variance
- sample minimum
- sample maximum
- find sum of pixels in the image
- find mean value of square of pixel values
- or-mask of all pixels used
- errors of requested quantities
 
 
- Support statistics of afw::image::Mask<T>.
- Allow specification with bitmask which pixels to include / exclude in statistic.
3.3 Additional functionality in new implementation¶
- Change to ndarray::Array(instead ofafw::image::Image<T>for type to operate on).
- Configuration using pex::config.
- Allow for adding new statistics to calculate (at compile time, not (necessarily) run time), and enable current statistics to be calculated differently.
- Add ability to see which pixels were clipped in output (e.g. as bitmask or as list of indices).
- Ensure design is compatible with afw::detection::SpanSet::applyFunctor.
3.4 Out of scope¶
- Adding new statistics.
4 Interfaces¶
All statistics requests are handled through the afw::math::computeStatistics function
(which roughly mimics scipy.stats.describe).
In Python its signature is as follows:
computeStatistics(img,
                   msk = None,
                   wgt = None,
                   var = None,
                   errors = False,
                   npoint = False,
                   mean = False,
                   stdev = False,
                   variance = False,
                   median = False,
                   iqrange = False,
                   meanclip = False,
                   stdevclip = False,
                   varianceclip = False,
                   min = False,
                   max = False,
                   sum = False,
                   meansquare = False,
                   ormask = False,
                   sctrl = StatisticsControl())
Where img, msk, wgt and var are typically a 1 dimensional numpy.ndarray, or any of the other supported overloads (see below).
In C++ the signature is:
template <typename ImageT, typename MaskT, typename WeightT, typename VarianceT>
StatisticsResult computeStatistics(ImageT const &img,
                          MaskT const *msk,     // (nullptr if not used)
                          WeightT const *wgt,   // (nullptr if not used)
                          VarianceT const *var, // (nullptr if not used)
                          int const flags,
                          StatisticsControl const &sctrl = StatisticsControl());
Where img, msk, wgt and var are typically ndarray::Array<T, 1, 1> of the appropriate type T, and where flags is a bitwise combination of:
enum Property {
    NOTHING = 0x0,         ///< We don't want anything
    ERRORS = 0x1,          ///< Include errors of requested quantities
    NPOINT = 0x2,          ///< number of sample points
    MEAN = 0x4,            ///< estimate sample mean
    STDEV = 0x8,           ///< estimate sample standard deviation
    VARIANCE = 0x10,       ///< estimate sample variance
    MEDIAN = 0x20,         ///< estimate sample median
    IQRANGE = 0x40,        ///< estimate sample inter-quartile range
    MEANCLIP = 0x80,       ///< estimate sample N-sigma clipped mean (N set in StatisticsControl, default=3)
    STDEVCLIP = 0x100,     ///< estimate sample N-sigma clipped stdev (N set in StatisticsControl, default=3)
    VARIANCECLIP = 0x200,  ///< estimate sample N-sigma clipped variance
                           ///<  (N set in StatisticsControl, default=3)
    MIN = 0x400,           ///< estimate sample minimum
    MAX = 0x800,           ///< estimate sample maximum
    SUM = 0x1000,          ///< find sum of pixels in the image
    MEANSQUARE = 0x2000,   ///< find mean value of square of pixel values
    ORMASK = 0x4000        ///< get the or-mask of all pixels used.
};
This is more natural in C++ for multiple options. An alternative would be to absorb the requested properties into StatisticsControl.
4.1 Overloads¶
The following overloads are present (also available from Python with the same syntax as above).
- Handle image::Imagearguments with an optionalimage::Mask.
template <typename ImagePixelT, typename MaskPixelT, typename WeightPixelT>
StatisticsResult computeStatistics(lsst::afw::image::Image<ImagePixelT> const &img,
                          lsst::afw::image::Mask<MaskPixelT> const *msk,
                          lsst::afw::image::Image<WeightPixelT> const *wgt,
                          int const flags,
                          StatisticsControl const &sctrl = StatisticsControl());
- Handle image::MaskedImage.
template <typename ImagePixelT, typename WeightPixelT>
StatisticsResult computeStatistics(image::MaskedImage<ImagePixelT> const &mimg,
                          lsst::afw::image::Image<WeightPixelT> const *wgt,
                          int const flags,
                          StatisticsControl const &sctrl = StatisticsControl());
- Handle image::Mask.
template <typename MaskPixelT>
StatisticsResult computeStatistics(image::Mask<MaskPixelT> const &msk,
                          int const flags,
                          statisticscontrol const &sctrl = statisticscontrol());
These overloads mostly just pass through the respective ndarrays obtained with .getArray().
4.2 Configuration¶
Statistics to compute are selected either with keyword arguments (Python) or a bitmask (C++).
Additional configuration settings are provided through an (optional) StatisticsControl object
using standard functionality provided by pex_config.
class NewStatisticsControl {
public:
    NewStatisticsControl()
            : numSigmaClip(3),
              numIter(3),
              andMask(0x0),
              noGoodPixelsMask(0x0),
              isNanSafe(true),
              calcErrorFromInputVariance(true),
              baseCaseSize(100),
              maskPropagationThresholds(16) {}
    LSST_CONTROL_FIELD(numSigmaClip, double, "Number of standard deviations to clip at");
    LSST_CONTROL_FIELD(numIter, int, "Number of iterations");
    LSST_CONTROL_FIELD(andMask, typename image::MaskPixel, "and-Mask to specify which mask planes to ignore");
    LSST_CONTROL_FIELD(noGoodPixelsMask, typename image::MaskPixel, "mask to set if no values are acceptable");
    LSST_CONTROL_FIELD(isNanSafe, bool, "Check for NaNs & Infs before running (slower)");
    LSST_CONTROL_FIELD(calcErrorFromInputVariance, bool,
                       "Calculate errors from the input variances, if available");
    LSST_CONTROL_FIELD(baseCaseSize, int, "Size of base case in partial sum for numerical stability");
    LSST_CONTROL_FIELD(
            maskPropagationThresholds, std::vector<double>,
            "Thresholds for when to propagate mask bits, treated like a dict (unset bits are set to 1.0)");
};
4.3 Results¶
Results are returned in the form of a single StatisticsResult object.
class StatisticsResult {
public:
    size_t getNpoint() const;
    std::pair<double, double> getMean() const;
    std::pair<double, double> getStdev() const;
    std::pair<double, double> getVariance() const;
    std::pair<double, double> getMedian() const;
    std::pair<double, double> getIqrange() const;
    std::pair<double, double> getMeanClip() const;
    std::pair<double, double> getStdevClip() const;
    std::pair<double, double> getVarianceClip() const;
    double getMin() const;
    double getMax() const;
    double getSum() const;
    typename image::MaskPixel getOrMask() const;
};
Note
Non-computed results are indicated with NaN.
Note
In Python all getters are also available as properties.
5 Implementation¶
The above requirements lead to the following design guidelines:
- Use compile-time polymorphism (through templated types) exclusively (at least in the inner loop) for abstraction, configurability and speed.
- Translate runtime options to compile-time constants (such that branches in the inner loop are compiled away) as much as possible.
- Enable pairwise summation for numerical stability.
- Separate accumulation of intermediate products from calculating of final values (to enable interoperability with
afw::detection::SpanSet::applyFunctor).
5.1 Design overview¶
The general design uses a Strategy like structure but with compile-time (instead of run-time) polymorphism. It has the following components:
Algorithm: calculates (collect) and combines (combineWith) intermediate values and produces the finalStatisticsResult(reduce). Relies on external state (ExternalData) for nested runs. OnlyStandardStatisticsis implemented, but can be swapped out later.
Runner<Algorithm, Validator>: performs (operator()) a single pass run of the algorithm on the data. Holds a (unique) instance ofExternalData. OnlySinglePassStatisticsis implemented, but can be swapped out later.
Validator: checks if pixels should be included or excluded. Are composable with various simple building blocks available.
5.2 Internal API¶
5.2.1 computeStatistics¶
The externally visible computeStatistics function calls an internal detail::translateOptions,
which has a series of overloads that reduce runtime function arguments to compile-time template parameters.
At the end of this series this results in a call to detail::computeStatistics.
namespace detail {
template <bool useMask, bool useWeight, bool useVariance, bool computeRange, bool computeMedian,
          bool sigmaClipped, typename ImageT, typename MaskT, typename WeightT, typename VarianceT>
StatisticsResult computeStatistics(ImageT const &img, MaskT const &msk, WeightT const &wgt,
                          VarianceT const &var, NewStatisticsControl const &sctrl);
}  // namespace detail
At this point one run (or multiple runs in case of sigma clipping) of SinglePassStatistics are performed
and the results are returned.
Note
Currently the code assumes StandardStatistics as the underlying algorithm, but this can be
changed to a template parameter if needed.
5.2.2 SinglePassStatistics¶
The default runner is SinglePassStatistics. This implementation uses pairwise summation (combination)
down to a baseCaseSize after which it performs a simple linear run through the data.
Internally it keeps one instance of Algorithm::ExternalData which is passed to the Algorithm::collect
and Algorithm::reduce functions (by reference) where needed.
It takes the following template parameters:
Validator: A functor that takes image, mask, weight and variance arguments and returnstrueif the value is to be clipped, orfalseotherwise.
Algorithm: The statistics algorithm to execute.
Constructors:
explicit SinglePassStatistics(const Validator &validator = Validator(), size_t baseCaseSize = 100)Copies of both input parameters are held as private members.
defaultcopy and move constructors (as well as copy and move assignment operators).
Member functions and operators:
operator(): RunAlgorithmonce on providedimg,msk,wgtandvarusing pairwise summation / combination.template <typename ImageT, typename MaskT, typename WeightT, typename VarianceT, typename... AlgorithmArgs> StatisticsResult operator()(ImageT const &img, MaskT const &msk, WeightT const &wgt, VarianceT const &var);
5.2.3 StandardStatistics¶
The default algorithm is represented by the StandardStatistics class which has the following API
Member type:
ExternalData: Algorithm specific data that has to be stored across different instances ofStandardStatistics, to be provided by the runner.
Template Parameters:
useMask,bool: Use the provided mask (assumemskisUnusedParameterif not)
useWeight,bool: Use the provided weights (assumewgtisUnusedParameterif not)
useVariance,bool: Use the provided variance (assumevarisUnusedParameterif not)
computeRange,bool: Computeminandmaxvalues (can be split if desired)
computeMedian,bool: Compute themedian(requiresExternalDatato store a copy of good pixel values)
Constructors:
- Default constructor,
defaultcopy and move constructors (as well as copy and move assignment operators).
Member functions:
collect: Gather intermediate products, can be run multiple times to accumulate one or more values.template <typename ImageIter, typename MaskIter, typename WeightIter, typename VarianceIter, typename Validator> void collect(ImageIter img, MaskIter msk, WeightIter wgt, VarianceIter var, size_t n, Validator const &validator, ExternalData &externalData);
reduce: Compute intermediate results (e.g.mean,median, orvariance, but notstdevwhich is computed lazily from the variance)StatisticsResult reduce(ExternalData &externalData);
combineWith: Combine intermediate products.void combineWith(const StandardStatistics &rhs);
5.2.4 Validators¶
During a run of Algorithm::collect each set of img, msk, wgt and var values
is passed to the provided Validator functor. If its return value is false the pixel is clipped.
Validators are composable. The makeCombinedChecker variadic function creates a single validator out of its provided arguments.
It does this by nesting instances of CheckBoth which itself takes two template arguments First and Second and has an operator() that returns the logical and of First::operator() and Second::operator().
The following building blocks are provided:
AlwaysTrue: always returnstrue;
AlwaysFalse: always returnsfalse;
CheckMask: returns bitwise and ofmskvalue withmaskargument provided at construction;
CheckFinite: returnstrueifimgis finite (e.g. notNaNor infinity);
CheckRange: returnstrueifimgis within the intervalcenter +/- limitwherecenterandlimitare provided at construction.
5.2.5 Unused parameters¶
Unused parameters (for msk, wgt or var) are given as nullptr (C++) or None (Python) in the API, but for lower layers are given as instances of UnusedParameter.
A fake vector / iterator that does nothing.
This makes the inner loop in Algorithm::collect straightforward while compiling away into non-existence..