An important feature of flavio is to deal with theoretical as well as
experimental uncertainties and to construct likelihoods to fit parameters
or Wilson coefficients from existing measurements. To quantify uncertainties,
the flavio.statistics
submodule contains classes for probability
distributions that can be associated to parameters or measurements.
In flavio, theoretical uncertainties in predictions of observables always result from uncertainties associated to parameters. Also formally non-parametric uncertainties, like the ones stemming from unknown higher-order contributions,are parameterised in terms of such parameters, e.g. by multiplying the prediction with a “fudge factor”.
In Bayesian statistics, the uncertainties of the parameters can be quantified in terms of a (prior) probability distribution function (PDF). In frequentist statistics, this is not the case; nevertheless, one can view a constraint on a parameter as an “auxiliary measurement” of the parameter, which again has the form of a probability distribution. Since the statistical interpretation occurs at a later step, no distinction between these two cases is made at the fundamental level in flavio.
The connection between parameters and PDFs (sticking to the Bayesian wording
from now on) is made through the flavio.classes.ParameterConstraints
class.
flavio.default_parameters
is a predefined instance of this class containing
the default constraints for all parameters
(see also Modifying defaults).
In addition to parameters, also measurements have uncertainties and are thus
associated with PDFs. The flavio.classes.Measurement
class derives from the
same parent as ParameterConstraints
and is very similar. The main difference
is that, while ParameterConstraints
contains constraints on all parameters,
Measurement
typically only contains a constraint on measurements of one or
a few observables (e.g. from a single experimental analysis) of a decay channel.
The predefined measurements are all read from the file
measurements.yml.
YAML files can be used to set the constraints on both parameters and measurements. Consult the following files,
containing the default constraints, for examples. The following examples show in which format the constraints can be written for different choices of PDFs.
Usual, symmetric Gaussian constraints can simply be written as (showing several equivalent possibilities)
An asymmetric constraint (defined as a continuous PDF made of two half-Gaussians) can be written as
When several uncertainties are given, e.g.
the individual PDFs are convoluted numerically to obtain a combined PDF.
For all distributions, there is an additional low-level interface that can be used instead:
is equivalent to the previous constraint above. This method is useful particularly for less common PDFs that cannot be accessed by shorthand notation. See the API docs for details on the available distributions.
A uniform distribution, i.e. a fixed probability within a range and zero probability outside, can be specified by giving the range boundaries in angle brackets, e.g. (equivalently)
For positive quantities that are not yet measured, but there exists an upper bound, you can use a half-Gaussian with mode at 0 by specifying, e.g.
where the standard deviation is derived automatically by requiring the cumulative probability below this limit to be, in this case, 0.95. Arbitrary percentages in the open interval (0, 100) are allowed.
For upper limits on positive quantities that arise from low-statistics counting experiments, a Gamma distribution is more appropriate than a normal distribution. Assuming a counting experiment recorded 10 events in total, while 7 background events had been expected, and the experiment quotes an upper limit of $3\times 10^{-9}$ with 95% confidence level on the quantity of interest, which is proportional to the count of signal events (e.g. a signal rate), then the constraint can be specified (using the low-level interface) as
If the number of expected background events is itself uncertain, the appropriate distribution would be a Gamma distribution convolved with a normal distribution for the background uncertainty. This can be used with, e.g.,
corresponding to $7\pm2$ expected background events.
Univariate constraints can also be defined by directly specifying the values of the PDF at a set of points. In this way, arbitrary probability distributions can be used. In YAML, the form is
where y
is the value of the PDF with arbitrary normalisation.
There are two important caveats when using numerical distributions:
The easiest way to include a multivariate Gaussian constraint differs for parameters and measurements.
Just as in the univariate case, also multivariate constraints can be defined by directly specifying the value of the PDF on a grid of points. This works for arbitrary dimensions. In YAML, a two-dimensional example would be
The same caveats as in the univariate case apply.