API Reference¶
Quick Summary¶
These are the user facing classes and methods:
|
Produce sweights for a dataset given component pdfs. |
|
Produce weights using COWs. |
|
Helper function to convert a RooFit::RooAbsPdf object into a python callable that can be used by either the sweight or cow classes |
SWeighter¶
- class sweights.sweight(data, pdfs=[], yields=[], discvarranges=None, method='summation', compnames=None, alphas=None, rfobs=[], verbose=True, checks=True)¶
Produce sweights for a dataset given component pdfs.
Initialize sweight object.
This will compute the W and A (or alpha) matrices which are used to produce the weight functions. Evaluation of these functions on a dataset is done in a different function getWeight.
- Parameters
data (ndarray) – The dataset in the discriminating variable, should have shape (nevents,ndiscvars) so for one discriminating variable will just be (N,) or (N,1).
pdfs (list of callable) – A list of the component pdfs. These should be simple python callable functions which are passed the same number of parameters as there are discriminating variables. If running with the ‘roofit’ method (see method) then this can be a list of ROOT.RooAbsPdf objects. If you only have your pdfs defined as RooFit objects then you can use the wrapper function convertRooAbsPdf to convert them into the appropriate object type for this call and then use other methods.
yields (list of float) – A list of the component yields.
discvarranges (list of tuple or tuple of tuple, optional) – The ranges for each discriminating variable, as a tuple or list of two element tuples specifying the lower and upper bound for the range (the default is None which will use minus to plus infinity)
method (str, optional) – The sweights method to use. Must be one of ‘summation’, ‘integration’, ‘subhess’, ‘tsplot’, ‘rootfit’. The recommended default is summation corresponding to Variant B of arXiv:2112.04574
compnames (list of str, optional) – A list of the component names. Only used for the legend entries when making a plot of the weight functions with makeWeightPlot
alphas (ndarray, optional) – If using the ‘subhess’ method then the covariance matrix of a fit to the disciminanting variable(s) in which only the yields float must also be passed. This matrix is inverted to produce the W-matrix.
rfobs (list, optional) – If using the ‘roofit’ method then the discriminating variables (of type RooRealVar) on which the pdfs depend must also be passed
verbose (bool, optional) – Print output
checks (bool, optional) – Perform some checks that the derived weights are self consistent, in particular check that the sum of all component weights for a given value of the discriminating variable(s) is unity, and check that the sum of all weights for a given component reproduces the yields. This will print additional output to the screen and issue warnings if checks are not passed.
Notes
Many of the arguments passed and methods implemented are not strictly necessary. In particular the ‘tsplot’ and ‘roofit’ methods are just wrappers for implementations elsewhere that were originally used as cross-checks. In future versions these two methods should be removed to simplify this call.
See also
- computeWMatrix()¶
Compute the W matrix
- getWeight(icomp=0, *args)¶
Get the weights for a given component and set of discriminating variable values
- Parameters
icomp (int, optional) – Get the weight function for the ith component
*args (ndarray) – The values of the discriminating variables (one argument for each) as numpy arrays
- Returns
An array of the weights
- Return type
ndarray
- makeWeightPlot(axis=None, dopts=['r', 'b', 'g', 'm', 'c', 'y'], labels=None)¶
Make a plot of the weight functions
- Parameters
axis (optional) – matplotlib axis to draw will default to use plt.gca()
dopts (list of str) – List of colors for the different components
labels (list of str) – List of legend labels. Default will use those passed in compnames with __init__
- nll(pars)¶
Define NLL function to minimise if ‘refit’ method is used
- printChecks()¶
Print checks
- runRooFit()¶
Run the RooFit specific method
- runTSPlot()¶
Run the TSPlot specific method
- solveAlphas()¶
Compute the A (or alpha) matrix
Normally this is done by simply inverting the Wkl matrix
COWs¶
- class sweights.cow(mrange, gs, gb, Im=1, obs=None, renorm=True, verbose=True)¶
Produce weights using COWs.
Initialize cow object.
This will compute the W and A (or alpha) matrices which are used to produce the weight functions. Evaluation of these functions on a dataset is done in a different function getWeight.
- Parameters
mrange (tuple) – A two element tuple for the integration range in the discriminant variable
gs (callable) – The function for the signal pdf (numerator) must accept a single argument in this case the discriminant variable
gb (callable or list of callable) – A function or list of functions for the backgrond pdfs (numerator) which must each accept a single argument in this case the discriminant variable
Im (int or callable, optional) – The function for the “variance function” or I(m) (denominator) which must accept a single argument in this case the discriminant variable. Can also pass 1 for a uniform variance function (the default)
obs (tuple of ndarray, optional) – You can instead pass the observed distribution to evaluate Im instead. This expects the entries and bin edges in a two element tuple like the return value of np.histogram
renorm (bool, optional) – Renormalise passed functions to unity (you can override this if you already know it’s true)
Notes
For more details see arXiv:2112.04574
See also
- compWkl()¶
Compute the W matrix
- normalise(f)¶
Return a normalised pdf
- Parameters
f (callable) – Input function
- Returns
Normalised output function
- Return type
callable
- wk(k, m)¶
Return the weights
- Parameters
k (int) – Index of the component
m (ndarray) – Values of the discriminating variable to compute weights for
- Returns
Values of the weights
- Return type
ndarray
Covariance Correction¶
- sweights.cov_correct(hs, gxs, hdata, gdata, weights, Nxs, fvals, fcov, dw_dW_fs, Wvals, verbose=False)¶
Perform a second order covariance correction for a fit to weighted data
- Parameters
hs (callable) – The control variable pdf which must take arguments (x, p0,…,pn) which has been fitted to a weighted dataset, where x is the observable and p0 … pn are the shape parameters
gxs (list of callable) – A list of the disciminant variable pdfs which must take a single argument (x) where x is the observable (shape parameters of gxs must be known in this case)
hdata (ndarray) – The data values of the control variable observable at which the hs pdf is evaluated
gdata (ndarray) – The data values of the disciminant variable observable at which the gxs pdfs are evaluated
weights (ndarray) – The values of the weights
Nxs (list of float or tuple of float) – A list of the fitted yield values for the components gxs
fvals (array_like) – A list of the fitted values of the shape parameters p0,….,pn
fcov (ndarray) – A covariance matrix of the weighted likelihood fit before the correction (this is normally available from the minmiser e.g. iminuit)
dw_dW_fs (list of callable) – A list of the functions describing the partial derivate of the weight function with respect to the W matrix elements (see the tests/example.py tutorial to see this passed for sweights and COWs)
Wvals (list of float or tuple of float) – A list of the W matrix elements
verbose (bool, optional) – Print some output
- Returns
The corrected covariance matrix
- Return type
ndarray
Notes
This function corresponds to the weighted score function with sweights (both terms of Eq.51 in arXiv:2112.04574). If the shape parameters of the gxs are not known then the full sandwich estimate must be used which is not yet implemented in this package.
- sweights.approx_cov_correct(pdf, data, wts, fvals, fcov, verbose=False)¶
Perform a first order covariance correction for a fit to weighted data
- Parameters
pdf (callable) – The control variable pdf which must take arguments (x, p0,…,pn) which has been fitted to a weighted dataset, where x is the observable and p0 … pn are the shape parameters
data (ndarray) – The data values of the observable at which the pdf is evaluated
wts (ndarray) – The values of the weights
fvals (array_like) – A list of the fitted values of the shape parameters p0,….,pn
fcov (ndarray) – A covariance matrix of the weighted likelihood fit before the correction (this is normally available from the minmiser e.g. iminuit)
verbose (bool, optional) – Print some output
- Returns
The corrected covariance matrix
- Return type
ndarray
Notes
This function corresponds to the weighted score function with independent weights (first term of Eq.51 in arXiv:2112.04574). This is often a good approximation for sweights although will be an overestimate. For a better correction for sweights use cov_correct (which is only in general accurate when the shape parameters in the discriminating variable are known)
See also
Utilities¶
- sweights.kendall_tau(x, y)¶
Return kendall tau correlation coefficient.
Useful for ascertainting the extent to which two samples are independent. In particular for sweights and COWs one wants to know the extent to which the discriminant and control variables factorise.
- Parameters
x (ndarray) – Values in the first dimension - must have the same shape as y
y (ndarray) – Values in the second dimension - must have the same shape as x
- Returns
Two element tuple with the coefficient and its uncertainty
- Return type
tuple
Notes
x and y must have the same dimension. This function now uses scipy.stats.kendalltau for the coefficent calcualtion (the uncertainty calculation is trivial) which makes a few optimisations. See the scipy documentation for more information.
- sweights.convertRooAbsPdf(pdf, obs, npoints=400, forcenorm=False)¶
Helper function to convert a RooFit::RooAbsPdf object into a python callable that can be used by either the sweight or cow classes
- Parameters
pdf (RooAbsPdf) – The pdf, must inherit from RooAbsPdf (e.g. RooGaussian, RooExponential, RooAddPdf etc.)
obs (RooRealVar) – The observable, must inherit from RooRealVarLValue but will usually be a RooRealVar
npoints (int, optional) – The number of points to use for the interpolation
forcenorm (bool, optional) – Force the return function to be normalised by performing a numerical integration of it (the function should in most cases be normalised properly anyway so this shouldn’t be needed)
- Returns
A callable function representing a normalised pdf which can then be passed to the sweight or cow classes
- Return type
callable