Public Types | Public Member Functions | Static Public Attributes | Protected Member Functions | Static Protected Member Functions | Protected Attributes | Private Member Functions | Private Attributes | Static Private Attributes | List of all members
ana::BayesianMarginal Class Reference

The result of marginalizing over a set of Bayesian MCMC samples down to a few dimensions. More...

#include "/cvmfs/nova-development.opensciencegrid.org/novasoft/releases/N20-11-28/CAFAna/Fit/BayesianMarginal.h"

Inheritance diagram for ana::BayesianMarginal:
ana::Bayesian1DMarginal ana::BayesianSurface

Public Types

enum  MarginalMode { MarginalMode::kHistogram, MarginalMode::kLLWgtdHistogram, MarginalMode::kKNN }
 How to construct marginal distributions. More...
 

Public Member Functions

 BayesianMarginal (const MCMCSamples &samples, const std::vector< IFitVarOrISyst > &orderedNames, MarginalMode mode=MarginalMode::kHistogram)
 
double QuantileThreshold (Quantile quantile, const TH1 *pdf=nullptr) const
 
double QuantileThreshold (double quantile, const MCMCSamples &samples, const TH1 *pdf=nullptr) const
 
std::vector< const ISyst * > Systs () const
 
std::vector< const IFitVar * > Vars () const
 

Static Public Attributes

static const std::vector< std::pair< Quantile, double > > QuantileUpVals
 

Protected Member Functions

 BayesianMarginal ()
 
double EstimateLLFromKNN (const std::vector< float > &vals) const
 
void SaveTo (TDirectory *dir) const
 
std::unique_ptr< TH1 > ToHistogram (const std::vector< Binning > &bins) const
 

Static Protected Member Functions

static double ThresholdFromTH1 (const TH1 *pdf, double quantile)
 Determine the threshold for the 'histogram' mode from the given histogram with a brute force method. More...
 
static void LoadInto (BayesianMarginal *marg, TDirectory *dir)
 

Protected Attributes

MarginalMode fMode
 What mode are we using? More...
 
const MCMCSamplesfMCMCSamples
 Ref to the MCMCSamples. Will not be persisted. More...
 
std::vector< std::stringfOrderedBrNames
 the variables/systs to be marginalized over, in axis order (x-axis, y-axis, ...) More...
 

Private Member Functions

void SetupKNN (TMVA::DataLoader *loader=nullptr)
 

Private Attributes

std::map< Quantile, MCMCSamplefQuantileSampleMap
 The samples that are the boundaries for various commonly-used quantiles. More...
 
std::unique_ptr< TMVA::MethodKNN > fkNN
 the kNN used for approximating the surface from the points when in kNN mode More...
 
std::unique_ptr< TMVA::DataLoader > fdataLoader
 Needed for kNN. More...
 
std::vector< const IFitVar * > fVars
 Variables we want to marginalize to (see also fSysts) More...
 
std::vector< const ISyst * > fSysts
 Systs we want to marginalize to (see also fVars) More...
 

Static Private Attributes

static const std::string sKNNName = "bayes_marginal_knn"
 

Detailed Description

The result of marginalizing over a set of Bayesian MCMC samples down to a few dimensions.

Definition at line 41 of file BayesianMarginal.h.

Member Enumeration Documentation

How to construct marginal distributions.

Enumerator
kHistogram 
kLLWgtdHistogram 
kKNN 

Definition at line 45 of file BayesianMarginal.h.

46  {
47  kHistogram,
48  kLLWgtdHistogram,
49  kKNN,
50  };

Constructor & Destructor Documentation

ana::BayesianMarginal::BayesianMarginal ( const MCMCSamples samples,
const std::vector< IFitVarOrISyst > &  orderedNames,
MarginalMode  mode = MarginalMode::kHistogram 
)

Definition at line 39 of file BayesianMarginal.cxx.

References ana::assert(), om::cerr, om::cout, allTimeWatchdog::endl, fdataLoader, fkNN, fMode, fOrderedBrNames, fQuantileSampleMap, fSysts, fVars, MECModelEnuComparisons::i, kKNN, ana::MCMCSamples::NumSamples(), ana::MCMCSamples::QuantileLL(), QuantileUpVals, ana::MCMCSamples::Sample(), SetupKNN(), sKNNName, ana::MCMCSamples::Systs(), ana::MCMCSamples::ToTTree(), PandAna.Demos.pi0_spectra::transform, PandAna.Demos.tute_pid_validation::var, and ana::MCMCSamples::Vars().

42  : fMode(mode), fMCMCSamples(&samples)
43  {
44  for (const auto & registerable : toFit)
45  {
46  if(registerable.ifitvar){
47  fVars.push_back(registerable.ifitvar);
48  fOrderedBrNames.push_back(registerable.ifitvar->ShortName());
49  }
50  else if (registerable.isyst){
51  fSysts.push_back(registerable.isyst);
52  fOrderedBrNames.push_back(registerable.isyst->ShortName());
53  }
54  else{
55  assert(false && "BayesianMarginal can only marginalize over IFitVars or ISysts");
56  }
57  }
58 
59  if (mode == MarginalMode::kKNN && samples.NumSamples() < 100)
60  {
61  // the kNN requires at least 100 points
62  std::cerr << "BayesianMarginal requires at least 100 MCMC samples to create a marginal distribution from them." << std::endl;
63  abort();
64  }
65 
66  for (const auto & var : fVars)
67  {
68  if (std::find(samples.Vars().begin(), samples.Vars().end(), var) == samples.Vars().end())
69  {
70  std::cerr << "Error: Requesting to marginalize over IFitVar not in MCMCSamples: " << var->ShortName() << std::endl;
71  abort();
72  }
73  }
74  for (const auto & syst : fSysts)
75  {
76  if (std::find(samples.Systs().begin(), samples.Systs().end(), syst) == samples.Systs().end())
77  {
78  std::cerr << "Error: Requesting to marginalize over ISyst not in MCMCSamples: " << syst->ShortName() << std::endl;
79  abort();
80  }
81  }
82 
84  {
85  // load the data & train...
86  // Use TMVA to build an k-nearest-neightbor estimator for the LL surface
87  std::cout << " Building a kNN from the MCMC points to estimate marginals..." << std::flush;
88  bool oldSilent = TMVA::gConfig().IsSilent();
89  TMVA::gConfig().SetSilent(true);
90  fdataLoader = std::make_unique<TMVA::DataLoader>(sKNNName.c_str());
91 
92  // we'd rather not mess around with the original tree,
93  // but passing a tree to AddRegressionTree() implicitly
94  // gives the kNN builder permission to modify the tree
95  // because it needs to be able to SetBranchAddress() etc.
96  // unfortunately, there is no 'const' method for cloning a tree
97  // because the clone comes back 'connected' (with all the same
98  // branch addresses etc.) as the original.
99  // so, we use some const_cast<>s just long enough to make a copy
100  // and then disconnect it in the next line.
101  // (that's what the second line is doing, despite the confusing method name,
102  // and the fact that it looks like it's called on the wrong object;
103  // see https://root.cern/doc/master/classTTree.html#a67af1e2908bc275c03268f84f1f2b282)
104  // stuff into a unique_ptr<> so taht this copy disappears as soon as training is over.
105  auto trainTree = std::unique_ptr<TTree>(const_cast<TTree *>(samples.ToTTree())->CloneTree());
106  const_cast<TTree *>(samples.ToTTree())->CopyAddresses(trainTree.get(), true);
107  const_cast<TTree *>(samples.ToTTree())->GetListOfClones()->Remove(trainTree.get());
108 
109  fdataLoader->AddRegressionTree(trainTree.get());
110  SetupKNN(fdataLoader.get());
111 
112  fkNN->Train();
113  TMVA::gConfig().SetSilent(oldSilent);
114  std::cout << " ... done." << std::endl;
115  }
116 
117 
118  // this is bit painful, but we want to only sort the LLs once in case there are many
119  std::vector<double> qVals;
121  std::back_inserter(qVals),
122  [](const std::pair<Quantile, double>& p){return p.second;});
123  // this is a map from quantile -> {idx, LL}
124  std::map<double, std::pair<std::size_t, double>> vals = samples.QuantileLL(qVals);
125 
126  for (std::size_t i = 0; i < QuantileUpVals.size(); ++i)
127  fQuantileSampleMap.emplace(QuantileUpVals[i].first, samples.Sample(vals[QuantileUpVals[i].second].first));
128  }
std::vector< const ISyst * > fSysts
Systs we want to marginalize to (see also fVars)
std::map< Quantile, MCMCSample > fQuantileSampleMap
The samples that are the boundaries for various commonly-used quantiles.
void SetupKNN(TMVA::DataLoader *loader=nullptr)
const char * p
Definition: xmltok.h:285
std::unique_ptr< TMVA::MethodKNN > fkNN
the kNN used for approximating the surface from the points when in kNN mode
OStream cerr
Definition: OStream.cxx:7
std::vector< std::string > fOrderedBrNames
the variables/systs to be marginalized over, in axis order (x-axis, y-axis, ...)
static const std::vector< std::pair< Quantile, double > > QuantileUpVals
std::unique_ptr< TMVA::DataLoader > fdataLoader
Needed for kNN.
OStream cout
Definition: OStream.cxx:6
MarginalMode fMode
What mode are we using?
const MCMCSamples * fMCMCSamples
Ref to the MCMCSamples. Will not be persisted.
static const std::string sKNNName
assert(nhit_max >=nhit_nbins)
std::vector< const IFitVar * > fVars
Variables we want to marginalize to (see also fSysts)
ana::BayesianMarginal::BayesianMarginal ( )
protected

Definition at line 131 of file BayesianMarginal.cxx.

133  {}
MarginalMode fMode
What mode are we using?
const MCMCSamples * fMCMCSamples
Ref to the MCMCSamples. Will not be persisted.

Member Function Documentation

double ana::BayesianMarginal::EstimateLLFromKNN ( const std::vector< float > &  vals) const
protected

Definition at line 136 of file BayesianMarginal.cxx.

References ana::assert(), fkNN, fMode, and kKNN.

Referenced by ana::BayesianSurface::BuildHist(), ana::Bayesian1DMarginal::QuantileRanges(), and ana::Bayesian1DMarginal::ToTH1().

137  {
139 
140  assert(vals.size() == fkNN->DataInfo().GetNVariables());
141 
142  TMVA::Event ev(vals, fkNN->DataInfo().GetNVariables());
143  return dynamic_cast<TMVA::MethodBase*>(fkNN.get())->GetRegressionValues(&ev)[0];
144  }
std::unique_ptr< TMVA::MethodKNN > fkNN
the kNN used for approximating the surface from the points when in kNN mode
MarginalMode fMode
What mode are we using?
assert(nhit_max >=nhit_nbins)
void ana::BayesianMarginal::LoadInto ( BayesianMarginal marg,
TDirectory *  dir 
)
staticprotected

Note that the signature here is different from usual. This version expects to be passed a derived class instance created by a derived LoadFrom() as its argument...

Definition at line 147 of file BayesianMarginal.cxx.

References fdataLoader, fkNN, fMode, fQuantileSampleMap, fSysts, fVars, kKNN, submit_nova_art::mode, SetupKNN(), sKNNName, systNames, Systs(), compareCafs::tree, and Vars().

Referenced by ana::BayesianSurface::LoadFrom().

148  {
149  auto varNames = dynamic_cast<TList*>(dir->Get("vars"));
150  for (const auto & obj : *varNames)
151  marg->fVars.push_back(Registry<IFitVar>::ShortNameToPtr(obj->GetName()));
152  auto systNames = dynamic_cast<TList*>(dir->Get("systs"));
153  for (const auto & obj : *systNames)
154  marg->fSysts.push_back(Registry<ISyst>::ShortNameToPtr(obj->GetName()));
155 
156  auto mode = dynamic_cast<TParameter<int>*>(dir->Get("mode"));
157  marg->fMode = MarginalMode(mode->GetVal());
158 
159  if (marg->fMode == MarginalMode::kKNN)
160  {
161  // again we're forced to make a temporary file (see SaveTo()). c'est la vie...
162  marg->fdataLoader = std::make_unique<TMVA::DataLoader>(sKNNName.c_str());
163  marg->SetupKNN(marg->fdataLoader.get());
164  auto tree = dynamic_cast<TTree *>(dir->Get("kNN_weights"));
165  tree->SetDirectory(nullptr);
166  TFile tmpFile("tmp_surface.root", "recreate");
167  tmpFile.cd();
168  tree->Write("knn");
169  marg->fkNN->ReadWeightsFromStream(tmpFile);
170  tmpFile.Close();
171  gSystem->Unlink("tmp_surface.root");
172  }
173 
174  auto quantMap = dynamic_cast<TMap*>(dir->Get("quantileSamples"));
175  for (const auto & quantPair : *quantMap)
176  {
177  auto pair = dynamic_cast<TPair*>(quantPair);
178  auto quant = Quantile(dynamic_cast<TParameter<int>*>(pair->Key())->GetVal());
179  marg->fQuantileSampleMap.emplace(std::piecewise_construct,
180  std::forward_as_tuple(quant),
181  std::forward_as_tuple(*dynamic_cast<TMap*>(pair->Value()),
182  marg->Vars(), marg->Systs()));
183  }
184 
185  }
MarginalMode
How to construct marginal distributions.
Quantile
Type to specify quantile values, so that we don&#39;t run into floating-point matching issues...
static const std::string sKNNName
TDirectory * dir
Definition: macro.C:5
std::map< std::string, std::string > systNames
Definition: PlotUnfolding.C:32
static const T * ShortNameToPtr(const std::string &s, bool allowFail=false)
Definition: Registry.cxx:60
double ana::BayesianMarginal::QuantileThreshold ( ana::Quantile  quantile,
const TH1 *  pdf = nullptr 
) const

Retrieve the value above which the remaining points comprise the given quantile.

Parameters
quantileThe quantile value desired
pdfIf fMode==MarginalMode::kHistogram, the returned value depends on the histogram that results from the marginalization (see on return val, below). Thus the relevant histogram must be supplied for that mode. For the other modes it may be left as nullptr.
Returns
Interpretation of return value depends on the MarginalMode fMode:

Definition at line 188 of file BayesianMarginal.cxx.

References ana::assert(), srt_file_template::find_if, fMode, fQuantileSampleMap, kHistogram, quantile(), QuantileUpVals, moon_position_table_new3::second, and ThresholdFromTH1().

Referenced by ana::Bayesian1DMarginal::QuantileRanges(), and ana::BayesianSurface::QuantileSurface().

189  {
190  // in "histogram" mode, the output isn't weighted by LL;
191  // what's returned is simply the best-guess pdf
192  // approximated by the histogram itself.
193  // so instead of simply yielding the LL itself,
194  // we need to find the pdf value corresponding to the right breakpoint.
196  {
197  assert(pdf && "In 'histogram' marginal mode, must supply the histogram for which the quantile threshold is to be calculated");
199  QuantileUpVals.end(),
200  [quantile](const auto & pair) { return pair.first == quantile;})->second);
201  }
202 
203  return fQuantileSampleMap.at(quantile).LL();
204  }
std::map< Quantile, MCMCSample > fQuantileSampleMap
The samples that are the boundaries for various commonly-used quantiles.
static const std::vector< std::pair< Quantile, double > > QuantileUpVals
static double ThresholdFromTH1(const TH1 *pdf, double quantile)
Determine the threshold for the &#39;histogram&#39; mode from the given histogram with a brute force method...
unsigned int quantile(std::vector< std::vector< double >> v, float E, float hadEfrac)
Definition: Toy_analyses.C:480
MarginalMode fMode
What mode are we using?
assert(nhit_max >=nhit_nbins)
double ana::BayesianMarginal::QuantileThreshold ( double  quantile,
const MCMCSamples samples,
const TH1 *  pdf = nullptr 
) const

See other variant of QuantileThreshold() for discussion. This version computes the threshold for an arbitrary quantile, rather than the prestored ones, and so requires the original MCMCSamples.

Definition at line 207 of file BayesianMarginal.cxx.

References ana::assert(), fMode, kHistogram, ana::MCMCSamples::QuantileLL(), and ThresholdFromTH1().

208  {
209  if (quantile == -1)
210  quantile = 0;
211  assert(quantile >= 0 && quantile <= 1 && "QuantileSurface(): quantile must be between 0 and 1");
212 
213  auto quantilePair = samples.QuantileLL(quantile);
214 
216  return ThresholdFromTH1(pdf, quantile);
217  return quantilePair.second;
218  }
static double ThresholdFromTH1(const TH1 *pdf, double quantile)
Determine the threshold for the &#39;histogram&#39; mode from the given histogram with a brute force method...
unsigned int quantile(std::vector< std::vector< double >> v, float E, float hadEfrac)
Definition: Toy_analyses.C:480
MarginalMode fMode
What mode are we using?
assert(nhit_max >=nhit_nbins)
void ana::BayesianMarginal::SaveTo ( TDirectory *  dir) const
protected

Definition at line 221 of file BayesianMarginal.cxx.

References fkNN, fMode, fQuantileSampleMap, fSysts, fVars, kKNN, submit_nova_art::mode, systNames, compareCafs::tree, and PandAna.Demos.tute_pid_validation::var.

Referenced by ana::BayesianSurface::SaveTo().

222  {
223  auto oldDir = gDirectory;
224  dir->cd();
225 
226  TParameter<int> mode("mode", int(fMode));
227  mode.Write();
228 
229  if (fMode == MarginalMode::kKNN)
230  {
231  // the fact that the tree gets saved into the file with name 'knn' is an unadvertised implementation detail...
232  // but do they need to save it as its own TFile so badly?
233  // we want to save it into our own file to keep the bits from getting too spread out.
234  TFile tmpFile("tmp_surface.root", "recreate");
235  fkNN->WriteWeightsToStream(tmpFile);
236  auto tree = dynamic_cast<TTree *>(tmpFile.Get("knn"));
237  tree->SetDirectory(nullptr);
238  dir->cd();
239  tree->Write("kNN_weights");
240  tmpFile.Close();
241  gSystem->Unlink("tmp_surface.root");
242  }
243 
244  TMap quantMap;
245  for (const auto & quantPair : fQuantileSampleMap)
246  quantMap.Add(new TParameter<int>("", int(quantPair.first)), quantPair.second.ToTMap().release());
247  quantMap.Write("quantileSamples", TObject::kSingleKey);
248 
249  TList varNames;
250  for (const auto & var: fVars)
251  varNames.Add(new TObjString(var->ShortName().c_str()));
252  varNames.Write("vars", TObject::kSingleKey);
253 
254  TList systNames;
255  for (const auto & syst : fSysts)
256  systNames.Add(new TObjString(syst->ShortName().c_str()));
257  systNames.Write("systs", TObject::kSingleKey);
258 
259  if (oldDir)
260  oldDir->cd();
261  }
std::vector< const ISyst * > fSysts
Systs we want to marginalize to (see also fVars)
std::map< Quantile, MCMCSample > fQuantileSampleMap
The samples that are the boundaries for various commonly-used quantiles.
std::unique_ptr< TMVA::MethodKNN > fkNN
the kNN used for approximating the surface from the points when in kNN mode
MarginalMode fMode
What mode are we using?
TDirectory * dir
Definition: macro.C:5
std::map< std::string, std::string > systNames
Definition: PlotUnfolding.C:32
std::vector< const IFitVar * > fVars
Variables we want to marginalize to (see also fSysts)
void ana::BayesianMarginal::SetupKNN ( TMVA::DataLoader *  loader = nullptr)
private

Definition at line 264 of file BayesianMarginal.cxx.

References fkNN, fSysts, fVars, ana::MCMCSamples::LOGLIKELIHOOD_BRANCH_NAME, MECModelEnuComparisons::opt, sKNNName, and PandAna.Demos.tute_pid_validation::var.

Referenced by BayesianMarginal(), and LoadInto().

265  {
266  int entries = 0;
267  if (loader)
268  {
269  entries = loader->DataInput().GetEntries();
270  auto opt = Form("KNN,nkNN=%d", entries > 20 ? 20 : entries-1); // 20 neighbors is kNN default
271  fkNN = std::make_unique<TMVA::MethodKNN>(loader->GetDataSetInfo(), opt);
272  }
273  else
274  {
275  TMVA::DataSetInfo info(sKNNName.c_str());
276  fkNN = std::make_unique<TMVA::MethodKNN>(info, "KNN");
277  }
278 
279  auto & dsInfo = fkNN->DataInfo();
280  for (const auto & var : fVars)
281  dsInfo.AddVariable(var->ShortName());
282  for (const auto & syst : fSysts)
283  dsInfo.AddVariable(syst->ShortName());
284  dsInfo.AddTarget(MCMCSamples::LOGLIKELIHOOD_BRANCH_NAME, "", "", 0, 0);
285  if (loader)
286  dsInfo.SetSplitOptions(Form("nTrain_Regression=%d,nTest_Regression=0", entries));
287 
288  fkNN->SetAnalysisType( TMVA::Types::kRegression );
289  fkNN->SetupMethod();
290  fkNN->ParseOptions();
291  fkNN->ProcessSetup();
292  }
const XML_Char XML_Encoding * info
Definition: expat.h:530
std::vector< const ISyst * > fSysts
Systs we want to marginalize to (see also fVars)
std::unique_ptr< TMVA::MethodKNN > fkNN
the kNN used for approximating the surface from the points when in kNN mode
loader
Definition: demo0.py:10
static const std::string sKNNName
static const std::string LOGLIKELIHOOD_BRANCH_NAME
Definition: MCMCSamples.h:37
std::vector< const IFitVar * > fVars
Variables we want to marginalize to (see also fSysts)
std::vector<const ISyst*> ana::BayesianMarginal::Systs ( ) const
inline

Definition at line 76 of file BayesianMarginal.h.

Referenced by LoadInto(), and ana::Bayesian1DMarginal::ToTH1().

76 { return fSysts; }
std::vector< const ISyst * > fSysts
Systs we want to marginalize to (see also fVars)
double ana::BayesianMarginal::ThresholdFromTH1 ( const TH1 *  pdf,
double  quantile 
)
staticprotected

Determine the threshold for the 'histogram' mode from the given histogram with a brute force method.

Definition at line 294 of file BayesianMarginal.cxx.

References std::abs(), om::cerr, allTimeWatchdog::endl, compare_h5_caf::idx, it, RunSnowGlobes::pdf, quantile(), and Scale().

Referenced by QuantileThreshold().

295  {
296  if (pdf->Integral() <= 0)
297  return std::numeric_limits<double>::signaling_NaN();
298 
299  // easy cases
300  if (quantile == 0)
301  return 1.0;
302  else if (quantile == 1)
303  return 0.0;
304  else if (quantile < 0 || quantile > 1)
305  {
306  std::cerr << "BayesianMarginal::ThresholdFromTH1(): requested quantile = " << quantile << " is not between 0 and 1. Abort" << std::endl;
307  abort();
308  }
309 
310  // check if the histogram I was passed was already unit normalized.
311  // if not, make a copy that is
312  if (std::abs(pdf->Integral() - 1) > 0.00001)
313  {
314  pdf = dynamic_cast<TH1*>(pdf->Clone());
315  const_cast<TH1*>(pdf)->Scale(1./pdf->Integral());
316  }
317 
318  // first make a list of the fractional bin contents.
319  std::vector<double> contents(pdf->GetNcells());
320  double overflowC = 0;
321  double underflowC = 0;
322  for (int binIdx = 0; binIdx <= pdf->GetNcells(); binIdx++)
323  {
324  double binC = pdf->GetBinContent(binIdx);
325  if (binC == 0)
326  continue; // don't go inserting useless 0s into the vector to make it much longer and slower to sort
327 
328  if (pdf->IsBinOverflow(binIdx))
329  overflowC += binC;
330  else if (pdf->IsBinUnderflow(binIdx))
331  underflowC += binC;
332 
333  contents.push_back(binC);
334  }
335  if ((1 - quantile) < underflowC || (1 - quantile) < overflowC)
336  {
337  std::cerr << "Warning: Histogram underflow or overflow contains more than requested quantile (" << quantile << ") of events." << std::endl;
338  std::cerr << " Bayesian credible region will be incorrect." << std::endl;
339  }
340 
341  // then do a search to find the value above which the integral corresponds to the desired quantile.
342  // begin at the specified quantile index as a first guess
343 
344  std::sort(contents.begin(), contents.end());
345  std::size_t idx = (1 - quantile) * (contents.size() - 1);
346  auto it = contents.begin();
347  decltype(it) oldIt = it;
348  std::advance(it, idx);
349  double oldSum = std::accumulate(it, contents.end(), 0.0);
350  // i.e., are we including too much, so we need to shrink the interval by moving towards the end?
351  bool forward = oldSum > quantile;
352  while (it != contents.begin() && it != contents.end())
353  {
354  // if we're exactly right, then we're good
355  if (oldSum == quantile)
356  return *it;
357 
358  std::advance(it, forward ? +1 : -1);
359  double newSum = std::accumulate(it, contents.end(), 0.0);
360 
361  // if we just crossed over the quantile boundary, we're also done.
362  if ((newSum - quantile) / (oldSum - quantile) < 0)
363  break;
364 
365  oldSum = newSum;
366  oldIt = it;
367  }
368 
369  // the average is not unbiased (it depends on the steepness of the distribution near the breakpoint)
370  // but it's probably closer than the values on either side?
371  return (*it + *oldIt) / 2;
372  }
set< int >::iterator it
OStream cerr
Definition: OStream.cxx:7
float abs(float number)
Definition: d0nt_math.hpp:39
unsigned int quantile(std::vector< std::vector< double >> v, float E, float hadEfrac)
Definition: Toy_analyses.C:480
while(!feof(fp))
simulatedPeak Scale(1/simulationNormalisationFactor)
std::unique_ptr< TH1 > ana::BayesianMarginal::ToHistogram ( const std::vector< Binning > &  bins) const
protected

Definition at line 375 of file BayesianMarginal.cxx.

References ana::assert(), b, om::cerr, allTimeWatchdog::endl, stan::math::exp(), Fill(), fMCMCSamples, fMode, fOrderedBrNames, fSysts, fVars, std::isnan(), kHistogram, kLLWgtdHistogram, ana::MakeTH1D(), ana::MakeTH2D(), ana::MCMCSamples::NumSamples(), runNovaSAM::ret, ana::MCMCSamples::SampleLL(), ana::MCMCSamples::SampleValue(), ana::MCMCSamples::SampleValues(), string, ana::UniqueName(), and PandAna.Demos.tute_pid_validation::var.

Referenced by ana::BayesianSurface::BuildHist(), ana::Bayesian1DMarginal::QuantileRanges(), and ana::Bayesian1DMarginal::ToTH1().

376  {
377  // idea:
378  // * for mode==kHistogram:
379  // simply histogram the MCMC samples.
380  // * for mode==kLLWgtdHistogram:
381  // find the bin-averaged probability distribution
382  // by making the histogram weighted by exp(LL)
383  // and dividing it by the simple histogram of the events
385  assert(bins.size() == fOrderedBrNames.size());
386  assert(bins.size() > 0 && bins.size() < 3); // could port NOvA CAFAna 3D hist stuff if needed
387 
388  auto axisNameStr = std::accumulate(fOrderedBrNames.begin(), fOrderedBrNames.end(),
389  std::string(""),
390  [this](std::string s, const std::string & b)
391  {
392 
395  name = var->LatexName();
396  else if (auto syst = Registry<ISyst>::ShortNameToPtr(fOrderedBrNames[0], true))
397  name = syst->LatexName();
398  return s + ";" + name;
399  });
400  std::unique_ptr<TH1> ret;
401  if (bins.size() == 1)
402  ret.reset(MakeTH1D(UniqueName().c_str(), axisNameStr.c_str(), bins[0]));
403  else if (bins.size() == 2)
404  ret.reset(MakeTH2D(UniqueName().c_str(), axisNameStr.c_str(), bins[0], bins[1]));
405  // we haven't implemented MakeTH3D here. but won't worry about it for now
406  //else if (bins.size() == 3)
407  // ret.reset(MakeTH3D(UniqueName().c_str(), axisNameStr.c_str(), bins[0], bins[1], bins[2]));
408 
409  std::unique_ptr<TH1> denom;
410  if (bins.size() == 1)
411  denom = std::make_unique<TH1D>(*dynamic_cast<TH1D*>(ret.get()));
412  else if (bins.size() == 2)
413  denom = std::make_unique<TH2D>(*dynamic_cast<TH2D*>(ret.get()));
414  //else if (bins.size() == 3)
415  // denom = std::make_unique<TH3D>(*dynamic_cast<TH3D*>(ret.get()));
416  denom->SetName(UniqueName().c_str());
417 
418  // this could certainly be done less sloppily
419  // but I wanted to ensure we don't do lots of expensive lookups in std::maps
420  // inside the loop over samples, which could run into the millions of iterations
421  union VarOrSyst
422  {
423  const IFitVar * var;
424  const ISyst * syst;
425  };
426  std::vector<bool> isVar(fVars.size() + fSysts.size(), true);
427  std::vector<VarOrSyst> varsOrSysts(fVars.size() + fSysts.size());
428  for (std::size_t brIdx = 0; brIdx < fOrderedBrNames.size(); brIdx++)
429  {
430  const auto & brName = fOrderedBrNames[brIdx];
431  if (auto var = Registry<IFitVar>::ShortNameToPtr(brName, true))
432  varsOrSysts[brIdx].var = var;
433  else if (auto syst = Registry<ISyst>::ShortNameToPtr(brName, true))
434  {
435  isVar[brIdx] = false;
436  varsOrSysts[brIdx].syst = syst;
437  }
438  }
439  for (std::size_t sample = 0; sample < fMCMCSamples->NumSamples(); ++sample)
440  {
441  double logprob = fMCMCSamples->SampleLL(sample);
442  if (std::isnan(logprob))
443  std::cerr << "Warning: Encountered NaN log-probability in an MCMC sample. Other things will probably go wrong..." << std::endl;
444  std::map<const IFitVar*, double> varVals;
445  std::map<const ISyst*, double> systVals;
446  fMCMCSamples->SampleValues(sample, fVars, varVals, fSysts, systVals);
447  std::vector<double> vals;
448  for (std::size_t brIdx = 0; brIdx < varsOrSysts.size(); ++brIdx)
449  vals.push_back(
450  isVar[brIdx] ? fMCMCSamples->SampleValue(varsOrSysts[brIdx].var, sample)
451  : fMCMCSamples->SampleValue(varsOrSysts[brIdx].syst, sample));
452 
453  // could template this, but probably not worth the effort?
454  double numWgt = (fMode == MarginalMode::kHistogram) ? 1.0 : logprob;
455  if (bins.size() == 1)
456  ret->Fill(vals[0], numWgt);
457  else if (bins.size() == 2)
458  {
459  dynamic_cast<TH2*>(ret.get())->Fill(vals[0], vals[1], numWgt);
460  dynamic_cast<TH2*>(denom.get())->Fill(vals[0], vals[1]);
461  }
462  else if (bins.size() == 3)
463  {
464  dynamic_cast<TH3*>(ret.get())->Fill(vals[0], vals[1], vals[2], numWgt);
465  dynamic_cast<TH3*>(denom.get())->Fill(vals[0], vals[1], vals[2]);
466  }
467  }
468 
469  // now exponentiate the log-probs to get probabilities
471  {
472  ret->Divide(denom.get());
473  for (int binIdx = 0; binIdx < ret->GetNcells(); binIdx++)
474  {
475  // if it's *exactly* 0, this is a region where there were no samples altogether. just leave empty
476  if (ret->GetBinContent(binIdx) != 0.0)
477  ret->SetBinContent(binIdx, exp(ret->GetBinContent(binIdx)));
478  }
479  }
480  else if (fMode == MarginalMode::kHistogram)
481  {
482  if (ret->Integral() > 0)
483  ret->Scale(1. / ret->Integral());
484  }
485 
486  return ret;
487  }
const XML_Char * name
Definition: expat.h:151
std::vector< const ISyst * > fSysts
Systs we want to marginalize to (see also fVars)
std::size_t NumSamples() const
How many samples do we have?
Definition: MCMCSamples.h:135
TH1D * MakeTH1D(const char *name, const char *title, const Binning &bins)
Utility function to avoid need to switch on bins.IsSimple()
Definition: UtilsExt.cxx:74
OStream cerr
Definition: OStream.cxx:7
double SampleValue(const IFitVar *var, std::size_t idx) const
Get the value of Var var for sample number .
Definition: MCMCSamples.h:169
std::vector< std::string > fOrderedBrNames
the variables/systs to be marginalized over, in axis order (x-axis, y-axis, ...)
int isnan(const stan::math::var &a)
Definition: std_isnan.hpp:18
void SampleValues(std::size_t idx, const std::vector< const ana::IFitVar * > &vars, std::map< const ana::IFitVar *, double > &varVals) const
Get the values of FitVars vars for sample number idx.
const XML_Char * s
Definition: expat.h:262
correl_yv Fill(-(dy[iP-1][iC-1]), hyv->GetBinContent(iP, iC))
fvar< T > exp(const fvar< T > &x)
Definition: exp.hpp:10
MarginalMode fMode
What mode are we using?
TH2D * MakeTH2D(const char *name, const char *title, const Binning &binsx, const Binning &binsy)
Utility function to avoid 4-way combinatorial explosion on the bin types.
Definition: UtilsExt.cxx:86
double SampleLL(std::size_t idx) const
Get the LL for sample number .
Definition: MCMCSamples.h:162
const Binning bins
Definition: NumuCC_CPiBin.h:8
const MCMCSamples * fMCMCSamples
Ref to the MCMCSamples. Will not be persisted.
const hit & b
Definition: hits.cxx:21
assert(nhit_max >=nhit_nbins)
std::vector< const IFitVar * > fVars
Variables we want to marginalize to (see also fSysts)
std::string UniqueName()
Return a different string each time, for creating histograms.
Definition: Utilities.cxx:29
static const T * ShortNameToPtr(const std::string &s, bool allowFail=false)
Definition: Registry.cxx:60
enum BeamMode string
std::vector<const IFitVar*> ana::BayesianMarginal::Vars ( ) const
inline

Definition at line 77 of file BayesianMarginal.h.

References ana::bins, dir, RunSnowGlobes::pdf, quantile(), and ana::SaveTo().

Referenced by LoadInto(), and ana::Bayesian1DMarginal::ToTH1().

77 { return fVars; }
std::vector< const IFitVar * > fVars
Variables we want to marginalize to (see also fSysts)

Member Data Documentation

std::unique_ptr<TMVA::DataLoader> ana::BayesianMarginal::fdataLoader
mutableprivate

Needed for kNN.

Definition at line 113 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), and LoadInto().

std::unique_ptr<TMVA::MethodKNN> ana::BayesianMarginal::fkNN
mutableprivate

the kNN used for approximating the surface from the points when in kNN mode

Definition at line 111 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), EstimateLLFromKNN(), LoadInto(), SaveTo(), and SetupKNN().

const MCMCSamples* ana::BayesianMarginal::fMCMCSamples
protected

Ref to the MCMCSamples. Will not be persisted.

Definition at line 99 of file BayesianMarginal.h.

Referenced by ToHistogram().

MarginalMode ana::BayesianMarginal::fMode
protected
std::vector<std::string> ana::BayesianMarginal::fOrderedBrNames
protected

the variables/systs to be marginalized over, in axis order (x-axis, y-axis, ...)

Definition at line 102 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), and ToHistogram().

std::map<Quantile, MCMCSample> ana::BayesianMarginal::fQuantileSampleMap
private

The samples that are the boundaries for various commonly-used quantiles.

Definition at line 108 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), LoadInto(), QuantileThreshold(), and SaveTo().

std::vector<const ISyst*> ana::BayesianMarginal::fSysts
private

Systs we want to marginalize to (see also fVars)

Definition at line 118 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), LoadInto(), SaveTo(), SetupKNN(), and ToHistogram().

std::vector<const IFitVar*> ana::BayesianMarginal::fVars
private

Variables we want to marginalize to (see also fSysts)

Definition at line 116 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), LoadInto(), SaveTo(), SetupKNN(), and ToHistogram().

const std::vector< std::pair< Quantile, double > > ana::BayesianMarginal::QuantileUpVals
static
Initial value:

Definition at line 53 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), and QuantileThreshold().

const std::string ana::BayesianMarginal::sKNNName = "bayes_marginal_knn"
staticprivate

Definition at line 121 of file BayesianMarginal.h.

Referenced by BayesianMarginal(), LoadInto(), and SetupKNN().


The documentation for this class was generated from the following files: