Machine Learning algorithms for the Belle II experiment and their validation on Belle data

Thomas Keck (contact@tkeck.de)

KIT

KIT

SuperKEKB

SuperKEKB / Belle II

Belle II

Belle II

First Collisions at Belle II (26.05.2018)

B Mesonen

$$ \Upsilon(\mathrm{4S}) \rightarrow \mathrm{B} \bar{\mathrm{B}} $$

B Meson Decay

B Meson

Simple Analogy: Uranium

Machine Learning at Belle II

Data Acquisition

Event Reconstruction

Data Analysis

Full Event Interpretation

Idea

Each event contains exactly two B Mesons
$$ \Upsilon(\mathrm{4S}) \rightarrow \mathrm{B} \bar{\mathrm{B}} $$

Idea

Automatically reconstruct one out of the two B Mesons
$$ \mathrm{B}^- \rightarrow \mathrm{D^{0\star}} \mathrm{D}^- \mathrm{K}^0_\mathrm{s} \rightarrow \dots \rightarrow 2 \mathrm{K}\ 7 \pi\ 4 \gamma$$
Over 10000 possible chains are considered!

Idea

Rest of Event is another B meson!
$$ \mathrm{B}^+ \rightarrow \tau^+ \nu \rightarrow \mu^+ \nu \nu \nu $$
Rare decay, only accessible at a B factory!

Hierarchical Reconstruction

One BDT for each decay-channel $\rightarrow$ Currently $177$ BDTs

Simple Analogy: Image Recognition

Pixels
Objects

Simple Analogy: Image Recognition

Data
Decays
Probability 0.8
Tracks (1, 4, 6, ...) and Clusters (3, 5, 10, ...)
$ \mathrm{B}^- \rightarrow \mathrm{D^{0\star}} \mathrm{D}^- \mathrm{K}^0_\mathrm{s} \rightarrow \dots \rightarrow 2 \mathrm{K}\ 7 \pi\ 4 \gamma $
Probability 0.6
Tracks (2, 4, 7, ...) and Clusters (4, 5, 6, ...)
$ \mathrm{B}^- \rightarrow \mathrm{D^0} \pi^+ \pi^- \pi^- \rightarrow \dots \rightarrow 1 \mathrm{K}\ 4 \pi\ 2 \gamma$
$\dots$

Result

Usable efficiency significantly improved and its faster in direct comparison!

Runtime: FastBDT

Inference

Custom BDT implementation: fast, robust, HEP-specific features

Runtime: FastBDT

Fitting

Custom BDT implementation is (originally) one order of magnitude faster
$\rightarrow$ efficient caching and equal-frequency binning

Deep Learning in HEP

Data Driven Algorithms

  1. Train classifier on simulated events
  2. Apply classifier to recorded events
Simulated Events and Recorded Events are NOT compatible

Data Driven Algorithms

  1. Train classifier A: recorded vs. simulated events
  2. Apply classifier A to simulated events
  3. Reweight simulated events: $ w = \frac{p}{1-p} $
  4. Train classifier B: simulated signal vs. background events
  5. Apply classifier B to recorded events

Uniformity Constraints

  1. Train classifier using low-level features
  2. Apply classifier to increase signal-to-noise ratio
  3. Measure the resonance sub-structure
Resonance Substructure
Selection Efficiency
Feature Correlation Matrix

Non-uniform selection efficiency creates artificial signals!

Adversarial Networks

G. Louppe, M. Kagan, K. Crammer

Adversarial Networks

Questions and Answers

References

FastBDT

Efficient caching (spatial, temporal)
Equal-frequency binning

Runtime: FastFit

Simple vertex fitter based on Eigen is one order of magnitude faster

Neuro Z Trigger

  • Virtex 6 vhx380 FPGA on Custom Board
  • Identifies tracks originating in the interaction region
  • 280 nano-seconds per decision

Traditional Flavour Tagger

Deep Flavour Tagger

Out-of-the-box superior to previous traditional algorithm

SuperKEKB