General purpose analysis software for SIDIS at the EIC
This repository provides a set of common tools for the analysis of both full and fast simulations, including the following features:
- kinematics reconstruction methods (e.g., leptonic, hadronic, Jacquet-Blondel, etc.)
- calculations of SIDIS variables, such as
PhiH
andqT
, for single particles, as well as jet variables - application of common set of cuts
- ability to specify arbitrary multi-dimensional binning schemes
- outputs include binned histograms, tables, and other data structures such as
TTrees
- An analysis is primarily driven by macros, used to set up the binning and other settings
If you prefer to use your own analysis code, but would still like to make use of the common tools provided in this repository (e.g., kinematics reconstruction), this is also possible; you only need to stream the data structure you need, most likely within the event loop. In this situation, it is recommended you fork the repository (pull requests are also welcome).
-
To minimize setup effort, and provide a consistent development environment, a Singularity image is available, which contains all the dependencies pre-built, as well as sample ROOT files
- First run
container/install.sh
to download and build the Singularity image- With no arguments, a usage guide will be printed
- Default image file location is
container/img/
- Note that the image size is about 2 GB
- Images are hosted on Docker Hub
- (the Docker image is hosted, but Singularity can pull it too)
- Then run
container/shell.sh
to start a shell in the container- This will automatically call
source env.sh
upon shell startup, which sets environment variables
- This will automatically call
- Proceed with the Building section below (just type
make
)
- First run
-
Alternatively if you prefer to use Docker:
- obtain the image using
docker pull cjdilks/largex-eic:dev
- start the container using a standard
docker run
command; you can also use the scriptcontainer/devscripts/dockerShell.sh
, if you find it useful- the Docker image was built assuming a default user ID (UID) of 1000; if your
UID is different (check with the
id
command), your user name in the container may beI have no name!
, but you should still have read/write permission for the present working directory; we have not tested working in this condition, due to our preference for Singularity, however suggestions how to improve are welcome - Docker files are also provided, you can follow
container/devscripts/README.md
for instructions how to build your own image (which would allow you to change the default UID, or anything else you want)
- the Docker image was built assuming a default user ID (UID) of 1000; if your
UID is different (check with the
- once you are in the Docker container, proceed with the Building section below
- obtain the image using
- The other option is to manually set up your environment, by downloading and/or building all of the necessary dependencies
- Once you have all the dependencies, proceed with the Building section below
- ROOT: prefer v6.24.02 or later
- Delphes:
- the analysis is capable of reading
delphes
fast simulation output, and also provides a simple wrapper fordelphes
to help keep inputhepmc
and outputroot
files organized- it is not required to use the
delphes
wrapper, butdelphes
libraries are needed for the analysis of fast simulation data
- it is not required to use the
- first, make sure you have a build of
delphes
somewhere, preferably in a separate directory - set environment variables before doing anything, so this repository knows where your
delphes
build is:source env.sh /path/to/delphes/repository
- if you do not specify a path to
delphes
repository, it will use a default path given inenv.sh
; it is useful to edit this default path for your own convenience - it will also symlink
delphes
external code, so analysis macros will not complain
- if you do not specify a path to
- the analysis is capable of reading
- First make sure environment variables are set by calling
source env.sh
- Build analysis code with
make
- It requires a
root
build as well asdelphes
(see above) - All classes are found in the
src/
directory
- It requires a
- If you're ready to try the software hands-on, follow the tutorials in
the
tutorial/
directory
- for convenience, the wrapper script
exeDelphes.sh
is provided, which runsdelphes
on a givenhepmc
orhepmc.gz
file, and sets the output file names and the appropriate configuration card- configuration cards are stored in the
cards/
directory as a submodule- clone this
largex-eic
repository with--recurse-submodules
, or if you already have cloned without submodules, executegit submodule update --init
to obtain them
- clone this
- environment must be set first (
source env.sh
) - run
exeDelphes.sh
with no arguments for usage guide - in the script, you may need to change
exeDelphes
to the proper executable, e.g.,DelphesHepMC2
orDelphesHepMC3
, depending on the format of your generator input - if reading a gunzipped file (
*.hepmc.gz
), this script will automatically stream it throughgunzip
, so there is no need to decompress beforehand
- configuration cards are stored in the
- the output will be a
TTree
stored in aroot
file- output files will be placed in
datarec/
- input
hepmc(.gz)
files can be kept indatagen/
- output files will be placed in
- full simulation files are stored on S3; follow s3tools documentation for scripts and guidance
- in general, everything that can be done in fast simulation can also be done in
full simulation; just replace your usage of
AnalysisDelphes
withAnalysisDD4hep
- in practice, implementations may sometimes be a bit out of sync, where some features exist in fast simulation do not exist in full simulation, or vice versa
- TODO: more details
After simulation, this repository separates the analysis procedure into two stages: (1) the Analysis stage includes the event loop, which processes either fast or full simulation output, kinematics reconstruction, and your specified binning scheme, while (2) the Post-processing stage includes histogram drawing, comparisons, table printouts, and any feature you would like to add
The two stages are driven by macros. Example macros will eventually be added;
for now you can assume any macro named analysis_*.C
or postprocess_*.C
are
respective macros for the stages.
- Note: most macros stored in this repository must be executed from the
largex-eic
top directory, not from within their subdirectory, e.g., runroot -b -q tutorial/analysis_template.C
; this is because certain library and data directory paths are given as relative paths
- the
Analysis
class is the main class that performs the analysis; it is controlled at the macro level- a typical analysis macro must do the following:
- instantiate
Analysis
(with file names, beam energies, crossing angle) - set up bin schemes and bins (arbitrary specification, see below)
- set any other settings (e.g., a maximum number of events to process, useful for quick tests)
- execute the analysis
- see
src/Analysis.h
for further documentation in comments
- instantiate
- the output will be a
root
file, filled withTObjArray
s of histograms- each
TObjArray
can be for a different subset of events (bin), e.g., different minimumy
cuts, so that their histograms can be compared and divided; you can open theroot
file in aTBrowser
to browse the histograms - the
Histos
class is a container for the histograms, and instances ofHistos
will also be streamed toroot
files, along with the binning scheme (handled by theBinSet
class); downstream post processing code makes use of these streamed objects, rather than theTObjArray
s
- each
- a typical analysis macro must do the following:
- The bins may be specified arbitrarily, using the
BinSet
andCutDef
classes- see example
analysis_*C
macros CutDef
can store and apply an arbitrary cut for a single variable, such as:- ranges:
a<x<b
or|x-a|<b
- minimum or maximum:
x>a
orx<a
- no cut (useful for "full" bins)
- ranges:
- The set of bins for a variable is defined by
BinSet
, a set of bins- These bins can be defined arbitrarily, with the help of the
CutDef
class; you can either:- Automatically define a set of bins, e.g.,
N
bins betweena
andb
- Equal width in linear scale
- Equal width in log scale (useful for
x
andQ2
) - Any custom
TAxis
- Manually define each bin
- example: specific bins in
z
andpT
:|z-0.3|<0.1
and|pT-0.2|<0.05
|z-0.7|<0.1
and|pT-0.5|<0.05
- example: 3 different
y
minima:y>0.05
y>0.03
y>0
(no cut)- note that the arbitrary specification permits bins to overlap, e.g.,
an event with
y=0.1
will appear in all three bins
- example: specific bins in
- Automatically define a set of bins, e.g.,
- These bins can be defined arbitrarily, with the help of the
- see example
- Multi-dimensional binning
- Binning in multi-dimensions is allowed, e.g., 3D binning in
x
,Q2
,z
- See Adage documentation for more information on how multi-dimensional binning is handled, as well as the Adage syntax reference
- Be careful of the curse of dimensionality
- You can restrict the binning in certain dimensions by taking only diagonal
elements of a matrix of bins (see
diagonal
settings insrc/Analysis.h
)
- You can restrict the binning in certain dimensions by taking only diagonal
elements of a matrix of bins (see
- Binning in multi-dimensions is allowed, e.g., 3D binning in
- The
Analysis
class is capable of producing a simpleTTree
, handled by theSimpleTree
class, which can also be useful for analysis- As the name suggests, it is a flat tree with a minimal set of variables, specifically needed for asymmetry analysis
- The tree branches are configured to be compatible with asymmetry analysis code, built on the BruFit framework
- There is a switch in
Analysis
to enable/disable whether this tree is written
- results processing is handled by the
PostProcessor
class, which does tasks such as printing tables of average values, and drawing ratios of histograms- this class is steered by
postprocess_*.C
macros, which includes the following:- instantiate
Analysis
, needed for bin loops and settings - instantiate
PostProcessor
, with the specifiedroot
file that contains output from the analysis macro - loop over bins and perform actions
- instantiate
- this class is steered by
- see
src/PostProcessor.h
andsrc/PostProcessor.cxx
for available post-processing routines; you are welcome to add your own
- the
SimpleTree
output is compatible with asymmetry code, included here as a submodule inasym/
- clone this
largex-eic
repository with--recurse-submodules
, to getlargex-eic-asym
and its main dependencybrufit
- follow
asym/README.md
- clone this
- This repository is in an early stage of development, so bugs and issues are likely
- Contributions are welcome via pull requests and issues reporting; you may also find it useful to fork the repository for your own purposes, so that you do not have to feel limited by existing code (you can still send pull requests from a fork)
- Continuous Integration (CI) will trigger on pull requests, which will build
and test your contribution; see
Actions
tab for workflows for details - It is recommended to keep up-to-date with developments by browsing the pull
requests, issues, and viewing the latest commits by going to the
Insights
tab, and clickingNetwork
to show the branch topology