Reporting Tool

After a training run has completed, the reporting tool in report_utils.py allows you to generate a concise experiment report with aggregated metrics, and metadata. It parses TFEvent files saved during training e.g. via tf.summary.SummaryWriter and simulates early stopping. The logic to parse TFEvent files directly is implemented in tfevent_utils.py.

Main Features

Complementary: This tool is not meant to replace the Tensorboard interface to visualizing the temporal graphs TfEvents files. Rather it is meant to provide a reliable way to create summarized reports (metrics snapshot at the simulated early stopping step) that can be easily loaded into tables/dataframes for comparing experiments at scale.
Lightweight: Gathers raw metrics directly from TFEvents files written to a directory during training.
Fast scalable analysis: Since reports are usually generated at the end of training, loading existing reports is fast and no aggregation needs to be performed at analysis time. This allows us to load thousands of reports and analyze them together.
Flexible: Can be used for different numeric metrics and machine learning tasks (e.g. translation, image recognition).
Customizable: Add more fields to the report to save the information you need for experiment analysis.
All reports in one place: Creating small aggregated report files and saving them to a centralized directory allows the user to document and find experiments more easily, and compare against all previously run experiments. The tool provides a function to load all reports saved to a directory.

Smoothing, Simulated Early Stopping and Aggregated Metrics

The reporting tool performs the following steps to extract the aggregated metrics:

Apply smoothing as specified to the early stopping metric, e.g. eval loss
Find early stopping step on the smoothed series, e.g. where eval loss is minimized
Extract aggregated smoothed and unsmoothed values for all metrics at the early stopping step

Handling of NaN values

In some cases, an experiment contains NaN (not a number) values in its event series, but the user may still want to produce a report for it by stopping early on a step where the early stopping metric is not NaN.

Smoothing is not well-defined for series containing NaN values. Because of that, we decided to not apply smoothing on experiments containing NaN values in any of its metrics.

Before computing the early stopping step and aggregating the metrics, the tool checks for NaNs in all event series. If any NaNs are found, no smoothing will be applied for early stopping and metric aggregation. Only unsmoothed metrics will be included in the report.

Usage

Ensure that your TFEvent files are saved in a directory in one of the following structures:

A: Subdirectories for each dataset

├── <model_dir>
│   ├── <train_ds_dir>
│   │   ├── events.out.tfevents.[...]
│   │   ├── events.out.tfevents.[...]
│   │   ├── (Any number of tfevent files is ok, as long as they are generated by a single run)
│   ├── <eval_ds1_dir>
│   │   ├── events.out.tfevents.[...]
│   ├── <eval_ds2_dir>
│   │   ├── events.out.tfevents.[...]
│   ├── <eval_ds3_dir>
│   │   ├── events.out.tfevents.[...]
│   ├── ...
│   └── other files or dirs, non tfevent files will be ignored.

B: All TFEvents files stored in top-level directory

├── <model_dir>
│   ├── events.out.tfevents.[...]
│   ├── events.out.tfevents.[...]
│   ├── events.out.tfevents.[...]
│   ├── ...
│   └── other files or dirs, non tfevent files will be ignored.

Report Creation

After training has completed, call report_utils.create_end_of_training_report(). The function will return an instance of the report_utils.ExperimentReport dataclass. Please refer to the docstrings for more details on the arguments.

Early stopping: You can specify which attribute/tag, aggregation function and optionally which subdirectory to use for determining the early stopping step. You can also indicate after which step to consider early stopping. This can be useful when you modify your model / data during training, and you only want to consider the model after some modification (e.g. quantization).

The following example A command simulates early stopping based on when the loss (early_stop_attr='loss') in the eval_ds1 subdirectory (early_stop_ds_dir='eval_ds1') is minimized (early_stop_agg=report_utils.MinOrMax.MIN), considering only steps after 20000 (start_step=20000).
Tags to include: When you use the tensorflow summary writer, you can save values to different tags, like loss. You can specify which tags to include in your report via the tags_to_include arg.

In the example, accuracy, loss , l2_loss values at the early stopping step on eval_ds1 (early_stop_ds_dir), train_ds and eval_ds2 (other_ds_dirs) will be included in the report. If a tag does not exist for a specified ds_dir, it won't show up in the report unter that ds_dir. In the example, only train_ds has l2_loss.
Smoothing: Optionally, you can smooth the metrics, which can be helpful when the evaluation is done on subsets of a dataset and the values are noisy. To configure smoothing, specify a smoothing_kernel, and window_size_in_steps, which defines the smoothing window size.

In the example, we use a triangular smoothing kernel and a window size of 6000 (window_size_in_steps=6000), which should include roughly 5 events, given the evaluation frequency of 1087 (eval_freq=1087).
Additional experiment details: Optionally, you can include experiment_name, user_name, launch_time, tensorboard_id in your report.

Example A: If your TfEvent files are stored in subdirectories corresponding to different datasets

report = report_utils.create_end_of_training_report_oss(
            model_dir='/lisa/test_experiment',
            early_stop_ds_dir='eval_ds1',
            early_stop_attr='loss',
            early_stop_agg=report_utils.MinOrMax.MIN,
            other_ds_dirs=['train_ds', 'eval_ds2'],
            tags_to_include=['accuracy', 'loss', 'l2_loss'],
            smoothing_kernel=report_utils.SmoothingKernel.TRIANGULAR,
            eval_freq=1087,
            window_size_in_steps=6000,
            start_step=20000,
            num_train_steps=200000,
            experiment_name='test_experiment',
            user_name='lisa',
            launch_time='20210216T071237'.
            tensorboard_id='<tensorboard_dev_url>'
         )

The resulting report dataclass would look like this:

{
 'early_stop_step': 179355,
 'eval_freq': 1087,
 'experiment_name': 'test_experiment',
 'first_nan_step': None,
 'launch_time': '20210216T071237',
 'model_dir': '/lisa/test_experiment',
 'num_train_steps': 200000,
 'report_query_args': {'early_stop_agg': 'MIN',
                       'early_stop_attr': 'loss',
                       'early_stop_ds_dir': 'eval1',
                       'other_ds_dirs': ['train',
                                         'eval2',],
                       'smoothing_kernel': 'TRIANGULAR',
                       'start_step': 20000,
                       'tags_to_include': ['accuracy',
                                           'loss',
                                           'l2_loss'],
                       'window_size_in_steps': 6000},
 'tensorboard_id': '8773780292699030478',
 'metrics': {'eval1': {'accuracy': 0.680,
                       'loss': 1.387},
             'eval2': {'accuracy': 0.576,
                       'loss': 1.691},
             'train': {'accuracy': 0.624,
                       'loss': 1.591
                       'l2_loss':5.125},
             }
 'unsmoothed_metrics': {'eval1': {'accuracy': 0.675,
                                  'loss': 1.390},
                        'eval2': {'accuracy': 0.573,
                                  'loss': 1.695},
                        'train': {'accuracy': 0.625,
                                  'loss': 1.590
                                  'l2_loss':5.122},
                         }
 'user_name': 'lisa'}

Example B: if your TFEvent files are stored in top-level model_dir, without sub-directories

report = report_utils.create_end_of_training_report_oss(
            model_dir='/lisa/test_experiment',
            early_stop_attr='eval1_loss',
            early_stop_agg=report_utils.MinOrMax.MIN,
            tags_to_include=['train_loss', 'train_acc', 'eval1_loss', 'eval1_acc'],
            smoothing_kernel=report_utils.SmoothingKernel.TRIANGULAR,
            eval_freq=1087,
            window_size_in_steps=6000,
            start_step=20000,
            num_train_steps=200000,
            experiment_name='test_experiment',
            user_name='lisa',
            launch_time='20210216T071237',
            tensorboard_id='<tensorboard_dev_url>'
         )

The resulting report dataclass would look like this:

{
 'early_stop_step': 179355,
 'eval_freq': 1087,
 'experiment_name': 'test_experiment',
 'first_nan_step': None,
 'launch_time': '20210216T071237',
 'model_dir': '/lisa/test_experiment',
 'num_train_steps': 200000,
 'report_query_args': {'early_stop_agg': 'MIN',
                       'early_stop_attr': 'eval1_loss',
                       'early_stop_ds_dir': None,
                       'other_ds_dirs': None,
                       'smoothing_kernel': 'TRIANGULAR',
                       'start_step': 20000,
                       'tags_to_include': ['train_loss',
                                           'train_acc',
                                           'eval1_loss',
                                           'eval1_acc'],
                       'window_size_in_steps': 6000},
 'tensorboard_id': '8773780292699030478',
 'metrics': {'eval1_loss': 1.387,
             'eval1_acc': 0.576,
             'train_loss': 1.591,
             'train_acc': 0.624
             }
 'unsmoothed_metrics': {'eval1_loss': 1.390,
                        'eval1_acc': 0.675,
                        'train_loss': 1.590,
                        'train_acc': 0.625
                       }
 'user_name': 'lisa'}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reporting Tool

Main Features

Smoothing, Simulated Early Stopping and Aggregated Metrics

Handling of NaN values

Usage

A: Subdirectories for each dataset

B: All TFEvents files stored in top-level directory

Report Creation

Example A: If your TfEvent files are stored in subdirectories corresponding to different datasets

Example B: if your TFEvent files are stored in top-level model_dir, without sub-directories

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reporting Tool

Main Features

Smoothing, Simulated Early Stopping and Aggregated Metrics

Handling of NaN values

Usage

A: Subdirectories for each dataset

B: All TFEvents files stored in top-level directory

Report Creation

Example A: If your TfEvent files are stored in subdirectories corresponding to different datasets

Example B: if your TFEvent files are stored in top-level model_dir, without sub-directories