Skip to content

Latest commit

 

History

History
 
 

examples

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

10 minutes to Evidently

This is a short introduction to Evidently.

Input Data

You should prepare the data as pandas DataFrames. It could be two datasets - reference data and current production data. Or just one - you will need to identify rows that refer to reference and production data. If you want to generate reports with no comparison performed, you will need one dataset also. If you deal with large datasets, you can take a sample from it:

df.sample(1000, random_state=0) 

Column mapping

To create column mapping, you need to specify the following parameters:

  • target - the name of the column with the target function
  • prediction - the name of the column(s) with model predictions
  • id - ID column in the dataset
  • datetime - the name of the column with datetime
  • numerical_features - list of numerical features
  • categorical_features - list of categorical features

If the column_mapping is not specified or set as None, we use the default mapping strategy:

  • All features will be treated as numerical.
  • The column with 'id' name will be treated as an ID column.
  • The column with 'datetime' name will be treated as a datetime column.
  • The column with 'target' name will be treated as a target function.
  • The column with 'prediction' name will be treated as a model prediction.

Example

from evidently.pipeline.column_mapping import ColumnMapping

column_mapping = ColumnMapping()

column_mapping.target = 'y'
column_mapping.prediction = 'pred' # predictions
column_mapping.id = None 
column_mapping.datetime = 'date' 
column_mapping.numerical_features = ['temp', 'atemp', 'humidity'] 
column_mapping.categorical_features = ['season', 'holiday'] 

Choose The Tabs

You can choose one or several of the following Tabs.

  • DataDriftTab to estimate the data drift
  • NumTargetDriftTab to estimate target drift for the numerical target (for problem statements with the numerical target function: regression, probabilistic classification or ranking, etc.)
  • CatTargetDriftTab to estimate target drift for the categorical target (for problem statements with the categorical target function: binary classification, multi-class classification, etc.)
  • RegressionPerformanceTab to explore the performance of a regression model
  • ClassificationPerformanceTab to explore the performance of a classification model
  • ProbClassificationPerformanceTab to explore the performance of a probabilistic classification model and the quality of the model calibration For each tab, you can specify the following parameters:
  • verbose_level - you can get the short version of the tab (verbose_level == 0) and the full version (verbose_level == 1)
  • Include_widgets - You can list all the widgets from the particular report you want to include, and they will appear in the specified order You can explore short and long versions of our custom Evidently report in the Example section

Create Your Dashboard

To generate the report and explore it in the Jupyter notebook run these commands:

from evidently.dashboard import Dashboard
from evidently.dashboard.tabs import DataDriftTab

my_dashboard = Dashboard(tabs=[DataDriftTab()])
my_dashboard.calculate(reference_data, current_data)
my_dashboard.show()

You can set the custom options for the following Reports:

  • num_target_drift_tab (Numerical Target Drift)
  • cat_target_drift_tab (Categorical Target Drift)
  • data_drift_tab (Data Drift) See the example here.

Export the report as an HTML file

To save the Data Drift report as HTML, run:

drift_dashboard.save("reports/my_report.html")

Profiles

You can generate JSON profiles if you want to integrate the calculated metrics and statistical test results into external pipelines and visualization tools. You can include several analyses in a single JSON output. You specify each as a section in a profile: just like you choose tabs in the visual dashboards. You can choose one or several of the following profiles:

  • DataDriftProfileSection to estimate the data drift
  • NumTargetDriftProfileSection to estimate target drift for the numerical target (for problem statements with the numerical target function: regression, probabilistic classification or ranking, etc.)
  • CatTargetDriftProfileSection to estimate target drift for the categorical target (for problem statements with the categorical target function: binary classification, multi-class classification, etc.)
  • RegressionPerformanceProfileSection to explore the performance of a regression model.
  • ClassificationPerformanceProfileSection to explore the performance of a classification model
  • ProbClassificationPerformanceProfileSection to explore the performance of a probabilistic classification model and the quality of the model calibration

To generate the Data Drift profile, run:

from evidently.model_profile import Profile
from evidently.model_profile.sections import DataDriftProfileSection

my_profile = Profile(sections=[DataDriftProfileSection()])
my_profile.calculate(reference_data, current_data) 
my_profile.json()

Examples

Sample notebooks

Here you can find simple examples on toy datasets to quickly explore what Evidently can do right out of the box.

Report Jupyter notebook Colab notebook Data source
Data Drift + Categorical Target Drift (Multiclass) link link Iris plants sklearn.datasets
Data Drift + Categorical Target Drift (Binary) link link Breast cancer sklearn.datasets
Data Drift + Numerical Target Drift link link California housing sklearn.datasets
Regression Performance link link Bike sharing UCI: link
Classification Performance (Multiclass) link link Iris plants sklearn.datasets
Probabilistic Classification Performance (Multiclass) link link Iris plants sklearn.datasets
Classification Performance (Binary) link link Breast cancer sklearn.datasets
Probabilistic Classification Performance (Binary) link link Breast cancer sklearn.datasets
Data Quality link link Bike sharing UCI: link

How-to notebooks

These examples answer “how-to” questions - they help you to adjust evidently as you need

How to Jupyter notebook Colab notebook Data source
How to customize drift dashboards? (set confidence level, number of bins in a histogram and statistical test) link link California housing sklearn.datasets
How to change classification threshold? How to cut outliers from the histagram plot? How to define the width of confidence interval depicted on plots? link link Wine Quality openml
How to add your own widget or create your own report? link link California housing sklearn.datasets
How to specify a colour scheme for the Dashboard? link link Iris plants sklearn.datasets
How to create a text annotation in the Dashboard? link link Iris plants sklearn.datasets
How to assign a particular stattest from the evidently library for a feature or features? link link Adult data set openml

Data Stories

To better understand potential use cases (such as model evaluation and monitoring), refer to the detailed tutorials accompanied by the blog posts.

Title Jupyter notebook Colab notebook Blog post Data source
Monitor production models link link How to break a model in 20 days Bike sharing UCI: link
Compare two models link link What Is Your Model Hiding? HR Employee Attrition: link
Custom tab and PSI widget link link --- California housing sklearn.datasets

Integrations

To see how to integrate Evidently in your prediction pipelines and use it with other tools, refer to the integrations.

Title link to tutorial
Real-time ML monitoring with Grafana Evidently + Grafana
Batch ML monitoring with Airflow Evidently + Airflow
Log Evidently metrics in MLflow UI Evidently + MLflow