All notable changes to this project will be documented in this file.
- Post-release
2.1
a bug has been reported preventingpredict_model
function to work inregression
module in a new notebook session, whentransform_target
was set toFalse
during model training. This issue has been fixed in PyCaret release2.1.2
. To learn more about the issue: pycaret#525
- Post-release
2.1
a bug has been identified in MLFlow back-end. The error is only caused whenlog_experiment
in thesetup
function is set to True and is applicable to all the modules. The cause of the error has been identified and an issue is opened withMLFlow
. The error is caused byinfer_signature
function inmlflow.sklearn.log_model
and is only raised when there are missing values in the dataset. This issue has been fixed in PyCaret release2.1.1
by skipping the signature in cases whereMLFlow
raises exception.
- Model Deployment Model deployment support for
gcp
andazure
has been added indeploy_model
function for all modules. Seedocumentation
for details. - Compare Models Budget Time new parameter
budget_time
added incompare_models
function. To set the upper limit oncompare_models
training time,budget_time
parameter can be used. - Feature Selection New feature selection method
boruta
has been added for feature selection. By default,feature_selection_method
parameter in thesetup
function is set toclassic
but can be set toboruta
for feature selection using boruta algorithm. This change is applicable forpycaret.classification
andpycaret.regression
. - Numeric Imputation New method
zero
has been added in thenumeric_imputation
in thesetup
function. When method is set tozero
, missing values are replaced with constant 0. Default behavior ofnumeric_imputation
is unchanged. - Plot Model New parameter
scale
has been added inplot_model
for all modules to enable high quality images for research publications. - User Defined Loss Function You can now pass
custom_scorer
for optimizing user defined loss function intune_model
forpycaret.classification
andpycaret.regression
. You must usemake_scorer
fromsklearn
to create custom loss function that can be passed intocustom_scorer
for thetune_model
function. - Change in Pipeline Behavior When using
save_model
themodel
object is appended intoPipeline
, as such the behavior ofPipeline
andpredict_model
is now changed. Instead of saving alist
,save_model
now savesPipeline
object where trained model is on last position. The user functionality on front-end forpredict_model
remains same. - Compare Models parameter
blacklist
andwhitelist
is now renamed toexclude
andinclude
with no change in functionality. - Predict Model Labels The
Label
column returned bypredict_model
function inpycaret.classification
now returns the original label instead of encoded value. This change is made to make output frompredict_model
more human-readable. A new parameterencoded_labels
is added, which isFalse
by default. When set toTrue
, it will return encoded labels. - Model Logging Model persistence in the backend when
log_experiment
is set toTrue
is now changed. Instead of using internalsave_model
functionality, it now adopts tomlflow.sklearn.save_model
to allow the use of Model Registry andMLFlow
native deployment functionalities. - CatBoost Compatibility
CatBoostClassifier
is now compatible withblend_models
inpycaret.classification
. As suchblend_models
without anyestimator_list
will now result in blending total of15
estimators includingCatBoostClassifier
. - Stack Models
stack_models
inpycaret.classification
andpycaret.regression
now adopts toStackingClassifier()
andStackingRegressor
fromsklearn
. As such thestack_models
function now returnssklearn
object instead of customlist
in previous versions. - Create Stacknet
create_stacknet
inpycaret.classification
andpycaret.regression
is now removed. - Tune Model
tune_model
inpycaret.classification
andpycaret.regression
now inherits params from the inputestimator
. As such if you have trainedxgboost
,lightgbm
orcatboost
on gpu will not inherits training method fromestimator
. - Interpret Model
**kwargs
argument now added ininterpret_model
. - Pandas Categorical Type All modules are now compatible with
pandas.Categorical
object. Internally they are converted into object and are treated as the same way asobject
orbool
is treated. - use_gpu A new parameter added in the
setup
function forpycaret.classification
andpycaret.regression
. In2.1
it was added to prepare for the backend work required to make this change in future releases. As such usinguse_gpu
param in2.1
has no impact. - Unit Tests Unit testing enhanced. Continious improvement in progress https://github.com/pycaret/pycaret/tree/master/pycaret/tests
- Automated Documentation Added Automated documentation now added. Documentation on Website will only update for
major
releases 0.X. For all minor monthly releases, documentation will be available on: https://pycaret.readthedocs.io/en/latest/ - Introduction of GitHub Actions CI/CD build testing is now moved from
travis-ci
togithub-actions
.pycaret-nightly
is now being published every 24 hours automatically. - Tutorials All tutorials are now updated using
pycaret==2.0
. https://github.com/pycaret/pycaret/tree/master/tutorials - Resources New resources added under
/pycaret/resources/
https://github.com/pycaret/pycaret/tree/master/resources - Example Notebook Many example notebooks added under
/pycaret/examples/
https://github.com/pycaret/pycaret/tree/master/examples
- Experiment Logging MLFlow logging backend added. New parameters
log_experiment
experiment_name
log_profile
log_data
added insetup
. Available inpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- Save / Load Experiment
save_experiment
andload_experiment
function frompycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
is removed in PyCaret 2.0 - System Logging System log files now generated when
setup
is executed.logs.log
file is saved in current working directory. Functionget_system_logs
can be used to access log file in notebook. - Command Line Support When using PyCaret 2.0 outside of Notebook,
html
parameter insetup
must be set to False. - Imbalance Dataset
fix_imbalance
andfix_imbalance_method
parameter added insetup
forpycaret.classification
. When set to True, SMOTE is applied by default to create synthetic datapoints for minority class. To change the method pass any class fromimblearn
that supportsfit_resample
method infix_imbalance_method
parameter. - Save Plot
save
parameter added inplot_model
. When set to True, it saves the plot aspng
orhtml
in current working directory. - kwargs
kwargs**
added increate_model
forpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
- choose_better
choose_better
andoptimize
parameter added intune_model
ensemble_model
blend_models
stack_models
create_stacknet
inpycaret.classification
andpycaret.regression
. Read the details below to learn more about thi added increate_model
forpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
- Training Time
TT (Sec)
added incompare_models
function forpycaret.classification
andpycaret.regression
- New Metric: MCC
MCC
metric added in score grid forpycaret.classification
- NEW FUNCTION: automl() New function
automl
added inpycaret.classification
pycaret.regression
- NEW FUNCTION: pull() New function
pull
added inpycaret.classification
pycaret.regression
- NEW FUNCTION: models() New function
models
added inpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- NEW FUNCTION: get_logs() New function
get_logs
added inpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- NEW FUNCTION: get_config() New function
get_config
added inpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- NEW FUNCTION: set_config() New function
set_config
added inpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- NEW FUNCTION: get_system_logs New function
get_logs
added inpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- CHANGE IN BEHAVIOR: compare_models
compare_models
now returns top_n models defined byn_select
parameter, by default set to 1. - CHANGE IN BEHAVIOR: tune_model
tune_model
function inpycaret.classification
andpycaret.regression
now requires trained model object to be passed asestimator
instead of string abbreviation / ID. - REMOVED DEPENDENCIES
awscli
andshap
removed from requirements.txt. To useinterpret_model
function inpycaret.classification
pycaret.regression
anddeploy_model
function inpycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
, these libraries will have to be installed separately.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
remove_perfect_collinearity
parameter added insetup()
. Default set to False.
When set to True, perfect collinearity (features with correlation = 1) is removed from the dataset, When two features are 100% correlated, one of it is randomly dropped from the dataset.fix_imbalance
parameter added insetup()
. Default set to False.
When dataset has unequal distribution of target class it can be fixed using fix_imbalance parameter. When set to True, SMOTE (Synthetic Minority Over-sampling Technique) is applied by default to create synthetic datapoints for minority class.fix_imbalance_method
parameter added insetup()
. Default set to None.
When fix_imbalance is set to True and fix_imbalance_method is None, 'smote' is applied by default to oversample minority class during cross validation. This parameter accepts any module from 'imblearn' that supports 'fit_resample' method.data_split_shuffle
parameter added insetup()
. Default set to True.
If set to False, prevents shuffling of rows when splitting data.folds_shuffle
parameter added insetup()
. Default set to False.
If set to False, prevents shuffling of rows when using cross validation.n_jobs
parameter added insetup()
. Default set to -1.
The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor set n_jobs to None.html
parameter added insetup()
. Default set to True.
If set to False, prevents runtime display of monitor. This must be set to False when using environment that doesnt support HTML.log_experiment
parameter added insetup()
. Default set to False.
When set to True, all metrics and parameters are logged on MLFlow server.experiment_name
parameter added insetup()
. Default set to None.
Name of experiment for logging. When set to None, 'clf' is by default used as alias for the experiment name.log_plots
parameter added insetup()
. Default set to False.
When set to True, specific plots are logged in MLflow as a png file.log_profile
parameter added insetup()
. Default set to False.
When set to True, data profile is also logged on MLflow as a html file.log_data
parameter added insetup()
. Default set to False.
When set to True, train and test dataset are logged as csv.verbose
parameter added insetup()
. Default set to True.
Information grid is not printed when verbose is set to False.
pycaret.classification
pycaret.regression
whitelist
parameter added incompare_models
. Default set to None.
In order to run only certain models for the comparison, the model ID's can be passed as a list of strings in whitelist param.n_select
parameter added incompare_models
. Default set to 1.
Number of top_n models to return. use negative argument for bottom selection. For example, n_select = -3 means bottom 3 models.verbose
parameter added incompare_models
. Default set to True.
Score grid is not printed when verbose is set to False.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
cross_validation
parameter added increate_model
. Default set to True.
When cross_validation set to False fold parameter is ignored and model is trained on entire training dataset. No metric evaluation is returned. Only applicable inpycaret.classification
andpycaret.regression
system
parameter added increate_model
. Default set to True.
Must remain True all times. Only to be changed by internal functions.ground_truth
parameter added increate_model
. Default set to None.
When ground_truth is provided, Homogeneity Score, Rand Index, and Completeness Score is evaluated and printer along with other metrics. This is only available inpycaret.clustering
kwargs
parameter added increate_model
.
Additional keyword arguments to pass to the estimator.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
custom_grid
parameter added intune_model
. Default set to None.
To use custom hyperparameters for tuning pass a dictionary with parameter name and values to be iterated. When set to None it uses pre-defined tuning grid. Forpycaret.clustering
pycaret.anomaly
pycaret.nlp
, custom_grid param must be a list of values to iterate over.choose_better
parameter added intune_model
. Default set to False.
When set to set to True, base estimator is returned when the performance doesn't improve by tune_model. This gurantees the returned object would perform atleast equivalent to base estimator created using create_model or model returned by compare_models.
pycaret.classification
pycaret.regression
choose_better
parameter added inensemble_model
. Default set to False.
When set to set to True, base estimator is returned when the performance doesn't improve by tune_model. This gurantees the returned object would perform atleast equivalent to base estimator created using create_model or model returned by compare_models.optimize
parameter added inensemble_model
. Default set toAccuracy
forpycaret.classification
andR2
forpycaret.regression
.
Only used when choose_better is set to True. optimize parameter is used to compare emsembled model with base estimator. Values accepted in optimize parameter forpycaret.classification
are 'Accuracy', 'AUC', 'Recall', 'Precision', 'F1', 'Kappa', 'MCC' and forpycaret.regression
are 'MAE', 'MSE', 'RMSE' 'R2', 'RMSLE' and 'MAPE'.
pycaret.classification
pycaret.regression
choose_better
parameter added inblend_models
. Default set to False.
When set to set to True, base estimator is returned when the performance doesn't improve by tune_model. This gurantees the returned object would perform atleast equivalent to base estimator created using create_model or model returned by compare_models.optimize
parameter added inblend_models
. Default set toAccuracy
forpycaret.classification
andR2
forpycaret.regression
.
Only used when choose_better is set to True. optimize parameter is used to compare emsembled model with base estimator. Values accepted in optimize parameter forpycaret.classification
are 'Accuracy', 'AUC', 'Recall', 'Precision', 'F1', 'Kappa', 'MCC' and forpycaret.regression
are 'MAE', 'MSE', 'RMSE' 'R2', 'RMSLE' and 'MAPE'.
pycaret.classification
pycaret.regression
choose_better
parameter added instack_models
. Default set to False.
When set to set to True, base estimator is returned when the performance doesn't improve by tune_model. This gurantees the returned object would perform atleast equivalent to base estimator created using create_model or model returned by compare_models.optimize
parameter added instack_models
. Default set toAccuracy
forpycaret.classification
andR2
forpycaret.regression
.
Only used when choose_better is set to True. optimize parameter is used to compare emsembled model with base estimator. Values accepted in optimize parameter forpycaret.classification
are 'Accuracy', 'AUC', 'Recall', 'Precision', 'F1', 'Kappa', 'MCC' and forpycaret.regression
are 'MAE', 'MSE', 'RMSE' 'R2', 'RMSLE' and 'MAPE'.
pycaret.classification
pycaret.regression
choose_better
parameter added increate_stacknet
. Default set to False.
When set to set to True, base estimator is returned when the performance doesn't improve by tune_model. This gurantees the returned object would perform atleast equivalent to base estimator created using create_model or model returned by compare_models.optimize
parameter added increate_stacknet
. Default set toAccuracy
forpycaret.classification
andR2
forpycaret.regression
.
Only used when choose_better is set to True. optimize parameter is used to compare emsembled model with base estimator. Values accepted in optimize parameter forpycaret.classification
are 'Accuracy', 'AUC', 'Recall', 'Precision', 'F1', 'Kappa', 'MCC' and forpycaret.regression
are 'MAE', 'MSE', 'RMSE' 'R2', 'RMSLE' and 'MAPE'.
pycaret.classification
pycaret.regression
verbose
parameter added inpredict_model
. Default set to True.
Holdout score grid is not printed when verbose is set to False.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
save
parameter added inplot_model
. Default set to False.
When set to True, Plot is saved as a 'png' file in current working directory.
verbose
parameter added inplot_model
. Default set to True.
Progress bar not shown when verbose set to False.
system
parameter added inplot_model
. Default set to True.
Must remain True all times. Only to be changed by internal functions.
pycaret.classification
pycaret.regression
- This function returns the best model out of all models created in current active environment based on metric defined in optimize parameter.
optimize
string, default = 'Accuracy' forpycaret.classification
and 'R2' forpycaret.regression
Other values you can pass in optimize param are 'AUC', 'Recall', 'Precision', 'F1', 'Kappa', and 'MCC' forpycaret.classification
and 'MAE', 'MSE', 'RMSE', 'R2', 'RMSLE', and 'MAPE' forpycaret.regression
use_holdout
bool, default = False
When set to True, metrics are evaluated on holdout set instead of CV.
pycaret.classification
pycaret.regression
- This function returns the last printed score grid as pandas dataframe.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- This function Returns the table of models available in model library.
type
string, default = None
linear : filters and only return linear models
tree : filters and only return tree based models
ensemble : filters and only return ensemble models
type
parameter only available in pycaret.classification
and pycaret.regression
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- This function returns a table with experiment logs consisting run details, parameter, metrics and tags.
-
experiment_name
string, default = None
When set to None current active run is used. -
save
bool, default = False
When set to True, csv file is saved in current directory.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- This function is used to access global environment variables. Check docstring for the list of global var accessible.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- This function is used to reset global environment variables. Check docstring for the list of global var accessible.
pycaret.classification
pycaret.regression
pycaret.clustering
pycaret.anomaly
pycaret.nlp
- This function is reads and print 'logs.log' file from current active directory. logs.log is generated from
setup
is initialized in any module.