-
Notifications
You must be signed in to change notification settings - Fork 0
sivikt/dstrials
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Short-Answer Portion -------------------- Your models have been implemented and customers are now using them in production. Q1. Imagine the credit risk use case above. You have two models: a logistic regression model with an F1-score of 0.60 and a neural network with an F1-score of 0.63. Which model would you recommend for the bank and why? A1. In general, I would say that for the bank logistic regression will be better since it is easier to interpret its results: for example based on weights values and provided input object vector components. In the case above logit F1-score doesn't differ from NN F1-score very much and for bank system this type of metric won't be a primary success criteria. Banks, for example in case of Bad/Good loan, are more interested to have a high Recall for Bad classes and low Precision, so they can double check the objects classified as Bad using human resources and it will be okay for them. Recall = (Precision * F1)/(2 * Precision - F1) Precision = (Recall * F1)/(2 * Recall - F1) where (2 * Precision - F1) > 0, and 0 <= Precision <= 1, (2 * Recall - F1) > 0, and 0 <= Recall <= 1 So, for F1=0.60, 0.3 < Precision <= 1 for F1=0.63, 0.315 < Precision <= 1. The same goes for Recall. Then, if we want Recall to be, saying 0.99 (so that we classified 1% of Bad objects as Good), than for F1=0.60, Precision ~ 0.43 for F1=0.63, Precision ~ 0.46 By that way Type II Errors (False Negatives (FN)) will constitute only 1%, but bank workers will need to check ~54% of incoming load requests which is almost twice less than if they do it without model assistance. -------- Q2. A customer wants to know which features matter for their dataset. They have several models created in the DataRobot platform such as random forest, linear regression, and neural network regressor. They also are pretty good at coding in Python so wouldn't mind using another library or function. How do you suggest this customer gets feature importance for their data? A2. DataRobot platform can try to get feature importance from the model itself if the model supports such functionality. For example, it is possible to get feature importance for random forest and linear regression models since they support it internally. This information can be shown to the customer. If the model doesn't provide such information then the customer can use next approaches to understand feature importance using python language: 1) permute particular feature in all test objects and evaluate the predictions of DataRobot model using this test dataset. Compare performance scores. Find the delta. Repeat this process for all features. Sort features using deltas. 2) drop some feature and fit the models and see the difference in performance scores. Repeate it for other features. 3) compute correlation matrix remove collinear features. Try to fit the models. 4) try to use PCA or SVD to find a new smaller basis for initial dataset and the fit new models and compare performance scores with those the customer got initially.
About
Some DS "keep repeating" exercises
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published