ML Docs: Added coefficient MLR example

xhlulu · xhlulu · commit e9098df5075d · 2020-03-06T15:05:49.000-05:00
diff --git a/doc/python/ml-regression.md b/doc/python/ml-regression.md
@@ -39,7 +39,7 @@ jupyter:
 ### Ordinary Least Square (OLS) with `plotly.express`
 
 
-This example shows how to use `plotly.express` to train a simply Ordinary Least Square (OLS) that can predict the tips servers will receive based on the value of the total bill.
+This example shows how to use `plotly.express`'s `trendline` parameter to train a simply Ordinary Least Square (OLS) for predicting the tips servers will receive based on the value of the total bill.
 
 ```python
 import plotly.express as px
@@ -108,7 +108,7 @@ fig.show()
 
 ## Comparing different kNN models parameters
 
-Compare the performance of two different models on the same dataset. This can be easily combined with discrete color legends from `px`.
+Compare the performance of two different models on the same dataset. This can be easily combined with discrete color legends from `px`, such as coloring by the assigned `sex`.
 
 ```python
 import numpy as np
@@ -136,9 +136,51 @@ fig.add_traces(go.Scatter(x=x_range, y=y_dist, name='Weights: Distance'))
 fig.show()
 ```
 
+## Displaying `PolynomialFeatures` using $\LaTeX$
+
+It's easy to diplay latex equations in legend and titles by simply adding `$` before and after your equation.
+
+```python
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from sklearn.linear_model import LinearRegression
+from sklearn.preprocessing import PolynomialFeatures
+
+def format_coefs(coefs):
+    equation_list = [f"{coef}x^{i}" for i, coef in enumerate(coefs)]
+    equation = "$" +  " + ".join(equation_list) + "$"
+    
+    replace_map = {"x^0": "", "x^1": "x", '+ -': '- '}
+    for old, new in replace_map.items():
+        equation = equation.replace(old, new)
+        
+    return equation
+
+df = px.data.tips()
+X = df.total_bill.values.reshape(-1, 1)
+x_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
+
+fig = px.scatter(df, x='total_bill', y='tip', opacity=0.65)
+for n_features in [1, 2, 3, 4]:
+    poly = PolynomialFeatures(n_features)
+    poly.fit(X)
+    X_poly = poly.transform(X)
+    x_range_poly = poly.transform(x_range)
+
+    model = LinearRegression(fit_intercept=False)
+    model.fit(X_poly, df.tip)
+    y_poly = model.predict(x_range_poly)
+    
+    equation = format_coefs(model.coef_.round(2))
+    fig.add_traces(go.Scatter(x=x_range.squeeze(), y=y_poly, name=equation))
+
+fig.show()
+```
+
 ## 3D regression surface with `px.scatter_3d` and `go.Surface`
 
-Visualize the decision plane of your model whenever you have more than one variable in your `X`.
+Visualize the decision plane of your model whenever you have more than one variable in your input data.
 
 ```python
 import numpy as np
@@ -176,53 +218,44 @@ fig.add_traces(go.Surface(x=xrange, y=yrange, z=pred, name='pred_surface'))
 fig.show()
 ```
 
-## Displaying `PolynomialFeatures` using $\LaTeX$
+## Visualizing coefficients for multiple linear regression (MLR)
 
-It's easy to diplay latex equations in legend and titles by simply adding `$` before and after your equation.
+When you are fitting a linear regression, you want to often know what feature matters the most in your regression's output.
 
 ```python
-import numpy as np
 import plotly.express as px
 import plotly.graph_objects as go
 from sklearn.linear_model import LinearRegression
-from sklearn.preprocessing import PolynomialFeatures
 
-def format_coefs(coefs):
-    equation_list = [f"{coef}x^{i}" for i, coef in enumerate(coefs)]
-    equation = "$" +  " + ".join(equation_list) + "$"
-    
-    replace_map = {"x^0": "", "x^1": "x", '+ -': '- '}
-    for old, new in replace_map.items():
-        equation = equation.replace(old, new)
-        
-    return equation
+df = px.data.iris()
 
-df = px.data.tips()
-X = df.total_bill.values.reshape(-1, 1)
-x_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
+X = df.drop(columns=['petal_width', 'species_id'])
+X = pd.get_dummies(X, columns=['species'], prefix_sep='=')
+y = df['petal_width']
 
-fig = px.scatter(df, x='total_bill', y='tip', opacity=0.65)
-for n_features in [1, 2, 3, 4]:
-    poly = PolynomialFeatures(n_features)
-    poly.fit(X)
-    X_poly = poly.transform(X)
-    x_range_poly = poly.transform(x_range)
+model = LinearRegression()
+model.fit(X, y)
 
-    model = LinearRegression(fit_intercept=False)
-    model.fit(X_poly, df.tip)
-    y_poly = model.predict(x_range_poly)
-    
-    equation = format_coefs(model.coef_.round(2))
-    fig.add_traces(go.Scatter(x=x_range.squeeze(), y=y_poly, name=equation))
+colors = ['Positive' if c > 0 else 'Negative' for c in model.coef_]
 
+fig = px.bar(
+    x=X.columns, y=model.coef_, color=colors,
+    color_discrete_sequence=['red', 'blue'],
+    labels=dict(x='Feature', y='Linear coefficient'),
+    title='Weight of each feature for predicting petal width'
+)
 fig.show()
 ```
 
 ## Prediction Error Plots
 
+When you are working with very high-dimensional data, it is inconvenient to plot every dimension with your output `y`. Instead, you can use methods such as prediction error plots, which let you visualize how well your model does compared to the ground truth.
+
 
 ### Simple actual vs predicted plot
 
+This example shows you the simplest way to compare the predicted output vs. the actual output. A good model will have most of the scatter dots near the diagonal black line.
+
 ```python
 import plotly.express as px
 import plotly.graph_objects as go
@@ -323,10 +356,10 @@ fig = px.scatter(
 fig.show()
 ```
 
-## Regularization visualization
+## Visualize regularization across different cross-validation folds
 
 
-### Plot alphas for individual folds
+In this example, we show how to plot the results of various $\alpha$ penalization values from the results of cross-validation using scikit-learn's `LassoCV`. This is useful to see how much the error of the optimal alpha actually varies across CV folds.
 
 ```python
 import pandas as pd
@@ -335,6 +368,8 @@ import plotly.express as px
 import plotly.graph_objects as go
 from sklearn.linear_model import LassoCV
 
+N_FOLD = 6
+
 # Load and preprocess the data
 df = px.data.gapminder()
 X = df.drop(columns=['lifeExp', 'iso_num'])