Skip to content

Commit e9098df

Browse files
author
xhlulu
committed
ML Docs: Added coefficient MLR example
1 parent 1af0416 commit e9098df

File tree

1 file changed

+68
-33
lines changed

1 file changed

+68
-33
lines changed

doc/python/ml-regression.md

Lines changed: 68 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ jupyter:
3939
### Ordinary Least Square (OLS) with `plotly.express`
4040

4141

42-
This example shows how to use `plotly.express` to train a simply Ordinary Least Square (OLS) that can predict the tips servers will receive based on the value of the total bill.
42+
This example shows how to use `plotly.express`'s `trendline` parameter to train a simply Ordinary Least Square (OLS) for predicting the tips servers will receive based on the value of the total bill.
4343

4444
```python
4545
import plotly.express as px
@@ -108,7 +108,7 @@ fig.show()
108108

109109
## Comparing different kNN models parameters
110110

111-
Compare the performance of two different models on the same dataset. This can be easily combined with discrete color legends from `px`.
111+
Compare the performance of two different models on the same dataset. This can be easily combined with discrete color legends from `px`, such as coloring by the assigned `sex`.
112112

113113
```python
114114
import numpy as np
@@ -136,9 +136,51 @@ fig.add_traces(go.Scatter(x=x_range, y=y_dist, name='Weights: Distance'))
136136
fig.show()
137137
```
138138

139+
## Displaying `PolynomialFeatures` using $\LaTeX$
140+
141+
It's easy to diplay latex equations in legend and titles by simply adding `$` before and after your equation.
142+
143+
```python
144+
import numpy as np
145+
import plotly.express as px
146+
import plotly.graph_objects as go
147+
from sklearn.linear_model import LinearRegression
148+
from sklearn.preprocessing import PolynomialFeatures
149+
150+
def format_coefs(coefs):
151+
equation_list = [f"{coef}x^{i}" for i, coef in enumerate(coefs)]
152+
equation = "$" + " + ".join(equation_list) + "$"
153+
154+
replace_map = {"x^0": "", "x^1": "x", '+ -': '- '}
155+
for old, new in replace_map.items():
156+
equation = equation.replace(old, new)
157+
158+
return equation
159+
160+
df = px.data.tips()
161+
X = df.total_bill.values.reshape(-1, 1)
162+
x_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
163+
164+
fig = px.scatter(df, x='total_bill', y='tip', opacity=0.65)
165+
for n_features in [1, 2, 3, 4]:
166+
poly = PolynomialFeatures(n_features)
167+
poly.fit(X)
168+
X_poly = poly.transform(X)
169+
x_range_poly = poly.transform(x_range)
170+
171+
model = LinearRegression(fit_intercept=False)
172+
model.fit(X_poly, df.tip)
173+
y_poly = model.predict(x_range_poly)
174+
175+
equation = format_coefs(model.coef_.round(2))
176+
fig.add_traces(go.Scatter(x=x_range.squeeze(), y=y_poly, name=equation))
177+
178+
fig.show()
179+
```
180+
139181
## 3D regression surface with `px.scatter_3d` and `go.Surface`
140182

141-
Visualize the decision plane of your model whenever you have more than one variable in your `X`.
183+
Visualize the decision plane of your model whenever you have more than one variable in your input data.
142184

143185
```python
144186
import numpy as np
@@ -176,53 +218,44 @@ fig.add_traces(go.Surface(x=xrange, y=yrange, z=pred, name='pred_surface'))
176218
fig.show()
177219
```
178220

179-
## Displaying `PolynomialFeatures` using $\LaTeX$
221+
## Visualizing coefficients for multiple linear regression (MLR)
180222

181-
It's easy to diplay latex equations in legend and titles by simply adding `$` before and after your equation.
223+
When you are fitting a linear regression, you want to often know what feature matters the most in your regression's output.
182224

183225
```python
184-
import numpy as np
185226
import plotly.express as px
186227
import plotly.graph_objects as go
187228
from sklearn.linear_model import LinearRegression
188-
from sklearn.preprocessing import PolynomialFeatures
189229

190-
def format_coefs(coefs):
191-
equation_list = [f"{coef}x^{i}" for i, coef in enumerate(coefs)]
192-
equation = "$" + " + ".join(equation_list) + "$"
193-
194-
replace_map = {"x^0": "", "x^1": "x", '+ -': '- '}
195-
for old, new in replace_map.items():
196-
equation = equation.replace(old, new)
197-
198-
return equation
230+
df = px.data.iris()
199231

200-
df = px.data.tips()
201-
X = df.total_bill.values.reshape(-1, 1)
202-
x_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
232+
X = df.drop(columns=['petal_width', 'species_id'])
233+
X = pd.get_dummies(X, columns=['species'], prefix_sep='=')
234+
y = df['petal_width']
203235

204-
fig = px.scatter(df, x='total_bill', y='tip', opacity=0.65)
205-
for n_features in [1, 2, 3, 4]:
206-
poly = PolynomialFeatures(n_features)
207-
poly.fit(X)
208-
X_poly = poly.transform(X)
209-
x_range_poly = poly.transform(x_range)
236+
model = LinearRegression()
237+
model.fit(X, y)
210238

211-
model = LinearRegression(fit_intercept=False)
212-
model.fit(X_poly, df.tip)
213-
y_poly = model.predict(x_range_poly)
214-
215-
equation = format_coefs(model.coef_.round(2))
216-
fig.add_traces(go.Scatter(x=x_range.squeeze(), y=y_poly, name=equation))
239+
colors = ['Positive' if c > 0 else 'Negative' for c in model.coef_]
217240

241+
fig = px.bar(
242+
x=X.columns, y=model.coef_, color=colors,
243+
color_discrete_sequence=['red', 'blue'],
244+
labels=dict(x='Feature', y='Linear coefficient'),
245+
title='Weight of each feature for predicting petal width'
246+
)
218247
fig.show()
219248
```
220249

221250
## Prediction Error Plots
222251

252+
When you are working with very high-dimensional data, it is inconvenient to plot every dimension with your output `y`. Instead, you can use methods such as prediction error plots, which let you visualize how well your model does compared to the ground truth.
253+
223254

224255
### Simple actual vs predicted plot
225256

257+
This example shows you the simplest way to compare the predicted output vs. the actual output. A good model will have most of the scatter dots near the diagonal black line.
258+
226259
```python
227260
import plotly.express as px
228261
import plotly.graph_objects as go
@@ -323,10 +356,10 @@ fig = px.scatter(
323356
fig.show()
324357
```
325358

326-
## Regularization visualization
359+
## Visualize regularization across different cross-validation folds
327360

328361

329-
### Plot alphas for individual folds
362+
In this example, we show how to plot the results of various $\alpha$ penalization values from the results of cross-validation using scikit-learn's `LassoCV`. This is useful to see how much the error of the optimal alpha actually varies across CV folds.
330363

331364
```python
332365
import pandas as pd
@@ -335,6 +368,8 @@ import plotly.express as px
335368
import plotly.graph_objects as go
336369
from sklearn.linear_model import LassoCV
337370

371+
N_FOLD = 6
372+
338373
# Load and preprocess the data
339374
df = px.data.gapminder()
340375
X = df.drop(columns=['lifeExp', 'iso_num'])

0 commit comments

Comments
 (0)