you have to create a venv or install in global the requirements.txt
python3 -m venv [path]
source [path]/bin/activate
pip install -r requirements.txt
python ./training
it will plot the cost , plot repartition of the data, plot the data on the different scale
and print the performance score of the model
python ./predict
>[int mileage ]
> its print the result of the prediction
Linear regression is a statistic method use for modelisation between dependant variable and indepedant variable
In simple terms, it aims to establish a straight line that best fits the observed data points in a coordinate space.
> y^= theta0 + theta1 * x
Optimized algorithmn that learn by itself with a number of iterations and a learning rate It permit to find the minimum error of the linear regressions
t0 = learning * (1/ n_sample) * sum ( y^ - y)
t1 = learning * (1/ n_sample) * sum ( y^ - y) * mileages ( or x)
To calculate the error between predictions and real prices we can use few formula to see if the performance of the linear regression is good and if we dont have to change the lr , n_iterate
we can use
- R2 : 1 - (sum (prices - predict) ^ 2) / (sum (prices - mean_prices)^ 2)
it gave use a performance between [0;1]
- MSE ( mean square error) 1/n_samples - (prices−predict ) ^ 2
- its give us a value and our goal is to reduce the mse ( more the mse is near 0 the better is)
when we have different scale its crucial to normalize or standize our data , so the different scale can't ruin our model
There two of transformation of the data
- We can use Z-scores ( standardization)
x - mean / std
- We can use min max (normalisation)
- (x- min_val) / ( max_val - min_val)
Dont forget to change the theta0 and theta1 if you use normalisation or standardization