Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Data Processing

Data Smoothing

Savitzky-Golay Filter

A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in a process known as convolution, by fitting successive subsets of adjacent data points with a low-degree polynomial by the method of linear least squares. When the data points are equally spaced, an analytical solution to the least-squares equations can be found, in the form of a single set of "convolution coefficients" that can be applied to all data sub-sets, to give estimates of the smoothed signal, (or derivatives of the smoothed signal) at the central point of each sub-set.

In every step, the window moves and a different part of the original dataset is used. Then, the local polynomial function is fitted to the data in the window, and a new data point is calculated using the polynomial function. After that, the window moves to the next part of the dataset, and the process repeats.

from scipy.signal import savgol_filter

scipy.signal.savgol_filter(input_data, window_length, polyorder)

df_time_series['savgol'] = df_time_series['prediction'].transform(lambda x: savgol_filter(x, 5,2))

The window size parameter specifies how many data points will be used to fit a polynomial regression function. The second parameter specifies the degree of the fitted polynomial function (if we choose 1 as the polynomial degree, we end up using a linear regression function).
The larger the window the less accurate the fitting and the smoothing procedures because we will force the function to average a greater portion of the signal.
In order to have Savitzky-Golay filter working properly, should always choose an odd number for the window size and the order of the polynomial function should always be a number lower than the window size.

Whittaker–Shannon interpolation

The Whittaker smoother attempts to fit a curve that represents the raw data, but is penalized if subsequent points vary too much. The Whittaker filter is a balancing between the residual to the original data and the “smoothness” of the fitted curve.

Batch vs Stream

Batch	Stream
Larger datasets	Simpler analysis: aggregation / filtering
More complex analysis	Individual records / micro batches
Slower moving data (hours, days)	Data moves FAST
With batch processing, a batch of information is collected before being sent in for processing	With streaming, data is sent for analysis piece-by-piece, and processed in real time

Stream

Roles vs Users

Roles	Users
Live in IAM	Live in IAM
Have permissions policies	Have permissions policies
Only attach to other services	Can act on their own
Do not have keys nor login	Can have keys or login

Convert Pandas DataFrame to bytes-like object

import io

towrite = io.BytesIO()
df.to_excel(towrite)  # write to BytesIO buffer
towrite.seek(0) 

print(towrite)
> b''
print(type(towrite))
> _io.BytesIO

if you want to see the bytes-like object use getvalue,

print(towrite.getvalue())
> b'PK\x03\x04\x14\x00\x00\x00\x08\x00\x00\x00!\x00<\xb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Processing

Data Processing

README.md

Data Processing

Data Smoothing

Savitzky-Golay Filter

Whittaker–Shannon interpolation

Batch vs Stream

Stream

Roles vs Users

Convert Pandas DataFrame to bytes-like object

Files

Data Processing

Directory actions

More options

Directory actions

More options

Latest commit

History

Data Processing

Folders and files

parent directory

README.md

Data Processing

Data Smoothing

Savitzky-Golay Filter

Whittaker–Shannon interpolation

Batch vs Stream

Stream

Roles vs Users

Convert Pandas DataFrame to bytes-like object