A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in a process known as convolution, by fitting successive subsets of adjacent data points with a low-degree polynomial by the method of linear least squares. When the data points are equally spaced, an analytical solution to the least-squares equations can be found, in the form of a single set of "convolution coefficients" that can be applied to all data sub-sets, to give estimates of the smoothed signal, (or derivatives of the smoothed signal) at the central point of each sub-set.
In every step, the window moves and a different part of the original dataset is used. Then, the local polynomial function is fitted to the data in the window, and a new data point is calculated using the polynomial function. After that, the window moves to the next part of the dataset, and the process repeats.
from scipy.signal import savgol_filter
scipy.signal.savgol_filter(input_data, window_length, polyorder)
df_time_series['savgol'] = df_time_series['prediction'].transform(lambda x: savgol_filter(x, 5,2))
- The window size parameter specifies how many data points will be used to fit a polynomial regression function. The second parameter specifies the degree of the fitted polynomial function (if we choose 1 as the polynomial degree, we end up using a linear regression function).
- The larger the window the less accurate the fitting and the smoothing procedures because we will force the function to average a greater portion of the signal.
- In order to have Savitzky-Golay filter working properly, should always choose an odd number for the window size and the order of the polynomial function should always be a number lower than the window size.
The Whittaker smoother attempts to fit a curve that represents the raw data, but is penalized if subsequent points vary too much. The Whittaker filter is a balancing between the residual to the original data and the “smoothness” of the fitted curve.
Batch | Stream |
Larger datasets | Simpler analysis: aggregation / filtering |
More complex analysis | Individual records / micro batches |
Slower moving data (hours, days) | Data moves FAST |
With batch processing, a batch of information is collected before being sent in for processing | With streaming, data is sent for analysis piece-by-piece, and processed in real time |
Roles | Users |
Live in IAM | Live in IAM |
Have permissions policies | Have permissions policies |
Only attach to other services | Can act on their own |
Do not have keys nor login | Can have keys or login |
import io
towrite = io.BytesIO()
df.to_excel(towrite) # write to BytesIO buffer
> b''
> _io.BytesIO
if you want to see the bytes-like object use getvalue
> b'PK\x03\x04\x14\x00\x00\x00\x08\x00\x00\x00!\x00<\xb