`ZScoreDetector`: `fit_score` returns `None` values #17

stefanDeveloper · 2024-08-30T15:20:20Z

Describe the bug
Fitting the example data [0. 0. 0. 0. 0. 1. 0. 1. 4. 0.] for the ZScoreDetector returns None values, however, I would expect -1 values or something else when the windows length is not reached. Any reason to return None?

To Reproduce
Steps to reproduce the behavior:

Run example code
See score results:
[None, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Example

from streamad.model import ZScoreDetector
from streamad.util import StreamGenerator, CustomDS
import numpy as np

data = {
    "start": "2024-07-02T12:52:45.000Z",
    "end": "2024-07-02T12:52:55.000Z",
    "data": [
        {
            "timestamp": "2024-07-02T12:52:50.988Z",
        },
        {
            "timestamp": "2024-07-02T12:52:52.092Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
    ],
}

# Convert timestamps to numpy datetime64
timestamps = np.array([
    np.datetime64(item["timestamp"])
    for item in data["data"]
])

# Sort timestamps and count occurrences
sorted_indices = np.argsort(timestamps)
timestamps = timestamps[sorted_indices]

# Set min_date and max_date
min_date = np.datetime64(data["start"])
max_date = np.datetime64(data["end"])

# Generate the time range from min_date to max_date with 1ms interval
time_range = np.arange(min_date, max_date, np.timedelta64(1, 's'))

# Initialize an array to hold counts for each timestamp in the range
counts = np.zeros(time_range.shape, dtype=np.float64)

# Count occurrences of timestamps and fill the corresponding index in the counts array
unique_times, unique_indices, unique_counts = np.unique(timestamps, return_index=True, return_counts=True)
time_indices = ((unique_times - min_date)//1).astype('timedelta64[s]').astype(int)
counts[time_indices] = unique_counts

# Reshape into the required shape (n, 1) and print the resulting numpy array
X = counts.reshape(-1, 1).astype(np.float64)

ds = CustomDS(X, X)
stream = StreamGenerator(ds.data)
model = ZScoreDetector(window_len=1)

scores = []

for x in stream.iter_item():
    score = model.fit_score(x)
    scores.append(score)
    
print(scores)

Desktop (please complete the following information):

OS: Linux 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Package version (please complete the following information):

Version 0.3.1

The text was updated successfully, but these errors were encountered:

Fengrui-Liu · 2024-09-02T13:44:14Z

Hi, @stefanDeveloper

-1 is a reserved design. We can use the plus and minus signs to indicate "spike" or "drop".

Of course, you can also modify the code to replace None with -1 if you prefer.

stefanDeveloper · 2024-09-18T07:47:32Z

-1 is a reserved design. We can use the plus and minus signs to indicate "spike" or "drop".

Currently, we do something like:

for x in stream.iter_item():
    score = self.model.fit_score(x)
    if score != None:
        self.anomalies.append(score)
    else:
        self.anomalies.append(0) # or -1, -inf, ...

It works for our case. However, I think it would be nicer if we handled it directly in Streamad.

Of course, you can also modify the code to replace None with -1 if you prefer.

I see. Could we log a warning that the window length hasn't been reached? In this case, it is not surprising for users to encounter this when trying to work on scores. If yes, I would send a pull request with this relatively small change.

stefanDeveloper added bug Something isn't working enhancement New feature or request labels Aug 30, 2024

stefanDeveloper assigned Fengrui-Liu Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ZScoreDetector`: `fit_score` returns `None` values #17

`ZScoreDetector`: `fit_score` returns `None` values #17

stefanDeveloper commented Aug 30, 2024

Fengrui-Liu commented Sep 2, 2024

stefanDeveloper commented Sep 18, 2024

ZScoreDetector: fit_score returns None values #17

ZScoreDetector: fit_score returns None values #17

Comments

stefanDeveloper commented Aug 30, 2024

Fengrui-Liu commented Sep 2, 2024

stefanDeveloper commented Sep 18, 2024

`ZScoreDetector`: `fit_score` returns `None` values #17

`ZScoreDetector`: `fit_score` returns `None` values #17