Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZScoreDetector: fit_score returns None values #17

Open
stefanDeveloper opened this issue Aug 30, 2024 · 2 comments
Open

ZScoreDetector: fit_score returns None values #17

stefanDeveloper opened this issue Aug 30, 2024 · 2 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@stefanDeveloper
Copy link

Describe the bug
Fitting the example data [0. 0. 0. 0. 0. 1. 0. 1. 4. 0.] for the ZScoreDetector returns None values, however, I would expect -1 values or something else when the windows length is not reached. Any reason to return None?

To Reproduce
Steps to reproduce the behavior:

  1. Run example code
  2. See score results:
    [None, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Example

from streamad.model import ZScoreDetector
from streamad.util import StreamGenerator, CustomDS
import numpy as np

data = {
    "start": "2024-07-02T12:52:45.000Z",
    "end": "2024-07-02T12:52:55.000Z",
    "data": [
        {
            "timestamp": "2024-07-02T12:52:50.988Z",
        },
        {
            "timestamp": "2024-07-02T12:52:52.092Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
        {
            "timestamp": "2024-07-02T12:52:53.095Z",
        },
    ],
}

# Convert timestamps to numpy datetime64
timestamps = np.array([
    np.datetime64(item["timestamp"])
    for item in data["data"]
])

# Sort timestamps and count occurrences
sorted_indices = np.argsort(timestamps)
timestamps = timestamps[sorted_indices]

# Set min_date and max_date
min_date = np.datetime64(data["start"])
max_date = np.datetime64(data["end"])

# Generate the time range from min_date to max_date with 1ms interval
time_range = np.arange(min_date, max_date, np.timedelta64(1, 's'))

# Initialize an array to hold counts for each timestamp in the range
counts = np.zeros(time_range.shape, dtype=np.float64)

# Count occurrences of timestamps and fill the corresponding index in the counts array
unique_times, unique_indices, unique_counts = np.unique(timestamps, return_index=True, return_counts=True)
time_indices = ((unique_times - min_date)//1).astype('timedelta64[s]').astype(int)
counts[time_indices] = unique_counts

# Reshape into the required shape (n, 1) and print the resulting numpy array
X = counts.reshape(-1, 1).astype(np.float64)

ds = CustomDS(X, X)
stream = StreamGenerator(ds.data)
model = ZScoreDetector(window_len=1)

scores = []

for x in stream.iter_item():
    score = model.fit_score(x)
    scores.append(score)
    
print(scores)

Desktop (please complete the following information):

  • OS: Linux 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Package version (please complete the following information):

  • Version 0.3.1
@stefanDeveloper stefanDeveloper added bug Something isn't working enhancement New feature or request labels Aug 30, 2024
@Fengrui-Liu
Copy link
Owner

Hi, @stefanDeveloper

-1 is a reserved design. We can use the plus and minus signs to indicate "spike" or "drop".

Of course, you can also modify the code to replace None with -1 if you prefer.

@stefanDeveloper
Copy link
Author

-1 is a reserved design. We can use the plus and minus signs to indicate "spike" or "drop".

Currently, we do something like:

for x in stream.iter_item():
    score = self.model.fit_score(x)
    if score != None:
        self.anomalies.append(score)
    else:
        self.anomalies.append(0) # or -1, -inf, ...

It works for our case. However, I think it would be nicer if we handled it directly in Streamad.

Of course, you can also modify the code to replace None with -1 if you prefer.

I see. Could we log a warning that the window length hasn't been reached? In this case, it is not surprising for users to encounter this when trying to work on scores. If yes, I would send a pull request with this relatively small change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants