-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(nodestore): A file system-based node storage backend, with added support for S3 and GCS storage #76250
base: master
Are you sure you want to change the base?
Conversation
…ort for S3 and GCS storage.
Great job! But I worry a little about the performance while reading multiple objects as a batch. (I didn't dig it, just an intuitive guess) |
I'm not sure how frequently the node storage is accessed or how large the stored files are. I don't think object storage services (such as S3 or GCS) are designed for frequent reads and writes of very small files. This may lead to performance issues or higher costs. |
Based on our recorded data, each node data is approximately between 15~76KB in size, with the majority being around 15KB
I investigated the places where nodestore is written to and read from and found that there are only two scenarios that trigger these actions:
Writing is essentially offline data stream processing, so a slower speed is acceptable. The QPS (queries per second) during reading is very low, so it is not a significant issue either. Additionally, OSS storage costs are definitely the lowest, which is why many projects in recent years (like Paimon and OpenObserve) support the separation of storage (such as S3 and other object storage) and computation. |
@klboke Awesome. I am concerned that reading a large number of tiny files from object storage services might incur higher costs than from table store services. However, according to your analysis, the files are not that small, and the access frequency is not that high. Therefore, my concern may be unnecessary. |
Another point worth noting is that Sentry stores a large amount of highly repetitive ASCII text, and table store services can achieve high compression rates for this, thereby reducing storage costs. However, object storage services usually do not support transparent compression. While each piece of data can be compressed separately, the additional computational overhead will be incurred, and the compression rate may be difficult to match compared to table store services. Anyway, having one more option is always good. I’m just pointing out some potential drawbacks for discussion, not to undermine your work. Thank you for your contribution. |
Thank you very much for your reminder and suggestions. If we switch to S3, considering our 3TB nodestore (which actually is more than most people have), cost won't be an issue at all. I would prefer storing plain JSON files directly, as it would be more convenient for observing and debugging some issues. |
if not settings.DEBUG and options_store.get("filestore.backend") == "filesystem": | ||
raise ValueError("Local fileSystem should only be used in development!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would break existing self-hosted users that don't use (or don't have access to) S3 compatible services. This should just log a warning instead of throwing a ValueError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The default NodeStore
is sentry.nodestore.django.DjangoNodeStorage
rather than this. So it won't break any existing users. If you want to configure this FileSystemNodeStorage
in non-debug environment, you must configure FileStore
also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping to bump this PR.
I would really like to see the built-in support for S3 Node Store (even though performance is not endorsed).
I think the concern here is: FileSystemNodeStorage
by title is supposed to be using "File System". It is not intuitive that by enabling "S3 File Store", FileSystemNodeStore will push things to S3.
I think better way is to explicitly create a new S3NodeStore class with those logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if #82126 makes sense to everyone.
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
A file system-based node storage backend, with added support for S3 and GCS storage.
My starting point is as follows:
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.