Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be explicit about scraping, streaming and push #41

Closed
mattbostock opened this issue Oct 9, 2017 · 15 comments
Closed

Be explicit about scraping, streaming and push #41

mattbostock opened this issue Oct 9, 2017 · 15 comments

Comments

@mattbostock
Copy link
Contributor

In #11 we mention scraping but I can imagine that someone might want to use the exposition format for streaming (or pushing) metrics.

We should consider such use cases, their implications and whether the specification should allow for them.

@mattbostock mattbostock changed the title Be explicit about scraping vs streaming vs push Be explicit about scraping, streaming and push Oct 9, 2017
@manolama
Copy link

I think the spec should be completely agnostic of the source method.

@brian-brazil
Copy link
Contributor

I don't think we get to be agnostic, but we should cover both push and pull.

@RichiH
Copy link
Member

RichiH commented Oct 10, 2017

I would like to have it decoupled if at all possible; but we will find out either way. The result needs to support both, and ideally show a way to transfer one into the other.

@pimvanpelt
Copy link

I strongly support pushing and pulling and we should be explicit on that in the API spec.
Regarding pushing my best examples are instrumentation agents that are behind NATs (example: cell phones, IoTs), and systems interconnects (example: Prometheus pushing to StackDriver).

@DirectXMan12
Copy link

cc @dashpole

Standards around the actual transport, at least in the context of pull and streaming, would be nice for making it easy on the developer end to use a library and get compat with any pull- or streaming-compatible OpenMetrics storage/TSDB/etc.

For instance, on-and-off over the past few months, @dashpole and I have been looking at replacing the bespoke metrics API that the Kubelet serves in Kubernetes (part of a broader effort to overhaul the internal Kubelet --> in-cluster storage --> Kubernetes controller metrics pipeline). Ideally, whatever we decide on would also be easily consumable by other monitoring pipelines as well, which makes OpenMetrics a good candidate :-).

We want streaming support for metrics collection at rapid intervals, and wanted to see if anyone else had any experience/feedback/thoughts from the OpenMetrics side. Long-term, it would be awesome if whatever we proposed/adopted for Kubernetes ended up being/converging with a standard for streaming OpenMetrics, so it "just worked" with other solutions too.

@SuperQ
Copy link
Member

SuperQ commented Nov 28, 2018

@DirectXMan12 "Rapid" is relative, it would help to talk about specifics. Specific intervals, how many metrics per interval, etc.

@dashpole
Copy link

We have had requests for intervals as low as 100ms. Given a 100 pods/node limit, 2 metrics/container, and assuming 2 containers/pod, we would to ideally be able to push ~400 metrics in this time interval from each node. This is all in an ideal world, so take it with a grain of salt.

@SuperQ
Copy link
Member

SuperQ commented Nov 29, 2018

We've done a lot of work to make the Go implementation low impact, especially when it comes to the cost of instrumentation (15 CPU nanoseconds to increment a counter, for example). But we haven't done as much optimization of the collection, as we mostly target 15s intervals.

With the current Go implementation, I grabbed a typical typical app. This app exposes ~4700 metrics in ~35ms. This yields 7.8us per metric. So for your example, it would take ~3.1ms to gather 400 metrics. This seems reasonable to me for a 100ms interval. There has been some recent work to cut the scrape cost, I'm not sure if my above example includes that work, I'll have to do some more digging.

In an ideal world, each container/pod would have a separate metrics endpoint, exposing data directly like my above example does. This means that if an application has 100 metrics itself, it can be scraped in under 1ms, more than good enough for 100ms intervals. Pollers (like Prometheus) can then parallelize the work of data collection.

@RichiH
Copy link
Member

RichiH commented Nov 29, 2018

Please note that push/streaming will probably look the same on the wire, but be called by a different name to avoid confusion.

@dashpole
Copy link

dashpole commented Dec 1, 2018

We also have in-cluster storage for these metrics, which is currently the metrics-server. It would be expected to ingest ~400 metrics / 100ms from up to 5000 nodes. In its current state (scraping a json http endpoint every 30 seconds), nearly all of its CPU time is spent serializing/deserializing, and are hoping to move to a more efficient format.

@SuperQ
Copy link
Member

SuperQ commented Dec 1, 2018

@dashpole

400 metrics / 100ms from up to 5000 nodes

This is confusing, are you saying 4000 samples/second? Or 20M samples/second?

Prometheus used to support json, but was replaced due to CPU overhead back in 2013. With the current format we can ingest ~200,000 samples/second/cpu.

@RichiH
Copy link
Member

RichiH commented Dec 1, 2018

NB: ITYM samples/(second*core) (hard to write without latex).

@DirectXMan12
Copy link

In an ideal world, each container/pod would have a separate metrics endpoint

That's not necessarily true for monitoring system-level metrics around the container runtimes -- it generally makes more sense there to have the container-runtime do the monitoring and report those aspects together from a single endpoint.

Prometheus used to support json, but was replaced due to CPU overhead back in 2013

Exactly our problem :-)

With the current format we can ingest ~200,000 samples/second/cpu.

given:

  • mpc means "metrics per container"
  • cpp means "containers per pod"
  • ppn means "pods per node"

we want (2mpc * 2cpp * 100ppn * 5000 nodes) = 400metrics * 5000nodes = 2,000,000metrics/100ms, parallelized in 400-metric chunks (since 400 metrics per node). I think @dashpole was able to achieve something close to the desired load using some custom proto + streaming, but he can probably fill in the details a bit better. With the current text format, without the Prometheus LRU cache trick (which takes some digging in the Prometheus codebase to find and extract), the CPU load was a lot higher, IIRC.

@SuperQ
Copy link
Member

SuperQ commented Dec 4, 2018

That's not necessarily true for monitoring system-level metrics around the container runtimes -- it generally makes more sense there to have the container-runtime do the monitoring and report those aspects together from a single endpoint.

Right, I'm talking about the applications in the containers. It is not a good idea to have the container runtime deal with these, as you can have thousands of metrics per container. The applications themselves already have metrics endpoints declared. Adding this to the container runtime is also going to create a SPoF/bottleneck. This is exactly why Borgmon and Prometheus work the way the do.

If you want to improve cluster monitoring efficiency, why not contribute directly to the Prometheus project?

@DirectXMan12
Copy link

Right, I'm talking about the applications in the containers. It is not a good idea to have the container runtime deal with these, as you can have thousands of metrics per container

Sure, agreed, for application-specific metrics. Those are exposed by the application, and are scraped directly by Prometheus or what-have-you. That wasn't what the example provided above was about.

In Kubernetes, there are also system-level metrics determined by inspecting cgroups from outside the containers, etc, which the apps don't know about. Those are computed by the container runtimes or the kubelet (depending on the given metric), and exposed via the kubelet or by the container runtime directly. The applications have no idea about how to monitor those metrics, and they shouldn't have to. The example we were talking about was those metrics.

@RichiH RichiH closed this as completed Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants