Skip to content

Commit

Permalink
doc: add a page on profiling (near#11393)
Browse files Browse the repository at this point in the history
This includes both the somewhat well known information on how to use
perf as well as text on bytehound, which is the only tool that seems to
work for memory profiling neard.

Fixes near#11389
  • Loading branch information
nagisa authored May 29, 2024
1 parent 47cd2ea commit ccfbdeb
Show file tree
Hide file tree
Showing 2 changed files with 136 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
- [Run Gas Estimations](./practices/workflows/gas_estimations.md)
- [Localnet on many machines](./practices/workflows/localnet_on_many_machines.md)
- [IO tracing](./practices/workflows/io_trace.md)
- [Profiling](./practices/workflows/profiling.md)
- [Working with OpenTelemetry Traces](./practices/workflows/otel_traces.md)
- [Code Style](./practices/style.md)
- [Documentation](./practices/docs.md)
Expand Down
135 changes: 135 additions & 0 deletions docs/practices/workflows/profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Profiling neard

## Sampling performance profiling

It is a common task to need to look where `neard` is spending time. Outside of instrumentation
we've also been successfully using sampling profilers to gain an intuition over how the code works
and where it spends time.

Linux's `perf` has been a tool of choice in most cases. In order to use it, first prepare your
system:

```command
$ sudo sysctl kernel.perf_event_paranoid=0
$ sudo sysctl kernel.restrict_kptr=0
```

<blockquote style="background: rgba(255, 200, 0, 0.1); border: 5px solid rgba(255, 200, 0, 0.4);">

Beware that this gives access to certain kernel state and environment to the unprivileged user.
Once investigation is over either set these properties back to the more restricted settings or,
better yet, reboot.

Definitely do not run untrusted code after running these commands.

</blockquote>

Then collect a profile as such:

```command
$ perf record -e cpu-clock -F1000 -g --call-graph dwarf,65528 YOUR_COMMAND_HERE
```

(you may omit `-e cpu-clock` in certain environments for a more precise sampling timer.)

This will produce a profile file in the current working directory. Although you can inspect the
profile already with `perf report`, we've had much better experience with using [Firefox
Profiler](https://profiler.firefox.com/) as the viewer. Although Firefox Profiler supports `perf`
and many other different data formats, for `perf` in particular a conversion step is necessary:

```command
$ perf script -F +pid > mylittleprofile.script
```

Then, load this `mylittleprofile.script` with the tool.

## Memory profiling

`neard` is a pretty memory-intensive application with many allocations occurring constantly.
Although Rust makes it pretty hard to introduce memory problems, it is still possible to leak
memory or to inadvertently retain too much of it.

Unfortunately, “just” throwing a random profiler at neard does not work for many reasons. Valgrind
for example is introducing enough slowdown to significantly alter the behaviour of the run, not to
mention that to run it successfully and without crashing it will be necessary to comment out
`neard`’s use of `jemalloc` for yet another substantial slowdown.

So far the only took that worked out well out of the box was
[`bytehound`](https://github.com/koute/bytehound). Using it is quite straightforward, but needs
Linux, and ability to execute privileged commands.

First, checkout and build the profiler (you will need to have nodejs `yarn` thing available as
well):

```command
$ git clone [email protected]:koute/bytehound.git
$ cargo build --release -p bytehound-preload
$ cargo build --release -p bytehound-cli
```

You will also need a build of your `neard`, once you have that, give it some ambient cabapilities
necessary for profiling:

```command
$ sudo sysctl kernel.perf_event_paranoid=0
$ sudo setcap 'CAP_SYS_ADMIN+ep' /path/to/neard
$ sudo setcap 'CAP_SYS_ADMIN+ep' /path/to/libbytehound.so
```

And finally run the program with the profiler enabled (in this case `neard run` command is used):

```command
$ /lib64/ld-linux-x86-64.so.2 --preload /path/to/libbytehound.so /path/to/neard run
```

### Viewing the profile

<blockquote style="background: rgba(255, 200, 0, 0.1); border: 5px solid rgba(255, 200, 0, 0.4);">

Do note that you will need about twice the amount of RAM as the size of the input file in order to
load it successfully.

</blockquote>

Once enough profiling data has been gathered, terminate the program. Use the `bytehound` CLI tool
to operate on the profile. I recommend `bytehound server` over directly converting to e.g. heaptrack
format using other subcommands as each invocation will read and parse the profile data from
scratch. This process can take quite some time. `server` parses the inputs once and makes
conversions and other introspection available as interactive steps.

You can use `server` interface to inspect the profile in one of the few ways, download a flamegraph
or a heaptrack file. Heaptrack in particular provides some interesting additional visualizations
and has an ability to show memory use over time from different allocation sources.

I personally found it a bit troublesome to figure out how to open the heaptrack file from the GUI.
However, `heaptrack myexportedfile` worked perfectly. I recommend opening the file exactly this way.

### Troubleshooting

#### No output file

1. Set a higher profiler logging level. Verify that the profiler gets loaded at all. If you're not
seeing any log messages, then something about your working environment is preventing the loader
from including the profiler library.
2. Try specifying an exact output file with e.g. environment variables that the profiler reads.

#### Crashes

If the profiled `neard` crashes in your tests, there are a couple things you can try to get past
it. First, makes sure your binary has the necessary ambient capabilities (`setcap` command above
needs to be executed every time binary is replaced!)

Another thing to try is disabling `jemalloc`. Comment out this code in `neard/src/main.rs`:

```rust
#[global_allocator]
static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
```

The other thing you can try is different profilers, different versions of the profilers or
different options made available (in particular disabling the shadow stack in bytehound), although
I don't have specific recommendations here.

We don't know what exactly it is about neard that leads to it crashing under the profiler as easily
as it does. I have seen valgrind reporting that we have libraries that are deallocating with a
wrong size class, so that might be the reason? Do definitely look into this if you have time.

0 comments on commit ccfbdeb

Please sign in to comment.