Skip to content

Commit

Permalink
Add perf uncore events
Browse files Browse the repository at this point in the history
Signed-off-by: Paweł Szulik <[email protected]>
  • Loading branch information
Paweł Szulik committed Jun 23, 2020
1 parent 0ac6b77 commit 5641a0f
Show file tree
Hide file tree
Showing 26 changed files with 1,153 additions and 165 deletions.
131 changes: 124 additions & 7 deletions docs/runtime_options.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,23 +158,124 @@ automatically.
* `grouping` - in scenario when accounted for events are used to calculate derivative metrics, it is reasonable to
measure them in transactional manner: all the events in a group must be accounted for in the same period of time. Keep
in mind that it is impossible to group more events that there are counters available.
* `uncore events` - events which can be counted by PMUs outside core.
* `PMU` - Performance Monitoring Unit

#### Getting config values
Using perf tools:
* Identify the event in `perf list` output.
* Execute command: `perf stat -I 5000 -vvv -e EVENT_NAME`
* Find `perf_event_attr` section on `perf stat` output, copy config and type field to configuration file.

```
------------------------------------------------------------
perf_event_attr:
type 18
size 112
config 0x304
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
```
* Configuration file should look like:
```json
{
"core": {
"events": [
["EVENT_NAME"]
],
"custom_events": [
{
"type": 18,
"config": [
"0x304"
],
"name": "EVENT_NAME"
}
]
},
"uncore": {
"events": [
["EVENT_NAME"]
],
"custom_events": [
{
"type": 18,
"config": [
"0x304"
],
"name": "EVENT_NAME"
}
]
}
}
```

Config values can be also obtain from:
* [Intel® 64 and IA32 Architectures Performance Monitoring Events](https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia32-architectures-performance-monitoring-events.html)


##### Uncore Events configuration
Uncore Event name should be in form `PMU_PREFIX/event_name` where **PMU_PREFIX** mean
that statistics would be counted on all PMUs with that prefix in name.

Let's explain this by example:

```json
{
"uncore": {
"events": [
["uncore_imc/cas_count_read"],
["uncore_imc_0/cas_count_write"],
["cas_count_all"]
],
"custom_events": [
{
"config": [
"0x304"
],
"name": "uncore_imc_0/cas_count_write"
},
{
"type": 19,
"config": [
"0x304"
],
"name": "cas_count_all"
}
]
}
}
```

- `uncore_imc/cas_count_read` - because of `uncore_imc` type and no entry in custom events,
it would be counted by **all** Integrated Memory Controller PMUs with config provided from libpfm package.
(using this function: https://man7.org/linux/man-pages/man3/pfm_get_os_event_encoding.3.html)

- `uncore_imc_0/cas_count_write` - because of `uncore_imc_0` type and entry in custom events it would be counted by `uncore_imc_0` PMU with provided config.

- `uncore_imc_1/cas_count_all` - because of entry in custom events with type field, event would be counted by PMU with **19** type and provided config.

### Further reading

* [perf Examples](http://www.brendangregg.com/perf.html) on Brendan Gregg's blog
* [Kernel Perf Wiki](https://perf.wiki.kernel.org/index.php/Main_Page)
* `man perf_event_open`
* [perf subsystem](https://github.com/torvalds/linux/tree/v5.6/kernel/events) in Linux kernel
* [Uncore Performance Monitoring Reference Manuals](https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html#uncore)

See example configuration below:
```json
{
"events": [
["instructions"],
["instructions_retired"]
],
"custom_events": [
[
"core": {
"events": [
["instructions"],
["instructions_retired"]
],
"custom_events": [
{
"type": 4,
"config": [
Expand All @@ -183,7 +284,20 @@ See example configuration below:
"name": "instructions_retired"
}
]
]
},
"uncore": {
"events": [
["uncore_imc/cas_count_read"]
],
"custom_events": [
{
"config": [
"0xc04"
],
"name": "uncore_imc/cas_count_read"
}
]
}
}
```

Expand All @@ -194,6 +308,9 @@ interface that majority of users will rely on.
* `instructions_retired` will be measured as non-grouped event and is specified using an advanced API that allows
to specify any perf event available (some of them are not named and can't be specified with plain string). Event name
should be a human readable string that will become a metric name.
* `cas_count_read` will be measured as uncore non-grouped event on all Integrated Memory Controllers Performance Monitoring Units because of unset `type` field and
`uncore_imc` prefix.


## Storage driver specific instructions:

Expand Down
30 changes: 30 additions & 0 deletions info/v1/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -874,6 +874,32 @@ type ResctrlStats struct {
Cache []CacheStats `json:"cache,omitempty"`
}

// PerfUncoreStat represents value of a single monitored perf uncore event.
type PerfUncoreStat struct {
// Indicates scaling ratio for an event: time_running/time_enabled
// (amount of time that event was being measured divided by
// amount of time that event was enabled for).
// value 1.0 indicates that no multiplexing occurred. Value close
// to 0 indicates that event was measured for short time and event's
// value might be inaccurate.
// See: https://lwn.net/Articles/324756/
ScalingRatio float64 `json:"scaling_ratio"`

// Value represents value of perf event retrieved from OS. It is
// normalized against ScalingRatio and takes multiplexing into
// consideration.
Value uint64 `json:"value"`

// Name is human readable name of an event.
Name string `json:"name"`

// Socket that perf event was measured on.
Socket int `json:"socket"`

// PMU is Performance Monitoring Unit which collected these stats.
PMU string `json:"pmu"`
}

type UlimitSpec struct {
Name string `json:"name"`
SoftLimit int64 `json:"soft_limit"`
Expand Down Expand Up @@ -926,6 +952,10 @@ type ContainerStats struct {
// Statistics originating from perf events
PerfStats []PerfStat `json:"perf_stats,omitempty"`

// Statistics originating from perf uncore events.
// Applies only for root container.
PerfUncoreStats []PerfUncoreStat `json:"perf_uncore_stats,omitempty"`

// Referenced memory
ReferencedMemory uint64 `json:"referenced_memory,omitempty"`

Expand Down
13 changes: 13 additions & 0 deletions info/v1/machine.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,19 @@ func (n *Node) FindCore(id int) (bool, int) {
return false, -1
}

// FindCoreByThread returns bool if found Core with same thread as provided and it's index in Node Core array.
// If it's not found, returns false and -1.
func (n *Node) FindCoreByThread(thread int) (bool, int) {
for i, n := range n.Cores {
for _, t := range n.Threads {
if t == thread {
return true, i
}
}
}
return false, -1
}

func (n *Node) AddThread(thread int, core int) {
var coreIdx int
if core == -1 {
Expand Down
6 changes: 6 additions & 0 deletions info/v2/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,9 @@ type DeprecatedContainerStats struct {
CustomMetrics map[string][]v1.MetricVal `json:"custom_metrics,omitempty"`
// Perf events counters
PerfStats []v1.PerfStat `json:"perf_stats,omitempty"`
// Statistics originating from perf uncore events.
// Applies only for root container.
PerfUncoreStats []v1.PerfUncoreStat `json:"perf_uncore_stats,omitempty"`
// Referenced memory
ReferencedMemory uint64 `json:"referenced_memory,omitempty"`
// Resource Control (resctrl) statistics
Expand Down Expand Up @@ -173,6 +176,9 @@ type ContainerStats struct {
CustomMetrics map[string][]v1.MetricVal `json:"custom_metrics,omitempty"`
// Perf events counters
PerfStats []v1.PerfStat `json:"perf_stats,omitempty"`
// Statistics originating from perf uncore events.
// Applies only for root container.
PerfUncoreStats []v1.PerfUncoreStat `json:"perf_uncore_stats,omitempty"`
// Referenced memory
ReferencedMemory uint64 `json:"referenced_memory,omitempty"`
// Resource Control (resctrl) statistics
Expand Down
6 changes: 6 additions & 0 deletions info/v2/conversion.go
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,9 @@ func ContainerStatsFromV1(containerName string, spec *v1.ContainerSpec, stats []
if len(val.PerfStats) > 0 {
stat.PerfStats = val.PerfStats
}
if len(val.PerfUncoreStats) > 0 {
stat.PerfUncoreStats = val.PerfUncoreStats
}
if len(val.Resctrl.MemoryBandwidth) > 0 || len(val.Resctrl.Cache) > 0 {
stat.Resctrl = val.Resctrl
}
Expand Down Expand Up @@ -213,6 +216,9 @@ func DeprecatedStatsFromV1(cont *v1.ContainerInfo) []DeprecatedContainerStats {
if len(val.PerfStats) > 0 {
stat.PerfStats = val.PerfStats
}
if len(val.PerfUncoreStats) > 0 {
stat.PerfUncoreStats = val.PerfUncoreStats
}
if len(val.Resctrl.MemoryBandwidth) > 0 || len(val.Resctrl.Cache) > 0 {
stat.Resctrl = val.Resctrl
}
Expand Down
17 changes: 17 additions & 0 deletions info/v2/conversion_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,22 @@ func TestContainerStatsFromV1(t *testing.T) {
Name: "cycles",
},
},
PerfUncoreStats: []v1.PerfUncoreStat{
{
ScalingRatio: 1.0,
Value: 123456,
Name: "uncore_imc_0/cas_count_write",
Socket: 0,
PMU: "17",
},
{
ScalingRatio: 1.0,
Value: 654321,
Name: "uncore_imc_0/cas_count_write",
Socket: 1,
PMU: "17",
},
},
ReferencedMemory: uint64(1234),
Resctrl: v1.ResctrlStats{
MemoryBandwidth: []v1.MemoryBandwidthStats{
Expand Down Expand Up @@ -247,6 +263,7 @@ func TestContainerStatsFromV1(t *testing.T) {
},
Accelerators: v1Stats.Accelerators,
PerfStats: v1Stats.PerfStats,
PerfUncoreStats: v1Stats.PerfUncoreStats,
ReferencedMemory: v1Stats.ReferencedMemory,
Resctrl: v1Stats.Resctrl,
}
Expand Down
4 changes: 2 additions & 2 deletions machine/info.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ import (
"strings"
"time"

"golang.org/x/sys/unix"

"github.com/google/cadvisor/fs"
info "github.com/google/cadvisor/info/v1"
"github.com/google/cadvisor/nvm"
Expand All @@ -30,8 +32,6 @@ import (
"github.com/google/cadvisor/utils/sysinfo"

"k8s.io/klog/v2"

"golang.org/x/sys/unix"
)

const hugepagesDirectory = "/sys/kernel/mm/hugepages/"
Expand Down
2 changes: 1 addition & 1 deletion manager/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ func New(memoryCache *memory.InMemoryCache, sysfs sysfs.SysFs, houskeepingConfig
newManager.machineInfo = *machineInfo
klog.V(1).Infof("Machine: %+v", newManager.machineInfo)

newManager.perfManager, err = perf.NewManager(perfEventsFile, machineInfo.NumCores)
newManager.perfManager, err = perf.NewManager(perfEventsFile, machineInfo.NumCores, machineInfo.Topology)
if err != nil {
return nil, err
}
Expand Down
44 changes: 39 additions & 5 deletions metrics/prometheus.go
Original file line number Diff line number Diff line change
Expand Up @@ -1545,11 +1545,11 @@ func NewPrometheusCollector(i infoProvider, f ContainerLabelsFunc, includedMetri
},
}...)
}
if c.includedMetrics.Has(container.PerfMetrics) {
if includedMetrics.Has(container.PerfMetrics) {
c.containerMetrics = append(c.containerMetrics, []containerMetric{
{
name: "container_perf_metric",
help: "Perf event metric",
name: "container_perf_events_total",
help: "Perf event metric.",
valueType: prometheus.CounterValue,
extraLabels: []string{"cpu", "event"},
getValues: func(s *info.ContainerStats) metricValues {
Expand All @@ -1565,8 +1565,8 @@ func NewPrometheusCollector(i infoProvider, f ContainerLabelsFunc, includedMetri
},
},
{
name: "container_perf_metric_scaling_ratio",
help: "Perf event metric scaling ratio",
name: "container_perf_events_scaling_ratio",
help: "Perf event metric scaling ratio.",
valueType: prometheus.GaugeValue,
extraLabels: []string{"cpu", "event"},
getValues: func(s *info.ContainerStats) metricValues {
Expand All @@ -1581,6 +1581,40 @@ func NewPrometheusCollector(i infoProvider, f ContainerLabelsFunc, includedMetri
return values
},
},
{
name: "container_perf_uncore_events_total",
help: "Perf uncore event metric.",
valueType: prometheus.CounterValue,
extraLabels: []string{"socket", "event", "pmu"},
getValues: func(s *info.ContainerStats) metricValues {
values := make(metricValues, 0, len(s.PerfUncoreStats))
for _, metric := range s.PerfUncoreStats {
values = append(values, metricValue{
value: float64(metric.Value),
labels: []string{strconv.Itoa(metric.Socket), metric.Name, metric.PMU},
timestamp: s.Timestamp,
})
}
return values
},
},
{
name: "container_perf_uncore_events_scaling_ratio",
help: "Perf uncore event metric scaling ratio.",
valueType: prometheus.GaugeValue,
extraLabels: []string{"socket", "event", "pmu"},
getValues: func(s *info.ContainerStats) metricValues {
values := make(metricValues, 0, len(s.PerfUncoreStats))
for _, metric := range s.PerfUncoreStats {
values = append(values, metricValue{
value: metric.ScalingRatio,
labels: []string{strconv.Itoa(metric.Socket), metric.Name, metric.PMU},
timestamp: s.Timestamp,
})
}
return values
},
},
}...)
}
if includedMetrics.Has(container.ReferencedMemoryMetrics) {
Expand Down
Loading

0 comments on commit 5641a0f

Please sign in to comment.