Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hostfrequencyscaling #140

Merged
merged 87 commits into from
Jan 8, 2020

Conversation

ghazanfarttu
Copy link
Contributor

This module performs mutations related to scaling of CPU frequency to control CPU thermal conditions of HPC node using in-band mechanism.

This module clocks HPC node to "schedutil" scaling governor on boot and whenever CPU temperature reaches to high (warning) condition, module mutates the scaling governor to "powersave". Current implementation handles critical CPU temperature same as high CPU temperature.

Additionally, there are many other mutations intended for different use cases (e.g. switching back to "schedutil" after "powersave") are under considerations and investigations.

Copy link
Contributor

@kpelzel kpelzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Most of the changes I suggested are just small things to cleanup from when we were debugging stuff.

config/pipxe.yaml Outdated Show resolved Hide resolved
core/StateDifferenceEngine.go Outdated Show resolved Hide resolved
core/StateDifferenceEngine.go Outdated Show resolved Hide resolved
core/StateMutationEngine.go Outdated Show resolved Hide resolved
core/StateMutationEngine.go Outdated Show resolved Hide resolved
modules/hostfrequencyscaling/hostfrequencyscaling.go Outdated Show resolved Hide resolved
modules/hostthermaldiscovery/hostthermaldiscovery.go Outdated Show resolved Hide resolved
core/StateSyncEngine.go Outdated Show resolved Hide resolved
Comment on lines 200 to 206
k.Sme.Thaw()
// Thaw if full state
if len(parents) == 0 {
k.Sme.Thaw()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want this in a separate PR, but ask Lowell.

Comment on lines 66 to 76
// PxeURL refers to PXE object
PxeURL string = "type.googleapis.com/proto.RPi3/Pxe"

// ModuleStateURL refers to module state
ModuleStateURL string = "/Services/hostfrequencyscaling/State"

// HostThermalStateURL points to Thermal extension
HostThermalStateURL string = "type.googleapis.com/proto.HostThermal/State"

// NodeIPURL provides node IP address
NodeIPURL string = "type.googleapis.com/proto.IPv4OverEthernet/Ifaces/0/Ip/Ip"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constant values probably don't need to be exported, but it probably doesn't really matter. Ask Lowell

@kpelzel
Copy link
Contributor

kpelzel commented Dec 12, 2019

I just realized a lot of this PR contains changes from #112. We should make sure that gets merged first, then do a rebase.

Copy link
Contributor

@kpelzel kpelzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things need to be cleaned up, but otherwise looks good.

Comment on lines 1 to 28
# kraken-build.go: describes a build for a BitScope Raspberry Pi cluster
targets:
'linux-arm64': # this identifies the build, will be appended to the binary name
os: 'linux' # os must match a supported GOOS
arch: 'arm64' # arch must match a supported GOARCH
'linux-amd64':
os: 'linux'
arch: 'amd64'
'darwin-amd64':
os: 'darwin'
arch: 'amd64'

# included extensions
extensions:
- github.com/hpc/kraken/extensions/IPv4
- github.com/hpc/kraken/extensions/RPi3
- github.com/hpc/kraken/extensions/HostThermal
- github.com/hpc/kraken/extensions/HostFrequencyScaler
# included modules
modules:
- github.com/hpc/kraken/modules/restapi
- github.com/hpc/kraken/modules/rfpipower
- github.com/hpc/kraken/modules/pipxe
- github.com/hpc/kraken/modules/hostthermaldiscovery
- github.com/hpc/kraken/modules/hostfrequencyscaling
- github.com/hpc/kraken/modules/cpuburn
- github.com/hpc/kraken/modules/websocket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks like it's the same as pipxe so it can probably be removed.

Comment on lines 416 to 434
// // setup a ticker for checking whether PS is enforced in Thermal bound scenario
// if hfs.cfg.GetThermalBoundScaler() == true {

// dur, _ := time.ParseDuration("1s")
// thermalCheckTick := time.NewTicker(dur)

// // thermal ticker
// for {
// select {
// case <-thermalCheckTick.C:
// if hfs.psEnforced == true {
// go hfs.CheckThermalThreshold()
// }

// break
// }
// }

// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

Comment on lines 511 to 532
// if currentThermal >= thresholdThermal {
// hfs.mutex.Lock()
// hfs.psEnforced = true
// hfs.mutex.Unlock()
// } else
if (currentThermal / 1000) < thresholdThermal {
hfs.mutex.Lock()
hfs.psEnforced = false
hfs.mutex.Unlock()

// url := lib.NodeURLJoin(node.ID().String(), hostFreqScalerURL)
// ev := core.NewEvent(
// lib.Event_DISCOVERY,
// url,
// &core.DiscoveryEvent{
// URL: url,
// ValueID: profileMap["performance"],
// },
// )
// hfs.dchan <- ev
}
//hfs.api.Logf(lib.LLERROR, "*** T E M P ***: %v", currentThermal/1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed commented code

Comment on lines 643 to 654
// } else {
// url := lib.NodeURLJoin(node.ID().String(), hostHightoLowFreqScalerURL)
// ev := core.NewEvent(
// lib.Event_DISCOVERY,
// url,
// &core.DiscoveryEvent{
// URL: url,
// ValueID: currentScalingConfig.CurScalingGovernor,
// },
// )
// hfs.dchan <- ev
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

Comment on lines 216 to 219
// if hostFreqScaler == hostDisc.preFreqScaler {
// // no change in frequency scaler so no need to generate discovery event
// return
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code


vid := profileMap[hostFreqScaler]

//hostDisc.api.Logf(lib., "PRINTSCALER: %s", vid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

hostDisc.api.Logf(lib.LLERROR, "Reading CPU thermal sensor failed: %v", err)
return ""
}
//fscalingGovernor := strings.TrimSuffix(string(bscalingGovernor), "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

Comment on lines 353 to 361
// cpuThermalThresholds := hostDisc.cfg.GetThermalThresholds()
// lowerNormal := cpuThermalThresholds["CPUThermalThresholds"].GetLowerNormal()
// upperNormal := cpuThermalThresholds["CPUThermalThresholds"].GetUpperNormal()

lowerHigh := cpuThermalThresholds["CPUThermalThresholds"].GetLowerHigh()
upperHigh := cpuThermalThresholds["CPUThermalThresholds"].GetUpperHigh()
// lowerHigh := cpuThermalThresholds["CPUThermalThresholds"].GetLowerHigh()
// upperHigh := cpuThermalThresholds["CPUThermalThresholds"].GetUpperHigh()

lowerCritical := cpuThermalThresholds["CPUThermalThresholds"].GetLowerCritical()
upperCritical := cpuThermalThresholds["CPUThermalThresholds"].GetUpperCritical()
// lowerCritical := cpuThermalThresholds["CPUThermalThresholds"].GetLowerCritical()
// upperCritical := cpuThermalThresholds["CPUThermalThresholds"].GetUpperCritical()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

@kpelzel kpelzel force-pushed the hostfrequencyscaling branch from c07fc7d to 0502246 Compare January 8, 2020 23:52
@jlowellwofford jlowellwofford merged commit 4f8d3ad into kraken-hpc:master Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants