-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hostfrequencyscaling #140
Hostfrequencyscaling #140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Most of the changes I suggested are just small things to cleanup from when we were debugging stuff.
kraken/main.go.tpl
Outdated
k.Sme.Thaw() | ||
// Thaw if full state | ||
if len(parents) == 0 { | ||
k.Sme.Thaw() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want this in a separate PR, but ask Lowell.
// PxeURL refers to PXE object | ||
PxeURL string = "type.googleapis.com/proto.RPi3/Pxe" | ||
|
||
// ModuleStateURL refers to module state | ||
ModuleStateURL string = "/Services/hostfrequencyscaling/State" | ||
|
||
// HostThermalStateURL points to Thermal extension | ||
HostThermalStateURL string = "type.googleapis.com/proto.HostThermal/State" | ||
|
||
// NodeIPURL provides node IP address | ||
NodeIPURL string = "type.googleapis.com/proto.IPv4OverEthernet/Ifaces/0/Ip/Ip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constant values probably don't need to be exported, but it probably doesn't really matter. Ask Lowell
I just realized a lot of this PR contains changes from #112. We should make sure that gets merged first, then do a rebase. |
7e0076f
to
83f6e48
Compare
93f15ed
to
a95b77b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things need to be cleaned up, but otherwise looks good.
config/pipxe-child.yaml
Outdated
# kraken-build.go: describes a build for a BitScope Raspberry Pi cluster | ||
targets: | ||
'linux-arm64': # this identifies the build, will be appended to the binary name | ||
os: 'linux' # os must match a supported GOOS | ||
arch: 'arm64' # arch must match a supported GOARCH | ||
'linux-amd64': | ||
os: 'linux' | ||
arch: 'amd64' | ||
'darwin-amd64': | ||
os: 'darwin' | ||
arch: 'amd64' | ||
|
||
# included extensions | ||
extensions: | ||
- github.com/hpc/kraken/extensions/IPv4 | ||
- github.com/hpc/kraken/extensions/RPi3 | ||
- github.com/hpc/kraken/extensions/HostThermal | ||
- github.com/hpc/kraken/extensions/HostFrequencyScaler | ||
# included modules | ||
modules: | ||
- github.com/hpc/kraken/modules/restapi | ||
- github.com/hpc/kraken/modules/rfpipower | ||
- github.com/hpc/kraken/modules/pipxe | ||
- github.com/hpc/kraken/modules/hostthermaldiscovery | ||
- github.com/hpc/kraken/modules/hostfrequencyscaling | ||
- github.com/hpc/kraken/modules/cpuburn | ||
- github.com/hpc/kraken/modules/websocket | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file looks like it's the same as pipxe so it can probably be removed.
// // setup a ticker for checking whether PS is enforced in Thermal bound scenario | ||
// if hfs.cfg.GetThermalBoundScaler() == true { | ||
|
||
// dur, _ := time.ParseDuration("1s") | ||
// thermalCheckTick := time.NewTicker(dur) | ||
|
||
// // thermal ticker | ||
// for { | ||
// select { | ||
// case <-thermalCheckTick.C: | ||
// if hfs.psEnforced == true { | ||
// go hfs.CheckThermalThreshold() | ||
// } | ||
|
||
// break | ||
// } | ||
// } | ||
|
||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
// if currentThermal >= thresholdThermal { | ||
// hfs.mutex.Lock() | ||
// hfs.psEnforced = true | ||
// hfs.mutex.Unlock() | ||
// } else | ||
if (currentThermal / 1000) < thresholdThermal { | ||
hfs.mutex.Lock() | ||
hfs.psEnforced = false | ||
hfs.mutex.Unlock() | ||
|
||
// url := lib.NodeURLJoin(node.ID().String(), hostFreqScalerURL) | ||
// ev := core.NewEvent( | ||
// lib.Event_DISCOVERY, | ||
// url, | ||
// &core.DiscoveryEvent{ | ||
// URL: url, | ||
// ValueID: profileMap["performance"], | ||
// }, | ||
// ) | ||
// hfs.dchan <- ev | ||
} | ||
//hfs.api.Logf(lib.LLERROR, "*** T E M P ***: %v", currentThermal/1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed commented code
// } else { | ||
// url := lib.NodeURLJoin(node.ID().String(), hostHightoLowFreqScalerURL) | ||
// ev := core.NewEvent( | ||
// lib.Event_DISCOVERY, | ||
// url, | ||
// &core.DiscoveryEvent{ | ||
// URL: url, | ||
// ValueID: currentScalingConfig.CurScalingGovernor, | ||
// }, | ||
// ) | ||
// hfs.dchan <- ev | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
// if hostFreqScaler == hostDisc.preFreqScaler { | ||
// // no change in frequency scaler so no need to generate discovery event | ||
// return | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
|
||
vid := profileMap[hostFreqScaler] | ||
|
||
//hostDisc.api.Logf(lib., "PRINTSCALER: %s", vid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
hostDisc.api.Logf(lib.LLERROR, "Reading CPU thermal sensor failed: %v", err) | ||
return "" | ||
} | ||
//fscalingGovernor := strings.TrimSuffix(string(bscalingGovernor), "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
// cpuThermalThresholds := hostDisc.cfg.GetThermalThresholds() | ||
// lowerNormal := cpuThermalThresholds["CPUThermalThresholds"].GetLowerNormal() | ||
// upperNormal := cpuThermalThresholds["CPUThermalThresholds"].GetUpperNormal() | ||
|
||
lowerHigh := cpuThermalThresholds["CPUThermalThresholds"].GetLowerHigh() | ||
upperHigh := cpuThermalThresholds["CPUThermalThresholds"].GetUpperHigh() | ||
// lowerHigh := cpuThermalThresholds["CPUThermalThresholds"].GetLowerHigh() | ||
// upperHigh := cpuThermalThresholds["CPUThermalThresholds"].GetUpperHigh() | ||
|
||
lowerCritical := cpuThermalThresholds["CPUThermalThresholds"].GetLowerCritical() | ||
upperCritical := cpuThermalThresholds["CPUThermalThresholds"].GetUpperCritical() | ||
// lowerCritical := cpuThermalThresholds["CPUThermalThresholds"].GetLowerCritical() | ||
// upperCritical := cpuThermalThresholds["CPUThermalThresholds"].GetUpperCritical() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
c07fc7d
to
0502246
Compare
This module performs mutations related to scaling of CPU frequency to control CPU thermal conditions of HPC node using in-band mechanism.
This module clocks HPC node to "schedutil" scaling governor on boot and whenever CPU temperature reaches to high (warning) condition, module mutates the scaling governor to "powersave". Current implementation handles critical CPU temperature same as high CPU temperature.
Additionally, there are many other mutations intended for different use cases (e.g. switching back to "schedutil" after "powersave") are under considerations and investigations.