Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[alarm] refactor new alarm #2902

Merged
merged 77 commits into from
Jan 4, 2025
Merged
Changes from 1 commit
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
efba1f6
[improve] update backend monitor tags to label
tomsun28 Dec 17, 2024
62e84cf
[improve] update alarm fra
tomsun28 Dec 19, 2024
c8eec91
Merge branch 'master' into new-alarm
tomsun28 Dec 24, 2024
ac5f1b3
[improve] update alarm stru
tomsun28 Dec 24, 2024
6b781c4
[improve] update alarm
tomsun28 Dec 25, 2024
e15418e
[improve] update alarm
tomsun28 Dec 25, 2024
469ec93
[improve] update alarm
tomsun28 Dec 25, 2024
5a790f4
[improve] update alarm
tomsun28 Dec 25, 2024
557fa93
[improve] update alarm
tomsun28 Dec 25, 2024
5bd9d21
[improve] update alarm
tomsun28 Dec 25, 2024
bf3f6d6
[improve] update alarm
tomsun28 Dec 25, 2024
78db1b1
[improve] update alarm
tomsun28 Dec 25, 2024
4741cc9
[improve] update alarm
tomsun28 Dec 25, 2024
ef49b72
[webapp] update alert pojo
tomsun28 Dec 25, 2024
604de18
[webapp] update alert pojo
tomsun28 Dec 25, 2024
a7438fb
[improve] update alarm
tomsun28 Dec 25, 2024
d7a4e92
[improve] update alarm
tomsun28 Dec 25, 2024
f5acbfe
[improve] update alarm
tomsun28 Dec 25, 2024
d97830c
[improve] update alarm
tomsun28 Dec 25, 2024
dc247d0
[improve] update alarm
tomsun28 Dec 25, 2024
cae868d
[improve] update alarm
tomsun28 Dec 25, 2024
bcd8e76
[improve] update alarm
tomsun28 Dec 25, 2024
e3b5de7
[improve] update alarm
tomsun28 Dec 25, 2024
107581a
[improve] update alarm
tomsun28 Dec 25, 2024
d7794f5
[improve] update alarm
tomsun28 Dec 25, 2024
e7122b9
[improve] update alarm
tomsun28 Dec 25, 2024
81934a0
[improve] update alarm
tomsun28 Dec 25, 2024
17798de
[improve] update alarm
tomsun28 Dec 25, 2024
46c5271
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/cal…
tomsun28 Dec 25, 2024
ff4b4fa
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/red…
tomsun28 Dec 25, 2024
fa3d43b
Update hertzbeat-common/src/main/java/org/apache/hertzbeat/common/ent…
tomsun28 Dec 25, 2024
8f321c0
Update hertzbeat-base/pom.xml
tomsun28 Dec 25, 2024
7cbd5ce
Update hertzbeat-alerter/src/test/java/org/apache/hertzbeat/alert/red…
tomsun28 Dec 25, 2024
79196c6
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/cal…
tomsun28 Dec 25, 2024
a8b6204
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/ser…
tomsun28 Dec 25, 2024
da6be30
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/ser…
tomsun28 Dec 25, 2024
be6724e
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/ser…
tomsun28 Dec 25, 2024
1555cf1
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/ser…
tomsun28 Dec 25, 2024
abd9dd1
Update hertzbeat-alerter/src/test/java/org/apache/hertzbeat/alert/red…
tomsun28 Dec 25, 2024
48ba387
Update hertzbeat-base/pom.xml
tomsun28 Dec 25, 2024
0c3e0bb
Update hertzbeat-common/src/main/java/org/apache/hertzbeat/common/ent…
tomsun28 Dec 25, 2024
0753a7e
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/cal…
tomsun28 Dec 25, 2024
fdc667d
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/cal…
tomsun28 Dec 25, 2024
0e1901a
Update hertzbeat-alerter/src/main/java/org/apache/hertzbeat/alert/red…
tomsun28 Dec 25, 2024
0089821
Merge branch 'master' into new-alarm
tomsun28 Dec 25, 2024
201c6cf
[improve] update alarm
tomsun28 Dec 26, 2024
5e36d75
[improve] update alarm
tomsun28 Dec 26, 2024
2df0abb
[improve] update alarm
tomsun28 Dec 26, 2024
238a2b8
[improve] update alarm
tomsun28 Dec 26, 2024
8333382
Merge branch 'master' into new-alarm
Calvin979 Dec 27, 2024
2a3a61c
Merge branch 'master' into new-alarm
tomsun28 Dec 28, 2024
022bbcb
Merge branch 'master' into new-alarm
tomsun28 Dec 30, 2024
45dcb3d
[improve] update alarm
tomsun28 Dec 30, 2024
b85eb1a
[feature] update alert define and add realtime, periodic threshold (#…
tomsun28 Jan 1, 2025
6c29b2e
Merge branch 'master' into new-alarm
tomsun28 Jan 1, 2025
cfaef28
[refactor] support alarm threshold bind monitors (#2933)
tomsun28 Jan 1, 2025
65d3c41
Merge branch 'master' into new-alarm
tomsun28 Jan 1, 2025
d73c09c
[improve] update alarm define
tomsun28 Jan 2, 2025
798edcb
[alarm] update alarm center ui and group alarm config (#2938)
tomsun28 Jan 2, 2025
7682cf4
[alarm] support alarm inhibit web ui (#2940)
tomsun28 Jan 2, 2025
9224d31
[alarm] combine common labels and update ui (#2944)
tomsun28 Jan 3, 2025
9344d3f
Merge branch 'master' into new-alarm
tomsun28 Jan 3, 2025
1755dbe
[improve] update labels
tomsun28 Jan 3, 2025
3d9a124
[improve] fix license
tomsun28 Jan 3, 2025
4f8e352
[improve] fix test
tomsun28 Jan 3, 2025
b529b26
Merge branch 'master' into new-alarm
tomsun28 Jan 3, 2025
3c4bcfe
[improve] fix test
tomsun28 Jan 3, 2025
9c33d8b
[improve] fix test
tomsun28 Jan 3, 2025
5f1ada4
AlarmInhibitReduce
a-little-fool Jan 3, 2025
6d604cc
AlarmInhibitReduce
a-little-fool Jan 3, 2025
e66040d
[improve] fix annotation
a-little-fool Jan 3, 2025
61c8bad
[improve] fix
tomsun28 Jan 3, 2025
7e19ed8
[improve] update group reduce
tomsun28 Jan 4, 2025
fa81d56
[improve] update group reduce
tomsun28 Jan 4, 2025
18b60eb
[impove] update alarm and labels relate ui (#2946)
tomsun28 Jan 4, 2025
c91a098
Merge branch 'master' into new-alarm
tomsun28 Jan 4, 2025
1fa25cf
Merge branch 'master' into new-alarm
zqr10159 Jan 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
AlarmInhibitReduce
  • Loading branch information
a-little-fool committed Jan 3, 2025
commit 5f1ada47d2b0ee243e3378199d83e9d7ecc016a3
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,14 @@
import org.apache.hertzbeat.common.entity.alerter.GroupAlert;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.Collections;
import java.util.stream.Collectors;
import java.util.HashMap;

import lombok.Data;
import lombok.AllArgsConstructor;

Expand All @@ -46,9 +48,9 @@ public class AlarmInhibitReduce {
private static final long SOURCE_ALERT_TTL = 4 * 60 * 60 * 1000L;

private final AlarmSilenceReduce alarmSilenceReduce;

private final Map<Long, AlertInhibit> inhibitRules;

/**
* Cache for source alerts
* key: ruleId
Expand All @@ -63,177 +65,256 @@ public AlarmInhibitReduce(AlarmSilenceReduce alarmSilenceReduce, AlertInhibitDao
List<AlertInhibit> inhibits = alertInhibitDao.findAlertInhibitsByEnableIsTrue();
refreshInhibitRules(inhibits);
}

/**
* Configure inhibit rules
* @param rules Inhibit rule list
*/
public void refreshInhibitRules(List<AlertInhibit> rules) {
this.inhibitRules.clear();
rules.forEach(rule -> this.inhibitRules.put(rule.getId(), rule));
if (rules == null) {
log.warn("Attempted to refresh inhibit rules with null list.");
return;
}
try {
this.inhibitRules.clear();
rules.forEach(rule -> this.inhibitRules.put(rule.getId(), rule));
} catch (Exception e) {
log.error("Error refreshing inhibit rules", e);
}
}

/**
* Process alert with inhibit rules
* If alert is inhibited, it will not be forwarded
* @param groupAlert Grouped and pending alerts to be processed
*/
public void inhibitAlarm(GroupAlert groupAlert) {
if (inhibitRules.isEmpty()) {
// No inhibit rules, forward directly
alarmSilenceReduce.silenceAlarm(groupAlert);
if (groupAlert == null) {
log.warn("Received null GroupAlert. Skipping processing.");
return;
}
try {
if (inhibitRules.isEmpty()) {
alarmSilenceReduce.silenceAlarm(groupAlert);
return;
}

// Check if this alert can be a source alert that inhibits others
for (AlertInhibit rule : inhibitRules.values()) {
if (isSourceAlert(groupAlert, rule)) {
// Cache this alert as active source
cacheSourceAlert(groupAlert, rule);
for (AlertInhibit rule : inhibitRules.values()) {
if (isSourceAlert(groupAlert, rule)) {
cacheSourceAlert(groupAlert, rule);
}
}
}

// Check if this alert should be inhibited
if (shouldInhibit(groupAlert)) {
log.debug("Alert {} is inhibited", groupAlert);
return;
}
if (shouldInhibit(groupAlert)) {
log.debug("Alert {} is inhibited", groupAlert);
return;
}

// Forward if not inhibited
alarmSilenceReduce.silenceAlarm(groupAlert);
alarmSilenceReduce.silenceAlarm(groupAlert);
} catch (Exception e) {
log.error("Error inhibiting alarm for {}", groupAlert, e);
}
}

/**
* Check if alert matches inhibit rule source labels
* @param alert Grouped and pending alerts to be processed
* @param rule The rule of inhibition
*/
private boolean isSourceAlert(GroupAlert alert, AlertInhibit rule) {
// Must be firing status
if (!"firing".equals(alert.getStatus())) {
if (alert == null || rule == null) {
log.warn("Received null alert or rule in isSourceAlert");
return false;
}
try {
if (!"firing".equals(alert.getStatus())) {
return false;
}
return matchLabels(alert.getCommonLabels(), rule.getSourceLabels());
} catch (Exception e) {
log.error("Error checking if alert is source alert", e);
return false;
}

// Check if labels match
return matchLabels(alert.getCommonLabels(), rule.getSourceLabels());
}

/**
* Check if alert should be inhibited by any active source alerts
* @param alert Grouped and pending alerts to be processed
*/
private boolean shouldInhibit(GroupAlert alert) {
// Resolved alerts are never inhibited
if ("resolved".equals(alert.getStatus())) {
if (alert == null) {
log.warn("Received null alert in shouldInhibit");
return false;
}

for (AlertInhibit rule : inhibitRules.values()) {
// Check if alert matches target labels
if (!matchLabels(alert.getCommonLabels(), rule.getTargetLabels())) {
continue;
try {
if ("resolved".equals(alert.getStatus())) {
return false;
}

// Check if there are active source alerts for this rule
List<GroupAlert> sourceAlerts = getActiveSourceAlerts(rule);
if (sourceAlerts.isEmpty()) {
continue;
}
for (AlertInhibit rule : inhibitRules.values()) {
if (!matchLabels(alert.getCommonLabels(), rule.getTargetLabels())) {
continue;
}

// Check equal labels
for (GroupAlert source : sourceAlerts) {
if (matchEqualLabels(source, alert, rule.getEqualLabels())) {
return true;
List<GroupAlert> sourceAlerts = getActiveSourceAlerts(rule);
if (sourceAlerts.isEmpty()) {
continue;
}

for (GroupAlert source : sourceAlerts) {
if (matchEqualLabels(source, alert, rule.getEqualLabels())) {
return true;
}
}
}
return false;
} catch (Exception e) {
log.error("Error checking if alert should be inhibited", e);
return false;
}
return false;
}

/**
* Check if all required labels match
* @param alertLabels The label of the alarm
* @param requiredLabels Labels to be matched
*/
private boolean matchLabels(Map<String, String> alertLabels, Map<String, String> requiredLabels) {
if (alertLabels == null || requiredLabels == null) {
log.warn("Received null alertLabels or requiredLabels in matchLabels");
return false;
}
try {
return requiredLabels.entrySet().stream()
.allMatch(entry -> entry.getValue().equals(alertLabels.get(entry.getKey())));
} catch (Exception e) {
log.error("Error matching labels", e);
return false;
}
return requiredLabels.entrySet().stream()
.allMatch(entry -> entry.getValue().equals(alertLabels.get(entry.getKey())));
}

/**
* Check if equal labels have same values in both alerts
* @param source Alarm used to suppress other alarms
* @param target Alarm that may be suppressed
* @param equalLabels Need to be equal labels
*/
private boolean matchEqualLabels(GroupAlert source, GroupAlert target, List<String> equalLabels) {
if (equalLabels == null || equalLabels.isEmpty()) {
return true;
if (source == null || target == null) {
log.warn("Received null source or target in matchEqualLabels");
return false;
}
try {
if (equalLabels == null || equalLabels.isEmpty()) {
return true;
}
Map<String, String> sourceLabels = source.getCommonLabels();
Map<String, String> targetLabels = target.getCommonLabels();

return equalLabels.stream().allMatch(label -> {
String sourceValue = sourceLabels.get(label);
String targetValue = targetLabels.get(label);
return sourceValue != null && sourceValue.equals(targetValue);
});
} catch (Exception e) {
log.error("Error matching equal labels", e);
return false;
}
Map<String, String> sourceLabels = source.getCommonLabels();
Map<String, String> targetLabels = target.getCommonLabels();

return equalLabels.stream().allMatch(label -> {
String sourceValue = sourceLabels.get(label);
String targetValue = targetLabels.get(label);
return sourceValue != null && sourceValue.equals(targetValue);
});
}

/**
* Cache source alert for inhibit rule
* @param alert Grouped and pending alerts to be processed
* @param rule The rule of inhibition
*/
private void cacheSourceAlert(GroupAlert alert, AlertInhibit rule) {
// Get or create cache for this rule
Map<String, SourceAlertEntry> ruleCache = sourceAlertCache.computeIfAbsent(
rule.getId(),
k -> new ConcurrentHashMap<>()
);

// Generate fingerprint for deduplication
String fingerprint = generateAlertFingerprint(alert);

// Update or add cache entry
SourceAlertEntry entry = new SourceAlertEntry(
alert,
System.currentTimeMillis(),
System.currentTimeMillis() + SOURCE_ALERT_TTL
);
ruleCache.put(fingerprint, entry);

// Cleanup expired entries
cleanupExpiredEntries(ruleCache);
if (alert == null || rule == null) {
log.warn("Received null alert or rule in cacheSourceAlert");
return;
}
try {
Map<String, SourceAlertEntry> ruleCache = sourceAlertCache.computeIfAbsent(
rule.getId(),
k -> new ConcurrentHashMap<>()
);

String fingerprint = generateAlertFingerprint(alert);
SourceAlertEntry entry = new SourceAlertEntry(
alert,
System.currentTimeMillis(),
System.currentTimeMillis() + SOURCE_ALERT_TTL
);
ruleCache.put(fingerprint, entry);
cleanupExpiredEntries(ruleCache);
} catch (Exception e) {
log.error("Error caching source alert", e);
}
}

/**
* Get active source alerts for inhibit rule
* @param rule The rule of inhibition
*/
private List<GroupAlert> getActiveSourceAlerts(AlertInhibit rule) {
Map<String, SourceAlertEntry> ruleCache = sourceAlertCache.get(rule.getId());
if (ruleCache == null || ruleCache.isEmpty()) {
if (rule == null) {
log.warn("Received null rule in getActiveSourceAlerts");
return Collections.emptyList();
}
try {
Map<String, SourceAlertEntry> ruleCache = sourceAlertCache.get(rule.getId());
if (ruleCache == null || ruleCache.isEmpty()) {
return Collections.emptyList();
}

long now = System.currentTimeMillis();
return ruleCache.values().stream()
.filter(entry -> entry.getExpiryTime() > now)
.map(SourceAlertEntry::getAlert)
.collect(Collectors.toList());
} catch (Exception e) {
log.error("Error getting active source alerts", e);
return Collections.emptyList();
}

long now = System.currentTimeMillis();
return ruleCache.values().stream()
.filter(entry -> entry.getExpiryTime() > now)
.map(SourceAlertEntry::getAlert)
.collect(Collectors.toList());
}

/**
* Generate fingerprint for alert deduplication
* @param alert Grouped and pending alerts to be processed
*/
private String generateAlertFingerprint(GroupAlert alert) {
Map<String, String> labels = new HashMap<>(alert.getCommonLabels());
// Remove timestamp related fields
labels.remove("timestamp");

return labels.entrySet().stream()
.sorted(Map.Entry.comparingByKey())
.map(e -> e.getKey() + ":" + e.getValue())
.collect(Collectors.joining(","));
if (alert == null) {
log.warn("Received null alert in generateAlertFingerprint");
return "";
}
try {
Map<String, String> labels = new HashMap<>(alert.getCommonLabels());
labels.remove("timestamp");

return labels.entrySet().stream()
.sorted(Map.Entry.comparingByKey())
.map(e -> e.getKey() + ":" + e.getValue())
.collect(Collectors.joining(","));
} catch (Exception e) {
log.error("Error generating alert fingerprint", e);
return "";
}
}

/**
* Remove expired entries from cache
* @param cache Source alert cache entry map
*/
private void cleanupExpiredEntries(Map<String, SourceAlertEntry> cache) {
long now = System.currentTimeMillis();
cache.entrySet().removeIf(entry -> entry.getValue().getExpiryTime() <= now);
if (cache == null) {
log.warn("Received null cache in cleanupExpiredEntries");
return;
}
try {
long now = System.currentTimeMillis();
cache.entrySet().removeIf(entry -> entry.getValue().getExpiryTime() <= now);
} catch (Exception e) {
log.error("Error cleaning up expired entries", e);
}
}

/**
Expand Down