-
-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid duplicates with alias BACKEND #3685
base: master
Are you sure you want to change the base?
Conversation
This commit introduces a new functionality that addresses the issue of duplicate vulnerabilities by comparing the priority of sources and the aliases attached to a component. The implementation required adding new database rows to support the changes. To ensure the correctness of the implementation, tests have been added to validate the behavior of the updated functionality. In addition, a new API endpoint has been added to get the actual Enabled Sources. It is important to note that this update specifically affects the addVulnerability function and is not able to delete a vulnerability in any case. Signed-off-by: Andres Tito <[email protected]>
Not sure if I fully understand the PR. Does it only one vulnerability, the one from the source with the highest priority? It feels to me that this would be a "bolt on" solution to a database model that should be changed. Wouldn't it be better to have data model that has one vulnerability that can have multiple aliases (from different sources possibly)? Currently with multiple aliases/sources there are (can be / most often are) multiple rows of vulnerabilities. This makes a lot of things harder and more complicated. For example determining the number of affected project for a vulnerability or something "as simple as" sorting the list of vulnerabilities by number of affected projects. |
PR Description updated: I hope that with the example images the PR will be better explained @valentijnscholten |
… I remove it for now, issues with isEmpty, better use != null Signed-off-by: Andres Tito <[email protected]>
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more |
…y-track into AvoidDuplicatesWithAlias Signed-off-by: Andres Tito <[email protected]>
I think this might also solve #2181 would be greatly appreciated :) 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @LaVibeX!
I added a few comments. I wasn't able to test this since it doesn't build for me locally, which is unfortunately caused by a recent refactoring in Alpine and DT (#3730).
Also I have to say that I do agree with @valentijnscholten in that we really should change the underlying data model to make aliases more useful in general. The problem with the priority approach is that you get inconsistent outcomes across projects, depending on which vulnerability came first, and / or the order in which they were processed, which can become quite confusing.
On a related note, during the last community meeting we briefly mentioned that we're considering building a pre-compiled vulnerability database: https://youtu.be/9harG5GcV_E?t=2799. One of the things that it would make easier is the correlation of aliases across vulnerability sources. Perhaps give that a watch and let us know your thoughts?
It would go along quite well with a change in the data model as proposed by @valentijnscholten:
Wouldn't it be better to have data model that has one vulnerability that can have multiple aliases (from different sources possibly)?
I am happy to provide a feature branch so people can test the approach in this PR out, if you're interested to continue working on it.
setVulnerabilityAliasesIfNull(vulnerability); | ||
boolean vulnerabilityExists = checkVulnerabilityExists(vulnerability, component); | ||
|
||
if (!vulnerabilityExists){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will doing this here not cause it all to depend on the order in which vulnerabilities are processed?
For example, if my priority list has:
- CVE
- GHSA
And the GHSA is processed before the corresponding CVE, de-duplication won't happen until the next time the component is analyzed.
Ideally, given the same set of vulnerabilities being reported across all sources, we should get the same consistent outcome, regardless of the order in which they were processed.
IMO, if we end up doing this sort of de-duplication, we should do it after we have the results from all scanners, so possibly somewhere here:
dependency-track/src/main/java/org/dependencytrack/tasks/VulnerabilityAnalysisTask.java
Lines 96 to 130 in ad5e911
private void analyzeComponents(final QueryManager qm, final List<Component> components, final Event event) { | |
/* | |
When this task is processing events that specify the components to scan, | |
separate them out into 'candidates' so that we can fire off multiple events | |
in hopes of perform parallel analysis using different analyzers. | |
*/ | |
final InternalAnalysisTask internalAnalysisTask = new InternalAnalysisTask(); | |
final OssIndexAnalysisTask ossIndexAnalysisTask = new OssIndexAnalysisTask(); | |
final VulnDbAnalysisTask vulnDbAnalysisTask = new VulnDbAnalysisTask(); | |
final SnykAnalysisTask snykAnalysisTask = new SnykAnalysisTask(); | |
final TrivyAnalysisTask trivyAnalysisTask = new TrivyAnalysisTask(); | |
final List<Component> internalCandidates = new ArrayList<>(); | |
final List<Component> ossIndexCandidates = new ArrayList<>(); | |
final List<Component> vulnDbCandidates = new ArrayList<>(); | |
final List<Component> snykCandidates = new ArrayList<>(); | |
final List<Component> trivyCandidates = new ArrayList<>(); | |
for (final Component component : components) { | |
inspectComponentReadiness(component, internalAnalysisTask, internalCandidates); | |
inspectComponentReadiness(component, ossIndexAnalysisTask, ossIndexCandidates); | |
inspectComponentReadiness(component, vulnDbAnalysisTask, vulnDbCandidates); | |
inspectComponentReadiness(component, snykAnalysisTask, snykCandidates); | |
inspectComponentReadiness(component, trivyAnalysisTask, trivyCandidates); | |
} | |
qm.detach(components); | |
// Do not call individual async events when processing a known list of components. | |
// Call each analyzer task sequentially and catch any exceptions as to prevent one analyzer | |
// from interrupting the successful execution of all analyzers. | |
performAnalysis(internalAnalysisTask, new InternalAnalysisEvent(internalCandidates), internalAnalysisTask.getAnalyzerIdentity(), event); | |
performAnalysis(ossIndexAnalysisTask, new OssIndexAnalysisEvent(ossIndexCandidates), ossIndexAnalysisTask.getAnalyzerIdentity(), event); | |
performAnalysis(snykAnalysisTask, new SnykAnalysisEvent(snykCandidates), snykAnalysisTask.getAnalyzerIdentity(), event); | |
performAnalysis(trivyAnalysisTask, new TrivyAnalysisEvent(trivyCandidates), trivyAnalysisTask.getAnalyzerIdentity(), event); | |
performAnalysis(vulnDbAnalysisTask, new VulnDbAnalysisEvent(vulnDbCandidates), vulnDbAnalysisTask.getAnalyzerIdentity(), event); | |
} |
The problem is that currently, vulnerabilities are "persisted", and notifications are sent, as soon as they are found. De-duplication is supposed to reduce the noise, so we'd need to refactor the scanning such that these things only happen at the very end, when all scanners completed their work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality first checks the priority order, then examines aliases in the context of the vulnerability. Aliases are loaded at the time the vulnerability is created, separate from the vulnerability's processing order.
Suppose VULNDB has an alias [NVD], and NVD has a higher priority. If, for some reason, vulndb is processed before NVD, the system will check if the vulnerability contains any NVD alias. If it does, the system will not add that vulnerability to avoid duplication. Subsequently, when NVD is processed, considering the priority configurations, it will be added.
*I will leave more cases on the PR Description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read and looked at it three times, and I am not sure I understand it. It seems to rely on the list of aliases being complete and reliable 100% of the time. Even when the vulnerability was published by any source. I am not sure if NVD publishes a vulnerability the aliases field in for example OSV will contain the correct GitHub alias straightaway for example. You would need a lot of test cases to make sure it behaves as expected. You don't want to create false negatives. And you want consistent behaviour. If you have 10 projects all using the exact same component, you want all 10 projects to have the same vulnerabilities from the same source/analyzer attached to that component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have observed that vulnerabilities often appear in other sources before NVD assigns a CVE ID. It is essential to select sources like GitHub or VulnDB, which provide the most up-to-date vulnerabilities without waiting for a CVE ID. When NVD releases a CVE ID and DT updates any changes in GitHub or VulnDB, these sources will map the CVE ID to their corresponding VulnID, adding a new alias to the table without creating duplicate "new vulnerabilities" in the component audit vulnerabilities view.
The test cases I provided are reliable and cover all possible outcomes. However, I am open to implementing a consistent check and reporting back on its behavior.
Also I would like to know if we agree on this approach or you still believe that an internal id would be better to tackle this problem?
#1994 (comment)
try { | ||
priorityList = configPropertyQueryManager.parsePriorityList(); | ||
|
||
} catch (Exception ex) { | ||
LOGGER.warn("An unexpected error occurred while retrieving the preference list for alias duplicates", ex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will be executed by multiple threads in parallel, hence it's not a good idea to work with class-level fields like this. Also we will want to avoid loading it for every single vulnerability over and over again, so some sort of caching would be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look on this
//Update PriorityList with new Enabled/Disabled Sources to avoid conflicts | ||
ConfigPropertyQueryManager configPropertyQueryManager = new ConfigPropertyQueryManager(); | ||
if(configPropertyQueryManager.isDedupEnabled()){ | ||
configPropertyQueryManager.updatePropertiesFromEnabledSources(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems out-of-place here, perhaps used for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeking a method to update and retrieve the enabled sources before creating a new component. It would be ideal to keep this check in the component creation process to ensure that the most up-to-date data is used and to maintain the correct priority logic.
Should I keep it there or should I move it somewhere else?
Thank you for your comments on the PR. You're right that a more ideal solution would be to modify the data model to have one vulnerability with multiple aliases from different sources. This change would indeed make it easier to determine the number of affected projects for a vulnerability and sort the list of vulnerabilities by the number of affected projects. However, I would like to present the current PR as a reasonable approach to solving the issue of duplicate vulnerabilities. The implementation compares the priority of sources and the aliases attached to a component, which helps in reducing duplicates. I understand your concerns about inconsistent outcomes across projects, but this solution is a step towards addressing the issue while we wait for the new database that can accommodate the required changes in the data model. I understand that the current issue of duplicate vulnerabilities is persistent and makes it difficult for most of our teams to activate other sources due to the duplicates noise. This is indeed unfortunate because multiple sources is one of the best aspects of Dependency-Track. In the meantime, I hope you find the current solution helpful. I am open to any feedback or suggestions to improve it further. I will address and answer your reviews and questions @nscuro. Best, |
hi With your feature we can try to perform a distinction between publisher reviews that arrive through OSV and CVEs which are unique id of vulnerabilities. e.g. debian advisory [SECURITY] [DSA 5759-1] python3.11 security update (debian.org) (from OSV DSA-5759-1 - OSV) about 3 vulnerabilities (NVD - CVE-2024-8088 (nist.gov) NVD - CVE-2024-4032 (nist.gov) NVD - CVE-2024-8088 (nist.gov)) |
Description
This PR introduces a new functionality that addresses the issue of duplicate vulnerabilities by comparing the priority of sources and the aliases attached to a component.
The implementation required adding new database rows to support the changes.
To ensure the correctness of the implementation, tests have been added to validate the behavior of the updated functionality.
In addition, a new API endpoint has been added to get the actual Enabled Sources.
It is important to note that this update specifically affects the addVulnerability function and is not able to delete a vulnerability in any case.
Fronted changes: DependencyTrack/frontend#838
I'm open to discussing any changes or improvements👍🏽
Examples:
Alias Deduplication Disabled
Vulnerability source with highest priority: NVD
Vulnerability source with highest priority: GITHUB
Flow Charts for better understanding:
Case 3: (The vulnerability source has a higher priority, but the alias is in the component):
This case can occur only if a higher-priority source vulnerability does not exist at that time. A lower-priority source vulnerability will be added. Later, upon alias mapping and reanalysis, the higher-priority source vulnerability will not be added because the alias is already present.
Addressed Issue
This PR fixes #1994 and #2181
Additional Details
Add a new file named ConfigPropertyQueryManager.java to manage functions related to the EnabledSources
Checklist