Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBOMs from Ubuntu with purls seam not to work with Google OSV #4445

Open
2 tasks done
Andre-85 opened this issue Dec 6, 2024 · 15 comments
Open
2 tasks done

SBOMs from Ubuntu with purls seam not to work with Google OSV #4445

Andre-85 opened this issue Dec 6, 2024 · 15 comments
Labels
defect Something isn't working in triage

Comments

@Andre-85
Copy link

Andre-85 commented Dec 6, 2024

Current Behavior

Hello together,
I've noticed that in DependencyTrack in version 4.12.1 supports fetching vulnerabilities from Google's OSV service. So wanted to test DependencyTrack with an SBOM containing PURLs i can also use for CVE-fetching via Google API. As example i choose the ubuntu package atftp in version 0.7.git20120829-3.1~0.18.04.1 which has the known CVE'S CVE-2020-6097, CVE-2021-46671 and CVE-2021-41054. But DependencyTrack reports no issues at all.

Since i was a bit unsure about the supported debian/ubuntu PURL format, i made different variants containing arch and/or distro and completely without arch/distro:
purl_test_name_version.json
purl_test_name_version_arch_armhf.json
purl_test_name_version_arch_armhf_distro_bionic.json
purl_test_name_version_arch_src.json
purl_test_name_version_arch_src_distro_bionic.json
purl_test_name_version_distro_bionic.json

Steps to Reproduce

Upload the SBOMs which i uploaded to this ticket and you will find no vulnerabilities.

Expected Behavior

Just last week Google closed issue #2842 (reported by me) on OSV services and made CVE-fetching via purl possible.
So right now it is possible to fetch the CVE's (CVE-2020-6097, CVE-2021-46671 and CVE-2021-41054) belonging to atftp for the given version from Google OSV with:
curl -d '{"package": {"purl": "pkg:deb/ubuntu/[email protected]~0.18.04.1"}}' https://api.osv.dev/v1/query

I would be great if DependencyTrack could fetch the CVE's with a SBOM with this PURL in it.

Let me know how I can help to figure this problem out.

Dependency-Track Version

4.12.1

Dependency-Track Distribution

Container Image

Database Server

PostgreSQL

Database Server Version

No response

Browser

Mozilla Firefox

Checklist

@valentijnscholten
Copy link
Contributor

Just last week Google closed issue #2842 (reported by me)

The correct link (I assume) is: google/osv.dev#2842 :-)

@Andre-85
Copy link
Author

I updated to Version 4.12.2 but still the same

@Andre-85
Copy link
Author

While looking for a solution i found #1374. Also this problem is caused by a insuffient regex in version parsing. While fixing this bug i developed a theory about how to parse versions, including semantic versioning 2.0.0 and debian versions in general (also see in the issue).

My plan is to develop an enhanced parsing algorithm based on the current one.

For testing I thought about picking random version pairs (because there are so many) from a list of all ubuntu version numbers (they are a superset of debian version numbers and are ~130000 single entries so ~8.5 billion possible pairs), compare each with the enhanced algorithm and dpkg --compare-versions and check if the verdict is in all tested cases the same. From the most representative samples i want to create unittests.

Same i want to do with semantic version pairs (but from what list?) and with a "offical" (which one?) comparing tool.

Further it would be great if this could be done also with version numbers from other ecosystems on Google OSV. But also here would be the question from which version lists and with what offical comparision tool.

@Andre-85
Copy link
Author

Andre-85 commented Jan 2, 2025

Hi @ALL,
i got a working comparison algorithm for generic version parsing with an a test for debian/ubuntu versions.

It works in following steps:

  1. Ecosystem needs to be defined, meaning the sort priority of version number elements. I use three lists for that. In list1 are the elements which sort lower than the empty string (in debian : 1.0.0rc1 < 1.0.0), which sort same as empty string (#: 1.0.0#build1 = 1.0.0) and which sort higher than empty string (in debian anything else), so i got:
 this.eco = new Ecosystem("deb", List.of("~"), List.of("#"), List.of("\\d+", "[a-z]+", "\\+", "-", "\\.", ":"));

which equivalent to ~ < # = "" < \d+ < [a-z]+ < + < - < . < :, and what the debian version algorithm does: https://salsa.debian.org/dpkg-team/dpkg/-/blob/main/scripts/Dpkg/Version.pm?ref_type=heads#L351
2. Tokenize: Concat all the list elements by logical or with in accending order. Matched group number is Token sort priority, the match is the Token value:

String regex = this.eco.getElements().stream().map(e -> "(" + e + ")").reduce((e1, e2) -> e1 + "|" + e2).orElse("");
while (matcher.find()) {
             for (int i = 1; i <= this.eco.getElements().size(); i++) {
                if (matcher.group(i) != null) {
                    String value = matcher.group(i);
                    if(debug) {
                        System.out.print(" '"+value+"' ");
                    }
                    fields.add(new Token(i - 1, value));
                    break;
                }
             }
          }

3.Compare one-by-one: First check priority, then check value. if value is a numeric add leading zeros that both strings got same length. Stop on first difference:

while(true) {
            version1_field = version1_iterator.hasNext() ? version1_iterator.next() : null;
            version2_field = version2_iterator.hasNext() ? version2_iterator.next() : null;


            Integer priority1 = (version1_field != null) ? version1_field.getPriority() : this.eco.getEndOfStringPriority();
            Integer priority2 = (version2_field != null) ? version2_field.getPriority() : this.eco.getEndOfStringPriority();

            if((result_code = Integer.compare(priority1, priority2)) != 0) {
                break;
            }


            if (version1_field == null || version2_field == null) {
                break;
            }

            String value1 = version1_field.getValue();
            String value2 = version2_field.getValue();

            if(Character.isDigit(value1.charAt(0)) && Character.isDigit(value2.charAt(0))) {
                if(value1.length() > value2.length()) {
                    value2 = "0".repeat(value1.length() - value2.length()) + value2;
                }
                else if(value1.length() < value2.length()) {
                    value1 = "0".repeat(value2.length() - value1.length()) + value1;
                }
            }

            if((result_code = value1.compareTo(value2)) != 0) {
                break;
            }
        }

@Andre-85
Copy link
Author

Andre-85 commented Jan 2, 2025

The algorithm it tested with the mentioned testdata set with 130000 version pairs and i discovered some corner cases which are not covered by other implementations:

  1. Some packages use as a version number the datetime in the format YYYYMMDDhhmm which exceeds Integer and crashes many implementations.
  2. Some version number have more than one dash, e.g. 1.0-0-1. According to debian the last dash is the splitter between original version and debian version, so original version should be 1.0-0 and debian version 1. The problem is now that assuming the original version as a semver than the sorting order is 1.0-0 < 1.0 (since - sorts before empty version in semver), but in fact it is sorted as debian version 1.0-0 > 1.0. For accieving the same in in debian order you need the write 1.0~0 . So in general all Debian-Packages should replace - from the original version number with ~ to preserve the version order. Do this consequently, all debian version should contain one dash maximal, but so in fact we need to handle the 2+ dash cases, even if they should IMHO considered be as faulty
  3. The epoch which defaults to 0 if not set

To circumvent these problems i did the design as followed:

  1. Do not parse numbers at all. For comparing it is suffient to pad the number with the less digits with leading zeros to the length of the longer number and do an alphabetic sort then. So the number of the digits maximum allowed is the maximum allowed numbers of characters in a string.
  2. Search for the last dash before tokenizing in the version string an replace it by a \n, assign the \n the next higher priority to EndOfString. This will cause the the versions to be parsed separated and avoids a very complex parser which needs to do a lot of look-ahead to find the last dash.
  3. Just search for epoch separator :, if not set add '0:' to the front. Also this helps to keep the tokenizer simple.

@Andre-85
Copy link
Author

Andre-85 commented Jan 2, 2025

In the next days i will create an fork and add my algorithm. For who's interessted, the algorithm so far:

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.regex.*;

public class vers {
    private static boolean debug = true;

    public class Token {
        private Integer priority;
        private String value;

        public Token(Integer priority, String value) {
            this.priority = priority;
            this.value = value;
        }

        public Integer getPriority() {
            return priority;
        }

        public String getValue() {
            return value;
        }
    }

    public class Ecosystem {
        private String name;
        private final Integer equalToEmptyStringIndex;
        private List<String> elements;

        public Ecosystem(String name, List<String> pre_elements, List<String> ignore_elements, List<String> post_elements) {
            this.name = name;
            this.equalToEmptyStringIndex = pre_elements.size();
            this.elements = new ArrayList<>();
            this.elements.addAll(pre_elements);
            this.elements.addAll(ignore_elements);
            // This acts as a splitter between two different version blocks which are compared separatly
            this.elements.add("\n");
            this.elements.addAll(post_elements);
        }

        public String getName() {
            return name;
        }
        
        public Integer getEndOfStringPriority() {
            return equalToEmptyStringIndex;
        }

        public List<String> getElements() {
            return elements;
        }
    }

    public Ecosystem eco;

    public vers() {
       this.eco = new Ecosystem("deb", List.of("~"), List.of("#"), List.of("\\d+", "[a-z]+", "\\+", "-", "\\.", ":"));
    }

    public List<Token> tokenize(String version) {
        List<Token> fields = new ArrayList<>();
        String regex = this.eco.getElements().stream().map(e -> "(" + e + ")").reduce((e1, e2) -> e1 + "|" + e2).orElse("");
        System.out.println("regex: "+regex);
        Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

        // Debian/Ubuntu specific
        if(this.eco.name=="deb") {
            // When no epoch is given, use default epoch 0
            if (!version.contains(":")) {
               version = "0:" + version;
            }

           // So we replace '-' which acts a blocks splitter (split between upstream and debian version) by the block-splitter "\n" since
           // also upstream versions uses sometimes '-'. But to follow semver with debian sorting it should be debian pre-splitter '~'
           int debianSplitterIndex = version.lastIndexOf("-");
           if(debianSplitterIndex > 0) {
               version = version.substring(0, debianSplitterIndex) + "\n" + version.substring(debianSplitterIndex + 1);
           }
        }

        Matcher matcher = pattern.matcher(version);

         if(debug) {
             System.out.print("Token: ");
         }
         while (matcher.find()) {
             for (int i = 1; i <= this.eco.getElements().size(); i++) {
                if (matcher.group(i) != null) {
                    String value = matcher.group(i);
                    if(debug) {
                        System.out.print(" '"+value+"' ");
                    }
                    fields.add(new Token(i - 1, value));
                    break;
                }
             }
          }

        if(debug) {
            System.out.println("");
        }

        return fields;
    }

    public Integer compareTo(List<Token> version1, List<Token> version2) {
        final Integer version1_code = 1;
        final Integer version2_code = -1;
        Integer result_code = 0;

        Iterator<Token> version1_iterator = version1.iterator();
        Iterator<Token> version2_iterator = version2.iterator();

        Token version1_field;
        Token version2_field;

        while(true) {
            version1_field = version1_iterator.hasNext() ? version1_iterator.next() : null;
            version2_field = version2_iterator.hasNext() ? version2_iterator.next() : null;


            Integer priority1 = (version1_field != null) ? version1_field.getPriority() : this.eco.getEndOfStringPriority();
            Integer priority2 = (version2_field != null) ? version2_field.getPriority() : this.eco.getEndOfStringPriority();

            if((result_code = Integer.compare(priority1, priority2)) != 0) {
                break;
            }


            if (version1_field == null || version2_field == null) {
                break;
            }

            String value1 = version1_field.getValue();
            String value2 = version2_field.getValue();

            if(Character.isDigit(value1.charAt(0)) && Character.isDigit(value2.charAt(0))) {
                if(value1.length() > value2.length()) {
                    value2 = "0".repeat(value1.length() - value2.length()) + value2;
                }
                else if(value1.length() < value2.length()) {
                    value1 = "0".repeat(value2.length() - value1.length()) + value1;
                }
            }

            if((result_code = value1.compareTo(value2)) != 0) {
                break;
            }
        }

        if(debug) {
            if(result_code > 0)
            {
                System.out.println("version1 >> version2");
            }
            else if(result_code < 0)
            {
                System.out.println("version1 << version2");
            }
            else{
                System.out.println("version1 = version2");
            }
        }

        return result_code;
    }


    public void printToken(Token token) {
        Integer priority = token.getPriority();
        String key = this.eco.getElements().get(priority);
        System.out.print("'c=" + key + ";p=" + priority + ";v="+ token.getValue()+"' ");
    }

    public void printTokens(List<Token> fields) {
        System.out.print("Token eval: ");
        Iterator<Token> fields_iterator = fields.iterator();
        while(fields_iterator.hasNext())
        {
            Token field = fields_iterator.next();
            printToken(field);
            System.out.print(" ");
        }
        System.out.println("");
    }

    public static void main(String[] args) {
        // Check if the user provided exactly two versions
        if (args.length != 2) {
            System.out.println("Usage: java vers <version1> <version2>");
            return;
        }

        // Retrieve the versions from the command line
        String version1 = args[0];
        String version2 = args[1];

        // Perform some operation with the versions
        if(debug) {
            System.out.println("Version 1: " + version1);
            System.out.println("Version 2: " + version2);
        }
      
        vers v = new vers();

        List<Token> token1 = v.tokenize(version1);
        if(debug) {
            v.printTokens(token1);
        }

        List<Token> token2 = v.tokenize(version2);
        if(debug) {
            v.printTokens(token2);
        }

        Integer result = v.compareTo(token1, token2);

        if(result < 0)
            result = -1;
        else if(result > 0)
            result = 1;

        System.exit(result);

    }
}

@nickrmc83
Copy link

👋 We're very much enjoying DT. Thanks for your efforts in building such a great tool!

We're also experiencing issues with identifying vulnerabilities against Ubuntu packages and it'd be great to see this issue addressed as it'd allow us to move entirely to DT to provide vulnerability detection.

Tangentially, during our investigations that have lead us to this issue, we've identified an upstream issue here relating to invalid PURLs being published to OSV from Canonical. These look like they'll be addressed quickly.

@nscuro
Copy link
Member

nscuro commented Jan 8, 2025

@nickrmc83 I think this effort could be greatly accelerated by assisting @Andre-85 in reviewing and testing their implementation against realistic datasets. Waiting for me to have capacity to do so will delay this, I fear.

@Andre-85 Once you have something you're comfortable with sharing, we could ask others in the community (i.e. via Slack) to help here. We could publish container images to Docker Hub to make it easier for folks to kick the tires. Let me know if I can help facilitate anything.

@nickrmc83
Copy link

I'm happy to provide some test cases and real-world data that can be used to verify @Andre-85's changes. It's unlikely I'll be able to commit to thoroughly testing a new version outside of a quick validation for at least a couple of months.

@Andre-85
Copy link
Author

Andre-85 commented Jan 9, 2025

Hi @ALL,
thank you very much for your support offers! Right now I'm fitting my new algorithm in the DependencyTrack-Code and formatting my test code as regular unittests. As soon as this is finished I would this pack this in a container (this i can do by myself) and publish them for testing and discussing (here i could need help).

@nickrmc83 What kind of test data can you provide and for which ecosystem? Right now my testdata are random version numbers picked from a file (one per line in a file ubuntu_versions.txt). With these random version numbers I run the algorithm (right now its not in the unittest format of course) and compare it with the official tool.

#!/usr/bin/python3

import random
import pathlib
import subprocess
import sys

operators = { 255:"<<", 0:"=", 1:">>" }

def run_dpkg(version1, operator, version2):
    return subprocess.run(['dpkg', '--compare-versions', version1, operator, version2], check=False, shell=False).returncode

def run_java_vers(version1, version2):
    return operators[subprocess.run(['java','vers', version1, version2], check=False, shell=False).returncode]

testset_path = pathlib.Path("tests").joinpath("testsets","ubuntu_versions.txt")
#operators = [ "<<", "=", ">>" ]

# Read all lines from the file
if testset_path.exists():
    testset = testset_path.read_text().splitlines()  # Read lines and split by newline
    # Select two random lines
    if len(testset) >= 2:

        while True:
            versions = random.choices(testset, k=2)  # Randomly pick 2 lines without replacement
            for op in operators.values():
                code = run_dpkg(versions[0], op, versions[1])
                if code == 0:
                    relation = run_java_vers(versions[0], versions[1])
                    check = (relation == op)

                    print("Run: "+versions[0]+" "+op+" "+versions[1]+" "+str(check))
                    if not check:
                        sys.exit(0)
    else:
        print("The file doesn't have enough lines to select two.")
else:
    print(f"File '{testset_path}' does not exist.")

But a testset with a file with version1 RELATION version2 per line would also be ok.

@nickrmc83
Copy link

nickrmc83 commented Jan 10, 2025

@Andre-85 the relatively simple test data I can provide would be a real-world SBOM and associated PURLs and a list of CVEs we'd expect to see. This may allow for confirmation for end-to-end functionality as opposed to your Debian and Ubuntu version parsing logic.

@Andre-85
Copy link
Author

@nickrmc83 This would great if you could send this testdata to me. Even more interesting it would be to have testdata which is for other Ecosystems than Debian or Ubuntu. The issue you filed (canonical/ubuntu-security-notices#4) affects also DependencyTrack abilities i guess?

@nickrmc83
Copy link

nickrmc83 commented Jan 12, 2025

@nickrmc83 This would great if you could send this testdata to me. Even more interesting it would be to have testdata which is for other Ecosystems than Debian or Ubuntu. The issue you filed (canonical/ubuntu-security-notices#4) affects also DependencyTrack abilities i guess?

I can share an SBOM for Debian and Ubuntu only. Whilst the SBOMs I'll share are non-sensitive and will have any redactions made, I'd prefer to DM them to you.

You can compare the SBOMs I provide against findings identified by other tools using the same input. We've been using grype to this point.

I'm not sure how impacted dependency-track is by the Canonical issue, I didn't dive in deep on the impact, but I imagine if DT depends on qualifiers from OSV-published PURLs, there's a chance it would include either false-positive or false-negatives as a result.

@Andre-85
Copy link
Author

Hi @ALL,
i managed to make the test runs more efficient. From my giant ubuntu version dataset of ~180000 entries, i extracted the one which fit to truely different regexes. This gives ~1200 entries. These entries i presorted with sort -n and run then a bubble sort with dpkg --compare-version x '<<' y on them, so the list is now sorted from the oldest to the latest version. (Link: https://github.com/Andre-85/dependency-track/blob/feature/improved-version-compare/src/test/resources/version/ubuntu2.txt)

So running a test on the implementation requires now only picking a version from the list, pick a second one more down in the list and compare them (https://github.com/Andre-85/dependency-track/blob/feature/improved-version-compare/src/test/java/org/dependencytrack/util/ComponentVersionTest.java). The result needs to be always <0.

For the ones interested how i did the bubble sort:

import subprocess

def dpkg_compare_versions(version1, version2, operator):
    """Compare two versions using dpkg --compare-versions."""
    result = subprocess.run([
        'dpkg', '--compare-versions', version1, operator, version2
    ])
    return result.returncode == 0

def bubble_sort_versions(versions):
    """Sort a list of versions using bubble sort and dpkg --compare-versions."""
    n = len(versions)
    for i in range(n):
        swapped = False
        for j in range(0, n-i-1):
            # Use dpkg --compare-versions to compare versions
            if dpkg_compare_versions(versions[j], versions[j+1], '>>'):
                # Swap if the current version is greater than the next
                versions[j], versions[j+1] = versions[j+1], versions[j]
                print("Swapped " +  versions[j+1] + " and " + versions[j])
                swapped = True

        # If no two elements were swapped, the list is sorted
        if not swapped:
            break
    return versions

def main():
    # Input file with one version per line
    input_file = 'ubuntu2.txt'  # Replace with your file name

    # Read the file and get the versions as a list
    with open(input_file, 'r') as file:
        versions = [line.strip() for line in file if line.strip()]

    print("Initial list of versions:")
    print(versions)

    # Sort the versions using bubble sort
    sorted_versions = bubble_sort_versions(versions)

    # Write the sorted versions back to a file
    output_file = 'sorted_versions.txt'
    with open(output_file, 'w') as file:
        file.write('\n'.join(sorted_versions))

    print("\nSorted versions:")
    print(sorted_versions)
    print(f"\nSorted versions saved to {output_file}")

if __name__ == '__main__':
    main()

@Andre-85
Copy link
Author

@nickrmc83 It would be interesting to convert the SBOM entries and the CVE entries to expect in a unittest (https://github.com/Andre-85/dependency-track/blob/feature/improved-version-compare/src/test/java/org/dependencytrack/util/ComponentVersionTest.java). Then just a run of mvn test -Dtest=ComponentVersionTest should be suffient to see if your SBOM purl versions will be processed correctly.

DM the SBOM to me is ok. But howto do it github without posting my email address here in public?

You said you were using grype before. What was your reason to switch to DependencyTrack? And how to grype compare to here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Something isn't working in triage
Projects
None yet
Development

No branches or pull requests

4 participants