In this approach, user agent-strings are encoded via a vector database to enable a model-assited threat hunting workflow:
This methodology allows us to:
- (0) parse the user agent string into a set of tokens or string elements
- (1) Hash each token, and convert it to a number
- (2) Produce a pseudo-random number specific to this particular token.
- (3) Create a numerical signature using this list of numbers, a signature that can then be used to compare user agents.
Blog: "Text Vectorisation, Clustering and Similarity Analysis With Splunk: Exploring User Agent Strings at Scale" Author: Josh Cowling
Code: