README.md

Text Vectorization of User Agent Strings

In this approach, user agent-strings are encoded via a vector database to enable a model-assited threat hunting workflow:

This methodology allows us to:

(0) parse the user agent string into a set of tokens or string elements
(1) Hash each token, and convert it to a number
(2) Produce a pseudo-random number specific to this particular token.
(3) Create a numerical signature using this list of numbers, a signature that can then be used to compare user agents.

Blog: "Text Vectorisation, Clustering and Similarity Analysis With Splunk: Exploring User Agent Strings at Scale" Author: Josh Cowling
Code: