Skip to content

Latest commit

 

History

History

text_vectorization_of_user_agent_strings

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Text Vectorization of User Agent Strings

In this approach, user agent-strings are encoded via a vector database to enable a model-assited threat hunting workflow:

This methodology allows us to:

  • (0) parse the user agent string into a set of tokens or string elements
  • (1) Hash each token, and convert it to a number
  • (2) Produce a pseudo-random number specific to this particular token.
  • (3) Create a numerical signature using this list of numbers, a signature that can then be used to compare user agents.

Blog: "Text Vectorisation, Clustering and Similarity Analysis With Splunk: Exploring User Agent Strings at Scale" Author: Josh Cowling
Code: GitHub