##Introduction :
Project contains a collection of functions to manipulate and perform analysis of malware feature vectors
From Wikipedia “… a feature vector is an n-dimensional vector of numerical features that represent some object”
Malware features numerically represent various aspect of (mostly) dynamic analysis and can be used to determine relational proximity between malware samples and families
Disclaimer : This project is very alpha / prototypical. Your milage may vary.
For an in-depth overview of the project, you may want to check out my DerbyCon presentation:
- Cuckoo Sandbox setup (linux host) -- cuckoosandbox.org
- MySql database with the following tables
$ mysql -u <mysql_user> -p <mfv_db_name> < mfv_tables.sql
- Python 2.7.X PIP
- Required python packages can be installed with
$ sudo pip install -r requirements.txt
- config.py must be modified with your credentials -- see config.py.example
- If using Plotly, you account creds must be stored in ~/.plotly -- see plot.ly for more info
mfv.py :
- Core resource of the project. Defines FeatureVector class.
- Defines functions to manipulate vectors, perform statistical analysis, and display plots
- Groups vectors based on shared tags (family, filetype, source)
- Normalizes vectors and creates “archetypes”
- Plots all vectors in the families with family archetype
- Similar to autogen_families.py
- Uses database instead of creating archetypes
- Compares test vector to each stored archetypes
- Normalizes & prunes sample under test to fit archetype
- Finds euclidean distance form test vector to archetype
- Suggests likely family based previously calculated distance
- Plots test vector against all archetypes
- Graphical way to verify the best_guess.py
- add_tags.py - imports tags (i.e. malware family, source, etc) into database
- create_feature_vectors.py - creates csv of feature vectors using Cuckoo Sandbox REST API
- add_vectors.py - imports feautre vector csv into database