##Introduction :
-
Project contains a collection of functions to manipulate and perform analysis of malware feature vectors
-
From Wikipedia “… a feature vector is an n-dimensional vector of numerical features that represent some object”
-
Malware features numerically represent various aspect of (mostly) dynamic analysis and can be used to determine relational proximity between malware samples and families
-
Disclaimer : This project is very alpha / prototypical. Your milage may vary.
-
For an in-depth overview of the project, you may want to check out my DerbyCon presentation:
https://www.youtube.com/watch?v=f74w4sOlQ5A
- Cuckoo Sandbox setup (linux host) -- cuckoosandbox.org
- MySql database with the following tables
$ mysql -u <mysql_user> -p <mfv_db_name> < mfv_tables.sql
- Python 2.7.X PIP
- Required python packages can be installed with
$ sudo pip install -r requirements.txt
- config.py must be modified with your credentials -- see config.py.example
- If using Plotly, you account creds must be stored in ~/.plotly -- see plot.ly for more info
-
mfv.py :
- Core resource of the project. Defines FeatureVector class.
- Defines functions to manipulate vectors, perform statistical analysis, and display plots
-
examples/autogen_families.py
- Groups vectors based on shared tags (family, filetype, source)
- Normalizes vectors and creates “archetypes”
- Plots all vectors in the families with family archetype
-
examples/plot_archetypes.py
- Similar to autogen_families.py
- Uses database instead of creating archetypes
-
examples/best_guess.py
- Compares test vector to each stored archetypes
- Normalizes & prunes sample under test to fit archetype
- Finds euclidean distance form test vector to archetype
- Suggests likely family based previously calculated distance
-
exmaples/compare_to_archetypes.py
- Normalizes & prunes sample under test to fit archetype
- Plots test vector against all archetypes
- Graphical way to verify the best_guess.py
- add_tags.py - imports tags (i.e. malware family, source, etc) into database
- create_feature_vectors.py - creates csv of feature vectors using Cuckoo Sandbox REST API
- add_vectors.py - imports feautre vector csv into database