Update README.

VT-Magnum-Research · Sep 26, 2014 · 3124358 · 3124358
1 parent 70d8423
commit 3124358
Show file tree

Hide file tree

Showing 3 changed files with 60 additions and 44 deletions.
diff --git a/README.md b/README.md
@@ -1,15 +1,31 @@
-# Deprecated
-__This project has been deprecated and remains online
-for historical archiving__
-
 # Android Antimalware
 
-## About
-This is a set of shell scripts and a sample feature vector collection
-application to help automate the training and testing of dynamic machine
-learning malware classifiers. Arch Linux is the main testing platform.
+An important concern on the growing Android platform is malware detection.
+Malware detection techniques on the Android platform are similar to
+techniques used on any platform. Detection is fundamentally broken into static
+analysis, by analyzing a compiled file; dynamic analysis, by analyzing the
+runtime behavior, such as battery, memory, and network utilization of the
+device; or hybrid analysis, by combining static and dynamic techniques.
+Static analysis is advantageous on memory-limited Android devices because the
+malware is not executed, only analyzed. However, dynamic analysis provides
+additional protection, particularly against polymorphic malware that change
+form during execution.
+**This project provides a framework to profile applications to obtain
+feature vectors for dynamic analysis.**
+
+This work was presented at the
+[International Wireless Communications and Mobile Computing Conference (IWCMC) 2013][iwcmc-2013], and the paper is available [here][doi].
+The set of feature vectors and classifiers are available for further
+analysis in the `Results/IWCMC-2013` directory.
+
+As projects mature, design decisions are tested, and the design
+decision of using shell scripts as a framework does not deliver
+a reliable control mechanism of error-prone emulators on a distributed system.
+**Therefore, this project has been deprecated and remains online
+for historical archiving.**
+We are actively designing a new framework in Scala.
 
-## Usage
+# Usage
 1. Populate `TestSuite/Training` and/or `TestSuite/Testing` with APK files
 with the naming format `<M/B><Number>-<Name>.apk`. Where
  + `<M/B>` represents the classification of the application (malicious
@@ -24,44 +40,44 @@ and collecting feature vectors.
 4. Feature vectors will be saved to `arff/` and the machine learning
 classifiers will be accordingly trained and tested with `arff/weka.sh`
 
-## Feature Vector Collection Application
-The feature vector collection application called
-Antimalware and is an Eclipse project. See below for a short section
-on increasing Eclipse's memory if you are trying to load it in Eclipse.
-The collected data is stored on an sdcard on the device.
+# Experiment: Malware Classifier Performance
+STREAM resides on the Android Tactical Application Assessment & Knowledge
+(ATAACK) Cloud, which is a hardware platform designed to provide a testbed
+for cloud– based analysis of mobile applications. The ATAACK cloud currently
+uses a 34 node cluster, with each cluster machine containing Dell PowerEdge
+M610 blade running CentOS 6.3. Each node has 2 Intel Xeon 564 processors
+with 12 cores each along with 36GB of DDR3 ECC memory.
+
+We used STREAM to send 10,000 input events to each application in the data set
+and collect a feature vector every 5 seconds. We collected the following
+set of features.
+
+![](https://raw.githubusercontent.com/VT-Magnum-Research/antimalware/master/images/feature-vectors.png)
+
+Feature vectors collected from the
+training set of applications were used to create classifiers, and then feature
+vectors from the testing set are used to evaluate the created malware
+classifiers. Classification rates from the testing set are based on the 47
+testing applications used. Future work includes increasing the testing set size
+to increase confidence in these results.
 
-## Directory Structure
- .
- ├── Antimalware - The data collection application
- │   ├── libs - The modified Weka library
- │   └── src
- ├── arff - Collected feature vectors and classifiers
- ├── Results
- └── TestSuite
-  ├── AVDs
-  ├── Device-Images
-  ├── logs
-  ├── Testing - Applications
-  └── Training - Applications
+The following table shows descriptions of the metrics used to evaluate
+classifiers.
 
-## Increasing Eclipse's Default Memory
-Importing the Antimalware Android project into Eclipse is simple. However,
-Eclipse's memory needs to be increased to load the Weka library used.
-First, find eclipse.ini in your system.
+![](https://raw.githubusercontent.com/VT-Magnum-Research/antimalware/master/images/definitions.png)
 
- $ sudo find / -name 'eclipse.ini'
- /usr/share/eclipse/eclipse.ini 
+The overall results of training and testing six machine learning algorithms with
+STREAM are shown in the following table.
 
-Then edit it and increase the memory settings:
+![](https://raw.githubusercontent.com/VT-Magnum-Research/antimalware/master/images/classifier-results.png)
 
- $ vim /usr/share/eclipse/eclipse.ini
+There is a clear difference in correct
+classification percentage of the cross validation set (made up of applications
+used in training) versus the testing set (made up of applications never used in
+training). Feature vectors from the training set are classified quite well,
+typically over 85% correct, whereas new feature vectors from the testing set
+are often only classified 70% correctly. Classifier performance cannot be based
+on cross validation solely, as it is prone to inflated accuracy results.
 
- [...]
- --launcher.XXMaxPermSize
- 2048m
- [...]
- --launcher.defaultAction
- [...]
- -Xms1024m
- -Xmx2028m
- [...]
+[iwcmc-2013]: http://iwcmc.org/2013/
+[doi]: http://dx.doi.org/10.1109/IWCMC.2013.6583806
diff --git a/images/classifier-results.png b/images/classifier-results.png
diff --git a/images/definitions.png b/images/definitions.png