mRMR方法是用来做特征选择的,该方法保证了特征间的最小冗余性以及特征和类标签的最大相关性,详细的源码请前往原链接mRMR,本文主要介绍如何使用此方法。关于本软件的数学公式推导,可以参考这篇paper,[Pent et al. mRMR.pdf]( et al_mRMR.pdf).
[root@master mrmr_c_src]# ./mrmr
Usage: mrmr_osx -i <dataset> -t <threshold> [optional arguments]
-i <dataset> .CSV file containing M rows and N columns, row - sample, column - variable/attribute.
-t <threshold> a float number of the discretization threshold; non-spec ifying this parameter means no discretizaton (i.e. data is already integer); 0 to make binarization.
-n <number of features> a natural number, default is 50.
-m <selection method> either "MID" or "MIQ" (Capital case), default is MID.
-s <MAX number of samples> a natural number, default is 1000. Note that if you don't have or don't need big memory, set this value small, as this program will use this value to pre-allocate memory in data file reading.
-v <MAX number of variables/attibutes in data> a natural number, default is 10000. Note that if you don't have or don't need big memory, set this value small, as this program will use this value to pre-allocate memory in data file reading.
[-h] print this message.
*** This program and the respective minimum Redundancy Maximum Relevance (mRMR)
algorithm were developed by Hanchuan Peng <[email protected]>for
the paper
"Feature selection based on mutual information: criteria of
max-dependency, max-relevance, and min-redundancy,"
Hanchuan Peng, Fuhui Long, and Chris Ding,
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 27, No. 8, pp.1226-1238, 2005.
[root@master mrmr_c_src]# nohup ./mrmr -i /home/liudiwei/data/featselect/ -t 0.0001 -s 1000000 -v 1000 -n 42 > result.out 2>&1 &
- result.out文件