Preprocessing

Before executing any method in this package, it is necessary to run a pre-processing script, to eliminate any noise from the sequences (e.g., other letters as: N, K ...,). To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/preprocessing.py -i input -o output


Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - Fasta format file, e.g., output.fasta

Running:

$ python3.7 preprocessing/preprocessing.py -i dataset.fasta -o preprocessing.fasta

Xmer k-Spaced Ymer Composition Frequency (kGap).

To use this model, follow the example below:

To run the code (Example): $ python3.7 methods/Kgap.py -i input -o output -l label -k kgap -bef before -aft after -seq type


Where:

-i = Input - Fasta format file, E.g., test.fasta

-o = Output - CSV format file, E.g., test.csv.

-l = label - lncRNA, circRNA...

-k = gap - e.g., Frequency of kgap, E.g. 1 = A_A, 2 = A__A, 3 = A___A...

-bef = before - e.g., 1 = A_A, 2 = AA_A, 3 = AAA_A...

-aft = after - e.g., 1 = A_A, 2 = A_AA, 3 = A_AAA...

-seq = type of sequence, e.g., 1 = DNA, 2 = RNA and 3 = Protein

Running:

$ python3.7 methods/Kgap.py -i sequences.fasta -o sequences.csv -l test -k 1 -bef 1 -aft 2 -seq 1

Note Input sequences for feature extraction must be in fasta format.

Note This example will generate a csv file with the extracted features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kgap.md

kgap.md

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

Preprocessing

Xmer k-Spaced Ymer Composition Frequency (kGap).

Files

kgap.md

Latest commit

History

kgap.md

File metadata and controls

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

Preprocessing

Xmer k-Spaced Ymer Composition Frequency (kGap).