morse-dataset

Generate synthetic datasets for Morse code symbol classification for machine learning algotihms such as artificial neural networks.

Compute the inherent difficulty of the classification problem on these datasets using different metrics.

This research paper has more details. Please consider citing it if you use or benefit from this work:
S. Dey, K. M. Chugg and P. A. Beerel, “Morse Code Datasets for Machine Learning,” in 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1-7, Jul 2018. (Won Best Paper Award)
Available on IEEE and arXiv (copyright owned by IEEE).

For a short description of dataset generation, see Sourya Dey's blog post
For a guide to Morse code symbols, see morse_tree. Go left for dots and right for dashes.

This family of datasets now has its own IEEEDataPort page and competition (open till October 14, 2020!)

Requirements: Python 3, numpy, scipy

Description of generate_morse_dataset.py:

Create morse code datasets:
    Style='BW': Black and white having 0s for spaces and 1s for dot or dash. Noise means bit flips
    Style='GRAY': Gaussian grayscale levels. Noise is additive Gaussian
    save_filename: Leave as '' to save in preset path inside

    Framelen: Total length of frame for 1 character
    Classes: How many characters
    TReach,VAeach,TEeach: No. of training, validation and test cases FOR EACH class
    minlendot,maxlendot,... : min and max length of dots, dashes and intermediate spaces
    leadingsp_rand: Set to 0 to have no leading spaces, otherwise 1 to have random number of leading spaces
    dilation: If >1, all lengths are increased by this factor

    Black and white only:
        maxflip: Noise measure. Max how many bits to flip
        SET maxflip=0 for NO NOISE
    Grayscale only:
        levels: How many levels for dots and dashes. Will be normalized at end
        symbmean: Mean level for any symbol (dot or dash)
        symbsd: Standard deviation for symbol levels
        noisemean: Mean level for noise
        noisesd: Standard deviation for noise
        SET noisesd=0 for NO NOISE

2 already generated datasets are included:

baseline.npz : Uses default parameters
difficult.npz : Uses noisesd=4, leadingsp_rand=1, minlendash=3

Use load_data to extract the data and labels into training, validation and test:

xtrain, ytrain, xval, yval, xtest, ytest = load_data(filename = './baseline.npz')

Run dataset_metrics to test dataset difficulty, for example:

L, U, D, T = dataset_metrics('./baseline.npz')

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Codebook.npy		Codebook.npy
LICENSE		LICENSE
README.md		README.md
baseline.npz		baseline.npz
dataset_metrics.py		dataset_metrics.py
difficult.npz		difficult.npz
generate_morse_dataset.py		generate_morse_dataset.py
load_data.py		load_data.py
morse_code.png		morse_code.png
morse_tree.png		morse_tree.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

morse-dataset

Generate synthetic datasets for Morse code symbol classification for machine learning algotihms such as artificial neural networks.

Compute the inherent difficulty of the classification problem on these datasets using different metrics.

About

Releases

Packages

Languages

License

souryadey/morse-dataset

Folders and files

Latest commit

History

Repository files navigation

morse-dataset

Generate synthetic datasets for Morse code symbol classification for machine learning algotihms such as artificial neural networks.

Compute the inherent difficulty of the classification problem on these datasets using different metrics.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages