(1) Original data creation
- Script: harfbuzz_and_crop.rb
- Output data: data
- "Friend," "affect," "work," and "power" glyphs are rasterized for each weight of Kozuka Gothic.
- Crop with ImageMagick.
- Draw them in white and the background in black because applying skeletonizing them and FFT later.
(2) Skeletonization
- Script: skeletonization.sh + skeletonization.py
- Output data: skeletonization
Executed by OpenCV API. The threshold is appropriate. The heavier the weight is, the worse the quality is. There seems to be no universal threshold.
(3) Magnitude spectrum calculation by fast Fourier transform
- Script: fft_and_magnitude_spectrum.py
- Output data: fft
Magnitude spectrum Absolute value of coefficient of frequency decomposition by Fourier transform.
Power spectrum Squared value of the magnitude spectrum.
(4) Character recognition by OCR (Tesseract)
- Script: harfbuzz_and_crop_for_tess.rb, tesseract_test.rb, check_tess_result.rb
- Output data: data4tess
(4-1) Original data creation harfbuzz_and_crop_for_tess.rb
(4-2) Recognition tesseract_test.rb
(4-3) Result evaluation check_tess_result.rb
- Clustering with machine learning (unsupervised).