Skip to content

yahyaqowle/Afsoomali

Repository files navigation

Soomaali Xaquuq Daabacad kama dhaxeyso

free for use

Halkan waxaan ku diyaarinaynaa mashaariic lagu hormarinayo Afsoomaliga

waa bilaash oo iskaa wax u qabso weeye mashruucani (nonprofit)

Step 1. Qaamuuskan Afka-Soomaliga wuxuu ka koobnaan doonaa 969 bog ah innagoo kasoo qaadan doonna buug ay qorayn Annarita Puglielli, Cabdalla Cumar Mansuur oo ku saabsan Qaamuska Afka-Soomaliga. (This Somali Dictionary will consist of 969 pages that is written by Annarita Puglielli, Cabdalla Cumar Mansuur. our source will be this Qaamuska Afka-Soomali.pdf).

Step 1.1 Extracting = We extract Words and their meanings From the pdf file into the Somali Dictionary.txt (Starting from A to Z orderly). Example: after we finish writing A words then we go B and C and D and so on.

Step 1.2 Categorising = After we done Step 1.1 then we go to next face called categorising. we categorise the text that we extracted in step 1 and put in a folder called Alphabets(alifbeeto). example: in step 1.1 we extracted A words from the pdf and we organise and saved in Somali-Dictionary.txt, now in step 1.2 we are copying only the A words from Somali-Dictionary and creating new text file and save it A.txt in the Alphabet folder. we repeat this method (categorising) to every alphabet.

Step 2. Erayo aan kasoo dhex saarnay Qaamuuska ayaan file u diyaarinaynaa isagoo formats kala duwan ah sida .txt, .csv ujeedaduna waa in erayo qalalan aan diyaarinno. (We are extracting Vocabulary words from Dictionary that we write and we will change to different formats).

Step 2.1 We will do the same as Step 1.1 but instead of copying from the pdf we will use python(it's a simple method). Step 2.2 Categorising = we will imply like Step 1.2.

Resources and files that can be used to improve this project directly or indirectly can be found.

New update will be on the way.

Our focus now is ML by training data we gathered from varies sources.

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages