Akkademia is a tool for automatically transliterating Unicode cuneiform glyphs. It is written in python script and uses HMM, MEMM and BiLSTM neural networks to determine appropriate sign-readings and segmentation.
We trained these algorithms on the RINAP corpora (Royal Inscriptions of the Neo-Assyrian Period), which are available in JSON and XML/TEI formats thanks to the efforts of the Official Inscriptions of the Middle East in Antiquity (OIMEA) Munich Project of Karen Radner and Jamie Novotny, funded by the Alexander von Humboldt Foundation, available here. We achieve accuracy rates of 89.5% with HMM, 94% with MEMM, and 96.7% with BiLSTM on the trained corpora. Our model can also be used on texts from other periods and genres, with varying levels of success.
Akkademia can be accessed in three different ways:
- Website
- Python package
- Github clone
The website and python package are meant to be accessible to people without advanced programming knowledge.
Go to the Babylonian Engine website (under development)
Go to the "Akkademia" tab and follow the instructions there for transliterating your signs.
Our python package "akkadian" will enable you to use Akkademia on your local machine.
You will need a Python 3.7.x installed. Our package currently does not work with other versions of python. You can follow the installation instructions here or go straight ahead to python's downloads page and pick an appropriate version.
Mac comes preinstalled with python 2.7, which may remain the default python version even after installing 3.7.x. To check, type python --version
into terminal. If the running version is python 2.7, the simplest short-term solution is to type python3
or pip3
in Terminal throughout instead of python
and pip
as in the instructions below.
You can install the package using the pip install function. If you do not have pip installed on your computer, or you are not sure whether it is installed or not, you can follow the instructions here
Before installing the package akkadian, you will need to install the torch package. For Windows, copy the following into Command Prompt (CMD):
pip install torch==1.0.0 torchvision==0.2.1 -f https://download.pytorch.org/whl/torch_stable.html
For Mac and Linux copy the following into Terminal:
pip install torch torchvision
Then, type the following in Command Prompt (Windows), or Terminal (Mac and Linux):
pip install akkadian
your installation should be executed. This will take several minutes.
Open a python IDE (Integrated development environment) where a python code can be run. There are many possible IDEs, see realpython's guide or wiki python's list. For beginners, we recommend using Jupyter Notebook: see downloading instructions here, or see downloading instructions and beginners' tutorial here.
First, import akkadian.transliterate
into your coding environment:
import akkadian.transliterate as akk
Then, you can use HMM, MEMM, or BiLSTM to transliterate the signs. The functions are:
akk.transliterate_hmm("Unicode_signs_here")
akk.transliterate_memm("Unicode_signs_here")
akk.transliterate_bilstm("Unicode_signs_here")
akk.transliterate_bilstm_top3("Unicode_signs_here")
akk.transliterate_bilstm_top3
gives the top three BiLSTM options, while akk.transliterate_bilstm
gives only the top one.
For an immediate output of the results, put the akk.transliterate()
function inside the print()
function. Here are some examples with their output:
print(akk.transliterate_hmm("π»π
ππΏπ¬ππ
π²π ππΎ"))
Ε‘aβ nak-ba-i-mu-ru iΕ‘-di-ma-a-ti
print(akk.transliterate_memm("π»π
ππΏπ¬ππ
π²π ππΎ"))
Ε‘aβ SILIM ba-i-mu-ru-iΕ‘-di-ma-a-ti
print(akk.transliterate_bilstm("π»π
ππΏπ¬ππ
π²π ππΎ"))
Ε‘aβ nak-ba-i-mu-ru iΕ‘-di-ma-a-ti
print(akk.transliterate_bilstm_top3("π»π
ππΏπ¬ππ
π²π ππΎ"))
('Ε‘aβ nak-ba-i-mu-ru iΕ‘-di-ma-a-ti ', 'Ε‘aβ-di-ba i mu ru-iΕ‘ di ma tukul-tu ', 'MUN kis BA Ε‘e-MU-Ε‘ub-Ε‘ah-αΉi-nab-nu-ti-')
This line was taken from the first line of the Epic of Gilgamesh: Ε‘aβ naq-ba i-mu-ru iΕ‘-di ma-a-ti; "He who saw the Deep, the foundation of the country" (George, A.R. 2003. The Babylonian Gilgamesh Epic: Introduction, Critical Edition and Cuneiform Texts. 2 vols. Oxford: Oxford University Press). Although the algorithms were not trained on this text genre, they show promising, useful results.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
You will need a Python 3.7.x installed. Our package currently does not work with other versions of python. Go to python's downloads page and pick an appropriate version.
If you don't have git installed, install git here (Choose the appropriate operating system).
If you don't have a Github user, create one here.
Copy the following into Command Prompt (with windows) or Terminal (with mac) to clone the project:
git clone https://github.com/gaigutherz/Akkademia.git
Then download your prefered version of conda, such as miniconda. We use conda because we want to create specific enviroments where we can manage dependencies without impacting the overall system.
Then create your environment with the right version of python:
conda create -n akkadian python=3.7
You can then activate as follows:
conda activate akkadian
To then install dependencies, run the following (assuming you store your git pulls from a folder in your home directory named 'GitHub'. You may need to customize this line to your system if it is stored elsewhere):
pip install -r ~/GitHub/Akkademia/requirements.txt
You will then have a specific enviornment which meets all requirements of this project without otherwise affecting operations on you linux or mac system. For a more customized install, see below.
In order to run the code, you will need the torch and allennlp libraries. If you have already installed the package akkadian, these were installed on your computer and you can skip to the next step.
Install torch: For Windows, copy the following to Command Prompt
pip install torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html
for Mac and Linux, copy the following to Terminal
pip install torch torchvision
Install overrides and allennlp: copy the following to Command Prompt (with windows) or Terminal (with mac):
pip install overrides==3.1.0
pip install allennlp==0.8.5
Copy the following into Command Prompt (with windows) or Terminal (with mac) to clone the project:
git clone https://github.com/gaigutherz/Akkademia.git
Now you can develop the Akkademia repository and add your improvements!
Use the file train.py in order to train the models using the datasets. There is a function for each model that trains, stores the pickle and tests its performance on a specific corpora.
The functions are as follows:
hmm_train_and_test(corpora)
memm_train_and_test(corpora)
biLSTM_train_and_test(corpora)
Use the file transliterate.py in order to transliterate using the models. There is a function for each model that takes Unicode cuneiform signs as parameter and returns its transliteration.
Example of usage:
cuneiform_signs = "π»π
ππΏπ¬ππ
π²π ππΎ"
print(transliterate(cuneiform_signs))
print(transliterate_bilstm(cuneiform_signs))
print(transliterate_bilstm_top3(cuneiform_signs))
print(transliterate_hmm(cuneiform_signs))
print(transliterate_memm(cuneiform_signs))
For training the algorithms, we used the RINAP corpora (Royal Inscriptions of the Neo-Assyrian Period), which are available in JSON and XML/TEI formats thanks to the efforts of the Humboldt Foundation-funded Official Inscriptions of the Middle East in Antiquity (OIMEA) Munich Project led by Karen Radner and Jamie Novotny, available here. The current output in our website, package and code is based on training done on these corpora alone.
For additional future training, we added the following corpora (in JSON file format) to the repository:
These corpora were all prepared by the Munich Open-access Cuneiform Corpus Initiative (MOCCI) and OIMEA project teams, both led by Karen Radner and Jamie Novotny, and are fully accessible for download in JSON or XML/TEI format in their respective project webpages (see left side-panel on project webpages and look for project-name downloads).
We also included a separate dataset which includes all the corpora in XML/TEI format.
All the dataset are taken from their respective project webpages (see left side-panel on project webpages and look for project_name downloads) and are fully accessible from there.
In our repository the datasets are located in the "raw_data" directory. They can also be downloaded from the Github repository using git clone or zip download.
BiLSTM_input:
Contains dictionaries used for transliteration by BiLSTM.
NMT_input:
Contains dictionaries used for natural machine translation.
akkadian.egg-info:
Information and settings for akkadian python package.
akkadian:
Sources and train's output.
output: Train's output for HMM, MEMM and BiLSTM - mostly pickles.
__init__.py: Init script for akkadian python package. Initializes global variables.
bilstm.py: Class for BiLSTM train and prediction using AllenNLP implementation.
build_data.py: Code for organizing the data in dictionaries.
check_translation.py: Code for translation accuracy checking.
combine_algorithms.py: Code for prediction using both HMM, MEMM and BiLSTM.
data.py: Utils for accuracy checks and dictionaries interpretations.
full_translation_build_data.py: Code for organizing the data for full translation task.
get_texts_details.py: Util for getting more information about the text.
hmm.py: Implementation of HMM for train and prediction.
memm.py: Implementation of MEMM for train and prediction.
parse_json: Json parsing used for data organizing.
parse_xml.py: XML parsing used for data organizing.
train.py: API for training all 3 algorithms and store the output.
translation_tokenize.py: Code for tokenization of translation task.
transliterate.py: API for transliterating using all 3 algorithms.
build/lib/akkadian:
Information and settings for akkadian python package.
dist:
Akkadian python package - wheel and tar.
raw_data:
Databases used for training the models:
RINAP 1, 3-5
Additional databases for future training:
RIAO
RIBO
SAAO
SUHU
Miscellanea:
tei - the same databases (RINAP, RIAO, RIBO, SAAO, SUHU) in XML/TEI format.
random - 4 texts used for testing texts outside of the training corpora. They were randomly selected from RIAO and RIBO.
This repository is made freely available under the Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. This means you are free to share and adapt the code and datasets, under the conditions that you cite the project appropriately, note any changes you have made to the original code and datasets, and if you are redistributing the project or a part thereof, you must release it under the same license or a similar one.
For more information about the license, see here.
If you are experiencing any issues with the website, the python package akkadian or the git repository, please contact us at [email protected], and we would gladly assist you. We would also much appreciate feedback about using the code via the website or the python package, or about the repository itself, so please send us any comments or suggestions.
- Gai Gutherz
- Ariel Elazary
- Avital Romach
- Shai Gordin