Translating Akkadian signs to transliteration using NLP algorithms

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Translating-Akkadian-using-NLP

Translating Akkadian signs to transliteration using NLP algorithms such as HMM, MEMM and BiLSTM neural networks.

Getting Started

There are 3 main ways to deploy the project:

Website
Python package
Github clone

Website

Use this link to access the website: https://babylonian.herokuapp.com/#/

Go to "Translit" tab and enter signs to see them transliterated.

Python Package

These instructions will enable you to use the project on your local machine for transliterating using "akkadian" python package that is based on our project.

Prerequisites

Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.

Installing

Install akkadian package. One way to do so is using pip:

pip install akkadian

Running

Following are a few examples for running sessions.

Tranliterating akkadian signs:

import akkadian.transliterate as akk
print(akk.transliterate("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))

Tranliterating akkadian signs using BiLSTM:

import akkadian.transliterate as akk
print(akk.transliterate_bilstm("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))

Top three options of tranliterating akkadian signs using BiLSTM:

import akkadian.transliterate as akk
print(akk.transliterate_bilstm_top3("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))

Tranliterating akkadian signs using MEMM:

import akkadian.transliterate as akk
print(akk.transliterate_memm("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))

Tranliterating akkadian signs using HMM:

import akkadian.transliterate as akk
print(akk.transliterate_hmm("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))

Github

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.

If you don't have git installed, install git - https://git-scm.com/downloads (Choose the appropriate operating system).

If you don't have a Github user, create one - https://github.com/join?source=header-home.

Installing the python dependencies

Install torch: Windows -

pip install torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html

Linux and MAC -

pip install torch torchvision

Install allennlp:

pip install allennlp==0.8.5

Cloning the project

Clone the project:

git clone https://github.com/gaigutherz/Translating-Akkadian-using-NLP.git

Running

Now you can develop for the Translating-Akkadian-using-NLP repository and and your improvements!

Training

Use the file train.py in order to train the models using the datasets. There is a function for each model that trains, stores the pickle and tests its performance on a specific corpora.

The functions are as follows:

hmm_train_and_test(corpora)
memm_train_and_test(corpora)
biLSTM_train_and_test(corpora)

Transliterating

Use the file transliterate.py in order to transliterate using the models. There is a function for each model that gets a sentence of Akkadian signs as parameter and returns its transliteration.

Example of usage:

akkadian_signs = "𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"
print(transliterate(akkadian_signs))
print(transliterate_bilstm(akkadian_signs))
print(transliterate_bilstm_top3(akkadian_signs))
print(transliterate_hmm(akkadian_signs))
print(transliterate_memm(akkadian_signs))

Datasets

The main datasets used for training and tests are:

Dataset	King	Time	Line Number	Percentage of Corpora
RINAP 1	Tiglath-pileser III and Shalmaneser V	744-722 BC	1125	4.78%
RINAP 3	Sennacherib	704-681 BC	7131	30.31%
RINAP 4	Esarhaddon	680-669 BC	6018	25.58%
RINAP 5	Ashurbanipal and Successors	668-612 BC	9252	39.33%

More datasets used:

RIAO - This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.
RIBO - This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).
SAAO - The online counterpart to the State Archives of Assyria series.
SUHU - This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.
TEI - Databases used for full translation.

Datasets deployment

The datasets are taken from ORACC project and can be downloaded from the following link: http://oracc.museum.upenn.edu/rinap/rinapdownloads/index.html.

In our repository the datasets are located in the "raw_data" directory. They can be also downloaded from the Github repository using git clone or zip download.

Project structure

BiLSTM_input:

Contains  dictionaries used for transliteration by BiLSTM.

NMT_input:

Contains dictionaries used for natural machine translation.

akkadian.egg-info:

Inforamtion  and settings for akkadian python package.

akkadian:

Sources and train's output.

output:	Train's output for HMM, MEMM and BiLSTM - mostly pickles.

__init__.py: Init script for akkadian python package. Initializes global variables.

bilstm.py:  Class for BiLSTM train and prediction using AllenNLP implementation.

build_data.py: Code for organizing the data in dictionaries.

check_translation.py: Code for translation accuracy checking.

combine_algorithms.py: Code for prediction using both HMM, MEMM and BiLSTM.

data.py: Utils for accuracy checks and dictionaries interpretations.

full_translation_build_data.py: Code for organizing the data for full translation task.

get_texts_details.py: Util for getting more information about the text.

hmm.py: Implementation of HMM for train and prediction.

memm.py: Implementation of MEMM for train and prediction.

parse_json: Json parsing used for data organizing.

parse_xml.py: XML parsing used for data organizing.

train.py: API for training all 3 algorithms and store the output.

translation_tokenize.py: Code for tokenization for translation task.

transliterate.py: API for transliterating using all 3 algorithms.

build/lib/akkadian:

Inforamtion  and settings for akkadian python package.

dist:

Akkadian python package - wheel and tar.

raw_data:

Databases used for  training the models.

random: 4 Texts used for cross era testing.

riao: This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.

ribo: This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).

rinap: Presents fully searchable, annotated editions of the royal inscriptions of Neo-Assyrian kings Tiglath-pileser III (744-727 BC), Shalmaneser V (726-722 BC), Sennacherib (704-681 BC), Esarhaddon (680-669 BC), Ashurbanipal (668-631 BC), Aššur-etel-ilāni (630-627 BC), and Sîn-šarra-iškun (626-612 BC).

saao: The online counterpart to the State Archives of Assyria series.

suhu: This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.

tei: Databases used for full translation.

Authors

Gai Gutherz
Ariel Elazary

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.12

May 10, 2023

1.0.11

Sep 9, 2020

1.0.10

Aug 31, 2020

1.0.9

Aug 22, 2020

1.0.8

Aug 22, 2020

1.0.7

Aug 15, 2020

1.0.6

Aug 12, 2020

1.0.5

Aug 11, 2020

This version

1.0.4

Aug 8, 2020

1.0.3

Aug 8, 2020

1.0.2

Jul 25, 2020

1.0.1

Jul 25, 2020

1.0.0

Jul 24, 2020

0.1.11

Jul 24, 2020

0.1.10

Jul 24, 2020

0.1.9

Jul 24, 2020

0.1.8

Jul 24, 2020

0.1.7

Jul 24, 2020

0.1.6

May 28, 2020

0.1.5

May 28, 2020

0.1.4

May 28, 2020

0.1.3

May 28, 2020

0.1.2

May 28, 2020

0.1.1

May 28, 2020

0.1

May 28, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

akkadian-1.0.4.tar.gz (33.8 kB view hashes)

Uploaded Aug 8, 2020 Source

Built Distribution

akkadian-1.0.4-py3-none-any.whl (33.1 MB view hashes)

Uploaded Aug 8, 2020 Python 3

Hashes for akkadian-1.0.4.tar.gz

Hashes for akkadian-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`d76938548ad4f8f7edb9bfd4397231751696687b81cfefc198ada495f2675779`
MD5	`7b062d1cdaefbdb299d2b39cba6e037c`
BLAKE2b-256	`cbfb6d52e0e86eb82ac26c482c9e942ede67ec9b402697eb5c3bbabcd2e332df`

Hashes for akkadian-1.0.4-py3-none-any.whl

Hashes for akkadian-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a94511ae1bc071af2dae2620681c6db9ace231466b966a8704503037a6f271c`
MD5	`243eaaa80a8a06797c5c42988093ca38`
BLAKE2b-256	`efb7bc8ab55a5e6744e85da41190a9c8a2fbc4069e8c8cb86e7244788c2801ca`