betacode

Betacode to Unicode converter.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- Greek
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3
Topic
- Text Processing :: Linguistic

Project description

betacode

Convert betacode to unicode and vice-versa easily. Tested on python 3.4, 3.5, and 3.6. The definition used is based off what is found at the TLG Beta Code Manual. Only the Greek sections were paid attention to.

Install

Installation is easy. Use pip or your preferred method to download from PyPI.

pip install betacode

Usage

Note that in all examples, strings are unicode encoded. Input can be in upper or lower case. The official definition from TLG uses only uppercase, but many resources, such as the Perseus catalog, are encoded in lowercase. So, this package accepts both. This package also does not pay much attention to the cannonical order of Greek diacritics that is defined in the official definition. This is because it is unecessary. The only thing that matters in order for the betacode to be unambiguous is that each character must either begin with a * or a letter. As long as these constraints are followed, breathing marks, accents, and such can go in any order. However, the cannonical order will be returned when going from unicode to betacode. Also note that currently, only individual, non-combining characters are handled. This means that you cannot do all combinations of letters and diacritics.

Betacode to unicode

import betacode.conv

beta = 'analabo/ntes de\ kaq\' e(/kaston'
betacode.conv.beta_to_uni(beta) # αναλαβόντες δὲ καθ᾽ ἕκαστον

Note that polytonic accent marks will be used, and not monotonic accent marks. Both are de jure equivalent in Greece, and betacode was initially developed to encode classic works. In other words, the oxeîa will be used rather than tónos. The oxeîa form can be converted to the modern accent form easily either through search and replace, or unicode normalization.

Unicode to betacode

import betacode.conv

uni = 'αναλαβόντες δὲ καθ᾽ ἕκαστον'
betacode.conv.uni_to_beta(uni) # analabo/ntes de\ kaq\' e(/kaston

The unicode text should only use polytonic (oxeîa) accent marks.

Speed

The original implementation used a custom made trie. This maybe was not the fastest (I wasn’t sure). So, I compared against a third party trie implementation, pygtrie. The pygtrie had nicer prefix methods which allowed for much faster processing of large texts. This changed converting all of Strabo or Herodotus in the Perseus catalog from a many minute operation to a ~3-4 second operation.

Modified Betacode

There is talk of a modified betacode that I have seen around on the internet. I have never been able to find a definitive definition of this so I have not implemented it. Among some differences is word final sigma usage, _ as macron, and uppercase and lowercase roman letters instead of using *.

Development

I am no classicist, and this was done in my free time. It is very possible that there are some letters missing that are not accounted for, or some punctuation that is not properly handled. If that is the case, please tell me as it is easy to fix, or please open a PR.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- Greek
Operating System
- OS Independent
Programming Language
- Python
- Python :: 3
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

1.0

Mar 9, 2020

0.2

May 25, 2018

0.1.6

May 24, 2018

This version

0.1.5

May 24, 2018

0.1.4

Apr 18, 2018

0.1.3

Apr 17, 2018

0.1.2

Apr 17, 2018

0.1.1

Apr 9, 2018

0.1

Apr 9, 2018

0.0.1

Apr 6, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betacode-0.1.5.tar.gz (7.4 kB view hashes)

Uploaded May 24, 2018 Source

Hashes for betacode-0.1.5.tar.gz

Hashes for betacode-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`46a55a1d1361f17e180539d61bb03b551aded617ed2c74fab0fb1dfc1c52d833`
MD5	`32d4534094be00447f3ccf5eb0e31134`
BLAKE2b-256	`e9ad452195d4fb7d532aa542ef79ce2be409a3a0b0329665ee8a8aec59b90755`