Skip to main content

python package for glottolog data curation

Project description

pyglottolog

Programmatic access to Glottolog data.

Build Status Documentation Status PyPI

Install

To install pyglottolog you need a python installation on your system, running python >3.7. Run

pip install pyglottolog

This will also install the command line interface glottolog.

Note: To make use of pyglottolog you also need a local copy of the Glottolog data. This can be

Make sure you remember where this local copy of the data is located - you may have to pass this location as option when using pyglottolog.

A convenient way to clone the data repository, keep it updated and access it from pyglottolog is provided by cldfbench. See the README for details.

Python API

Using pyglottolog, Glottolog data can be accessed programmatically from within python programs. All functionality is mediated through an instance of pyglottolog.Glottolog, e.g.

>>> from pyglottolog import Glottolog
>>> glottolog = Glottolog('.')
>>> print(glottolog)
<Glottolog repos v0.2-259-g27ac0ef at /.../glottolog>

For details, refer to the API documentation at readthedocs.

Command line interface

Command line functionality is implemented via sub-commands of glottolog. The list of available sub-commands can be inspected running

$ glottolog -h
usage: glottolog [-h] [--log-level LOG_LEVEL] [--repos REPOS]
                 [--repos-version REPOS_VERSION]
                 COMMAND ...

optional arguments:
  -h, --help            show this help message and exit
  --log-level LOG_LEVEL
                        log level [ERROR|WARN|INFO|DEBUG] (default: 20)
  --repos REPOS         clone of glottolog/glottolog
  --repos-version REPOS_VERSION
                        version of repository data. Requires a git clone!
                        (default: None)

available commands:
  Run "COMAMND -h" to get help for a specific command.

  COMMAND
    cldf                Dump Glottolog data as CLDF dataset
    create              Create a new languoid directory for a languoid
                        specified by name and level.
    edit                Open a languoid's INI file in a text editor.
    htmlmap             Create an HTML/Javascript map (using leaflet) of
                        Glottolog languoids.
    iso2codes           Map ISO codes to the list of all Glottolog languages
                        and dialects subsumed "under" it.
    langdatastats       List all metadata fields used in languoid INI files
                        and their frequency.
    langsearch          Search Glottolog languoids.
    languoids           Write languoids data to csv files
    refsearch           Search Glottolog references
    searchindex         Index
    show                Display details of a Glottolog object.
    tree                Print the classification tree starting at a specific
                        languoid.

Extracting languoid data

Glottolog data is often integrated with other data or incorporated as reference data in tools, e.g. as LanguageTable in a CLDF dataset.

To do this, the LanguageTable from glottolog/glottolog-cldf could be copied, or one may use glottolog's languoids subcommand, which dumps basic languoid data into a CSVW file with accompanying metadata:

glottolog languoids [--output=OUTDIR] [--version=VERSION]

This will create a CSVW package, i.e.

  • a CSV table glottolog-languoids-VERSION.csv
  • and a JSON description glottolog-languoids-VERSION.csv-metadata.json

where VERSION is the result of running git describe on the data repository, or the version string passed as--version=VERSION in case you are running the command on an export of the repository or a download from ZENODO.

Languoid search

To allow convenient search across all languoid info files, pyglottolog comes with functionality to create and search a Whoosh index. To do so, run

glottolog searchindex

This will take a couple of minutes (~15 on a somewhat beefy laptop with SSD) and build an index of about 800 MB size at build/.

Now you can search the index, e.g. using alternative names as query:

$ glottolog langsearch "Abipónok"
1 matches
Abipon [abip1241] language
languoids/tree/guai1249/guai1250/abip1241/md.ini
Abipónok [hu]

1 matches

But you can also exploit the schema defined in pyglottolog.fts.get_langs_index; i.e. use fields in your query:

$ glottolog langsearch "country:PG"
...

Alamblak [alam1246] language
languoids/tree/sepi1257/sepi1258/east2496/alam1246/md.ini
Papua New Guinea (PG)

906 matches

$ glottolog --repos=. langsearch "iso:mal"
...

Malayalam [mala1464] language
languoids/tree/drav1251/sout3133/sout3138/tami1291/tami1292/tami1293/tami1294/tami1297/tami1298/mala1541/mala1464/md.ini

1 matches

Reference search

The same can be done for reference data: To create a Whoosh index with all reference data, run

glottolog searchindex

Now you can query the index (using the fields described in the schema):

$ glottolog refsearch "author:Haspelmath AND title:Atlas"
...
(13 matches)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyglottolog-3.10.0.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyglottolog-3.10.0-py2.py3-none-any.whl (6.7 MB view details)

Uploaded Python 2Python 3

File details

Details for the file pyglottolog-3.10.0.tar.gz.

File metadata

  • Download URL: pyglottolog-3.10.0.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10

File hashes

Hashes for pyglottolog-3.10.0.tar.gz
Algorithm Hash digest
SHA256 73a8a396b9d099c63e6f628612d923b3b24375308053b0839b22021bda9fd2a4
MD5 746c9572bcb653a5d5cbb15336996eb5
BLAKE2b-256 614185c51091be7432798413ba6193e153d2aa0f067a1e3305ccb0d11b3a54d2

See more details on using hashes here.

File details

Details for the file pyglottolog-3.10.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pyglottolog-3.10.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.7 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.10

File hashes

Hashes for pyglottolog-3.10.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 22e1de4b919bdcaee05a1c34e6e498de3b9f150c3ab367e2c9fe145e400d4f67
MD5 67f28de8f071caba7935af4112d32abf
BLAKE2b-256 be61264946990763218fb4c21e20dcc9d5c5360db23e259924892341897e8b25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page