Skip to main content

A Python Library for the Processing of Cross-Linguistic Data

Project description

CL ToolKit

Build Status Documentation Status PyPI

A Python Library for the Processing of Cross-Linguistic Data.

By Johann-Mattis List and Robert Forkel.

Overview

While pycldf provides a basic Python API to access cross-linguistic data encoded in CLDF datasets, cltoolkit goes one step further, turning the data into full-fledged Python objects rather than shallow proxies for rows in a CSV file. Of course, as with pycldf's ORM package, there's a trade-off involved, gaining convenient access and a more pythonic API at the expense of performance (in particular memory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations of these) will be processable with cltoolkit on reasonable hardware in minutes - rather than hours.

The main idea behind cltoolkit is making (aggregated) CLDF data easily amenable for computation of linguistic features in a general sense (e.g. typological features, etc.). This is done by

  • providing the data for processing code as Python objects,
  • providing a framework that makes feature computation as simple as writing a Python function acting on a cltoolkit.models.Language object.

In general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could compare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference properties to link to reference catalogs, i.e.

cltoolkit objects exploit this extended comparability by distinguishing "senses" and "concepts" and "graphemes" and "sounds" and providing convenient access to comparable subsets of objects in an aggregation (see models.py).

See example.md for a walk-through of the typical workflow with cltoolkit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cltoolkit-0.2.0.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cltoolkit-0.2.0-py2.py3-none-any.whl (25.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file cltoolkit-0.2.0.tar.gz.

File metadata

  • Download URL: cltoolkit-0.2.0.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cltoolkit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 29c6cc1be983ee52959d4e97379c6608d0f94d7f89f3db8d2b39a295d516f79f
MD5 ae8d121527e8df2770ed0c1a50ae36f5
BLAKE2b-256 9cc1af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630

See more details on using hashes here.

File details

Details for the file cltoolkit-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: cltoolkit-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cltoolkit-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 36447e5dbf1bd6ffbce8ce71e1e24be46e7a2619e8ce6fda433fb01c15fad68c
MD5 327b2127d9a5f91643f7629e19751dfb
BLAKE2b-256 5ee167e9c8b3b4bf45c74f8ca41a1d67d5e270f4389558fb6aca518fd525914b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page