A Python Library for the Processing of Cross-Linguistic Data
Project description
CL ToolKit
A Python Library for the Processing of Cross-Linguistic Data.
By Johann-Mattis List and Robert Forkel.
Overview
While pycldf provides a basic Python API to access cross-linguistic data
encoded in CLDF datasets,
cltoolkit goes one step further, turning the data into full-fledged Python objects rather than
shallow proxies for rows in a CSV file. Of course, as with pycldf's ORM package, there's a trade-off
involved, gaining convenient access and a more pythonic API at the expense of performance (in particular
memory footprint but also data load time) and write-access. But most of today's CLDF datasets (or aggregations
of these) will be processable with cltoolkit on reasonable hardware in minutes - rather than hours.
The main idea behind cltoolkit is making (aggregated) CLDF data easily amenable for computation
of linguistic features in a general sense (e.g. typological features, etc.). This is done by
- providing the data for processing code as Python objects,
- providing a framework that makes feature computation
as simple as writing a Python function acting on a
cltoolkit.models.Languageobject.
In general, aggregated CLDF Wordlists provide limited (automated) comparability across datasets (e.g. one could compare the number of words per language in each dataset). A lot more can be done when datasets use CLDF reference properties to link to reference catalogs, i.e.
- link language varieties to Glottolog languoids,
- link senses to Concepticon concept sets,
- link sound segments to CLTS sounds.
cltoolkit objects exploit this extended comparability by distinguishing "senses" and "concepts" and "graphemes"
and "sounds" and providing convenient access to comparable subsets of objects in an aggregation
(see models.py).
See example.md for a walk-through of the typical workflow with cltoolkit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cltoolkit-0.2.0.tar.gz.
File metadata
- Download URL: cltoolkit-0.2.0.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29c6cc1be983ee52959d4e97379c6608d0f94d7f89f3db8d2b39a295d516f79f
|
|
| MD5 |
ae8d121527e8df2770ed0c1a50ae36f5
|
|
| BLAKE2b-256 |
9cc1af90a68f60765f81d214b83ad94cdc08d77952fe74aa3df6000c9d927630
|
File details
Details for the file cltoolkit-0.2.0-py2.py3-none-any.whl.
File metadata
- Download URL: cltoolkit-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36447e5dbf1bd6ffbce8ce71e1e24be46e7a2619e8ce6fda433fb01c15fad68c
|
|
| MD5 |
327b2127d9a5f91643f7629e19751dfb
|
|
| BLAKE2b-256 |
5ee167e9c8b3b4bf45c74f8ca41a1d67d5e270f4389558fb6aca518fd525914b
|