Python support library for the Humanitarian Exchange Language (HXL). See http://hxlstandard.org and https://github.com/HXLStandard/libhxl-python

Project description

libhxl-python

Python support library for the Humanitarian Exchange Language (HXL) data standard. The library requires Python 3 (versions prior to 4.6 also supported Python 2.7).

About HXL: http://hxlstandard.org

More-detailed documentation is available in the wiki: https://github.com/HXLStandard/libhxl-python/wiki

Usage

The hxl.data() function reads HXL from a file object, filename, URL, or list of arrays and makes it available for processing, much like $() in JQuery:

import sys
import hxl

dataset = hxl.data(sys.stdin)

You can add additional methods to process the data. This example shows an identity transformation in a pipeline (See "Generators", below):

for line in hxl.data(sys.stdin).gen_csv():
    print(line)

This is the Same transformation, but loading the entire dataset into memory as an intermediate step (see "Filters", below):

for line in hxl.data(sys.stdin).cache().gen_csv():
    print(line)

Filters

There are a number of filters that you can apply in a stream after a HXL dataset. This example uses the with_rows() filter to find every row that has a #sector of "WASH" and print the organisation mentioned in the row:

for row in hxl.data(sys.stdin).with_rows('#sector=WASH'):
    print('The organisation is {}'.format(row.get('#org')))

This example removes the WASH sector from the results, then counts the number of times each organisation appears in the remaining rows:

url = 'http://example.org/data.csv'
result = hxl.data(url).with_rows('#sector!=WASH').count('#org')

The following filters are available:

Filter method	Description
`.append(append_sources, add_columns=True, queries=[])`	Append a second HXL dataset to the current one, lining up columns.
`.cache()`	Cache an in-memory version of the dataset (for processing multiple times).
`.dedup(patterns=[], queries=[])`	Deduplicate the rows in a dataset, optionally looking only at specific columns.
`.with_columns(whitelist)`	Include only columns that match the whitelisted tag pattern(s), e.g. "#org+impl".
`.without_columns(blacklist)`	Include all columns except those that match the blacklisted tag patterns.
`.with_rows(queries, mask=[])`	Include only rows that match at least one of the queries, e.g. "#sector=WASH". Optionally ignore rows that don't match a mask pattern.
`.without_rows(queries, mask=[])`	Exclude rows that match at least one of the queries, e.g. "#sector=WASH". Optionally ignore rows that don't match a mask pattern.
`.sort(keys=None, reverse=False)`	Sort the rows, optionally using the pattern(s) provided as sort keys. Set _reverse_ to True for a descending sort.
`.count(patterns=[], aggregators=None, queries=[])`	Count the number of value combinations that appear for the pattern(s), e.g. ['#sector', '#org']. Optionally perform other aggregations, such as sums or averages.
`.replace_data(original, replacement, pattern=None, use_regex=False, queries=[])`	Replace values in a HXL dataset.
`.replace_data_map(map_source, queries=[])`	Replace values in a HXL dataset using a replacement map in another HXL dataset.
`.add_columns(specs, before=False)`	Add columns with fixed values to the dataset, e.g. "Country#country=Kenya" to add a new column #country with the text header "Country" and the value "Kenya" in every row.
`.rename_columns(specs)`	Change the header text and HXL hashtags for one or more columns.
`.clean_data(whitespace=[], upper=[], lower=[], date=[], number=[], queries=[])`	Clean whitespace, normalise dates and numbers, etc., optionally limited to specific columns.
`.merge_data(merge_source, keys, tags, replace=False, overwrite=False, queries=[])`	Merge values horizontally from a second dataset, based on shared keys (like a SQL join).
`.explode(header_attribute='header', value_attribute='value')`	Explode a "wide" dataset into a "long" dataset, using the HXL +label attribute.

Sinks

Sinks take a HXL stream and convert it into something that's not HXL.

Validation

To validate a HXL dataset against a schema (also in HXL), use the validate sink:

is_valid = hxl.data(url).validate('my-schema.csv')

If you don't specify a schema, the library will use a simple, built-in schema:

is_valid = hxl.data(url).validate()

If you include a callback, you can collect details about the errors and warnings:

def my_callback(error_info):
    # error_info is a HXLValidationException
    sys.stderr.write(error_info)

is_valid = hxl.data(url).validate(schema='my-schema.csv', callback=my_callback)

Generators

Generators allow the re-serialising of HXL data, returning something that works like an iterator. Example:

for line in hxl.data(url).gen_csv():
    print(line)

The following generators are available (you can use the parameters to turn the text headers and HXL tags on or off):

Generator method	Description
`Dataset.gen_raw(show_headers=True, show_tags=True)`	Generate arrays of strings, one row at a time.
`Dataset.gen_csv(show_headers=True, show_tags=True)`	Generate encoded CSV rows, one row at a time.
`Dataset.gen_json(show_headers=True, show_tags=True)`	Generate encoded JSON rows, one row at a time.

Caching

libhxl uses the Python requests library for opening URLs. If you want to enable caching (for example, to avoid beating up on your source with repeated requests), your code can use the requests_cache plugin, like this:

import requests_cache
requests_cache.install_cache('demo_cache', expire_after=3600)

The default caching backend is a sqlite database at the location specied.

Installation

This repository includes a standard Python setup.py script for installing the library and scripts (applications) on your system. In a Unix-like operating system, you can install using the following command:

python setup.py install

If you don't need to install from source, try simply

pip install libhxl

Once you've installed, you will be able to include the HXL libraries from any Python application, and will be able to call scripts like hxlvalidate from the command line.

Makefile

There is also a generic Makefile that automates many tasks, including setting up a Python virtual environment for testing. The Python3 venv module is required for most of the targets.

make build-venv

Set up a local Python virtual environment for testing, if it doesn't already exist. Will recreate the virtual environment if setup.py has changed.

make test

Set up a virtual environment (if missing) and run all the unit tests

make test-install

Test a clean installation to verify there are no missing dependencies, etc.

make close-issue

Merge the current git issue branch into the dev branch and delete the issue branch.

make push-dev

Push the git dev branch to upstream.

make merge-test

Merge the git dev branch into the test branch and push to upstream.

make merge-master

Merge the git test branch into the master branch and push to upstream.

make etags

(Re)build the TAGS file that Emacs uses for global search and replace.

Project details

Release history Release notifications | RSS feed

5.2.2

Oct 25, 2024

5.2.1

Feb 5, 2024

5.2

Jan 10, 2024

5.1

Sep 25, 2023

5.0.3

Sep 21, 2023

5.0.2

Jul 13, 2023

5.0.1

Jun 1, 2023

5.0

May 18, 2023

4.29

Apr 6, 2023

4.28

Mar 20, 2023

4.27.3

Dec 12, 2022

4.27.2

Nov 25, 2022

4.27

Sep 30, 2022

4.26

Aug 5, 2022

4.25.2

Jul 15, 2022

4.25.1

Jun 28, 2022

4.25

Mar 28, 2022

4.24.1

Apr 23, 2021

4.24

Apr 23, 2021

4.23

Feb 8, 2021

4.22

Jan 15, 2021

This version

4.21.3

Jan 13, 2021

4.21.2

Dec 8, 2020

4.21.1

Aug 18, 2020

4.21

Jul 22, 2020

4.20

May 20, 2020

4.19

Apr 23, 2020

4.18

Mar 16, 2020

4.17

Feb 25, 2020

4.16

Dec 9, 2019

4.15.1

Apr 4, 2019

4.15

Mar 29, 2019

4.14

Mar 4, 2019

4.13.2

Feb 6, 2019

4.13.1

Jan 31, 2019

4.13

Jan 31, 2019

4.12

Dec 3, 2018

4.11

Aug 29, 2018

4.10

Jul 30, 2018

4.9

Jun 29, 2018

4.8.4

Jun 14, 2018

4.8.3

Jun 5, 2018

4.8.2

May 31, 2018

4.8.1

May 31, 2018

4.8

May 31, 2018

4.7.1

May 11, 2018

4.7

May 2, 2018

4.6

Mar 29, 2018

4.5.1

Feb 5, 2018

4.5

Jan 30, 2018

4.4

Nov 22, 2017

4.3

Jun 13, 2017

4.2

Jun 6, 2017

4.1

Jun 5, 2017

4.0

Dec 2, 2016

3.3

Oct 18, 2016

3.2

Aug 30, 2016

3.1

Jul 29, 2016

3.0

Jul 23, 2016

2.8

Jun 22, 2016

2.7.2

Apr 12, 2016

2.7.1

Mar 15, 2016

2.7

Mar 15, 2016

2.6

Feb 26, 2016

2.5

Jan 14, 2016

2.4

Jan 11, 2016

2.2

Oct 10, 2015

2.1

Oct 10, 2015

2.0

Sep 4, 2015

1.20

Aug 17, 2015

1.19

Jul 28, 2015

1.14

Jul 6, 2015

1.13

May 4, 2015

1.12

Apr 18, 2015

1.11

Apr 9, 2015

1.02beta pre-release

Mar 29, 2015

1.1

Apr 9, 2015

1.01beta pre-release

Mar 29, 2015

1.0beta pre-release

Mar 12, 2015

0.13

Feb 14, 2015

0.12

Feb 12, 2015

0.11

Feb 11, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libhxl-4.21.3.tar.gz (91.4 kB view details)

Uploaded Jan 13, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

libhxl-4.21.3-py3.8.egg (231.7 kB view details)

Uploaded Jan 13, 2021 Egg

File details

Details for the file libhxl-4.21.3.tar.gz.

File metadata

Download URL: libhxl-4.21.3.tar.gz
Upload date: Jan 13, 2021
Size: 91.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for libhxl-4.21.3.tar.gz
Algorithm	Hash digest
SHA256	`521e6a25dffcf359db0cd26767c1d051ae9593a3b9052960004efd3e78ea8550`
MD5	`087bfe7d9b5003e9ec2a6654e514065e`
BLAKE2b-256	`4386964d75992e6743b5cbb58a5a5546cf436fe303f4fba86c28e13aeda4621d`

See more details on using hashes here.

File details

Details for the file libhxl-4.21.3-py3.8.egg.

File metadata

Download URL: libhxl-4.21.3-py3.8.egg
Upload date: Jan 13, 2021
Size: 231.7 kB
Tags: Egg
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for libhxl-4.21.3-py3.8.egg
Algorithm	Hash digest
SHA256	`f31fb1e4be3fb2116a839093d407f0e8521ecee5ee671d737b7ef846812dacb6`
MD5	`488a9e76b20bf61ff01c3a1c7f643854`
BLAKE2b-256	`abbe10f03a477247ded6a63b63373ce8f084378e9308bad41be5e69a4cea66a6`

See more details on using hashes here.

libhxl 4.21.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

libhxl-python

Usage

Filters

Sinks

Validation

Generators

Caching

Installation

Makefile

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes