Skip to main content

A fast and lightweight Python RDF parser which wraps bindings to Rust's Rio using PyO3

Project description

LightRDF

PyPI PyPI - Downloads

A fast and lightweight Python RDF parser which wraps bindings to Rust's Rio using PyO3.

Contents

Features

  • Supports N-Triples, Turtle, and RDF/XML
  • Handles large-size RDF documents
  • Provides HDT-like interfaces

Install

pip install lightrdf

Basic Usage

Iterate over all triples

With Parser:

import lightrdf

parser = lightrdf.Parser()

for triple in parser.parse("./go.owl", base_iri=None):
    print(triple)

With RDFDocument:

import lightrdf

doc = lightrdf.RDFDocument("./go.owl")

# `None` matches arbitrary term
for triple in doc.search_triples(None, None, None):
    print(triple)

Search triples with a triple pattern

import lightrdf

doc = lightrdf.RDFDocument("./go.owl")

for triple in doc.search_triples("http://purl.obolibrary.org/obo/GO_0005840", None, None):
    print(triple)

# Output:
# ('<http://purl.obolibrary.org/obo/GO_0005840>', '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', '<http://www.w3.org/2002/07/owl#Class>')
# ('<http://purl.obolibrary.org/obo/GO_0005840>', '<http://www.w3.org/2000/01/rdf-schema#subClassOf>', '<http://purl.obolibrary.org/obo/GO_0043232>')
# ...
# ('<http://purl.obolibrary.org/obo/GO_0005840>', '<http://www.geneontology.org/formats/oboInOwl#inSubset>', '<http://purl.obolibrary.org/obo/go#goslim_yeast>')
# ('<http://purl.obolibrary.org/obo/GO_0005840>', '<http://www.w3.org/2000/01/rdf-schema#label>', '"ribosome"^^<http://www.w3.org/2001/XMLSchema#string>')

Search triples with a triple pattern (Regex)

import lightrdf
from lightrdf import Regex

doc = lightrdf.RDFDocument("./go.owl")

for triple in doc.search_triples(Regex("^<http://purl.obolibrary.org/obo/.*>$"), None, Regex(".*amino[\w]+?transferase")):
    print(triple)

# Output:
# ('<http://purl.obolibrary.org/obo/GO_0003961>', '<http://www.w3.org/2000/01/rdf-schema#label>', '"O-acetylhomoserine aminocarboxypropyltransferase activity"^^<http://www.w3.org/2001/XMLSchema#string>')
# ('<http://purl.obolibrary.org/obo/GO_0004047>', '<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym>', '"S-aminomethyldihydrolipoylprotein:(6S)-tetrahydrofolate aminomethyltransferase (ammonia-forming) activity"^^<http://www.w3.org/2001/XMLSchema#string>')
# ...
# ('<http://purl.obolibrary.org/obo/GO_0050447>', '<http://www.w3.org/2000/01/rdf-schema#label>', '"zeatin 9-aminocarboxyethyltransferase activity"^^<http://www.w3.org/2001/XMLSchema#string>')
# ('<http://purl.obolibrary.org/obo/GO_0050514>', '<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym>', '"spermidine:putrescine 4-aminobutyltransferase (propane-1,3-diamine-forming)"^^<http://www.w3.org/2001/XMLSchema#string>')

Load file objects / texts

Load file objects with Parser:

import lightrdf

parser = lightrdf.Parser()

with open("./go.owl", "rb") as f:
    for triple in parser.parse(f, format="owl", base_iri=None):
        print(triple)

Load file objects with RDFDocument:

import lightrdf

with open("./go.owl", "rb") as f:
    doc = lightrdf.RDFDocument(f, parser=lightrdf.xml.PatternParser)

    for triple in doc.search_triples("http://purl.obolibrary.org/obo/GO_0005840", None, None):
        print(triple)

Load texts:

import io
import lightrdf

data = """<http://one.example/subject1> <http://one.example/predicate1> <http://one.example/object1> .
_:subject1 <http://an.example/predicate1> "object1" .
_:subject2 <http://an.example/predicate2> "object2" ."""

doc = lightrdf.RDFDocument(io.BytesIO(data.encode()), parser=lightrdf.turtle.PatternParser)

for triple in doc.search_triples("http://one.example/subject1", None, None):
    print(triple)

Benchmark (WIP)

On MacBook Air (13-inch, 2017), 1.8 GHz Intel Core i5, 8 GB 1600 MHz DDR3

https://gist.github.com/ozekik/b2ae3be0fcaa59670d4dd4759cdffbed

$ wget -q http://purl.obolibrary.org/obo/go.owl
$ gtime python3 count_triples_rdflib_graph.py ./go.owl  # RDFLib 4.2.2
1436427
235.29user 2.30system 3:59.56elapsed 99%CPU (0avgtext+0avgdata 1055816maxresident)k
0inputs+0outputs (283major+347896minor)pagefaults 0swaps
$ gtime python3 count_triples_lightrdf_rdfdocument.py ./go.owl  # LightRDF 0.1.1
1436427
7.90user 0.22system 0:08.27elapsed 98%CPU (0avgtext+0avgdata 163760maxresident)k
0inputs+0outputs (106major+41389minor)pagefaults 0swaps
$ gtime python3 count_triples_lightrdf_parser.py ./go.owl  # LightRDF 0.1.1
1436427
8.00user 0.24system 0:08.47elapsed 97%CPU (0avgtext+0avgdata 163748maxresident)k
0inputs+0outputs (106major+41388minor)pagefaults 0swaps

https://gist.github.com/ozekik/636a8fb521401070e02e010ce591fa92

$ wget -q http://downloads.dbpedia.org/2016-10/dbpedia_2016-10.nt
$ gtime python3 count_triples_rdflib_ntparser.py dbpedia_2016-10.nt  # RDFLib 4.2.2
31050
1.63user 0.23system 0:02.47elapsed 75%CPU (0avgtext+0avgdata 26568maxresident)k
0inputs+0outputs (1140major+6118minor)pagefaults 0swaps
$ gtime python3 count_triples_lightrdf_ntparser.py dbpedia_2016-10.nt  # LightRDF 0.1.1
31050
0.21user 0.04system 0:00.36elapsed 71%CPU (0avgtext+0avgdata 7628maxresident)k
0inputs+0outputs (534major+1925minor)pagefaults 0swaps

Alternatives

  • RDFLib – (Pros) pure-Python, matured, feature-rich / (Cons) takes some time to load triples
  • pyHDT – (Pros) extremely fast and efficient / (Cons) requires pre-conversion into HDT

Todo

  • Push to PyPI
  • Adopt CI
  • Handle Base IRI
  • Add basic tests
  • Switch to maturin-action from cibuildwheel
  • Support NQuads and TriG
  • Add docs
  • Add tests for w3c/rdf-tests
  • Resume on error
  • Allow opening fp
  • Support and test on RDF-star

License

Rio and PyO3 are licensed under the Apache-2.0 license.

Copyright 2020 Kentaro Ozeki

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightrdf-0.4.0.tar.gz (174.1 kB view hashes)

Uploaded Source

Built Distributions

lightrdf-0.4.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-cp311-none-win_amd64.whl (863.7 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

lightrdf-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-cp311-cp311-macosx_11_0_arm64.whl (1.0 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

lightrdf-0.4.0-cp311-cp311-macosx_10_7_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

lightrdf-0.4.0-cp310-none-win_amd64.whl (863.7 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

lightrdf-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-cp310-cp310-macosx_11_0_arm64.whl (1.0 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

lightrdf-0.4.0-cp310-cp310-macosx_10_7_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

lightrdf-0.4.0-cp39-none-win_amd64.whl (863.5 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

lightrdf-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-cp38-none-win_amd64.whl (863.3 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

lightrdf-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

lightrdf-0.4.0-cp37-none-win_amd64.whl (863.3 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

lightrdf-0.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

lightrdf-0.4.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page