Skip to main content

Reverse engineer patterns for use with the SpaCy DependencyTreeMatcher

Project description

SpaCy Pattern Builder

Use training examples to build and refine patterns for use with SpaCy's DependencyTreeMatcher.

Motivation

Generating patterns programmatically from training data is more efficient than creating them manually.

Installation

With pip:

pip install spacy-pattern-builder

Usage

# Import a SpaCy model, parse a string to create a Doc object
import en_core_web_sm

text = 'We introduce efficient methods for fitting Boolean models to molecular data.'
nlp = en_core_web_sm.load()
doc = nlp(text)

from spacy_pattern_builder import build_dependency_pattern

# Provide a list of tokens we want to match.
match_tokens = [doc[i] for i in [0, 1, 3]]  # [We, introduce, methods]

''' Note that these tokens must be fully connected. That is,
all tokens must have a path to all other tokens in the list,
without needing to traverse tokens outside of the list.
Otherwise, spacy-pattern-builder will raise a TokensNotFullyConnectedError.
You can get a connected set that includes your tokens with the following: '''
from spacy_pattern_builder import util
connected_tokens = util.smallest_connected_subgraph(match_tokens, doc)
assert match_tokens == connected_tokens

# Specify the token attributes / features to use
feature_dict = {  # This here is equal to the default feature_dict
    'DEP': 'dep_',
    'TAG': 'tag_'
}

# Build the pattern
pattern = build_dependency_pattern(doc, match_tokens, feature_dict=feature_dict)

from pprint import pprint
pprint(pattern)  # In the format consumed by SpaCy's DependencyTreeMatcher:
'''
[{'PATTERN': {'DEP': 'ROOT', 'TAG': 'VBP'}, 'SPEC': {'NODE_NAME': 'node1'}},
 {'PATTERN': {'DEP': 'nsubj', 'TAG': 'PRP'},
  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node0'}},
 {'PATTERN': {'DEP': 'dobj', 'TAG': 'NNS'},
  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node3'}}]
'''

# Create a matcher and add the newly generated pattern
from spacy.matcher import DependencyTreeMatcher

matcher = DependencyTreeMatcher(doc.vocab)
matcher.add('pattern', None, pattern)

# And match away
matches = matcher(doc)
for match_id, token_idxs in matches:
    tokens = [doc[i] for i in token_idxs]
    tokens = sorted(tokens, key=lambda w: w.i)
    print(tokens)  # [We, introduce, methods]

Acknowledgements

Uses:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-pattern-builder-0.0.1.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacy_pattern_builder-0.0.1-py2.py3-none-any.whl (8.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file spacy-pattern-builder-0.0.1.tar.gz.

File metadata

  • Download URL: spacy-pattern-builder-0.0.1.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for spacy-pattern-builder-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9dcda1c922eb069ad9906f86bc2bdedd2feaf4763f1da73eb725813e87a48d46
MD5 b4f559944ad81c473bb9d268668877ec
BLAKE2b-256 16f056a7732ad4c8f4e00c4f01ffe3fcfc9327f137433688e39a9cfad88d9ba5

See more details on using hashes here.

File details

Details for the file spacy_pattern_builder-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: spacy_pattern_builder-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for spacy_pattern_builder-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9598815f0f3102cbcf4e144d0e0ca1ace1de6045cf3b67f0a72231c71b2a5eb0
MD5 187c4cbbe20ba1ee1c6bb1acd837f720
BLAKE2b-256 6ac15cafbb3a439d320f6335a04a895fbb1beede34ea8b3cee40e4cec935a201

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page