A grammatical distribution analyser for NLP datasets.
Project description
GraDiAn
The Grammatical Distribution Analyser (GraDiAn) is used for analysing grammatical distributions; particularly the distributions of popular NLP datasets.
At the moment, GraDiAn does this by providing two abstract data types: the Syntactic Dependency Counter and the SentTree.
SentTree
SentTree
represents a given sentence in a tree structure.
Importantly, the SentTree
can be used to analyse the parse-tree with regards to different properties of the text including part-of-speech tags, syntactic dependencies and (with the help of spaCyTextBlob) sentiment.
## Syntactic Dependency Counter (SDC)
An SDC
does what it says on the tin.
Inheriting from python's collections.Counter
class, it maintains a count of syntactic dependency labels.
Usage
Syntactic Dependency Counter
Syntactic Dependency Counter from text:
>>> from gradian import SDC
>>> sdc = SDC.from_string('This is a test sentence!')
>>> sdc
SDC({'nsubj': 1, 'ROOT': 1, 'det': 1, 'compound': 1, 'attr': 1, 'punct': 1})
Or from a series of texts:
>>> from gradian import SDC
>>> sdc = SDC.from_string_arr(['This is a test sentence!', 'This is another sentence',
'How about another?'])
>>> sdc
SDC({'ROOT': 3, 'nsubj': 2, 'det': 2, 'attr': 2, 'punct': 2, 'compound': 1, 'advmod': 1, 'pobj': 1}
SentTree
SentTree from text:
>>> from gradian import SentTree
>>> sent_trees = SentTree.from_string('This is a test sentence! But this is another!')
>>> # Sent_Tree.from_string produces a list of trees; one for each sentence
>>> sent_trees[0].attr_tree('pos') # Get the Tree with respect to the sentence's POS-Tags
Tree('AUX', ['DET', Tree('NOUN', ['DET', 'NOUN']), 'PUNCT'])
attr_tree
can be used with any attribute of the tree including syntactic dependencies, POS-tags and (if spaCyTextBlob is enabled) sentiment.
>>> sent_trees[0].attr_tree('dependency')
Tree('ROOT', ['nsubj', Tree('attr', ['det', 'compound']), 'punct'])
The function can be called with token=True
to see the attributes alongside the relevant tokens:
>>> # token is a positional argument so does not need to be explicitly provided by keyword
>>> sent_trees[0].attr_tree('pos', token=True)
Tree('is: AUX', ['This: DET', Tree('sentence: NOUN', ['a: DET', 'test: NOUN']), '!: PUNCT'])
SentTrees
also come with the ability to create multi-attribute trees.
>>> sent_trees[0].multi_attr_tree(['pos', 'dependency'], True)
Tree('is:AUX:ROOT', ['This:DET:nsubj', Tree('sentence:NOUN:attr', ['a:DET:det', 'test:NOUN:compound']), '!:PUNCT:punct'])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for GraDiAn-0.0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | deee6a98e31c59f1a0ec0788694f8784058bf65e611ba67ed5b109760d3cbdc6 |
|
MD5 | b4581b92880c7e476166cedfc8f582ca |
|
BLAKE2b-256 | 109fe42bc7f7cb605d4ea6f4f04d85c6b30bb96f43298c55ec56b9562eae52dc |