Skip to main content

A grammatical distribution analyser for NLP datasets.

Project description

GraDiAn

The Grammatical Distribution Analyser (GraDiAn) is used for analysing grammatical distributions; particularly the distributions of popular NLP datasets.

At the moment, GraDiAn does this by providing two abstract data types: the Syntactic Dependency Counter and the SentTree.

SentTree

SentTree represents a given sentence in a tree structure. Importantly, the SentTree can be used to analyse the parse-tree with regards to different properties of the text including part-of-speech tags, syntactic dependencies and (with the help of spaCyTextBlob) sentiment.

## Syntactic Dependency Counter (SDC) An SDC does what it says on the tin. Inheriting from python's collections.Counter class, it maintains a count of syntactic dependency labels.

Usage

Syntactic Dependency Counter

Syntactic Dependency Counter from text:

>>> from gradian import SDC
>>> sdc = SDC.from_string('This is a test sentence!')
>>> sdc
SDC({'nsubj': 1, 'ROOT': 1, 'det': 1, 'compound': 1, 'attr': 1, 'punct': 1})

Or from a series of texts:

>>> from gradian import SDC
>>> sdc = SDC.from_string_arr(['This is a test sentence!', 'This is another sentence',
                               'How about another?'])
>>> sdc
SDC({'ROOT': 3, 'nsubj': 2, 'det': 2, 'attr': 2, 'punct': 2, 'compound': 1, 'advmod': 1, 'pobj': 1}

SentTree

SentTree from text:

>>> from gradian import SentTree
>>> sent_trees = SentTree.from_string('This is a test sentence! But this is another!')
>>> # Sent_Tree.from_string produces a list of trees; one for each sentence
>>> sent_trees[0].attr_tree('pos')  # Get the Tree with respect to the sentence's POS-Tags
Tree('AUX', ['DET', Tree('NOUN', ['DET', 'NOUN']), 'PUNCT'])

attr_tree can be used with any attribute of the tree including syntactic dependencies, POS-tags and (if spaCyTextBlob is enabled) sentiment.

>>> sent_trees[0].attr_tree('dependency')
Tree('ROOT', ['nsubj', Tree('attr', ['det', 'compound']), 'punct'])

The function can be called with token=True to see the attributes alongside the relevant tokens:

>>> # token is a positional argument so does not need to be explicitly provided by keyword
>>> sent_trees[0].attr_tree('pos', token=True)  
Tree('is:  AUX', ['This: DET', Tree('sentence:  NOUN', ['a: DET', 'test: NOUN']), '!: PUNCT'])

SentTrees also come with the ability to create multi-attribute trees.

>>> sent_trees[0].multi_attr_tree(['pos', 'dependency'], True)
Tree('is:AUX:ROOT', ['This:DET:nsubj', Tree('sentence:NOUN:attr', ['a:DET:det', 'test:NOUN:compound']), '!:PUNCT:punct'])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GraDiAn-0.0.0.1.tar.gz (19.2 kB view hashes)

Uploaded Source

Built Distribution

GraDiAn-0.0.0.1-py3-none-any.whl (20.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page