Skip to main content

a simple utility to take in a sentence and output information about the AWL words in it

Project description

Awlify

made-with-python GitHub license

A very basic tool that takes in a sentence of text and outputs the same text, annotated with information about whether any of its words are in the Academic Word List.

installing

pip install awlify

and if you haven't used spacy on your system before, you'll need to install the model we're using here with the command below:

python -m spacy download en_core_web_sm

tests

python -m unittest

command usage

from awlify import awlify

result = awlify('please inform me of the academic words in this sentence')

print(result)
{'data': {'sentence': 'please inform me of the academic words in this sentence', 'awl_words': [{'index': 5, 'word': 'academic', 'meta': {'head': 'academy', 'sublist': 5}}]}}

expected input / output

output format:

{
  'data': {
    'sentence': 'THIS IS THE ORIGINAL SENTENCE',
    'awl_words': [
      {
        'index': INDEX_OF_AWL_WORD_FOUND,
        'word': 'AWL_WORD_FOUND',
        'meta': {
          'head': 'THE_HEADWORD_FROM_THE_AWL',
          'sublist': THE_AWL_SUBLIST_OF_THE_WORD
        }
      }
    ]
  }
}

example input for a simple sentence (no AWL words):

simple_sentence = awlify('this is a sentence')

example output for a simple sentence (no AWL words):

{
  'data': {
    'sentence': 'this is a sentence',
    'awl_words': []
  }
}

example input for a complex sentence (a few AWL words):

complex_sentence = awlify('the economic recovery is ongoing and potentially problematic')

example output for a complex sentence (a few AWL words):

{
  'data': {
    'sentence': 'the economic recovery is ongoing and potentially problematic',
    'awl_words': [
      {
        'index': 1,
        'word': 'economic',
        'meta': {
          'head': 'economy',
          'sublist': 1
        }
      },
      {
        'index': 2,
        'word': 'recovery',
        'meta': {
          'head': 'recover',
          'sublist': 6
        }
      },
      {
        'index': 6,
        'word': 'potentially',
        'meta': {
          'head': 'potential',
          'sublist': 2
        }
      }
    ]
  }
}

NOTES

The current implementation of the sentence tokenization uses spacy, and so it's a bit heavier than absolutely necessary, since we're not taking advantage of any of the more advanced characteristics of the package.

In theory, it could probably perform 98% as well with just a simple regex, so I might add the option to do that in the future if there aren't any real use cases for needing the full weight of spacy.

REFERENCES

Coxhead, Averil (2000) A New Academic Word List. TESOL Quarterly, 34(2): 213-238.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awlify-1.1.1.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

awlify-1.1.1-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file awlify-1.1.1.tar.gz.

File metadata

  • Download URL: awlify-1.1.1.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for awlify-1.1.1.tar.gz
Algorithm Hash digest
SHA256 f6f384c43f87031b685db6796e6bebc886cb648b25b1d882fa915f1c7e4f6df6
MD5 5aca0175ad8c8220c4bc93eaf988bee7
BLAKE2b-256 37330aa8c0d2325b2b994a71aba62b3f6f52754ea412a18727a2dc50e6560cb5

See more details on using hashes here.

File details

Details for the file awlify-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: awlify-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for awlify-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 10fc9c0933b9da82427ee4222c5b97d09efd44a09b13d9edcfb844f5d779a2e0
MD5 60bbc4dde2b2e8580e6cabf63da204d9
BLAKE2b-256 802ee5a40011f57ba1bf8cf767fed0a16ebb93673fc678d10eb3d7adabd4aa9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page