Yoctol Utterance processing utilities
Project description
UTTUT
UTTerance UTilities for dialogue system. This package provides some general utils when processing chatbot utterance data.
BERT Pipe
To create a pipe for BERT preprocessing, please take a look at BERT.
Installation
$ pip install uttut
Usage
Let's create a Pipe to preprocess a Datum with English utterance.
Build a Pipe
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe()
>>> p.add('IntTokenWithSpace')
>>> p.add('FloatTokenWithSpace')
>>> p.add('MergeWhiteSpaceCharacters')
>>> p.add('StripWhiteSpaceCharacters')
>>> p.add('EngTokenizer') # word-level (ref: BERT)
>>> p.add('AddSosEos', checkpoint='result_of_add_sos_eos')
>>> p.add('Pad', {'maxlen': 5})
>>> p.add(
'Token2Index',
{
'token2index': {
'<sos>': 0, '<eos>': 1, # for AddSosEos
'<unk>': 2, '<pad>': 3, # for Pad
'_int_': 4, # for IntTokenWithSpace
'_float_': 5, # for FloatTokenWithSpace
'I': 6,
'apples': 7,
},
},
)
transform
>>> from uttut.elements import Datum, Entity, Intent
>>> datum = Datum(
utterance='I like apples.',
intents=[Intent(label=1), Intent(label=2)],
entities=[Entity(start=7, end=13, value='apples', label=7)],
)
>>> output_indices, intent_labels, entity_labels, label_aligner, intermediate = p.transform(datum)
>>> output_indices
[0, 6, 2, 7, 1, 3, 3]
>>> intent_labels
[1, 2]
>>> entity_labels
[0, 0, 0, 7, 0, 0, 0]
# intermediate
>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')
["<sos>", "I", "like", "apples", "<eos>"]
# label_aligner
>>> label_aligner.inverse_transform(entity_labels)
[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]
transform sequence
>>> output_sequence, label_aligner, intermediate = p.transform_sequence('I like apples.')
>>> output_sequence
[0, 6, 2, 7, 1, 3, 3]
# label_aligner
>>> label_aligner.transform([0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0])
[0, 0, 0, 7, 0, 0, 0]
>>> label_aligner.inverse_transform([0, 0, 0, 7, 0, 0, 0])
[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]
# intermediate
>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')
["<sos>", "I", "like", "apples", "<eos>"]
Serialization
Serialize
>>> serialized_str = p.serialize()
Deserialize
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe.deserialize(serialized_str )
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
uttut-1.4.7.tar.gz
(496.5 kB
view details)
File details
Details for the file uttut-1.4.7.tar.gz.
File metadata
- Download URL: uttut-1.4.7.tar.gz
- Upload date:
- Size: 496.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3811b373c7e070ea486fd71eb45e87ca507badfb7d3efa0d4772dbfc587bd7ed
|
|
| MD5 |
3a9f3598777fd144b105894de2d40947
|
|
| BLAKE2b-256 |
e1c52350523b000d1ee470dc9c5b7fec40f3063b8ba3f65d8f98cfd3191a054a
|