TEA - Translation Engine Architect

Project description

TEA - Translation Engine Architect

A command line tool to create translation engine.

Install

First install pipx then (x being your python version):

pipx install pangeamt-tea

Usage

Step 1: Create a new project

tea new --customer customer --src_lang es --tgt_lang en --flavor automotion --version 2

This command will create the project directory structure:

├── customer_es_en_automotion_2
│   ├── config.yml
│   └── data

Then enter in the directory

cd customer_es_en_automotion_2

Step 2: Configuration

Tokenizer

A tokenizer can be applied to source and target

tea config tokenizer --src mecab  --tgt moses

To list all available tokenizer:

tea config tokenizer --help

if you would not like to use tokenizers you can run:

tea config tokenizer -s none -t none

Truecaser

tea config truecaser --src --tgt

if you would not like to use truecaser you can run:

tea config tokenizer

BPE / SentencePiece

For joint BPE:

tea config bpe -j

For not joint BPE:

 tea bpe -s -t

For using sentencepiece:

tea config bpe --sentencepiece

and options --model_type TEXT (unigram) --vocab_size INTEGER (8000) if you would like to modify them from default

Processors

tea config processors -s "{processors}"

being processors a list of preprocesses and postprocesses.

To list all available processors:

tea config processors --list

In order to test the processors that will be applied you can run this script in the main TEA project directory:

debug_normalizers.py <config_file> <src_test> <tgt_test>

being config_file the yaml config and src_test and tgt_test the segments to test for source and target text.

Prepare

tea config prepare --shard_size 100000 --src_seq_length 400 --tgt_seq_length 400

Translation model

tea config translation-model -n onmt

Step 3:

Copy some multilingual ressources (.tmx, bilingual files, .af ) into the 'data' directory

Step 4: Run

Create workflow

tea worflow new

Clean the data passing the normalizers and validators:

tea workflow clean -n {clean_th} -d

being clean_th the number of threads.

Preprocess the data (split data in train, dev or test, tokenization, BPE):

tea workflow prepare -n {prepare_th} -s 3

being prepare_th the number of threads.

Training model

tea workflow train --gpu 0

if you do not want to use gpu do not use this parameter.

Evaluate model

tea workflow eval --step {step} --src file.src --ref file.tgt --log file.log --out file.out --gpu 0

Reset

First of all you may check the current status of the workflow using:

tea workflow status

Then you can reset your worflow at any step (clean, prepare, train, eval) using:

tea worflow reset -s {step_name}

Or if you want to make a full reset of the workflow use:

tea workflow reset

If you need some help on how to use reset command:

tea workflow reset --help

Project details

Release history Release notifications | RSS feed

This version

0.2.34

Jan 20, 2022

0.2.33

Nov 4, 2021

0.2.32

Feb 24, 2021

0.2.31

Feb 8, 2021

0.2.30

Feb 8, 2021

0.2.29

Feb 5, 2021

0.2.28

Feb 5, 2021

0.2.27

Jan 7, 2021

0.2.26

Jan 4, 2021

0.2.25

Dec 22, 2020

0.2.24

Oct 2, 2020

0.2.23

Oct 2, 2020

0.2.22

Sep 28, 2020

0.2.21

Sep 4, 2020

0.2.20

Jul 2, 2020

0.2.19

Jun 4, 2020

0.2.18

Jun 3, 2020

0.2.17

May 26, 2020

0.2.16

May 18, 2020

0.2.15

May 13, 2020

0.2.14

May 12, 2020

0.2.13

May 8, 2020

0.2.12

Apr 28, 2020

0.2.11

Apr 27, 2020

0.2.10

Apr 21, 2020

0.2.9

Apr 17, 2020

0.2.8

Apr 17, 2020

0.2.7

Apr 10, 2020

0.2.6

Apr 6, 2020

0.2.5

Apr 2, 2020

0.2.4

Apr 2, 2020

0.2.3

Apr 1, 2020

0.2.2

Mar 31, 2020

0.2.1

Mar 30, 2020

0.2.0

Mar 27, 2020

0.1.1

Feb 13, 2020

0.1.0

Feb 12, 2020

0.0.103

Jan 20, 2020

0.0.102

Jan 20, 2020

0.0.101

Jan 20, 2020

0.0.100

Jan 20, 2020

0.0.99

Nov 19, 2019

0.0.98

Nov 19, 2019

0.0.97

Nov 19, 2019

0.0.96

Nov 19, 2019

0.0.95

Nov 19, 2019

0.0.94

Nov 19, 2019

0.0.93

Nov 19, 2019

0.0.92

Nov 19, 2019

0.0.91

Nov 19, 2019

0.0.90

Nov 19, 2019

0.0.89

Nov 19, 2019

0.0.88

Nov 19, 2019

0.0.87

Nov 19, 2019

0.0.86

Nov 19, 2019

0.0.85

Nov 19, 2019

0.0.84

Nov 19, 2019

0.0.83

Nov 19, 2019

0.0.82

Nov 19, 2019

0.0.81

Nov 19, 2019

0.0.80

Nov 19, 2019

0.0.79

Nov 19, 2019

0.0.78

Nov 19, 2019

0.0.77

Nov 19, 2019

0.0.76

Nov 19, 2019

0.0.75

Nov 19, 2019

0.0.74

Nov 19, 2019

0.0.73

Nov 19, 2019

0.0.72

Nov 19, 2019

0.0.71

Nov 19, 2019

0.0.70

Nov 19, 2019

0.0.69

Nov 19, 2019

0.0.68

Nov 19, 2019

0.0.67

Nov 19, 2019

0.0.66

Nov 19, 2019

0.0.65

Nov 19, 2019

0.0.64

Nov 19, 2019

0.0.63

Nov 19, 2019

0.0.61

Nov 19, 2019

0.0.60

Nov 19, 2019

0.0.59

Nov 19, 2019

0.0.58

Nov 19, 2019

0.0.57

Nov 14, 2019

0.0.56

Nov 14, 2019

0.0.54

Nov 14, 2019

0.0.52

Nov 13, 2019

0.0.51

Nov 13, 2019

0.0.50

Nov 13, 2019

0.0.49

Nov 13, 2019

0.0.48

Nov 13, 2019

0.0.47

Nov 13, 2019

0.0.46

Nov 13, 2019

0.0.45

Nov 13, 2019

0.0.44

Nov 13, 2019

0.0.43

Nov 13, 2019

0.0.42

Nov 13, 2019

0.0.41

Nov 13, 2019

0.0.40

Nov 13, 2019

0.0.39

Nov 13, 2019

0.0.38

Nov 13, 2019

0.0.37

Nov 13, 2019

0.0.36

Nov 13, 2019

0.0.35

Nov 13, 2019

0.0.34

Nov 13, 2019

0.0.33

Nov 13, 2019

0.0.32

Nov 13, 2019

0.0.31

Nov 13, 2019

0.0.30

Nov 13, 2019

0.0.29

Nov 13, 2019

0.0.28

Nov 13, 2019

0.0.27

Nov 13, 2019

0.0.26

Nov 13, 2019

0.0.25

Nov 13, 2019

0.0.24

Nov 13, 2019

0.0.23

Nov 13, 2019

0.0.22

Nov 13, 2019

0.0.21

Nov 13, 2019

0.0.20

Nov 13, 2019

0.0.19

Nov 13, 2019

0.0.18

Nov 7, 2019

0.0.17

Nov 7, 2019

0.0.16

Nov 7, 2019

0.0.15

Nov 7, 2019

0.0.14

Nov 7, 2019

0.0.13

Nov 7, 2019

0.0.12

Nov 7, 2019

0.0.11

Nov 7, 2019

0.0.10

Nov 7, 2019

0.0.9

Nov 7, 2019

0.0.8

Nov 7, 2019

0.0.6

Nov 7, 2019

0.0.4

Nov 7, 2019

0.0.3

Nov 7, 2019

0.0.2

Nov 7, 2019

0.0.1

Nov 7, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pangeamt-tea-0.2.34.tar.gz (20.4 kB view hashes)

Uploaded Jan 20, 2022 Source

Built Distribution

pangeamt_tea-0.2.34-py3-none-any.whl (25.7 kB view hashes)

Uploaded Jan 20, 2022 Python 3

Hashes for pangeamt-tea-0.2.34.tar.gz

Hashes for pangeamt-tea-0.2.34.tar.gz
Algorithm	Hash digest
SHA256	`fd8ce352eeb45d9c87acc933351efa82f4a5fa8d4c3a95268eb7a010102165fc`
MD5	`8789652a77c79657d8ade953547f32c1`
BLAKE2b-256	`e83bf639d4995a07b8762e1a0c7b5060d632bc317c3f0036787a2214893cef1d`

Hashes for pangeamt_tea-0.2.34-py3-none-any.whl

Hashes for pangeamt_tea-0.2.34-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4194c978f12caeaef6ae7d191c1b5cd07f2d818db59d72ec71584e0688a98282`
MD5	`6799243b898470e1b04bbdba89b30d35`
BLAKE2b-256	`3c11df51c78741d987515e7f3df0534f2c8c9c3b9a9834b5a87a7ffbed79e3e3`