A supermarket receipt parser written in Python using tesseract OCR
Project description
A fuzzy receipt parser written in Python
This is a fuzzy receipt parser written in Python. It extracts information like the shop, the date, and the total from scanned receipts. It can work as a standalone script or as part of our IOS and Android application.
Dependencies
The receipt-parser-core
library depend on imagemagick
. Please install imagemagick
with your favorite package manager.
Usage
To convert all images from the data/img/
folder to text using tesseract and parse the resulting text files, run
make run
Docker
A Dockerfile
is available with all dependencies needed to run the program.
To build the image, run
make docker-build
To run it on the sample files, try
make docker-run
By default, running the image will execute the make run
command. To use with your own images, run the following:
docker run -v <path_to_input_images>:/usr/src/app/data/img mre0/receipt_parser
History
This project started as a hackathon idea. Read more about it on the trivago techblog. Also read the comments on HackerNews There's also a talk about the project. The library is now available at PyPi.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for receipt_parser_core-0.2.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad267da72acd5a23df796cbfcf6b46fbe89d7586f84ccb1e4560c72bdb84aeca |
|
MD5 | 318faede1e2e40485eae41aa3a743d48 |
|
BLAKE2b-256 | 62519d7b83e3c0ffd0b5cc4d15913d4aeaeb9d650cb3bb371f14069f7f47125f |
Hashes for receipt_parser_core-0.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8eb80e24d3d524bfc2318c1e2626c91aa26b63eda8dcacf945834873ab09847 |
|
MD5 | 59881516f94eaa6a9c81db42570d7c49 |
|
BLAKE2b-256 | 97c7e43bc9dbe57f1d717b27c6e5840d62185e8e5afcf2be746d319230215e0e |