Pure Python, lightweight, Pillow-based solver for the Amazon's text captcha.
Project description
Motivation behind creation of this library is taking its start from the genuinely simple idea: "I don't want to use pytesseract or some other non-amazon-specific OCR services, nor do I want to install some executables to just solve a captcha. My desire is to get a solution within 1-2 lines of code without any heavy add-ons. Using a pure Python."
Pure Python, lightweight, Pillow-based solver for the Amazon's text captcha.
Installation
pip install amazoncaptcha
Quick Snippet
from amazoncaptcha import AmazonCaptcha
captcha = AmazonCaptcha('captcha.jpg')
solution = captcha.solve()
# Or: solution = AmazonCaptcha('captcha.jpg').solve()
Status
Usage
If you are data extraction or web scraping specialist, who is crawling Amazon by using selenium
, this classmethod will do all the "dirty" work of extracting an image from webpage for you. Practically, it takes a screenshot from your webdriver, crops the captcha and stores it into bytes array, which is then used to create an AmazonCaptcha instance. This also means avoiding any local savings.
from amazoncaptcha import AmazonCaptcha
from selenium import webdriver
driver = webdriver.Chrome() # This is a simplified example
driver.get('https://www.amazon.com/errors/validateCaptcha')
captcha = AmazonCaptcha.from_webdriver(driver)
solution = captcha.solve()
For Whom?
- Data extraction and web scraping specialists could use this tool, obviously, to bypass the Amazon captcha.
- Machine learning developers could use captchas folder (currently contains 13000 unique solved captchas) based on a demand.
Issues
- If you constantly receive 'Not solved' output, feel free to create an issue and describe details.
- If you received an output, different from solution itself or 'Not solved', please, create an issue or contact me.
- If you've somehow met an Exception, which you don't understand - you know what to do :)
Changes
- Version 0.0.10:
- Reached 10000 training samples.
- Reached 90%+ accuracy.
- Version 0.0.11:
- Fixed error with captcha images that were taken from BytesIO.
- Version 0.0.12:
- Code adjustments and improvements.
- Program can now solve images where last letter is corrupted.
- Version 0.0.13:
- Added and tested 'from_webdriver' classmethod.
- Version 0.1.0:
- 100,000 captchas crash test, accuracy is 98.5%.
- Version 0.1.1 - 0.1.5:
- Code adjustments and improvements.
- Added tests.
- Version 0.2.0:
- Second crash test through 120k+ captchas.
- Accuracy increased to 99.1%
- Code coverage is 100%
- Version 0.3.0:
- Program can now solve images where letters are intercepted.
- Third crash test through 140k+ captchas.
- Accuracy increased to 99.998%
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for amazoncaptcha-0.3.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52f8bed1047d9f3bcb5196bc76d0d4d35fdb19bade77a42be1e7475df44c2cef |
|
MD5 | 0dcbbbb92f3190f77b2fe54f8a9b397f |
|
BLAKE2b-256 | b1136153e9ab1539029606493696c6650c98d2381e292956234081f3a8383002 |