scrapy output testing framework

Project description

scrapy-test

Scrapy test is a validation/test framework for validating scrapy results. This framework is capable of testing scrapy crawl and stats output.

See example project for hackernews crawler with full test suite.

Philosophy and Architecture

architecture illustration

scrapy-test tries to replicate scrapy.Item definition but instead of defining fields it defines test for every field.
Tests are callables that either returns a failure message if some condition is met. Example item specification:

class MyItem(Item):
    name = Field()
    url = Field()

class TestMyItem(ItemSpec):
    item_cls = MyItem

    # define tests
    name_test = Match('some-regex-pattern')
    url_test = lamda v: 'bad url' if 'cat' in v else ''

    # define coverage
    url_cov = 100  # 100% - every item should have url field

scrapy-test also supports stats output validation. When scrapy finished crawling it outputs various stats like error count etc. StatSpec can be defined to validate these stats:

class MyStats(StatsSpec):
    spider_cls = MySpder1, MySpider2
    # or multiple spiders
    validation = {  #stat_name_pattern : tests
        'item_scraped_count': MoreThan(1),
        'downloader/response_status_count/50\d': LessThan(1),
    }
    # required stat keys
    required = ['stat_pattern.+']

Finally scrapy-test determines failure by asserting if there are any messages generated by either stat ir item specifications (exit code 1 and 0 respectively).

Usage

Setup

test.py module should be created in spider directory.
For example creating test.py

scrapy-test-example/
├── example
│   ├── __init__.py
│   └── test.py
└── scrapy.cfg

Add test file config to scrapy.cfg:

[settings]
default = example.settings
[test]
root = example.test

Define ItemSpec for item field validation:

from scrapytest.tests import Match, Equal, Type, MoreThan, Map, Len, Required

class TestPost(ItemSpec):
    # defining item that is being covered
    item_cls = PostItem

    # defining field tests
    title_test = Match('.{5,}')
    points_test = Type(int), MoreThan(0)
    author_test = Type(str), Match('.{3}')
    comments_test = Type(list), Required()

    # also supports methods!
    def url_test(self, value: str):
        if not value.startswith('http'):
            return f'Invalid url: {value}'
        return ''

ItemSpec class should contain attributes that end in _test these attributes have be callables (functions, methods etc.) that return message(s) if failure is encountered. See the url_test example above.

Define StatSpec for crawl stats validation:

class TestStats(StatsSpec):
    # stat pattern: test functions
    validate = {  # this is default
        'log_count/ERROR$': LessThan(1),
        'item_scraped_count': MoreThan(1),
        'finish_reason': Match('finished'),
    }
    # these stats shoudl be required
    required = ['some_cool_stat']

StatsSpec should contain validate attribute with pattern: tests dictionary.

Define Spider classes:

from project.spiders import HackernewsSpider 

class TestHackernewsSpider(HackernewsSpider):
    test_urls = [
        "https://news.ycombinator.com/item?id=19187417",
    ]

    def start_requests(self):
        for url in self.test_urls:
            yield Request(url, self.parse_submission)

This spider should extend your production spider that simply crawls the urls without doing discovery. Alternatively you can also not extend anything for live testing.

Running

$ scrapy-test --help                                                                                                 
Usage: scrapy-test [OPTIONS] [SPIDER_NAME]

  run scrapy-test tests and output messages and appropriate exit code (1 for
  failed, 0 for passed)

Options:
  --cache  enable HTTPCACHE_ENABLED setting for this run
  --help       Show this message and exit.

To run the tests use cli command:

$ scrapy-test <spider_name>

Spider name can be skipped for running all spiders

Notifications

scrapy-test supports notification hooks on either test failure or success:

  --notify-on-error TEXT    send notification on failure, choice from:
                            ['slack']
  --notify-on-all TEXT      send notification on failure or success, choice
                            from: ['slack']
  --notify-on-success TEXT  send notification on success, choice from:
                            ['slack']

Right scrapy-test offers these notifiers:

* Slack - to configure slack notification follow slack [incoming webhooks](https://slack.com/apps/A0F7XDUAZ-incoming-webhooks) app and supply these settings in `scrapy.cfg`:

    slack_url = https://hooks.slack.com/services/AAA/BBB/CCC
    # where the message goes to
    slack_channel = #cats
    # bot's name
    slack_username = bender
    # bot's avatar
    slack_icon_emoji = :bender:
    # maintainer will be mentioned on error
    slack_maintainer = @bernard

Project details

Release history Release notifications | RSS feed

0.6.1

Jun 27, 2019

0.6

Apr 19, 2019

This version

0.5.1

Apr 18, 2019

0.5

Apr 12, 2019

0.4

Apr 9, 2019

0.2

Mar 5, 2019

0.1

Feb 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-test-0.5.1.tar.gz (11.1 kB view details)

Uploaded Apr 18, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapy_test-0.5.1-py3-none-any.whl (26.2 kB view details)

Uploaded Apr 18, 2019 Python 3

File details

Details for the file scrapy-test-0.5.1.tar.gz.

File metadata

Download URL: scrapy-test-0.5.1.tar.gz
Upload date: Apr 18, 2019
Size: 11.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.1 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for scrapy-test-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`efabb87689a0fb6d2ec000291f0a681353235996ed83e74ca9b5e1d6b27d37d8`
MD5	`8bbb70625a601e3ca1e9bf0728bd1457`
BLAKE2b-256	`85ff3585c9a7f5eea1aa4a449675d746ac43474a1ff7abeaba38174f5a2abd25`

See more details on using hashes here.

File details

Details for the file scrapy_test-0.5.1-py3-none-any.whl.

File metadata

Download URL: scrapy_test-0.5.1-py3-none-any.whl
Upload date: Apr 18, 2019
Size: 26.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.1 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for scrapy_test-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c2c4ff36992cc0fc81e89701182e2973e935ad334778d047a31f4779337ad337`
MD5	`d76f74475d6c310d87cd84dbe4d720e4`
BLAKE2b-256	`0ec74afd118c8f4315812e991a7a09fe6c6667e39bb3f6427ff3f507fa0ccc5d`

See more details on using hashes here.

scrapy-test 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

scrapy-test

Philosophy and Architecture

Usage

Setup

Running

Notifications

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes