Skip to main content

Module for automatic summarization of text documents and HTML pages.

Project description

https://api.travis-ci.org/miso-belica/sumy.png?branch=master

Here are some other summarizers:

Installation

Make sure you have Python 2.6+/3.2+ and pip (Windows, Linux) installed. Run simply (preferred way):

$ [sudo] pip install sumy

Or for the fresh version:

$ [sudo] pip install git+git://github.com/miso-belica/sumy.git

Or if you have to:

$ wget https://github.com/miso-belica/sumy/archive/master.zip # download the sources
$ unzip master.zip # extract the downloaded file
$ cd sumy-master/
$ [sudo] python setup.py install # install the package

Usage

Sumy contains command line utility for quick summarization of documents.

$ sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization # what's summarization?
$ sumy luhn --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy edmundson --language=czech --length=3% --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy --help # for more info

Various evaluation methods for some summarization method can be executed by commands below:

$ sumy_eval lex-rank reference_summary.txt --url=http://en.wikipedia.org/wiki/Automatic_summarization
$ sumy_eval lsa reference_summary.txt --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy_eval edmundson reference_summary.txt --language=czech --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy_eval --help # for more info

Python API

Or you can use sumy like a library in your project.

# -*- coding: utf8 -*-

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


LANGUAGE = "czech"
SENTENCES_COUNT = 10


if __name__ == "__main__":
    url = "http://www.zsstritezuct.estranky.cz/clanky/predmety/cteni/jak-naucit-dite-spravne-cist.html"
    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
    stemmer = Stemmer(LANGUAGE)

    summarizer = Summarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)

    for sentence in summarizer(parser.document, SENTENCES_COUNT):
        print(sentence)

Tests

Run tests via

$ nosetests-2.6 && nosetests-3.2 && nosetests-2.7 && nosetests-3.3

Changelog

0.2.1 (2014-01-23)

  • Fixed installation of my own readability fork. Added breadability to the dependencies instead of it #8. Thanks to @pratikpoddar.

0.2.0 (2014-01-18)

  • Removed dependency on SciPy #7. Use numpy.linalg.svd implementation. Thanks to Shantanu.

0.1.0 (2013-10-20)

  • First public release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sumy-0.2.1.zip (45.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sumy-0.2.1.win32.exe (386.9 kB view details)

Uploaded Source

sumy-0.2.1-py2.py3-none-any.whl (190.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file sumy-0.2.1.zip.

File metadata

  • Download URL: sumy-0.2.1.zip
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for sumy-0.2.1.zip
Algorithm Hash digest
SHA256 c9524daa9671e011af4311661dd2c3501d7f3f6638d2a6f33c8add2b054fc241
MD5 0a6ccac0701c009d293c4d6e0e4c1f36
BLAKE2b-256 6b51838c2591066ad66a7e686b8b2b86f0d49553075d02d75ce69aebb83d5355

See more details on using hashes here.

File details

Details for the file sumy-0.2.1.win32.exe.

File metadata

  • Download URL: sumy-0.2.1.win32.exe
  • Upload date:
  • Size: 386.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for sumy-0.2.1.win32.exe
Algorithm Hash digest
SHA256 6781b4af7fcc6a429cb34d23ff08cee7eb0b6c7049d0729a8313dc36494dc56b
MD5 c1806b503a25d82e48ddd9d18c2aa6b9
BLAKE2b-256 740078afacbc91980a14923164170ed1fda523379aa93b4a109723348910928d

See more details on using hashes here.

File details

Details for the file sumy-0.2.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for sumy-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bc82fa9921dc576f980dacc3a60456b880f382d3b4c9bd1fdc82fcc9ab3594fd
MD5 5748f15f25d20df91216a34cb2a7f47e
BLAKE2b-256 a65025b7e551f97c1a738db31a410056ca52f15380639ea9ae599ba4422caa24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page