Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (trivial streaming API)

    • easy to extend with other Vector Space algorithms (trivial transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive HTML documentation and tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

The simple way to install gensim is:

pip install -U gensim

Or, if you have instead downloaded and unzipped the source tar.gz package, you’d run:

python setup.py test
python setup.py install

For alternative modes of installation (without root privileges, development installation, optional install features), see the documentation.

This version has been tested under Python 2.6, 2.7, 3.3, 3.4 and 3.5 (support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you must use Python 2.5). Gensim’s github repo is hooked against Travis CI for automated testing on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Manual for the gensim package is available in HTML. It contains a walk-through of all its features and a complete reference section. It is also included in the source distribution package.

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      note={\url{http://is.muni.cz/publication/884893/en}},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-0.13.1.tar.gz (4.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-0.13.1.win-amd64-py3.5.exe (4.3 MB view details)

Uploaded Source

gensim-0.13.1.win-amd64-py2.7.exe (4.4 MB view details)

Uploaded Source

gensim-0.13.1.win32-py3.5.exe (4.3 MB view details)

Uploaded Source

gensim-0.13.1.win32-py2.7.exe (4.4 MB view details)

Uploaded Source

gensim-0.13.1-cp35-cp35m-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.5mWindows x86-64

gensim-0.13.1-cp35-cp35m-win32.whl (4.2 MB view details)

Uploaded CPython 3.5mWindows x86

gensim-0.13.1-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-0.13.1-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.4mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-0.13.1-cp33-cp33m-macosx_10_6_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.3mmacOS 10.6+ x86-64

gensim-0.13.1-cp27-cp27m-win_amd64.whl (4.2 MB view details)

Uploaded CPython 2.7mWindows x86-64

gensim-0.13.1-cp27-cp27m-win32.whl (4.2 MB view details)

Uploaded CPython 2.7mWindows x86

gensim-0.13.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.3 MB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-0.13.1.tar.gz.

File metadata

  • Download URL: gensim-0.13.1.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-0.13.1.tar.gz
Algorithm Hash digest
SHA256 7937d4dac5fa7d25665ca8602835a635fdac9619f6e4c6f5dba71947152ebc76
MD5 4ec562f4c4ca56d8e46c7ff60ecb12ec
BLAKE2b-256 70f3b43cbba2ec6ff1ff1aa6d6c016a6e0c72110d596870c56df60c7c2a54de1

See more details on using hashes here.

File details

Details for the file gensim-0.13.1.win-amd64-py3.5.exe.

File metadata

File hashes

Hashes for gensim-0.13.1.win-amd64-py3.5.exe
Algorithm Hash digest
SHA256 16b572d31b63c3328b7240b13dcb3dba34ab4041fb516f6d75f4eee125248a94
MD5 d9cadb5346da469ae8079914d8d5aaf4
BLAKE2b-256 fd51f57aec75fb999c64e4ebb83b5ae84d0d0f1aed188bb2274e3e55286c30f0

See more details on using hashes here.

File details

Details for the file gensim-0.13.1.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for gensim-0.13.1.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 355b62bbe248f600afd38026a3834f3447feef7437555b4d193e3fac3ee310a4
MD5 9eb86055a4dce6f0a23c071f22961a9a
BLAKE2b-256 f33560a480ae9130e0ee431674c1ac5a8d5d81534039ade7fa4ea5996347e686

See more details on using hashes here.

File details

Details for the file gensim-0.13.1.win32-py3.5.exe.

File metadata

File hashes

Hashes for gensim-0.13.1.win32-py3.5.exe
Algorithm Hash digest
SHA256 5e017c63c0a00778bdf0776607fd6d181d6481e8a8fe91b36901b6db71a79e87
MD5 0e9781117c4f065cb2bac352fa919d12
BLAKE2b-256 55792ceb8af3562e5218ccb6a777453ca619d191565894d6bbb7f36ca69fd421

See more details on using hashes here.

File details

Details for the file gensim-0.13.1.win32-py2.7.exe.

File metadata

File hashes

Hashes for gensim-0.13.1.win32-py2.7.exe
Algorithm Hash digest
SHA256 0ffe55fd4f339f2a7973b237c3cec7919bff022a0a552e2e0ab57b19f753fd36
MD5 3a7d8f725686265dda6eeec60df07e1d
BLAKE2b-256 0997715f24650442277f66ee30c3cf237980ce7f4a9b0f061558c70615bd34d1

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 b29549a20490fe524f4cb4e0faf18af87224f702ce683bca402571f57b24cab5
MD5 b35d1fd766ec76c1082ab423f05f7560
BLAKE2b-256 573cab639525550e027294434d4293284b620ccd58ad9511105d98b37c20df94

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 cf6069d52276c03fda5f0be42fd6b38922d67874913fd11fb0fd7d07b70d18a3
MD5 f71c4813e9602108d1935219e0b218ca
BLAKE2b-256 7b4bb45244b4a7d8e777feb3109cfc299d8c82f1dbc6f8419d8fd7e8189f8827

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 bfe450575f9683e46616773f7d6d2f5f40a743421b4c09f266e7fa5d4cc496ef
MD5 9c36990c3ea0cc18845f316f0b711f40
BLAKE2b-256 6a498bea12c2e93778e9af356f9fc29e38303295b7107e11a9e16e1bb79e975a

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 2c9925c7fc68a491dc19e004be5b247e27422c510a1f9d13b92b5c849e6cfd15
MD5 03fdcddb1552ba59254cebaea1a57e4c
BLAKE2b-256 767b7cf438d9db97d1422d2098477fc1dab12967a8dd685a37fac7957f5c2d92

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp33-cp33m-macosx_10_6_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp33-cp33m-macosx_10_6_x86_64.whl
Algorithm Hash digest
SHA256 118f45fef9bc77701d7e2c272c869b35ec2cae2c2e3ce6964063b6a231f72802
MD5 54f3a42aeab7b297618631de63366c45
BLAKE2b-256 92b4d9084923d6d971dee249d32f6c6812904a407e33df1121eeca01be62dec3

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 d3e84ca4798235e3e19000b7ff3fa11213dc5e234420a6a1f7f6f1b4aee8523b
MD5 9a2eadb00c4f852e4ce50b59346238ab
BLAKE2b-256 43ad7324e49cb0843f9c959264b04779cc529128b41466a98d017def04436b34

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 06d596ce8df5255e59e9ae43f3e04e9abfdee9d00298d8f84bc4d7b634d0b60e
MD5 00b5be9f97eb574376a73dcc6bd81fff
BLAKE2b-256 fe7fe730d9b5273732134930df2c08c565134add20b9daa840b25e00a4357b68

See more details on using hashes here.

File details

Details for the file gensim-0.13.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.13.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 8b5ecf4ee99a5d57f491eed3f1392d5071e80e1845dd97d740a4c420d103db28
MD5 353e783c5b66af6ae641f1fe5da6950e
BLAKE2b-256 7db9fb9b0b26cbc1f5215603c0b9e3c81eee4c70c028025065d7e6e2ae5decb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page