Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (trivial streaming API)

    • easy to extend with other Vector Space algorithms (trivial transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

The simple way to install gensim is:

pip install -U gensim

Or, if you have instead downloaded and unzipped the source tar.gz package, you’d run:

python setup.py test
python setup.py install

For alternative modes of installation (without root privileges, development installation, optional install features), see the install documentation.

This version has been tested under Python 2.7, 3.5 and 3.6. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you must use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you must use Python 2.5). Gensim’s github repo is hooked against Travis CI for automated testing on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

This version

3.3.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-3.3.0.tar.gz (21.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-3.3.0.win-amd64-py3.6.exe (22.7 MB view details)

Uploaded Source

gensim-3.3.0.win-amd64-py3.5.exe (22.7 MB view details)

Uploaded Source

gensim-3.3.0.win-amd64-py2.7.exe (22.4 MB view details)

Uploaded Source

gensim-3.3.0.win32-py3.6.exe (22.6 MB view details)

Uploaded Source

gensim-3.3.0.win32-py3.5.exe (22.6 MB view details)

Uploaded Source

gensim-3.3.0.win32-py2.7.exe (22.3 MB view details)

Uploaded Source

gensim-3.3.0-cp36-cp36m-win_amd64.whl (22.1 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-3.3.0-cp36-cp36m-win32.whl (22.1 MB view details)

Uploaded CPython 3.6mWindows x86

gensim-3.3.0-cp36-cp36m-manylinux1_x86_64.whl (22.5 MB view details)

Uploaded CPython 3.6m

gensim-3.3.0-cp36-cp36m-manylinux1_i686.whl (22.4 MB view details)

Uploaded CPython 3.6m

gensim-3.3.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (22.3 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.3.0-cp35-cp35m-win_amd64.whl (22.1 MB view details)

Uploaded CPython 3.5mWindows x86-64

gensim-3.3.0-cp35-cp35m-win32.whl (22.1 MB view details)

Uploaded CPython 3.5mWindows x86

gensim-3.3.0-cp35-cp35m-manylinux1_x86_64.whl (22.5 MB view details)

Uploaded CPython 3.5m

gensim-3.3.0-cp35-cp35m-manylinux1_i686.whl (22.4 MB view details)

Uploaded CPython 3.5m

gensim-3.3.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (22.3 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.3.0-cp27-cp27mu-manylinux1_x86_64.whl (22.5 MB view details)

Uploaded CPython 2.7mu

gensim-3.3.0-cp27-cp27mu-manylinux1_i686.whl (22.4 MB view details)

Uploaded CPython 2.7mu

gensim-3.3.0-cp27-cp27m-win_amd64.whl (22.1 MB view details)

Uploaded CPython 2.7mWindows x86-64

gensim-3.3.0-cp27-cp27m-win32.whl (22.1 MB view details)

Uploaded CPython 2.7mWindows x86

gensim-3.3.0-cp27-cp27m-manylinux1_x86_64.whl (22.5 MB view details)

Uploaded CPython 2.7m

gensim-3.3.0-cp27-cp27m-manylinux1_i686.whl (22.4 MB view details)

Uploaded CPython 2.7m

gensim-3.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (22.3 MB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-3.3.0.tar.gz.

File metadata

  • Download URL: gensim-3.3.0.tar.gz
  • Upload date:
  • Size: 21.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-3.3.0.tar.gz
Algorithm Hash digest
SHA256 6b2a813887583e63c8cedd26a91782e5f1e416a11f85394a92ae3ff908e0be03
MD5 699137080b5c0521f12cc6c71217c733
BLAKE2b-256 1f034758f26ca72f15d36b80b7b2d53486bb8a3e71aea262e2d6a5be689ded36

See more details on using hashes here.

File details

Details for the file gensim-3.3.0.win-amd64-py3.6.exe.

File metadata

File hashes

Hashes for gensim-3.3.0.win-amd64-py3.6.exe
Algorithm Hash digest
SHA256 1153ba758c33b1b9db705f9ed89d77070110428bf6daa503b9e4cf3333a4a5b2
MD5 e47ad6ccaf236321ea8815b9c762fdbb
BLAKE2b-256 0c39d74e6af2a2b0e72afe3f9969c6f6379873bfc1c30e8fdcf22d58f461e758

See more details on using hashes here.

File details

Details for the file gensim-3.3.0.win-amd64-py3.5.exe.

File metadata

File hashes

Hashes for gensim-3.3.0.win-amd64-py3.5.exe
Algorithm Hash digest
SHA256 177b96425754ace0538101f515ce36117506b67771e4716422364d3aab5c57c0
MD5 28aa457d62f85cbb8d74d26bfb3d88dd
BLAKE2b-256 1350c37ababca0d46e9a43ab9e86c9d321c55d26ecbe7af51a178219f45b6d01

See more details on using hashes here.

File details

Details for the file gensim-3.3.0.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for gensim-3.3.0.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 e71d623a752cb5684b116a7a255896f6bd0d63540544a3a12858250045ae0226
MD5 926087b2d1891dd52ab5182b7432f4cd
BLAKE2b-256 2d380aed3afaefadace5f09d7c2c53569b023d285730118f7997f4e536db2be2

See more details on using hashes here.

File details

Details for the file gensim-3.3.0.win32-py3.6.exe.

File metadata

File hashes

Hashes for gensim-3.3.0.win32-py3.6.exe
Algorithm Hash digest
SHA256 8380d3976cbbd27654ce6ae67162743774a4d63f507c72ef11f3d1429cdd0149
MD5 947dd5764fbbfd1b2d8ed9ea9d0efabd
BLAKE2b-256 44fb960895cf6d1d070a720abbec18ad7538805df5f7d44a2dccd47b06d25e20

See more details on using hashes here.

File details

Details for the file gensim-3.3.0.win32-py3.5.exe.

File metadata

File hashes

Hashes for gensim-3.3.0.win32-py3.5.exe
Algorithm Hash digest
SHA256 1bc04ca46a08be6d6c0509b54e2943fd80bc78b08cf93c4ea4c33613710aea85
MD5 28e195f8ef595d48783a02af34c9a735
BLAKE2b-256 9a07a7c0dc076b121f6b16ef6664312d253394db33c8945f0e36353b404e7ff0

See more details on using hashes here.

File details

Details for the file gensim-3.3.0.win32-py2.7.exe.

File metadata

File hashes

Hashes for gensim-3.3.0.win32-py2.7.exe
Algorithm Hash digest
SHA256 8c703f709e1901a969dcd23a4e5698c09bb965344f683c9e89a09694c935ca3b
MD5 dda7bd7bd78c1c553962f31e485f1b87
BLAKE2b-256 aeb9281c4c8ddcfa6e2983e9d9923f17a25898edb63a2a9e9bb7cd8f1a81a637

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 97efa71168d2eafb58a8c2c01b28dba64795363509d70f63b0e5166e8f91af19
MD5 a0e6c9db52ec6178f2a042fa005f6b38
BLAKE2b-256 e3952204144146b042b1a32b2f973420a2f1527542e2ae94e43381e6c0a4dbfb

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 d8c440d2de30b533b5f6c079787a60f4cc5396cf3b0cd566cdfad32481e399bd
MD5 3a4b48a942195e6754c3eec8f393970d
BLAKE2b-256 7679ffab36041dd33a6d39a9ccbb887b98a3dd064b8fd064ce40eab0c3e211fb

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8dfb1afddb3997d074289c88d29c73a7ebeb21d19adb97d2af59a3fed1ae63f7
MD5 2199c05c07920f8f3d43d7ee8292346f
BLAKE2b-256 0c50b3ea3db5ce57b215d88e58d64e32924596e78f35bdfa229735a1ee1e86cb

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 8e8843f092f449309d0a7de7cb47f1ccd4ed0a32ac0475e8cd4d5535bfbbc88a
MD5 1d5aff5820d0a8e9a814e7bac2ca1cc6
BLAKE2b-256 005554a518ce68cbc46d0c07be5517d466a0255bde8e0ff6751b96c618069577

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 1e8f1b8f12e18ab7efca3e6cc635df863184c231a4b16a6d704442c4cb5d0ee9
MD5 7567e36e191574605a638d6ea1f1990b
BLAKE2b-256 fa988b5639a53e27bd034dbc8b8940046313a05ad88fd4e45ae974b417f5bef9

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 ad28869550008de2be14171e457a54b3eaa16ddd868f884cd2550811451d68d1
MD5 cbb0a4bb52168994ea3e0d1938299ee7
BLAKE2b-256 5423c5592511a2112edfd212898dd2c98c386215554f294c08c7f3e9df450734

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 e6130552d311717b8f6b97b43276aff76e58b6fc637447c9da67fe26d20054ea
MD5 7555985d75dfa648451a65966b21c05c
BLAKE2b-256 c6efcf59629daa97954c6fa560c213b985a872c6d5417449fdd6c7ce8e618cd5

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fdc9cd5adcedbdca471dac8c740a49ed6531eb773df849513dc441ce60d517fe
MD5 5783a7ce7a8d9f66e579b4953c17e2d3
BLAKE2b-256 f66b693a0c3bc33c96a42964423e344caaa7f5314673c298f75e8ff28df4787c

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 afcebdc0cd787ca1b6c6bfbe7af3d82f831c8aef07a24efae1a77939c3719aed
MD5 d981562c26542d47f581718cf3e9b95d
BLAKE2b-256 110b23fe791afc6b13598ace11d99389afccc1a5a4f3c7a841162e146ae954ee

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 26ec993e6d615a4f365100b11315288947420a3c84abe06c2a387a64c734ebd3
MD5 394a44c9e323eeee41a4555e6eeecc1d
BLAKE2b-256 67b64205e215958b2d6df3ec19f37864f8750bc3b5f44819418f557dfc3743c1

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 603e652cdf92fb89a06974299369562eb0ca632ee1c256449aa4dd2ec8f28710
MD5 ad3ba1377072063ea15dd5ae2d97b4cb
BLAKE2b-256 63cc16caaaa485a8304a4db021c139f034edccbafba6df2d8c3af6426bbfe5e4

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp27-cp27mu-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 24b27be5f8454d829ceb264530fe898b1e790a3644e4bf92a330ba486c40ac4a
MD5 11fa15983f76839de367794c2b2dd978
BLAKE2b-256 1dd920a8ed07791c5a7234785bbd8254e5cc5477d9bd3efe15f2786203a7d705

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 ea7ddd5be2bc9b09197ea18a521b10f3b0ac3842d8af6d55425d9ef555936ba7
MD5 63e4ce6195dcf5deafcbb7edf460ded2
BLAKE2b-256 35c3227bb0c62b1df3389c27671beecb77b2b7332d852dd50db41de909083f81

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 c8c53f839b36498a6bd1bced0aba06ac7e752fa6c2ca90bf0c4cbdc17fc43be6
MD5 7e84a14f72fb382f54c7e93b50e1166a
BLAKE2b-256 657f5bd73eca49964ccc679aa5c82799f4e7a913485da0b0421e07e423aec8cf

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9702f5a61a1308067233735447a9fe1ba2fecfe0174a4c4e5a5fe6bb551239b1
MD5 dc9c94ee4d785f40f4d697a1a8b2e010
BLAKE2b-256 7d8898e11cc379fa5b1ba86cfdaccfe8ac7c4bbbd3f6839cd151cc6b494865f7

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp27-cp27m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 7f30cbea43274ad446d2fd2c47f88690c047178473bb056a3e90b4222f01d272
MD5 7856587a240d21c7d654f966f34bd857
BLAKE2b-256 23acdbd59ac0c2d45815bee2554d681d8cebefab3a2fcacc7412feda58dafc76

See more details on using hashes here.

File details

Details for the file gensim-3.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 9386f0338db5e37faecb67551cd8081d193a017496401f8d44ee2aa4d01defc3
MD5 fe7ead3899c4de521adee784eef083be
BLAKE2b-256 1a1bcceeacfa00125c6dd6a538302b35091fffc2f4a83bd32b513ee8f0563c81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page