Skip to main content

Python framework for fast Vector Space Modelling

Project description

GA Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core)

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (simple streaming API)

    • easy to extend with other Vector Space algorithms (simple transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don’t need to do anything special.

Install the latest version of gensim:

pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under all supported Python versions. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

This version

4.2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-4.2.0.tar.gz (23.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-4.2.0-cp310-cp310-win_amd64.whl (23.9 MB view details)

Uploaded CPython 3.10Windows x86-64

gensim-4.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

gensim-4.2.0-cp310-cp310-macosx_10_9_universal2.whl (24.4 MB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

gensim-4.2.0-cp39-cp39-win_amd64.whl (23.9 MB view details)

Uploaded CPython 3.9Windows x86-64

gensim-4.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

gensim-4.2.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

gensim-4.2.0-cp39-cp39-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

gensim-4.2.0-cp38-cp38-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.8Windows x86-64

gensim-4.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ ARM64

gensim-4.2.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

gensim-4.2.0-cp38-cp38-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

gensim-4.2.0-cp37-cp37m-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.7mWindows x86-64

gensim-4.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ ARM64

gensim-4.2.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

gensim-4.2.0-cp37-cp37m-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

File details

Details for the file gensim-4.2.0.tar.gz.

File metadata

  • Download URL: gensim-4.2.0.tar.gz
  • Upload date:
  • Size: 23.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.9

File hashes

Hashes for gensim-4.2.0.tar.gz
Algorithm Hash digest
SHA256 995ebd2970a31d47c100aaac10212f47e2bf12e2b06536d38883c951ff34eef1
MD5 74ded52df283caa131d303d4350e6e1e
BLAKE2b-256 24972197f018ee9f8ce2f071b2d9c6711c76159aead710f8d24a2bf006082a28

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: gensim-4.2.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.9

File hashes

Hashes for gensim-4.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f1b59d3559fab86df1e046882531ecc9a653689cf6effb1b089cd3adfc7ec57c
MD5 88459840e30854d7df7b443fcbff4ea4
BLAKE2b-256 aa1ae3a87ee67ffa989438dcf4f113a710335aa95ce96c586c68cac0fa6af0ee

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e06f177e537af25e219f6e42f50f78687754fd20aa3a8335fe3103636f747df
MD5 56d7f32814eabab1e2070467b4568bf5
BLAKE2b-256 7d9ab4a26ed8fd1c022c06b2fd8b3df43e0fc92dc868bd24b06f3d3048788090

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 de68cdee678e63500c832cc17ed768c0fc22550f63782201a50f319a4a6fc385
MD5 7d09f342544e732152fa328593956d1c
BLAKE2b-256 d63d0267ca76abbaed8ae54360de3021661549f26e16baa556734962499a4e2d

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: gensim-4.2.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.9

File hashes

Hashes for gensim-4.2.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 8ff2c9e8e7470701b26b994d29fdf4d77bb3b290b6303dee61859a776bfeeeee
MD5 8f40596bfc045cc4e0e1c9c83311df61
BLAKE2b-256 d215e5059a7ec03a5586fb10ceca957d7444a861a8b8a3ba913b69eafae4a79e

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 edb70625065c5d170205cbf108bb01a9fce9487182fdeb2724ce0674ab98a244
MD5 3fc9d49e5885ba52ed4166f7d348eec8
BLAKE2b-256 db0d0dd37261d2791034a68b04ac335d53df978fc7e33f5259398fc93f45d482

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 ec8c5917ec36cfffbe8688af60f444f9e8fa01a3032fb75c50df212772db257f
MD5 024c0a51f3d2a1715a6d5c67708aed3f
BLAKE2b-256 df8f0ab7661723685fc7535b7e9f8a3811be0bdbcab748b9ebbe6f82f0232733

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4ebc8ec9cccae2d063cef240825658f050c5ad6fd2c81eecc84765f7a87b58ba
MD5 9f8393bd480a4490ffd497fb251c3a46
BLAKE2b-256 4a7554a61a3c66abcea42c7fc5e607498049384c4ff98654c9b7178340be3a77

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: gensim-4.2.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.9

File hashes

Hashes for gensim-4.2.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 f8199bc532ea041a35c360c65ce418a4730c813fd300a433fd9a3d89b4bb6f1d
MD5 6e53cdc1a5dbabaee046751aab693d5a
BLAKE2b-256 02d7ca19d4dac1722d0f21ce10e0a8551fc2c2f093263d288639370565445ca6

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 636cd376e647200bbc37694541c078a64ce2ddfa70a6cbaafd55de800731000b
MD5 1e3f6c3db0a98dcadae2ba2ae6c36d56
BLAKE2b-256 cd60c010f23cbfa6bc828be7d2019a271468003d0a1178f368cbf27806c023a5

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 db1bc49dccc3815fa58bddc9e1f953354a131ab510d72fd8af58e9c749bb695e
MD5 e20271d4a5140ef2bbd15ea0f2588f0e
BLAKE2b-256 29f309b90ba9c0db8d0d0ae2e559614790284da43e853025e16002c3eef7adae

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bad35d595e5a37b2f082985b31dbd01024949a9b3d4f69f80e54d990e90721ad
MD5 50f6cc5bf15cdd1a8c1c4b366c5fa6d0
BLAKE2b-256 1b6430d536802cc0a1fa30624e0bc306af4add630892f0f5af779f90c83aebbe

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.2.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.9

File hashes

Hashes for gensim-4.2.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 662e755a179a4042d1a3c3c39d80479f50fc09b7e1c5df2e40276ec75c8311d2
MD5 da323cd47f3de8ed2109acac7db6c268
BLAKE2b-256 8a6fa690547cb7089d4019465bfbfbbb8bea5b3e52969cd2d6005049e6678ec4

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 84a06822c0ad1285f4e8630df5e618e8e2f67eb5edb571fa907433856911f42b
MD5 ec8bfd1e1fbe5394901fb34028150a2b
BLAKE2b-256 9f100a737b7f935a14ac49d12187530952805978e1be506212b7c66d15962e27

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 cccf80d032dac1d6be8de06a8c8ed763496acda8c37e5c9a9d19c712f16d8fd3
MD5 d0aac6f6eae9e7804c176858db07939e
BLAKE2b-256 1c22fe12d98526b54890ca0338f8bc8061e1050181b9db281b783f389367f531

See more details on using hashes here.

File details

Details for the file gensim-4.2.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.2.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1ae06b2d70ed09ba1051b7f37ee142b695c183fd18f2dddb3362379d6dc33d73
MD5 8eb3864f033dcf2db4839b4a1a99383d
BLAKE2b-256 8c7c4901e20d0c9b5f6f119282bf286d35b1f00d2b6fee6c029b9f1faa5939ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page