Skip to main content

Python framework for fast Vector Space Modelling

Project description

GA Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core)

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (simple streaming API)

    • easy to extend with other Vector Space algorithms (simple transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don’t need to do anything special.

Install the latest version of gensim:

pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under all supported Python versions. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-4.1.0.tar.gz (23.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-4.1.0-cp39-cp39-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.9Windows x86-64

gensim-4.1.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

gensim-4.1.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

gensim-4.1.0-cp39-cp39-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

gensim-4.1.0-cp38-cp38-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.8Windows x86-64

gensim-4.1.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ ARM64

gensim-4.1.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

gensim-4.1.0-cp38-cp38-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

gensim-4.1.0-cp37-cp37m-win_amd64.whl (23.9 MB view details)

Uploaded CPython 3.7mWindows x86-64

gensim-4.1.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ ARM64

gensim-4.1.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

gensim-4.1.0-cp37-cp37m-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

gensim-4.1.0-cp36-cp36m-win_amd64.whl (23.9 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-4.1.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ ARM64

gensim-4.1.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

gensim-4.1.0-cp36-cp36m-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file gensim-4.1.0.tar.gz.

File metadata

  • Download URL: gensim-4.1.0.tar.gz
  • Upload date:
  • Size: 23.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0.tar.gz
Algorithm Hash digest
SHA256 0b09983048a97c7915ab50500bc53eeec438d26366041598709ec156db3eef1f
MD5 036a55a88400c059a6fd7a43c6acff0f
BLAKE2b-256 55315a8f52f29232d8a6aa5d8fc75531a029bbc24a78869a0a0a4566b9f8c13a

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 1f3f37b2836d00b92e75f33695ee0c901286b71ced325e3e7510918e06b8144e
MD5 055dbb22de06d55f650680e783627d90
BLAKE2b-256 fe5596f90e06d6d16d33a755b915fee34a0c67b7ee313a4f41f5af4e6a656822

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b9a37a941d3b618520225f0f3e1ea2e2c1971385cca1d740db707c10421a7319
MD5 6304ecf6d637dd34aa48e8817459d756
BLAKE2b-256 392c5eeb200d2f9af2caca2b2dfdb86cadcc9cf978b66bc8568fe2ea6714f2ef

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 57c3b947b352e0637d810a5651db568e61844617ff07fec5fb56d55275b04578
MD5 f8d52faa0179e9c103a18ee23557d0d2
BLAKE2b-256 fd3c91b345adf405cbc00fb1c90be5477ccc12b3d291f24b2572991d95b93041

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 905617489a72bd51c7cf516db0defb4d6c04d500a0b054976e816df7f7397a90
MD5 1c8445770be82506982b741fa7dc215b
BLAKE2b-256 99b8288bbf52a661647e8728626baf2de88b38cb79875610d858bac81b7552a8

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 382c821e256040c9763d8ae356d851bbc58590bc45aa8c70ee067f037349e3b3
MD5 5f2cffed07aa1491a6a28d024a9f8a72
BLAKE2b-256 338824b7eb1d3a1db62eac8ef34a2825f72ab6cd21c7ef9f9c0a0cb77e657f82

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d2dee416383591b4fef9cdf3810f9818a2de4af5e588133822e2629a0ed2dc79
MD5 ef387606e8c9e2caab2e18fe5f52b3fa
BLAKE2b-256 cba1fa7cd9adc106dba2d18273089cfd60d0f5a16ea3b319dafb957260dfb03f

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 96a4bae3e7523a9e28049e0b85b9e8525924f6ae9313467734556afecf120bb9
MD5 3e80a96cc4d277064abfc2a55638215a
BLAKE2b-256 a80ab4e6b17557823e4d916789c4b8871452f9e24848bfd25fb4675f319a3852

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 eb577ec45ef72cc213e8e1301b251a6b5798e51f5085f88a0b5527d281acbcec
MD5 7ebd6361aef2438770dbc707c7b0431f
BLAKE2b-256 2df53c1b8fbfdb41c787bf0b8b26c9170084b7e9b09af0dee0526bffcfb66790

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 344ca542b7d8aca249ac3b4ad952f5c360341d4656cb0d25f9893a68ed4fd473
MD5 6865d6a7dec815ddbae52475bcbd3381
BLAKE2b-256 f1a0adec6df0a00b22d22b5842da47fe209fdb2a7b3aa946bf042e708107904a

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f6c4c8bf0c07d1f05f1c718fbc88e85eced222ceba03216eb6e5763098e8c2da
MD5 fd4ac295756f44f6da90c896c962f324
BLAKE2b-256 7b6e307f8957ce37449b5135044de8d5cb25ad72a9a5ee56578f3110b008f300

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 068a4bc698f3b2844609dcb8f693b033d81c9d03f82ffd56d80a62fecc9347e9
MD5 2aa5003a25c87c027f13d522da2a65aa
BLAKE2b-256 2db3655ea3d7c3c6601913ed8fb8d8a6db14fee2120681eb5175003f184a18d2

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 28f6e97d9a6bb32c44c53602d990a12e9fd199719c46f89f9221fe6cb6109bcd
MD5 117cd5dc94b1bdb32b20c1891ee897db
BLAKE2b-256 0d6d06e6d747575caadd70af4f08f88cc0f8e9db34bb1f60cdf99ad38aaa5c8b

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 093415811d56af0f70cd8ed51ef21f927937dc59ed7b341a5b5ebb887906851a
MD5 e1076292685a694eecab6af7aca88faa
BLAKE2b-256 3c1ae9474318d49d396055c45e436d85428c0aefd5d6801f247431bdc1bdeb0f

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9866744a57e6001dd8115328bc7e306fedc746e37e8d1720f3811b5e76359e67
MD5 bfef0dcd7143f2eeac7746298490fc63
BLAKE2b-256 d8c50f77e384a056634410c165e49ca1cfa6d48a7a6a8be976373e3bca39687b

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 932a3d372d4b795ce56efbec8f7b074e8c97a043e2f846b2f721a76c5cc80cc3
MD5 73be45a1032cd10ba287fda0e4ce9c33
BLAKE2b-256 e0ad4cf91087ad50f95ce9c8fa9845c25eeaebaa375600b698a0660a296f710b

See more details on using hashes here.

File details

Details for the file gensim-4.1.0-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.0-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for gensim-4.1.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 504c129224953f75940402143ecce1383be0784ea89912042c7b7e1e1107cbbc
MD5 3d99082a8e7786416db96c3ab1101d6f
BLAKE2b-256 1493676d62ce2431fda03adf061f024ca4902e387db0098beb65de0f31d9a68a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page