Skip to main content

Python Framework for Topic Modeling

Project description

Gensim is a Python framework for unsupervised learning from raw, unstructured digital texts. It provides a framework for learning hidden (latent) corpus structure. Once found, documents can be succinctly expressed in terms of this structure, queried for topical similarity and so on.

Gensim includes the following features:
  • Memory independence – there is no need for the whole text corpus (or any intermediate term-document matrices) to reside fully in RAM at any one time.

  • Provides implementations for several popular topic inference algorithms, including Latent Semantic Analysis (LSA, LSI) and Latent Dirichlet Allocation (LDA), and makes adding new ones simple.

  • Contains I/O wrappers and converters around several popular data formats.

  • Allows similarity queries across documents in their latent, topical representation.

The principal design objectives behind gensim are:
  1. Straightforward interfaces and low API learning curve for developers, facilitating modifications and rapid prototyping.

  2. Memory independence with respect to the size of the input corpus; all intermediate steps and algorithms operate in a streaming fashion, processing one document at a time.

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-0.2.tar.gz (119.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gensim-0.2-py2.5.egg (130.7 kB view details)

Uploaded Egg

File details

Details for the file gensim-0.2.tar.gz.

File metadata

  • Download URL: gensim-0.2.tar.gz
  • Upload date:
  • Size: 119.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-0.2.tar.gz
Algorithm Hash digest
SHA256 df51d7a3e254e0d6e7d3998d46bd0da6f13352106594985b9bd277b000838d1b
MD5 4a34f3623134d21222faa8a4c5035d3e
BLAKE2b-256 b4c74813ac45df446d4fa92575e8c863cdae68a79d11775caafcda3cdd665904

See more details on using hashes here.

File details

Details for the file gensim-0.2-py2.5.egg.

File metadata

  • Download URL: gensim-0.2-py2.5.egg
  • Upload date:
  • Size: 130.7 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-0.2-py2.5.egg
Algorithm Hash digest
SHA256 86aae1eab2409436b1a7790d402a7c6e816adda98082b22c69a61ff06d2147da
MD5 6cd22bc391fb8e7620b6d5aa0b316a5a
BLAKE2b-256 496980e1aa54ff72384ae13a4501d819d8eeb13e7fd6d36aad595998b22979ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page