Skip to main content

Topic modeling with latent Dirichlet allocation

Project description

pypi version travis-ci build status pypi download statistics Zenodo citation

lda implements latent Dirichlet allocation (LDA) using collapsed Gibbs sampling. lda is fast and is tested on Linux, OS X, and Windows.

You can read more about lda in the documentation.

Installation

pip install lda

Getting started

lda.LDA implements latent Dirichlet allocation (LDA). The interface follows conventions found in scikit-learn.

The following demonstrates how to inspect a model of a subset of the Reuters news dataset. The input below, X, is a document-term matrix (sparse matrices are accepted).

>>> import numpy as np
>>> import lda
>>> import lda.datasets
>>> X = lda.datasets.load_reuters()
>>> vocab = lda.datasets.load_reuters_vocab()
>>> titles = lda.datasets.load_reuters_titles()
>>> X.shape
(395, 4258)
>>> X.sum()
84010
>>> model = lda.LDA(n_topics=20, n_iter=1500, random_state=1)
>>> model.fit(X)  # model.fit_transform(X) is also available
>>> topic_word = model.topic_word_  # model.components_ also works
>>> n_top_words = 8
>>> for i, topic_dist in enumerate(topic_word):
...     topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]
...     print('Topic {}: {}'.format(i, ' '.join(topic_words)))

Topic 0: british churchill sale million major letters west britain
Topic 1: church government political country state people party against
Topic 2: elvis king fans presley life concert young death
Topic 3: yeltsin russian russia president kremlin moscow michael operation
Topic 4: pope vatican paul john surgery hospital pontiff rome
Topic 5: family funeral police miami versace cunanan city service
Topic 6: simpson former years court president wife south church
Topic 7: order mother successor election nuns church nirmala head
Topic 8: charles prince diana royal king queen parker bowles
Topic 9: film french france against bardot paris poster animal
Topic 10: germany german war nazi letter christian book jews
Topic 11: east peace prize award timor quebec belo leader
Topic 12: n't life show told very love television father
Topic 13: years year time last church world people say
Topic 14: mother teresa heart calcutta charity nun hospital missionaries
Topic 15: city salonika capital buddhist cultural vietnam byzantine show
Topic 16: music tour opera singer israel people film israeli
Topic 17: church catholic bernardin cardinal bishop wright death cancer
Topic 18: harriman clinton u.s ambassador paris president churchill france
Topic 19: city museum art exhibition century million churches set

The document-topic distributions are available in model.doc_topic_.

>>> doc_topic = model.doc_topic_
>>> for i in range(10):
...     print("{} (top topic: {})".format(titles[i], doc_topic[i].argmax()))
0 UK: Prince Charles spearheads British royal revolution. LONDON 1996-08-20 (top topic: 8)
1 GERMANY: Historic Dresden church rising from WW2 ashes. DRESDEN, Germany 1996-08-21 (top topic: 13)
2 INDIA: Mother Teresa's condition said still unstable. CALCUTTA 1996-08-23 (top topic: 14)
3 UK: Palace warns British weekly over Charles pictures. LONDON 1996-08-25 (top topic: 8)
4 INDIA: Mother Teresa, slightly stronger, blesses nuns. CALCUTTA 1996-08-25 (top topic: 14)
5 INDIA: Mother Teresa's condition unchanged, thousands pray. CALCUTTA 1996-08-25 (top topic: 14)
6 INDIA: Mother Teresa shows signs of strength, blesses nuns. CALCUTTA 1996-08-26 (top topic: 14)
7 INDIA: Mother Teresa's condition improves, many pray. CALCUTTA, India 1996-08-25 (top topic: 14)
8 INDIA: Mother Teresa improves, nuns pray for "miracle". CALCUTTA 1996-08-26 (top topic: 14)
9 UK: Charles under fire over prospect of Queen Camilla. LONDON 1996-08-26 (top topic: 8)

Requirements

Python 2.7 or Python 3.3+ is required. The following packages are required

Caveat

lda aims for simplicity. (It happens to be fast, as essential parts are written in C via Cython.) If you are working with a very large corpus you may wish to use more sophisticated topic models such as those implemented in hca and MALLET. hca is written entirely in C and MALLET is written in Java. Unlike lda, hca can use more than one processor at a time. Both MALLET and hca implement topic models known to be more robust than standard latent Dirichlet allocation.

Notes

Latent Dirichlet allocation is described in Blei et al. (2003) and Pritchard et al. (2000). Inference using collapsed Gibbs sampling is described in Griffiths and Steyvers (2004).

Other implementations

License

lda is licensed under Version 2.0 of the Mozilla Public License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

lda-1.0.4.zip (316.5 kB view details)

Uploaded Source

lda-1.0.4.tar.gz (300.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lda-1.0.4-cp35-cp35m-win_amd64.whl (313.3 kB view details)

Uploaded CPython 3.5mWindows x86-64

lda-1.0.4-cp35-cp35m-win32.whl (303.4 kB view details)

Uploaded CPython 3.5mWindows x86

lda-1.0.4-cp35-cp35m-manylinux1_x86_64.whl (500.9 kB view details)

Uploaded CPython 3.5m

lda-1.0.4-cp35-cp35m-manylinux1_i686.whl (484.4 kB view details)

Uploaded CPython 3.5m

lda-1.0.4-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (373.3 kB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

lda-1.0.4-cp34-cp34m-win_amd64.whl (312.4 kB view details)

Uploaded CPython 3.4mWindows x86-64

lda-1.0.4-cp34-cp34m-win32.whl (304.9 kB view details)

Uploaded CPython 3.4mWindows x86

lda-1.0.4-cp34-cp34m-manylinux1_x86_64.whl (503.9 kB view details)

Uploaded CPython 3.4m

lda-1.0.4-cp34-cp34m-manylinux1_i686.whl (486.9 kB view details)

Uploaded CPython 3.4m

lda-1.0.4-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (373.5 kB view details)

Uploaded CPython 3.4mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

lda-1.0.4-cp33-cp33m-win_amd64.whl (312.4 kB view details)

Uploaded CPython 3.3mWindows x86-64

lda-1.0.4-cp33-cp33m-win32.whl (304.8 kB view details)

Uploaded CPython 3.3mWindows x86

lda-1.0.4-cp27-cp27mu-manylinux1_x86_64.whl (486.4 kB view details)

Uploaded CPython 2.7mu

lda-1.0.4-cp27-cp27mu-manylinux1_i686.whl (470.1 kB view details)

Uploaded CPython 2.7mu

lda-1.0.4-cp27-cp27m-win_amd64.whl (313.5 kB view details)

Uploaded CPython 2.7mWindows x86-64

lda-1.0.4-cp27-cp27m-win32.whl (304.5 kB view details)

Uploaded CPython 2.7mWindows x86

lda-1.0.4-cp27-cp27m-manylinux1_x86_64.whl (486.4 kB view details)

Uploaded CPython 2.7m

lda-1.0.4-cp27-cp27m-manylinux1_i686.whl (470.1 kB view details)

Uploaded CPython 2.7m

lda-1.0.4-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (373.5 kB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file lda-1.0.4.zip.

File metadata

  • Download URL: lda-1.0.4.zip
  • Upload date:
  • Size: 316.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lda-1.0.4.zip
Algorithm Hash digest
SHA256 c1e136268fe47aef8cb42ca526ba3c2bc42e3b7cb63766bdc9ac32c1841e82fb
MD5 c05c6fcb871eb75d66e674a9757c9f5a
BLAKE2b-256 398879ccc5dd052f998a4948b6933678813a3e995feaf535cf57fe88a392c630

See more details on using hashes here.

File details

Details for the file lda-1.0.4.tar.gz.

File metadata

  • Download URL: lda-1.0.4.tar.gz
  • Upload date:
  • Size: 300.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lda-1.0.4.tar.gz
Algorithm Hash digest
SHA256 4cd08d032df5a3db0db3e642d2732b1c02489d73f2e45b5e0f346b3a53734393
MD5 cb35961edd6f23e5d856404fd502ec93
BLAKE2b-256 f7bd5aac0ff80b2fcb5e0db35132038a48e90024d7040c7b4280b280be8a6e76

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 688581daf2b22035badd5906a920f8b5430f6b53197ce0d979f7180b5f896948
MD5 44686bed980f0b7184b83b9b1cceeea9
BLAKE2b-256 5f7d2a11b684ede4d24d69a3695da662e7916102ac8399b4ab6b11a8c5e9108d

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp35-cp35m-win32.whl.

File metadata

  • Download URL: lda-1.0.4-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 303.4 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lda-1.0.4-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 560447f6b78b82fea7d9956e13c788484bebbf897f164a39f629d0e4bf9e753c
MD5 446cfc586e4cfe7a7b71ef47a26b195b
BLAKE2b-256 78807f40d9b920c523019c90e120084dbc318320132f3426704f767ea6c07a55

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cc4512fc1893e690be0f896f51dba628b62b5404d7d278ae325a8e950a58a1dd
MD5 b54954c66abf225dac55db5d1607e5cf
BLAKE2b-256 9ffd93134140a52757521076a0c0e0fbdbd939137bdc3182979600076fc870d5

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 7a6e78f8e38698e3d3821a0b64c22e4d8058d229db2d9c3b8d0370673b1d8dc1
MD5 69c08ec8ec691160d136c33d00278e7b
BLAKE2b-256 db15f4dcd339445aea0a3fbe820092048ca8f063d475968a0e1ce2432d6aa944

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 cf03c5f5335e6f53f14b2ee6aa5b199cede30f7ff07cc36bf13bbb3fc11b267a
MD5 cfcdf1bacf081765bcd1671dd8302ffd
BLAKE2b-256 4b22d26d0dddb04e3840ca60f84a258235370fd2a32b2e0a7edc6da286780682

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp34-cp34m-win_amd64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp34-cp34m-win_amd64.whl
Algorithm Hash digest
SHA256 43cf5f16dfc806812c2c98c95f959d48e565c865a6eab0b4df13faa816f9693f
MD5 e6d5061d0a78f913219ed75027342cd2
BLAKE2b-256 0b287da9f1bfb9add4612d8da030478af7570eafce6dde9c8ce996b007e22541

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp34-cp34m-win32.whl.

File metadata

  • Download URL: lda-1.0.4-cp34-cp34m-win32.whl
  • Upload date:
  • Size: 304.9 kB
  • Tags: CPython 3.4m, Windows x86
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lda-1.0.4-cp34-cp34m-win32.whl
Algorithm Hash digest
SHA256 7fb1dd267ff13f6c09dc6974f7acf106b3d3a4770306a4bd2f73fb32658c61cb
MD5 72fdeb8c5a67eef65dda1a4eb7cab6b2
BLAKE2b-256 9d8a78421ea1d2e253481eeaf7ace30557f811ffba90fcec6a90faacf4f0c755

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8538275d8cc67d211007175ff1210872a93e9a687bb82bb84fd66bd099c41f3a
MD5 83af12364e59b5c802bc57d9f601b1d4
BLAKE2b-256 6e1fe938a98a6b453ec20f9a4e3e356e133b64e99c75113a71753411aebeae6d

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp34-cp34m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp34-cp34m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 32f653af1d8db67a2fdad5e62c031cc2d2d53798baa5325a8e72bcf78a06fad7
MD5 b6b912def6f60219ab9017b491026944
BLAKE2b-256 3df41c9451be4afd6ddda6fec973fcbebddfb18fd9a0bcb0551e25fcc9f5b015

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 6fc9aefe128412e86459543fc77fdb3351864b4e7e182a155a334afc8550ccd4
MD5 a85e7fcd78774d8378ea7730a8335948
BLAKE2b-256 3608b42b072117998fc153380b129943562dd8aa124f41f23f2c572454376f01

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp33-cp33m-win_amd64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp33-cp33m-win_amd64.whl
Algorithm Hash digest
SHA256 8cf51efd0bc71c9ff9dd10acff1aa362850dd744ce61cbadf0efc937f1bf1495
MD5 3c1124ebd3454d3fda53287e78f2dfbf
BLAKE2b-256 3181bc80f82c48d60726a654e64ff438f9c2c51b932e49ae39d2082000a8ea8d

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp33-cp33m-win32.whl.

File metadata

  • Download URL: lda-1.0.4-cp33-cp33m-win32.whl
  • Upload date:
  • Size: 304.8 kB
  • Tags: CPython 3.3m, Windows x86
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lda-1.0.4-cp33-cp33m-win32.whl
Algorithm Hash digest
SHA256 67d8d53ded918260c1c26d61d9cf3e5ad17d1b1f3788af229651556f2e92cf95
MD5 3d714d7f9e5b3963f527cd6a8a8916a4
BLAKE2b-256 81c5a87dee951f96c200c3ba688b75bbf8e40145d167a0fec92fdde76fa4b045

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7fa65b70bb98e675339a72290bf16538781aca73b23cde7d64b0949b59929de0
MD5 5efc7bbde60de4e732407923a770ddd6
BLAKE2b-256 ce16a12c100551729098307820d650a501bb699ba6ce17f3eeb7e34f9aa4760e

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp27-cp27mu-manylinux1_i686.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 83ba5b5d1c51bbd00a52f79f7513dc7c25ed466894ce509c84aefe4edcc3b68b
MD5 8ffd214ea0ff3a1aafb6bcef39cd3b63
BLAKE2b-256 4dcdd52c082994c379e409b8d9eec8296e8afac83cdb0e4ceaf8fe09b39b04fa

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 109f7149a3caa379e9b79bb6f842a9c602d738a02dff8073dd671606f2c01e65
MD5 3fe94b24fe8a010d762273f4c8c1b689
BLAKE2b-256 7731b98313d1c9dcbd5999b7e9cab029ea29c12a7085bf4c5879b86e9870f519

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp27-cp27m-win32.whl.

File metadata

  • Download URL: lda-1.0.4-cp27-cp27m-win32.whl
  • Upload date:
  • Size: 304.5 kB
  • Tags: CPython 2.7m, Windows x86
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lda-1.0.4-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 69cc2e39ce4c41366094fa4bae1955769c1bc71f04b769f23e6b1c9644f7c4b2
MD5 2d4fbd77ae2c4927f7fc84f0f1b72ef0
BLAKE2b-256 95cc6cd3064181161979c987eafe7e17b1f767a8faa3cd07e66af6f9de0eac40

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 faf35a8032a5d408e96ad07144bb6a79c68e8f3af3e3b864fba15a49fe28dc66
MD5 6751fac3b9d15a74385213e2f4de8e0a
BLAKE2b-256 1ed5f4544477b2a5d15a6d3b1b69886b67e24f50525346de1a5fa686fbb0b7be

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp27-cp27m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 0e1070f387ad0b4ceb9bbc7d3c6e032aa15a80cef5f9490e139e97d5d0341bca
MD5 410d07d3375f4dfaa6a49988d02b2574
BLAKE2b-256 71b02b7d6201b36bd23b732febfdc967de2a365cc5b521456c19332d01f295ce

See more details on using hashes here.

File details

Details for the file lda-1.0.4-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for lda-1.0.4-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 4c967a90f997a261e76754b289753fdb929c04b2a917df259aec38456e4ce161
MD5 75515ee470c580e29c922264221cea12
BLAKE2b-256 06dd2cf48ddfe299d7cc5664b828c08c1524dce746b6fa00799c40c32f6e59a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page