mirror of
https://git.FreeBSD.org/ports.git
synced 2025-01-03 06:04:53 +00:00
22 lines
996 B
Plaintext
22 lines
996 B
Plaintext
Gensim is a Python library for topic modelling, document indexing and similarity
|
|
retrieval with large corpora. Target audience is the natural language processing
|
|
(NLP) and information retrieval (IR) community.
|
|
|
|
Features:
|
|
* All algorithms are memory-independent w.r.t. the corpus size (can process
|
|
input larger than RAM, streamed, out-of-core),
|
|
* Intuitive interfaces
|
|
* easy to plug in your own input corpus/datastream (trivial streaming API)
|
|
* easy to extend with other Vector Space algorithms (trivial transformation
|
|
API)
|
|
* Efficient multicore implementations of popular algorithms, such as online
|
|
Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA),
|
|
Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep
|
|
learning.
|
|
* Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet
|
|
Allocation on a cluster of computers.
|
|
* Extensive documentation and Jupyter Notebook tutorials.
|
|
|
|
|
|
WWW: https://radimrehurek.com/gensim/
|