doc2vec

Distributed Representations of Sentences, Documents and Topics

CRAN Package

Learn vector representations of sentences, paragraphs or documents by using the 'Paragraph Vector' algorithms, namely the distributed bag of words ('PV-DBOW') and the distributed memory ('PV-DM') model. The techniques in the package are detailed in the paper "Distributed Representations of Sentences and Documents" by Mikolov et al. (2014), available at doi:10.48550/arXiv.1405.4053. The package also provides an implementation to cluster documents based on these embedding using a technique called top2vec. Top2vec finds clusters in text documents by combining techniques to embed documents and words and density-based clustering. It does this by embedding documents in the semantic space as defined by the 'doc2vec' algorithm. Next it maps these document embeddings to a lower-dimensional space using the 'Uniform Manifold Approximation and Projection' (UMAP) clustering algorithm and finds dense areas in that space using a 'Hierarchical Density-Based Clustering' technique (HDBSCAN). These dense areas are the topic clusters which can be represented by the corresponding topic vector which is an aggregate of the document embeddings of the documents which are part of that topic cluster. In the same semantic space similar words can be found which are representative of the topic. More details can be found in the paper 'Top2Vec: Distributed Representations of Topics' by D. Angelov available at doi:10.48550/arXiv.2008.09470.


Documentation


Team


Insights

Last 30 days

This package has been downloaded 684 times in the last 30 days. Not bad! The download count is somewhere between 'small-town buzz' and 'moderate academic conference'. The following heatmap shows the distribution of downloads per day. Yesterday, it was downloaded 12 times.

Sun
Mon
Tue
Wed
Thu
Fri
Sat
3 downloadsMar 16, 2025
37 downloadsMar 17, 2025
43 downloadsMar 18, 2025
9 downloadsMar 19, 2025
15 downloadsMar 20, 2025
8 downloadsMar 21, 2025
8 downloadsMar 22, 2025
10 downloadsMar 23, 2025
77 downloadsMar 24, 2025
5 downloadsMar 25, 2025
4 downloadsMar 26, 2025
9 downloadsMar 27, 2025
69 downloadsMar 28, 2025
6 downloadsMar 29, 2025
2 downloadsMar 30, 2025
4 downloadsMar 31, 2025
66 downloadsApr 1, 2025
4 downloadsApr 2, 2025
4 downloadsApr 3, 2025
5 downloadsApr 4, 2025
70 downloadsApr 5, 2025
17 downloadsApr 6, 2025
10 downloadsApr 7, 2025
11 downloadsApr 8, 2025
67 downloadsApr 9, 2025
4 downloadsApr 10, 2025
11 downloadsApr 11, 2025
88 downloadsApr 12, 2025
6 downloadsApr 13, 2025
12 downloadsApr 14, 2025
0 downloadsApr 15, 2025
0 downloadsApr 16, 2025
0 downloadsApr 17, 2025
0 downloadsApr 18, 2025
0 downloadsApr 19, 2025
2
88

The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.

Last 365 days

This package has been downloaded 9,722 times in the last 365 days. A solid achievement! Enough downloads to get noticed at department meetings. The day with the most downloads was Apr 12, 2025 with 88 downloads.

The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.

Data provided by CRAN


Binaries


Dependencies

  • Imports1 package
  • Suggests5 packages
  • Linking To1 package