corpustools
Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
- Version0.5.1
- R version≥ 3.5.0
- LicenseGPL-3
- Needs compilation?Yes
- Last release05/08/2023
Documentation
Team
Kasper Welbers
Kasper Welbers and Wouter van Atteveldt
Insights
Last 30 days
Last 365 days
The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.
Data provided by CRAN
Binaries
Dependencies
- Depends1 package
- Imports15 packages
- Suggests5 packages
- Linking To2 packages
- Reverse Imports1 package
- Reverse Suggests1 package