tokenizers

Fast, Consistent Tokenization of Natural Language Text

CRAN Package

Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.


Documentation


Team


Insights

Last 30 days

This package has been downloaded 35,693 times in the last 30 days. The academic equivalent of having a dedicated subreddit. There are fans, and maybe even a few trolls! The following heatmap shows the distribution of downloads per day. Yesterday, it was downloaded 1,391 times.

Sun
Mon
Tue
Wed
Thu
Fri
Sat
0 downloadsFeb 9, 2025
0 downloadsFeb 10, 2025
0 downloadsFeb 11, 2025
1,213 downloadsFeb 12, 2025
1,138 downloadsFeb 13, 2025
1,200 downloadsFeb 14, 2025
677 downloadsFeb 15, 2025
618 downloadsFeb 16, 2025
1,152 downloadsFeb 17, 2025
1,199 downloadsFeb 18, 2025
1,279 downloadsFeb 19, 2025
1,224 downloadsFeb 20, 2025
1,140 downloadsFeb 21, 2025
800 downloadsFeb 22, 2025
684 downloadsFeb 23, 2025
1,283 downloadsFeb 24, 2025
1,561 downloadsFeb 25, 2025
1,402 downloadsFeb 26, 2025
1,378 downloadsFeb 27, 2025
1,368 downloadsFeb 28, 2025
841 downloadsMar 1, 2025
812 downloadsMar 2, 2025
1,350 downloadsMar 3, 2025
1,543 downloadsMar 4, 2025
1,522 downloadsMar 5, 2025
1,436 downloadsMar 6, 2025
1,354 downloadsMar 7, 2025
914 downloadsMar 8, 2025
772 downloadsMar 9, 2025
1,364 downloadsMar 10, 2025
1,574 downloadsMar 11, 2025
1,504 downloadsMar 12, 2025
1,391 downloadsMar 13, 2025
0 downloadsMar 14, 2025
0 downloadsMar 15, 2025
618
1,574

The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.

Last 365 days

This package has been downloaded 352,802 times in the last 365 days. This is the kind of download count that makes grant committees nod approvingly. A job well done, even the stoic reviewers might be impressed! The day with the most downloads was Apr 30, 2024 with 1,746 downloads.

The following line graph shows the downloads per day. You can hover over the graph to see the exact number of downloads per day.

Data provided by CRAN


Binaries


Dependencies

  • Imports3 packages
  • Suggests5 packages
  • Linking To1 package
  • Reverse Imports11 packages
  • Reverse Suggests2 packages
  • Reverse Enhances1 package