CRAN/E | morphemepiece

morphemepiece

Morpheme Tokenization

Installation

About

Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.

github.com/macmillancontentscience/morphemepiece
Bug report File report

Key Metrics

Version 1.2.3
Published 2022-04-16 905 days ago
Needs compilation? no
License Apache License (≥ 2)
CRAN checks morphemepiece results

Downloads

Yesterday 1 -86%
Last 7 days 42 -48%
Last 30 days 252 -26%
Last 90 days 818 +8%
Last 365 days 3.253 +21%

Maintainer

Maintainer

Jonathan Bratt

Authors

Jonathan Bratt

aut / cre

Jon Harmon

aut

Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning

cph

Material

README
NEWS
Reference manual
Package source

Vignettes

Testing the fall-through algorithm
Generating a Vocabulary and Lookup

macOS

r-release

arm64

r-oldrel

arm64

r-release

x86_64

r-oldrel

x86_64

Windows

r-devel

x86_64

r-release

x86_64

r-oldrel

x86_64

Old Sources

morphemepiece archive

Imports

dlr ≥ 1.0.0
fastmatch
magrittr
memoise ≥ 2.0.0
morphemepiece.data
piecemaker ≥ 1.0.0
purrr ≥ 0.3.4
readr
rlang
stringr ≥ 1.4.0

Suggests

dplyr
fs
ggplot2
here
knitr
remotes
rmarkdown
testthat ≥ 3.0.0
utils