1
0
mirror of https://git.FreeBSD.org/ports.git synced 2025-01-28 10:08:24 +00:00
freebsd-ports/textproc/sentencepiece/pkg-descr

8 lines
477 B
Plaintext

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for
Neural Network-based text generation systems where the vocabulary size is
predetermined prior to the neural model training. SentencePiece implements
subword units (e.g., byte-pair-encoding (BPE)) and unigram language model
with the extension of direct training from raw sentences. SentencePiece
allows us to make a purely end-to-end system that does not depend on
language-specific pre/postprocessing.