mirror of
https://git.FreeBSD.org/ports.git
synced 2024-11-28 01:06:17 +00:00
14 lines
594 B
Plaintext
14 lines
594 B
Plaintext
This is a perl version of simplified Chinese word segmentation.
|
|
|
|
The algorithm for this segmenter is to search the longest word at each point
|
|
from both left and right directions, and choose the one with higher frequency
|
|
product.
|
|
|
|
The original program is from the CPAN module Lingua::ZH::WordSegment
|
|
(https://metacpan.org/author/CHENYR) I did the follwing changes: 1) make the
|
|
interface object oriented; 2) make the internal string into utf8; 3) using
|
|
sogou's dictionary (http://www.sogou.com/labs/dl/w.html) as the default
|
|
dictionary.
|
|
|
|
WWW: https://metacpan.org/release/Lingua-ZH-WordSegmenter
|