mirror of
https://git.FreeBSD.org/ports.git
synced 2024-12-13 03:03:15 +00:00
14e6325838
Segmentation. PR: ports/113476 Submitted by: Gea-Suan Lin <gslin at gslin.org>
14 lines
591 B
Plaintext
14 lines
591 B
Plaintext
This is a perl version of simplified Chinese word segmentation.
|
|
|
|
The algorithm for this segmenter is to search the longest word at each
|
|
point from both left and right directions, and choose the one with
|
|
higher frequency product.
|
|
|
|
The original program is from the CPAN module Lingua::ZH::WordSegment
|
|
(http://search.cpan.org/~chenyr/) I did the follwing changes: 1) make
|
|
the interface object oriented; 2) make the internal string into utf8;
|
|
3) using sogou's dictionary (http://www.sogou.com/labs/dl/w.html) as
|
|
the default dictionary.
|
|
|
|
WWW: http://search.cpan.org/dist/Lingua-ZH-WordSegmenter/
|