public abstract class AbstractSegmentation extends Object implements DictionaryBasedSegmentation
Modifier and Type | Field and Description |
---|---|
protected org.slf4j.Logger |
LOGGER |
Constructor and Description |
---|
AbstractSegmentation() |
Modifier and Type | Method and Description |
---|---|
protected void |
addWord(List<Word> result,
String text,
int start,
int len)
将识别出的词放入队列
|
protected void |
addWord(Stack<Word> result,
String text,
int start,
int len)
将识别出的词入栈
|
Dictionary |
getDictionary()
获取词典操作接口
|
int |
getInterceptLength()
分词时截取的字符串的最大长度
|
protected Word |
getWord(String text,
int start,
int len)
获取一个已经识别的词
|
boolean |
isParallelSeg() |
static void |
main(String[] args) |
Map<List<Word>,Float> |
ngram(List<Word>... sentences)
利用ngram进行评分
|
boolean |
ngramEnabled()
是否启用ngram
|
List<Word> |
seg(String text)
默认分词算法实现:
1、把要分词的文本根据标点符号进行分割
2、对分割后的文本进行分词
3、组合分词结果
|
abstract List<Word> |
segImpl(String text)
具体的分词实现,留待子类实现
|
void |
setDictionary(Dictionary dictionary)
为基于词典的中文分词接口指定词典操作接口
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getSegmentationAlgorithm
public boolean isParallelSeg()
public void setDictionary(Dictionary dictionary)
setDictionary
in interface DictionaryBasedSegmentation
dictionary
- 词典操作接口public Dictionary getDictionary()
getDictionary
in interface DictionaryBasedSegmentation
public abstract List<Word> segImpl(String text)
text
- 文本public boolean ngramEnabled()
public Map<List<Word>,Float> ngram(List<Word>... sentences)
sentences
- 多个分词结果public int getInterceptLength()
public List<Word> seg(String text)
seg
in interface Segmentation
text
- 文本protected void addWord(List<Word> result, String text, int start, int len)
result
- 队列text
- 文本start
- 词开始索引len
- 词长度protected void addWord(Stack<Word> result, String text, int start, int len)
result
- 栈text
- 文本start
- 词开始索引len
- 词长度protected Word getWord(String text, int start, int len)
text
- 文本start
- 词开始索引len
- 词长度public static void main(String[] args)
Copyright © 2014–2015 APDPlat. All rights reserved.