public abstract class TextSimilarity extends Object implements Similarity, SimilarityRanker
Modifier and Type | Field and Description |
---|---|
protected boolean |
filterStopWord |
protected static org.slf4j.Logger |
LOGGER |
thresholdRate
Constructor and Description |
---|
TextSimilarity() |
Modifier and Type | Method and Description |
---|---|
protected abstract double |
scoreImpl(List<Word> words1,
List<Word> words2)
计算相似度分值
|
void |
setSegmentationAlgorithm(SegmentationAlgorithm segmentationAlgorithm) |
double |
similarScore(List<Word> words1,
List<Word> words2)
词列表1和词列表2的相似度分值
|
double |
similarScore(String text1,
String text2)
文本1和文本2的相似度分值
|
protected void |
taggingWeightWithWordFrequency(List<Word> words1,
List<Word> words2)
如果没有指定权重,则默认使用词频来标注词的权重
词频数据怎么来?
一个词在词列表1中出现了几次,它在词列表1中的权重就是几
一个词在词列表2中出现了几次,它在词列表2中的权重就是几
标注好的权重存储在Word类的weight字段中
|
protected Map<String,Float> |
toFastSearchMap(List<Word> words)
构造权重快速搜索容器
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
main, rank, rank
isSimilar, isSimilar, isSimilar, isSimilar, similarScore, similarScore
protected static final org.slf4j.Logger LOGGER
protected boolean filterStopWord
public void setSegmentationAlgorithm(SegmentationAlgorithm segmentationAlgorithm)
public double similarScore(String text1, String text2)
similarScore
in interface Similarity
text1
- 文本1text2
- 文本2public double similarScore(List<Word> words1, List<Word> words2)
similarScore
in interface Similarity
words1
- 词列表1words2
- 词列表2protected abstract double scoreImpl(List<Word> words1, List<Word> words2)
words1
- 词列表1words2
- 词列表2protected void taggingWeightWithWordFrequency(List<Word> words1, List<Word> words2)
words1
- 词列表1words2
- 词列表2Copyright © 2014–2015 APDPlat. All rights reserved.