Skip navigation links
A B C D E F G H I J L M N O P Q R S T U V W 

A

AbstractSegmentation - Class in org.apdplat.word.segmentation.impl
基于词典的分词算法抽象类
AbstractSegmentation() - Constructor for class org.apdplat.word.segmentation.impl.AbstractSegmentation
 
add(String) - Method in interface org.apdplat.word.dictionary.Dictionary
将单个词加入词典
add(String) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
add(String) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
add(String) - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
add(String) - Method in interface org.apdplat.word.util.ResourceLoader
动态增加一行数据
addAll(List<String>) - Method in interface org.apdplat.word.dictionary.Dictionary
批量将词加入词典
addAll(List<String>) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
addAll(List<String>) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
addAll(List<String>) - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
addAndGet(float) - Method in class org.apdplat.word.util.AtomicFloat
 
addHit(Hit) - Method in class org.apdplat.word.analysis.Hits
 
addHits(List<Hit>) - Method in class org.apdplat.word.analysis.Hits
 
addWord(List<Word>, String, int, int) - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
将识别出的词放入队列
addWord(Stack<Word>, String, int, int) - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
将识别出的词入栈
AhoCorasickDoubleArrayTrie<V> - Class in org.apdplat.word.dictionary.impl
An implemention of Aho Corasick algorithm based on Double Array Trie
AhoCorasickDoubleArrayTrie() - Constructor for class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
AhoCorasickDoubleArrayTrie.Hit<V> - Class in org.apdplat.word.dictionary.impl
A result output
AhoCorasickDoubleArrayTrie.IHit<V> - Interface in org.apdplat.word.dictionary.impl
Processor handles the output when hit a keyword
AhoCorasickDoubleArrayTrie.IHitFull<V> - Interface in org.apdplat.word.dictionary.impl
Processor handles the output when hit a keyword, with more detail
AntonymTagging - Class in org.apdplat.word.tagging
反义标注
AtomicFloat - Class in org.apdplat.word.util
因为Java没有提供AtomicFloat 所以自己实现一个
AtomicFloat() - Constructor for class org.apdplat.word.util.AtomicFloat
 
AtomicFloat(float) - Constructor for class org.apdplat.word.util.AtomicFloat
 
AutoDetector - Class in org.apdplat.word.util
资源变化自动检测
AutoDetector() - Constructor for class org.apdplat.word.util.AutoDetector
 

B

base - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
base array of the Double Array Trie structure
begin - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie.Hit
the beginning index, inclusive.
BidirectionalMaximumMatching - Class in org.apdplat.word.segmentation.impl
基于词典的双向最大匹配算法 Dictionary-based bidirectional maximum matching algorithm
BidirectionalMaximumMatching() - Constructor for class org.apdplat.word.segmentation.impl.BidirectionalMaximumMatching
 
BidirectionalMaximumMinimumMatching - Class in org.apdplat.word.segmentation.impl
基于词典的双向最大最小匹配算法 Dictionary-based bidirectional maximum minimum matching algorithm 利用ngram从 逆向最大匹配、正向最大匹配、逆向最小匹配、正向最小匹配 4种切分结果中选择一种最好的分词结果 如果分值都一样,则选择逆向最大匹配 实验表明,对于汉语来说,逆向最大匹配算法比(正向)最大匹配算法更有效
BidirectionalMaximumMinimumMatching() - Constructor for class org.apdplat.word.segmentation.impl.BidirectionalMaximumMinimumMatching
 
BidirectionalMinimumMatching - Class in org.apdplat.word.segmentation.impl
基于词典的双向最小匹配算法 Dictionary-based bidirectional minimum matching algorithm
BidirectionalMinimumMatching() - Constructor for class org.apdplat.word.segmentation.impl.BidirectionalMinimumMatching
 
Bigram - Class in org.apdplat.word.corpus
二元语法模型
Bigram() - Constructor for class org.apdplat.word.corpus.Bigram
 
bigram(List<Word>...) - Static method in class org.apdplat.word.corpus.Bigram
含有语境的二元模型分值算法 计算多种分词结果的分值 利用获得的二元模型分值重新计算分词结果的分值 补偿细粒度切分获得分值而粗粒度切分未获得分值的情况
bigram(List<Word>) - Static method in class org.apdplat.word.corpus.Bigram
计算分词结果的二元模型分值
build(Map<String, V>) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Build a AhoCorasickDoubleArrayTrie from a map

C

callback(Word) - Method in interface org.apdplat.word.util.Utils.FileSegmentationCallback
 
check - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
check array of the Double Array Trie structure
ChineseWordAnalysisBinderProcessor - Class in org.apdplat.word.elasticsearch
中文分词组件注册
ChineseWordAnalysisBinderProcessor() - Constructor for class org.apdplat.word.elasticsearch.ChineseWordAnalysisBinderProcessor
 
ChineseWordAnalyzer - Class in org.apdplat.word.lucene
Lucene中文分析器
ChineseWordAnalyzer() - Constructor for class org.apdplat.word.lucene.ChineseWordAnalyzer
 
ChineseWordAnalyzer(String) - Constructor for class org.apdplat.word.lucene.ChineseWordAnalyzer
 
ChineseWordAnalyzer(SegmentationAlgorithm) - Constructor for class org.apdplat.word.lucene.ChineseWordAnalyzer
 
ChineseWordAnalyzer(Segmentation) - Constructor for class org.apdplat.word.lucene.ChineseWordAnalyzer
 
ChineseWordAnalyzerProvider - Class in org.apdplat.word.elasticsearch
中文分析器工厂
ChineseWordAnalyzerProvider(Index, Settings, Environment, String, Settings) - Constructor for class org.apdplat.word.elasticsearch.ChineseWordAnalyzerProvider
 
ChineseWordIndicesAnalysis - Class in org.apdplat.word.elasticsearch
中文分词索引分析组件
ChineseWordIndicesAnalysis(Settings, IndicesAnalysisService) - Constructor for class org.apdplat.word.elasticsearch.ChineseWordIndicesAnalysis
 
ChineseWordIndicesAnalysisModule - Class in org.apdplat.word.elasticsearch
中文分词索引分析模块
ChineseWordIndicesAnalysisModule() - Constructor for class org.apdplat.word.elasticsearch.ChineseWordIndicesAnalysisModule
 
ChineseWordPlugin - Class in org.apdplat.word.elasticsearch
中文分词组件(word)的ElasticSearch插件
ChineseWordPlugin(Settings) - Constructor for class org.apdplat.word.elasticsearch.ChineseWordPlugin
 
ChineseWordTokenizer - Class in org.apdplat.word.lucene
Lucene中文分词器
ChineseWordTokenizer() - Constructor for class org.apdplat.word.lucene.ChineseWordTokenizer
 
ChineseWordTokenizer(String) - Constructor for class org.apdplat.word.lucene.ChineseWordTokenizer
 
ChineseWordTokenizer(SegmentationAlgorithm) - Constructor for class org.apdplat.word.lucene.ChineseWordTokenizer
 
ChineseWordTokenizer(Segmentation) - Constructor for class org.apdplat.word.lucene.ChineseWordTokenizer
 
ChineseWordTokenizerFactory - Class in org.apdplat.word.elasticsearch
中文分词器工厂
ChineseWordTokenizerFactory(Index, Settings, String, Settings) - Constructor for class org.apdplat.word.elasticsearch.ChineseWordTokenizerFactory
 
ChineseWordTokenizerFactory - Class in org.apdplat.word.solr
Lucene中文分词器工厂
ChineseWordTokenizerFactory(Map<String, String>) - Constructor for class org.apdplat.word.solr.ChineseWordTokenizerFactory
 
clear() - Method in interface org.apdplat.word.dictionary.Dictionary
清空词典中的所有的词
clear() - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
clear() - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
clear() - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
clear() - Method in class org.apdplat.word.util.DoubleArrayGenericTrie
 
clear() - Method in class org.apdplat.word.util.GenericTrie
 
clear() - Method in interface org.apdplat.word.util.ResourceLoader
清空数据
close() - Method in class org.apdplat.word.util.DirectoryWatcher
关闭监控线程
combine(List<Word>) - Static method in class org.apdplat.word.segmentation.WordRefiner
将多个词合并成一个,返回null表示不能合并
compareAndSet(float, float) - Method in class org.apdplat.word.util.AtomicFloat
 
compareTo(Object) - Method in class org.apdplat.word.analysis.Hit
 
compareTo(Object) - Method in class org.apdplat.word.corpus.EvaluationResult
 
compareTo(Object) - Method in class org.apdplat.word.segmentation.Word
 
compute(String, int) - Method in class org.apdplat.word.vector.Distance
 
configure() - Method in class org.apdplat.word.elasticsearch.ChineseWordIndicesAnalysisModule
 
contains(String, int, int) - Method in interface org.apdplat.word.dictionary.Dictionary
判断指定的文本是不是一个词
contains(String) - Method in interface org.apdplat.word.dictionary.Dictionary
判断文本是不是一个词
contains(String, int, int) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
contains(String) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
contains(String) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
contains(String, int, int) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
contains(String, int, int) - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
contains(String) - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
CorpusMerge - Class in org.apdplat.word.corpus
将多个语料库文件合并为一个
CorpusMerge() - Constructor for class org.apdplat.word.corpus.CorpusMerge
 
CorpusTools - Class in org.apdplat.word.corpus
语料库工具 用于构建二元模型和三元模型并做进一步的分析处理 同时把语料库中的新词加入词典
CorpusTools() - Constructor for class org.apdplat.word.corpus.CorpusTools
 
CosineTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:余弦相似度,通过计算两个向量的夹角余弦值来评估他们的相似度 余弦夹角原理: 向量a=(x1,y1),向量b=(x2,y2) similarity=a.b/|a|*|b| a.b=x1x2+y1y2 |a|=根号[(x1)^2+(y1)^2],|b|=根号[(x2)^2+(y2)^2]
CosineTextSimilarity() - Constructor for class org.apdplat.word.analysis.CosineTextSimilarity
 
create() - Method in class org.apdplat.word.elasticsearch.ChineseWordTokenizerFactory
 
create(AttributeFactory) - Method in class org.apdplat.word.solr.ChineseWordTokenizerFactory
 
createComponents(String) - Method in class org.apdplat.word.lucene.ChineseWordAnalyzer
 

D

decrementAndGet() - Method in class org.apdplat.word.util.AtomicFloat
 
deleteDir(File) - Static method in class org.apdplat.word.util.Utils
删除目录
description() - Method in class org.apdplat.word.elasticsearch.ChineseWordPlugin
 
Dictionary - Interface in org.apdplat.word.dictionary
词典操作接口
DictionaryBasedSegmentation - Interface in org.apdplat.word.segmentation
基于词典的中文分词接口 Dictionary Based Chinese Word Segmentation Interface
DictionaryFactory - Class in org.apdplat.word.dictionary
词典工厂 通过系统属性及配置文件指定词典实现类(dic.class)和词典文件(dic.path) 指定方式一,编程指定(高优先级): WordConfTools.set("dic.class", "org.apdplat.word.dictionary.impl.DictionaryTrie"); WordConfTools.set("dic.path", "classpath:dic.txt"); 指定方式二,Java虚拟机启动参数(中优先级): java -Ddic.class=org.apdplat.word.dictionary.impl.DictionaryTrie -Ddic.path=classpath:dic.txt 指定方式三,配置文件指定(低优先级): 在类路径下的word.conf中指定配置信息 dic.class=org.apdplat.word.dictionary.impl.DictionaryTrie dic.path=classpath:dic.txt 如未指定,则默认使用词典实现类(org.apdplat.word.dictionary.impl.DictionaryTrie)和词典文件(类路径下的dic.txt)
DictionaryTools - Class in org.apdplat.word.dictionary
词典工具 1、把多个词典合并为一个并规范清理 词长度:只保留大于等于2并且小于等于4的长度的词 识别功能: 移除能识别的词 移除非中文词:防止大量无意义或特殊词混入词典 2、移除词典中的短语结构
DictionaryTools() - Constructor for class org.apdplat.word.dictionary.DictionaryTools
 
DictionaryTrie - Class in org.apdplat.word.dictionary.impl
词首字索引式前缀树 前缀树的Java实现 为前缀树的一级节点(词首字)建立索引(比二分查找要快) 用于查找一个指定的字符串是否在词典中
DictionaryTrie() - Constructor for class org.apdplat.word.dictionary.impl.DictionaryTrie
 
DirectoryWatcher - Class in org.apdplat.word.util
文件系统目录和文件监控服务
DirectoryWatcher.WatcherCallback - Interface in org.apdplat.word.util
 
Distance - Class in org.apdplat.word.vector
计算词和词的相似度
Distance(TextSimilarity, String) - Constructor for class org.apdplat.word.vector.Distance
 
DoubleArrayDictionaryTrie - Class in org.apdplat.word.dictionary.impl
双数组前缀树的Java实现 用于查找一个指定的字符串是否在词典中 An Implementation of Double-Array Trie: http://linux.thai.net/~thep/datrie/datrie.html
DoubleArrayDictionaryTrie() - Constructor for class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
DoubleArrayGenericTrie - Class in org.apdplat.word.util
通用的双数组前缀树的Java实现 用于快速检索 K V 对 An Implementation of Double-Array Trie: http://linux.thai.net/~thep/datrie/datrie.html
DoubleArrayGenericTrie(int) - Constructor for class org.apdplat.word.util.DoubleArrayGenericTrie
 
DoubleArrayGenericTrie() - Constructor for class org.apdplat.word.util.DoubleArrayGenericTrie
 
doubleValue() - Method in class org.apdplat.word.util.AtomicFloat
 
dump(Map<String, String>) - Static method in class org.apdplat.word.segmentation.SegmentationContrast
 
dump(String) - Method in class org.apdplat.word.WordFrequencyStatistics
将词频统计结果保存到文件
dump() - Method in class org.apdplat.word.WordFrequencyStatistics
将词频统计结果保存到文件

E

EditDistanceTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:编辑距离(Edit Distance) 指两个字串之间,由一个转成另一个所需的最少编辑操作次数 允许的编辑操作包括将一个字符替换成另一个字符,增加一个字符,删除一个字符 例如将kitten一字转成sitting: sitten (k→s)将一个字符k替换成另一个字符s sittin (e→i)将一个字符e替换成另一个字符i sitting (→g)增加一个字符g 因为这个算法是俄罗斯科学家Vladimir Levenshtein在1965年提出 所以编辑距离(Edit Distance)又称Levenshtein距离
EditDistanceTextSimilarity() - Constructor for class org.apdplat.word.analysis.EditDistanceTextSimilarity
 
end - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie.Hit
the ending index, exclusive.
equals(Object) - Method in class org.apdplat.word.segmentation.Word
 
EuclideanDistanceTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:欧几里得距离(Euclidean Distance),通过计算两点间的距离来评估他们的相似度 欧几里得距离原理: 设A(x1, y1),B(x2, y2)是平面上任意两点 两点间的距离dist(A,B)=sqrt((x1-x2)^2+(y1-y2)^2)
EuclideanDistanceTextSimilarity() - Constructor for class org.apdplat.word.analysis.EuclideanDistanceTextSimilarity
 
Evaluation - Class in org.apdplat.word.corpus
利用人工标注的语料库 对分词算法效果进行评估 评估采用的测试文本有253 3709行,共2837 4490个字符 评估结果位于target/evaluation目录下: corpus-text.txt为分好词的人工标注文本,词之间以空格分隔 test-text.txt为测试文本,是把corpus-text.txt以标点符号分隔为多行的结果 standard-text.txt为测试文本对应的人工标注文本,作为分词是否正确的标准 result-text-***,***为各种分词算法名称,这是word分词结果 perfect-result-***,***为各种分词算法名称,这是分词结果和人工标注标准完全一致的文本 wrong-result-***,***为各种分词算法名称,这是分词结果和人工标注标准不一致的文本
Evaluation() - Constructor for class org.apdplat.word.corpus.Evaluation
 
evaluation(String, String, String, String) - Static method in class org.apdplat.word.corpus.Evaluation
分词效果评估
evaluation(String, String) - Static method in class org.apdplat.word.corpus.Evaluation
分词效果评估
EvaluationResult - Class in org.apdplat.word.corpus
中文分词效果评估结果
EvaluationResult() - Constructor for class org.apdplat.word.corpus.EvaluationResult
 
exactMatchSearch(String) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
match exactly by a key
execute(WatchEvent.Kind<?>, String) - Method in interface org.apdplat.word.util.DirectoryWatcher.WatcherCallback
 
extractFromCorpus(String, String, boolean) - Static method in class org.apdplat.word.corpus.ExtractText
从语料库中抽取内容
ExtractText - Class in org.apdplat.word.corpus
从语料库中抽取文本
ExtractText() - Constructor for class org.apdplat.word.corpus.ExtractText
 

F

F - Class in org.apdplat.word.vector
Created by apple on 7/14/15.
F() - Constructor for class org.apdplat.word.vector.F
 
fail - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
fail table of the Aho Corasick automata
filterStopWord - Variable in class org.apdplat.word.analysis.TextSimilarity
 
filterStopWords(List<Word>) - Static method in class org.apdplat.word.recognition.StopWord
停用词过滤,删除输入列表中的停用词
floatValue() - Method in class org.apdplat.word.util.AtomicFloat
 
forceOverride(String) - Static method in class org.apdplat.word.util.WordConfTools
强制覆盖默认配置
FullSegmentation - Class in org.apdplat.word.segmentation.impl
基于词典的全切分算法 Dictionary-based full segmentation algorithm 利用ngram给每一种切分结果计算分值 如果多个切分结果分值相同,则选择切分出的词的个数最少的切分结果(最少分词原则)
FullSegmentation() - Constructor for class org.apdplat.word.segmentation.impl.FullSegmentation
 

G

generateDataset(String, String, String) - Static method in class org.apdplat.word.corpus.Evaluation
生成测试数据集和标准数据集
GenericTrie<V> - Class in org.apdplat.word.util
词首字索引式通用前缀树,高效存储,快速搜索 为前缀树的一级节点(词首字)建立索引(比二分查找要快)
GenericTrie() - Constructor for class org.apdplat.word.util.GenericTrie
 
get(String, int) - Static method in class org.apdplat.word.analysis.HotWord
 
get(String) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Get value by a String key, just like a map.get() method
get(int) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Pick the value by index in value array
Notice that to be more efficiently, this method DONOT check the parameter
get() - Method in class org.apdplat.word.elasticsearch.ChineseWordAnalyzerProvider
 
get() - Method in class org.apdplat.word.util.AtomicFloat
 
get(String, int, int) - Method in class org.apdplat.word.util.DoubleArrayGenericTrie
 
get(String) - Method in class org.apdplat.word.util.DoubleArrayGenericTrie
 
get(String) - Method in class org.apdplat.word.util.GenericTrie
 
get(String, int, int) - Method in class org.apdplat.word.util.GenericTrie
 
get(String, String) - Static method in class org.apdplat.word.util.WordConfTools
 
get(String) - Static method in class org.apdplat.word.util.WordConfTools
 
getAcronymPinYin() - Method in class org.apdplat.word.segmentation.Word
 
getAndAdd(float) - Method in class org.apdplat.word.util.AtomicFloat
 
getAndDecrement() - Method in class org.apdplat.word.util.AtomicFloat
 
getAndIncrement() - Method in class org.apdplat.word.util.AtomicFloat
 
getAndSet(float) - Method in class org.apdplat.word.util.AtomicFloat
 
getAntonym() - Method in class org.apdplat.word.segmentation.Word
 
getBoolean(String, boolean) - Static method in class org.apdplat.word.util.WordConfTools
 
getBoolean(String) - Static method in class org.apdplat.word.util.WordConfTools
 
getCharPerfectRate() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getCharWrongRate() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getDes() - Method in class org.apdplat.word.segmentation.PartOfSpeech
 
getDes() - Method in enum org.apdplat.word.segmentation.SegmentationAlgorithm
 
getDictionary() - Static method in class org.apdplat.word.dictionary.DictionaryFactory
 
getDictionary() - Method in interface org.apdplat.word.segmentation.DictionaryBasedSegmentation
获取词典操作接口
getDictionary() - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
获取词典操作接口
getDirectoryWatcher(DirectoryWatcher.WatcherCallback, WatchEvent.Kind<?>...) - Static method in class org.apdplat.word.util.DirectoryWatcher
 
getFrequency(String, String) - Static method in class org.apdplat.word.corpus.Bigram
 
getFrequency(String, String, String) - Static method in class org.apdplat.word.corpus.Trigram
 
getFrequency() - Method in class org.apdplat.word.segmentation.Word
 
getFullPinYin() - Method in class org.apdplat.word.segmentation.Word
 
getHashBitCount() - Method in class org.apdplat.word.analysis.SimHashPlusHammingDistanceTextSimilarity
 
getHits() - Method in class org.apdplat.word.analysis.Hits
 
getInt(String, int) - Static method in class org.apdplat.word.util.WordConfTools
 
getInt(String) - Static method in class org.apdplat.word.util.WordConfTools
 
getInterceptLength() - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
分词时截取的字符串的最大长度
getLinePerfectRate() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getLineWrongRate() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getMaxFrequency() - Static method in class org.apdplat.word.corpus.Bigram
 
getMaxFrequency() - Static method in class org.apdplat.word.corpus.Trigram
 
getMaxLength() - Method in interface org.apdplat.word.dictionary.Dictionary
词典中的词的最大长度,即有多少个字符
getMaxLength() - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
getMaxLength() - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
getMaxLength() - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
getPartOfSpeech() - Method in class org.apdplat.word.segmentation.Word
 
getPerfectCharCount() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getPerfectLineCount() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getPos() - Method in class org.apdplat.word.segmentation.PartOfSpeech
 
getResultPath() - Method in class org.apdplat.word.WordFrequencyStatistics
获取词频统计结果保存路径
getScore() - Method in class org.apdplat.word.analysis.Hit
 
getScore(String, String) - Static method in class org.apdplat.word.corpus.Bigram
获取两个词一前一后紧挨着同时出现在语料库中的分值 分值被归一化了: 完全没有出现分值为0 出现频率最高的分值为1
getScore(String, String, String) - Static method in class org.apdplat.word.corpus.Trigram
获取三个词前后紧挨着同时出现在语料库中的分值 分值被归一化了: 完全没有出现分值为0 出现频率最高的分值为1
getSegmentation(SegmentationAlgorithm) - Static method in class org.apdplat.word.segmentation.SegmentationFactory
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.BidirectionalMaximumMatching
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.BidirectionalMaximumMinimumMatching
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.BidirectionalMinimumMatching
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.FullSegmentation
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.MaximumMatching
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.MaxNgramScore
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.MinimalWordCount
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.MinimumMatching
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.PureEnglish
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.ReverseMaximumMatching
 
getSegmentationAlgorithm() - Method in class org.apdplat.word.segmentation.impl.ReverseMinimumMatching
 
getSegmentationAlgorithm() - Method in interface org.apdplat.word.segmentation.Segmentation
分词器使用的算法
getSegmentationAlgorithm() - Method in class org.apdplat.word.WordFrequencyStatistics
获取分词算法
getSegSpeed() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getSurname(String) - Static method in class org.apdplat.word.recognition.PersonName
如果文本为人名,则返回姓
getSurnames() - Static method in class org.apdplat.word.recognition.PersonName
获取所有的姓
getSynonym() - Method in class org.apdplat.word.segmentation.Word
 
getText() - Method in class org.apdplat.word.analysis.Hit
 
getText() - Method in class org.apdplat.word.segmentation.Word
 
getTimeDes(Long) - Static method in class org.apdplat.word.util.Utils
根据毫秒数转换为自然语言表示的时间
getTotalCharCount() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getTotalLineCount() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getWeight() - Method in class org.apdplat.word.segmentation.Word
 
getWord(String, int, int) - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
获取一个已经识别的词
getWrongCharCount() - Method in class org.apdplat.word.corpus.EvaluationResult
 
getWrongLineCount() - Method in class org.apdplat.word.corpus.EvaluationResult
 

H

has(String) - Static method in class org.apdplat.word.recognition.Punctuation
判断文本中是否包含标点符号
hashCode() - Method in class org.apdplat.word.segmentation.Word
 
Hit - Class in org.apdplat.word.analysis
相似度排名结果
Hit() - Constructor for class org.apdplat.word.analysis.Hit
 
Hit(int, int, V) - Constructor for class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie.Hit
 
hit(int, int, V) - Method in interface org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie.IHit
Hit a keyword, you can use some code like text.substring(begin, end) to get the keyword
hit(int, int, V, int) - Method in interface org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie.IHitFull
Hit a keyword, you can use some code like text.substring(begin, end) to get the keyword
Hits - Class in org.apdplat.word.analysis
相似度排名结果列表
Hits() - Constructor for class org.apdplat.word.analysis.Hits
 
Hits(int) - Constructor for class org.apdplat.word.analysis.Hits
 
HotWord - Class in org.apdplat.word.analysis
利用NGRAM做热词分析
HotWord() - Constructor for class org.apdplat.word.analysis.HotWord
 

I

I - Static variable in class org.apdplat.word.segmentation.PartOfSpeech
 
incrementAndGet() - Method in class org.apdplat.word.util.AtomicFloat
 
incrementToken() - Method in class org.apdplat.word.lucene.ChineseWordTokenizer
 
indexModules(Settings) - Method in class org.apdplat.word.elasticsearch.ChineseWordPlugin
 
intValue() - Method in class org.apdplat.word.util.AtomicFloat
 
is(String) - Static method in class org.apdplat.word.recognition.PersonName
人名判定
is(char) - Static method in class org.apdplat.word.recognition.Punctuation
判断一个字符是否是标点符号
is(char) - Static method in class org.apdplat.word.recognition.Quantifier
 
is(String) - Static method in class org.apdplat.word.recognition.StopWord
判断一个词是否是停用词
isChineseCharAndLengthAtLeastOne(String) - Static method in class org.apdplat.word.util.Utils
至少出现一次中文字符,且以中文字符开头和结束
isChineseCharAndLengthAtLeastTwo(String) - Static method in class org.apdplat.word.util.Utils
至少出现两次中文字符,且以中文字符开头和结束
isChineseNumber(String) - Static method in class org.apdplat.word.recognition.RecognitionTool
中文数字识别,包括大小写
isChineseNumber(String, int, int) - Static method in class org.apdplat.word.recognition.RecognitionTool
中文数字识别,包括大小写
isEnglish(String) - Static method in class org.apdplat.word.recognition.RecognitionTool
英文单词识别
isEnglish(String, int, int) - Static method in class org.apdplat.word.recognition.RecognitionTool
英文单词识别
isEnglish(char) - Static method in class org.apdplat.word.recognition.RecognitionTool
英文字符识别,包括大小写,包括全角和半角
isEnglishAndNumberMix(String, int, int) - Static method in class org.apdplat.word.recognition.RecognitionTool
英文字母和数字混合识别,能识别纯数字、纯英文单词以及混合的情况
isFraction(String) - Static method in class org.apdplat.word.recognition.RecognitionTool
小数和分数识别
isFraction(String, int, int) - Static method in class org.apdplat.word.recognition.RecognitionTool
小数和分数识别
isNumber(String) - Static method in class org.apdplat.word.recognition.RecognitionTool
数字识别
isNumber(String, int, int) - Static method in class org.apdplat.word.recognition.RecognitionTool
数字识别
isNumber(char) - Static method in class org.apdplat.word.recognition.RecognitionTool
阿拉伯数字识别,包括全角和半角
isParallelSeg() - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
 
isPos(String) - Static method in class org.apdplat.word.segmentation.PartOfSpeech
 
isQuantifier(String) - Static method in class org.apdplat.word.recognition.RecognitionTool
数量词识别,如日期、时间、长度、容量、重量、面积等等
isQuantifier(String, int, int) - Static method in class org.apdplat.word.recognition.RecognitionTool
数量词识别,如日期、时间、长度、容量、重量、面积等等
isRemoveStopWord() - Method in class org.apdplat.word.WordFrequencyStatistics
是否移除停用词
isSimilar(String, String) - Method in interface org.apdplat.word.analysis.Similarity
对象1和对象2是否相似
isSimilar(List<Word>, List<Word>) - Method in interface org.apdplat.word.analysis.Similarity
词列表1和词列表2是否相似
isSimilar(HashMap<Word, Float>, HashMap<Word, Float>) - Method in interface org.apdplat.word.analysis.Similarity
词及其权重映射1和词及其权重映射2是否相似
isSimilar(Map<String, Float>, Map<String, Float>) - Method in interface org.apdplat.word.analysis.Similarity
词及其权重映射1和词及其权重映射2是否相似
isSurname(String) - Static method in class org.apdplat.word.recognition.PersonName
判断文本是不是百家姓

J

JaccardTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:Jaccard相似性系数(Jaccard similarity coefficient) ,通过计算两个集合交集的大小除以并集的大小来评估他们的相似度 算法步骤描述: 1、分词 2、求交集(去重),计算交集的不重复词的个数 intersectionSize 3、求并集(去重),计算并集的不重复词的个数 unionSize 4、2中的值除以3中的值 intersectionSize/(double)unionSize 完整计算公式: double score = intersectionSize/(double)unionSize;
JaccardTextSimilarity() - Constructor for class org.apdplat.word.analysis.JaccardTextSimilarity
 
JaroDistanceTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:Jaro距离(Jaro Distance),编辑距离的一种类型 这里需要注意的是Jaro距离也就是相似度分值
JaroDistanceTextSimilarity() - Constructor for class org.apdplat.word.analysis.JaroDistanceTextSimilarity
 
JaroWinklerDistanceTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:Jaro–Winkler距离(Jaro–Winkler Distance),Jaro的扩展 由William E.
JaroWinklerDistanceTextSimilarity() - Constructor for class org.apdplat.word.analysis.JaroWinklerDistanceTextSimilarity
 
JaroWinklerDistanceTextSimilarity(double) - Constructor for class org.apdplat.word.analysis.JaroWinklerDistanceTextSimilarity
scalingFactor的值介于闭区间[0, 0.25]

L

l - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
the length of every key
load(ObjectInputStream) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Load
load(List<String>) - Method in interface org.apdplat.word.util.ResourceLoader
初始加载全部数据
loadAndWatch(ResourceLoader, String) - Static method in class org.apdplat.word.util.AutoDetector
加载资源并自动检测资源变化 当资源发生变化的时候重新自动加载
LOGGER - Static variable in class org.apdplat.word.analysis.TextSimilarity
 
LOGGER - Variable in class org.apdplat.word.segmentation.impl.AbstractSegmentation
 
longerText - Variable in class org.apdplat.word.analysis.JaroDistanceTextSimilarity
 
longValue() - Method in class org.apdplat.word.util.AtomicFloat
 
LRUCache<K,V> - Class in org.apdplat.word.vector
LRU (Least Recently Used) 算法的Java实现
LRUCache(int) - Constructor for class org.apdplat.word.vector.LRUCache
 

M

main(String[]) - Static method in class org.apdplat.word.analysis.CosineTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.EditDistanceTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.EuclideanDistanceTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.HotWord
 
main(String[]) - Static method in class org.apdplat.word.analysis.JaccardTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.JaroDistanceTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.JaroWinklerDistanceTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.ManhattanDistanceTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.SimHashPlusHammingDistanceTextSimilarity
 
main(String[]) - Static method in interface org.apdplat.word.analysis.SimilarityRanker
 
main(String[]) - Static method in class org.apdplat.word.analysis.SimpleTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.analysis.SørensenDiceCoefficientTextSimilarity
 
main(String[]) - Static method in class org.apdplat.word.corpus.CorpusMerge
 
main(String[]) - Static method in class org.apdplat.word.corpus.CorpusTools
 
main(String[]) - Static method in class org.apdplat.word.corpus.Evaluation
 
main(String[]) - Static method in class org.apdplat.word.corpus.ExtractText
 
main(String[]) - Static method in class org.apdplat.word.dictionary.DictionaryFactory
 
main(String[]) - Static method in class org.apdplat.word.dictionary.DictionaryTools
 
main(String[]) - Static method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
main(String[]) - Static method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
main(String[]) - Static method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
main(String[]) - Static method in class org.apdplat.word.lucene.ChineseWordAnalyzer
 
main(String[]) - Static method in class org.apdplat.word.recognition.PersonName
 
main(String[]) - Static method in class org.apdplat.word.recognition.Punctuation
 
main(String[]) - Static method in class org.apdplat.word.recognition.Quantifier
 
main(String[]) - Static method in class org.apdplat.word.recognition.RecognitionTool
 
main(String[]) - Static method in class org.apdplat.word.recognition.StopWord
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.BidirectionalMaximumMatching
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.BidirectionalMaximumMinimumMatching
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.BidirectionalMinimumMatching
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.FullSegmentation
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.MaximumMatching
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.MaxNgramScore
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.MinimalWordCount
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.MinimumMatching
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.PureEnglish
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.ReverseMaximumMatching
 
main(String[]) - Static method in class org.apdplat.word.segmentation.impl.ReverseMinimumMatching
 
main(String[]) - Static method in class org.apdplat.word.segmentation.PartOfSpeech
 
main(String[]) - Static method in class org.apdplat.word.segmentation.SegmentationContrast
 
main(String[]) - Static method in class org.apdplat.word.segmentation.WordRefiner
 
main(String[]) - Static method in class org.apdplat.word.tagging.AntonymTagging
 
main(String[]) - Static method in class org.apdplat.word.tagging.PartOfSpeechTagging
 
main(String[]) - Static method in class org.apdplat.word.tagging.PinyinTagging
 
main(String[]) - Static method in class org.apdplat.word.tagging.SynonymTagging
 
main(String[]) - Static method in class org.apdplat.word.util.AutoDetector
 
main(String[]) - Static method in class org.apdplat.word.util.DirectoryWatcher
 
main(String[]) - Static method in class org.apdplat.word.util.DoubleArrayGenericTrie
 
main(String[]) - Static method in class org.apdplat.word.util.GenericTrie
 
main(String[]) - Static method in class org.apdplat.word.util.TrieTest
 
main(String[]) - Static method in class org.apdplat.word.util.WordConfTools
 
main(String[]) - Static method in class org.apdplat.word.vector.Distance
 
main(String[]) - Static method in class org.apdplat.word.vector.F
 
main(String[]) - Static method in class org.apdplat.word.vector.LRUCache
 
main(String[]) - Static method in class org.apdplat.word.vector.T
 
main(String[]) - Static method in class org.apdplat.word.vector.Word2Vector
 
main(String[]) - Static method in class org.apdplat.word.WordFrequencyStatistics
 
main(String[]) - Static method in class org.apdplat.word.WordSegmenter
 
ManhattanDistanceTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:曼哈顿距离(Manhattan Distance),通过计算两个点在标准坐标系上的绝对轴距总和来评估他们的相似度 曼哈顿距离原理: 设A(x1, y1),B(x2, y2)是平面上任意两点 两点间的距离dist(A,B)=|x1-x2|+|y1-y2|
ManhattanDistanceTextSimilarity() - Constructor for class org.apdplat.word.analysis.ManhattanDistanceTextSimilarity
 
MaximumMatching - Class in org.apdplat.word.segmentation.impl
基于词典的正向最大匹配算法 Dictionary-based maximum matching algorithm
MaximumMatching() - Constructor for class org.apdplat.word.segmentation.impl.MaximumMatching
 
MaxNgramScore - Class in org.apdplat.word.segmentation.impl
最大Ngram分值算法 Dictionary-based max ngram score segmentation algorithm 最大N元模型分值算法是指从切分结果里面选择切分出来的词的ngram分值最大的结果 利用ngram给切分结果计算分值 接着按分值从大到小排序 然后选择第一个结果 如果所有切分结果都没有ngram分值 则算法退化为 最少词数算法(org.apdplat.word.segmentation.impl.MinimalWordCount)
MaxNgramScore() - Constructor for class org.apdplat.word.segmentation.impl.MaxNgramScore
 
merge(String, String) - Static method in class org.apdplat.word.corpus.CorpusMerge
将多个语料库文件合并为一个
merge(List<String>, String) - Static method in class org.apdplat.word.dictionary.DictionaryTools
把多个词典合并为一个
merge(String, String...) - Method in class org.apdplat.word.WordFrequencyStatistics
将多个词频统计结果文件进行合并
MinimalWordCount - Class in org.apdplat.word.segmentation.impl
最少词数算法 Dictionary-based minimal word count segmentation algorithm 最少词数算法是指从切分结果里面选择切分出来的词的个数最少的结果 如果有多个切分结果的词的个数相同 则利用ngram给这些切分结果计算分值 接着按分值从大到小排序 最后选择第一个结果
MinimalWordCount() - Constructor for class org.apdplat.word.segmentation.impl.MinimalWordCount
 
MinimumMatching - Class in org.apdplat.word.segmentation.impl
基于词典的正向最小匹配算法 Dictionary-based minimum matching algorithm
MinimumMatching() - Constructor for class org.apdplat.word.segmentation.impl.MinimumMatching
 

N

name() - Method in class org.apdplat.word.elasticsearch.ChineseWordPlugin
 
ngram(List<Word>...) - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
利用ngram进行评分
ngramEnabled() - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
是否启用ngram
nodeModules() - Method in class org.apdplat.word.elasticsearch.ChineseWordPlugin
 
nodeServices() - Method in class org.apdplat.word.elasticsearch.ChineseWordPlugin
 

O

onModule(AnalysisModule) - Method in class org.apdplat.word.elasticsearch.ChineseWordPlugin
 
org.apdplat.word - package org.apdplat.word
 
org.apdplat.word.analysis - package org.apdplat.word.analysis
 
org.apdplat.word.corpus - package org.apdplat.word.corpus
 
org.apdplat.word.dictionary - package org.apdplat.word.dictionary
 
org.apdplat.word.dictionary.impl - package org.apdplat.word.dictionary.impl
 
org.apdplat.word.elasticsearch - package org.apdplat.word.elasticsearch
 
org.apdplat.word.lucene - package org.apdplat.word.lucene
 
org.apdplat.word.recognition - package org.apdplat.word.recognition
 
org.apdplat.word.segmentation - package org.apdplat.word.segmentation
 
org.apdplat.word.segmentation.impl - package org.apdplat.word.segmentation.impl
 
org.apdplat.word.solr - package org.apdplat.word.solr
 
org.apdplat.word.tagging - package org.apdplat.word.tagging
 
org.apdplat.word.util - package org.apdplat.word.util
 
org.apdplat.word.vector - package org.apdplat.word.vector
 
output - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
output table of the Aho Corasick automata

P

parseText(String) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
parseText(String, int, int) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Parse text
parseText(String, AhoCorasickDoubleArrayTrie.IHit<V>) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Parse text
parseText(char[], AhoCorasickDoubleArrayTrie.IHit<V>) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Parse text
parseText(char[], AhoCorasickDoubleArrayTrie.IHitFull<V>) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Parse text
PartOfSpeech - Class in org.apdplat.word.segmentation
词性
PartOfSpeech(String, String) - Constructor for class org.apdplat.word.segmentation.PartOfSpeech
 
PartOfSpeechTagging - Class in org.apdplat.word.tagging
词性标注
PersonName - Class in org.apdplat.word.recognition
人名识别
PersonName() - Constructor for class org.apdplat.word.recognition.PersonName
 
PinyinTagging - Class in org.apdplat.word.tagging
拼音标注
prefix(String) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
process(List<Word>) - Static method in class org.apdplat.word.tagging.AntonymTagging
 
process(List<Word>) - Static method in class org.apdplat.word.tagging.PartOfSpeechTagging
 
process(List<Word>) - Static method in class org.apdplat.word.tagging.PinyinTagging
 
process(List<Word>) - Static method in class org.apdplat.word.tagging.SynonymTagging
 
process(List<Word>, boolean) - Static method in class org.apdplat.word.tagging.SynonymTagging
 
processAnalyzers(AnalysisModule.AnalysisBinderProcessor.AnalyzersBindings) - Method in class org.apdplat.word.elasticsearch.ChineseWordAnalysisBinderProcessor
 
processCommand(String...) - Static method in class org.apdplat.word.WordSegmenter
 
processTokenizers(AnalysisModule.AnalysisBinderProcessor.TokenizersBindings) - Method in class org.apdplat.word.elasticsearch.ChineseWordAnalysisBinderProcessor
 
Punctuation - Class in org.apdplat.word.recognition
判断一个字符是否是标点符号
Punctuation() - Constructor for class org.apdplat.word.recognition.Punctuation
 
PureEnglish - Class in org.apdplat.word.segmentation.impl
针对纯英文文本的分词器
PureEnglish() - Constructor for class org.apdplat.word.segmentation.impl.PureEnglish
 
put(String, V) - Method in class org.apdplat.word.util.GenericTrie
 
putAll(Map<String, Integer>) - Method in class org.apdplat.word.util.DoubleArrayGenericTrie
 

Q

Quantifier - Class in org.apdplat.word.recognition
数量词识别
Quantifier() - Constructor for class org.apdplat.word.recognition.Quantifier
 

R

rank(String, List<String>) - Method in interface org.apdplat.word.analysis.SimilarityRanker
计算源文本和目标文本的相似度 根据相似度分值对目标文本进行排序
rank(String, List<String>, int) - Method in interface org.apdplat.word.analysis.SimilarityRanker
计算源文本和目标文本的相似度 根据相似度分值对目标文本进行排序 获取排名结果最高的topN项
recog(String) - Static method in class org.apdplat.word.recognition.RecognitionTool
识别文本(英文单词、数字、时间等)
recog(String, int, int) - Static method in class org.apdplat.word.recognition.RecognitionTool
识别文本(英文单词、数字、时间等)
RecognitionTool - Class in org.apdplat.word.recognition
分词特殊情况识别工具 如英文单词、数字、时间等
RecognitionTool() - Constructor for class org.apdplat.word.recognition.RecognitionTool
 
recognize(List<Word>) - Static method in class org.apdplat.word.recognition.PersonName
对分词结果进行处理,识别人名
refine(List<Word>) - Static method in class org.apdplat.word.segmentation.WordRefiner
先拆词,再组词
reload() - Static method in class org.apdplat.word.corpus.Bigram
 
reload() - Static method in class org.apdplat.word.corpus.Trigram
 
reload() - Static method in class org.apdplat.word.dictionary.DictionaryFactory
 
reload() - Static method in class org.apdplat.word.recognition.PersonName
 
reload() - Static method in class org.apdplat.word.recognition.Punctuation
 
reload() - Static method in class org.apdplat.word.recognition.Quantifier
 
reload() - Static method in class org.apdplat.word.recognition.StopWord
 
reload() - Static method in class org.apdplat.word.segmentation.WordRefiner
 
reload() - Static method in class org.apdplat.word.tagging.AntonymTagging
 
reload() - Static method in class org.apdplat.word.tagging.PartOfSpeechTagging
 
reload() - Static method in class org.apdplat.word.tagging.SynonymTagging
 
reload() - Static method in class org.apdplat.word.util.WordConfTools
重新加载配置文件
remove(String) - Method in interface org.apdplat.word.dictionary.Dictionary
将单个词从词典中删除
remove(String) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
remove(String) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
remove(String) - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
remove(String) - Method in class org.apdplat.word.util.GenericTrie
移除词性
remove(String) - Method in interface org.apdplat.word.util.ResourceLoader
动态移除一行数据
removeAll(List<String>) - Method in interface org.apdplat.word.dictionary.Dictionary
批量将词从词典中删除
removeAll(List<String>) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
 
removeAll(List<String>) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
removeAll(List<String>) - Method in class org.apdplat.word.dictionary.impl.DoubleArrayDictionaryTrie
 
removeEldestEntry(Map.Entry<K, V>) - Method in class org.apdplat.word.vector.LRUCache
缓存是否已满的判断
removePhraseFromDic(String, String) - Static method in class org.apdplat.word.dictionary.DictionaryTools
移除词典中的短语结构
reset() - Method in class org.apdplat.word.WordFrequencyStatistics
清除之前的统计结果
ResourceLoader - Interface in org.apdplat.word.util
资源加载接口
ReverseMaximumMatching - Class in org.apdplat.word.segmentation.impl
基于词典的逆向最大匹配算法 Dictionary-based reverse maximum matching algorithm
ReverseMaximumMatching() - Constructor for class org.apdplat.word.segmentation.impl.ReverseMaximumMatching
 
ReverseMinimumMatching - Class in org.apdplat.word.segmentation.impl
基于词典的逆向最小匹配算法 Dictionary-based reverse minimum matching algorithm
ReverseMinimumMatching() - Constructor for class org.apdplat.word.segmentation.impl.ReverseMinimumMatching
 
run(String) - Static method in class org.apdplat.word.segmentation.SegmentationContrast
 

S

save(ObjectOutputStream) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Save
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.CosineTextSimilarity
判定相似度的方式:余弦相似度 余弦夹角原理: 向量a=(x1,y1),向量b=(x2,y2) similarity=a.b/|a|*|b| a.b=x1x2+y1y2 |a|=根号[(x1)^2+(y1)^2],|b|=根号[(x2)^2+(y2)^2]
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.EditDistanceTextSimilarity
计算相似度分值
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.EuclideanDistanceTextSimilarity
判定相似度的方式:欧几里得距离 欧几里得距离原理: 设A(x1, y1),B(x2, y2)是平面上任意两点 两点间的距离dist(A,B)=sqrt((x1-x2)^2+(y1-y2)^2)
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.JaccardTextSimilarity
判定相似度的方式:Jaccard相似性系数
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.JaroDistanceTextSimilarity
计算相似度分值
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.JaroWinklerDistanceTextSimilarity
计算相似度分值
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.ManhattanDistanceTextSimilarity
判定相似度的方式:曼哈顿距离 曼哈顿距离原理: 设A(x1, y1),B(x2, y2)是平面上任意两点 两点间的距离dist(A,B)=|x1-x2|+|y1-y2|
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.SimHashPlusHammingDistanceTextSimilarity
计算相似度分值
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.SimpleTextSimilarity
计算相似度分值
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.SørensenDiceCoefficientTextSimilarity
计算相似度分值
scoreImpl(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.TextSimilarity
计算相似度分值
seg(String, boolean, char...) - Static method in class org.apdplat.word.recognition.Punctuation
将一段文本根据标点符号分割为多个不包含标点符号的文本 可指定要保留那些标点符号
seg(String) - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
默认分词算法实现: 1、把要分词的文本根据标点符号进行分割 2、对分割后的文本进行分词 3、组合分词结果
seg(String) - Method in class org.apdplat.word.segmentation.impl.PureEnglish
 
seg(String) - Method in interface org.apdplat.word.segmentation.Segmentation
将文本切分为词
seg(String) - Static method in class org.apdplat.word.segmentation.SegmentationContrast
 
seg(File, File, boolean, SegmentationAlgorithm) - Static method in class org.apdplat.word.util.Utils
对文件进行分词
seg(File, File, boolean, SegmentationAlgorithm, Utils.FileSegmentationCallback) - Static method in class org.apdplat.word.util.Utils
对文件进行分词
seg(String) - Method in class org.apdplat.word.WordFrequencyStatistics
对文本进行分词
seg(File, File) - Method in class org.apdplat.word.WordFrequencyStatistics
对文件进行分词
seg(String, SegmentationAlgorithm) - Static method in class org.apdplat.word.WordSegmenter
对文本进行分词,移除停用词 可指定其他分词算法
seg(String) - Static method in class org.apdplat.word.WordSegmenter
对文本进行分词,移除停用词 使用双向最大匹配算法
seg(File, File, SegmentationAlgorithm) - Static method in class org.apdplat.word.WordSegmenter
对文件进行分词,移除停用词 可指定其他分词算法
seg(File, File) - Static method in class org.apdplat.word.WordSegmenter
对文件进行分词,移除停用词 使用双向最大匹配算法
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
具体的分词实现,留待子类实现
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.BidirectionalMaximumMatching
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.BidirectionalMaximumMinimumMatching
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.BidirectionalMinimumMatching
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.FullSegmentation
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.MaximumMatching
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.MaxNgramScore
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.MinimalWordCount
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.MinimumMatching
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.ReverseMaximumMatching
 
segImpl(String) - Method in class org.apdplat.word.segmentation.impl.ReverseMinimumMatching
 
Segmentation - Interface in org.apdplat.word.segmentation
中文分词接口 Chinese Word Segmentation Interface
SegmentationAlgorithm - Enum in org.apdplat.word.segmentation
中文分词算法 Chinese word segmentation algorithm
SegmentationContrast - Class in org.apdplat.word.segmentation
对比各种分词算法的分词结果
SegmentationContrast() - Constructor for class org.apdplat.word.segmentation.SegmentationContrast
 
SegmentationFactory - Class in org.apdplat.word.segmentation
中文分词工厂类 根据指定的分词算法返回分词实现
segWithStopWords(String, SegmentationAlgorithm) - Static method in class org.apdplat.word.WordSegmenter
对文本进行分词,保留停用词 可指定其他分词算法
segWithStopWords(String) - Static method in class org.apdplat.word.WordSegmenter
对文本进行分词,保留停用词 使用双向最大匹配算法
segWithStopWords(File, File, SegmentationAlgorithm) - Static method in class org.apdplat.word.WordSegmenter
对文件进行分词,保留停用词 可指定其他分词算法
segWithStopWords(File, File) - Static method in class org.apdplat.word.WordSegmenter
对文件进行分词,保留停用词 使用双向最大匹配算法
set(float) - Method in class org.apdplat.word.util.AtomicFloat
 
set(String, String) - Static method in class org.apdplat.word.util.WordConfTools
 
setAcronymPinYin(String) - Method in class org.apdplat.word.segmentation.Word
 
setAntonym(List<Word>) - Method in class org.apdplat.word.segmentation.Word
 
setDes(String) - Method in class org.apdplat.word.segmentation.PartOfSpeech
 
setDictionary(Dictionary) - Method in interface org.apdplat.word.segmentation.DictionaryBasedSegmentation
为基于词典的中文分词接口指定词典操作接口
setDictionary(Dictionary) - Method in class org.apdplat.word.segmentation.impl.AbstractSegmentation
为基于词典的中文分词接口指定词典操作接口
setFrequency(int) - Method in class org.apdplat.word.segmentation.Word
 
setFullPinYin(String) - Method in class org.apdplat.word.segmentation.Word
 
setHashBitCount(int) - Method in class org.apdplat.word.analysis.SimHashPlusHammingDistanceTextSimilarity
 
setLimit(int) - Method in class org.apdplat.word.vector.Distance
 
setPartOfSpeech(PartOfSpeech) - Method in class org.apdplat.word.segmentation.Word
 
setPerfectCharCount(int) - Method in class org.apdplat.word.corpus.EvaluationResult
 
setPerfectLineCount(int) - Method in class org.apdplat.word.corpus.EvaluationResult
 
setPos(String) - Method in class org.apdplat.word.segmentation.PartOfSpeech
 
setRemoveStopWord(boolean) - Method in class org.apdplat.word.WordFrequencyStatistics
设置是否移除停用词
setResultPath(String) - Method in class org.apdplat.word.WordFrequencyStatistics
设置词频统计结果保存路径
setScore(Double) - Method in class org.apdplat.word.analysis.Hit
 
setSegmentationAlgorithm(SegmentationAlgorithm) - Method in class org.apdplat.word.analysis.TextSimilarity
 
setSegmentationAlgorithm(SegmentationAlgorithm) - Method in class org.apdplat.word.corpus.EvaluationResult
 
setSegmentationAlgorithm(SegmentationAlgorithm) - Method in class org.apdplat.word.WordFrequencyStatistics
设置分词算法
setSegSpeed(float) - Method in class org.apdplat.word.corpus.EvaluationResult
 
setSynonym(List<Word>) - Method in class org.apdplat.word.segmentation.Word
 
setText(String) - Method in class org.apdplat.word.analysis.Hit
 
setText(String) - Method in class org.apdplat.word.segmentation.Word
 
setTextSimilarity(TextSimilarity) - Method in class org.apdplat.word.vector.Distance
 
setTotalCharCount(int) - Method in class org.apdplat.word.corpus.EvaluationResult
 
setTotalLineCount(int) - Method in class org.apdplat.word.corpus.EvaluationResult
 
setWeight(Float) - Method in class org.apdplat.word.segmentation.Word
 
setWrongCharCount(int) - Method in class org.apdplat.word.corpus.EvaluationResult
 
setWrongLineCount(int) - Method in class org.apdplat.word.corpus.EvaluationResult
 
shorterText - Variable in class org.apdplat.word.analysis.JaroDistanceTextSimilarity
 
show(char) - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
show() - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
 
show(char) - Method in class org.apdplat.word.util.GenericTrie
 
show() - Method in class org.apdplat.word.util.GenericTrie
 
showConflict() - Method in class org.apdplat.word.dictionary.impl.DictionaryTrie
统计根节点冲突情况及预分配的数组空间利用情况
showConflict() - Method in class org.apdplat.word.util.GenericTrie
统计根节点冲突情况及预分配的数组空间利用情况
showUsage() - Static method in class org.apdplat.word.segmentation.SegmentationContrast
 
SimHashPlusHammingDistanceTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:SimHash + 汉明距离(Hamming Distance) 先使用SimHash把不同长度的文本映射为等长文本,然后再计算等长文本的汉明距离 simhash和普通hash最大的不同在于: 普通hash对 仅有一个字节不同的文本 会映射成 两个完全不同的哈希结果 simhash对 相似的文本 会映射成 相似的哈希结果 汉明距离是以美国数学家Richard Wesley Hamming的名字命名的 两个等长字符串之间的汉明距离是两个字符串相应位置的不同字符的个数 换句话说,它就是将一个字符串变换成另外一个字符串所需要替换的字符个数 比如: 1011101 与 1001001 之间的汉明距离是 2 2143896 与 2233796 之间的汉明距离是 3 toned 与 roses 之间的汉明距离是 3
SimHashPlusHammingDistanceTextSimilarity() - Constructor for class org.apdplat.word.analysis.SimHashPlusHammingDistanceTextSimilarity
 
SimHashPlusHammingDistanceTextSimilarity(int) - Constructor for class org.apdplat.word.analysis.SimHashPlusHammingDistanceTextSimilarity
 
Similarity - Interface in org.apdplat.word.analysis
相似度
SimilarityRanker - Interface in org.apdplat.word.analysis
相似度排名
similarScore(String, String) - Method in interface org.apdplat.word.analysis.Similarity
对象1和对象2的相似度分值
similarScore(List<Word>, List<Word>) - Method in interface org.apdplat.word.analysis.Similarity
词列表1和词列表2的相似度分值
similarScore(HashMap<Word, Float>, HashMap<Word, Float>) - Method in interface org.apdplat.word.analysis.Similarity
词及其权重映射1和词及其权重映射2的相似度分值
similarScore(Map<String, Float>, Map<String, Float>) - Method in interface org.apdplat.word.analysis.Similarity
词及其权重映射1和词及其权重映射2的相似度分值
similarScore(String, String) - Method in class org.apdplat.word.analysis.TextSimilarity
文本1和文本2的相似度分值
similarScore(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.TextSimilarity
词列表1和词列表2的相似度分值
SimpleTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:简单共有词,通过计算两篇文档共有的词的总字符数除以最长文档字符数来评估他们的相似度 算法步骤描述: 1、分词 2、求交集(去重),累加交集的所有的词的字符数得到 intersectionLength 3、求最长文本字符数 Math.max(words1Length, words2Length) 4、2中的值除以3中的值 intersectionLength/(double)Math.max(words1Length, words2Length) 完整计算公式: double score = intersectionLength/(double)Math.max(words1Length, words2Length);
SimpleTextSimilarity() - Constructor for class org.apdplat.word.analysis.SimpleTextSimilarity
 
size() - Method in class org.apdplat.word.analysis.Hits
 
size - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
the size of base and check array
size() - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
Get the size of the keywords
split(Word) - Static method in class org.apdplat.word.segmentation.WordRefiner
将一个词拆分成几个,返回null表示不能拆分
StopWord - Class in org.apdplat.word.recognition
停用词判定 通过系统属性及配置文件指定停用词词典(stopwords.path) 指定方式一,编程指定(高优先级): WordConfTools.set("stopwords.path", "classpath:stopwords.txt"); 指定方式二,Java虚拟机启动参数(中优先级): java -Dstopwords.path=classpath:stopwords.txt 指定方式三,配置文件指定(低优先级): 在类路径下的word.conf中指定配置信息 stopwords.path=classpath:stopwords.txt 如未指定,则默认使用停用词词典文件(类路径下的stopwords.txt)
StopWord() - Constructor for class org.apdplat.word.recognition.StopWord
 
SynonymTagging - Class in org.apdplat.word.tagging
同义标注
SørensenDiceCoefficientTextSimilarity - Class in org.apdplat.word.analysis
文本相似度计算 判定方式:Sørensen–Dice系数(Sørensen–Dice coefficient),通过计算两个集合交集的大小的2倍除以两个集合的大小之和来评估他们的相似度 算法步骤描述: 1、分词 2、求交集(去重),计算交集的不重复词的个数 intersectionSize 3、两个集合的大小分别为 set1Size 和 set2Size 4、相似度分值 = 2*intersectionSize/(set1Size+set2Size) 完整计算公式: double score = 2*intersectionSize/(set1Size+set2Size);
SørensenDiceCoefficientTextSimilarity() - Constructor for class org.apdplat.word.analysis.SørensenDiceCoefficientTextSimilarity
 

T

T - Class in org.apdplat.word.vector
Created by apple on 7/14/15.
T() - Constructor for class org.apdplat.word.vector.T
 
taggingWeightWithWordFrequency(List<Word>, List<Word>) - Method in class org.apdplat.word.analysis.TextSimilarity
如果没有指定权重,则默认使用词频来标注词的权重 词频数据怎么来? 一个词在词列表1中出现了几次,它在词列表1中的权重就是几 一个词在词列表2中出现了几次,它在词列表2中的权重就是几 标注好的权重存储在Word类的weight字段中
testBigram() - Static method in class org.apdplat.word.util.TrieTest
 
testBigram2() - Static method in class org.apdplat.word.util.TrieTest
 
testTrigram() - Static method in class org.apdplat.word.util.TrieTest
 
testTrigram2() - Static method in class org.apdplat.word.util.TrieTest
 
TextSimilarity - Class in org.apdplat.word.analysis
文本相似度
TextSimilarity() - Constructor for class org.apdplat.word.analysis.TextSimilarity
 
thresholdRate - Static variable in interface org.apdplat.word.analysis.Similarity
 
toFastSearchMap(List<Word>) - Method in class org.apdplat.word.analysis.TextSimilarity
构造权重快速搜索容器
toString() - Method in class org.apdplat.word.analysis.Hit
 
toString() - Method in class org.apdplat.word.corpus.EvaluationResult
 
toString() - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie.Hit
 
toString() - Method in class org.apdplat.word.segmentation.Word
 
toString() - Method in class org.apdplat.word.util.AtomicFloat
 
transition(int, char) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
transition of a state
transitionWithRoot(int, char) - Method in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
transition of a state, if the state is root and it failed, then returns the root
TrieTest - Class in org.apdplat.word.util
前缀树和双数组前缀树性能测试
TrieTest() - Constructor for class org.apdplat.word.util.TrieTest
 
Trigram - Class in org.apdplat.word.corpus
三元语法模型
Trigram() - Constructor for class org.apdplat.word.corpus.Trigram
 
trigram(List<Word>...) - Static method in class org.apdplat.word.corpus.Trigram
一次性计算多种分词结果的三元模型分值
trigram(List<Word>) - Static method in class org.apdplat.word.corpus.Trigram
计算分词结果的三元模型分值

U

Utils - Class in org.apdplat.word.util
工具类
Utils() - Constructor for class org.apdplat.word.util.Utils
 
Utils.FileSegmentationCallback - Interface in org.apdplat.word.util
 

V

v - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie
outer value array
value - Variable in class org.apdplat.word.dictionary.impl.AhoCorasickDoubleArrayTrie.Hit
the value assigned to the keyword
valueOf(String) - Static method in class org.apdplat.word.segmentation.PartOfSpeech
 
valueOf(String) - Static method in enum org.apdplat.word.segmentation.SegmentationAlgorithm
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.apdplat.word.segmentation.SegmentationAlgorithm
Returns an array containing the constants of this enum type, in the order they are declared.

W

watchDirectory(String) - Method in class org.apdplat.word.util.DirectoryWatcher
监控指定目录,不监控子目录
watchDirectory(Path) - Method in class org.apdplat.word.util.DirectoryWatcher
监控指定目录,不监控子目录
watchDirectoryTree(String) - Method in class org.apdplat.word.util.DirectoryWatcher
监控指定的目录及其所有子目录
watchDirectoryTree(Path) - Method in class org.apdplat.word.util.DirectoryWatcher
监控指定的目录及其所有子目录
Word - Class in org.apdplat.word.segmentation
词、拼音、词性、词频 Word
Word(String) - Constructor for class org.apdplat.word.segmentation.Word
 
Word(String, PartOfSpeech, int) - Constructor for class org.apdplat.word.segmentation.Word
 
Word2Vector - Class in org.apdplat.word.vector
用词向量来表达一个词
Word2Vector() - Constructor for class org.apdplat.word.vector.Word2Vector
 
WordConfTools - Class in org.apdplat.word.util
获取配置信息的工具类
WordConfTools() - Constructor for class org.apdplat.word.util.WordConfTools
 
WordFrequencyStatistics - Class in org.apdplat.word
词频统计
WordFrequencyStatistics() - Constructor for class org.apdplat.word.WordFrequencyStatistics
默认构造函数 不指定算法则默认使用:最大Ngram分值算法 不指定词频统计结果保存路径默认使用当前路径下的:WordFrequencyStatistics-Result.txt
WordFrequencyStatistics(String) - Constructor for class org.apdplat.word.WordFrequencyStatistics
构造函数 不指定算法则默认使用:最大Ngram分值算法
WordFrequencyStatistics(String, SegmentationAlgorithm) - Constructor for class org.apdplat.word.WordFrequencyStatistics
构造函数
WordFrequencyStatistics(String, String) - Constructor for class org.apdplat.word.WordFrequencyStatistics
构造函数
WordRefiner - Class in org.apdplat.word.segmentation
对分词结果进行微调
WordSegmenter - Class in org.apdplat.word
中文分词基础入口 默认使用双向最大匹配算法 也可指定其他分词算法
WordSegmenter() - Constructor for class org.apdplat.word.WordSegmenter
 
A B C D E F G H I J L M N O P Q R S T U V W 
Skip navigation links

Copyright © 2014–2015 APDPlat. All rights reserved.