public class SpellChecker
extends java.lang.Object
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
F_WORD
Field name for each word in the ngram index.
|
| Constructor and Description |
|---|
SpellChecker(Directory spellIndex)
Use the given directory as a spell checker index with a
LevensteinDistance as the default StringDistance. |
SpellChecker(Directory spellIndex,
StringDistance sd)
Use the given directory as a spell checker index.
|
| Modifier and Type | Method and Description |
|---|---|
void |
clearIndex()
Removes all terms from the spell check index.
|
void |
close()
Close the IndexSearcher used by this SpellChecker
|
boolean |
exist(java.lang.String word)
Check whether the word exists in the index.
|
StringDistance |
getStringDistance()
Returns the
StringDistance instance used by this
SpellChecker instance. |
void |
indexDictionary(Dictionary dict)
Indexes the data from the given
Dictionary. |
void |
indexDictionary(Dictionary dict,
int mergeFactor,
int ramMB)
Indexes the data from the given
Dictionary. |
void |
setAccuracy(float minScore)
Sets the accuracy 0 < minScore < 1; default 0.5
|
void |
setSpellIndex(Directory spellIndexDir)
Use a different index as the spell checker index or re-open
the existing index if
spellIndex is the same value
as given in the constructor. |
void |
setStringDistance(StringDistance sd)
Sets the
StringDistance implementation for this
SpellChecker instance. |
java.lang.String[] |
suggestSimilar(java.lang.String word,
int numSug)
Suggest similar words.
|
java.lang.String[] |
suggestSimilar(java.lang.String word,
int numSug,
IndexReader ir,
java.lang.String field,
boolean morePopular)
Suggest similar words (optionally restricted to a field of an index).
|
public static final java.lang.String F_WORD
public SpellChecker(Directory spellIndex, StringDistance sd) throws java.io.IOException
spellIndex - the spell index directorysd - the StringDistance measurement to usejava.io.IOException - if Spellchecker can not open the directorypublic SpellChecker(Directory spellIndex) throws java.io.IOException
LevensteinDistance as the default StringDistance. The
directory is created if it doesn't exist yet.spellIndex - the spell index directoryjava.io.IOException - if spellchecker can not open the directorypublic void setSpellIndex(Directory spellIndexDir) throws java.io.IOException
spellIndex is the same value
as given in the constructor.spellIndexDir - the spell directory to useAlreadyClosedException - if the Spellchecker is already closedjava.io.IOException - if spellchecker can not open the directorypublic void setStringDistance(StringDistance sd)
StringDistance implementation for this
SpellChecker instance.sd - the StringDistance implementation for this
SpellChecker instancepublic StringDistance getStringDistance()
StringDistance instance used by this
SpellChecker instance.StringDistance instance used by this
SpellChecker instance.public void setAccuracy(float minScore)
public java.lang.String[] suggestSimilar(java.lang.String word,
int numSug)
throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word - the word you want a spell check done onnumSug - the number of suggested wordsjava.io.IOException - if the underlying index throws an IOExceptionAlreadyClosedException - if the Spellchecker is already closedpublic java.lang.String[] suggestSimilar(java.lang.String word,
int numSug,
IndexReader ir,
java.lang.String field,
boolean morePopular)
throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word - the word you want a spell check done onnumSug - the number of suggested wordsir - the indexReader of the user index (can be null see field param)field - the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.morePopular - return only the suggest words that are as frequent or more frequent than the searched word
(only if restricted mode = (indexReader!=null and field!=null)java.io.IOException - if the underlying index throws an IOExceptionAlreadyClosedException - if the Spellchecker is already closedpublic void clearIndex()
throws java.io.IOException
java.io.IOExceptionAlreadyClosedException - if the Spellchecker is already closedpublic boolean exist(java.lang.String word)
throws java.io.IOException
word - java.io.IOExceptionAlreadyClosedException - if the Spellchecker is already closedpublic void indexDictionary(Dictionary dict, int mergeFactor, int ramMB) throws java.io.IOException
Dictionary.dict - Dictionary to indexmergeFactor - mergeFactor to use when indexingramMB - the max amount or memory in MB to useAlreadyClosedException - if the Spellchecker is already closedjava.io.IOExceptionpublic void indexDictionary(Dictionary dict) throws java.io.IOException
Dictionary.dict - the dictionary to indexjava.io.IOExceptionpublic void close()
throws java.io.IOException
java.io.IOException - if the close operation causes an IOExceptionAlreadyClosedException - if the SpellChecker is already closedCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.