aspell: Aspell Suggestion Strategy
1
1 A.1 Aspell Suggestion Strategy
1 ==============================
1
1 The magic behind my spell checker comes from merging Lawrence Philips
1 excellent metaphone algorithm and Ispell's near miss strategy which is
1 inserting a space or hyphen, interchanging two adjacent letters,
1 changing one letter, deleting a letter, or adding a letter.
1
1 The process goes something like this.
1
1 1. Convert the misspelled word to its soundslike equivalent (its
1 metaphone for English words).
1
1 2. Find all words that have a soundslike within one or two edit
1 distances from the original word's soundslike. The edit distance
1 is the total number of deletions, insertions, exchanges, or
1 adjacent swaps needed to make one string equivalent to the other.
1 When set to only look for soundslikes within one edit distance it
1 tries all possible soundslike combinations and checks if each one
1 is in the dictionary. When set to find all soundslike within two
1 edit distances it scans through the entire dictionary and quickly
1 scores each soundslike. The scoring is quick because it will give
1 up if the two soundslikes are more than two edit distances apart.
1
1 3. Find misspelled words that have a correctly spelled replacement by
1 the same criteria of step number 2 and 3. That is the misspelled
1 word in the word pair (such as "teh -> the") would appear in the
1 suggestions list as if it was a correct spelling.
1
1 4. Score the result list and return the words with the lowest score.
1 The score is roughly the weighed average of the weighed edit
1 distance of the word to the misspelled word and the soundslike
1 equivalent of the two words. The weighted edit distance is like
1 the edit distance except that the various edits have weights
1 attached to them.
1
1 5. Replace the misspelled words that have correctly spelled
1 replacements with their replacements and remove any duplicates
1 that might arise because of this.
1
1 Please note that the soundslike equivalent is a rough approximation
1 of how the words sounds. It is not the phoneme of the word by any
1 means. For more details about exactly how each step is performed
1 please see the file `suggest.cc'. For more information on the metaphone
1 algorithm please see the data file `english_phonet.dat'.
1