aspell: Words With Symbols in Them

1 
1 C.2 Words With Spaces or Other Symbols in Them
1 ==============================================
1 
1 Many languages, including English, have words with non-letter symbols in
1 them.  For example the apostrophe.  These symbols generally appear in
1 the middle of a word, but they can also appear at the end, such as in an
1 abbreviation.  If a symbol can _only_ appear as part of a word then
1 Aspell can treat it as if it were a letter.
1 
1    However, the problem is most of these symbols have other uses.  For
1 example, the apostrophe is often used as a single quote and the
1 abbreviations marker is also used as a period.  Thus, Aspell cannot
1 blindly treat them as if they were letters.
1 
1    Aspell currently handles the case where the symbol can only appear in
1 the middle of the word fairly well.  It simply assumes that if there is
1 a letter both before and after the symbol than it is part of the word.
1 This works most of the time but it is not fool proof.  For example,
1 suppose the user forgot to leave a space after the period:
1 
1        ... and the dog went up the tree.Then the cat ...
1 
1 Aspell would think "tree.Then" is one word.  A better solution might be
1 to then try to check "tree" and "Then" separately.  But what if one of
1 them is not in the dictionary?  Should Aspell assume "tree.Then" is one
1 word?
1 
1    The case where the symbol can appear at the beginning or end of the
1 word is more difficult to deal with.  The symbol may or may not
1 actually be part of the word.  Aspell currently handles this case by
1 first trying to spell check the word with the symbol and if that fails,
1 try it without.  The problem is, if the word is misspelled, should
1 Aspell assume the symbol belongs with the word or not?  Currently
1 Aspell assumes it does, which is not always the correct thing to do.
1 
1    Numbers in words present a different challenge to Aspell.  If Aspell
1 treats numbers as letters then every possible number a user might write
1 in a document must be specified in the dictionary.  This could easily
1 be solved by having special code to assume all numbers are correctly
1 spelled.  Yet, what about something like "4th".  Since the "th" suffix
1 can appear after any number we are left with the same problem.  The
1 solution would be to have a special symbol for "any number".
1 
1    Words with spaces in them, such as foreign phrases, are even more
1 trouble to deal with.  The basic problem is that when tokenizing a
1 string there is no good way to keep phrases together. One solution is to
1 use trial and error.  If a word is not in the dictionary try grouping it
1 with the previous or next word and see if the combined word is in the
1 dictionary.  But what if the combined word is not, should the misspelled
1 word be grouped when looking for suggestions?  One solution is to also
1 store each part of the phrase in the dictionary, but tag it as part of a
1 phrase and not an independent word.
1 
1    To further complicate things, most applications that use spell
1 checkers are accustom to parsing the document themselves and sending it
1 to the spell checker a word at a time.  In order to support words with
1 spaces in them a more complicated interface will be required.
1