aspell: Affix Compression

1 
1 7.6 Affix Compression
1 =====================
1 
1 Aspell, as of version 0.60, now has support for affix compression.  The
1 codebase comes from MySpell found in OpenOffice.
1 
1    To add support for affix compression add the following lines to the
1 language data file.
1 
1      affix          LANG
1      affix-compress true
1 
1    The line `affix LANG' adds support for recognizing affix
1 information, and the line `affix-compress true' enables affix
1 compression.
1 
1    The affix file is expected to be named `LANG_affix.dat'.  It is the
1 exact same format as those used by MySpell.  More information can be
1 found in the myspell/ directory of the distribution or at
1 `http://lingucomponent.openoffice.org/dictionary.html'.
1 
1    Affix compression can also be used with soundslike lookup.  Aspell
1 does this by only storing the soundslike for the root word.  When a
1 word is misspelled it will search for a soundslike close to all
1 possible roots of the misspelled word.
1 
1    When no soundslike information, or the simple soundslike, is used it
1 may be beneficial to specify the option `partially-expand' which will
1 partially expand a word with affix information so that the affix flags
1 do not affect the first 3 letters of the word.  This will allow Aspell
1 to get more accurate results when scanning the list for near misses
1 since the full word can be used and not just the root.  Specifying this
1 option, however, will also effectively expand any prefixes.  Thus this
1 option should not be used for prefix heavy languages such as Hebrew.
1 
1    An existing word list, without affix info, can be affix compressed
1 using using `aspell munch-list'.
1 
1 7.6.1 Format of the Affix File
1 ------------------------------
1 
1 An affix is either a  prefix or a suffix attached to root words to make
1 other words.  For example supply -> supplied by dropping the "y" and
1 adding an "ied" (the suffix).
1 
1    Here is an example of how to define one specific suffix borrowed
1 from the English affix file.
1 
1      SFX D Y 4
1      SFX D   0     d          e
1      SFX D   y     ied        [^aeiou]y
1      SFX D   0     ed         [^ey]
1      SFX D   0     ed         [aeiou]y
1 
1    This file is space delimited and case sensitive.  So this information
1 can be interpreted as follows:
1 
1    The first line has 4 fields:
1 
1 1    SFX         indicates this is a suffix
1 2    D           is the name of the character which represents this suffix
1 3    Y           indicates it can be combined with prefixes (cross product)
1 4    4           indicates that sequence of 4 affix entries are needed to
1                  properly store the affix information
1 
1    The remaining lines describe the unique information for the 4 affix
1 entries that make up this affix.  Each line can be interpreted as
1 follows: (note fields 1 and 2 are used as a check against line 1 info)
1 
1 1    SFX         indicates this is a suffix
1 2    D           is the name of the character which represents this affix
1 3    y           the string of chars to strip off before adding affix (a 0
1                  here indicates the NULL string)
1 4    ied         the string of affix characters to add (a 0 here indicates
1                  the NULL string)
1 5    [^aeiou]y   the conditions which must be met before the affix can be
1                  applied
1 
1    Field 5 is interesting.  Since this is a suffix, field 5 tells us
1 that there are 2 conditions that must be met.  The first condition is
1 that the next to the last character in the word must _not_ be any of the
1 following "a", "e", "i", "o" or "u".  The second condition is that the
1 last character of the word must end in "y".
1 
1 7.6.2 When Compared With Ispell
1 -------------------------------
1 
1 Now for comparison purposes, here is the same information from the
1 Ispell `english.aff' compression file which was used as the basis for
1 the OOo one.
1 
1      flag *D:
1          E           >       D               # As in create > created
1          [^AEIOU]Y   >       -Y,IED          # As in imply > implied
1          [^EY]       >       ED              # As in cross > crossed
1          [AEIOU]Y    >       ED              # As in convey > conveyed
1 
1    The Ispell information has exactly the same information but in a
1 slightly different (case-insensitive) format:
1 
1    Here are the ways to see the mapping from Ispell .aff format to our
1 OOo format.
1 
1   1. The Ispell english.aff has flag D under the "suffix" section so
1      you know it is a suffix.
1 
1   2. The D is the character assigned to this suffix
1 
1   3. `*' indicates that it can be combined with prefixes
1 
1   4. Each line following the : describes the affix entries needed to
1      define this suffix
1 
1         * The first field is the conditions that must be met.
1 
1         * The second field is after the > if a "-" occurs is the string
1           to strip off (can be blank).
1 
1         * The third field is the string to add (the affix)
1 
1    In addition all chars in Ispell aff files are in uppercase.
1 
1 7.6.3 Specifying Affix Flags
1 ----------------------------
1 
1 Affix flags are specified in the word list by specifying them after the
1 `/' character:
1 
1      WORD/FLAGS
1 
1    For example:
1 
1      create/DG
1 
1 will associate the `D' and `G' flag with the word create.
1