aspell: Affix Compression
1
1 7.6 Affix Compression
1 =====================
1
1 Aspell, as of version 0.60, now has support for affix compression. The
1 codebase comes from MySpell found in OpenOffice.
1
1 To add support for affix compression add the following lines to the
1 language data file.
1
1 affix LANG
1 affix-compress true
1
1 The line `affix LANG' adds support for recognizing affix
1 information, and the line `affix-compress true' enables affix
1 compression.
1
1 The affix file is expected to be named `LANG_affix.dat'. It is the
1 exact same format as those used by MySpell. More information can be
1 found in the myspell/ directory of the distribution or at
1 `http://lingucomponent.openoffice.org/dictionary.html'.
1
1 Affix compression can also be used with soundslike lookup. Aspell
1 does this by only storing the soundslike for the root word. When a
1 word is misspelled it will search for a soundslike close to all
1 possible roots of the misspelled word.
1
1 When no soundslike information, or the simple soundslike, is used it
1 may be beneficial to specify the option `partially-expand' which will
1 partially expand a word with affix information so that the affix flags
1 do not affect the first 3 letters of the word. This will allow Aspell
1 to get more accurate results when scanning the list for near misses
1 since the full word can be used and not just the root. Specifying this
1 option, however, will also effectively expand any prefixes. Thus this
1 option should not be used for prefix heavy languages such as Hebrew.
1
1 An existing word list, without affix info, can be affix compressed
1 using using `aspell munch-list'.
1
1 7.6.1 Format of the Affix File
1 ------------------------------
1
1 An affix is either a prefix or a suffix attached to root words to make
1 other words. For example supply -> supplied by dropping the "y" and
1 adding an "ied" (the suffix).
1
1 Here is an example of how to define one specific suffix borrowed
1 from the English affix file.
1
1 SFX D Y 4
1 SFX D 0 d e
1 SFX D y ied [^aeiou]y
1 SFX D 0 ed [^ey]
1 SFX D 0 ed [aeiou]y
1
1 This file is space delimited and case sensitive. So this information
1 can be interpreted as follows:
1
1 The first line has 4 fields:
1
1 1 SFX indicates this is a suffix
1 2 D is the name of the character which represents this suffix
1 3 Y indicates it can be combined with prefixes (cross product)
1 4 4 indicates that sequence of 4 affix entries are needed to
1 properly store the affix information
1
1 The remaining lines describe the unique information for the 4 affix
1 entries that make up this affix. Each line can be interpreted as
1 follows: (note fields 1 and 2 are used as a check against line 1 info)
1
1 1 SFX indicates this is a suffix
1 2 D is the name of the character which represents this affix
1 3 y the string of chars to strip off before adding affix (a 0
1 here indicates the NULL string)
1 4 ied the string of affix characters to add (a 0 here indicates
1 the NULL string)
1 5 [^aeiou]y the conditions which must be met before the affix can be
1 applied
1
1 Field 5 is interesting. Since this is a suffix, field 5 tells us
1 that there are 2 conditions that must be met. The first condition is
1 that the next to the last character in the word must _not_ be any of the
1 following "a", "e", "i", "o" or "u". The second condition is that the
1 last character of the word must end in "y".
1
1 7.6.2 When Compared With Ispell
1 -------------------------------
1
1 Now for comparison purposes, here is the same information from the
1 Ispell `english.aff' compression file which was used as the basis for
1 the OOo one.
1
1 flag *D:
1 E > D # As in create > created
1 [^AEIOU]Y > -Y,IED # As in imply > implied
1 [^EY] > ED # As in cross > crossed
1 [AEIOU]Y > ED # As in convey > conveyed
1
1 The Ispell information has exactly the same information but in a
1 slightly different (case-insensitive) format:
1
1 Here are the ways to see the mapping from Ispell .aff format to our
1 OOo format.
1
1 1. The Ispell english.aff has flag D under the "suffix" section so
1 you know it is a suffix.
1
1 2. The D is the character assigned to this suffix
1
1 3. `*' indicates that it can be combined with prefixes
1
1 4. Each line following the : describes the affix entries needed to
1 define this suffix
1
1 * The first field is the conditions that must be met.
1
1 * The second field is after the > if a "-" occurs is the string
1 to strip off (can be blank).
1
1 * The third field is the string to add (the affix)
1
1 In addition all chars in Ispell aff files are in uppercase.
1
1 7.6.3 Specifying Affix Flags
1 ----------------------------
1
1 Affix flags are specified in the word list by specifying them after the
1 `/' character:
1
1 WORD/FLAGS
1
1 For example:
1
1 create/DG
1
1 will associate the `D' and `G' flag with the word create.
1