aspell: Compound Words
1
1 C.1 Compound Words
1 ==================
1
1 In some languages, such as German, it is acceptable to string two words
1 together, thus forming a compound word. However, there are rules to
1 when this can be done. Furthermore, it is not always sufficient to
1 simply concatenate the two words. For example, sometimes a letter is
1 inserted between the two words. Aspell currently has support for
1 unconditionally stringing words together. I tried implementing more
1 sophisticated support for compound words in Aspell but it was too
1 limiting and no one used it.
1
1 After receiving feedback from several people it seems that acceptable
1 support for compound words involved two basically independent parts.
1 If this is not sufficient for your language please let me know.
1
1 Part One
1 ========
1
1 Describes how the word needs to be changed when forming a compound
1
1 CMP <flag> <strip> <add> <cond> <cond2>
1
1 <flag> is the compound flag
1 <strip> is the string to strip or 0 for the null string
1 <add> is the string to add or 0 for the null string
1 <cond> is the condition to match at the end of the current word
1 <cond2> is the condition to match at the beginning of the next word
1
1 All but the last field are the same as a suffix entry in the existing
1 affix code.
1
1 <cond> is a simplified regular expression. Some examples:
1 . (for anything)
1 e
1 [^aeiou]y
1 [^ey]
1 [aeiou]y
1
1 It does not seem necessary to change the beginning of a word when
1 forming compounds
1
1 Part Two
1 ========
1
1 Describes the position a word can appear in (beginning, middle, or end)
1 and with which words.
1
1 To do this each word can be assigned a category. Then each category
1 can be given a set of rules to describe how it can be used in a
1 compound word for example
1
1 A + B: indicates that category A may appear at the beginning of a
1 word when followed by a category B word. When combined it is then
1 considered a category B word.
1 A + C + B: here a C word may only appear between an A or B word
1 A + A + B
1 A + A
1 A + A + A
1 etc..
1
1 I have not decided if a word should be allowed to belong to more than
1 one category as a new category can be created in necessary to mean
1 words in both category A and B for example.
1
1 C.1.1 To Implement
1 ------------------
1
1 To implement support for compound words based on the above description
1 the following will need to be done:
1
1 1. expand the affix code to support special compound flags as
1 described in part one
1
1 2. write code to store the conditions as described in part two
1
1 3. expand the compound checking code to check against the conditions
1
1 4. expand the dictionary format to store the necessary compound info
1 with the word
1
1
1 I don't know when I will be able to actually implement this. If you
1 would like to try please let me know.
1