aspell: The Language Data File
1
1 7.1 The Language Data File
1 ==========================
1
1 The basic format of the language data file is the same as it is for the
1 Aspell configuration file. It is named `LANG.dat' and is located in
1 the architecture independent data dir for Aspell (option `data-dir')
1 which is usually `PREFIX/share/aspell'. Use `aspell config' to find
1 out where it is in your installation. By convention the language name
1 should be the two letter ISO 639 language code if it exists, if not use
1 the three letter code.
1
1 The language data file has several mandatory fields, and several
1 optional ones. All fields are case sensitive and should be in all
1 lower case.
1
1 The two mandatory fields are `name' and `charset'.
1
1 `name' is the name of the language and should be the same as the
1 file name (without the `.dat').
1
1 `charset' is the 8-bit character set Aspell will expect the word
1 lists to be formatted in. If possible choose from one of the standard
1 ones provided with Aspell. These are `iso-8859-*', `koi8-*', or
1 `viscii'. If your language does not require any non-ascii characters
1 choose `iso-8859-1'. If one of these standard character sets is not
11 suitable for your language then you can create a new one. ⇒
Creating A New Character Set.
1
1 The optional fields are as follows:
1
1 `data-encoding'
1 The encoding the language data files are expected to be in as well
1 as the default encoding to use when saving the personal
1 dictionaries. It can be either `utf-8' or any of the 8-bit
1 encoding that Aspell supports. If not set, then it defaults to
1 `charset'.
1
1 `special'
1 Non-letter characters that can appear in your language such as the
1 `'' and `-'. The format for the value is a list separated by
1 spaces. Each item of the list has the following format.
1
1 <char> <begin><middle><end>
1
1 CHAR is the non-letter character in question. BEGIN, MIDDLE, END
1 are either a `-' or a `*'. A star for BEGIN means that the
1 character can begin a word, a `-' means it can't. The same is
1 true for MIDDLE and END. For example, the entry for the `'' in
1 English is:
1
1 ' -*-
1
1 To include more than one middle character just list them one after
1 another on the same line. For example, to make both the `'' and
1 the `-' a middle character, use the following line in the language
1 data file:
1
1 special ' -*- - -*-
1
1 However, please be aware that adding special characters can have
1 unintended consequences due to limitations of Aspell. For example
1 if the `-' was accepted as a middle character, then _every_ word
1 with a `-' in it would be flagged as a spelling error unless that
1 exact word is in the dictionary, even if both parts are in the
1 dictionary. Also, having a `.' as an end character will cause the
1 `.' to be part of any misspelled words. Which can get very
1 annoying if you misspell a word at the end of a sentence.
1
1 `soundslike'
1 The name of the soundslike data for the language. The data is
1 expected to be in the file `NAME_phonet.dat'.
1
1 If NAME is `simpile' then a very simple soundslike is used. This
1 is not as powerful as full phonetic soundslike but it can be
1 computed a lot faster. (⇒The Simple Soundslike)
1
1 If the soundslike name is `none', or this option is not specified,
1 then no soundslike will be used. The effective soundslike is the
1 word converted to all lowercase and possibly with accents stripped
1 depending on the `store-as' option. For languages with phonetic
1 spelling the difference will not be very noticeable. However, for
1 languages with non-phonetic spelling there will be a noticeable
1 difference. The difference you notice will depend on the quality
1 of the soundslike data file. If you do not notice much of a
1 difference for a language with non-phonetic spelling that is a good
1 indication that the soundslike data is not rough enough--or the
1 words you are trying are not that badly misspelled.
1
1 `invisible-soundslike'
1 Avoid storing the soundslike information with the word. Instead
1 it is computed as needed. This option defaults to true if the
1 soundslike is `none' or `simpile', and false when a phonetic
1 soundslike is used.
1
1 `repl-table'
1 ⇒Replacement Tables.
1
1 `keyboard'
1 The base name of the keyboard definition file to use. For more
1 information see ⇒Notes on Typo-Analysis.
1
1 `sug-split-char'
1 A list of characters which specifies which characters to insert
1 between two words when a word is split. This is a list option.
1
1 `affix'
1 `affix-compress'
1 `partially-expand'
1 ⇒Affix Compression.
1
1 `store-as'
1 How the words are indexed in the dictionary. If "stripped" then
1 the word is indexed in a lower case and de-accented form. If
1 "lower", then the word is indexed in a lower case form but with
1 accent info still intact. This just controls how the word is
1 indexed, not how it is stored. The default is "stripped" unless
1 affix compression is used.
1
1 `norm-required'
1 Should be set to true if your language makes use of private use
1 characters or when Normalization Form C is not the same as full
1 composition.
1
1 `normalize'
1
1 `norm-form'
1
1 Additional options includes options to control how run-together words
1 are handled the same way as they are in the normal configuration files.
11 for more information, please ⇒Controlling the Behavior of
Run-together Words.
1