liblouis: Translation Opcodes
1
1 2.8 Translation Opcodes
1 =======================
1
1 These opcodes define the braille representations for character
1 sequences. Each of them defines an entry within the contraction table.
1 These entries may be defined in any order except, as noted below, when
1 they define alternate representations for the same character sequence.
1
1 Each of these opcodes specifies a condition under which the
1 translation is legal, and each also has a characters operand and a dots
1 operand. The text being translated is processed strictly from left to
1 right, character by character, with the most eligible entry for each
1 position being used. If there is more than one eligible entry for a
1 given position in the text, then the one with the longest character
1 string is used. If there is more than one eligible entry for the same
1 character string, then the one defined first is is tested for legality
1 first. (This is the only case in which the order of the entries makes a
1 difference.)
1
1 The characters operand is a sequence or string of characters preceded
1 and followed by whitespace. Each character can be entered in the normal
1 way, or it can be defined as a four-digit hexadecimal number preceded by
1 '\x'.
1
1 The dots operand defines the braille representation for the
1 characters operand. It may also be specified as an equals sign ('=').
11 This means that the the default representation for each character (⇒
Character-Definition Opcodes) within the sequence is to be used. Note
1 however that the '=' shortcut for dot patterns is deprecated. Dot
1 patterns should be written out. Otherwise back-translation may not be
1 correct.
1
1 In what follows the word 'characters' means a sequence of one or more
1 consecutive letters between spaces and/or punctuation marks.
1
1 'noback opcode ...'
1 This is an opcode prefix, that is to say, it modifies the operation
1 of the opcode that follows it on the same line. noback specifies
1 that no back-translation is to be done using this line.
1
1 noback always ;\s; 0
1
1 'nofor opcode ...'
1 This is an opcode prefix which modifies the operation of the opcode
1 following it on the same line. nofor specifies that forward
1 translation is not to use the information on this line.
1
1 'compbrl characters'
1 If the characters are found within a block of text surrounded by
1 whitespace the entire block is translated according to the default
11 braille representations defined by the ⇒Character-Definition
Opcodes, if 8-dot computer braille is enabled or according to the
11 dot patterns given in the 'comp6' opcode (⇒comp6 comp6
opcode.), if 6-dot computer braille is enabled. For example:
1
1 compbrl www translate URLs in computer braille
1
1 'comp6 character dots'
1 This opcode specifies the translation of characters in 6-dot
1 computer braille. It is necessary because the translation of a
1 single character may require more than one cell. The first operand
1 must be a character with a decimal representation from 0 to 255
1 inclusive. The second operand may specify as many cells as
1 necessary. The opcode is somewhat of a misnomer, since any dots,
1 not just dots 1 through 6, can be specified. This even includes
1 virtual dots.
1
1 'nocont characters'
1 Like 'compbrl', except that the string is uncontracted. 'prepunc'
1 opcode (⇒prepunc prepunc opcode.) and 'postpunc' opcode
1 (⇒postpunc postpunc opcode.) rules are applied, however.
1 This is useful for specifying that foreign words should not be
1 contracted in an entire document.
1
1 'replace characters {characters}'
1 Replace the first set of characters, no matter where they appear,
1 with the second. Note that the second operand is _NOT_ a dot
1 pattern. It is also optional. If it is omitted the character(s)
1 in the first operand will be discarded. This is useful for
1 ignoring characters. It is possible that the "ignored" characters
1 may still affect the translation indirectly. Therefore, it is
11 preferable to use 'correct' opcode (⇒correct correct
opcode.).
1
1 'always characters dots'
1 Replace the characters with the dot pattern no matter where they
1 appear. Do _NOT_ use an entry such as 'always a 1'. Use the
1 'uplow', 'letter', etc. character definition opcodes instead. For
1 example:
1
1 always world 456-2456 unconditional translation
1
1 'repeated characters dots'
1 Replace the characters with the dot pattern no matter where they
1 appear. Ignore any consecutive repetitions of the same character
1 sequence. This is useful for shortening long strings of spaces or
1 hyphens or periods. For example:
1
1 repeated --- 36-36-36 shorten separator lines made with hyphens
1
1 'repword characters dots'
1 When characters are encountered check to see if the word before
1 this string matches the word after it. If so, replace characters
1 with dots and eliminate the second word and any word following
1 another occurrence of characters that is the same. This opcode is
1 used in Malaysian braille. In this case the rule is:
1
1 repword - 123456
1
1 'largesign characters dots'
1 Replace the characters with the dot pattern no matter where they
1 appear. In addition, if two words defined as large signs follow
1 each other, remove the space between them. For example, in
1 'en-us-g2.ctb' the words 'and' and 'the' are both defined as large
1 signs. Thus, in the phrase 'the cat and the dog' the space would
1 be deleted between 'and' and 'the', with the result 'the cat andthe
1 dog'. Of course, 'and' and 'the' would be properly contracted.
1 The term 'largesign' is a bit of braille jargon that pleases
1 braille experts.
1
1 'word characters dots'
1 Replace the characters with the dot pattern if they are a word,
1 that is, are surrounded by whitespace and/or punctuation.
1
1 'syllable characters dots'
1 As its name indicates, this opcode defines a "syllable" which must
1 be represented by exactly the dot patterns given. Contractions may
1 not cross the boundaries of this "syllable" either from left or
1 right. The character string defined by this opcode need not be a
1 lexical syllable, though it usually will be. The equal sign in the
1 following example means that the the default representation for
1 Opcodes::):
1
1 syllable horse = sawhorse, horseradish
1
1 'nocross characters dots'
1 Replace the characters with the dot pattern if the characters are
1 all in one syllable (do not cross a syllable boundary). For this
1 opcode to work, a hyphenation table must be included. If this is
11 not done, 'nocross' behaves like the 'always' opcode (⇒always
always opcode.). For example, if the English Grade 2 table is
1 being used and the appropriate hyphenation table has been included
1 'nocross sh 146' will cause the 'sh' in 'monkshood' not to be
1 contracted.
1
1 'joinword characters dots'
1 Replace the characters with the dot pattern if they are a word
1 which is followed by whitespace and a letter. In addition remove
1 the whitespace. For example, 'en-us-g2.ctb' has 'joinword to 235'.
1 This means that if the word 'to' is followed by another word the
1 contraction is to be used and the space is to be omitted. If these
1 conditions are not met, the word is translated according to any
1 other opcodes that may apply to it.
1
1 'lowword characters dots'
1 Replace the characters with the dot pattern if they are a word
1 preceded and followed by whitespace. No punctuation either before
1 or after the word is allowed. The term 'lowword' derives from the
1 fact that in English these contractions are written in the lower
1 part of the cell. For example:
1
1 lowword were 2356
1
1 'contraction characters'
1 If you look at 'en-us-g2.ctb' you will see that some words are
1 actually contracted into some of their own letters. A famous
1 example among braille transcribers is 'also', which is contracted
1 as 'al'. But this is also the name of a person. To take another
1 example, 'altogether' is contracted as 'alt', but this is the
1 abbreviation for the alternate key on a computer keyboard.
1 Similarly 'could' is contracted into 'cd', but this is the
1 abbreviation for compact disk. To prevent confusion in such cases,
11 the letter sign (see 'letsign' opcode (⇒letsign letsign
opcode.)) is placed before such letter combinations when they
1 actually are abbreviations, not contractions. The 'contraction'
1 opcode tells the translator to do this.
1
1 'sufword characters dots'
1 Replace the characters with the dot pattern if they are either a
1 word or at the beginning of a word.
1
1 'prfword characters dots'
1 Replace the characters with the dot pattern if they are either a
1 word or at the end of a word.
1
1 'begword characters dots'
1 Replace the characters with the dot pattern if they are at the
1 beginning of a word.
1
1 'begmidword characters dots'
1 Replace the characters with the dot pattern if they are either at
1 the beginning or in the middle of a word.
1
1 'midword characters dots'
1 Replace the characters with the dot pattern if they are in the
1 middle of a word.
1
1 'midendword characters dots'
1 Replace the characters with the dot pattern if they are either in
1 the middle or at the end of a word.
1
1 'endword characters dots'
1 Replace the characters with the dot pattern if they are at the end
1 of a word.
1
1 'partword characters dots'
1 Replace the characters with the dot pattern if the characters are
1 anywhere in a word, that is, if they are proceeded or followed by a
1 letter.
1
1 'exactdots @dots'
1 Note that the operand must begin with an at sign ('@'). The dot
1 pattern following it is evaluated for validity. If it is valid,
1 whenever an at sign followed by this dot pattern appears in the
1 source document it is replaced by the characters corresponding to
1 the dot pattern in the output. This opcode is intended for use in
1 liblouisutdml semantic-action files to specify exact dot patterns,
1 as in mathematical codes. For example:
1
1 exactdots @4-46-12356
1 will produce the characters with these dot patterns in the output.
1
1 'prepunc characters dots'
1 Replace the characters with the dot pattern if they are part of
1 punctuation at the beginning of a word.
1
1 'postpunc characters dots'
1 Replace the characters with the dot pattern if they are part of
1 punctuation at the end of a word.
1
1 'begnum characters dots'
1 Replace the characters with the dot pattern if they are at the
1 beginning of a number, that is, before all its digits. For
1 example, in 'en-us-g1.ctb' we have 'begnum # 4'.
1
1 'midnum characters dots'
1 Replace the characters with the dot pattern if they are in the
1 middle of a number. For example, 'en-us-g1.ctb' has 'midnum . 46'.
1 This is because the decimal point has a different dot pattern than
1 the period.
1
1 'endnum characters dots'
1 Replace the characters with the dot pattern if they are at the end
1 of a number. For example 'en-us-g1.ctb' has 'endnum th 1456'.
1 This handles things like '4th'. A letter sign is _NOT_ inserted.
1
1 'joinnum characters dots'
1 Replace the characters with the dot pattern. In addition, if
1 whitespace and a number follows omit the whitespace. This opcode
1 can be used to join currency symbols to numbers for example:
1
1 joinnum \x20AC 15 (EURO SIGN)
1 joinnum \x0024 145 (DOLLAR SIGN)
1 joinnum \x00A3 1234 (POUND SIGN)
1 joinnum \x00A5 13456 (YEN SIGN)
1