grub2: Internationalisation

1 
1 17 Internationalisation
1 ***********************
1 
1 17.1 Charset
1 ============
1 
1 GRUB uses UTF-8 internally other than in rendering where some
1 GRUB-specific appropriate representation is used.  All text files
1 (including config) are assumed to be encoded in UTF-8.
1 
1 17.2 Filesystems
1 ================
1 
1 NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of
1 ISO9660 are treated as UTF-16 as per specification.  AFS and BFS are
1 read as UTF-8, again according to specification.  BtrFS, cpio, tar,
1 squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4,
1 FAT (short names), F2FS, RockRidge part of ISO9660, nilfs2, UFS1, UFS2
1 and ZFS are assumed to be UTF-8.  This might be false on systems
1 configured with legacy charset but as long as the charset used is
1 superset of ASCII you should be able to access ASCII-named files.  And
1 it's recommended to configure your system to use UTF-8 to access the
1 filesystem, convmv may help with migration.  ISO9660 (plain) filenames
1 are specified as being ASCII or being described with unspecified escape
1 sequences.  GRUB assumes that the ISO9660 names are UTF-8 (since any
1 ASCII is valid UTF-8).  There are some old CD-ROMs which use CP437 in
1 non-compliant way.  You're still able to access files with names
1 containing only ASCII characters on such filesystems though.  You're
1 also able to access any file if the filesystem contains valid Joliet
1 (UTF-16) or RockRidge (UTF-8).  AFFS, SFS and HFS never use unicode and
1 GRUB assumes them to be in Latin1, Latin1 and MacRoman respectively.
1 GRUB handles filesystem case-insensitivity however no attempt is
1 performed at case conversion of international characters so e.g.  a file
1 named lowercase greek alpha is treated as different from the one named
1 as uppercase alpha.  The filesystems in questions are NTFS (except POSIX
1 namespace), HFS+ (configurable at mkfs time, default insensitive), SFS
1 (configurable at mkfs time, default insensitive), JFS (configurable at
1 mkfs time, default sensitive), HFS, AFFS, FAT, exFAT and ZFS
1 (configurable on per-subvolume basis by property "casesensitivity",
1 default sensitive).  On ZFS subvolumes marked as case insensitive files
1 containing lowercase international characters are inaccessible.  Also
1 like all supported filesystems except HFS+ and ZFS (configurable on
1 per-subvolume basis by property "normalization", default none) GRUB
1 makes no attempt at check of canonical equivalence so a file name
1 u-diaresis is treated as distinct from u+combining diaresis.  This
1 however means that in order to access file on HFS+ its name must be
1 specified in normalisation form D. On normalized ZFS subvolumes
1 filenames out of normalisation are inaccessible.
1 
1 17.3 Output terminal
1 ====================
1 
1 Firmware output console "console" on ARC and IEEE1275 are limited to
1 ASCII.
1 
1    BIOS firmware console and VGA text are limited to ASCII and some
1 pseudographics.
1 
1    None of above mentioned is appropriate for displaying international
1 and any unsupported character is replaced with question mark except
1 pseudographics which we attempt to approximate with ASCII.
1 
1    EFI console on the other hand nominally supports UTF-16 but actual
1 language coverage depends on firmware and may be very limited.
1 
1    The encoding used on serial can be chosen with 'terminfo' as either
1 ASCII, UTF-8 or "visual UTF-8".  Last one is against the specification
1 but results in correct rendering of right-to-left on some readers which
1 don't have own bidi implementation.
1 
1    On emu GRUB checks if charset is UTF-8 and uses it if so and uses
1 ASCII otherwise.
1 
1    When using gfxterm or gfxmenu GRUB itself is responsible for
1 rendering the text.  In this case GRUB is limited by loaded fonts.  If
1 fonts contain all required characters then bidirectional text, cursive
1 variants and combining marks other than enclosing, half (e.g.  left half
1 tilde or combining overline) and double ones.  Ligatures aren't
1 supported though.  This should cover European, Middle Eastern (if you
1 don't mind lack of lam-alif ligature in Arabic) and East Asian scripts.
1 Notable unsupported scripts are Brahmic family and derived as well as
1 Mongolian, Tifinagh, Korean Jamo (precomposed characters have no
1 problem) and tonal writing (2e5-2e9).  GRUB also ignores deprecated (as
1 specified in Unicode) characters (e.g.  tags).  GRUB also doesn't handle
1 so called "annotation characters" If you can complete either of two
1 lists or, better, propose a patch to improve rendering, please contact
1 developer team.
1 
1 17.4 Input terminal
1 ===================
1 
1 Firmware console on BIOS, IEEE1275 and ARC doesn't allow you to enter
1 non-ASCII characters.  EFI specification allows for such but author is
1 unaware of any actual implementations.  Serial input is currently
1 limited for latin1 (unlikely to change).  Own keyboard implementations
1 (at_keyboard and usb_keyboard) supports any key but work on
1 one-char-per-keystroke.  So no dead keys or advanced input method.  Also
1 there is no keymap change hotkey.  In practice it makes difficult to
1 enter any text using non-Latin alphabet.  Moreover all current input
1 consumers are limited to ASCII.
1 
1 17.5 Gettext
1 ============
1 
1 GRUB supports being translated.  For this you need to have language *.mo
1 files in $prefix/locale, load gettext module and set "lang" variable.
1 
1 17.6 Regexp
1 ===========
1 
1 Regexps work on unicode characters, however no attempt at checking
1 cannonical equivalence has been made.  Moreover the classes like
1 [:alpha:] match only ASCII subset.
1 
1 17.7 Other
1 ==========
1 
1 Currently GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY]
1 24-hour datetime format but weekdays are translated.  GRUB always uses
1 the decimal number format with [0-9] as digits and .  as descimal
1 separator and no group separator.  IEEE1275 aliases are matched
1 case-insensitively except non-ASCII which is matched as binary.  Similar
1 behaviour is for matching OSBundleRequired.  Since IEEE1275 aliases and
1 OSBundleRequired don't contain any non-ASCII it should never be a
1 problem in practice.  Case-sensitive identifiers are matched as raw
1 strings, no canonical equivalence check is performed.  Case-insenstive
1 identifiers are matched as RAW but additionally [a-z] is equivalent to
1 [A-Z]. GRUB-defined identifiers use only ASCII and so should
1 user-defined ones.  Identifiers containing non-ASCII may work but aren't
1 supported.  Only the ASCII space characters (space U+0020, tab U+000b,
1 CR U+000d and LF U+000a) are recognised.  Other unicode space characters
1 aren't a valid field separator.  'test' (⇒test) tests <, >, <=,
1 >=, -pgt and -plt compare the strings in the lexicographical order of
1 unicode codepoints, replicating the behaviour of test from coreutils.
1 environment variables and commands are listed in the same order.
1