gettext: General Problems
1
1 15.5.18.1 General Problems Parsing Perl Code
1 ............................................
1
1 It is often heard that only Perl can parse Perl. This is not true.
1 Perl cannot be _parsed_ at all, it can only be _executed_. Perl has
1 various built-in ambiguities that can only be resolved at runtime.
1
1 The following example may illustrate one common problem:
1
1 print gettext "Hello World!";
1
1 Although this example looks like a bullet-proof case of a function
1 invocation, it is not:
1
1 open gettext, ">testfile" or die;
1 print gettext "Hello world!"
1
1 In this context, the string ‘gettext’ looks more like a file handle.
1 But not necessarily:
1
1 use Locale::Messages qw (:libintl_h);
1 open gettext ">testfile" or die;
1 print gettext "Hello world!";
1
1 Now, the file is probably syntactically incorrect, provided that the
1 module ‘Locale::Messages’ found first in the Perl include path exports a
1 function ‘gettext’. But what if the module ‘Locale::Messages’ really
1 looks like this?
1
1 use vars qw (*gettext);
1
1 1;
1
1 In this case, the string ‘gettext’ will be interpreted as a file
1 handle again, and the above example will create a file ‘testfile’ and
1 write the string “Hello world!” into it. Even advanced control flow
1 analysis will not really help:
1
1 if (0.5 < rand) {
1 eval "use Sane";
1 } else {
1 eval "use InSane";
1 }
1 print gettext "Hello world!";
1
1 If the module ‘Sane’ exports a function ‘gettext’ that does what we
1 expect, and the module ‘InSane’ opens a file for writing and associates
1 the _handle_ ‘gettext’ with this output stream, we are clueless again
1 about what will happen at runtime. It is completely unpredictable. The
1 truth is that Perl has so many ways to fill its symbol table at runtime
1 that it is impossible to interpret a particular piece of code without
1 executing it.
1
1 Of course, ‘xgettext’ will not execute your Perl sources while
1 scanning for translatable strings, but rather use heuristics in order to
1 guess what you meant.
1
1 Another problem is the ambiguity of the slash and the question mark.
1 Their interpretation depends on the context:
1
1 # A pattern match.
1 print "OK\n" if /foobar/;
1
1 # A division.
1 print 1 / 2;
1
1 # Another pattern match.
1 print "OK\n" if ?foobar?;
1
1 # Conditional.
1 print $x ? "foo" : "bar";
1
1 The slash may either act as the division operator or introduce a
1 pattern match, whereas the question mark may act as the ternary
1 conditional operator or as a pattern match, too. Other programming
1 languages like ‘awk’ present similar problems, but the consequences of a
1 misinterpretation are particularly nasty with Perl sources. In ‘awk’
1 for instance, a statement can never exceed one line and the parser can
1 recover from a parsing error at the next newline and interpret the rest
1 of the input stream correctly. Perl is different, as a pattern match is
1 terminated by the next appearance of the delimiter (the slash or the
1 question mark) in the input stream, regardless of the semantic context.
1 If a slash is really a division sign but mis-interpreted as a pattern
1 match, the rest of the input file is most probably parsed incorrectly.
1
1 There are certain cases, where the ambiguity cannot be resolved at
1 all:
1
1 $x = wantarray ? 1 : 0;
1
1 The Perl built-in function ‘wantarray’ does not accept any arguments.
1 The Perl parser therefore knows that the question mark does not start a
1 regular expression but is the ternary conditional operator.
1
1 sub wantarrays {}
1 $x = wantarrays ? 1 : 0;
1
1 Now the situation is different. The function ‘wantarrays’ takes a
1 variable number of arguments (like any non-prototyped Perl function).
1 The question mark is now the delimiter of a pattern match, and hence the
1 piece of code does not compile.
1
1 sub wantarrays() {}
1 $x = wantarrays ? 1 : 0;
1
1 Now the function is prototyped, Perl knows that it does not accept
1 any arguments, and the question mark is therefore interpreted as the
1 ternaray operator again. But that unfortunately outsmarts ‘xgettext’.
1
1 The Perl parser in ‘xgettext’ cannot know whether a function has a
1 prototype and what that prototype would look like. It therefore makes
1 an educated guess. If a function is known to be a Perl built-in and
1 this function does not accept any arguments, a following question mark
1 or slash is treated as an operator, otherwise as the delimiter of a
1 following regular expression. The Perl built-ins that do not accept
1 arguments are ‘wantarray’, ‘fork’, ‘time’, ‘times’, ‘getlogin’,
1 ‘getppid’, ‘getpwent’, ‘getgrent’, ‘gethostent’, ‘getnetent’,
1 ‘getprotoent’, ‘getservent’, ‘setpwent’, ‘setgrent’, ‘endpwent’,
1 ‘endgrent’, ‘endhostent’, ‘endnetent’, ‘endprotoent’, and ‘endservent’.
1
1 If you find that ‘xgettext’ fails to extract strings from portions of
1 your sources, you should therefore look out for slashes and/or question
1 marks preceding these sections. You may have come across a bug in
1 ‘xgettext’’s Perl parser (and of course you should report that bug). In
1 the meantime you should consider to reformulate your code in a manner
1 less challenging to ‘xgettext’.
1
1 In particular, if the parser is too dumb to see that a function does
1 not accept arguments, use parentheses:
1
1 $x = somefunc() ? 1 : 0;
1 $y = (somefunc) ? 1 : 0;
1
1 In fact the Perl parser itself has similar problems and warns you
1 about such constructs.
1