Bug 11631: Make i18n toolchain ignore useless strings
authorPasi Kallinen <pasi.kallinen@pttk.fi>
Thu, 24 Apr 2014 08:01:52 +0000 (11:01 +0300)
committerGalen Charlton <gmc@esilibrary.com>
Sun, 27 Apr 2014 21:06:48 +0000 (21:06 +0000)
This patch removes several types of strings from the
PO files that cannot be usefully translated, including
ones that consist entirely of punctuation and/or HTML entities.

Test:
1) Update PO files of some lang, xx-YY-*po
cd misc/translator
perl translate update xx-YY
2) Do it again, just in case
3) rm po/xx-YY*po~
4) Extract all msgid's, sorted
cat po/xx-YY*po | egrep "^msgid" | sort | uniq > xx-YY-pre
5) Apply the patch
6) Repeat 1-3
7) Repeat 4 again, other file
cat po/xx-YY*po | egrep "^msgid" | sort | uniq > xx-YY-post
8) Do a diff, inspect results, only strings with %s and \s
diff xx-YY-pre xx-YY-post | less

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Works as described, 380 strings less to 'translate'
No koha-qa errors.

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Tested according to test plan, works as described.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>

misc/translator/xgettext.pl

index 032117d..ea75794 100755 (executable)
@@ -37,6 +37,7 @@ sub string_negligible_p ($) {
            || $t =~ /^\d+$/                    # purely digits
            || $t =~ /^[-\+\.,:;!\?'"%\(\)\[\]\|]+$/ # punctuation w/o context
            || $t =~ /^[A-Za-z]$/               # single letters
+            || $t =~ /^(&[a-z]+;|&#\d+;|&#x[0-9a-fA-F]+;|%%|%s|\s|[[:punct:]])*$/ # html entities,placeholder,punct, ...
        )
 }