Bug 14078: converting from ISO5426 is not complete
authorFridolin Somers <fridolin.somers@biblibre.com>
Wed, 29 Apr 2015 10:24:23 +0000 (12:24 +0200)
committerLiz Rea <wizzyrea@gmail.com>
Fri, 11 Dec 2015 00:44:28 +0000 (13:44 +1300)
Conversion of MARC from ISO5426 is defined in C4::Charset::char_decode5426().
Each character or combined characters conversion is defined in a map.

This patch changes some odd actual conversions.

In char_decode5426(), only characters between 0xC0 and 0xDF will be used for combining with following charater :
  ($char >= 0xC0 && $char <= 0xDF)
So conversion like "$chars{0x81d1}=0x00b0" will never be used.
Rules for "h with breve below" use combining with 0xf9 but looks like the correct caracter is 0xd5.

See http://www.gymel.com/charsets/MAB2.html

Signed-off-by: Frederic Demians <f.demians@tamil.fr>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
(cherry picked from commit 5e3882bcecede59f24a6b3c4aa9c4324390a29c3)
Signed-off-by: Frédéric Demians <f.demians@tamil.fr>
(cherry picked from commit 2ceab7145e2c337a8737d2f3a4b20ff9a6c20315)
Signed-off-by: Liz Rea <wizzyrea@gmail.com>

C4/Charset.pm

index 0998d4e..a04c640 100644 (file)
@@ -819,7 +819,7 @@ $chars{0x97}=0x003c;#3/2leftlowsinglequotationmark
 $chars{0x98}=0x003e;#3/2leftlowsinglequotationmark
 $chars{0xfa}=0x0153; #oe
 $chars{0xea}=0x0152; #oe
-$chars{0x81d1}=0x00b0;
+#$chars{0x81d1}=0x00b0; # FIXME useless
 
 ####
 ## combined characters iso5426
@@ -1121,8 +1121,8 @@ $chars{0xd375}=0x0173; # small u with ogonek
 $chars{0xd441}=0x1e00; # capital a with ring below
 $chars{0xd461}=0x1e01; # small a with ring below
         # 5/5 half circle below
-$chars{0xf948}=0x1e2a; # capital h with breve below
-$chars{0xf968}=0x1e2b; # small h with breve below
+$chars{0xd548}=0x1e2a; # capital h with breve below
+$chars{0xd568}=0x1e2b; # small h with breve below
         # 5/6 dot below
 $chars{0xd641}=0x1ea0; # capital a with dot below
 $chars{0xd642}=0x1e04; # capital b with dot below