Apache OpenOffice (AOO) Bugzilla – Issue 75634
[surrogate pair] need collation data for surrogate pair characters
Last modified: 2008-04-28 22:11:53 UTC
- collation. Surrogate characters are not in collation table, they will be sorted out of range. We need to update the table. [Karl Hong]
Please set target when you decide one.
Naoyuki, do you have collation data for Japanese surrogate characters?
Created attachment 49285 [details] surrogate paired character list in JIS X 0213:2004
Hi Karl, I attached the list of surrogate characters in JIS X 0213:2004 in UTF8 and UTF16 (\u notation). I extracted this list from http://w3.kcua.ac.jp/~fujiwara/jis2000/jis2004/jisx0213-2004.html
Oops, sorry, I misunderstood your request. But I guess the first column data (kuten) in URL chart like 1-14-02 can be used as collation order for these characters. I will attach surrogate paired list with this data.
Created attachment 49288 [details] ja surrogate character list with kuten
Hi Naoyuki, what we need to update is the charset collator, http://l10n.openoffice.org/source/browse/*checkout*/l10n/i18npool/source/collator/data/ja_charset.txt?rev=1.2 I think the surrogate characters need to be inserted into the list, can I just use the list in, http://w3.kcua.ac.jp/~fujiwara/jis2000/jis2004/jisx0213-2004.html to replace ja_charset.txt?
Hi Karl, I looked current ja_charset.txt and data in w3.kcua.ac.jp. Please do not replace current ja_charset.txt with data in w3.kcua.ac.jp. I believe current ja_charset.txt has more reasonable order at least for Kana part. I do not know where ja_charset.txt came from, but if we can get updated version (with JIS X 0213:2004) from the same source, then we should update with it. I mean, if we can't, then I feel we should leave it as is for now. I do not see benefit to break backward compatibility, and actually I failed to find added surrogate paired characters collation data which OOo can utilize on the net.
Since we don't have collation data for those surrogate pairs, I will close this issue as WONTFIX.
close.