Issue 75634 - [surrogate pair] need collation data for surrogate pair characters
Summary: [surrogate pair] need collation data for surrogate pair characters
Status: CLOSED WONT_FIX
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: OOo 2.2 RC2
Hardware: All All
: P4 Trivial (vote)
Target Milestone: ---
Assignee: karl.hong
QA Contact: issues@l10n
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-22 06:44 UTC by naoyuki
Modified: 2008-04-28 22:11 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
surrogate paired character list in JIS X 0213:2004 (3.99 KB, application/x-compressed)
2007-10-31 09:13 UTC, naoyuki
no flags Details
ja surrogate character list with kuten (5.04 KB, application/x-compressed)
2007-10-31 09:56 UTC, naoyuki
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description naoyuki 2007-03-22 06:44:49 UTC
- collation. Surrogate characters are not in collation table, they will be
sorted out of range. We need to update the table. [Karl Hong]
Comment 1 pavel 2007-08-02 12:22:04 UTC
Please set target when you decide one.
Comment 2 karl.hong 2007-10-30 17:27:47 UTC
Naoyuki, do you have collation data for Japanese surrogate characters?
Comment 3 naoyuki 2007-10-31 09:13:44 UTC
Created attachment 49285 [details]
surrogate paired character list in JIS X 0213:2004
Comment 4 naoyuki 2007-10-31 09:17:04 UTC
Hi Karl, I attached the list of surrogate characters in JIS X 0213:2004 in UTF8 and 
UTF16 (\u notation). I extracted this list from
http://w3.kcua.ac.jp/~fujiwara/jis2000/jis2004/jisx0213-2004.html
Comment 5 naoyuki 2007-10-31 09:54:23 UTC
Oops, sorry, I misunderstood your request.
But I guess the first column data  (kuten) in URL chart like 1-14-02 can be used
as collation order
for these characters. I will attach surrogate paired list with this data.
Comment 6 naoyuki 2007-10-31 09:56:14 UTC
Created attachment 49288 [details]
ja surrogate character list with kuten
Comment 7 karl.hong 2007-12-01 03:08:07 UTC
Hi Naoyuki,  what we need to update is the charset collator,

http://l10n.openoffice.org/source/browse/*checkout*/l10n/i18npool/source/collator/data/ja_charset.txt?rev=1.2

I think the surrogate characters need to be inserted into the list, can I just
use the list in,

http://w3.kcua.ac.jp/~fujiwara/jis2000/jis2004/jisx0213-2004.html

to replace ja_charset.txt?
Comment 8 naoyuki 2007-12-04 04:17:09 UTC
Hi Karl,

I looked current ja_charset.txt and data in w3.kcua.ac.jp. Please do not replace
current ja_charset.txt with data in w3.kcua.ac.jp. I believe current
ja_charset.txt has more reasonable order at least for Kana part. I do not know
where ja_charset.txt came from, but if we can get updated version (with JIS X
0213:2004) from the same source, then we should update with it. I mean, if we
can't, then I feel we should leave it as is for now. I do not see benefit to
break backward compatibility, and actually I failed to find added surrogate
paired characters collation data which OOo can utilize on the net.

Comment 9 karl.hong 2007-12-08 06:05:52 UTC
Since we don't have collation data for those surrogate pairs, I will close this
issue as WONTFIX.
Comment 10 karl.hong 2008-04-28 22:11:53 UTC
close.