Unicode 6.0 Sorting
Mountain View, CA, USA – October 29, 2010 – The new version of Unicode Technical Standard #10, Unicode Collation Algorithm (UCA), has been updated for Unicode Version 6.0, adding support for 2,088 characters in sorting, searching, and matching. Also in this release new data files for support of the Unicode Common Locale Data Repository (CLDR), which provides customization for different languages.
Reorderable Categories. The data files for CLDR order characters strictly by certain major categories. This allows programmers to parametrically reorder these groups of characters to put them in the desired order for different languages. For example, numbers can be ordered after letters, or Cyrillic before Latin. The reorderable categories are:
whitespace, punctuation, general symbols, currency symbols, and numbers, then Latin, Greek, Coptic, Cyrillic, ..., Egyptian Hieroglyphs, and finally, CJK.
Distinguishing Symbols from Punctuation. UCA provides an option for ignoring certain characters when comparing strings. By default, these are whitespace, punctuation, and general symbols. The data files for CLDR modify that default so that symbols are compared significantly, while still ignoring whitespace and punctuation. Thus, for example, "I♥NY" is not sorted the same as "I☠NY".
Special Database Values. The data files for CLDR provide special weights for two noncharacters:
1. A special noncharacter (U+FFFF) for specification of a range in a database, allowing "Sch" ≤ X ≤ "Sch" to pick all strings starting with "sch" plus those that sort equivalently.
2. A special noncharacter (U+FFFE) for merged database fields, allowing "DisílvaJohn" to sort next to "DisilvaJohn".
The version of CLDR using these new data files is planned for release at the start of December, 2010.
The text of the UCA standard has been clarified in different areas. Implementers should pay special attention to the changes regarding ill-formed sequences, noncharacters, and unassigned code points in CJK blocks.
For more information, see:
* The UCA Standard 6.0.0: http://www.unicode.org/reports/tr10/
* The UCA charts: http://unicode.org/charts/collation/
* The UCA data: http://unicode.org/Public/UCA/6.0.0/
* Merged database fields: http://unicode.org/reports/tr10/#Interleaved_Levels
About The Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry.
Members are: Adobe, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, The Society for Natural Language Technology Research, SAP, The University of California (Berkeley), The University of California (Santa Cruz), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.
For more information, please contact the Unicode Consortium. http://www.unicode.org/contacts.html
游客ReviewFS Emeric字体：一款令人愉悦的新字体font article没看看有这款字体啊
我在这儿Review网页设计中的艺术字体设计与排版（1）font article很好 值得学习
游客Review免费商用中文字体有哪些？font article新蒂字体 授权使用： 无需备案免费使用的情况： 在个人电脑上安装、打印个人文档、个人网站、博客、微博配图 需要备案免费使用的情况： 设计有可能商用的稿件（在商用时购买商业授权）、用于免费提供给他人的印刷品（印量在500份以内，且无需获得印刷品行政许可）的情况、用作没有品牌冠名的公益广告、用作完全免费（不能含有收费项目）的软件及网络服务。 需要购买个人授权的情况： 个人网店的装饰、个体商...