Contents of /trunk/mkinitrd-magellan/busybox/docs/unicode.txt
Parent Directory | Revision Log
Revision 1123 -
(show annotations)
(download)
Wed Aug 18 21:56:57 2010 UTC (14 years, 1 month ago) by niro
File MIME type: text/plain
File size: 2362 byte(s)
Wed Aug 18 21:56:57 2010 UTC (14 years, 1 month ago) by niro
File MIME type: text/plain
File size: 2362 byte(s)
-updated to busybox-1.17.1
1 | Unicode support in busybox |
2 | |
3 | There are several scenarios where we need to handle unicode |
4 | correctly. |
5 | |
6 | Shell input |
7 | |
8 | We want to correctly handle input of unicode characters. |
9 | There are several problems with it. Just handling input |
10 | as sequence of bytes would break any editing. This was fixed |
11 | and now lineedit operates on the array of wchar_t's. |
12 | But we also need to handle the following problematic moments: |
13 | |
14 | * It is unreasonable to expect that output device supports |
15 | _any_ unicode chars. Perhaps we need to avoid printing |
16 | those chars which are not supported by output device. |
17 | Examples: chars which are not present in the font, |
18 | chars which are not assigned in unicode, |
19 | combining chars (especially trying to combine bad pairs: |
20 | a_chinese_symbol + "combining grave accent" = ??!) |
21 | |
22 | * We need to account for the fact that unicode chars have |
23 | different widths: 0 for combining chars, 1 for usual, |
24 | 2 for ideograms (are there 3+ wide chars?). |
25 | |
26 | * Bidirectional handling. If user wants to echo a phrase |
27 | in Hebrew, he types: echo "srettel werbeH" |
28 | |
29 | Editors (vi, ed) |
30 | |
31 | This case is a bit similar to "shell input", but unlike shell, |
32 | editors may encounder many more unexpected unicode sequences |
33 | (try to load a random binary file...), and they need to preserve |
34 | them, unlike shell which can afford to drop bogus input. |
35 | |
36 | more, less |
37 | |
38 | Need to correctly display any input file. Ideally, with |
39 | ASCII/unicode/filtered_unicode option or keyboard switch. |
40 | Note: need to handle tabs and backspaces specially |
41 | (bksp is for manpage compat). |
42 | |
43 | cut, fold, watch |
44 | |
45 | May need ability to cut unicode string to specified number of wchars |
46 | and/or to specified screen width. Need to handle tabs specially. |
47 | |
48 | sed, awk, grep |
49 | |
50 | Handle unicode-aware regexp match |
51 | |
52 | ls (multi-column display) |
53 | |
54 | ls will fail to line up columnar output if it will not account |
55 | for character widths (and maybe filter out some of them, see |
56 | above). OTOH, non-columnar views (ls -1, ls -l, ls | car) |
57 | should NOT filter out bad unicode (but need to filter out |
58 | control chars (coreutils does that). Note that unlike more/less, |
59 | tabs and backspaces need not special handling. |
60 | |
61 | top, ps |
62 | |
63 | Need to perform filtering similar to ls. |
64 | |
65 | Filename display (in error messages and elsewhere) |
66 | |
67 | Need to perform filtering similar to ls. |
68 | |
69 | |
70 | TODO: write an email to Asmus Freytag (asmus@unicode.org), |
71 | author of http://unicode.org/reports/tr11/ |