Magellan Linux

Annotation of /trunk/mkinitrd-magellan/busybox/docs/keep_data_small.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1123 - (hide annotations) (download)
Wed Aug 18 21:56:57 2010 UTC (13 years, 9 months ago) by niro
File MIME type: text/plain
File size: 8233 byte(s)
-updated to busybox-1.17.1
1 niro 816 Keeping data small
2    
3     When many applets are compiled into busybox, all rw data and
4     bss for each applet are concatenated. Including those from libc,
5     if static busybox is built. When busybox is started, _all_ this data
6     is allocated, not just that one part for selected applet.
7    
8     What "allocated" exactly means, depends on arch.
9     On NOMMU it's probably bites the most, actually using real
10     RAM for rwdata and bss. On i386, bss is lazily allocated
11     by COWed zero pages. Not sure about rwdata - also COW?
12    
13     In order to keep busybox NOMMU and small-mem systems friendly
14     we should avoid large global data in our applets, and should
15     minimize usage of libc functions which implicitly use
16     such structures.
17    
18     Small experiment to measure "parasitic" bbox memory consumption:
19     here we start 1000 "busybox sleep 10" in parallel.
20     busybox binary is practically allyesconfig static one,
21     built against uclibc. Run on x86-64 machine with 64-bit kernel:
22    
23     bash-3.2# nmeter '%t %c %m %p %[pn]'
24     23:17:28 .......... 168M 0 147
25     23:17:29 .......... 168M 0 147
26     23:17:30 U......... 168M 1 147
27     23:17:31 SU........ 181M 244 391
28     23:17:32 SSSSUUU... 223M 757 1147
29     23:17:33 UUU....... 223M 0 1147
30     23:17:34 U......... 223M 1 1147
31     23:17:35 .......... 223M 0 1147
32     23:17:36 .......... 223M 0 1147
33     23:17:37 S......... 223M 0 1147
34     23:17:38 .......... 223M 1 1147
35     23:17:39 .......... 223M 0 1147
36     23:17:40 .......... 223M 0 1147
37     23:17:41 .......... 210M 0 906
38     23:17:42 .......... 168M 1 147
39     23:17:43 .......... 168M 0 147
40    
41     This requires 55M of memory. Thus 1 trivial busybox applet
42     takes 55k of memory on 64-bit x86 kernel.
43    
44     On 32-bit kernel we need ~26k per applet.
45    
46     Script:
47    
48     i=1000; while test $i != 0; do
49     echo -n .
50     busybox sleep 30 &
51     i=$((i - 1))
52     done
53     echo
54     wait
55    
56     (Data from NOMMU arches are sought. Provide 'size busybox' output too)
57    
58    
59     Example 1
60    
61     One example how to reduce global data usage is in
62     archival/libunarchive/decompress_unzip.c:
63    
64     /* This is somewhat complex-looking arrangement, but it allows
65     * to place decompressor state either in bss or in
66     * malloc'ed space simply by changing #defines below.
67     * Sizes on i386:
68     * text data bss dec hex
69     * 5256 0 108 5364 14f4 - bss
70     * 4915 0 0 4915 1333 - malloc
71     */
72     #define STATE_IN_BSS 0
73     #define STATE_IN_MALLOC 1
74    
75     (see the rest of the file to get the idea)
76    
77     This example completely eliminates globals in that module.
78     Required memory is allocated in unpack_gz_stream() [its main module]
79     and then passed down to all subroutines which need to access 'globals'
80     as a parameter.
81    
82    
83     Example 2
84    
85     In case you don't want to pass this additional parameter everywhere,
86     take a look at archival/gzip.c. Here all global data is replaced by
87     single global pointer (ptr_to_globals) to allocated storage.
88    
89     In order to not duplicate ptr_to_globals in every applet, you can
90     reuse single common one. It is defined in libbb/messages.c
91     as struct globals *const ptr_to_globals, but the struct globals is
92     NOT defined in libbb.h. You first define your own struct:
93    
94     struct globals { int a; char buf[1000]; };
95    
96     and then declare that ptr_to_globals is a pointer to it:
97    
98     #define G (*ptr_to_globals)
99    
100     ptr_to_globals is declared as constant pointer.
101     This helps gcc understand that it won't change, resulting in noticeably
102     smaller code. In order to assign it, use SET_PTR_TO_GLOBALS macro:
103    
104     SET_PTR_TO_GLOBALS(xzalloc(sizeof(G)));
105    
106     Typically it is done in <applet>_main().
107    
108     Now you can reference "globals" by G.a, G.buf and so on, in any function.
109    
110    
111     bb_common_bufsiz1
112    
113     There is one big common buffer in bss - bb_common_bufsiz1. It is a much
114     earlier mechanism to reduce bss usage. Each applet can use it for
115     its needs. Library functions are prohibited from using it.
116    
117     'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer:
118    
119     #define G (*(struct globals*)&bb_common_bufsiz1)
120    
121     Be careful, though, and use it only if globals fit into bb_common_bufsiz1.
122     Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change
123     from one libc to another, you have to add compile-time check for it:
124    
125     if (sizeof(struct globals) > sizeof(bb_common_bufsiz1))
126     BUG_<applet>_globals_too_big();
127    
128    
129     Drawbacks
130    
131     You have to initialize it by hand. xzalloc() can be helpful in clearing
132     allocated storage to 0, but anything more must be done by hand.
133    
134     All global variables are prefixed by 'G.' now. If this makes code
135     less readable, use #defines:
136    
137     #define dev_fd (G.dev_fd)
138     #define sector (G.sector)
139    
140    
141     Word of caution
142    
143     If applet doesn't use much of global data, converting it to use
144     one of above methods is not worth the resulting code obfuscation.
145     If you have less than ~300 bytes of global data - don't bother.
146    
147    
148 niro 1123 Finding non-shared duplicated strings
149    
150     strings busybox | sort | uniq -c | sort -nr
151    
152    
153 niro 816 gcc's data alignment problem
154    
155     The following attribute added in vi.c:
156    
157     static int tabstop;
158     static struct termios term_orig __attribute__ ((aligned (4)));
159     static struct termios term_vi __attribute__ ((aligned (4)));
160    
161     reduces bss size by 32 bytes, because gcc sometimes aligns structures to
162     ridiculously large values. asm output diff for above example:
163    
164     tabstop:
165     .zero 4
166     .section .bss.term_orig,"aw",@nobits
167     - .align 32
168     + .align 4
169     .type term_orig, @object
170     .size term_orig, 60
171     term_orig:
172     .zero 60
173     .section .bss.term_vi,"aw",@nobits
174     - .align 32
175     + .align 4
176     .type term_vi, @object
177     .size term_vi, 60
178    
179     gcc doesn't seem to have options for altering this behaviour.
180    
181     gcc 3.4.3 and 4.1.1 tested:
182     char c = 1;
183     // gcc aligns to 32 bytes if sizeof(struct) >= 32
184     struct {
185     int a,b,c,d;
186     int i1,i2,i3;
187     } s28 = { 1 }; // struct will be aligned to 4 bytes
188     struct {
189     int a,b,c,d;
190     int i1,i2,i3,i4;
191     } s32 = { 1 }; // struct will be aligned to 32 bytes
192     // same for arrays
193     char vc31[31] = { 1 }; // unaligned
194     char vc32[32] = { 1 }; // aligned to 32 bytes
195    
196     -fpack-struct=1 reduces alignment of s28 to 1 (but probably
197     will break layout of many libc structs) but s32 and vc32
198     are still aligned to 32 bytes.
199    
200     I will try to cook up a patch to add a gcc option for disabling it.
201     Meanwhile, this is where it can be disabled in gcc source:
202    
203     gcc/config/i386/i386.c
204     int
205     ix86_data_alignment (tree type, int align)
206     {
207     #if 0
208     if (AGGREGATE_TYPE_P (type)
209     && TYPE_SIZE (type)
210     && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
211     && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
212     || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
213     return 256;
214     #endif
215    
216     Result (non-static busybox built against glibc):
217    
218     # size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox
219     text data bss dec hex filename
220     634416 2736 23856 661008 a1610 busybox
221     632580 2672 22944 658196 a0b14 busybox_noalign
222 niro 984
223    
224    
225     Keeping code small
226    
227     Set CONFIG_EXTRA_CFLAGS="-fno-inline-functions-called-once",
228     produce "make bloatcheck", see the biggest auto-inlined functions.
229     Now, set CONFIG_EXTRA_CFLAGS back to "", but add NOINLINE
230     to some of these functions. In 1.16.x timeframe, the results were
231     (annotated "make bloatcheck" output):
232    
233     function old new delta
234     expand_vars_to_list - 1712 +1712 win
235     lzo1x_optimize - 1429 +1429 win
236     arith_apply - 1326 +1326 win
237     read_interfaces - 1163 +1163 loss, leave w/o NOINLINE
238     logdir_open - 1148 +1148 win
239     check_deps - 1148 +1148 loss
240     rewrite - 1039 +1039 win
241     run_pipe 358 1396 +1038 win
242     write_status_file - 1029 +1029 almost the same, leave w/o NOINLINE
243     dump_identity - 987 +987 win
244     mainQSort3 - 921 +921 win
245     parse_one_line - 916 +916 loss
246     summarize - 897 +897 almost the same
247     do_shm - 884 +884 win
248     cpio_o - 863 +863 win
249     subCommand - 841 +841 loss
250     receive - 834 +834 loss
251    
252     855 bytes saved in total.
253    
254     scripts/mkdiff_obj_bloat may be useful to automate this process: run
255     "scripts/mkdiff_obj_bloat NORMALLY_BUILT_TREE FORCED_NOINLINE_TREE"
256     and select modules which shrank.