Contents of /tags/mkinitrd-6_1_5/busybox/docs/keep_data_small.txt
Parent Directory | Revision Log
Revision 899 -
(show annotations)
(download)
Wed Aug 5 17:52:52 2009 UTC (15 years, 1 month ago) by niro
File MIME type: text/plain
File size: 6757 byte(s)
Wed Aug 5 17:52:52 2009 UTC (15 years, 1 month ago) by niro
File MIME type: text/plain
File size: 6757 byte(s)
tagged 'mkinitrd-6_1_5'
1 | Keeping data small |
2 | |
3 | When many applets are compiled into busybox, all rw data and |
4 | bss for each applet are concatenated. Including those from libc, |
5 | if static busybox is built. When busybox is started, _all_ this data |
6 | is allocated, not just that one part for selected applet. |
7 | |
8 | What "allocated" exactly means, depends on arch. |
9 | On NOMMU it's probably bites the most, actually using real |
10 | RAM for rwdata and bss. On i386, bss is lazily allocated |
11 | by COWed zero pages. Not sure about rwdata - also COW? |
12 | |
13 | In order to keep busybox NOMMU and small-mem systems friendly |
14 | we should avoid large global data in our applets, and should |
15 | minimize usage of libc functions which implicitly use |
16 | such structures. |
17 | |
18 | Small experiment to measure "parasitic" bbox memory consumption: |
19 | here we start 1000 "busybox sleep 10" in parallel. |
20 | busybox binary is practically allyesconfig static one, |
21 | built against uclibc. Run on x86-64 machine with 64-bit kernel: |
22 | |
23 | bash-3.2# nmeter '%t %c %m %p %[pn]' |
24 | 23:17:28 .......... 168M 0 147 |
25 | 23:17:29 .......... 168M 0 147 |
26 | 23:17:30 U......... 168M 1 147 |
27 | 23:17:31 SU........ 181M 244 391 |
28 | 23:17:32 SSSSUUU... 223M 757 1147 |
29 | 23:17:33 UUU....... 223M 0 1147 |
30 | 23:17:34 U......... 223M 1 1147 |
31 | 23:17:35 .......... 223M 0 1147 |
32 | 23:17:36 .......... 223M 0 1147 |
33 | 23:17:37 S......... 223M 0 1147 |
34 | 23:17:38 .......... 223M 1 1147 |
35 | 23:17:39 .......... 223M 0 1147 |
36 | 23:17:40 .......... 223M 0 1147 |
37 | 23:17:41 .......... 210M 0 906 |
38 | 23:17:42 .......... 168M 1 147 |
39 | 23:17:43 .......... 168M 0 147 |
40 | |
41 | This requires 55M of memory. Thus 1 trivial busybox applet |
42 | takes 55k of memory on 64-bit x86 kernel. |
43 | |
44 | On 32-bit kernel we need ~26k per applet. |
45 | |
46 | Script: |
47 | |
48 | i=1000; while test $i != 0; do |
49 | echo -n . |
50 | busybox sleep 30 & |
51 | i=$((i - 1)) |
52 | done |
53 | echo |
54 | wait |
55 | |
56 | (Data from NOMMU arches are sought. Provide 'size busybox' output too) |
57 | |
58 | |
59 | Example 1 |
60 | |
61 | One example how to reduce global data usage is in |
62 | archival/libunarchive/decompress_unzip.c: |
63 | |
64 | /* This is somewhat complex-looking arrangement, but it allows |
65 | * to place decompressor state either in bss or in |
66 | * malloc'ed space simply by changing #defines below. |
67 | * Sizes on i386: |
68 | * text data bss dec hex |
69 | * 5256 0 108 5364 14f4 - bss |
70 | * 4915 0 0 4915 1333 - malloc |
71 | */ |
72 | #define STATE_IN_BSS 0 |
73 | #define STATE_IN_MALLOC 1 |
74 | |
75 | (see the rest of the file to get the idea) |
76 | |
77 | This example completely eliminates globals in that module. |
78 | Required memory is allocated in unpack_gz_stream() [its main module] |
79 | and then passed down to all subroutines which need to access 'globals' |
80 | as a parameter. |
81 | |
82 | |
83 | Example 2 |
84 | |
85 | In case you don't want to pass this additional parameter everywhere, |
86 | take a look at archival/gzip.c. Here all global data is replaced by |
87 | single global pointer (ptr_to_globals) to allocated storage. |
88 | |
89 | In order to not duplicate ptr_to_globals in every applet, you can |
90 | reuse single common one. It is defined in libbb/messages.c |
91 | as struct globals *const ptr_to_globals, but the struct globals is |
92 | NOT defined in libbb.h. You first define your own struct: |
93 | |
94 | struct globals { int a; char buf[1000]; }; |
95 | |
96 | and then declare that ptr_to_globals is a pointer to it: |
97 | |
98 | #define G (*ptr_to_globals) |
99 | |
100 | ptr_to_globals is declared as constant pointer. |
101 | This helps gcc understand that it won't change, resulting in noticeably |
102 | smaller code. In order to assign it, use SET_PTR_TO_GLOBALS macro: |
103 | |
104 | SET_PTR_TO_GLOBALS(xzalloc(sizeof(G))); |
105 | |
106 | Typically it is done in <applet>_main(). |
107 | |
108 | Now you can reference "globals" by G.a, G.buf and so on, in any function. |
109 | |
110 | |
111 | bb_common_bufsiz1 |
112 | |
113 | There is one big common buffer in bss - bb_common_bufsiz1. It is a much |
114 | earlier mechanism to reduce bss usage. Each applet can use it for |
115 | its needs. Library functions are prohibited from using it. |
116 | |
117 | 'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer: |
118 | |
119 | #define G (*(struct globals*)&bb_common_bufsiz1) |
120 | |
121 | Be careful, though, and use it only if globals fit into bb_common_bufsiz1. |
122 | Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change |
123 | from one libc to another, you have to add compile-time check for it: |
124 | |
125 | if (sizeof(struct globals) > sizeof(bb_common_bufsiz1)) |
126 | BUG_<applet>_globals_too_big(); |
127 | |
128 | |
129 | Drawbacks |
130 | |
131 | You have to initialize it by hand. xzalloc() can be helpful in clearing |
132 | allocated storage to 0, but anything more must be done by hand. |
133 | |
134 | All global variables are prefixed by 'G.' now. If this makes code |
135 | less readable, use #defines: |
136 | |
137 | #define dev_fd (G.dev_fd) |
138 | #define sector (G.sector) |
139 | |
140 | |
141 | Word of caution |
142 | |
143 | If applet doesn't use much of global data, converting it to use |
144 | one of above methods is not worth the resulting code obfuscation. |
145 | If you have less than ~300 bytes of global data - don't bother. |
146 | |
147 | |
148 | gcc's data alignment problem |
149 | |
150 | The following attribute added in vi.c: |
151 | |
152 | static int tabstop; |
153 | static struct termios term_orig __attribute__ ((aligned (4))); |
154 | static struct termios term_vi __attribute__ ((aligned (4))); |
155 | |
156 | reduces bss size by 32 bytes, because gcc sometimes aligns structures to |
157 | ridiculously large values. asm output diff for above example: |
158 | |
159 | tabstop: |
160 | .zero 4 |
161 | .section .bss.term_orig,"aw",@nobits |
162 | - .align 32 |
163 | + .align 4 |
164 | .type term_orig, @object |
165 | .size term_orig, 60 |
166 | term_orig: |
167 | .zero 60 |
168 | .section .bss.term_vi,"aw",@nobits |
169 | - .align 32 |
170 | + .align 4 |
171 | .type term_vi, @object |
172 | .size term_vi, 60 |
173 | |
174 | gcc doesn't seem to have options for altering this behaviour. |
175 | |
176 | gcc 3.4.3 and 4.1.1 tested: |
177 | char c = 1; |
178 | // gcc aligns to 32 bytes if sizeof(struct) >= 32 |
179 | struct { |
180 | int a,b,c,d; |
181 | int i1,i2,i3; |
182 | } s28 = { 1 }; // struct will be aligned to 4 bytes |
183 | struct { |
184 | int a,b,c,d; |
185 | int i1,i2,i3,i4; |
186 | } s32 = { 1 }; // struct will be aligned to 32 bytes |
187 | // same for arrays |
188 | char vc31[31] = { 1 }; // unaligned |
189 | char vc32[32] = { 1 }; // aligned to 32 bytes |
190 | |
191 | -fpack-struct=1 reduces alignment of s28 to 1 (but probably |
192 | will break layout of many libc structs) but s32 and vc32 |
193 | are still aligned to 32 bytes. |
194 | |
195 | I will try to cook up a patch to add a gcc option for disabling it. |
196 | Meanwhile, this is where it can be disabled in gcc source: |
197 | |
198 | gcc/config/i386/i386.c |
199 | int |
200 | ix86_data_alignment (tree type, int align) |
201 | { |
202 | #if 0 |
203 | if (AGGREGATE_TYPE_P (type) |
204 | && TYPE_SIZE (type) |
205 | && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST |
206 | && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256 |
207 | || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256) |
208 | return 256; |
209 | #endif |
210 | |
211 | Result (non-static busybox built against glibc): |
212 | |
213 | # size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox |
214 | text data bss dec hex filename |
215 | 634416 2736 23856 661008 a1610 busybox |
216 | 632580 2672 22944 658196 a0b14 busybox_noalign |