Contents of /alx-src/tags/kernel26-2.6.12-alx-r9/Documentation/IPMI.txt
Parent Directory | Revision Log
Revision 630 -
(show annotations)
(download)
Wed Mar 4 11:03:09 2009 UTC (15 years, 6 months ago) by niro
File MIME type: text/plain
File size: 21237 byte(s)
Wed Mar 4 11:03:09 2009 UTC (15 years, 6 months ago) by niro
File MIME type: text/plain
File size: 21237 byte(s)
Tag kernel26-2.6.12-alx-r9
1 | |
2 | The Linux IPMI Driver |
3 | --------------------- |
4 | Corey Minyard |
5 | <minyard@mvista.com> |
6 | <minyard@acm.org> |
7 | |
8 | The Intelligent Platform Management Interface, or IPMI, is a |
9 | standard for controlling intelligent devices that monitor a system. |
10 | It provides for dynamic discovery of sensors in the system and the |
11 | ability to monitor the sensors and be informed when the sensor's |
12 | values change or go outside certain boundaries. It also has a |
13 | standardized database for field-replacable units (FRUs) and a watchdog |
14 | timer. |
15 | |
16 | To use this, you need an interface to an IPMI controller in your |
17 | system (called a Baseboard Management Controller, or BMC) and |
18 | management software that can use the IPMI system. |
19 | |
20 | This document describes how to use the IPMI driver for Linux. If you |
21 | are not familiar with IPMI itself, see the web site at |
22 | http://www.intel.com/design/servers/ipmi/index.htm. IPMI is a big |
23 | subject and I can't cover it all here! |
24 | |
25 | Configuration |
26 | ------------- |
27 | |
28 | The LinuxIPMI driver is modular, which means you have to pick several |
29 | things to have it work right depending on your hardware. Most of |
30 | these are available in the 'Character Devices' menu. |
31 | |
32 | No matter what, you must pick 'IPMI top-level message handler' to use |
33 | IPMI. What you do beyond that depends on your needs and hardware. |
34 | |
35 | The message handler does not provide any user-level interfaces. |
36 | Kernel code (like the watchdog) can still use it. If you need access |
37 | from userland, you need to select 'Device interface for IPMI' if you |
38 | want access through a device driver. Another interface is also |
39 | available, you may select 'IPMI sockets' in the 'Networking Support' |
40 | main menu. This provides a socket interface to IPMI. You may select |
41 | both of these at the same time, they will both work together. |
42 | |
43 | The driver interface depends on your hardware. If you have a board |
44 | with a standard interface (These will generally be either "KCS", |
45 | "SMIC", or "BT", consult your hardware manual), choose the 'IPMI SI |
46 | handler' option. A driver also exists for direct I2C access to the |
47 | IPMI management controller. Some boards support this, but it is |
48 | unknown if it will work on every board. For this, choose 'IPMI SMBus |
49 | handler', but be ready to try to do some figuring to see if it will |
50 | work. |
51 | |
52 | There is also a KCS-only driver interface supplied, but it is |
53 | depracated in favor of the SI interface. |
54 | |
55 | You should generally enable ACPI on your system, as systems with IPMI |
56 | should have ACPI tables describing them. |
57 | |
58 | If you have a standard interface and the board manufacturer has done |
59 | their job correctly, the IPMI controller should be automatically |
60 | detect (via ACPI or SMBIOS tables) and should just work. Sadly, many |
61 | boards do not have this information. The driver attempts standard |
62 | defaults, but they may not work. If you fall into this situation, you |
63 | need to read the section below named 'The SI Driver' on how to |
64 | hand-configure your system. |
65 | |
66 | IPMI defines a standard watchdog timer. You can enable this with the |
67 | 'IPMI Watchdog Timer' config option. If you compile the driver into |
68 | the kernel, then via a kernel command-line option you can have the |
69 | watchdog timer start as soon as it intitializes. It also have a lot |
70 | of other options, see the 'Watchdog' section below for more details. |
71 | Note that you can also have the watchdog continue to run if it is |
72 | closed (by default it is disabled on close). Go into the 'Watchdog |
73 | Cards' menu, enable 'Watchdog Timer Support', and enable the option |
74 | 'Disable watchdog shutdown on close'. |
75 | |
76 | |
77 | Basic Design |
78 | ------------ |
79 | |
80 | The Linux IPMI driver is designed to be very modular and flexible, you |
81 | only need to take the pieces you need and you can use it in many |
82 | different ways. Because of that, it's broken into many chunks of |
83 | code. These chunks are: |
84 | |
85 | ipmi_msghandler - This is the central piece of software for the IPMI |
86 | system. It handles all messages, message timing, and responses. The |
87 | IPMI users tie into this, and the IPMI physical interfaces (called |
88 | System Management Interfaces, or SMIs) also tie in here. This |
89 | provides the kernelland interface for IPMI, but does not provide an |
90 | interface for use by application processes. |
91 | |
92 | ipmi_devintf - This provides a userland IOCTL interface for the IPMI |
93 | driver, each open file for this device ties in to the message handler |
94 | as an IPMI user. |
95 | |
96 | ipmi_si - A driver for various system interfaces. This supports |
97 | KCS, SMIC, and may support BT in the future. Unless you have your own |
98 | custom interface, you probably need to use this. |
99 | |
100 | ipmi_smb - A driver for accessing BMCs on the SMBus. It uses the |
101 | I2C kernel driver's SMBus interfaces to send and receive IPMI messages |
102 | over the SMBus. |
103 | |
104 | af_ipmi - A network socket interface to IPMI. This doesn't take up |
105 | a character device in your system. |
106 | |
107 | Note that the KCS-only interface ahs been removed. |
108 | |
109 | Much documentation for the interface is in the include files. The |
110 | IPMI include files are: |
111 | |
112 | net/af_ipmi.h - Contains the socket interface. |
113 | |
114 | linux/ipmi.h - Contains the user interface and IOCTL interface for IPMI. |
115 | |
116 | linux/ipmi_smi.h - Contains the interface for system management interfaces |
117 | (things that interface to IPMI controllers) to use. |
118 | |
119 | linux/ipmi_msgdefs.h - General definitions for base IPMI messaging. |
120 | |
121 | |
122 | Addressing |
123 | ---------- |
124 | |
125 | The IPMI addressing works much like IP addresses, you have an overlay |
126 | to handle the different address types. The overlay is: |
127 | |
128 | struct ipmi_addr |
129 | { |
130 | int addr_type; |
131 | short channel; |
132 | char data[IPMI_MAX_ADDR_SIZE]; |
133 | }; |
134 | |
135 | The addr_type determines what the address really is. The driver |
136 | currently understands two different types of addresses. |
137 | |
138 | "System Interface" addresses are defined as: |
139 | |
140 | struct ipmi_system_interface_addr |
141 | { |
142 | int addr_type; |
143 | short channel; |
144 | }; |
145 | |
146 | and the type is IPMI_SYSTEM_INTERFACE_ADDR_TYPE. This is used for talking |
147 | straight to the BMC on the current card. The channel must be |
148 | IPMI_BMC_CHANNEL. |
149 | |
150 | Messages that are destined to go out on the IPMB bus use the |
151 | IPMI_IPMB_ADDR_TYPE address type. The format is |
152 | |
153 | struct ipmi_ipmb_addr |
154 | { |
155 | int addr_type; |
156 | short channel; |
157 | unsigned char slave_addr; |
158 | unsigned char lun; |
159 | }; |
160 | |
161 | The "channel" here is generally zero, but some devices support more |
162 | than one channel, it corresponds to the channel as defined in the IPMI |
163 | spec. |
164 | |
165 | |
166 | Messages |
167 | -------- |
168 | |
169 | Messages are defined as: |
170 | |
171 | struct ipmi_msg |
172 | { |
173 | unsigned char netfn; |
174 | unsigned char lun; |
175 | unsigned char cmd; |
176 | unsigned char *data; |
177 | int data_len; |
178 | }; |
179 | |
180 | The driver takes care of adding/stripping the header information. The |
181 | data portion is just the data to be send (do NOT put addressing info |
182 | here) or the response. Note that the completion code of a response is |
183 | the first item in "data", it is not stripped out because that is how |
184 | all the messages are defined in the spec (and thus makes counting the |
185 | offsets a little easier :-). |
186 | |
187 | When using the IOCTL interface from userland, you must provide a block |
188 | of data for "data", fill it, and set data_len to the length of the |
189 | block of data, even when receiving messages. Otherwise the driver |
190 | will have no place to put the message. |
191 | |
192 | Messages coming up from the message handler in kernelland will come in |
193 | as: |
194 | |
195 | struct ipmi_recv_msg |
196 | { |
197 | struct list_head link; |
198 | |
199 | /* The type of message as defined in the "Receive Types" |
200 | defines above. */ |
201 | int recv_type; |
202 | |
203 | ipmi_user_t *user; |
204 | struct ipmi_addr addr; |
205 | long msgid; |
206 | struct ipmi_msg msg; |
207 | |
208 | /* Call this when done with the message. It will presumably free |
209 | the message and do any other necessary cleanup. */ |
210 | void (*done)(struct ipmi_recv_msg *msg); |
211 | |
212 | /* Place-holder for the data, don't make any assumptions about |
213 | the size or existence of this, since it may change. */ |
214 | unsigned char msg_data[IPMI_MAX_MSG_LENGTH]; |
215 | }; |
216 | |
217 | You should look at the receive type and handle the message |
218 | appropriately. |
219 | |
220 | |
221 | The Upper Layer Interface (Message Handler) |
222 | ------------------------------------------- |
223 | |
224 | The upper layer of the interface provides the users with a consistent |
225 | view of the IPMI interfaces. It allows multiple SMI interfaces to be |
226 | addressed (because some boards actually have multiple BMCs on them) |
227 | and the user should not have to care what type of SMI is below them. |
228 | |
229 | |
230 | Creating the User |
231 | |
232 | To user the message handler, you must first create a user using |
233 | ipmi_create_user. The interface number specifies which SMI you want |
234 | to connect to, and you must supply callback functions to be called |
235 | when data comes in. The callback function can run at interrupt level, |
236 | so be careful using the callbacks. This also allows to you pass in a |
237 | piece of data, the handler_data, that will be passed back to you on |
238 | all calls. |
239 | |
240 | Once you are done, call ipmi_destroy_user() to get rid of the user. |
241 | |
242 | From userland, opening the device automatically creates a user, and |
243 | closing the device automatically destroys the user. |
244 | |
245 | |
246 | Messaging |
247 | |
248 | To send a message from kernel-land, the ipmi_request() call does |
249 | pretty much all message handling. Most of the parameter are |
250 | self-explanatory. However, it takes a "msgid" parameter. This is NOT |
251 | the sequence number of messages. It is simply a long value that is |
252 | passed back when the response for the message is returned. You may |
253 | use it for anything you like. |
254 | |
255 | Responses come back in the function pointed to by the ipmi_recv_hndl |
256 | field of the "handler" that you passed in to ipmi_create_user(). |
257 | Remember again, these may be running at interrupt level. Remember to |
258 | look at the receive type, too. |
259 | |
260 | From userland, you fill out an ipmi_req_t structure and use the |
261 | IPMICTL_SEND_COMMAND ioctl. For incoming stuff, you can use select() |
262 | or poll() to wait for messages to come in. However, you cannot use |
263 | read() to get them, you must call the IPMICTL_RECEIVE_MSG with the |
264 | ipmi_recv_t structure to actually get the message. Remember that you |
265 | must supply a pointer to a block of data in the msg.data field, and |
266 | you must fill in the msg.data_len field with the size of the data. |
267 | This gives the receiver a place to actually put the message. |
268 | |
269 | If the message cannot fit into the data you provide, you will get an |
270 | EMSGSIZE error and the driver will leave the data in the receive |
271 | queue. If you want to get it and have it truncate the message, us |
272 | the IPMICTL_RECEIVE_MSG_TRUNC ioctl. |
273 | |
274 | When you send a command (which is defined by the lowest-order bit of |
275 | the netfn per the IPMI spec) on the IPMB bus, the driver will |
276 | automatically assign the sequence number to the command and save the |
277 | command. If the response is not receive in the IPMI-specified 5 |
278 | seconds, it will generate a response automatically saying the command |
279 | timed out. If an unsolicited response comes in (if it was after 5 |
280 | seconds, for instance), that response will be ignored. |
281 | |
282 | In kernelland, after you receive a message and are done with it, you |
283 | MUST call ipmi_free_recv_msg() on it, or you will leak messages. Note |
284 | that you should NEVER mess with the "done" field of a message, that is |
285 | required to properly clean up the message. |
286 | |
287 | Note that when sending, there is an ipmi_request_supply_msgs() call |
288 | that lets you supply the smi and receive message. This is useful for |
289 | pieces of code that need to work even if the system is out of buffers |
290 | (the watchdog timer uses this, for instance). You supply your own |
291 | buffer and own free routines. This is not recommended for normal use, |
292 | though, since it is tricky to manage your own buffers. |
293 | |
294 | |
295 | Events and Incoming Commands |
296 | |
297 | The driver takes care of polling for IPMI events and receiving |
298 | commands (commands are messages that are not responses, they are |
299 | commands that other things on the IPMB bus have sent you). To receive |
300 | these, you must register for them, they will not automatically be sent |
301 | to you. |
302 | |
303 | To receive events, you must call ipmi_set_gets_events() and set the |
304 | "val" to non-zero. Any events that have been received by the driver |
305 | since startup will immediately be delivered to the first user that |
306 | registers for events. After that, if multiple users are registered |
307 | for events, they will all receive all events that come in. |
308 | |
309 | For receiving commands, you have to individually register commands you |
310 | want to receive. Call ipmi_register_for_cmd() and supply the netfn |
311 | and command name for each command you want to receive. Only one user |
312 | may be registered for each netfn/cmd, but different users may register |
313 | for different commands. |
314 | |
315 | From userland, equivalent IOCTLs are provided to do these functions. |
316 | |
317 | |
318 | The Lower Layer (SMI) Interface |
319 | ------------------------------- |
320 | |
321 | As mentioned before, multiple SMI interfaces may be registered to the |
322 | message handler, each of these is assigned an interface number when |
323 | they register with the message handler. They are generally assigned |
324 | in the order they register, although if an SMI unregisters and then |
325 | another one registers, all bets are off. |
326 | |
327 | The ipmi_smi.h defines the interface for management interfaces, see |
328 | that for more details. |
329 | |
330 | |
331 | The SI Driver |
332 | ------------- |
333 | |
334 | The SI driver allows up to 4 KCS or SMIC interfaces to be configured |
335 | in the system. By default, scan the ACPI tables for interfaces, and |
336 | if it doesn't find any the driver will attempt to register one KCS |
337 | interface at the spec-specified I/O port 0xca2 without interrupts. |
338 | You can change this at module load time (for a module) with: |
339 | |
340 | modprobe ipmi_si.o type=<type1>,<type2>.... |
341 | ports=<port1>,<port2>... addrs=<addr1>,<addr2>... |
342 | irqs=<irq1>,<irq2>... trydefaults=[0|1] |
343 | regspacings=<sp1>,<sp2>,... regsizes=<size1>,<size2>,... |
344 | regshifts=<shift1>,<shift2>,... |
345 | slave_addrs=<addr1>,<addr2>,... |
346 | |
347 | Each of these except si_trydefaults is a list, the first item for the |
348 | first interface, second item for the second interface, etc. |
349 | |
350 | The si_type may be either "kcs", "smic", or "bt". If you leave it blank, it |
351 | defaults to "kcs". |
352 | |
353 | If you specify si_addrs as non-zero for an interface, the driver will |
354 | use the memory address given as the address of the device. This |
355 | overrides si_ports. |
356 | |
357 | If you specify si_ports as non-zero for an interface, the driver will |
358 | use the I/O port given as the device address. |
359 | |
360 | If you specify si_irqs as non-zero for an interface, the driver will |
361 | attempt to use the given interrupt for the device. |
362 | |
363 | si_trydefaults sets whether the standard IPMI interface at 0xca2 and |
364 | any interfaces specified by ACPE are tried. By default, the driver |
365 | tries it, set this value to zero to turn this off. |
366 | |
367 | The next three parameters have to do with register layout. The |
368 | registers used by the interfaces may not appear at successive |
369 | locations and they may not be in 8-bit registers. These parameters |
370 | allow the layout of the data in the registers to be more precisely |
371 | specified. |
372 | |
373 | The regspacings parameter give the number of bytes between successive |
374 | register start addresses. For instance, if the regspacing is set to 4 |
375 | and the start address is 0xca2, then the address for the second |
376 | register would be 0xca6. This defaults to 1. |
377 | |
378 | The regsizes parameter gives the size of a register, in bytes. The |
379 | data used by IPMI is 8-bits wide, but it may be inside a larger |
380 | register. This parameter allows the read and write type to specified. |
381 | It may be 1, 2, 4, or 8. The default is 1. |
382 | |
383 | Since the register size may be larger than 32 bits, the IPMI data may not |
384 | be in the lower 8 bits. The regshifts parameter give the amount to shift |
385 | the data to get to the actual IPMI data. |
386 | |
387 | The slave_addrs specifies the IPMI address of the local BMC. This is |
388 | usually 0x20 and the driver defaults to that, but in case it's not, it |
389 | can be specified when the driver starts up. |
390 | |
391 | When compiled into the kernel, the addresses can be specified on the |
392 | kernel command line as: |
393 | |
394 | ipmi_si.type=<type1>,<type2>... |
395 | ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>... |
396 | ipmi_si.irqs=<irq1>,<irq2>... ipmi_si.trydefaults=[0|1] |
397 | ipmi_si.regspacings=<sp1>,<sp2>,... |
398 | ipmi_si.regsizes=<size1>,<size2>,... |
399 | ipmi_si.regshifts=<shift1>,<shift2>,... |
400 | ipmi_si.slave_addrs=<addr1>,<addr2>,... |
401 | |
402 | It works the same as the module parameters of the same names. |
403 | |
404 | By default, the driver will attempt to detect any device specified by |
405 | ACPI, and if none of those then a KCS device at the spec-specified |
406 | 0xca2. If you want to turn this off, set the "trydefaults" option to |
407 | false. |
408 | |
409 | If you have high-res timers compiled into the kernel, the driver will |
410 | use them to provide much better performance. Note that if you do not |
411 | have high-res timers enabled in the kernel and you don't have |
412 | interrupts enabled, the driver will run VERY slowly. Don't blame me, |
413 | these interfaces suck. |
414 | |
415 | |
416 | The SMBus Driver |
417 | ---------------- |
418 | |
419 | The SMBus driver allows up to 4 SMBus devices to be configured in the |
420 | system. By default, the driver will register any SMBus interfaces it finds |
421 | in the I2C address range of 0x20 to 0x4f on any adapter. You can change this |
422 | at module load time (for a module) with: |
423 | |
424 | modprobe ipmi_smb.o |
425 | addr=<adapter1>,<i2caddr1>[,<adapter2>,<i2caddr2>[,...]] |
426 | dbg=<flags1>,<flags2>... |
427 | [defaultprobe=0] [dbg_probe=1] |
428 | |
429 | The addresses are specified in pairs, the first is the adapter ID and the |
430 | second is the I2C address on that adapter. |
431 | |
432 | The debug flags are bit flags for each BMC found, they are: |
433 | IPMI messages: 1, driver state: 2, timing: 4, I2C probe: 8 |
434 | |
435 | Setting smb_defaultprobe to zero disabled the default probing of SMBus |
436 | interfaces at address range 0x20 to 0x4f. This means that only the |
437 | BMCs specified on the smb_addr line will be detected. |
438 | |
439 | Setting smb_dbg_probe to 1 will enable debugging of the probing and |
440 | detection process for BMCs on the SMBusses. |
441 | |
442 | Discovering the IPMI compilant BMC on the SMBus can cause devices |
443 | on the I2C bus to fail. The SMBus driver writes a "Get Device ID" IPMI |
444 | message as a block write to the I2C bus and waits for a response. |
445 | This action can be detrimental to some I2C devices. It is highly recommended |
446 | that the known I2c address be given to the SMBus driver in the smb_addr |
447 | parameter. The default adrress range will not be used when a smb_addr |
448 | parameter is provided. |
449 | |
450 | When compiled into the kernel, the addresses can be specified on the |
451 | kernel command line as: |
452 | |
453 | ipmb_smb.addr=<adapter1>,<i2caddr1>[,<adapter2>,<i2caddr2>[,...]] |
454 | ipmi_smb.dbg=<flags1>,<flags2>... |
455 | ipmi_smb.defaultprobe=0 ipmi_smb.dbg_probe=1 |
456 | |
457 | These are the same options as on the module command line. |
458 | |
459 | Note that you might need some I2C changes if CONFIG_IPMI_PANIC_EVENT |
460 | is enabled along with this, so the I2C driver knows to run to |
461 | completion during sending a panic event. |
462 | |
463 | |
464 | Other Pieces |
465 | ------------ |
466 | |
467 | Watchdog |
468 | -------- |
469 | |
470 | A watchdog timer is provided that implements the Linux-standard |
471 | watchdog timer interface. It has three module parameters that can be |
472 | used to control it: |
473 | |
474 | modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type> |
475 | preaction=<preaction type> preop=<preop type> start_now=x |
476 | nowayout=x |
477 | |
478 | The timeout is the number of seconds to the action, and the pretimeout |
479 | is the amount of seconds before the reset that the pre-timeout panic will |
480 | occur (if pretimeout is zero, then pretimeout will not be enabled). Note |
481 | that the pretimeout is the time before the final timeout. So if the |
482 | timeout is 50 seconds and the pretimeout is 10 seconds, then the pretimeout |
483 | will occur in 40 second (10 seconds before the timeout). |
484 | |
485 | The action may be "reset", "power_cycle", or "power_off", and |
486 | specifies what to do when the timer times out, and defaults to |
487 | "reset". |
488 | |
489 | The preaction may be "pre_smi" for an indication through the SMI |
490 | interface, "pre_int" for an indication through the SMI with an |
491 | interrupts, and "pre_nmi" for a NMI on a preaction. This is how |
492 | the driver is informed of the pretimeout. |
493 | |
494 | The preop may be set to "preop_none" for no operation on a pretimeout, |
495 | "preop_panic" to set the preoperation to panic, or "preop_give_data" |
496 | to provide data to read from the watchdog device when the pretimeout |
497 | occurs. A "pre_nmi" setting CANNOT be used with "preop_give_data" |
498 | because you can't do data operations from an NMI. |
499 | |
500 | When preop is set to "preop_give_data", one byte comes ready to read |
501 | on the device when the pretimeout occurs. Select and fasync work on |
502 | the device, as well. |
503 | |
504 | If start_now is set to 1, the watchdog timer will start running as |
505 | soon as the driver is loaded. |
506 | |
507 | If nowayout is set to 1, the watchdog timer will not stop when the |
508 | watchdog device is closed. The default value of nowayout is true |
509 | if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not. |
510 | |
511 | When compiled into the kernel, the kernel command line is available |
512 | for configuring the watchdog: |
513 | |
514 | ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t> |
515 | ipmi_watchdog.action=<action type> |
516 | ipmi_watchdog.preaction=<preaction type> |
517 | ipmi_watchdog.preop=<preop type> |
518 | ipmi_watchdog.start_now=x |
519 | ipmi_watchdog.nowayout=x |
520 | |
521 | The options are the same as the module parameter options. |
522 | |
523 | The watchdog will panic and start a 120 second reset timeout if it |
524 | gets a pre-action. During a panic or a reboot, the watchdog will |
525 | start a 120 timer if it is running to make sure the reboot occurs. |
526 | |
527 | Note that if you use the NMI preaction for the watchdog, you MUST |
528 | NOT use nmi watchdog mode 1. If you use the NMI watchdog, you |
529 | must use mode 2. |
530 | |
531 | Once you open the watchdog timer, you must write a 'V' character to the |
532 | device to close it, or the timer will not stop. This is a new semantic |
533 | for the driver, but makes it consistent with the rest of the watchdog |
534 | drivers in Linux. |