Magellan Linux

Annotation of /alx-src/trunk/kernel26-alx/linux/Documentation/MSI-HOWTO.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 628 - (hide annotations) (download)
Wed Mar 4 10:48:58 2009 UTC (15 years, 2 months ago) by niro
File MIME type: text/plain
File size: 23033 byte(s)
import linux sources based on 2.6.12-alx-r9:
 -using linux-2.6.12.6
 -using 2.6.12-ck6 patch set
 -using fbsplash-0.9.2-r3
 -using vesafb-tng-0.9-rc7
 -using squashfs-2.2
 -added cddvd-cmdfilter-drop.patch as ck dropped it
 -added via-epia-dri (cle266) patch
 -added zd1211-svn-32 wlan driver (http://zd1211.ath.cx/download/)
 -added debian patches to zd1211 for wep256 etc

1 niro 628 The MSI Driver Guide HOWTO
2     Tom L Nguyen tom.l.nguyen@intel.com
3     10/03/2003
4     Revised Feb 12, 2004 by Martine Silbermann
5     email: Martine.Silbermann@hp.com
6     Revised Jun 25, 2004 by Tom L Nguyen
7    
8     1. About this guide
9    
10     This guide describes the basics of Message Signaled Interrupts (MSI),
11     the advantages of using MSI over traditional interrupt mechanisms,
12     and how to enable your driver to use MSI or MSI-X. Also included is
13     a Frequently Asked Questions.
14    
15     2. Copyright 2003 Intel Corporation
16    
17     3. What is MSI/MSI-X?
18    
19     Message Signaled Interrupt (MSI), as described in the PCI Local Bus
20     Specification Revision 2.3 or latest, is an optional feature, and a
21     required feature for PCI Express devices. MSI enables a device function
22     to request service by sending an Inbound Memory Write on its PCI bus to
23     the FSB as a Message Signal Interrupt transaction. Because MSI is
24     generated in the form of a Memory Write, all transaction conditions,
25     such as a Retry, Master-Abort, Target-Abort or normal completion, are
26     supported.
27    
28     A PCI device that supports MSI must also support pin IRQ assertion
29     interrupt mechanism to provide backward compatibility for systems that
30     do not support MSI. In Systems, which support MSI, the bus driver is
31     responsible for initializing the message address and message data of
32     the device function's MSI/MSI-X capability structure during device
33     initial configuration.
34    
35     An MSI capable device function indicates MSI support by implementing
36     the MSI/MSI-X capability structure in its PCI capability list. The
37     device function may implement both the MSI capability structure and
38     the MSI-X capability structure; however, the bus driver should not
39     enable both.
40    
41     The MSI capability structure contains Message Control register,
42     Message Address register and Message Data register. These registers
43     provide the bus driver control over MSI. The Message Control register
44     indicates the MSI capability supported by the device. The Message
45     Address register specifies the target address and the Message Data
46     register specifies the characteristics of the message. To request
47     service, the device function writes the content of the Message Data
48     register to the target address. The device and its software driver
49     are prohibited from writing to these registers.
50    
51     The MSI-X capability structure is an optional extension to MSI. It
52     uses an independent and separate capability structure. There are
53     some key advantages to implementing the MSI-X capability structure
54     over the MSI capability structure as described below.
55    
56     - Support a larger maximum number of vectors per function.
57    
58     - Provide the ability for system software to configure
59     each vector with an independent message address and message
60     data, specified by a table that resides in Memory Space.
61    
62     - MSI and MSI-X both support per-vector masking. Per-vector
63     masking is an optional extension of MSI but a required
64     feature for MSI-X. Per-vector masking provides the kernel
65     the ability to mask/unmask MSI when servicing its software
66     interrupt service routing handler. If per-vector masking is
67     not supported, then the device driver should provide the
68     hardware/software synchronization to ensure that the device
69     generates MSI when the driver wants it to do so.
70    
71     4. Why use MSI?
72    
73     As a benefit the simplification of board design, MSI allows board
74     designers to remove out of band interrupt routing. MSI is another
75     step towards a legacy-free environment.
76    
77     Due to increasing pressure on chipset and processor packages to
78     reduce pin count, the need for interrupt pins is expected to
79     diminish over time. Devices, due to pin constraints, may implement
80     messages to increase performance.
81    
82     PCI Express endpoints uses INTx emulation (in-band messages) instead
83     of IRQ pin assertion. Using INTx emulation requires interrupt
84     sharing among devices connected to the same node (PCI bridge) while
85     MSI is unique (non-shared) and does not require BIOS configuration
86     support. As a result, the PCI Express technology requires MSI
87     support for better interrupt performance.
88    
89     Using MSI enables the device functions to support two or more
90     vectors, which can be configured to target different CPU's to
91     increase scalability.
92    
93     5. Configuring a driver to use MSI/MSI-X
94    
95     By default, the kernel will not enable MSI/MSI-X on all devices that
96     support this capability. The CONFIG_PCI_MSI kernel option
97     must be selected to enable MSI/MSI-X support.
98    
99     5.1 Including MSI/MSI-X support into the kernel
100    
101     To allow MSI/MSI-X capable device drivers to selectively enable
102     MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
103     below), the VECTOR based scheme needs to be enabled by setting
104     CONFIG_PCI_MSI during kernel config.
105    
106     Since the target of the inbound message is the local APIC, providing
107     CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
108    
109     5.2 Configuring for MSI support
110    
111     Due to the non-contiguous fashion in vector assignment of the
112     existing Linux kernel, this version does not support multiple
113     messages regardless of a device function is capable of supporting
114     more than one vector. To enable MSI on a device function's MSI
115     capability structure requires a device driver to call the function
116     pci_enable_msi() explicitly.
117    
118     5.2.1 API pci_enable_msi
119    
120     int pci_enable_msi(struct pci_dev *dev)
121    
122     With this new API, any existing device driver, which like to have
123     MSI enabled on its device function, must call this API to enable MSI
124     A successful call will initialize the MSI capability structure
125     with ONE vector, regardless of whether a device function is
126     capable of supporting multiple messages. This vector replaces the
127     pre-assigned dev->irq with a new MSI vector. To avoid the conflict
128     of new assigned vector with existing pre-assigned vector requires
129     a device driver to call this API before calling request_irq().
130    
131     5.2.2 API pci_disable_msi
132    
133     void pci_disable_msi(struct pci_dev *dev)
134    
135     This API should always be used to undo the effect of pci_enable_msi()
136     when a device driver is unloading. This API restores dev->irq with
137     the pre-assigned IOAPIC vector and switches a device's interrupt
138     mode to PCI pin-irq assertion/INTx emulation mode.
139    
140     Note that a device driver should always call free_irq() on MSI vector
141     it has done request_irq() on before calling this API. Failure to do
142     so results a BUG_ON() and a device will be left with MSI enabled and
143     leaks its vector.
144    
145     5.2.3 MSI mode vs. legacy mode diagram
146    
147     The below diagram shows the events, which switches the interrupt
148     mode on the MSI-capable device function between MSI mode and
149     PIN-IRQ assertion mode.
150    
151     ------------ pci_enable_msi ------------------------
152     | | <=============== | |
153     | MSI MODE | | PIN-IRQ ASSERTION MODE |
154     | | ===============> | |
155     ------------ pci_disable_msi ------------------------
156    
157    
158     Figure 1.0 MSI Mode vs. Legacy Mode
159    
160     In Figure 1.0, a device operates by default in legacy mode. Legacy
161     in this context means PCI pin-irq assertion or PCI-Express INTx
162     emulation. A successful MSI request (using pci_enable_msi()) switches
163     a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
164     stored in dev->irq will be saved by the PCI subsystem and a new
165     assigned MSI vector will replace dev->irq.
166    
167     To return back to its default mode, a device driver should always call
168     pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
169     device driver should always call free_irq() on MSI vector it has done
170     request_irq() on before calling pci_disable_msi(). Failure to do so
171     results a BUG_ON() and a device will be left with MSI enabled and
172     leaks its vector. Otherwise, the PCI subsystem restores a device's
173     dev->irq with a pre-assigned IOAPIC vector and marks released
174     MSI vector as unused.
175    
176     Once being marked as unused, there is no guarantee that the PCI
177     subsystem will reserve this MSI vector for a device. Depending on
178     the availability of current PCI vector resources and the number of
179     MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
180    
181     For the case where the PCI subsystem re-assigned this MSI vector
182     another driver, a request to switching back to MSI mode may result
183     in being assigned a different MSI vector or a failure if no more
184     vectors are available.
185    
186     5.3 Configuring for MSI-X support
187    
188     Due to the ability of the system software to configure each vector of
189     the MSI-X capability structure with an independent message address
190     and message data, the non-contiguous fashion in vector assignment of
191     the existing Linux kernel has no impact on supporting multiple
192     messages on an MSI-X capable device functions. To enable MSI-X on
193     a device function's MSI-X capability structure requires its device
194     driver to call the function pci_enable_msix() explicitly.
195    
196     The function pci_enable_msix(), once invoked, enables either
197     all or nothing, depending on the current availability of PCI vector
198     resources. If the PCI vector resources are available for the number
199     of vectors requested by a device driver, this function will configure
200     the MSI-X table of the MSI-X capability structure of a device with
201     requested messages. To emphasize this reason, for example, a device
202     may be capable for supporting the maximum of 32 vectors while its
203     software driver usually may request 4 vectors. It is recommended
204     that the device driver should call this function once during the
205     initialization phase of the device driver.
206    
207     Unlike the function pci_enable_msi(), the function pci_enable_msix()
208     does not replace the pre-assigned IOAPIC dev->irq with a new MSI
209     vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
210     into the field vector of each element contained in a second argument.
211     Note that the pre-assigned IO-APIC dev->irq is valid only if the device
212     operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt of
213     using dev->irq by the device driver to request for interrupt service
214     may result unpredictabe behavior.
215    
216     For each MSI-X vector granted, a device driver is responsible to call
217     other functions like request_irq(), enable_irq(), etc. to enable
218     this vector with its corresponding interrupt service handler. It is
219     a device driver's choice to assign all vectors with the same
220     interrupt service handler or each vector with a unique interrupt
221     service handler.
222    
223     5.3.1 Handling MMIO address space of MSI-X Table
224    
225     The PCI 3.0 specification has implementation notes that MMIO address
226     space for a device's MSI-X structure should be isolated so that the
227     software system can set different page for controlling accesses to
228     the MSI-X structure. The implementation of MSI patch requires the PCI
229     subsystem, not a device driver, to maintain full control of the MSI-X
230     table/MSI-X PBA and MMIO address space of the MSI-X table/MSI-X PBA.
231     A device driver is prohibited from requesting the MMIO address space
232     of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem will fail
233     enabling MSI-X on its hardware device when it calls the function
234     pci_enable_msix().
235    
236     5.3.2 Handling MSI-X allocation
237    
238     Determining the number of MSI-X vectors allocated to a function is
239     dependent on the number of MSI capable devices and MSI-X capable
240     devices populated in the system. The policy of allocating MSI-X
241     vectors to a function is defined as the following:
242    
243     #of MSI-X vectors allocated to a function = (x - y)/z where
244    
245     x = The number of available PCI vector resources by the time
246     the device driver calls pci_enable_msix(). The PCI vector
247     resources is the sum of the number of unassigned vectors
248     (new) and the number of released vectors when any MSI/MSI-X
249     device driver switches its hardware device back to a legacy
250     mode or is hot-removed. The number of unassigned vectors
251     may exclude some vectors reserved, as defined in parameter
252     NR_HP_RESERVED_VECTORS, for the case where the system is
253     capable of supporting hot-add/hot-remove operations. Users
254     may change the value defined in NR_HR_RESERVED_VECTORS to
255     meet their specific needs.
256    
257     y = The number of MSI capable devices populated in the system.
258     This policy ensures that each MSI capable device has its
259     vector reserved to avoid the case where some MSI-X capable
260     drivers may attempt to claim all available vector resources.
261    
262     z = The number of MSI-X capable devices pupulated in the system.
263     This policy ensures that maximum (x - y) is distributed
264     evenly among MSI-X capable devices.
265    
266     Note that the PCI subsystem scans y and z during a bus enumeration.
267     When the PCI subsystem completes configuring MSI/MSI-X capability
268     structure of a device as requested by its device driver, y/z is
269     decremented accordingly.
270    
271     5.3.3 Handling MSI-X shortages
272    
273     For the case where fewer MSI-X vectors are allocated to a function
274     than requested, the function pci_enable_msix() will return the
275     maximum number of MSI-X vectors available to the caller. A device
276     driver may re-send its request with fewer or equal vectors indicated
277     in a return. For example, if a device driver requests 5 vectors, but
278     the number of available vectors is 3 vectors, a value of 3 will be a
279     return as a result of pci_enable_msix() call. A function could be
280     designed for its driver to use only 3 MSI-X table entries as
281     different combinations as ABC--, A-B-C, A--CB, etc. Note that this
282     patch does not support multiple entries with the same vector. Such
283     attempt by a device driver to use 5 MSI-X table entries with 3 vectors
284     as ABBCC, AABCC, BCCBA, etc will result as a failure by the function
285     pci_enable_msix(). Below are the reasons why supporting multiple
286     entries with the same vector is an undesirable solution.
287    
288     - The PCI subsystem can not determine which entry, which
289     generated the message, to mask/unmask MSI while handling
290     software driver ISR. Attempting to walk through all MSI-X
291     table entries (2048 max) to mask/unmask any match vector
292     is an undesirable solution.
293    
294     - Walk through all MSI-X table entries (2048 max) to handle
295     SMP affinity of any match vector is an undesirable solution.
296    
297     5.3.4 API pci_enable_msix
298    
299     int pci_enable_msix(struct pci_dev *dev, u32 *entries, int nvec)
300    
301     This API enables a device driver to request the PCI subsystem
302     for enabling MSI-X messages on its hardware device. Depending on
303     the availability of PCI vectors resources, the PCI subsystem enables
304     either all or nothing.
305    
306     Argument dev points to the device (pci_dev) structure.
307    
308     Argument entries is a pointer of unsigned integer type. The number of
309     elements is indicated in argument nvec. The content of each element
310     will be mapped to the following struct defined in /driver/pci/msi.h.
311    
312     struct msix_entry {
313     u16 vector; /* kernel uses to write alloc vector */
314     u16 entry; /* driver uses to specify entry */
315     };
316    
317     A device driver is responsible for initializing the field entry of
318     each element with unique entry supported by MSI-X table. Otherwise,
319     -EINVAL will be returned as a result. A successful return of zero
320     indicates the PCI subsystem completes initializing each of requested
321     entries of the MSI-X table with message address and message data.
322     Last but not least, the PCI subsystem will write the 1:1
323     vector-to-entry mapping into the field vector of each element. A
324     device driver is responsible of keeping track of allocated MSI-X
325     vectors in its internal data structure.
326    
327     Argument nvec is an integer indicating the number of messages
328     requested.
329    
330     A return of zero indicates that the number of MSI-X vectors is
331     successfully allocated. A return of greater than zero indicates
332     MSI-X vector shortage. Or a return of less than zero indicates
333     a failure. This failure may be a result of duplicate entries
334     specified in second argument, or a result of no available vector,
335     or a result of failing to initialize MSI-X table entries.
336    
337     5.3.5 API pci_disable_msix
338    
339     void pci_disable_msix(struct pci_dev *dev)
340    
341     This API should always be used to undo the effect of pci_enable_msix()
342     when a device driver is unloading. Note that a device driver should
343     always call free_irq() on all MSI-X vectors it has done request_irq()
344     on before calling this API. Failure to do so results a BUG_ON() and
345     a device will be left with MSI-X enabled and leaks its vectors.
346    
347     5.3.6 MSI-X mode vs. legacy mode diagram
348    
349     The below diagram shows the events, which switches the interrupt
350     mode on the MSI-X capable device function between MSI-X mode and
351     PIN-IRQ assertion mode (legacy).
352    
353     ------------ pci_enable_msix(,,n) ------------------------
354     | | <=============== | |
355     | MSI-X MODE | | PIN-IRQ ASSERTION MODE |
356     | | ===============> | |
357     ------------ pci_disable_msix ------------------------
358    
359     Figure 2.0 MSI-X Mode vs. Legacy Mode
360    
361     In Figure 2.0, a device operates by default in legacy mode. A
362     successful MSI-X request (using pci_enable_msix()) switches a
363     device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
364     stored in dev->irq will be saved by the PCI subsystem; however,
365     unlike MSI mode, the PCI subsystem will not replace dev->irq with
366     assigned MSI-X vector because the PCI subsystem already writes the 1:1
367     vector-to-entry mapping into the field vector of each element
368     specified in second argument.
369    
370     To return back to its default mode, a device driver should always call
371     pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
372     a device driver should always call free_irq() on all MSI-X vectors it
373     has done request_irq() on before calling pci_disable_msix(). Failure
374     to do so results a BUG_ON() and a device will be left with MSI-X
375     enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
376     device function's interrupt mode from MSI-X mode to legacy mode and
377     marks all allocated MSI-X vectors as unused.
378    
379     Once being marked as unused, there is no guarantee that the PCI
380     subsystem will reserve these MSI-X vectors for a device. Depending on
381     the availability of current PCI vector resources and the number of
382     MSI/MSI-X requests from other drivers, these MSI-X vectors may be
383     re-assigned.
384    
385     For the case where the PCI subsystem re-assigned these MSI-X vectors
386     to other driver, a request to switching back to MSI-X mode may result
387     being assigned with another set of MSI-X vectors or a failure if no
388     more vectors are available.
389    
390     5.4 Handling function implementng both MSI and MSI-X capabilities
391    
392     For the case where a function implements both MSI and MSI-X
393     capabilities, the PCI subsystem enables a device to run either in MSI
394     mode or MSI-X mode but not both. A device driver determines whether it
395     wants MSI or MSI-X enabled on its hardware device. Once a device
396     driver requests for MSI, for example, it is prohibited to request for
397     MSI-X; in other words, a device driver is not permitted to ping-pong
398     between MSI mod MSI-X mode during a run-time.
399    
400     5.5 Hardware requirements for MSI/MSI-X support
401     MSI/MSI-X support requires support from both system hardware and
402     individual hardware device functions.
403    
404     5.5.1 System hardware support
405     Since the target of MSI address is the local APIC CPU, enabling
406     MSI/MSI-X support in Linux kernel is dependent on whether existing
407     system hardware supports local APIC. Users should verify their
408     system whether it runs when CONFIG_X86_LOCAL_APIC=y.
409    
410     In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
411     however, in UP environment, users must manually set
412     CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
413     CONFIG_PCI_MSI enables the VECTOR based scheme and
414     the option for MSI-capable device drivers to selectively enable
415     MSI/MSI-X.
416    
417     Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
418     vector is allocated new during runtime and MSI/MSI-X support does not
419     depend on BIOS support. This key independency enables MSI/MSI-X
420     support on future IOxAPIC free platform.
421    
422     5.5.2 Device hardware support
423     The hardware device function supports MSI by indicating the
424     MSI/MSI-X capability structure on its PCI capability list. By
425     default, this capability structure will not be initialized by
426     the kernel to enable MSI during the system boot. In other words,
427     the device function is running on its default pin assertion mode.
428     Note that in many cases the hardware supporting MSI have bugs,
429     which may result in system hang. The software driver of specific
430     MSI-capable hardware is responsible for whether calling
431     pci_enable_msi or not. A return of zero indicates the kernel
432     successfully initializes the MSI/MSI-X capability structure of the
433     device funtion. The device function is now running on MSI/MSI-X mode.
434    
435     5.6 How to tell whether MSI/MSI-X is enabled on device function
436    
437     At the driver level, a return of zero from the function call of
438     pci_enable_msi()/pci_enable_msix() indicates to a device driver that
439     its device function is initialized successfully and ready to run in
440     MSI/MSI-X mode.
441    
442     At the user level, users can use command 'cat /proc/interrupts'
443     to display the vector allocated for a device and its interrupt
444     MSI/MSI-X mode ("PCI MSI"/"PCI MSIX"). Below shows below MSI mode is
445     enabled on a SCSI Adaptec 39320D Ultra320.
446    
447     CPU0 CPU1
448     0: 324639 0 IO-APIC-edge timer
449     1: 1186 0 IO-APIC-edge i8042
450     2: 0 0 XT-PIC cascade
451     12: 2797 0 IO-APIC-edge i8042
452     14: 6543 0 IO-APIC-edge ide0
453     15: 1 0 IO-APIC-edge ide1
454     169: 0 0 IO-APIC-level uhci-hcd
455     185: 0 0 IO-APIC-level uhci-hcd
456     193: 138 10 PCI MSI aic79xx
457     201: 30 0 PCI MSI aic79xx
458     225: 30 0 IO-APIC-level aic7xxx
459     233: 30 0 IO-APIC-level aic7xxx
460     NMI: 0 0
461     LOC: 324553 325068
462     ERR: 0
463     MIS: 0
464    
465     6. FAQ
466    
467     Q1. Are there any limitations on using the MSI?
468    
469     A1. If the PCI device supports MSI and conforms to the
470     specification and the platform supports the APIC local bus,
471     then using MSI should work.
472    
473     Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
474     AMD processors)? In P3 IPI's are transmitted on the APIC local
475     bus and in P4 and Xeon they are transmitted on the system
476     bus. Are there any implications with this?
477    
478     A2. MSI support enables a PCI device sending an inbound
479     memory write (0xfeexxxxx as target address) on its PCI bus
480     directly to the FSB. Since the message address has a
481     redirection hint bit cleared, it should work.
482    
483     Q3. The target address 0xfeexxxxx will be translated by the
484     Host Bridge into an interrupt message. Are there any
485     limitations on the chipsets such as Intel 8xx, Intel e7xxx,
486     or VIA?
487    
488     A3. If these chipsets support an inbound memory write with
489     target address set as 0xfeexxxxx, as conformed to PCI
490     specification 2.3 or latest, then it should work.
491    
492     Q4. From the driver point of view, if the MSI is lost because
493     of the errors occur during inbound memory write, then it may
494     wait for ever. Is there a mechanism for it to recover?
495    
496     A4. Since the target of the transaction is an inbound memory
497     write, all transaction termination conditions (Retry,
498     Master-Abort, Target-Abort, or normal completion) are
499     supported. A device sending an MSI must abide by all the PCI
500     rules and conditions regarding that inbound memory write. So,
501     if a retry is signaled it must retry, etc... We believe that
502     the recommendation for Abort is also a retry (refer to PCI
503     specification 2.3 or latest).