Contents of /alx-src/tags/kernel26-2.6.12-alx-r9/Documentation/README.DAC960
Parent Directory | Revision Log
Revision 630 -
(show annotations)
(download)
Wed Mar 4 11:03:09 2009 UTC (15 years, 6 months ago) by niro
File size: 34727 byte(s)
Wed Mar 4 11:03:09 2009 UTC (15 years, 6 months ago) by niro
File size: 34727 byte(s)
Tag kernel26-2.6.12-alx-r9
1 | Linux Driver for Mylex DAC960/AcceleRAID/eXtremeRAID PCI RAID Controllers |
2 | |
3 | Version 2.2.11 for Linux 2.2.19 |
4 | Version 2.4.11 for Linux 2.4.12 |
5 | |
6 | PRODUCTION RELEASE |
7 | |
8 | 11 October 2001 |
9 | |
10 | Leonard N. Zubkoff |
11 | Dandelion Digital |
12 | lnz@dandelion.com |
13 | |
14 | Copyright 1998-2001 by Leonard N. Zubkoff <lnz@dandelion.com> |
15 | |
16 | |
17 | INTRODUCTION |
18 | |
19 | Mylex, Inc. designs and manufactures a variety of high performance PCI RAID |
20 | controllers. Mylex Corporation is located at 34551 Ardenwood Blvd., Fremont, |
21 | California 94555, USA and can be reached at 510.796.6100 or on the World Wide |
22 | Web at http://www.mylex.com. Mylex Technical Support can be reached by |
23 | electronic mail at mylexsup@us.ibm.com, by voice at 510.608.2400, or by FAX at |
24 | 510.745.7715. Contact information for offices in Europe and Japan is available |
25 | on their Web site. |
26 | |
27 | The latest information on Linux support for DAC960 PCI RAID Controllers, as |
28 | well as the most recent release of this driver, will always be available from |
29 | my Linux Home Page at URL "http://www.dandelion.com/Linux/". The Linux DAC960 |
30 | driver supports all current Mylex PCI RAID controllers including the new |
31 | eXtremeRAID 2000/3000 and AcceleRAID 352/170/160 models which have an entirely |
32 | new firmware interface from the older eXtremeRAID 1100, AcceleRAID 150/200/250, |
33 | and DAC960PJ/PG/PU/PD/PL. See below for a complete controller list as well as |
34 | minimum firmware version requirements. For simplicity, in most places this |
35 | documentation refers to DAC960 generically rather than explicitly listing all |
36 | the supported models. |
37 | |
38 | Driver bug reports should be sent via electronic mail to "lnz@dandelion.com". |
39 | Please include with the bug report the complete configuration messages reported |
40 | by the driver at startup, along with any subsequent system messages relevant to |
41 | the controller's operation, and a detailed description of your system's |
42 | hardware configuration. Driver bugs are actually quite rare; if you encounter |
43 | problems with disks being marked offline, for example, please contact Mylex |
44 | Technical Support as the problem is related to the hardware configuration |
45 | rather than the Linux driver. |
46 | |
47 | Please consult the RAID controller documentation for detailed information |
48 | regarding installation and configuration of the controllers. This document |
49 | primarily provides information specific to the Linux support. |
50 | |
51 | |
52 | DRIVER FEATURES |
53 | |
54 | The DAC960 RAID controllers are supported solely as high performance RAID |
55 | controllers, not as interfaces to arbitrary SCSI devices. The Linux DAC960 |
56 | driver operates at the block device level, the same level as the SCSI and IDE |
57 | drivers. Unlike other RAID controllers currently supported on Linux, the |
58 | DAC960 driver is not dependent on the SCSI subsystem, and hence avoids all the |
59 | complexity and unnecessary code that would be associated with an implementation |
60 | as a SCSI driver. The DAC960 driver is designed for as high a performance as |
61 | possible with no compromises or extra code for compatibility with lower |
62 | performance devices. The DAC960 driver includes extensive error logging and |
63 | online configuration management capabilities. Except for initial configuration |
64 | of the controller and adding new disk drives, most everything can be handled |
65 | from Linux while the system is operational. |
66 | |
67 | The DAC960 driver is architected to support up to 8 controllers per system. |
68 | Each DAC960 parallel SCSI controller can support up to 15 disk drives per |
69 | channel, for a maximum of 60 drives on a four channel controller; the fibre |
70 | channel eXtremeRAID 3000 controller supports up to 125 disk drives per loop for |
71 | a total of 250 drives. The drives installed on a controller are divided into |
72 | one or more "Drive Groups", and then each Drive Group is subdivided further |
73 | into 1 to 32 "Logical Drives". Each Logical Drive has a specific RAID Level |
74 | and caching policy associated with it, and it appears to Linux as a single |
75 | block device. Logical Drives are further subdivided into up to 7 partitions |
76 | through the normal Linux and PC disk partitioning schemes. Logical Drives are |
77 | also known as "System Drives", and Drive Groups are also called "Packs". Both |
78 | terms are in use in the Mylex documentation; I have chosen to standardize on |
79 | the more generic "Logical Drive" and "Drive Group". |
80 | |
81 | DAC960 RAID disk devices are named in the style of the Device File System |
82 | (DEVFS). The device corresponding to Logical Drive D on Controller C is |
83 | referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 |
84 | through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on |
85 | Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI |
86 | disks the device names will not change in the event of a disk drive failure. |
87 | The DAC960 driver is assigned major numbers 48 - 55 with one major number per |
88 | controller. The 8 bits of minor number are divided into 5 bits for the Logical |
89 | Drive and 3 bits for the partition. |
90 | |
91 | |
92 | SUPPORTED DAC960/AcceleRAID/eXtremeRAID PCI RAID CONTROLLERS |
93 | |
94 | The following list comprises the supported DAC960, AcceleRAID, and eXtremeRAID |
95 | PCI RAID Controllers as of the date of this document. It is recommended that |
96 | anyone purchasing a Mylex PCI RAID Controller not in the following table |
97 | contact the author beforehand to verify that it is or will be supported. |
98 | |
99 | eXtremeRAID 3000 |
100 | 1 Wide Ultra-2/LVD SCSI channel |
101 | 2 External Fibre FC-AL channels |
102 | 233MHz StrongARM SA 110 Processor |
103 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
104 | 32MB/64MB ECC SDRAM Memory |
105 | |
106 | eXtremeRAID 2000 |
107 | 4 Wide Ultra-160 LVD SCSI channels |
108 | 233MHz StrongARM SA 110 Processor |
109 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
110 | 32MB/64MB ECC SDRAM Memory |
111 | |
112 | AcceleRAID 352 |
113 | 2 Wide Ultra-160 LVD SCSI channels |
114 | 100MHz Intel i960RN RISC Processor |
115 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
116 | 32MB/64MB ECC SDRAM Memory |
117 | |
118 | AcceleRAID 170 |
119 | 1 Wide Ultra-160 LVD SCSI channel |
120 | 100MHz Intel i960RM RISC Processor |
121 | 16MB/32MB/64MB ECC SDRAM Memory |
122 | |
123 | AcceleRAID 160 (AcceleRAID 170LP) |
124 | 1 Wide Ultra-160 LVD SCSI channel |
125 | 100MHz Intel i960RS RISC Processor |
126 | Built in 16M ECC SDRAM Memory |
127 | PCI Low Profile Form Factor - fit for 2U height |
128 | |
129 | eXtremeRAID 1100 (DAC1164P) |
130 | 3 Wide Ultra-2/LVD SCSI channels |
131 | 233MHz StrongARM SA 110 Processor |
132 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
133 | 16MB/32MB/64MB Parity SDRAM Memory with Battery Backup |
134 | |
135 | AcceleRAID 250 (DAC960PTL1) |
136 | Uses onboard Symbios SCSI chips on certain motherboards |
137 | Also includes one onboard Wide Ultra-2/LVD SCSI Channel |
138 | 66MHz Intel i960RD RISC Processor |
139 | 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory |
140 | |
141 | AcceleRAID 200 (DAC960PTL0) |
142 | Uses onboard Symbios SCSI chips on certain motherboards |
143 | Includes no onboard SCSI Channels |
144 | 66MHz Intel i960RD RISC Processor |
145 | 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory |
146 | |
147 | AcceleRAID 150 (DAC960PRL) |
148 | Uses onboard Symbios SCSI chips on certain motherboards |
149 | Also includes one onboard Wide Ultra-2/LVD SCSI Channel |
150 | 33MHz Intel i960RP RISC Processor |
151 | 4MB Parity EDO Memory |
152 | |
153 | DAC960PJ 1/2/3 Wide Ultra SCSI-3 Channels |
154 | 66MHz Intel i960RD RISC Processor |
155 | 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory |
156 | |
157 | DAC960PG 1/2/3 Wide Ultra SCSI-3 Channels |
158 | 33MHz Intel i960RP RISC Processor |
159 | 4MB/8MB ECC EDO Memory |
160 | |
161 | DAC960PU 1/2/3 Wide Ultra SCSI-3 Channels |
162 | Intel i960CF RISC Processor |
163 | 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory |
164 | |
165 | DAC960PD 1/2/3 Wide Fast SCSI-2 Channels |
166 | Intel i960CF RISC Processor |
167 | 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory |
168 | |
169 | DAC960PL 1/2/3 Wide Fast SCSI-2 Channels |
170 | Intel i960 RISC Processor |
171 | 2MB/4MB/8MB/16MB/32MB DRAM Memory |
172 | |
173 | DAC960P 1/2/3 Wide Fast SCSI-2 Channels |
174 | Intel i960 RISC Processor |
175 | 2MB/4MB/8MB/16MB/32MB DRAM Memory |
176 | |
177 | For the eXtremeRAID 2000/3000 and AcceleRAID 352/170/160, firmware version |
178 | 6.00-01 or above is required. |
179 | |
180 | For the eXtremeRAID 1100, firmware version 5.06-0-52 or above is required. |
181 | |
182 | For the AcceleRAID 250, 200, and 150, firmware version 4.06-0-57 or above is |
183 | required. |
184 | |
185 | For the DAC960PJ and DAC960PG, firmware version 4.06-0-00 or above is required. |
186 | |
187 | For the DAC960PU, DAC960PD, DAC960PL, and DAC960P, either firmware version |
188 | 3.51-0-04 or above is required (for dual Flash ROM controllers), or firmware |
189 | version 2.73-0-00 or above is required (for single Flash ROM controllers) |
190 | |
191 | Please note that not all SCSI disk drives are suitable for use with DAC960 |
192 | controllers, and only particular firmware versions of any given model may |
193 | actually function correctly. Similarly, not all motherboards have a BIOS that |
194 | properly initializes the AcceleRAID 250, AcceleRAID 200, AcceleRAID 150, |
195 | DAC960PJ, and DAC960PG because the Intel i960RD/RP is a multi-function device. |
196 | If in doubt, contact Mylex RAID Technical Support (mylexsup@us.ibm.com) to |
197 | verify compatibility. Mylex makes available a hard disk compatibility list at |
198 | http://www.mylex.com/support/hdcomp/hd-lists.html. |
199 | |
200 | |
201 | DRIVER INSTALLATION |
202 | |
203 | This distribution was prepared for Linux kernel version 2.2.19 or 2.4.12. |
204 | |
205 | To install the DAC960 RAID driver, you may use the following commands, |
206 | replacing "/usr/src" with wherever you keep your Linux kernel source tree: |
207 | |
208 | cd /usr/src |
209 | tar -xvzf DAC960-2.2.11.tar.gz (or DAC960-2.4.11.tar.gz) |
210 | mv README.DAC960 linux/Documentation |
211 | mv DAC960.[ch] linux/drivers/block |
212 | patch -p0 < DAC960.patch (if DAC960.patch is included) |
213 | cd linux |
214 | make config |
215 | make bzImage (or zImage) |
216 | |
217 | Then install "arch/i386/boot/bzImage" or "arch/i386/boot/zImage" as your |
218 | standard kernel, run lilo if appropriate, and reboot. |
219 | |
220 | To create the necessary devices in /dev, the "make_rd" script included in |
221 | "DAC960-Utilities.tar.gz" from http://www.dandelion.com/Linux/ may be used. |
222 | LILO 21 and FDISK v2.9 include DAC960 support; also included in this archive |
223 | are patches to LILO 20 and FDISK v2.8 that add DAC960 support, along with |
224 | statically linked executables of LILO and FDISK. This modified version of LILO |
225 | will allow booting from a DAC960 controller and/or mounting the root file |
226 | system from a DAC960. |
227 | |
228 | Red Hat Linux 6.0 and SuSE Linux 6.1 include support for Mylex PCI RAID |
229 | controllers. Installing directly onto a DAC960 may be problematic from other |
230 | Linux distributions until their installation utilities are updated. |
231 | |
232 | |
233 | INSTALLATION NOTES |
234 | |
235 | Before installing Linux or adding DAC960 logical drives to an existing Linux |
236 | system, the controller must first be configured to provide one or more logical |
237 | drives using the BIOS Configuration Utility or DACCF. Please note that since |
238 | there are only at most 6 usable partitions on each logical drive, systems |
239 | requiring more partitions should subdivide a drive group into multiple logical |
240 | drives, each of which can have up to 6 usable partitions. Also, note that with |
241 | large disk arrays it is advisable to enable the 8GB BIOS Geometry (255/63) |
242 | rather than accepting the default 2GB BIOS Geometry (128/32); failing to so do |
243 | will cause the logical drive geometry to have more than 65535 cylinders which |
244 | will make it impossible for FDISK to be used properly. The 8GB BIOS Geometry |
245 | can be enabled by configuring the DAC960 BIOS, which is accessible via Alt-M |
246 | during the BIOS initialization sequence. |
247 | |
248 | For maximum performance and the most efficient E2FSCK performance, it is |
249 | recommended that EXT2 file systems be built with a 4KB block size and 16 block |
250 | stride to match the DAC960 controller's 64KB default stripe size. The command |
251 | "mke2fs -b 4096 -R stride=16 <device>" is appropriate. Unless there will be a |
252 | large number of small files on the file systems, it is also beneficial to add |
253 | the "-i 16384" option to increase the bytes per inode parameter thereby |
254 | reducing the file system metadata. Finally, on systems that will only be run |
255 | with Linux 2.2 or later kernels it is beneficial to enable sparse superblocks |
256 | with the "-s 1" option. |
257 | |
258 | |
259 | DAC960 ANNOUNCEMENTS MAILING LIST |
260 | |
261 | The DAC960 Announcements Mailing List provides a forum for informing Linux |
262 | users of new driver releases and other announcements regarding Linux support |
263 | for DAC960 PCI RAID Controllers. To join the mailing list, send a message to |
264 | "dac960-announce-request@dandelion.com" with the line "subscribe" in the |
265 | message body. |
266 | |
267 | |
268 | CONTROLLER CONFIGURATION AND STATUS MONITORING |
269 | |
270 | The DAC960 RAID controllers running firmware 4.06 or above include a Background |
271 | Initialization facility so that system downtime is minimized both for initial |
272 | installation and subsequent configuration of additional storage. The BIOS |
273 | Configuration Utility (accessible via Alt-R during the BIOS initialization |
274 | sequence) is used to quickly configure the controller, and then the logical |
275 | drives that have been created are available for immediate use even while they |
276 | are still being initialized by the controller. The primary need for online |
277 | configuration and status monitoring is then to avoid system downtime when disk |
278 | drives fail and must be replaced. Mylex's online monitoring and configuration |
279 | utilities are being ported to Linux and will become available at some point in |
280 | the future. Note that with a SAF-TE (SCSI Accessed Fault-Tolerant Enclosure) |
281 | enclosure, the controller is able to rebuild failed drives automatically as |
282 | soon as a drive replacement is made available. |
283 | |
284 | The primary interfaces for controller configuration and status monitoring are |
285 | special files created in the /proc/rd/... hierarchy along with the normal |
286 | system console logging mechanism. Whenever the system is operating, the DAC960 |
287 | driver queries each controller for status information every 10 seconds, and |
288 | checks for additional conditions every 60 seconds. The initial status of each |
289 | controller is always available for controller N in /proc/rd/cN/initial_status, |
290 | and the current status as of the last status monitoring query is available in |
291 | /proc/rd/cN/current_status. In addition, status changes are also logged by the |
292 | driver to the system console and will appear in the log files maintained by |
293 | syslog. The progress of asynchronous rebuild or consistency check operations |
294 | is also available in /proc/rd/cN/current_status, and progress messages are |
295 | logged to the system console at most every 60 seconds. |
296 | |
297 | Starting with the 2.2.3/2.0.3 versions of the driver, the status information |
298 | available in /proc/rd/cN/initial_status and /proc/rd/cN/current_status has been |
299 | augmented to include the vendor, model, revision, and serial number (if |
300 | available) for each physical device found connected to the controller: |
301 | |
302 | ***** DAC960 RAID Driver Version 2.2.3 of 19 August 1999 ***** |
303 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
304 | Configuring Mylex DAC960PRL PCI RAID Controller |
305 | Firmware Version: 4.07-0-07, Channels: 1, Memory Size: 16MB |
306 | PCI Bus: 1, Device: 4, Function: 1, I/O Address: Unassigned |
307 | PCI Address: 0xFE300000 mapped at 0xA0800000, IRQ Channel: 21 |
308 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
309 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
310 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
311 | SAF-TE Enclosure Management Enabled |
312 | Physical Devices: |
313 | 0:0 Vendor: IBM Model: DRVS09D Revision: 0270 |
314 | Serial Number: 68016775HA |
315 | Disk Status: Online, 17928192 blocks |
316 | 0:1 Vendor: IBM Model: DRVS09D Revision: 0270 |
317 | Serial Number: 68004E53HA |
318 | Disk Status: Online, 17928192 blocks |
319 | 0:2 Vendor: IBM Model: DRVS09D Revision: 0270 |
320 | Serial Number: 13013935HA |
321 | Disk Status: Online, 17928192 blocks |
322 | 0:3 Vendor: IBM Model: DRVS09D Revision: 0270 |
323 | Serial Number: 13016897HA |
324 | Disk Status: Online, 17928192 blocks |
325 | 0:4 Vendor: IBM Model: DRVS09D Revision: 0270 |
326 | Serial Number: 68019905HA |
327 | Disk Status: Online, 17928192 blocks |
328 | 0:5 Vendor: IBM Model: DRVS09D Revision: 0270 |
329 | Serial Number: 68012753HA |
330 | Disk Status: Online, 17928192 blocks |
331 | 0:6 Vendor: ESG-SHV Model: SCA HSBP M6 Revision: 0.61 |
332 | Logical Drives: |
333 | /dev/rd/c0d0: RAID-5, Online, 89640960 blocks, Write Thru |
334 | No Rebuild or Consistency Check in Progress |
335 | |
336 | To simplify the monitoring process for custom software, the special file |
337 | /proc/rd/status returns "OK" when all DAC960 controllers in the system are |
338 | operating normally and no failures have occurred, or "ALERT" if any logical |
339 | drives are offline or critical or any non-standby physical drives are dead. |
340 | |
341 | Configuration commands for controller N are available via the special file |
342 | /proc/rd/cN/user_command. A human readable command can be written to this |
343 | special file to initiate a configuration operation, and the results of the |
344 | operation can then be read back from the special file in addition to being |
345 | logged to the system console. The shell command sequence |
346 | |
347 | echo "<configuration-command>" > /proc/rd/c0/user_command |
348 | cat /proc/rd/c0/user_command |
349 | |
350 | is typically used to execute configuration commands. The configuration |
351 | commands are: |
352 | |
353 | flush-cache |
354 | |
355 | The "flush-cache" command flushes the controller's cache. The system |
356 | automatically flushes the cache at shutdown or if the driver module is |
357 | unloaded, so this command is only needed to be certain a write back cache |
358 | is flushed to disk before the system is powered off by a command to a UPS. |
359 | Note that the flush-cache command also stops an asynchronous rebuild or |
360 | consistency check, so it should not be used except when the system is being |
361 | halted. |
362 | |
363 | kill <channel>:<target-id> |
364 | |
365 | The "kill" command marks the physical drive <channel>:<target-id> as DEAD. |
366 | This command is provided primarily for testing, and should not be used |
367 | during normal system operation. |
368 | |
369 | make-online <channel>:<target-id> |
370 | |
371 | The "make-online" command changes the physical drive <channel>:<target-id> |
372 | from status DEAD to status ONLINE. In cases where multiple physical drives |
373 | have been killed simultaneously, this command may be used to bring all but |
374 | one of them back online, after which a rebuild to the final drive is |
375 | necessary. |
376 | |
377 | Warning: make-online should only be used on a dead physical drive that is |
378 | an active part of a drive group, never on a standby drive. The command |
379 | should never be used on a dead drive that is part of a critical logical |
380 | drive; rebuild should be used if only a single drive is dead. |
381 | |
382 | make-standby <channel>:<target-id> |
383 | |
384 | The "make-standby" command changes physical drive <channel>:<target-id> |
385 | from status DEAD to status STANDBY. It should only be used in cases where |
386 | a dead drive was replaced after an automatic rebuild was performed onto a |
387 | standby drive. It cannot be used to add a standby drive to the controller |
388 | configuration if one was not created initially; the BIOS Configuration |
389 | Utility must be used for that currently. |
390 | |
391 | rebuild <channel>:<target-id> |
392 | |
393 | The "rebuild" command initiates an asynchronous rebuild onto physical drive |
394 | <channel>:<target-id>. It should only be used when a dead drive has been |
395 | replaced. |
396 | |
397 | check-consistency <logical-drive-number> |
398 | |
399 | The "check-consistency" command initiates an asynchronous consistency check |
400 | of <logical-drive-number> with automatic restoration. It can be used |
401 | whenever it is desired to verify the consistency of the redundancy |
402 | information. |
403 | |
404 | cancel-rebuild |
405 | cancel-consistency-check |
406 | |
407 | The "cancel-rebuild" and "cancel-consistency-check" commands cancel any |
408 | rebuild or consistency check operations previously initiated. |
409 | |
410 | |
411 | EXAMPLE I - DRIVE FAILURE WITHOUT A STANDBY DRIVE |
412 | |
413 | The following annotated logs demonstrate the controller configuration and and |
414 | online status monitoring capabilities of the Linux DAC960 Driver. The test |
415 | configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a |
416 | DAC960PJ controller. The physical drives are configured into a single drive |
417 | group without a standby drive, and the drive group has been configured into two |
418 | logical drives, one RAID-5 and one RAID-6. Note that these logs are from an |
419 | earlier version of the driver and the messages have changed somewhat with newer |
420 | releases, but the functionality remains similar. First, here is the current |
421 | status of the RAID configuration: |
422 | |
423 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
424 | ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** |
425 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
426 | Configuring Mylex DAC960PJ PCI RAID Controller |
427 | Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB |
428 | PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned |
429 | PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 |
430 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
431 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
432 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
433 | Physical Devices: |
434 | 0:1 - Disk: Online, 2201600 blocks |
435 | 0:2 - Disk: Online, 2201600 blocks |
436 | 0:3 - Disk: Online, 2201600 blocks |
437 | 1:1 - Disk: Online, 2201600 blocks |
438 | 1:2 - Disk: Online, 2201600 blocks |
439 | 1:3 - Disk: Online, 2201600 blocks |
440 | Logical Drives: |
441 | /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru |
442 | /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru |
443 | No Rebuild or Consistency Check in Progress |
444 | |
445 | gwynedd:/u/lnz# cat /proc/rd/status |
446 | OK |
447 | |
448 | The above messages indicate that everything is healthy, and /proc/rd/status |
449 | returns "OK" indicating that there are no problems with any DAC960 controller |
450 | in the system. For demonstration purposes, while I/O is active Physical Drive |
451 | 1:1 is now disconnected, simulating a drive failure. The failure is noted by |
452 | the driver within 10 seconds of the controller's having detected it, and the |
453 | driver logs the following console status messages indicating that Logical |
454 | Drives 0 and 1 are now CRITICAL as a result of Physical Drive 1:1 being DEAD: |
455 | |
456 | DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
457 | DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
458 | DAC960#0: Physical Drive 1:1 killed because of timeout on SCSI command |
459 | DAC960#0: Physical Drive 1:1 is now DEAD |
460 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL |
461 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL |
462 | |
463 | The Sense Keys logged here are just Check Condition / Unit Attention conditions |
464 | arising from a SCSI bus reset that is forced by the controller during its error |
465 | recovery procedures. Concurrently with the above, the driver status available |
466 | from /proc/rd also reflects the drive failure. The status message in |
467 | /proc/rd/status has changed from "OK" to "ALERT": |
468 | |
469 | gwynedd:/u/lnz# cat /proc/rd/status |
470 | ALERT |
471 | |
472 | and /proc/rd/c0/current_status has been updated: |
473 | |
474 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
475 | ... |
476 | Physical Devices: |
477 | 0:1 - Disk: Online, 2201600 blocks |
478 | 0:2 - Disk: Online, 2201600 blocks |
479 | 0:3 - Disk: Online, 2201600 blocks |
480 | 1:1 - Disk: Dead, 2201600 blocks |
481 | 1:2 - Disk: Online, 2201600 blocks |
482 | 1:3 - Disk: Online, 2201600 blocks |
483 | Logical Drives: |
484 | /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru |
485 | /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru |
486 | No Rebuild or Consistency Check in Progress |
487 | |
488 | Since there are no standby drives configured, the system can continue to access |
489 | the logical drives in a performance degraded mode until the failed drive is |
490 | replaced and a rebuild operation completed to restore the redundancy of the |
491 | logical drives. Once Physical Drive 1:1 is replaced with a properly |
492 | functioning drive, or if the physical drive was killed without having failed |
493 | (e.g., due to electrical problems on the SCSI bus), the user can instruct the |
494 | controller to initiate a rebuild operation onto the newly replaced drive: |
495 | |
496 | gwynedd:/u/lnz# echo "rebuild 1:1" > /proc/rd/c0/user_command |
497 | gwynedd:/u/lnz# cat /proc/rd/c0/user_command |
498 | Rebuild of Physical Drive 1:1 Initiated |
499 | |
500 | The echo command instructs the controller to initiate an asynchronous rebuild |
501 | operation onto Physical Drive 1:1, and the status message that results from the |
502 | operation is then available for reading from /proc/rd/c0/user_command, as well |
503 | as being logged to the console by the driver. |
504 | |
505 | Within 10 seconds of this command the driver logs the initiation of the |
506 | asynchronous rebuild operation: |
507 | |
508 | DAC960#0: Rebuild of Physical Drive 1:1 Initiated |
509 | DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 |
510 | DAC960#0: Physical Drive 1:1 is now WRITE-ONLY |
511 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 1% completed |
512 | |
513 | and /proc/rd/c0/current_status is updated: |
514 | |
515 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
516 | ... |
517 | Physical Devices: |
518 | 0:1 - Disk: Online, 2201600 blocks |
519 | 0:2 - Disk: Online, 2201600 blocks |
520 | 0:3 - Disk: Online, 2201600 blocks |
521 | 1:1 - Disk: Write-Only, 2201600 blocks |
522 | 1:2 - Disk: Online, 2201600 blocks |
523 | 1:3 - Disk: Online, 2201600 blocks |
524 | Logical Drives: |
525 | /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru |
526 | /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru |
527 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 6% completed |
528 | |
529 | As the rebuild progresses, the current status in /proc/rd/c0/current_status is |
530 | updated every 10 seconds: |
531 | |
532 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
533 | ... |
534 | Physical Devices: |
535 | 0:1 - Disk: Online, 2201600 blocks |
536 | 0:2 - Disk: Online, 2201600 blocks |
537 | 0:3 - Disk: Online, 2201600 blocks |
538 | 1:1 - Disk: Write-Only, 2201600 blocks |
539 | 1:2 - Disk: Online, 2201600 blocks |
540 | 1:3 - Disk: Online, 2201600 blocks |
541 | Logical Drives: |
542 | /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru |
543 | /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru |
544 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 15% completed |
545 | |
546 | and every minute a progress message is logged to the console by the driver: |
547 | |
548 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 32% completed |
549 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 63% completed |
550 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 94% completed |
551 | DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 94% completed |
552 | |
553 | Finally, the rebuild completes successfully. The driver logs the status of the |
554 | logical and physical drives and the rebuild completion: |
555 | |
556 | DAC960#0: Rebuild Completed Successfully |
557 | DAC960#0: Physical Drive 1:1 is now ONLINE |
558 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE |
559 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE |
560 | |
561 | /proc/rd/c0/current_status is updated: |
562 | |
563 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
564 | ... |
565 | Physical Devices: |
566 | 0:1 - Disk: Online, 2201600 blocks |
567 | 0:2 - Disk: Online, 2201600 blocks |
568 | 0:3 - Disk: Online, 2201600 blocks |
569 | 1:1 - Disk: Online, 2201600 blocks |
570 | 1:2 - Disk: Online, 2201600 blocks |
571 | 1:3 - Disk: Online, 2201600 blocks |
572 | Logical Drives: |
573 | /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru |
574 | /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru |
575 | Rebuild Completed Successfully |
576 | |
577 | and /proc/rd/status indicates that everything is healthy once again: |
578 | |
579 | gwynedd:/u/lnz# cat /proc/rd/status |
580 | OK |
581 | |
582 | |
583 | EXAMPLE II - DRIVE FAILURE WITH A STANDBY DRIVE |
584 | |
585 | The following annotated logs demonstrate the controller configuration and and |
586 | online status monitoring capabilities of the Linux DAC960 Driver. The test |
587 | configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a |
588 | DAC960PJ controller. The physical drives are configured into a single drive |
589 | group with a standby drive, and the drive group has been configured into two |
590 | logical drives, one RAID-5 and one RAID-6. Note that these logs are from an |
591 | earlier version of the driver and the messages have changed somewhat with newer |
592 | releases, but the functionality remains similar. First, here is the current |
593 | status of the RAID configuration: |
594 | |
595 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
596 | ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** |
597 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
598 | Configuring Mylex DAC960PJ PCI RAID Controller |
599 | Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB |
600 | PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned |
601 | PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 |
602 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
603 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
604 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
605 | Physical Devices: |
606 | 0:1 - Disk: Online, 2201600 blocks |
607 | 0:2 - Disk: Online, 2201600 blocks |
608 | 0:3 - Disk: Online, 2201600 blocks |
609 | 1:1 - Disk: Online, 2201600 blocks |
610 | 1:2 - Disk: Online, 2201600 blocks |
611 | 1:3 - Disk: Standby, 2201600 blocks |
612 | Logical Drives: |
613 | /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru |
614 | /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru |
615 | No Rebuild or Consistency Check in Progress |
616 | |
617 | gwynedd:/u/lnz# cat /proc/rd/status |
618 | OK |
619 | |
620 | The above messages indicate that everything is healthy, and /proc/rd/status |
621 | returns "OK" indicating that there are no problems with any DAC960 controller |
622 | in the system. For demonstration purposes, while I/O is active Physical Drive |
623 | 1:2 is now disconnected, simulating a drive failure. The failure is noted by |
624 | the driver within 10 seconds of the controller's having detected it, and the |
625 | driver logs the following console status messages: |
626 | |
627 | DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
628 | DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
629 | DAC960#0: Physical Drive 1:2 killed because of timeout on SCSI command |
630 | DAC960#0: Physical Drive 1:2 is now DEAD |
631 | DAC960#0: Physical Drive 1:2 killed because it was removed |
632 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL |
633 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL |
634 | |
635 | Since a standby drive is configured, the controller automatically begins |
636 | rebuilding onto the standby drive: |
637 | |
638 | DAC960#0: Physical Drive 1:3 is now WRITE-ONLY |
639 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed |
640 | |
641 | Concurrently with the above, the driver status available from /proc/rd also |
642 | reflects the drive failure and automatic rebuild. The status message in |
643 | /proc/rd/status has changed from "OK" to "ALERT": |
644 | |
645 | gwynedd:/u/lnz# cat /proc/rd/status |
646 | ALERT |
647 | |
648 | and /proc/rd/c0/current_status has been updated: |
649 | |
650 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
651 | ... |
652 | Physical Devices: |
653 | 0:1 - Disk: Online, 2201600 blocks |
654 | 0:2 - Disk: Online, 2201600 blocks |
655 | 0:3 - Disk: Online, 2201600 blocks |
656 | 1:1 - Disk: Online, 2201600 blocks |
657 | 1:2 - Disk: Dead, 2201600 blocks |
658 | 1:3 - Disk: Write-Only, 2201600 blocks |
659 | Logical Drives: |
660 | /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru |
661 | /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru |
662 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed |
663 | |
664 | As the rebuild progresses, the current status in /proc/rd/c0/current_status is |
665 | updated every 10 seconds: |
666 | |
667 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
668 | ... |
669 | Physical Devices: |
670 | 0:1 - Disk: Online, 2201600 blocks |
671 | 0:2 - Disk: Online, 2201600 blocks |
672 | 0:3 - Disk: Online, 2201600 blocks |
673 | 1:1 - Disk: Online, 2201600 blocks |
674 | 1:2 - Disk: Dead, 2201600 blocks |
675 | 1:3 - Disk: Write-Only, 2201600 blocks |
676 | Logical Drives: |
677 | /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru |
678 | /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru |
679 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed |
680 | |
681 | and every minute a progress message is logged on the console by the driver: |
682 | |
683 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed |
684 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 76% completed |
685 | DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 66% completed |
686 | DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 84% completed |
687 | |
688 | Finally, the rebuild completes successfully. The driver logs the status of the |
689 | logical and physical drives and the rebuild completion: |
690 | |
691 | DAC960#0: Rebuild Completed Successfully |
692 | DAC960#0: Physical Drive 1:3 is now ONLINE |
693 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE |
694 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE |
695 | |
696 | /proc/rd/c0/current_status is updated: |
697 | |
698 | ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** |
699 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
700 | Configuring Mylex DAC960PJ PCI RAID Controller |
701 | Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB |
702 | PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned |
703 | PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 |
704 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
705 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
706 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
707 | Physical Devices: |
708 | 0:1 - Disk: Online, 2201600 blocks |
709 | 0:2 - Disk: Online, 2201600 blocks |
710 | 0:3 - Disk: Online, 2201600 blocks |
711 | 1:1 - Disk: Online, 2201600 blocks |
712 | 1:2 - Disk: Dead, 2201600 blocks |
713 | 1:3 - Disk: Online, 2201600 blocks |
714 | Logical Drives: |
715 | /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru |
716 | /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru |
717 | Rebuild Completed Successfully |
718 | |
719 | and /proc/rd/status indicates that everything is healthy once again: |
720 | |
721 | gwynedd:/u/lnz# cat /proc/rd/status |
722 | OK |
723 | |
724 | Note that the absence of a viable standby drive does not create an "ALERT" |
725 | status. Once dead Physical Drive 1:2 has been replaced, the controller must be |
726 | told that this has occurred and that the newly replaced drive should become the |
727 | new standby drive: |
728 | |
729 | gwynedd:/u/lnz# echo "make-standby 1:2" > /proc/rd/c0/user_command |
730 | gwynedd:/u/lnz# cat /proc/rd/c0/user_command |
731 | Make Standby of Physical Drive 1:2 Succeeded |
732 | |
733 | The echo command instructs the controller to make Physical Drive 1:2 into a |
734 | standby drive, and the status message that results from the operation is then |
735 | available for reading from /proc/rd/c0/user_command, as well as being logged to |
736 | the console by the driver. Within 60 seconds of this command the driver logs: |
737 | |
738 | DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 |
739 | DAC960#0: Physical Drive 1:2 is now STANDBY |
740 | DAC960#0: Make Standby of Physical Drive 1:2 Succeeded |
741 | |
742 | and /proc/rd/c0/current_status is updated: |
743 | |
744 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
745 | ... |
746 | Physical Devices: |
747 | 0:1 - Disk: Online, 2201600 blocks |
748 | 0:2 - Disk: Online, 2201600 blocks |
749 | 0:3 - Disk: Online, 2201600 blocks |
750 | 1:1 - Disk: Online, 2201600 blocks |
751 | 1:2 - Disk: Standby, 2201600 blocks |
752 | 1:3 - Disk: Online, 2201600 blocks |
753 | Logical Drives: |
754 | /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru |
755 | /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru |
756 | Rebuild Completed Successfully |