diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 12533a9..68e902e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -82,6 +82,7 @@ parameter is applicable: SH SuperH architecture is enabled. SMP The kernel is an SMP kernel. SPARC Sparc architecture is enabled. + SUSPEND2 Suspend2 is enabled. SWSUSP Software suspend is enabled. TS Appropriate touchscreen support is enabled. USB USB support is enabled. @@ -1140,6 +1141,8 @@ and is between 256 and 4096 characters. It is defined in the file noresume [SWSUSP] Disables resume and restores original swap space. + noresume2 [SUSPEND2] Disables resuming and restores original swap signature. + no-scroll [VGA] Disables scrollback. This is required for the Braillex ib80-piezo Braille reader made by F.H. Papenmeier (Germany). @@ -1445,6 +1448,11 @@ and is between 256 and 4096 characters. It is defined in the file retain_initrd [RAM] Keep initrd memory after extraction + resume2= [SUSPEND2] Specify the storage device for Suspend2. + Format: :. + See Documentation/power/suspend2.txt for details of the + formats for available image writers. + rhash_entries= [KNL,NET] Set number of hash buckets for route cache diff --git a/Documentation/power/suspend2-internals.txt b/Documentation/power/suspend2-internals.txt new file mode 100644 index 0000000..ba4e1e5 --- /dev/null +++ b/Documentation/power/suspend2-internals.txt @@ -0,0 +1,473 @@ + Software Suspend 2.2 Internal Documentation. + Version 1 + +1. Introduction. + + Software Suspend 2.2 is an addition to the Linux Kernel, designed to + allow the user to quickly shutdown and quickly boot a computer, without + needing to close documents or programs. It is equivalent to the + hibernate facility in some laptops. This implementation, however, + requires no special BIOS or hardware support. + + The code in these files is based upon the original implementation + prepared by Gabor Kuti and additional work by Pavel Machek and a + host of others. This code has been substantially reworked by Nigel + Cunningham, again with the help and testing of many others, not the + least of whom is Michael Frank. At its heart, however, the operation is + essentially the same as Gabor's version. + +2. Overview of operation. + + The basic sequence of operations is as follows: + + a. Quiesce all other activity. + b. Ensure enough memory and storage space are available, and attempt + to free memory/storage if necessary. + c. Allocate the required memory and storage space. + d. Write the image. + e. Power down. + + There are a number of complicating factors which mean that things are + not as simple as the above would imply, however... + + o The activity of each process must be stopped at a point where it will + not be holding locks necessary for saving the image, or unexpectedly + restart operations due to something like a timeout and thereby make + our image inconsistent. + + o It is desirous that we sync outstanding I/O to disk before calculating + image statistics. This reduces corruption if one should suspend but + then not resume, and also makes later parts of the operation safer (see + below). + + o We need to get as close as we can to an atomic copy of the data. + Inconsistencies in the image will result in inconsistent memory contents at + resume time, and thus in instability of the system and/or file system + corruption. This would appear to imply a maximum image size of one half of + the amount of RAM, but we have a solution... (again, below). + + o In 2.6, we choose to play nicely with the other suspend-to-disk + implementations. + +3. Detailed description of internals. + + a. Quiescing activity. + + Safely quiescing the system is achieved using two methods. + + First, we note that the vast majority of processes don't need to run during + suspend. They can be 'frozen'. We therefore implement a refrigerator + routine, which processes enter and in which they remain until the cycle is + complete. Processes enter the refrigerator via try_to_freeze() invocations + at appropriate places. A process cannot be frozen in any old place. It + must not be holding locks that will be needed for writing the image or + freezing other processes. For this reason, userspace processes generally + enter the refrigerator via the signal handling code, and kernel threads at + the place in their event loops where they drop locks and yield to other + processes or sleep. + + The second part of our method for quisescing the system involves freezing + the filesystems. We use the standard freeze_bdev and thaw_bdev functions to + ensure that all of the user's data is synced to disk before we begin to + write the image. This is particularly important with XFS, where without + bdev freezing, activity may still occur after we begin to write the image + (potentially causing in-memory and on-disk corruption later). + + Quiescing the system works most quickly and reliably when we add one more + element to the algorithm: separating the freezing of userspace processes + from the freezing of kernel space processes, and doing the filesystem freeze + in between. The filesystem freeze needs to be done while kernel threads such + as kjournald can still run. At the same time, though, everything will be + less racy and run more quickly if we stop userspace submitting more I/O work + while we're trying to quiesce. + + Quiescing the system is therefore done in three steps: + - Freeze userspace + - Freeze filesystems + - Freeze kernel threads + + If we need to free memory, we thaw kernel threads and filesystems, but not + userspace. We can then free caches without worrying about deadlocks due to + swap files being on frozen filesystems or such like. + + One limitation of this is that FUSE filesystems are incompatible with + suspending to disk. They need to be unmounted prior to suspending, to avoid + potential deadlocks. + + b. Ensure enough memory & storage are available. + + We have a number of constraints to meet in order to be able to successfully + suspend and resume. + + First, the image will be written in two parts, described below. One of these + parts needs to have an atomic copy made, which of course implies a maximum + size of one half of the amount of system memory. The other part ('pageset') + is not atomically copied, and can therefore be as large or small as desired. + + Second, we have constraints on the amount of storage available. In these + calculations, we may also consider any compression that will be done. The + cryptoapi module allows the user to configure an expected compression ratio. + + Third, the user can specify an arbitrary limit on the image size, in + megabytes. This limit is treated as a soft limit, so that we don't fail the + attempt to suspend if we cannot meet this constraint. + + c. Allocate the required memory and storage space. + + Having done the initial freeze, we determine whether the above constraints + are met, and seek to allocate the metadata for the image. If the constraints + are not met, or we fail to allocate the required space for the metadata, we + seek to free the amount of memory that we calculate is needed and try again. + We allow up to four iterations of this loop before aborting the cycle. If we + do fail, it should only be because of a bug in Suspend's calculations. + + These steps are merged together in the prepare_image function, found in + prepare_image.c. The functions are merged because of the cyclical nature + of the problem of calculating how much memory and storage is needed. Since + the data structures containing the information about the image must + themselves take memory and use storage, the amount of memory and storage + required changes as we prepare the image. Since the changes are not large, + only one or two iterations will be required to achieve a solution. + + The recursive nature of the algorithm is miminised by keeping user space + frozen while preparing the image, and by the fact that our records of which + pages are to be saved and which pageset they are saved in use bitmaps (so + that changes in number or fragmentation of the pages to be saved don't + feedback via changes in the amount of memory needed for metadata). The + recursiveness is thus limited to any extra slab pages allocated to store the + extents that record storage used, and he effects of seeking to free memory. + + d. Write the image. + + We previously mentioned the need to create an atomic copy of the data, and + the half-of-memory limitation that is implied in this. This limitation is + circumvented by dividing the memory to be saved into two parts, called + pagesets. + + Pageset2 contains the page cache - the pages on the active and inactive + lists. These pages aren't needed or modifed while Suspend2 is running, so + they can be safely written without an atomic copy. They are therefore + saved first and reloaded last. While saving these pages, Suspend2 carefully + ensures that the work of writing the pages doesn't make the image + inconsistent. + + Once pageset2 has been saved, we prepare to do the atomic copy of remaining + memory. As part of the preparation, we power down drivers, thereby providing + them with the opportunity to have their state recorded in the image. The + amount of memory allocated by drivers for this is usually negligible, but if + DRI is in use, video drivers may require significants amounts. Ideally we + would be able to query drivers while preparing the image as to the amount of + memory they will need. Unfortunately no such mechanism exists at the time of + writing. For this reason, Suspend2 allows the user to set an + 'extra_pages_allowance', which is used to seek to ensure sufficient memory + is available for drivers at this point. Suspend2 also lets the user set this + value to 0. In this case, a test driver suspend is done while preparing the + image, and the difference (plus a margin) used instead. + + Having suspended the drivers, we save the CPU context before making an + atomic copy of pageset1, resuming the drivers and saving the atomic copy. + After saving the two pagesets, we just need to save our metadata before + powering down. + + As we mentioned earlier, the contents of pageset2 pages aren't needed once + they've been saved. We therefore use them as the destination of our atomic + copy. In the unlikely event that pageset1 is larger, extra pages are + allocated while the image is being prepared. This is normally only a real + possibility when the system has just been booted and the page cache is + small. + + This is where we need to be careful about syncing, however. Pageset2 will + probably contain filesystem meta data. If this is overwritten with pageset1 + and then a sync occurs, the filesystem will be corrupted - at least until + resume time and another sync of the restored data. Since there is a + possibility that the user might not resume or (may it never be!) that + suspend might oops, we do our utmost to avoid syncing filesystems after + copying pageset1. + + e. Power down. + + Powering down uses standard kernel routines. Suspend2 supports powering down + using the ACPI S3, S4 and S5 methods or the kernel's non-ACPI power-off. + Supporting suspend to ram (S3) as a power off option might sound strange, + but it allows the user to quickly get their system up and running again if + the battery doesn't run out (we just need to re-read the overwritten pages) + and if the battery does run out (or the user removes power), they can still + resume. + +4. Data Structures. + + Suspend2 uses three main structures to store its metadata and configuration + information: + + a) Pageflags bitmaps. + + Suspend records which pages will be in pageset1, pageset2, the destination + of the atomic copy and the source of the atomically restored image using + bitmaps. These bitmaps are created from order zero allocations to maximise + reliability. The individual pages are combined together with pointers to + form per-zone bitmaps, which are in turn combined with another layer of + pointers to construct the overall bitmap. + + The pageset1 bitmap is thus easily stored in the image header for use at + resume time. + + As mentioned above, using bitmaps also means that the amount of memory and + storage required for recording the above information is constant. This + greatly simplifies the work of preparing the image. In earlier versions of + Suspend2, extents were used to record which pages would be stored. In that + case, however, eating memory could result in greater fragmentation of the + lists of pages, which in turn required more memory to store the extents and + more storage in the image header. These could in turn require further + freeing of memory, and another iteration. All of this complexity is removed + by having bitmaps. + + Bitmaps also make a lot of sense because Suspend2 only ever iterates + through the lists. There is therefore no cost to not being able to find the + nth page in order 0 time. We only need to worry about the cost of finding + the n+1th page, given the location of the nth page. Bitwise optimisations + help here. + + The data structure is: unsigned long ***. + + b) Extents for block data. + + Suspend2 supports writing the image to multiple block devices. In the case + of swap, multiple partitions and/or files may be in use, and we happily use + them all. This is accomplished as follows: + + Whatever the actual source of the allocated storage, the destination of the + image can be viewed in terms of one or more block devices, and on each + device, a list of sectors. To simplify matters, we only use contiguous, + PAGE_SIZE aligned sectors, like the swap code does. + + Since sector numbers on each bdev may well not start at 0, it makes much + more sense to use extents here. Contiguous ranges of pages can thus be + represented in the extents by contiguous values. + + Variations in block size are taken account of in transforming this data + into the parameters for bio submission. + + We can thus implement a layer of abstraction wherein the core of Suspend2 + doesn't have to worry about which device we're currently writing to or + where in the device we are. It simply requests that the next page in the + pageset or header be written, leaving the details to this lower layer. + The lower layer remembers where in the sequence of devices and blocks each + pageset starts. The header always starts at the beginning of the allocated + storage. + + So extents are: + + struct extent { + unsigned long minimum, maximum; + struct extent *next; + } + + These are combined into chains of extents for a device: + + struct extent_chain { + int size; /* size of the extent ie sum (max-min+1) */ + int allocs, frees; + char *name; + struct extent *first, *last_touched; + }; + + For each bdev, we need to store a little more info: + + struct suspend_bdev_info { + struct block_device *bdev; + dev_t dev_t; + int bmap_shift; + int blocks_per_page; + }; + + The dev_t is used to identify the device in the stored image. As a result, + we expect devices at resume time to have the same major and minor numbers + as they had while suspending. This is primarily a concern where the user + utilises LVM for storage, as they will need to dmsetup their partitions in + such a way as to maintain this consistency at resume time. + + bmap_shift and blocks_per_page record apply the effects of variations in + blocks per page settings for the filesystem and underlying bdev. For most + filesystems, these are the same, but for xfs, they can have independant + values. + + Combining these two structures together, we have everything we need to + record what devices and what blocks on each device are being used to + store the image, and to submit i/o using bio_submit. + + The last elements in the picture are a means of recording how the storage + is being used. + + We do this first and foremost by implementing a layer of abstraction on + top of the devices and extent chains which allows us to view however many + devices there might be as one long storage tape, with a single 'head' that + tracks a 'current position' on the tape: + + struct extent_iterate_state { + struct extent_chain *chains; + int num_chains; + int current_chain; + struct extent *current_extent; + unsigned long current_offset; + }; + + That is, *chains points to an array of size num_chains of extent chains. + For the filewriter, this is always a single chain. For the swapwriter, the + array is of size MAX_SWAPFILES. + + current_chain, current_extent and current_offset thus point to the current + index in the chains array (and into a matching array of struct + suspend_bdev_info), the current extent in that chain (to optimise access), + and the current value in the offset. + + The image is divided into three parts: + - The header + - Pageset 1 + - Pageset 2 + + The header always starts at the first device and first block. We know its + size before we begin to save the image because we carefully account for + everything that will be stored in it. + + The second pageset (LRU) is stored first. It begins on the next page after + the end of the header. + + The first pageset is stored second. It's start location is only known once + pageset2 has been saved, since pageset2 may be compressed as it is written. + This location is thus recorded at the end of saving pageset2. It is page + aligned also. + + Since this information is needed at resume time, and the location of extents + in memory will differ at resume time, this needs to be stored in a portable + way: + + struct extent_iterate_saved_state { + int chain_num; + int extent_num; + unsigned long offset; + }; + + We can thus implement a layer of abstraction wherein the core of Suspend2 + doesn't have to worry about which device we're currently writing to or + where in the device we are. It simply requests that the next page in the + pageset or header be written, leaving the details to this layer, and + invokes the routines to remember and restore the position, without having + to worry about the details of how the data is arranged on disk or such like. + + c) Modules + + One aim in designing Suspend2 was to make it flexible. We wanted to allow + for the implementation of different methods of transforming a page to be + written to disk and different methods of getting the pages stored. + + In early versions (the betas and perhaps Suspend1), compression support was + inlined in the image writing code, and the data structures and code for + managing swap were intertwined with the rest of the code. A number of people + had expressed interest in implementing image encryption, and alternative + methods of storing the image. + + In order to achieve this, Suspend2 was given a modular design. + + A module is a single file which encapsulates the functionality needed + to transform a pageset of data (encryption or compression, for example), + or to write the pageset to a device. The former type of module is called + a 'page-transformer', the later a 'writer'. + + Modules are linked together in pipeline fashion. There may be zero or more + page transformers in a pipeline, and there is always exactly one writer. + The pipeline follows this pattern: + + --------------------------------- + | Suspend2 Core | + --------------------------------- + | + | + --------------------------------- + | Page transformer 1 | + --------------------------------- + | + | + --------------------------------- + | Page transformer 2 | + --------------------------------- + | + | + --------------------------------- + | Writer | + --------------------------------- + + During the writing of an image, the core code feeds pages one at a time + to the first module. This module performs whatever transformations it + implements on the incoming data, completely consuming the incoming data and + feeding output in a similar manner to the next module. A module may buffer + its output. + + During reading, the pipeline works in the reverse direction. The core code + calls the first module with the address of a buffer which should be filled. + (Note that the buffer size is always PAGE_SIZE at this time). This module + will in turn request data from the next module and so on down until the + writer is made to read from the stored image. + + Part of definition of the structure of a module thus looks like this: + + int (*rw_init) (int rw, int stream_number); + int (*rw_cleanup) (int rw); + int (*write_chunk) (struct page *buffer_page); + int (*read_chunk) (struct page *buffer_page, int sync); + + It should be noted that the _cleanup routine may be called before the + full stream of data has been read or written. While writing the image, + the user may (depending upon settings) choose to abort suspending, and + if we are in the midst of writing the last portion of the image, a portion + of the second pageset may be reread. This may also happen if an error + occurs and we seek to abort the process of writing the image. + + The modular design is also useful in a number of other ways. It provides + a means where by we can add support for: + + - providing overall initialisation and cleanup routines; + - serialising configuration information in the image header; + - providing debugging information to the user; + - determining memory and image storage requirements; + - dis/enabling components at run-time; + - configuring the module (see below); + + ...and routines for writers specific to their work: + - Parsing a resume2= location; + - Determining whether an image exists; + - Marking a resume as having been attempted; + - Invalidating an image; + + Since some parts of the core - the user interface and storage manager + support - have use for some of these functions, they are registered as + 'miscellaneous' modules as well. + + d) Sysfs data structures. + + This brings us naturally to support for configuring Suspend2. We desired to + provide a way to make Suspend2 as flexible and configurable as possible. + The user shouldn't have to reboot just because they want to now suspend to + a file instead of a partition, for example. + + To accomplish this, Suspend2 implements a very generic means whereby the + core and modules can register new sysfs entries. All Suspend2 entries use + a single _store and _show routine, both of which are found in sysfs.c in + the kernel/power directory. These routines handle the most common operations + - getting and setting the values of bits, integers, longs, unsigned longs + and strings in one place, and allow overrides for customised get and set + options as well as side-effect routines for all reads and writes. + + When combined with some simple macros, a new sysfs entry can then be defined + in just a couple of lines: + + { SUSPEND2_ATTR("progress_granularity", SYSFS_RW), + SYSFS_INT(&progress_granularity, 1, 2048) + }, + + This defines a sysfs entry named "progress_granularity" which is rw and + allows the user to access an integer stored at &progress_granularity, giving + it a value between 1 and 2048 inclusive. + + Sysfs entries are registered under /sys/power/suspend2, and entries for + modules are located in a subdirectory named after the module. + diff --git a/Documentation/power/suspend2.txt b/Documentation/power/suspend2.txt new file mode 100644 index 0000000..b5a8edb --- /dev/null +++ b/Documentation/power/suspend2.txt @@ -0,0 +1,713 @@ + --- Suspend2, version 2.2 --- + +1. What is it? +2. Why would you want it? +3. What do you need to use it? +4. Why not just use the version already in the kernel? +5. How do you use it? +6. What do all those entries in /sys/power/suspend2 do? +7. How do you get support? +8. I think I've found a bug. What should I do? +9. When will XXX be supported? +10 How does it work? +11. Who wrote Suspend2? + +1. What is it? + + Imagine you're sitting at your computer, working away. For some reason, you + need to turn off your computer for a while - perhaps it's time to go home + for the day. When you come back to your computer next, you're going to want + to carry on where you left off. Now imagine that you could push a button and + have your computer store the contents of its memory to disk and power down. + Then, when you next start up your computer, it loads that image back into + memory and you can carry on from where you were, just as if you'd never + turned the computer off. Far less time to start up, no reopening + applications and finding what directory you put that file in yesterday. + That's what Suspend2 does. + + Suspend2 has a long heritage. It began life as work by Gabor Kuti, who, + with some help from Pavel Machek, got an early version going in 1999. The + project was then taken over by Florent Chabaud while still in alpha version + numbers. Nigel Cunningham came on the scene when Florent was unable to + continue, moving the project into betas, then 1.0, 2.0 and so on up to + the present 2.2 series. Pavel Machek's swsusp code, which was merged around + 2.5.17 retains the original name, and was essentially a fork of the beta + code until Rafael Wysocki came on the scene in 2005 and began to improve it + further. + +2. Why would you want it? + + Why wouldn't you want it? + + Being able to save the state of your system and quickly restore it improves + your productivity - you get a useful system in far less time than through + the normal boot process. + +3. What do you need to use it? + + a. Kernel Support. + + i) The Suspend2 patch. + + Suspend2 is part of the Linux Kernel. This version is not part of Linus's + 2.6 tree at the moment, so you will need to download the kernel source and + apply the latest patch. Having done that, enable the appropriate options in + make [menu|x]config (under Power Management Options), compile and install your + kernel. Suspend2 works with SMP, Highmem, preemption, x86-32, PPC and x86_64. + + Suspend2 patches are available from http://suspend2.net. + + ii) Compression and encryption support. + + Compression and encryption support are implemented via the + cryptoapi. You will therefore want to select any Cryptoapi transforms that + you want to use on your image from the Cryptoapi menu while configuring + your kernel. + + You can also tell Suspend to write it's image to an encrypted and/or + compressed filesystem/swap partition. In that case, you don't need to do + anything special for Suspend2 when it comes to kernel configuration. + + iii) Configuring other options. + + While you're configuring your kernel, try to configure as much as possible + to build as modules. We recommend this because there are a number of drivers + that are still in the process of implementing proper power management + support. In those cases, the best way to work around their current lack is + to build them as modules and remove the modules while suspending. You might + also bug the driver authors to get their support up to speed, or even help! + + b. Storage. + + i) Swap. + + Suspend2 can store the suspend image in your swap partition, a swap file or + a combination thereof. Whichever combination you choose, you will probably + want to create enough swap space to store the largest image you could have, + plus the space you'd normally use for swap. A good rule of thumb would be + to calculate the amount of swap you'd want without using Suspend2, and then + add the amount of memory you have. This swapspace can be arranged in any way + you'd like. It can be in one partition or file, or spread over a number. The + only requirement is that they be active when you start a suspend cycle. + + There is one exception to this requirement. Suspend2 has the ability to turn + on one swap file or partition at the start of suspending and turn it back off + at the end. If you want to ensure you have enough memory to store a image + when your memory is fully used, you might want to make one swap partition or + file for 'normal' use, and another for Suspend2 to activate & deactivate + automatically. (Further details below). + + ii) Normal files. + + Suspend2 includes a 'filewriter'. The filewriter can store your image in a + simple file. Since Linux has the idea of everything being a file, this is + more powerful than it initially sounds. If, for example, you were to set up + a network block device file, you could suspend to a network server. This has + been tested and works to a point, but nbd itself isn't stateless enough for + our purposes. + + Take extra care when setting up the filewriter. If you just type commands + without thinking and then try to suspend, you could cause irreversible + corruption on your filesystems! Make sure you have backups. + + Most people will only want to suspend to a local file. To achieve that, do + something along the lines of: + + echo "Suspend2" > /suspend-file + dd if=/dev/zero bs=1M count=512 >> suspend-file + + This will create a 512MB file called /suspend-file. To get Suspend2 to use + it: + + echo /suspend-file > /sys/power/suspend2/filewriter/filewriter_target + + Then + + cat /sys/power/suspend2/resume2 + + Put the results of this into your bootloader's configuration (see also step + C, below: + + ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE--- + # cat /sys/power/suspend2/resume2 + file:/dev/hda2:0x1e001 + + In this example, we would edit the append= line of our lilo.conf|menu.lst + so that it included: + + resume2=file:/dev/hda2:0x1e001 + ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE--- + + For those who are thinking 'Could I make the file sparse?', the answer is + 'No!'. At the moment, there is no way for Suspend2 to fill in the holes in + a sparse file while suspending. In the longer term (post merge!), I'd like + to change things so that the file could be dynamically resized as needed. + Right now, however, that's not possible and not a priority. + + c. Bootloader configuration. + + Using Suspend2 also requires that you add an extra parameter to + your lilo.conf or equivalent. Here's an example for a swap partition: + + append="resume2=swap:/dev/hda1" + + This would tell Suspend2 that /dev/hda1 is a swap partition you + have. Suspend2 will use the swap signature of this partition as a + pointer to your data when you suspend. This means that (in this example) + /dev/hda1 doesn't need to be _the_ swap partition where all of your data + is actually stored. It just needs to be a swap partition that has a + valid signature. + + You don't need to have a swap partition for this purpose. Suspend2 + can also use a swap file, but usage is a little more complex. Having made + your swap file, turn it on and do + + cat /sys/power/suspend2/swapwriter/headerlocations + + (this assumes you've already compiled your kernel with Suspend2 + support and booted it). The results of the cat command will tell you + what you need to put in lilo.conf: + + For swap partitions like /dev/hda1, simply use resume2=/dev/hda1. + For swapfile `swapfile`, use resume2=swap:/dev/hda2:0x242d. + + If the swapfile changes for any reason (it is moved to a different + location, it is deleted and recreated, or the filesystem is + defragmented) then you will have to check + /sys/power/suspend2/swapwriter/headerlocations for a new resume_block value. + + Once you've compiled and installed the kernel and adjusted your bootloader + configuration, you should only need to reboot for the most basic part + of Suspend2 to be ready. + + If you only compile in the swapwriter, or only compile in the filewriter, + you don't need to add the "swap:" part of the resume2= parameters above. + resume2=/dev/hda2:0x242d will work just as well. + + d. The hibernate script. + + Since the driver model in 2.6 kernels is still being developed, you may need + to do more, however. Users of Suspend2 usually start the process via a script + which prepares for the suspend, tells the kernel to do its stuff and then + restore things afterwards. This script might involve: + + - Switching to a text console and back if X doesn't like the video card + status on resume. + - Un/reloading PCMCIA support since it doesn't play well with suspend. + + Note that you might not be able to unload some drivers if there are + processes using them. You might have to kill off processes that hold + devices open. Hint: if your X server accesses an USB mouse, doing a + 'chvt' to a text console releases the device and you can unload the + module. + + Check out the latest script (available on suspend2.net). + +4. Why not just use the version already in the kernel? + + The version in the vanilla kernel has a number of drawbacks. Among these: + - it has a maximum image size of 1/2 total memory. + - it doesn't allocate storage until after it has snapshotted memory. + This means that you can't be sure suspending will work until you + see it start to write the image. + - it performs all of it's I/O synchronously. + - it does not allow you to press escape to cancel a cycle + - it does not allow you to automatically swapon a file when + starting a cycle. + - it does not allow you to use multiple swap partitions. + - it does not allow you to use swapfiles. + - it does not allow you to use ordinary files. + - it just invalidates an image and continues to boot if you + accidentally boot the wrong kernel after suspending. + - it doesn't support any sort of nice display while suspending + - it is moving toward requiring that you have an initrd/initramfs + to ever have a hope of resuming (uswsusp). While uswsusp will + address some of the concerns above, it won't address all, and + will be more complicated to get set up. + +5. How do you use it? + + A suspend cycle can be started directly by doing: + + echo > /sys/power/suspend2/do_resume + + In practice, though, you'll probably want to use the hibernate script + to unload modules, configure the kernel the way you like it and so on. + In that case, you'd do (as root): + + hibernate + + See the hibernate script's man page for more details on the options it + takes. + + If you're using the text or splash user interface modules, one neat feature + of Suspend2 that you might find useful is that you can press Escape at any + time during suspending, and the process will be aborted. + + Due to the way suspend works, this means you'll have your system back and + perfectly usable almost instantly. The only exception is when it's at the + very end of writing the image. Then it will need to reload a small ( + usually 4-50MBs, depending upon the image characteristics) portion first. + + If you run into problems with resuming, adding the "noresume2" option to + the kernel command line will let you skip the resume step and recover your + system. + +6. What do all those entries in /sys/power/suspend2 do? + + /sys/power/suspend2 is the directory which contains files you can use to + tune and configure Suspend2 to your liking. The exact contents of + the directory will depend upon the version of Suspend2 you're + running and the options you selected at compile time. In the following + descriptions, names in brackets refer to compile time options. + (Note that they're all dependant upon you having selected CONFIG_SUSPEND2 + in the first place!). + + Since the values of these settings can open potential security risks, they + are usually accessible only to the root user. You can, however, enable a + compile time option which makes all of these files world-accessible. This + should only be done if you trust everyone with shell access to this + computer! + + - checksum/enabled + + Use cryptoapi hashing routines to verify that Pageset2 pages don't change + while we're saving the first part of the image, and to get any pages that + do change resaved in the atomic copy. This should normally not be needed, + but if you're seeing issues, please enable this. If your issues stop you + being able to resume, enable this option, suspend and cancel the cycle + after the atomic copy is done. If the debugging info shows a non-zero + number of pages resaved, please report this to Nigel. + + - compression/algorithm + + Set the cryptoapi algorithm used for compressing the image. + + - compression/expected_compression + + These values allow you to set an expected compression ratio, which Software + Suspend will use in calculating whether it meets constraints on the image + size. If this expected compression ratio is not attained, the suspend will + abort, so it is wise to allow some spare. You can see what compression + ratio is achieved in the logs after suspending. + + - debug_info: + + This file returns information about your configuration that may be helpful + in diagnosing problems with suspending. + + - do_resume: + + When anything is written to this file suspend will attempt to read and + restore an image. If there is no image, it will return almost immediately. + If an image exists, the echo > will never return. Instead, the original + kernel context will be restored and the original echo > do_suspend will + return. + + - do_suspend: + + When anything is written to this file, the kernel side of Suspend2 will + begin to attempt to write an image to disk and power down. You'll normally + want to run the hibernate script instead, to get modules unloaded first. + + - driver_model_beeping + + Enable beeping when suspending and resuming the drivers. Might help with + determining where a problem in resuming occurs. + + - */enabled + + These option can be used to temporarily disable various parts of suspend. + + - encryption/* + + The iv, key, save_key_and_iv, mode and algorithm values allow you to + select a cryptoapi encryption algoritm, set the iv and key and whether + they are saved in the image header. Saving the iv and key in the image + header is of course less secure than having them on some external device, + such as a USB key. If you want to use a USB key, you'll need to write + some scripting in your initrd/ramfs to retrieve the key & iv from your + USB key and put them into the entries again prior to doing the echo to + do_resume. + + - extra_pages_allowance + + When Suspend2 does its atomic copy, it calls the driver model suspend + and resume methods. If you have DRI enabled with a driver such as fglrx, + this can result in the driver allocating a substantial amount of memory + for storing its state. Extra_pages_allowance tells suspend2 how much + extra memory it should ensure is available for those allocations. If + your attempts at suspending end with a message in dmesg indicating that + insufficient extra pages were allowed, you need to increase this value. + + - filewriter/target: + + Read this value to get the current setting. Write to it to point Suspend + at a new storage location for the filewriter. See above for details of how + to set up the filewriter. + + - freezer_test + + This entry can be used to get Suspend2 to just test the freezer without + actually doing a suspend cycle. It is useful for diagnosing freezing + issues. + + - image_exists: + + Can be used in a script to determine whether a valid image exists at the + location currently pointed to by resume2=. Returns up to three lines. + The first is whether an image exists (-1 for unsure, otherwise 0 or 1). + If an image eixsts, additional lines will return the machine and version. + Echoing anything to this entry removes any current image. + + - image_size_limit: + + The maximum size of suspend image written to disk, measured in megabytes + (1024*1024). + + - interface_version: + + The value returned by this file can be used by scripts and configuration + tools to determine what entries should be looked for. The value is + incremented whenever an entry in /sys/power/suspend2 is obsoleted or + added. + + - last_result: + + The result of the last suspend, as defined in + include/linux/suspend-debug.h with the values SUSPEND_ABORTED to + SUSPEND_KEPT_IMAGE. This is a bitmask. + + - log_everything (CONFIG_PM_DEBUG): + + Setting this option results in all messages printed being logged. Normally, + only a subset are logged, so as to not slow the process and not clutter the + logs. Useful for debugging. It can be toggled during a cycle by pressing + 'L'. + + - pause_between_steps (CONFIG_PM_DEBUG): + + This option is used during debugging, to make Suspend2 pause between + each step of the process. It is ignored when the nice display is on. + + - powerdown_method: + + Used to select a method by which Suspend2 should powerdown after writing the + image. Currently: + + 0: Don't use ACPI to power off. + 3: Attempt to enter Suspend-to-ram. + 4: Attempt to enter ACPI S4 mode. + 5: Attempt to power down via ACPI S5 mode. + + Note that these options are highly dependant upon your hardware & software: + + 3: When succesful, your machine suspends-to-ram instead of powering off. + The advantage of using this mode is that it doesn't matter whether your + battery has enough charge to make it through to your next resume. If it + lasts, you will simply resume from suspend to ram (and the image on disk + will be discarded). If the battery runs out, you will resume from disk + instead. The disadvantage is that it takes longer than a normal + suspend-to-ram to enter the state, since the suspend-to-disk image needs + to be written first. + 4/5: When successful, your machine will be off and comsume (almost) no power. + But it might still react to some external events like opening the lid or + trafic on a network or usb device. For the bios, resume is then the same + as warm boot, similar to a situation where you used the command `reboot' + to reboot your machine. If your machine has problems on warm boot or if + you want to protect your machine with the bios password, this is probably + not the right choice. Mode 4 may be necessary on some machines where ACPI + wake up methods need to be run to properly reinitialise hardware after a + suspend-to-disk cycle. + 0: Switch the machine completely off. The only possible wakeup is the power + button. For the bios, resume is then the same as a cold boot, in + particular you would have to provide your bios boot password if your + machine uses that feature for booting. + + - progressbar_granularity_limit: + + This option can be used to limit the granularity of the progress bar + displayed with a bootsplash screen. The value is the maximum number of + steps. That is, 10 will make the progress bar jump in 10% increments. + + - reboot: + + This option causes Suspend2 to reboot rather than powering down + at the end of saving an image. It can be toggled during a cycle by pressing + 'R'. + + - resume_commandline: + + This entry can be read after resuming to see the commandline that was used + when resuming began. You might use this to set up two bootloader entries + that are the same apart from the fact that one includes a extra append= + argument "at_work=1". You could then grep resume_commandline in your + post-resume scripts and configure networking (for example) differently + depending upon whether you're at home or work. resume_commandline can be + set to arbitrary text if you wish to remove sensitive contents. + + - swapwriter/swapfilename: + + This entry is used to specify the swapfile or partition that + Suspend2 will attempt to swapon/swapoff automatically. Thus, if + I normally use /dev/hda1 for swap, and want to use /dev/hda2 for specifically + for my suspend image, I would + + echo /dev/hda2 > /sys/power/suspend2/swapwriter/swapfile + + /dev/hda2 would then be automatically swapon'd and swapoff'd. Note that the + swapon and swapoff occur while other processes are frozen (including kswapd) + so this swap file will not be used up when attempting to free memory. The + parition/file is also given the highest priority, so other swapfiles/partitions + will only be used to save the image when this one is filled. + + The value of this file is used by headerlocations along with any currently + activated swapfiles/partitions. + + - swapwriter/headerlocations: + + This option tells you the resume2= options to use for swap devices you + currently have activated. It is particularly useful when you only want to + use a swap file to store your image. See above for further details. + + - toggle_process_nofreeze + + This entry can be used to toggle the NOFREEZE flag on a process, to allow it + to run during Suspending. It should be used with extreme caution. There are + strict limitations on what a process running during suspend can do. This is + really only intended for use by Suspend's helpers (userui in particular). + + - userui_program + + This entry is used to tell Suspend what userspace program to use for + providing a user interface while suspending. The program uses a netlink + socket to pass messages back and forward to the kernel, allowing all of the + functions formerly implemented in the kernel user interface components. + + - user_interface/debug_sections (CONFIG_PM_DEBUG): + + This value, together with the console log level, controls what debugging + information is displayed. The console log level determines the level of + detail, and this value determines what detail is displayed. This value is + a bit vector, and the meaning of the bits can be found in the kernel tree + in include/linux/suspend2.h. It can be overridden using the kernel's + command line option suspend_dbg. + + - user_interface/default_console_level (CONFIG_PM_DEBUG): + + This determines the value of the console log level at the start of a + suspend cycle. If debugging is compiled in, the console log level can be + changed during a cycle by pressing the digit keys. Meanings are: + + 0: Nice display. + 1: Nice display plus numerical progress. + 2: Errors only. + 3: Low level debugging info. + 4: Medium level debugging info. + 5: High level debugging info. + 6: Verbose debugging info. + + - user_interface/enable_escape: + + Setting this to "1" will enable you abort a suspend by + pressing escape, "0" (default) disables this feature. Note that enabling + this option means that you cannot initiate a suspend and then walk away + from your computer, expecting it to be secure. With feature disabled, + you can validly have this expectation once Suspend begins to write the + image to disk. (Prior to this point, it is possible that Suspend might + about because of failure to freeze all processes or because constraints + on its ability to save the image are not met). + + - version: + + The version of suspend you have compiled into the currently running kernel. + +7. How do you get support? + + Glad you asked. Suspend2 is being actively maintained and supported + by Nigel (the guy doing most of the kernel coding at the moment), Bernard + (who maintains the hibernate script and userspace user interface components) + and its users. + + Resources availble include HowTos, FAQs and a Wiki, all available via + suspend2.net. You can find the mailing lists there. + +8. I think I've found a bug. What should I do? + + By far and a way, the most common problems people have with suspend2 + related to drivers not having adequate power management support. In this + case, it is not a bug with suspend2, but we can still help you. As we + mentioned above, such issues can usually be worked around by building the + functionality as modules and unloading them while suspending. Please visit + the Wiki for up-to-date lists of known issues and work arounds. + + If this information doesn't help, try running: + + hibernate --bug-report + + ..and sending the output to the users mailing list. + + Good information on how to provide us with useful information from an + oops is found in the file REPORTING-BUGS, in the top level directory + of the kernel tree. If you get an oops, please especially note the + information about running what is printed on the screen through ksymoops. + The raw information is useless. + +9. When will XXX be supported? + + If there's a feature missing from Suspend2 that you'd like, feel free to + ask. We try to be obliging, within reason. + + Patches are welcome. Please send to the list. + +10. How does it work? + + Suspend2 does its work in a number of steps. + + a. Freezing system activity. + + The first main stage in suspending is to stop all other activity. This is + achieved in stages. Processes are considered in fours groups, which we will + describe in reverse order for clarity's sake: Threads with the PF_NOFREEZE + flag, kernel threads without this flag, userspace processes with the + PF_SYNCTHREAD flag and all other processes. The first set (PF_NOFREEZE) are + untouched by the refrigerator code. They are allowed to run during suspending + and resuming, and are used to support user interaction, storage access or the + like. Other kernel threads (those unneeded while suspending) are frozen last. + This leaves us with userspace processes that need to be frozen. When a + process enters one of the *_sync system calls, we set a PF_SYNCTHREAD flag on + that process for the duration of that call. Processes that have this flag are + frozen after processes without it, so that we can seek to ensure that dirty + data is synced to disk as quickly as possible in a situation where other + processes may be submitting writes at the same time. Freezing the processes + that are submitting data stops new I/O from being submitted. Syncthreads can + then cleanly finish their work. So the order is: + + - Userspace processes without PF_SYNCTHREAD or PF_NOFREEZE; + - Userspace processes with PF_SYNCTHREAD (they won't have NOFREEZE); + - Kernel processes without PF_NOFREEZE. + + b. Eating memory. + + For a successful suspend, you need to have enough disk space to store the + image and enough memory for the various limitations of Suspend2's + algorithm. You can also specify a maximum image size. In order to attain + to those constraints, Suspend2 may 'eat' memory. If, after freezing + processes, the constraints aren't met, Suspend2 will thaw all the + other processes and begin to eat memory until its calculations indicate + the constraints are met. It will then freeze processes again and recheck + its calculations. + + c. Allocation of storage. + + Next, Suspend2 allocates the storage that will be used to save + the image. + + The core of Suspend2 knows nothing about how or where pages are stored. We + therefore request the active writer (remember you might have compiled in + more than one!) to allocate enough storage for our expect image size. If + this request cannot be fulfilled, we eat more memory and try again. If it + is fulfiled, we seek to allocate additional storage, just in case our + expected compression ratio (if any) isn't achieved. This time, however, we + just continue if we can't allocate enough storage. + + If these calls to our writer change the characteristics of the image such + that we haven't allocated enough memory, we also loop. (The writer may well + need to allocate space for its storage information). + + d. Write the first part of the image. + + Suspend2 stores the image in two sets of pages called 'pagesets'. + Pageset 2 contains pages on the active and inactive lists; essentially + the page cache. Pageset 1 contains all other pages, including the kernel. + We use two pagesets for one important reason: We need to make an atomic copy + of the kernel to ensure consistency of the image. Without a second pageset, + that would limit us to an image that was at most half the amount of memory + available. Using two pagesets allows us to store a full image. Since pageset + 2 pages won't be needed in saving pageset 1, we first save pageset 2 pages. + We can then make our atomic copy of the remaining pages using both pageset 2 + pages and any other pages that are free. While saving both pagesets, we are + careful not to corrupt the image. Among other things, we use lowlevel block + I/O routines that don't change the pagecache contents. + + The next step, then, is writing pageset 2. + + e. Suspending drivers and storing processor context. + + Having written pageset2, Suspend2 calls the power management functions to + notify drivers of the suspend, and saves the processor state in preparation + for the atomic copy of memory we are about to make. + + f. Atomic copy. + + At this stage, everything else but the Suspend2 code is halted. Processes + are frozen or idling, drivers are quiesced and have stored (ideally and where + necessary) their configuration in memory we are about to atomically copy. + In our lowlevel architecture specific code, we have saved the CPU state. + We can therefore now do our atomic copy before resuming drivers etc. + + g. Save the atomic copy (pageset 1). + + Suspend can then write the atomic copy of the remaining pages. Since we + have copied the pages into other locations, we can continue to use the + normal block I/O routines without fear of corruption our image. + + f. Save the suspend header. + + Nearly there! We save our settings and other parameters needed for + reloading pageset 1 in a 'suspend header'. We also tell our writer to + serialise its data at this stage, so that it can reread the image at resume + time. Note that the writer can write this data in any format - in the case + of the swapwriter, for example, it splits header pages in 4092 byte blocks, + using the last four bytes to link pages of data together. This is completely + transparent to the core. + + g. Set the image header. + + Finally, we edit the header at our resume2= location. The signature is + changed by the writer to reflect the fact that an image exists, and to point + to the start of that data if necessary (swapwriter). + + h. Power down. + + Or reboot if we're debugging and the appropriate option is selected. + + Whew! + + Reloading the image. + -------------------- + + Reloading the image is essentially the reverse of all the above. We load + our copy of pageset 1, being careful to choose locations that aren't going + to be overwritten as we copy it back (We start very early in the boot + process, so there are no other processes to quiesce here). We then copy + pageset 1 back to its original location in memory and restore the process + context. We are now running with the original kernel. Next, we reload the + pageset 2 pages, free the memory and swap used by Suspend2, restore + the pageset header and restart processes. Sounds easy in comparison to + suspending, doesn't it! + + There is of course more to Suspend2 than this, but this explanation + should be a good start. If there's interest, I'll write further + documentation on range pages and the low level I/O. + +11. Who wrote Suspend2? + + (Answer based on the writings of Florent Chabaud, credits in files and + Nigel's limited knowledge; apologies to anyone missed out!) + + The main developers of Suspend2 have been... + + Gabor Kuti + Pavel Machek + Florent Chabaud + Bernard Blackham + Nigel Cunningham + + They have been aided in their efforts by a host of hundreds, if not thousands + of testers and people who have submitted bug fixes & suggestions. Of special + note are the efforts of Michael Frank, who had his computers repetitively + suspend and resume for literally tens of thousands of cycles and developed + scripts to stress the system and test Suspend2 far beyond the point + most of us (Nigel included!) would consider testing. His efforts have + contributed as much to Suspend2 as any of the names above. diff --git a/MAINTAINERS b/MAINTAINERS index 277877a..f1adba1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3218,6 +3218,13 @@ M: sammy@sammy.net W: http://sammy.net/sun3/ S: Maintained +SUSPEND2 +P: Nigel Cunningham +M: nigel@suspend2.net +L: suspend2-devel@suspend2.net +W: http://suspend2.net +S: Maintained + SVGA HANDLING P: Martin Mares M: mj@ucw.cz diff --git a/arch/i386/mm/fault.c b/arch/i386/mm/fault.c index b8c4e25..21982d9 100644 --- a/arch/i386/mm/fault.c +++ b/arch/i386/mm/fault.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -33,6 +34,9 @@ extern void die(const char *,struct pt_regs *,long); static ATOMIC_NOTIFIER_HEAD(notify_page_fault_chain); +int suspend2_faulted = 0; +EXPORT_SYMBOL(suspend2_faulted); + int register_page_fault_notifier(struct notifier_block *nb) { vmalloc_sync_all(); @@ -311,6 +315,20 @@ fastcall void __kprobes do_page_fault(struct pt_regs *regs, si_code = SEGV_MAPERR; + /* During a Suspend2 atomic copy, with DEBUG_SLAB, we will + * get page faults where slab has been unmapped. Map them + * temporarily and set the variable that tells Suspend2 to + * unmap afterwards. + */ + + if (unlikely(suspend2_running && !suspend2_faulted)) { + struct page *page = NULL; + suspend2_faulted = 1; + page = virt_to_page(address); + kernel_map_pages(page, 1, 1); + return; + } + /* * We fault-in kernel-space virtual memory on-demand. The * 'reference' page table is init_mm.pgd. diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c index ae43688..a180d21 100644 --- a/arch/i386/mm/init.c +++ b/arch/i386/mm/init.c @@ -387,7 +387,7 @@ static void __init pagetable_init (void) #endif } -#if defined(CONFIG_SOFTWARE_SUSPEND) || defined(CONFIG_ACPI_SLEEP) +#if defined(CONFIG_SUSPEND_SHARED) || defined(CONFIG_ACPI_SLEEP) /* * Swap suspend & friends need this for resume because things like the intel-agp * driver might have split up a kernel 4MB mapping. @@ -774,13 +774,13 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end) unsigned long addr; for (addr = begin; addr < end; addr += PAGE_SIZE) { - ClearPageReserved(virt_to_page(addr)); - init_page_count(virt_to_page(addr)); + //ClearPageReserved(virt_to_page(addr)); + //init_page_count(virt_to_page(addr)); memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE); - free_page(addr); - totalram_pages++; + //free_page(addr); + //totalram_pages++; } - printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); + //printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10); } void free_initmem(void) diff --git a/arch/i386/mm/pageattr.c b/arch/i386/mm/pageattr.c index 412ebbd..dded2e1 100644 --- a/arch/i386/mm/pageattr.c +++ b/arch/i386/mm/pageattr.c @@ -252,7 +252,27 @@ void kernel_map_pages(struct page *page, int numpages, int enable) */ __flush_tlb_all(); } +EXPORT_SYMBOL(kernel_map_pages); #endif +int page_is_mapped(struct page *page) +{ + pte_t *kpte; + unsigned long address; + struct page *kpte_page; + + if(PageHighMem(page)) + return 0; + + address = (unsigned long)page_address(page); + + kpte = lookup_address(address); + if (!kpte) + return -EINVAL; + kpte_page = virt_to_page(kpte); + + return (pte_val(*kpte) & (__PAGE_KERNEL_EXEC | __PAGE_KERNEL)) ? 1:0; +} EXPORT_SYMBOL(change_page_attr); EXPORT_SYMBOL(global_flush_tlb); +EXPORT_SYMBOL(page_is_mapped); diff --git a/arch/i386/power/Makefile b/arch/i386/power/Makefile index 2de7bbf..72a6169 100644 --- a/arch/i386/power/Makefile +++ b/arch/i386/power/Makefile @@ -1,2 +1,2 @@ obj-$(CONFIG_PM) += cpu.o -obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o suspend.o +obj-$(CONFIG_SUSPEND_SHARED) += swsusp.o suspend.o diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 8120d42..5d57f0c 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -36,7 +36,7 @@ obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_6xx) += idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o obj-$(CONFIG_TAU) += tau_6xx.o -obj32-$(CONFIG_SOFTWARE_SUSPEND) += swsusp_32.o +obj32-$(CONFIG_SUSPEND_SHARED) += swsusp_32.o obj32-$(CONFIG_MODULES) += module_32.o ifeq ($(CONFIG_PPC_MERGE),y) diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c index 651fa42..243e5f4 100644 --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -425,7 +425,7 @@ static void __init find_boot_device(void) * only */ -#ifdef CONFIG_SOFTWARE_SUSPEND +#ifdef CONFIG_SUSPEND_SHARED static int pmac_pm_prepare(suspend_state_t state) { @@ -480,16 +480,16 @@ static struct pm_ops pmac_pm_ops = { .valid = pmac_pm_valid, }; -#endif /* CONFIG_SOFTWARE_SUSPEND */ +#endif /* CONFIG_SUSPEND_SHARED */ static int initializing = 1; static int pmac_late_init(void) { initializing = 0; -#ifdef CONFIG_SOFTWARE_SUSPEND +#ifdef CONFIG_SUSPEND_SHARED pm_set_ops(&pmac_pm_ops); -#endif /* CONFIG_SOFTWARE_SUSPEND */ +#endif /* CONFIG_SUSPEND_SHARED */ return 0; } diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile index bb47e86..9943190 100644 --- a/arch/x86_64/kernel/Makefile +++ b/arch/x86_64/kernel/Makefile @@ -26,7 +26,7 @@ obj-y += io_apic.o mpparse.o \ obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_PM) += suspend.o -obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend_asm.o +obj-$(CONFIG_SUSPEND_SHARED) += suspend_asm.o obj-$(CONFIG_CPU_FREQ) += cpufreq/ obj-$(CONFIG_EARLY_PRINTK) += early_printk.o obj-$(CONFIG_IOMMU) += pci-gart.o aperture.o diff --git a/arch/x86_64/kernel/suspend.c b/arch/x86_64/kernel/suspend.c index 91f7e67..983f16f 100644 --- a/arch/x86_64/kernel/suspend.c +++ b/arch/x86_64/kernel/suspend.c @@ -140,7 +140,7 @@ void fix_processor_context(void) } -#ifdef CONFIG_SOFTWARE_SUSPEND +#ifdef CONFIG_SUSPEND_SHARED /* Defined in arch/x86_64/kernel/suspend_asm.S */ extern int restore_image(void); @@ -219,4 +219,4 @@ int swsusp_arch_resume(void) restore_image(); return 0; } -#endif /* CONFIG_SOFTWARE_SUSPEND */ +#endif /* CONFIG_SUSPEND_SHARED */ diff --git a/crypto/Kconfig b/crypto/Kconfig index 086fcec..efde46c 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -406,6 +406,14 @@ config CRYPTO_DEFLATE You will most probably want this if using IPSec. +config CRYPTO_LZF + tristate "LZF compression algorithm" + default y + select CRYPTO_ALGAPI + help + This is the LZF algorithm. It is especially useful for Suspend2, + because it achieves good compression quickly. + config CRYPTO_MICHAEL_MIC tristate "Michael MIC keyed digest algorithm" select CRYPTO_ALGAPI diff --git a/crypto/Makefile b/crypto/Makefile index 12f93f5..69a8af3 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -46,5 +46,6 @@ obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o +obj-$(CONFIG_CRYPTO_LZF) += lzf.o obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o diff --git a/crypto/lzf.c b/crypto/lzf.c new file mode 100644 index 0000000..7c74784 --- /dev/null +++ b/crypto/lzf.c @@ -0,0 +1,325 @@ +/* + * Cryptoapi LZF compression module. + * + * Copyright (c) 2004-2005 Nigel Cunningham + * + * based on the deflate.c file: + * + * Copyright (c) 2003 James Morris + * + * and upon the LZF compression module donated to the Suspend2 project with + * the following copyright: + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * Copyright (c) 2000-2003 Marc Alexander Lehmann + * + * Redistribution and use in source and binary forms, with or without modifica- + * tion, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER- + * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO + * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE- + * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; + * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH- + * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Alternatively, the contents of this file may be used under the terms of + * the GNU General Public License version 2 (the "GPL"), in which case the + * provisions of the GPL are applicable instead of the above. If you wish to + * allow the use of your version of this file only under the terms of the + * GPL and not to allow others to use your version of this file under the + * BSD license, indicate your decision by deleting the provisions above and + * replace them with the notice and other provisions required by the GPL. If + * you do not delete the provisions above, a recipient may use your version + * of this file under either the BSD or the GPL. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +struct lzf_ctx { + void *hbuf; + unsigned int bufofs; +}; + +/* + * size of hashtable is (1 << hlog) * sizeof (char *) + * decompression is independent of the hash table size + * the difference between 15 and 14 is very small + * for small blocks (and 14 is also faster). + * For a low-memory configuration, use hlog == 13; + * For best compression, use 15 or 16. + */ +static const int hlog = 14; + +/* + * don't play with this unless you benchmark! + * decompression is not dependent on the hash function + * the hashing function might seem strange, just believe me + * it works ;) + */ +static inline u16 first(const u8 *p) +{ + return ((p[0]) << 8) + p[1]; +} + +static inline u16 next(u8 v, const u8 *p) +{ + return ((v) << 8) + p[2]; +} + +static inline u32 idx(unsigned int h) +{ + return (((h ^ (h << 5)) >> (3*8 - hlog)) + h*3) & ((1 << hlog) - 1); +} + +/* + * IDX works because it is very similar to a multiplicative hash, e.g. + * (h * 57321 >> (3*8 - hlog)) + * the next one is also quite good, albeit slow ;) + * (int)(cos(h & 0xffffff) * 1e6) + */ + +static const int max_lit = (1 << 5); +static const int max_off = (1 << 13); +static const int max_ref = ((1 << 8) + (1 << 3)); + +/* + * compressed format + * + * 000LLLLL ; literal + * LLLOOOOO oooooooo ; backref L + * 111OOOOO LLLLLLLL oooooooo ; backref L+7 + * + */ + +static void lzf_compress_exit(struct crypto_tfm *tfm) +{ + struct lzf_ctx *ctx = crypto_tfm_ctx(tfm); + + if (!ctx->hbuf) + return; + + vfree(ctx->hbuf); + ctx->hbuf = NULL; +} + +static int lzf_compress_init(struct crypto_tfm *tfm) +{ + struct lzf_ctx *ctx = crypto_tfm_ctx(tfm); + + /* Get LZF ready to go */ + ctx->hbuf = vmalloc_32((1 << hlog) * sizeof(char *)); + if (ctx->hbuf) + return 0; + + printk(KERN_WARNING "Failed to allocate %ld bytes for lzf workspace\n", + (long) ((1 << hlog) * sizeof(char *))); + return -ENOMEM; +} + +static int lzf_compress(struct crypto_tfm *tfm, const u8 *in_data, + unsigned int in_len, u8 *out_data, unsigned int *out_len) +{ + struct lzf_ctx *ctx = crypto_tfm_ctx(tfm); + const u8 **htab = ctx->hbuf; + const u8 **hslot; + const u8 *ip = in_data; + u8 *op = out_data; + const u8 *in_end = ip + in_len; + u8 *out_end = op + *out_len - 3; + const u8 *ref; + + unsigned int hval = first(ip); + unsigned long off; + int lit = 0; + + memset(htab, 0, sizeof(htab)); + + for (;;) { + if (ip < in_end - 2) { + hval = next(hval, ip); + hslot = htab + idx(hval); + ref = *hslot; + *hslot = ip; + + if ((off = ip - ref - 1) < max_off + && ip + 4 < in_end && ref > in_data + && *(u16 *) ref == *(u16 *) ip && ref[2] == ip[2] + ) { + /* match found at *ref++ */ + unsigned int len = 2; + unsigned int maxlen = in_end - ip - len; + maxlen = maxlen > max_ref ? max_ref : maxlen; + + do + len++; + while (len < maxlen && ref[len] == ip[len]); + + if (op + lit + 1 + 3 >= out_end) { + *out_len = PAGE_SIZE; + return 0; + } + + if (lit) { + *op++ = lit - 1; + lit = -lit; + do + *op++ = ip[lit]; + while (++lit); + } + + len -= 2; + ip++; + + if (len < 7) { + *op++ = (off >> 8) + (len << 5); + } else { + *op++ = (off >> 8) + (7 << 5); + *op++ = len - 7; + } + + *op++ = off; + + ip += len; + hval = first(ip); + hval = next(hval, ip); + htab[idx(hval)] = ip; + ip++; + continue; + } + } else if (ip == in_end) + break; + + /* one more literal byte we must copy */ + lit++; + ip++; + + if (lit == max_lit) { + if (op + 1 + max_lit >= out_end) { + *out_len = PAGE_SIZE; + return 0; + } + + *op++ = max_lit - 1; + memcpy(op, ip - max_lit, max_lit); + op += max_lit; + lit = 0; + } + } + + if (lit) { + if (op + lit + 1 >= out_end) { + *out_len = PAGE_SIZE; + return 0; + } + + *op++ = lit - 1; + lit = -lit; + do + *op++ = ip[lit]; + while (++lit); + } + + *out_len = op - out_data; + return 0; +} + +static int lzf_decompress(struct crypto_tfm *tfm, const u8 *src, + unsigned int slen, u8 *dst, unsigned int *dlen) +{ + u8 const *ip = src; + u8 *op = dst; + u8 const *const in_end = ip + slen; + u8 *const out_end = op + *dlen; + + *dlen = PAGE_SIZE; + do { + unsigned int ctrl = *ip++; + + if (ctrl < (1 << 5)) { /* literal run */ + ctrl++; + + if (op + ctrl > out_end) + return 0; + memcpy(op, ip, ctrl); + op += ctrl; + ip += ctrl; + } else { /* back reference */ + + unsigned int len = ctrl >> 5; + + u8 *ref = op - ((ctrl & 0x1f) << 8) - 1; + + if (len == 7) + len += *ip++; + + ref -= *ip++; + len += 2; + + if (op + len > out_end || ref < (u8 *) dst) + return 0; + + do + *op++ = *ref++; + while (--len); + } + } + while (op < out_end && ip < in_end); + + *dlen = op - (u8 *) dst; + return 0; +} + +static struct crypto_alg alg = { + .cra_name = "lzf", + .cra_flags = CRYPTO_ALG_TYPE_COMPRESS, + .cra_ctxsize = 0, + .cra_module = THIS_MODULE, + .cra_list = LIST_HEAD_INIT(alg.cra_list), + .cra_init = lzf_compress_init, + .cra_exit = lzf_compress_exit, + .cra_u = { .compress = { + .coa_compress = lzf_compress, + .coa_decompress = lzf_decompress } } +}; + +static int __init init(void) +{ + return crypto_register_alg(&alg); +} + +static void __exit fini(void) +{ + crypto_unregister_alg(&alg); +} + +module_init(init); +module_exit(fini); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("LZF Compression Algorithm"); +MODULE_AUTHOR("Marc Alexander Lehmann & Nigel Cunningham"); diff --git a/drivers/base/core.c b/drivers/base/core.c index d7fcf82..edbbf76 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -27,6 +27,8 @@ int (*platform_notify)(struct device * dev) = NULL; int (*platform_notify_remove)(struct device * dev) = NULL; +static int do_dump_stack; + /* * sysfs bindings for devices. */ @@ -638,6 +640,18 @@ int device_add(struct device *dev) class_intf->add_dev(dev, class_intf); up(&dev->class->sem); } + +#ifdef CONFIG_PM + if (!((dev->class && dev->class->resume) || + (dev->bus && (dev->bus->resume || dev->bus->resume_early))) && + !dev->pm_safe) { + printk("Device driver %s lacks bus and class support for " + "being resumed.\n", kobject_name(&dev->kobj)); + if (do_dump_stack) + dump_stack(); + } +#endif + Done: kfree(class_name); put_device(dev); @@ -975,6 +989,7 @@ struct device *device_create(struct class *class, struct device *parent, dev->class = class; dev->parent = parent; dev->release = device_create_release; + dev->pm_safe = 1; va_start(args, fmt); vsnprintf(dev->bus_id, BUS_ID_SIZE, fmt, args); @@ -1183,3 +1198,11 @@ out: } EXPORT_SYMBOL_GPL(device_move); + +static int __init pm_debug_dump_stack(char *str) +{ + do_dump_stack = 1; + return 1; +} + +__setup("pm_debug_dump_stack", pm_debug_dump_stack); diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c index b6073bd..32bd423 100644 --- a/drivers/macintosh/via-pmu.c +++ b/drivers/macintosh/via-pmu.c @@ -42,7 +42,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/md/md.c b/drivers/md/md.c index 509171c..76e5ac8 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5309,6 +5309,8 @@ void md_do_sync(mddev_t *mddev) last_mark = next; } + while(freezer_is_on()) + yield(); if (kthread_should_stop()) { /* diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index a3c1755..171be82 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -451,6 +451,10 @@ int __pci_register_driver(struct pci_driver *drv, struct module *owner, if (error) driver_unregister(&drv->driver); + if (!drv->resume) + printk("PCI driver %s lacks driver specific resume support.\n", + drv->name); + return error; } diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 9e3e943..e91fdef 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -788,6 +788,9 @@ int usb_register_driver(struct usb_driver *new_driver, struct module *owner, usbcore_name, new_driver->name); usbfs_update_special(); usb_create_newid_file(new_driver); + if (!new_driver->resume) + printk("USB driver %s lacks resume support.\n", + new_driver->name); } else { printk(KERN_ERR "%s: error %d registering interface " " driver %s\n", diff --git a/include/asm-i386/cacheflush.h b/include/asm-i386/cacheflush.h index 74e03c8..3bb8575 100644 --- a/include/asm-i386/cacheflush.h +++ b/include/asm-i386/cacheflush.h @@ -36,4 +36,6 @@ void kernel_map_pages(struct page *page, int numpages, int enable); void mark_rodata_ro(void); #endif +extern int page_is_mapped(struct page *page); + #endif /* _I386_CACHEFLUSH_H */ diff --git a/include/asm-i386/suspend.h b/include/asm-i386/suspend.h index 8dbaafe..e23fd20 100644 --- a/include/asm-i386/suspend.h +++ b/include/asm-i386/suspend.h @@ -8,6 +8,9 @@ static inline int arch_prepare_suspend(void) { return 0; } +extern int suspend2_faulted; +#define clear_suspend2_fault() do { suspend2_faulted = 0; } while(0) + /* image of the saved processor state */ struct saved_context { u16 es, fs, gs, ss; diff --git a/include/asm-ppc/suspend.h b/include/asm-ppc/suspend.h index 3df9f32..9d5db0e 100644 --- a/include/asm-ppc/suspend.h +++ b/include/asm-ppc/suspend.h @@ -10,3 +10,6 @@ static inline void save_processor_state(void) static inline void restore_processor_state(void) { } + +#define suspend2_faulted (0) +#define clear_suspend2_fault() do { } while(0) diff --git a/include/asm-x86_64/cacheflush.h b/include/asm-x86_64/cacheflush.h index ab1cb5c..b8e7def 100644 --- a/include/asm-x86_64/cacheflush.h +++ b/include/asm-x86_64/cacheflush.h @@ -32,4 +32,9 @@ int change_page_attr_addr(unsigned long addr, int numpages, pgprot_t prot); void mark_rodata_ro(void); #endif +static inline int page_is_mapped(struct page *page) +{ + return 1; +} + #endif /* _X8664_CACHEFLUSH_H */ diff --git a/include/asm-x86_64/suspend.h b/include/asm-x86_64/suspend.h index bc7f817..2f18e1b 100644 --- a/include/asm-x86_64/suspend.h +++ b/include/asm-x86_64/suspend.h @@ -12,6 +12,9 @@ arch_prepare_suspend(void) return 0; } +#define suspend2_faulted (0) +#define clear_suspend2_fault() do { } while(0) + /* Image of the saved processor state. If you touch this, fix acpi_wakeup.S. */ struct saved_context { u16 ds, es, fs, gs, ss; diff --git a/include/linux/device.h b/include/linux/device.h index 5cf30e9..69122c1 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -402,6 +402,7 @@ struct device { char bus_id[BUS_ID_SIZE]; /* position on parent bus */ struct device_type *type; unsigned is_registered:1; + unsigned pm_safe:1; /* No resume fn is ok? */ struct device_attribute uevent_attr; struct device_attribute *devt_attr; diff --git a/include/linux/dyn_pageflags.h b/include/linux/dyn_pageflags.h new file mode 100644 index 0000000..23d9127 --- /dev/null +++ b/include/linux/dyn_pageflags.h @@ -0,0 +1,68 @@ +/* + * include/linux/dyn_pageflags.h + * + * Copyright (C) 2004-2006 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It implements support for dynamically allocated bitmaps that are + * used for temporary or infrequently used pageflags, in lieu of + * bits in the struct page flags entry. + */ + +#ifndef DYN_PAGEFLAGS_H +#define DYN_PAGEFLAGS_H + +#include + +/* [pg_dat][zone][page_num] */ +typedef unsigned long **** dyn_pageflags_t; + +#if BITS_PER_LONG == 32 +#define UL_SHIFT 5 +#else +#if BITS_PER_LONG == 64 +#define UL_SHIFT 6 +#else +#error Bits per long not 32 or 64? +#endif +#endif + +#define BIT_NUM_MASK (sizeof(unsigned long) * 8 - 1) +#define PAGE_NUM_MASK (~((1 << (PAGE_SHIFT + 3)) - 1)) +#define UL_NUM_MASK (~(BIT_NUM_MASK | PAGE_NUM_MASK)) + +/* + * PAGENUMBER gives the index of the page within the zone. + * PAGEINDEX gives the index of the unsigned long within that page. + * PAGEBIT gives the index of the bit within the unsigned long. + */ +#define BITS_PER_PAGE (PAGE_SIZE << 3) +#define PAGENUMBER(zone_offset) ((int) (zone_offset >> (PAGE_SHIFT + 3))) +#define PAGEINDEX(zone_offset) ((int) ((zone_offset & UL_NUM_MASK) >> UL_SHIFT)) +#define PAGEBIT(zone_offset) ((int) (zone_offset & BIT_NUM_MASK)) + +#define PAGE_UL_PTR(bitmap, node, zone_num, zone_pfn) \ + ((bitmap[node][zone_num][PAGENUMBER(zone_pfn)])+PAGEINDEX(zone_pfn)) + +#define BITMAP_FOR_EACH_SET(bitmap, counter) \ + for (counter = get_next_bit_on(bitmap, max_pfn + 1); counter <= max_pfn; \ + counter = get_next_bit_on(bitmap, counter)) + +extern void clear_dyn_pageflags(dyn_pageflags_t pagemap); +extern int allocate_dyn_pageflags(dyn_pageflags_t *pagemap); +extern void free_dyn_pageflags(dyn_pageflags_t *pagemap); +extern unsigned long get_next_bit_on(dyn_pageflags_t bitmap, unsigned long counter); + +extern int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page); +extern void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page); +extern void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page *page); +#endif + +/* + * With the above macros defined, you can do... + * #define PagePageset1(page) (test_dynpageflag(&pageset1_map, page)) + * #define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page)) + * #define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page)) + */ + diff --git a/include/linux/freezer.h b/include/linux/freezer.h index 5e75e26..f49c9be 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -1,7 +1,10 @@ -/* Freezer declarations */ +#ifndef LINUX_FREEZER_H +#define LINUX_FREEZER_H #include +/* Freezer declarations */ + #ifdef CONFIG_PM /* * Check if a process has been frozen @@ -73,6 +76,18 @@ static inline int try_to_freeze(void) extern void thaw_some_processes(int all); +extern int freezer_state; +#define FREEZER_OFF 0 +#define FREEZER_USERSPACE_FROZEN 1 +#define FREEZER_FULLY_ON 2 + +static inline int freezer_is_on(void) +{ + return (freezer_state == FREEZER_FULLY_ON); +} + +extern void thaw_kernel_threads(void); + #else static inline int frozen(struct task_struct *p) { return 0; } static inline int freezing(struct task_struct *p) { return 0; } @@ -85,6 +100,9 @@ static inline int freeze_processes(void) { BUG(); return 0; } static inline void thaw_processes(void) {} static inline int try_to_freeze(void) { return 0; } +static inline int freezer_is_on(void) { return 0; } +static inline void thaw_kernel_threads(void) { } #endif +#endif diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 9ddf25c..c18ec95 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -113,6 +113,8 @@ extern int vsprintf(char *buf, const char *, va_list) __attribute__ ((format (printf, 2, 0))); extern int snprintf(char * buf, size_t size, const char * fmt, ...) __attribute__ ((format (printf, 3, 4))); +extern int snprintf_used(char *buffer, int buffer_size, + const char *fmt, ...); extern int vsnprintf(char *buf, size_t size, const char *fmt, va_list args) __attribute__ ((format (printf, 3, 0))); extern int scnprintf(char * buf, size_t size, const char * fmt, ...) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 2a20f48..b4d9db1 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -24,6 +24,8 @@ /* leave room for NETLINK_DM (DM Events) */ #define NETLINK_SCSITRANSPORT 18 /* SCSI Transports */ #define NETLINK_ECRYPTFS 19 +#define NETLINK_SUSPEND2_USERUI 20 /* For suspend2's userui */ +#define NETLINK_SUSPEND2_USM 21 /* For suspend2's userspace storage manager */ #define MAX_LINKS 32 diff --git a/include/linux/suspend.h b/include/linux/suspend.h index bf99bd4..02c1184 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -27,14 +27,9 @@ extern void mark_free_pages(struct zone *zone); /* kernel/power/swsusp.c */ extern int software_suspend(void); -#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) extern int pm_prepare_console(void); extern void pm_restore_console(void); #else -static inline int pm_prepare_console(void) { return 0; } -static inline void pm_restore_console(void) {} -#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */ -#else static inline int software_suspend(void) { printk("Warning: fake suspend called\n"); @@ -45,8 +40,6 @@ static inline int software_suspend(void) void save_processor_state(void); void restore_processor_state(void); struct saved_context; -void __save_processor_state(struct saved_context *ctxt); -void __restore_processor_state(struct saved_context *ctxt); unsigned long get_safe_page(gfp_t gfp_mask); /* @@ -55,4 +48,81 @@ unsigned long get_safe_page(gfp_t gfp_mask); */ #define PAGES_FOR_IO 1024 +enum { + SUSPEND_CAN_SUSPEND, + SUSPEND_CAN_RESUME, + SUSPEND_RUNNING, + SUSPEND_RESUME_DEVICE_OK, + SUSPEND_NORESUME_SPECIFIED, + SUSPEND_SANITY_CHECK_PROMPT, + SUSPEND_PAGESET2_NOT_LOADED, + SUSPEND_CONTINUE_REQ, + SUSPEND_RESUMED_BEFORE, + SUSPEND_RESUME_NOT_DONE, + SUSPEND_BOOT_TIME, + SUSPEND_NOW_RESUMING, + SUSPEND_IGNORE_LOGLEVEL, + SUSPEND_TRYING_TO_RESUME, + SUSPEND_TRY_RESUME_RD, + SUSPEND_LOADING_ALT_IMAGE, + SUSPEND_STOP_RESUME, + SUSPEND_IO_STOPPED, +}; + +#ifdef CONFIG_SUSPEND2 + +/* Used in init dir files */ +extern unsigned long suspend_state; +#define set_suspend_state(bit) (set_bit(bit, &suspend_state)) +#define clear_suspend_state(bit) (clear_bit(bit, &suspend_state)) +#define test_suspend_state(bit) (test_bit(bit, &suspend_state)) +extern int suspend2_running; + +#else /* !CONFIG_SUSPEND2 */ + +#define suspend_state (0) +#define set_suspend_state(bit) do { } while(0) +#define clear_suspend_state(bit) do { } while (0) +#define test_suspend_state(bit) (0) +#define suspend2_running (0) +#endif /* CONFIG_SUSPEND2 */ + +#ifdef CONFIG_SUSPEND_SHARED +#ifdef CONFIG_SUSPEND2 +extern void suspend2_try_resume(void); +#else +#define suspend2_try_resume() do { } while(0) +#endif + +extern int resume_attempted; + +#ifdef CONFIG_SOFTWARE_SUSPEND +extern int software_resume(void); +#else +static inline int software_resume(void) +{ + resume_attempted = 1; + suspend2_try_resume(); + return 0; +} +#endif + +static inline void check_resume_attempted(void) +{ + if (resume_attempted) + return; + + software_resume(); +} +#else +#define check_resume_attempted() do { } while(0) +#define resume_attempted (0) +#endif + +#ifdef CONFIG_PRINTK_NOSAVE +#define POSS_NOSAVE __nosavedata +#else +#define POSS_NOSAVE +#endif + #endif /* _LINUX_SWSUSP_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 0068688..ad3fead 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -190,8 +190,9 @@ extern void swap_setup(void); /* linux/mm/vmscan.c */ extern unsigned long try_to_free_pages(struct zone **, gfp_t, struct task_struct *p); extern unsigned long shrink_all_memory(unsigned long nr_pages); +extern void shrink_one_zone(struct zone *zone, int desired_size); extern int vm_mapped; extern int vm_hardmaplimit; extern int remove_mapping(struct address_space *mapping, struct page *page); extern long vm_total_pages; @@ -368,5 +369,10 @@ static inline swp_entry_t get_swap_page(void) #define disable_swap_token() do { } while(0) #endif /* CONFIG_SWAP */ + +/* For Suspend2 - unlink LRU pages while saving separately */ +void unlink_lru_lists(void); +void relink_lru_lists(void); + #endif /* __KERNEL__*/ #endif /* _LINUX_SWAP_H */ diff --git a/include/linux/time.h b/include/linux/time.h index 8ea8dea..5c07f00 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -224,4 +224,7 @@ struct itimerval { */ #define TIMER_ABSTIME 0x01 +extern void save_avenrun(void); +extern void restore_avenrun(void); + #endif diff --git a/init/do_mounts.c b/init/do_mounts.c index dc1ec08..3e67387 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -139,11 +139,16 @@ dev_t name_to_dev_t(char *name) char s[32]; char *p; dev_t res = 0; - int part; + int part, mount_result; #ifdef CONFIG_SYSFS int mkdir_err = sys_mkdir("/sys", 0700); - if (sys_mount("sysfs", "/sys", "sysfs", 0, NULL) < 0) + /* + * When changing resume2 parameter for Software Suspend, sysfs may + * already be mounted. + */ + mount_result = sys_mount("sysfs", "/sys", "sysfs", 0, NULL); + if (mount_result < 0 && mount_result != -EBUSY) goto out; #endif @@ -195,7 +200,8 @@ dev_t name_to_dev_t(char *name) res = try_name(s, part); done: #ifdef CONFIG_SYSFS - sys_umount("/sys", 0); + if (mount_result >= 0) + sys_umount("/sys", 0); out: if (!mkdir_err) sys_rmdir("/sys"); @@ -434,12 +440,27 @@ void __init prepare_namespace(void) is_floppy = MAJOR(ROOT_DEV) == FLOPPY_MAJOR; + /* Suspend2: + * By this point, suspend_early_init has been called to initialise our + * sysfs interface. If modules are built in, they have registered (all + * of the above via initcalls). + * + * We have not yet looked to see if an image exists, however. If we + * have an initrd, it is expected that the user will have set it up + * to echo > /sys/power/suspend2/do_resume and thus initiate any + * resume. If they don't do that, we do it immediately after the initrd + * is finished (major issues if they mount filesystems rw from the + * initrd! - they are warned. If there's no usable initrd, we do our + * check next. + */ if (initrd_load()) goto out; if (is_floppy && rd_doload && rd_load_disk(0)) ROOT_DEV = Root_RAM0; + check_resume_attempted(); + mount_root(); out: sys_mount(".", "/", NULL, MS_MOVE, NULL); diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c index 2cfd7cb..9c24ec4 100644 --- a/init/do_mounts_initrd.c +++ b/init/do_mounts_initrd.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include "do_mounts.h" @@ -58,10 +59,18 @@ static void __init handle_initrd(void) current->flags |= PF_NOFREEZE; pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD); if (pid > 0) { - while (pid != sys_wait4(-1, NULL, 0, NULL)) + while (pid != sys_wait4(-1, NULL, 0, NULL)) { yield(); + try_to_freeze(); + } } + if (!resume_attempted) + printk(KERN_ERR "Suspend2: No attempt was made to resume from " + "any image that might exist.\n"); + clear_suspend_state(SUSPEND_BOOT_TIME); + current->flags &= ~PF_NOFREEZE; + /* move initrd to rootfs' /old */ sys_fchdir(old_fd); sys_mount("/", ".", NULL, MS_MOVE, NULL); diff --git a/init/main.c b/init/main.c index a92989e..cd3f60f 100644 --- a/init/main.c +++ b/init/main.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include @@ -804,7 +805,9 @@ static int __init init(void * unused) /* * check if there is an early userspace init. If yes, let it do all - * the work + * the work. For suspend2, we assume that it will do the right thing + * with regard to trying to resume at the right place. When that + * happens, the BOOT_TIME flag will be cleared. */ if (!ramdisk_execute_command) diff --git a/kernel/kmod.c b/kernel/kmod.c index 7962761..f0d6fc1 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -34,6 +34,7 @@ #include #include #include +#include #include extern int max_threads; @@ -338,6 +339,11 @@ int call_usermodehelper_pipe(char *path, char **argv, char **envp, } sub_info.stdin = f; + if (freezer_is_on()) { + printk(KERN_WARNING "Freezer is on. Refusing to start %s.\n", path); + return -EBUSY; + } + queue_work(khelper_wq, &sub_info.work); wait_for_completion(&done); return sub_info.retval; diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig index 51a4dd0..4caac70 100644 --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -48,6 +48,18 @@ config DISABLE_CONSOLE_SUSPEND suspend/resume routines, but may itself lead to problems, for example if netconsole is used. +config PRINTK_NOSAVE + depends on PM && PM_DEBUG + bool "Preserve printk data from boot kernel when resuming." + default n + ---help--- + This option gives printk data and the associated variables the + attribute __nosave, which means that they will not be saved as + part of the image. The net effect is that after resuming, your + dmesg will show the messages from prior to the atomic restore, + instead of the messages from the resumed kernel. This may be + useful for debugging hibernation. + config PM_TRACE bool "Suspend/resume event tracing" depends on PM && PM_DEBUG && X86_32 && EXPERIMENTAL @@ -162,3 +174,174 @@ config APM_EMULATION random kernel OOPSes or reboots that don't seem to be related to anything, try disabling/enabling this option (or disabling/enabling APM in your BIOS). + +menuconfig SUSPEND2_CORE + tristate "Suspend2" + depends on PM + select DYN_PAGEFLAGS + select HOTPLUG_CPU if SMP + default y + ---help--- + Suspend2 is the 'new and improved' suspend support. + + See the Suspend2 home page (suspend2.net) + for FAQs, HOWTOs and other documentation. + + comment "Image Storage (you need at least one allocator)" + depends on SUSPEND2_CORE + + config SUSPEND2_FILE + tristate "File Allocator" + depends on SUSPEND2_CORE + default y + ---help--- + This option enables support for storing an image in a + simple file. This should be possible, but we're still + testing it. + + config SUSPEND2_SWAP + tristate "Swap Allocator" + depends on SUSPEND2_CORE + default y + select SWAP + ---help--- + This option enables support for storing an image in your + swap space. + + comment "General Options" + depends on SUSPEND2_CORE + + config SUSPEND2_CRYPTO + tristate "Compression support" + depends on SUSPEND2_CORE && CRYPTO + default y + ---help--- + This option adds support for using cryptoapi compression + algorithms. Compression is particularly useful as + the LZF support that comes with the Suspend2 patch can double + your suspend and resume speed. + + You probably want this, so say Y here. + + comment "No compression support available without Cryptoapi support." + depends on SUSPEND2_CORE && !CRYPTO + + config SUSPEND2_USERUI + tristate "Userspace User Interface support" + depends on SUSPEND2_CORE && NET + default y + ---help--- + This option enabled support for a userspace based user interface + to Suspend2, which allows you to have a nice display while suspending + and resuming, and also enables features such as pressing escape to + cancel a cycle or interactive debugging. + + config SUSPEND2_DEFAULT_RESUME2 + string "Default resume device name" + depends on SUSPEND2_CORE + ---help--- + You normally need to add a resume2= parameter to your lilo.conf or + equivalent. With this option properly set, the kernel has a value + to default. No damage will be done if the value is invalid. + + config SUSPEND2_KEEP_IMAGE + bool "Allow Keep Image Mode" + depends on SUSPEND2_CORE + ---help--- + This option allows you to keep and image and reuse it. It is intended + __ONLY__ for use with systems where all filesystems are mounted read- + only (kiosks, for example). To use it, compile this option in and boot + normally. Set the KEEP_IMAGE flag in /sys/power/suspend2 and suspend. + When you resume, the image will not be removed. You will be unable to turn + off swap partitions (assuming you are using the swap allocator), but future + suspends simply do a power-down. The image can be updated using the + kernel command line parameter suspend_act= to turn off the keep image + bit. Keep image mode is a little less user friendly on purpose - it + should not be used without thought! + + config SUSPEND2_REPLACE_SWSUSP + bool "Replace swsusp by default" + default y + depends on SUSPEND2_CORE + ---help--- + Suspend2 can replace swsusp. This option makes that the default state, + requiring you to echo 0 > /sys/power/suspend2/replace_swsusp if you want + to use the vanilla kernel functionality. Note that your initrd/ramfs will + need to do this before trying to resume, too. + With overriding swsusp enabled, Suspend2 will use both the resume= and + noresume commandline options _and_ the resume2= and noresume2 ones (for + compatibility). resume= takes precedence over resume2=. Echoing disk + to /sys/power/state will start a Suspend2 cycle. If resume= doesn't + specify an allocator and both the swap and file allocators are compiled in, + the swap allocator will be used by default. + + config SUSPEND2_CLUSTER + tristate "Cluster support" + default n + depends on SUSPEND2_CORE && NET && BROKEN + ---help--- + Support for linking multiple machines in a cluster so that they suspend + and resume together. + + config SUSPEND2_DEFAULT_CLUSTER_MASTER + string "Default cluster master address/port" + depends on SUSPEND2_CLUSTER + ---help--- + If this machine will be the master, simply enter a port on which to + listen for slaves. + If this machine will be a slave, enter the ip address and port on + which the master listens with a colon separating them. + If no value is set here, cluster support will be disabled by default. + + config SUSPEND2_CHECKSUM + bool "Checksum pageset2" + depends on SUSPEND2_CORE + select CRYPTO + select CRYPTO_ALGAPI + select CRYPTO_MD5 + ---help--- + Adds support for checksumming pageset2 pages, to ensure you really get an + atomic copy. Should not normally be needed, but here for verification and + diagnostic purposes. + +config SUSPEND_SHARED + bool + depends on SUSPEND2_CORE || SOFTWARE_SUSPEND + default y + +config SUSPEND2_USERUI_EXPORTS + bool + depends on SUSPEND2_USERUI=m + default y + +config SUSPEND2_SWAP_EXPORTS + bool + depends on SUSPEND2_SWAP=m + default y + +config SUSPEND2_FILE_EXPORTS + bool + depends on SUSPEND2_FILE=m + default y + +config SUSPEND2_CRYPTO_EXPORTS + bool + depends on SUSPEND2_CRYPTO=m + default y + +config SUSPEND2_CORE_EXPORTS + bool + depends on SUSPEND2_CORE=m + default y + +config SUSPEND2_EXPORTS + bool + depends on SUSPEND2_SWAP_EXPORTS || SUSPEND2_FILE_EXPORTS || \ + SUSPEND2_CRYPTO_EXPORTS || SUSPEND2_CLUSTER=m || \ + SUSPEND2_USERUI_EXPORTS + default y + +config SUSPEND2 + bool + depends on SUSPEND2_CORE!=n + default y diff --git a/kernel/power/Makefile b/kernel/power/Makefile index 38725f5..a4a8c5b 100644 --- a/kernel/power/Makefile +++ b/kernel/power/Makefile @@ -5,6 +5,32 @@ endif obj-y := main.o process.o console.o obj-$(CONFIG_PM_LEGACY) += pm.o -obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o disk.o snapshot.o swap.o user.o +obj-$(CONFIG_SUSPEND_SHARED) += snapshot.o + +suspend_core-objs := modules.o sysfs.o suspend.o \ + io.o pagedir.o prepare_image.o \ + extent.o pageflags.o ui.o \ + power_off.o atomic_copy.o + +obj-$(CONFIG_SUSPEND2) += suspend2_builtin.o + +ifdef CONFIG_SUSPEND2_CHECKSUM +suspend_core-objs += checksum.o +endif + +ifdef CONFIG_NET +suspend_core-objs += storage.o netlink.o +endif + +obj-$(CONFIG_SUSPEND2_CORE) += suspend_core.o +obj-$(CONFIG_SUSPEND2_CRYPTO) += suspend_compress.o + +obj-$(CONFIG_SUSPEND2_SWAP) += suspend_block_io.o suspend_swap.o +obj-$(CONFIG_SUSPEND2_FILE) += suspend_block_io.o suspend_file.o +obj-$(CONFIG_SUSPEND2_CLUSTER) += cluster.o + +obj-$(CONFIG_SUSPEND2_USERUI) += suspend_userui.o + +obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o disk.o swap.o user.o obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o diff --git a/kernel/power/atomic_copy.c b/kernel/power/atomic_copy.c new file mode 100644 index 0000000..d2b8cd7 --- /dev/null +++ b/kernel/power/atomic_copy.c @@ -0,0 +1,416 @@ +/* + * kernel/power/atomic_copy.c + * + * Copyright 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * Copyright (C) 2006 Red Hat, inc. + * + * Distributed under GPLv2. + * + * Routines for doing the atomic save/restore. + */ + +#include +#include +#include +#include +#include +#include "suspend.h" +#include "storage.h" +#include "power_off.h" +#include "ui.h" +#include "power.h" +#include "io.h" +#include "prepare_image.h" +#include "pageflags.h" +#include "checksum.h" +#include "suspend2_builtin.h" + +int extra_pd1_pages_used; + +/* + * Highmem related functions (x86 only). + */ + +#ifdef CONFIG_HIGHMEM + +/** + * copyback_high: Restore highmem pages. + * + * Highmem data and pbe lists are/can be stored in highmem. + * The format is slightly different to the lowmem pbe lists + * used for the assembly code: the last pbe in each page is + * a struct page * instead of struct pbe *, pointing to the + * next page where pbes are stored (or NULL if happens to be + * the end of the list). Since we don't want to generate + * unnecessary deltas against swsusp code, we use a cast + * instead of a union. + **/ + +static void copyback_high(void) +{ + struct page * pbe_page = (struct page *) restore_highmem_pblist; + struct pbe *this_pbe, *first_pbe; + unsigned long *origpage, *copypage; + int pbe_index = 1; + + if (!pbe_page) + return; + + this_pbe = (struct pbe *) kmap_atomic(pbe_page, KM_BOUNCE_READ); + first_pbe = this_pbe; + + while (this_pbe) { + int loop = (PAGE_SIZE / sizeof(unsigned long)) - 1; + + origpage = kmap_atomic((struct page *) this_pbe->orig_address, + KM_BIO_DST_IRQ); + copypage = kmap_atomic((struct page *) this_pbe->address, + KM_BIO_SRC_IRQ); + + while (loop >= 0) { + *(origpage + loop) = *(copypage + loop); + loop--; + } + + kunmap_atomic(origpage, KM_BIO_DST_IRQ); + kunmap_atomic(copypage, KM_BIO_SRC_IRQ); + + if (!this_pbe->next) + break; + + if (pbe_index < PBES_PER_PAGE) { + this_pbe++; + pbe_index++; + } else { + pbe_page = (struct page *) this_pbe->next; + kunmap_atomic(first_pbe, KM_BOUNCE_READ); + if (!pbe_page) + return; + this_pbe = (struct pbe *) kmap_atomic(pbe_page, + KM_BOUNCE_READ); + first_pbe = this_pbe; + pbe_index = 1; + } + } + kunmap_atomic(first_pbe, KM_BOUNCE_READ); +} + +#else /* CONFIG_HIGHMEM */ +void copyback_high(void) { } +#endif + +/** + * free_pbe_list: Free page backup entries used by the atomic copy code. + * + * Normally, this function isn't used. If, however, we need to abort before + * doing the atomic copy, we use this to free the pbes previously allocated. + **/ +static void free_pbe_list(struct pbe **list, int highmem) +{ + struct pbe *free_pbe = *list; + struct page *page = (struct page *) free_pbe; + + do { + int i; + + if (highmem) + free_pbe = (struct pbe *) kmap(page); + + for (i = 0; i < PBES_PER_PAGE; i++) { + if (!free_pbe) + break; + __free_page(free_pbe->address); + free_pbe = free_pbe->next; + } + + if (highmem) { + struct page *next_page = NULL; + if (free_pbe) + next_page = (struct page *) free_pbe->next; + kunmap(page); + __free_page(page); + page = next_page; + } + + } while(page && free_pbe); + + *list = NULL; +} + +/** + * copyback_post: Post atomic-restore actions. + * + * After doing the atomic restore, we have a few more things to do: + * 1) We want to retain some values across the restore, so we now copy + * these from the nosave variables to the normal ones. + * 2) Set the status flags. + * 3) Resume devices. + * 4) Get userui to redraw. + * 5) Reread the page cache. + **/ + +void copyback_post(void) +{ + int loop; + + suspend_action = suspend2_nosave_state1; + suspend_debug_state = suspend2_nosave_state2; + console_loglevel = suspend2_nosave_state3; + + for (loop = 0; loop < 4; loop++) + suspend_io_time[loop/2][loop%2] = + suspend2_nosave_io_speed[loop/2][loop%2]; + + set_suspend_state(SUSPEND_NOW_RESUMING); + set_suspend_state(SUSPEND_PAGESET2_NOT_LOADED); + + if (suspend_activate_storage(1)) + panic("Failed to reactivate our storage."); + + suspend_ui_redraw(); + + suspend_cond_pause(1, "About to reload secondary pagedir."); + + if (read_pageset2(0)) + panic("Unable to successfully reread the page cache."); + + clear_suspend_state(SUSPEND_PAGESET2_NOT_LOADED); +} + +/** + * suspend_copy_pageset1: Do the atomic copy of pageset1. + * + * Make the atomic copy of pageset1. We can't use copy_page (as we once did) + * because we can't be sure what side effects it has. On my old Duron, with + * 3DNOW, kernel_fpu_begin increments preempt count, making our preempt + * count at resume time 4 instead of 3. + * + * We don't want to call kmap_atomic unconditionally because it has the side + * effect of incrementing the preempt count, which will leave it one too high + * post resume (the page containing the preempt count will be copied after + * its incremented. This is essentially the same problem. + **/ + +void suspend_copy_pageset1(void) +{ + int i; + unsigned long source_index, dest_index; + + source_index = get_next_bit_on(pageset1_map, max_pfn + 1); + dest_index = get_next_bit_on(pageset1_copy_map, max_pfn + 1); + + for (i = 0; i < pagedir1.size; i++) { + unsigned long *origvirt, *copyvirt; + struct page *origpage, *copypage; + int loop = (PAGE_SIZE / sizeof(unsigned long)) - 1; + + origpage = pfn_to_page(source_index); + copypage = pfn_to_page(dest_index); + + origvirt = PageHighMem(origpage) ? + kmap_atomic(origpage, KM_USER0) : + page_address(origpage); + + copyvirt = PageHighMem(copypage) ? + kmap_atomic(copypage, KM_USER1) : + page_address(copypage); + + while (loop >= 0) { + *(copyvirt + loop) = *(origvirt + loop); + loop--; + } + + if (PageHighMem(origpage)) + kunmap_atomic(origvirt, KM_USER0); + else if (suspend2_faulted) { + printk("%p (%lu) being unmapped after faulting during atomic copy.\n", origpage, source_index); + kernel_map_pages(origpage, 1, 0); + clear_suspend2_fault(); + } + + if (PageHighMem(copypage)) + kunmap_atomic(copyvirt, KM_USER1); + + source_index = get_next_bit_on(pageset1_map, source_index); + dest_index = get_next_bit_on(pageset1_copy_map, dest_index); + } +} + +/** + * __suspend_post_context_save: Steps after saving the cpu context. + * + * Steps taken after saving the CPU state to make the actual + * atomic copy. + * + * Called from swsusp_save in snapshot.c via suspend_post_context_save. + **/ + +int __suspend_post_context_save(void) +{ + int old_ps1_size = pagedir1.size; + + calculate_check_checksums(1); + + free_checksum_pages(); + + suspend_recalculate_image_contents(1); + + extra_pd1_pages_used = pagedir1.size - old_ps1_size; + + if (extra_pd1_pages_used > extra_pd1_pages_allowance) { + printk("Pageset1 has grown by %d pages. " + "extra_pages_allowance is currently only %d.\n", + pagedir1.size - old_ps1_size, + extra_pd1_pages_allowance); + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_EXTRA_PAGES_ALLOW_TOO_SMALL); + return -1; + } + + if (!test_action_state(SUSPEND_TEST_FILTER_SPEED) && + !test_action_state(SUSPEND_TEST_BIO)) + suspend_copy_pageset1(); + + return 0; +} + +/** + * suspend2_suspend: High level code for doing the atomic copy. + * + * High-level code which prepares to do the atomic copy. Loosely based + * on the swsusp version, but with the following twists: + * - We set suspend2_running so the swsusp code uses our code paths. + * - We give better feedback regarding what goes wrong if there is a problem. + * - We use an extra function to call the assembly, just in case this code + * is in a module (return address). + **/ + +int suspend2_suspend(void) +{ + int error; + + suspend2_running = 1; /* For the swsusp code we use :< */ + + if (test_action_state(SUSPEND_PM_PREPARE_CONSOLE)) + pm_prepare_console(); + + if ((error = arch_prepare_suspend())) + goto err_out; + + local_irq_disable(); + + /* At this point, device_suspend() has been called, but *not* + * device_power_down(). We *must* device_power_down() now. + * Otherwise, drivers for some devices (e.g. interrupt controllers) + * become desynchronized with the actual state of the hardware + * at resume time, and evil weirdness ensues. + */ + + if ((error = device_power_down(PMSG_FREEZE))) { + set_result_state(SUSPEND_DEVICE_REFUSED); + set_result_state(SUSPEND_ABORTED); + printk(KERN_ERR "Some devices failed to power down, aborting suspend\n"); + goto enable_irqs; + } + + error = suspend2_lowlevel_builtin(); + + if (!suspend2_in_suspend) + copyback_high(); + + device_power_up(); +enable_irqs: + local_irq_enable(); + if (test_action_state(SUSPEND_PM_PREPARE_CONSOLE)) + pm_restore_console(); +err_out: + suspend2_running = 0; + return error; +} + +/** + * suspend_atomic_restore: Prepare to do the atomic restore. + * + * Get ready to do the atomic restore. This part gets us into the same + * state we are in prior to do calling do_suspend2_lowlevel while + * suspending: hot-unplugging secondary cpus and freeze processes, + * before starting the thread that will do the restore. + **/ + +int suspend_atomic_restore(void) +{ + int error, loop; + + suspend2_running = 1; + + suspend_prepare_status(DONT_CLEAR_BAR, "Prepare console"); + + if (test_action_state(SUSPEND_PM_PREPARE_CONSOLE)) + pm_prepare_console(); + + suspend_prepare_status(DONT_CLEAR_BAR, "Device suspend."); + + suspend_console(); + if ((error = device_suspend(PMSG_PRETHAW))) { + printk("Some devices failed to suspend\n"); + goto device_resume; + } + + if (test_action_state(SUSPEND_LATE_CPU_HOTPLUG)) { + suspend_prepare_status(DONT_CLEAR_BAR, "Disable nonboot cpus."); + if (disable_nonboot_cpus()) { + set_result_state(SUSPEND_CPU_HOTPLUG_FAILED); + set_result_state(SUSPEND_ABORTED); + goto device_resume; + } + } + + suspend_prepare_status(DONT_CLEAR_BAR, "Atomic restore preparation"); + + suspend2_nosave_state1 = suspend_action; + suspend2_nosave_state2 = suspend_debug_state; + suspend2_nosave_state3 = console_loglevel; + + for (loop = 0; loop < 4; loop++) + suspend2_nosave_io_speed[loop/2][loop%2] = + suspend_io_time[loop/2][loop%2]; + memcpy(suspend2_nosave_commandline, saved_command_line, COMMAND_LINE_SIZE); + + mb(); + + local_irq_disable(); + + if (device_power_down(PMSG_FREEZE)) { + printk(KERN_ERR "Some devices failed to power down. Very bad.\n"); + goto device_power_up; + } + + /* We'll ignore saved state, but this gets preempt count (etc) right */ + save_processor_state(); + + error = swsusp_arch_resume(); + /* + * Code below is only ever reached in case of failure. Otherwise + * execution continues at place where swsusp_arch_suspend was called. + * + * We don't know whether it's safe to continue (this shouldn't happen), + * so lets err on the side of caution. + */ + BUG(); + +device_power_up: + device_power_up(); + if (test_action_state(SUSPEND_LATE_CPU_HOTPLUG)) + enable_nonboot_cpus(); +device_resume: + device_resume(); + resume_console(); + free_pbe_list(&restore_pblist, 0); +#ifdef CONFIG_HIGHMEM + free_pbe_list(&restore_highmem_pblist, 1); +#endif + if (test_action_state(SUSPEND_PM_PREPARE_CONSOLE)) + pm_restore_console(); + suspend2_running = 0; + return 1; +} diff --git a/kernel/power/block_io.h b/kernel/power/block_io.h new file mode 100644 index 0000000..49eaa51 --- /dev/null +++ b/kernel/power/block_io.h @@ -0,0 +1,55 @@ +/* + * kernel/power/block_io.h + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * Copyright (C) 2006 Red Hat, inc. + * + * Distributed under GPLv2. + * + * This file contains declarations for functions exported from + * block_io.c, which contains low level io functions. + */ + +#include +#include "extent.h" + +struct suspend_bdev_info { + struct block_device *bdev; + dev_t dev_t; + int bmap_shift; + int blocks_per_page; +}; + +/* + * Our exported interface so the swapwriter and filewriter don't + * need these functions duplicated. + */ +struct suspend_bio_ops { + int (*bdev_page_io) (int rw, struct block_device *bdev, long pos, + struct page *page); + void (*check_io_stats) (void); + void (*reset_io_stats) (void); + void (*finish_all_io) (void); + int (*forward_one_page) (void); + void (*set_extra_page_forward) (void); + void (*set_devinfo) (struct suspend_bdev_info *info); + int (*read_chunk) (unsigned long *index, struct page *buffer_page, + unsigned int *buf_size, int sync); + int (*write_chunk) (unsigned long index, struct page *buffer_page, + unsigned int buf_size); + void (*read_header_init) (void); + int (*rw_header_chunk) (int rw, struct suspend_module_ops *owner, + char *buffer, int buffer_size); + int (*write_header_chunk_finish) (void); + int (*rw_init) (int rw, int stream_number); + int (*rw_cleanup) (int rw); +}; + +extern struct suspend_bio_ops suspend_bio_ops; + +extern char *suspend_writer_buffer; +extern int suspend_writer_buffer_posn; +extern int suspend_read_fd; +extern struct extent_iterate_saved_state suspend_writer_posn_save[3]; +extern struct extent_iterate_state suspend_writer_posn; +extern int suspend_header_bytes_used; diff --git a/kernel/power/checksum.c b/kernel/power/checksum.c new file mode 100644 index 0000000..356af21 --- /dev/null +++ b/kernel/power/checksum.c @@ -0,0 +1,371 @@ +/* + * kernel/power/checksum.c + * + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * Copyright (C) 2006 Red Hat, inc. + * + * This file is released under the GPLv2. + * + * This file contains data checksum routines for suspend2, + * using cryptoapi. They are used to locate any modifications + * made to pageset 2 while we're saving it. + */ + +#include +#include +#include +#include +#include +#include + +#include "suspend.h" +#include "modules.h" +#include "sysfs.h" +#include "io.h" +#include "pageflags.h" +#include "checksum.h" +#include "pagedir.h" + +static struct suspend_module_ops suspend_checksum_ops; + +/* Constant at the mo, but I might allow tuning later */ +static char suspend_checksum_name[32] = "md5"; +/* Bytes per checksum */ +#define CHECKSUM_SIZE (128 / 8) + +#define CHECKSUMS_PER_PAGE ((PAGE_SIZE - sizeof(void *)) / CHECKSUM_SIZE) + +static struct crypto_hash *suspend_checksum_transform; +static struct hash_desc desc; +static int pages_allocated; +static unsigned long page_list; + +static int suspend_num_resaved = 0; + +#if 1 +#define PRINTK(a, b...) do { } while(0) +#else +#define PRINTK(a, b...) do { printk(a, ##b); } while(0) +#endif + +/* ---- Local buffer management ---- */ + +/* + * suspend_checksum_cleanup + * + * Frees memory allocated for our labours. + */ +static void suspend_checksum_cleanup(int ending_cycle) +{ + if (ending_cycle && suspend_checksum_transform) { + crypto_free_hash(suspend_checksum_transform); + suspend_checksum_transform = NULL; + desc.tfm = NULL; + } +} + +/* + * suspend_crypto_prepare + * + * Prepare to do some work by allocating buffers and transforms. + * Returns: Int: Zero. Even if we can't set up checksum, we still + * seek to suspend. + */ +static int suspend_checksum_prepare(int starting_cycle) +{ + if (!starting_cycle || !suspend_checksum_ops.enabled) + return 0; + + if (!*suspend_checksum_name) { + printk("Suspend2: No checksum algorithm name set.\n"); + return 1; + } + + suspend_checksum_transform = crypto_alloc_hash(suspend_checksum_name, 0, 0); + if (IS_ERR(suspend_checksum_transform)) { + printk("Suspend2: Failed to initialise the %s checksum algorithm: %ld.\n", + suspend_checksum_name, + (long) suspend_checksum_transform); + suspend_checksum_transform = NULL; + return 1; + } + + desc.tfm = suspend_checksum_transform; + desc.flags = 0; + + return 0; +} + +static int suspend_print_task_if_using_page(struct task_struct *t, struct page *seeking) +{ + struct vm_area_struct *vma; + struct mm_struct *mm; + int result = 0; + + mm = t->active_mm; + + if (!mm || !mm->mmap) return 0; + + /* Don't try to take the sem when processes are frozen, + * drivers are suspended and irqs are disabled. We're + * not racing with anything anyway. */ + if (!irqs_disabled()) + down_read(&mm->mmap_sem); + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + if (vma->vm_flags & VM_PFNMAP) + continue; + if (vma->vm_start) { + unsigned long posn; + for (posn = vma->vm_start; posn < vma->vm_end; + posn += PAGE_SIZE) { + struct page *page = + follow_page(vma, posn, 0); + if (page == seeking) { + printk("%s(%d)", t->comm, t->pid); + result = 1; + goto out; + } + } + } + } + +out: + if (!irqs_disabled()) + up_read(&mm->mmap_sem); + + return result; +} + +static void print_tasks_using_page(struct page *seeking) +{ + struct task_struct *p; + + read_lock(&tasklist_lock); + for_each_process(p) { + if (suspend_print_task_if_using_page(p, seeking)) + printk(" "); + } + read_unlock(&tasklist_lock); +} + +/* + * suspend_checksum_print_debug_stats + * @buffer: Pointer to a buffer into which the debug info will be printed. + * @size: Size of the buffer. + * + * Print information to be recorded for debugging purposes into a buffer. + * Returns: Number of characters written to the buffer. + */ + +static int suspend_checksum_print_debug_stats(char *buffer, int size) +{ + int len; + + if (!suspend_checksum_ops.enabled) + return snprintf_used(buffer, size, + "- Checksumming disabled.\n"); + + len = snprintf_used(buffer, size, "- Checksum method is '%s'.\n", + suspend_checksum_name); + len+= snprintf_used(buffer + len, size - len, + " %d pages resaved in atomic copy.\n", suspend_num_resaved); + return len; +} + +static int suspend_checksum_storage_needed(void) +{ + if (suspend_checksum_ops.enabled) + return strlen(suspend_checksum_name) + sizeof(int) + 1; + else + return 0; +} + +/* + * suspend_checksum_save_config_info + * @buffer: Pointer to a buffer of size PAGE_SIZE. + * + * Save informaton needed when reloading the image at resume time. + * Returns: Number of bytes used for saving our data. + */ +static int suspend_checksum_save_config_info(char *buffer) +{ + int namelen = strlen(suspend_checksum_name) + 1; + int total_len; + + *((unsigned int *) buffer) = namelen; + strncpy(buffer + sizeof(unsigned int), suspend_checksum_name, + namelen); + total_len = sizeof(unsigned int) + namelen; + return total_len; +} + +/* suspend_checksum_load_config_info + * @buffer: Pointer to the start of the data. + * @size: Number of bytes that were saved. + * + * Description: Reload information needed for dechecksuming the image at + * resume time. + */ +static void suspend_checksum_load_config_info(char *buffer, int size) +{ + int namelen; + + namelen = *((unsigned int *) (buffer)); + strncpy(suspend_checksum_name, buffer + sizeof(unsigned int), + namelen); + return; +} + +/* + * Free Checksum Memory + */ + +void free_checksum_pages(void) +{ + PRINTK("Freeing %d checksum pages.\n", pages_allocated); + while (pages_allocated) { + unsigned long next = *((unsigned long *) page_list); + PRINTK("Page %3d is at %lx and points to %lx.\n", pages_allocated, page_list, next); + ClearPageNosave(virt_to_page(page_list)); + free_page((unsigned long) page_list); + page_list = next; + pages_allocated--; + } +} + +/* + * Allocate Checksum Memory + */ + +int allocate_checksum_pages(void) +{ + int pages_needed = DIV_ROUND_UP(pagedir2.size, CHECKSUMS_PER_PAGE); + + if (!suspend_checksum_ops.enabled) + return 0; + + PRINTK("Need %d checksum pages for %ld pageset2 pages.\n", pages_needed, pagedir2.size); + while (pages_allocated < pages_needed) { + unsigned long *new_page = + (unsigned long *) get_zeroed_page(GFP_ATOMIC); + if (!new_page) + return -ENOMEM; + SetPageNosave(virt_to_page(new_page)); + (*new_page) = page_list; + page_list = (unsigned long) new_page; + pages_allocated++; + PRINTK("Page %3d is at %lx and points to %lx.\n", pages_allocated, page_list, *((unsigned long *) page_list)); + } + + return 0; +} + +#if 0 +static void print_checksum(char *buf, int size) +{ + int index; + + for (index = 0; index < size; index++) + printk("%x ", buf[index]); + + printk("\n"); +} +#endif + +/* + * Calculate checksums + */ + +void calculate_check_checksums(int check) +{ + int pfn, index = 0; + unsigned long next_page, this_checksum = 0; + struct scatterlist sg[2]; + char current_checksum[CHECKSUM_SIZE]; + + if (!suspend_checksum_ops.enabled) + return; + + next_page = (unsigned long) page_list; + + if (check) + suspend_num_resaved = 0; + + BITMAP_FOR_EACH_SET(pageset2_map, pfn) { + int ret; + if (index % CHECKSUMS_PER_PAGE) { + this_checksum += CHECKSUM_SIZE; + } else { + this_checksum = next_page + sizeof(void *); + next_page = *((unsigned long *) next_page); + } + PRINTK("Put checksum for page %3d %p in %lx.\n", index, page_address(pfn_to_page(pfn)), this_checksum); + sg_set_buf(&sg[0], page_address(pfn_to_page(pfn)), PAGE_SIZE); + if (check) { + ret = crypto_hash_digest(&desc, sg, + PAGE_SIZE, current_checksum); + if (memcmp(current_checksum, (char *) this_checksum, CHECKSUM_SIZE)) { + SetPageResave(pfn_to_page(pfn)); + printk("Page %d changed. Saving in atomic copy." + "Processes using it:", pfn); + print_tasks_using_page(pfn_to_page(pfn)); + printk("\n"); + suspend_num_resaved++; + if (test_action_state(SUSPEND_ABORT_ON_RESAVE_NEEDED)) + set_result_state(SUSPEND_ABORTED); + } + } else + ret = crypto_hash_digest(&desc, sg, + PAGE_SIZE, (char *) this_checksum); + if (ret) { + printk("Digest failed. Returned %d.\n", ret); + return; + } + index++; + } +} + +static struct suspend_sysfs_data sysfs_params[] = { + { SUSPEND2_ATTR("enabled", SYSFS_RW), + SYSFS_INT(&suspend_checksum_ops.enabled, 0, 1, 0) + }, + + { SUSPEND2_ATTR("abort_if_resave_needed", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_ABORT_ON_RESAVE_NEEDED, 0) + } +}; + +/* + * Ops structure. + */ +static struct suspend_module_ops suspend_checksum_ops = { + .type = MISC_MODULE, + .name = "Checksumming", + .directory = "checksum", + .module = THIS_MODULE, + .initialise = suspend_checksum_prepare, + .cleanup = suspend_checksum_cleanup, + .print_debug_info = suspend_checksum_print_debug_stats, + .save_config_info = suspend_checksum_save_config_info, + .load_config_info = suspend_checksum_load_config_info, + .storage_needed = suspend_checksum_storage_needed, + + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +/* ---- Registration ---- */ +int s2_checksum_init(void) +{ + int result = suspend_register_module(&suspend_checksum_ops); + + /* Disabled by default */ + suspend_checksum_ops.enabled = 0; + return result; +} + +void s2_checksum_exit(void) +{ + suspend_unregister_module(&suspend_checksum_ops); +} diff --git a/kernel/power/checksum.h b/kernel/power/checksum.h new file mode 100644 index 0000000..9984eec --- /dev/null +++ b/kernel/power/checksum.h @@ -0,0 +1,27 @@ +/* + * kernel/power/checksum.h + * + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * Copyright (C) 2006 Red Hat, inc. + * + * This file is released under the GPLv2. + * + * This file contains data checksum routines for suspend2, + * using cryptoapi. They are used to locate any modifications + * made to pageset 2 while we're saving it. + */ + +#if defined(CONFIG_SUSPEND2_CHECKSUM) +extern int s2_checksum_init(void); +extern void s2_checksum_exit(void); +void calculate_check_checksums(int check); +int allocate_checksum_pages(void); +void free_checksum_pages(void); +#else +static inline int s2_checksum_init(void) { return 0; } +static inline void s2_checksum_exit(void) { } +static inline void calculate_check_checksums(int check) { }; +static inline int allocate_checksum_pages(void) { return 0; }; +static inline void free_checksum_pages(void) { }; +#endif + diff --git a/kernel/power/cluster.c b/kernel/power/cluster.c new file mode 100644 index 0000000..b5ab9ad --- /dev/null +++ b/kernel/power/cluster.c @@ -0,0 +1,152 @@ +/* + * kernel/power/cluster.c + * + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * This file contains routines for cluster hibernation support. + * + */ + +#include +#include + +#include "suspend.h" +#include "modules.h" +#include "sysfs.h" +#include "io.h" + +static char suspend_cluster_master[63] = CONFIG_SUSPEND2_DEFAULT_CLUSTER_MASTER; + +static struct suspend_module_ops suspend_cluster_ops; + +/* suspend_cluster_print_debug_stats + * + * Description: Print information to be recorded for debugging purposes into a + * buffer. + * Arguments: buffer: Pointer to a buffer into which the debug info will be + * printed. + * size: Size of the buffer. + * Returns: Number of characters written to the buffer. + */ +static int suspend_cluster_print_debug_stats(char *buffer, int size) +{ + int len; + + if (strlen(suspend_cluster_master)) + len = snprintf_used(buffer, size, "- Cluster master is '%s'.\n", + suspend_cluster_master); + else + len = snprintf_used(buffer, size, "- Cluster support is disabled.\n"); + return len; +} + +/* cluster_memory_needed + * + * Description: Tell the caller how much memory we need to operate during + * suspend/resume. + * Returns: Unsigned long. Maximum number of bytes of memory required for + * operation. + */ +static int suspend_cluster_memory_needed(void) +{ + return 0; +} + +static int suspend_cluster_storage_needed(void) +{ + return 1 + strlen(suspend_cluster_master); +} + +/* suspend_cluster_save_config_info + * + * Description: Save informaton needed when reloading the image at resume time. + * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE. + * Returns: Number of bytes used for saving our data. + */ +static int suspend_cluster_save_config_info(char *buffer) +{ + strcpy(buffer, suspend_cluster_master); + return strlen(suspend_cluster_master + 1); +} + +/* suspend_cluster_load_config_info + * + * Description: Reload information needed for declustering the image at + * resume time. + * Arguments: Buffer: Pointer to the start of the data. + * Size: Number of bytes that were saved. + */ +static void suspend_cluster_load_config_info(char *buffer, int size) +{ + strncpy(suspend_cluster_master, buffer, size); + return; +} + +/* + * data for our sysfs entries. + */ +static struct suspend_sysfs_data sysfs_params[] = { + { + SUSPEND2_ATTR("master", SYSFS_RW), + SYSFS_STRING(suspend_cluster_master, 63, SYSFS_SM_NOT_NEEDED) + }, + + { + SUSPEND2_ATTR("enabled", SYSFS_RW), + SYSFS_INT(&suspend_cluster_ops.enabled, 0, 1) + } +}; + +/* + * Ops structure. + */ + +static struct suspend_module_ops suspend_cluster_ops = { + .type = FILTER_MODULE, + .name = "Cluster", + .directory = "cluster", + .module = THIS_MODULE, + .memory_needed = suspend_cluster_memory_needed, + .print_debug_info = suspend_cluster_print_debug_stats, + .save_config_info = suspend_cluster_save_config_info, + .load_config_info = suspend_cluster_load_config_info, + .storage_needed = suspend_cluster_storage_needed, + + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +/* ---- Registration ---- */ + +#ifdef MODULE +#warning Module set. +#define INIT static __init +#define EXIT static __exit +#else +#define INIT +#define EXIT +#endif + +INIT int s2_cluster_init(void) +{ + int temp = suspend_register_module(&suspend_cluster_ops); + + if (!strlen(suspend_cluster_master)) + suspend_cluster_ops.enabled = 0; + return temp; +} + +EXIT void s2_cluster_exit(void) +{ + suspend_unregister_module(&suspend_cluster_ops); +} + +#ifdef MODULE +MODULE_LICENSE("GPL"); +module_init(s2_cluster_init); +module_exit(s2_cluster_exit); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Cluster Support for Suspend2"); +#endif diff --git a/kernel/power/cluster.h b/kernel/power/cluster.h new file mode 100644 index 0000000..d44bbf7 --- /dev/null +++ b/kernel/power/cluster.h @@ -0,0 +1,17 @@ +/* + * kernel/power/cluster.h + * + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * Copyright (C) 2006 Red Hat, inc. + * + * This file is released under the GPLv2. + */ + +#ifdef CONFIG_SUSPEND2_CLUSTER +extern int s2_cluster_init(void); +extern void s2_cluster_exit(void); +#else +static inline int s2_cluster_init(void) { return 0; } +static inline void s2_cluster_exit(void) { } +#endif + diff --git a/kernel/power/disk.c b/kernel/power/disk.c index aec19b0..5d014f1 100644 --- a/kernel/power/disk.c +++ b/kernel/power/disk.c @@ -24,6 +24,8 @@ #include "power.h" +#include "suspend.h" +#include "suspend2_builtin.h" static int noresume = 0; char resume_file[256] = CONFIG_PM_STD_PARTITION; @@ -118,6 +120,11 @@ int pm_suspend_disk(void) { int error; +#ifdef CONFIG_SUSPEND2 + if (test_action_state(SUSPEND_REPLACE_SWSUSP)) + return suspend2_try_suspend(1); +#endif + error = prepare_processes(); if (error) return error; @@ -200,10 +207,22 @@ int pm_suspend_disk(void) * */ -static int software_resume(void) +int software_resume(void) { int error; + resume_attempted = 1; + +#ifdef CONFIG_SUSPEND2 + /* + * We can't know (until an image header - if any - is loaded), whether + * we did override swsusp. We therefore ensure that both are tried. + */ + if (test_action_state(SUSPEND_REPLACE_SWSUSP)) + printk("Replacing swsusp.\n"); + suspend2_try_resume(); +#endif + mutex_lock(&pm_mutex); if (!swsusp_resume_device) { if (!strlen(resume_file)) { @@ -274,9 +293,6 @@ static int software_resume(void) return 0; } -late_initcall(software_resume); - - static const char * const pm_disk_modes[] = { [PM_DISK_FIRMWARE] = "firmware", [PM_DISK_PLATFORM] = "platform", @@ -457,6 +473,7 @@ static int __init resume_offset_setup(char *str) static int __init noresume_setup(char *str) { noresume = 1; + set_suspend_state(SUSPEND_NORESUME_SPECIFIED); return 1; } diff --git a/kernel/power/extent.c b/kernel/power/extent.c new file mode 100644 index 0000000..cd0ff0c --- /dev/null +++ b/kernel/power/extent.c @@ -0,0 +1,305 @@ +/* + * kernel/power/extent.c + * + * Copyright (C) 2003-2007 Nigel Cunningham (nigel at suspend2 net) + * + * Distributed under GPLv2. + * + * These functions encapsulate the manipulation of storage metadata. For + * pageflags, we use dynamically allocated bitmaps. + */ + +#include +#include +#include "modules.h" +#include "extent.h" +#include "ui.h" +#include "suspend.h" + +/* suspend_get_extent + * + * Returns a free extent. May fail, returning NULL instead. + */ +static struct extent *suspend_get_extent(void) +{ + struct extent *result; + + if (!(result = kmalloc(sizeof(struct extent), GFP_ATOMIC))) + return NULL; + + result->minimum = result->maximum = 0; + result->next = NULL; + + return result; +} + +/* suspend_put_extent_chain. + * + * Frees a whole chain of extents. + */ +void suspend_put_extent_chain(struct extent_chain *chain) +{ + struct extent *this; + + this = chain->first; + + while(this) { + struct extent *next = this->next; + kfree(this); + chain->num_extents--; + this = next; + } + + chain->first = chain->last_touched = NULL; + chain->size = 0; +} + +/* + * suspend_add_to_extent_chain + * + * Add an extent to an existing chain. + */ +int suspend_add_to_extent_chain(struct extent_chain *chain, + unsigned long minimum, unsigned long maximum) +{ + struct extent *new_extent = NULL, *start_at; + + /* Find the right place in the chain */ + start_at = (chain->last_touched && + (chain->last_touched->minimum < minimum)) ? + chain->last_touched : NULL; + + if (!start_at && chain->first && chain->first->minimum < minimum) + start_at = chain->first; + + while (start_at && start_at->next && start_at->next->minimum < minimum) + start_at = start_at->next; + + if (start_at && start_at->maximum == (minimum - 1)) { + start_at->maximum = maximum; + + /* Merge with the following one? */ + if (start_at->next && + start_at->maximum + 1 == start_at->next->minimum) { + struct extent *to_free = start_at->next; + start_at->maximum = start_at->next->maximum; + start_at->next = start_at->next->next; + chain->num_extents--; + kfree(to_free); + } + + chain->last_touched = start_at; + chain->size+= (maximum - minimum + 1); + + return 0; + } + + new_extent = suspend_get_extent(); + if (!new_extent) { + printk("Error unable to append a new extent to the chain.\n"); + return 2; + } + + chain->num_extents++; + chain->size+= (maximum - minimum + 1); + new_extent->minimum = minimum; + new_extent->maximum = maximum; + new_extent->next = NULL; + + chain->last_touched = new_extent; + + if (start_at) { + struct extent *next = start_at->next; + start_at->next = new_extent; + new_extent->next = next; + } else { + if (chain->first) + new_extent->next = chain->first; + chain->first = new_extent; + } + + return 0; +} + +/* suspend_serialise_extent_chain + * + * Write a chain in the image. + */ +int suspend_serialise_extent_chain(struct suspend_module_ops *owner, + struct extent_chain *chain) +{ + struct extent *this; + int ret, i = 0; + + if ((ret = suspendActiveAllocator->rw_header_chunk(WRITE, owner, + (char *) chain, + 2 * sizeof(int)))) + return ret; + + this = chain->first; + while (this) { + if ((ret = suspendActiveAllocator->rw_header_chunk(WRITE, owner, + (char *) this, + 2 * sizeof(unsigned long)))) + return ret; + this = this->next; + i++; + } + + if (i != chain->num_extents) { + printk(KERN_EMERG "Saved %d extents but chain metadata says there " + "should be %d.\n", i, chain->num_extents); + return 1; + } + + return ret; +} + +/* suspend_load_extent_chain + * + * Read back a chain saved in the image. + */ +int suspend_load_extent_chain(struct extent_chain *chain) +{ + struct extent *this, *last = NULL; + int i, ret; + + if ((ret = suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) chain, 2 * sizeof(int)))) { + printk("Failed to read size of extent chain.\n"); + return 1; + } + + for (i = 0; i < chain->num_extents; i++) { + this = kmalloc(sizeof(struct extent), GFP_ATOMIC); + if (!this) { + printk("Failed to allocate a new extent.\n"); + return -ENOMEM; + } + this->next = NULL; + if ((ret = suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) this, 2 * sizeof(unsigned long)))) { + printk("Failed to an extent.\n"); + return 1; + } + if (last) + last->next = this; + else + chain->first = this; + last = this; + } + return 0; +} + +/* suspend_extent_state_next + * + * Given a state, progress to the next valid entry. We may begin in an + * invalid state, as we do when invoked after extent_state_goto_start below. + * + * When using compression and expected_compression > 0, we let the image size + * be larger than storage, so we can validly run out of data to return. + */ +unsigned long suspend_extent_state_next(struct extent_iterate_state *state) +{ + if (state->current_chain == state->num_chains) + return 0; + + if (state->current_extent) { + if (state->current_offset == state->current_extent->maximum) { + if (state->current_extent->next) { + state->current_extent = state->current_extent->next; + state->current_offset = state->current_extent->minimum; + } else { + state->current_extent = NULL; + state->current_offset = 0; + } + } else + state->current_offset++; + } + + while(!state->current_extent) { + int chain_num = ++(state->current_chain); + + if (chain_num == state->num_chains) + return 0; + + state->current_extent = (state->chains + chain_num)->first; + + if (!state->current_extent) + continue; + + state->current_offset = state->current_extent->minimum; + } + + return state->current_offset; +} + +/* suspend_extent_state_goto_start + * + * Find the first valid value in a group of chains. + */ +void suspend_extent_state_goto_start(struct extent_iterate_state *state) +{ + state->current_chain = -1; + state->current_extent = NULL; + state->current_offset = 0; +} + +/* suspend_extent_start_save + * + * Given a state and a struct extent_state_store, save the current + * position in a format that can be used with relocated chains (at + * resume time). + */ +void suspend_extent_state_save(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state) +{ + struct extent *extent; + + saved_state->chain_num = state->current_chain; + saved_state->extent_num = 0; + saved_state->offset = state->current_offset; + + if (saved_state->chain_num == -1) + return; + + extent = (state->chains + state->current_chain)->first; + + while (extent != state->current_extent) { + saved_state->extent_num++; + extent = extent->next; + } +} + +/* suspend_extent_start_restore + * + * Restore the position saved by extent_state_save. + */ +void suspend_extent_state_restore(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state) +{ + int posn = saved_state->extent_num; + + if (saved_state->chain_num == -1) { + suspend_extent_state_goto_start(state); + return; + } + + state->current_chain = saved_state->chain_num; + state->current_extent = (state->chains + state->current_chain)->first; + state->current_offset = saved_state->offset; + + while (posn--) + state->current_extent = state->current_extent->next; +} + +#ifdef CONFIG_SUSPEND2_EXPORTS +EXPORT_SYMBOL_GPL(suspend_add_to_extent_chain); +EXPORT_SYMBOL_GPL(suspend_put_extent_chain); +EXPORT_SYMBOL_GPL(suspend_load_extent_chain); +EXPORT_SYMBOL_GPL(suspend_serialise_extent_chain); +EXPORT_SYMBOL_GPL(suspend_extent_state_save); +EXPORT_SYMBOL_GPL(suspend_extent_state_restore); +EXPORT_SYMBOL_GPL(suspend_extent_state_goto_start); +EXPORT_SYMBOL_GPL(suspend_extent_state_next); +#endif diff --git a/kernel/power/extent.h b/kernel/power/extent.h new file mode 100644 index 0000000..c97772b --- /dev/null +++ b/kernel/power/extent.h @@ -0,0 +1,77 @@ +/* + * kernel/power/extent.h + * + * Copyright (C) 2003-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * It contains declarations related to extents. Extents are + * suspend's method of storing some of the metadata for the image. + * See extent.c for more info. + * + */ + +#include "modules.h" + +#ifndef EXTENT_H +#define EXTENT_H + +struct extent { + unsigned long minimum, maximum; + struct extent *next; +}; + +struct extent_chain { + int size; /* size of the chain ie sum (max-min+1) */ + int num_extents; + struct extent *first, *last_touched; +}; + +struct extent_iterate_state { + struct extent_chain *chains; + int num_chains; + int current_chain; + struct extent *current_extent; + unsigned long current_offset; +}; + +struct extent_iterate_saved_state { + int chain_num; + int extent_num; + unsigned long offset; +}; + +#define suspend_extent_state_eof(state) ((state)->num_chains == (state)->current_chain) + +/* Simplify iterating through all the values in an extent chain */ +#define suspend_extent_for_each(extent_chain, extentpointer, value) \ +if ((extent_chain)->first) \ + for ((extentpointer) = (extent_chain)->first, (value) = \ + (extentpointer)->minimum; \ + ((extentpointer) && ((extentpointer)->next || (value) <= \ + (extentpointer)->maximum)); \ + (((value) == (extentpointer)->maximum) ? \ + ((extentpointer) = (extentpointer)->next, (value) = \ + ((extentpointer) ? (extentpointer)->minimum : 0)) : \ + (value)++)) + +void suspend_put_extent_chain(struct extent_chain *chain); +int suspend_add_to_extent_chain(struct extent_chain *chain, + unsigned long minimum, unsigned long maximum); +int suspend_serialise_extent_chain(struct suspend_module_ops *owner, + struct extent_chain *chain); +int suspend_load_extent_chain(struct extent_chain *chain); + +/* swap_entry_to_extent_val & extent_val_to_swap_entry: + * We are putting offset in the low bits so consecutive swap entries + * make consecutive extent values */ +#define swap_entry_to_extent_val(swp_entry) (swp_entry.val) +#define extent_val_to_swap_entry(val) (swp_entry_t) { (val) } + +void suspend_extent_state_save(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state); +void suspend_extent_state_restore(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state); +void suspend_extent_state_goto_start(struct extent_iterate_state *state); +unsigned long suspend_extent_state_next(struct extent_iterate_state *state); +#endif diff --git a/kernel/power/io.c b/kernel/power/io.c new file mode 100644 index 0000000..854d1a7 --- /dev/null +++ b/kernel/power/io.c @@ -0,0 +1,1407 @@ +/* + * kernel/power/io.c + * + * Copyright (C) 1998-2001 Gabor Kuti + * Copyright (C) 1998,2001,2002 Pavel Machek + * Copyright (C) 2002-2003 Florent Chabaud + * Copyright (C) 2002-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * It contains high level IO routines for suspending. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "suspend.h" +#include "modules.h" +#include "pageflags.h" +#include "io.h" +#include "ui.h" +#include "storage.h" +#include "prepare_image.h" +#include "extent.h" +#include "sysfs.h" +#include "suspend2_builtin.h" + +char poweroff_resume2[256]; + +/* Variables shared between threads and updated under the mutex */ +static int io_write, io_finish_at, io_base, io_barmax, io_pageset, io_result; +static int io_index, io_nextupdate, io_pc, io_pc_step; +static unsigned long pfn, other_pfn; +static DEFINE_MUTEX(io_mutex); +static DEFINE_PER_CPU(struct page *, last_sought); +static DEFINE_PER_CPU(struct page *, last_high_page); +static DEFINE_PER_CPU(struct pbe *, last_low_page); +static atomic_t worker_thread_count; +static atomic_t io_count; + +/* suspend_attempt_to_parse_resume_device + * + * Can we suspend, using the current resume2= parameter? + */ +int suspend_attempt_to_parse_resume_device(int quiet) +{ + struct list_head *Allocator; + struct suspend_module_ops *thisAllocator; + int result, returning = 0; + + if (suspend_activate_storage(0)) + return 0; + + suspendActiveAllocator = NULL; + clear_suspend_state(SUSPEND_RESUME_DEVICE_OK); + clear_suspend_state(SUSPEND_CAN_RESUME); + clear_result_state(SUSPEND_ABORTED); + + if (!suspendNumAllocators) { + if (!quiet) + printk("Suspend2: No storage allocators have been " + "registered. Suspending will be disabled.\n"); + goto cleanup; + } + + if (!resume2_file[0]) { + if (!quiet) + printk("Suspend2: Resume2 parameter is empty." + " Suspending will be disabled.\n"); + goto cleanup; + } + + list_for_each(Allocator, &suspendAllocators) { + thisAllocator = list_entry(Allocator, struct suspend_module_ops, + type_list); + + /* + * Not sure why you'd want to disable an allocator, but + * we should honour the flag if we're providing it + */ + if (!thisAllocator->enabled) + continue; + + result = thisAllocator->parse_sig_location( + resume2_file, (suspendNumAllocators == 1), + quiet); + + switch (result) { + case -EINVAL: + /* For this allocator, but not a valid + * configuration. Error already printed. */ + goto cleanup; + + case 0: + /* For this allocator and valid. */ + suspendActiveAllocator = thisAllocator; + + set_suspend_state(SUSPEND_RESUME_DEVICE_OK); + set_suspend_state(SUSPEND_CAN_RESUME); + if (!quiet) + printk("Suspend2: Resuming enabled.\n"); + + returning = 1; + goto cleanup; + } + } + if (!quiet) + printk("Suspend2: No matching enabled allocator found. " + "Resuming disabled.\n"); +cleanup: + suspend_deactivate_storage(0); + return returning; +} + +void attempt_to_parse_resume_device2(void) +{ + suspend_prepare_usm(); + suspend_attempt_to_parse_resume_device(0); + suspend_cleanup_usm(); +} + +void save_restore_resume2(int replace, int quiet) +{ + static char resume2_save[255]; + static unsigned long suspend_state_save; + + if (replace) { + suspend_state_save = suspend_state; + strcpy(resume2_save, resume2_file); + strcpy(resume2_file, poweroff_resume2); + } else { + strcpy(resume2_file, resume2_save); + suspend_state = suspend_state_save; + } + suspend_attempt_to_parse_resume_device(quiet); +} + +void attempt_to_parse_po_resume_device2(void) +{ + int ok = 0; + + /* Temporarily set resume2 to the poweroff value */ + if (!strlen(poweroff_resume2)) + return; + + printk("=== Trying Poweroff Resume2 ===\n"); + save_restore_resume2(SAVE, NOQUIET); + if (test_suspend_state(SUSPEND_CAN_RESUME)) + ok = 1; + + printk("=== Done ===\n"); + save_restore_resume2(RESTORE, QUIET); + + /* If not ok, clear the string */ + if (ok) + return; + + printk("Can't resume from that location; clearing poweroff_resume2.\n"); + poweroff_resume2[0] = '\0'; +} + +/* noresume_reset_modules + * + * Description: When we read the start of an image, modules (and especially the + * active allocator) might need to reset data structures if we + * decide to invalidate the image rather than resuming from it. + */ + +static void noresume_reset_modules(void) +{ + struct suspend_module_ops *this_filter; + + list_for_each_entry(this_filter, &suspend_filters, type_list) + if (this_filter->noresume_reset) + this_filter->noresume_reset(); + + if (suspendActiveAllocator && suspendActiveAllocator->noresume_reset) + suspendActiveAllocator->noresume_reset(); +} + +/* fill_suspend_header() + * + * Description: Fill the suspend header structure. + * Arguments: struct suspend_header: Header data structure to be filled. + */ + +static void fill_suspend_header(struct suspend_header *sh) +{ + int i; + + memset((char *)sh, 0, sizeof(*sh)); + + sh->version_code = LINUX_VERSION_CODE; + sh->num_physpages = num_physpages; + memcpy(&sh->uts, init_utsname(), sizeof(struct new_utsname)); + sh->page_size = PAGE_SIZE; + sh->pagedir = pagedir1; + sh->pageset_2_size = pagedir2.size; + sh->param0 = suspend_result; + sh->param1 = suspend_action; + sh->param2 = suspend_debug_state; + sh->param3 = console_loglevel; + sh->root_fs = current->fs->rootmnt->mnt_sb->s_dev; + for (i = 0; i < 4; i++) + sh->io_time[i/2][i%2] = suspend_io_time[i/2][i%2]; +} + +/* + * rw_init_modules + * + * Iterate over modules, preparing the ones that will be used to read or write + * data. + */ +static int rw_init_modules(int rw, int which) +{ + struct suspend_module_ops *this_module; + /* Initialise page transformers */ + list_for_each_entry(this_module, &suspend_filters, type_list) { + if (!this_module->enabled) + continue; + if (this_module->rw_init && this_module->rw_init(rw, which)) { + abort_suspend(SUSPEND_FAILED_MODULE_INIT, + "Failed to initialise the %s filter.", + this_module->name); + return 1; + } + } + + /* Initialise allocator */ + if (suspendActiveAllocator->rw_init(rw, which)) { + abort_suspend(SUSPEND_FAILED_MODULE_INIT, + "Failed to initialise the allocator."); + if (!rw) + suspendActiveAllocator->invalidate_image(); + return 1; + } + + /* Initialise other modules */ + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled || + this_module->type == FILTER_MODULE || + this_module->type == WRITER_MODULE) + continue; + if (this_module->rw_init && this_module->rw_init(rw, which)) { + set_result_state(SUSPEND_ABORTED); + printk("Setting aborted flag due to module init failure.\n"); + return 1; + } + } + + return 0; +} + +/* + * rw_cleanup_modules + * + * Cleanup components after reading or writing a set of pages. + * Only the allocator may fail. + */ +static int rw_cleanup_modules(int rw) +{ + struct suspend_module_ops *this_module; + int result = 0; + + /* Cleanup other modules */ + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled || + this_module->type == FILTER_MODULE || + this_module->type == WRITER_MODULE) + continue; + if (this_module->rw_cleanup) + result |= this_module->rw_cleanup(rw); + } + + /* Flush data and cleanup */ + list_for_each_entry(this_module, &suspend_filters, type_list) { + if (!this_module->enabled) + continue; + if (this_module->rw_cleanup) + result |= this_module->rw_cleanup(rw); + } + + result |= suspendActiveAllocator->rw_cleanup(rw); + + return result; +} + +static struct page *copy_page_from_orig_page(struct page *orig_page) +{ + int is_high = PageHighMem(orig_page), index, min, max; + struct page *high_page = NULL, + **my_last_high_page = &__get_cpu_var(last_high_page), + **my_last_sought = &__get_cpu_var(last_sought); + struct pbe *this, **my_last_low_page = &__get_cpu_var(last_low_page); + void *compare; + + if (is_high) { + if (*my_last_sought && *my_last_high_page && *my_last_sought < orig_page) + high_page = *my_last_high_page; + else + high_page = (struct page *) restore_highmem_pblist; + this = (struct pbe *) kmap(high_page); + compare = orig_page; + } else { + if (*my_last_sought && *my_last_low_page && *my_last_sought < orig_page) + this = *my_last_low_page; + else + this = restore_pblist; + compare = page_address(orig_page); + } + + *my_last_sought = orig_page; + + /* Locate page containing pbe */ + while ( this[PBES_PER_PAGE - 1].next && + this[PBES_PER_PAGE - 1].orig_address < compare) { + if (is_high) { + struct page *next_high_page = (struct page *) + this[PBES_PER_PAGE - 1].next; + kunmap(high_page); + this = kmap(next_high_page); + high_page = next_high_page; + } else + this = this[PBES_PER_PAGE - 1].next; + } + + /* Do a binary search within the page */ + min = 0; + max = PBES_PER_PAGE; + index = PBES_PER_PAGE / 2; + while (max - min) { + if (!this[index].orig_address || + this[index].orig_address > compare) + max = index; + else if (this[index].orig_address == compare) { + if (is_high) { + struct page *page = this[index].address; + *my_last_high_page = high_page; + kunmap(high_page); + return page; + } + *my_last_low_page = this; + return virt_to_page(this[index].address); + } else + min = index; + index = ((max + min) / 2); + }; + + if (is_high) + kunmap(high_page); + + abort_suspend(SUSPEND_FAILED_IO, "Failed to get destination page for" + " orig page %p. This[min].orig_address=%p.\n", orig_page, + this[index].orig_address); + return NULL; +} + +/* + * do_rw_loop + * + * The main I/O loop for reading or writing pages. + */ +static int worker_rw_loop(void *data) +{ + unsigned long orig_pfn, write_pfn; + int result, my_io_index = 0; + struct suspend_module_ops *first_filter = suspend_get_next_filter(NULL); + struct page *buffer = alloc_page(GFP_ATOMIC); + + atomic_inc(&worker_thread_count); + + mutex_lock(&io_mutex); + + do { + int buf_size; + + /* + * What page to use? If reading, don't know yet which page's + * data will be read, so always use the buffer. If writing, + * use the copy (Pageset1) or original page (Pageset2), but + * always write the pfn of the original page. + */ + if (io_write) { + struct page *page; + + pfn = get_next_bit_on(io_map, pfn); + + /* Another thread could have beaten us to it. */ + if (pfn == max_pfn + 1) { + if (atomic_read(&io_count)) { + printk("Ran out of pfns but io_count is still %d.\n", atomic_read(&io_count)); + BUG(); + } + break; + } + + atomic_dec(&io_count); + + orig_pfn = pfn; + write_pfn = pfn; + + /* + * Other_pfn is updated by all threads, so we're not + * writing the same page multiple times. + */ + clear_dynpageflag(&io_map, pfn_to_page(pfn)); + if (io_pageset == 1) { + other_pfn = get_next_bit_on(pageset1_map, other_pfn); + write_pfn = other_pfn; + } + page = pfn_to_page(pfn); + + my_io_index = io_finish_at - atomic_read(&io_count); + + mutex_unlock(&io_mutex); + + result = first_filter->write_chunk(write_pfn, page, + PAGE_SIZE); + } else { + atomic_dec(&io_count); + mutex_unlock(&io_mutex); + + /* + * Are we aborting? If so, don't submit any more I/O as + * resetting the resume_attempted flag (from ui.c) will + * clear the bdev flags, making this thread oops. + */ + if (unlikely(test_suspend_state(SUSPEND_STOP_RESUME))) { + atomic_dec(&worker_thread_count); + if (!atomic_read(&worker_thread_count)) + set_suspend_state(SUSPEND_IO_STOPPED); + while (1) + schedule(); + } + + result = first_filter->read_chunk(&write_pfn, buffer, + &buf_size, SUSPEND_ASYNC); + if (buf_size != PAGE_SIZE) { + abort_suspend(SUSPEND_FAILED_IO, + "I/O pipeline returned %d bytes instead " + "of %d.\n", buf_size, PAGE_SIZE); + mutex_lock(&io_mutex); + break; + } + } + + if (result) { + io_result = result; + if (io_write) { + printk("Write chunk returned %d.\n", result); + abort_suspend(SUSPEND_FAILED_IO, + "Failed to write a chunk of the " + "image."); + mutex_lock(&io_mutex); + break; + } + panic("Read chunk returned (%d)", result); + } + + /* + * Discard reads of resaved pages while reading ps2 + * and unwanted pages while rereading ps2 when aborting. + */ + if (!io_write && !PageResave(pfn_to_page(write_pfn))) { + struct page *final_page = pfn_to_page(write_pfn), + *copy_page = final_page; + char *virt, *buffer_virt; + + if (io_pageset == 1 && !load_direct(final_page)) { + copy_page = copy_page_from_orig_page(final_page); + BUG_ON(!copy_page); + } + + if (test_dynpageflag(&io_map, final_page)) { + virt = kmap(copy_page); + buffer_virt = kmap(buffer); + memcpy(virt, buffer_virt, PAGE_SIZE); + kunmap(copy_page); + kunmap(buffer); + clear_dynpageflag(&io_map, final_page); + mutex_lock(&io_mutex); + my_io_index = io_finish_at - atomic_read(&io_count); + mutex_unlock(&io_mutex); + } else { + mutex_lock(&io_mutex); + atomic_inc(&io_count); + mutex_unlock(&io_mutex); + } + } + + /* Strictly speaking, this is racy - another thread could + * output the next the next percentage before we've done + * ours. 1/5th of the pageset would have to be done first, + * though, so I'm not worried. In addition, the only impact + * would be messed up output, not image corruption. Doing + * this under the mutex seems an unnecessary slowdown. + */ + if ((my_io_index + io_base) >= io_nextupdate) + io_nextupdate = suspend_update_status(my_io_index + + io_base, io_barmax, " %d/%d MB ", + MB(io_base+my_io_index+1), MB(io_barmax)); + + if ((my_io_index + 1) == io_pc) { + printk("%d%%...", 20 * io_pc_step); + io_pc_step++; + io_pc = io_finish_at * io_pc_step / 5; + } + + suspend_cond_pause(0, NULL); + + /* + * Subtle: If there's less I/O still to be done than threads + * running, quit. This stops us doing I/O beyond the end of + * the image when reading. + * + * Possible race condition. Two threads could do the test at + * the same time; one should exit and one should continue. + * Therefore we take the mutex before comparing and exiting. + */ + + mutex_lock(&io_mutex); + + } while(atomic_read(&io_count) >= atomic_read(&worker_thread_count) && + !(io_write && test_result_state(SUSPEND_ABORTED))); + + atomic_dec(&worker_thread_count); + mutex_unlock(&io_mutex); + + __free_pages(buffer, 0); + + return 0; +} + +void start_other_threads(void) +{ + int cpu; + struct task_struct *p; + + for_each_online_cpu(cpu) { + if (cpu == smp_processor_id()) + continue; + + p = kthread_create(worker_rw_loop, NULL, "ks2io/%d", cpu); + if (IS_ERR(p)) { + printk("ks2io for %i failed\n", cpu); + continue; + } + kthread_bind(p, cpu); + wake_up_process(p); + } +} + +/* + * do_rw_loop + * + * The main I/O loop for reading or writing pages. + */ +static int do_rw_loop(int write, int finish_at, dyn_pageflags_t *pageflags, + int base, int barmax, int pageset) +{ + int index = 0, cpu; + + if (!finish_at) + return 0; + + io_write = write; + io_finish_at = finish_at; + io_base = base; + io_barmax = barmax; + io_pageset = pageset; + io_index = 0; + io_pc = io_finish_at / 5; + io_pc_step = 1; + io_result = 0; + io_nextupdate = 0; + + for_each_online_cpu(cpu) { + per_cpu(last_sought, cpu) = NULL; + per_cpu(last_low_page, cpu) = NULL; + per_cpu(last_high_page, cpu) = NULL; + } + + /* Ensure all bits clear */ + pfn = get_next_bit_on(io_map, max_pfn + 1); + + while (pfn < max_pfn + 1) { + clear_dynpageflag(&io_map, pfn_to_page(pfn)); + pfn = get_next_bit_on(io_map, pfn); + } + + /* Set the bits for the pages to write */ + pfn = get_next_bit_on(*pageflags, max_pfn + 1); + + while (pfn < max_pfn + 1 && index < finish_at) { + set_dynpageflag(&io_map, pfn_to_page(pfn)); + pfn = get_next_bit_on(*pageflags, pfn); + index++; + } + + BUG_ON(index < finish_at); + + atomic_set(&io_count, finish_at); + + pfn = max_pfn + 1; + other_pfn = pfn; + + clear_suspend_state(SUSPEND_IO_STOPPED); + + if (!test_action_state(SUSPEND_NO_MULTITHREADED_IO)) + start_other_threads(); + worker_rw_loop(NULL); + + while (atomic_read(&worker_thread_count)) + schedule(); + + set_suspend_state(SUSPEND_IO_STOPPED); + if (unlikely(test_suspend_state(SUSPEND_STOP_RESUME))) { + while (1) + schedule(); + } + + if (!io_result) { + printk("done.\n"); + + suspend_update_status(io_base + io_finish_at, io_barmax, " %d/%d MB ", + MB(io_base + io_finish_at), MB(io_barmax)); + } + + if (io_write && test_result_state(SUSPEND_ABORTED)) + io_result = 1; + else /* All I/O done? */ + BUG_ON(get_next_bit_on(io_map, max_pfn + 1) != max_pfn + 1); + + return io_result; +} + +/* write_pageset() + * + * Description: Write a pageset to disk. + * Arguments: pagedir: Which pagedir to write.. + * Returns: Zero on success or -1 on failure. + */ + +int write_pageset(struct pagedir *pagedir) +{ + int finish_at, base = 0, start_time, end_time; + int barmax = pagedir1.size + pagedir2.size; + long error = 0; + dyn_pageflags_t *pageflags; + + /* + * Even if there is nothing to read or write, the allocator + * may need the init/cleanup for it's housekeeping. (eg: + * Pageset1 may start where pageset2 ends when writing). + */ + finish_at = pagedir->size; + + if (pagedir->id == 1) { + suspend_prepare_status(DONT_CLEAR_BAR, + "Writing kernel & process data..."); + base = pagedir2.size; + if (test_action_state(SUSPEND_TEST_FILTER_SPEED) || + test_action_state(SUSPEND_TEST_BIO)) + pageflags = &pageset1_map; + else + pageflags = &pageset1_copy_map; + } else { + suspend_prepare_status(CLEAR_BAR, "Writing caches..."); + pageflags = &pageset2_map; + } + + start_time = jiffies; + + if (rw_init_modules(1, pagedir->id)) { + abort_suspend(SUSPEND_FAILED_MODULE_INIT, + "Failed to initialise modules for writing."); + error = 1; + } + + if (!error) + error = do_rw_loop(1, finish_at, pageflags, base, barmax, + pagedir->id); + + if (rw_cleanup_modules(WRITE) && !error) { + abort_suspend(SUSPEND_FAILED_MODULE_CLEANUP, + "Failed to cleanup after writing."); + error = 1; + } + + end_time = jiffies; + + if ((end_time - start_time) && (!test_result_state(SUSPEND_ABORTED))) { + suspend_io_time[0][0] += finish_at, + suspend_io_time[0][1] += (end_time - start_time); + } + + return error; +} + +/* read_pageset() + * + * Description: Read a pageset from disk. + * Arguments: whichtowrite: Controls what debugging output is printed. + * overwrittenpagesonly: Whether to read the whole pageset or + * only part. + * Returns: Zero on success or -1 on failure. + */ + +static int read_pageset(struct pagedir *pagedir, int overwrittenpagesonly) +{ + int result = 0, base = 0, start_time, end_time; + int finish_at = pagedir->size; + int barmax = pagedir1.size + pagedir2.size; + dyn_pageflags_t *pageflags; + + if (pagedir->id == 1) { + suspend_prepare_status(CLEAR_BAR, + "Reading kernel & process data..."); + pageflags = &pageset1_map; + } else { + suspend_prepare_status(DONT_CLEAR_BAR, "Reading caches..."); + if (overwrittenpagesonly) + barmax = finish_at = min(pagedir1.size, + pagedir2.size); + else { + base = pagedir1.size; + } + pageflags = &pageset2_map; + } + + start_time = jiffies; + + if (rw_init_modules(0, pagedir->id)) { + suspendActiveAllocator->invalidate_image(); + result = 1; + } else + result = do_rw_loop(0, finish_at, pageflags, base, barmax, + pagedir->id); + + if (rw_cleanup_modules(READ) && !result) { + abort_suspend(SUSPEND_FAILED_MODULE_CLEANUP, + "Failed to cleanup after reading."); + result = 1; + } + + /* Statistics */ + end_time=jiffies; + + if ((end_time - start_time) && (!test_result_state(SUSPEND_ABORTED))) { + suspend_io_time[1][0] += finish_at, + suspend_io_time[1][1] += (end_time - start_time); + } + + return result; +} + +/* write_module_configs() + * + * Description: Store the configuration for each module in the image header. + * Returns: Int: Zero on success, Error value otherwise. + */ +static int write_module_configs(void) +{ + struct suspend_module_ops *this_module; + char *buffer = (char *) get_zeroed_page(GFP_ATOMIC); + int len, index = 1; + struct suspend_module_header suspend_module_header; + + if (!buffer) { + printk("Failed to allocate a buffer for saving " + "module configuration info.\n"); + return -ENOMEM; + } + + /* + * We have to know which data goes with which module, so we at + * least write a length of zero for a module. Note that we are + * also assuming every module's config data takes <= PAGE_SIZE. + */ + + /* For each module (in registration order) */ + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled || !this_module->storage_needed || + (this_module->type == WRITER_MODULE && + suspendActiveAllocator != this_module)) + continue; + + /* Get the data from the module */ + len = 0; + if (this_module->save_config_info) + len = this_module->save_config_info(buffer); + + /* Save the details of the module */ + suspend_module_header.enabled = this_module->enabled; + suspend_module_header.type = this_module->type; + suspend_module_header.index = index++; + strncpy(suspend_module_header.name, this_module->name, + sizeof(suspend_module_header.name)); + suspendActiveAllocator->rw_header_chunk(WRITE, + this_module, + (char *) &suspend_module_header, + sizeof(suspend_module_header)); + + /* Save the size of the data and any data returned */ + suspendActiveAllocator->rw_header_chunk(WRITE, + this_module, + (char *) &len, sizeof(int)); + if (len) + suspendActiveAllocator->rw_header_chunk( + WRITE, this_module, buffer, len); + } + + /* Write a blank header to terminate the list */ + suspend_module_header.name[0] = '\0'; + suspendActiveAllocator->rw_header_chunk(WRITE, + NULL, + (char *) &suspend_module_header, + sizeof(suspend_module_header)); + + free_page((unsigned long) buffer); + return 0; +} + +/* read_module_configs() + * + * Description: Reload module configurations from the image header. + * Returns: Int. Zero on success, error value otherwise. + */ + +static int read_module_configs(void) +{ + struct suspend_module_ops *this_module; + char *buffer = (char *) get_zeroed_page(GFP_ATOMIC); + int len, result = 0; + struct suspend_module_header suspend_module_header; + + if (!buffer) { + printk("Failed to allocate a buffer for reloading module " + "configuration info.\n"); + return -ENOMEM; + } + + /* All modules are initially disabled. That way, if we have a module + * loaded now that wasn't loaded when we suspended, it won't be used + * in trying to read the data. + */ + list_for_each_entry(this_module, &suspend_modules, module_list) + this_module->enabled = 0; + + /* Get the first module header */ + result = suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) &suspend_module_header, + sizeof(suspend_module_header)); + if (result) { + printk("Failed to read the next module header.\n"); + free_page((unsigned long) buffer); + return -EINVAL; + } + + /* For each module (in registration order) */ + while (suspend_module_header.name[0]) { + + /* Find the module */ + this_module = suspend_find_module_given_name(suspend_module_header.name); + + if (!this_module) { + /* + * Is it used? Only need to worry about filters. The active + * allocator must be loaded! + */ + if (suspend_module_header.enabled) { + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "It looks like we need module %s for " + "reading the image but it hasn't been " + "registered.\n", + suspend_module_header.name); + if (!(test_suspend_state(SUSPEND_CONTINUE_REQ))) { + suspendActiveAllocator->invalidate_image(); + free_page((unsigned long) buffer); + return -EINVAL; + } + } else + printk("Module %s configuration data found, but" + " the module hasn't registered. Looks " + "like it was disabled, so we're " + "ignoring it's data.", + suspend_module_header.name); + } + + /* Get the length of the data (if any) */ + result = suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) &len, sizeof(int)); + if (result) { + printk("Failed to read the length of the module %s's" + " configuration data.\n", + suspend_module_header.name); + free_page((unsigned long) buffer); + return -EINVAL; + } + + /* Read any data and pass to the module (if we found one) */ + if (len) { + suspendActiveAllocator->rw_header_chunk(READ, NULL, + buffer, len); + if (this_module) { + if (!this_module->save_config_info) { + printk("Huh? Module %s appears to have " + "a save_config_info, but not a " + "load_config_info function!\n", + this_module->name); + } else + this_module->load_config_info(buffer, len); + } + } + + if (this_module) { + /* Now move this module to the tail of its lists. This + * will put it in order. Any new modules will end up at + * the top of the lists. They should have been set to + * disabled when loaded (people will normally not edit + * an initrd to load a new module and then suspend + * without using it!). + */ + + suspend_move_module_tail(this_module); + + /* + * We apply the disabled state; modules don't need to + * save whether they were disabled and if they do, we + * override them anyway. + */ + this_module->enabled = suspend_module_header.enabled; + } + + /* Get the next module header */ + result = suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) &suspend_module_header, + sizeof(suspend_module_header)); + + if (result) { + printk("Failed to read the next module header.\n"); + free_page((unsigned long) buffer); + return -EINVAL; + } + + } + + free_page((unsigned long) buffer); + return 0; +} + +/* write_image_header() + * + * Description: Write the image header after write the image proper. + * Returns: Int. Zero on success or -1 on failure. + */ + +int write_image_header(void) +{ + int ret; + int total = pagedir1.size + pagedir2.size+2; + char *header_buffer = NULL; + + /* Now prepare to write the header */ + if ((ret = suspendActiveAllocator->write_header_init())) { + abort_suspend(SUSPEND_FAILED_MODULE_INIT, + "Active allocator's write_header_init" + " function failed."); + goto write_image_header_abort; + } + + /* Get a buffer */ + header_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + if (!header_buffer) { + abort_suspend(SUSPEND_OUT_OF_MEMORY, + "Out of memory when trying to get page for header!"); + goto write_image_header_abort; + } + + /* Write suspend header */ + fill_suspend_header((struct suspend_header *) header_buffer); + suspendActiveAllocator->rw_header_chunk(WRITE, NULL, + header_buffer, sizeof(struct suspend_header)); + + free_page((unsigned long) header_buffer); + + /* Write module configurations */ + if ((ret = write_module_configs())) { + abort_suspend(SUSPEND_FAILED_IO, + "Failed to write module configs."); + goto write_image_header_abort; + } + + save_dyn_pageflags(pageset1_map); + + /* Flush data and let allocator cleanup */ + if (suspendActiveAllocator->write_header_cleanup()) { + abort_suspend(SUSPEND_FAILED_IO, + "Failed to cleanup writing header."); + goto write_image_header_abort_no_cleanup; + } + + if (test_result_state(SUSPEND_ABORTED)) + goto write_image_header_abort_no_cleanup; + + suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1, "|\n"); + suspend_update_status(total, total, NULL); + + return 0; + +write_image_header_abort: + suspendActiveAllocator->write_header_cleanup(); +write_image_header_abort_no_cleanup: + return -1; +} + +/* sanity_check() + * + * Description: Perform a few checks, seeking to ensure that the kernel being + * booted matches the one suspended. They need to match so we can + * be _sure_ things will work. It is not absolutely impossible for + * resuming from a different kernel to work, just not assured. + * Arguments: Struct suspend_header. The header which was saved at suspend + * time. + */ +static char *sanity_check(struct suspend_header *sh) +{ + if (sh->version_code != LINUX_VERSION_CODE) + return "Incorrect kernel version."; + + if (sh->num_physpages != num_physpages) + return "Incorrect memory size."; + + if (strncmp(sh->uts.sysname, init_utsname()->sysname, 65)) + return "Incorrect system type."; + + if (strncmp(sh->uts.release, init_utsname()->release, 65)) + return "Incorrect release."; + + if (strncmp(sh->uts.version, init_utsname()->version, 65)) + return "Right kernel version but wrong build number."; + + if (strncmp(sh->uts.machine, init_utsname()->machine, 65)) + return "Incorrect machine type."; + + if (sh->page_size != PAGE_SIZE) + return "Incorrect PAGE_SIZE."; + + if (!test_action_state(SUSPEND_IGNORE_ROOTFS)) { + const struct super_block *sb; + list_for_each_entry(sb, &super_blocks, s_list) { + if ((!(sb->s_flags & MS_RDONLY)) && + (sb->s_type->fs_flags & FS_REQUIRES_DEV)) + return "Device backed fs has been mounted " + "rw prior to resume or initrd/ramfs " + "is mounted rw."; + } + } + + return 0; +} + +/* __read_pageset1 + * + * Description: Test for the existence of an image and attempt to load it. + * Returns: Int. Zero if image found and pageset1 successfully loaded. + * Error if no image found or loaded. + */ +static int __read_pageset1(void) +{ + int i, result = 0; + char *header_buffer = (char *) get_zeroed_page(GFP_ATOMIC), + *sanity_error = NULL; + struct suspend_header *suspend_header; + + if (!header_buffer) { + printk("Unable to allocate a page for reading the signature.\n"); + return -ENOMEM; + } + + /* Check for an image */ + if (!(result = suspendActiveAllocator->image_exists())) { + result = -ENODATA; + noresume_reset_modules(); + printk("Suspend2: No image found.\n"); + goto out; + } + + /* Check for noresume command line option */ + if (test_suspend_state(SUSPEND_NORESUME_SPECIFIED)) { + printk("Suspend2: Noresume: Invalidated image.\n"); + goto out_invalidate_image; + } + + /* Check whether we've resumed before */ + if (test_suspend_state(SUSPEND_RESUMED_BEFORE)) { + int resumed_before_default = 0; + if (test_suspend_state(SUSPEND_RETRY_RESUME)) + resumed_before_default = SUSPEND_CONTINUE_REQ; + + suspend_early_boot_message(1, resumed_before_default, NULL); + clear_suspend_state(SUSPEND_RETRY_RESUME); + if (!(test_suspend_state(SUSPEND_CONTINUE_REQ))) { + printk("Suspend2: Tried to resume before: " + "Invalidated image.\n"); + goto out_invalidate_image; + } + } + + clear_suspend_state(SUSPEND_CONTINUE_REQ); + + /* + * Prepare the active allocator for reading the image header. The + * activate allocator might read its own configuration. + * + * NB: This call may never return because there might be a signature + * for a different image such that we warn the user and they choose + * to reboot. (If the device ids look erroneous (2.4 vs 2.6) or the + * location of the image might be unavailable if it was stored on a + * network connection. + */ + + if ((result = suspendActiveAllocator->read_header_init())) { + printk("Suspend2: Failed to initialise, reading the image " + "header.\n"); + goto out_invalidate_image; + } + + /* Read suspend header */ + if ((result = suspendActiveAllocator->rw_header_chunk(READ, NULL, + header_buffer, sizeof(struct suspend_header))) < 0) { + printk("Suspend2: Failed to read the image signature.\n"); + goto out_invalidate_image; + } + + suspend_header = (struct suspend_header *) header_buffer; + + /* + * NB: This call may also result in a reboot rather than returning. + */ + + if ((sanity_error = sanity_check(suspend_header)) && + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, sanity_error)) { + printk("Suspend2: Sanity check failed.\n"); + goto out_invalidate_image; + } + + /* + * We have an image and it looks like it will load okay. + * + * Get metadata from header. Don't override commandline parameters. + * + * We don't need to save the image size limit because it's not used + * during resume and will be restored with the image anyway. + */ + + memcpy((char *) &pagedir1, + (char *) &suspend_header->pagedir, sizeof(pagedir1)); + suspend_result = suspend_header->param0; + suspend_action = suspend_header->param1; + suspend_debug_state = suspend_header->param2; + console_loglevel = suspend_header->param3; + clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL); + pagedir2.size = suspend_header->pageset_2_size; + for (i = 0; i < 4; i++) + suspend_io_time[i/2][i%2] = + suspend_header->io_time[i/2][i%2]; + + /* Read module configurations */ + if ((result = read_module_configs())) { + pagedir1.size = pagedir2.size = 0; + printk("Suspend2: Failed to read Suspend module " + "configurations.\n"); + clear_action_state(SUSPEND_KEEP_IMAGE); + goto out_invalidate_image; + } + + suspend_prepare_console(); + + set_suspend_state(SUSPEND_NOW_RESUMING); + + if (pre_resume_freeze()) + goto out_reset_console; + + suspend_cond_pause(1, "About to read original pageset1 locations."); + + /* + * Read original pageset1 locations. These are the addresses we can't + * use for the data to be restored. + */ + + if (allocate_dyn_pageflags(&pageset1_map) || + allocate_dyn_pageflags(&pageset1_copy_map) || + allocate_dyn_pageflags(&io_map)) + goto out_reset_console; + + if (load_dyn_pageflags(pageset1_map)) + goto out_reset_console; + + /* Clean up after reading the header */ + if ((result = suspendActiveAllocator->read_header_cleanup())) { + printk("Suspend2: Failed to cleanup after reading the image " + "header.\n"); + goto out_reset_console; + } + + suspend_cond_pause(1, "About to read pagedir."); + + /* + * Get the addresses of pages into which we will load the kernel to + * be copied back + */ + if (suspend_get_pageset1_load_addresses()) { + printk("Suspend2: Failed to get load addresses for pageset1.\n"); + goto out_reset_console; + } + + /* Read the original kernel back */ + suspend_cond_pause(1, "About to read pageset 1."); + + if (read_pageset(&pagedir1, 0)) { + suspend_prepare_status(CLEAR_BAR, "Failed to read pageset 1."); + result = -EIO; + printk("Suspend2: Failed to get load pageset1.\n"); + goto out_reset_console; + } + + suspend_cond_pause(1, "About to restore original kernel."); + result = 0; + + if (!test_action_state(SUSPEND_KEEP_IMAGE) && + suspendActiveAllocator->mark_resume_attempted) + suspendActiveAllocator->mark_resume_attempted(1); + +out: + free_page((unsigned long) header_buffer); + return result; + +out_reset_console: + suspend_cleanup_console(); + +out_invalidate_image: + free_dyn_pageflags(&pageset1_map); + free_dyn_pageflags(&pageset1_copy_map); + free_dyn_pageflags(&io_map); + result = -EINVAL; + if (!test_action_state(SUSPEND_KEEP_IMAGE)) + suspendActiveAllocator->invalidate_image(); + suspendActiveAllocator->read_header_cleanup(); + noresume_reset_modules(); + goto out; +} + +/* read_pageset1() + * + * Description: Attempt to read the header and pageset1 of a suspend image. + * Handle the outcome, complaining where appropriate. + */ + +int read_pageset1(void) +{ + int error; + + error = __read_pageset1(); + + switch (error) { + case 0: + case -ENODATA: + case -EINVAL: /* non fatal error */ + break; + default: + if (test_result_state(SUSPEND_ABORTED)) + break; + + abort_suspend(SUSPEND_IMAGE_ERROR, + "Suspend2: Error %d resuming\n", + error); + } + return error; +} + +/* + * get_have_image_data() + */ +static char *get_have_image_data(void) +{ + char *output_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + struct suspend_header *suspend_header; + + if (!output_buffer) { + printk("Output buffer null.\n"); + return NULL; + } + + /* Check for an image */ + if (!suspendActiveAllocator->image_exists() || + suspendActiveAllocator->read_header_init() || + suspendActiveAllocator->rw_header_chunk(READ, NULL, + output_buffer, sizeof(struct suspend_header))) { + sprintf(output_buffer, "0\n"); + goto out; + } + + suspend_header = (struct suspend_header *) output_buffer; + + sprintf(output_buffer, "1\n%s\n%s\n", + suspend_header->uts.machine, + suspend_header->uts.version); + + /* Check whether we've resumed before */ + if (test_suspend_state(SUSPEND_RESUMED_BEFORE)) + strcat(output_buffer, "Resumed before.\n"); + +out: + noresume_reset_modules(); + return output_buffer; +} + +/* read_pageset2() + * + * Description: Read in part or all of pageset2 of an image, depending upon + * whether we are suspending and have only overwritten a portion + * with pageset1 pages, or are resuming and need to read them + * all. + * Arguments: Int. Boolean. Read only pages which would have been + * overwritten by pageset1? + * Returns: Int. Zero if no error, otherwise the error value. + */ +int read_pageset2(int overwrittenpagesonly) +{ + int result = 0; + + if (!pagedir2.size) + return 0; + + result = read_pageset(&pagedir2, overwrittenpagesonly); + + suspend_update_status(100, 100, NULL); + suspend_cond_pause(1, "Pagedir 2 read."); + + return result; +} + +/* image_exists_read + * + * Return 0 or 1, depending on whether an image is found. + * Incoming buffer is PAGE_SIZE and result is guaranteed + * to be far less than that, so we don't worry about + * overflow. + */ +int image_exists_read(const char *page, int count) +{ + int len = 0; + char *result; + + if (suspend_activate_storage(0)) + return count; + + if (!test_suspend_state(SUSPEND_RESUME_DEVICE_OK)) + suspend_attempt_to_parse_resume_device(0); + + if (!suspendActiveAllocator) { + len = sprintf((char *) page, "-1\n"); + } else { + result = get_have_image_data(); + if (result) { + len = sprintf((char *) page, "%s", result); + free_page((unsigned long) result); + } + } + + suspend_deactivate_storage(0); + + return len; +} + +/* image_exists_write + * + * Invalidate an image if one exists. + */ +int image_exists_write(const char *buffer, int count) +{ + if (suspend_activate_storage(0)) + return count; + + if (suspendActiveAllocator && suspendActiveAllocator->image_exists()) + suspendActiveAllocator->invalidate_image(); + + suspend_deactivate_storage(0); + + clear_result_state(SUSPEND_KEPT_IMAGE); + + return count; +} + +#ifdef CONFIG_SUSPEND2_EXPORTS +EXPORT_SYMBOL_GPL(suspend_attempt_to_parse_resume_device); +EXPORT_SYMBOL_GPL(attempt_to_parse_resume_device2); +#endif + diff --git a/kernel/power/io.h b/kernel/power/io.h new file mode 100644 index 0000000..527b39e --- /dev/null +++ b/kernel/power/io.h @@ -0,0 +1,56 @@ +/* + * kernel/power/io.h + * + * Copyright (C) 2005-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * It contains high level IO routines for suspending. + * + */ + +#include +#include "pagedir.h" + +/* Non-module data saved in our image header */ +struct suspend_header { + u32 version_code; + unsigned long num_physpages; + unsigned long orig_mem_free; + struct new_utsname uts; + int num_cpus; + int page_size; + int pageset_2_size; + int param0; + int param1; + int param2; + int param3; + int progress0; + int progress1; + int progress2; + int progress3; + int io_time[2][2]; + struct pagedir pagedir; + dev_t root_fs; +}; + +extern int write_pageset(struct pagedir *pagedir); +extern int write_image_header(void); +extern int read_pageset1(void); +extern int read_pageset2(int overwrittenpagesonly); + +extern int suspend_attempt_to_parse_resume_device(int quiet); +extern void attempt_to_parse_resume_device2(void); +extern void attempt_to_parse_po_resume_device2(void); +int image_exists_read(const char *page, int count); +int image_exists_write(const char *buffer, int count); +extern void save_restore_resume2(int replace, int quiet); + +/* Args to save_restore_resume2 */ +#define RESTORE 0 +#define SAVE 1 + +#define NOQUIET 0 +#define QUIET 1 + +extern dev_t name_to_dev_t(char *line); diff --git a/kernel/power/main.c b/kernel/power/main.c index a064dfd..05b6686 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -155,7 +155,7 @@ static void suspend_finish(suspend_state_t state) static const char * const pm_states[PM_SUSPEND_MAX] = { [PM_SUSPEND_STANDBY] = "standby", [PM_SUSPEND_MEM] = "mem", -#ifdef CONFIG_SOFTWARE_SUSPEND +#if defined(CONFIG_SOFTWARE_SUSPEND) || defined(CONFIG_SUSPEND2) [PM_SUSPEND_DISK] = "disk", #endif }; diff --git a/kernel/power/modules.c b/kernel/power/modules.c new file mode 100644 index 0000000..a6b574f --- /dev/null +++ b/kernel/power/modules.c @@ -0,0 +1,415 @@ +/* + * kernel/power/modules.c + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + */ + +#include +#include +#include "suspend.h" +#include "modules.h" +#include "sysfs.h" +#include "ui.h" + +struct list_head suspend_filters, suspendAllocators, suspend_modules; +struct suspend_module_ops *suspendActiveAllocator; +int suspend_num_filters; +int suspendNumAllocators, suspend_num_modules; +int initialised; + +static inline void suspend_initialise_module_lists(void) { + INIT_LIST_HEAD(&suspend_filters); + INIT_LIST_HEAD(&suspendAllocators); + INIT_LIST_HEAD(&suspend_modules); +} + +/* + * suspend_header_storage_for_modules + * + * Returns the amount of space needed to store configuration + * data needed by the modules prior to copying back the original + * kernel. We can exclude data for pageset2 because it will be + * available anyway once the kernel is copied back. + */ +int suspend_header_storage_for_modules(void) +{ + struct suspend_module_ops *this_module; + int bytes = 0; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled || + (this_module->type == WRITER_MODULE && + suspendActiveAllocator != this_module)) + continue; + if (this_module->storage_needed) { + int this = this_module->storage_needed() + + sizeof(struct suspend_module_header) + + sizeof(int); + this_module->header_requested = this; + bytes += this; + } + } + + /* One more for the empty terminator */ + return bytes + sizeof(struct suspend_module_header); +} + +/* + * suspend_memory_for_modules + * + * Returns the amount of memory requested by modules for + * doing their work during the cycle. + */ + +int suspend_memory_for_modules(void) +{ + int bytes = 0; + struct suspend_module_ops *this_module; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled) + continue; + if (this_module->memory_needed) + bytes += this_module->memory_needed(); + } + + return ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT); +} + +/* + * suspend_expected_compression_ratio + * + * Returns the compression ratio expected when saving the image. + */ + +int suspend_expected_compression_ratio(void) +{ + int ratio = 100; + struct suspend_module_ops *this_module; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled) + continue; + if (this_module->expected_compression) + ratio = ratio * this_module->expected_compression() / 100; + } + + return ratio; +} + +/* suspend_find_module_given_name + * Functionality : Return a module (if found), given a pointer + * to its name + */ + +struct suspend_module_ops *suspend_find_module_given_name(char *name) +{ + struct suspend_module_ops *this_module, *found_module = NULL; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!strcmp(name, this_module->name)) { + found_module = this_module; + break; + } + } + + return found_module; +} + +/* + * suspend_print_module_debug_info + * Functionality : Get debugging info from modules into a buffer. + */ +int suspend_print_module_debug_info(char *buffer, int buffer_size) +{ + struct suspend_module_ops *this_module; + int len = 0; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled) + continue; + if (this_module->print_debug_info) { + int result; + result = this_module->print_debug_info(buffer + len, + buffer_size - len); + len += result; + } + } + + /* Ensure null terminated */ + buffer[buffer_size] = 0; + + return len; +} + +/* + * suspend_register_module + * + * Register a module. + */ +int suspend_register_module(struct suspend_module_ops *module) +{ + int i; + struct kobject *kobj; + + if (!initialised) { + suspend_initialise_module_lists(); + initialised = 1; + } + + module->enabled = 1; + + if (suspend_find_module_given_name(module->name)) { + printk("Suspend2: Trying to load module %s," + " which is already registered.\n", + module->name); + return -EBUSY; + } + + switch (module->type) { + case FILTER_MODULE: + list_add_tail(&module->type_list, + &suspend_filters); + suspend_num_filters++; + break; + + case WRITER_MODULE: + list_add_tail(&module->type_list, + &suspendAllocators); + suspendNumAllocators++; + break; + + case MISC_MODULE: + break; + + default: + printk("Hmmm. Module '%s' has an invalid type." + " It has been ignored.\n", module->name); + return -EINVAL; + } + list_add_tail(&module->module_list, &suspend_modules); + suspend_num_modules++; + + if (module->directory || module->shared_directory) { + /* + * Modules may share a directory, but those with shared_dir + * set must be loaded (via symbol dependencies) after parents + * and unloaded beforehand. + */ + if (module->shared_directory) { + struct suspend_module_ops *shared = + suspend_find_module_given_name(module->shared_directory); + if (!shared) { + printk("Suspend2: Module %s wants to share %s's directory but %s isn't loaded.\n", + module->name, + module->shared_directory, + module->shared_directory); + suspend_unregister_module(module); + return -ENODEV; + } + kobj = shared->dir_kobj; + } else + kobj = make_suspend2_sysdir(module->directory); + module->dir_kobj = kobj; + for (i=0; i < module->num_sysfs_entries; i++) { + int result = suspend_register_sysfs_file(kobj, &module->sysfs_data[i]); + if (result) + return result; + } + } + + printk("Suspend2 %s support registered.\n", module->name); + return 0; +} + +/* + * suspend_unregister_module + * + * Remove a module. + */ +void suspend_unregister_module(struct suspend_module_ops *module) +{ + int i; + + if (module->dir_kobj) + for (i=0; i < module->num_sysfs_entries; i++) + suspend_unregister_sysfs_file(module->dir_kobj, &module->sysfs_data[i]); + + if (!module->shared_directory && module->directory) + remove_suspend2_sysdir(module->dir_kobj); + + switch (module->type) { + case FILTER_MODULE: + list_del(&module->type_list); + suspend_num_filters--; + break; + + case WRITER_MODULE: + list_del(&module->type_list); + suspendNumAllocators--; + if (suspendActiveAllocator == module) { + suspendActiveAllocator = NULL; + clear_suspend_state(SUSPEND_CAN_RESUME); + clear_suspend_state(SUSPEND_CAN_SUSPEND); + } + break; + + case MISC_MODULE: + break; + + default: + printk("Hmmm. Module '%s' has an invalid type." + " It has been ignored.\n", module->name); + return; + } + list_del(&module->module_list); + suspend_num_modules--; + printk("Suspend2 %s module unloaded.\n", module->name); +} + +/* + * suspend_move_module_tail + * + * Rearrange modules when reloading the config. + */ +void suspend_move_module_tail(struct suspend_module_ops *module) +{ + switch (module->type) { + case FILTER_MODULE: + if (suspend_num_filters > 1) + list_move_tail(&module->type_list, + &suspend_filters); + break; + + case WRITER_MODULE: + if (suspendNumAllocators > 1) + list_move_tail(&module->type_list, + &suspendAllocators); + break; + + case MISC_MODULE: + break; + default: + printk("Hmmm. Module '%s' has an invalid type." + " It has been ignored.\n", module->name); + return; + } + if ((suspend_num_filters + suspendNumAllocators) > 1) + list_move_tail(&module->module_list, &suspend_modules); +} + +/* + * suspend_initialise_modules + * + * Get ready to do some work! + */ +int suspend_initialise_modules(int starting_cycle) +{ + struct suspend_module_ops *this_module; + int result; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + this_module->header_requested = 0; + this_module->header_used = 0; + if (!this_module->enabled) + continue; + if (this_module->initialise) { + suspend_message(SUSPEND_MEMORY, SUSPEND_MEDIUM, 1, + "Initialising module %s.\n", + this_module->name); + if ((result = this_module->initialise(starting_cycle))) { + printk("%s didn't initialise okay.\n", + this_module->name); + return result; + } + } + } + + return 0; +} + +/* + * suspend_cleanup_modules + * + * Tell modules the work is done. + */ +void suspend_cleanup_modules(int finishing_cycle) +{ + struct suspend_module_ops *this_module; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!this_module->enabled) + continue; + if (this_module->cleanup) { + suspend_message(SUSPEND_MEMORY, SUSPEND_MEDIUM, 1, + "Cleaning up module %s.\n", + this_module->name); + this_module->cleanup(finishing_cycle); + } + } +} + +/* + * suspend_get_next_filter + * + * Get the next filter in the pipeline. + */ +struct suspend_module_ops *suspend_get_next_filter(struct suspend_module_ops *filter_sought) +{ + struct suspend_module_ops *last_filter = NULL, *this_filter = NULL; + + list_for_each_entry(this_filter, &suspend_filters, type_list) { + if (!this_filter->enabled) + continue; + if ((last_filter == filter_sought) || (!filter_sought)) + return this_filter; + last_filter = this_filter; + } + + return suspendActiveAllocator; +} + +/* suspend_get_modules + * + * Take a reference to modules so they can't go away under us. + */ + +int suspend_get_modules(void) +{ + struct suspend_module_ops *this_module; + + list_for_each_entry(this_module, &suspend_modules, module_list) { + if (!try_module_get(this_module->module)) { + /* Failed! Reverse gets and return error */ + struct suspend_module_ops *this_module2; + list_for_each_entry(this_module2, &suspend_modules, module_list) { + if (this_module == this_module2) + return -EINVAL; + module_put(this_module2->module); + } + } + } + + return 0; +} + +/* suspend_put_modules + * + * Release our references to modules we used. + */ + +void suspend_put_modules(void) +{ + struct suspend_module_ops *this_module; + + list_for_each_entry(this_module, &suspend_modules, module_list) + module_put(this_module->module); +} + +#ifdef CONFIG_SUSPEND2_EXPORTS +EXPORT_SYMBOL_GPL(suspend_register_module); +EXPORT_SYMBOL_GPL(suspend_unregister_module); +EXPORT_SYMBOL_GPL(suspend_get_next_filter); +EXPORT_SYMBOL_GPL(suspendActiveAllocator); +#endif diff --git a/kernel/power/modules.h b/kernel/power/modules.h new file mode 100644 index 0000000..ed94458 --- /dev/null +++ b/kernel/power/modules.h @@ -0,0 +1,164 @@ +/* + * kernel/power/modules.h + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * It contains declarations for modules. Modules are additions to + * suspend2 that provide facilities such as image compression or + * encryption, backends for storage of the image and user interfaces. + * + */ + +#ifndef SUSPEND_MODULES_H +#define SUSPEND_MODULES_H + +/* This is the maximum size we store in the image header for a module name */ +#define SUSPEND_MAX_MODULE_NAME_LENGTH 30 + +/* Per-module metadata */ +struct suspend_module_header { + char name[SUSPEND_MAX_MODULE_NAME_LENGTH]; + int enabled; + int type; + int index; + int data_length; + unsigned long signature; +}; + +enum { + FILTER_MODULE, + WRITER_MODULE, + MISC_MODULE /* Block writer, eg. */ +}; + +enum { + SUSPEND_ASYNC, + SUSPEND_SYNC +}; + +struct suspend_module_ops { + /* Functions common to all modules */ + int type; + char *name; + char *directory; + char *shared_directory; + struct kobject *dir_kobj; + struct module *module; + int enabled; + struct list_head module_list; + + /* List of filters or allocators */ + struct list_head list, type_list; + + /* + * Requirements for memory and storage in + * the image header.. + */ + int (*memory_needed) (void); + int (*storage_needed) (void); + + int header_requested, header_used; + + int (*expected_compression) (void); + + /* + * Debug info + */ + int (*print_debug_info) (char *buffer, int size); + int (*save_config_info) (char *buffer); + void (*load_config_info) (char *buffer, int len); + + /* + * Initialise & cleanup - general routines called + * at the start and end of a cycle. + */ + int (*initialise) (int starting_cycle); + void (*cleanup) (int finishing_cycle); + + /* + * Calls for allocating storage (allocators only). + * + * Header space is allocated separately. Note that allocation + * of space for the header might result in allocated space + * being stolen from the main pool if there is no unallocated + * space. We have to be able to allocate enough space for + * the header. We can eat memory to ensure there is enough + * for the main pool. + */ + + int (*storage_available) (void); + int (*allocate_header_space) (int space_requested); + int (*allocate_storage) (int space_requested); + int (*storage_allocated) (void); + int (*release_storage) (void); + + /* + * Routines used in image I/O. + */ + int (*rw_init) (int rw, int stream_number); + int (*rw_cleanup) (int rw); + int (*write_chunk) (unsigned long index, struct page *buffer_page, + unsigned int buf_size); + int (*read_chunk) (unsigned long *index, struct page *buffer_page, + unsigned int *buf_size, int sync); + + /* Reset module if image exists but reading aborted */ + void (*noresume_reset) (void); + + /* Read and write the metadata */ + int (*write_header_init) (void); + int (*write_header_cleanup) (void); + + int (*read_header_init) (void); + int (*read_header_cleanup) (void); + + int (*rw_header_chunk) (int rw, struct suspend_module_ops *owner, + char *buffer_start, int buffer_size); + + /* Attempt to parse an image location */ + int (*parse_sig_location) (char *buffer, int only_writer, int quiet); + + /* Determine whether image exists that we can restore */ + int (*image_exists) (void); + + /* Mark the image as having tried to resume */ + void (*mark_resume_attempted) (int); + + /* Destroy image if one exists */ + int (*invalidate_image) (void); + + /* Sysfs Data */ + struct suspend_sysfs_data *sysfs_data; + int num_sysfs_entries; +}; + +extern int suspend_num_modules, suspendNumAllocators; + +extern struct suspend_module_ops *suspendActiveAllocator; +extern struct list_head suspend_filters, suspendAllocators, suspend_modules; + +extern void suspend_prepare_console_modules(void); +extern void suspend_cleanup_console_modules(void); + +extern struct suspend_module_ops *suspend_find_module_given_name(char *name); +extern struct suspend_module_ops *suspend_get_next_filter(struct suspend_module_ops *); + +extern int suspend_register_module(struct suspend_module_ops *module); +extern void suspend_move_module_tail(struct suspend_module_ops *module); + +extern int suspend_header_storage_for_modules(void); +extern int suspend_memory_for_modules(void); +extern int suspend_expected_compression_ratio(void); + +extern int suspend_print_module_debug_info(char *buffer, int buffer_size); +extern int suspend_register_module(struct suspend_module_ops *module); +extern void suspend_unregister_module(struct suspend_module_ops *module); + +extern int suspend_initialise_modules(int starting_cycle); +extern void suspend_cleanup_modules(int finishing_cycle); + +int suspend_get_modules(void); +void suspend_put_modules(void); +#endif diff --git a/kernel/power/netlink.c b/kernel/power/netlink.c new file mode 100644 index 0000000..bb1d563 --- /dev/null +++ b/kernel/power/netlink.c @@ -0,0 +1,387 @@ +/* + * kernel/power/netlink.c + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Functions for communicating with a userspace helper via netlink. + */ + + +#include +#include "netlink.h" +#include "suspend.h" +#include "modules.h" + +struct user_helper_data *uhd_list = NULL; + +/* + * Refill our pool of SKBs for use in emergencies (eg, when eating memory and none + * can be allocated). + */ +static void suspend_fill_skb_pool(struct user_helper_data *uhd) +{ + while (uhd->pool_level < uhd->pool_limit) { + struct sk_buff *new_skb = + alloc_skb(NLMSG_SPACE(uhd->skb_size), GFP_ATOMIC); + + if (!new_skb) + break; + + new_skb->next = uhd->emerg_skbs; + uhd->emerg_skbs = new_skb; + uhd->pool_level++; + } +} + +/* + * Try to allocate a single skb. If we can't get one, try to use one from + * our pool. + */ +static struct sk_buff *suspend_get_skb(struct user_helper_data *uhd) +{ + struct sk_buff *skb = + alloc_skb(NLMSG_SPACE(uhd->skb_size), GFP_ATOMIC); + + if (skb) + return skb; + + skb = uhd->emerg_skbs; + if (skb) { + uhd->pool_level--; + uhd->emerg_skbs = skb->next; + skb->next = NULL; + } + + return skb; +} + +static void put_skb(struct user_helper_data *uhd, struct sk_buff *skb) +{ + if (uhd->pool_level < uhd->pool_limit) { + skb->next = uhd->emerg_skbs; + uhd->emerg_skbs = skb; + } else + kfree_skb(skb); +} + +void suspend_send_netlink_message(struct user_helper_data *uhd, + int type, void* params, size_t len) +{ + struct sk_buff *skb; + struct nlmsghdr *nlh; + void *dest; + struct task_struct *t; + + if (uhd->pid == -1) + return; + + skb = suspend_get_skb(uhd); + if (!skb) { + printk("suspend_netlink: Can't allocate skb!\n"); + return; + } + + /* NLMSG_PUT contains a hidden goto nlmsg_failure */ + nlh = NLMSG_PUT(skb, 0, uhd->sock_seq, type, len); + uhd->sock_seq++; + + dest = NLMSG_DATA(nlh); + if (params && len > 0) + memcpy(dest, params, len); + + netlink_unicast(uhd->nl, skb, uhd->pid, 0); + + read_lock(&tasklist_lock); + if ((t = find_task_by_pid(uhd->pid)) == NULL) { + read_unlock(&tasklist_lock); + if (uhd->pid > -1) + printk("Hmm. Can't find the userspace task %d.\n", uhd->pid); + return; + } + wake_up_process(t); + read_unlock(&tasklist_lock); + + yield(); + + return; + +nlmsg_failure: + if (skb) + put_skb(uhd, skb); +} + +static void send_whether_debugging(struct user_helper_data *uhd) +{ + static int is_debugging = 1; + + suspend_send_netlink_message(uhd, NETLINK_MSG_IS_DEBUGGING, + &is_debugging, sizeof(int)); +} + +/* + * Set the PF_NOFREEZE flag on the given process to ensure it can run whilst we + * are suspending. + */ +static int nl_set_nofreeze(struct user_helper_data *uhd, int pid) +{ + struct task_struct *t; + + read_lock(&tasklist_lock); + if ((t = find_task_by_pid(pid)) == NULL) { + read_unlock(&tasklist_lock); + printk("Strange. Can't find the userspace task %d.\n", pid); + return -EINVAL; + } + + t->flags |= PF_NOFREEZE; + + read_unlock(&tasklist_lock); + uhd->pid = pid; + + suspend_send_netlink_message(uhd, NETLINK_MSG_NOFREEZE_ACK, NULL, 0); + + return 0; +} + +/* + * Called when the userspace process has informed us that it's ready to roll. + */ +static int nl_ready(struct user_helper_data *uhd, int version) +{ + if (version != uhd->interface_version) { + printk("%s userspace process using invalid interface version." + " Trying to continue without it.\n", + uhd->name); + if (uhd->not_ready) + uhd->not_ready(); + return 1; + } + + complete(&uhd->wait_for_process); + + return 0; +} + +void suspend_netlink_close_complete(struct user_helper_data *uhd) +{ + if (uhd->nl) { + sock_release(uhd->nl->sk_socket); + uhd->nl = NULL; + } + + while (uhd->emerg_skbs) { + struct sk_buff *next = uhd->emerg_skbs->next; + kfree_skb(uhd->emerg_skbs); + uhd->emerg_skbs = next; + } + + uhd->pid = -1; + + suspend_put_modules(); +} + +static int suspend_nl_gen_rcv_msg(struct user_helper_data *uhd, + struct sk_buff *skb, struct nlmsghdr *nlh) +{ + int type; + int *data; + int err; + + /* Let the more specific handler go first. It returns + * 1 for valid messages that it doesn't know. */ + if ((err = uhd->rcv_msg(skb, nlh)) != 1) + return err; + + type = nlh->nlmsg_type; + + /* Only allow one task to receive NOFREEZE privileges */ + if (type == NETLINK_MSG_NOFREEZE_ME && uhd->pid != -1) { + printk("Received extra nofreeze me requests.\n"); + return -EBUSY; + } + + data = (int*)NLMSG_DATA(nlh); + + switch (type) { + case NETLINK_MSG_NOFREEZE_ME: + if ((err = nl_set_nofreeze(uhd, nlh->nlmsg_pid)) != 0) + return err; + break; + case NETLINK_MSG_GET_DEBUGGING: + send_whether_debugging(uhd); + break; + case NETLINK_MSG_READY: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) { + printk("Invalid ready mesage.\n"); + return -EINVAL; + } + if ((err = nl_ready(uhd, *data)) != 0) + return err; + break; + case NETLINK_MSG_CLEANUP: + suspend_netlink_close_complete(uhd); + break; + } + + return 0; +} + +static void suspend_user_rcv_skb(struct user_helper_data *uhd, + struct sk_buff *skb) +{ + int err; + struct nlmsghdr *nlh; + + while (skb->len >= NLMSG_SPACE(0)) { + u32 rlen; + + nlh = (struct nlmsghdr *) skb->data; + if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len) + return; + + rlen = NLMSG_ALIGN(nlh->nlmsg_len); + if (rlen > skb->len) + rlen = skb->len; + + if ((err = suspend_nl_gen_rcv_msg(uhd, skb, nlh)) != 0) + netlink_ack(skb, nlh, err); + else if (nlh->nlmsg_flags & NLM_F_ACK) + netlink_ack(skb, nlh, 0); + skb_pull(skb, rlen); + } +} + +static void suspend_netlink_input(struct sock *sk, int len) +{ + struct user_helper_data *uhd = uhd_list; + + while (uhd && uhd->netlink_id != sk->sk_protocol) + uhd= uhd->next; + + do { + struct sk_buff *skb; + while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) { + suspend_user_rcv_skb(uhd, skb); + put_skb(uhd, skb); + } + } while (uhd->nl && uhd->nl->sk_receive_queue.qlen); +} + +static int netlink_prepare(struct user_helper_data *uhd) +{ + suspend_get_modules(); + + uhd->next = uhd_list; + uhd_list = uhd; + + uhd->sock_seq = 0x42c0ffee; + uhd->nl = netlink_kernel_create(uhd->netlink_id, 0, + suspend_netlink_input, THIS_MODULE); + if (!uhd->nl) { + printk("Failed to allocate netlink socket for %s.\n", + uhd->name); + return -ENOMEM; + } + + suspend_fill_skb_pool(uhd); + + return 0; +} + +void suspend_netlink_close(struct user_helper_data *uhd) +{ + struct task_struct *t; + + read_lock(&tasklist_lock); + if ((t = find_task_by_pid(uhd->pid))) + t->flags &= ~PF_NOFREEZE; + read_unlock(&tasklist_lock); + + suspend_send_netlink_message(uhd, NETLINK_MSG_CLEANUP, NULL, 0); +} + +static int suspend2_launch_userspace_program(char *command, int channel_no) +{ + int retval; + static char *envp[] = { + "HOME=/", + "TERM=linux", + "PATH=/sbin:/usr/sbin:/bin:/usr/bin", + NULL }; + static char *argv[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }; + char *channel = kmalloc(6, GFP_KERNEL); + int arg = 0, size; + char test_read[255]; + char *orig_posn = command; + + if (!strlen(orig_posn)) + return 1; + + /* Up to 7 args supported */ + while (arg < 7) { + sscanf(orig_posn, "%s", test_read); + size = strlen(test_read); + if (!(size)) + break; + argv[arg] = kmalloc(size + 1, GFP_ATOMIC); + strcpy(argv[arg], test_read); + orig_posn += size + 1; + *test_read = 0; + arg++; + } + + if (channel_no) { + sprintf(channel, "-c%d", channel_no); + argv[arg] = channel; + } else + arg--; + + retval = call_usermodehelper(argv[0], argv, envp, 0); + + if (retval) + printk("Failed to launch userspace program '%s': Error %d\n", + command, retval); + + { + int i; + for (i = 0; i < arg; i++) + if (argv[i] && argv[i] != channel) + kfree(argv[i]); + } + + kfree(channel); + + return retval; +} + +int suspend_netlink_setup(struct user_helper_data *uhd) +{ + if (netlink_prepare(uhd) < 0) { + printk("Netlink prepare failed.\n"); + return 1; + } + + if (suspend2_launch_userspace_program(uhd->program, uhd->netlink_id) < 0) { + printk("Launch userspace program failed.\n"); + suspend_netlink_close_complete(uhd); + return 1; + } + + /* Wait 2 seconds for the userspace process to make contact */ + wait_for_completion_timeout(&uhd->wait_for_process, 2*HZ); + + if (uhd->pid == -1) { + printk("%s: Failed to contact userspace process.\n", + uhd->name); + suspend_netlink_close_complete(uhd); + return 1; + } + + return 0; +} + +EXPORT_SYMBOL_GPL(suspend_netlink_setup); +EXPORT_SYMBOL_GPL(suspend_netlink_close); +EXPORT_SYMBOL_GPL(suspend_send_netlink_message); diff --git a/kernel/power/netlink.h b/kernel/power/netlink.h new file mode 100644 index 0000000..97647e8 --- /dev/null +++ b/kernel/power/netlink.h @@ -0,0 +1,58 @@ +/* + * kernel/power/netlink.h + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Declarations for functions for communicating with a userspace helper + * via netlink. + */ + +#include +#include + +#define NETLINK_MSG_BASE 0x10 + +#define NETLINK_MSG_READY 0x10 +#define NETLINK_MSG_NOFREEZE_ME 0x16 +#define NETLINK_MSG_GET_DEBUGGING 0x19 +#define NETLINK_MSG_CLEANUP 0x24 +#define NETLINK_MSG_NOFREEZE_ACK 0x27 +#define NETLINK_MSG_IS_DEBUGGING 0x28 + +struct user_helper_data { + int (*rcv_msg) (struct sk_buff *skb, struct nlmsghdr *nlh); + void (* not_ready) (void); + struct sock *nl; + u32 sock_seq; + pid_t pid; + char *comm; + char program[256]; + int pool_level; + int pool_limit; + struct sk_buff *emerg_skbs; + int skb_size; + int netlink_id; + char *name; + struct user_helper_data *next; + struct completion wait_for_process; + int interface_version; + int must_init; +}; + +#ifdef CONFIG_NET +int suspend_netlink_setup(struct user_helper_data *uhd); +void suspend_netlink_close(struct user_helper_data *uhd); +void suspend_send_netlink_message(struct user_helper_data *uhd, + int type, void* params, size_t len); +#else +static inline int suspend_netlink_setup(struct user_helper_data *uhd) +{ + return 0; +} + +static inline void suspend_netlink_close(struct user_helper_data *uhd) { }; +static inline void suspend_send_netlink_message(struct user_helper_data *uhd, + int type, void* params, size_t len) { }; +#endif diff --git a/kernel/power/pagedir.c b/kernel/power/pagedir.c new file mode 100644 index 0000000..4e74689 --- /dev/null +++ b/kernel/power/pagedir.c @@ -0,0 +1,480 @@ +/* + * kernel/power/pagedir.c + * + * Copyright (C) 1998-2001 Gabor Kuti + * Copyright (C) 1998,2001,2002 Pavel Machek + * Copyright (C) 2002-2003 Florent Chabaud + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Routines for handling pagesets. + * Note that pbes aren't actually stored as such. They're stored as + * bitmaps and extents. + */ + +#include +#include +#include +#include +#include +#include + +#include "pageflags.h" +#include "ui.h" +#include "pagedir.h" +#include "prepare_image.h" +#include "suspend.h" +#include "power.h" +#include "suspend2_builtin.h" + +#define PAGESET1 0 +#define PAGESET2 1 + +static int ps2_pfn; + +/* + * suspend_mark_task_as_pageset + * Functionality : Marks all the saveable pages belonging to a given process + * as belonging to a particular pageset. + */ + +static void suspend_mark_task_as_pageset(struct task_struct *t, int pageset2) +{ + struct vm_area_struct *vma; + struct mm_struct *mm; + + mm = t->active_mm; + + if (!mm || !mm->mmap) return; + + if (!irqs_disabled()) + down_read(&mm->mmap_sem); + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + unsigned long posn; + + if (vma->vm_flags & (VM_PFNMAP | VM_IO | VM_RESERVED)) { + printk("Skipping vma %p in process %d (%s) which has " + "VM_PFNMAP | VM_IO | VM_RESERVED (%lx).\n", vma, + t->pid, t->comm, vma->vm_flags); + continue; + } + + if (!vma->vm_start) + continue; + + for (posn = vma->vm_start; posn < vma->vm_end; + posn += PAGE_SIZE) { + struct page *page = follow_page(vma, posn, 0); + if (!page) + continue; + + if (pageset2) + SetPagePageset2(page); + else { + ClearPagePageset2(page); + SetPagePageset1(page); + } + } + } + + if (!irqs_disabled()) + up_read(&mm->mmap_sem); +} + +static void pageset2_full(void) +{ + struct zone *zone; + unsigned long flags; + + for_each_zone(zone) { + spin_lock_irqsave(&zone->lru_lock, flags); + if (zone_page_state(zone, NR_INACTIVE)) { + struct page *page; + list_for_each_entry(page, &zone->inactive_list, lru) + SetPagePageset2(page); + } + if (zone_page_state(zone, NR_ACTIVE)) { + struct page *page; + list_for_each_entry(page, &zone->active_list, lru) + SetPagePageset2(page); + } + spin_unlock_irqrestore(&zone->lru_lock, flags); + } +} + +/* mark_pages_for_pageset2 + * + * Description: Mark unshared pages in processes not needed for suspend as + * being able to be written out in a separate pagedir. + * HighMem pages are simply marked as pageset2. They won't be + * needed during suspend. + */ + +struct attention_list { + struct task_struct *task; + struct attention_list *next; +}; + +void suspend_mark_pages_for_pageset2(void) +{ + struct task_struct *p; + struct attention_list *attention_list = NULL, *last = NULL, *next; + int i, task_count = 0; + + if (test_action_state(SUSPEND_NO_PAGESET2)) + return; + + clear_dyn_pageflags(pageset2_map); + + if (test_action_state(SUSPEND_PAGESET2_FULL)) + pageset2_full(); + else { + read_lock(&tasklist_lock); + for_each_process(p) { + if (!p->mm || (p->flags & PF_BORROWED_MM)) + continue; + + suspend_mark_task_as_pageset(p, PAGESET2); + } + read_unlock(&tasklist_lock); + } + + /* + * Now we count all userspace process (with task->mm) marked PF_NOFREEZE. + */ + read_lock(&tasklist_lock); + for_each_process(p) + if ((p->flags & PF_NOFREEZE) || p == current) + task_count++; + read_unlock(&tasklist_lock); + + /* + * Allocate attention list structs. + */ + for (i = 0; i < task_count; i++) { + struct attention_list *this = + kmalloc(sizeof(struct attention_list), GFP_ATOMIC); + if (!this) { + printk("Failed to allocate slab for attention list.\n"); + set_result_state(SUSPEND_ABORTED); + goto free_attention_list; + } + this->next = NULL; + if (attention_list) { + last->next = this; + last = this; + } else + attention_list = last = this; + } + + next = attention_list; + read_lock(&tasklist_lock); + for_each_process(p) + if ((p->flags & PF_NOFREEZE) || p == current) { + next->task = p; + next = next->next; + } + read_unlock(&tasklist_lock); + + /* + * Because the tasks in attention_list are ones related to suspending, + * we know that they won't go away under us. + */ + +free_attention_list: + while (attention_list) { + if (!test_result_state(SUSPEND_ABORTED)) + suspend_mark_task_as_pageset(attention_list->task, PAGESET1); + last = attention_list; + attention_list = attention_list->next; + kfree(last); + } +} + +void suspend_reset_alt_image_pageset2_pfn(void) +{ + ps2_pfn = max_pfn + 1; +} + +static struct page *first_conflicting_page; + +/* + * free_conflicting_pages + */ + +void free_conflicting_pages(void) +{ + while (first_conflicting_page) { + struct page *next = *((struct page **) kmap(first_conflicting_page)); + kunmap(first_conflicting_page); + __free_page(first_conflicting_page); + first_conflicting_page = next; + } +} + +/* __suspend_get_nonconflicting_page + * + * Description: Gets order zero pages that won't be overwritten + * while copying the original pages. + */ + +struct page * ___suspend_get_nonconflicting_page(int can_be_highmem) +{ + struct page *page; + int flags = GFP_ATOMIC | __GFP_NOWARN | __GFP_ZERO; + if (can_be_highmem) + flags |= __GFP_HIGHMEM; + + + if (test_suspend_state(SUSPEND_LOADING_ALT_IMAGE) && pageset2_map && + (ps2_pfn < (max_pfn + 2))) { + /* + * ps2_pfn = max_pfn + 1 when yet to find first ps2 pfn that can + * be used. + * = 0..max_pfn when going through list. + * = max_pfn + 2 when gone through whole list. + */ + do { + ps2_pfn = get_next_bit_on(pageset2_map, ps2_pfn); + if (ps2_pfn <= max_pfn) { + page = pfn_to_page(ps2_pfn); + if (!PagePageset1(page) && + (can_be_highmem || !PageHighMem(page))) + return page; + } else + ps2_pfn++; + } while (ps2_pfn < max_pfn); + } + + do { + page = alloc_pages(flags, 0); + if (!page) { + printk("Failed to get nonconflicting page.\n"); + return 0; + } + if (PagePageset1(page)) { + struct page **next = (struct page **) kmap(page); + *next = first_conflicting_page; + first_conflicting_page = page; + kunmap(page); + } + } while(PagePageset1(page)); + + return page; +} + +unsigned long __suspend_get_nonconflicting_page(void) +{ + struct page *page = ___suspend_get_nonconflicting_page(0); + return page ? (unsigned long) page_address(page) : 0; +} + +struct pbe *get_next_pbe(struct page **page_ptr, struct pbe *this_pbe, int highmem) +{ + if (((((unsigned long) this_pbe) & (PAGE_SIZE - 1)) + + 2 * sizeof(struct pbe)) > PAGE_SIZE) { + struct page *new_page = + ___suspend_get_nonconflicting_page(highmem); + if (!new_page) + return ERR_PTR(-ENOMEM); + this_pbe = (struct pbe *) kmap(new_page); + memset(this_pbe, 0, PAGE_SIZE); + *page_ptr = new_page; + } else + this_pbe++; + + return this_pbe; +} + +/* get_pageset1_load_addresses + * + * Description: We check here that pagedir & pages it points to won't collide + * with pages where we're going to restore from the loaded pages + * later. + * Returns: Zero on success, one if couldn't find enough pages (shouldn't + * happen). + */ + +int suspend_get_pageset1_load_addresses(void) +{ + int pfn, highallocd = 0, lowallocd = 0; + int low_needed = pagedir1.size - get_highmem_size(pagedir1); + int high_needed = get_highmem_size(pagedir1); + int low_pages_for_highmem = 0; + unsigned long flags = GFP_ATOMIC | __GFP_NOWARN | __GFP_HIGHMEM; + struct page *page, *high_pbe_page = NULL, *last_high_pbe_page = NULL, + *low_pbe_page; + struct pbe **last_low_pbe_ptr = &restore_pblist, + **last_high_pbe_ptr = &restore_highmem_pblist, + *this_low_pbe = NULL, *this_high_pbe = NULL; + int orig_low_pfn = max_pfn + 1, orig_high_pfn = max_pfn + 1; + int high_pbes_done=0, low_pbes_done=0; + int low_direct = 0, high_direct = 0; + int high_to_free, low_to_free; + + /* First, allocate pages for the start of our pbe lists. */ + if (high_needed) { + high_pbe_page = ___suspend_get_nonconflicting_page(1); + if (!high_pbe_page) + return 1; + this_high_pbe = (struct pbe *) kmap(high_pbe_page); + memset(this_high_pbe, 0, PAGE_SIZE); + } + + low_pbe_page = ___suspend_get_nonconflicting_page(0); + if (!low_pbe_page) + return 1; + this_low_pbe = (struct pbe *) page_address(low_pbe_page); + + /* + * Next, allocate all possible memory to find where we can + * load data directly into destination pages. I'd like to do + * this in bigger chunks, but then we can't free pages + * individually later. + */ + + do { + page = alloc_pages(flags, 0); + if (page) + SetPagePageset1Copy(page); + } while (page); + + /* + * Find out how many high- and lowmem pages we allocated above, + * and how many pages we can reload directly to their original + * location. + */ + BITMAP_FOR_EACH_SET(pageset1_copy_map, pfn) { + int is_high; + page = pfn_to_page(pfn); + is_high = PageHighMem(page); + + if (PagePageset1(page)) { + if (test_action_state(SUSPEND_NO_DIRECT_LOAD)) { + ClearPagePageset1Copy(page); + __free_page(page); + continue; + } else { + if (is_high) + high_direct++; + else + low_direct++; + } + } else { + if (is_high) + highallocd++; + else + lowallocd++; + } + } + + high_needed-= high_direct; + low_needed-= low_direct; + + /* + * Do we need to use some lowmem pages for the copies of highmem + * pages? + */ + if (high_needed > highallocd) { + low_pages_for_highmem = high_needed - highallocd; + high_needed -= low_pages_for_highmem; + low_needed += low_pages_for_highmem; + } + + high_to_free = highallocd - high_needed; + low_to_free = lowallocd - low_needed; + + /* + * Now generate our pbes (which will be used for the atomic restore, + * and free unneeded pages. + */ + BITMAP_FOR_EACH_SET(pageset1_copy_map, pfn) { + int is_high; + page = pfn_to_page(pfn); + is_high = PageHighMem(page); + + if (PagePageset1(page)) + continue; + + /* Free the page? */ + if ((is_high && high_to_free) || + (!is_high && low_to_free)) { + ClearPagePageset1Copy(page); + __free_page(page); + if (is_high) + high_to_free--; + else + low_to_free--; + continue; + } + + /* Nope. We're going to use this page. Add a pbe. */ + if (is_high || low_pages_for_highmem) { + struct page *orig_page; + high_pbes_done++; + if (!is_high) + low_pages_for_highmem--; + do { + orig_high_pfn = get_next_bit_on(pageset1_map, + orig_high_pfn); + BUG_ON(orig_high_pfn > max_pfn); + orig_page = pfn_to_page(orig_high_pfn); + } while(!PageHighMem(orig_page) || load_direct(orig_page)); + + this_high_pbe->orig_address = orig_page; + this_high_pbe->address = page; + this_high_pbe->next = NULL; + if (last_high_pbe_page != high_pbe_page) { + *last_high_pbe_ptr = (struct pbe *) high_pbe_page; + if (!last_high_pbe_page) + last_high_pbe_page = high_pbe_page; + } else + *last_high_pbe_ptr = this_high_pbe; + last_high_pbe_ptr = &this_high_pbe->next; + if (last_high_pbe_page != high_pbe_page) { + kunmap(last_high_pbe_page); + last_high_pbe_page = high_pbe_page; + } + this_high_pbe = get_next_pbe(&high_pbe_page, this_high_pbe, 1); + if (IS_ERR(this_high_pbe)) { + printk("This high pbe is an error.\n"); + return -ENOMEM; + } + } else { + struct page *orig_page; + low_pbes_done++; + do { + orig_low_pfn = get_next_bit_on(pageset1_map, + orig_low_pfn); + BUG_ON(orig_low_pfn > max_pfn); + orig_page = pfn_to_page(orig_low_pfn); + } while(PageHighMem(orig_page) || load_direct(orig_page)); + + this_low_pbe->orig_address = page_address(orig_page); + this_low_pbe->address = page_address(page); + this_low_pbe->next = NULL; + *last_low_pbe_ptr = this_low_pbe; + last_low_pbe_ptr = &this_low_pbe->next; + this_low_pbe = get_next_pbe(&low_pbe_page, this_low_pbe, 0); + if (IS_ERR(this_low_pbe)) { + printk("this_low_pbe is an error.\n"); + return -ENOMEM; + } + } + } + + if (high_pbe_page) + kunmap(high_pbe_page); + + if (last_high_pbe_page != high_pbe_page) { + if (last_high_pbe_page) + kunmap(last_high_pbe_page); + __free_page(high_pbe_page); + } + + free_conflicting_pages(); + + return 0; +} diff --git a/kernel/power/pagedir.h b/kernel/power/pagedir.h new file mode 100644 index 0000000..2ae395f --- /dev/null +++ b/kernel/power/pagedir.h @@ -0,0 +1,51 @@ +/* + * kernel/power/pagedir.h + * + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Declarations for routines for handling pagesets. + */ + +#ifndef KERNEL_POWER_PAGEDIR_H +#define KERNEL_POWER_PAGEDIR_H + +/* Pagedir + * + * Contains the metadata for a set of pages saved in the image. + */ + +struct pagedir { + int id; + int size; +#ifdef CONFIG_HIGHMEM + int size_high; +#endif +}; + +#ifdef CONFIG_HIGHMEM +#define get_highmem_size(pagedir) (pagedir.size_high) +#define set_highmem_size(pagedir, sz) do { pagedir.size_high = sz; } while(0) +#define inc_highmem_size(pagedir) do { pagedir.size_high++; } while(0) +#define get_lowmem_size(pagedir) (pagedir.size - pagedir.size_high) +#else +#define get_highmem_size(pagedir) (0) +#define set_highmem_size(pagedir, sz) do { } while(0) +#define inc_highmem_size(pagedir) do { } while(0) +#define get_lowmem_size(pagedir) (pagedir.size) +#endif + +extern struct pagedir pagedir1, pagedir2; + +extern void suspend_copy_pageset1(void); + +extern void suspend_mark_pages_for_pageset2(void); + +extern int suspend_get_pageset1_load_addresses(void); + +extern unsigned long __suspend_get_nonconflicting_page(void); +struct page * ___suspend_get_nonconflicting_page(int can_be_highmem); + +extern void suspend_reset_alt_image_pageset2_pfn(void); +#endif diff --git a/kernel/power/pageflags.c b/kernel/power/pageflags.c new file mode 100644 index 0000000..dccac97 --- /dev/null +++ b/kernel/power/pageflags.c @@ -0,0 +1,149 @@ +/* + * kernel/power/pageflags.c + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Routines for serialising and relocating pageflags in which we + * store our image metadata. + */ + +#include +#include +#include +#include +#include +#include +#include "pageflags.h" +#include "modules.h" +#include "pagedir.h" +#include "suspend.h" + +dyn_pageflags_t pageset2_map; +dyn_pageflags_t page_resave_map; +dyn_pageflags_t io_map; + +static int pages_for_zone(struct zone *zone) +{ + return DIV_ROUND_UP(zone->spanned_pages, (PAGE_SIZE << 3)); +} + +int suspend_pageflags_space_needed(void) +{ + int total = 0; + struct zone *zone; + + for_each_zone(zone) + if (populated_zone(zone)) + total += sizeof(int) * 3 + pages_for_zone(zone) * PAGE_SIZE; + + total += sizeof(int); + + return total; +} + +/* save_dyn_pageflags + * + * Description: Save a set of pageflags. + * Arguments: dyn_pageflags_t *: Pointer to the bitmap being saved. + */ + +void save_dyn_pageflags(dyn_pageflags_t pagemap) +{ + int i, zone_idx, size, node = 0; + struct zone *zone; + struct pglist_data *pgdat; + + if (!*pagemap) + return; + + for_each_online_pgdat(pgdat) { + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) { + zone = &pgdat->node_zones[zone_idx]; + + if (!populated_zone(zone)) + continue; + + suspendActiveAllocator->rw_header_chunk(WRITE, NULL, + (char *) &node, sizeof(int)); + suspendActiveAllocator->rw_header_chunk(WRITE, NULL, + (char *) &zone_idx, sizeof(int)); + size = pages_for_zone(zone); + suspendActiveAllocator->rw_header_chunk(WRITE, NULL, + (char *) &size, sizeof(int)); + + for (i = 0; i < size; i++) + suspendActiveAllocator->rw_header_chunk(WRITE, + NULL, (char *) pagemap[node][zone_idx][i], + PAGE_SIZE); + } + node++; + } + node = -1; + suspendActiveAllocator->rw_header_chunk(WRITE, NULL, + (char *) &node, sizeof(int)); +} + +/* load_dyn_pageflags + * + * Description: Load a set of pageflags. + * Arguments: dyn_pageflags_t *: Pointer to the bitmap being loaded. + * (It must be allocated before calling this routine). + */ + +int load_dyn_pageflags(dyn_pageflags_t pagemap) +{ + int i, zone_idx, zone_check = 0, size, node = 0; + struct zone *zone; + struct pglist_data *pgdat; + + if (!pagemap) + return 1; + + for_each_online_pgdat(pgdat) { + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) { + zone = &pgdat->node_zones[zone_idx]; + + if (!populated_zone(zone)) + continue; + + /* Same node? */ + suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) &zone_check, sizeof(int)); + if (zone_check != node) { + printk("Node read (%d) != node (%d).\n", + zone_check, node); + return 1; + } + + /* Same zone? */ + suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) &zone_check, sizeof(int)); + if (zone_check != zone_idx) { + printk("Zone read (%d) != node (%d).\n", + zone_check, zone_idx); + return 1; + } + + + suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) &size, sizeof(int)); + + for (i = 0; i < size; i++) + suspendActiveAllocator->rw_header_chunk(READ, NULL, + (char *) pagemap[node][zone_idx][i], + PAGE_SIZE); + } + node++; + } + suspendActiveAllocator->rw_header_chunk(READ, NULL, (char *) &zone_check, + sizeof(int)); + if (zone_check != -1) { + printk("Didn't read end of dyn pageflag data marker.(%x)\n", + zone_check); + return 1; + } + + return 0; +} diff --git a/kernel/power/pageflags.h b/kernel/power/pageflags.h new file mode 100644 index 0000000..405cbce --- /dev/null +++ b/kernel/power/pageflags.h @@ -0,0 +1,49 @@ +/* + * kernel/power/pageflags.h + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Suspend2 needs a few pageflags while working that aren't otherwise + * used. To save the struct page pageflags, we dynamically allocate + * a bitmap and use that. These are the only non order-0 allocations + * we do. + * + * NOTE!!! + * We assume that PAGE_SIZE - sizeof(void *) is a multiple of + * sizeof(unsigned long). Is this ever false? + */ + +#include +#include + +extern dyn_pageflags_t pageset1_map; +extern dyn_pageflags_t pageset1_copy_map; +extern dyn_pageflags_t pageset2_map; +extern dyn_pageflags_t page_resave_map; +extern dyn_pageflags_t io_map; + +#define PagePageset1(page) (test_dynpageflag(&pageset1_map, page)) +#define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page)) +#define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page)) + +#define PagePageset1Copy(page) (test_dynpageflag(&pageset1_copy_map, page)) +#define SetPagePageset1Copy(page) (set_dynpageflag(&pageset1_copy_map, page)) +#define ClearPagePageset1Copy(page) (clear_dynpageflag(&pageset1_copy_map, page)) + +#define PagePageset2(page) (test_dynpageflag(&pageset2_map, page)) +#define SetPagePageset2(page) (set_dynpageflag(&pageset2_map, page)) +#define ClearPagePageset2(page) (clear_dynpageflag(&pageset2_map, page)) + +#define PageWasRW(page) (test_dynpageflag(&pageset2_map, page)) +#define SetPageWasRW(page) (set_dynpageflag(&pageset2_map, page)) +#define ClearPageWasRW(page) (clear_dynpageflag(&pageset2_map, page)) + +#define PageResave(page) (page_resave_map ? test_dynpageflag(&page_resave_map, page) : 0) +#define SetPageResave(page) (set_dynpageflag(&page_resave_map, page)) +#define ClearPageResave(page) (clear_dynpageflag(&page_resave_map, page)) + +extern void save_dyn_pageflags(dyn_pageflags_t pagemap); +extern int load_dyn_pageflags(dyn_pageflags_t pagemap); +extern int suspend_pageflags_space_needed(void); diff --git a/kernel/power/power.h b/kernel/power/power.h index eb461b8..a4d6550 100644 --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -1,5 +1,11 @@ +/* + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + */ + #include #include +#include "suspend.h" +#include "suspend2_builtin.h" struct swsusp_info { struct new_utsname uts; @@ -15,11 +21,15 @@ struct swsusp_info { #ifdef CONFIG_SOFTWARE_SUSPEND extern int pm_suspend_disk(void); - +extern char resume_file[256]; #else static inline int pm_suspend_disk(void) { - return -EPERM; +#ifdef CONFIG_SUSPEND2 + return suspend2_try_suspend(1); +#else + return -ENODEV; +#endif } #endif @@ -40,6 +50,8 @@ extern struct subsystem power_subsys; /* References to section boundaries */ extern const void __nosave_begin, __nosave_end; +extern struct pbe *restore_pblist; + /* Preferred image size in bytes (default 500 MB) */ extern unsigned long image_size; extern int in_suspend; @@ -177,3 +189,11 @@ extern int suspend_enter(suspend_state_t state); struct timeval; extern void swsusp_show_speed(struct timeval *, struct timeval *, unsigned int, char *); +extern struct page *saveable_page(unsigned long pfn); +#ifdef CONFIG_HIGHMEM +extern struct page *saveable_highmem_page(unsigned long pfn); +#else +static inline void *saveable_highmem_page(unsigned long pfn) { return NULL; } +#endif + +#define PBES_PER_PAGE (PAGE_SIZE / sizeof(struct pbe)) diff --git a/kernel/power/power_off.c b/kernel/power/power_off.c new file mode 100644 index 0000000..7db6186 --- /dev/null +++ b/kernel/power/power_off.c @@ -0,0 +1,109 @@ +/* + * kernel/power/power_off.c + * + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Support for powering down. + */ + +#include +#include +#include +#include +#include +#include +#include +#include "suspend.h" +#include "ui.h" +#include "power_off.h" +#include "power.h" + +unsigned long suspend2_poweroff_method = 0; /* 0 - Kernel power off */ + +/* + * suspend2_power_down + * Functionality : Powers down or reboots the computer once the image + * has been written to disk. + * Key Assumptions : Able to reboot/power down via code called or that + * the warning emitted if the calls fail will be visible + * to the user (ie printk resumes devices). + * Called From : do_suspend2_suspend_2 + */ + +void suspend2_power_down(void) +{ + int result = 0; + + if (test_action_state(SUSPEND_REBOOT)) { + suspend_prepare_status(DONT_CLEAR_BAR, "Ready to reboot."); + kernel_restart(NULL); + } + + suspend_prepare_status(DONT_CLEAR_BAR, "Powering down."); + + switch (suspend2_poweroff_method) { + case 0: + break; + case 3: + suspend_console(); + + if (device_suspend(PMSG_SUSPEND)) { + suspend_prepare_status(DONT_CLEAR_BAR, "Device " + "suspend failure. Doing poweroff."); + goto ResumeConsole; + } + + if (!pm_ops || + (pm_ops->prepare && pm_ops->prepare(PM_SUSPEND_MEM))) + goto DeviceResume; + + if (test_action_state(SUSPEND_LATE_CPU_HOTPLUG) && + disable_nonboot_cpus()) + goto PmOpsFinish; + + if (!suspend_enter(PM_SUSPEND_MEM)) + result = 1; + + if (test_action_state(SUSPEND_LATE_CPU_HOTPLUG)) + enable_nonboot_cpus(); + +PmOpsFinish: + if (pm_ops->finish) + pm_ops->finish(PM_SUSPEND_MEM); + +DeviceResume: + device_resume(); + +ResumeConsole: + resume_console(); + + /* If suspended to ram and later woke. */ + if (result) + return; + break; + case 4: + case 5: + if (!pm_ops || + (pm_ops->prepare && pm_ops->prepare(PM_SUSPEND_MAX))) + break; + + kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK); + suspend_enter(suspend2_poweroff_method); + + /* Failed. Fall back to kernel_power_off etc. */ + if (pm_ops->finish) + pm_ops->finish(PM_SUSPEND_MAX); + } + + suspend_prepare_status(DONT_CLEAR_BAR, "Falling back to alternate power off method."); + kernel_power_off(); + kernel_halt(); + suspend_prepare_status(DONT_CLEAR_BAR, "Powerdown failed."); + while (1) + cpu_relax(); +} + +EXPORT_SYMBOL_GPL(suspend2_poweroff_method); +EXPORT_SYMBOL_GPL(suspend2_power_down); diff --git a/kernel/power/power_off.h b/kernel/power/power_off.h new file mode 100644 index 0000000..40e0c46 --- /dev/null +++ b/kernel/power/power_off.h @@ -0,0 +1,13 @@ +/* + * kernel/power/power_off.h + * + * Copyright (C) 2006-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Support for the powering down. + */ + +int suspend_pm_state_finish(void); +void suspend2_power_down(void); +extern unsigned long suspend2_poweroff_method; diff --git a/kernel/power/prepare_image.c b/kernel/power/prepare_image.c new file mode 100644 index 0000000..779341b --- /dev/null +++ b/kernel/power/prepare_image.c @@ -0,0 +1,798 @@ +/* + * kernel/power/prepare_image.c + * + * Copyright (C) 2003-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * We need to eat memory until we can: + * 1. Perform the save without changing anything (RAM_NEEDED < #pages) + * 2. Fit it all in available space (suspendActiveAllocator->available_space() >= + * main_storage_needed()) + * 3. Reload the pagedir and pageset1 to places that don't collide with their + * final destinations, not knowing to what extent the resumed kernel will + * overlap with the one loaded at boot time. I think the resumed kernel + * should overlap completely, but I don't want to rely on this as it is + * an unproven assumption. We therefore assume there will be no overlap at + * all (worse case). + * 4. Meet the user's requested limit (if any) on the size of the image. + * The limit is in MB, so pages/256 (assuming 4K pages). + * + */ + +#include +#include +#include +#include +#include +#include + +#include "pageflags.h" +#include "modules.h" +#include "io.h" +#include "ui.h" +#include "extent.h" +#include "prepare_image.h" +#include "block_io.h" +#include "suspend.h" +#include "checksum.h" +#include "sysfs.h" + +static int num_nosave = 0; +static int header_space_allocated = 0; +static int main_storage_allocated = 0; +static int storage_available = 0; +int extra_pd1_pages_allowance = MIN_EXTRA_PAGES_ALLOWANCE; +int image_size_limit = 0; + +/* + * The atomic copy of pageset1 is stored in pageset2 pages. + * But if pageset1 is larger (normally only just after boot), + * we need to allocate extra pages to store the atomic copy. + * The following data struct and functions are used to handle + * the allocation and freeing of that memory. + */ + +static int extra_pages_allocated; + +struct extras { + struct page *page; + int order; + struct extras *next; +}; + +static struct extras *extras_list; + +/* suspend_free_extra_pagedir_memory + * + * Description: Free previously allocated extra pagedir memory. + */ +void suspend_free_extra_pagedir_memory(void) +{ + /* Free allocated pages */ + while (extras_list) { + struct extras *this = extras_list; + int i; + + extras_list = this->next; + + for (i = 0; i < (1 << this->order); i++) + ClearPageNosave(this->page + i); + + __free_pages(this->page, this->order); + kfree(this); + } + + extra_pages_allocated = 0; +} + +/* suspend_allocate_extra_pagedir_memory + * + * Description: Allocate memory for making the atomic copy of pagedir1 in the + * case where it is bigger than pagedir2. + * Arguments: int num_to_alloc: Number of extra pages needed. + * Result: int. Number of extra pages we now have allocated. + */ +static int suspend_allocate_extra_pagedir_memory(int extra_pages_needed) +{ + int j, order, num_to_alloc = extra_pages_needed - extra_pages_allocated; + unsigned long flags = GFP_ATOMIC | __GFP_NOWARN; + + if (num_to_alloc < 1) + return 0; + + order = fls(num_to_alloc); + if (order >= MAX_ORDER) + order = MAX_ORDER - 1; + + while (num_to_alloc) { + struct page *newpage; + unsigned long virt; + struct extras *extras_entry; + + while ((1 << order) > num_to_alloc) + order--; + + extras_entry = (struct extras *) kmalloc(sizeof(struct extras), + GFP_ATOMIC); + + if (!extras_entry) + return extra_pages_allocated; + + virt = __get_free_pages(flags, order); + while (!virt && order) { + order--; + virt = __get_free_pages(flags, order); + } + + if (!virt) { + kfree(extras_entry); + return extra_pages_allocated; + } + + newpage = virt_to_page(virt); + + extras_entry->page = newpage; + extras_entry->order = order; + extras_entry->next = NULL; + + if (extras_list) + extras_entry->next = extras_list; + + extras_list = extras_entry; + + for (j = 0; j < (1 << order); j++) { + SetPageNosave(newpage + j); + SetPagePageset1Copy(newpage + j); + } + + extra_pages_allocated += (1 << order); + num_to_alloc -= (1 << order); + } + + return extra_pages_allocated; +} + +/* + * real_nr_free_pages: Count pcp pages for a zone type or all zones + * (-1 for all, otherwise zone_idx() result desired). + */ +int real_nr_free_pages(unsigned long zone_idx_mask) +{ + struct zone *zone; + int result = 0, i = 0, cpu; + + /* PCP lists */ + for_each_zone(zone) { + if (!populated_zone(zone)) + continue; + + if (!(zone_idx_mask & (1 << zone_idx(zone)))) + continue; + + for_each_online_cpu(cpu) { + struct per_cpu_pageset *pset = zone_pcp(zone, cpu); + + for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { + struct per_cpu_pages *pcp; + + pcp = &pset->pcp[i]; + result += pcp->count; + } + } + + result += zone_page_state(zone, NR_FREE_PAGES); + } + return result; +} + +/* + * Discover how much extra memory will be required by the drivers + * when they're asked to suspend. We can then ensure that amount + * of memory is available when we really want it. + */ +static void get_extra_pd1_allowance(void) +{ + int orig_num_free = real_nr_free_pages(all_zones_mask), final; + + suspend_prepare_status(CLEAR_BAR, "Finding allowance for drivers."); + + suspend_console(); + device_suspend(PMSG_FREEZE); + local_irq_disable(); /* irqs might have been re-enabled on us */ + device_power_down(PMSG_FREEZE); + + final = real_nr_free_pages(all_zones_mask); + + device_power_up(); + local_irq_enable(); + device_resume(); + resume_console(); + + extra_pd1_pages_allowance = max( + orig_num_free - final + MIN_EXTRA_PAGES_ALLOWANCE, + MIN_EXTRA_PAGES_ALLOWANCE); +} + +/* + * Amount of storage needed, possibly taking into account the + * expected compression ratio and possibly also ignoring our + * allowance for extra pages. + */ +static int main_storage_needed(int use_ecr, + int ignore_extra_pd1_allow) +{ + return ((pagedir1.size + pagedir2.size + + (ignore_extra_pd1_allow ? 0 : extra_pd1_pages_allowance)) * + (use_ecr ? suspend_expected_compression_ratio() : 100) / 100); +} + +/* + * Storage needed for the image header, in bytes until the return. + */ +static int header_storage_needed(void) +{ + int bytes = (int) sizeof(struct suspend_header) + + suspend_header_storage_for_modules() + + suspend_pageflags_space_needed(); + + return DIV_ROUND_UP(bytes, PAGE_SIZE); +} + +/* + * When freeing memory, pages from either pageset might be freed. + * + * When seeking to free memory to be able to suspend, for every ps1 page freed, + * we need 2 less pages for the atomic copy because there is one less page to + * copy and one more page into which data can be copied. + * + * Freeing ps2 pages saves us nothing directly. No more memory is available + * for the atomic copy. Indirectly, a ps1 page might be freed (slab?), but + * that's too much work to figure out. + * + * => ps1_to_free functions + * + * Of course if we just want to reduce the image size, because of storage + * limitations or an image size limit either ps will do. + * + * => any_to_free function + */ + +static int highpages_ps1_to_free(void) +{ + return max_t(int, 0, DIV_ROUND_UP(get_highmem_size(pagedir1) - + get_highmem_size(pagedir2), 2) - real_nr_free_high_pages()); +} + +static int lowpages_ps1_to_free(void) +{ + return max_t(int, 0, DIV_ROUND_UP(get_lowmem_size(pagedir1) + + extra_pd1_pages_allowance + MIN_FREE_RAM + + suspend_memory_for_modules() - get_lowmem_size(pagedir2) - + real_nr_free_low_pages() - extra_pages_allocated, 2)); +} + +static int current_image_size(void) +{ + return pagedir1.size + pagedir2.size + header_space_allocated; +} + +static int any_to_free(int use_image_size_limit) +{ + int user_limit = (use_image_size_limit && image_size_limit > 0) ? + max_t(int, 0, current_image_size() - (image_size_limit << 8)) + : 0; + + int storage_limit = max_t(int, 0, + main_storage_needed(1, 1) - storage_available); + + return max(user_limit, storage_limit); +} + +/* amount_needed + * + * Calculates the amount by which the image size needs to be reduced to meet + * our constraints. + */ +static int amount_needed(int use_image_size_limit) +{ + return max(highpages_ps1_to_free() + lowpages_ps1_to_free(), + any_to_free(use_image_size_limit)); +} + +static int image_not_ready(int use_image_size_limit) +{ + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_LOW, 1, + "Amount still needed (%d) > 0:%d. Header: %d < %d: %d," + " Storage allocd: %d < %d: %d.\n", + amount_needed(use_image_size_limit), + (amount_needed(use_image_size_limit) > 0), + header_space_allocated, header_storage_needed(), + header_space_allocated < header_storage_needed(), + main_storage_allocated, + main_storage_needed(1, 1), + main_storage_allocated < main_storage_needed(1, 1)); + + suspend_cond_pause(0, NULL); + + return ((amount_needed(use_image_size_limit) > 0) || + header_space_allocated < header_storage_needed() || + main_storage_allocated < main_storage_needed(1, 1)); +} + +static void display_stats(int always, int sub_extra_pd1_allow) +{ + char buffer[255]; + snprintf(buffer, 254, + "Free:%d(%d). Sets:%d(%d),%d(%d). Header:%d/%d. Nosave:%d-%d" + "=%d. Storage:%u/%u(%u=>%u). Needed:%d,%d,%d(%d,%d,%d,%d)\n", + + /* Free */ + real_nr_free_pages(all_zones_mask), + real_nr_free_low_pages(), + + /* Sets */ + pagedir1.size, pagedir1.size - get_highmem_size(pagedir1), + pagedir2.size, pagedir2.size - get_highmem_size(pagedir2), + + /* Header */ + header_space_allocated, header_storage_needed(), + + /* Nosave */ + num_nosave, extra_pages_allocated, + num_nosave - extra_pages_allocated, + + /* Storage */ + main_storage_allocated, + storage_available, + main_storage_needed(1, sub_extra_pd1_allow), + main_storage_needed(1, 1), + + /* Needed */ + lowpages_ps1_to_free(), highpages_ps1_to_free(), + any_to_free(1), + MIN_FREE_RAM, suspend_memory_for_modules(), + extra_pd1_pages_allowance, image_size_limit << 8); + + if (always) + printk(buffer); + else + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_MEDIUM, 1, buffer); +} + +/* generate_free_page_map + * + * Description: This routine generates a bitmap of free pages from the + * lists used by the memory manager. We then use the bitmap + * to quickly calculate which pages to save and in which + * pagesets. + */ +static void generate_free_page_map(void) +{ + int order, loop, cpu; + struct page *page; + unsigned long flags, i; + struct zone *zone; + + for_each_zone(zone) { + if (!populated_zone(zone)) + continue; + + spin_lock_irqsave(&zone->lock, flags); + + for(i=0; i < zone->spanned_pages; i++) + ClearPageNosaveFree(pfn_to_page( + zone->zone_start_pfn + i)); + + for (order = MAX_ORDER - 1; order >= 0; --order) + list_for_each_entry(page, + &zone->free_area[order].free_list, lru) + for(loop=0; loop < (1 << order); loop++) + SetPageNosaveFree(page+loop); + + + for_each_online_cpu(cpu) { + struct per_cpu_pageset *pset = zone_pcp(zone, cpu); + + for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { + struct per_cpu_pages *pcp; + struct page *page; + + pcp = &pset->pcp[i]; + list_for_each_entry(page, &pcp->list, lru) + SetPageNosaveFree(page); + } + } + + spin_unlock_irqrestore(&zone->lock, flags); + } +} + +/* size_of_free_region + * + * Description: Return the number of pages that are free, beginning with and + * including this one. + */ +static int size_of_free_region(struct page *page) +{ + struct zone *zone = page_zone(page); + struct page *posn = page, *last_in_zone = + pfn_to_page(zone->zone_start_pfn) + zone->spanned_pages - 1; + + while (posn <= last_in_zone && PageNosaveFree(posn)) + posn++; + return (posn - page); +} + +/* flag_image_pages + * + * This routine generates our lists of pages to be stored in each + * pageset. Since we store the data using extents, and adding new + * extents might allocate a new extent page, this routine may well + * be called more than once. + */ +static void flag_image_pages(int atomic_copy) +{ + int num_free = 0; + unsigned long loop; + struct zone *zone; + + pagedir1.size = 0; + pagedir2.size = 0; + + set_highmem_size(pagedir1, 0); + set_highmem_size(pagedir2, 0); + + num_nosave = 0; + + clear_dyn_pageflags(pageset1_map); + + generate_free_page_map(); + + /* + * Pages not to be saved are marked Nosave irrespective of being reserved + */ + for_each_zone(zone) { + int highmem = is_highmem(zone); + + if (!populated_zone(zone)) + continue; + + for (loop = 0; loop < zone->spanned_pages; loop++) { + unsigned long pfn = zone->zone_start_pfn + loop; + struct page *page; + int chunk_size; + + if (!pfn_valid(pfn)) + continue; + + page = pfn_to_page(pfn); + + chunk_size = size_of_free_region(page); + if (chunk_size) { + num_free += chunk_size; + loop += chunk_size - 1; + continue; + } + + if (highmem) + page = saveable_highmem_page(pfn); + else + page = saveable_page(pfn); + + if (!page) { + num_nosave++; + continue; + } + + if (PagePageset2(page)) { + pagedir2.size++; + if (PageHighMem(page)) + inc_highmem_size(pagedir2); + else + SetPagePageset1Copy(page); + if (PageResave(page)) { + SetPagePageset1(page); + ClearPagePageset1Copy(page); + pagedir1.size++; + if (PageHighMem(page)) + inc_highmem_size(pagedir1); + } + } else { + pagedir1.size++; + SetPagePageset1(page); + if (PageHighMem(page)) + inc_highmem_size(pagedir1); + } + } + } + + if (atomic_copy) + return; + + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_MEDIUM, 0, + "Count data pages: Set1 (%d) + Set2 (%d) + Nosave (%d) + " + "NumFree (%d) = %d.\n", + pagedir1.size, pagedir2.size, num_nosave, num_free, + pagedir1.size + pagedir2.size + num_nosave + num_free); +} + +void suspend_recalculate_image_contents(int atomic_copy) +{ + clear_dyn_pageflags(pageset1_map); + if (!atomic_copy) { + int pfn; + BITMAP_FOR_EACH_SET(pageset2_map, pfn) + ClearPagePageset1Copy(pfn_to_page(pfn)); + /* Need to call this before getting pageset1_size! */ + suspend_mark_pages_for_pageset2(); + } + flag_image_pages(atomic_copy); + + if (!atomic_copy) { + storage_available = suspendActiveAllocator->storage_available(); + display_stats(0, 0); + } +} + +/* update_image + * + * Allocate [more] memory and storage for the image. + */ +static void update_image(void) +{ + int result, param_used, wanted, got; + + suspend_recalculate_image_contents(0); + + /* Include allowance for growth in pagedir1 while writing pagedir 2 */ + wanted = pagedir1.size + extra_pd1_pages_allowance - + get_lowmem_size(pagedir2); + if (wanted > extra_pages_allocated) { + got = suspend_allocate_extra_pagedir_memory(wanted); + if (wanted < got) { + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_LOW, 1, + "Want %d extra pages for pageset1, got %d.\n", + wanted, got); + return; + } + } + + thaw_kernel_threads(); + + /* + * Allocate remaining storage space, if possible, up to the + * maximum we know we'll need. It's okay to allocate the + * maximum if the writer is the swapwriter, but + * we don't want to grab all available space on an NFS share. + * We therefore ignore the expected compression ratio here, + * thereby trying to allocate the maximum image size we could + * need (assuming compression doesn't expand the image), but + * don't complain if we can't get the full amount we're after. + */ + + suspendActiveAllocator->allocate_storage( + min(storage_available, main_storage_needed(0, 0))); + + main_storage_allocated = suspendActiveAllocator->storage_allocated(); + + param_used = header_storage_needed(); + + result = suspendActiveAllocator->allocate_header_space(param_used); + + if (result) + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_LOW, 1, + "Still need to get more storage space for header.\n"); + else + header_space_allocated = param_used; + + if (freeze_processes()) { + set_result_state(SUSPEND_FREEZING_FAILED); + set_result_state(SUSPEND_ABORTED); + } + + allocate_checksum_pages(); + + suspend_recalculate_image_contents(0); +} + +/* attempt_to_freeze + * + * Try to freeze processes. + */ + +static int attempt_to_freeze(void) +{ + int result; + + /* Stop processes before checking again */ + thaw_processes(); + suspend_prepare_status(CLEAR_BAR, "Freezing processes & syncing filesystems."); + result = freeze_processes(); + + if (result) { + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_FREEZING_FAILED); + } + + return result; +} + +/* eat_memory + * + * Try to free some memory, either to meet hard or soft constraints on the image + * characteristics. + * + * Hard constraints: + * - Pageset1 must be < half of memory; + * - We must have enough memory free at resume time to have pageset1 + * be able to be loaded in pages that don't conflict with where it has to + * be restored. + * Soft constraints + * - User specificied image size limit. + */ +static void eat_memory(void) +{ + int amount_wanted = 0; + int free_flags = 0, did_eat_memory = 0; + + /* + * Note that if we have enough storage space and enough free memory, we + * may exit without eating anything. We give up when the last 10 + * iterations ate no extra pages because we're not going to get much + * more anyway, but the few pages we get will take a lot of time. + * + * We freeze processes before beginning, and then unfreeze them if we + * need to eat memory until we think we have enough. If our attempts + * to freeze fail, we give up and abort. + */ + + suspend_recalculate_image_contents(0); + amount_wanted = amount_needed(1); + + switch (image_size_limit) { + case -1: /* Don't eat any memory */ + if (amount_wanted > 0) { + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_WOULD_EAT_MEMORY); + return; + } + break; + case -2: /* Free caches only */ + drop_pagecache(); + suspend_recalculate_image_contents(0); + amount_wanted = amount_needed(1); + did_eat_memory = 1; + break; + default: + free_flags = GFP_ATOMIC | __GFP_HIGHMEM; + } + + if (amount_wanted > 0 && !test_result_state(SUSPEND_ABORTED) && + image_size_limit != -1) { + struct zone *zone; + int zone_idx; + + suspend_prepare_status(CLEAR_BAR, "Seeking to free %dMB of memory.", MB(amount_wanted)); + + thaw_kernel_threads(); + + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) { + int zone_type_free = max_t(int, (zone_idx == ZONE_HIGHMEM) ? + highpages_ps1_to_free() : + lowpages_ps1_to_free(), amount_wanted); + + if (zone_type_free < 0) + break; + + for_each_zone(zone) { + if (zone_idx(zone) != zone_idx) + continue; + + shrink_one_zone(zone, zone_type_free); + + did_eat_memory = 1; + + suspend_recalculate_image_contents(0); + + amount_wanted = amount_needed(1); + zone_type_free = max_t(int, (zone_idx == ZONE_HIGHMEM) ? + highpages_ps1_to_free() : + lowpages_ps1_to_free(), amount_wanted); + + if (zone_type_free < 0) + break; + } + } + + suspend_cond_pause(0, NULL); + + if (freeze_processes()) { + set_result_state(SUSPEND_FREEZING_FAILED); + set_result_state(SUSPEND_ABORTED); + } + } + + if (did_eat_memory) { + unsigned long orig_state = get_suspend_state(); + /* Freeze_processes will call sys_sync too */ + restore_suspend_state(orig_state); + suspend_recalculate_image_contents(0); + } + + /* Blank out image size display */ + suspend_update_status(100, 100, NULL); +} + +/* suspend_prepare_image + * + * Entry point to the whole image preparation section. + * + * We do four things: + * - Freeze processes; + * - Ensure image size constraints are met; + * - Complete all the preparation for saving the image, + * including allocation of storage. The only memory + * that should be needed when we're finished is that + * for actually storing the image (and we know how + * much is needed for that because the modules tell + * us). + * - Make sure that all dirty buffers are written out. + */ +#define MAX_TRIES 2 +int suspend_prepare_image(void) +{ + int result = 1, tries = 1; + + header_space_allocated = 0; + main_storage_allocated = 0; + + if (attempt_to_freeze()) + return 1; + + if (!extra_pd1_pages_allowance) + get_extra_pd1_allowance(); + + storage_available = suspendActiveAllocator->storage_available(); + + if (!storage_available) { + printk(KERN_ERR "You need some storage available to be able to suspend.\n"); + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_NOSTORAGE_AVAILABLE); + return 1; + } + + do { + suspend_prepare_status(CLEAR_BAR, "Preparing Image. Try %d.", tries); + + eat_memory(); + + if (test_result_state(SUSPEND_ABORTED)) + break; + + update_image(); + + tries++; + + } while (image_not_ready(1) && tries <= MAX_TRIES && + !test_result_state(SUSPEND_ABORTED)); + + result = image_not_ready(0); + + if (!test_result_state(SUSPEND_ABORTED)) { + if (result) { + display_stats(1, 0); + abort_suspend(SUSPEND_UNABLE_TO_PREPARE_IMAGE, + "Unable to successfully prepare the image.\n"); + } else { + unlink_lru_lists(); + suspend_cond_pause(1, "Image preparation complete."); + } + } + + return result; +} + +#ifdef CONFIG_SUSPEND2_EXPORTS +EXPORT_SYMBOL_GPL(real_nr_free_pages); +#endif diff --git a/kernel/power/prepare_image.h b/kernel/power/prepare_image.h new file mode 100644 index 0000000..8c2e426 --- /dev/null +++ b/kernel/power/prepare_image.h @@ -0,0 +1,34 @@ +/* + * kernel/power/prepare_image.h + * + * Copyright (C) 2003-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + */ + +#include + +extern int suspend_prepare_image(void); +extern void suspend_recalculate_image_contents(int storage_available); +extern int real_nr_free_pages(unsigned long zone_idx_mask); +extern int image_size_limit; +extern void suspend_free_extra_pagedir_memory(void); +extern int extra_pd1_pages_allowance; + +#define MIN_FREE_RAM 100 +#define MIN_EXTRA_PAGES_ALLOWANCE 500 + +#define all_zones_mask ((unsigned long) ((1 << MAX_NR_ZONES) - 1)) +#ifdef CONFIG_HIGHMEM +#define real_nr_free_high_pages() (real_nr_free_pages(1 << ZONE_HIGHMEM)) +#define real_nr_free_low_pages() (real_nr_free_pages(all_zones_mask - \ + (1 << ZONE_HIGHMEM))) +#else +#define real_nr_free_high_pages() (0) +#define real_nr_free_low_pages() (real_nr_free_pages(all_zones_mask)) + +/* For eat_memory function */ +#define ZONE_HIGHMEM (MAX_NR_ZONES + 1) +#endif + diff --git a/kernel/power/process.c b/kernel/power/process.c index 6d566bf..90a4cc5 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -15,6 +15,8 @@ #include #include +int freezer_state = 0; + /* * Timeout for stopping processes */ @@ -179,10 +181,11 @@ int freeze_processes(void) return nr_unfrozen; sys_sync(); + freezer_state = FREEZER_USERSPACE_FROZEN; nr_unfrozen = try_to_freeze_tasks(FREEZER_KERNEL_THREADS); if (nr_unfrozen) return nr_unfrozen; - + freezer_state = FREEZER_FULLY_ON; printk("done.\n"); BUG_ON(in_atomic()); return 0; @@ -200,7 +203,7 @@ static void thaw_tasks(int thaw_user_space) if (is_user_space(p) == !thaw_user_space) continue; - if (!thaw_process(p)) + if (!thaw_process(p) && p->state != TASK_TRACED) printk(KERN_WARNING " Strange, %s not stopped\n", p->comm ); } while_each_thread(g, p); @@ -209,11 +212,31 @@ static void thaw_tasks(int thaw_user_space) void thaw_processes(void) { + int old_state = freezer_state; + + if (old_state == FREEZER_OFF) + return; + + /* + * Change state beforehand because thawed tasks might submit I/O + * immediately. + */ + freezer_state = FREEZER_OFF; + printk("Restarting tasks ... "); - thaw_tasks(FREEZER_KERNEL_THREADS); + + if (old_state == FREEZER_FULLY_ON) + thaw_tasks(FREEZER_KERNEL_THREADS); thaw_tasks(FREEZER_USER_SPACE); schedule(); printk("done.\n"); } +void thaw_kernel_threads(void) +{ + freezer_state = FREEZER_USERSPACE_FROZEN; + thaw_tasks(FREEZER_KERNEL_THREADS); +} + EXPORT_SYMBOL(refrigerator); +EXPORT_SYMBOL(freezer_state); diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index fc53ad0..319c96a 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -33,6 +33,7 @@ #include #include "power.h" +#include "suspend2_builtin.h" /* List of PBEs needed for restoring the pages that were allocated before * the suspend and included in the suspend image, but have also been @@ -40,6 +41,13 @@ * directly to their "original" page frames. */ struct pbe *restore_pblist; +int resume_attempted; +EXPORT_SYMBOL_GPL(resume_attempted); + +#ifdef CONFIG_SUSPEND2 +#include "pagedir.h" +int suspend_post_context_save(void); +#endif /* Pointer to an auxiliary buffer (1 page) */ static void *buffer; @@ -82,6 +90,11 @@ static void *get_image_page(gfp_t gfp_mask, int safe_needed) unsigned long get_safe_page(gfp_t gfp_mask) { +#ifdef CONFIG_SUSPEND2 + if (suspend2_running) + return suspend_get_nonconflicting_page(); +#endif + return (unsigned long)get_image_page(gfp_mask, PG_SAFE); } @@ -604,7 +617,7 @@ static unsigned int count_free_highmem_pages(void) * and it isn't a part of a free chunk of pages. */ -static struct page *saveable_highmem_page(unsigned long pfn) +struct page *saveable_highmem_page(unsigned long pfn) { struct page *page; @@ -646,7 +659,6 @@ unsigned int count_highmem_pages(void) return n; } #else -static inline void *saveable_highmem_page(unsigned long pfn) { return NULL; } static inline unsigned int count_highmem_pages(void) { return 0; } #endif /* CONFIG_HIGHMEM */ @@ -670,7 +682,7 @@ static inline int pfn_is_nosave(unsigned long pfn) * a free chunk of pages. */ -static struct page *saveable_page(unsigned long pfn) +struct page *saveable_page(unsigned long pfn) { struct page *page; @@ -986,6 +998,11 @@ asmlinkage int swsusp_save(void) { unsigned int nr_pages, nr_highmem; +#ifdef CONFIG_SUSPEND2 + if (suspend2_running) + return suspend_post_context_save(); +#endif + printk("swsusp: critical section: \n"); drain_local_pages(); diff --git a/kernel/power/storage.c b/kernel/power/storage.c new file mode 100644 index 0000000..702c1d4 --- /dev/null +++ b/kernel/power/storage.c @@ -0,0 +1,288 @@ +/* + * kernel/power/storage.c + * + * Copyright (C) 2005-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Routines for talking to a userspace program that manages storage. + * + * The kernel side: + * - starts the userspace program; + * - sends messages telling it when to open and close the connection; + * - tells it when to quit; + * + * The user space side: + * - passes messages regarding status; + * + */ + +#include +#include + +#include "sysfs.h" +#include "modules.h" +#include "netlink.h" +#include "storage.h" +#include "ui.h" + +static struct user_helper_data usm_helper_data; +static struct suspend_module_ops usm_ops; +static int message_received = 0; +static int usm_prepare_count = 0; +static int storage_manager_last_action = 0; +static int storage_manager_action = 0; + +static int usm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +{ + int type; + int *data; + + type = nlh->nlmsg_type; + + /* A control message: ignore them */ + if (type < NETLINK_MSG_BASE) + return 0; + + /* Unknown message: reply with EINVAL */ + if (type >= USM_MSG_MAX) + return -EINVAL; + + /* All operations require privileges, even GET */ + if (security_netlink_recv(skb, CAP_NET_ADMIN)) + return -EPERM; + + /* Only allow one task to receive NOFREEZE privileges */ + if (type == NETLINK_MSG_NOFREEZE_ME && usm_helper_data.pid != -1) + return -EBUSY; + + data = (int*)NLMSG_DATA(nlh); + + switch (type) { + case USM_MSG_SUCCESS: + case USM_MSG_FAILED: + message_received = type; + complete(&usm_helper_data.wait_for_process); + break; + default: + printk("Storage manager doesn't recognise message %d.\n", type); + } + + return 1; +} + +#ifdef CONFIG_NET +static int activations = 0; + +int suspend_activate_storage(int force) +{ + int tries = 1; + + if (usm_helper_data.pid == -1 || !usm_ops.enabled) + return 0; + + message_received = 0; + activations++; + + if (activations > 1 && !force) + return 0; + + while ((!message_received || message_received == USM_MSG_FAILED) && tries < 2) { + suspend_prepare_status(DONT_CLEAR_BAR, "Activate storage attempt %d.\n", tries); + + init_completion(&usm_helper_data.wait_for_process); + + suspend_send_netlink_message(&usm_helper_data, + USM_MSG_CONNECT, + NULL, 0); + + /* Wait 2 seconds for the userspace process to make contact */ + wait_for_completion_timeout(&usm_helper_data.wait_for_process, 2*HZ); + + tries++; + } + + return 0; +} + +int suspend_deactivate_storage(int force) +{ + if (usm_helper_data.pid == -1 || !usm_ops.enabled) + return 0; + + message_received = 0; + activations--; + + if (activations && !force) + return 0; + + init_completion(&usm_helper_data.wait_for_process); + + suspend_send_netlink_message(&usm_helper_data, + USM_MSG_DISCONNECT, + NULL, 0); + + wait_for_completion_timeout(&usm_helper_data.wait_for_process, 2*HZ); + + if (!message_received || message_received == USM_MSG_FAILED) { + printk("Returning failure disconnecting storage.\n"); + return 1; + } + + return 0; +} +#endif + +static void storage_manager_simulate(void) +{ + printk("--- Storage manager simulate ---\n"); + suspend_prepare_usm(); + schedule(); + printk("--- Activate storage 1 ---\n"); + suspend_activate_storage(1); + schedule(); + printk("--- Deactivate storage 1 ---\n"); + suspend_deactivate_storage(1); + schedule(); + printk("--- Cleanup usm ---\n"); + suspend_cleanup_usm(); + schedule(); + printk("--- Storage manager simulate ends ---\n"); +} + +static int usm_storage_needed(void) +{ + return strlen(usm_helper_data.program); +} + +static int usm_save_config_info(char *buf) +{ + int len = strlen(usm_helper_data.program); + memcpy(buf, usm_helper_data.program, len); + return len; +} + +static void usm_load_config_info(char *buf, int size) +{ + /* Don't load the saved path if one has already been set */ + if (usm_helper_data.program[0]) + return; + + memcpy(usm_helper_data.program, buf, size); +} + +static int usm_memory_needed(void) +{ + /* ball park figure of 32 pages */ + return (32 * PAGE_SIZE); +} + +/* suspend_prepare_usm + */ +int suspend_prepare_usm(void) +{ + usm_prepare_count++; + + if (usm_prepare_count > 1 || !usm_ops.enabled) + return 0; + + usm_helper_data.pid = -1; + + if (!*usm_helper_data.program) + return 0; + + suspend_netlink_setup(&usm_helper_data); + + if (usm_helper_data.pid == -1) + printk("Suspend2 Storage Manager wanted, but couldn't start it.\n"); + + suspend_activate_storage(0); + + return (usm_helper_data.pid != -1); +} + +void suspend_cleanup_usm(void) +{ + usm_prepare_count--; + + if (usm_helper_data.pid > -1 && !usm_prepare_count) { + suspend_deactivate_storage(0); + suspend_netlink_close(&usm_helper_data); + } +} + +static void storage_manager_activate(void) +{ + if (storage_manager_action == storage_manager_last_action) + return; + + if (storage_manager_action) + suspend_prepare_usm(); + else + suspend_cleanup_usm(); + + storage_manager_last_action = storage_manager_action; +} + +/* + * User interface specific /sys/power/suspend2 entries. + */ + +static struct suspend_sysfs_data sysfs_params[] = { + { SUSPEND2_ATTR("simulate_atomic_copy", SYSFS_RW), + .type = SUSPEND_SYSFS_DATA_NONE, + .write_side_effect = storage_manager_simulate, + }, + + { SUSPEND2_ATTR("enabled", SYSFS_RW), + SYSFS_INT(&usm_ops.enabled, 0, 1, 0) + }, + + { SUSPEND2_ATTR("program", SYSFS_RW), + SYSFS_STRING(usm_helper_data.program, 254, 0) + }, + + { SUSPEND2_ATTR("activate_storage", SYSFS_RW), + SYSFS_INT(&storage_manager_action, 0, 1, 0), + .write_side_effect = storage_manager_activate, + } +}; + +static struct suspend_module_ops usm_ops = { + .type = MISC_MODULE, + .name = "Userspace Storage Manager", + .directory = "storage_manager", + .module = THIS_MODULE, + .storage_needed = usm_storage_needed, + .save_config_info = usm_save_config_info, + .load_config_info = usm_load_config_info, + .memory_needed = usm_memory_needed, + + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +/* suspend_usm_sysfs_init + * Description: Boot time initialisation for user interface. + */ +int s2_usm_init(void) +{ + usm_helper_data.nl = NULL; + usm_helper_data.program[0] = '\0'; + usm_helper_data.pid = -1; + usm_helper_data.skb_size = 0; + usm_helper_data.pool_limit = 6; + usm_helper_data.netlink_id = NETLINK_SUSPEND2_USM; + usm_helper_data.name = "userspace storage manager"; + usm_helper_data.rcv_msg = usm_user_rcv_msg; + usm_helper_data.interface_version = 1; + usm_helper_data.must_init = 0; + init_completion(&usm_helper_data.wait_for_process); + + return suspend_register_module(&usm_ops); +} + +void s2_usm_exit(void) +{ + suspend_unregister_module(&usm_ops); +} diff --git a/kernel/power/storage.h b/kernel/power/storage.h new file mode 100644 index 0000000..e05eeef --- /dev/null +++ b/kernel/power/storage.h @@ -0,0 +1,53 @@ +/* + * kernel/power/storage.h + * + * Copyright (C) 2005-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + */ + +#ifdef CONFIG_NET +int suspend_prepare_usm(void); +void suspend_cleanup_usm(void); + +int suspend_activate_storage(int force); +int suspend_deactivate_storage(int force); +extern int s2_usm_init(void); +extern void s2_usm_exit(void); +#else +static inline int s2_usm_init(void) { return 0; } +static inline void s2_usm_exit(void) { } + +static inline int suspend_activate_storage(int force) +{ + return 0; +} + +static inline int suspend_deactivate_storage(int force) +{ + return 0; +} + +static inline int suspend_prepare_usm(void) { return 0; } +static inline void suspend_cleanup_usm(void) { } +#endif + +enum { + USM_MSG_BASE = 0x10, + + /* Kernel -> Userspace */ + USM_MSG_CONNECT = 0x30, + USM_MSG_DISCONNECT = 0x31, + USM_MSG_SUCCESS = 0x40, + USM_MSG_FAILED = 0x41, + + USM_MSG_MAX, +}; + +#ifdef CONFIG_NET +extern __init int suspend_usm_init(void); +extern __exit void suspend_usm_cleanup(void); +#else +#define suspend_usm_init() do { } while(0) +#define suspend_usb_cleanup() do { } while(0) +#endif diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c new file mode 100644 index 0000000..3de56ac --- /dev/null +++ b/kernel/power/suspend.c @@ -0,0 +1,1022 @@ +/* + * kernel/power/suspend.c + */ +/** \mainpage Suspend2. + * + * Suspend2 provides support for saving and restoring an image of + * system memory to an arbitrary storage device, either on the local computer, + * or across some network. The support is entirely OS based, so Suspend2 + * works without requiring BIOS, APM or ACPI support. The vast majority of the + * code is also architecture independant, so it should be very easy to port + * the code to new architectures. Suspend includes support for SMP, 4G HighMem + * and preemption. Initramfses and initrds are also supported. + * + * Suspend2 uses a modular design, in which the method of storing the image is + * completely abstracted from the core code, as are transformations on the data + * such as compression and/or encryption (multiple 'modules' can be used to + * provide arbitrary combinations of functionality). The user interface is also + * modular, so that arbitrarily simple or complex interfaces can be used to + * provide anything from debugging information through to eye candy. + * + * \section Copyright + * + * Suspend2 is released under the GPLv2. + * + * Copyright (C) 1998-2001 Gabor Kuti
+ * Copyright (C) 1998,2001,2002 Pavel Machek
+ * Copyright (C) 2002-2003 Florent Chabaud
+ * Copyright (C) 2002-2007 Nigel Cunningham (nigel at suspend2 net)
+ * + * \section Credits + * + * Nigel would like to thank the following people for their work: + * + * Bernard Blackham
+ * Web page & Wiki administration, some coding. A person without whom + * Suspend would not be where it is. + * + * Michael Frank
+ * Extensive testing and help with improving stability. I was constantly + * amazed by the quality and quantity of Michael's help. + * + * Pavel Machek
+ * Modifications, defectiveness pointing, being with Gabor at the very beginning, + * suspend to swap space, stop all tasks. Port to 2.4.18-ac and 2.5.17. Even + * though Pavel and I disagree on the direction suspend to disk should take, I + * appreciate the valuable work he did in helping Gabor get the concept working. + * + * ..and of course the myriads of Suspend2 users who have helped diagnose + * and fix bugs, made suggestions on how to improve the code, proofread + * documentation, and donated time and money. + * + * Thanks also to corporate sponsors: + * + * Redhat.Sometime employer from May 2006 (my fault, not Redhat's!). + * + * Cyclades.com. Nigel's employers from Dec 2004 until May 2006, who + * allowed him to work on Suspend and PM related issues on company time. + * + * LinuxFund.org. Sponsored Nigel's work on Suspend for four months Oct 2003 + * to Jan 2004. + * + * LAC Linux. Donated P4 hardware that enabled development and ongoing + * maintenance of SMP and Highmem support. + * + * OSDL. Provided access to various hardware configurations, make occasional + * small donations to the project. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "modules.h" +#include "sysfs.h" +#include "prepare_image.h" +#include "io.h" +#include "ui.h" +#include "power_off.h" +#include "storage.h" +#include "checksum.h" +#include "cluster.h" +#include "suspend2_builtin.h" + +/*! Pageset metadata. */ +struct pagedir pagedir2 = {2}; + +static int get_pmsem = 0, got_pmsem; +static mm_segment_t oldfs; +static atomic_t actions_running; +static int block_dump_save; +extern int block_dump; + +int do_suspend2_step(int step); + +/* + * Basic clean-up routine. + */ +void suspend_finish_anything(int suspend_or_resume) +{ + if (!atomic_dec_and_test(&actions_running)) + return; + + suspend_cleanup_modules(suspend_or_resume); + suspend_put_modules(); + clear_suspend_state(SUSPEND_RUNNING); + set_fs(oldfs); + if (suspend_or_resume) { + block_dump = block_dump_save; + set_cpus_allowed(current, CPU_MASK_ALL); + } +} + +/* + * Basic set-up routine. + */ +int suspend_start_anything(int suspend_or_resume) +{ + if (atomic_add_return(1, &actions_running) != 1) { + if (suspend_or_resume) { + printk("Can't start a cycle when actions are " + "already running.\n"); + atomic_dec(&actions_running); + return -EBUSY; + } else + return 0; + } + + oldfs = get_fs(); + set_fs(KERNEL_DS); + + if (!suspendActiveAllocator) { + /* Be quiet if we're not trying to suspend or resume */ + if (suspend_or_resume) + printk("No storage allocator is currently active. " + "Rechecking whether we can use one.\n"); + suspend_attempt_to_parse_resume_device(!suspend_or_resume); + } + + set_suspend_state(SUSPEND_RUNNING); + + if (suspend_get_modules()) { + printk("Suspend2: Get modules failed!\n"); + goto out_err; + } + + if (suspend_initialise_modules(suspend_or_resume)) { + printk("Suspend2: Initialise modules failed!\n"); + goto out_err; + } + + if (suspend_or_resume) { + block_dump_save = block_dump; + block_dump = 0; + set_cpus_allowed(current, CPU_MASK_CPU0); + } + + return 0; + +out_err: + if (suspend_or_resume) + block_dump_save = block_dump; + suspend_finish_anything(suspend_or_resume); + return -EBUSY; +} + +/* + * Allocate & free bitmaps. + */ +static int allocate_bitmaps(void) +{ + if (allocate_dyn_pageflags(&pageset1_map) || + allocate_dyn_pageflags(&pageset1_copy_map) || + allocate_dyn_pageflags(&pageset2_map) || + allocate_dyn_pageflags(&io_map) || + allocate_dyn_pageflags(&page_resave_map)) + return 1; + + return 0; +} + +static void free_bitmaps(void) +{ + free_dyn_pageflags(&pageset1_map); + free_dyn_pageflags(&pageset1_copy_map); + free_dyn_pageflags(&pageset2_map); + free_dyn_pageflags(&io_map); + free_dyn_pageflags(&page_resave_map); +} + +static int io_MB_per_second(int read_write) +{ + return (suspend_io_time[read_write][1]) ? + MB((unsigned long) suspend_io_time[read_write][0]) * HZ / + suspend_io_time[read_write][1] : 0; +} + +/* get_debug_info + * Functionality: Store debug info in a buffer. + */ +#define SNPRINTF(a...) len += snprintf_used(((char *)buffer) + len, \ + count - len - 1, ## a) +static int get_suspend_debug_info(const char *buffer, int count) +{ + int len = 0; + + SNPRINTF("Suspend2 debugging info:\n"); + SNPRINTF("- Suspend core : %s\n", SUSPEND_CORE_VERSION); + SNPRINTF("- Kernel Version : %s\n", UTS_RELEASE); + SNPRINTF("- Compiler vers. : %d.%d\n", __GNUC__, __GNUC_MINOR__); + SNPRINTF("- Attempt number : %d\n", nr_suspends); + SNPRINTF("- Parameters : %ld %ld %ld %d %d %ld\n", + suspend_result, + suspend_action, + suspend_debug_state, + suspend_default_console_level, + image_size_limit, + suspend2_poweroff_method); + SNPRINTF("- Overall expected compression percentage: %d.\n", + 100 - suspend_expected_compression_ratio()); + len+= suspend_print_module_debug_info(((char *) buffer) + len, + count - len - 1); + if (suspend_io_time[0][1]) { + if ((io_MB_per_second(0) < 5) || (io_MB_per_second(1) < 5)) { + SNPRINTF("- I/O speed: Write %d KB/s", + (KB((unsigned long) suspend_io_time[0][0]) * HZ / + suspend_io_time[0][1])); + if (suspend_io_time[1][1]) + SNPRINTF(", Read %d KB/s", + (KB((unsigned long) suspend_io_time[1][0]) * HZ / + suspend_io_time[1][1])); + } else { + SNPRINTF("- I/O speed: Write %d MB/s", + (MB((unsigned long) suspend_io_time[0][0]) * HZ / + suspend_io_time[0][1])); + if (suspend_io_time[1][1]) + SNPRINTF(", Read %d MB/s", + (MB((unsigned long) suspend_io_time[1][0]) * HZ / + suspend_io_time[1][1])); + } + SNPRINTF(".\n"); + } + else + SNPRINTF("- No I/O speed stats available.\n"); + SNPRINTF("- Extra pages : %d used/%d.\n", + extra_pd1_pages_used, extra_pd1_pages_allowance); + + return len; +} + +/* + * do_cleanup + */ + +static void do_cleanup(int get_debug_info) +{ + int i = 0; + char *buffer = NULL; + + if (get_debug_info) + suspend_prepare_status(DONT_CLEAR_BAR, "Cleaning up..."); + relink_lru_lists(); + + free_checksum_pages(); + + if (get_debug_info) + buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + if (buffer) + i = get_suspend_debug_info(buffer, PAGE_SIZE); + + suspend_free_extra_pagedir_memory(); + + pagedir1.size = pagedir2.size = 0; + set_highmem_size(pagedir1, 0); + set_highmem_size(pagedir2, 0); + + restore_avenrun(); + + thaw_processes(); + +#ifdef CONFIG_SUSPEND2_KEEP_IMAGE + if (test_action_state(SUSPEND_KEEP_IMAGE) && + !test_result_state(SUSPEND_ABORTED)) { + suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1, + "Suspend2: Not invalidating the image due " + "to Keep Image being enabled.\n"); + set_result_state(SUSPEND_KEPT_IMAGE); + } else +#endif + if (suspendActiveAllocator) + suspendActiveAllocator->invalidate_image(); + + free_bitmaps(); + + if (buffer && i) { + /* Printk can only handle 1023 bytes, including + * its level mangling. */ + for (i = 0; i < 3; i++) + printk("%s", buffer + (1023 * i)); + free_page((unsigned long) buffer); + } + + if (!test_action_state(SUSPEND_LATE_CPU_HOTPLUG)) + enable_nonboot_cpus(); + suspend_cleanup_console(); + + suspend_deactivate_storage(0); + + clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL); + clear_suspend_state(SUSPEND_TRYING_TO_RESUME); + clear_suspend_state(SUSPEND_NOW_RESUMING); + + if (got_pmsem) { + mutex_unlock(&pm_mutex); + got_pmsem = 0; + } +} + +static int check_still_keeping_image(void) +{ + if (test_action_state(SUSPEND_KEEP_IMAGE)) { + printk("Image already stored: powering down immediately."); + do_suspend2_step(STEP_SUSPEND_POWERDOWN); + return 1; /* Just in case we're using S3 */ + } + + printk("Invalidating previous image.\n"); + suspendActiveAllocator->invalidate_image(); + + return 0; +} + +static int suspend_init(void) +{ + suspend_result = 0; + + printk(KERN_INFO "Suspend2: Initiating a software suspend cycle.\n"); + + nr_suspends++; + + save_avenrun(); + + suspend_io_time[0][0] = suspend_io_time[0][1] = + suspend_io_time[1][0] = suspend_io_time[1][1] = 0; + + if (!test_suspend_state(SUSPEND_CAN_SUSPEND) || + allocate_bitmaps()) + return 0; + + suspend_prepare_console(); + if (test_action_state(SUSPEND_LATE_CPU_HOTPLUG) || + !disable_nonboot_cpus()) + return 1; + + set_result_state(SUSPEND_CPU_HOTPLUG_FAILED); + set_result_state(SUSPEND_ABORTED); + return 0; +} + +static int can_suspend(void) +{ + if (get_pmsem) { + if (!mutex_trylock(&pm_mutex)) { + printk("Suspend2: Failed to obtain pm_mutex.\n"); + dump_stack(); + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_PM_SEM); + return 0; + } + got_pmsem = 1; + } + + if (!test_suspend_state(SUSPEND_CAN_SUSPEND)) + suspend_attempt_to_parse_resume_device(0); + + if (!test_suspend_state(SUSPEND_CAN_SUSPEND)) { + printk("Suspend2: Software suspend is disabled.\n" + "This may be because you haven't put something along " + "the lines of\n\nresume2=swap:/dev/hda1\n\n" + "in lilo.conf or equivalent. (Where /dev/hda1 is your " + "swap partition).\n"); + set_result_state(SUSPEND_ABORTED); + if (!got_pmsem) { + mutex_unlock(&pm_mutex); + got_pmsem = 0; + } + return 0; + } + + return 1; +} + +static int do_power_down(void) +{ + /* If switching images fails, do normal powerdown */ + if (poweroff_resume2[0]) + do_suspend2_step(STEP_RESUME_ALT_IMAGE); + + suspend_cond_pause(1, "About to power down or reboot."); + suspend2_power_down(); + + /* If we return, it's because we suspended to ram */ + if (read_pageset2(1)) + panic("Attempt to reload pagedir 2 failed. Try rebooting."); + + barrier(); + mb(); + do_cleanup(1); + return 0; +} + +/* + * __save_image + * Functionality : High level routine which performs the steps necessary + * to save the image after preparatory steps have been taken. + * Key Assumptions : Processes frozen, sufficient memory available, drivers + * suspended. + */ +static int __save_image(void) +{ + int temp_result; + + suspend_prepare_status(DONT_CLEAR_BAR, "Starting to save the image.."); + + suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1, + " - Final values: %d and %d.\n", + pagedir1.size, pagedir2.size); + + suspend_cond_pause(1, "About to write pagedir2."); + + calculate_check_checksums(0); + + temp_result = write_pageset(&pagedir2); + + if (temp_result == -1 || test_result_state(SUSPEND_ABORTED)) + return 1; + + suspend_cond_pause(1, "About to copy pageset 1."); + + if (test_result_state(SUSPEND_ABORTED)) + return 1; + + suspend_deactivate_storage(1); + + suspend_prepare_status(DONT_CLEAR_BAR, "Doing atomic copy."); + + suspend2_in_suspend = 1; + + suspend_console(); + if (device_suspend(PMSG_FREEZE)) { + set_result_state(SUSPEND_DEVICE_REFUSED); + set_result_state(SUSPEND_ABORTED); + goto ResumeConsole; + } + + if (test_action_state(SUSPEND_LATE_CPU_HOTPLUG) && + disable_nonboot_cpus()) { + set_result_state(SUSPEND_CPU_HOTPLUG_FAILED); + set_result_state(SUSPEND_ABORTED); + } else + temp_result = suspend2_suspend(); + + /* We return here at resume time too! */ + if (!suspend2_in_suspend && pm_ops && pm_ops->finish && + suspend2_poweroff_method > 3) + pm_ops->finish(suspend2_poweroff_method); + + if (test_action_state(SUSPEND_LATE_CPU_HOTPLUG)) + enable_nonboot_cpus(); + + device_resume(); + +ResumeConsole: + resume_console(); + + if (suspend_activate_storage(1)) + panic("Failed to reactivate our storage."); + + if (temp_result || test_result_state(SUSPEND_ABORTED)) + return 1; + + /* Resume time? */ + if (!suspend2_in_suspend) { + copyback_post(); + return 0; + } + + /* Nope. Suspending. So, see if we can save the image... */ + + suspend_update_status(pagedir2.size, + pagedir1.size + pagedir2.size, + NULL); + + if (test_result_state(SUSPEND_ABORTED)) + goto abort_reloading_pagedir_two; + + suspend_cond_pause(1, "About to write pageset1."); + + suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1, + "-- Writing pageset1\n"); + + temp_result = write_pageset(&pagedir1); + + /* We didn't overwrite any memory, so no reread needs to be done. */ + if (test_action_state(SUSPEND_TEST_FILTER_SPEED)) + return 1; + + if (temp_result == 1 || test_result_state(SUSPEND_ABORTED)) + goto abort_reloading_pagedir_two; + + suspend_cond_pause(1, "About to write header."); + + if (test_result_state(SUSPEND_ABORTED)) + goto abort_reloading_pagedir_two; + + temp_result = write_image_header(); + + if (test_action_state(SUSPEND_TEST_BIO)) + return 1; + + if (!temp_result && !test_result_state(SUSPEND_ABORTED)) + return 0; + +abort_reloading_pagedir_two: + temp_result = read_pageset2(1); + + /* If that failed, we're sunk. Panic! */ + if (temp_result) + panic("Attempt to reload pagedir 2 while aborting " + "a suspend failed."); + + return 1; +} + +/* + * do_save_image + * + * Save the prepared image. + */ + +static int do_save_image(void) +{ + int result = __save_image(); + if (!suspend2_in_suspend || result) + do_cleanup(1); + return result; +} + + +/* do_prepare_image + * + * Seek to initialise and prepare an image to be saved. On failure, + * cleanup. + */ + +static int do_prepare_image(void) +{ + if (suspend_activate_storage(0)) + return 1; + + /* + * If kept image and still keeping image and suspending to RAM, we will + * return 1 after suspending and resuming (provided the power doesn't + * run out. + */ + + if (!can_suspend() || + (test_result_state(SUSPEND_KEPT_IMAGE) && + check_still_keeping_image())) + goto cleanup; + + if (suspend_init() && !suspend_prepare_image() && + !test_result_state(SUSPEND_ABORTED)) + return 0; + +cleanup: + do_cleanup(0); + return 1; +} + +static int do_check_can_resume(void) +{ + char *buf = (char *) get_zeroed_page(GFP_KERNEL); + int result = 0; + + if (!buf) + return 0; + + /* Only interested in first byte, so throw away return code. */ + image_exists_read(buf, PAGE_SIZE); + + if (buf[0] == '1') + result = 1; + + free_page((unsigned long) buf); + return result; +} + +/* + * We check if we have an image and if so we try to resume. + */ +static int do_load_atomic_copy(void) +{ + int read_image_result = 0; + + if (sizeof(swp_entry_t) != sizeof(long)) { + printk(KERN_WARNING "Suspend2: The size of swp_entry_t != size" + " of long. Please report this!\n"); + return 1; + } + + if (!resume2_file[0]) + printk(KERN_WARNING "Suspend2: " + "You need to use a resume2= command line parameter to " + "tell Suspend2 where to look for an image.\n"); + + suspend_activate_storage(0); + + if (!(test_suspend_state(SUSPEND_RESUME_DEVICE_OK)) && + !suspend_attempt_to_parse_resume_device(0)) { + /* + * Without a usable storage device we can do nothing - + * even if noresume is given + */ + + if (!suspendNumAllocators) + printk(KERN_ALERT "Suspend2: " + "No storage allocators have been registered.\n"); + else + printk(KERN_ALERT "Suspend2: " + "Missing or invalid storage location " + "(resume2= parameter). Please correct and " + "rerun lilo (or equivalent) before " + "suspending.\n"); + suspend_deactivate_storage(0); + return 1; + } + + read_image_result = read_pageset1(); /* non fatal error ignored */ + + if (test_suspend_state(SUSPEND_NORESUME_SPECIFIED)) { + printk(KERN_WARNING "Suspend2: Resuming disabled as requested.\n"); + clear_suspend_state(SUSPEND_NORESUME_SPECIFIED); + } + + suspend_deactivate_storage(0); + + if (read_image_result) + return 1; + + return 0; +} + +static void prepare_restore_load_alt_image(int prepare) +{ + static dyn_pageflags_t pageset1_map_save, pageset1_copy_map_save; + + if (prepare) { + pageset1_map_save = pageset1_map; + pageset1_map = NULL; + pageset1_copy_map_save = pageset1_copy_map; + pageset1_copy_map = NULL; + set_suspend_state(SUSPEND_LOADING_ALT_IMAGE); + suspend_reset_alt_image_pageset2_pfn(); + } else { + if (pageset1_map) + free_dyn_pageflags(&pageset1_map); + pageset1_map = pageset1_map_save; + if (pageset1_copy_map) + free_dyn_pageflags(&pageset1_copy_map); + pageset1_copy_map = pageset1_copy_map_save; + clear_suspend_state(SUSPEND_NOW_RESUMING); + clear_suspend_state(SUSPEND_LOADING_ALT_IMAGE); + } +} + +int pre_resume_freeze(void) +{ + if (!test_action_state(SUSPEND_LATE_CPU_HOTPLUG)) { + suspend_prepare_status(DONT_CLEAR_BAR, "Disable nonboot cpus."); + if (disable_nonboot_cpus()) { + set_result_state(SUSPEND_CPU_HOTPLUG_FAILED); + set_result_state(SUSPEND_ABORTED); + return 1; + } + } + + suspend_prepare_status(DONT_CLEAR_BAR, "Freeze processes."); + + if (freeze_processes()) { + printk("Some processes failed to suspend\n"); + return 1; + } + + return 0; +} + +void post_resume_thaw(void) +{ + thaw_processes(); + if (!test_action_state(SUSPEND_LATE_CPU_HOTPLUG)) + enable_nonboot_cpus(); +} + +int do_suspend2_step(int step) +{ + int result; + + switch (step) { + case STEP_SUSPEND_PREPARE_IMAGE: + return do_prepare_image(); + case STEP_SUSPEND_SAVE_IMAGE: + return do_save_image(); + case STEP_SUSPEND_POWERDOWN: + return do_power_down(); + case STEP_RESUME_CAN_RESUME: + return do_check_can_resume(); + case STEP_RESUME_LOAD_PS1: + return do_load_atomic_copy(); + case STEP_RESUME_DO_RESTORE: + /* + * If we succeed, this doesn't return. + * Instead, we return from do_save_image() in the + * suspended kernel. + */ + result = suspend_atomic_restore(); + if (result) + post_resume_thaw(); + return result; + case STEP_RESUME_ALT_IMAGE: + printk("Trying to resume alternate image.\n"); + suspend2_in_suspend = 0; + save_restore_resume2(SAVE, NOQUIET); + prepare_restore_load_alt_image(1); + if (!do_check_can_resume()) { + printk("Nothing to resume from.\n"); + goto out; + } + if (!do_load_atomic_copy()) { + printk("Failed to load image.\n"); + suspend_atomic_restore(); + } +out: + prepare_restore_load_alt_image(0); + save_restore_resume2(RESTORE, NOQUIET); + break; + } + + return 0; +} + +/* -- Functions for kickstarting a suspend or resume --- */ + +/* + * Check if we have an image and if so try to resume. + */ +void __suspend2_try_resume(void) +{ + set_suspend_state(SUSPEND_TRYING_TO_RESUME); + resume_attempted = 1; + + if (do_suspend2_step(STEP_RESUME_CAN_RESUME) && + !do_suspend2_step(STEP_RESUME_LOAD_PS1)) + do_suspend2_step(STEP_RESUME_DO_RESTORE); + + do_cleanup(0); + + clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL); + clear_suspend_state(SUSPEND_TRYING_TO_RESUME); + clear_suspend_state(SUSPEND_NOW_RESUMING); +} + +/* Wrapper for when called from init/do_mounts.c */ +void _suspend2_try_resume(void) +{ + resume_attempted = 1; + + if (suspend_start_anything(SYSFS_RESUMING)) + return; + + /* Unlock will be done in do_cleanup */ + mutex_lock(&pm_mutex); + got_pmsem = 1; + + __suspend2_try_resume(); + + /* + * For initramfs, we have to clear the boot time + * flag after trying to resume + */ + clear_suspend_state(SUSPEND_BOOT_TIME); + suspend_finish_anything(SYSFS_RESUMING); +} + +/* + * _suspend2_try_suspend + * Functionality : + * Called From : drivers/acpi/sleep/main.c + * kernel/reboot.c + */ +int _suspend2_try_suspend(int have_pmsem) +{ + int result = 0, sys_power_disk = 0; + + if (!atomic_read(&actions_running)) { + /* Came in via /sys/power/disk */ + if (suspend_start_anything(SYSFS_SUSPENDING)) + return -EBUSY; + sys_power_disk = 1; + } + + get_pmsem = !have_pmsem; + + if (strlen(poweroff_resume2)) { + attempt_to_parse_po_resume_device2(); + + if (!strlen(poweroff_resume2)) { + printk("Poweroff resume2 now invalid. Aborting.\n"); + goto out; + } + } + + if ((result = do_suspend2_step(STEP_SUSPEND_PREPARE_IMAGE))) + goto out; + + if (test_action_state(SUSPEND_FREEZER_TEST)) { + do_cleanup(0); + goto out; + } + + if ((result = do_suspend2_step(STEP_SUSPEND_SAVE_IMAGE))) + goto out; + + /* This code runs at resume time too! */ + if (suspend2_in_suspend) + result = do_suspend2_step(STEP_SUSPEND_POWERDOWN); +out: + if (sys_power_disk) + suspend_finish_anything(SYSFS_SUSPENDING); + return result; +} + +/* + * This array contains entries that are automatically registered at + * boot. Modules and the console code register their own entries separately. + */ +static struct suspend_sysfs_data sysfs_params[] = { + { SUSPEND2_ATTR("extra_pages_allowance", SYSFS_RW), + SYSFS_INT(&extra_pd1_pages_allowance, 0, INT_MAX, 0) + }, + + { SUSPEND2_ATTR("image_exists", SYSFS_RW), + SYSFS_CUSTOM(image_exists_read, image_exists_write, + SYSFS_NEEDS_SM_FOR_BOTH) + }, + + { SUSPEND2_ATTR("resume2", SYSFS_RW), + SYSFS_STRING(resume2_file, 255, SYSFS_NEEDS_SM_FOR_WRITE), + .write_side_effect = attempt_to_parse_resume_device2, + }, + + { SUSPEND2_ATTR("poweroff_resume2", SYSFS_RW), + SYSFS_STRING(poweroff_resume2, 255, SYSFS_NEEDS_SM_FOR_WRITE), + .write_side_effect = attempt_to_parse_po_resume_device2, + }, + { SUSPEND2_ATTR("debug_info", SYSFS_READONLY), + SYSFS_CUSTOM(get_suspend_debug_info, NULL, 0) + }, + + { SUSPEND2_ATTR("ignore_rootfs", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_IGNORE_ROOTFS, 0) + }, + + { SUSPEND2_ATTR("image_size_limit", SYSFS_RW), + SYSFS_INT(&image_size_limit, -2, INT_MAX, 0) + }, + + { SUSPEND2_ATTR("last_result", SYSFS_RW), + SYSFS_UL(&suspend_result, 0, 0, 0) + }, + + { SUSPEND2_ATTR("no_multithreaded_io", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_NO_MULTITHREADED_IO, 0) + }, + + { SUSPEND2_ATTR("full_pageset2", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_PAGESET2_FULL, 0) + }, + + { SUSPEND2_ATTR("reboot", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_REBOOT, 0) + }, + +#ifdef CONFIG_SOFTWARE_SUSPEND + { SUSPEND2_ATTR("replace_swsusp", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_REPLACE_SWSUSP, 0) + }, +#endif + + { SUSPEND2_ATTR("resume_commandline", SYSFS_RW), + SYSFS_STRING(suspend2_nosave_commandline, COMMAND_LINE_SIZE, 0) + }, + + { SUSPEND2_ATTR("version", SYSFS_READONLY), + SYSFS_STRING(SUSPEND_CORE_VERSION, 0, 0) + }, + + { SUSPEND2_ATTR("no_load_direct", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_NO_DIRECT_LOAD, 0) + }, + + { SUSPEND2_ATTR("freezer_test", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_FREEZER_TEST, 0) + }, + + { SUSPEND2_ATTR("test_bio", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_TEST_BIO, 0) + }, + + { SUSPEND2_ATTR("test_filter_speed", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_TEST_FILTER_SPEED, 0) + }, + + { SUSPEND2_ATTR("slow", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_SLOW, 0) + }, + + { SUSPEND2_ATTR("no_pageset2", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_NO_PAGESET2, 0) + }, + + { SUSPEND2_ATTR("late_cpu_hotplug", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_LATE_CPU_HOTPLUG, 0) + }, + +#if defined(CONFIG_ACPI) + { SUSPEND2_ATTR("powerdown_method", SYSFS_RW), + SYSFS_UL(&suspend2_poweroff_method, 0, 5, 0) + }, +#endif + +#ifdef CONFIG_SUSPEND2_KEEP_IMAGE + { SUSPEND2_ATTR("keep_image", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_KEEP_IMAGE, 0) + }, +#endif +}; + +struct suspend2_core_fns my_fns = { + .get_nonconflicting_page = __suspend_get_nonconflicting_page, + .post_context_save = __suspend_post_context_save, + .try_suspend = _suspend2_try_suspend, + .try_resume = _suspend2_try_resume, +}; + +static __init int core_load(void) +{ + int i, + numfiles = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data); + + printk("Suspend v" SUSPEND_CORE_VERSION "\n"); + + if (s2_sysfs_init()) + return 1; + + for (i=0; i< numfiles; i++) + suspend_register_sysfs_file(&suspend2_subsys.kset.kobj, + &sysfs_params[i]); + + s2_core_fns = &my_fns; + + if (s2_checksum_init()) + return 1; + if (s2_cluster_init()) + return 1; + if (s2_usm_init()) + return 1; + if (s2_ui_init()) + return 1; + +#ifdef CONFIG_SOFTWARE_SUSPEND + /* Overriding resume2= with resume=? */ + if (test_action_state(SUSPEND_REPLACE_SWSUSP) && resume_file[0]) + strncpy(resume2_file, resume_file, 256); +#endif + + return 0; +} + +#ifdef MODULE +static __exit void core_unload(void) +{ + int i, + numfiles = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data); + + s2_ui_exit(); + s2_checksum_exit(); + s2_cluster_exit(); + s2_usm_exit(); + + for (i=0; i< numfiles; i++) + suspend_unregister_sysfs_file(&suspend2_subsys.kset.kobj, + &sysfs_params[i]); + + s2_core_fns = NULL; + + s2_sysfs_exit(); +} +MODULE_LICENSE("GPL"); +module_init(core_load); +module_exit(core_unload); +#else +late_initcall(core_load); +#endif + +#ifdef CONFIG_SUSPEND2_EXPORTS +EXPORT_SYMBOL_GPL(pagedir2); +#endif diff --git a/kernel/power/suspend.h b/kernel/power/suspend.h new file mode 100644 index 0000000..81e752d --- /dev/null +++ b/kernel/power/suspend.h @@ -0,0 +1,182 @@ +/* + * kernel/power/suspend.h + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * It contains declarations used throughout swsusp. + * + */ + +#ifndef KERNEL_POWER_SUSPEND_H +#define KERNEL_POWER_SUSPEND_H + +#include +#include +#include +#include +#include +#include "pageflags.h" + +#define SUSPEND_CORE_VERSION "2.2.9.17" + +/* == Action states == */ + +enum { + SUSPEND_REBOOT, + SUSPEND_PAUSE, + SUSPEND_SLOW, + SUSPEND_LOGALL, + SUSPEND_CAN_CANCEL, + SUSPEND_KEEP_IMAGE, + SUSPEND_FREEZER_TEST, + SUSPEND_SINGLESTEP, + SUSPEND_PAUSE_NEAR_PAGESET_END, + SUSPEND_TEST_FILTER_SPEED, + SUSPEND_TEST_BIO, + SUSPEND_NO_PAGESET2, + SUSPEND_PM_PREPARE_CONSOLE, + SUSPEND_IGNORE_ROOTFS, + SUSPEND_REPLACE_SWSUSP, + SUSPEND_RETRY_RESUME, + SUSPEND_PAGESET2_FULL, + SUSPEND_ABORT_ON_RESAVE_NEEDED, + SUSPEND_NO_MULTITHREADED_IO, + SUSPEND_NO_DIRECT_LOAD, + SUSPEND_LATE_CPU_HOTPLUG, +}; + +extern unsigned long suspend_action; + +#define clear_action_state(bit) (test_and_clear_bit(bit, &suspend_action)) +#define test_action_state(bit) (test_bit(bit, &suspend_action)) + +/* == Result states == */ + +enum { + SUSPEND_ABORTED, + SUSPEND_ABORT_REQUESTED, + SUSPEND_NOSTORAGE_AVAILABLE, + SUSPEND_INSUFFICIENT_STORAGE, + SUSPEND_FREEZING_FAILED, + SUSPEND_UNEXPECTED_ALLOC, + SUSPEND_KEPT_IMAGE, + SUSPEND_WOULD_EAT_MEMORY, + SUSPEND_UNABLE_TO_FREE_ENOUGH_MEMORY, + SUSPEND_ENCRYPTION_SETUP_FAILED, + SUSPEND_PM_SEM, + SUSPEND_DEVICE_REFUSED, + SUSPEND_EXTRA_PAGES_ALLOW_TOO_SMALL, + SUSPEND_UNABLE_TO_PREPARE_IMAGE, + SUSPEND_FAILED_MODULE_INIT, + SUSPEND_FAILED_MODULE_CLEANUP, + SUSPEND_FAILED_IO, + SUSPEND_OUT_OF_MEMORY, + SUSPEND_IMAGE_ERROR, + SUSPEND_PLATFORM_PREP_FAILED, + SUSPEND_CPU_HOTPLUG_FAILED, +}; + +extern unsigned long suspend_result; + +#define set_result_state(bit) (test_and_set_bit(bit, &suspend_result)) +#define clear_result_state(bit) (test_and_clear_bit(bit, &suspend_result)) +#define test_result_state(bit) (test_bit(bit, &suspend_result)) + +/* == Debug sections and levels == */ + +/* debugging levels. */ +enum { + SUSPEND_STATUS = 0, + SUSPEND_ERROR = 2, + SUSPEND_LOW, + SUSPEND_MEDIUM, + SUSPEND_HIGH, + SUSPEND_VERBOSE, +}; + +enum { + SUSPEND_ANY_SECTION, + SUSPEND_EAT_MEMORY, + SUSPEND_IO, + SUSPEND_HEADER, + SUSPEND_WRITER, + SUSPEND_MEMORY, +}; + +extern unsigned long suspend_debug_state; + +#define set_debug_state(bit) (test_and_set_bit(bit, &suspend_debug_state)) +#define clear_debug_state(bit) (test_and_clear_bit(bit, &suspend_debug_state)) +#define test_debug_state(bit) (test_bit(bit, &suspend_debug_state)) + +/* == Steps in suspending == */ + +enum { + STEP_SUSPEND_PREPARE_IMAGE, + STEP_SUSPEND_SAVE_IMAGE, + STEP_SUSPEND_POWERDOWN, + STEP_RESUME_CAN_RESUME, + STEP_RESUME_LOAD_PS1, + STEP_RESUME_DO_RESTORE, + STEP_RESUME_READ_PS2, + STEP_RESUME_GO, + STEP_RESUME_ALT_IMAGE, +}; + +/* == Suspend states == + (see also include/linux/suspend.h) */ + +#define get_suspend_state() (suspend_state) +#define restore_suspend_state(saved_state) \ + do { suspend_state = saved_state; } while(0) + +/* == Module support == */ + +struct suspend2_core_fns { + int (*post_context_save)(void); + unsigned long (*get_nonconflicting_page)(void); + int (*try_suspend)(int have_pmsem); + void (*try_resume)(void); +}; + +extern struct suspend2_core_fns *s2_core_fns; + +/* == All else == */ +#define KB(x) ((x) << (PAGE_SHIFT - 10)) +#define MB(x) ((x) >> (20 - PAGE_SHIFT)) + +extern int suspend_start_anything(int suspend_or_resume); +extern void suspend_finish_anything(int suspend_or_resume); + +extern int save_image_part1(void); +extern int suspend_atomic_restore(void); + +extern int _suspend2_try_suspend(int have_pmsem); +extern void __suspend2_try_resume(void); + +extern int __suspend_post_context_save(void); + +extern unsigned int nr_suspends; +extern char resume2_file[256]; +extern char poweroff_resume2[256]; + +extern void copyback_post(void); +extern int suspend2_suspend(void); +extern int extra_pd1_pages_used; + +extern int suspend_io_time[2][2]; + +#define SECTOR_SIZE 512 + +extern int suspend_early_boot_message + (int can_erase_image, int default_answer, char *warning_reason, ...); + +static inline int load_direct(struct page *page) +{ + return test_action_state(SUSPEND_NO_DIRECT_LOAD) ? 0 : PagePageset1Copy(page); +} + +extern int pre_resume_freeze(void); +#endif diff --git a/kernel/power/suspend2_builtin.c b/kernel/power/suspend2_builtin.c new file mode 100644 index 0000000..15fe301 --- /dev/null +++ b/kernel/power/suspend2_builtin.c @@ -0,0 +1,287 @@ +/* + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "io.h" +#include "suspend.h" +#include "extent.h" +#include "block_io.h" +#include "netlink.h" +#include "prepare_image.h" +#include "ui.h" +#include "sysfs.h" +#include "pagedir.h" +#include "modules.h" +#include "suspend2_builtin.h" + +#ifdef CONFIG_SUSPEND2_CORE_EXPORTS +#ifdef CONFIG_SOFTWARE_SUSPEND +EXPORT_SYMBOL_GPL(resume_file); +#endif + +EXPORT_SYMBOL_GPL(max_pfn); +EXPORT_SYMBOL_GPL(free_dyn_pageflags); +EXPORT_SYMBOL_GPL(clear_dynpageflag); +EXPORT_SYMBOL_GPL(test_dynpageflag); +EXPORT_SYMBOL_GPL(set_dynpageflag); +EXPORT_SYMBOL_GPL(get_next_bit_on); +EXPORT_SYMBOL_GPL(allocate_dyn_pageflags); +EXPORT_SYMBOL_GPL(clear_dyn_pageflags); + +#ifdef CONFIG_X86_64 +EXPORT_SYMBOL_GPL(restore_processor_state); +EXPORT_SYMBOL_GPL(save_processor_state); +#endif + +EXPORT_SYMBOL_GPL(kernel_shutdown_prepare); +EXPORT_SYMBOL_GPL(drop_pagecache); +EXPORT_SYMBOL_GPL(restore_pblist); +EXPORT_SYMBOL_GPL(pm_mutex); +EXPORT_SYMBOL_GPL(pm_restore_console); +EXPORT_SYMBOL_GPL(super_blocks); +EXPORT_SYMBOL_GPL(next_zone); + +EXPORT_SYMBOL_GPL(freeze_processes); +EXPORT_SYMBOL_GPL(thaw_processes); +EXPORT_SYMBOL_GPL(thaw_kernel_threads); +EXPORT_SYMBOL_GPL(shrink_all_memory); +EXPORT_SYMBOL_GPL(shrink_one_zone); +EXPORT_SYMBOL_GPL(saveable_page); +EXPORT_SYMBOL_GPL(swsusp_arch_suspend); +EXPORT_SYMBOL_GPL(swsusp_arch_resume); +EXPORT_SYMBOL_GPL(pm_ops); +EXPORT_SYMBOL_GPL(pm_prepare_console); +EXPORT_SYMBOL_GPL(follow_page); +EXPORT_SYMBOL_GPL(machine_halt); +EXPORT_SYMBOL_GPL(block_dump); +EXPORT_SYMBOL_GPL(unlink_lru_lists); +EXPORT_SYMBOL_GPL(relink_lru_lists); +EXPORT_SYMBOL_GPL(power_subsys); +EXPORT_SYMBOL_GPL(machine_power_off); +EXPORT_SYMBOL_GPL(suspend_enter); +EXPORT_SYMBOL_GPL(first_online_pgdat); +EXPORT_SYMBOL_GPL(next_online_pgdat); +EXPORT_SYMBOL_GPL(machine_restart); +EXPORT_SYMBOL_GPL(saved_command_line); +EXPORT_SYMBOL_GPL(tasklist_lock); +#ifdef CONFIG_SUSPEND_SMP +EXPORT_SYMBOL_GPL(disable_nonboot_cpus); +EXPORT_SYMBOL_GPL(enable_nonboot_cpus); +#endif +#endif + +#ifdef CONFIG_SUSPEND2_USERUI_EXPORTS +EXPORT_SYMBOL_GPL(kmsg_redirect); +EXPORT_SYMBOL_GPL(console_printk); +#ifndef CONFIG_COMPAT +EXPORT_SYMBOL_GPL(sys_ioctl); +#endif +#endif + +#ifdef CONFIG_SUSPEND2_SWAP_EXPORTS /* Suspend swap specific */ +EXPORT_SYMBOL_GPL(sys_swapon); +EXPORT_SYMBOL_GPL(sys_swapoff); +EXPORT_SYMBOL_GPL(si_swapinfo); +EXPORT_SYMBOL_GPL(map_swap_page); +EXPORT_SYMBOL_GPL(get_swap_page); +EXPORT_SYMBOL_GPL(swap_free); +EXPORT_SYMBOL_GPL(get_swap_info_struct); +#endif + +#ifdef CONFIG_SUSPEND2_FILE_EXPORTS +/* Suspend_file specific */ +extern char * __initdata root_device_name; + +EXPORT_SYMBOL_GPL(ROOT_DEV); +EXPORT_SYMBOL_GPL(root_device_name); +EXPORT_SYMBOL_GPL(sys_unlink); +EXPORT_SYMBOL_GPL(sys_mknod); +#endif + +/* Swap or file */ +#if defined(CONFIG_SUSPEND2_FILE_EXPORTS) || defined(CONFIG_SUSPEND2_SWAP_EXPORTS) +EXPORT_SYMBOL_GPL(bio_set_pages_dirty); +EXPORT_SYMBOL_GPL(name_to_dev_t); +#endif + +#if defined(CONFIG_SUSPEND2_EXPORTS) || defined(CONFIG_SUSPEND2_CORE_EXPORTS) +EXPORT_SYMBOL_GPL(snprintf_used); +#endif +struct suspend2_core_fns *s2_core_fns; +EXPORT_SYMBOL_GPL(s2_core_fns); + +dyn_pageflags_t pageset1_map; +dyn_pageflags_t pageset1_copy_map; +EXPORT_SYMBOL_GPL(pageset1_map); +EXPORT_SYMBOL_GPL(pageset1_copy_map); + +unsigned long suspend_result = 0; +unsigned long suspend_debug_state = 0; +int suspend_io_time[2][2]; +struct pagedir pagedir1 = {1}; + +EXPORT_SYMBOL_GPL(suspend_io_time); +EXPORT_SYMBOL_GPL(suspend_debug_state); +EXPORT_SYMBOL_GPL(suspend_result); +EXPORT_SYMBOL_GPL(pagedir1); + +unsigned long suspend_get_nonconflicting_page(void) +{ + return s2_core_fns->get_nonconflicting_page(); +} + +int suspend_post_context_save(void) +{ + return s2_core_fns->post_context_save(); +} + +int suspend2_try_suspend(int have_pmsem) +{ + if (!s2_core_fns) + return -ENODEV; + + return s2_core_fns->try_suspend(have_pmsem); +} + +void suspend2_try_resume(void) +{ + if (s2_core_fns) + s2_core_fns->try_resume(); +} + +int suspend2_lowlevel_builtin(void) +{ + int error = 0; + + save_processor_state(); + if ((error = swsusp_arch_suspend())) + printk(KERN_ERR "Error %d suspending\n", error); + /* Restore control flow appears here */ + restore_processor_state(); + + return error; +} + +EXPORT_SYMBOL_GPL(suspend2_lowlevel_builtin); + +unsigned long suspend_compress_bytes_in, suspend_compress_bytes_out; +EXPORT_SYMBOL_GPL(suspend_compress_bytes_in); +EXPORT_SYMBOL_GPL(suspend_compress_bytes_out); + +#ifdef CONFIG_SUSPEND2_REPLACE_SWSUSP +unsigned long suspend_action = (1 << SUSPEND_REPLACE_SWSUSP) | (1 << SUSPEND_PAGESET2_FULL); +#else +unsigned long suspend_action = 1 << SUSPEND_PAGESET2_FULL; +#endif +EXPORT_SYMBOL_GPL(suspend_action); + +unsigned long suspend_state = ((1 << SUSPEND_BOOT_TIME) | + (1 << SUSPEND_IGNORE_LOGLEVEL) | + (1 << SUSPEND_IO_STOPPED)); +EXPORT_SYMBOL_GPL(suspend_state); + +/* The number of suspends we have started (some may have been cancelled) */ +unsigned int nr_suspends; +EXPORT_SYMBOL_GPL(nr_suspends); + +char resume2_file[256] = CONFIG_SUSPEND2_DEFAULT_RESUME2; +EXPORT_SYMBOL_GPL(resume2_file); + +int suspend2_running = 0; +EXPORT_SYMBOL_GPL(suspend2_running); + +int suspend2_in_suspend __nosavedata; +EXPORT_SYMBOL_GPL(suspend2_in_suspend); + +unsigned long suspend2_nosave_state1 __nosavedata = 0; +unsigned long suspend2_nosave_state2 __nosavedata = 0; +int suspend2_nosave_state3 __nosavedata = 0; +int suspend2_nosave_io_speed[2][2] __nosavedata; +__nosavedata char suspend2_nosave_commandline[COMMAND_LINE_SIZE]; + +__nosavedata struct pbe *restore_highmem_pblist; + +#ifdef CONFIG_SUSPEND2_CORE_EXPORTS +#ifdef CONFIG_HIGHMEM +EXPORT_SYMBOL_GPL(nr_free_highpages); +EXPORT_SYMBOL_GPL(saveable_highmem_page); +EXPORT_SYMBOL_GPL(restore_highmem_pblist); +#endif + +EXPORT_SYMBOL_GPL(suspend2_nosave_state1); +EXPORT_SYMBOL_GPL(suspend2_nosave_state2); +EXPORT_SYMBOL_GPL(suspend2_nosave_state3); +EXPORT_SYMBOL_GPL(suspend2_nosave_io_speed); +EXPORT_SYMBOL_GPL(suspend2_nosave_commandline); +#endif + +/* -- Commandline Parameter Handling --- + * + * Resume setup: obtain the storage device. + */ +static int __init resume2_setup(char *str) +{ + if (!*str) + return 0; + + strncpy(resume2_file, str, 255); + return 0; +} + +/* + * Allow the user to specify that we should ignore any image found and + * invalidate the image if necesssary. This is equivalent to running + * the task queue and a sync and then turning off the power. The same + * precautions should be taken: fsck if you're not journalled. + */ +static int __init noresume2_setup(char *str) +{ + set_suspend_state(SUSPEND_NORESUME_SPECIFIED); + return 0; +} + +static int __init suspend_retry_resume_setup(char *str) +{ + set_suspend_state(SUSPEND_RETRY_RESUME); + return 0; +} + +#ifndef CONFIG_SOFTWARE_SUSPEND +static int __init resume_setup(char *str) +{ + if (!*str) + return 0; + + strncpy(resume2_file, str, 255); + return 0; +} + +static int __init noresume_setup(char *str) +{ + set_suspend_state(SUSPEND_NORESUME_SPECIFIED); + return 0; +} +__setup("noresume", noresume_setup); +__setup("resume=", resume_setup); +#endif + +__setup("noresume2", noresume2_setup); +__setup("resume2=", resume2_setup); +__setup("suspend_retry_resume", suspend_retry_resume_setup); + diff --git a/kernel/power/suspend2_builtin.h b/kernel/power/suspend2_builtin.h new file mode 100644 index 0000000..968b24b --- /dev/null +++ b/kernel/power/suspend2_builtin.h @@ -0,0 +1,35 @@ +/* + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + */ +#include +#include + +extern struct suspend2_core_fns *s2_core_fns; +extern unsigned long suspend_compress_bytes_in, suspend_compress_bytes_out; +extern unsigned long suspend_action; +extern unsigned int nr_suspends; +extern char resume2_file[256]; +extern int suspend2_in_suspend; + +extern unsigned long suspend2_nosave_state1 __nosavedata; +extern unsigned long suspend2_nosave_state2 __nosavedata; +extern int suspend2_nosave_state3 __nosavedata; +extern int suspend2_nosave_io_speed[2][2] __nosavedata; +extern __nosavedata char suspend2_nosave_commandline[COMMAND_LINE_SIZE]; +extern __nosavedata struct pbe *restore_highmem_pblist; + +int suspend2_lowlevel_builtin(void); + +extern dyn_pageflags_t __nosavedata suspend2_nosave_origmap; +extern dyn_pageflags_t __nosavedata suspend2_nosave_copymap; + +#ifdef CONFIG_HIGHMEM +extern __nosavedata struct zone_data *suspend2_nosave_zone_list; +extern __nosavedata unsigned long suspend2_nosave_max_pfn; +#endif + +extern unsigned long suspend_get_nonconflicting_page(void); +extern int suspend_post_context_save(void); +extern int suspend2_try_suspend(int have_pmsem); diff --git a/kernel/power/suspend_block_io.c b/kernel/power/suspend_block_io.c new file mode 100644 index 0000000..533ae5c --- /dev/null +++ b/kernel/power/suspend_block_io.c @@ -0,0 +1,1020 @@ +/* + * kernel/power/suspend_block_io.c + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * Distributed under GPLv2. + * + * This file contains block io functions for suspend2. These are + * used by the swapwriter and it is planned that they will also + * be used by the NFSwriter. + * + */ + +#include +#include +#include + +#include "suspend.h" +#include "sysfs.h" +#include "modules.h" +#include "prepare_image.h" +#include "block_io.h" +#include "ui.h" + +static int pr_index; + +#if 0 +#define PR_DEBUG(a, b...) do { if (pr_index < 20) printk(a, ##b); } while(0) +#else +#define PR_DEBUG(a, b...) do { } while(0) +#endif + +#define MAX_OUTSTANDING_IO 2048 +#define SUBMIT_BATCH_SIZE 128 + +static int max_outstanding_io = MAX_OUTSTANDING_IO; +static int submit_batch_size = SUBMIT_BATCH_SIZE; + +struct io_info { + struct bio *sys_struct; + sector_t first_block; + struct page *bio_page, *dest_page; + int writing, readahead_index; + struct block_device *dev; + struct list_head list; +}; + +static LIST_HEAD(ioinfo_ready_for_cleanup); +static DEFINE_SPINLOCK(ioinfo_ready_lock); + +static LIST_HEAD(ioinfo_submit_batch); +static DEFINE_SPINLOCK(ioinfo_submit_lock); + +static LIST_HEAD(ioinfo_busy); +static DEFINE_SPINLOCK(ioinfo_busy_lock); + +static struct io_info *waiting_on; + +static atomic_t submit_batch; +static int submit_batched(void); + +/* [Max] number of I/O operations pending */ +static atomic_t outstanding_io; + +static int extra_page_forward = 0; + +static volatile unsigned long suspend_readahead_flags[ + DIV_ROUND_UP(MAX_OUTSTANDING_IO, BITS_PER_LONG)]; +static spinlock_t suspend_readahead_flags_lock = SPIN_LOCK_UNLOCKED; +static struct page *suspend_readahead_pages[MAX_OUTSTANDING_IO]; +static int readahead_index, readahead_submit_index; + +static int current_stream; +/* 0 = Header, 1 = Pageset1, 2 = Pageset2 */ +struct extent_iterate_saved_state suspend_writer_posn_save[3]; + +/* Pointer to current entry being loaded/saved. */ +struct extent_iterate_state suspend_writer_posn; + +/* Not static, so that the allocators can setup and complete + * writing the header */ +char *suspend_writer_buffer; +int suspend_writer_buffer_posn; + +int suspend_read_fd; + +static struct suspend_bdev_info *suspend_devinfo; + +int suspend_header_bytes_used = 0; + +DEFINE_MUTEX(suspend_bio_mutex); + +/* + * __suspend_bio_cleanup_one + * + * Description: Clean up after completing I/O on a page. + * Arguments: struct io_info: Data for I/O to be completed. + */ +static void __suspend_bio_cleanup_one(struct io_info *io_info) +{ + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0, + "Cleanup IO: [%p]\n", io_info); + + if (!io_info->writing && io_info->readahead_index == -1) { + char *from, *to; + /* + * Copy the page we read into the buffer our caller provided. + */ + to = (char *) kmap(io_info->dest_page); + from = (char *) kmap(io_info->bio_page); + memcpy(to, from, PAGE_SIZE); + kunmap(io_info->dest_page); + kunmap(io_info->bio_page); + } + + if (io_info->writing || io_info->readahead_index == -1) { + /* Sanity check */ + if (page_count(io_info->bio_page) != 2) + printk(KERN_EMERG "Cleanup IO: Page count on page %p" + " is %d. Not good!\n", + io_info->bio_page, + page_count(io_info->bio_page)); + put_page(io_info->bio_page); + __free_page(io_info->bio_page); + } else + put_page(io_info->bio_page); + + bio_put(io_info->sys_struct); + io_info->sys_struct = NULL; +} + +/* __suspend_io_cleanup + */ + +static void suspend_bio_cleanup_one(void *data) +{ + struct io_info *io_info = (struct io_info *) data; + int readahead_index; + unsigned long flags; + + readahead_index = io_info->readahead_index; + list_del_init(&io_info->list); + __suspend_bio_cleanup_one(io_info); + + if (readahead_index > -1) { + int index = readahead_index/BITS_PER_LONG; + int bit = readahead_index - (index * BITS_PER_LONG); + spin_lock_irqsave(&suspend_readahead_flags_lock, flags); + set_bit(bit, &suspend_readahead_flags[index]); + spin_unlock_irqrestore(&suspend_readahead_flags_lock, flags); + } + + if (waiting_on == io_info) + waiting_on = NULL; + kfree(io_info); + atomic_dec(&outstanding_io); +} + +/* suspend_cleanup_some_completed_io + * + * NB: This is designed so that multiple callers can be in here simultaneously. + */ + +static void suspend_cleanup_some_completed_io(void) +{ + int num_cleaned = 0; + struct io_info *first; + unsigned long flags; + + spin_lock_irqsave(&ioinfo_ready_lock, flags); + while(!list_empty(&ioinfo_ready_for_cleanup)) { + first = list_entry(ioinfo_ready_for_cleanup.next, + struct io_info, list); + + list_del_init(&first->list); + + spin_unlock_irqrestore(&ioinfo_ready_lock, flags); + suspend_bio_cleanup_one((void *) first); + spin_lock_irqsave(&ioinfo_ready_lock, flags); + + num_cleaned++; + if (num_cleaned == submit_batch_size) + break; + } + spin_unlock_irqrestore(&ioinfo_ready_lock, flags); +} + +/* do_bio_wait + * + * Actions taken when we want some I/O to get run. + * + * Submit any I/O that's batched up (if we're not already doing + * that, unplug queues, schedule and clean up whatever we can. + */ +static void do_bio_wait(void) +{ + int num_submitted = 0; + + /* Don't want to wait on I/O we haven't submitted! */ + num_submitted = submit_batched(); + + kblockd_flush(); + + io_schedule(); + + suspend_cleanup_some_completed_io(); +} + +/* + * suspend_finish_all_io + * + * Description: Finishes all IO and frees all IO info struct pages. + */ +static void suspend_finish_all_io(void) +{ + /* Wait for all I/O to complete. */ + while (atomic_read(&outstanding_io)) + do_bio_wait(); +} + +/* + * wait_on_readahead + * + * Wait until a particular readahead is ready. + */ +static void suspend_wait_on_readahead(int readahead_index) +{ + int index = readahead_index / BITS_PER_LONG; + int bit = readahead_index - index * BITS_PER_LONG; + + /* read_ahead_index is the one we want to return */ + while (!test_bit(bit, &suspend_readahead_flags[index])) + do_bio_wait(); +} + +/* + * readahead_done + * + * Returns whether the readahead requested is ready. + */ + +static int suspend_readahead_ready(int readahead_index) +{ + int index = readahead_index / BITS_PER_LONG; + int bit = readahead_index - (index * BITS_PER_LONG); + + return test_bit(bit, &suspend_readahead_flags[index]); +} + +/* suspend_readahead_prepare + * Set up for doing readahead on an image */ +static int suspend_prepare_readahead(int index) +{ + unsigned long new_page = get_zeroed_page(GFP_ATOMIC | __GFP_NOWARN); + + if(!new_page) + return -ENOMEM; + + suspend_readahead_pages[index] = virt_to_page(new_page); + return 0; +} + +/* suspend_readahead_cleanup + * Clean up structures used for readahead */ +static void suspend_cleanup_readahead(int page) +{ + __free_page(suspend_readahead_pages[page]); + suspend_readahead_pages[page] = 0; + return; +} + +/* + * suspend_end_bio + * + * Description: Function called by block driver from interrupt context when I/O + * is completed. This is the reason we use spinlocks in + * manipulating the io_info lists. + * Nearly the fs/buffer.c version, but we want to mark the page as + * done in our own structures too. + */ + +static int suspend_end_bio(struct bio *bio, unsigned int num, int err) +{ + struct io_info *io_info = bio->bi_private; + unsigned long flags; + + spin_lock_irqsave(&ioinfo_busy_lock, flags); + list_del_init(&io_info->list); + spin_unlock_irqrestore(&ioinfo_busy_lock, flags); + + spin_lock_irqsave(&ioinfo_ready_lock, flags); + list_add_tail(&io_info->list, &ioinfo_ready_for_cleanup); + spin_unlock_irqrestore(&ioinfo_ready_lock, flags); + return 0; +} + +/** + * submit - submit BIO request. + * @writing: READ or WRITE. + * @io_info: IO info structure. + * + * Based on Patrick's pmdisk code from long ago: + * "Straight from the textbook - allocate and initialize the bio. + * If we're writing, make sure the page is marked as dirty. + * Then submit it and carry on." + * + * With a twist, though - we handle block_size != PAGE_SIZE. + * Caller has already checked that our page is not fragmented. + */ + +static int submit(struct io_info *io_info) +{ + struct bio *bio = NULL; + unsigned long flags; + + while (!bio) { + bio = bio_alloc(GFP_ATOMIC,1); + if (!bio) + do_bio_wait(); + } + + bio->bi_bdev = io_info->dev; + bio->bi_sector = io_info->first_block; + bio->bi_private = io_info; + bio->bi_end_io = suspend_end_bio; + io_info->sys_struct = bio; + + if (bio_add_page(bio, io_info->bio_page, PAGE_SIZE, 0) < PAGE_SIZE) { + printk("ERROR: adding page to bio at %lld\n", + (unsigned long long) io_info->first_block); + bio_put(bio); + return -EFAULT; + } + + if (io_info->writing) + bio_set_pages_dirty(bio); + + spin_lock_irqsave(&ioinfo_busy_lock, flags); + list_add_tail(&io_info->list, &ioinfo_busy); + spin_unlock_irqrestore(&ioinfo_busy_lock, flags); + + submit_bio(io_info->writing, bio); + + return 0; +} + +/* + * submit a batch. The submit function can wait on I/O, so we have + * simple locking to avoid infinite recursion. + */ +static int submit_batched(void) +{ + static int running_already = 0; + struct io_info *first; + unsigned long flags; + int num_submitted = 0; + + if (running_already) + return 0; + + running_already = 1; + spin_lock_irqsave(&ioinfo_submit_lock, flags); + while(!list_empty(&ioinfo_submit_batch)) { + first = list_entry(ioinfo_submit_batch.next, struct io_info, + list); + list_del_init(&first->list); + atomic_dec(&submit_batch); + spin_unlock_irqrestore(&ioinfo_submit_lock, flags); + submit(first); + spin_lock_irqsave(&ioinfo_submit_lock, flags); + num_submitted++; + if (num_submitted == submit_batch_size) + break; + } + spin_unlock_irqrestore(&ioinfo_submit_lock, flags); + running_already = 0; + + return num_submitted; +} + +static void add_to_batch(struct io_info *io_info) +{ + unsigned long flags; + int waiting; + + /* Put our prepared I/O struct on the batch list. */ + spin_lock_irqsave(&ioinfo_submit_lock, flags); + list_add_tail(&io_info->list, &ioinfo_submit_batch); + waiting = atomic_add_return(1, &submit_batch); + spin_unlock_irqrestore(&ioinfo_submit_lock, flags); + + if (waiting >= submit_batch_size) + submit_batched(); +} + +/* + * get_io_info_struct + * + * Description: Get an I/O struct. + * Returns: Pointer to the struct prepared for use. + */ +static struct io_info *get_io_info_struct(void) +{ + struct io_info *this = NULL; + + do { + while (atomic_read(&outstanding_io) >= max_outstanding_io) + do_bio_wait(); + + this = kmalloc(sizeof(struct io_info), GFP_ATOMIC); + } while (!this); + + INIT_LIST_HEAD(&this->list); + return this; +} + +/* + * suspend_do_io + * + * Description: Prepare and start a read or write operation. + * Note that we use our own buffer for reading or writing. + * This simplifies doing readahead and asynchronous writing. + * We can begin a read without knowing the location into which + * the data will eventually be placed, and the buffer passed + * for a write can be reused immediately (essential for the + * modules system). + * Failure? What's that? + * Returns: The io_info struct created. + */ +static int suspend_do_io(int writing, struct block_device *bdev, long block0, + struct page *page, int readahead_index, int syncio) +{ + struct io_info *io_info; + unsigned long buffer_virt = 0; + char *to, *from; + + io_info = get_io_info_struct(); + + /* Done before submitting to avoid races. */ + if (syncio) + waiting_on = io_info; + + /* Get our local buffer */ + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 1, + "Start_IO: [%p]", io_info); + + /* Copy settings to the io_info struct */ + io_info->writing = writing; + io_info->dev = bdev; + io_info->first_block = block0; + io_info->dest_page = page; + io_info->readahead_index = readahead_index; + + if (io_info->readahead_index == -1) { + while (!(buffer_virt = get_zeroed_page(GFP_ATOMIC | __GFP_NOWARN))) + do_bio_wait(); + + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0, + "[ALLOC BUFFER]->%d", + real_nr_free_pages(all_zones_mask)); + io_info->bio_page = virt_to_page(buffer_virt); + } else { + unsigned long flags; + int index = io_info->readahead_index / BITS_PER_LONG; + int bit = io_info->readahead_index - index * BITS_PER_LONG; + + spin_lock_irqsave(&suspend_readahead_flags_lock, flags); + clear_bit(bit, &suspend_readahead_flags[index]); + spin_unlock_irqrestore(&suspend_readahead_flags_lock, flags); + + io_info->bio_page = page; + } + + /* If writing, copy our data. The data is probably in + * lowmem, but we cannot be certain. If there is no + * compression/encryption, we might be passed the + * actual source page's address. */ + if (writing) { + to = (char *) buffer_virt; + from = kmap_atomic(page, KM_USER1); + memcpy(to, from, PAGE_SIZE); + kunmap_atomic(from, KM_USER1); + } + + /* Submit the page */ + get_page(io_info->bio_page); + + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 1, + "-> (PRE BRW) %d\n", real_nr_free_pages(all_zones_mask)); + + if (syncio) + submit(io_info); + else + add_to_batch(io_info); + + atomic_inc(&outstanding_io); + + if (syncio) + do { do_bio_wait(); } while (waiting_on); + + return 0; +} + +/* We used to use bread here, but it doesn't correctly handle + * blocksize != PAGE_SIZE. Now we create a submit_info to get the data we + * want and use our normal routines (synchronously). + */ + +static int suspend_bdev_page_io(int writing, struct block_device *bdev, + long pos, struct page *page) +{ + return suspend_do_io(writing, bdev, pos, page, -1, 1); +} + +static int suspend_bio_memory_needed(void) +{ + /* We want to have at least enough memory so as to have + * max_outstanding_io transactions on the fly at once. If we + * can do more, fine. */ + return (max_outstanding_io * (PAGE_SIZE + sizeof(struct request) + + sizeof(struct bio) + sizeof(struct io_info))); +} + +static void suspend_set_devinfo(struct suspend_bdev_info *info) +{ + suspend_devinfo = info; +} + +static void dump_block_chains(void) +{ + int i; + + for (i = 0; i < suspend_writer_posn.num_chains; i++) { + struct extent *this; + + printk("Chain %d:", i); + + this = (suspend_writer_posn.chains + i)->first; + + if (!this) + printk(" (Empty)"); + + while (this) { + printk(" [%lu-%lu]%s", this->minimum, this->maximum, + this->next ? "," : ""); + this = this->next; + } + + printk("\n"); + } + + for (i = 0; i < 3; i++) + printk("Posn %d: Chain %d, extent %d, offset %lu.\n", i, + suspend_writer_posn_save[i].chain_num, + suspend_writer_posn_save[i].extent_num, + suspend_writer_posn_save[i].offset); +} +static int forward_extra_blocks(void) +{ + int i; + + for (i = 1; i < suspend_devinfo[suspend_writer_posn.current_chain]. + blocks_per_page; i++) + suspend_extent_state_next(&suspend_writer_posn); + + if (suspend_extent_state_eof(&suspend_writer_posn)) { + printk("Extent state eof.\n"); + dump_block_chains(); + return -ENODATA; + } + + return 0; +} + +static int forward_one_page(void) +{ + int at_start = (suspend_writer_posn.current_chain == -1); + + /* Have to go forward one to ensure we're on the right chain, + * before we can know how many more blocks to skip.*/ + suspend_extent_state_next(&suspend_writer_posn); + + if (!at_start && forward_extra_blocks()) + return -ENODATA; + + if (extra_page_forward) { + extra_page_forward = 0; + return forward_one_page(); + } + + return 0; +} + +/* Used in reading header, to jump to 2nd page after getting 1st page + * direct from image header. */ +static void set_extra_page_forward(void) +{ + extra_page_forward = 1; +} + +static int suspend_bio_rw_page(int writing, struct page *page, + int readahead_index, int sync) +{ + struct suspend_bdev_info *dev_info; + + if (test_action_state(SUSPEND_TEST_FILTER_SPEED)) + return 0; + + if (forward_one_page()) { + printk("Failed to advance a page in the extent data.\n"); + return -ENODATA; + } + + if (current_stream == 0 && writing && + suspend_writer_posn.current_chain == suspend_writer_posn_save[2].chain_num && + suspend_writer_posn.current_offset == suspend_writer_posn_save[2].offset) { + dump_block_chains(); + BUG(); + } + + dev_info = &suspend_devinfo[suspend_writer_posn.current_chain]; + + return suspend_do_io(writing, dev_info->bdev, + suspend_writer_posn.current_offset << + dev_info->bmap_shift, + page, readahead_index, sync); +} + +static int suspend_rw_init(int writing, int stream_number) +{ + suspend_header_bytes_used = 0; + + suspend_extent_state_restore(&suspend_writer_posn, + &suspend_writer_posn_save[stream_number]); + + suspend_writer_buffer_posn = writing ? 0 : PAGE_SIZE; + + current_stream = stream_number; + + readahead_index = readahead_submit_index = -1; + + pr_index = 0; + + return 0; +} + +static void suspend_read_header_init(void) +{ + readahead_index = readahead_submit_index = -1; +} + +static int suspend_rw_cleanup(int writing) +{ + if (writing && suspend_bio_rw_page(WRITE, + virt_to_page(suspend_writer_buffer), -1, 0)) + return -EIO; + + if (writing && current_stream == 2) + suspend_extent_state_save(&suspend_writer_posn, + &suspend_writer_posn_save[1]); + + suspend_finish_all_io(); + + if (!writing) + while (readahead_index != readahead_submit_index) { + suspend_cleanup_readahead(readahead_index); + readahead_index++; + if (readahead_index == max_outstanding_io) + readahead_index = 0; + } + + current_stream = 0; + + return 0; +} + +static int suspend_bio_read_page_with_readahead(void) +{ + static int last_result; + unsigned long *virt; + + if (readahead_index == -1) { + last_result = 0; + readahead_index = readahead_submit_index = 0; + } + + /* Start a new readahead? */ + if (last_result) { + /* We failed to submit a read, and have cleaned up + * all the readahead previously submitted */ + if (readahead_submit_index == readahead_index) { + abort_suspend(SUSPEND_FAILED_IO, "Failed to submit" + " a read and no readahead left.\n"); + return -EIO; + } + goto wait; + } + + do { + if (suspend_prepare_readahead(readahead_submit_index)) + break; + + last_result = suspend_bio_rw_page(READ, + suspend_readahead_pages[readahead_submit_index], + readahead_submit_index, SUSPEND_ASYNC); + if (last_result) { + printk("Begin read chunk for page %d returned %d.\n", + readahead_submit_index, last_result); + suspend_cleanup_readahead(readahead_submit_index); + break; + } + + readahead_submit_index++; + + if (readahead_submit_index == max_outstanding_io) + readahead_submit_index = 0; + + } while((!last_result) && (readahead_submit_index != readahead_index) && + (!suspend_readahead_ready(readahead_index))); + +wait: + suspend_wait_on_readahead(readahead_index); + + virt = kmap_atomic(suspend_readahead_pages[readahead_index], KM_USER1); + memcpy(suspend_writer_buffer, virt, PAGE_SIZE); + kunmap_atomic(virt, KM_USER1); + + suspend_cleanup_readahead(readahead_index); + + readahead_index++; + if (readahead_index == max_outstanding_io) + readahead_index = 0; + + return 0; +} + +/* + * + */ + +static int suspend_rw_buffer(int writing, char *buffer, int buffer_size) +{ + int bytes_left = buffer_size; + + /* Read/write a chunk of the header */ + while (bytes_left) { + char *source_start = buffer + buffer_size - bytes_left; + char *dest_start = suspend_writer_buffer + suspend_writer_buffer_posn; + int capacity = PAGE_SIZE - suspend_writer_buffer_posn; + char *to = writing ? dest_start : source_start; + char *from = writing ? source_start : dest_start; + + if (bytes_left <= capacity) { + if (test_debug_state(SUSPEND_HEADER)) + printk("Copy %d bytes %d-%d from %p to %p.\n", + bytes_left, + suspend_header_bytes_used, + suspend_header_bytes_used + bytes_left, + from, to); + memcpy(to, from, bytes_left); + suspend_writer_buffer_posn += bytes_left; + suspend_header_bytes_used += bytes_left; + return 0; + } + + /* Complete this page and start a new one */ + if (test_debug_state(SUSPEND_HEADER)) + printk("Copy %d bytes (%d-%d) from %p to %p.\n", + capacity, + suspend_header_bytes_used, + suspend_header_bytes_used + capacity, + from, to); + memcpy(to, from, capacity); + bytes_left -= capacity; + suspend_header_bytes_used += capacity; + + if (!writing) { + if (test_suspend_state(SUSPEND_TRY_RESUME_RD)) + sys_read(suspend_read_fd, + suspend_writer_buffer, BLOCK_SIZE); + else + if (suspend_bio_read_page_with_readahead()) + return -EIO; + } else if (suspend_bio_rw_page(WRITE, + virt_to_page(suspend_writer_buffer), + -1, SUSPEND_ASYNC)) + return -EIO; + + suspend_writer_buffer_posn = 0; + suspend_cond_pause(0, NULL); + } + + return 0; +} + +/* + * suspend_bio_read_chunk + * + * Read a (possibly compressed and/or encrypted) page from the image, + * into buffer_page, returning it's index and the buffer size. + * + * If asynchronous I/O is requested, use readahead. + */ + +static int suspend_bio_read_chunk(unsigned long *index, struct page *buffer_page, + unsigned int *buf_size, int sync) +{ + int result; + char *buffer_virt = kmap(buffer_page); + + pr_index++; + + while (!mutex_trylock(&suspend_bio_mutex)) + do_bio_wait(); + + if ((result = suspend_rw_buffer(READ, (char *) index, + sizeof(unsigned long)))) { + abort_suspend(SUSPEND_FAILED_IO, + "Read of index returned %d.\n", result); + goto out; + } + + if ((result = suspend_rw_buffer(READ, (char *) buf_size, sizeof(int)))) { + abort_suspend(SUSPEND_FAILED_IO, + "Read of buffer size is %d.\n", result); + goto out; + } + + result = suspend_rw_buffer(READ, buffer_virt, *buf_size); + if (result) + abort_suspend(SUSPEND_FAILED_IO, + "Read of data returned %d.\n", result); + + PR_DEBUG("%d: Index %ld, %d bytes.\n", pr_index, *index, *buf_size); +out: + mutex_unlock(&suspend_bio_mutex); + kunmap(buffer_page); + if (result) + abort_suspend(SUSPEND_FAILED_IO, + "Returning %d from suspend_bio_read_chunk.\n", result); + return result; +} + +/* + * suspend_bio_write_chunk + * + * Write a (possibly compressed and/or encrypted) page to the image from + * the buffer, together with it's index and buffer size. + */ + +static int suspend_bio_write_chunk(unsigned long index, struct page *buffer_page, + unsigned int buf_size) +{ + int result; + char *buffer_virt = kmap(buffer_page); + + pr_index++; + + while (!mutex_trylock(&suspend_bio_mutex)) + do_bio_wait(); + + if ((result = suspend_rw_buffer(WRITE, (char *) &index, + sizeof(unsigned long)))) + goto out; + + if ((result = suspend_rw_buffer(WRITE, (char *) &buf_size, sizeof(int)))) + goto out; + + result = suspend_rw_buffer(WRITE, buffer_virt, buf_size); + + PR_DEBUG("%d: Index %ld, %d bytes.\n", pr_index, index, buf_size); +out: + mutex_unlock(&suspend_bio_mutex); + kunmap(buffer_page); + return result; +} + +/* + * suspend_rw_header_chunk + * + * Read or write a portion of the header. + */ + +static int suspend_rw_header_chunk(int writing, + struct suspend_module_ops *owner, + char *buffer, int buffer_size) +{ + if (owner) { + owner->header_used += buffer_size; + if (owner->header_used > owner->header_requested) { + printk(KERN_EMERG "Suspend2 module %s is using more" + "header space (%u) than it requested (%u).\n", + owner->name, + owner->header_used, + owner->header_requested); + return buffer_size; + } + } + + return suspend_rw_buffer(writing, buffer, buffer_size); +} + +/* + * write_header_chunk_finish + * + * Flush any buffered writes in the section of the image. + */ +static int write_header_chunk_finish(void) +{ + return suspend_bio_rw_page(WRITE, virt_to_page(suspend_writer_buffer), + -1, 0) ? -EIO : 0; +} + +static int suspend_bio_storage_needed(void) +{ + return 2 * sizeof(int); +} + +static int suspend_bio_save_config_info(char *buf) +{ + int *ints = (int *) buf; + ints[0] = max_outstanding_io; + ints[1] = submit_batch_size; + return 2 * sizeof(int); +} + +static void suspend_bio_load_config_info(char *buf, int size) +{ + int *ints = (int *) buf; + max_outstanding_io = ints[0]; + submit_batch_size = ints[1]; +} + +static int suspend_bio_initialise(int starting_cycle) +{ + suspend_writer_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + return suspend_writer_buffer ? 0 : -ENOMEM; +} + +static void suspend_bio_cleanup(int finishing_cycle) +{ + if (suspend_writer_buffer) { + free_page((unsigned long) suspend_writer_buffer); + suspend_writer_buffer = NULL; + } +} + +struct suspend_bio_ops suspend_bio_ops = { + .bdev_page_io = suspend_bdev_page_io, + .finish_all_io = suspend_finish_all_io, + .forward_one_page = forward_one_page, + .set_extra_page_forward = set_extra_page_forward, + .set_devinfo = suspend_set_devinfo, + .read_chunk = suspend_bio_read_chunk, + .write_chunk = suspend_bio_write_chunk, + .rw_init = suspend_rw_init, + .rw_cleanup = suspend_rw_cleanup, + .read_header_init = suspend_read_header_init, + .rw_header_chunk = suspend_rw_header_chunk, + .write_header_chunk_finish = write_header_chunk_finish, +}; + +static struct suspend_sysfs_data sysfs_params[] = { + { SUSPEND2_ATTR("max_outstanding_io", SYSFS_RW), + SYSFS_INT(&max_outstanding_io, 16, MAX_OUTSTANDING_IO, 0), + }, + + { SUSPEND2_ATTR("submit_batch_size", SYSFS_RW), + SYSFS_INT(&submit_batch_size, 16, SUBMIT_BATCH_SIZE, 0), + } +}; + +static struct suspend_module_ops suspend_blockwriter_ops = +{ + .name = "Block I/O", + .type = MISC_MODULE, + .directory = "block_io", + .module = THIS_MODULE, + .memory_needed = suspend_bio_memory_needed, + .storage_needed = suspend_bio_storage_needed, + .save_config_info = suspend_bio_save_config_info, + .load_config_info = suspend_bio_load_config_info, + .initialise = suspend_bio_initialise, + .cleanup = suspend_bio_cleanup, + + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +static __init int suspend_block_io_load(void) +{ + return suspend_register_module(&suspend_blockwriter_ops); +} + +#ifdef CONFIG_SUSPEND2_FILE_EXPORTS +EXPORT_SYMBOL_GPL(suspend_read_fd); +#endif +#if defined(CONFIG_SUSPEND2_FILE_EXPORTS) || defined(CONFIG_SUSPEND2_SWAP_EXPORTS) +EXPORT_SYMBOL_GPL(suspend_writer_posn); +EXPORT_SYMBOL_GPL(suspend_writer_posn_save); +EXPORT_SYMBOL_GPL(suspend_writer_buffer); +EXPORT_SYMBOL_GPL(suspend_writer_buffer_posn); +EXPORT_SYMBOL_GPL(suspend_header_bytes_used); +EXPORT_SYMBOL_GPL(suspend_bio_ops); +#endif +#ifdef MODULE +static __exit void suspend_block_io_unload(void) +{ + suspend_unregister_module(&suspend_blockwriter_ops); +} + +module_init(suspend_block_io_load); +module_exit(suspend_block_io_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Suspend2 block io functions"); +#else +late_initcall(suspend_block_io_load); +#endif diff --git a/kernel/power/suspend_compress.c b/kernel/power/suspend_compress.c new file mode 100644 index 0000000..6e70e33 --- /dev/null +++ b/kernel/power/suspend_compress.c @@ -0,0 +1,436 @@ +/* + * kernel/power/compression.c + * + * Copyright (C) 2003-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * This file contains data compression routines for suspend, + * using cryptoapi. + */ + +#include +#include +#include +#include +#include + +#include "suspend2_builtin.h" +#include "suspend.h" +#include "modules.h" +#include "sysfs.h" +#include "io.h" +#include "ui.h" + +static int suspend_expected_compression = 0; + +static struct suspend_module_ops suspend_compression_ops; +static struct suspend_module_ops *next_driver; + +static char suspend_compressor_name[32] = "lzf"; + +static DEFINE_MUTEX(stats_lock); + +struct cpu_context { + u8 * page_buffer; + struct crypto_comp *transform; + unsigned int len; + char *buffer_start; +}; + +static DEFINE_PER_CPU(struct cpu_context, contexts); + +static int suspend_compress_prepare_result; + +/* + * suspend_compress_cleanup + * + * Frees memory allocated for our labours. + */ +static void suspend_compress_cleanup(int suspend_or_resume) +{ + int cpu; + + if (!suspend_or_resume) + return; + + for_each_online_cpu(cpu) { + struct cpu_context *this = &per_cpu(contexts, cpu); + if (this->transform) { + crypto_free_comp(this->transform); + this->transform = NULL; + } + + if (this->page_buffer) + free_page((unsigned long) this->page_buffer); + + this->page_buffer = NULL; + } +} + +/* + * suspend_crypto_prepare + * + * Prepare to do some work by allocating buffers and transforms. + */ +static int suspend_compress_crypto_prepare(void) +{ + int cpu; + + if (!*suspend_compressor_name) { + printk("Suspend2: Compression enabled but no compressor name set.\n"); + return 1; + } + + for_each_online_cpu(cpu) { + struct cpu_context *this = &per_cpu(contexts, cpu); + this->transform = crypto_alloc_comp(suspend_compressor_name, + 0, 0); + if (IS_ERR(this->transform)) { + printk("Suspend2: Failed to initialise the %s " + "compression transform.\n", + suspend_compressor_name); + this->transform = NULL; + return 1; + } + + this->page_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + if (!this->page_buffer) { + printk(KERN_ERR + "Failed to allocate a page buffer for suspend2 " + "encryption driver.\n"); + return -ENOMEM; + } + } + + return 0; +} + +/* + * suspend_compress_init + */ + +static int suspend_compress_init(int suspend_or_resume) +{ + if (!suspend_or_resume) + return 0; + + suspend_compress_bytes_in = suspend_compress_bytes_out = 0; + + next_driver = suspend_get_next_filter(&suspend_compression_ops); + + if (!next_driver) { + printk("Compression Driver: Argh! Nothing follows me in" + " the pipeline!\n"); + return -ECHILD; + } + + suspend_compress_prepare_result = suspend_compress_crypto_prepare(); + + return 0; +} + +/* + * suspend_compress_rw_init() + */ + +int suspend_compress_rw_init(int rw, int stream_number) +{ + if (suspend_compress_prepare_result) { + printk("Failed to initialise compression algorithm.\n"); + if (rw == READ) + return -ENODEV; + else + suspend_compression_ops.enabled = 0; + } + + return 0; +} + +/* + * suspend_compress_write_chunk() + * + * Compress a page of data, buffering output and passing on filled + * pages to the next module in the pipeline. + * + * Buffer_page: Pointer to a buffer of size PAGE_SIZE, containing + * data to be compressed. + * + * Returns: 0 on success. Otherwise the error is that returned by later + * modules, -ECHILD if we have a broken pipeline or -EIO if + * zlib errs. + */ +static int suspend_compress_write_chunk(unsigned long index, + struct page *buffer_page, unsigned int buf_size) +{ + int ret, cpu = smp_processor_id(); + struct cpu_context *ctx = &per_cpu(contexts, cpu); + + if (!ctx->transform) + return next_driver->write_chunk(index, buffer_page, buf_size); + + ctx->buffer_start = kmap(buffer_page); + + ctx->len = buf_size; + + ret = crypto_comp_compress(ctx->transform, + ctx->buffer_start, buf_size, + ctx->page_buffer, &ctx->len); + + kunmap(buffer_page); + + if (ret) { + printk("Compression failed.\n"); + goto failure; + } + + mutex_lock(&stats_lock); + suspend_compress_bytes_in += buf_size; + suspend_compress_bytes_out += ctx->len; + mutex_unlock(&stats_lock); + + if (ctx->len < buf_size) /* some compression */ + ret = next_driver->write_chunk(index, + virt_to_page(ctx->page_buffer), + ctx->len); + else + ret = next_driver->write_chunk(index, buffer_page, buf_size); + +failure: + return ret; +} + +/* + * suspend_compress_read_chunk() + * @buffer_page: struct page *. Pointer to a buffer of size PAGE_SIZE. + * @sync: int. Whether the previous module (or core) wants its data + * synchronously. + * + * Retrieve data from later modules and decompress it until the input buffer + * is filled. + * Zero if successful. Error condition from me or from downstream on failure. + */ +static int suspend_compress_read_chunk(unsigned long *index, + struct page *buffer_page, unsigned int *buf_size, int sync) +{ + int ret, cpu = smp_processor_id(); + unsigned int len; + unsigned int outlen = PAGE_SIZE; + char *buffer_start; + struct cpu_context *ctx = &per_cpu(contexts, cpu); + + if (!ctx->transform) + return next_driver->read_chunk(index, buffer_page, buf_size, + sync); + + /* + * All our reads must be synchronous - we can't decompress + * data that hasn't been read yet. + */ + + *buf_size = PAGE_SIZE; + + ret = next_driver->read_chunk(index, buffer_page, &len, SUSPEND_SYNC); + + /* Error or uncompressed data */ + if (ret || len == PAGE_SIZE) + return ret; + + buffer_start = kmap(buffer_page); + memcpy(ctx->page_buffer, buffer_start, len); + ret = crypto_comp_decompress( + ctx->transform, + ctx->page_buffer, + len, buffer_start, &outlen); + if (ret) + abort_suspend(SUSPEND_FAILED_IO, + "Compress_read returned %d.\n", ret); + else if (outlen != PAGE_SIZE) { + abort_suspend(SUSPEND_FAILED_IO, + "Decompression yielded %d bytes instead of %ld.\n", + outlen, PAGE_SIZE); + ret = -EIO; + *buf_size = outlen; + } + kunmap(buffer_page); + return ret; +} + +/* + * suspend_compress_print_debug_stats + * @buffer: Pointer to a buffer into which the debug info will be printed. + * @size: Size of the buffer. + * + * Print information to be recorded for debugging purposes into a buffer. + * Returns: Number of characters written to the buffer. + */ + +static int suspend_compress_print_debug_stats(char *buffer, int size) +{ + int pages_in = suspend_compress_bytes_in >> PAGE_SHIFT, + pages_out = suspend_compress_bytes_out >> PAGE_SHIFT; + int len; + + /* Output the compression ratio achieved. */ + if (*suspend_compressor_name) + len = snprintf_used(buffer, size, "- Compressor is '%s'.\n", + suspend_compressor_name); + else + len = snprintf_used(buffer, size, "- Compressor is not set.\n"); + + if (pages_in) + len+= snprintf_used(buffer+len, size - len, + " Compressed %ld bytes into %ld (%d percent compression).\n", + suspend_compress_bytes_in, + suspend_compress_bytes_out, + (pages_in - pages_out) * 100 / pages_in); + return len; +} + +/* + * suspend_compress_compression_memory_needed + * + * Tell the caller how much memory we need to operate during suspend/resume. + * Returns: Unsigned long. Maximum number of bytes of memory required for + * operation. + */ +static int suspend_compress_memory_needed(void) +{ + return 2 * PAGE_SIZE; +} + +static int suspend_compress_storage_needed(void) +{ + return 4 * sizeof(unsigned long) + strlen(suspend_compressor_name) + 1; +} + +/* + * suspend_compress_save_config_info + * @buffer: Pointer to a buffer of size PAGE_SIZE. + * + * Save informaton needed when reloading the image at resume time. + * Returns: Number of bytes used for saving our data. + */ +static int suspend_compress_save_config_info(char *buffer) +{ + int namelen = strlen(suspend_compressor_name) + 1; + int total_len; + + *((unsigned long *) buffer) = suspend_compress_bytes_in; + *((unsigned long *) (buffer + 1 * sizeof(unsigned long))) = + suspend_compress_bytes_out; + *((unsigned long *) (buffer + 2 * sizeof(unsigned long))) = + suspend_expected_compression; + *((unsigned long *) (buffer + 3 * sizeof(unsigned long))) = namelen; + strncpy(buffer + 4 * sizeof(unsigned long), suspend_compressor_name, + namelen); + total_len = 4 * sizeof(unsigned long) + namelen; + return total_len; +} + +/* suspend_compress_load_config_info + * @buffer: Pointer to the start of the data. + * @size: Number of bytes that were saved. + * + * Description: Reload information needed for decompressing the image at + * resume time. + */ +static void suspend_compress_load_config_info(char *buffer, int size) +{ + int namelen; + + suspend_compress_bytes_in = *((unsigned long *) buffer); + suspend_compress_bytes_out = *((unsigned long *) (buffer + 1 * sizeof(unsigned long))); + suspend_expected_compression = *((unsigned long *) (buffer + 2 * + sizeof(unsigned long))); + namelen = *((unsigned long *) (buffer + 3 * sizeof(unsigned long))); + strncpy(suspend_compressor_name, buffer + 4 * sizeof(unsigned long), + namelen); + return; +} + +/* + * suspend_expected_compression_ratio + * + * Description: Returns the expected ratio between data passed into this module + * and the amount of data output when writing. + * Returns: 100 if the module is disabled. Otherwise the value set by the + * user via our sysfs entry. + */ + +static int suspend_compress_expected_ratio(void) +{ + if (!suspend_compression_ops.enabled) + return 100; + else + return 100 - suspend_expected_compression; +} + +/* + * data for our sysfs entries. + */ +static struct suspend_sysfs_data sysfs_params[] = { + { + SUSPEND2_ATTR("expected_compression", SYSFS_RW), + SYSFS_INT(&suspend_expected_compression, 0, 99, 0) + }, + + { + SUSPEND2_ATTR("enabled", SYSFS_RW), + SYSFS_INT(&suspend_compression_ops.enabled, 0, 1, 0) + }, + + { + SUSPEND2_ATTR("algorithm", SYSFS_RW), + SYSFS_STRING(suspend_compressor_name, 31, 0) + } +}; + +/* + * Ops structure. + */ +static struct suspend_module_ops suspend_compression_ops = { + .type = FILTER_MODULE, + .name = "Compressor", + .directory = "compression", + .module = THIS_MODULE, + .initialise = suspend_compress_init, + .cleanup = suspend_compress_cleanup, + .memory_needed = suspend_compress_memory_needed, + .print_debug_info = suspend_compress_print_debug_stats, + .save_config_info = suspend_compress_save_config_info, + .load_config_info = suspend_compress_load_config_info, + .storage_needed = suspend_compress_storage_needed, + .expected_compression = suspend_compress_expected_ratio, + + .rw_init = suspend_compress_rw_init, + + .write_chunk = suspend_compress_write_chunk, + .read_chunk = suspend_compress_read_chunk, + + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +/* ---- Registration ---- */ + +static __init int suspend_compress_load(void) +{ + return suspend_register_module(&suspend_compression_ops); +} + +#ifdef MODULE +static __exit void suspend_compress_unload(void) +{ + suspend_unregister_module(&suspend_compression_ops); +} + +module_init(suspend_compress_load); +module_exit(suspend_compress_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Compression Support for Suspend2"); +#else +late_initcall(suspend_compress_load); +#endif diff --git a/kernel/power/suspend_file.c b/kernel/power/suspend_file.c new file mode 100644 index 0000000..e523f8b --- /dev/null +++ b/kernel/power/suspend_file.c @@ -0,0 +1,1131 @@ +/* + * kernel/power/suspend_file.c + * + * Copyright (C) 2005-2007 Nigel Cunningham (nigel at suspend2 net) + * + * Distributed under GPLv2. + * + * This file encapsulates functions for usage of a simple file as a + * backing store. It is based upon the swapallocator, and shares the + * same basic working. Here, though, we have nothing to do with + * swapspace, and only one device to worry about. + * + * The user can just + * + * echo Suspend2 > /path/to/my_file + * + * and + * + * echo /path/to/my_file > /sys/power/suspend2/suspend_file/target + * + * then put what they find in /sys/power/suspend2/resume2 + * as their resume2= parameter in lilo.conf (and rerun lilo if using it). + * + * Having done this, they're ready to suspend and resume. + * + * TODO: + * - File resizing. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "suspend.h" +#include "sysfs.h" +#include "modules.h" +#include "ui.h" +#include "extent.h" +#include "io.h" +#include "storage.h" +#include "block_io.h" + +static struct suspend_module_ops suspend_fileops; + +/* Details of our target. */ + +char suspend_file_target[256]; +static struct inode *target_inode; +static struct file *target_file; +static struct block_device *suspend_file_target_bdev; +static dev_t resume_file_dev_t; +static int used_devt = 0; +static int setting_suspend_file_target = 0; +static sector_t target_firstblock = 0, target_header_start = 0; +static int target_storage_available = 0; +static int target_claim = 0; + +static char HaveImage[] = "HaveImage\n"; +static char NoImage[] = "Suspend2\n"; +#define sig_size (sizeof(HaveImage) + 1) + +struct suspend_file_header { + char sig[sig_size]; + int resumed_before; + unsigned long first_header_block; +}; + +extern char *__initdata root_device_name; + +/* Header Page Information */ +static int header_pages_allocated; + +/* Main Storage Pages */ +static int main_pages_allocated, main_pages_requested; + +#define target_is_normal_file() (S_ISREG(target_inode->i_mode)) + +static struct suspend_bdev_info devinfo; + +/* Extent chain for blocks */ +static struct extent_chain block_chain; + +/* Signature operations */ +enum { + GET_IMAGE_EXISTS, + INVALIDATE, + MARK_RESUME_ATTEMPTED, + UNMARK_RESUME_ATTEMPTED, +}; + +static void set_devinfo(struct block_device *bdev, int target_blkbits) +{ + devinfo.bdev = bdev; + if (!target_blkbits) { + devinfo.bmap_shift = devinfo.blocks_per_page = 0; + } else { + devinfo.bmap_shift = target_blkbits - 9; + devinfo.blocks_per_page = (1 << (PAGE_SHIFT - target_blkbits)); + } +} + +static int adjust_for_extra_pages(int unadjusted) +{ + return (unadjusted << PAGE_SHIFT) / (PAGE_SIZE + sizeof(unsigned long) + + sizeof(int)); +} + +static int suspend_file_storage_available(void) +{ + int result = 0; + struct block_device *bdev=suspend_file_target_bdev; + + if (!target_inode) + return 0; + + switch (target_inode->i_mode & S_IFMT) { + case S_IFSOCK: + case S_IFCHR: + case S_IFIFO: /* Socket, Char, Fifo */ + return -1; + case S_IFREG: /* Regular file: current size - holes + free + space on part */ + result = target_storage_available; + break; + case S_IFBLK: /* Block device */ + if (!bdev->bd_disk) { + printk("bdev->bd_disk null.\n"); + return 0; + } + + result = (bdev->bd_part ? + bdev->bd_part->nr_sects : + bdev->bd_disk->capacity) >> (PAGE_SHIFT - 9); + } + + return adjust_for_extra_pages(result); +} + +static int has_contiguous_blocks(int page_num) +{ + int j; + sector_t last = 0; + + for (j = 0; j < devinfo.blocks_per_page; j++) { + sector_t this = bmap(target_inode, + page_num * devinfo.blocks_per_page + j); + + if (!this || (last && (last + 1) != this)) + break; + + last = this; + } + + return (j == devinfo.blocks_per_page); +} + +static int size_ignoring_ignored_pages(void) +{ + int mappable = 0, i; + + if (!target_is_normal_file()) + return suspend_file_storage_available(); + + for (i = 0; i < (target_inode->i_size >> PAGE_SHIFT) ; i++) + if (has_contiguous_blocks(i)) + mappable++; + + return mappable; +} + +static void __populate_block_list(int min, int max) +{ + if (test_action_state(SUSPEND_TEST_BIO)) + printk("Adding extent %d-%d.\n", min << devinfo.bmap_shift, + ((max + 1) << devinfo.bmap_shift) - 1); + + suspend_add_to_extent_chain(&block_chain, min, max); +} + +static void populate_block_list(void) +{ + int i; + int extent_min = -1, extent_max = -1, got_header = 0; + + if (block_chain.first) + suspend_put_extent_chain(&block_chain); + + if (!target_is_normal_file()) { + if (target_storage_available > 0) + __populate_block_list(devinfo.blocks_per_page, + (target_storage_available + 1) * + devinfo.blocks_per_page - 1); + return; + } + + for (i = 0; i < (target_inode->i_size >> PAGE_SHIFT); i++) { + sector_t new_sector; + + if (!has_contiguous_blocks(i)) + continue; + + new_sector = bmap(target_inode, + (i * devinfo.blocks_per_page)); + + /* + * Ignore the first block in the file. + * It gets the header. + */ + if (new_sector == target_firstblock >> devinfo.bmap_shift) { + got_header = 1; + continue; + } + + /* + * I'd love to be able to fill in holes and resize + * files, but not yet... + */ + + if (new_sector == extent_max + 1) + extent_max+= devinfo.blocks_per_page; + else { + if (extent_min > -1) + __populate_block_list(extent_min, + extent_max); + + extent_min = new_sector; + extent_max = extent_min + + devinfo.blocks_per_page - 1; + } + } + + if (extent_min > -1) + __populate_block_list(extent_min, extent_max); +} + +static void suspend_file_cleanup(int finishing_cycle) +{ + if (suspend_file_target_bdev) { + if (target_claim) { + bd_release(suspend_file_target_bdev); + target_claim = 0; + } + + if (used_devt) { + blkdev_put(suspend_file_target_bdev); + used_devt = 0; + } + suspend_file_target_bdev = NULL; + target_inode = NULL; + set_devinfo(NULL, 0); + target_storage_available = 0; + } + + if (target_file > 0) { + filp_close(target_file, NULL); + target_file = NULL; + } +} + +/* + * reopen_resume_devt + * + * Having opened resume2= once, we remember the major and + * minor nodes and use them to reopen the bdev for checking + * whether an image exists (possibly when starting a resume). + */ +static void reopen_resume_devt(void) +{ + suspend_file_target_bdev = open_by_devnum(resume_file_dev_t, FMODE_READ); + if (IS_ERR(suspend_file_target_bdev)) { + printk("Got a dev_num (%lx) but failed to open it.\n", + (unsigned long) resume_file_dev_t); + return; + } + target_inode = suspend_file_target_bdev->bd_inode; + set_devinfo(suspend_file_target_bdev, target_inode->i_blkbits); +} + +static void suspend_file_get_target_info(char *target, int get_size, + int resume2) +{ + if (target_file) + suspend_file_cleanup(0); + + if (!target || !strlen(target)) + return; + + target_file = filp_open(target, O_RDWR, 0); + + if (IS_ERR(target_file) || !target_file) { + + if (!resume2) { + printk("Open file %s returned %p.\n", + target, target_file); + target_file = NULL; + return; + } + + target_file = NULL; + resume_file_dev_t = name_to_dev_t(target); + if (!resume_file_dev_t) { + struct kstat stat; + int error = vfs_stat(target, &stat); + printk("Open file %s returned %p and name_to_devt " + "failed.\n", target, target_file); + if (error) + printk("Stating the file also failed." + " Nothing more we can do.\n"); + else + resume_file_dev_t = stat.rdev; + return; + } + + suspend_file_target_bdev = open_by_devnum(resume_file_dev_t, + FMODE_READ); + if (IS_ERR(suspend_file_target_bdev)) { + printk("Got a dev_num (%lx) but failed to open it.\n", + (unsigned long) resume_file_dev_t); + return; + } + used_devt = 1; + target_inode = suspend_file_target_bdev->bd_inode; + } else + target_inode = target_file->f_mapping->host; + + if (S_ISLNK(target_inode->i_mode) || S_ISDIR(target_inode->i_mode) || + S_ISSOCK(target_inode->i_mode) || S_ISFIFO(target_inode->i_mode)) { + printk("File support works with regular files, character " + "files and block devices.\n"); + goto cleanup; + } + + if (!used_devt) { + if (S_ISBLK(target_inode->i_mode)) { + suspend_file_target_bdev = I_BDEV(target_inode); + if (!bd_claim(suspend_file_target_bdev, &suspend_fileops)) + target_claim = 1; + } else + suspend_file_target_bdev = target_inode->i_sb->s_bdev; + resume_file_dev_t = suspend_file_target_bdev->bd_dev; + } + + set_devinfo(suspend_file_target_bdev, target_inode->i_blkbits); + + if (get_size) + target_storage_available = size_ignoring_ignored_pages(); + + if (!resume2) + target_firstblock = bmap(target_inode, 0) << devinfo.bmap_shift; + + return; +cleanup: + target_inode = NULL; + if (target_file) { + filp_close(target_file, NULL); + target_file = NULL; + } + set_devinfo(NULL, 0); + target_storage_available = 0; +} + +static int parse_signature(struct suspend_file_header *header) +{ + int have_image = !memcmp(HaveImage, header->sig, sizeof(HaveImage) - 1); + int no_image_header = !memcmp(NoImage, header->sig, sizeof(NoImage) - 1); + + if (no_image_header) + return 0; + + if (!have_image) + return -1; + + if (header->resumed_before) + set_suspend_state(SUSPEND_RESUMED_BEFORE); + else + clear_suspend_state(SUSPEND_RESUMED_BEFORE); + + target_header_start = header->first_header_block; + return 1; +} + +/* prepare_signature */ + +static int prepare_signature(struct suspend_file_header *current_header, + unsigned long first_header_block) +{ + strncpy(current_header->sig, HaveImage, sizeof(HaveImage)); + current_header->resumed_before = 0; + current_header->first_header_block = first_header_block; + return 0; +} + +static int suspend_file_storage_allocated(void) +{ + if (!target_inode) + return 0; + + if (target_is_normal_file()) + return (int) target_storage_available; + else + return header_pages_allocated + main_pages_requested; +} + +static int suspend_file_release_storage(void) +{ + if (test_action_state(SUSPEND_KEEP_IMAGE) && + test_suspend_state(SUSPEND_NOW_RESUMING)) + return 0; + + suspend_put_extent_chain(&block_chain); + + header_pages_allocated = 0; + main_pages_allocated = 0; + main_pages_requested = 0; + return 0; +} + +static int __suspend_file_allocate_storage(int main_storage_requested, + int header_storage); + +static int suspend_file_allocate_header_space(int space_requested) +{ + int i; + + if (!block_chain.first && __suspend_file_allocate_storage( + main_pages_requested, space_requested)) { + printk("Failed to allocate space for the header.\n"); + return -ENOSPC; + } + + suspend_extent_state_goto_start(&suspend_writer_posn); + suspend_bio_ops.forward_one_page(); /* To first page */ + + for (i = 0; i < space_requested; i++) { + if (suspend_bio_ops.forward_one_page()) { + printk("Out of space while seeking to allocate " + "header pages,\n"); + header_pages_allocated = i; + return -ENOSPC; + } + } + + header_pages_allocated = space_requested; + + /* The end of header pages will be the start of pageset 2 */ + suspend_extent_state_save(&suspend_writer_posn, + &suspend_writer_posn_save[2]); + return 0; +} + +static int suspend_file_allocate_storage(int space_requested) +{ + if (__suspend_file_allocate_storage(space_requested, + header_pages_allocated)) + return -ENOSPC; + + main_pages_requested = space_requested; + return -ENOSPC; +} + +static int __suspend_file_allocate_storage(int main_space_requested, + int header_space_requested) +{ + int result = 0; + + int extra_pages = DIV_ROUND_UP(main_space_requested * + (sizeof(unsigned long) + sizeof(int)), PAGE_SIZE); + int pages_to_get = main_space_requested + extra_pages + + header_space_requested; + int blocks_to_get = pages_to_get - block_chain.size; + + /* Only release_storage reduces the size */ + if (blocks_to_get < 1) + return 0; + + populate_block_list(); + + suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0, + "Finished with block_chain.size == %d.\n", + block_chain.size); + + if (block_chain.size < pages_to_get) { + printk("Block chain size (%d) < header pages (%d) + extra pages (%d) + main pages (%d) (=%d pages).\n", + block_chain.size, header_pages_allocated, extra_pages, + main_space_requested, pages_to_get); + result = -ENOSPC; + } + + main_pages_requested = main_space_requested; + main_pages_allocated = main_space_requested + extra_pages; + + suspend_file_allocate_header_space(header_pages_allocated); + return result; +} + +static int suspend_file_write_header_init(void) +{ + suspend_extent_state_goto_start(&suspend_writer_posn); + + suspend_writer_buffer_posn = suspend_header_bytes_used = 0; + + /* Info needed to bootstrap goes at the start of the header. + * First we save the basic info needed for reading, including the number + * of header pages. Then we save the structs containing data needed + * for reading the header pages back. + * Note that even if header pages take more than one page, when we + * read back the info, we will have restored the location of the + * next header page by the time we go to use it. + */ + + suspend_bio_ops.rw_header_chunk(WRITE, &suspend_fileops, + (char *) &suspend_writer_posn_save, + sizeof(suspend_writer_posn_save)); + + suspend_bio_ops.rw_header_chunk(WRITE, &suspend_fileops, + (char *) &devinfo, sizeof(devinfo)); + + suspend_serialise_extent_chain(&suspend_fileops, &block_chain); + + return 0; +} + +static int suspend_file_write_header_cleanup(void) +{ + struct suspend_file_header *header; + + /* Write any unsaved data */ + if (suspend_writer_buffer_posn) + suspend_bio_ops.write_header_chunk_finish(); + + suspend_bio_ops.finish_all_io(); + + suspend_extent_state_goto_start(&suspend_writer_posn); + suspend_bio_ops.forward_one_page(); + + /* Adjust image header */ + suspend_bio_ops.bdev_page_io(READ, suspend_file_target_bdev, + target_firstblock, + virt_to_page(suspend_writer_buffer)); + + header = (struct suspend_file_header *) suspend_writer_buffer; + + prepare_signature(header, + suspend_writer_posn.current_offset << + devinfo.bmap_shift); + + suspend_bio_ops.bdev_page_io(WRITE, suspend_file_target_bdev, + target_firstblock, + virt_to_page(suspend_writer_buffer)); + + suspend_bio_ops.finish_all_io(); + + return 0; +} + +/* HEADER READING */ + +#ifdef CONFIG_DEVFS_FS +int create_dev(char *name, dev_t dev, char *devfs_name); +#else +static int create_dev(char *name, dev_t dev, char *devfs_name) +{ + sys_unlink(name); + return sys_mknod(name, S_IFBLK|0600, new_encode_dev(dev)); +} +#endif + +static int rd_init(void) +{ + suspend_writer_buffer_posn = 0; + + create_dev("/dev/root", ROOT_DEV, root_device_name); + create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, 0), NULL); + + suspend_read_fd = sys_open("/dev/root", O_RDONLY, 0); + if (suspend_read_fd < 0) + goto out; + + sys_read(suspend_read_fd, suspend_writer_buffer, BLOCK_SIZE); + + memcpy(&suspend_writer_posn_save, + suspend_writer_buffer + suspend_writer_buffer_posn, + sizeof(suspend_writer_posn_save)); + + suspend_writer_buffer_posn += sizeof(suspend_writer_posn_save); + + return 0; +out: + sys_unlink("/dev/ram"); + sys_unlink("/dev/root"); + return -EIO; +} + +static int file_init(void) +{ + suspend_writer_buffer_posn = 0; + + /* Read suspend_file configuration */ + suspend_bio_ops.bdev_page_io(READ, suspend_file_target_bdev, + target_header_start, + virt_to_page((unsigned long) suspend_writer_buffer)); + + return 0; +} + +/* + * read_header_init() + * + * Ramdisk support based heavily on init/do_mounts_rd.c + * + * Description: + * 1. Attempt to read the device specified with resume2=. + * 2. Check the contents of the header for our signature. + * 3. Warn, ignore, reset and/or continue as appropriate. + * 4. If continuing, read the suspend_file configuration section + * of the header and set up block device info so we can read + * the rest of the header & image. + * + * Returns: + * May not return if user choose to reboot at a warning. + * -EINVAL if cannot resume at this time. Booting should continue + * normally. + */ + +static int suspend_file_read_header_init(void) +{ + int result; + struct block_device *tmp; + + if (test_suspend_state(SUSPEND_TRY_RESUME_RD)) + result = rd_init(); + else + result = file_init(); + + if (result) { + printk("FileAllocator read header init: Failed to initialise " + "reading the first page of data.\n"); + return result; + } + + memcpy(&suspend_writer_posn_save, + suspend_writer_buffer + suspend_writer_buffer_posn, + sizeof(suspend_writer_posn_save)); + + suspend_writer_buffer_posn += sizeof(suspend_writer_posn_save); + + tmp = devinfo.bdev; + + memcpy(&devinfo, + suspend_writer_buffer + suspend_writer_buffer_posn, + sizeof(devinfo)); + + devinfo.bdev = tmp; + suspend_writer_buffer_posn += sizeof(devinfo); + + suspend_bio_ops.read_header_init(); + suspend_extent_state_goto_start(&suspend_writer_posn); + suspend_bio_ops.set_extra_page_forward(); + + suspend_header_bytes_used = suspend_writer_buffer_posn; + + return suspend_load_extent_chain(&block_chain); +} + +static int suspend_file_read_header_cleanup(void) +{ + suspend_bio_ops.rw_cleanup(READ); + return 0; +} + +static int suspend_file_signature_op(int op) +{ + char *cur; + int result = 0, changed = 0; + struct suspend_file_header *header; + + if(suspend_file_target_bdev <= 0) + return -1; + + cur = (char *) get_zeroed_page(GFP_ATOMIC); + if (!cur) { + printk("Unable to allocate a page for reading the image " + "signature.\n"); + return -ENOMEM; + } + + suspend_bio_ops.bdev_page_io(READ, suspend_file_target_bdev, + target_firstblock, + virt_to_page(cur)); + + header = (struct suspend_file_header *) cur; + result = parse_signature(header); + + switch (op) { + case INVALIDATE: + if (result == -1) + goto out; + + strcpy(header->sig, NoImage); + header->resumed_before = 0; + result = changed = 1; + break; + case MARK_RESUME_ATTEMPTED: + if (result == 1) { + header->resumed_before = 1; + changed = 1; + } + break; + case UNMARK_RESUME_ATTEMPTED: + if (result == 1) { + header->resumed_before = 0; + changed = 1; + } + break; + } + + if (changed) + suspend_bio_ops.bdev_page_io(WRITE, suspend_file_target_bdev, + target_firstblock, + virt_to_page(cur)); + +out: + suspend_bio_ops.finish_all_io(); + free_page((unsigned long) cur); + return result; +} + +/* Print debug info + * + * Description: + */ + +static int suspend_file_print_debug_stats(char *buffer, int size) +{ + int len = 0; + + if (suspendActiveAllocator != &suspend_fileops) { + len = snprintf_used(buffer, size, "- FileAllocator inactive.\n"); + return len; + } + + len = snprintf_used(buffer, size, "- FileAllocator active.\n"); + + len+= snprintf_used(buffer+len, size-len, " Storage available for image: " + "%ld pages.\n", + suspend_file_storage_allocated()); + + return len; +} + +/* + * Storage needed + * + * Returns amount of space in the image header required + * for the suspend_file's data. + * + * We ensure the space is allocated, but actually save the + * data from write_header_init and therefore don't also define a + * save_config_info routine. + */ +static int suspend_file_storage_needed(void) +{ + return sig_size + strlen(suspend_file_target) + 1 + + 3 * sizeof(struct extent_iterate_saved_state) + + sizeof(devinfo) + + sizeof(struct extent_chain) - 2 * sizeof(void *) + + (2 * sizeof(unsigned long) * block_chain.num_extents); +} + +/* + * suspend_file_invalidate_image + * + */ +static int suspend_file_invalidate_image(void) +{ + int result; + + suspend_file_release_storage(); + + result = suspend_file_signature_op(INVALIDATE); + if (result == 1 && !nr_suspends) + printk(KERN_WARNING "Suspend2: Image invalidated.\n"); + + return result; +} + +/* + * Image_exists + * + */ + +static int suspend_file_image_exists(void) +{ + if (!suspend_file_target_bdev) + reopen_resume_devt(); + + return suspend_file_signature_op(GET_IMAGE_EXISTS); +} + +/* + * Mark resume attempted. + * + * Record that we tried to resume from this image. + */ + +static void suspend_file_mark_resume_attempted(int mark) +{ + suspend_file_signature_op(mark ? MARK_RESUME_ATTEMPTED: + UNMARK_RESUME_ATTEMPTED); +} + +static void suspend_file_set_resume2(void) +{ + char *buffer = (char *) get_zeroed_page(GFP_ATOMIC); + char *buffer2 = (char *) get_zeroed_page(GFP_ATOMIC); + unsigned long sector = bmap(target_inode, 0); + int offset = 0; + + if (suspend_file_target_bdev) { + set_devinfo(suspend_file_target_bdev, target_inode->i_blkbits); + + bdevname(suspend_file_target_bdev, buffer2); + offset += snprintf(buffer + offset, PAGE_SIZE - offset, + "/dev/%s", buffer2); + + if (sector) + offset += snprintf(buffer + offset, PAGE_SIZE - offset, + ":0x%lx", sector << devinfo.bmap_shift); + } else + offset += snprintf(buffer + offset, PAGE_SIZE - offset, + "%s is not a valid target.", suspend_file_target); + + sprintf(resume2_file, "file:%s", buffer); + + free_page((unsigned long) buffer); + free_page((unsigned long) buffer2); + + suspend_attempt_to_parse_resume_device(1); +} + +static int __test_suspend_file_target(char *target, int resume_time, int quiet) +{ + suspend_file_get_target_info(target, 0, resume_time); + if (suspend_file_signature_op(GET_IMAGE_EXISTS) > -1) { + if (!quiet) + printk("Suspend2: FileAllocator: File signature found.\n"); + if (!resume_time) + suspend_file_set_resume2(); + + suspend_bio_ops.set_devinfo(&devinfo); + suspend_writer_posn.chains = &block_chain; + suspend_writer_posn.num_chains = 1; + + if (!resume_time) + set_suspend_state(SUSPEND_CAN_SUSPEND); + return 0; + } + + clear_suspend_state(SUSPEND_CAN_SUSPEND); + + if (quiet) + return 1; + + if (*target) + printk("Suspend2: FileAllocator: Sorry. No signature found at" + " %s.\n", target); + else + if (!resume_time) + printk("Suspend2: FileAllocator: Sorry. Target is not" + " set for suspending.\n"); + + return 1; +} + +static void test_suspend_file_target(void) +{ + setting_suspend_file_target = 1; + + printk("Suspend2: Suspending %sabled.\n", + __test_suspend_file_target(suspend_file_target, 0, 1) ? + "dis" : "en"); + + setting_suspend_file_target = 0; +} + +/* + * Parse Image Location + * + * Attempt to parse a resume2= parameter. + * Swap Writer accepts: + * resume2=file:DEVNAME[:FIRSTBLOCK] + * + * Where: + * DEVNAME is convertable to a dev_t by name_to_dev_t + * FIRSTBLOCK is the location of the first block in the file. + * BLOCKSIZE is the logical blocksize >= SECTOR_SIZE & <= PAGE_SIZE, + * mod SECTOR_SIZE == 0 of the device. + * Data is validated by attempting to read a header from the + * location given. Failure will result in suspend_file refusing to + * save an image, and a reboot with correct parameters will be + * necessary. + */ + +static int suspend_file_parse_sig_location(char *commandline, + int only_writer, int quiet) +{ + char *thischar, *devstart = NULL, *colon = NULL, *at_symbol = NULL; + int result = -EINVAL, target_blocksize = 0; + + if (strncmp(commandline, "file:", 5)) { + if (!only_writer) + return 1; + } else + commandline += 5; + + /* + * Don't check signature again if we're beginning a cycle. If we already + * did the initialisation successfully, assume we'll be okay when it comes + * to resuming. + */ + if (suspend_file_target_bdev) + return 0; + + devstart = thischar = commandline; + while ((*thischar != ':') && (*thischar != '@') && + ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == ':') { + colon = thischar; + *colon = 0; + thischar++; + } + + while ((*thischar != '@') && ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == '@') { + at_symbol = thischar; + *at_symbol = 0; + } + + /* + * For the suspend_file, you can be able to resume, but not suspend, + * because the resume2= is set correctly, but the suspend_file_target + * isn't. + * + * We may have come here as a result of setting resume2 or + * suspend_file_target. We only test the suspend_file target in the + * former case (it's already done in the later), and we do it before + * setting the block number ourselves. It will overwrite the values + * given on the command line if we don't. + */ + + if (!setting_suspend_file_target) + __test_suspend_file_target(suspend_file_target, 1, 0); + + if (colon) + target_firstblock = (int) simple_strtoul(colon + 1, NULL, 0); + else + target_firstblock = 0; + + if (at_symbol) { + target_blocksize = (int) simple_strtoul(at_symbol + 1, NULL, 0); + if (target_blocksize & (SECTOR_SIZE - 1)) { + printk("FileAllocator: Blocksizes are multiples of %d.\n", SECTOR_SIZE); + result = -EINVAL; + goto out; + } + } + + if (!quiet) + printk("Suspend2 FileAllocator: Testing whether you can resume:\n"); + + suspend_file_get_target_info(commandline, 0, 1); + + if (!suspend_file_target_bdev || IS_ERR(suspend_file_target_bdev)) { + suspend_file_target_bdev = NULL; + result = -1; + goto out; + } + + if (target_blocksize) + set_devinfo(suspend_file_target_bdev, ffs(target_blocksize)); + + result = __test_suspend_file_target(commandline, 1, 0); + +out: + if (result) + clear_suspend_state(SUSPEND_CAN_SUSPEND); + + if (!quiet) + printk("Resuming %sabled.\n", result ? "dis" : "en"); + + if (colon) + *colon = ':'; + if (at_symbol) + *at_symbol = '@'; + + return result; +} + +/* suspend_file_save_config_info + * + * Description: Save the target's name, not for resume time, but for all_settings. + * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE. + * Returns: Number of bytes used for saving our data. + */ + +static int suspend_file_save_config_info(char *buffer) +{ + strcpy(buffer, suspend_file_target); + return strlen(suspend_file_target) + 1; +} + +/* suspend_file_load_config_info + * + * Description: Reload target's name. + * Arguments: Buffer: Pointer to the start of the data. + * Size: Number of bytes that were saved. + */ + +static void suspend_file_load_config_info(char *buffer, int size) +{ + strcpy(suspend_file_target, buffer); +} + +static int suspend_file_initialise(int starting_cycle) +{ + if (starting_cycle) { + if (suspendActiveAllocator != &suspend_fileops) + return 0; + + if (starting_cycle & SYSFS_SUSPEND && !*suspend_file_target) { + printk("FileAllocator is the active writer, " + "but no filename has been set.\n"); + return 1; + } + } + + if (suspend_file_target) + suspend_file_get_target_info(suspend_file_target, starting_cycle, 0); + + if (starting_cycle && (suspend_file_image_exists() == -1)) { + printk("%s is does not have a valid signature for suspending.\n", + suspend_file_target); + return 1; + } + + return 0; +} + +static struct suspend_sysfs_data sysfs_params[] = { + + { + SUSPEND2_ATTR("target", SYSFS_RW), + SYSFS_STRING(suspend_file_target, 256, SYSFS_NEEDS_SM_FOR_WRITE), + .write_side_effect = test_suspend_file_target, + }, + + { + SUSPEND2_ATTR("enabled", SYSFS_RW), + SYSFS_INT(&suspend_fileops.enabled, 0, 1, 0), + .write_side_effect = attempt_to_parse_resume_device2, + } +}; + +static struct suspend_module_ops suspend_fileops = { + .type = WRITER_MODULE, + .name = "File Allocator", + .directory = "file", + .module = THIS_MODULE, + .print_debug_info = suspend_file_print_debug_stats, + .save_config_info = suspend_file_save_config_info, + .load_config_info = suspend_file_load_config_info, + .storage_needed = suspend_file_storage_needed, + .initialise = suspend_file_initialise, + .cleanup = suspend_file_cleanup, + + .storage_available = suspend_file_storage_available, + .storage_allocated = suspend_file_storage_allocated, + .release_storage = suspend_file_release_storage, + .allocate_header_space = suspend_file_allocate_header_space, + .allocate_storage = suspend_file_allocate_storage, + .image_exists = suspend_file_image_exists, + .mark_resume_attempted = suspend_file_mark_resume_attempted, + .write_header_init = suspend_file_write_header_init, + .write_header_cleanup = suspend_file_write_header_cleanup, + .read_header_init = suspend_file_read_header_init, + .read_header_cleanup = suspend_file_read_header_cleanup, + .invalidate_image = suspend_file_invalidate_image, + .parse_sig_location = suspend_file_parse_sig_location, + + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +/* ---- Registration ---- */ +static __init int suspend_file_load(void) +{ + suspend_fileops.rw_init = suspend_bio_ops.rw_init; + suspend_fileops.rw_cleanup = suspend_bio_ops.rw_cleanup; + suspend_fileops.read_chunk = suspend_bio_ops.read_chunk; + suspend_fileops.write_chunk = suspend_bio_ops.write_chunk; + suspend_fileops.rw_header_chunk = suspend_bio_ops.rw_header_chunk; + + return suspend_register_module(&suspend_fileops); +} + +#ifdef MODULE +static __exit void suspend_file_unload(void) +{ + suspend_unregister_module(&suspend_fileops); +} + +module_init(suspend_file_load); +module_exit(suspend_file_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Suspend2 FileAllocator"); +#else +late_initcall(suspend_file_load); +#endif diff --git a/kernel/power/suspend_swap.c b/kernel/power/suspend_swap.c new file mode 100644 index 0000000..a81dcc1 --- /dev/null +++ b/kernel/power/suspend_swap.c @@ -0,0 +1,1262 @@ +/* + * kernel/power/suspend_swap.c + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * Distributed under GPLv2. + * + * This file encapsulates functions for usage of swap space as a + * backing store. + */ + +#include +#include +#include +#include +#include +#include + +#include "suspend.h" +#include "sysfs.h" +#include "modules.h" +#include "io.h" +#include "ui.h" +#include "extent.h" +#include "block_io.h" + +static struct suspend_module_ops suspend_swapops; + +#define SIGNATURE_VER 6 + +/* --- Struct of pages stored on disk */ + +union diskpage { + union swap_header swh; /* swh.magic is the only member used */ +}; + +union p_diskpage { + union diskpage *pointer; + char *ptr; + unsigned long address; +}; + +/* Devices used for swap */ +static struct suspend_bdev_info devinfo[MAX_SWAPFILES]; + +/* Extent chains for swap & blocks */ +struct extent_chain swapextents; +struct extent_chain block_chain[MAX_SWAPFILES]; + +static dev_t header_dev_t; +static struct block_device *header_block_device; +static unsigned long headerblock; + +/* For swapfile automatically swapon/off'd. */ +static char swapfilename[32] = ""; +static int suspend_swapon_status; + +/* Header Page Information */ +static int header_pages_allocated; + +/* Swap Pages */ +static int main_pages_allocated, main_pages_requested; + +/* User Specified Parameters. */ + +static unsigned long resume_firstblock; +static int resume_blocksize; +static dev_t resume_swap_dev_t; +static struct block_device *resume_block_device; + +struct sysinfo swapinfo; +static int suspend_swap_invalidate_image(void); + +/* Block devices open. */ +struct bdev_opened +{ + dev_t device; + struct block_device *bdev; + int claimed; +}; + +/* + * Entry MAX_SWAPFILES is the resume block device, which may + * not be a swap device enabled when we suspend. + * Entry MAX_SWAPFILES + 1 is the header block device, which + * is needed before we find out which slot it occupies. + */ +static struct bdev_opened *bdev_info_list[MAX_SWAPFILES + 2]; + +static void close_bdev(int i) +{ + struct bdev_opened *this = bdev_info_list[i]; + + if (this->claimed) + bd_release(this->bdev); + + /* Release our reference. */ + blkdev_put(this->bdev); + + /* Free our info. */ + kfree(this); + + bdev_info_list[i] = NULL; +} + +static void close_bdevs(void) +{ + int i; + + for (i = 0; i < MAX_SWAPFILES; i++) + if (bdev_info_list[i]) + close_bdev(i); + + resume_block_device = header_block_device = NULL; +} + +static struct block_device *open_bdev(int index, dev_t device, int display_errs) +{ + struct bdev_opened *this; + struct block_device *bdev; + + if (bdev_info_list[index] && (bdev_info_list[index]->device == device)){ + bdev = bdev_info_list[index]->bdev; + return bdev; + } + + if (bdev_info_list[index] && bdev_info_list[index]->device != device) + close_bdev(index); + + bdev = open_by_devnum(device, FMODE_READ); + + if (IS_ERR(bdev) || !bdev) { + if (display_errs) + suspend_early_boot_message(1,SUSPEND_CONTINUE_REQ, + "Failed to get access to block device " + "\"%x\" (error %d).\n Maybe you need " + "to run mknod and/or lvmsetup in an " + "initrd/ramfs?", device, bdev); + return ERR_PTR(-EINVAL); + } + + this = kmalloc(sizeof(struct bdev_opened), GFP_KERNEL); + if (!this) { + printk(KERN_WARNING "Suspend2: Failed to allocate memory for " + "opening a bdev."); + return ERR_PTR(-ENOMEM); + } + + bdev_info_list[index] = this; + this->device = device; + this->bdev = bdev; + + if (index < MAX_SWAPFILES) + devinfo[index].bdev = bdev; + + return bdev; +} + +/* Must be silent - might be called from cat /sys/power/suspend2/debug_info + * Returns 0 if was off, -EBUSY if was on, error value otherwise. + */ +static int enable_swapfile(void) +{ + int activateswapresult = -EINVAL; + + if (suspend_swapon_status) + return 0; + + if (swapfilename[0]) { + /* Attempt to swap on with maximum priority */ + activateswapresult = sys_swapon(swapfilename, 0xFFFF); + if ((activateswapresult) && (activateswapresult != -EBUSY)) + printk("Suspend2: The swapfile/partition specified by " + "/sys/power/suspend2/suspend_swap/swapfile " + "(%s) could not be turned on (error %d). " + "Attempting to continue.\n", + swapfilename, activateswapresult); + if (!activateswapresult) + suspend_swapon_status = 1; + } + return activateswapresult; +} + +/* Returns 0 if was on, -EINVAL if was off, error value otherwise */ +static int disable_swapfile(void) +{ + int result = -EINVAL; + + if (!suspend_swapon_status) + return 0; + + if (swapfilename[0]) { + result = sys_swapoff(swapfilename); + if (result == -EINVAL) + return 0; /* Wasn't on */ + if (!result) + suspend_swapon_status = 0; + } + + return result; +} + +static int try_to_parse_resume_device(char *commandline, int quiet) +{ + struct kstat stat; + int error = 0; + + resume_swap_dev_t = name_to_dev_t(commandline); + + if (!resume_swap_dev_t) { + struct file *file = filp_open(commandline, O_RDONLY, 0); + + if (!IS_ERR(file) && file) { + vfs_getattr(file->f_vfsmnt, file->f_dentry, &stat); + filp_close(file, NULL); + } else + error = vfs_stat(commandline, &stat); + if (!error) + resume_swap_dev_t = stat.rdev; + } + + if (!resume_swap_dev_t) { + if (quiet) + return 1; + + if (test_suspend_state(SUSPEND_TRYING_TO_RESUME)) + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "Failed to translate \"%s\" into a device id.\n", + commandline); + else + printk("Suspend2: Can't translate \"%s\" into a device " + "id yet.\n", commandline); + return 1; + } + + resume_block_device = open_bdev(MAX_SWAPFILES, resume_swap_dev_t, 0); + if (IS_ERR(resume_block_device)) { + if (!quiet) + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "Failed to get access to \"%s\", where" + " the swap header should be found.", + commandline); + return 1; + } + + return 0; +} + +/* + * If we have read part of the image, we might have filled memory with + * data that should be zeroed out. + */ +static void suspend_swap_noresume_reset(void) +{ + memset((char *) &devinfo, 0, sizeof(devinfo)); +} + +static int parse_signature(char *header, int restore) +{ + int type = -1; + + if (!memcmp("SWAP-SPACE",header,10)) + return 0; + else if (!memcmp("SWAPSPACE2",header,10)) + return 1; + + else if (!memcmp("S1SUSP",header,6)) + type = 2; + else if (!memcmp("S2SUSP",header,6)) + type = 3; + else if (!memcmp("S1SUSPEND",header,9)) + type = 4; + + else if (!memcmp("z",header,1)) + type = 12; + else if (!memcmp("Z",header,1)) + type = 13; + + /* + * Put bdev of suspend header in last byte of swap header + * (unsigned short) + */ + if (type > 11) { + dev_t *header_ptr = (dev_t *) &header[1]; + unsigned char *headerblocksize_ptr = + (unsigned char *) &header[5]; + u32 *headerblock_ptr = (u32 *) &header[6]; + header_dev_t = *header_ptr; + /* + * We are now using the highest bit of the char to indicate + * whether we have attempted to resume from this image before. + */ + clear_suspend_state(SUSPEND_RESUMED_BEFORE); + if (((int) *headerblocksize_ptr) & 0x80) + set_suspend_state(SUSPEND_RESUMED_BEFORE); + headerblock = (unsigned long) *headerblock_ptr; + } + + if ((restore) && (type > 5)) { + /* We only reset our own signatures */ + if (type & 1) + memcpy(header,"SWAPSPACE2",10); + else + memcpy(header,"SWAP-SPACE",10); + } + + return type; +} + +/* + * prepare_signature + */ +static int prepare_signature(dev_t bdev, unsigned long block, + char *current_header) +{ + int current_type = parse_signature(current_header, 0); + dev_t *header_ptr = (dev_t *) (¤t_header[1]); + unsigned long *headerblock_ptr = + (unsigned long *) (¤t_header[6]); + + if ((current_type > 1) && (current_type < 6)) + return 1; + + /* At the moment, I don't have a way to handle the block being + * > 32 bits. Not enough room in the signature and no way to + * safely put the data elsewhere. */ + + if (BITS_PER_LONG == 64 && ffs(block) > 31) { + suspend_prepare_status(DONT_CLEAR_BAR, + "Header sector requires 33+ bits. " + "Would not be able to resume."); + return 1; + } + + if (current_type & 1) + current_header[0] = 'Z'; + else + current_header[0] = 'z'; + *header_ptr = bdev; + /* prev is the first/last swap page of the resume area */ + *headerblock_ptr = (unsigned long) block; + return 0; +} + +static int __suspend_swap_allocate_storage(int main_storage_requested, + int header_storage); + +static int suspend_swap_allocate_header_space(int space_requested) +{ + int i; + + if (!swapextents.size && __suspend_swap_allocate_storage( + main_pages_requested, space_requested)) { + printk("Failed to allocate space for the header.\n"); + return -ENOSPC; + } + + suspend_extent_state_goto_start(&suspend_writer_posn); + suspend_bio_ops.forward_one_page(); /* To first page */ + + for (i = 0; i < space_requested; i++) { + if (suspend_bio_ops.forward_one_page()) { + printk("Out of space while seeking to allocate " + "header pages,\n"); + header_pages_allocated = i; + return -ENOSPC; + } + + } + + header_pages_allocated = space_requested; + + /* The end of header pages will be the start of pageset 2; + * we are now sitting on the first pageset2 page. */ + suspend_extent_state_save(&suspend_writer_posn, + &suspend_writer_posn_save[2]); + return 0; +} + +static void get_main_pool_phys_params(void) +{ + struct extent *extentpointer = NULL; + unsigned long address; + int i, extent_min = -1, extent_max = -1, last_chain = -1; + + for (i = 0; i < MAX_SWAPFILES; i++) + if (block_chain[i].first) + suspend_put_extent_chain(&block_chain[i]); + + suspend_extent_for_each(&swapextents, extentpointer, address) { + swp_entry_t swap_address = extent_val_to_swap_entry(address); + pgoff_t offset = swp_offset(swap_address); + unsigned swapfilenum = swp_type(swap_address); + struct swap_info_struct *sis = get_swap_info_struct(swapfilenum); + sector_t new_sector = map_swap_page(sis, offset); + + if ((new_sector == extent_max + 1) && + (last_chain == swapfilenum)) + extent_max++; + else { + if (extent_min > -1) { + if (test_action_state(SUSPEND_TEST_BIO)) + printk("Adding extent chain %d %d-%d.\n", + swapfilenum, + extent_min << + devinfo[last_chain].bmap_shift, + extent_max << + devinfo[last_chain].bmap_shift); + + suspend_add_to_extent_chain( + &block_chain[last_chain], + extent_min, extent_max); + } + extent_min = extent_max = new_sector; + last_chain = swapfilenum; + } + } + + if (extent_min > -1) { + if (test_action_state(SUSPEND_TEST_BIO)) + printk("Adding extent chain %d %d-%d.\n", + last_chain, + extent_min << + devinfo[last_chain].bmap_shift, + extent_max << + devinfo[last_chain].bmap_shift); + suspend_add_to_extent_chain( + &block_chain[last_chain], + extent_min, extent_max); + } + + suspend_swap_allocate_header_space(header_pages_allocated); +} + +static int suspend_swap_storage_allocated(void) +{ + return main_pages_requested + header_pages_allocated; +} + +static int suspend_swap_storage_available(void) +{ + si_swapinfo(&swapinfo); + return (((int) swapinfo.freeswap + main_pages_allocated) * PAGE_SIZE / + (PAGE_SIZE + sizeof(unsigned long) + sizeof(int))); +} + +static int suspend_swap_initialise(int starting_cycle) +{ + if (!starting_cycle) + return 0; + + enable_swapfile(); + + if (resume_swap_dev_t && !resume_block_device && + IS_ERR(resume_block_device = + open_bdev(MAX_SWAPFILES, resume_swap_dev_t, 1))) + return 1; + + return 0; +} + +static void suspend_swap_cleanup(int ending_cycle) +{ + if (ending_cycle) + disable_swapfile(); + + close_bdevs(); +} + +static int suspend_swap_release_storage(void) +{ + int i = 0; + + if (test_action_state(SUSPEND_KEEP_IMAGE) && + test_suspend_state(SUSPEND_NOW_RESUMING)) + return 0; + + header_pages_allocated = 0; + main_pages_allocated = 0; + + if (swapextents.first) { + /* Free swap entries */ + struct extent *extentpointer; + unsigned long extentvalue; + suspend_extent_for_each(&swapextents, extentpointer, + extentvalue) + swap_free(extent_val_to_swap_entry(extentvalue)); + + suspend_put_extent_chain(&swapextents); + + for (i = 0; i < MAX_SWAPFILES; i++) + if (block_chain[i].first) + suspend_put_extent_chain(&block_chain[i]); + } + + return 0; +} + +static int suspend_swap_allocate_storage(int space_requested) +{ + if (!__suspend_swap_allocate_storage(space_requested, + header_pages_allocated)) { + main_pages_requested = space_requested; + return 0; + } + + return -ENOSPC; +} + +static void free_swap_range(unsigned long min, unsigned long max) +{ + int j; + + for (j = min; j < max; j++) + swap_free(extent_val_to_swap_entry(j)); +} + +/* + * Round robin allocation (where swap storage has the same priority). + * could make this very inefficient, so we track extents allocated on + * a per-swapfiles basis. + */ +static int __suspend_swap_allocate_storage(int main_space_requested, + int header_space_requested) +{ + int i, result = 0, first[MAX_SWAPFILES], pages_to_get, extra_pages, gotten = 0; + unsigned long extent_min[MAX_SWAPFILES], extent_max[MAX_SWAPFILES]; + + extra_pages = DIV_ROUND_UP(main_space_requested * (sizeof(unsigned long) + + sizeof(int)), PAGE_SIZE); + pages_to_get = main_space_requested + extra_pages + + header_space_requested - swapextents.size; + + if (pages_to_get < 1) + return 0; + + for (i=0; i < MAX_SWAPFILES; i++) { + struct swap_info_struct *si = get_swap_info_struct(i); + if ((devinfo[i].bdev = si->bdev)) + devinfo[i].dev_t = si->bdev->bd_dev; + devinfo[i].bmap_shift = 3; + devinfo[i].blocks_per_page = 1; + first[i] = 1; + } + + for(i=0; i < pages_to_get; i++) { + swp_entry_t entry; + unsigned long new_value; + unsigned swapfilenum; + + entry = get_swap_page(); + if (!entry.val) + break; + + swapfilenum = swp_type(entry); + new_value = swap_entry_to_extent_val(entry); + + if (first[swapfilenum]) { + first[swapfilenum] = 0; + extent_min[swapfilenum] = new_value; + extent_max[swapfilenum] = new_value; + gotten++; + continue; + } + + if (new_value == extent_max[swapfilenum] + 1) { + extent_max[swapfilenum]++; + gotten++; + continue; + } + + if (suspend_add_to_extent_chain(&swapextents, + extent_min[swapfilenum], + extent_max[swapfilenum])) { + free_swap_range(extent_min[swapfilenum], + extent_max[swapfilenum]); + swap_free(entry); + gotten -= (extent_max[swapfilenum] - + extent_min[swapfilenum]); + break; + } else { + extent_min[swapfilenum] = new_value; + extent_max[swapfilenum] = new_value; + gotten++; + } + } + + for (i = 0; i < MAX_SWAPFILES; i++) + if (!first[i] && suspend_add_to_extent_chain(&swapextents, + extent_min[i], extent_max[i])) { + free_swap_range(extent_min[i], extent_max[i]); + gotten -= (extent_max[i] - extent_min[i]); + } + + if (gotten < pages_to_get) + result = -ENOSPC; + + main_pages_allocated += gotten; + get_main_pool_phys_params(); + return result; +} + +static int suspend_swap_write_header_init(void) +{ + int i, result; + struct swap_info_struct *si; + + suspend_extent_state_goto_start(&suspend_writer_posn); + + suspend_writer_buffer_posn = suspend_header_bytes_used = 0; + + /* Info needed to bootstrap goes at the start of the header. + * First we save the positions and devinfo, including the number + * of header pages. Then we save the structs containing data needed + * for reading the header pages back. + * Note that even if header pages take more than one page, when we + * read back the info, we will have restored the location of the + * next header page by the time we go to use it. + */ + + /* Forward one page will be done prior to the read */ + for (i = 0; i < MAX_SWAPFILES; i++) { + si = get_swap_info_struct(i); + if (si->swap_file) + devinfo[i].dev_t = si->bdev->bd_dev; + else + devinfo[i].dev_t = (dev_t) 0; + } + + if ((result = suspend_bio_ops.rw_header_chunk(WRITE, + &suspend_swapops, + (char *) &suspend_writer_posn_save, + sizeof(suspend_writer_posn_save)))) + return result; + + if ((result = suspend_bio_ops.rw_header_chunk(WRITE, + &suspend_swapops, + (char *) &devinfo, sizeof(devinfo)))) + return result; + + for (i=0; i < MAX_SWAPFILES; i++) + suspend_serialise_extent_chain(&suspend_swapops, &block_chain[i]); + + return 0; +} + +static int suspend_swap_write_header_cleanup(void) +{ + int result; + struct swap_info_struct *si; + + /* Write any unsaved data */ + if (suspend_writer_buffer_posn) + suspend_bio_ops.write_header_chunk_finish(); + + suspend_bio_ops.finish_all_io(); + + suspend_extent_state_goto_start(&suspend_writer_posn); + suspend_bio_ops.forward_one_page(); + + /* Adjust swap header */ + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(suspend_writer_buffer)); + + si = get_swap_info_struct(suspend_writer_posn.current_chain); + result = prepare_signature(si->bdev->bd_dev, + suspend_writer_posn.current_offset, + ((union swap_header *) suspend_writer_buffer)->magic.magic); + + if (!result) + suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, + resume_firstblock, + virt_to_page(suspend_writer_buffer)); + + suspend_bio_ops.finish_all_io(); + + return result; +} + +/* ------------------------- HEADER READING ------------------------- */ + +/* + * read_header_init() + * + * Description: + * 1. Attempt to read the device specified with resume2=. + * 2. Check the contents of the swap header for our signature. + * 3. Warn, ignore, reset and/or continue as appropriate. + * 4. If continuing, read the suspend_swap configuration section + * of the header and set up block device info so we can read + * the rest of the header & image. + * + * Returns: + * May not return if user choose to reboot at a warning. + * -EINVAL if cannot resume at this time. Booting should continue + * normally. + */ + +static int suspend_swap_read_header_init(void) +{ + int i, result = 0; + + suspend_header_bytes_used = 0; + + if (!header_dev_t) { + printk("read_header_init called when we haven't " + "verified there is an image!\n"); + return -EINVAL; + } + + /* + * If the header is not on the resume_swap_dev_t, get the resume device first. + */ + if (header_dev_t != resume_swap_dev_t) { + header_block_device = open_bdev(MAX_SWAPFILES + 1, + header_dev_t, 1); + + if (IS_ERR(header_block_device)) + return PTR_ERR(header_block_device); + } else + header_block_device = resume_block_device; + + /* + * Read suspend_swap configuration. + * Headerblock size taken into account already. + */ + suspend_bio_ops.bdev_page_io(READ, header_block_device, + headerblock << 3, + virt_to_page((unsigned long) suspend_writer_buffer)); + + memcpy(&suspend_writer_posn_save, suspend_writer_buffer, 3 * sizeof(struct extent_iterate_saved_state)); + + suspend_writer_buffer_posn = 3 * sizeof(struct extent_iterate_saved_state); + suspend_header_bytes_used += 3 * sizeof(struct extent_iterate_saved_state); + + memcpy(&devinfo, suspend_writer_buffer + suspend_writer_buffer_posn, sizeof(devinfo)); + + suspend_writer_buffer_posn += sizeof(devinfo); + suspend_header_bytes_used += sizeof(devinfo); + + /* Restore device info */ + for (i = 0; i < MAX_SWAPFILES; i++) { + dev_t thisdevice = devinfo[i].dev_t; + struct block_device *result; + + devinfo[i].bdev = NULL; + + if (!thisdevice) + continue; + + if (thisdevice == resume_swap_dev_t) { + devinfo[i].bdev = resume_block_device; + bdev_info_list[i] = bdev_info_list[MAX_SWAPFILES]; + bdev_info_list[MAX_SWAPFILES] = NULL; + continue; + } + + if (thisdevice == header_dev_t) { + devinfo[i].bdev = header_block_device; + bdev_info_list[i] = bdev_info_list[MAX_SWAPFILES + 1]; + bdev_info_list[MAX_SWAPFILES + 1] = NULL; + continue; + } + + result = open_bdev(i, thisdevice, 1); + if (IS_ERR(result)) + return PTR_ERR(result); + } + + suspend_bio_ops.read_header_init(); + suspend_extent_state_goto_start(&suspend_writer_posn); + suspend_bio_ops.set_extra_page_forward(); + + for (i = 0; i < MAX_SWAPFILES && !result; i++) + result = suspend_load_extent_chain(&block_chain[i]); + + return result; +} + +static int suspend_swap_read_header_cleanup(void) +{ + suspend_bio_ops.rw_cleanup(READ); + return 0; +} + +/* suspend_swap_invalidate_image + * + */ +static int suspend_swap_invalidate_image(void) +{ + union p_diskpage cur; + int result = 0; + char newsig[11]; + + cur.address = get_zeroed_page(GFP_ATOMIC); + if (!cur.address) { + printk("Unable to allocate a page for restoring the swap signature.\n"); + return -ENOMEM; + } + + /* + * If nr_suspends == 0, we must be booting, so no swap pages + * will be recorded as used yet. + */ + + if (nr_suspends > 0) + suspend_swap_release_storage(); + + /* + * We don't do a sanity check here: we want to restore the swap + * whatever version of kernel made the suspend image. + * + * We need to write swap, but swap may not be enabled so + * we write the device directly + */ + + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(cur.pointer)); + + result = parse_signature(cur.pointer->swh.magic.magic, 1); + + if (result < 5) + goto out; + + strncpy(newsig, cur.pointer->swh.magic.magic, 10); + newsig[10] = 0; + + suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, + resume_firstblock, + virt_to_page(cur.pointer)); + + if (!nr_suspends) + printk(KERN_WARNING "Suspend2: Image invalidated.\n"); +out: + suspend_bio_ops.finish_all_io(); + free_page(cur.address); + return 0; +} + +/* + * workspace_size + * + * Description: + * Returns the number of bytes of RAM needed for this + * code to do its work. (Used when calculating whether + * we have enough memory to be able to suspend & resume). + * + */ +static int suspend_swap_memory_needed(void) +{ + return 1; +} + +/* + * Print debug info + * + * Description: + */ +static int suspend_swap_print_debug_stats(char *buffer, int size) +{ + int len = 0; + struct sysinfo sysinfo; + + if (suspendActiveAllocator != &suspend_swapops) { + len = snprintf_used(buffer, size, "- SwapAllocator inactive.\n"); + return len; + } + + len = snprintf_used(buffer, size, "- SwapAllocator active.\n"); + if (swapfilename[0]) + len+= snprintf_used(buffer+len, size-len, + " Attempting to automatically swapon: %s.\n", swapfilename); + + si_swapinfo(&sysinfo); + + len+= snprintf_used(buffer+len, size-len, " Swap available for image: %ld pages.\n", + (int) sysinfo.freeswap + suspend_swap_storage_allocated()); + + return len; +} + +/* + * Storage needed + * + * Returns amount of space in the swap header required + * for the suspend_swap's data. This ignores the links between + * pages, which we factor in when allocating the space. + * + * We ensure the space is allocated, but actually save the + * data from write_header_init and therefore don't also define a + * save_config_info routine. + */ +static int suspend_swap_storage_needed(void) +{ + int i, result; + result = sizeof(suspend_writer_posn_save) + sizeof(devinfo); + + for (i = 0; i < MAX_SWAPFILES; i++) { + result += 3 * sizeof(int); + result += (2 * sizeof(unsigned long) * + block_chain[i].num_extents); + } + + return result; +} + +/* + * Image_exists + */ +static int suspend_swap_image_exists(void) +{ + int signature_found; + union p_diskpage diskpage; + + if (!resume_swap_dev_t) { + printk("Not even trying to read header " + "because resume_swap_dev_t is not set.\n"); + return 0; + } + + if (!resume_block_device && + IS_ERR(resume_block_device = + open_bdev(MAX_SWAPFILES, resume_swap_dev_t, 1))) { + printk("Failed to open resume dev_t (%x).\n", resume_swap_dev_t); + return 0; + } + + diskpage.address = get_zeroed_page(GFP_ATOMIC); + + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + suspend_bio_ops.finish_all_io(); + + signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0); + free_page(diskpage.address); + + if (signature_found < 2) { + printk("Suspend2: Normal swapspace found.\n"); + return 0; /* Normal swap space */ + } else if (signature_found == -1) { + printk(KERN_ERR "Suspend2: Unable to find a signature. Could " + "you have moved a swap file?\n"); + return 0; + } else if (signature_found < 6) { + printk("Suspend2: Detected another implementation's signature.\n"); + return 0; + } else if ((signature_found >> 1) != SIGNATURE_VER) { + if ((!(test_suspend_state(SUSPEND_NORESUME_SPECIFIED))) && + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "Found a different style suspend image signature.")) { + set_suspend_state(SUSPEND_NORESUME_SPECIFIED); + printk("Suspend2: Dectected another implementation's signature.\n"); + } + } + + return 1; +} + +/* + * Mark resume attempted. + * + * Record that we tried to resume from this image. + */ +static void suspend_swap_mark_resume_attempted(int mark) +{ + union p_diskpage diskpage; + int signature_found; + + if (!resume_swap_dev_t) { + printk("Not even trying to record attempt at resuming" + " because resume_swap_dev_t is not set.\n"); + return; + } + + diskpage.address = get_zeroed_page(GFP_ATOMIC); + + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0); + + switch (signature_found) { + case 12: + case 13: + diskpage.pointer->swh.magic.magic[5] &= ~0x80; + if (mark) + diskpage.pointer->swh.magic.magic[5] |= 0x80; + break; + } + + suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + suspend_bio_ops.finish_all_io(); + free_page(diskpage.address); + return; +} + +/* + * Parse Image Location + * + * Attempt to parse a resume2= parameter. + * Swap Writer accepts: + * resume2=swap:DEVNAME[:FIRSTBLOCK][@BLOCKSIZE] + * + * Where: + * DEVNAME is convertable to a dev_t by name_to_dev_t + * FIRSTBLOCK is the location of the first block in the swap file + * (specifying for a swap partition is nonsensical but not prohibited). + * Data is validated by attempting to read a swap header from the + * location given. Failure will result in suspend_swap refusing to + * save an image, and a reboot with correct parameters will be + * necessary. + */ +static int suspend_swap_parse_sig_location(char *commandline, + int only_allocator, int quiet) +{ + char *thischar, *devstart, *colon = NULL, *at_symbol = NULL; + union p_diskpage diskpage; + int signature_found, result = -EINVAL, temp_result; + + if (strncmp(commandline, "swap:", 5)) { + /* + * Failing swap:, we'll take a simple + * resume2=/dev/hda2, but fall through to + * other allocators if /dev/ isn't matched. + */ + if (strncmp(commandline, "/dev/", 5)) + return 1; + } else + commandline += 5; + + devstart = thischar = commandline; + while ((*thischar != ':') && (*thischar != '@') && + ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == ':') { + colon = thischar; + *colon = 0; + thischar++; + } + + while ((*thischar != '@') && ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == '@') { + at_symbol = thischar; + *at_symbol = 0; + } + + if (colon) + resume_firstblock = (int) simple_strtoul(colon + 1, NULL, 0); + else + resume_firstblock = 0; + + clear_suspend_state(SUSPEND_CAN_SUSPEND); + clear_suspend_state(SUSPEND_CAN_RESUME); + + /* Legacy */ + if (at_symbol) { + resume_blocksize = (int) simple_strtoul(at_symbol + 1, NULL, 0); + if (resume_blocksize & (SECTOR_SIZE - 1)) { + if (!quiet) + printk("SwapAllocator: Blocksizes are multiples" + "of %d!\n", SECTOR_SIZE); + return -EINVAL; + } + resume_firstblock = resume_firstblock * + (resume_blocksize / SECTOR_SIZE); + } + + temp_result = try_to_parse_resume_device(devstart, quiet); + + if (colon) + *colon = ':'; + if (at_symbol) + *at_symbol = '@'; + + if (temp_result) + return -EINVAL; + + diskpage.address = get_zeroed_page(GFP_ATOMIC); + if (!diskpage.address) { + printk(KERN_ERR "Suspend2: SwapAllocator: Failed to allocate " + "a diskpage for I/O.\n"); + return -ENOMEM; + } + + temp_result = suspend_bio_ops.bdev_page_io(READ, + resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + + suspend_bio_ops.finish_all_io(); + + if (temp_result) { + printk(KERN_ERR "Suspend2: SwapAllocator: Failed to submit " + "I/O.\n"); + goto invalid; + } + + signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0); + + if (signature_found != -1) { + if (!quiet) + printk("Suspend2: SwapAllocator: Signature found.\n"); + result = 0; + + suspend_bio_ops.set_devinfo(devinfo); + suspend_writer_posn.chains = &block_chain[0]; + suspend_writer_posn.num_chains = MAX_SWAPFILES; + set_suspend_state(SUSPEND_CAN_SUSPEND); + set_suspend_state(SUSPEND_CAN_RESUME); + } else + if (!quiet) + printk(KERN_ERR "Suspend2: SwapAllocator: No swap " + "signature found at specified location.\n"); +invalid: + free_page((unsigned long) diskpage.address); + return result; + +} + +static int header_locations_read_sysfs(const char *page, int count) +{ + int i, printedpartitionsmessage = 0, len = 0, haveswap = 0; + struct inode *swapf = 0; + int zone; + char *path_page = (char *) __get_free_page(GFP_KERNEL); + char *path, *output = (char *) page; + int path_len; + + if (!page) + return 0; + + for (i = 0; i < MAX_SWAPFILES; i++) { + struct swap_info_struct *si = get_swap_info_struct(i); + + if (!si->swap_file) + continue; + + if (S_ISBLK(si->swap_file->f_mapping->host->i_mode)) { + haveswap = 1; + if (!printedpartitionsmessage) { + len += sprintf(output + len, + "For swap partitions, simply use the " + "format: resume2=swap:/dev/hda1.\n"); + printedpartitionsmessage = 1; + } + } else { + path_len = 0; + + path = d_path(si->swap_file->f_dentry, + si->swap_file->f_vfsmnt, + path_page, + PAGE_SIZE); + path_len = snprintf(path_page, 31, "%s", path); + + haveswap = 1; + swapf = si->swap_file->f_mapping->host; + if (!(zone = bmap(swapf,0))) { + len+= sprintf(output + len, + "Swapfile %s has been corrupted. Reuse" + " mkswap on it and try again.\n", + path_page); + } else { + char name_buffer[255]; + len+= sprintf(output + len, "For swapfile `%s`," + " use resume2=swap:/dev/%s:0x%x.\n", + path_page, + bdevname(si->bdev, name_buffer), + zone << (swapf->i_blkbits - 9)); + } + + } + } + + if (!haveswap) + len = sprintf(output, "You need to turn on swap partitions " + "before examining this file.\n"); + + free_page((unsigned long) path_page); + return len; +} + +static struct suspend_sysfs_data sysfs_params[] = { + { + SUSPEND2_ATTR("swapfilename", SYSFS_RW), + SYSFS_STRING(swapfilename, 255, 0) + }, + + { + SUSPEND2_ATTR("headerlocations", SYSFS_READONLY), + SYSFS_CUSTOM(header_locations_read_sysfs, NULL, 0) + }, + + { SUSPEND2_ATTR("enabled", SYSFS_RW), + SYSFS_INT(&suspend_swapops.enabled, 0, 1, 0), + .write_side_effect = attempt_to_parse_resume_device2, + } +}; + +static struct suspend_module_ops suspend_swapops = { + .type = WRITER_MODULE, + .name = "Swap Allocator", + .directory = "swap", + .module = THIS_MODULE, + .memory_needed = suspend_swap_memory_needed, + .print_debug_info = suspend_swap_print_debug_stats, + .storage_needed = suspend_swap_storage_needed, + .initialise = suspend_swap_initialise, + .cleanup = suspend_swap_cleanup, + + .noresume_reset = suspend_swap_noresume_reset, + .storage_available = suspend_swap_storage_available, + .storage_allocated = suspend_swap_storage_allocated, + .release_storage = suspend_swap_release_storage, + .allocate_header_space = suspend_swap_allocate_header_space, + .allocate_storage = suspend_swap_allocate_storage, + .image_exists = suspend_swap_image_exists, + .mark_resume_attempted = suspend_swap_mark_resume_attempted, + .write_header_init = suspend_swap_write_header_init, + .write_header_cleanup = suspend_swap_write_header_cleanup, + .read_header_init = suspend_swap_read_header_init, + .read_header_cleanup = suspend_swap_read_header_cleanup, + .invalidate_image = suspend_swap_invalidate_image, + .parse_sig_location = suspend_swap_parse_sig_location, + + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +/* ---- Registration ---- */ +static __init int suspend_swap_load(void) +{ + suspend_swapops.rw_init = suspend_bio_ops.rw_init; + suspend_swapops.rw_cleanup = suspend_bio_ops.rw_cleanup; + suspend_swapops.read_chunk = suspend_bio_ops.read_chunk; + suspend_swapops.write_chunk = suspend_bio_ops.write_chunk; + suspend_swapops.rw_header_chunk = suspend_bio_ops.rw_header_chunk; + + return suspend_register_module(&suspend_swapops); +} + +#ifdef MODULE +static __exit void suspend_swap_unload(void) +{ + suspend_unregister_module(&suspend_swapops); +} + +module_init(suspend_swap_load); +module_exit(suspend_swap_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Suspend2 SwapAllocator"); +#else +late_initcall(suspend_swap_load); +#endif diff --git a/kernel/power/suspend_userui.c b/kernel/power/suspend_userui.c new file mode 100644 index 0000000..a11cd3d --- /dev/null +++ b/kernel/power/suspend_userui.c @@ -0,0 +1,649 @@ +/* + * kernel/power/user_ui.c + * + * Copyright (C) 2005-2007 Bernard Blackham + * Copyright (C) 2002-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Routines for Suspend2's user interface. + * + * The user interface code talks to a userspace program via a + * netlink socket. + * + * The kernel side: + * - starts the userui program; + * - sends text messages and progress bar status; + * + * The user space side: + * - passes messages regarding user requests (abort, toggle reboot etc) + * + */ + +#define __KERNEL_SYSCALLS__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "sysfs.h" +#include "modules.h" +#include "suspend.h" +#include "ui.h" +#include "netlink.h" +#include "power_off.h" + +static char local_printf_buf[1024]; /* Same as printk - should be safe */ + +static struct user_helper_data ui_helper_data; +static struct suspend_module_ops userui_ops; +static int orig_kmsg; + +static char lastheader[512]; +static int lastheader_message_len; +static int ui_helper_changed; /* Used at resume-time so don't overwrite value + set from initrd/ramfs. */ + +/* Number of distinct progress amounts that userspace can display */ +static int progress_granularity = 30; + +DECLARE_WAIT_QUEUE_HEAD(userui_wait_for_key); + +static void ui_nl_set_state(int n) +{ + /* Only let them change certain settings */ + static const int suspend_action_mask = + (1 << SUSPEND_REBOOT) | (1 << SUSPEND_PAUSE) | (1 << SUSPEND_SLOW) | + (1 << SUSPEND_LOGALL) | (1 << SUSPEND_SINGLESTEP) | + (1 << SUSPEND_PAUSE_NEAR_PAGESET_END); + + suspend_action = (suspend_action & (~suspend_action_mask)) | + (n & suspend_action_mask); + + if (!test_action_state(SUSPEND_PAUSE) && + !test_action_state(SUSPEND_SINGLESTEP)) + wake_up_interruptible(&userui_wait_for_key); +} + +static void userui_redraw(void) +{ + suspend_send_netlink_message(&ui_helper_data, + USERUI_MSG_REDRAW, NULL, 0); +} + +static int userui_storage_needed(void) +{ + return sizeof(ui_helper_data.program) + 1 + sizeof(int); +} + +static int userui_save_config_info(char *buf) +{ + *((int *) buf) = progress_granularity; + memcpy(buf + sizeof(int), ui_helper_data.program, sizeof(ui_helper_data.program)); + return sizeof(ui_helper_data.program) + sizeof(int) + 1; +} + +static void userui_load_config_info(char *buf, int size) +{ + progress_granularity = *((int *) buf); + size -= sizeof(int); + + /* Don't load the saved path if one has already been set */ + if (ui_helper_changed) + return; + + if (size > sizeof(ui_helper_data.program)) + size = sizeof(ui_helper_data.program); + + memcpy(ui_helper_data.program, buf + sizeof(int), size); + ui_helper_data.program[sizeof(ui_helper_data.program)-1] = '\0'; +} + +static void set_ui_program_set(void) +{ + ui_helper_changed = 1; +} + +static int userui_memory_needed(void) +{ + /* ball park figure of 128 pages */ + return (128 * PAGE_SIZE); +} + +/* suspend_update_status + * + * Description: Update the progress bar and (if on) in-bar message. + * Arguments: UL value, maximum: Current progress percentage (value/max). + * const char *fmt, ...: Message to be displayed in the middle + * of the progress bar. + * Note that a NULL message does not mean that any previous + * message is erased! For that, you need suspend_prepare_status with + * clearbar on. + * Returns: Unsigned long: The next value where status needs to be updated. + * This is to reduce unnecessary calls to update_status. + */ +static unsigned long userui_update_status(unsigned long value, + unsigned long maximum, const char *fmt, ...) +{ + static int last_step = -1; + struct userui_msg_params msg; + int bitshift; + int this_step; + unsigned long next_update; + + if (ui_helper_data.pid == -1) + return 0; + + if ((!maximum) || (!progress_granularity)) + return maximum; + + if (value < 0) + value = 0; + + if (value > maximum) + value = maximum; + + /* Try to avoid math problems - we can't do 64 bit math here + * (and shouldn't need it - anyone got screen resolution + * of 65536 pixels or more?) */ + bitshift = fls(maximum) - 16; + if (bitshift > 0) { + unsigned long temp_maximum = maximum >> bitshift; + unsigned long temp_value = value >> bitshift; + this_step = (int) + (temp_value * progress_granularity / temp_maximum); + next_update = (((this_step + 1) * temp_maximum / + progress_granularity) + 1) << bitshift; + } else { + this_step = (int) (value * progress_granularity / maximum); + next_update = ((this_step + 1) * maximum / + progress_granularity) + 1; + } + + if (this_step == last_step) + return next_update; + + memset(&msg, 0, sizeof(msg)); + + msg.a = this_step; + msg.b = progress_granularity; + + if (fmt) { + va_list args; + va_start(args, fmt); + vsnprintf(msg.text, sizeof(msg.text), fmt, args); + va_end(args); + msg.text[sizeof(msg.text)-1] = '\0'; + } + + suspend_send_netlink_message(&ui_helper_data, USERUI_MSG_PROGRESS, + &msg, sizeof(msg)); + last_step = this_step; + + return next_update; +} + +/* userui_message. + * + * Description: This function is intended to do the same job as printk, but + * without normally logging what is printed. The point is to be + * able to get debugging info on screen without filling the logs + * with "1/534. ^M 2/534^M. 3/534^M" + * + * It may be called from an interrupt context - can't sleep! + * + * Arguments: int mask: The debugging section(s) this message belongs to. + * int level: The level of verbosity of this message. + * int restartline: Whether to output a \r or \n with this line + * (\n if we're logging all output). + * const char *fmt, ...: Message to be displayed a la printk. + */ +static void userui_message(unsigned long section, unsigned long level, + int normally_logged, const char *fmt, ...) +{ + struct userui_msg_params msg; + + if ((level) && (level > console_loglevel)) + return; + + memset(&msg, 0, sizeof(msg)); + + msg.a = section; + msg.b = level; + msg.c = normally_logged; + + if (fmt) { + va_list args; + va_start(args, fmt); + vsnprintf(msg.text, sizeof(msg.text), fmt, args); + va_end(args); + msg.text[sizeof(msg.text)-1] = '\0'; + } + + if (test_action_state(SUSPEND_LOGALL)) + printk("%s\n", msg.text); + + suspend_send_netlink_message(&ui_helper_data, USERUI_MSG_MESSAGE, + &msg, sizeof(msg)); +} + +static void wait_for_key_via_userui(void) +{ + DECLARE_WAITQUEUE(wait, current); + + add_wait_queue(&userui_wait_for_key, &wait); + set_current_state(TASK_INTERRUPTIBLE); + + interruptible_sleep_on(&userui_wait_for_key); + + set_current_state(TASK_RUNNING); + remove_wait_queue(&userui_wait_for_key, &wait); +} + +static char userui_wait_for_keypress(int timeout) +{ + int fd; + char key = '\0'; + struct termios t, t_backup; + + if (ui_helper_data.pid != -1) { + wait_for_key_via_userui(); + key = ' '; + goto out; + } + + /* We should be guaranteed /dev/console exists after populate_rootfs() in + * init/main.c + */ + if ((fd = sys_open("/dev/console", O_RDONLY, 0)) < 0) { + printk("Couldn't open /dev/console.\n"); + goto out; + } + + if (sys_ioctl(fd, TCGETS, (long)&t) < 0) + goto out_close; + + memcpy(&t_backup, &t, sizeof(t)); + + t.c_lflag &= ~(ISIG|ICANON|ECHO); + t.c_cc[VMIN] = 0; + if (timeout) + t.c_cc[VTIME] = timeout*10; + + if (sys_ioctl(fd, TCSETS, (long)&t) < 0) + goto out_restore; + + while (1) { + if (sys_read(fd, &key, 1) <= 0) { + key = '\0'; + break; + } + key = tolower(key); + if (test_suspend_state(SUSPEND_SANITY_CHECK_PROMPT)) { + if (key == 'c') { + set_suspend_state(SUSPEND_CONTINUE_REQ); + break; + } else if (key == ' ') + break; + } else + break; + } + +out_restore: + sys_ioctl(fd, TCSETS, (long)&t_backup); +out_close: + sys_close(fd); +out: + return key; +} + +/* suspend_prepare_status + * Description: Prepare the 'nice display', drawing the header and version, + * along with the current action and perhaps also resetting the + * progress bar. + * Arguments: + * int clearbar: Whether to reset the progress bar. + * const char *fmt, ...: The action to be displayed. + */ +static void userui_prepare_status(int clearbar, const char *fmt, ...) +{ + va_list args; + + if (fmt) { + va_start(args, fmt); + lastheader_message_len = vsnprintf(lastheader, 512, fmt, args); + va_end(args); + } + + if (clearbar) + suspend_update_status(0, 1, NULL); + + suspend_message(0, SUSPEND_STATUS, 1, lastheader, NULL); + + if (ui_helper_data.pid == -1) + printk(KERN_EMERG "%s\n", lastheader); +} + +/* abort_suspend + * + * Description: Begin to abort a cycle. If this wasn't at the user's request + * (and we're displaying output), tell the user why and wait for + * them to acknowledge the message. + * Arguments: A parameterised string (imagine this is printk) to display, + * telling the user why we're aborting. + */ + +static void userui_abort_suspend(int result_code, const char *fmt, ...) +{ + va_list args; + int printed_len = 0; + + set_result_state(result_code); + if (!test_result_state(SUSPEND_ABORTED)) { + if (!test_result_state(SUSPEND_ABORT_REQUESTED)) { + va_start(args, fmt); + printed_len = vsnprintf(local_printf_buf, + sizeof(local_printf_buf), fmt, args); + va_end(args); + if (ui_helper_data.pid != -1) + printed_len = sprintf(local_printf_buf + printed_len, + " (Press SPACE to continue)"); + suspend_prepare_status(CLEAR_BAR, local_printf_buf); + + if (ui_helper_data.pid != -1) + suspend_wait_for_keypress(0); + } + /* Turn on aborting flag */ + set_result_state(SUSPEND_ABORTED); + } +} + +/* request_abort_suspend + * + * Description: Handle the user requesting the cancellation of a suspend by + * pressing escape. + * Callers: Invoked from a netlink packet from userspace when the user presses + * escape. + */ +static void request_abort_suspend(void) +{ + if (test_result_state(SUSPEND_ABORT_REQUESTED)) + return; + + if (test_suspend_state(SUSPEND_NOW_RESUMING)) { + suspend_prepare_status(CLEAR_BAR, "Escape pressed. " + "Powering down again."); + set_suspend_state(SUSPEND_STOP_RESUME); + while (!test_suspend_state(SUSPEND_IO_STOPPED)) + schedule(); + if (suspendActiveAllocator->mark_resume_attempted) + suspendActiveAllocator->mark_resume_attempted(0); + suspend2_power_down(); + } else { + suspend_prepare_status(CLEAR_BAR, "--- ESCAPE PRESSED :" + " ABORTING SUSPEND ---"); + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_ABORT_REQUESTED); + + wake_up_interruptible(&userui_wait_for_key); + } +} + +static int userui_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +{ + int type; + int *data; + + type = nlh->nlmsg_type; + + /* A control message: ignore them */ + if (type < NETLINK_MSG_BASE) + return 0; + + /* Unknown message: reply with EINVAL */ + if (type >= USERUI_MSG_MAX) + return -EINVAL; + + /* All operations require privileges, even GET */ + if (security_netlink_recv(skb, CAP_NET_ADMIN)) + return -EPERM; + + /* Only allow one task to receive NOFREEZE privileges */ + if (type == NETLINK_MSG_NOFREEZE_ME && ui_helper_data.pid != -1) { + printk("Got NOFREEZE_ME request when ui_helper_data.pid is %d.\n", ui_helper_data.pid); + return -EBUSY; + } + + data = (int*)NLMSG_DATA(nlh); + + switch (type) { + case USERUI_MSG_ABORT: + request_abort_suspend(); + break; + case USERUI_MSG_GET_STATE: + suspend_send_netlink_message(&ui_helper_data, + USERUI_MSG_GET_STATE, &suspend_action, + sizeof(suspend_action)); + break; + case USERUI_MSG_GET_DEBUG_STATE: + suspend_send_netlink_message(&ui_helper_data, + USERUI_MSG_GET_DEBUG_STATE, + &suspend_debug_state, + sizeof(suspend_debug_state)); + break; + case USERUI_MSG_SET_STATE: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) + return -EINVAL; + ui_nl_set_state(*data); + break; + case USERUI_MSG_SET_DEBUG_STATE: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) + return -EINVAL; + suspend_debug_state = (*data); + break; + case USERUI_MSG_SPACE: + wake_up_interruptible(&userui_wait_for_key); + break; + case USERUI_MSG_GET_POWERDOWN_METHOD: + suspend_send_netlink_message(&ui_helper_data, + USERUI_MSG_GET_POWERDOWN_METHOD, + &suspend2_poweroff_method, + sizeof(suspend2_poweroff_method)); + break; + case USERUI_MSG_SET_POWERDOWN_METHOD: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) + return -EINVAL; + suspend2_poweroff_method = (*data); + break; + case USERUI_MSG_GET_LOGLEVEL: + suspend_send_netlink_message(&ui_helper_data, + USERUI_MSG_GET_LOGLEVEL, + &suspend_default_console_level, + sizeof(suspend_default_console_level)); + break; + case USERUI_MSG_SET_LOGLEVEL: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) + return -EINVAL; + suspend_default_console_level = (*data); + break; + } + + return 1; +} + +/* userui_cond_pause + * + * Description: Potentially pause and wait for the user to tell us to continue. + * We normally only pause when @pause is set. + * Arguments: int pause: Whether we normally pause. + * char *message: The message to display. Not parameterised + * because it's normally a constant. + */ + +static void userui_cond_pause(int pause, char *message) +{ + int displayed_message = 0, last_key = 0; + + while (last_key != 32 && + ui_helper_data.pid != -1 && + (!test_result_state(SUSPEND_ABORTED)) && + ((test_action_state(SUSPEND_PAUSE) && pause) || + (test_action_state(SUSPEND_SINGLESTEP)))) { + if (!displayed_message) { + suspend_prepare_status(DONT_CLEAR_BAR, + "%s Press SPACE to continue.%s", + message ? message : "", + (test_action_state(SUSPEND_SINGLESTEP)) ? + " Single step on." : ""); + displayed_message = 1; + } + last_key = suspend_wait_for_keypress(0); + } + schedule(); +} + +/* userui_prepare_console + * + * Description: Prepare a console for use, save current settings. + * Returns: Boolean: Whether an error occured. Errors aren't + * treated as fatal, but a warning is printed. + */ +static void userui_prepare_console(void) +{ + orig_kmsg = kmsg_redirect; + kmsg_redirect = fg_console + 1; + + ui_helper_data.pid = -1; + + if (!userui_ops.enabled) + return; + + if (!*ui_helper_data.program) { + printk("suspend_userui: program not configured. suspend_userui disabled.\n"); + return; + } + + suspend_netlink_setup(&ui_helper_data); + + return; +} + +/* userui_cleanup_console + * + * Description: Restore the settings we saved above. + */ + +static void userui_cleanup_console(void) +{ + if (ui_helper_data.pid > -1) + suspend_netlink_close(&ui_helper_data); + + kmsg_redirect = orig_kmsg; +} + +/* + * User interface specific /sys/power/suspend2 entries. + */ + +static struct suspend_sysfs_data sysfs_params[] = { +#if defined(CONFIG_NET) && defined(CONFIG_SYSFS) + { SUSPEND2_ATTR("enable_escape", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_CAN_CANCEL, 0) + }, + + { SUSPEND2_ATTR("pause_between_steps", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_PAUSE, 0) + }, + + { SUSPEND2_ATTR("enabled", SYSFS_RW), + SYSFS_INT(&userui_ops.enabled, 0, 1, 0) + }, + + { SUSPEND2_ATTR("progress_granularity", SYSFS_RW), + SYSFS_INT(&progress_granularity, 1, 2048, 0) + }, + + { SUSPEND2_ATTR("program", SYSFS_RW), + SYSFS_STRING(ui_helper_data.program, 255, 0), + .write_side_effect = set_ui_program_set, + }, +#endif +}; + +static struct suspend_module_ops userui_ops = { + .type = MISC_MODULE, + .name = "Userspace UI", + .shared_directory = "Basic User Interface", + .module = THIS_MODULE, + .storage_needed = userui_storage_needed, + .save_config_info = userui_save_config_info, + .load_config_info = userui_load_config_info, + .memory_needed = userui_memory_needed, + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +static struct ui_ops my_ui_ops = { + .redraw = userui_redraw, + .update_status = userui_update_status, + .message = userui_message, + .prepare_status = userui_prepare_status, + .abort = userui_abort_suspend, + .cond_pause = userui_cond_pause, + .prepare = userui_prepare_console, + .cleanup = userui_cleanup_console, + .wait_for_key = userui_wait_for_keypress, +}; + +/* suspend_console_sysfs_init + * Description: Boot time initialisation for user interface. + */ + +static __init int s2_user_ui_init(void) +{ + int result; + + ui_helper_data.nl = NULL; + ui_helper_data.program[0] = '\0'; + ui_helper_data.pid = -1; + ui_helper_data.skb_size = sizeof(struct userui_msg_params); + ui_helper_data.pool_limit = 6; + ui_helper_data.netlink_id = NETLINK_SUSPEND2_USERUI; + ui_helper_data.name = "userspace ui"; + ui_helper_data.rcv_msg = userui_user_rcv_msg; + ui_helper_data.interface_version = 7; + ui_helper_data.must_init = 0; + ui_helper_data.not_ready = userui_cleanup_console; + init_completion(&ui_helper_data.wait_for_process); + result = suspend_register_module(&userui_ops); + if (!result) + result = s2_register_ui_ops(&my_ui_ops); + if (result) + suspend_unregister_module(&userui_ops); + + return result; +} + +#ifdef MODULE +static __exit void s2_user_ui_exit(void) +{ + s2_remove_ui_ops(&my_ui_ops); + suspend_unregister_module(&userui_ops); +} + +module_init(s2_user_ui_init); +module_exit(s2_user_ui_exit); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Suspend2 Userui Support"); +MODULE_LICENSE("GPL"); +#else +late_initcall(s2_user_ui_init); +#endif diff --git a/kernel/power/sysfs.c b/kernel/power/sysfs.c new file mode 100644 index 0000000..47eca3b --- /dev/null +++ b/kernel/power/sysfs.c @@ -0,0 +1,347 @@ +/* + * kernel/power/sysfs.c + * + * Copyright (C) 2002-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * This file contains support for sysfs entries for tuning Suspend2. + * + * We have a generic handler that deals with the most common cases, and + * hooks for special handlers to use. + */ + +#include +#include +#include + +#include "sysfs.h" +#include "suspend.h" +#include "storage.h" + +static int suspend_sysfs_initialised = 0; + +static void suspend_initialise_sysfs(void); + +static struct suspend_sysfs_data sysfs_params[]; + +#define to_sysfs_data(_attr) container_of(_attr, struct suspend_sysfs_data, attr) + +static void suspend2_main_wrapper(void) +{ + _suspend2_try_suspend(0); +} + +static ssize_t suspend2_attr_show(struct kobject *kobj, struct attribute *attr, + char *page) +{ + struct suspend_sysfs_data *sysfs_data = to_sysfs_data(attr); + int len = 0; + + if (suspend_start_anything(0)) + return -EBUSY; + + if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_READ) + suspend_prepare_usm(); + + switch (sysfs_data->type) { + case SUSPEND_SYSFS_DATA_CUSTOM: + len = (sysfs_data->data.special.read_sysfs) ? + (sysfs_data->data.special.read_sysfs)(page, PAGE_SIZE) + : 0; + break; + case SUSPEND_SYSFS_DATA_BIT: + len = sprintf(page, "%d\n", + -test_bit(sysfs_data->data.bit.bit, + sysfs_data->data.bit.bit_vector)); + break; + case SUSPEND_SYSFS_DATA_INTEGER: + len = sprintf(page, "%d\n", + *(sysfs_data->data.integer.variable)); + break; + case SUSPEND_SYSFS_DATA_LONG: + len = sprintf(page, "%ld\n", + *(sysfs_data->data.a_long.variable)); + break; + case SUSPEND_SYSFS_DATA_UL: + len = sprintf(page, "%lu\n", + *(sysfs_data->data.ul.variable)); + break; + case SUSPEND_SYSFS_DATA_STRING: + len = sprintf(page, "%s\n", + sysfs_data->data.string.variable); + break; + } + /* Side effect routine? */ + if (sysfs_data->read_side_effect) + sysfs_data->read_side_effect(); + + if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_READ) + suspend_cleanup_usm(); + + suspend_finish_anything(0); + + return len; +} + +#define BOUND(_variable, _type) \ + if (*_variable < sysfs_data->data._type.minimum) \ + *_variable = sysfs_data->data._type.minimum; \ + else if (*_variable > sysfs_data->data._type.maximum) \ + *_variable = sysfs_data->data._type.maximum; + +static ssize_t suspend2_attr_store(struct kobject *kobj, struct attribute *attr, + const char *my_buf, size_t count) +{ + int assigned_temp_buffer = 0, result = count; + struct suspend_sysfs_data *sysfs_data = to_sysfs_data(attr); + + if (suspend_start_anything((sysfs_data->flags & SYSFS_SUSPEND_OR_RESUME))) + return -EBUSY; + + ((char *) my_buf)[count] = 0; + + if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_WRITE) + suspend_prepare_usm(); + + switch (sysfs_data->type) { + case SUSPEND_SYSFS_DATA_CUSTOM: + if (sysfs_data->data.special.write_sysfs) + result = (sysfs_data->data.special.write_sysfs) + (my_buf, count); + break; + case SUSPEND_SYSFS_DATA_BIT: + { + int value = simple_strtoul(my_buf, NULL, 0); + if (value) + set_bit(sysfs_data->data.bit.bit, + (sysfs_data->data.bit.bit_vector)); + else + clear_bit(sysfs_data->data.bit.bit, + (sysfs_data->data.bit.bit_vector)); + } + break; + case SUSPEND_SYSFS_DATA_INTEGER: + { + int *variable = sysfs_data->data.integer.variable; + *variable = simple_strtol(my_buf, NULL, 0); + BOUND(variable, integer); + break; + } + case SUSPEND_SYSFS_DATA_LONG: + { + long *variable = sysfs_data->data.a_long.variable; + *variable = simple_strtol(my_buf, NULL, 0); + BOUND(variable, a_long); + break; + } + case SUSPEND_SYSFS_DATA_UL: + { + unsigned long *variable = sysfs_data->data.ul.variable; + *variable = simple_strtoul(my_buf, NULL, 0); + BOUND(variable, ul); + break; + } + break; + case SUSPEND_SYSFS_DATA_STRING: + { + int copy_len = count; + char *variable = + sysfs_data->data.string.variable; + + if (sysfs_data->data.string.max_length && + (copy_len > sysfs_data->data.string.max_length)) + copy_len = sysfs_data->data.string.max_length; + + if (!variable) { + sysfs_data->data.string.variable = + variable = (char *) get_zeroed_page(GFP_ATOMIC); + assigned_temp_buffer = 1; + } + strncpy(variable, my_buf, copy_len); + if ((copy_len) && + (my_buf[copy_len - 1] == '\n')) + variable[count - 1] = 0; + variable[count] = 0; + } + break; + } + + /* Side effect routine? */ + if (sysfs_data->write_side_effect) + sysfs_data->write_side_effect(); + + /* Free temporary buffers */ + if (assigned_temp_buffer) { + free_page((unsigned long) sysfs_data->data.string.variable); + sysfs_data->data.string.variable = NULL; + } + + if (sysfs_data->flags & SYSFS_NEEDS_SM_FOR_WRITE) + suspend_cleanup_usm(); + + suspend_finish_anything(sysfs_data->flags & SYSFS_SUSPEND_OR_RESUME); + + return result; +} + +static struct sysfs_ops suspend2_sysfs_ops = { + .show = &suspend2_attr_show, + .store = &suspend2_attr_store, +}; + +static struct kobj_type suspend2_ktype = { + .sysfs_ops = &suspend2_sysfs_ops, +}; + +decl_subsys(suspend2, &suspend2_ktype, NULL); + +/* Non-module sysfs entries. + * + * This array contains entries that are automatically registered at + * boot. Modules and the console code register their own entries separately. + * + * NB: If you move do_suspend, change suspend_write_sysfs's test so that + * suspend_start_anything still gets a 1 when the user echos > do_suspend! + */ + +static struct suspend_sysfs_data sysfs_params[] = { + { SUSPEND2_ATTR("do_suspend", SYSFS_WRITEONLY), + SYSFS_CUSTOM(NULL, NULL, SYSFS_SUSPENDING), + .write_side_effect = suspend2_main_wrapper + }, + + { SUSPEND2_ATTR("do_resume", SYSFS_WRITEONLY), + SYSFS_CUSTOM(NULL, NULL, SYSFS_RESUMING), + .write_side_effect = __suspend2_try_resume + }, + +}; + +void remove_suspend2_sysdir(struct kobject *kobj) +{ + if (!kobj) + return; + + kobject_unregister(kobj); + + kfree(kobj); +} + +struct kobject *make_suspend2_sysdir(char *name) +{ + struct kobject *kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL); + int err; + + if(!kobj) { + printk("Suspend2: Can't allocate kobject for sysfs dir!\n"); + return NULL; + } + + err = kobject_set_name(kobj, "%s", name); + + if (err) { + kfree(kobj); + return NULL; + } + + kobj->kset = &suspend2_subsys.kset; + + err = kobject_register(kobj); + + if (err) + kfree(kobj); + + return err ? NULL : kobj; +} + +/* suspend_register_sysfs_file + * + * Helper for registering a new /sysfs/suspend2 entry. + */ + +int suspend_register_sysfs_file( + struct kobject *kobj, + struct suspend_sysfs_data *suspend_sysfs_data) +{ + int result; + + if (!suspend_sysfs_initialised) + suspend_initialise_sysfs(); + + if ((result = sysfs_create_file(kobj, &suspend_sysfs_data->attr))) + printk("Suspend2: sysfs_create_file for %s returned %d.\n", + suspend_sysfs_data->attr.name, result); + + return result; +} + +/* suspend_unregister_sysfs_file + * + * Helper for removing unwanted /sys/power/suspend2 entries. + * + */ +void suspend_unregister_sysfs_file(struct kobject *kobj, + struct suspend_sysfs_data *suspend_sysfs_data) +{ + sysfs_remove_file(kobj, &suspend_sysfs_data->attr); +} + +void suspend_cleanup_sysfs(void) +{ + int i, + numfiles = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data); + + if (!suspend_sysfs_initialised) + return; + + for (i=0; i< numfiles; i++) + suspend_unregister_sysfs_file(&suspend2_subsys.kset.kobj, + &sysfs_params[i]); + + kobj_set_kset_s(&suspend2_subsys.kset, power_subsys); + subsystem_unregister(&suspend2_subsys); + + suspend_sysfs_initialised = 0; +} + +/* suspend_initialise_sysfs + * + * Initialise the /sysfs/suspend2 directory. + */ + +static void suspend_initialise_sysfs(void) +{ + int i, error; + int numfiles = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data); + + if (suspend_sysfs_initialised) + return; + + /* Make our suspend2 directory a child of /sys/power */ + kobj_set_kset_s(&suspend2_subsys.kset, power_subsys); + error = subsystem_register(&suspend2_subsys); + + if (error) + return; + + /* Make it use the .store and .show routines above */ + kobj_set_kset_s(&suspend2_subsys.kset, suspend2_subsys); + + suspend_sysfs_initialised = 1; + + for (i=0; i< numfiles; i++) + suspend_register_sysfs_file(&suspend2_subsys.kset.kobj, + &sysfs_params[i]); +} + +int s2_sysfs_init(void) +{ + suspend_initialise_sysfs(); + return 0; +} + +void s2_sysfs_exit(void) +{ + suspend_cleanup_sysfs(); +} diff --git a/kernel/power/sysfs.h b/kernel/power/sysfs.h new file mode 100644 index 0000000..aad5e26 --- /dev/null +++ b/kernel/power/sysfs.h @@ -0,0 +1,132 @@ +/* + * kernel/power/sysfs.h + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * It provides declarations for suspend to use in managing + * /sysfs/suspend2. When we switch to kobjects, + * this will become redundant. + * + */ + +#include +#include "power.h" + +struct suspend_sysfs_data { + struct attribute attr; + int type; + int flags; + union { + struct { + unsigned long *bit_vector; + int bit; + } bit; + struct { + int *variable; + int minimum; + int maximum; + } integer; + struct { + long *variable; + long minimum; + long maximum; + } a_long; + struct { + unsigned long *variable; + unsigned long minimum; + unsigned long maximum; + } ul; + struct { + char *variable; + int max_length; + } string; + struct { + int (*read_sysfs) (const char *buffer, int count); + int (*write_sysfs) (const char *buffer, int count); + void *data; + } special; + } data; + + /* Side effects routines. Used, eg, for reparsing the + * resume2 entry when it changes */ + void (*read_side_effect) (void); + void (*write_side_effect) (void); + struct list_head sysfs_data_list; +}; + +enum { + SUSPEND_SYSFS_DATA_NONE = 1, + SUSPEND_SYSFS_DATA_CUSTOM, + SUSPEND_SYSFS_DATA_BIT, + SUSPEND_SYSFS_DATA_INTEGER, + SUSPEND_SYSFS_DATA_UL, + SUSPEND_SYSFS_DATA_LONG, + SUSPEND_SYSFS_DATA_STRING +}; + +#define SUSPEND2_ATTR(_name, _mode) \ + .attr = {.name = _name , .mode = _mode } + +#define SYSFS_BIT(_ul, _bit, _flags) \ + .type = SUSPEND_SYSFS_DATA_BIT, \ + .flags = _flags, \ + .data = { .bit = { .bit_vector = _ul, .bit = _bit } } + +#define SYSFS_INT(_int, _min, _max, _flags) \ + .type = SUSPEND_SYSFS_DATA_INTEGER, \ + .flags = _flags, \ + .data = { .integer = { .variable = _int, .minimum = _min, \ + .maximum = _max } } + +#define SYSFS_UL(_ul, _min, _max, _flags) \ + .type = SUSPEND_SYSFS_DATA_UL, \ + .flags = _flags, \ + .data = { .ul = { .variable = _ul, .minimum = _min, \ + .maximum = _max } } + +#define SYSFS_LONG(_long, _min, _max, _flags) \ + .type = SUSPEND_SYSFS_DATA_LONG, \ + .flags = _flags, \ + .data = { .a_long = { .variable = _long, .minimum = _min, \ + .maximum = _max } } + +#define SYSFS_STRING(_string, _max_len, _flags) \ + .type = SUSPEND_SYSFS_DATA_STRING, \ + .flags = _flags, \ + .data = { .string = { .variable = _string, .max_length = _max_len } } + +#define SYSFS_CUSTOM(_read, _write, _flags) \ + .type = SUSPEND_SYSFS_DATA_CUSTOM, \ + .flags = _flags, \ + .data = { .special = { .read_sysfs = _read, .write_sysfs = _write } } + +#define SYSFS_WRITEONLY 0200 +#define SYSFS_READONLY 0444 +#define SYSFS_RW 0644 + +/* Flags */ +#define SYSFS_NEEDS_SM_FOR_READ 1 +#define SYSFS_NEEDS_SM_FOR_WRITE 2 +#define SYSFS_SUSPEND 4 +#define SYSFS_RESUME 8 +#define SYSFS_SUSPEND_OR_RESUME (SYSFS_SUSPEND | SYSFS_RESUME) +#define SYSFS_SUSPENDING (SYSFS_SUSPEND | SYSFS_NEEDS_SM_FOR_WRITE) +#define SYSFS_RESUMING (SYSFS_RESUME | SYSFS_NEEDS_SM_FOR_WRITE) +#define SYSFS_NEEDS_SM_FOR_BOTH \ + (SYSFS_NEEDS_SM_FOR_READ | SYSFS_NEEDS_SM_FOR_WRITE) + +int suspend_register_sysfs_file(struct kobject *kobj, + struct suspend_sysfs_data *suspend_sysfs_data); +void suspend_unregister_sysfs_file(struct kobject *kobj, + struct suspend_sysfs_data *suspend_sysfs_data); + +extern struct subsystem suspend2_subsys; + +struct kobject *make_suspend2_sysdir(char *name); +void remove_suspend2_sysdir(struct kobject *obj); +extern void suspend_cleanup_sysfs(void); + +extern int s2_sysfs_init(void); +extern void s2_sysfs_exit(void); diff --git a/kernel/power/ui.c b/kernel/power/ui.c new file mode 100644 index 0000000..5b6789f --- /dev/null +++ b/kernel/power/ui.c @@ -0,0 +1,235 @@ +/* + * kernel/power/ui.c + * + * Copyright (C) 1998-2001 Gabor Kuti + * Copyright (C) 1998,2001,2002 Pavel Machek + * Copyright (C) 2002-2003 Florent Chabaud + * Copyright (C) 2002-2007 Nigel Cunningham (nigel at suspend2 net) + * + * This file is released under the GPLv2. + * + * Routines for Suspend2's user interface. + * + * The user interface code talks to a userspace program via a + * netlink socket. + * + * The kernel side: + * - starts the userui program; + * - sends text messages and progress bar status; + * + * The user space side: + * - passes messages regarding user requests (abort, toggle reboot etc) + * + */ + +#define __KERNEL_SYSCALLS__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "sysfs.h" +#include "modules.h" +#include "suspend.h" +#include "ui.h" +#include "netlink.h" +#include "power_off.h" + +static char local_printf_buf[1024]; /* Same as printk - should be safe */ +struct ui_ops *s2_current_ui; + +/*! The console log level we default to. */ +int suspend_default_console_level = 0; + +/* suspend_early_boot_message() + * Description: Handle errors early in the process of booting. + * The user may press C to continue booting, perhaps + * invalidating the image, or space to reboot. + * This works from either the serial console or normally + * attached keyboard. + * + * Note that we come in here from init, while the kernel is + * locked. If we want to get events from the serial console, + * we need to temporarily unlock the kernel. + * + * suspend_early_boot_message may also be called post-boot. + * In this case, it simply printks the message and returns. + * + * Arguments: int Whether we are able to erase the image. + * int default_answer. What to do when we timeout. This + * will normally be continue, but the user might + * provide command line options (__setup) to override + * particular cases. + * Char *. Pointer to a string explaining why we're moaning. + */ + +#define say(message, a...) printk(KERN_EMERG message, ##a) +#define message_timeout 25 /* message_timeout * 10 must fit in 8 bits */ + +int suspend_early_boot_message(int message_detail, int default_answer, char *warning_reason, ...) +{ + unsigned long orig_state = get_suspend_state(), continue_req = 0; + unsigned long orig_loglevel = console_loglevel; + va_list args; + int printed_len; + + if (warning_reason) { + va_start(args, warning_reason); + printed_len = vsnprintf(local_printf_buf, + sizeof(local_printf_buf), + warning_reason, + args); + va_end(args); + } + + if (!test_suspend_state(SUSPEND_BOOT_TIME)) { + printk("Suspend2: %s\n", local_printf_buf); + return default_answer; + } + + /* We might be called directly from do_mounts_initrd if the + * user fails to set up their initrd properly. We need to + * enable the keyboard handler by setting the running flag */ + set_suspend_state(SUSPEND_RUNNING); + +#if defined(CONFIG_VT) || defined(CONFIG_SERIAL_CONSOLE) + console_loglevel = 7; + + say("=== Suspend2 ===\n\n"); + if (warning_reason) { + say("BIG FAT WARNING!! %s\n\n", local_printf_buf); + switch (message_detail) { + case 0: + say("If you continue booting, note that any image WILL NOT BE REMOVED.\n"); + say("Suspend is unable to do so because the appropriate modules aren't\n"); + say("loaded. You should manually remove the image to avoid any\n"); + say("possibility of corrupting your filesystem(s) later.\n"); + break; + case 1: + say("If you want to use the current suspend image, reboot and try\n"); + say("again with the same kernel that you suspended from. If you want\n"); + say("to forget that image, continue and the image will be erased.\n"); + break; + } + say("Press SPACE to reboot or C to continue booting with this kernel\n\n"); + say("Default action if you don't select one in %d seconds is: %s.\n", + message_timeout, + default_answer == SUSPEND_CONTINUE_REQ ? + "continue booting" : "reboot"); + } else { + say("BIG FAT WARNING!!\n\n"); + say("You have tried to resume from this image before.\n"); + say("If it failed once, it may well fail again.\n"); + say("Would you like to remove the image and boot normally?\n"); + say("This will be equivalent to entering noresume2 on the\n"); + say("kernel command line.\n\n"); + say("Press SPACE to remove the image or C to continue resuming.\n\n"); + say("Default action if you don't select one in %d seconds is: %s.\n", + message_timeout, + !!default_answer ? + "continue resuming" : "remove the image"); + } + console_loglevel = orig_loglevel; + + set_suspend_state(SUSPEND_SANITY_CHECK_PROMPT); + clear_suspend_state(SUSPEND_CONTINUE_REQ); + + if (suspend_wait_for_keypress(message_timeout) == 0) /* We timed out */ + continue_req = !!default_answer; + else + continue_req = test_suspend_state(SUSPEND_CONTINUE_REQ); + + if ((warning_reason) && (!continue_req)) + machine_restart(NULL); + + restore_suspend_state(orig_state); + if (continue_req) + set_suspend_state(SUSPEND_CONTINUE_REQ); + +#endif /* CONFIG_VT or CONFIG_SERIAL_CONSOLE */ + return -EIO; +} +#undef say + +/* + * User interface specific /sys/power/suspend2 entries. + */ + +static struct suspend_sysfs_data sysfs_params[] = { +#if defined(CONFIG_NET) && defined(CONFIG_SYSFS) + { SUSPEND2_ATTR("default_console_level", SYSFS_RW), + SYSFS_INT(&suspend_default_console_level, 0, 7, 0) + }, + + { SUSPEND2_ATTR("debug_sections", SYSFS_RW), + SYSFS_UL(&suspend_debug_state, 0, 1 << 30, 0) + }, + + { SUSPEND2_ATTR("log_everything", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_LOGALL, 0) + }, +#endif + { SUSPEND2_ATTR("pm_prepare_console", SYSFS_RW), + SYSFS_BIT(&suspend_action, SUSPEND_PM_PREPARE_CONSOLE, 0) + } +}; + +static struct suspend_module_ops userui_ops = { + .type = MISC_MODULE, + .name = "Basic User Interface", + .directory = "user_interface", + .module = THIS_MODULE, + .sysfs_data = sysfs_params, + .num_sysfs_entries = sizeof(sysfs_params) / sizeof(struct suspend_sysfs_data), +}; + +int s2_register_ui_ops(struct ui_ops *this_ui) +{ + if (s2_current_ui) { + printk("Only one Suspend2 user interface module can be loaded" + " at a time."); + return -EBUSY; + } + + s2_current_ui = this_ui; + + return 0; +} + +void s2_remove_ui_ops(struct ui_ops *this_ui) +{ + if (s2_current_ui != this_ui) + return; + + s2_current_ui = NULL; +} + +/* suspend_console_sysfs_init + * Description: Boot time initialisation for user interface. + */ + +int s2_ui_init(void) +{ + return suspend_register_module(&userui_ops); +} + +void s2_ui_exit(void) +{ + suspend_unregister_module(&userui_ops); +} + +#ifdef CONFIG_SUSPEND2_EXPORTS +EXPORT_SYMBOL_GPL(s2_current_ui); +EXPORT_SYMBOL_GPL(suspend_early_boot_message); +EXPORT_SYMBOL_GPL(s2_register_ui_ops); +EXPORT_SYMBOL_GPL(s2_remove_ui_ops); +EXPORT_SYMBOL_GPL(suspend_default_console_level); +#endif diff --git a/kernel/power/ui.h b/kernel/power/ui.h new file mode 100644 index 0000000..2d1034c --- /dev/null +++ b/kernel/power/ui.h @@ -0,0 +1,108 @@ +/* + * kernel/power/ui.h + * + * Copyright (C) 2004-2007 Nigel Cunningham (nigel at suspend2 net) + */ + +enum { + DONT_CLEAR_BAR, + CLEAR_BAR +}; + +enum { + /* Userspace -> Kernel */ + USERUI_MSG_ABORT = 0x11, + USERUI_MSG_SET_STATE = 0x12, + USERUI_MSG_GET_STATE = 0x13, + USERUI_MSG_GET_DEBUG_STATE = 0x14, + USERUI_MSG_SET_DEBUG_STATE = 0x15, + USERUI_MSG_SPACE = 0x18, + USERUI_MSG_GET_POWERDOWN_METHOD = 0x1A, + USERUI_MSG_SET_POWERDOWN_METHOD = 0x1B, + USERUI_MSG_GET_LOGLEVEL = 0x1C, + USERUI_MSG_SET_LOGLEVEL = 0x1D, + + /* Kernel -> Userspace */ + USERUI_MSG_MESSAGE = 0x21, + USERUI_MSG_PROGRESS = 0x22, + USERUI_MSG_REDRAW = 0x25, + + USERUI_MSG_MAX, +}; + +struct userui_msg_params { + unsigned long a, b, c, d; + char text[255]; +}; + +struct ui_ops { + char (*wait_for_key) (int timeout); + unsigned long (*update_status) (unsigned long value, + unsigned long maximum, const char *fmt, ...); + void (*prepare_status) (int clearbar, const char *fmt, ...); + void (*cond_pause) (int pause, char *message); + void (*abort)(int result_code, const char *fmt, ...); + void (*prepare)(void); + void (*cleanup)(void); + void (*redraw)(void); + void (*message)(unsigned long section, unsigned long level, + int normally_logged, const char *fmt, ...); +}; + +extern struct ui_ops *s2_current_ui; + +#define suspend_update_status(val, max, fmt, args...) \ + (s2_current_ui ? (s2_current_ui->update_status) (val, max, fmt, ##args) : max) + +#define suspend_wait_for_keypress(timeout) \ + (s2_current_ui ? (s2_current_ui->wait_for_key) (timeout) : 0) + +#define suspend_ui_redraw(void) \ + do { if (s2_current_ui) \ + (s2_current_ui->redraw)(); \ + } while(0) + +#define suspend_prepare_console(void) \ + do { if (s2_current_ui) \ + (s2_current_ui->prepare)(); \ + } while(0) + +#define suspend_cleanup_console(void) \ + do { if (s2_current_ui) \ + (s2_current_ui->cleanup)(); \ + } while(0) + +#define abort_suspend(result, fmt, args...) \ + do { if (s2_current_ui) \ + (s2_current_ui->abort)(result, fmt, ##args); \ + else { \ + set_result_state(SUSPEND_ABORTED); \ + set_result_state(result); \ + } \ + } while(0) + +#define suspend_cond_pause(pause, message) \ + do { if (s2_current_ui) \ + (s2_current_ui->cond_pause)(pause, message); \ + } while(0) + +#define suspend_prepare_status(clear, fmt, args...) \ + do { if (s2_current_ui) \ + (s2_current_ui->prepare_status)(clear, fmt, ##args); \ + else \ + printk(fmt, ##args); \ + } while(0) + +extern int suspend_default_console_level; + +#define suspend_message(sn, lev, log, fmt, a...) \ +do { \ + if (s2_current_ui && (!sn || test_debug_state(sn))) \ + s2_current_ui->message(sn, lev, log, fmt, ##a); \ +} while(0) + +__exit void suspend_ui_cleanup(void); +extern int s2_ui_init(void); +extern void s2_ui_exit(void); +extern int s2_register_ui_ops(struct ui_ops *this_ui); +extern void s2_remove_ui_ops(struct ui_ops *this_ui); diff --git a/kernel/printk.c b/kernel/printk.c index 4b47e59..6cafd3b 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -32,6 +32,7 @@ #include #include #include +#include #include @@ -92,9 +93,9 @@ static DEFINE_SPINLOCK(logbuf_lock); * The indices into log_buf are not constrained to log_buf_len - they * must be masked before subscripting */ -static unsigned long log_start; /* Index into log_buf: next char to be read by syslog() */ -static unsigned long con_start; /* Index into log_buf: next char to be sent to consoles */ -static unsigned long log_end; /* Index into log_buf: most-recently-written-char + 1 */ +static unsigned long POSS_NOSAVE log_start; /* Index into log_buf: next char to be read by syslog() */ +static unsigned long POSS_NOSAVE con_start; /* Index into log_buf: next char to be sent to consoles */ +static unsigned long POSS_NOSAVE log_end; /* Index into log_buf: most-recently-written-char + 1 */ /* * Array of consoles built from command line options (console=) @@ -117,10 +118,10 @@ static int console_may_schedule; #ifdef CONFIG_PRINTK -static char __log_buf[__LOG_BUF_LEN]; -static char *log_buf = __log_buf; -static int log_buf_len = __LOG_BUF_LEN; -static unsigned long logged_chars; /* Number of chars produced since last read+clear operation */ +static char POSS_NOSAVE __log_buf[__LOG_BUF_LEN]; +static char POSS_NOSAVE *log_buf = __log_buf; +static int POSS_NOSAVE log_buf_len = __LOG_BUF_LEN; +static unsigned long POSS_NOSAVE logged_chars; /* Number of chars produced since last read+clear operation */ static int __init log_buf_len_setup(char *str) { @@ -739,12 +740,14 @@ void suspend_console(void) acquire_console_sem(); console_suspended = 1; } +EXPORT_SYMBOL(suspend_console); void resume_console(void) { console_suspended = 0; release_console_sem(); } +EXPORT_SYMBOL(resume_console); #endif /* CONFIG_DISABLE_CONSOLE_SUSPEND */ /** diff --git a/kernel/timer.c b/kernel/timer.c index dd6c2c1..aded05d 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1240,6 +1240,38 @@ unsigned long avenrun[3]; EXPORT_SYMBOL(avenrun); +static unsigned long avenrun_save[3]; +/* + * save_avenrun - Record the values prior to starting a hibernation cycle. + * We do this to make the work done in hibernation invisible to userspace + * post-suspend. Some programs, including some MTAs, watch the load average + * and stop work until it lowers. Without this, they would stop working for + * a while post-resume, unnecessarily. + */ + +void save_avenrun(void) +{ + avenrun_save[0] = avenrun[0]; + avenrun_save[1] = avenrun[1]; + avenrun_save[2] = avenrun[2]; +} + +EXPORT_SYMBOL_GPL(save_avenrun); + +void restore_avenrun(void) +{ + if (!avenrun_save[0]) + return; + + avenrun[0] = avenrun_save[0]; + avenrun[1] = avenrun_save[1]; + avenrun[2] = avenrun_save[2]; + + avenrun_save[0] = 0; +} + +EXPORT_SYMBOL_GPL(restore_avenrun); + /* * calc_load - given tick count, update the avenrun load estimates. * This is called while holding a write_lock on xtime_lock. diff --git a/lib/Kconfig b/lib/Kconfig index 3842499..758a928 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -47,6 +47,9 @@ config AUDIT_GENERIC depends on AUDIT && !AUDIT_ARCH default y +config DYN_PAGEFLAGS + bool + # # compression support is select'ed if needed # diff --git a/lib/Makefile b/lib/Makefile index 992a39e..b974998 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -38,6 +38,9 @@ ifneq ($(CONFIG_HAVE_DEC_LOCK),y) endif obj-$(CONFIG_BITREVERSE) += bitrev.o + +obj-$(CONFIG_DYN_PAGEFLAGS) += dyn_pageflags.o + obj-$(CONFIG_CRC_CCITT) += crc-ccitt.o obj-$(CONFIG_CRC16) += crc16.o obj-$(CONFIG_CRC32) += crc32.o diff --git a/lib/dyn_pageflags.c b/lib/dyn_pageflags.c new file mode 100644 index 0000000..963ac5c --- /dev/null +++ b/lib/dyn_pageflags.c @@ -0,0 +1,312 @@ +/* + * lib/dyn_pageflags.c + * + * Copyright (C) 2004-2006 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Routines for dynamically allocating and releasing bitmaps + * used as pseudo-pageflags. + */ + +#include +#include +#include +#include + +#if 0 +#define PR_DEBUG(a, b...) do { printk(a, ##b); } while(0) +#else +#define PR_DEBUG(a, b...) do { } while(0) +#endif + +#define pages_for_zone(zone) \ + (DIV_ROUND_UP((zone)->spanned_pages, (PAGE_SIZE << 3))) + +/* + * clear_dyn_pageflags(dyn_pageflags_t pagemap) + * + * Clear an array used to store local page flags. + * + */ + +void clear_dyn_pageflags(dyn_pageflags_t pagemap) +{ + int i = 0, zone_idx, node_id = 0; + struct zone *zone; + struct pglist_data *pgdat; + + BUG_ON(!pagemap); + + for_each_online_pgdat(pgdat) { + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) { + zone = &pgdat->node_zones[zone_idx]; + + if (!populated_zone(zone)) + continue; + + for (i = 0; i < pages_for_zone(zone); i++) + memset((pagemap[node_id][zone_idx][i]), 0, + PAGE_SIZE); + } + node_id++; + } +} + +/* + * free_dyn_pageflags(dyn_pageflags_t pagemap) + * + * Free a dynamically allocated pageflags bitmap. For Suspend2 usage, we + * support data being relocated from slab to pages that don't conflict + * with the image that will be copied back. This is the reason for the + * PageSlab tests below. + * + */ +void free_dyn_pageflags(dyn_pageflags_t *pagemap) +{ + int i = 0, zone_pages, node_id = -1, zone_idx; + struct zone *zone; + struct pglist_data *pgdat; + + if (!*pagemap) + return; + + PR_DEBUG("Seeking to free dyn_pageflags %p.\n", pagemap); + + for_each_online_pgdat(pgdat) { + node_id++; + + PR_DEBUG("Node id %d.\n", node_id); + + if (!(*pagemap)[node_id]) { + PR_DEBUG("Node %d unallocated.\n", node_id); + continue; + } + + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) { + zone = &pgdat->node_zones[zone_idx]; + if (!populated_zone(zone)) { + PR_DEBUG("Node %d zone %d unpopulated.\n", node_id, zone_idx); + continue; + } + + if (!(*pagemap)[node_id][zone_idx]) { + PR_DEBUG("Node %d zone %d unallocated.\n", node_id, zone_idx); + continue; + } + + PR_DEBUG("Node id %d. Zone %d.\n", node_id, zone_idx); + + zone_pages = pages_for_zone(zone); + + for (i = 0; i < zone_pages; i++) { + PR_DEBUG("Node id %d. Zone %d. Page %d.\n", node_id, zone_idx, i); + free_page((unsigned long)(*pagemap)[node_id][zone_idx][i]); + } + + kfree((*pagemap)[node_id][zone_idx]); + } + PR_DEBUG("Free node %d (%p).\n", node_id, pagemap[node_id]); + kfree((*pagemap)[node_id]); + } + + PR_DEBUG("Free map pgdat list at %p.\n", pagemap); + kfree(*pagemap); + + *pagemap = NULL; + PR_DEBUG("Done.\n"); + return; +} + +static int try_alloc_dyn_pageflag_part(int nr_ptrs, void **ptr) +{ + *ptr = kzalloc(sizeof(void *) * nr_ptrs, GFP_ATOMIC); + PR_DEBUG("Got %p. Putting it in %p.\n", *ptr, ptr); + + if (*ptr) + return 0; + + printk("Error. Unable to allocate memory for dynamic pageflags."); + return -ENOMEM; +} + +/* + * allocate_dyn_pageflags + * + * Allocate a bitmap for dynamic page flags. + * + */ +int allocate_dyn_pageflags(dyn_pageflags_t *pagemap) +{ + int i, zone_idx, zone_pages, node_id = 0; + struct zone *zone; + struct pglist_data *pgdat; + + if (*pagemap) { + PR_DEBUG("Pagemap %p already allocated.\n", pagemap); + return 0; + } + + PR_DEBUG("Seeking to allocate dyn_pageflags %p.\n", pagemap); + + for_each_online_pgdat(pgdat) + node_id++; + + if (try_alloc_dyn_pageflag_part(node_id, (void **) pagemap)) + return -ENOMEM; + + node_id = 0; + + for_each_online_pgdat(pgdat) { + PR_DEBUG("Node %d.\n", node_id); + + if (try_alloc_dyn_pageflag_part(MAX_NR_ZONES, + (void **) &(*pagemap)[node_id])) + return -ENOMEM; + + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) { + PR_DEBUG("Zone %d of %d.\n", zone_idx, MAX_NR_ZONES); + + zone = &pgdat->node_zones[zone_idx]; + + if (!populated_zone(zone)) { + PR_DEBUG("Node %d zone %d unpopulated - won't allocate.\n", node_id, zone_idx); + continue; + } + + zone_pages = pages_for_zone(zone); + + PR_DEBUG("Node %d zone %d (needs %d pages).\n", node_id, zone_idx, zone_pages); + + if (try_alloc_dyn_pageflag_part(zone_pages, + (void **) &(*pagemap)[node_id][zone_idx])) + return -ENOMEM; + + for (i = 0; i < zone_pages; i++) { + unsigned long address = get_zeroed_page(GFP_ATOMIC); + if (!address) { + PR_DEBUG("Error. Unable to allocate memory for " + "dynamic pageflags."); + free_dyn_pageflags(pagemap); + return -ENOMEM; + } + PR_DEBUG("Node %d zone %d. Page %d.\n", node_id, zone_idx, i); + (*pagemap)[node_id][zone_idx][i] = + (unsigned long *) address; + } + } + node_id++; + } + + PR_DEBUG("Done.\n"); + return 0; +} + +#define GET_BIT_AND_UL(bitmap, page) \ + struct zone *zone = page_zone(page); \ + unsigned long zone_pfn = page_to_pfn(page) - zone->zone_start_pfn; \ + int node = page_to_nid(page); \ + int zone_num = zone_idx(zone); \ + int pagenum = PAGENUMBER(zone_pfn); \ + int page_offset = PAGEINDEX(zone_pfn); \ + unsigned long *ul = ((*bitmap)[node][zone_num][pagenum]) + page_offset; \ + int bit = PAGEBIT(zone_pfn); + +/* + * test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) + * + * Is the page flagged in the given bitmap? + * + */ + +int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) +{ + GET_BIT_AND_UL(bitmap, page); + return test_bit(bit, ul); +} + +/* + * set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) + * + * Set the flag for the page in the given bitmap. + * + */ + +void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) +{ + GET_BIT_AND_UL(bitmap, page); + set_bit(bit, ul); +} + +/* + * clear_dynpageflags(dyn_pageflags_t *bitmap, struct page *page) + * + * Clear the flag for the page in the given bitmap. + * + */ + +void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) +{ + GET_BIT_AND_UL(bitmap, page); + clear_bit(bit, ul); +} + +/* + * get_next_bit_on(dyn_pageflags_t bitmap, int counter) + * + * Given a pfn (possibly -1), find the next pfn in the bitmap that + * is set. If there are no more flags set, return -1. + * + */ + +unsigned long get_next_bit_on(dyn_pageflags_t bitmap, unsigned long counter) +{ + struct page *page; + struct zone *zone; + unsigned long *ul = NULL; + unsigned long zone_offset; + int pagebit, zone_num, first = (counter == (max_pfn + 1)), node; + + if (first) + counter = first_online_pgdat()->node_zones->zone_start_pfn; + + page = pfn_to_page(counter); + zone = page_zone(page); + node = zone->zone_pgdat->node_id; + zone_num = zone_idx(zone); + zone_offset = counter - zone->zone_start_pfn; + + if (first) + goto test; + + do { + zone_offset++; + + if (zone_offset >= zone->spanned_pages) { + do { + zone = next_zone(zone); + if (!zone) + return max_pfn + 1; + } while(!zone->spanned_pages); + + zone_num = zone_idx(zone); + node = zone->zone_pgdat->node_id; + zone_offset = 0; + } +test: + pagebit = PAGEBIT(zone_offset); + + if (!pagebit || !ul) + ul = (bitmap[node][zone_num][PAGENUMBER(zone_offset)]) + + PAGEINDEX(zone_offset); + + if (!(*ul & ~((1 << pagebit) - 1))) { + zone_offset += BITS_PER_LONG - pagebit - 1; + continue; + } + + } while(!test_bit(pagebit, ul)); + + return zone->zone_start_pfn + zone_offset; +} + diff --git a/lib/vsprintf.c b/lib/vsprintf.c index b025864..2138c47 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -236,6 +236,29 @@ static char * number(char * buf, char * end, unsigned long long num, int base, i return buf; } +/* + * vsnprintf_used + * + * Functionality : Print a string with parameters to a buffer of a + * limited size. Unlike vsnprintf, we return the number + * of bytes actually put in the buffer, not the number + * that would have been put in if it was big enough. + */ +int snprintf_used(char *buffer, int buffer_size, const char *fmt, ...) +{ + int result; + va_list args; + + if (!buffer_size) + return 0; + + va_start(args, fmt); + result = vsnprintf(buffer, buffer_size, fmt, args); + va_end(args); + + return result > buffer_size ? buffer_size : result; +} + /** * vsnprintf - Format a string and place it in a buffer * @buf: The buffer to place the result into diff --git a/mm/vmscan.c b/mm/vmscan.c index db023e2..54acc2c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -654,6 +654,28 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, return nr_taken; } +/* return_lru_pages puts a list of pages back on a zone's lru lists. */ + +static void return_lru_pages(struct list_head *page_list, struct zone *zone, + struct pagevec *pvec) +{ + while (!list_empty(page_list)) { + struct page *page = lru_to_page(page_list); + VM_BUG_ON(PageLRU(page)); + SetPageLRU(page); + list_del(&page->lru); + if (PageActive(page)) + add_page_to_active_list(zone, page); + else + add_page_to_inactive_list(zone, page); + if (!pagevec_add(pvec, page)) { + spin_unlock_irq(&zone->lru_lock); + __pagevec_release(pvec); + spin_lock_irq(&zone->lru_lock); + } + } +} + /* * shrink_inactive_list() is a helper for shrink_zone(). It returns the number * of reclaimed pages @@ -671,7 +693,6 @@ static unsigned long shrink_inactive_list(unsigned long max_scan, lru_add_drain(); spin_lock_irq(&zone->lru_lock); do { - struct page *page; unsigned long nr_taken; unsigned long nr_scan; unsigned long nr_freed; @@ -701,21 +722,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan, /* * Put back any unfreeable pages. */ - while (!list_empty(&page_list)) { - page = lru_to_page(&page_list); - VM_BUG_ON(PageLRU(page)); - SetPageLRU(page); - list_del(&page->lru); - if (PageActive(page)) - add_page_to_active_list(zone, page); - else - add_page_to_inactive_list(zone, page); - if (!pagevec_add(&pvec, page)) { - spin_unlock_irq(&zone->lru_lock); - __pagevec_release(&pvec); - spin_lock_irq(&zone->lru_lock); - } - } + return_lru_pages(&page_list, zone, &pvec); } while (nr_scanned < max_scan); spin_unlock(&zone->lru_lock); done: @@ -1276,6 +1283,72 @@ out: return nr_reclaimed; } +struct lru_save { + struct zone *zone; + struct list_head active_list; + struct list_head inactive_list; + struct lru_save *next; +}; + +struct lru_save *lru_save_list; + +void unlink_lru_lists(void) +{ + struct zone *zone; + + for_each_zone(zone) { + struct lru_save *this; + unsigned long moved, scanned; + + if (!zone->spanned_pages) + continue; + + this = (struct lru_save *) + kzalloc(sizeof(struct lru_save), GFP_ATOMIC); + + BUG_ON(!this); + + this->next = lru_save_list; + lru_save_list = this; + + this->zone = zone; + + spin_lock_irq(&zone->lru_lock); + INIT_LIST_HEAD(&this->active_list); + INIT_LIST_HEAD(&this->inactive_list); + moved = isolate_lru_pages(zone_page_state(zone, NR_ACTIVE), + &zone->active_list, &this->active_list, + &scanned); + __mod_zone_page_state(zone, NR_ACTIVE, -moved); + moved = isolate_lru_pages(zone_page_state(zone, NR_INACTIVE), + &zone->inactive_list, &this->inactive_list, + &scanned); + __mod_zone_page_state(zone, NR_INACTIVE, -moved); + spin_unlock_irq(&zone->lru_lock); + } +} + +void relink_lru_lists(void) +{ + while(lru_save_list) { + struct lru_save *this = lru_save_list; + struct zone *zone = this->zone; + struct pagevec pvec; + + pagevec_init(&pvec, 1); + + lru_save_list = this->next; + + spin_lock_irq(&zone->lru_lock); + return_lru_pages(&this->active_list, zone, &pvec); + return_lru_pages(&this->inactive_list, zone, &pvec); + spin_unlock_irq(&zone->lru_lock); + pagevec_release(&pvec); + + kfree(this); + } +} + /* * The background pageout daemon, started as a kernel thread * from the init process. @@ -1323,8 +1396,6 @@ static int kswapd(void *p) for ( ; ; ) { unsigned long new_order; - try_to_freeze(); - /* kswapd has been busy so delay watermark_timer */ mod_timer(&pgdat->watermark_timer, jiffies + WT_EXPIRY); prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); @@ -1335,13 +1406,20 @@ static int kswapd(void *p) */ order = new_order; } else { set_user_nice(tsk, 0); - schedule(); + if (!freezing(current)) + schedule(); + order = pgdat->kswapd_max_order; } finish_wait(&pgdat->kswapd_wait, &wait); - balance_pgdat(pgdat, order); + if (!try_to_freeze()) { + /* We can speed up thawing tasks if we don't call + * balance_pgdat after returning from the refrigerator + */ + balance_pgdat(pgdat, order); + } } return 0; } @@ -1355,6 +1433,9 @@ void wakeup_kswapd(struct zone *zone, int order) if (!populated_zone(zone)) return; + if (freezer_is_on()) + return; + pgdat = zone->zone_pgdat; if (zone_watermark_ok(zone, order, zone->pages_low, 0, 0)) return; @@ -1368,6 +1449,91 @@ void wakeup_kswapd(struct zone *zone, int order) } #ifdef CONFIG_PM +void shrink_one_zone(struct zone *zone, int total_to_free) +{ + int prio; + unsigned long still_to_free = total_to_free; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .may_swap = 0, + .may_writepage = 1, + .mapped = vm_mapped, + }; + + if (!populated_zone(zone) || zone->all_unreclaimable) + return; + + if (still_to_free <= 0) + return; + + if (is_highmem(zone)) + sc.gfp_mask |= __GFP_HIGHMEM; + + for (prio = DEF_PRIORITY; prio >= 0; prio--) { + unsigned long to_free, just_freed, orig_size; + unsigned long old_nr_active; + + to_free = min(zone_page_state(zone, NR_ACTIVE) + + zone_page_state(zone, NR_INACTIVE), + still_to_free); + + if (to_free <= 0) + return; + + sc.swap_cluster_max = to_free - + zone_page_state(zone, NR_INACTIVE); + + do { + old_nr_active = zone_page_state(zone, NR_ACTIVE); + zone->nr_scan_active = sc.swap_cluster_max - 1; + shrink_active_list(sc.swap_cluster_max, zone, &sc, + prio); + zone->nr_scan_active = 0; + + sc.swap_cluster_max = to_free - zone_page_state(zone, + NR_INACTIVE); + + } while (sc.swap_cluster_max > 0 && + zone_page_state(zone, NR_ACTIVE) > old_nr_active); + + to_free = min(zone_page_state(zone, NR_ACTIVE) + + zone_page_state(zone, NR_INACTIVE), + still_to_free); + + do { + orig_size = zone_page_state(zone, NR_ACTIVE) + + zone_page_state(zone, NR_INACTIVE); + zone->nr_scan_inactive = to_free; + sc.swap_cluster_max = to_free; + shrink_inactive_list(to_free, zone, &sc); + just_freed = (orig_size - + (zone_page_state(zone, NR_ACTIVE) + + zone_page_state(zone, NR_INACTIVE))); + zone->nr_scan_inactive = 0; + still_to_free -= just_freed; + to_free -= just_freed; + } while (just_freed > 0 && still_to_free > 0); + }; + + while (still_to_free > 0) { + unsigned long nr_slab = global_page_state(NR_SLAB_RECLAIMABLE); + struct reclaim_state reclaim_state; + + if (nr_slab > still_to_free) + nr_slab = still_to_free; + + reclaim_state.reclaimed_slab = 0; + shrink_slab(nr_slab, sc.gfp_mask, nr_slab); + if (!reclaim_state.reclaimed_slab) + break; + + still_to_free -= reclaim_state.reclaimed_slab; + } + + return; +} + + /* * Helper function for shrink_all_memory(). Tries to reclaim 'nr_pages' pages * from LRU lists system-wide, for given pass and priority, and returns the