Magellan Linux

Annotation of /trunk/kernel26-alx/patches-2.6.33-r2/0153-2.6.33-unionfs-2.5.4.patch

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1192 - (hide annotations) (download)
Sat Nov 20 15:43:20 2010 UTC (13 years, 5 months ago) by niro
File size: 336788 byte(s)
added fixes for intel-agp and i915 module loading issues
1 niro 1192 diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
2     index 875d496..0a9acac 100644
3     --- a/Documentation/filesystems/00-INDEX
4     +++ b/Documentation/filesystems/00-INDEX
5     @@ -106,6 +106,8 @@ udf.txt
6     - info and mount options for the UDF filesystem.
7     ufs.txt
8     - info on the ufs filesystem.
9     +unionfs/
10     + - info on the unionfs filesystem
11     vfat.txt
12     - info on using the VFAT filesystem used in Windows NT and Windows 95
13     vfs.txt
14     diff --git a/Documentation/filesystems/unionfs/00-INDEX b/Documentation/filesystems/unionfs/00-INDEX
15     new file mode 100644
16     index 0000000..96fdf67
17     --- /dev/null
18     +++ b/Documentation/filesystems/unionfs/00-INDEX
19     @@ -0,0 +1,10 @@
20     +00-INDEX
21     + - this file.
22     +concepts.txt
23     + - A brief introduction of concepts.
24     +issues.txt
25     + - A summary of known issues with unionfs.
26     +rename.txt
27     + - Information regarding rename operations.
28     +usage.txt
29     + - Usage information and examples.
30     diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
31     new file mode 100644
32     index 0000000..b853788
33     --- /dev/null
34     +++ b/Documentation/filesystems/unionfs/concepts.txt
35     @@ -0,0 +1,287 @@
36     +Unionfs 2.x CONCEPTS:
37     +=====================
38     +
39     +This file describes the concepts needed by a namespace unification file
40     +system.
41     +
42     +
43     +Branch Priority:
44     +================
45     +
46     +Each branch is assigned a unique priority - starting from 0 (highest
47     +priority). No two branches can have the same priority.
48     +
49     +
50     +Branch Mode:
51     +============
52     +
53     +Each branch is assigned a mode - read-write or read-only. This allows
54     +directories on media mounted read-write to be used in a read-only manner.
55     +
56     +
57     +Whiteouts:
58     +==========
59     +
60     +A whiteout removes a file name from the namespace. Whiteouts are needed when
61     +one attempts to remove a file on a read-only branch.
62     +
63     +Suppose we have a two-branch union, where branch 0 is read-write and branch
64     +1 is read-only. And a file 'foo' on branch 1:
65     +
66     +./b0/
67     +./b1/
68     +./b1/foo
69     +
70     +The unified view would simply be:
71     +
72     +./union/
73     +./union/foo
74     +
75     +Since 'foo' is stored on a read-only branch, it cannot be removed. A
76     +whiteout is used to remove the name 'foo' from the unified namespace. Again,
77     +since branch 1 is read-only, the whiteout cannot be created there. So, we
78     +try on a higher priority (lower numerically) branch and create the whiteout
79     +there.
80     +
81     +./b0/
82     +./b0/.wh.foo
83     +./b1/
84     +./b1/foo
85     +
86     +Later, when Unionfs traverses branches (due to lookup or readdir), it
87     +eliminate 'foo' from the namespace (as well as the whiteout itself.)
88     +
89     +
90     +Opaque Directories:
91     +===================
92     +
93     +Assume we have a unionfs mount comprising of two branches. Branch 0 is
94     +empty; branch 1 has the directory /a and file /a/f. Let's say we mount a
95     +union of branch 0 as read-write and branch 1 as read-only. Now, let's say
96     +we try to perform the following operation in the union:
97     +
98     + rm -fr a
99     +
100     +Because branch 1 is not writable, we cannot physically remove the file /a/f
101     +or the directory /a. So instead, we will create a whiteout in branch 0
102     +named /.wh.a, masking out the name "a" from branch 1. Next, let's say we
103     +try to create a directory named "a" as follows:
104     +
105     + mkdir a
106     +
107     +Because we have a whiteout for "a" already, Unionfs behaves as if "a"
108     +doesn't exist, and thus will delete the whiteout and replace it with an
109     +actual directory named "a".
110     +
111     +The problem now is that if you try to "ls" in the union, Unionfs will
112     +perform is normal directory name unification, for *all* directories named
113     +"a" in all branches. This will cause the file /a/f from branch 1 to
114     +re-appear in the union's namespace, which violates Unix semantics.
115     +
116     +To avoid this problem, we have a different form of whiteouts for
117     +directories, called "opaque directories" (same as BSD Union Mount does).
118     +Whenever we replace a whiteout with a directory, that directory is marked as
119     +opaque. In Unionfs 2.x, it means that we create a file named
120     +/a/.wh.__dir_opaque in branch 0, after having created directory /a there.
121     +When unionfs notices that a directory is opaque, it stops all namespace
122     +operations (including merging readdir contents) at that opaque directory.
123     +This prevents re-exposing names from masked out directories.
124     +
125     +
126     +Duplicate Elimination:
127     +======================
128     +
129     +It is possible for files on different branches to have the same name.
130     +Unionfs then has to select which instance of the file to show to the user.
131     +Given the fact that each branch has a priority associated with it, the
132     +simplest solution is to take the instance from the highest priority
133     +(numerically lowest value) and "hide" the others.
134     +
135     +
136     +Unlinking:
137     +=========
138     +
139     +Unlink operation on non-directory instances is optimized to remove the
140     +maximum possible objects in case multiple underlying branches have the same
141     +file name. The unlink operation will first try to delete file instances
142     +from highest priority branch and then move further to delete from remaining
143     +branches in order of their decreasing priority. Consider a case (F..D..F),
144     +where F is a file and D is a directory of the same name; here, some
145     +intermediate branch could have an empty directory instance with the same
146     +name, so this operation also tries to delete this directory instance and
147     +proceed further to delete from next possible lower priority branch. The
148     +unionfs unlink operation will smoothly delete the files with same name from
149     +all possible underlying branches. In case if some error occurs, it creates
150     +whiteout in highest priority branch that will hide file instance in rest of
151     +the branches. An error could occur either if an unlink operations in any of
152     +the underlying branch failed or if a branch has no write permission.
153     +
154     +This unlinking policy is known as "delete all" and it has the benefit of
155     +overall reducing the number of inodes used by duplicate files, and further
156     +reducing the total number of inodes consumed by whiteouts. The cost is of
157     +extra processing, but testing shows this extra processing is well worth the
158     +savings.
159     +
160     +
161     +Copyup:
162     +=======
163     +
164     +When a change is made to the contents of a file's data or meta-data, they
165     +have to be stored somewhere. The best way is to create a copy of the
166     +original file on a branch that is writable, and then redirect the write
167     +though to this copy. The copy must be made on a higher priority branch so
168     +that lookup and readdir return this newer "version" of the file rather than
169     +the original (see duplicate elimination).
170     +
171     +An entire unionfs mount can be read-only or read-write. If it's read-only,
172     +then none of the branches will be written to, even if some of the branches
173     +are physically writeable. If the unionfs mount is read-write, then the
174     +leftmost (highest priority) branch must be writeable (for copyup to take
175     +place); the remaining branches can be any mix of read-write and read-only.
176     +
177     +In a writeable mount, unionfs will create new files/dir in the leftmost
178     +branch. If one tries to modify a file in a read-only branch/media, unionfs
179     +will copyup the file to the leftmost branch and modify it there. If you try
180     +to modify a file from a writeable branch which is not the leftmost branch,
181     +then unionfs will modify it in that branch; this is useful if you, say,
182     +unify differnet packages (e.g., apache, sendmail, ftpd, etc.) and you want
183     +changes to specific package files to remain logically in the directory where
184     +they came from.
185     +
186     +Cache Coherency:
187     +================
188     +
189     +Unionfs users often want to be able to modify files and directories directly
190     +on the lower branches, and have those changes be visible at the Unionfs
191     +level. This means that data (e.g., pages) and meta-data (dentries, inodes,
192     +open files, etc.) have to be synchronized between the upper and lower
193     +layers. In other words, the newest changes from a layer below have to be
194     +propagated to the Unionfs layer above. If the two layers are not in sync, a
195     +cache incoherency ensues, which could lead to application failures and even
196     +oopses. The Linux kernel, however, has a rather limited set of mechanisms
197     +to ensure this inter-layer cache coherency---so Unionfs has to do most of
198     +the hard work on its own.
199     +
200     +Maintaining Invariants:
201     +
202     +The way Unionfs ensures cache coherency is as follows. At each entry point
203     +to a Unionfs file system method, we call a utility function to validate the
204     +primary objects of this method. Generally, we call unionfs_file_revalidate
205     +on open files, and __unionfs_d_revalidate_chain on dentries (which also
206     +validates inodes). These utility functions check to see whether the upper
207     +Unionfs object is in sync with any of the lower objects that it represents.
208     +The checks we perform include whether the Unionfs superblock has a newer
209     +generation number, or if any of the lower objects mtime's or ctime's are
210     +newer. (Note: generation numbers change when branch-management commands are
211     +issued, so in a way, maintaining cache coherency is also very important for
212     +branch-management.) If indeed we determine that any Unionfs object is no
213     +longer in sync with its lower counterparts, then we rebuild that object
214     +similarly to how we do so for branch-management.
215     +
216     +While rebuilding Unionfs's objects, we also purge any page mappings and
217     +truncate inode pages (see fs/unionfs/dentry.c:purge_inode_data). This is to
218     +ensure that Unionfs will re-get the newer data from the lower branches. We
219     +perform this purging only if the Unionfs operation in question is a reading
220     +operation; if Unionfs is performing a data writing operation (e.g., ->write,
221     +->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is
222     +because (1) a self-deadlock could occur and (2) the upper Unionfs pages are
223     +considered more authoritative anyway, as they are newer and will overwrite
224     +any lower pages.
225     +
226     +Unionfs maintains the following important invariant regarding mtime's,
227     +ctime's, and atime's: the upper inode object's times are the max() of all of
228     +the lower ones. For non-directory objects, there's only one object below,
229     +so the mapping is simple; for directory objects, there could me multiple
230     +lower objects and we have to sync up with the newest one of all the lower
231     +ones. This invariant is important to maintain, especially for directories
232     +(besides, we need this to be POSIX compliant). A union could comprise
233     +multiple writable branches, each of which could change. If we don't reflect
234     +the newest possible mtime/ctime, some applications could fail. For example,
235     +NFSv2/v3 exports check for newer directory mtimes on the server to determine
236     +if the client-side attribute cache should be purged.
237     +
238     +To maintain these important invariants, of course, Unionfs carefully
239     +synchronizes upper and lower times in various places. For example, if we
240     +copy-up a file to a top-level branch, the parent directory where the file
241     +was copied up to will now have a new mtime: so after a successful copy-up,
242     +we sync up with the new top-level branch's parent directory mtime.
243     +
244     +Implementation:
245     +
246     +This cache-coherency implementation is efficient because it defers any
247     +synchronizing between the upper and lower layers until absolutely needed.
248     +Consider the example a common situation where users perform a lot of lower
249     +changes, such as untarring a whole package. While these take place,
250     +typically the user doesn't access the files via Unionfs; only after the
251     +lower changes are done, does the user try to access the lower files. With
252     +our cache-coherency implementation, the entirety of the changes to the lower
253     +branches will not result in a single CPU cycle spent at the Unionfs level
254     +until the user invokes a system call that goes through Unionfs.
255     +
256     +We have considered two alternate cache-coherency designs. (1) Using the
257     +dentry/inode notify functionality to register interest in finding out about
258     +any lower changes. This is a somewhat limited and also a heavy-handed
259     +approach which could result in many notifications to the Unionfs layer upon
260     +each small change at the lower layer (imagine a file being modified multiple
261     +times in rapid succession). (2) Rewriting the VFS to support explicit
262     +callbacks from lower objects to upper objects. We began exploring such an
263     +implementation, but found it to be very complicated--it would have resulted
264     +in massive VFS/MM changes which are unlikely to be accepted by the LKML
265     +community. We therefore believe that our current cache-coherency design and
266     +implementation represent the best approach at this time.
267     +
268     +Limitations:
269     +
270     +Our implementation works in that as long as a user process will have caused
271     +Unionfs to be called, directly or indirectly, even to just do
272     +->d_revalidate; then we will have purged the current Unionfs data and the
273     +process will see the new data. For example, a process that continually
274     +re-reads the same file's data will see the NEW data as soon as the lower
275     +file had changed, upon the next read(2) syscall (even if the file is still
276     +open!) However, this doesn't work when the process re-reads the open file's
277     +data via mmap(2) (unless the user unmaps/closes the file and remaps/reopens
278     +it). Once we respond to ->readpage(s), then the kernel maps the page into
279     +the process's address space and there doesn't appear to be a way to force
280     +the kernel to invalidate those pages/mappings, and force the process to
281     +re-issue ->readpage. If there's a way to invalidate active mappings and
282     +force a ->readpage, let us know please (invalidate_inode_pages2 doesn't do
283     +the trick).
284     +
285     +Our current Unionfs code has to perform many file-revalidation calls. It
286     +would be really nice if the VFS would export an optional file system hook
287     +->file_revalidate (similarly to dentry->d_revalidate) that will be called
288     +before each VFS op that has a "struct file" in it.
289     +
290     +Certain file systems have micro-second granularity (or better) for inode
291     +times, and asynchronous actions could cause those times to change with some
292     +small delay. In such cases, Unionfs may see a changed inode time that only
293     +differs by a tiny fraction of a second: such a change may be a false
294     +positive indication that the lower object has changed, whereas if unionfs
295     +waits a little longer, that false indication will not be seen. (These false
296     +positives are harmless, because they would at most cause unionfs to
297     +re-validate an object that may need no revalidation, and print a debugging
298     +message that clutters the console/logs.) Therefore, to minimize the chances
299     +of these situations, we delay the detection of changed times by a small
300     +factor of a few seconds, called UNIONFS_MIN_CC_TIME (which defaults to 3
301     +seconds, as does NFS). This means that we will detect the change, only a
302     +couple of seconds later, if indeed the time change persists in the lower
303     +file object. This delayed detection has an added performance benefit: we
304     +reduce the number of times that unionfs has to revalidate objects, in case
305     +there's a lot of concurrent activity on both the upper and lower objects,
306     +for the same file(s). Lastly, this delayed time attribute detection is
307     +similar to how NFS clients operate (e.g., acregmin).
308     +
309     +Finally, there is no way currently in Linux to prevent lower directories
310     +from being moved around (i.e., topology changes); there's no way to prevent
311     +modifications to directory sub-trees of whole file systems which are mounted
312     +read-write. It is therefore possible for in-flight operations in unionfs to
313     +take place, while a lower directory is being moved around. Therefore, if
314     +you try to, say, create a new file in a directory through unionfs, while the
315     +directory is being moved around directly, then the new file may get created
316     +in the new location where that directory was moved to. This is a somewhat
317     +similar behaviour in NFS: an NFS client could be creating a new file while
318     +th NFS server is moving th directory around; the file will get successfully
319     +created in the new location. (The one exception in unionfs is that if the
320     +branch is marked read-only by unionfs, then a copyup will take place.)
321     +
322     +For more information, see <http://unionfs.filesystems.org/>.
323     diff --git a/Documentation/filesystems/unionfs/issues.txt b/Documentation/filesystems/unionfs/issues.txt
324     new file mode 100644
325     index 0000000..f4b7e7e
326     --- /dev/null
327     +++ b/Documentation/filesystems/unionfs/issues.txt
328     @@ -0,0 +1,28 @@
329     +KNOWN Unionfs 2.x ISSUES:
330     +=========================
331     +
332     +1. Unionfs should not use lookup_one_len() on the underlying f/s as it
333     + confuses NFSv4. Currently, unionfs_lookup() passes lookup intents to the
334     + lower file-system, this eliminates part of the problem. The remaining
335     + calls to lookup_one_len may need to be changed to pass an intent. We are
336     + currently introducing VFS changes to fs/namei.c's do_path_lookup() to
337     + allow proper file lookup and opening in stackable file systems.
338     +
339     +2. Lockdep (a debugging feature) isn't aware of stacking, and so it
340     + incorrectly complains about locking problems. The problem boils down to
341     + this: Lockdep considers all objects of a certain type to be in the same
342     + class, for example, all inodes. Lockdep doesn't like to see a lock held
343     + on two inodes within the same task, and warns that it could lead to a
344     + deadlock. However, stackable file systems do precisely that: they lock
345     + an upper object, and then a lower object, in a strict order to avoid
346     + locking problems; in addition, Unionfs, as a fan-out file system, may
347     + have to lock several lower inodes. We are currently looking into Lockdep
348     + to see how to make it aware of stackable file systems. For now, we
349     + temporarily disable lockdep when calling vfs methods on lower objects,
350     + but only for those places where lockdep complained. While this solution
351     + may seem unclean, it is not without precedent: other places in the kernel
352     + also do similar temporary disabling, of course after carefully having
353     + checked that it is the right thing to do. Anyway, you get any warnings
354     + from Lockdep, please report them to the Unionfs maintainers.
355     +
356     +For more information, see <http://unionfs.filesystems.org/>.
357     diff --git a/Documentation/filesystems/unionfs/rename.txt b/Documentation/filesystems/unionfs/rename.txt
358     new file mode 100644
359     index 0000000..e20bb82
360     --- /dev/null
361     +++ b/Documentation/filesystems/unionfs/rename.txt
362     @@ -0,0 +1,31 @@
363     +Rename is a complex beast. The following table shows which rename(2) operations
364     +should succeed and which should fail.
365     +
366     +o: success
367     +E: error (either unionfs or vfs)
368     +X: EXDEV
369     +
370     +none = file does not exist
371     +file = file is a file
372     +dir = file is a empty directory
373     +child= file is a non-empty directory
374     +wh = file is a directory containing only whiteouts; this makes it logically
375     + empty
376     +
377     + none file dir child wh
378     +file o o E E E
379     +dir o E o E o
380     +child X E X E X
381     +wh o E o E o
382     +
383     +
384     +Renaming directories:
385     +=====================
386     +
387     +Whenever a empty (either physically or logically) directory is being renamed,
388     +the following sequence of events should take place:
389     +
390     +1) Remove whiteouts from both source and destination directory
391     +2) Rename source to destination
392     +3) Make destination opaque to prevent anything under it from showing up
393     +
394     diff --git a/Documentation/filesystems/unionfs/usage.txt b/Documentation/filesystems/unionfs/usage.txt
395     new file mode 100644
396     index 0000000..1adde69
397     --- /dev/null
398     +++ b/Documentation/filesystems/unionfs/usage.txt
399     @@ -0,0 +1,134 @@
400     +Unionfs is a stackable unification file system, which can appear to merge
401     +the contents of several directories (branches), while keeping their physical
402     +content separate. Unionfs is useful for unified source tree management,
403     +merged contents of split CD-ROM, merged separate software package
404     +directories, data grids, and more. Unionfs allows any mix of read-only and
405     +read-write branches, as well as insertion and deletion of branches anywhere
406     +in the fan-out. To maintain Unix semantics, Unionfs handles elimination of
407     +duplicates, partial-error conditions, and more.
408     +
409     +GENERAL SYNTAX
410     +==============
411     +
412     +# mount -t unionfs -o <OPTIONS>,<BRANCH-OPTIONS> none MOUNTPOINT
413     +
414     +OPTIONS can be any legal combination of:
415     +
416     +- ro # mount file system read-only
417     +- rw # mount file system read-write
418     +- remount # remount the file system (see Branch Management below)
419     +- incgen # increment generation no. (see Cache Consistency below)
420     +
421     +BRANCH-OPTIONS can be either (1) a list of branches given to the "dirs="
422     +option, or (2) a list of individual branch manipulation commands, combined
423     +with the "remount" option, and is further described in the "Branch
424     +Management" section below.
425     +
426     +The syntax for the "dirs=" mount option is:
427     +
428     + dirs=branch[=ro|=rw][:...]
429     +
430     +The "dirs=" option takes a colon-delimited list of directories to compose
431     +the union, with an optional branch mode for each of those directories.
432     +Directories that come earlier (specified first, on the left) in the list
433     +have a higher precedence than those which come later. Additionally,
434     +read-only or read-write permissions of the branch can be specified by
435     +appending =ro or =rw (default) to each directory. See the Copyup section in
436     +concepts.txt, for a description of Unionfs's behavior when mixing read-only
437     +and read-write branches and mounts.
438     +
439     +Syntax:
440     +
441     + dirs=/branch1[=ro|=rw]:/branch2[=ro|=rw]:...:/branchN[=ro|=rw]
442     +
443     +Example:
444     +
445     + dirs=/writable_branch=rw:/read-only_branch=ro
446     +
447     +
448     +BRANCH MANAGEMENT
449     +=================
450     +
451     +Once you mount your union for the first time, using the "dirs=" option, you
452     +can then change the union's overall mode or reconfigure the branches, using
453     +the remount option, as follows.
454     +
455     +To downgrade a union from read-write to read-only:
456     +
457     +# mount -t unionfs -o remount,ro none MOUNTPOINT
458     +
459     +To upgrade a union from read-only to read-write:
460     +
461     +# mount -t unionfs -o remount,rw none MOUNTPOINT
462     +
463     +To delete a branch /foo, regardless where it is in the current union:
464     +
465     +# mount -t unionfs -o remount,del=/foo none MOUNTPOINT
466     +
467     +To insert (add) a branch /foo before /bar:
468     +
469     +# mount -t unionfs -o remount,add=/bar:/foo none MOUNTPOINT
470     +
471     +To insert (add) a branch /foo (with the "rw" mode flag) before /bar:
472     +
473     +# mount -t unionfs -o remount,add=/bar:/foo=rw none MOUNTPOINT
474     +
475     +To insert (add) a branch /foo (in "rw" mode) at the very beginning (i.e., a
476     +new highest-priority branch), you can use the above syntax, or use a short
477     +hand version as follows:
478     +
479     +# mount -t unionfs -o remount,add=/foo none MOUNTPOINT
480     +
481     +To append a branch to the very end (new lowest-priority branch):
482     +
483     +# mount -t unionfs -o remount,add=:/foo none MOUNTPOINT
484     +
485     +To append a branch to the very end (new lowest-priority branch), in
486     +read-only mode:
487     +
488     +# mount -t unionfs -o remount,add=:/foo=ro none MOUNTPOINT
489     +
490     +Finally, to change the mode of one existing branch, say /foo, from read-only
491     +to read-write, and change /bar from read-write to read-only:
492     +
493     +# mount -t unionfs -o remount,mode=/foo=rw,mode=/bar=ro none MOUNTPOINT
494     +
495     +Note: in Unionfs 2.x, you cannot set the leftmost branch to readonly because
496     +then Unionfs won't have any writable place for copyups to take place.
497     +Moreover, the VFS can get confused when it tries to modify something in a
498     +file system mounted read-write, but isn't permitted to write to it.
499     +Instead, you should set the whole union as readonly, as described above.
500     +If, however, you must set the leftmost branch as readonly, perhaps so you
501     +can get a snapshot of it at a point in time, then you should insert a new
502     +writable top-level branch, and mark the one you want as readonly. This can
503     +be accomplished as follows, assuming that /foo is your current leftmost
504     +branch:
505     +
506     +# mount -t tmpfs -o size=NNN /new
507     +# mount -t unionfs -o remount,add=/new,mode=/foo=ro none MOUNTPOINT
508     +<do what you want safely in /foo>
509     +# mount -t unionfs -o remount,del=/new,mode=/foo=rw none MOUNTPOINT
510     +<check if there's anything in /new you want to preserve>
511     +# umount /new
512     +
513     +CACHE CONSISTENCY
514     +=================
515     +
516     +If you modify any file on any of the lower branches directly, while there is
517     +a Unionfs 2.x mounted above any of those branches, you should tell Unionfs
518     +to purge its caches and re-get the objects. To do that, you have to
519     +increment the generation number of the superblock using the following
520     +command:
521     +
522     +# mount -t unionfs -o remount,incgen none MOUNTPOINT
523     +
524     +Note that the older way of incrementing the generation number using an
525     +ioctl, is no longer supported in Unionfs 2.0 and newer. Ioctls in general
526     +are not encouraged. Plus, an ioctl is per-file concept, whereas the
527     +generation number is a per-file-system concept. Worse, such an ioctl
528     +requires an open file, which then has to be invalidated by the very nature
529     +of the generation number increase (read: the old generation increase ioctl
530     +was pretty racy).
531     +
532     +
533     +For more information, see <http://unionfs.filesystems.org/>.
534     diff --git a/MAINTAINERS b/MAINTAINERS
535     index 2533fc4..23bc981 100644
536     --- a/MAINTAINERS
537     +++ b/MAINTAINERS
538     @@ -5446,6 +5446,14 @@ F: Documentation/cdrom/
539     F: drivers/cdrom/cdrom.c
540     F: include/linux/cdrom.h
541    
542     +UNIONFS
543     +P: Erez Zadok
544     +M: ezk@cs.sunysb.edu
545     +L: unionfs@filesystems.org
546     +W: http://unionfs.filesystems.org/
547     +T: git git.kernel.org/pub/scm/linux/kernel/git/ezk/unionfs.git
548     +S: Maintained
549     +
550     UNSORTED BLOCK IMAGES (UBI)
551     M: Artem Bityutskiy <dedekind1@gmail.com>
552     W: http://www.linux-mtd.infradead.org/
553     diff --git a/fs/Kconfig b/fs/Kconfig
554     index 64d44ef..b69e2f2 100644
555     --- a/fs/Kconfig
556     +++ b/fs/Kconfig
557     @@ -169,6 +169,7 @@ if MISC_FILESYSTEMS
558     source "fs/adfs/Kconfig"
559     source "fs/affs/Kconfig"
560     source "fs/ecryptfs/Kconfig"
561     +source "fs/unionfs/Kconfig"
562     source "fs/hfs/Kconfig"
563     source "fs/hfsplus/Kconfig"
564     source "fs/befs/Kconfig"
565     diff --git a/fs/Makefile b/fs/Makefile
566     index af6d047..6c254d5 100644
567     --- a/fs/Makefile
568     +++ b/fs/Makefile
569     @@ -84,6 +84,7 @@ obj-$(CONFIG_ISO9660_FS) += isofs/
570     obj-$(CONFIG_HFSPLUS_FS) += hfsplus/ # Before hfs to find wrapped HFS+
571     obj-$(CONFIG_HFS_FS) += hfs/
572     obj-$(CONFIG_ECRYPT_FS) += ecryptfs/
573     +obj-$(CONFIG_UNION_FS) += unionfs/
574     obj-$(CONFIG_VXFS_FS) += freevxfs/
575     obj-$(CONFIG_NFS_FS) += nfs/
576     obj-$(CONFIG_EXPORTFS) += exportfs/
577     diff --git a/fs/namei.c b/fs/namei.c
578     index a4855af..948c5e5 100644
579     --- a/fs/namei.c
580     +++ b/fs/namei.c
581     @@ -387,6 +387,7 @@ void release_open_intent(struct nameidata *nd)
582     else
583     fput(nd->intent.open.file);
584     }
585     +EXPORT_SYMBOL_GPL(release_open_intent);
586    
587     static inline struct dentry *
588     do_revalidate(struct dentry *dentry, struct nameidata *nd)
589     diff --git a/fs/splice.c b/fs/splice.c
590     index 3920866..488e3ba 100644
591     --- a/fs/splice.c
592     +++ b/fs/splice.c
593     @@ -1053,8 +1053,8 @@ EXPORT_SYMBOL(generic_splice_sendpage);
594     /*
595     * Attempt to initiate a splice from pipe to file.
596     */
597     -static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
598     - loff_t *ppos, size_t len, unsigned int flags)
599     +long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
600     + loff_t *ppos, size_t len, unsigned int flags)
601     {
602     ssize_t (*splice_write)(struct pipe_inode_info *, struct file *,
603     loff_t *, size_t, unsigned int);
604     @@ -1077,13 +1077,14 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
605    
606     return splice_write(pipe, out, ppos, len, flags);
607     }
608     +EXPORT_SYMBOL_GPL(vfs_splice_from);
609    
610     /*
611     * Attempt to initiate a splice from a file to a pipe.
612     */
613     -static long do_splice_to(struct file *in, loff_t *ppos,
614     - struct pipe_inode_info *pipe, size_t len,
615     - unsigned int flags)
616     +long vfs_splice_to(struct file *in, loff_t *ppos,
617     + struct pipe_inode_info *pipe, size_t len,
618     + unsigned int flags)
619     {
620     ssize_t (*splice_read)(struct file *, loff_t *,
621     struct pipe_inode_info *, size_t, unsigned int);
622     @@ -1103,6 +1104,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
623    
624     return splice_read(in, ppos, pipe, len, flags);
625     }
626     +EXPORT_SYMBOL_GPL(vfs_splice_to);
627    
628     /**
629     * splice_direct_to_actor - splices data directly between two non-pipes
630     @@ -1172,7 +1174,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
631     size_t read_len;
632     loff_t pos = sd->pos, prev_pos = pos;
633    
634     - ret = do_splice_to(in, &pos, pipe, len, flags);
635     + ret = vfs_splice_to(in, &pos, pipe, len, flags);
636     if (unlikely(ret <= 0))
637     goto out_release;
638    
639     @@ -1231,7 +1233,7 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
640     {
641     struct file *file = sd->u.file;
642    
643     - return do_splice_from(pipe, file, &sd->pos, sd->total_len, sd->flags);
644     + return vfs_splice_from(pipe, file, &sd->pos, sd->total_len, sd->flags);
645     }
646    
647     /**
648     @@ -1329,7 +1331,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
649     } else
650     off = &out->f_pos;
651    
652     - ret = do_splice_from(ipipe, out, off, len, flags);
653     + ret = vfs_splice_from(ipipe, out, off, len, flags);
654    
655     if (off_out && copy_to_user(off_out, off, sizeof(loff_t)))
656     ret = -EFAULT;
657     @@ -1350,7 +1352,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
658     } else
659     off = &in->f_pos;
660    
661     - ret = do_splice_to(in, off, opipe, len, flags);
662     + ret = vfs_splice_to(in, off, opipe, len, flags);
663    
664     if (off_in && copy_to_user(off_in, off, sizeof(loff_t)))
665     ret = -EFAULT;
666     diff --git a/fs/stack.c b/fs/stack.c
667     index 4a6f7f4..7eeef12 100644
668     --- a/fs/stack.c
669     +++ b/fs/stack.c
670     @@ -1,8 +1,20 @@
671     +/*
672     + * Copyright (c) 2006-2009 Erez Zadok
673     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
674     + * Copyright (c) 2006-2009 Stony Brook University
675     + * Copyright (c) 2006-2009 The Research Foundation of SUNY
676     + *
677     + * This program is free software; you can redistribute it and/or modify
678     + * it under the terms of the GNU General Public License version 2 as
679     + * published by the Free Software Foundation.
680     + */
681     +
682     #include <linux/module.h>
683     #include <linux/fs.h>
684     #include <linux/fs_stack.h>
685    
686     -/* does _NOT_ require i_mutex to be held.
687     +/*
688     + * does _NOT_ require i_mutex to be held.
689     *
690     * This function cannot be inlined since i_size_{read,write} is rather
691     * heavy-weight on 32-bit systems
692     diff --git a/fs/super.c b/fs/super.c
693     index aff046b..ad6dc74 100644
694     --- a/fs/super.c
695     +++ b/fs/super.c
696     @@ -95,6 +95,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
697     s->s_count = S_BIAS;
698     atomic_set(&s->s_active, 1);
699     mutex_init(&s->s_vfs_rename_mutex);
700     + lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
701     mutex_init(&s->s_dquot.dqio_mutex);
702     mutex_init(&s->s_dquot.dqonoff_mutex);
703     init_rwsem(&s->s_dquot.dqptr_sem);
704     diff --git a/fs/unionfs/Kconfig b/fs/unionfs/Kconfig
705     new file mode 100644
706     index 0000000..f3c1ac4
707     --- /dev/null
708     +++ b/fs/unionfs/Kconfig
709     @@ -0,0 +1,24 @@
710     +config UNION_FS
711     + tristate "Union file system (EXPERIMENTAL)"
712     + depends on EXPERIMENTAL
713     + help
714     + Unionfs is a stackable unification file system, which appears to
715     + merge the contents of several directories (branches), while keeping
716     + their physical content separate.
717     +
718     + See <http://unionfs.filesystems.org> for details
719     +
720     +config UNION_FS_XATTR
721     + bool "Unionfs extended attributes"
722     + depends on UNION_FS
723     + help
724     + Extended attributes are name:value pairs associated with inodes by
725     + the kernel or by users (see the attr(5) manual page).
726     +
727     + If unsure, say N.
728     +
729     +config UNION_FS_DEBUG
730     + bool "Debug Unionfs"
731     + depends on UNION_FS
732     + help
733     + If you say Y here, you can turn on debugging output from Unionfs.
734     diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
735     new file mode 100644
736     index 0000000..1ef873e
737     --- /dev/null
738     +++ b/fs/unionfs/Makefile
739     @@ -0,0 +1,17 @@
740     +UNIONFS_VERSION="2.5.4 (for 2.6.33)"
741     +
742     +EXTRA_CFLAGS += -DUNIONFS_VERSION=\"$(UNIONFS_VERSION)\"
743     +
744     +obj-$(CONFIG_UNION_FS) += unionfs.o
745     +
746     +unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
747     + rdstate.o copyup.o dirhelper.o rename.o unlink.o \
748     + lookup.o commonfops.o dirfops.o sioq.o mmap.o whiteout.o
749     +
750     +unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o
751     +
752     +unionfs-$(CONFIG_UNION_FS_DEBUG) += debug.o
753     +
754     +ifeq ($(CONFIG_UNION_FS_DEBUG),y)
755     +EXTRA_CFLAGS += -DDEBUG
756     +endif
757     diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
758     new file mode 100644
759     index 0000000..740c4ad
760     --- /dev/null
761     +++ b/fs/unionfs/commonfops.c
762     @@ -0,0 +1,896 @@
763     +/*
764     + * Copyright (c) 2003-2010 Erez Zadok
765     + * Copyright (c) 2003-2006 Charles P. Wright
766     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
767     + * Copyright (c) 2005-2006 Junjiro Okajima
768     + * Copyright (c) 2005 Arun M. Krishnakumar
769     + * Copyright (c) 2004-2006 David P. Quigley
770     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
771     + * Copyright (c) 2003 Puja Gupta
772     + * Copyright (c) 2003 Harikesavan Krishnan
773     + * Copyright (c) 2003-2010 Stony Brook University
774     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
775     + *
776     + * This program is free software; you can redistribute it and/or modify
777     + * it under the terms of the GNU General Public License version 2 as
778     + * published by the Free Software Foundation.
779     + */
780     +
781     +#include "union.h"
782     +
783     +/*
784     + * 1) Copyup the file
785     + * 2) Rename the file to '.unionfs<original inode#><counter>' - obviously
786     + * stolen from NFS's silly rename
787     + */
788     +static int copyup_deleted_file(struct file *file, struct dentry *dentry,
789     + struct dentry *parent, int bstart, int bindex)
790     +{
791     + static unsigned int counter;
792     + const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2;
793     + const int countersize = sizeof(counter) * 2;
794     + const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1;
795     + char name[nlen + 1];
796     + int err;
797     + struct dentry *tmp_dentry = NULL;
798     + struct dentry *lower_dentry;
799     + struct dentry *lower_dir_dentry = NULL;
800     +
801     + lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);
802     +
803     + sprintf(name, ".unionfs%*.*lx",
804     + i_inosize, i_inosize, lower_dentry->d_inode->i_ino);
805     +
806     + /*
807     + * Loop, looking for an unused temp name to copyup to.
808     + *
809     + * It's somewhat silly that we look for a free temp tmp name in the
810     + * source branch (bstart) instead of the dest branch (bindex), where
811     + * the final name will be created. We _will_ catch it if somehow
812     + * the name exists in the dest branch, but it'd be nice to catch it
813     + * sooner than later.
814     + */
815     +retry:
816     + tmp_dentry = NULL;
817     + do {
818     + char *suffix = name + nlen - countersize;
819     +
820     + dput(tmp_dentry);
821     + counter++;
822     + sprintf(suffix, "%*.*x", countersize, countersize, counter);
823     +
824     + pr_debug("unionfs: trying to rename %s to %s\n",
825     + dentry->d_name.name, name);
826     +
827     + tmp_dentry = lookup_lck_len(name, lower_dentry->d_parent,
828     + nlen);
829     + if (IS_ERR(tmp_dentry)) {
830     + err = PTR_ERR(tmp_dentry);
831     + goto out;
832     + }
833     + } while (tmp_dentry->d_inode != NULL); /* need negative dentry */
834     + dput(tmp_dentry);
835     +
836     + err = copyup_named_file(parent->d_inode, file, name, bstart, bindex,
837     + i_size_read(file->f_path.dentry->d_inode));
838     + if (err) {
839     + if (unlikely(err == -EEXIST))
840     + goto retry;
841     + goto out;
842     + }
843     +
844     + /* bring it to the same state as an unlinked file */
845     + lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
846     + if (!unionfs_lower_inode_idx(dentry->d_inode, bindex)) {
847     + atomic_inc(&lower_dentry->d_inode->i_count);
848     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
849     + lower_dentry->d_inode);
850     + }
851     + lower_dir_dentry = lock_parent(lower_dentry);
852     + err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
853     + unlock_dir(lower_dir_dentry);
854     +
855     +out:
856     + if (!err)
857     + unionfs_check_dentry(dentry);
858     + return err;
859     +}
860     +
861     +/*
862     + * put all references held by upper struct file and free lower file pointer
863     + * array
864     + */
865     +static void cleanup_file(struct file *file)
866     +{
867     + int bindex, bstart, bend;
868     + struct file **lower_files;
869     + struct file *lower_file;
870     + struct super_block *sb = file->f_path.dentry->d_sb;
871     +
872     + lower_files = UNIONFS_F(file)->lower_files;
873     + bstart = fbstart(file);
874     + bend = fbend(file);
875     +
876     + for (bindex = bstart; bindex <= bend; bindex++) {
877     + int i; /* holds (possibly) updated branch index */
878     + int old_bid;
879     +
880     + lower_file = unionfs_lower_file_idx(file, bindex);
881     + if (!lower_file)
882     + continue;
883     +
884     + /*
885     + * Find new index of matching branch with an open
886     + * file, since branches could have been added or
887     + * deleted causing the one with open files to shift.
888     + */
889     + old_bid = UNIONFS_F(file)->saved_branch_ids[bindex];
890     + i = branch_id_to_idx(sb, old_bid);
891     + if (unlikely(i < 0)) {
892     + printk(KERN_ERR "unionfs: no superblock for "
893     + "file %p\n", file);
894     + continue;
895     + }
896     +
897     + /* decrement count of open files */
898     + branchput(sb, i);
899     + /*
900     + * fput will perform an mntput for us on the correct branch.
901     + * Although we're using the file's old branch configuration,
902     + * bindex, which is the old index, correctly points to the
903     + * right branch in the file's branch list. In other words,
904     + * we're going to mntput the correct branch even if branches
905     + * have been added/removed.
906     + */
907     + fput(lower_file);
908     + UNIONFS_F(file)->lower_files[bindex] = NULL;
909     + UNIONFS_F(file)->saved_branch_ids[bindex] = -1;
910     + }
911     +
912     + UNIONFS_F(file)->lower_files = NULL;
913     + kfree(lower_files);
914     + kfree(UNIONFS_F(file)->saved_branch_ids);
915     + /* set to NULL because caller needs to know if to kfree on error */
916     + UNIONFS_F(file)->saved_branch_ids = NULL;
917     +}
918     +
919     +/* open all lower files for a given file */
920     +static int open_all_files(struct file *file)
921     +{
922     + int bindex, bstart, bend, err = 0;
923     + struct file *lower_file;
924     + struct dentry *lower_dentry;
925     + struct dentry *dentry = file->f_path.dentry;
926     + struct super_block *sb = dentry->d_sb;
927     +
928     + bstart = dbstart(dentry);
929     + bend = dbend(dentry);
930     +
931     + for (bindex = bstart; bindex <= bend; bindex++) {
932     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
933     + if (!lower_dentry)
934     + continue;
935     +
936     + dget(lower_dentry);
937     + unionfs_mntget(dentry, bindex);
938     + branchget(sb, bindex);
939     +
940     + lower_file =
941     + dentry_open(lower_dentry,
942     + unionfs_lower_mnt_idx(dentry, bindex),
943     + file->f_flags, current_cred());
944     + if (IS_ERR(lower_file)) {
945     + branchput(sb, bindex);
946     + err = PTR_ERR(lower_file);
947     + goto out;
948     + } else {
949     + unionfs_set_lower_file_idx(file, bindex, lower_file);
950     + }
951     + }
952     +out:
953     + return err;
954     +}
955     +
956     +/* open the highest priority file for a given upper file */
957     +static int open_highest_file(struct file *file, bool willwrite)
958     +{
959     + int bindex, bstart, bend, err = 0;
960     + struct file *lower_file;
961     + struct dentry *lower_dentry;
962     + struct dentry *dentry = file->f_path.dentry;
963     + struct dentry *parent = dget_parent(dentry);
964     + struct inode *parent_inode = parent->d_inode;
965     + struct super_block *sb = dentry->d_sb;
966     +
967     + bstart = dbstart(dentry);
968     + bend = dbend(dentry);
969     +
970     + lower_dentry = unionfs_lower_dentry(dentry);
971     + if (willwrite && IS_WRITE_FLAG(file->f_flags) && is_robranch(dentry)) {
972     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
973     + err = copyup_file(parent_inode, file, bstart, bindex,
974     + i_size_read(dentry->d_inode));
975     + if (!err)
976     + break;
977     + }
978     + atomic_set(&UNIONFS_F(file)->generation,
979     + atomic_read(&UNIONFS_I(dentry->d_inode)->
980     + generation));
981     + goto out;
982     + }
983     +
984     + dget(lower_dentry);
985     + unionfs_mntget(dentry, bstart);
986     + lower_file = dentry_open(lower_dentry,
987     + unionfs_lower_mnt_idx(dentry, bstart),
988     + file->f_flags, current_cred());
989     + if (IS_ERR(lower_file)) {
990     + err = PTR_ERR(lower_file);
991     + goto out;
992     + }
993     + branchget(sb, bstart);
994     + unionfs_set_lower_file(file, lower_file);
995     + /* Fix up the position. */
996     + lower_file->f_pos = file->f_pos;
997     +
998     + memcpy(&lower_file->f_ra, &file->f_ra, sizeof(struct file_ra_state));
999     +out:
1000     + dput(parent);
1001     + return err;
1002     +}
1003     +
1004     +/* perform a delayed copyup of a read-write file on a read-only branch */
1005     +static int do_delayed_copyup(struct file *file, struct dentry *parent)
1006     +{
1007     + int bindex, bstart, bend, err = 0;
1008     + struct dentry *dentry = file->f_path.dentry;
1009     + struct inode *parent_inode = parent->d_inode;
1010     +
1011     + bstart = fbstart(file);
1012     + bend = fbend(file);
1013     +
1014     + BUG_ON(!S_ISREG(dentry->d_inode->i_mode));
1015     +
1016     + unionfs_check_file(file);
1017     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
1018     + if (!d_deleted(dentry))
1019     + err = copyup_file(parent_inode, file, bstart,
1020     + bindex,
1021     + i_size_read(dentry->d_inode));
1022     + else
1023     + err = copyup_deleted_file(file, dentry, parent,
1024     + bstart, bindex);
1025     + /* if succeeded, set lower open-file flags and break */
1026     + if (!err) {
1027     + struct file *lower_file;
1028     + lower_file = unionfs_lower_file_idx(file, bindex);
1029     + lower_file->f_flags = file->f_flags;
1030     + break;
1031     + }
1032     + }
1033     + if (err || (bstart <= fbstart(file)))
1034     + goto out;
1035     + bend = fbend(file);
1036     + for (bindex = bstart; bindex <= bend; bindex++) {
1037     + if (unionfs_lower_file_idx(file, bindex)) {
1038     + branchput(dentry->d_sb, bindex);
1039     + fput(unionfs_lower_file_idx(file, bindex));
1040     + unionfs_set_lower_file_idx(file, bindex, NULL);
1041     + }
1042     + }
1043     + path_put_lowers(dentry, bstart, bend, false);
1044     + iput_lowers(dentry->d_inode, bstart, bend, false);
1045     + /* for reg file, we only open it "once" */
1046     + fbend(file) = fbstart(file);
1047     + dbend(dentry) = dbstart(dentry);
1048     + ibend(dentry->d_inode) = ibstart(dentry->d_inode);
1049     +
1050     +out:
1051     + unionfs_check_file(file);
1052     + return err;
1053     +}
1054     +
1055     +/*
1056     + * Helper function for unionfs_file_revalidate/locked.
1057     + * Expects dentry/parent to be locked already, and revalidated.
1058     + */
1059     +static int __unionfs_file_revalidate(struct file *file, struct dentry *dentry,
1060     + struct dentry *parent,
1061     + struct super_block *sb, int sbgen,
1062     + int dgen, bool willwrite)
1063     +{
1064     + int fgen;
1065     + int bstart, bend, orig_brid;
1066     + int size;
1067     + int err = 0;
1068     +
1069     + fgen = atomic_read(&UNIONFS_F(file)->generation);
1070     +
1071     + /*
1072     + * There are two cases we are interested in. The first is if the
1073     + * generation is lower than the super-block. The second is if
1074     + * someone has copied up this file from underneath us, we also need
1075     + * to refresh things.
1076     + */
1077     + if (d_deleted(dentry) ||
1078     + (sbgen <= fgen &&
1079     + dbstart(dentry) == fbstart(file) &&
1080     + unionfs_lower_file(file)))
1081     + goto out_may_copyup;
1082     +
1083     + /* save orig branch ID */
1084     + orig_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1085     +
1086     + /* First we throw out the existing files. */
1087     + cleanup_file(file);
1088     +
1089     + /* Now we reopen the file(s) as in unionfs_open. */
1090     + bstart = fbstart(file) = dbstart(dentry);
1091     + bend = fbend(file) = dbend(dentry);
1092     +
1093     + size = sizeof(struct file *) * sbmax(sb);
1094     + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1095     + if (unlikely(!UNIONFS_F(file)->lower_files)) {
1096     + err = -ENOMEM;
1097     + goto out;
1098     + }
1099     + size = sizeof(int) * sbmax(sb);
1100     + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1101     + if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1102     + err = -ENOMEM;
1103     + goto out;
1104     + }
1105     +
1106     + if (S_ISDIR(dentry->d_inode->i_mode)) {
1107     + /* We need to open all the files. */
1108     + err = open_all_files(file);
1109     + if (err)
1110     + goto out;
1111     + } else {
1112     + int new_brid;
1113     + /* We only open the highest priority branch. */
1114     + err = open_highest_file(file, willwrite);
1115     + if (err)
1116     + goto out;
1117     + new_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1118     + if (unlikely(new_brid != orig_brid && sbgen > fgen)) {
1119     + /*
1120     + * If we re-opened the file on a different branch
1121     + * than the original one, and this was due to a new
1122     + * branch inserted, then update the mnt counts of
1123     + * the old and new branches accordingly.
1124     + */
1125     + unionfs_mntget(dentry, bstart);
1126     + unionfs_mntput(sb->s_root,
1127     + branch_id_to_idx(sb, orig_brid));
1128     + }
1129     + /* regular files have only one open lower file */
1130     + fbend(file) = fbstart(file);
1131     + }
1132     + atomic_set(&UNIONFS_F(file)->generation,
1133     + atomic_read(&UNIONFS_I(dentry->d_inode)->generation));
1134     +
1135     +out_may_copyup:
1136     + /* Copyup on the first write to a file on a readonly branch. */
1137     + if (willwrite && IS_WRITE_FLAG(file->f_flags) &&
1138     + !IS_WRITE_FLAG(unionfs_lower_file(file)->f_flags) &&
1139     + is_robranch(dentry)) {
1140     + pr_debug("unionfs: do delay copyup of \"%s\"\n",
1141     + dentry->d_name.name);
1142     + err = do_delayed_copyup(file, parent);
1143     + /* regular files have only one open lower file */
1144     + if (!err && !S_ISDIR(dentry->d_inode->i_mode))
1145     + fbend(file) = fbstart(file);
1146     + }
1147     +
1148     +out:
1149     + if (err) {
1150     + kfree(UNIONFS_F(file)->lower_files);
1151     + kfree(UNIONFS_F(file)->saved_branch_ids);
1152     + }
1153     + return err;
1154     +}
1155     +
1156     +/*
1157     + * Revalidate the struct file
1158     + * @file: file to revalidate
1159     + * @parent: parent dentry (locked by caller)
1160     + * @willwrite: true if caller may cause changes to the file; false otherwise.
1161     + * Caller must lock/unlock dentry's branch configuration.
1162     + */
1163     +int unionfs_file_revalidate(struct file *file, struct dentry *parent,
1164     + bool willwrite)
1165     +{
1166     + struct super_block *sb;
1167     + struct dentry *dentry;
1168     + int sbgen, dgen;
1169     + int err = 0;
1170     +
1171     + dentry = file->f_path.dentry;
1172     + sb = dentry->d_sb;
1173     + verify_locked(dentry);
1174     + verify_locked(parent);
1175     +
1176     + /*
1177     + * First revalidate the dentry inside struct file,
1178     + * but not unhashed dentries.
1179     + */
1180     + if (!d_deleted(dentry) &&
1181     + !__unionfs_d_revalidate(dentry, parent, willwrite)) {
1182     + err = -ESTALE;
1183     + goto out;
1184     + }
1185     +
1186     + sbgen = atomic_read(&UNIONFS_SB(sb)->generation);
1187     + dgen = atomic_read(&UNIONFS_D(dentry)->generation);
1188     +
1189     + if (unlikely(sbgen > dgen)) { /* XXX: should never happen */
1190     + pr_debug("unionfs: failed to revalidate dentry (%s)\n",
1191     + dentry->d_name.name);
1192     + err = -ESTALE;
1193     + goto out;
1194     + }
1195     +
1196     + err = __unionfs_file_revalidate(file, dentry, parent, sb,
1197     + sbgen, dgen, willwrite);
1198     +out:
1199     + return err;
1200     +}
1201     +
1202     +/* unionfs_open helper function: open a directory */
1203     +static int __open_dir(struct inode *inode, struct file *file)
1204     +{
1205     + struct dentry *lower_dentry;
1206     + struct file *lower_file;
1207     + int bindex, bstart, bend;
1208     + struct vfsmount *mnt;
1209     +
1210     + bstart = fbstart(file) = dbstart(file->f_path.dentry);
1211     + bend = fbend(file) = dbend(file->f_path.dentry);
1212     +
1213     + for (bindex = bstart; bindex <= bend; bindex++) {
1214     + lower_dentry =
1215     + unionfs_lower_dentry_idx(file->f_path.dentry, bindex);
1216     + if (!lower_dentry)
1217     + continue;
1218     +
1219     + dget(lower_dentry);
1220     + unionfs_mntget(file->f_path.dentry, bindex);
1221     + mnt = unionfs_lower_mnt_idx(file->f_path.dentry, bindex);
1222     + lower_file = dentry_open(lower_dentry, mnt, file->f_flags,
1223     + current_cred());
1224     + if (IS_ERR(lower_file))
1225     + return PTR_ERR(lower_file);
1226     +
1227     + unionfs_set_lower_file_idx(file, bindex, lower_file);
1228     +
1229     + /*
1230     + * The branchget goes after the open, because otherwise
1231     + * we would miss the reference on release.
1232     + */
1233     + branchget(inode->i_sb, bindex);
1234     + }
1235     +
1236     + return 0;
1237     +}
1238     +
1239     +/* unionfs_open helper function: open a file */
1240     +static int __open_file(struct inode *inode, struct file *file,
1241     + struct dentry *parent)
1242     +{
1243     + struct dentry *lower_dentry;
1244     + struct file *lower_file;
1245     + int lower_flags;
1246     + int bindex, bstart, bend;
1247     +
1248     + lower_dentry = unionfs_lower_dentry(file->f_path.dentry);
1249     + lower_flags = file->f_flags;
1250     +
1251     + bstart = fbstart(file) = dbstart(file->f_path.dentry);
1252     + bend = fbend(file) = dbend(file->f_path.dentry);
1253     +
1254     + /*
1255     + * check for the permission for lower file. If the error is
1256     + * COPYUP_ERR, copyup the file.
1257     + */
1258     + if (lower_dentry->d_inode && is_robranch(file->f_path.dentry)) {
1259     + /*
1260     + * if the open will change the file, copy it up otherwise
1261     + * defer it.
1262     + */
1263     + if (lower_flags & O_TRUNC) {
1264     + int size = 0;
1265     + int err = -EROFS;
1266     +
1267     + /* copyup the file */
1268     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
1269     + err = copyup_file(parent->d_inode, file,
1270     + bstart, bindex, size);
1271     + if (!err)
1272     + break;
1273     + }
1274     + return err;
1275     + } else {
1276     + /*
1277     + * turn off writeable flags, to force delayed copyup
1278     + * by caller.
1279     + */
1280     + lower_flags &= ~(OPEN_WRITE_FLAGS);
1281     + }
1282     + }
1283     +
1284     + dget(lower_dentry);
1285     +
1286     + /*
1287     + * dentry_open will decrement mnt refcnt if err.
1288     + * otherwise fput() will do an mntput() for us upon file close.
1289     + */
1290     + unionfs_mntget(file->f_path.dentry, bstart);
1291     + lower_file =
1292     + dentry_open(lower_dentry,
1293     + unionfs_lower_mnt_idx(file->f_path.dentry, bstart),
1294     + lower_flags, current_cred());
1295     + if (IS_ERR(lower_file))
1296     + return PTR_ERR(lower_file);
1297     +
1298     + unionfs_set_lower_file(file, lower_file);
1299     + branchget(inode->i_sb, bstart);
1300     +
1301     + return 0;
1302     +}
1303     +
1304     +int unionfs_open(struct inode *inode, struct file *file)
1305     +{
1306     + int err = 0;
1307     + struct file *lower_file = NULL;
1308     + struct dentry *dentry = file->f_path.dentry;
1309     + struct dentry *parent;
1310     + int bindex = 0, bstart = 0, bend = 0;
1311     + int size;
1312     + int valid = 0;
1313     +
1314     + unionfs_read_lock(inode->i_sb, UNIONFS_SMUTEX_PARENT);
1315     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1316     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1317     +
1318     + /* don't open unhashed/deleted files */
1319     + if (d_deleted(dentry)) {
1320     + err = -ENOENT;
1321     + goto out_nofree;
1322     + }
1323     +
1324     + /* XXX: should I change 'false' below to the 'willwrite' flag? */
1325     + valid = __unionfs_d_revalidate(dentry, parent, false);
1326     + if (unlikely(!valid)) {
1327     + err = -ESTALE;
1328     + goto out_nofree;
1329     + }
1330     +
1331     + file->private_data =
1332     + kzalloc(sizeof(struct unionfs_file_info), GFP_KERNEL);
1333     + if (unlikely(!UNIONFS_F(file))) {
1334     + err = -ENOMEM;
1335     + goto out_nofree;
1336     + }
1337     + fbstart(file) = -1;
1338     + fbend(file) = -1;
1339     + atomic_set(&UNIONFS_F(file)->generation,
1340     + atomic_read(&UNIONFS_I(inode)->generation));
1341     +
1342     + size = sizeof(struct file *) * sbmax(inode->i_sb);
1343     + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1344     + if (unlikely(!UNIONFS_F(file)->lower_files)) {
1345     + err = -ENOMEM;
1346     + goto out;
1347     + }
1348     + size = sizeof(int) * sbmax(inode->i_sb);
1349     + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1350     + if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1351     + err = -ENOMEM;
1352     + goto out;
1353     + }
1354     +
1355     + bstart = fbstart(file) = dbstart(dentry);
1356     + bend = fbend(file) = dbend(dentry);
1357     +
1358     + /*
1359     + * open all directories and make the unionfs file struct point to
1360     + * these lower file structs
1361     + */
1362     + if (S_ISDIR(inode->i_mode))
1363     + err = __open_dir(inode, file); /* open a dir */
1364     + else
1365     + err = __open_file(inode, file, parent); /* open a file */
1366     +
1367     + /* freeing the allocated resources, and fput the opened files */
1368     + if (err) {
1369     + for (bindex = bstart; bindex <= bend; bindex++) {
1370     + lower_file = unionfs_lower_file_idx(file, bindex);
1371     + if (!lower_file)
1372     + continue;
1373     +
1374     + branchput(dentry->d_sb, bindex);
1375     + /* fput calls dput for lower_dentry */
1376     + fput(lower_file);
1377     + }
1378     + }
1379     +
1380     +out:
1381     + if (err) {
1382     + kfree(UNIONFS_F(file)->lower_files);
1383     + kfree(UNIONFS_F(file)->saved_branch_ids);
1384     + kfree(UNIONFS_F(file));
1385     + }
1386     +out_nofree:
1387     + if (!err) {
1388     + unionfs_postcopyup_setmnt(dentry);
1389     + unionfs_copy_attr_times(inode);
1390     + unionfs_check_file(file);
1391     + unionfs_check_inode(inode);
1392     + }
1393     + unionfs_unlock_dentry(dentry);
1394     + unionfs_unlock_parent(dentry, parent);
1395     + unionfs_read_unlock(inode->i_sb);
1396     + return err;
1397     +}
1398     +
1399     +/*
1400     + * release all lower object references & free the file info structure
1401     + *
1402     + * No need to grab sb info's rwsem.
1403     + */
1404     +int unionfs_file_release(struct inode *inode, struct file *file)
1405     +{
1406     + struct file *lower_file = NULL;
1407     + struct unionfs_file_info *fileinfo;
1408     + struct unionfs_inode_info *inodeinfo;
1409     + struct super_block *sb = inode->i_sb;
1410     + struct dentry *dentry = file->f_path.dentry;
1411     + struct dentry *parent;
1412     + int bindex, bstart, bend;
1413     + int fgen, err = 0;
1414     +
1415     + /*
1416     + * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
1417     + * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
1418     + * has been causing false positives in file system stacking layers.
1419     + * In particular, our ->mmap is called after sys_mmap2 already holds
1420     + * mmap_sem, then we lock our own mutexes; but earlier, it's
1421     + * possible for lockdep to have locked our mutexes first, and then
1422     + * we call a lower ->readdir which could call might_fault. The
1423     + * different ordering of the locks is what lockdep complains about
1424     + * -- unnecessarily. Therefore, we have no choice but to tell
1425     + * lockdep to temporarily turn off lockdep here. Note: the comments
1426     + * inside might_sleep also suggest that it would have been
1427     + * nicer to only annotate paths that needs that might_lock_read.
1428     + */
1429     + lockdep_off();
1430     + unionfs_read_lock(sb, UNIONFS_SMUTEX_PARENT);
1431     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1432     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1433     +
1434     + /*
1435     + * We try to revalidate, but the VFS ignores return return values
1436     + * from file->release, so we must always try to succeed here,
1437     + * including to do the kfree and dput below. So if revalidation
1438     + * failed, all we can do is print some message and keep going.
1439     + */
1440     + err = unionfs_file_revalidate(file, parent,
1441     + UNIONFS_F(file)->wrote_to_file);
1442     + if (!err)
1443     + unionfs_check_file(file);
1444     + fileinfo = UNIONFS_F(file);
1445     + BUG_ON(file->f_path.dentry->d_inode != inode);
1446     + inodeinfo = UNIONFS_I(inode);
1447     +
1448     + /* fput all the lower files */
1449     + fgen = atomic_read(&fileinfo->generation);
1450     + bstart = fbstart(file);
1451     + bend = fbend(file);
1452     +
1453     + for (bindex = bstart; bindex <= bend; bindex++) {
1454     + lower_file = unionfs_lower_file_idx(file, bindex);
1455     +
1456     + if (lower_file) {
1457     + unionfs_set_lower_file_idx(file, bindex, NULL);
1458     + fput(lower_file);
1459     + branchput(sb, bindex);
1460     + }
1461     +
1462     + /* if there are no more refs to the dentry, dput it */
1463     + if (d_deleted(dentry)) {
1464     + dput(unionfs_lower_dentry_idx(dentry, bindex));
1465     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1466     + }
1467     + }
1468     +
1469     + kfree(fileinfo->lower_files);
1470     + kfree(fileinfo->saved_branch_ids);
1471     +
1472     + if (fileinfo->rdstate) {
1473     + fileinfo->rdstate->access = jiffies;
1474     + spin_lock(&inodeinfo->rdlock);
1475     + inodeinfo->rdcount++;
1476     + list_add_tail(&fileinfo->rdstate->cache,
1477     + &inodeinfo->readdircache);
1478     + mark_inode_dirty(inode);
1479     + spin_unlock(&inodeinfo->rdlock);
1480     + fileinfo->rdstate = NULL;
1481     + }
1482     + kfree(fileinfo);
1483     +
1484     + unionfs_unlock_dentry(dentry);
1485     + unionfs_unlock_parent(dentry, parent);
1486     + unionfs_read_unlock(sb);
1487     + lockdep_on();
1488     + return err;
1489     +}
1490     +
1491     +/* pass the ioctl to the lower fs */
1492     +static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1493     +{
1494     + struct file *lower_file;
1495     + int err;
1496     +
1497     + lower_file = unionfs_lower_file(file);
1498     +
1499     + err = -ENOTTY;
1500     + if (!lower_file || !lower_file->f_op)
1501     + goto out;
1502     + if (lower_file->f_op->unlocked_ioctl) {
1503     + err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
1504     + } else if (lower_file->f_op->ioctl) {
1505     + lock_kernel();
1506     + err = lower_file->f_op->ioctl(
1507     + lower_file->f_path.dentry->d_inode,
1508     + lower_file, cmd, arg);
1509     + unlock_kernel();
1510     + }
1511     +
1512     +out:
1513     + return err;
1514     +}
1515     +
1516     +/*
1517     + * return to user-space the branch indices containing the file in question
1518     + *
1519     + * We use fd_set and therefore we are limited to the number of the branches
1520     + * to FD_SETSIZE, which is currently 1024 - plenty for most people
1521     + */
1522     +static int unionfs_ioctl_queryfile(struct file *file, struct dentry *parent,
1523     + unsigned int cmd, unsigned long arg)
1524     +{
1525     + int err = 0;
1526     + fd_set branchlist;
1527     + int bstart = 0, bend = 0, bindex = 0;
1528     + int orig_bstart, orig_bend;
1529     + struct dentry *dentry, *lower_dentry;
1530     + struct vfsmount *mnt;
1531     +
1532     + dentry = file->f_path.dentry;
1533     + orig_bstart = dbstart(dentry);
1534     + orig_bend = dbend(dentry);
1535     + err = unionfs_partial_lookup(dentry, parent);
1536     + if (err)
1537     + goto out;
1538     + bstart = dbstart(dentry);
1539     + bend = dbend(dentry);
1540     +
1541     + FD_ZERO(&branchlist);
1542     +
1543     + for (bindex = bstart; bindex <= bend; bindex++) {
1544     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
1545     + if (!lower_dentry)
1546     + continue;
1547     + if (likely(lower_dentry->d_inode))
1548     + FD_SET(bindex, &branchlist);
1549     + /* purge any lower objects after partial_lookup */
1550     + if (bindex < orig_bstart || bindex > orig_bend) {
1551     + dput(lower_dentry);
1552     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1553     + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
1554     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
1555     + NULL);
1556     + mnt = unionfs_lower_mnt_idx(dentry, bindex);
1557     + if (!mnt)
1558     + continue;
1559     + unionfs_mntput(dentry, bindex);
1560     + unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
1561     + }
1562     + }
1563     + /* restore original dentry's offsets */
1564     + dbstart(dentry) = orig_bstart;
1565     + dbend(dentry) = orig_bend;
1566     + ibstart(dentry->d_inode) = orig_bstart;
1567     + ibend(dentry->d_inode) = orig_bend;
1568     +
1569     + err = copy_to_user((void __user *)arg, &branchlist, sizeof(fd_set));
1570     + if (unlikely(err))
1571     + err = -EFAULT;
1572     +
1573     +out:
1574     + return err < 0 ? err : bend;
1575     +}
1576     +
1577     +long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1578     +{
1579     + long err;
1580     + struct dentry *dentry = file->f_path.dentry;
1581     + struct dentry *parent;
1582     +
1583     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1584     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1585     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1586     +
1587     + err = unionfs_file_revalidate(file, parent, true);
1588     + if (unlikely(err))
1589     + goto out;
1590     +
1591     + /* check if asked for local commands */
1592     + switch (cmd) {
1593     + case UNIONFS_IOCTL_INCGEN:
1594     + /* Increment the superblock generation count */
1595     + pr_info("unionfs: incgen ioctl deprecated; "
1596     + "use \"-o remount,incgen\"\n");
1597     + err = -ENOSYS;
1598     + break;
1599     +
1600     + case UNIONFS_IOCTL_QUERYFILE:
1601     + /* Return list of branches containing the given file */
1602     + err = unionfs_ioctl_queryfile(file, parent, cmd, arg);
1603     + break;
1604     +
1605     + default:
1606     + /* pass the ioctl down */
1607     + err = do_ioctl(file, cmd, arg);
1608     + break;
1609     + }
1610     +
1611     +out:
1612     + unionfs_check_file(file);
1613     + unionfs_unlock_dentry(dentry);
1614     + unionfs_unlock_parent(dentry, parent);
1615     + unionfs_read_unlock(dentry->d_sb);
1616     + return err;
1617     +}
1618     +
1619     +int unionfs_flush(struct file *file, fl_owner_t id)
1620     +{
1621     + int err = 0;
1622     + struct file *lower_file = NULL;
1623     + struct dentry *dentry = file->f_path.dentry;
1624     + struct dentry *parent;
1625     + int bindex, bstart, bend;
1626     +
1627     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1628     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1629     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1630     +
1631     + err = unionfs_file_revalidate(file, parent,
1632     + UNIONFS_F(file)->wrote_to_file);
1633     + if (unlikely(err))
1634     + goto out;
1635     + unionfs_check_file(file);
1636     +
1637     + bstart = fbstart(file);
1638     + bend = fbend(file);
1639     + for (bindex = bstart; bindex <= bend; bindex++) {
1640     + lower_file = unionfs_lower_file_idx(file, bindex);
1641     +
1642     + if (lower_file && lower_file->f_op &&
1643     + lower_file->f_op->flush) {
1644     + err = lower_file->f_op->flush(lower_file, id);
1645     + if (err)
1646     + goto out;
1647     + }
1648     +
1649     + }
1650     +
1651     +out:
1652     + if (!err)
1653     + unionfs_check_file(file);
1654     + unionfs_unlock_dentry(dentry);
1655     + unionfs_unlock_parent(dentry, parent);
1656     + unionfs_read_unlock(dentry->d_sb);
1657     + return err;
1658     +}
1659     diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
1660     new file mode 100644
1661     index 0000000..9c7b2ac
1662     --- /dev/null
1663     +++ b/fs/unionfs/copyup.c
1664     @@ -0,0 +1,897 @@
1665     +/*
1666     + * Copyright (c) 2003-2010 Erez Zadok
1667     + * Copyright (c) 2003-2006 Charles P. Wright
1668     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
1669     + * Copyright (c) 2005-2006 Junjiro Okajima
1670     + * Copyright (c) 2005 Arun M. Krishnakumar
1671     + * Copyright (c) 2004-2006 David P. Quigley
1672     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
1673     + * Copyright (c) 2003 Puja Gupta
1674     + * Copyright (c) 2003 Harikesavan Krishnan
1675     + * Copyright (c) 2003-2010 Stony Brook University
1676     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
1677     + *
1678     + * This program is free software; you can redistribute it and/or modify
1679     + * it under the terms of the GNU General Public License version 2 as
1680     + * published by the Free Software Foundation.
1681     + */
1682     +
1683     +#include "union.h"
1684     +
1685     +/*
1686     + * For detailed explanation of copyup see:
1687     + * Documentation/filesystems/unionfs/concepts.txt
1688     + */
1689     +
1690     +#ifdef CONFIG_UNION_FS_XATTR
1691     +/* copyup all extended attrs for a given dentry */
1692     +static int copyup_xattrs(struct dentry *old_lower_dentry,
1693     + struct dentry *new_lower_dentry)
1694     +{
1695     + int err = 0;
1696     + ssize_t list_size = -1;
1697     + char *name_list = NULL;
1698     + char *attr_value = NULL;
1699     + char *name_list_buf = NULL;
1700     +
1701     + /* query the actual size of the xattr list */
1702     + list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
1703     + if (list_size <= 0) {
1704     + err = list_size;
1705     + goto out;
1706     + }
1707     +
1708     + /* allocate space for the actual list */
1709     + name_list = unionfs_xattr_alloc(list_size + 1, XATTR_LIST_MAX);
1710     + if (unlikely(!name_list || IS_ERR(name_list))) {
1711     + err = PTR_ERR(name_list);
1712     + goto out;
1713     + }
1714     +
1715     + name_list_buf = name_list; /* save for kfree at end */
1716     +
1717     + /* now get the actual xattr list of the source file */
1718     + list_size = vfs_listxattr(old_lower_dentry, name_list, list_size);
1719     + if (list_size <= 0) {
1720     + err = list_size;
1721     + goto out;
1722     + }
1723     +
1724     + /* allocate space to hold each xattr's value */
1725     + attr_value = unionfs_xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX);
1726     + if (unlikely(!attr_value || IS_ERR(attr_value))) {
1727     + err = PTR_ERR(name_list);
1728     + goto out;
1729     + }
1730     +
1731     + /* in a loop, get and set each xattr from src to dst file */
1732     + while (*name_list) {
1733     + ssize_t size;
1734     +
1735     + /* Lock here since vfs_getxattr doesn't lock for us */
1736     + mutex_lock(&old_lower_dentry->d_inode->i_mutex);
1737     + size = vfs_getxattr(old_lower_dentry, name_list,
1738     + attr_value, XATTR_SIZE_MAX);
1739     + mutex_unlock(&old_lower_dentry->d_inode->i_mutex);
1740     + if (size < 0) {
1741     + err = size;
1742     + goto out;
1743     + }
1744     + if (size > XATTR_SIZE_MAX) {
1745     + err = -E2BIG;
1746     + goto out;
1747     + }
1748     + /* Don't lock here since vfs_setxattr does it for us. */
1749     + err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
1750     + size, 0);
1751     + /*
1752     + * Selinux depends on "security.*" xattrs, so to maintain
1753     + * the security of copied-up files, if Selinux is active,
1754     + * then we must copy these xattrs as well. So we need to
1755     + * temporarily get FOWNER privileges.
1756     + * XXX: move entire copyup code to SIOQ.
1757     + */
1758     + if (err == -EPERM && !capable(CAP_FOWNER)) {
1759     + const struct cred *old_creds;
1760     + struct cred *new_creds;
1761     +
1762     + new_creds = prepare_creds();
1763     + if (unlikely(!new_creds)) {
1764     + err = -ENOMEM;
1765     + goto out;
1766     + }
1767     + cap_raise(new_creds->cap_effective, CAP_FOWNER);
1768     + old_creds = override_creds(new_creds);
1769     + err = vfs_setxattr(new_lower_dentry, name_list,
1770     + attr_value, size, 0);
1771     + revert_creds(old_creds);
1772     + }
1773     + if (err < 0)
1774     + goto out;
1775     + name_list += strlen(name_list) + 1;
1776     + }
1777     +out:
1778     + unionfs_xattr_kfree(name_list_buf);
1779     + unionfs_xattr_kfree(attr_value);
1780     + /* Ignore if xattr isn't supported */
1781     + if (err == -ENOTSUPP || err == -EOPNOTSUPP)
1782     + err = 0;
1783     + return err;
1784     +}
1785     +#endif /* CONFIG_UNION_FS_XATTR */
1786     +
1787     +/*
1788     + * Determine the mode based on the copyup flags, and the existing dentry.
1789     + *
1790     + * Handle file systems which may not support certain options. For example
1791     + * jffs2 doesn't allow one to chmod a symlink. So we ignore such harmless
1792     + * errors, rather than propagating them up, which results in copyup errors
1793     + * and errors returned back to users.
1794     + */
1795     +static int copyup_permissions(struct super_block *sb,
1796     + struct dentry *old_lower_dentry,
1797     + struct dentry *new_lower_dentry)
1798     +{
1799     + struct inode *i = old_lower_dentry->d_inode;
1800     + struct iattr newattrs;
1801     + int err;
1802     +
1803     + newattrs.ia_atime = i->i_atime;
1804     + newattrs.ia_mtime = i->i_mtime;
1805     + newattrs.ia_ctime = i->i_ctime;
1806     + newattrs.ia_gid = i->i_gid;
1807     + newattrs.ia_uid = i->i_uid;
1808     + newattrs.ia_valid = ATTR_CTIME | ATTR_ATIME | ATTR_MTIME |
1809     + ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_FORCE |
1810     + ATTR_GID | ATTR_UID;
1811     + mutex_lock(&new_lower_dentry->d_inode->i_mutex);
1812     + err = notify_change(new_lower_dentry, &newattrs);
1813     + if (err)
1814     + goto out;
1815     +
1816     + /* now try to change the mode and ignore EOPNOTSUPP on symlinks */
1817     + newattrs.ia_mode = i->i_mode;
1818     + newattrs.ia_valid = ATTR_MODE | ATTR_FORCE;
1819     + err = notify_change(new_lower_dentry, &newattrs);
1820     + if (err == -EOPNOTSUPP &&
1821     + S_ISLNK(new_lower_dentry->d_inode->i_mode)) {
1822     + printk(KERN_WARNING
1823     + "unionfs: changing \"%s\" symlink mode unsupported\n",
1824     + new_lower_dentry->d_name.name);
1825     + err = 0;
1826     + }
1827     +
1828     +out:
1829     + mutex_unlock(&new_lower_dentry->d_inode->i_mutex);
1830     + return err;
1831     +}
1832     +
1833     +/*
1834     + * create the new device/file/directory - use copyup_permission to copyup
1835     + * times, and mode
1836     + *
1837     + * if the object being copied up is a regular file, the file is only created,
1838     + * the contents have to be copied up separately
1839     + */
1840     +static int __copyup_ndentry(struct dentry *old_lower_dentry,
1841     + struct dentry *new_lower_dentry,
1842     + struct dentry *new_lower_parent_dentry,
1843     + char *symbuf)
1844     +{
1845     + int err = 0;
1846     + umode_t old_mode = old_lower_dentry->d_inode->i_mode;
1847     + struct sioq_args args;
1848     +
1849     + if (S_ISDIR(old_mode)) {
1850     + args.mkdir.parent = new_lower_parent_dentry->d_inode;
1851     + args.mkdir.dentry = new_lower_dentry;
1852     + args.mkdir.mode = old_mode;
1853     +
1854     + run_sioq(__unionfs_mkdir, &args);
1855     + err = args.err;
1856     + } else if (S_ISLNK(old_mode)) {
1857     + args.symlink.parent = new_lower_parent_dentry->d_inode;
1858     + args.symlink.dentry = new_lower_dentry;
1859     + args.symlink.symbuf = symbuf;
1860     +
1861     + run_sioq(__unionfs_symlink, &args);
1862     + err = args.err;
1863     + } else if (S_ISBLK(old_mode) || S_ISCHR(old_mode) ||
1864     + S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) {
1865     + args.mknod.parent = new_lower_parent_dentry->d_inode;
1866     + args.mknod.dentry = new_lower_dentry;
1867     + args.mknod.mode = old_mode;
1868     + args.mknod.dev = old_lower_dentry->d_inode->i_rdev;
1869     +
1870     + run_sioq(__unionfs_mknod, &args);
1871     + err = args.err;
1872     + } else if (S_ISREG(old_mode)) {
1873     + struct nameidata nd;
1874     + err = init_lower_nd(&nd, LOOKUP_CREATE);
1875     + if (unlikely(err < 0))
1876     + goto out;
1877     + args.create.nd = &nd;
1878     + args.create.parent = new_lower_parent_dentry->d_inode;
1879     + args.create.dentry = new_lower_dentry;
1880     + args.create.mode = old_mode;
1881     +
1882     + run_sioq(__unionfs_create, &args);
1883     + err = args.err;
1884     + release_lower_nd(&nd, err);
1885     + } else {
1886     + printk(KERN_CRIT "unionfs: unknown inode type %d\n",
1887     + old_mode);
1888     + BUG();
1889     + }
1890     +
1891     +out:
1892     + return err;
1893     +}
1894     +
1895     +static int __copyup_reg_data(struct dentry *dentry,
1896     + struct dentry *new_lower_dentry, int new_bindex,
1897     + struct dentry *old_lower_dentry, int old_bindex,
1898     + struct file **copyup_file, loff_t len)
1899     +{
1900     + struct super_block *sb = dentry->d_sb;
1901     + struct file *input_file;
1902     + struct file *output_file;
1903     + struct vfsmount *output_mnt;
1904     + mm_segment_t old_fs;
1905     + char *buf = NULL;
1906     + ssize_t read_bytes, write_bytes;
1907     + loff_t size;
1908     + int err = 0;
1909     +
1910     + /* open old file */
1911     + unionfs_mntget(dentry, old_bindex);
1912     + branchget(sb, old_bindex);
1913     + /* dentry_open calls dput and mntput if it returns an error */
1914     + input_file = dentry_open(old_lower_dentry,
1915     + unionfs_lower_mnt_idx(dentry, old_bindex),
1916     + O_RDONLY | O_LARGEFILE, current_cred());
1917     + if (IS_ERR(input_file)) {
1918     + dput(old_lower_dentry);
1919     + err = PTR_ERR(input_file);
1920     + goto out;
1921     + }
1922     + if (unlikely(!input_file->f_op || !input_file->f_op->read)) {
1923     + err = -EINVAL;
1924     + goto out_close_in;
1925     + }
1926     +
1927     + /* open new file */
1928     + dget(new_lower_dentry);
1929     + output_mnt = unionfs_mntget(sb->s_root, new_bindex);
1930     + branchget(sb, new_bindex);
1931     + output_file = dentry_open(new_lower_dentry, output_mnt,
1932     + O_RDWR | O_LARGEFILE, current_cred());
1933     + if (IS_ERR(output_file)) {
1934     + err = PTR_ERR(output_file);
1935     + goto out_close_in2;
1936     + }
1937     + if (unlikely(!output_file->f_op || !output_file->f_op->write)) {
1938     + err = -EINVAL;
1939     + goto out_close_out;
1940     + }
1941     +
1942     + /* allocating a buffer */
1943     + buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
1944     + if (unlikely(!buf)) {
1945     + err = -ENOMEM;
1946     + goto out_close_out;
1947     + }
1948     +
1949     + input_file->f_pos = 0;
1950     + output_file->f_pos = 0;
1951     +
1952     + old_fs = get_fs();
1953     + set_fs(KERNEL_DS);
1954     +
1955     + size = len;
1956     + err = 0;
1957     + do {
1958     + if (len >= PAGE_SIZE)
1959     + size = PAGE_SIZE;
1960     + else if ((len < PAGE_SIZE) && (len > 0))
1961     + size = len;
1962     +
1963     + len -= PAGE_SIZE;
1964     +
1965     + read_bytes =
1966     + input_file->f_op->read(input_file,
1967     + (char __user *)buf, size,
1968     + &input_file->f_pos);
1969     + if (read_bytes <= 0) {
1970     + err = read_bytes;
1971     + break;
1972     + }
1973     +
1974     + /* see Documentation/filesystems/unionfs/issues.txt */
1975     + lockdep_off();
1976     + write_bytes =
1977     + output_file->f_op->write(output_file,
1978     + (char __user *)buf,
1979     + read_bytes,
1980     + &output_file->f_pos);
1981     + lockdep_on();
1982     + if ((write_bytes < 0) || (write_bytes < read_bytes)) {
1983     + err = write_bytes;
1984     + break;
1985     + }
1986     + } while ((read_bytes > 0) && (len > 0));
1987     +
1988     + set_fs(old_fs);
1989     +
1990     + kfree(buf);
1991     +
1992     + if (!err)
1993     + err = output_file->f_op->fsync(output_file,
1994     + new_lower_dentry, 0);
1995     +
1996     + if (err)
1997     + goto out_close_out;
1998     +
1999     + if (copyup_file) {
2000     + *copyup_file = output_file;
2001     + goto out_close_in;
2002     + }
2003     +
2004     +out_close_out:
2005     + fput(output_file);
2006     +
2007     +out_close_in2:
2008     + branchput(sb, new_bindex);
2009     +
2010     +out_close_in:
2011     + fput(input_file);
2012     +
2013     +out:
2014     + branchput(sb, old_bindex);
2015     +
2016     + return err;
2017     +}
2018     +
2019     +/*
2020     + * dput the lower references for old and new dentry & clear a lower dentry
2021     + * pointer
2022     + */
2023     +static void __clear(struct dentry *dentry, struct dentry *old_lower_dentry,
2024     + int old_bstart, int old_bend,
2025     + struct dentry *new_lower_dentry, int new_bindex)
2026     +{
2027     + /* get rid of the lower dentry and all its traces */
2028     + unionfs_set_lower_dentry_idx(dentry, new_bindex, NULL);
2029     + dbstart(dentry) = old_bstart;
2030     + dbend(dentry) = old_bend;
2031     +
2032     + dput(new_lower_dentry);
2033     + dput(old_lower_dentry);
2034     +}
2035     +
2036     +/*
2037     + * Copy up a dentry to a file of specified name.
2038     + *
2039     + * @dir: used to pull the ->i_sb to access other branches
2040     + * @dentry: the non-negative dentry whose lower_inode we should copy
2041     + * @bstart: the branch of the lower_inode to copy from
2042     + * @new_bindex: the branch to create the new file in
2043     + * @name: the name of the file to create
2044     + * @namelen: length of @name
2045     + * @copyup_file: the "struct file" to return (optional)
2046     + * @len: how many bytes to copy-up?
2047     + */
2048     +int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,
2049     + int new_bindex, const char *name, int namelen,
2050     + struct file **copyup_file, loff_t len)
2051     +{
2052     + struct dentry *new_lower_dentry;
2053     + struct dentry *old_lower_dentry = NULL;
2054     + struct super_block *sb;
2055     + int err = 0;
2056     + int old_bindex;
2057     + int old_bstart;
2058     + int old_bend;
2059     + struct dentry *new_lower_parent_dentry = NULL;
2060     + mm_segment_t oldfs;
2061     + char *symbuf = NULL;
2062     +
2063     + verify_locked(dentry);
2064     +
2065     + old_bindex = bstart;
2066     + old_bstart = dbstart(dentry);
2067     + old_bend = dbend(dentry);
2068     +
2069     + BUG_ON(new_bindex < 0);
2070     + BUG_ON(new_bindex >= old_bindex);
2071     +
2072     + sb = dir->i_sb;
2073     +
2074     + err = is_robranch_super(sb, new_bindex);
2075     + if (err)
2076     + goto out;
2077     +
2078     + /* Create the directory structure above this dentry. */
2079     + new_lower_dentry = create_parents(dir, dentry, name, new_bindex);
2080     + if (IS_ERR(new_lower_dentry)) {
2081     + err = PTR_ERR(new_lower_dentry);
2082     + goto out;
2083     + }
2084     +
2085     + old_lower_dentry = unionfs_lower_dentry_idx(dentry, old_bindex);
2086     + /* we conditionally dput this old_lower_dentry at end of function */
2087     + dget(old_lower_dentry);
2088     +
2089     + /* For symlinks, we must read the link before we lock the directory. */
2090     + if (S_ISLNK(old_lower_dentry->d_inode->i_mode)) {
2091     +
2092     + symbuf = kmalloc(PATH_MAX, GFP_KERNEL);
2093     + if (unlikely(!symbuf)) {
2094     + __clear(dentry, old_lower_dentry,
2095     + old_bstart, old_bend,
2096     + new_lower_dentry, new_bindex);
2097     + err = -ENOMEM;
2098     + goto out_free;
2099     + }
2100     +
2101     + oldfs = get_fs();
2102     + set_fs(KERNEL_DS);
2103     + err = old_lower_dentry->d_inode->i_op->readlink(
2104     + old_lower_dentry,
2105     + (char __user *)symbuf,
2106     + PATH_MAX);
2107     + set_fs(oldfs);
2108     + if (err < 0) {
2109     + __clear(dentry, old_lower_dentry,
2110     + old_bstart, old_bend,
2111     + new_lower_dentry, new_bindex);
2112     + goto out_free;
2113     + }
2114     + symbuf[err] = '\0';
2115     + }
2116     +
2117     + /* Now we lock the parent, and create the object in the new branch. */
2118     + new_lower_parent_dentry = lock_parent(new_lower_dentry);
2119     +
2120     + /* create the new inode */
2121     + err = __copyup_ndentry(old_lower_dentry, new_lower_dentry,
2122     + new_lower_parent_dentry, symbuf);
2123     +
2124     + if (err) {
2125     + __clear(dentry, old_lower_dentry,
2126     + old_bstart, old_bend,
2127     + new_lower_dentry, new_bindex);
2128     + goto out_unlock;
2129     + }
2130     +
2131     + /* We actually copyup the file here. */
2132     + if (S_ISREG(old_lower_dentry->d_inode->i_mode))
2133     + err = __copyup_reg_data(dentry, new_lower_dentry, new_bindex,
2134     + old_lower_dentry, old_bindex,
2135     + copyup_file, len);
2136     + if (err)
2137     + goto out_unlink;
2138     +
2139     + /* Set permissions. */
2140     + err = copyup_permissions(sb, old_lower_dentry, new_lower_dentry);
2141     + if (err)
2142     + goto out_unlink;
2143     +
2144     +#ifdef CONFIG_UNION_FS_XATTR
2145     + /* Selinux uses extended attributes for permissions. */
2146     + err = copyup_xattrs(old_lower_dentry, new_lower_dentry);
2147     + if (err)
2148     + goto out_unlink;
2149     +#endif /* CONFIG_UNION_FS_XATTR */
2150     +
2151     + /* do not allow files getting deleted to be re-interposed */
2152     + if (!d_deleted(dentry))
2153     + unionfs_reinterpose(dentry);
2154     +
2155     + goto out_unlock;
2156     +
2157     +out_unlink:
2158     + /*
2159     + * copyup failed, because we possibly ran out of space or
2160     + * quota, or something else happened so let's unlink; we don't
2161     + * really care about the return value of vfs_unlink
2162     + */
2163     + vfs_unlink(new_lower_parent_dentry->d_inode, new_lower_dentry);
2164     +
2165     + if (copyup_file) {
2166     + /* need to close the file */
2167     +
2168     + fput(*copyup_file);
2169     + branchput(sb, new_bindex);
2170     + }
2171     +
2172     + /*
2173     + * TODO: should we reset the error to something like -EIO?
2174     + *
2175     + * If we don't reset, the user may get some nonsensical errors, but
2176     + * on the other hand, if we reset to EIO, we guarantee that the user
2177     + * will get a "confusing" error message.
2178     + */
2179     +
2180     +out_unlock:
2181     + unlock_dir(new_lower_parent_dentry);
2182     +
2183     +out_free:
2184     + /*
2185     + * If old_lower_dentry was not a file, then we need to dput it. If
2186     + * it was a file, then it was already dput indirectly by other
2187     + * functions we call above which operate on regular files.
2188     + */
2189     + if (old_lower_dentry && old_lower_dentry->d_inode &&
2190     + !S_ISREG(old_lower_dentry->d_inode->i_mode))
2191     + dput(old_lower_dentry);
2192     + kfree(symbuf);
2193     +
2194     + if (err) {
2195     + /*
2196     + * if directory creation succeeded, but inode copyup failed,
2197     + * then purge new dentries.
2198     + */
2199     + if (dbstart(dentry) < old_bstart &&
2200     + ibstart(dentry->d_inode) > dbstart(dentry))
2201     + __clear(dentry, NULL, old_bstart, old_bend,
2202     + unionfs_lower_dentry(dentry), dbstart(dentry));
2203     + goto out;
2204     + }
2205     + if (!S_ISDIR(dentry->d_inode->i_mode)) {
2206     + unionfs_postcopyup_release(dentry);
2207     + if (!unionfs_lower_inode(dentry->d_inode)) {
2208     + /*
2209     + * If we got here, then we copied up to an
2210     + * unlinked-open file, whose name is .unionfsXXXXX.
2211     + */
2212     + struct inode *inode = new_lower_dentry->d_inode;
2213     + atomic_inc(&inode->i_count);
2214     + unionfs_set_lower_inode_idx(dentry->d_inode,
2215     + ibstart(dentry->d_inode),
2216     + inode);
2217     + }
2218     + }
2219     + unionfs_postcopyup_setmnt(dentry);
2220     + /* sync inode times from copied-up inode to our inode */
2221     + unionfs_copy_attr_times(dentry->d_inode);
2222     + unionfs_check_inode(dir);
2223     + unionfs_check_dentry(dentry);
2224     +out:
2225     + return err;
2226     +}
2227     +
2228     +/*
2229     + * This function creates a copy of a file represented by 'file' which
2230     + * currently resides in branch 'bstart' to branch 'new_bindex.' The copy
2231     + * will be named "name".
2232     + */
2233     +int copyup_named_file(struct inode *dir, struct file *file, char *name,
2234     + int bstart, int new_bindex, loff_t len)
2235     +{
2236     + int err = 0;
2237     + struct file *output_file = NULL;
2238     +
2239     + err = copyup_dentry(dir, file->f_path.dentry, bstart, new_bindex,
2240     + name, strlen(name), &output_file, len);
2241     + if (!err) {
2242     + fbstart(file) = new_bindex;
2243     + unionfs_set_lower_file_idx(file, new_bindex, output_file);
2244     + }
2245     +
2246     + return err;
2247     +}
2248     +
2249     +/*
2250     + * This function creates a copy of a file represented by 'file' which
2251     + * currently resides in branch 'bstart' to branch 'new_bindex'.
2252     + */
2253     +int copyup_file(struct inode *dir, struct file *file, int bstart,
2254     + int new_bindex, loff_t len)
2255     +{
2256     + int err = 0;
2257     + struct file *output_file = NULL;
2258     + struct dentry *dentry = file->f_path.dentry;
2259     +
2260     + err = copyup_dentry(dir, dentry, bstart, new_bindex,
2261     + dentry->d_name.name, dentry->d_name.len,
2262     + &output_file, len);
2263     + if (!err) {
2264     + fbstart(file) = new_bindex;
2265     + unionfs_set_lower_file_idx(file, new_bindex, output_file);
2266     + }
2267     +
2268     + return err;
2269     +}
2270     +
2271     +/* purge a dentry's lower-branch states (dput/mntput, etc.) */
2272     +static void __cleanup_dentry(struct dentry *dentry, int bindex,
2273     + int old_bstart, int old_bend)
2274     +{
2275     + int loop_start;
2276     + int loop_end;
2277     + int new_bstart = -1;
2278     + int new_bend = -1;
2279     + int i;
2280     +
2281     + loop_start = min(old_bstart, bindex);
2282     + loop_end = max(old_bend, bindex);
2283     +
2284     + /*
2285     + * This loop sets the bstart and bend for the new dentry by
2286     + * traversing from left to right. It also dputs all negative
2287     + * dentries except bindex
2288     + */
2289     + for (i = loop_start; i <= loop_end; i++) {
2290     + if (!unionfs_lower_dentry_idx(dentry, i))
2291     + continue;
2292     +
2293     + if (i == bindex) {
2294     + new_bend = i;
2295     + if (new_bstart < 0)
2296     + new_bstart = i;
2297     + continue;
2298     + }
2299     +
2300     + if (!unionfs_lower_dentry_idx(dentry, i)->d_inode) {
2301     + dput(unionfs_lower_dentry_idx(dentry, i));
2302     + unionfs_set_lower_dentry_idx(dentry, i, NULL);
2303     +
2304     + unionfs_mntput(dentry, i);
2305     + unionfs_set_lower_mnt_idx(dentry, i, NULL);
2306     + } else {
2307     + if (new_bstart < 0)
2308     + new_bstart = i;
2309     + new_bend = i;
2310     + }
2311     + }
2312     +
2313     + if (new_bstart < 0)
2314     + new_bstart = bindex;
2315     + if (new_bend < 0)
2316     + new_bend = bindex;
2317     + dbstart(dentry) = new_bstart;
2318     + dbend(dentry) = new_bend;
2319     +
2320     +}
2321     +
2322     +/* set lower inode ptr and update bstart & bend if necessary */
2323     +static void __set_inode(struct dentry *upper, struct dentry *lower,
2324     + int bindex)
2325     +{
2326     + unionfs_set_lower_inode_idx(upper->d_inode, bindex,
2327     + igrab(lower->d_inode));
2328     + if (likely(ibstart(upper->d_inode) > bindex))
2329     + ibstart(upper->d_inode) = bindex;
2330     + if (likely(ibend(upper->d_inode) < bindex))
2331     + ibend(upper->d_inode) = bindex;
2332     +
2333     +}
2334     +
2335     +/* set lower dentry ptr and update bstart & bend if necessary */
2336     +static void __set_dentry(struct dentry *upper, struct dentry *lower,
2337     + int bindex)
2338     +{
2339     + unionfs_set_lower_dentry_idx(upper, bindex, lower);
2340     + if (likely(dbstart(upper) > bindex))
2341     + dbstart(upper) = bindex;
2342     + if (likely(dbend(upper) < bindex))
2343     + dbend(upper) = bindex;
2344     +}
2345     +
2346     +/*
2347     + * This function replicates the directory structure up-to given dentry
2348     + * in the bindex branch.
2349     + */
2350     +struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
2351     + const char *name, int bindex)
2352     +{
2353     + int err;
2354     + struct dentry *child_dentry;
2355     + struct dentry *parent_dentry;
2356     + struct dentry *lower_parent_dentry = NULL;
2357     + struct dentry *lower_dentry = NULL;
2358     + const char *childname;
2359     + unsigned int childnamelen;
2360     + int nr_dentry;
2361     + int count = 0;
2362     + int old_bstart;
2363     + int old_bend;
2364     + struct dentry **path = NULL;
2365     + struct super_block *sb;
2366     +
2367     + verify_locked(dentry);
2368     +
2369     + err = is_robranch_super(dir->i_sb, bindex);
2370     + if (err) {
2371     + lower_dentry = ERR_PTR(err);
2372     + goto out;
2373     + }
2374     +
2375     + old_bstart = dbstart(dentry);
2376     + old_bend = dbend(dentry);
2377     +
2378     + lower_dentry = ERR_PTR(-ENOMEM);
2379     +
2380     + /* There is no sense allocating any less than the minimum. */
2381     + nr_dentry = 1;
2382     + path = kmalloc(nr_dentry * sizeof(struct dentry *), GFP_KERNEL);
2383     + if (unlikely(!path))
2384     + goto out;
2385     +
2386     + /* assume the negative dentry of unionfs as the parent dentry */
2387     + parent_dentry = dentry;
2388     +
2389     + /*
2390     + * This loop finds the first parent that exists in the given branch.
2391     + * We start building the directory structure from there. At the end
2392     + * of the loop, the following should hold:
2393     + * - child_dentry is the first nonexistent child
2394     + * - parent_dentry is the first existent parent
2395     + * - path[0] is the = deepest child
2396     + * - path[count] is the first child to create
2397     + */
2398     + do {
2399     + child_dentry = parent_dentry;
2400     +
2401     + /* find the parent directory dentry in unionfs */
2402     + parent_dentry = dget_parent(child_dentry);
2403     +
2404     + /* find out the lower_parent_dentry in the given branch */
2405     + lower_parent_dentry =
2406     + unionfs_lower_dentry_idx(parent_dentry, bindex);
2407     +
2408     + /* grow path table */
2409     + if (count == nr_dentry) {
2410     + void *p;
2411     +
2412     + nr_dentry *= 2;
2413     + p = krealloc(path, nr_dentry * sizeof(struct dentry *),
2414     + GFP_KERNEL);
2415     + if (unlikely(!p)) {
2416     + lower_dentry = ERR_PTR(-ENOMEM);
2417     + goto out;
2418     + }
2419     + path = p;
2420     + }
2421     +
2422     + /* store the child dentry */
2423     + path[count++] = child_dentry;
2424     + } while (!lower_parent_dentry);
2425     + count--;
2426     +
2427     + sb = dentry->d_sb;
2428     +
2429     + /*
2430     + * This code goes between the begin/end labels and basically
2431     + * emulates a while(child_dentry != dentry), only cleaner and
2432     + * shorter than what would be a much longer while loop.
2433     + */
2434     +begin:
2435     + /* get lower parent dir in the current branch */
2436     + lower_parent_dentry = unionfs_lower_dentry_idx(parent_dentry, bindex);
2437     + dput(parent_dentry);
2438     +
2439     + /* init the values to lookup */
2440     + childname = child_dentry->d_name.name;
2441     + childnamelen = child_dentry->d_name.len;
2442     +
2443     + if (child_dentry != dentry) {
2444     + /* lookup child in the underlying file system */
2445     + lower_dentry = lookup_lck_len(childname, lower_parent_dentry,
2446     + childnamelen);
2447     + if (IS_ERR(lower_dentry))
2448     + goto out;
2449     + } else {
2450     + /*
2451     + * Is the name a whiteout of the child name ? lookup the
2452     + * whiteout child in the underlying file system
2453     + */
2454     + lower_dentry = lookup_lck_len(name, lower_parent_dentry,
2455     + strlen(name));
2456     + if (IS_ERR(lower_dentry))
2457     + goto out;
2458     +
2459     + /* Replace the current dentry (if any) with the new one */
2460     + dput(unionfs_lower_dentry_idx(dentry, bindex));
2461     + unionfs_set_lower_dentry_idx(dentry, bindex,
2462     + lower_dentry);
2463     +
2464     + __cleanup_dentry(dentry, bindex, old_bstart, old_bend);
2465     + goto out;
2466     + }
2467     +
2468     + if (lower_dentry->d_inode) {
2469     + /*
2470     + * since this already exists we dput to avoid
2471     + * multiple references on the same dentry
2472     + */
2473     + dput(lower_dentry);
2474     + } else {
2475     + struct sioq_args args;
2476     +
2477     + /* it's a negative dentry, create a new dir */
2478     + lower_parent_dentry = lock_parent(lower_dentry);
2479     +
2480     + args.mkdir.parent = lower_parent_dentry->d_inode;
2481     + args.mkdir.dentry = lower_dentry;
2482     + args.mkdir.mode = child_dentry->d_inode->i_mode;
2483     +
2484     + run_sioq(__unionfs_mkdir, &args);
2485     + err = args.err;
2486     +
2487     + if (!err)
2488     + err = copyup_permissions(dir->i_sb, child_dentry,
2489     + lower_dentry);
2490     + unlock_dir(lower_parent_dentry);
2491     + if (err) {
2492     + dput(lower_dentry);
2493     + lower_dentry = ERR_PTR(err);
2494     + goto out;
2495     + }
2496     +
2497     + }
2498     +
2499     + __set_inode(child_dentry, lower_dentry, bindex);
2500     + __set_dentry(child_dentry, lower_dentry, bindex);
2501     + /*
2502     + * update times of this dentry, but also the parent, because if
2503     + * we changed, the parent may have changed too.
2504     + */
2505     + fsstack_copy_attr_times(parent_dentry->d_inode,
2506     + lower_parent_dentry->d_inode);
2507     + unionfs_copy_attr_times(child_dentry->d_inode);
2508     +
2509     + parent_dentry = child_dentry;
2510     + child_dentry = path[--count];
2511     + goto begin;
2512     +out:
2513     + /* cleanup any leftover locks from the do/while loop above */
2514     + if (IS_ERR(lower_dentry))
2515     + while (count)
2516     + dput(path[count--]);
2517     + kfree(path);
2518     + return lower_dentry;
2519     +}
2520     +
2521     +/*
2522     + * Post-copyup helper to ensure we have valid mnts: set lower mnt of
2523     + * dentry+parents to the first parent node that has an mnt.
2524     + */
2525     +void unionfs_postcopyup_setmnt(struct dentry *dentry)
2526     +{
2527     + struct dentry *parent, *hasone;
2528     + int bindex = dbstart(dentry);
2529     +
2530     + if (unionfs_lower_mnt_idx(dentry, bindex))
2531     + return;
2532     + hasone = dentry->d_parent;
2533     + /* this loop should stop at root dentry */
2534     + while (!unionfs_lower_mnt_idx(hasone, bindex))
2535     + hasone = hasone->d_parent;
2536     + parent = dentry;
2537     + while (!unionfs_lower_mnt_idx(parent, bindex)) {
2538     + unionfs_set_lower_mnt_idx(parent, bindex,
2539     + unionfs_mntget(hasone, bindex));
2540     + parent = parent->d_parent;
2541     + }
2542     +}
2543     +
2544     +/*
2545     + * Post-copyup helper to release all non-directory source objects of a
2546     + * copied-up file. Regular files should have only one lower object.
2547     + */
2548     +void unionfs_postcopyup_release(struct dentry *dentry)
2549     +{
2550     + int bstart, bend;
2551     +
2552     + BUG_ON(S_ISDIR(dentry->d_inode->i_mode));
2553     + bstart = dbstart(dentry);
2554     + bend = dbend(dentry);
2555     +
2556     + path_put_lowers(dentry, bstart + 1, bend, false);
2557     + iput_lowers(dentry->d_inode, bstart + 1, bend, false);
2558     +
2559     + dbend(dentry) = bstart;
2560     + ibend(dentry->d_inode) = ibstart(dentry->d_inode) = bstart;
2561     +}
2562     diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c
2563     new file mode 100644
2564     index 0000000..acc44bd
2565     --- /dev/null
2566     +++ b/fs/unionfs/debug.c
2567     @@ -0,0 +1,533 @@
2568     +/*
2569     + * Copyright (c) 2003-2010 Erez Zadok
2570     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
2571     + * Copyright (c) 2003-2010 Stony Brook University
2572     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
2573     + *
2574     + * This program is free software; you can redistribute it and/or modify
2575     + * it under the terms of the GNU General Public License version 2 as
2576     + * published by the Free Software Foundation.
2577     + */
2578     +
2579     +#include "union.h"
2580     +
2581     +/*
2582     + * Helper debugging functions for maintainers (and for users to report back
2583     + * useful information back to maintainers)
2584     + */
2585     +
2586     +/* it's always useful to know what part of the code called us */
2587     +#define PRINT_CALLER(fname, fxn, line) \
2588     + do { \
2589     + if (!printed_caller) { \
2590     + pr_debug("PC:%s:%s:%d\n", (fname), (fxn), (line)); \
2591     + printed_caller = 1; \
2592     + } \
2593     + } while (0)
2594     +
2595     +/*
2596     + * __unionfs_check_{inode,dentry,file} perform exhaustive sanity checking on
2597     + * the fan-out of various Unionfs objects. We check that no lower objects
2598     + * exist outside the start/end branch range; that all objects within are
2599     + * non-NULL (with some allowed exceptions); that for every lower file
2600     + * there's a lower dentry+inode; that the start/end ranges match for all
2601     + * corresponding lower objects; that open files/symlinks have only one lower
2602     + * objects, but directories can have several; and more.
2603     + */
2604     +void __unionfs_check_inode(const struct inode *inode,
2605     + const char *fname, const char *fxn, int line)
2606     +{
2607     + int bindex;
2608     + int istart, iend;
2609     + struct inode *lower_inode;
2610     + struct super_block *sb;
2611     + int printed_caller = 0;
2612     + void *poison_ptr;
2613     +
2614     + /* for inodes now */
2615     + BUG_ON(!inode);
2616     + sb = inode->i_sb;
2617     + istart = ibstart(inode);
2618     + iend = ibend(inode);
2619     + /* don't check inode if no lower branches */
2620     + if (istart < 0 && iend < 0)
2621     + return;
2622     + if (unlikely(istart > iend)) {
2623     + PRINT_CALLER(fname, fxn, line);
2624     + pr_debug(" Ci0: inode=%p istart/end=%d:%d\n",
2625     + inode, istart, iend);
2626     + }
2627     + if (unlikely((istart == -1 && iend != -1) ||
2628     + (istart != -1 && iend == -1))) {
2629     + PRINT_CALLER(fname, fxn, line);
2630     + pr_debug(" Ci1: inode=%p istart/end=%d:%d\n",
2631     + inode, istart, iend);
2632     + }
2633     + if (!S_ISDIR(inode->i_mode)) {
2634     + if (unlikely(iend != istart)) {
2635     + PRINT_CALLER(fname, fxn, line);
2636     + pr_debug(" Ci2: inode=%p istart=%d iend=%d\n",
2637     + inode, istart, iend);
2638     + }
2639     + }
2640     +
2641     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2642     + if (unlikely(!UNIONFS_I(inode))) {
2643     + PRINT_CALLER(fname, fxn, line);
2644     + pr_debug(" Ci3: no inode_info %p\n", inode);
2645     + return;
2646     + }
2647     + if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
2648     + PRINT_CALLER(fname, fxn, line);
2649     + pr_debug(" Ci4: no lower_inodes %p\n", inode);
2650     + return;
2651     + }
2652     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2653     + if (lower_inode) {
2654     + memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2655     + if (unlikely(bindex < istart || bindex > iend)) {
2656     + PRINT_CALLER(fname, fxn, line);
2657     + pr_debug(" Ci5: inode/linode=%p:%p bindex=%d "
2658     + "istart/end=%d:%d\n", inode,
2659     + lower_inode, bindex, istart, iend);
2660     + } else if (unlikely(lower_inode == poison_ptr)) {
2661     + /* freed inode! */
2662     + PRINT_CALLER(fname, fxn, line);
2663     + pr_debug(" Ci6: inode/linode=%p:%p bindex=%d "
2664     + "istart/end=%d:%d\n", inode,
2665     + lower_inode, bindex, istart, iend);
2666     + }
2667     + continue;
2668     + }
2669     + /* if we get here, then lower_inode == NULL */
2670     + if (bindex < istart || bindex > iend)
2671     + continue;
2672     + /*
2673     + * directories can have NULL lower inodes in b/t start/end,
2674     + * but NOT if at the start/end range.
2675     + */
2676     + if (unlikely(S_ISDIR(inode->i_mode) &&
2677     + bindex > istart && bindex < iend))
2678     + continue;
2679     + PRINT_CALLER(fname, fxn, line);
2680     + pr_debug(" Ci7: inode/linode=%p:%p "
2681     + "bindex=%d istart/end=%d:%d\n",
2682     + inode, lower_inode, bindex, istart, iend);
2683     + }
2684     +}
2685     +
2686     +void __unionfs_check_dentry(const struct dentry *dentry,
2687     + const char *fname, const char *fxn, int line)
2688     +{
2689     + int bindex;
2690     + int dstart, dend, istart, iend;
2691     + struct dentry *lower_dentry;
2692     + struct inode *inode, *lower_inode;
2693     + struct super_block *sb;
2694     + struct vfsmount *lower_mnt;
2695     + int printed_caller = 0;
2696     + void *poison_ptr;
2697     +
2698     + BUG_ON(!dentry);
2699     + sb = dentry->d_sb;
2700     + inode = dentry->d_inode;
2701     + dstart = dbstart(dentry);
2702     + dend = dbend(dentry);
2703     + /* don't check dentry/mnt if no lower branches */
2704     + if (dstart < 0 && dend < 0)
2705     + goto check_inode;
2706     + BUG_ON(dstart > dend);
2707     +
2708     + if (unlikely((dstart == -1 && dend != -1) ||
2709     + (dstart != -1 && dend == -1))) {
2710     + PRINT_CALLER(fname, fxn, line);
2711     + pr_debug(" CD0: dentry=%p dstart/end=%d:%d\n",
2712     + dentry, dstart, dend);
2713     + }
2714     + /*
2715     + * check for NULL dentries inside the start/end range, or
2716     + * non-NULL dentries outside the start/end range.
2717     + */
2718     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2719     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
2720     + if (lower_dentry) {
2721     + if (unlikely(bindex < dstart || bindex > dend)) {
2722     + PRINT_CALLER(fname, fxn, line);
2723     + pr_debug(" CD1: dentry/lower=%p:%p(%p) "
2724     + "bindex=%d dstart/end=%d:%d\n",
2725     + dentry, lower_dentry,
2726     + (lower_dentry ? lower_dentry->d_inode :
2727     + (void *) -1L),
2728     + bindex, dstart, dend);
2729     + }
2730     + } else { /* lower_dentry == NULL */
2731     + if (bindex < dstart || bindex > dend)
2732     + continue;
2733     + /*
2734     + * Directories can have NULL lower inodes in b/t
2735     + * start/end, but NOT if at the start/end range.
2736     + * Ignore this rule, however, if this is a NULL
2737     + * dentry or a deleted dentry.
2738     + */
2739     + if (unlikely(!d_deleted((struct dentry *) dentry) &&
2740     + inode &&
2741     + !(inode && S_ISDIR(inode->i_mode) &&
2742     + bindex > dstart && bindex < dend))) {
2743     + PRINT_CALLER(fname, fxn, line);
2744     + pr_debug(" CD2: dentry/lower=%p:%p(%p) "
2745     + "bindex=%d dstart/end=%d:%d\n",
2746     + dentry, lower_dentry,
2747     + (lower_dentry ?
2748     + lower_dentry->d_inode :
2749     + (void *) -1L),
2750     + bindex, dstart, dend);
2751     + }
2752     + }
2753     + }
2754     +
2755     + /* check for vfsmounts same as for dentries */
2756     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2757     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2758     + if (lower_mnt) {
2759     + if (unlikely(bindex < dstart || bindex > dend)) {
2760     + PRINT_CALLER(fname, fxn, line);
2761     + pr_debug(" CM0: dentry/lmnt=%p:%p bindex=%d "
2762     + "dstart/end=%d:%d\n", dentry,
2763     + lower_mnt, bindex, dstart, dend);
2764     + }
2765     + } else { /* lower_mnt == NULL */
2766     + if (bindex < dstart || bindex > dend)
2767     + continue;
2768     + /*
2769     + * Directories can have NULL lower inodes in b/t
2770     + * start/end, but NOT if at the start/end range.
2771     + * Ignore this rule, however, if this is a NULL
2772     + * dentry.
2773     + */
2774     + if (unlikely(inode &&
2775     + !(inode && S_ISDIR(inode->i_mode) &&
2776     + bindex > dstart && bindex < dend))) {
2777     + PRINT_CALLER(fname, fxn, line);
2778     + pr_debug(" CM1: dentry/lmnt=%p:%p "
2779     + "bindex=%d dstart/end=%d:%d\n",
2780     + dentry, lower_mnt, bindex,
2781     + dstart, dend);
2782     + }
2783     + }
2784     + }
2785     +
2786     +check_inode:
2787     + /* for inodes now */
2788     + if (!inode)
2789     + return;
2790     + istart = ibstart(inode);
2791     + iend = ibend(inode);
2792     + /* don't check inode if no lower branches */
2793     + if (istart < 0 && iend < 0)
2794     + return;
2795     + BUG_ON(istart > iend);
2796     + if (unlikely((istart == -1 && iend != -1) ||
2797     + (istart != -1 && iend == -1))) {
2798     + PRINT_CALLER(fname, fxn, line);
2799     + pr_debug(" CI0: dentry/inode=%p:%p istart/end=%d:%d\n",
2800     + dentry, inode, istart, iend);
2801     + }
2802     + if (unlikely(istart != dstart)) {
2803     + PRINT_CALLER(fname, fxn, line);
2804     + pr_debug(" CI1: dentry/inode=%p:%p istart=%d dstart=%d\n",
2805     + dentry, inode, istart, dstart);
2806     + }
2807     + if (unlikely(iend != dend)) {
2808     + PRINT_CALLER(fname, fxn, line);
2809     + pr_debug(" CI2: dentry/inode=%p:%p iend=%d dend=%d\n",
2810     + dentry, inode, iend, dend);
2811     + }
2812     +
2813     + if (!S_ISDIR(inode->i_mode)) {
2814     + if (unlikely(dend != dstart)) {
2815     + PRINT_CALLER(fname, fxn, line);
2816     + pr_debug(" CI3: dentry/inode=%p:%p dstart=%d dend=%d\n",
2817     + dentry, inode, dstart, dend);
2818     + }
2819     + if (unlikely(iend != istart)) {
2820     + PRINT_CALLER(fname, fxn, line);
2821     + pr_debug(" CI4: dentry/inode=%p:%p istart=%d iend=%d\n",
2822     + dentry, inode, istart, iend);
2823     + }
2824     + }
2825     +
2826     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2827     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2828     + if (lower_inode) {
2829     + memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2830     + if (unlikely(bindex < istart || bindex > iend)) {
2831     + PRINT_CALLER(fname, fxn, line);
2832     + pr_debug(" CI5: dentry/linode=%p:%p bindex=%d "
2833     + "istart/end=%d:%d\n", dentry,
2834     + lower_inode, bindex, istart, iend);
2835     + } else if (unlikely(lower_inode == poison_ptr)) {
2836     + /* freed inode! */
2837     + PRINT_CALLER(fname, fxn, line);
2838     + pr_debug(" CI6: dentry/linode=%p:%p bindex=%d "
2839     + "istart/end=%d:%d\n", dentry,
2840     + lower_inode, bindex, istart, iend);
2841     + }
2842     + continue;
2843     + }
2844     + /* if we get here, then lower_inode == NULL */
2845     + if (bindex < istart || bindex > iend)
2846     + continue;
2847     + /*
2848     + * directories can have NULL lower inodes in b/t start/end,
2849     + * but NOT if at the start/end range.
2850     + */
2851     + if (unlikely(S_ISDIR(inode->i_mode) &&
2852     + bindex > istart && bindex < iend))
2853     + continue;
2854     + PRINT_CALLER(fname, fxn, line);
2855     + pr_debug(" CI7: dentry/linode=%p:%p "
2856     + "bindex=%d istart/end=%d:%d\n",
2857     + dentry, lower_inode, bindex, istart, iend);
2858     + }
2859     +
2860     + /*
2861     + * If it's a directory, then intermediate objects b/t start/end can
2862     + * be NULL. But, check that all three are NULL: lower dentry, mnt,
2863     + * and inode.
2864     + */
2865     + if (dstart >= 0 && dend >= 0 && S_ISDIR(inode->i_mode))
2866     + for (bindex = dstart+1; bindex < dend; bindex++) {
2867     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2868     + lower_dentry = unionfs_lower_dentry_idx(dentry,
2869     + bindex);
2870     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2871     + if (unlikely(!((lower_inode && lower_dentry &&
2872     + lower_mnt) ||
2873     + (!lower_inode &&
2874     + !lower_dentry && !lower_mnt)))) {
2875     + PRINT_CALLER(fname, fxn, line);
2876     + pr_debug(" Cx: lmnt/ldentry/linode=%p:%p:%p "
2877     + "bindex=%d dstart/end=%d:%d\n",
2878     + lower_mnt, lower_dentry, lower_inode,
2879     + bindex, dstart, dend);
2880     + }
2881     + }
2882     + /* check if lower inode is newer than upper one (it shouldn't) */
2883     + if (unlikely(is_newer_lower(dentry) && !is_negative_lower(dentry))) {
2884     + PRINT_CALLER(fname, fxn, line);
2885     + for (bindex = ibstart(inode); bindex <= ibend(inode);
2886     + bindex++) {
2887     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2888     + if (unlikely(!lower_inode))
2889     + continue;
2890     + pr_debug(" CI8: bindex=%d mtime/lmtime=%lu.%lu/%lu.%lu "
2891     + "ctime/lctime=%lu.%lu/%lu.%lu\n",
2892     + bindex,
2893     + inode->i_mtime.tv_sec,
2894     + inode->i_mtime.tv_nsec,
2895     + lower_inode->i_mtime.tv_sec,
2896     + lower_inode->i_mtime.tv_nsec,
2897     + inode->i_ctime.tv_sec,
2898     + inode->i_ctime.tv_nsec,
2899     + lower_inode->i_ctime.tv_sec,
2900     + lower_inode->i_ctime.tv_nsec);
2901     + }
2902     + }
2903     +}
2904     +
2905     +void __unionfs_check_file(const struct file *file,
2906     + const char *fname, const char *fxn, int line)
2907     +{
2908     + int bindex;
2909     + int dstart, dend, fstart, fend;
2910     + struct dentry *dentry;
2911     + struct file *lower_file;
2912     + struct inode *inode;
2913     + struct super_block *sb;
2914     + int printed_caller = 0;
2915     +
2916     + BUG_ON(!file);
2917     + dentry = file->f_path.dentry;
2918     + sb = dentry->d_sb;
2919     + dstart = dbstart(dentry);
2920     + dend = dbend(dentry);
2921     + BUG_ON(dstart > dend);
2922     + fstart = fbstart(file);
2923     + fend = fbend(file);
2924     + BUG_ON(fstart > fend);
2925     +
2926     + if (unlikely((fstart == -1 && fend != -1) ||
2927     + (fstart != -1 && fend == -1))) {
2928     + PRINT_CALLER(fname, fxn, line);
2929     + pr_debug(" CF0: file/dentry=%p:%p fstart/end=%d:%d\n",
2930     + file, dentry, fstart, fend);
2931     + }
2932     + if (unlikely(fstart != dstart)) {
2933     + PRINT_CALLER(fname, fxn, line);
2934     + pr_debug(" CF1: file/dentry=%p:%p fstart=%d dstart=%d\n",
2935     + file, dentry, fstart, dstart);
2936     + }
2937     + if (unlikely(fend != dend)) {
2938     + PRINT_CALLER(fname, fxn, line);
2939     + pr_debug(" CF2: file/dentry=%p:%p fend=%d dend=%d\n",
2940     + file, dentry, fend, dend);
2941     + }
2942     + inode = dentry->d_inode;
2943     + if (!S_ISDIR(inode->i_mode)) {
2944     + if (unlikely(fend != fstart)) {
2945     + PRINT_CALLER(fname, fxn, line);
2946     + pr_debug(" CF3: file/inode=%p:%p fstart=%d fend=%d\n",
2947     + file, inode, fstart, fend);
2948     + }
2949     + if (unlikely(dend != dstart)) {
2950     + PRINT_CALLER(fname, fxn, line);
2951     + pr_debug(" CF4: file/dentry=%p:%p dstart=%d dend=%d\n",
2952     + file, dentry, dstart, dend);
2953     + }
2954     + }
2955     +
2956     + /*
2957     + * check for NULL dentries inside the start/end range, or
2958     + * non-NULL dentries outside the start/end range.
2959     + */
2960     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2961     + lower_file = unionfs_lower_file_idx(file, bindex);
2962     + if (lower_file) {
2963     + if (unlikely(bindex < fstart || bindex > fend)) {
2964     + PRINT_CALLER(fname, fxn, line);
2965     + pr_debug(" CF5: file/lower=%p:%p bindex=%d "
2966     + "fstart/end=%d:%d\n", file,
2967     + lower_file, bindex, fstart, fend);
2968     + }
2969     + } else { /* lower_file == NULL */
2970     + if (bindex >= fstart && bindex <= fend) {
2971     + /*
2972     + * directories can have NULL lower inodes in
2973     + * b/t start/end, but NOT if at the
2974     + * start/end range.
2975     + */
2976     + if (unlikely(!(S_ISDIR(inode->i_mode) &&
2977     + bindex > fstart &&
2978     + bindex < fend))) {
2979     + PRINT_CALLER(fname, fxn, line);
2980     + pr_debug(" CF6: file/lower=%p:%p "
2981     + "bindex=%d fstart/end=%d:%d\n",
2982     + file, lower_file, bindex,
2983     + fstart, fend);
2984     + }
2985     + }
2986     + }
2987     + }
2988     +
2989     + __unionfs_check_dentry(dentry, fname, fxn, line);
2990     +}
2991     +
2992     +void __unionfs_check_nd(const struct nameidata *nd,
2993     + const char *fname, const char *fxn, int line)
2994     +{
2995     + struct file *file;
2996     + int printed_caller = 0;
2997     +
2998     + if (unlikely(!nd))
2999     + return;
3000     + if (nd->flags & LOOKUP_OPEN) {
3001     + file = nd->intent.open.file;
3002     + if (unlikely(file->f_path.dentry &&
3003     + strcmp(file->f_path.dentry->d_sb->s_type->name,
3004     + UNIONFS_NAME))) {
3005     + PRINT_CALLER(fname, fxn, line);
3006     + pr_debug(" CND1: lower_file of type %s\n",
3007     + file->f_path.dentry->d_sb->s_type->name);
3008     + BUG();
3009     + }
3010     + }
3011     +}
3012     +
3013     +/* useful to track vfsmount leaks that could cause EBUSY on unmount */
3014     +void __show_branch_counts(const struct super_block *sb,
3015     + const char *file, const char *fxn, int line)
3016     +{
3017     + int i;
3018     + struct vfsmount *mnt;
3019     +
3020     + pr_debug("BC:");
3021     + for (i = 0; i < sbmax(sb); i++) {
3022     + if (likely(sb->s_root))
3023     + mnt = UNIONFS_D(sb->s_root)->lower_paths[i].mnt;
3024     + else
3025     + mnt = NULL;
3026     + printk(KERN_CONT "%d:",
3027     + (mnt ? atomic_read(&mnt->mnt_count) : -99));
3028     + }
3029     + printk(KERN_CONT "%s:%s:%d\n", file, fxn, line);
3030     +}
3031     +
3032     +void __show_inode_times(const struct inode *inode,
3033     + const char *file, const char *fxn, int line)
3034     +{
3035     + struct inode *lower_inode;
3036     + int bindex;
3037     +
3038     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3039     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3040     + if (unlikely(!lower_inode))
3041     + continue;
3042     + pr_debug("IT(%lu:%d): %s:%s:%d "
3043     + "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3044     + inode->i_ino, bindex,
3045     + file, fxn, line,
3046     + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3047     + lower_inode->i_mtime.tv_sec,
3048     + lower_inode->i_mtime.tv_nsec,
3049     + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3050     + lower_inode->i_ctime.tv_sec,
3051     + lower_inode->i_ctime.tv_nsec);
3052     + }
3053     +}
3054     +
3055     +void __show_dinode_times(const struct dentry *dentry,
3056     + const char *file, const char *fxn, int line)
3057     +{
3058     + struct inode *inode = dentry->d_inode;
3059     + struct inode *lower_inode;
3060     + int bindex;
3061     +
3062     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3063     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3064     + if (!lower_inode)
3065     + continue;
3066     + pr_debug("DT(%s:%lu:%d): %s:%s:%d "
3067     + "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3068     + dentry->d_name.name, inode->i_ino, bindex,
3069     + file, fxn, line,
3070     + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3071     + lower_inode->i_mtime.tv_sec,
3072     + lower_inode->i_mtime.tv_nsec,
3073     + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3074     + lower_inode->i_ctime.tv_sec,
3075     + lower_inode->i_ctime.tv_nsec);
3076     + }
3077     +}
3078     +
3079     +void __show_inode_counts(const struct inode *inode,
3080     + const char *file, const char *fxn, int line)
3081     +{
3082     + struct inode *lower_inode;
3083     + int bindex;
3084     +
3085     + if (unlikely(!inode)) {
3086     + pr_debug("SiC: Null inode\n");
3087     + return;
3088     + }
3089     + for (bindex = sbstart(inode->i_sb); bindex <= sbend(inode->i_sb);
3090     + bindex++) {
3091     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3092     + if (unlikely(!lower_inode))
3093     + continue;
3094     + pr_debug("SIC(%lu:%d:%d): lc=%d %s:%s:%d\n",
3095     + inode->i_ino, bindex,
3096     + atomic_read(&(inode)->i_count),
3097     + atomic_read(&(lower_inode)->i_count),
3098     + file, fxn, line);
3099     + }
3100     +}
3101     diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
3102     new file mode 100644
3103     index 0000000..a0c3bba
3104     --- /dev/null
3105     +++ b/fs/unionfs/dentry.c
3106     @@ -0,0 +1,397 @@
3107     +/*
3108     + * Copyright (c) 2003-2010 Erez Zadok
3109     + * Copyright (c) 2003-2006 Charles P. Wright
3110     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3111     + * Copyright (c) 2005-2006 Junjiro Okajima
3112     + * Copyright (c) 2005 Arun M. Krishnakumar
3113     + * Copyright (c) 2004-2006 David P. Quigley
3114     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3115     + * Copyright (c) 2003 Puja Gupta
3116     + * Copyright (c) 2003 Harikesavan Krishnan
3117     + * Copyright (c) 2003-2010 Stony Brook University
3118     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3119     + *
3120     + * This program is free software; you can redistribute it and/or modify
3121     + * it under the terms of the GNU General Public License version 2 as
3122     + * published by the Free Software Foundation.
3123     + */
3124     +
3125     +#include "union.h"
3126     +
3127     +bool is_negative_lower(const struct dentry *dentry)
3128     +{
3129     + int bindex;
3130     + struct dentry *lower_dentry;
3131     +
3132     + BUG_ON(!dentry);
3133     + /* cache coherency: check if file was deleted on lower branch */
3134     + if (dbstart(dentry) < 0)
3135     + return true;
3136     + for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
3137     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3138     + /* unhashed (i.e., unlinked) lower dentries don't count */
3139     + if (lower_dentry && lower_dentry->d_inode &&
3140     + !d_deleted(lower_dentry) &&
3141     + !(lower_dentry->d_flags & DCACHE_NFSFS_RENAMED))
3142     + return false;
3143     + }
3144     + return true;
3145     +}
3146     +
3147     +static inline void __dput_lowers(struct dentry *dentry, int start, int end)
3148     +{
3149     + struct dentry *lower_dentry;
3150     + int bindex;
3151     +
3152     + if (start < 0)
3153     + return;
3154     + for (bindex = start; bindex <= end; bindex++) {
3155     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3156     + if (!lower_dentry)
3157     + continue;
3158     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
3159     + dput(lower_dentry);
3160     + }
3161     +}
3162     +
3163     +/*
3164     + * Purge and invalidate as many data pages of a unionfs inode. This is
3165     + * called when the lower inode has changed, and we want to force processes
3166     + * to re-get the new data.
3167     + */
3168     +static inline void purge_inode_data(struct inode *inode)
3169     +{
3170     + /* remove all non-private mappings */
3171     + unmap_mapping_range(inode->i_mapping, 0, 0, 0);
3172     + /* invalidate as many pages as possible */
3173     + invalidate_mapping_pages(inode->i_mapping, 0, -1);
3174     + /*
3175     + * Don't try to truncate_inode_pages here, because this could lead
3176     + * to a deadlock between some of address_space ops and dentry
3177     + * revalidation: the address space op is invoked with a lock on our
3178     + * own page, and truncate_inode_pages will block on locked pages.
3179     + */
3180     +}
3181     +
3182     +/*
3183     + * Revalidate a single file/symlink/special dentry. Assume that info nodes
3184     + * of the @dentry and its @parent are locked. Assume parent is valid,
3185     + * otherwise return false (and let's hope the VFS will try to re-lookup this
3186     + * dentry). Returns true if valid, false otherwise.
3187     + */
3188     +bool __unionfs_d_revalidate(struct dentry *dentry, struct dentry *parent,
3189     + bool willwrite)
3190     +{
3191     + bool valid = true; /* default is valid */
3192     + struct dentry *lower_dentry;
3193     + struct dentry *result;
3194     + int bindex, bstart, bend;
3195     + int sbgen, dgen, pdgen;
3196     + int positive = 0;
3197     + int interpose_flag;
3198     +
3199     + verify_locked(dentry);
3200     + verify_locked(parent);
3201     +
3202     + /* if the dentry is unhashed, do NOT revalidate */
3203     + if (d_deleted(dentry))
3204     + goto out;
3205     +
3206     + dgen = atomic_read(&UNIONFS_D(dentry)->generation);
3207     +
3208     + if (is_newer_lower(dentry)) {
3209     + /* root dentry is always valid */
3210     + if (IS_ROOT(dentry)) {
3211     + unionfs_copy_attr_times(dentry->d_inode);
3212     + } else {
3213     + /*
3214     + * reset generation number to zero, guaranteed to be
3215     + * "old"
3216     + */
3217     + dgen = 0;
3218     + atomic_set(&UNIONFS_D(dentry)->generation, dgen);
3219     + }
3220     + if (!willwrite)
3221     + purge_inode_data(dentry->d_inode);
3222     + }
3223     +
3224     + sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
3225     +
3226     + BUG_ON(dbstart(dentry) == -1);
3227     + if (dentry->d_inode)
3228     + positive = 1;
3229     +
3230     + /* if our dentry is valid, then validate all lower ones */
3231     + if (sbgen == dgen)
3232     + goto validate_lowers;
3233     +
3234     + /* The root entry should always be valid */
3235     + BUG_ON(IS_ROOT(dentry));
3236     +
3237     + /* We can't work correctly if our parent isn't valid. */
3238     + pdgen = atomic_read(&UNIONFS_D(parent)->generation);
3239     +
3240     + /* Free the pointers for our inodes and this dentry. */
3241     + path_put_lowers_all(dentry, false);
3242     +
3243     + interpose_flag = INTERPOSE_REVAL_NEG;
3244     + if (positive) {
3245     + interpose_flag = INTERPOSE_REVAL;
3246     + iput_lowers_all(dentry->d_inode, true);
3247     + }
3248     +
3249     + if (realloc_dentry_private_data(dentry) != 0) {
3250     + valid = false;
3251     + goto out;
3252     + }
3253     +
3254     + result = unionfs_lookup_full(dentry, parent, interpose_flag);
3255     + if (result) {
3256     + if (IS_ERR(result)) {
3257     + valid = false;
3258     + goto out;
3259     + }
3260     + /*
3261     + * current unionfs_lookup_backend() doesn't return
3262     + * a valid dentry
3263     + */
3264     + dput(dentry);
3265     + dentry = result;
3266     + }
3267     +
3268     + if (unlikely(positive && is_negative_lower(dentry))) {
3269     + /* call make_bad_inode here ? */
3270     + d_drop(dentry);
3271     + valid = false;
3272     + goto out;
3273     + }
3274     +
3275     + /*
3276     + * if we got here then we have revalidated our dentry and all lower
3277     + * ones, so we can return safely.
3278     + */
3279     + if (!valid) /* lower dentry revalidation failed */
3280     + goto out;
3281     +
3282     + /*
3283     + * If the parent's gen no. matches the superblock's gen no., then
3284     + * we can update our denty's gen no. If they didn't match, then it
3285     + * was OK to revalidate this dentry with a stale parent, but we'll
3286     + * purposely not update our dentry's gen no. (so it can be redone);
3287     + * and, we'll mark our parent dentry as invalid so it'll force it
3288     + * (and our dentry) to be revalidated.
3289     + */
3290     + if (pdgen == sbgen)
3291     + atomic_set(&UNIONFS_D(dentry)->generation, sbgen);
3292     + goto out;
3293     +
3294     +validate_lowers:
3295     +
3296     + /* The revalidation must occur across all branches */
3297     + bstart = dbstart(dentry);
3298     + bend = dbend(dentry);
3299     + BUG_ON(bstart == -1);
3300     + for (bindex = bstart; bindex <= bend; bindex++) {
3301     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3302     + if (!lower_dentry || !lower_dentry->d_op
3303     + || !lower_dentry->d_op->d_revalidate)
3304     + continue;
3305     + /*
3306     + * Don't pass nameidata to lower file system, because we
3307     + * don't want an arbitrary lower file being opened or
3308     + * returned to us: it may be useless to us because of the
3309     + * fanout nature of unionfs (cf. file/directory open-file
3310     + * invariants). We will open lower files as and when needed
3311     + * later on.
3312     + */
3313     + if (!lower_dentry->d_op->d_revalidate(lower_dentry, NULL))
3314     + valid = false;
3315     + }
3316     +
3317     + if (!dentry->d_inode ||
3318     + ibstart(dentry->d_inode) < 0 ||
3319     + ibend(dentry->d_inode) < 0) {
3320     + valid = false;
3321     + goto out;
3322     + }
3323     +
3324     + if (valid) {
3325     + /*
3326     + * If we get here, and we copy the meta-data from the lower
3327     + * inode to our inode, then it is vital that we have already
3328     + * purged all unionfs-level file data. We do that in the
3329     + * caller (__unionfs_d_revalidate) by calling
3330     + * purge_inode_data.
3331     + */
3332     + unionfs_copy_attr_all(dentry->d_inode,
3333     + unionfs_lower_inode(dentry->d_inode));
3334     + fsstack_copy_inode_size(dentry->d_inode,
3335     + unionfs_lower_inode(dentry->d_inode));
3336     + }
3337     +
3338     +out:
3339     + return valid;
3340     +}
3341     +
3342     +/*
3343     + * Determine if the lower inode objects have changed from below the unionfs
3344     + * inode. Return true if changed, false otherwise.
3345     + *
3346     + * We check if the mtime or ctime have changed. However, the inode times
3347     + * can be changed by anyone without much protection, including
3348     + * asynchronously. This can sometimes cause unionfs to find that the lower
3349     + * file system doesn't change its inode times quick enough, resulting in a
3350     + * false positive indication (which is harmless, it just makes unionfs do
3351     + * extra work in re-validating the objects). To minimize the chances of
3352     + * these situations, we still consider such small time changes valid, but we
3353     + * don't print debugging messages unless the time changes are greater than
3354     + * UNIONFS_MIN_CC_TIME (which defaults to 3 seconds, as with NFS's acregmin)
3355     + * because significant changes are more likely due to users manually
3356     + * touching lower files.
3357     + */
3358     +bool is_newer_lower(const struct dentry *dentry)
3359     +{
3360     + int bindex;
3361     + struct inode *inode;
3362     + struct inode *lower_inode;
3363     +
3364     + /* ignore if we're called on semi-initialized dentries/inodes */
3365     + if (!dentry || !UNIONFS_D(dentry))
3366     + return false;
3367     + inode = dentry->d_inode;
3368     + if (!inode || !UNIONFS_I(inode)->lower_inodes ||
3369     + ibstart(inode) < 0 || ibend(inode) < 0)
3370     + return false;
3371     +
3372     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3373     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3374     + if (!lower_inode)
3375     + continue;
3376     +
3377     + /* check if mtime/ctime have changed */
3378     + if (unlikely(timespec_compare(&inode->i_mtime,
3379     + &lower_inode->i_mtime) < 0)) {
3380     + if ((lower_inode->i_mtime.tv_sec -
3381     + inode->i_mtime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3382     + pr_info("unionfs: new lower inode mtime "
3383     + "(bindex=%d, name=%s)\n", bindex,
3384     + dentry->d_name.name);
3385     + show_dinode_times(dentry);
3386     + }
3387     + return true;
3388     + }
3389     + if (unlikely(timespec_compare(&inode->i_ctime,
3390     + &lower_inode->i_ctime) < 0)) {
3391     + if ((lower_inode->i_ctime.tv_sec -
3392     + inode->i_ctime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3393     + pr_info("unionfs: new lower inode ctime "
3394     + "(bindex=%d, name=%s)\n", bindex,
3395     + dentry->d_name.name);
3396     + show_dinode_times(dentry);
3397     + }
3398     + return true;
3399     + }
3400     + }
3401     +
3402     + /*
3403     + * Last check: if this is a positive dentry, but somehow all lower
3404     + * dentries are negative or unhashed, then this dentry needs to be
3405     + * revalidated, because someone probably deleted the objects from
3406     + * the lower branches directly.
3407     + */
3408     + if (is_negative_lower(dentry))
3409     + return true;
3410     +
3411     + return false; /* default: lower is not newer */
3412     +}
3413     +
3414     +static int unionfs_d_revalidate(struct dentry *dentry,
3415     + struct nameidata *nd_unused)
3416     +{
3417     + bool valid = true;
3418     + int err = 1; /* 1 means valid for the VFS */
3419     + struct dentry *parent;
3420     +
3421     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3422     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3423     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3424     +
3425     + valid = __unionfs_d_revalidate(dentry, parent, false);
3426     + if (valid) {
3427     + unionfs_postcopyup_setmnt(dentry);
3428     + unionfs_check_dentry(dentry);
3429     + } else {
3430     + d_drop(dentry);
3431     + err = valid;
3432     + }
3433     + unionfs_unlock_dentry(dentry);
3434     + unionfs_unlock_parent(dentry, parent);
3435     + unionfs_read_unlock(dentry->d_sb);
3436     +
3437     + return err;
3438     +}
3439     +
3440     +static void unionfs_d_release(struct dentry *dentry)
3441     +{
3442     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3443     + if (unlikely(!UNIONFS_D(dentry)))
3444     + goto out; /* skip if no lower branches */
3445     + /* must lock our branch configuration here */
3446     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3447     +
3448     + unionfs_check_dentry(dentry);
3449     + /* this could be a negative dentry, so check first */
3450     + if (dbstart(dentry) < 0) {
3451     + unionfs_unlock_dentry(dentry);
3452     + goto out; /* due to a (normal) failed lookup */
3453     + }
3454     +
3455     + /* Release all the lower dentries */
3456     + path_put_lowers_all(dentry, true);
3457     +
3458     + unionfs_unlock_dentry(dentry);
3459     +
3460     +out:
3461     + free_dentry_private_data(dentry);
3462     + unionfs_read_unlock(dentry->d_sb);
3463     + return;
3464     +}
3465     +
3466     +/*
3467     + * Called when we're removing the last reference to our dentry. So we
3468     + * should drop all lower references too.
3469     + */
3470     +static void unionfs_d_iput(struct dentry *dentry, struct inode *inode)
3471     +{
3472     + int rc;
3473     +
3474     + BUG_ON(!dentry);
3475     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3476     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3477     +
3478     + if (!UNIONFS_D(dentry) || dbstart(dentry) < 0)
3479     + goto drop_lower_inodes;
3480     + path_put_lowers_all(dentry, false);
3481     +
3482     +drop_lower_inodes:
3483     + rc = atomic_read(&inode->i_count);
3484     + if (rc == 1 && inode->i_nlink == 1 && ibstart(inode) >= 0) {
3485     + /* see Documentation/filesystems/unionfs/issues.txt */
3486     + lockdep_off();
3487     + iput(unionfs_lower_inode(inode));
3488     + lockdep_on();
3489     + unionfs_set_lower_inode(inode, NULL);
3490     + /* XXX: may need to set start/end to -1? */
3491     + }
3492     +
3493     + iput(inode);
3494     +
3495     + unionfs_unlock_dentry(dentry);
3496     + unionfs_read_unlock(dentry->d_sb);
3497     +}
3498     +
3499     +struct dentry_operations unionfs_dops = {
3500     + .d_revalidate = unionfs_d_revalidate,
3501     + .d_release = unionfs_d_release,
3502     + .d_iput = unionfs_d_iput,
3503     +};
3504     diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
3505     new file mode 100644
3506     index 0000000..7da0ff0
3507     --- /dev/null
3508     +++ b/fs/unionfs/dirfops.c
3509     @@ -0,0 +1,302 @@
3510     +/*
3511     + * Copyright (c) 2003-2010 Erez Zadok
3512     + * Copyright (c) 2003-2006 Charles P. Wright
3513     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3514     + * Copyright (c) 2005-2006 Junjiro Okajima
3515     + * Copyright (c) 2005 Arun M. Krishnakumar
3516     + * Copyright (c) 2004-2006 David P. Quigley
3517     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3518     + * Copyright (c) 2003 Puja Gupta
3519     + * Copyright (c) 2003 Harikesavan Krishnan
3520     + * Copyright (c) 2003-2010 Stony Brook University
3521     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3522     + *
3523     + * This program is free software; you can redistribute it and/or modify
3524     + * it under the terms of the GNU General Public License version 2 as
3525     + * published by the Free Software Foundation.
3526     + */
3527     +
3528     +#include "union.h"
3529     +
3530     +/* Make sure our rdstate is playing by the rules. */
3531     +static void verify_rdstate_offset(struct unionfs_dir_state *rdstate)
3532     +{
3533     + BUG_ON(rdstate->offset >= DIREOF);
3534     + BUG_ON(rdstate->cookie >= MAXRDCOOKIE);
3535     +}
3536     +
3537     +struct unionfs_getdents_callback {
3538     + struct unionfs_dir_state *rdstate;
3539     + void *dirent;
3540     + int entries_written;
3541     + int filldir_called;
3542     + int filldir_error;
3543     + filldir_t filldir;
3544     + struct super_block *sb;
3545     +};
3546     +
3547     +/* based on generic filldir in fs/readir.c */
3548     +static int unionfs_filldir(void *dirent, const char *oname, int namelen,
3549     + loff_t offset, u64 ino, unsigned int d_type)
3550     +{
3551     + struct unionfs_getdents_callback *buf = dirent;
3552     + struct filldir_node *found = NULL;
3553     + int err = 0;
3554     + int is_whiteout;
3555     + char *name = (char *) oname;
3556     +
3557     + buf->filldir_called++;
3558     +
3559     + is_whiteout = is_whiteout_name(&name, &namelen);
3560     +
3561     + found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3562     +
3563     + if (found) {
3564     + /*
3565     + * If we had non-whiteout entry in dir cache, then mark it
3566     + * as a whiteout and but leave it in the dir cache.
3567     + */
3568     + if (is_whiteout && !found->whiteout)
3569     + found->whiteout = is_whiteout;
3570     + goto out;
3571     + }
3572     +
3573     + /* if 'name' isn't a whiteout, filldir it. */
3574     + if (!is_whiteout) {
3575     + off_t pos = rdstate2offset(buf->rdstate);
3576     + u64 unionfs_ino = ino;
3577     +
3578     + err = buf->filldir(buf->dirent, name, namelen, pos,
3579     + unionfs_ino, d_type);
3580     + buf->rdstate->offset++;
3581     + verify_rdstate_offset(buf->rdstate);
3582     + }
3583     + /*
3584     + * If we did fill it, stuff it in our hash, otherwise return an
3585     + * error.
3586     + */
3587     + if (err) {
3588     + buf->filldir_error = err;
3589     + goto out;
3590     + }
3591     + buf->entries_written++;
3592     + err = add_filldir_node(buf->rdstate, name, namelen,
3593     + buf->rdstate->bindex, is_whiteout);
3594     + if (err)
3595     + buf->filldir_error = err;
3596     +
3597     +out:
3598     + return err;
3599     +}
3600     +
3601     +static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
3602     +{
3603     + int err = 0;
3604     + struct file *lower_file = NULL;
3605     + struct dentry *dentry = file->f_path.dentry;
3606     + struct dentry *parent;
3607     + struct inode *inode = NULL;
3608     + struct unionfs_getdents_callback buf;
3609     + struct unionfs_dir_state *uds;
3610     + int bend;
3611     + loff_t offset;
3612     +
3613     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3614     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3615     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3616     +
3617     + err = unionfs_file_revalidate(file, parent, false);
3618     + if (unlikely(err))
3619     + goto out;
3620     +
3621     + inode = dentry->d_inode;
3622     +
3623     + uds = UNIONFS_F(file)->rdstate;
3624     + if (!uds) {
3625     + if (file->f_pos == DIREOF) {
3626     + goto out;
3627     + } else if (file->f_pos > 0) {
3628     + uds = find_rdstate(inode, file->f_pos);
3629     + if (unlikely(!uds)) {
3630     + err = -ESTALE;
3631     + goto out;
3632     + }
3633     + UNIONFS_F(file)->rdstate = uds;
3634     + } else {
3635     + init_rdstate(file);
3636     + uds = UNIONFS_F(file)->rdstate;
3637     + }
3638     + }
3639     + bend = fbend(file);
3640     +
3641     + while (uds->bindex <= bend) {
3642     + lower_file = unionfs_lower_file_idx(file, uds->bindex);
3643     + if (!lower_file) {
3644     + uds->bindex++;
3645     + uds->dirpos = 0;
3646     + continue;
3647     + }
3648     +
3649     + /* prepare callback buffer */
3650     + buf.filldir_called = 0;
3651     + buf.filldir_error = 0;
3652     + buf.entries_written = 0;
3653     + buf.dirent = dirent;
3654     + buf.filldir = filldir;
3655     + buf.rdstate = uds;
3656     + buf.sb = inode->i_sb;
3657     +
3658     + /* Read starting from where we last left off. */
3659     + offset = vfs_llseek(lower_file, uds->dirpos, SEEK_SET);
3660     + if (offset < 0) {
3661     + err = offset;
3662     + goto out;
3663     + }
3664     + err = vfs_readdir(lower_file, unionfs_filldir, &buf);
3665     +
3666     + /* Save the position for when we continue. */
3667     + offset = vfs_llseek(lower_file, 0, SEEK_CUR);
3668     + if (offset < 0) {
3669     + err = offset;
3670     + goto out;
3671     + }
3672     + uds->dirpos = offset;
3673     +
3674     + /* Copy the atime. */
3675     + fsstack_copy_attr_atime(inode,
3676     + lower_file->f_path.dentry->d_inode);
3677     +
3678     + if (err < 0)
3679     + goto out;
3680     +
3681     + if (buf.filldir_error)
3682     + break;
3683     +
3684     + if (!buf.entries_written) {
3685     + uds->bindex++;
3686     + uds->dirpos = 0;
3687     + }
3688     + }
3689     +
3690     + if (!buf.filldir_error && uds->bindex >= bend) {
3691     + /* Save the number of hash entries for next time. */
3692     + UNIONFS_I(inode)->hashsize = uds->hashentries;
3693     + free_rdstate(uds);
3694     + UNIONFS_F(file)->rdstate = NULL;
3695     + file->f_pos = DIREOF;
3696     + } else {
3697     + file->f_pos = rdstate2offset(uds);
3698     + }
3699     +
3700     +out:
3701     + if (!err)
3702     + unionfs_check_file(file);
3703     + unionfs_unlock_dentry(dentry);
3704     + unionfs_unlock_parent(dentry, parent);
3705     + unionfs_read_unlock(dentry->d_sb);
3706     + return err;
3707     +}
3708     +
3709     +/*
3710     + * This is not meant to be a generic repositioning function. If you do
3711     + * things that aren't supported, then we return EINVAL.
3712     + *
3713     + * What is allowed:
3714     + * (1) seeking to the same position that you are currently at
3715     + * This really has no effect, but returns where you are.
3716     + * (2) seeking to the beginning of the file
3717     + * This throws out all state, and lets you begin again.
3718     + */
3719     +static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
3720     +{
3721     + struct unionfs_dir_state *rdstate;
3722     + struct dentry *dentry = file->f_path.dentry;
3723     + struct dentry *parent;
3724     + loff_t err;
3725     +
3726     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3727     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3728     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3729     +
3730     + err = unionfs_file_revalidate(file, parent, false);
3731     + if (unlikely(err))
3732     + goto out;
3733     +
3734     + rdstate = UNIONFS_F(file)->rdstate;
3735     +
3736     + /*
3737     + * we let users seek to their current position, but not anywhere
3738     + * else.
3739     + */
3740     + if (!offset) {
3741     + switch (origin) {
3742     + case SEEK_SET:
3743     + if (rdstate) {
3744     + free_rdstate(rdstate);
3745     + UNIONFS_F(file)->rdstate = NULL;
3746     + }
3747     + init_rdstate(file);
3748     + err = 0;
3749     + break;
3750     + case SEEK_CUR:
3751     + err = file->f_pos;
3752     + break;
3753     + case SEEK_END:
3754     + /* Unsupported, because we would break everything. */
3755     + err = -EINVAL;
3756     + break;
3757     + }
3758     + } else {
3759     + switch (origin) {
3760     + case SEEK_SET:
3761     + if (rdstate) {
3762     + if (offset == rdstate2offset(rdstate))
3763     + err = offset;
3764     + else if (file->f_pos == DIREOF)
3765     + err = DIREOF;
3766     + else
3767     + err = -EINVAL;
3768     + } else {
3769     + struct inode *inode;
3770     + inode = dentry->d_inode;
3771     + rdstate = find_rdstate(inode, offset);
3772     + if (rdstate) {
3773     + UNIONFS_F(file)->rdstate = rdstate;
3774     + err = rdstate->offset;
3775     + } else {
3776     + err = -EINVAL;
3777     + }
3778     + }
3779     + break;
3780     + case SEEK_CUR:
3781     + case SEEK_END:
3782     + /* Unsupported, because we would break everything. */
3783     + err = -EINVAL;
3784     + break;
3785     + }
3786     + }
3787     +
3788     +out:
3789     + if (!err)
3790     + unionfs_check_file(file);
3791     + unionfs_unlock_dentry(dentry);
3792     + unionfs_unlock_parent(dentry, parent);
3793     + unionfs_read_unlock(dentry->d_sb);
3794     + return err;
3795     +}
3796     +
3797     +/*
3798     + * Trimmed directory options, we shouldn't pass everything down since
3799     + * we don't want to operate on partial directories.
3800     + */
3801     +struct file_operations unionfs_dir_fops = {
3802     + .llseek = unionfs_dir_llseek,
3803     + .read = generic_read_dir,
3804     + .readdir = unionfs_readdir,
3805     + .unlocked_ioctl = unionfs_ioctl,
3806     + .open = unionfs_open,
3807     + .release = unionfs_file_release,
3808     + .flush = unionfs_flush,
3809     + .fsync = unionfs_fsync,
3810     + .fasync = unionfs_fasync,
3811     +};
3812     diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
3813     new file mode 100644
3814     index 0000000..033343b
3815     --- /dev/null
3816     +++ b/fs/unionfs/dirhelper.c
3817     @@ -0,0 +1,158 @@
3818     +/*
3819     + * Copyright (c) 2003-2010 Erez Zadok
3820     + * Copyright (c) 2003-2006 Charles P. Wright
3821     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3822     + * Copyright (c) 2005-2006 Junjiro Okajima
3823     + * Copyright (c) 2005 Arun M. Krishnakumar
3824     + * Copyright (c) 2004-2006 David P. Quigley
3825     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3826     + * Copyright (c) 2003 Puja Gupta
3827     + * Copyright (c) 2003 Harikesavan Krishnan
3828     + * Copyright (c) 2003-2010 Stony Brook University
3829     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3830     + *
3831     + * This program is free software; you can redistribute it and/or modify
3832     + * it under the terms of the GNU General Public License version 2 as
3833     + * published by the Free Software Foundation.
3834     + */
3835     +
3836     +#include "union.h"
3837     +
3838     +#define RD_NONE 0
3839     +#define RD_CHECK_EMPTY 1
3840     +/* The callback structure for check_empty. */
3841     +struct unionfs_rdutil_callback {
3842     + int err;
3843     + int filldir_called;
3844     + struct unionfs_dir_state *rdstate;
3845     + int mode;
3846     +};
3847     +
3848     +/* This filldir function makes sure only whiteouts exist within a directory. */
3849     +static int readdir_util_callback(void *dirent, const char *oname, int namelen,
3850     + loff_t offset, u64 ino, unsigned int d_type)
3851     +{
3852     + int err = 0;
3853     + struct unionfs_rdutil_callback *buf = dirent;
3854     + int is_whiteout;
3855     + struct filldir_node *found;
3856     + char *name = (char *) oname;
3857     +
3858     + buf->filldir_called = 1;
3859     +
3860     + if (name[0] == '.' && (namelen == 1 ||
3861     + (name[1] == '.' && namelen == 2)))
3862     + goto out;
3863     +
3864     + is_whiteout = is_whiteout_name(&name, &namelen);
3865     +
3866     + found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3867     + /* If it was found in the table there was a previous whiteout. */
3868     + if (found)
3869     + goto out;
3870     +
3871     + /*
3872     + * if it wasn't found and isn't a whiteout, the directory isn't
3873     + * empty.
3874     + */
3875     + err = -ENOTEMPTY;
3876     + if ((buf->mode == RD_CHECK_EMPTY) && !is_whiteout)
3877     + goto out;
3878     +
3879     + err = add_filldir_node(buf->rdstate, name, namelen,
3880     + buf->rdstate->bindex, is_whiteout);
3881     +
3882     +out:
3883     + buf->err = err;
3884     + return err;
3885     +}
3886     +
3887     +/* Is a directory logically empty? */
3888     +int check_empty(struct dentry *dentry, struct dentry *parent,
3889     + struct unionfs_dir_state **namelist)
3890     +{
3891     + int err = 0;
3892     + struct dentry *lower_dentry = NULL;
3893     + struct vfsmount *mnt;
3894     + struct super_block *sb;
3895     + struct file *lower_file;
3896     + struct unionfs_rdutil_callback *buf = NULL;
3897     + int bindex, bstart, bend, bopaque;
3898     +
3899     + sb = dentry->d_sb;
3900     +
3901     +
3902     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
3903     +
3904     + err = unionfs_partial_lookup(dentry, parent);
3905     + if (err)
3906     + goto out;
3907     +
3908     + bstart = dbstart(dentry);
3909     + bend = dbend(dentry);
3910     + bopaque = dbopaque(dentry);
3911     + if (0 <= bopaque && bopaque < bend)
3912     + bend = bopaque;
3913     +
3914     + buf = kmalloc(sizeof(struct unionfs_rdutil_callback), GFP_KERNEL);
3915     + if (unlikely(!buf)) {
3916     + err = -ENOMEM;
3917     + goto out;
3918     + }
3919     + buf->err = 0;
3920     + buf->mode = RD_CHECK_EMPTY;
3921     + buf->rdstate = alloc_rdstate(dentry->d_inode, bstart);
3922     + if (unlikely(!buf->rdstate)) {
3923     + err = -ENOMEM;
3924     + goto out;
3925     + }
3926     +
3927     + /* Process the lower directories with rdutil_callback as a filldir. */
3928     + for (bindex = bstart; bindex <= bend; bindex++) {
3929     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3930     + if (!lower_dentry)
3931     + continue;
3932     + if (!lower_dentry->d_inode)
3933     + continue;
3934     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
3935     + continue;
3936     +
3937     + dget(lower_dentry);
3938     + mnt = unionfs_mntget(dentry, bindex);
3939     + branchget(sb, bindex);
3940     + lower_file = dentry_open(lower_dentry, mnt, O_RDONLY, current_cred());
3941     + if (IS_ERR(lower_file)) {
3942     + err = PTR_ERR(lower_file);
3943     + branchput(sb, bindex);
3944     + goto out;
3945     + }
3946     +
3947     + do {
3948     + buf->filldir_called = 0;
3949     + buf->rdstate->bindex = bindex;
3950     + err = vfs_readdir(lower_file,
3951     + readdir_util_callback, buf);
3952     + if (buf->err)
3953     + err = buf->err;
3954     + } while ((err >= 0) && buf->filldir_called);
3955     +
3956     + /* fput calls dput for lower_dentry */
3957     + fput(lower_file);
3958     + branchput(sb, bindex);
3959     +
3960     + if (err < 0)
3961     + goto out;
3962     + }
3963     +
3964     +out:
3965     + if (buf) {
3966     + if (namelist && !err)
3967     + *namelist = buf->rdstate;
3968     + else if (buf->rdstate)
3969     + free_rdstate(buf->rdstate);
3970     + kfree(buf);
3971     + }
3972     +
3973     +
3974     + return err;
3975     +}
3976     diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h
3977     new file mode 100644
3978     index 0000000..5b77eac
3979     --- /dev/null
3980     +++ b/fs/unionfs/fanout.h
3981     @@ -0,0 +1,407 @@
3982     +/*
3983     + * Copyright (c) 2003-2010 Erez Zadok
3984     + * Copyright (c) 2003-2006 Charles P. Wright
3985     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3986     + * Copyright (c) 2005 Arun M. Krishnakumar
3987     + * Copyright (c) 2004-2006 David P. Quigley
3988     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3989     + * Copyright (c) 2003 Puja Gupta
3990     + * Copyright (c) 2003 Harikesavan Krishnan
3991     + * Copyright (c) 2003-2010 Stony Brook University
3992     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3993     + *
3994     + * This program is free software; you can redistribute it and/or modify
3995     + * it under the terms of the GNU General Public License version 2 as
3996     + * published by the Free Software Foundation.
3997     + */
3998     +
3999     +#ifndef _FANOUT_H_
4000     +#define _FANOUT_H_
4001     +
4002     +/*
4003     + * Inode to private data
4004     + *
4005     + * Since we use containers and the struct inode is _inside_ the
4006     + * unionfs_inode_info structure, UNIONFS_I will always (given a non-NULL
4007     + * inode pointer), return a valid non-NULL pointer.
4008     + */
4009     +static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode)
4010     +{
4011     + return container_of(inode, struct unionfs_inode_info, vfs_inode);
4012     +}
4013     +
4014     +#define ibstart(ino) (UNIONFS_I(ino)->bstart)
4015     +#define ibend(ino) (UNIONFS_I(ino)->bend)
4016     +
4017     +/* Dentry to private data */
4018     +#define UNIONFS_D(dent) ((struct unionfs_dentry_info *)(dent)->d_fsdata)
4019     +#define dbstart(dent) (UNIONFS_D(dent)->bstart)
4020     +#define dbend(dent) (UNIONFS_D(dent)->bend)
4021     +#define dbopaque(dent) (UNIONFS_D(dent)->bopaque)
4022     +
4023     +/* Superblock to private data */
4024     +#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info)
4025     +#define sbstart(sb) 0
4026     +#define sbend(sb) (UNIONFS_SB(sb)->bend)
4027     +#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1)
4028     +#define sbhbid(sb) (UNIONFS_SB(sb)->high_branch_id)
4029     +
4030     +/* File to private Data */
4031     +#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data))
4032     +#define fbstart(file) (UNIONFS_F(file)->bstart)
4033     +#define fbend(file) (UNIONFS_F(file)->bend)
4034     +
4035     +/* macros to manipulate branch IDs in stored in our superblock */
4036     +static inline int branch_id(struct super_block *sb, int index)
4037     +{
4038     + BUG_ON(!sb || index < 0);
4039     + return UNIONFS_SB(sb)->data[index].branch_id;
4040     +}
4041     +
4042     +static inline void set_branch_id(struct super_block *sb, int index, int val)
4043     +{
4044     + BUG_ON(!sb || index < 0);
4045     + UNIONFS_SB(sb)->data[index].branch_id = val;
4046     +}
4047     +
4048     +static inline void new_branch_id(struct super_block *sb, int index)
4049     +{
4050     + BUG_ON(!sb || index < 0);
4051     + set_branch_id(sb, index, ++UNIONFS_SB(sb)->high_branch_id);
4052     +}
4053     +
4054     +/*
4055     + * Find new index of matching branch with an existing superblock of a known
4056     + * (possibly old) id. This is needed because branches could have been
4057     + * added/deleted causing the branches of any open files to shift.
4058     + *
4059     + * @sb: the new superblock which may have new/different branch IDs
4060     + * @id: the old/existing id we're looking for
4061     + * Returns index of newly found branch (0 or greater), -1 otherwise.
4062     + */
4063     +static inline int branch_id_to_idx(struct super_block *sb, int id)
4064     +{
4065     + int i;
4066     + for (i = 0; i < sbmax(sb); i++) {
4067     + if (branch_id(sb, i) == id)
4068     + return i;
4069     + }
4070     + /* in the non-ODF code, this should really never happen */
4071     + printk(KERN_WARNING "unionfs: cannot find branch with id %d\n", id);
4072     + return -1;
4073     +}
4074     +
4075     +/* File to lower file. */
4076     +static inline struct file *unionfs_lower_file(const struct file *f)
4077     +{
4078     + BUG_ON(!f);
4079     + return UNIONFS_F(f)->lower_files[fbstart(f)];
4080     +}
4081     +
4082     +static inline struct file *unionfs_lower_file_idx(const struct file *f,
4083     + int index)
4084     +{
4085     + BUG_ON(!f || index < 0);
4086     + return UNIONFS_F(f)->lower_files[index];
4087     +}
4088     +
4089     +static inline void unionfs_set_lower_file_idx(struct file *f, int index,
4090     + struct file *val)
4091     +{
4092     + BUG_ON(!f || index < 0);
4093     + UNIONFS_F(f)->lower_files[index] = val;
4094     + /* save branch ID (may be redundant?) */
4095     + UNIONFS_F(f)->saved_branch_ids[index] =
4096     + branch_id((f)->f_path.dentry->d_sb, index);
4097     +}
4098     +
4099     +static inline void unionfs_set_lower_file(struct file *f, struct file *val)
4100     +{
4101     + BUG_ON(!f);
4102     + unionfs_set_lower_file_idx((f), fbstart(f), (val));
4103     +}
4104     +
4105     +/* Inode to lower inode. */
4106     +static inline struct inode *unionfs_lower_inode(const struct inode *i)
4107     +{
4108     + BUG_ON(!i);
4109     + return UNIONFS_I(i)->lower_inodes[ibstart(i)];
4110     +}
4111     +
4112     +static inline struct inode *unionfs_lower_inode_idx(const struct inode *i,
4113     + int index)
4114     +{
4115     + BUG_ON(!i || index < 0);
4116     + return UNIONFS_I(i)->lower_inodes[index];
4117     +}
4118     +
4119     +static inline void unionfs_set_lower_inode_idx(struct inode *i, int index,
4120     + struct inode *val)
4121     +{
4122     + BUG_ON(!i || index < 0);
4123     + UNIONFS_I(i)->lower_inodes[index] = val;
4124     +}
4125     +
4126     +static inline void unionfs_set_lower_inode(struct inode *i, struct inode *val)
4127     +{
4128     + BUG_ON(!i);
4129     + UNIONFS_I(i)->lower_inodes[ibstart(i)] = val;
4130     +}
4131     +
4132     +/* Superblock to lower superblock. */
4133     +static inline struct super_block *unionfs_lower_super(
4134     + const struct super_block *sb)
4135     +{
4136     + BUG_ON(!sb);
4137     + return UNIONFS_SB(sb)->data[sbstart(sb)].sb;
4138     +}
4139     +
4140     +static inline struct super_block *unionfs_lower_super_idx(
4141     + const struct super_block *sb,
4142     + int index)
4143     +{
4144     + BUG_ON(!sb || index < 0);
4145     + return UNIONFS_SB(sb)->data[index].sb;
4146     +}
4147     +
4148     +static inline void unionfs_set_lower_super_idx(struct super_block *sb,
4149     + int index,
4150     + struct super_block *val)
4151     +{
4152     + BUG_ON(!sb || index < 0);
4153     + UNIONFS_SB(sb)->data[index].sb = val;
4154     +}
4155     +
4156     +static inline void unionfs_set_lower_super(struct super_block *sb,
4157     + struct super_block *val)
4158     +{
4159     + BUG_ON(!sb);
4160     + UNIONFS_SB(sb)->data[sbstart(sb)].sb = val;
4161     +}
4162     +
4163     +/* Branch count macros. */
4164     +static inline int branch_count(const struct super_block *sb, int index)
4165     +{
4166     + BUG_ON(!sb || index < 0);
4167     + return atomic_read(&UNIONFS_SB(sb)->data[index].open_files);
4168     +}
4169     +
4170     +static inline void set_branch_count(struct super_block *sb, int index, int val)
4171     +{
4172     + BUG_ON(!sb || index < 0);
4173     + atomic_set(&UNIONFS_SB(sb)->data[index].open_files, val);
4174     +}
4175     +
4176     +static inline void branchget(struct super_block *sb, int index)
4177     +{
4178     + BUG_ON(!sb || index < 0);
4179     + atomic_inc(&UNIONFS_SB(sb)->data[index].open_files);
4180     +}
4181     +
4182     +static inline void branchput(struct super_block *sb, int index)
4183     +{
4184     + BUG_ON(!sb || index < 0);
4185     + atomic_dec(&UNIONFS_SB(sb)->data[index].open_files);
4186     +}
4187     +
4188     +/* Dentry macros */
4189     +static inline void unionfs_set_lower_dentry_idx(struct dentry *dent, int index,
4190     + struct dentry *val)
4191     +{
4192     + BUG_ON(!dent || index < 0);
4193     + UNIONFS_D(dent)->lower_paths[index].dentry = val;
4194     +}
4195     +
4196     +static inline struct dentry *unionfs_lower_dentry_idx(
4197     + const struct dentry *dent,
4198     + int index)
4199     +{
4200     + BUG_ON(!dent || index < 0);
4201     + return UNIONFS_D(dent)->lower_paths[index].dentry;
4202     +}
4203     +
4204     +static inline struct dentry *unionfs_lower_dentry(const struct dentry *dent)
4205     +{
4206     + BUG_ON(!dent);
4207     + return unionfs_lower_dentry_idx(dent, dbstart(dent));
4208     +}
4209     +
4210     +static inline void unionfs_set_lower_mnt_idx(struct dentry *dent, int index,
4211     + struct vfsmount *mnt)
4212     +{
4213     + BUG_ON(!dent || index < 0);
4214     + UNIONFS_D(dent)->lower_paths[index].mnt = mnt;
4215     +}
4216     +
4217     +static inline struct vfsmount *unionfs_lower_mnt_idx(
4218     + const struct dentry *dent,
4219     + int index)
4220     +{
4221     + BUG_ON(!dent || index < 0);
4222     + return UNIONFS_D(dent)->lower_paths[index].mnt;
4223     +}
4224     +
4225     +static inline struct vfsmount *unionfs_lower_mnt(const struct dentry *dent)
4226     +{
4227     + BUG_ON(!dent);
4228     + return unionfs_lower_mnt_idx(dent, dbstart(dent));
4229     +}
4230     +
4231     +/* Macros for locking a dentry. */
4232     +enum unionfs_dentry_lock_class {
4233     + UNIONFS_DMUTEX_NORMAL,
4234     + UNIONFS_DMUTEX_ROOT,
4235     + UNIONFS_DMUTEX_PARENT,
4236     + UNIONFS_DMUTEX_CHILD,
4237     + UNIONFS_DMUTEX_WHITEOUT,
4238     + UNIONFS_DMUTEX_REVAL_PARENT, /* for file/dentry revalidate */
4239     + UNIONFS_DMUTEX_REVAL_CHILD, /* for file/dentry revalidate */
4240     +};
4241     +
4242     +static inline void unionfs_lock_dentry(struct dentry *d,
4243     + unsigned int subclass)
4244     +{
4245     + BUG_ON(!d);
4246     + mutex_lock_nested(&UNIONFS_D(d)->lock, subclass);
4247     +}
4248     +
4249     +static inline void unionfs_unlock_dentry(struct dentry *d)
4250     +{
4251     + BUG_ON(!d);
4252     + mutex_unlock(&UNIONFS_D(d)->lock);
4253     +}
4254     +
4255     +static inline struct dentry *unionfs_lock_parent(struct dentry *d,
4256     + unsigned int subclass)
4257     +{
4258     + struct dentry *p;
4259     +
4260     + BUG_ON(!d);
4261     + p = dget_parent(d);
4262     + if (p != d)
4263     + mutex_lock_nested(&UNIONFS_D(p)->lock, subclass);
4264     + return p;
4265     +}
4266     +
4267     +static inline void unionfs_unlock_parent(struct dentry *d, struct dentry *p)
4268     +{
4269     + BUG_ON(!d);
4270     + BUG_ON(!p);
4271     + if (p != d) {
4272     + BUG_ON(!mutex_is_locked(&UNIONFS_D(p)->lock));
4273     + mutex_unlock(&UNIONFS_D(p)->lock);
4274     + }
4275     + dput(p);
4276     +}
4277     +
4278     +static inline void verify_locked(struct dentry *d)
4279     +{
4280     + BUG_ON(!d);
4281     + BUG_ON(!mutex_is_locked(&UNIONFS_D(d)->lock));
4282     +}
4283     +
4284     +/* macros to put lower objects */
4285     +
4286     +/*
4287     + * iput lower inodes of an unionfs dentry, from bstart to bend. If
4288     + * @free_lower is true, then also kfree the memory used to hold the lower
4289     + * object pointers.
4290     + */
4291     +static inline void iput_lowers(struct inode *inode,
4292     + int bstart, int bend, bool free_lower)
4293     +{
4294     + struct inode *lower_inode;
4295     + int bindex;
4296     +
4297     + BUG_ON(!inode);
4298     + BUG_ON(!UNIONFS_I(inode));
4299     + BUG_ON(bstart < 0);
4300     +
4301     + for (bindex = bstart; bindex <= bend; bindex++) {
4302     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4303     + if (lower_inode) {
4304     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
4305     + /* see Documentation/filesystems/unionfs/issues.txt */
4306     + lockdep_off();
4307     + iput(lower_inode);
4308     + lockdep_on();
4309     + }
4310     + }
4311     +
4312     + if (free_lower) {
4313     + kfree(UNIONFS_I(inode)->lower_inodes);
4314     + UNIONFS_I(inode)->lower_inodes = NULL;
4315     + }
4316     +}
4317     +
4318     +/* iput all lower inodes, and reset start/end branch indices to -1 */
4319     +static inline void iput_lowers_all(struct inode *inode, bool free_lower)
4320     +{
4321     + int bstart, bend;
4322     +
4323     + BUG_ON(!inode);
4324     + BUG_ON(!UNIONFS_I(inode));
4325     + bstart = ibstart(inode);
4326     + bend = ibend(inode);
4327     + BUG_ON(bstart < 0);
4328     +
4329     + iput_lowers(inode, bstart, bend, free_lower);
4330     + ibstart(inode) = ibend(inode) = -1;
4331     +}
4332     +
4333     +/*
4334     + * dput/mntput all lower dentries and vfsmounts of an unionfs dentry, from
4335     + * bstart to bend. If @free_lower is true, then also kfree the memory used
4336     + * to hold the lower object pointers.
4337     + *
4338     + * XXX: implement using path_put VFS macros
4339     + */
4340     +static inline void path_put_lowers(struct dentry *dentry,
4341     + int bstart, int bend, bool free_lower)
4342     +{
4343     + struct dentry *lower_dentry;
4344     + struct vfsmount *lower_mnt;
4345     + int bindex;
4346     +
4347     + BUG_ON(!dentry);
4348     + BUG_ON(!UNIONFS_D(dentry));
4349     + BUG_ON(bstart < 0);
4350     +
4351     + for (bindex = bstart; bindex <= bend; bindex++) {
4352     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4353     + if (lower_dentry) {
4354     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
4355     + dput(lower_dentry);
4356     + }
4357     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
4358     + if (lower_mnt) {
4359     + unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
4360     + mntput(lower_mnt);
4361     + }
4362     + }
4363     +
4364     + if (free_lower) {
4365     + kfree(UNIONFS_D(dentry)->lower_paths);
4366     + UNIONFS_D(dentry)->lower_paths = NULL;
4367     + }
4368     +}
4369     +
4370     +/*
4371     + * dput/mntput all lower dentries and vfsmounts, and reset start/end branch
4372     + * indices to -1.
4373     + */
4374     +static inline void path_put_lowers_all(struct dentry *dentry, bool free_lower)
4375     +{
4376     + int bstart, bend;
4377     +
4378     + BUG_ON(!dentry);
4379     + BUG_ON(!UNIONFS_D(dentry));
4380     + bstart = dbstart(dentry);
4381     + bend = dbend(dentry);
4382     + BUG_ON(bstart < 0);
4383     +
4384     + path_put_lowers(dentry, bstart, bend, free_lower);
4385     + dbstart(dentry) = dbend(dentry) = -1;
4386     +}
4387     +
4388     +#endif /* not _FANOUT_H */
4389     diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
4390     new file mode 100644
4391     index 0000000..46eaa90
4392     --- /dev/null
4393     +++ b/fs/unionfs/file.c
4394     @@ -0,0 +1,380 @@
4395     +/*
4396     + * Copyright (c) 2003-2010 Erez Zadok
4397     + * Copyright (c) 2003-2006 Charles P. Wright
4398     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4399     + * Copyright (c) 2005-2006 Junjiro Okajima
4400     + * Copyright (c) 2005 Arun M. Krishnakumar
4401     + * Copyright (c) 2004-2006 David P. Quigley
4402     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4403     + * Copyright (c) 2003 Puja Gupta
4404     + * Copyright (c) 2003 Harikesavan Krishnan
4405     + * Copyright (c) 2003-2010 Stony Brook University
4406     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
4407     + *
4408     + * This program is free software; you can redistribute it and/or modify
4409     + * it under the terms of the GNU General Public License version 2 as
4410     + * published by the Free Software Foundation.
4411     + */
4412     +
4413     +#include "union.h"
4414     +
4415     +static ssize_t unionfs_read(struct file *file, char __user *buf,
4416     + size_t count, loff_t *ppos)
4417     +{
4418     + int err;
4419     + struct file *lower_file;
4420     + struct dentry *dentry = file->f_path.dentry;
4421     + struct dentry *parent;
4422     +
4423     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4424     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4425     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4426     +
4427     + err = unionfs_file_revalidate(file, parent, false);
4428     + if (unlikely(err))
4429     + goto out;
4430     +
4431     + lower_file = unionfs_lower_file(file);
4432     + err = vfs_read(lower_file, buf, count, ppos);
4433     + /* update our inode atime upon a successful lower read */
4434     + if (err >= 0) {
4435     + fsstack_copy_attr_atime(dentry->d_inode,
4436     + lower_file->f_path.dentry->d_inode);
4437     + unionfs_check_file(file);
4438     + }
4439     +
4440     +out:
4441     + unionfs_unlock_dentry(dentry);
4442     + unionfs_unlock_parent(dentry, parent);
4443     + unionfs_read_unlock(dentry->d_sb);
4444     + return err;
4445     +}
4446     +
4447     +static ssize_t unionfs_write(struct file *file, const char __user *buf,
4448     + size_t count, loff_t *ppos)
4449     +{
4450     + int err = 0;
4451     + struct file *lower_file;
4452     + struct dentry *dentry = file->f_path.dentry;
4453     + struct dentry *parent;
4454     +
4455     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4456     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4457     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4458     +
4459     + err = unionfs_file_revalidate(file, parent, true);
4460     + if (unlikely(err))
4461     + goto out;
4462     +
4463     + lower_file = unionfs_lower_file(file);
4464     + err = vfs_write(lower_file, buf, count, ppos);
4465     + /* update our inode times+sizes upon a successful lower write */
4466     + if (err >= 0) {
4467     + fsstack_copy_inode_size(dentry->d_inode,
4468     + lower_file->f_path.dentry->d_inode);
4469     + fsstack_copy_attr_times(dentry->d_inode,
4470     + lower_file->f_path.dentry->d_inode);
4471     + UNIONFS_F(file)->wrote_to_file = true; /* for delayed copyup */
4472     + unionfs_check_file(file);
4473     + }
4474     +
4475     +out:
4476     + unionfs_unlock_dentry(dentry);
4477     + unionfs_unlock_parent(dentry, parent);
4478     + unionfs_read_unlock(dentry->d_sb);
4479     + return err;
4480     +}
4481     +
4482     +static int unionfs_file_readdir(struct file *file, void *dirent,
4483     + filldir_t filldir)
4484     +{
4485     + return -ENOTDIR;
4486     +}
4487     +
4488     +static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
4489     +{
4490     + int err = 0;
4491     + bool willwrite;
4492     + struct file *lower_file;
4493     + struct dentry *dentry = file->f_path.dentry;
4494     + struct dentry *parent;
4495     + const struct vm_operations_struct *saved_vm_ops = NULL;
4496     +
4497     + /*
4498     + * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
4499     + * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
4500     + * has been causing false positives in file system stacking layers.
4501     + * In particular, our ->mmap is called after sys_mmap2 already holds
4502     + * mmap_sem, then we lock our own mutexes; but earlier, it's
4503     + * possible for lockdep to have locked our mutexes first, and then
4504     + * we call a lower ->readdir which could call might_fault. The
4505     + * different ordering of the locks is what lockdep complains about
4506     + * -- unnecessarily. Therefore, we have no choice but to tell
4507     + * lockdep to temporarily turn off lockdep here. Note: the comments
4508     + * inside might_sleep also suggest that it would have been
4509     + * nicer to only annotate paths that needs that might_lock_read.
4510     + */
4511     + lockdep_off();
4512     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4513     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4514     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4515     +
4516     + /* This might be deferred to mmap's writepage */
4517     + willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
4518     + err = unionfs_file_revalidate(file, parent, willwrite);
4519     + if (unlikely(err))
4520     + goto out;
4521     + unionfs_check_file(file);
4522     +
4523     + /*
4524     + * File systems which do not implement ->writepage may use
4525     + * generic_file_readonly_mmap as their ->mmap op. If you call
4526     + * generic_file_readonly_mmap with VM_WRITE, you'd get an -EINVAL.
4527     + * But we cannot call the lower ->mmap op, so we can't tell that
4528     + * writeable mappings won't work. Therefore, our only choice is to
4529     + * check if the lower file system supports the ->writepage, and if
4530     + * not, return EINVAL (the same error that
4531     + * generic_file_readonly_mmap returns in that case).
4532     + */
4533     + lower_file = unionfs_lower_file(file);
4534     + if (willwrite && !lower_file->f_mapping->a_ops->writepage) {
4535     + err = -EINVAL;
4536     + printk(KERN_ERR "unionfs: branch %d file system does not "
4537     + "support writeable mmap\n", fbstart(file));
4538     + goto out;
4539     + }
4540     +
4541     + /*
4542     + * find and save lower vm_ops.
4543     + *
4544     + * XXX: the VFS should have a cleaner way of finding the lower vm_ops
4545     + */
4546     + if (!UNIONFS_F(file)->lower_vm_ops) {
4547     + err = lower_file->f_op->mmap(lower_file, vma);
4548     + if (err) {
4549     + printk(KERN_ERR "unionfs: lower mmap failed %d\n", err);
4550     + goto out;
4551     + }
4552     + saved_vm_ops = vma->vm_ops;
4553     + err = do_munmap(current->mm, vma->vm_start,
4554     + vma->vm_end - vma->vm_start);
4555     + if (err) {
4556     + printk(KERN_ERR "unionfs: do_munmap failed %d\n", err);
4557     + goto out;
4558     + }
4559     + }
4560     +
4561     + file->f_mapping->a_ops = &unionfs_dummy_aops;
4562     + err = generic_file_mmap(file, vma);
4563     + file->f_mapping->a_ops = &unionfs_aops;
4564     + if (err) {
4565     + printk(KERN_ERR "unionfs: generic_file_mmap failed %d\n", err);
4566     + goto out;
4567     + }
4568     + vma->vm_ops = &unionfs_vm_ops;
4569     + if (!UNIONFS_F(file)->lower_vm_ops)
4570     + UNIONFS_F(file)->lower_vm_ops = saved_vm_ops;
4571     +
4572     +out:
4573     + if (!err) {
4574     + /* copyup could cause parent dir times to change */
4575     + unionfs_copy_attr_times(parent->d_inode);
4576     + unionfs_check_file(file);
4577     + }
4578     + unionfs_unlock_dentry(dentry);
4579     + unionfs_unlock_parent(dentry, parent);
4580     + unionfs_read_unlock(dentry->d_sb);
4581     + lockdep_on();
4582     + return err;
4583     +}
4584     +
4585     +int unionfs_fsync(struct file *file, struct dentry *dentry, int datasync)
4586     +{
4587     + int bindex, bstart, bend;
4588     + struct file *lower_file;
4589     + struct dentry *lower_dentry;
4590     + struct dentry *parent;
4591     + struct inode *lower_inode, *inode;
4592     + int err = -EINVAL;
4593     +
4594     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4595     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4596     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4597     +
4598     + err = unionfs_file_revalidate(file, parent, true);
4599     + if (unlikely(err))
4600     + goto out;
4601     + unionfs_check_file(file);
4602     +
4603     + bstart = fbstart(file);
4604     + bend = fbend(file);
4605     + if (bstart < 0 || bend < 0)
4606     + goto out;
4607     +
4608     + inode = dentry->d_inode;
4609     + if (unlikely(!inode)) {
4610     + printk(KERN_ERR
4611     + "unionfs: null lower inode in unionfs_fsync\n");
4612     + goto out;
4613     + }
4614     + for (bindex = bstart; bindex <= bend; bindex++) {
4615     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4616     + if (!lower_inode || !lower_inode->i_fop->fsync)
4617     + continue;
4618     + lower_file = unionfs_lower_file_idx(file, bindex);
4619     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4620     + mutex_lock(&lower_inode->i_mutex);
4621     + err = lower_inode->i_fop->fsync(lower_file,
4622     + lower_dentry,
4623     + datasync);
4624     + if (!err && bindex == bstart)
4625     + fsstack_copy_attr_times(inode, lower_inode);
4626     + mutex_unlock(&lower_inode->i_mutex);
4627     + if (err)
4628     + goto out;
4629     + }
4630     +
4631     +out:
4632     + if (!err)
4633     + unionfs_check_file(file);
4634     + unionfs_unlock_dentry(dentry);
4635     + unionfs_unlock_parent(dentry, parent);
4636     + unionfs_read_unlock(dentry->d_sb);
4637     + return err;
4638     +}
4639     +
4640     +int unionfs_fasync(int fd, struct file *file, int flag)
4641     +{
4642     + int bindex, bstart, bend;
4643     + struct file *lower_file;
4644     + struct dentry *dentry = file->f_path.dentry;
4645     + struct dentry *parent;
4646     + struct inode *lower_inode, *inode;
4647     + int err = 0;
4648     +
4649     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4650     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4651     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4652     +
4653     + err = unionfs_file_revalidate(file, parent, true);
4654     + if (unlikely(err))
4655     + goto out;
4656     + unionfs_check_file(file);
4657     +
4658     + bstart = fbstart(file);
4659     + bend = fbend(file);
4660     + if (bstart < 0 || bend < 0)
4661     + goto out;
4662     +
4663     + inode = dentry->d_inode;
4664     + if (unlikely(!inode)) {
4665     + printk(KERN_ERR
4666     + "unionfs: null lower inode in unionfs_fasync\n");
4667     + goto out;
4668     + }
4669     + for (bindex = bstart; bindex <= bend; bindex++) {
4670     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4671     + if (!lower_inode || !lower_inode->i_fop->fasync)
4672     + continue;
4673     + lower_file = unionfs_lower_file_idx(file, bindex);
4674     + mutex_lock(&lower_inode->i_mutex);
4675     + err = lower_inode->i_fop->fasync(fd, lower_file, flag);
4676     + if (!err && bindex == bstart)
4677     + fsstack_copy_attr_times(inode, lower_inode);
4678     + mutex_unlock(&lower_inode->i_mutex);
4679     + if (err)
4680     + goto out;
4681     + }
4682     +
4683     +out:
4684     + if (!err)
4685     + unionfs_check_file(file);
4686     + unionfs_unlock_dentry(dentry);
4687     + unionfs_unlock_parent(dentry, parent);
4688     + unionfs_read_unlock(dentry->d_sb);
4689     + return err;
4690     +}
4691     +
4692     +static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
4693     + struct pipe_inode_info *pipe, size_t len,
4694     + unsigned int flags)
4695     +{
4696     + ssize_t err;
4697     + struct file *lower_file;
4698     + struct dentry *dentry = file->f_path.dentry;
4699     + struct dentry *parent;
4700     +
4701     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4702     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4703     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4704     +
4705     + err = unionfs_file_revalidate(file, parent, false);
4706     + if (unlikely(err))
4707     + goto out;
4708     +
4709     + lower_file = unionfs_lower_file(file);
4710     + err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
4711     + /* update our inode atime upon a successful lower splice-read */
4712     + if (err >= 0) {
4713     + fsstack_copy_attr_atime(dentry->d_inode,
4714     + lower_file->f_path.dentry->d_inode);
4715     + unionfs_check_file(file);
4716     + }
4717     +
4718     +out:
4719     + unionfs_unlock_dentry(dentry);
4720     + unionfs_unlock_parent(dentry, parent);
4721     + unionfs_read_unlock(dentry->d_sb);
4722     + return err;
4723     +}
4724     +
4725     +static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
4726     + struct file *file, loff_t *ppos,
4727     + size_t len, unsigned int flags)
4728     +{
4729     + ssize_t err = 0;
4730     + struct file *lower_file;
4731     + struct dentry *dentry = file->f_path.dentry;
4732     + struct dentry *parent;
4733     +
4734     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4735     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4736     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4737     +
4738     + err = unionfs_file_revalidate(file, parent, true);
4739     + if (unlikely(err))
4740     + goto out;
4741     +
4742     + lower_file = unionfs_lower_file(file);
4743     + err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
4744     + /* update our inode times+sizes upon a successful lower write */
4745     + if (err >= 0) {
4746     + fsstack_copy_inode_size(dentry->d_inode,
4747     + lower_file->f_path.dentry->d_inode);
4748     + fsstack_copy_attr_times(dentry->d_inode,
4749     + lower_file->f_path.dentry->d_inode);
4750     + unionfs_check_file(file);
4751     + }
4752     +
4753     +out:
4754     + unionfs_unlock_dentry(dentry);
4755     + unionfs_unlock_parent(dentry, parent);
4756     + unionfs_read_unlock(dentry->d_sb);
4757     + return err;
4758     +}
4759     +
4760     +struct file_operations unionfs_main_fops = {
4761     + .llseek = generic_file_llseek,
4762     + .read = unionfs_read,
4763     + .write = unionfs_write,
4764     + .readdir = unionfs_file_readdir,
4765     + .unlocked_ioctl = unionfs_ioctl,
4766     + .mmap = unionfs_mmap,
4767     + .open = unionfs_open,
4768     + .flush = unionfs_flush,
4769     + .release = unionfs_file_release,
4770     + .fsync = unionfs_fsync,
4771     + .fasync = unionfs_fasync,
4772     + .splice_read = unionfs_splice_read,
4773     + .splice_write = unionfs_splice_write,
4774     +};
4775     diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
4776     new file mode 100644
4777     index 0000000..062163a
4778     --- /dev/null
4779     +++ b/fs/unionfs/inode.c
4780     @@ -0,0 +1,1055 @@
4781     +/*
4782     + * Copyright (c) 2003-2010 Erez Zadok
4783     + * Copyright (c) 2003-2006 Charles P. Wright
4784     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4785     + * Copyright (c) 2005-2006 Junjiro Okajima
4786     + * Copyright (c) 2005 Arun M. Krishnakumar
4787     + * Copyright (c) 2004-2006 David P. Quigley
4788     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4789     + * Copyright (c) 2003 Puja Gupta
4790     + * Copyright (c) 2003 Harikesavan Krishnan
4791     + * Copyright (c) 2003-2010 Stony Brook University
4792     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
4793     + *
4794     + * This program is free software; you can redistribute it and/or modify
4795     + * it under the terms of the GNU General Public License version 2 as
4796     + * published by the Free Software Foundation.
4797     + */
4798     +
4799     +#include "union.h"
4800     +
4801     +/*
4802     + * Find a writeable branch to create new object in. Checks all writeble
4803     + * branches of the parent inode, from istart to iend order; if none are
4804     + * suitable, also tries branch 0 (which may require a copyup).
4805     + *
4806     + * Return a lower_dentry we can use to create object in, or ERR_PTR.
4807     + */
4808     +static struct dentry *find_writeable_branch(struct inode *parent,
4809     + struct dentry *dentry)
4810     +{
4811     + int err = -EINVAL;
4812     + int bindex, istart, iend;
4813     + struct dentry *lower_dentry = NULL;
4814     +
4815     + istart = ibstart(parent);
4816     + iend = ibend(parent);
4817     + if (istart < 0)
4818     + goto out;
4819     +
4820     +begin:
4821     + for (bindex = istart; bindex <= iend; bindex++) {
4822     + /* skip non-writeable branches */
4823     + err = is_robranch_super(dentry->d_sb, bindex);
4824     + if (err) {
4825     + err = -EROFS;
4826     + continue;
4827     + }
4828     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4829     + if (!lower_dentry)
4830     + continue;
4831     + /*
4832     + * check for whiteouts in writeable branch, and remove them
4833     + * if necessary.
4834     + */
4835     + err = check_unlink_whiteout(dentry, lower_dentry, bindex);
4836     + if (err > 0) /* ignore if whiteout found and removed */
4837     + err = 0;
4838     + if (err)
4839     + continue;
4840     + /* if get here, we can write to the branch */
4841     + break;
4842     + }
4843     + /*
4844     + * If istart wasn't already branch 0, and we got any error, then try
4845     + * branch 0 (which may require copyup)
4846     + */
4847     + if (err && istart > 0) {
4848     + istart = iend = 0;
4849     + goto begin;
4850     + }
4851     +
4852     + /*
4853     + * If we tried even branch 0, and still got an error, abort. But if
4854     + * the error was an EROFS, then we should try to copyup.
4855     + */
4856     + if (err && err != -EROFS)
4857     + goto out;
4858     +
4859     + /*
4860     + * If we get here, then check if copyup needed. If lower_dentry is
4861     + * NULL, create the entire dentry directory structure in branch 0.
4862     + */
4863     + if (!lower_dentry) {
4864     + bindex = 0;
4865     + lower_dentry = create_parents(parent, dentry,
4866     + dentry->d_name.name, bindex);
4867     + if (IS_ERR(lower_dentry)) {
4868     + err = PTR_ERR(lower_dentry);
4869     + goto out;
4870     + }
4871     + }
4872     + err = 0; /* all's well */
4873     +out:
4874     + if (err)
4875     + return ERR_PTR(err);
4876     + return lower_dentry;
4877     +}
4878     +
4879     +static int unionfs_create(struct inode *dir, struct dentry *dentry,
4880     + int mode, struct nameidata *nd_unused)
4881     +{
4882     + int err = 0;
4883     + struct dentry *lower_dentry = NULL;
4884     + struct dentry *lower_parent_dentry = NULL;
4885     + struct dentry *parent;
4886     + int valid = 0;
4887     + struct nameidata lower_nd;
4888     +
4889     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4890     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4891     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4892     +
4893     + valid = __unionfs_d_revalidate(dentry, parent, false);
4894     + if (unlikely(!valid)) {
4895     + err = -ESTALE; /* same as what real_lookup does */
4896     + goto out;
4897     + }
4898     +
4899     + lower_dentry = find_writeable_branch(dir, dentry);
4900     + if (IS_ERR(lower_dentry)) {
4901     + err = PTR_ERR(lower_dentry);
4902     + goto out;
4903     + }
4904     +
4905     + lower_parent_dentry = lock_parent(lower_dentry);
4906     + if (IS_ERR(lower_parent_dentry)) {
4907     + err = PTR_ERR(lower_parent_dentry);
4908     + goto out_unlock;
4909     + }
4910     +
4911     + err = init_lower_nd(&lower_nd, LOOKUP_CREATE);
4912     + if (unlikely(err < 0))
4913     + goto out_unlock;
4914     + err = vfs_create(lower_parent_dentry->d_inode, lower_dentry, mode,
4915     + &lower_nd);
4916     + release_lower_nd(&lower_nd, err);
4917     +
4918     + if (!err) {
4919     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
4920     + if (!err) {
4921     + unionfs_copy_attr_times(dir);
4922     + fsstack_copy_inode_size(dir,
4923     + lower_parent_dentry->d_inode);
4924     + /* update no. of links on parent directory */
4925     + dir->i_nlink = unionfs_get_nlinks(dir);
4926     + }
4927     + }
4928     +
4929     +out_unlock:
4930     + unlock_dir(lower_parent_dentry);
4931     +out:
4932     + if (!err) {
4933     + unionfs_postcopyup_setmnt(dentry);
4934     + unionfs_check_inode(dir);
4935     + unionfs_check_dentry(dentry);
4936     + }
4937     + unionfs_unlock_dentry(dentry);
4938     + unionfs_unlock_parent(dentry, parent);
4939     + unionfs_read_unlock(dentry->d_sb);
4940     + return err;
4941     +}
4942     +
4943     +/*
4944     + * unionfs_lookup is the only special function which takes a dentry, yet we
4945     + * do NOT want to call __unionfs_d_revalidate_chain because by definition,
4946     + * we don't have a valid dentry here yet.
4947     + */
4948     +static struct dentry *unionfs_lookup(struct inode *dir,
4949     + struct dentry *dentry,
4950     + struct nameidata *nd_unused)
4951     +{
4952     + struct dentry *ret, *parent;
4953     + int err = 0;
4954     +
4955     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4956     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4957     +
4958     + /*
4959     + * As long as we lock/dget the parent, then can skip validating the
4960     + * parent now; we may have to rebuild this dentry on the next
4961     + * ->d_revalidate, however.
4962     + */
4963     +
4964     + /* allocate dentry private data. We free it in ->d_release */
4965     + err = new_dentry_private_data(dentry, UNIONFS_DMUTEX_CHILD);
4966     + if (unlikely(err)) {
4967     + ret = ERR_PTR(err);
4968     + goto out;
4969     + }
4970     +
4971     + ret = unionfs_lookup_full(dentry, parent, INTERPOSE_LOOKUP);
4972     +
4973     + if (!IS_ERR(ret)) {
4974     + if (ret)
4975     + dentry = ret;
4976     + /* lookup_full can return multiple positive dentries */
4977     + if (dentry->d_inode && !S_ISDIR(dentry->d_inode->i_mode)) {
4978     + BUG_ON(dbstart(dentry) < 0);
4979     + unionfs_postcopyup_release(dentry);
4980     + }
4981     + unionfs_copy_attr_times(dentry->d_inode);
4982     + }
4983     +
4984     + unionfs_check_inode(dir);
4985     + if (!IS_ERR(ret))
4986     + unionfs_check_dentry(dentry);
4987     + unionfs_check_dentry(parent);
4988     + unionfs_unlock_dentry(dentry); /* locked in new_dentry_private data */
4989     +
4990     +out:
4991     + unionfs_unlock_parent(dentry, parent);
4992     + unionfs_read_unlock(dentry->d_sb);
4993     +
4994     + return ret;
4995     +}
4996     +
4997     +static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
4998     + struct dentry *new_dentry)
4999     +{
5000     + int err = 0;
5001     + struct dentry *lower_old_dentry = NULL;
5002     + struct dentry *lower_new_dentry = NULL;
5003     + struct dentry *lower_dir_dentry = NULL;
5004     + struct dentry *old_parent, *new_parent;
5005     + char *name = NULL;
5006     + bool valid;
5007     +
5008     + unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5009     + old_parent = dget_parent(old_dentry);
5010     + new_parent = dget_parent(new_dentry);
5011     + unionfs_double_lock_parents(old_parent, new_parent);
5012     + unionfs_double_lock_dentry(old_dentry, new_dentry);
5013     +
5014     + valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
5015     + if (unlikely(!valid)) {
5016     + err = -ESTALE;
5017     + goto out;
5018     + }
5019     + if (new_dentry->d_inode) {
5020     + valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
5021     + if (unlikely(!valid)) {
5022     + err = -ESTALE;
5023     + goto out;
5024     + }
5025     + }
5026     +
5027     + lower_new_dentry = unionfs_lower_dentry(new_dentry);
5028     +
5029     + /* check for a whiteout in new dentry branch, and delete it */
5030     + err = check_unlink_whiteout(new_dentry, lower_new_dentry,
5031     + dbstart(new_dentry));
5032     + if (err > 0) { /* whiteout found and removed successfully */
5033     + lower_dir_dentry = dget_parent(lower_new_dentry);
5034     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
5035     + dput(lower_dir_dentry);
5036     + dir->i_nlink = unionfs_get_nlinks(dir);
5037     + err = 0;
5038     + }
5039     + if (err)
5040     + goto out;
5041     +
5042     + /* check if parent hierachy is needed, then link in same branch */
5043     + if (dbstart(old_dentry) != dbstart(new_dentry)) {
5044     + lower_new_dentry = create_parents(dir, new_dentry,
5045     + new_dentry->d_name.name,
5046     + dbstart(old_dentry));
5047     + err = PTR_ERR(lower_new_dentry);
5048     + if (IS_COPYUP_ERR(err))
5049     + goto docopyup;
5050     + if (!lower_new_dentry || IS_ERR(lower_new_dentry))
5051     + goto out;
5052     + }
5053     + lower_new_dentry = unionfs_lower_dentry(new_dentry);
5054     + lower_old_dentry = unionfs_lower_dentry(old_dentry);
5055     +
5056     + BUG_ON(dbstart(old_dentry) != dbstart(new_dentry));
5057     + lower_dir_dentry = lock_parent(lower_new_dentry);
5058     + err = is_robranch(old_dentry);
5059     + if (!err) {
5060     + /* see Documentation/filesystems/unionfs/issues.txt */
5061     + lockdep_off();
5062     + err = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
5063     + lower_new_dentry);
5064     + lockdep_on();
5065     + }
5066     + unlock_dir(lower_dir_dentry);
5067     +
5068     +docopyup:
5069     + if (IS_COPYUP_ERR(err)) {
5070     + int old_bstart = dbstart(old_dentry);
5071     + int bindex;
5072     +
5073     + for (bindex = old_bstart - 1; bindex >= 0; bindex--) {
5074     + err = copyup_dentry(old_parent->d_inode,
5075     + old_dentry, old_bstart,
5076     + bindex, old_dentry->d_name.name,
5077     + old_dentry->d_name.len, NULL,
5078     + i_size_read(old_dentry->d_inode));
5079     + if (err)
5080     + continue;
5081     + lower_new_dentry =
5082     + create_parents(dir, new_dentry,
5083     + new_dentry->d_name.name,
5084     + bindex);
5085     + lower_old_dentry = unionfs_lower_dentry(old_dentry);
5086     + lower_dir_dentry = lock_parent(lower_new_dentry);
5087     + /* see Documentation/filesystems/unionfs/issues.txt */
5088     + lockdep_off();
5089     + /* do vfs_link */
5090     + err = vfs_link(lower_old_dentry,
5091     + lower_dir_dentry->d_inode,
5092     + lower_new_dentry);
5093     + lockdep_on();
5094     + unlock_dir(lower_dir_dentry);
5095     + goto check_link;
5096     + }
5097     + goto out;
5098     + }
5099     +
5100     +check_link:
5101     + if (err || !lower_new_dentry->d_inode)
5102     + goto out;
5103     +
5104     + /* Its a hard link, so use the same inode */
5105     + new_dentry->d_inode = igrab(old_dentry->d_inode);
5106     + d_add(new_dentry, new_dentry->d_inode);
5107     + unionfs_copy_attr_all(dir, lower_new_dentry->d_parent->d_inode);
5108     + fsstack_copy_inode_size(dir, lower_new_dentry->d_parent->d_inode);
5109     +
5110     + /* propagate number of hard-links */
5111     + old_dentry->d_inode->i_nlink = unionfs_get_nlinks(old_dentry->d_inode);
5112     + /* new dentry's ctime may have changed due to hard-link counts */
5113     + unionfs_copy_attr_times(new_dentry->d_inode);
5114     +
5115     +out:
5116     + if (!new_dentry->d_inode)
5117     + d_drop(new_dentry);
5118     +
5119     + kfree(name);
5120     + if (!err)
5121     + unionfs_postcopyup_setmnt(new_dentry);
5122     +
5123     + unionfs_check_inode(dir);
5124     + unionfs_check_dentry(new_dentry);
5125     + unionfs_check_dentry(old_dentry);
5126     +
5127     + unionfs_double_unlock_dentry(old_dentry, new_dentry);
5128     + unionfs_double_unlock_parents(old_parent, new_parent);
5129     + dput(new_parent);
5130     + dput(old_parent);
5131     + unionfs_read_unlock(old_dentry->d_sb);
5132     +
5133     + return err;
5134     +}
5135     +
5136     +static int unionfs_symlink(struct inode *dir, struct dentry *dentry,
5137     + const char *symname)
5138     +{
5139     + int err = 0;
5140     + struct dentry *lower_dentry = NULL;
5141     + struct dentry *wh_dentry = NULL;
5142     + struct dentry *lower_parent_dentry = NULL;
5143     + struct dentry *parent;
5144     + char *name = NULL;
5145     + int valid = 0;
5146     + umode_t mode;
5147     +
5148     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5149     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5150     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5151     +
5152     + valid = __unionfs_d_revalidate(dentry, parent, false);
5153     + if (unlikely(!valid)) {
5154     + err = -ESTALE;
5155     + goto out;
5156     + }
5157     +
5158     + /*
5159     + * It's only a bug if this dentry was not negative and couldn't be
5160     + * revalidated (shouldn't happen).
5161     + */
5162     + BUG_ON(!valid && dentry->d_inode);
5163     +
5164     + lower_dentry = find_writeable_branch(dir, dentry);
5165     + if (IS_ERR(lower_dentry)) {
5166     + err = PTR_ERR(lower_dentry);
5167     + goto out;
5168     + }
5169     +
5170     + lower_parent_dentry = lock_parent(lower_dentry);
5171     + if (IS_ERR(lower_parent_dentry)) {
5172     + err = PTR_ERR(lower_parent_dentry);
5173     + goto out_unlock;
5174     + }
5175     +
5176     + mode = S_IALLUGO;
5177     + err = vfs_symlink(lower_parent_dentry->d_inode, lower_dentry, symname);
5178     + if (!err) {
5179     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5180     + if (!err) {
5181     + unionfs_copy_attr_times(dir);
5182     + fsstack_copy_inode_size(dir,
5183     + lower_parent_dentry->d_inode);
5184     + /* update no. of links on parent directory */
5185     + dir->i_nlink = unionfs_get_nlinks(dir);
5186     + }
5187     + }
5188     +
5189     +out_unlock:
5190     + unlock_dir(lower_parent_dentry);
5191     +out:
5192     + dput(wh_dentry);
5193     + kfree(name);
5194     +
5195     + if (!err) {
5196     + unionfs_postcopyup_setmnt(dentry);
5197     + unionfs_check_inode(dir);
5198     + unionfs_check_dentry(dentry);
5199     + }
5200     + unionfs_unlock_dentry(dentry);
5201     + unionfs_unlock_parent(dentry, parent);
5202     + unionfs_read_unlock(dentry->d_sb);
5203     + return err;
5204     +}
5205     +
5206     +static int unionfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
5207     +{
5208     + int err = 0;
5209     + struct dentry *lower_dentry = NULL;
5210     + struct dentry *lower_parent_dentry = NULL;
5211     + struct dentry *parent;
5212     + int bindex = 0, bstart;
5213     + char *name = NULL;
5214     + int valid;
5215     +
5216     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5217     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5218     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5219     +
5220     + valid = __unionfs_d_revalidate(dentry, parent, false);
5221     + if (unlikely(!valid)) {
5222     + err = -ESTALE; /* same as what real_lookup does */
5223     + goto out;
5224     + }
5225     +
5226     + bstart = dbstart(dentry);
5227     +
5228     + lower_dentry = unionfs_lower_dentry(dentry);
5229     +
5230     + /* check for a whiteout in new dentry branch, and delete it */
5231     + err = check_unlink_whiteout(dentry, lower_dentry, bstart);
5232     + if (err > 0) /* whiteout found and removed successfully */
5233     + err = 0;
5234     + if (err) {
5235     + /* exit if the error returned was NOT -EROFS */
5236     + if (!IS_COPYUP_ERR(err))
5237     + goto out;
5238     + bstart--;
5239     + }
5240     +
5241     + /* check if copyup's needed, and mkdir */
5242     + for (bindex = bstart; bindex >= 0; bindex--) {
5243     + int i;
5244     + int bend = dbend(dentry);
5245     +
5246     + if (is_robranch_super(dentry->d_sb, bindex))
5247     + continue;
5248     +
5249     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5250     + if (!lower_dentry) {
5251     + lower_dentry = create_parents(dir, dentry,
5252     + dentry->d_name.name,
5253     + bindex);
5254     + if (!lower_dentry || IS_ERR(lower_dentry)) {
5255     + printk(KERN_ERR "unionfs: lower dentry "
5256     + " NULL for bindex = %d\n", bindex);
5257     + continue;
5258     + }
5259     + }
5260     +
5261     + lower_parent_dentry = lock_parent(lower_dentry);
5262     +
5263     + if (IS_ERR(lower_parent_dentry)) {
5264     + err = PTR_ERR(lower_parent_dentry);
5265     + goto out;
5266     + }
5267     +
5268     + err = vfs_mkdir(lower_parent_dentry->d_inode, lower_dentry,
5269     + mode);
5270     +
5271     + unlock_dir(lower_parent_dentry);
5272     +
5273     + /* did the mkdir succeed? */
5274     + if (err)
5275     + break;
5276     +
5277     + for (i = bindex + 1; i <= bend; i++) {
5278     + /* XXX: use path_put_lowers? */
5279     + if (unionfs_lower_dentry_idx(dentry, i)) {
5280     + dput(unionfs_lower_dentry_idx(dentry, i));
5281     + unionfs_set_lower_dentry_idx(dentry, i, NULL);
5282     + }
5283     + }
5284     + dbend(dentry) = bindex;
5285     +
5286     + /*
5287     + * Only INTERPOSE_LOOKUP can return a value other than 0 on
5288     + * err.
5289     + */
5290     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5291     + if (!err) {
5292     + unionfs_copy_attr_times(dir);
5293     + fsstack_copy_inode_size(dir,
5294     + lower_parent_dentry->d_inode);
5295     +
5296     + /* update number of links on parent directory */
5297     + dir->i_nlink = unionfs_get_nlinks(dir);
5298     + }
5299     +
5300     + err = make_dir_opaque(dentry, dbstart(dentry));
5301     + if (err) {
5302     + printk(KERN_ERR "unionfs: mkdir: error creating "
5303     + ".wh.__dir_opaque: %d\n", err);
5304     + goto out;
5305     + }
5306     +
5307     + /* we are done! */
5308     + break;
5309     + }
5310     +
5311     +out:
5312     + if (!dentry->d_inode)
5313     + d_drop(dentry);
5314     +
5315     + kfree(name);
5316     +
5317     + if (!err) {
5318     + unionfs_copy_attr_times(dentry->d_inode);
5319     + unionfs_postcopyup_setmnt(dentry);
5320     + }
5321     + unionfs_check_inode(dir);
5322     + unionfs_check_dentry(dentry);
5323     + unionfs_unlock_dentry(dentry);
5324     + unionfs_unlock_parent(dentry, parent);
5325     + unionfs_read_unlock(dentry->d_sb);
5326     +
5327     + return err;
5328     +}
5329     +
5330     +static int unionfs_mknod(struct inode *dir, struct dentry *dentry, int mode,
5331     + dev_t dev)
5332     +{
5333     + int err = 0;
5334     + struct dentry *lower_dentry = NULL;
5335     + struct dentry *wh_dentry = NULL;
5336     + struct dentry *lower_parent_dentry = NULL;
5337     + struct dentry *parent;
5338     + char *name = NULL;
5339     + int valid = 0;
5340     +
5341     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5342     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5343     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5344     +
5345     + valid = __unionfs_d_revalidate(dentry, parent, false);
5346     + if (unlikely(!valid)) {
5347     + err = -ESTALE;
5348     + goto out;
5349     + }
5350     +
5351     + /*
5352     + * It's only a bug if this dentry was not negative and couldn't be
5353     + * revalidated (shouldn't happen).
5354     + */
5355     + BUG_ON(!valid && dentry->d_inode);
5356     +
5357     + lower_dentry = find_writeable_branch(dir, dentry);
5358     + if (IS_ERR(lower_dentry)) {
5359     + err = PTR_ERR(lower_dentry);
5360     + goto out;
5361     + }
5362     +
5363     + lower_parent_dentry = lock_parent(lower_dentry);
5364     + if (IS_ERR(lower_parent_dentry)) {
5365     + err = PTR_ERR(lower_parent_dentry);
5366     + goto out_unlock;
5367     + }
5368     +
5369     + err = vfs_mknod(lower_parent_dentry->d_inode, lower_dentry, mode, dev);
5370     + if (!err) {
5371     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5372     + if (!err) {
5373     + unionfs_copy_attr_times(dir);
5374     + fsstack_copy_inode_size(dir,
5375     + lower_parent_dentry->d_inode);
5376     + /* update no. of links on parent directory */
5377     + dir->i_nlink = unionfs_get_nlinks(dir);
5378     + }
5379     + }
5380     +
5381     +out_unlock:
5382     + unlock_dir(lower_parent_dentry);
5383     +out:
5384     + dput(wh_dentry);
5385     + kfree(name);
5386     +
5387     + if (!err) {
5388     + unionfs_postcopyup_setmnt(dentry);
5389     + unionfs_check_inode(dir);
5390     + unionfs_check_dentry(dentry);
5391     + }
5392     + unionfs_unlock_dentry(dentry);
5393     + unionfs_unlock_parent(dentry, parent);
5394     + unionfs_read_unlock(dentry->d_sb);
5395     + return err;
5396     +}
5397     +
5398     +/* requires sb, dentry, and parent to already be locked */
5399     +static int __unionfs_readlink(struct dentry *dentry, char __user *buf,
5400     + int bufsiz)
5401     +{
5402     + int err;
5403     + struct dentry *lower_dentry;
5404     +
5405     + lower_dentry = unionfs_lower_dentry(dentry);
5406     +
5407     + if (!lower_dentry->d_inode->i_op ||
5408     + !lower_dentry->d_inode->i_op->readlink) {
5409     + err = -EINVAL;
5410     + goto out;
5411     + }
5412     +
5413     + err = lower_dentry->d_inode->i_op->readlink(lower_dentry,
5414     + buf, bufsiz);
5415     + if (err >= 0)
5416     + fsstack_copy_attr_atime(dentry->d_inode,
5417     + lower_dentry->d_inode);
5418     +
5419     +out:
5420     + return err;
5421     +}
5422     +
5423     +static int unionfs_readlink(struct dentry *dentry, char __user *buf,
5424     + int bufsiz)
5425     +{
5426     + int err;
5427     + struct dentry *parent;
5428     +
5429     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5430     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5431     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5432     +
5433     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5434     + err = -ESTALE;
5435     + goto out;
5436     + }
5437     +
5438     + err = __unionfs_readlink(dentry, buf, bufsiz);
5439     +
5440     +out:
5441     + unionfs_check_dentry(dentry);
5442     + unionfs_unlock_dentry(dentry);
5443     + unionfs_unlock_parent(dentry, parent);
5444     + unionfs_read_unlock(dentry->d_sb);
5445     +
5446     + return err;
5447     +}
5448     +
5449     +static void *unionfs_follow_link(struct dentry *dentry, struct nameidata *nd)
5450     +{
5451     + char *buf;
5452     + int len = PAGE_SIZE, err;
5453     + mm_segment_t old_fs;
5454     + struct dentry *parent;
5455     +
5456     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5457     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5458     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5459     +
5460     + /* This is freed by the put_link method assuming a successful call. */
5461     + buf = kmalloc(len, GFP_KERNEL);
5462     + if (unlikely(!buf)) {
5463     + err = -ENOMEM;
5464     + goto out;
5465     + }
5466     +
5467     + /* read the symlink, and then we will follow it */
5468     + old_fs = get_fs();
5469     + set_fs(KERNEL_DS);
5470     + err = __unionfs_readlink(dentry, buf, len);
5471     + set_fs(old_fs);
5472     + if (err < 0) {
5473     + kfree(buf);
5474     + buf = NULL;
5475     + goto out;
5476     + }
5477     + buf[err] = 0;
5478     + nd_set_link(nd, buf);
5479     + err = 0;
5480     +
5481     +out:
5482     + if (err >= 0) {
5483     + unionfs_check_nd(nd);
5484     + unionfs_check_dentry(dentry);
5485     + }
5486     +
5487     + unionfs_unlock_dentry(dentry);
5488     + unionfs_unlock_parent(dentry, parent);
5489     + unionfs_read_unlock(dentry->d_sb);
5490     +
5491     + return ERR_PTR(err);
5492     +}
5493     +
5494     +/* this @nd *IS* still used */
5495     +static void unionfs_put_link(struct dentry *dentry, struct nameidata *nd,
5496     + void *cookie)
5497     +{
5498     + struct dentry *parent;
5499     +
5500     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5501     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5502     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5503     +
5504     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false)))
5505     + printk(KERN_ERR
5506     + "unionfs: put_link failed to revalidate dentry\n");
5507     +
5508     + unionfs_check_dentry(dentry);
5509     + unionfs_check_nd(nd);
5510     + kfree(nd_get_link(nd));
5511     + unionfs_unlock_dentry(dentry);
5512     + unionfs_unlock_parent(dentry, parent);
5513     + unionfs_read_unlock(dentry->d_sb);
5514     +}
5515     +
5516     +/*
5517     + * This is a variant of fs/namei.c:permission() or inode_permission() which
5518     + * skips over EROFS tests (because we perform copyup on EROFS).
5519     + */
5520     +static int __inode_permission(struct inode *inode, int mask)
5521     +{
5522     + int retval;
5523     +
5524     + /* nobody gets write access to an immutable file */
5525     + if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
5526     + return -EACCES;
5527     +
5528     + /* Ordinary permission routines do not understand MAY_APPEND. */
5529     + if (inode->i_op && inode->i_op->permission) {
5530     + retval = inode->i_op->permission(inode, mask);
5531     + if (!retval) {
5532     + /*
5533     + * Exec permission on a regular file is denied if none
5534     + * of the execute bits are set.
5535     + *
5536     + * This check should be done by the ->permission()
5537     + * method.
5538     + */
5539     + if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode) &&
5540     + !(inode->i_mode & S_IXUGO))
5541     + return -EACCES;
5542     + }
5543     + } else {
5544     + retval = generic_permission(inode, mask, NULL);
5545     + }
5546     + if (retval)
5547     + return retval;
5548     +
5549     + return security_inode_permission(inode,
5550     + mask & (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND));
5551     +}
5552     +
5553     +/*
5554     + * Don't grab the superblock read-lock in unionfs_permission, which prevents
5555     + * a deadlock with the branch-management "add branch" code (which grabbed
5556     + * the write lock). It is safe to not grab the read lock here, because even
5557     + * with branch management taking place, there is no chance that
5558     + * unionfs_permission, or anything it calls, will use stale branch
5559     + * information.
5560     + */
5561     +static int unionfs_permission(struct inode *inode, int mask)
5562     +{
5563     + struct inode *lower_inode = NULL;
5564     + int err = 0;
5565     + int bindex, bstart, bend;
5566     + const int is_file = !S_ISDIR(inode->i_mode);
5567     + const int write_mask = (mask & MAY_WRITE) && !(mask & MAY_READ);
5568     + struct inode *inode_grabbed = igrab(inode);
5569     + struct dentry *dentry = d_find_alias(inode);
5570     +
5571     + if (dentry)
5572     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5573     +
5574     + if (!UNIONFS_I(inode)->lower_inodes) {
5575     + if (is_file) /* dirs can be unlinked but chdir'ed to */
5576     + err = -ESTALE; /* force revalidate */
5577     + goto out;
5578     + }
5579     + bstart = ibstart(inode);
5580     + bend = ibend(inode);
5581     + if (unlikely(bstart < 0 || bend < 0)) {
5582     + /*
5583     + * With branch-management, we can get a stale inode here.
5584     + * If so, we return ESTALE back to link_path_walk, which
5585     + * would discard the dcache entry and re-lookup the
5586     + * dentry+inode. This should be equivalent to issuing
5587     + * __unionfs_d_revalidate_chain on nd.dentry here.
5588     + */
5589     + if (is_file) /* dirs can be unlinked but chdir'ed to */
5590     + err = -ESTALE; /* force revalidate */
5591     + goto out;
5592     + }
5593     +
5594     + for (bindex = bstart; bindex <= bend; bindex++) {
5595     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
5596     + if (!lower_inode)
5597     + continue;
5598     +
5599     + /*
5600     + * check the condition for D-F-D underlying files/directories,
5601     + * we don't have to check for files, if we are checking for
5602     + * directories.
5603     + */
5604     + if (!is_file && !S_ISDIR(lower_inode->i_mode))
5605     + continue;
5606     +
5607     + /*
5608     + * We check basic permissions, but we ignore any conditions
5609     + * such as readonly file systems or branches marked as
5610     + * readonly, because those conditions should lead to a
5611     + * copyup taking place later on. However, if user never had
5612     + * access to the file, then no copyup could ever take place.
5613     + */
5614     + err = __inode_permission(lower_inode, mask);
5615     + if (err && err != -EACCES && err != EPERM && bindex > 0) {
5616     + umode_t mode = lower_inode->i_mode;
5617     + if ((is_robranch_super(inode->i_sb, bindex) ||
5618     + __is_rdonly(lower_inode)) &&
5619     + (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
5620     + err = 0;
5621     + if (IS_COPYUP_ERR(err))
5622     + err = 0;
5623     + }
5624     +
5625     + /*
5626     + * NFS HACK: NFSv2/3 return EACCES on readonly-exported,
5627     + * locally readonly-mounted file systems, instead of EROFS
5628     + * like other file systems do. So we have no choice here
5629     + * but to intercept this and ignore it for NFS branches
5630     + * marked readonly. Specifically, we avoid using NFS's own
5631     + * "broken" ->permission method, and rely on
5632     + * generic_permission() to do basic checking for us.
5633     + */
5634     + if (err && err == -EACCES &&
5635     + is_robranch_super(inode->i_sb, bindex) &&
5636     + lower_inode->i_sb->s_magic == NFS_SUPER_MAGIC)
5637     + err = generic_permission(lower_inode, mask, NULL);
5638     +
5639     + /*
5640     + * The permissions are an intersection of the overall directory
5641     + * permissions, so we fail if one fails.
5642     + */
5643     + if (err)
5644     + goto out;
5645     +
5646     + /* only the leftmost file matters. */
5647     + if (is_file || write_mask) {
5648     + if (is_file && write_mask) {
5649     + err = get_write_access(lower_inode);
5650     + if (!err)
5651     + put_write_access(lower_inode);
5652     + }
5653     + break;
5654     + }
5655     + }
5656     + /* sync times which may have changed (asynchronously) below */
5657     + unionfs_copy_attr_times(inode);
5658     +
5659     +out:
5660     + unionfs_check_inode(inode);
5661     + if (dentry) {
5662     + unionfs_unlock_dentry(dentry);
5663     + dput(dentry);
5664     + }
5665     + iput(inode_grabbed);
5666     + return err;
5667     +}
5668     +
5669     +static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
5670     +{
5671     + int err = 0;
5672     + struct dentry *lower_dentry;
5673     + struct dentry *parent;
5674     + struct inode *inode;
5675     + struct inode *lower_inode;
5676     + int bstart, bend, bindex;
5677     + loff_t size;
5678     +
5679     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5680     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5681     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5682     +
5683     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5684     + err = -ESTALE;
5685     + goto out;
5686     + }
5687     +
5688     + bstart = dbstart(dentry);
5689     + bend = dbend(dentry);
5690     + inode = dentry->d_inode;
5691     +
5692     + /*
5693     + * mode change is for clearing setuid/setgid. Allow lower filesystem
5694     + * to reinterpret it in its own way.
5695     + */
5696     + if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
5697     + ia->ia_valid &= ~ATTR_MODE;
5698     +
5699     + lower_dentry = unionfs_lower_dentry(dentry);
5700     + if (!lower_dentry) { /* should never happen after above revalidate */
5701     + err = -EINVAL;
5702     + goto out;
5703     + }
5704     + lower_inode = unionfs_lower_inode(inode);
5705     +
5706     + /* check if user has permission to change lower inode */
5707     + err = inode_change_ok(lower_inode, ia);
5708     + if (err)
5709     + goto out;
5710     +
5711     + /* copyup if the file is on a read only branch */
5712     + if (is_robranch_super(dentry->d_sb, bstart)
5713     + || __is_rdonly(lower_inode)) {
5714     + /* check if we have a branch to copy up to */
5715     + if (bstart <= 0) {
5716     + err = -EACCES;
5717     + goto out;
5718     + }
5719     +
5720     + if (ia->ia_valid & ATTR_SIZE)
5721     + size = ia->ia_size;
5722     + else
5723     + size = i_size_read(inode);
5724     + /* copyup to next available branch */
5725     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
5726     + err = copyup_dentry(parent->d_inode,
5727     + dentry, bstart, bindex,
5728     + dentry->d_name.name,
5729     + dentry->d_name.len,
5730     + NULL, size);
5731     + if (!err)
5732     + break;
5733     + }
5734     + if (err)
5735     + goto out;
5736     + /* get updated lower_dentry/inode after copyup */
5737     + lower_dentry = unionfs_lower_dentry(dentry);
5738     + lower_inode = unionfs_lower_inode(inode);
5739     + }
5740     +
5741     + /*
5742     + * If shrinking, first truncate upper level to cancel writing dirty
5743     + * pages beyond the new eof; and also if its' maxbytes is more
5744     + * limiting (fail with -EFBIG before making any change to the lower
5745     + * level). There is no need to vmtruncate the upper level
5746     + * afterwards in the other cases: we fsstack_copy_inode_size from
5747     + * the lower level.
5748     + */
5749     + if (ia->ia_valid & ATTR_SIZE) {
5750     + size = i_size_read(inode);
5751     + if (ia->ia_size < size || (ia->ia_size > size &&
5752     + inode->i_sb->s_maxbytes < lower_inode->i_sb->s_maxbytes)) {
5753     + err = vmtruncate(inode, ia->ia_size);
5754     + if (err)
5755     + goto out;
5756     + }
5757     + }
5758     +
5759     + /* notify the (possibly copied-up) lower inode */
5760     + /*
5761     + * Note: we use lower_dentry->d_inode, because lower_inode may be
5762     + * unlinked (no inode->i_sb and i_ino==0. This happens if someone
5763     + * tries to open(), unlink(), then ftruncate() a file.
5764     + */
5765     + mutex_lock(&lower_dentry->d_inode->i_mutex);
5766     + err = notify_change(lower_dentry, ia);
5767     + mutex_unlock(&lower_dentry->d_inode->i_mutex);
5768     + if (err)
5769     + goto out;
5770     +
5771     + /* get attributes from the first lower inode */
5772     + if (ibstart(inode) >= 0)
5773     + unionfs_copy_attr_all(inode, lower_inode);
5774     + /*
5775     + * unionfs_copy_attr_all will copy the lower times to our inode if
5776     + * the lower ones are newer (useful for cache coherency). However,
5777     + * ->setattr is the only place in which we may have to copy the
5778     + * lower inode times absolutely, to support utimes(2).
5779     + */
5780     + if (ia->ia_valid & ATTR_MTIME_SET)
5781     + inode->i_mtime = lower_inode->i_mtime;
5782     + if (ia->ia_valid & ATTR_CTIME)
5783     + inode->i_ctime = lower_inode->i_ctime;
5784     + if (ia->ia_valid & ATTR_ATIME_SET)
5785     + inode->i_atime = lower_inode->i_atime;
5786     + fsstack_copy_inode_size(inode, lower_inode);
5787     +
5788     +out:
5789     + if (!err)
5790     + unionfs_check_dentry(dentry);
5791     + unionfs_unlock_dentry(dentry);
5792     + unionfs_unlock_parent(dentry, parent);
5793     + unionfs_read_unlock(dentry->d_sb);
5794     +
5795     + return err;
5796     +}
5797     +
5798     +struct inode_operations unionfs_symlink_iops = {
5799     + .readlink = unionfs_readlink,
5800     + .permission = unionfs_permission,
5801     + .follow_link = unionfs_follow_link,
5802     + .setattr = unionfs_setattr,
5803     + .put_link = unionfs_put_link,
5804     +};
5805     +
5806     +struct inode_operations unionfs_dir_iops = {
5807     + .create = unionfs_create,
5808     + .lookup = unionfs_lookup,
5809     + .link = unionfs_link,
5810     + .unlink = unionfs_unlink,
5811     + .symlink = unionfs_symlink,
5812     + .mkdir = unionfs_mkdir,
5813     + .rmdir = unionfs_rmdir,
5814     + .mknod = unionfs_mknod,
5815     + .rename = unionfs_rename,
5816     + .permission = unionfs_permission,
5817     + .setattr = unionfs_setattr,
5818     +#ifdef CONFIG_UNION_FS_XATTR
5819     + .setxattr = unionfs_setxattr,
5820     + .getxattr = unionfs_getxattr,
5821     + .removexattr = unionfs_removexattr,
5822     + .listxattr = unionfs_listxattr,
5823     +#endif /* CONFIG_UNION_FS_XATTR */
5824     +};
5825     +
5826     +struct inode_operations unionfs_main_iops = {
5827     + .permission = unionfs_permission,
5828     + .setattr = unionfs_setattr,
5829     +#ifdef CONFIG_UNION_FS_XATTR
5830     + .setxattr = unionfs_setxattr,
5831     + .getxattr = unionfs_getxattr,
5832     + .removexattr = unionfs_removexattr,
5833     + .listxattr = unionfs_listxattr,
5834     +#endif /* CONFIG_UNION_FS_XATTR */
5835     +};
5836     diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
5837     new file mode 100644
5838     index 0000000..b63c17e
5839     --- /dev/null
5840     +++ b/fs/unionfs/lookup.c
5841     @@ -0,0 +1,569 @@
5842     +/*
5843     + * Copyright (c) 2003-2010 Erez Zadok
5844     + * Copyright (c) 2003-2006 Charles P. Wright
5845     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
5846     + * Copyright (c) 2005-2006 Junjiro Okajima
5847     + * Copyright (c) 2005 Arun M. Krishnakumar
5848     + * Copyright (c) 2004-2006 David P. Quigley
5849     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
5850     + * Copyright (c) 2003 Puja Gupta
5851     + * Copyright (c) 2003 Harikesavan Krishnan
5852     + * Copyright (c) 2003-2010 Stony Brook University
5853     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
5854     + *
5855     + * This program is free software; you can redistribute it and/or modify
5856     + * it under the terms of the GNU General Public License version 2 as
5857     + * published by the Free Software Foundation.
5858     + */
5859     +
5860     +#include "union.h"
5861     +
5862     +/*
5863     + * Lookup one path component @name relative to a <base,mnt> path pair.
5864     + * Behaves nearly the same as lookup_one_len (i.e., return negative dentry
5865     + * on ENOENT), but uses the @mnt passed, so it can cross bind mounts and
5866     + * other lower mounts properly. If @new_mnt is non-null, will fill in the
5867     + * new mnt there. Caller is responsible to dput/mntput/path_put returned
5868     + * @dentry and @new_mnt.
5869     + */
5870     +struct dentry *__lookup_one(struct dentry *base, struct vfsmount *mnt,
5871     + const char *name, struct vfsmount **new_mnt)
5872     +{
5873     + struct dentry *dentry = NULL;
5874     + struct nameidata lower_nd;
5875     + int err;
5876     +
5877     + /* we use flags=0 to get basic lookup */
5878     + err = vfs_path_lookup(base, mnt, name, 0, &lower_nd);
5879     +
5880     + switch (err) {
5881     + case 0: /* no error */
5882     + dentry = lower_nd.path.dentry;
5883     + if (new_mnt)
5884     + *new_mnt = lower_nd.path.mnt; /* rc already inc'ed */
5885     + break;
5886     + case -ENOENT:
5887     + /*
5888     + * We don't consider ENOENT an error, and we want to return
5889     + * a negative dentry (ala lookup_one_len). As we know
5890     + * there was no inode for this name before (-ENOENT), then
5891     + * it's safe to call lookup_one_len (which doesn't take a
5892     + * vfsmount).
5893     + */
5894     + dentry = lookup_lck_len(name, base, strlen(name));
5895     + if (new_mnt)
5896     + *new_mnt = mntget(lower_nd.path.mnt);
5897     + break;
5898     + default: /* all other real errors */
5899     + dentry = ERR_PTR(err);
5900     + break;
5901     + }
5902     +
5903     + return dentry;
5904     +}
5905     +
5906     +/*
5907     + * This is a utility function that fills in a unionfs dentry.
5908     + * Caller must lock this dentry with unionfs_lock_dentry.
5909     + *
5910     + * Returns: 0 (ok), or -ERRNO if an error occurred.
5911     + * XXX: get rid of _partial_lookup and make callers call _lookup_full directly
5912     + */
5913     +int unionfs_partial_lookup(struct dentry *dentry, struct dentry *parent)
5914     +{
5915     + struct dentry *tmp;
5916     + int err = -ENOSYS;
5917     +
5918     + tmp = unionfs_lookup_full(dentry, parent, INTERPOSE_PARTIAL);
5919     +
5920     + if (!tmp) {
5921     + err = 0;
5922     + goto out;
5923     + }
5924     + if (IS_ERR(tmp)) {
5925     + err = PTR_ERR(tmp);
5926     + goto out;
5927     + }
5928     + /* XXX: need to change the interface */
5929     + BUG_ON(tmp != dentry);
5930     +out:
5931     + return err;
5932     +}
5933     +
5934     +/* The dentry cache is just so we have properly sized dentries. */
5935     +static struct kmem_cache *unionfs_dentry_cachep;
5936     +int unionfs_init_dentry_cache(void)
5937     +{
5938     + unionfs_dentry_cachep =
5939     + kmem_cache_create("unionfs_dentry",
5940     + sizeof(struct unionfs_dentry_info),
5941     + 0, SLAB_RECLAIM_ACCOUNT, NULL);
5942     +
5943     + return (unionfs_dentry_cachep ? 0 : -ENOMEM);
5944     +}
5945     +
5946     +void unionfs_destroy_dentry_cache(void)
5947     +{
5948     + if (unionfs_dentry_cachep)
5949     + kmem_cache_destroy(unionfs_dentry_cachep);
5950     +}
5951     +
5952     +void free_dentry_private_data(struct dentry *dentry)
5953     +{
5954     + if (!dentry || !dentry->d_fsdata)
5955     + return;
5956     + kfree(UNIONFS_D(dentry)->lower_paths);
5957     + UNIONFS_D(dentry)->lower_paths = NULL;
5958     + kmem_cache_free(unionfs_dentry_cachep, dentry->d_fsdata);
5959     + dentry->d_fsdata = NULL;
5960     +}
5961     +
5962     +static inline int __realloc_dentry_private_data(struct dentry *dentry)
5963     +{
5964     + struct unionfs_dentry_info *info = UNIONFS_D(dentry);
5965     + void *p;
5966     + int size;
5967     +
5968     + BUG_ON(!info);
5969     +
5970     + size = sizeof(struct path) * sbmax(dentry->d_sb);
5971     + p = krealloc(info->lower_paths, size, GFP_ATOMIC);
5972     + if (unlikely(!p))
5973     + return -ENOMEM;
5974     +
5975     + info->lower_paths = p;
5976     +
5977     + info->bstart = -1;
5978     + info->bend = -1;
5979     + info->bopaque = -1;
5980     + info->bcount = sbmax(dentry->d_sb);
5981     + atomic_set(&info->generation,
5982     + atomic_read(&UNIONFS_SB(dentry->d_sb)->generation));
5983     +
5984     + memset(info->lower_paths, 0, size);
5985     +
5986     + return 0;
5987     +}
5988     +
5989     +/* UNIONFS_D(dentry)->lock must be locked */
5990     +int realloc_dentry_private_data(struct dentry *dentry)
5991     +{
5992     + if (!__realloc_dentry_private_data(dentry))
5993     + return 0;
5994     +
5995     + kfree(UNIONFS_D(dentry)->lower_paths);
5996     + free_dentry_private_data(dentry);
5997     + return -ENOMEM;
5998     +}
5999     +
6000     +/* allocate new dentry private data */
6001     +int new_dentry_private_data(struct dentry *dentry, int subclass)
6002     +{
6003     + struct unionfs_dentry_info *info = UNIONFS_D(dentry);
6004     +
6005     + BUG_ON(info);
6006     +
6007     + info = kmem_cache_alloc(unionfs_dentry_cachep, GFP_ATOMIC);
6008     + if (unlikely(!info))
6009     + return -ENOMEM;
6010     +
6011     + mutex_init(&info->lock);
6012     + mutex_lock_nested(&info->lock, subclass);
6013     +
6014     + info->lower_paths = NULL;
6015     +
6016     + dentry->d_fsdata = info;
6017     +
6018     + if (!__realloc_dentry_private_data(dentry))
6019     + return 0;
6020     +
6021     + mutex_unlock(&info->lock);
6022     + free_dentry_private_data(dentry);
6023     + return -ENOMEM;
6024     +}
6025     +
6026     +/*
6027     + * scan through the lower dentry objects, and set bstart to reflect the
6028     + * starting branch
6029     + */
6030     +void update_bstart(struct dentry *dentry)
6031     +{
6032     + int bindex;
6033     + int bstart = dbstart(dentry);
6034     + int bend = dbend(dentry);
6035     + struct dentry *lower_dentry;
6036     +
6037     + for (bindex = bstart; bindex <= bend; bindex++) {
6038     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6039     + if (!lower_dentry)
6040     + continue;
6041     + if (lower_dentry->d_inode) {
6042     + dbstart(dentry) = bindex;
6043     + break;
6044     + }
6045     + dput(lower_dentry);
6046     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
6047     + }
6048     +}
6049     +
6050     +
6051     +/*
6052     + * Initialize a nameidata structure (the intent part) we can pass to a lower
6053     + * file system. Returns 0 on success or -error (only -ENOMEM possible).
6054     + * Inside that nd structure, this function may also return an allocated
6055     + * struct file (for open intents). The caller, when done with this nd, must
6056     + * kfree the intent file (using release_lower_nd).
6057     + *
6058     + * XXX: this code, and the callers of this code, should be redone using
6059     + * vfs_path_lookup() when (1) the nameidata structure is refactored into a
6060     + * separate intent-structure, and (2) open_namei() is broken into a VFS-only
6061     + * function and a method that other file systems can call.
6062     + */
6063     +int init_lower_nd(struct nameidata *nd, unsigned int flags)
6064     +{
6065     + int err = 0;
6066     +#ifdef ALLOC_LOWER_ND_FILE
6067     + /*
6068     + * XXX: one day we may need to have the lower return an open file
6069     + * for us. It is not needed in 2.6.23-rc1 for nfs2/nfs3, but may
6070     + * very well be needed for nfs4.
6071     + */
6072     + struct file *file;
6073     +#endif /* ALLOC_LOWER_ND_FILE */
6074     +
6075     + memset(nd, 0, sizeof(struct nameidata));
6076     + if (!flags)
6077     + return err;
6078     +
6079     + switch (flags) {
6080     + case LOOKUP_CREATE:
6081     + nd->intent.open.flags |= O_CREAT;
6082     + /* fall through: shared code for create/open cases */
6083     + case LOOKUP_OPEN:
6084     + nd->flags = flags;
6085     + nd->intent.open.flags |= (FMODE_READ | FMODE_WRITE);
6086     +#ifdef ALLOC_LOWER_ND_FILE
6087     + file = kzalloc(sizeof(struct file), GFP_KERNEL);
6088     + if (unlikely(!file)) {
6089     + err = -ENOMEM;
6090     + break; /* exit switch statement and thus return */
6091     + }
6092     + nd->intent.open.file = file;
6093     +#endif /* ALLOC_LOWER_ND_FILE */
6094     + break;
6095     + default:
6096     + /*
6097     + * We should never get here, for now.
6098     + * We can add new cases here later on.
6099     + */
6100     + pr_debug("unionfs: unknown nameidata flag 0x%x\n", flags);
6101     + BUG();
6102     + break;
6103     + }
6104     +
6105     + return err;
6106     +}
6107     +
6108     +void release_lower_nd(struct nameidata *nd, int err)
6109     +{
6110     + if (!nd->intent.open.file)
6111     + return;
6112     + else if (!err)
6113     + release_open_intent(nd);
6114     +#ifdef ALLOC_LOWER_ND_FILE
6115     + kfree(nd->intent.open.file);
6116     +#endif /* ALLOC_LOWER_ND_FILE */
6117     +}
6118     +
6119     +/*
6120     + * Main (and complex) driver function for Unionfs's lookup
6121     + *
6122     + * Returns: NULL (ok), ERR_PTR if an error occurred, or a non-null non-error
6123     + * PTR if d_splice returned a different dentry.
6124     + *
6125     + * If lookupmode is INTERPOSE_PARTIAL/REVAL/REVAL_NEG, the passed dentry's
6126     + * inode info must be locked. If lookupmode is INTERPOSE_LOOKUP (i.e., a
6127     + * newly looked-up dentry), then unionfs_lookup_backend will return a locked
6128     + * dentry's info, which the caller must unlock.
6129     + */
6130     +struct dentry *unionfs_lookup_full(struct dentry *dentry,
6131     + struct dentry *parent, int lookupmode)
6132     +{
6133     + int err = 0;
6134     + struct dentry *lower_dentry = NULL;
6135     + struct vfsmount *lower_mnt;
6136     + struct vfsmount *lower_dir_mnt;
6137     + struct dentry *wh_lower_dentry = NULL;
6138     + struct dentry *lower_dir_dentry = NULL;
6139     + struct dentry *d_interposed = NULL;
6140     + int bindex, bstart, bend, bopaque;
6141     + int opaque, num_positive = 0;
6142     + const char *name;
6143     + int namelen;
6144     + int pos_start, pos_end;
6145     +
6146     + /*
6147     + * We should already have a lock on this dentry in the case of a
6148     + * partial lookup, or a revalidation. Otherwise it is returned from
6149     + * new_dentry_private_data already locked.
6150     + */
6151     + verify_locked(dentry);
6152     + verify_locked(parent);
6153     +
6154     + /* must initialize dentry operations */
6155     + dentry->d_op = &unionfs_dops;
6156     +
6157     + /* We never partial lookup the root directory. */
6158     + if (IS_ROOT(dentry))
6159     + goto out;
6160     +
6161     + name = dentry->d_name.name;
6162     + namelen = dentry->d_name.len;
6163     +
6164     + /* No dentries should get created for possible whiteout names. */
6165     + if (!is_validname(name)) {
6166     + err = -EPERM;
6167     + goto out_free;
6168     + }
6169     +
6170     + /* Now start the actual lookup procedure. */
6171     + bstart = dbstart(parent);
6172     + bend = dbend(parent);
6173     + bopaque = dbopaque(parent);
6174     + BUG_ON(bstart < 0);
6175     +
6176     + /* adjust bend to bopaque if needed */
6177     + if ((bopaque >= 0) && (bopaque < bend))
6178     + bend = bopaque;
6179     +
6180     + /* lookup all possible dentries */
6181     + for (bindex = bstart; bindex <= bend; bindex++) {
6182     +
6183     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6184     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
6185     +
6186     + /* skip if we already have a positive lower dentry */
6187     + if (lower_dentry) {
6188     + if (dbstart(dentry) < 0)
6189     + dbstart(dentry) = bindex;
6190     + if (bindex > dbend(dentry))
6191     + dbend(dentry) = bindex;
6192     + if (lower_dentry->d_inode)
6193     + num_positive++;
6194     + continue;
6195     + }
6196     +
6197     + lower_dir_dentry =
6198     + unionfs_lower_dentry_idx(parent, bindex);
6199     + /* if the lower dentry's parent does not exist, skip this */
6200     + if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6201     + continue;
6202     +
6203     + /* also skip it if the parent isn't a directory. */
6204     + if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6205     + continue; /* XXX: should be BUG_ON */
6206     +
6207     + /* check for whiteouts: stop lookup if found */
6208     + wh_lower_dentry = lookup_whiteout(name, lower_dir_dentry);
6209     + if (IS_ERR(wh_lower_dentry)) {
6210     + err = PTR_ERR(wh_lower_dentry);
6211     + goto out_free;
6212     + }
6213     + if (wh_lower_dentry->d_inode) {
6214     + dbend(dentry) = dbopaque(dentry) = bindex;
6215     + if (dbstart(dentry) < 0)
6216     + dbstart(dentry) = bindex;
6217     + dput(wh_lower_dentry);
6218     + break;
6219     + }
6220     + dput(wh_lower_dentry);
6221     +
6222     + /* Now do regular lookup; lookup @name */
6223     + lower_dir_mnt = unionfs_lower_mnt_idx(parent, bindex);
6224     + lower_mnt = NULL; /* XXX: needed? */
6225     +
6226     + lower_dentry = __lookup_one(lower_dir_dentry, lower_dir_mnt,
6227     + name, &lower_mnt);
6228     +
6229     + if (IS_ERR(lower_dentry)) {
6230     + err = PTR_ERR(lower_dentry);
6231     + goto out_free;
6232     + }
6233     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6234     + if (!lower_mnt)
6235     + lower_mnt = unionfs_mntget(dentry->d_sb->s_root,
6236     + bindex);
6237     + unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6238     +
6239     + /* adjust dbstart/end */
6240     + if (dbstart(dentry) < 0)
6241     + dbstart(dentry) = bindex;
6242     + if (bindex > dbend(dentry))
6243     + dbend(dentry) = bindex;
6244     + /*
6245     + * We always store the lower dentries above, and update
6246     + * dbstart/dbend, even if the whole unionfs dentry is
6247     + * negative (i.e., no lower inodes).
6248     + */
6249     + if (!lower_dentry->d_inode)
6250     + continue;
6251     + num_positive++;
6252     +
6253     + /*
6254     + * check if we just found an opaque directory, if so, stop
6255     + * lookups here.
6256     + */
6257     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
6258     + continue;
6259     + opaque = is_opaque_dir(dentry, bindex);
6260     + if (opaque < 0) {
6261     + err = opaque;
6262     + goto out_free;
6263     + } else if (opaque) {
6264     + dbend(dentry) = dbopaque(dentry) = bindex;
6265     + break;
6266     + }
6267     + dbend(dentry) = bindex;
6268     +
6269     + /* update parent directory's atime with the bindex */
6270     + fsstack_copy_attr_atime(parent->d_inode,
6271     + lower_dir_dentry->d_inode);
6272     + }
6273     +
6274     + /* sanity checks, then decide if to process a negative dentry */
6275     + BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6276     + BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6277     +
6278     + if (num_positive > 0)
6279     + goto out_positive;
6280     +
6281     + /*** handle NEGATIVE dentries ***/
6282     +
6283     + /*
6284     + * If negative, keep only first lower negative dentry, to save on
6285     + * memory.
6286     + */
6287     + if (dbstart(dentry) < dbend(dentry)) {
6288     + path_put_lowers(dentry, dbstart(dentry) + 1,
6289     + dbend(dentry), false);
6290     + dbend(dentry) = dbstart(dentry);
6291     + }
6292     + if (lookupmode == INTERPOSE_PARTIAL)
6293     + goto out;
6294     + if (lookupmode == INTERPOSE_LOOKUP) {
6295     + /*
6296     + * If all we found was a whiteout in the first available
6297     + * branch, then create a negative dentry for a possibly new
6298     + * file to be created.
6299     + */
6300     + if (dbopaque(dentry) < 0)
6301     + goto out;
6302     + /* XXX: need to get mnt here */
6303     + bindex = dbstart(dentry);
6304     + if (unionfs_lower_dentry_idx(dentry, bindex))
6305     + goto out;
6306     + lower_dir_dentry =
6307     + unionfs_lower_dentry_idx(parent, bindex);
6308     + if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6309     + goto out;
6310     + if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6311     + goto out; /* XXX: should be BUG_ON */
6312     + /* XXX: do we need to cross bind mounts here? */
6313     + lower_dentry = lookup_lck_len(name, lower_dir_dentry, namelen);
6314     + if (IS_ERR(lower_dentry)) {
6315     + err = PTR_ERR(lower_dentry);
6316     + goto out;
6317     + }
6318     + /* XXX: need to mntget/mntput as needed too! */
6319     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6320     + /* XXX: wrong mnt for crossing bind mounts! */
6321     + lower_mnt = unionfs_mntget(dentry->d_sb->s_root, bindex);
6322     + unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6323     +
6324     + goto out;
6325     + }
6326     +
6327     + /* if we're revalidating a positive dentry, don't make it negative */
6328     + if (lookupmode != INTERPOSE_REVAL)
6329     + d_add(dentry, NULL);
6330     +
6331     + goto out;
6332     +
6333     +out_positive:
6334     + /*** handle POSITIVE dentries ***/
6335     +
6336     + /*
6337     + * This unionfs dentry is positive (at least one lower inode
6338     + * exists), so scan entire dentry from beginning to end, and remove
6339     + * any negative lower dentries, if any. Then, update dbstart/dbend
6340     + * to reflect the start/end of positive dentries.
6341     + */
6342     + pos_start = pos_end = -1;
6343     + for (bindex = bstart; bindex <= bend; bindex++) {
6344     + lower_dentry = unionfs_lower_dentry_idx(dentry,
6345     + bindex);
6346     + if (lower_dentry && lower_dentry->d_inode) {
6347     + if (pos_start < 0)
6348     + pos_start = bindex;
6349     + if (bindex > pos_end)
6350     + pos_end = bindex;
6351     + continue;
6352     + }
6353     + path_put_lowers(dentry, bindex, bindex, false);
6354     + }
6355     + if (pos_start >= 0)
6356     + dbstart(dentry) = pos_start;
6357     + if (pos_end >= 0)
6358     + dbend(dentry) = pos_end;
6359     +
6360     + /* Partial lookups need to re-interpose, or throw away older negs. */
6361     + if (lookupmode == INTERPOSE_PARTIAL) {
6362     + if (dentry->d_inode) {
6363     + unionfs_reinterpose(dentry);
6364     + goto out;
6365     + }
6366     +
6367     + /*
6368     + * This dentry was positive, so it is as if we had a
6369     + * negative revalidation.
6370     + */
6371     + lookupmode = INTERPOSE_REVAL_NEG;
6372     + update_bstart(dentry);
6373     + }
6374     +
6375     + /*
6376     + * Interpose can return a dentry if d_splice returned a different
6377     + * dentry.
6378     + */
6379     + d_interposed = unionfs_interpose(dentry, dentry->d_sb, lookupmode);
6380     + if (IS_ERR(d_interposed))
6381     + err = PTR_ERR(d_interposed);
6382     + else if (d_interposed)
6383     + dentry = d_interposed;
6384     +
6385     + if (!err)
6386     + goto out;
6387     + d_drop(dentry);
6388     +
6389     +out_free:
6390     + /* should dput/mntput all the underlying dentries on error condition */
6391     + if (dbstart(dentry) >= 0)
6392     + path_put_lowers_all(dentry, false);
6393     + /* free lower_paths unconditionally */
6394     + kfree(UNIONFS_D(dentry)->lower_paths);
6395     + UNIONFS_D(dentry)->lower_paths = NULL;
6396     +
6397     +out:
6398     + if (dentry && UNIONFS_D(dentry)) {
6399     + BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6400     + BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6401     + }
6402     + if (d_interposed && UNIONFS_D(d_interposed)) {
6403     + BUG_ON(dbstart(d_interposed) < 0 && dbend(d_interposed) >= 0);
6404     + BUG_ON(dbstart(d_interposed) >= 0 && dbend(d_interposed) < 0);
6405     + }
6406     +
6407     + if (!err && d_interposed)
6408     + return d_interposed;
6409     + return ERR_PTR(err);
6410     +}
6411     diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
6412     new file mode 100644
6413     index 0000000..258386e
6414     --- /dev/null
6415     +++ b/fs/unionfs/main.c
6416     @@ -0,0 +1,758 @@
6417     +/*
6418     + * Copyright (c) 2003-2010 Erez Zadok
6419     + * Copyright (c) 2003-2006 Charles P. Wright
6420     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
6421     + * Copyright (c) 2005-2006 Junjiro Okajima
6422     + * Copyright (c) 2005 Arun M. Krishnakumar
6423     + * Copyright (c) 2004-2006 David P. Quigley
6424     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
6425     + * Copyright (c) 2003 Puja Gupta
6426     + * Copyright (c) 2003 Harikesavan Krishnan
6427     + * Copyright (c) 2003-2010 Stony Brook University
6428     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
6429     + *
6430     + * This program is free software; you can redistribute it and/or modify
6431     + * it under the terms of the GNU General Public License version 2 as
6432     + * published by the Free Software Foundation.
6433     + */
6434     +
6435     +#include "union.h"
6436     +#include <linux/module.h>
6437     +#include <linux/moduleparam.h>
6438     +
6439     +static void unionfs_fill_inode(struct dentry *dentry,
6440     + struct inode *inode)
6441     +{
6442     + struct inode *lower_inode;
6443     + struct dentry *lower_dentry;
6444     + int bindex, bstart, bend;
6445     +
6446     + bstart = dbstart(dentry);
6447     + bend = dbend(dentry);
6448     +
6449     + for (bindex = bstart; bindex <= bend; bindex++) {
6450     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6451     + if (!lower_dentry) {
6452     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
6453     + continue;
6454     + }
6455     +
6456     + /* Initialize the lower inode to the new lower inode. */
6457     + if (!lower_dentry->d_inode)
6458     + continue;
6459     +
6460     + unionfs_set_lower_inode_idx(inode, bindex,
6461     + igrab(lower_dentry->d_inode));
6462     + }
6463     +
6464     + ibstart(inode) = dbstart(dentry);
6465     + ibend(inode) = dbend(dentry);
6466     +
6467     + /* Use attributes from the first branch. */
6468     + lower_inode = unionfs_lower_inode(inode);
6469     +
6470     + /* Use different set of inode ops for symlinks & directories */
6471     + if (S_ISLNK(lower_inode->i_mode))
6472     + inode->i_op = &unionfs_symlink_iops;
6473     + else if (S_ISDIR(lower_inode->i_mode))
6474     + inode->i_op = &unionfs_dir_iops;
6475     +
6476     + /* Use different set of file ops for directories */
6477     + if (S_ISDIR(lower_inode->i_mode))
6478     + inode->i_fop = &unionfs_dir_fops;
6479     +
6480     + /* properly initialize special inodes */
6481     + if (S_ISBLK(lower_inode->i_mode) || S_ISCHR(lower_inode->i_mode) ||
6482     + S_ISFIFO(lower_inode->i_mode) || S_ISSOCK(lower_inode->i_mode))
6483     + init_special_inode(inode, lower_inode->i_mode,
6484     + lower_inode->i_rdev);
6485     +
6486     + /* all well, copy inode attributes */
6487     + unionfs_copy_attr_all(inode, lower_inode);
6488     + fsstack_copy_inode_size(inode, lower_inode);
6489     +}
6490     +
6491     +/*
6492     + * Connect a unionfs inode dentry/inode with several lower ones. This is
6493     + * the classic stackable file system "vnode interposition" action.
6494     + *
6495     + * @sb: unionfs's super_block
6496     + */
6497     +struct dentry *unionfs_interpose(struct dentry *dentry, struct super_block *sb,
6498     + int flag)
6499     +{
6500     + int err = 0;
6501     + struct inode *inode;
6502     + int need_fill_inode = 1;
6503     + struct dentry *spliced = NULL;
6504     +
6505     + verify_locked(dentry);
6506     +
6507     + /*
6508     + * We allocate our new inode below by calling unionfs_iget,
6509     + * which will initialize some of the new inode's fields
6510     + */
6511     +
6512     + /*
6513     + * On revalidate we've already got our own inode and just need
6514     + * to fix it up.
6515     + */
6516     + if (flag == INTERPOSE_REVAL) {
6517     + inode = dentry->d_inode;
6518     + UNIONFS_I(inode)->bstart = -1;
6519     + UNIONFS_I(inode)->bend = -1;
6520     + atomic_set(&UNIONFS_I(inode)->generation,
6521     + atomic_read(&UNIONFS_SB(sb)->generation));
6522     +
6523     + UNIONFS_I(inode)->lower_inodes =
6524     + kcalloc(sbmax(sb), sizeof(struct inode *), GFP_KERNEL);
6525     + if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
6526     + err = -ENOMEM;
6527     + goto out;
6528     + }
6529     + } else {
6530     + /* get unique inode number for unionfs */
6531     + inode = unionfs_iget(sb, iunique(sb, UNIONFS_ROOT_INO));
6532     + if (IS_ERR(inode)) {
6533     + err = PTR_ERR(inode);
6534     + goto out;
6535     + }
6536     + if (atomic_read(&inode->i_count) > 1)
6537     + goto skip;
6538     + }
6539     +
6540     + need_fill_inode = 0;
6541     + unionfs_fill_inode(dentry, inode);
6542     +
6543     +skip:
6544     + /* only (our) lookup wants to do a d_add */
6545     + switch (flag) {
6546     + case INTERPOSE_DEFAULT:
6547     + /* for operations which create new inodes */
6548     + d_add(dentry, inode);
6549     + break;
6550     + case INTERPOSE_REVAL_NEG:
6551     + d_instantiate(dentry, inode);
6552     + break;
6553     + case INTERPOSE_LOOKUP:
6554     + spliced = d_splice_alias(inode, dentry);
6555     + if (spliced && spliced != dentry) {
6556     + /*
6557     + * d_splice can return a dentry if it was
6558     + * disconnected and had to be moved. We must ensure
6559     + * that the private data of the new dentry is
6560     + * correct and that the inode info was filled
6561     + * properly. Finally we must return this new
6562     + * dentry.
6563     + */
6564     + spliced->d_op = &unionfs_dops;
6565     + spliced->d_fsdata = dentry->d_fsdata;
6566     + dentry->d_fsdata = NULL;
6567     + dentry = spliced;
6568     + if (need_fill_inode) {
6569     + need_fill_inode = 0;
6570     + unionfs_fill_inode(dentry, inode);
6571     + }
6572     + goto out_spliced;
6573     + } else if (!spliced) {
6574     + if (need_fill_inode) {
6575     + need_fill_inode = 0;
6576     + unionfs_fill_inode(dentry, inode);
6577     + goto out_spliced;
6578     + }
6579     + }
6580     + break;
6581     + case INTERPOSE_REVAL:
6582     + /* Do nothing. */
6583     + break;
6584     + default:
6585     + printk(KERN_CRIT "unionfs: invalid interpose flag passed!\n");
6586     + BUG();
6587     + }
6588     + goto out;
6589     +
6590     +out_spliced:
6591     + if (!err)
6592     + return spliced;
6593     +out:
6594     + return ERR_PTR(err);
6595     +}
6596     +
6597     +/* like interpose above, but for an already existing dentry */
6598     +void unionfs_reinterpose(struct dentry *dentry)
6599     +{
6600     + struct dentry *lower_dentry;
6601     + struct inode *inode;
6602     + int bindex, bstart, bend;
6603     +
6604     + verify_locked(dentry);
6605     +
6606     + /* This is pre-allocated inode */
6607     + inode = dentry->d_inode;
6608     +
6609     + bstart = dbstart(dentry);
6610     + bend = dbend(dentry);
6611     + for (bindex = bstart; bindex <= bend; bindex++) {
6612     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6613     + if (!lower_dentry)
6614     + continue;
6615     +
6616     + if (!lower_dentry->d_inode)
6617     + continue;
6618     + if (unionfs_lower_inode_idx(inode, bindex))
6619     + continue;
6620     + unionfs_set_lower_inode_idx(inode, bindex,
6621     + igrab(lower_dentry->d_inode));
6622     + }
6623     + ibstart(inode) = dbstart(dentry);
6624     + ibend(inode) = dbend(dentry);
6625     +}
6626     +
6627     +/*
6628     + * make sure the branch we just looked up (nd) makes sense:
6629     + *
6630     + * 1) we're not trying to stack unionfs on top of unionfs
6631     + * 2) it exists
6632     + * 3) is a directory
6633     + */
6634     +int check_branch(struct nameidata *nd)
6635     +{
6636     + /* XXX: remove in ODF code -- stacking unions allowed there */
6637     + if (!strcmp(nd->path.dentry->d_sb->s_type->name, UNIONFS_NAME))
6638     + return -EINVAL;
6639     + if (!nd->path.dentry->d_inode)
6640     + return -ENOENT;
6641     + if (!S_ISDIR(nd->path.dentry->d_inode->i_mode))
6642     + return -ENOTDIR;
6643     + return 0;
6644     +}
6645     +
6646     +/* checks if two lower_dentries have overlapping branches */
6647     +static int is_branch_overlap(struct dentry *dent1, struct dentry *dent2)
6648     +{
6649     + struct dentry *dent = NULL;
6650     +
6651     + dent = dent1;
6652     + while ((dent != dent2) && (dent->d_parent != dent))
6653     + dent = dent->d_parent;
6654     +
6655     + if (dent == dent2)
6656     + return 1;
6657     +
6658     + dent = dent2;
6659     + while ((dent != dent1) && (dent->d_parent != dent))
6660     + dent = dent->d_parent;
6661     +
6662     + return (dent == dent1);
6663     +}
6664     +
6665     +/*
6666     + * Parse "ro" or "rw" options, but default to "rw" if no mode options was
6667     + * specified. Fill the mode bits in @perms. If encounter an unknown
6668     + * string, return -EINVAL. Otherwise return 0.
6669     + */
6670     +int parse_branch_mode(const char *name, int *perms)
6671     +{
6672     + if (!name || !strcmp(name, "rw")) {
6673     + *perms = MAY_READ | MAY_WRITE;
6674     + return 0;
6675     + }
6676     + if (!strcmp(name, "ro")) {
6677     + *perms = MAY_READ;
6678     + return 0;
6679     + }
6680     + return -EINVAL;
6681     +}
6682     +
6683     +/*
6684     + * parse the dirs= mount argument
6685     + *
6686     + * We don't need to lock the superblock private data's rwsem, as we get
6687     + * called only by unionfs_read_super - it is still a long time before anyone
6688     + * can even get a reference to us.
6689     + */
6690     +static int parse_dirs_option(struct super_block *sb, struct unionfs_dentry_info
6691     + *lower_root_info, char *options)
6692     +{
6693     + struct nameidata nd;
6694     + char *name;
6695     + int err = 0;
6696     + int branches = 1;
6697     + int bindex = 0;
6698     + int i = 0;
6699     + int j = 0;
6700     + struct dentry *dent1;
6701     + struct dentry *dent2;
6702     +
6703     + if (options[0] == '\0') {
6704     + printk(KERN_ERR "unionfs: no branches specified\n");
6705     + err = -EINVAL;
6706     + goto out;
6707     + }
6708     +
6709     + /*
6710     + * Each colon means we have a separator, this is really just a rough
6711     + * guess, since strsep will handle empty fields for us.
6712     + */
6713     + for (i = 0; options[i]; i++)
6714     + if (options[i] == ':')
6715     + branches++;
6716     +
6717     + /* allocate space for underlying pointers to lower dentry */
6718     + UNIONFS_SB(sb)->data =
6719     + kcalloc(branches, sizeof(struct unionfs_data), GFP_KERNEL);
6720     + if (unlikely(!UNIONFS_SB(sb)->data)) {
6721     + err = -ENOMEM;
6722     + goto out;
6723     + }
6724     +
6725     + lower_root_info->lower_paths =
6726     + kcalloc(branches, sizeof(struct path), GFP_KERNEL);
6727     + if (unlikely(!lower_root_info->lower_paths)) {
6728     + err = -ENOMEM;
6729     + goto out;
6730     + }
6731     +
6732     + /* now parsing a string such as "b1:b2=rw:b3=ro:b4" */
6733     + branches = 0;
6734     + while ((name = strsep(&options, ":")) != NULL) {
6735     + int perms;
6736     + char *mode = strchr(name, '=');
6737     +
6738     + if (!name)
6739     + continue;
6740     + if (!*name) { /* bad use of ':' (extra colons) */
6741     + err = -EINVAL;
6742     + goto out;
6743     + }
6744     +
6745     + branches++;
6746     +
6747     + /* strip off '=' if any */
6748     + if (mode)
6749     + *mode++ = '\0';
6750     +
6751     + err = parse_branch_mode(mode, &perms);
6752     + if (err) {
6753     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
6754     + "branch %d\n", mode, bindex);
6755     + goto out;
6756     + }
6757     + /* ensure that leftmost branch is writeable */
6758     + if (!bindex && !(perms & MAY_WRITE)) {
6759     + printk(KERN_ERR "unionfs: leftmost branch cannot be "
6760     + "read-only (use \"-o ro\" to create a "
6761     + "read-only union)\n");
6762     + err = -EINVAL;
6763     + goto out;
6764     + }
6765     +
6766     + err = path_lookup(name, LOOKUP_FOLLOW, &nd);
6767     + if (err) {
6768     + printk(KERN_ERR "unionfs: error accessing "
6769     + "lower directory '%s' (error %d)\n",
6770     + name, err);
6771     + goto out;
6772     + }
6773     +
6774     + err = check_branch(&nd);
6775     + if (err) {
6776     + printk(KERN_ERR "unionfs: lower directory "
6777     + "'%s' is not a valid branch\n", name);
6778     + path_put(&nd.path);
6779     + goto out;
6780     + }
6781     +
6782     + lower_root_info->lower_paths[bindex].dentry = nd.path.dentry;
6783     + lower_root_info->lower_paths[bindex].mnt = nd.path.mnt;
6784     +
6785     + set_branchperms(sb, bindex, perms);
6786     + set_branch_count(sb, bindex, 0);
6787     + new_branch_id(sb, bindex);
6788     +
6789     + if (lower_root_info->bstart < 0)
6790     + lower_root_info->bstart = bindex;
6791     + lower_root_info->bend = bindex;
6792     + bindex++;
6793     + }
6794     +
6795     + if (branches == 0) {
6796     + printk(KERN_ERR "unionfs: no branches specified\n");
6797     + err = -EINVAL;
6798     + goto out;
6799     + }
6800     +
6801     + BUG_ON(branches != (lower_root_info->bend + 1));
6802     +
6803     + /*
6804     + * Ensure that no overlaps exist in the branches.
6805     + *
6806     + * This test is required because the Linux kernel has no support
6807     + * currently for ensuring coherency between stackable layers and
6808     + * branches. If we were to allow overlapping branches, it would be
6809     + * possible, for example, to delete a file via one branch, which
6810     + * would not be reflected in another branch. Such incoherency could
6811     + * lead to inconsistencies and even kernel oopses. Rather than
6812     + * implement hacks to work around some of these cache-coherency
6813     + * problems, we prevent branch overlapping, for now. A complete
6814     + * solution will involve proper kernel/VFS support for cache
6815     + * coherency, at which time we could safely remove this
6816     + * branch-overlapping test.
6817     + */
6818     + for (i = 0; i < branches; i++) {
6819     + dent1 = lower_root_info->lower_paths[i].dentry;
6820     + for (j = i + 1; j < branches; j++) {
6821     + dent2 = lower_root_info->lower_paths[j].dentry;
6822     + if (is_branch_overlap(dent1, dent2)) {
6823     + printk(KERN_ERR "unionfs: branches %d and "
6824     + "%d overlap\n", i, j);
6825     + err = -EINVAL;
6826     + goto out;
6827     + }
6828     + }
6829     + }
6830     +
6831     +out:
6832     + if (err) {
6833     + for (i = 0; i < branches; i++)
6834     + path_put(&lower_root_info->lower_paths[i]);
6835     +
6836     + kfree(lower_root_info->lower_paths);
6837     + kfree(UNIONFS_SB(sb)->data);
6838     +
6839     + /*
6840     + * MUST clear the pointers to prevent potential double free if
6841     + * the caller dies later on
6842     + */
6843     + lower_root_info->lower_paths = NULL;
6844     + UNIONFS_SB(sb)->data = NULL;
6845     + }
6846     + return err;
6847     +}
6848     +
6849     +/*
6850     + * Parse mount options. See the manual page for usage instructions.
6851     + *
6852     + * Returns the dentry object of the lower-level (lower) directory;
6853     + * We want to mount our stackable file system on top of that lower directory.
6854     + */
6855     +static struct unionfs_dentry_info *unionfs_parse_options(
6856     + struct super_block *sb,
6857     + char *options)
6858     +{
6859     + struct unionfs_dentry_info *lower_root_info;
6860     + char *optname;
6861     + int err = 0;
6862     + int bindex;
6863     + int dirsfound = 0;
6864     +
6865     + /* allocate private data area */
6866     + err = -ENOMEM;
6867     + lower_root_info =
6868     + kzalloc(sizeof(struct unionfs_dentry_info), GFP_KERNEL);
6869     + if (unlikely(!lower_root_info))
6870     + goto out_error;
6871     + lower_root_info->bstart = -1;
6872     + lower_root_info->bend = -1;
6873     + lower_root_info->bopaque = -1;
6874     +
6875     + while ((optname = strsep(&options, ",")) != NULL) {
6876     + char *optarg;
6877     +
6878     + if (!optname || !*optname)
6879     + continue;
6880     +
6881     + optarg = strchr(optname, '=');
6882     + if (optarg)
6883     + *optarg++ = '\0';
6884     +
6885     + /*
6886     + * All of our options take an argument now. Insert ones that
6887     + * don't, above this check.
6888     + */
6889     + if (!optarg) {
6890     + printk(KERN_ERR "unionfs: %s requires an argument\n",
6891     + optname);
6892     + err = -EINVAL;
6893     + goto out_error;
6894     + }
6895     +
6896     + if (!strcmp("dirs", optname)) {
6897     + if (++dirsfound > 1) {
6898     + printk(KERN_ERR
6899     + "unionfs: multiple dirs specified\n");
6900     + err = -EINVAL;
6901     + goto out_error;
6902     + }
6903     + err = parse_dirs_option(sb, lower_root_info, optarg);
6904     + if (err)
6905     + goto out_error;
6906     + continue;
6907     + }
6908     +
6909     + err = -EINVAL;
6910     + printk(KERN_ERR
6911     + "unionfs: unrecognized option '%s'\n", optname);
6912     + goto out_error;
6913     + }
6914     + if (dirsfound != 1) {
6915     + printk(KERN_ERR "unionfs: dirs option required\n");
6916     + err = -EINVAL;
6917     + goto out_error;
6918     + }
6919     + goto out;
6920     +
6921     +out_error:
6922     + if (lower_root_info && lower_root_info->lower_paths) {
6923     + for (bindex = lower_root_info->bstart;
6924     + bindex >= 0 && bindex <= lower_root_info->bend;
6925     + bindex++)
6926     + path_put(&lower_root_info->lower_paths[bindex]);
6927     + }
6928     +
6929     + kfree(lower_root_info->lower_paths);
6930     + kfree(lower_root_info);
6931     +
6932     + kfree(UNIONFS_SB(sb)->data);
6933     + UNIONFS_SB(sb)->data = NULL;
6934     +
6935     + lower_root_info = ERR_PTR(err);
6936     +out:
6937     + return lower_root_info;
6938     +}
6939     +
6940     +/*
6941     + * our custom d_alloc_root work-alike
6942     + *
6943     + * we can't use d_alloc_root if we want to use our own interpose function
6944     + * unchanged, so we simply call our own "fake" d_alloc_root
6945     + */
6946     +static struct dentry *unionfs_d_alloc_root(struct super_block *sb)
6947     +{
6948     + struct dentry *ret = NULL;
6949     +
6950     + if (sb) {
6951     + static const struct qstr name = {
6952     + .name = "/",
6953     + .len = 1
6954     + };
6955     +
6956     + ret = d_alloc(NULL, &name);
6957     + if (likely(ret)) {
6958     + ret->d_op = &unionfs_dops;
6959     + ret->d_sb = sb;
6960     + ret->d_parent = ret;
6961     + }
6962     + }
6963     + return ret;
6964     +}
6965     +
6966     +/*
6967     + * There is no need to lock the unionfs_super_info's rwsem as there is no
6968     + * way anyone can have a reference to the superblock at this point in time.
6969     + */
6970     +static int unionfs_read_super(struct super_block *sb, void *raw_data,
6971     + int silent)
6972     +{
6973     + int err = 0;
6974     + struct unionfs_dentry_info *lower_root_info = NULL;
6975     + int bindex, bstart, bend;
6976     +
6977     + if (!raw_data) {
6978     + printk(KERN_ERR
6979     + "unionfs: read_super: missing data argument\n");
6980     + err = -EINVAL;
6981     + goto out;
6982     + }
6983     +
6984     + /* Allocate superblock private data */
6985     + sb->s_fs_info = kzalloc(sizeof(struct unionfs_sb_info), GFP_KERNEL);
6986     + if (unlikely(!UNIONFS_SB(sb))) {
6987     + printk(KERN_CRIT "unionfs: read_super: out of memory\n");
6988     + err = -ENOMEM;
6989     + goto out;
6990     + }
6991     +
6992     + UNIONFS_SB(sb)->bend = -1;
6993     + atomic_set(&UNIONFS_SB(sb)->generation, 1);
6994     + init_rwsem(&UNIONFS_SB(sb)->rwsem);
6995     + UNIONFS_SB(sb)->high_branch_id = -1; /* -1 == invalid branch ID */
6996     +
6997     + lower_root_info = unionfs_parse_options(sb, raw_data);
6998     + if (IS_ERR(lower_root_info)) {
6999     + printk(KERN_ERR
7000     + "unionfs: read_super: error while parsing options "
7001     + "(err = %ld)\n", PTR_ERR(lower_root_info));
7002     + err = PTR_ERR(lower_root_info);
7003     + lower_root_info = NULL;
7004     + goto out_free;
7005     + }
7006     + if (lower_root_info->bstart == -1) {
7007     + err = -ENOENT;
7008     + goto out_free;
7009     + }
7010     +
7011     + /* set the lower superblock field of upper superblock */
7012     + bstart = lower_root_info->bstart;
7013     + BUG_ON(bstart != 0);
7014     + sbend(sb) = bend = lower_root_info->bend;
7015     + for (bindex = bstart; bindex <= bend; bindex++) {
7016     + struct dentry *d = lower_root_info->lower_paths[bindex].dentry;
7017     + atomic_inc(&d->d_sb->s_active);
7018     + unionfs_set_lower_super_idx(sb, bindex, d->d_sb);
7019     + }
7020     +
7021     + /* max Bytes is the maximum bytes from highest priority branch */
7022     + sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
7023     +
7024     + /*
7025     + * Our c/m/atime granularity is 1 ns because we may stack on file
7026     + * systems whose granularity is as good. This is important for our
7027     + * time-based cache coherency.
7028     + */
7029     + sb->s_time_gran = 1;
7030     +
7031     + sb->s_op = &unionfs_sops;
7032     +
7033     + /* See comment next to the definition of unionfs_d_alloc_root */
7034     + sb->s_root = unionfs_d_alloc_root(sb);
7035     + if (unlikely(!sb->s_root)) {
7036     + err = -ENOMEM;
7037     + goto out_dput;
7038     + }
7039     +
7040     + /* link the upper and lower dentries */
7041     + sb->s_root->d_fsdata = NULL;
7042     + err = new_dentry_private_data(sb->s_root, UNIONFS_DMUTEX_ROOT);
7043     + if (unlikely(err))
7044     + goto out_freedpd;
7045     +
7046     + /* Set the lower dentries for s_root */
7047     + for (bindex = bstart; bindex <= bend; bindex++) {
7048     + struct dentry *d;
7049     + struct vfsmount *m;
7050     +
7051     + d = lower_root_info->lower_paths[bindex].dentry;
7052     + m = lower_root_info->lower_paths[bindex].mnt;
7053     +
7054     + unionfs_set_lower_dentry_idx(sb->s_root, bindex, d);
7055     + unionfs_set_lower_mnt_idx(sb->s_root, bindex, m);
7056     + }
7057     + dbstart(sb->s_root) = bstart;
7058     + dbend(sb->s_root) = bend;
7059     +
7060     + /* Set the generation number to one, since this is for the mount. */
7061     + atomic_set(&UNIONFS_D(sb->s_root)->generation, 1);
7062     +
7063     + /*
7064     + * Call interpose to create the upper level inode. Only
7065     + * INTERPOSE_LOOKUP can return a value other than 0 on err.
7066     + */
7067     + err = PTR_ERR(unionfs_interpose(sb->s_root, sb, 0));
7068     + unionfs_unlock_dentry(sb->s_root);
7069     + if (!err)
7070     + goto out;
7071     + /* else fall through */
7072     +
7073     +out_freedpd:
7074     + if (UNIONFS_D(sb->s_root)) {
7075     + kfree(UNIONFS_D(sb->s_root)->lower_paths);
7076     + free_dentry_private_data(sb->s_root);
7077     + }
7078     + dput(sb->s_root);
7079     +
7080     +out_dput:
7081     + if (lower_root_info && !IS_ERR(lower_root_info)) {
7082     + for (bindex = lower_root_info->bstart;
7083     + bindex <= lower_root_info->bend; bindex++) {
7084     + struct dentry *d;
7085     + d = lower_root_info->lower_paths[bindex].dentry;
7086     + /* drop refs we took earlier */
7087     + atomic_dec(&d->d_sb->s_active);
7088     + path_put(&lower_root_info->lower_paths[bindex]);
7089     + }
7090     + kfree(lower_root_info->lower_paths);
7091     + kfree(lower_root_info);
7092     + lower_root_info = NULL;
7093     + }
7094     +
7095     +out_free:
7096     + kfree(UNIONFS_SB(sb)->data);
7097     + kfree(UNIONFS_SB(sb));
7098     + sb->s_fs_info = NULL;
7099     +
7100     +out:
7101     + if (lower_root_info && !IS_ERR(lower_root_info)) {
7102     + kfree(lower_root_info->lower_paths);
7103     + kfree(lower_root_info);
7104     + }
7105     + return err;
7106     +}
7107     +
7108     +static int unionfs_get_sb(struct file_system_type *fs_type,
7109     + int flags, const char *dev_name,
7110     + void *raw_data, struct vfsmount *mnt)
7111     +{
7112     + int err;
7113     + err = get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
7114     + if (!err)
7115     + UNIONFS_SB(mnt->mnt_sb)->dev_name =
7116     + kstrdup(dev_name, GFP_KERNEL);
7117     + return err;
7118     +}
7119     +
7120     +static struct file_system_type unionfs_fs_type = {
7121     + .owner = THIS_MODULE,
7122     + .name = UNIONFS_NAME,
7123     + .get_sb = unionfs_get_sb,
7124     + .kill_sb = generic_shutdown_super,
7125     + .fs_flags = FS_REVAL_DOT,
7126     +};
7127     +
7128     +static int __init init_unionfs_fs(void)
7129     +{
7130     + int err;
7131     +
7132     + pr_info("Registering unionfs " UNIONFS_VERSION "\n");
7133     +
7134     + err = unionfs_init_filldir_cache();
7135     + if (unlikely(err))
7136     + goto out;
7137     + err = unionfs_init_inode_cache();
7138     + if (unlikely(err))
7139     + goto out;
7140     + err = unionfs_init_dentry_cache();
7141     + if (unlikely(err))
7142     + goto out;
7143     + err = init_sioq();
7144     + if (unlikely(err))
7145     + goto out;
7146     + err = register_filesystem(&unionfs_fs_type);
7147     +out:
7148     + if (unlikely(err)) {
7149     + stop_sioq();
7150     + unionfs_destroy_filldir_cache();
7151     + unionfs_destroy_inode_cache();
7152     + unionfs_destroy_dentry_cache();
7153     + }
7154     + return err;
7155     +}
7156     +
7157     +static void __exit exit_unionfs_fs(void)
7158     +{
7159     + stop_sioq();
7160     + unionfs_destroy_filldir_cache();
7161     + unionfs_destroy_inode_cache();
7162     + unionfs_destroy_dentry_cache();
7163     + unregister_filesystem(&unionfs_fs_type);
7164     + pr_info("Completed unionfs module unload\n");
7165     +}
7166     +
7167     +MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University"
7168     + " (http://www.fsl.cs.sunysb.edu)");
7169     +MODULE_DESCRIPTION("Unionfs " UNIONFS_VERSION
7170     + " (http://unionfs.filesystems.org)");
7171     +MODULE_LICENSE("GPL");
7172     +
7173     +module_init(init_unionfs_fs);
7174     +module_exit(exit_unionfs_fs);
7175     diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
7176     new file mode 100644
7177     index 0000000..1f70535
7178     --- /dev/null
7179     +++ b/fs/unionfs/mmap.c
7180     @@ -0,0 +1,89 @@
7181     +/*
7182     + * Copyright (c) 2003-2010 Erez Zadok
7183     + * Copyright (c) 2003-2006 Charles P. Wright
7184     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7185     + * Copyright (c) 2005-2006 Junjiro Okajima
7186     + * Copyright (c) 2006 Shaya Potter
7187     + * Copyright (c) 2005 Arun M. Krishnakumar
7188     + * Copyright (c) 2004-2006 David P. Quigley
7189     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7190     + * Copyright (c) 2003 Puja Gupta
7191     + * Copyright (c) 2003 Harikesavan Krishnan
7192     + * Copyright (c) 2003-2010 Stony Brook University
7193     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
7194     + *
7195     + * This program is free software; you can redistribute it and/or modify
7196     + * it under the terms of the GNU General Public License version 2 as
7197     + * published by the Free Software Foundation.
7198     + */
7199     +
7200     +#include "union.h"
7201     +
7202     +
7203     +/*
7204     + * XXX: we need a dummy readpage handler because generic_file_mmap (which we
7205     + * use in unionfs_mmap) checks for the existence of
7206     + * mapping->a_ops->readpage, else it returns -ENOEXEC. The VFS will need to
7207     + * be fixed to allow a file system to define vm_ops->fault without any
7208     + * address_space_ops whatsoever.
7209     + *
7210     + * Otherwise, we don't want to use our readpage method at all.
7211     + */
7212     +static int unionfs_readpage(struct file *file, struct page *page)
7213     +{
7214     + BUG();
7215     + return -EINVAL;
7216     +}
7217     +
7218     +static int unionfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
7219     +{
7220     + int err;
7221     + struct file *file, *lower_file;
7222     + const struct vm_operations_struct *lower_vm_ops;
7223     + struct vm_area_struct lower_vma;
7224     +
7225     + BUG_ON(!vma);
7226     + memcpy(&lower_vma, vma, sizeof(struct vm_area_struct));
7227     + file = lower_vma.vm_file;
7228     + lower_vm_ops = UNIONFS_F(file)->lower_vm_ops;
7229     + BUG_ON(!lower_vm_ops);
7230     +
7231     + lower_file = unionfs_lower_file(file);
7232     + BUG_ON(!lower_file);
7233     + /*
7234     + * XXX: vm_ops->fault may be called in parallel. Because we have to
7235     + * resort to temporarily changing the vma->vm_file to point to the
7236     + * lower file, a concurrent invocation of unionfs_fault could see a
7237     + * different value. In this workaround, we keep a different copy of
7238     + * the vma structure in our stack, so we never expose a different
7239     + * value of the vma->vm_file called to us, even temporarily. A
7240     + * better fix would be to change the calling semantics of ->fault to
7241     + * take an explicit file pointer.
7242     + */
7243     + lower_vma.vm_file = lower_file;
7244     + err = lower_vm_ops->fault(&lower_vma, vmf);
7245     + return err;
7246     +}
7247     +
7248     +/*
7249     + * XXX: the default address_space_ops for unionfs is empty. We cannot set
7250     + * our inode->i_mapping->a_ops to NULL because too many code paths expect
7251     + * the a_ops vector to be non-NULL.
7252     + */
7253     +struct address_space_operations unionfs_aops = {
7254     + /* empty on purpose */
7255     +};
7256     +
7257     +/*
7258     + * XXX: we need a second, dummy address_space_ops vector, to be used
7259     + * temporarily during unionfs_mmap, because the latter calls
7260     + * generic_file_mmap, which checks if ->readpage exists, else returns
7261     + * -ENOEXEC.
7262     + */
7263     +struct address_space_operations unionfs_dummy_aops = {
7264     + .readpage = unionfs_readpage,
7265     +};
7266     +
7267     +struct vm_operations_struct unionfs_vm_ops = {
7268     + .fault = unionfs_fault,
7269     +};
7270     diff --git a/fs/unionfs/rdstate.c b/fs/unionfs/rdstate.c
7271     new file mode 100644
7272     index 0000000..f745fbc
7273     --- /dev/null
7274     +++ b/fs/unionfs/rdstate.c
7275     @@ -0,0 +1,285 @@
7276     +/*
7277     + * Copyright (c) 2003-2010 Erez Zadok
7278     + * Copyright (c) 2003-2006 Charles P. Wright
7279     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7280     + * Copyright (c) 2005-2006 Junjiro Okajima
7281     + * Copyright (c) 2005 Arun M. Krishnakumar
7282     + * Copyright (c) 2004-2006 David P. Quigley
7283     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7284     + * Copyright (c) 2003 Puja Gupta
7285     + * Copyright (c) 2003 Harikesavan Krishnan
7286     + * Copyright (c) 2003-2010 Stony Brook University
7287     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
7288     + *
7289     + * This program is free software; you can redistribute it and/or modify
7290     + * it under the terms of the GNU General Public License version 2 as
7291     + * published by the Free Software Foundation.
7292     + */
7293     +
7294     +#include "union.h"
7295     +
7296     +/* This file contains the routines for maintaining readdir state. */
7297     +
7298     +/*
7299     + * There are two structures here, rdstate which is a hash table
7300     + * of the second structure which is a filldir_node.
7301     + */
7302     +
7303     +/*
7304     + * This is a struct kmem_cache for filldir nodes, because we allocate a lot
7305     + * of them and they shouldn't waste memory. If the node has a small name
7306     + * (as defined by the dentry structure), then we use an inline name to
7307     + * preserve kmalloc space.
7308     + */
7309     +static struct kmem_cache *unionfs_filldir_cachep;
7310     +
7311     +int unionfs_init_filldir_cache(void)
7312     +{
7313     + unionfs_filldir_cachep =
7314     + kmem_cache_create("unionfs_filldir",
7315     + sizeof(struct filldir_node), 0,
7316     + SLAB_RECLAIM_ACCOUNT, NULL);
7317     +
7318     + return (unionfs_filldir_cachep ? 0 : -ENOMEM);
7319     +}
7320     +
7321     +void unionfs_destroy_filldir_cache(void)
7322     +{
7323     + if (unionfs_filldir_cachep)
7324     + kmem_cache_destroy(unionfs_filldir_cachep);
7325     +}
7326     +
7327     +/*
7328     + * This is a tuning parameter that tells us roughly how big to make the
7329     + * hash table in directory entries per page. This isn't perfect, but
7330     + * at least we get a hash table size that shouldn't be too overloaded.
7331     + * The following averages are based on my home directory.
7332     + * 14.44693 Overall
7333     + * 12.29 Single Page Directories
7334     + * 117.93 Multi-page directories
7335     + */
7336     +#define DENTPAGE 4096
7337     +#define DENTPERONEPAGE 12
7338     +#define DENTPERPAGE 118
7339     +#define MINHASHSIZE 1
7340     +static int guesstimate_hash_size(struct inode *inode)
7341     +{
7342     + struct inode *lower_inode;
7343     + int bindex;
7344     + int hashsize = MINHASHSIZE;
7345     +
7346     + if (UNIONFS_I(inode)->hashsize > 0)
7347     + return UNIONFS_I(inode)->hashsize;
7348     +
7349     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
7350     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
7351     + if (!lower_inode)
7352     + continue;
7353     +
7354     + if (i_size_read(lower_inode) == DENTPAGE)
7355     + hashsize += DENTPERONEPAGE;
7356     + else
7357     + hashsize += (i_size_read(lower_inode) / DENTPAGE) *
7358     + DENTPERPAGE;
7359     + }
7360     +
7361     + return hashsize;
7362     +}
7363     +
7364     +int init_rdstate(struct file *file)
7365     +{
7366     + BUG_ON(sizeof(loff_t) !=
7367     + (sizeof(unsigned int) + sizeof(unsigned int)));
7368     + BUG_ON(UNIONFS_F(file)->rdstate != NULL);
7369     +
7370     + UNIONFS_F(file)->rdstate = alloc_rdstate(file->f_path.dentry->d_inode,
7371     + fbstart(file));
7372     +
7373     + return (UNIONFS_F(file)->rdstate ? 0 : -ENOMEM);
7374     +}
7375     +
7376     +struct unionfs_dir_state *find_rdstate(struct inode *inode, loff_t fpos)
7377     +{
7378     + struct unionfs_dir_state *rdstate = NULL;
7379     + struct list_head *pos;
7380     +
7381     + spin_lock(&UNIONFS_I(inode)->rdlock);
7382     + list_for_each(pos, &UNIONFS_I(inode)->readdircache) {
7383     + struct unionfs_dir_state *r =
7384     + list_entry(pos, struct unionfs_dir_state, cache);
7385     + if (fpos == rdstate2offset(r)) {
7386     + UNIONFS_I(inode)->rdcount--;
7387     + list_del(&r->cache);
7388     + rdstate = r;
7389     + break;
7390     + }
7391     + }
7392     + spin_unlock(&UNIONFS_I(inode)->rdlock);
7393     + return rdstate;
7394     +}
7395     +
7396     +struct unionfs_dir_state *alloc_rdstate(struct inode *inode, int bindex)
7397     +{
7398     + int i = 0;
7399     + int hashsize;
7400     + unsigned long mallocsize = sizeof(struct unionfs_dir_state);
7401     + struct unionfs_dir_state *rdstate;
7402     +
7403     + hashsize = guesstimate_hash_size(inode);
7404     + mallocsize += hashsize * sizeof(struct list_head);
7405     + mallocsize = __roundup_pow_of_two(mallocsize);
7406     +
7407     + /* This should give us about 500 entries anyway. */
7408     + if (mallocsize > PAGE_SIZE)
7409     + mallocsize = PAGE_SIZE;
7410     +
7411     + hashsize = (mallocsize - sizeof(struct unionfs_dir_state)) /
7412     + sizeof(struct list_head);
7413     +
7414     + rdstate = kmalloc(mallocsize, GFP_KERNEL);
7415     + if (unlikely(!rdstate))
7416     + return NULL;
7417     +
7418     + spin_lock(&UNIONFS_I(inode)->rdlock);
7419     + if (UNIONFS_I(inode)->cookie >= (MAXRDCOOKIE - 1))
7420     + UNIONFS_I(inode)->cookie = 1;
7421     + else
7422     + UNIONFS_I(inode)->cookie++;
7423     +
7424     + rdstate->cookie = UNIONFS_I(inode)->cookie;
7425     + spin_unlock(&UNIONFS_I(inode)->rdlock);
7426     + rdstate->offset = 1;
7427     + rdstate->access = jiffies;
7428     + rdstate->bindex = bindex;
7429     + rdstate->dirpos = 0;
7430     + rdstate->hashentries = 0;
7431     + rdstate->size = hashsize;
7432     + for (i = 0; i < rdstate->size; i++)
7433     + INIT_LIST_HEAD(&rdstate->list[i]);
7434     +
7435     + return rdstate;
7436     +}
7437     +
7438     +static void free_filldir_node(struct filldir_node *node)
7439     +{
7440     + if (node->namelen >= DNAME_INLINE_LEN_MIN)
7441     + kfree(node->name);
7442     + kmem_cache_free(unionfs_filldir_cachep, node);
7443     +}
7444     +
7445     +void free_rdstate(struct unionfs_dir_state *state)
7446     +{
7447     + struct filldir_node *tmp;
7448     + int i;
7449     +
7450     + for (i = 0; i < state->size; i++) {
7451     + struct list_head *head = &(state->list[i]);
7452     + struct list_head *pos, *n;
7453     +
7454     + /* traverse the list and deallocate space */
7455     + list_for_each_safe(pos, n, head) {
7456     + tmp = list_entry(pos, struct filldir_node, file_list);
7457     + list_del(&tmp->file_list);
7458     + free_filldir_node(tmp);
7459     + }
7460     + }
7461     +
7462     + kfree(state);
7463     +}
7464     +
7465     +struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
7466     + const char *name, int namelen,
7467     + int is_whiteout)
7468     +{
7469     + int index;
7470     + unsigned int hash;
7471     + struct list_head *head;
7472     + struct list_head *pos;
7473     + struct filldir_node *cursor = NULL;
7474     + int found = 0;
7475     +
7476     + BUG_ON(namelen <= 0);
7477     +
7478     + hash = full_name_hash(name, namelen);
7479     + index = hash % rdstate->size;
7480     +
7481     + head = &(rdstate->list[index]);
7482     + list_for_each(pos, head) {
7483     + cursor = list_entry(pos, struct filldir_node, file_list);
7484     +
7485     + if (cursor->namelen == namelen && cursor->hash == hash &&
7486     + !strncmp(cursor->name, name, namelen)) {
7487     + /*
7488     + * a duplicate exists, and hence no need to create
7489     + * entry to the list
7490     + */
7491     + found = 1;
7492     +
7493     + /*
7494     + * if a duplicate is found in this branch, and is
7495     + * not due to the caller looking for an entry to
7496     + * whiteout, then the file system may be corrupted.
7497     + */
7498     + if (unlikely(!is_whiteout &&
7499     + cursor->bindex == rdstate->bindex))
7500     + printk(KERN_ERR "unionfs: filldir: possible "
7501     + "I/O error: a file is duplicated "
7502     + "in the same branch %d: %s\n",
7503     + rdstate->bindex, cursor->name);
7504     + break;
7505     + }
7506     + }
7507     +
7508     + if (!found)
7509     + cursor = NULL;
7510     +
7511     + return cursor;
7512     +}
7513     +
7514     +int add_filldir_node(struct unionfs_dir_state *rdstate, const char *name,
7515     + int namelen, int bindex, int whiteout)
7516     +{
7517     + struct filldir_node *new;
7518     + unsigned int hash;
7519     + int index;
7520     + int err = 0;
7521     + struct list_head *head;
7522     +
7523     + BUG_ON(namelen <= 0);
7524     +
7525     + hash = full_name_hash(name, namelen);
7526     + index = hash % rdstate->size;
7527     + head = &(rdstate->list[index]);
7528     +
7529     + new = kmem_cache_alloc(unionfs_filldir_cachep, GFP_KERNEL);
7530     + if (unlikely(!new)) {
7531     + err = -ENOMEM;
7532     + goto out;
7533     + }
7534     +
7535     + INIT_LIST_HEAD(&new->file_list);
7536     + new->namelen = namelen;
7537     + new->hash = hash;
7538     + new->bindex = bindex;
7539     + new->whiteout = whiteout;
7540     +
7541     + if (namelen < DNAME_INLINE_LEN_MIN) {
7542     + new->name = new->iname;
7543     + } else {
7544     + new->name = kmalloc(namelen + 1, GFP_KERNEL);
7545     + if (unlikely(!new->name)) {
7546     + kmem_cache_free(unionfs_filldir_cachep, new);
7547     + new = NULL;
7548     + goto out;
7549     + }
7550     + }
7551     +
7552     + memcpy(new->name, name, namelen);
7553     + new->name[namelen] = '\0';
7554     +
7555     + rdstate->hashentries++;
7556     +
7557     + list_add(&(new->file_list), head);
7558     +out:
7559     + return err;
7560     +}
7561     diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
7562     new file mode 100644
7563     index 0000000..936700e
7564     --- /dev/null
7565     +++ b/fs/unionfs/rename.c
7566     @@ -0,0 +1,517 @@
7567     +/*
7568     + * Copyright (c) 2003-2010 Erez Zadok
7569     + * Copyright (c) 2003-2006 Charles P. Wright
7570     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7571     + * Copyright (c) 2005-2006 Junjiro Okajima
7572     + * Copyright (c) 2005 Arun M. Krishnakumar
7573     + * Copyright (c) 2004-2006 David P. Quigley
7574     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7575     + * Copyright (c) 2003 Puja Gupta
7576     + * Copyright (c) 2003 Harikesavan Krishnan
7577     + * Copyright (c) 2003-2010 Stony Brook University
7578     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
7579     + *
7580     + * This program is free software; you can redistribute it and/or modify
7581     + * it under the terms of the GNU General Public License version 2 as
7582     + * published by the Free Software Foundation.
7583     + */
7584     +
7585     +#include "union.h"
7586     +
7587     +/*
7588     + * This is a helper function for rename, used when rename ends up with hosed
7589     + * over dentries and we need to revert.
7590     + */
7591     +static int unionfs_refresh_lower_dentry(struct dentry *dentry,
7592     + struct dentry *parent, int bindex)
7593     +{
7594     + struct dentry *lower_dentry;
7595     + struct dentry *lower_parent;
7596     + int err = 0;
7597     +
7598     + verify_locked(dentry);
7599     +
7600     + lower_parent = unionfs_lower_dentry_idx(parent, bindex);
7601     +
7602     + BUG_ON(!S_ISDIR(lower_parent->d_inode->i_mode));
7603     +
7604     + lower_dentry = lookup_one_len(dentry->d_name.name, lower_parent,
7605     + dentry->d_name.len);
7606     + if (IS_ERR(lower_dentry)) {
7607     + err = PTR_ERR(lower_dentry);
7608     + goto out;
7609     + }
7610     +
7611     + dput(unionfs_lower_dentry_idx(dentry, bindex));
7612     + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
7613     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex, NULL);
7614     +
7615     + if (!lower_dentry->d_inode) {
7616     + dput(lower_dentry);
7617     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
7618     + } else {
7619     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
7620     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
7621     + igrab(lower_dentry->d_inode));
7622     + }
7623     +
7624     +out:
7625     + return err;
7626     +}
7627     +
7628     +static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7629     + struct dentry *old_parent,
7630     + struct inode *new_dir, struct dentry *new_dentry,
7631     + struct dentry *new_parent,
7632     + int bindex)
7633     +{
7634     + int err = 0;
7635     + struct dentry *lower_old_dentry;
7636     + struct dentry *lower_new_dentry;
7637     + struct dentry *lower_old_dir_dentry;
7638     + struct dentry *lower_new_dir_dentry;
7639     + struct dentry *trap;
7640     +
7641     + lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7642     + lower_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
7643     +
7644     + if (!lower_new_dentry) {
7645     + lower_new_dentry =
7646     + create_parents(new_parent->d_inode,
7647     + new_dentry, new_dentry->d_name.name,
7648     + bindex);
7649     + if (IS_ERR(lower_new_dentry)) {
7650     + err = PTR_ERR(lower_new_dentry);
7651     + if (IS_COPYUP_ERR(err))
7652     + goto out;
7653     + printk(KERN_ERR "unionfs: error creating directory "
7654     + "tree for rename, bindex=%d err=%d\n",
7655     + bindex, err);
7656     + goto out;
7657     + }
7658     + }
7659     +
7660     + /* check for and remove whiteout, if any */
7661     + err = check_unlink_whiteout(new_dentry, lower_new_dentry, bindex);
7662     + if (err > 0) /* ignore if whiteout found and successfully removed */
7663     + err = 0;
7664     + if (err)
7665     + goto out;
7666     +
7667     + /* check of old_dentry branch is writable */
7668     + err = is_robranch_super(old_dentry->d_sb, bindex);
7669     + if (err)
7670     + goto out;
7671     +
7672     + dget(lower_old_dentry);
7673     + dget(lower_new_dentry);
7674     + lower_old_dir_dentry = dget_parent(lower_old_dentry);
7675     + lower_new_dir_dentry = dget_parent(lower_new_dentry);
7676     +
7677     + trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7678     + /* source should not be ancenstor of target */
7679     + if (trap == lower_old_dentry) {
7680     + err = -EINVAL;
7681     + goto out_err_unlock;
7682     + }
7683     + /* target should not be ancenstor of source */
7684     + if (trap == lower_new_dentry) {
7685     + err = -ENOTEMPTY;
7686     + goto out_err_unlock;
7687     + }
7688     + err = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
7689     + lower_new_dir_dentry->d_inode, lower_new_dentry);
7690     +out_err_unlock:
7691     + if (!err) {
7692     + /* update parent dir times */
7693     + fsstack_copy_attr_times(old_dir, lower_old_dir_dentry->d_inode);
7694     + fsstack_copy_attr_times(new_dir, lower_new_dir_dentry->d_inode);
7695     + }
7696     + unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7697     +
7698     + dput(lower_old_dir_dentry);
7699     + dput(lower_new_dir_dentry);
7700     + dput(lower_old_dentry);
7701     + dput(lower_new_dentry);
7702     +
7703     +out:
7704     + if (!err) {
7705     + /* Fixup the new_dentry. */
7706     + if (bindex < dbstart(new_dentry))
7707     + dbstart(new_dentry) = bindex;
7708     + else if (bindex > dbend(new_dentry))
7709     + dbend(new_dentry) = bindex;
7710     + }
7711     +
7712     + return err;
7713     +}
7714     +
7715     +/*
7716     + * Main rename code. This is sufficiently complex, that it's documented in
7717     + * Documentation/filesystems/unionfs/rename.txt. This routine calls
7718     + * __unionfs_rename() above to perform some of the work.
7719     + */
7720     +static int do_unionfs_rename(struct inode *old_dir,
7721     + struct dentry *old_dentry,
7722     + struct dentry *old_parent,
7723     + struct inode *new_dir,
7724     + struct dentry *new_dentry,
7725     + struct dentry *new_parent)
7726     +{
7727     + int err = 0;
7728     + int bindex;
7729     + int old_bstart, old_bend;
7730     + int new_bstart, new_bend;
7731     + int do_copyup = -1;
7732     + int local_err = 0;
7733     + int eio = 0;
7734     + int revert = 0;
7735     +
7736     + old_bstart = dbstart(old_dentry);
7737     + old_bend = dbend(old_dentry);
7738     +
7739     + new_bstart = dbstart(new_dentry);
7740     + new_bend = dbend(new_dentry);
7741     +
7742     + /* Rename source to destination. */
7743     + err = __unionfs_rename(old_dir, old_dentry, old_parent,
7744     + new_dir, new_dentry, new_parent,
7745     + old_bstart);
7746     + if (err) {
7747     + if (!IS_COPYUP_ERR(err))
7748     + goto out;
7749     + do_copyup = old_bstart - 1;
7750     + } else {
7751     + revert = 1;
7752     + }
7753     +
7754     + /*
7755     + * Unlink all instances of destination that exist to the left of
7756     + * bstart of source. On error, revert back, goto out.
7757     + */
7758     + for (bindex = old_bstart - 1; bindex >= new_bstart; bindex--) {
7759     + struct dentry *unlink_dentry;
7760     + struct dentry *unlink_dir_dentry;
7761     +
7762     + BUG_ON(bindex < 0);
7763     + unlink_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7764     + if (!unlink_dentry)
7765     + continue;
7766     +
7767     + unlink_dir_dentry = lock_parent(unlink_dentry);
7768     + err = is_robranch_super(old_dir->i_sb, bindex);
7769     + if (!err)
7770     + err = vfs_unlink(unlink_dir_dentry->d_inode,
7771     + unlink_dentry);
7772     +
7773     + fsstack_copy_attr_times(new_parent->d_inode,
7774     + unlink_dir_dentry->d_inode);
7775     + /* propagate number of hard-links */
7776     + new_parent->d_inode->i_nlink =
7777     + unionfs_get_nlinks(new_parent->d_inode);
7778     +
7779     + unlock_dir(unlink_dir_dentry);
7780     + if (!err) {
7781     + if (bindex != new_bstart) {
7782     + dput(unlink_dentry);
7783     + unionfs_set_lower_dentry_idx(new_dentry,
7784     + bindex, NULL);
7785     + }
7786     + } else if (IS_COPYUP_ERR(err)) {
7787     + do_copyup = bindex - 1;
7788     + } else if (revert) {
7789     + goto revert;
7790     + }
7791     + }
7792     +
7793     + if (do_copyup != -1) {
7794     + for (bindex = do_copyup; bindex >= 0; bindex--) {
7795     + /*
7796     + * copyup the file into some left directory, so that
7797     + * you can rename it
7798     + */
7799     + err = copyup_dentry(old_parent->d_inode,
7800     + old_dentry, old_bstart, bindex,
7801     + old_dentry->d_name.name,
7802     + old_dentry->d_name.len, NULL,
7803     + i_size_read(old_dentry->d_inode));
7804     + /* if copyup failed, try next branch to the left */
7805     + if (err)
7806     + continue;
7807     + /*
7808     + * create whiteout before calling __unionfs_rename
7809     + * because the latter will change the old_dentry's
7810     + * lower name and parent dir, resulting in the
7811     + * whiteout getting created in the wrong dir.
7812     + */
7813     + err = create_whiteout(old_dentry, bindex);
7814     + if (err) {
7815     + printk(KERN_ERR "unionfs: can't create a "
7816     + "whiteout for %s in rename (err=%d)\n",
7817     + old_dentry->d_name.name, err);
7818     + continue;
7819     + }
7820     + err = __unionfs_rename(old_dir, old_dentry, old_parent,
7821     + new_dir, new_dentry, new_parent,
7822     + bindex);
7823     + break;
7824     + }
7825     + }
7826     +
7827     + /* make it opaque */
7828     + if (S_ISDIR(old_dentry->d_inode->i_mode)) {
7829     + err = make_dir_opaque(old_dentry, dbstart(old_dentry));
7830     + if (err)
7831     + goto revert;
7832     + }
7833     +
7834     + /*
7835     + * Create whiteout for source, only if:
7836     + * (1) There is more than one underlying instance of source.
7837     + * (We did a copy_up is taken care of above).
7838     + */
7839     + if ((old_bstart != old_bend) && (do_copyup == -1)) {
7840     + err = create_whiteout(old_dentry, old_bstart);
7841     + if (err) {
7842     + /* can't fix anything now, so we exit with -EIO */
7843     + printk(KERN_ERR "unionfs: can't create a whiteout for "
7844     + "%s in rename!\n", old_dentry->d_name.name);
7845     + err = -EIO;
7846     + }
7847     + }
7848     +
7849     +out:
7850     + return err;
7851     +
7852     +revert:
7853     + /* Do revert here. */
7854     + local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
7855     + old_bstart);
7856     + if (local_err) {
7857     + printk(KERN_ERR "unionfs: revert failed in rename: "
7858     + "the new refresh failed\n");
7859     + eio = -EIO;
7860     + }
7861     +
7862     + local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
7863     + old_bstart);
7864     + if (local_err) {
7865     + printk(KERN_ERR "unionfs: revert failed in rename: "
7866     + "the old refresh failed\n");
7867     + eio = -EIO;
7868     + goto revert_out;
7869     + }
7870     +
7871     + if (!unionfs_lower_dentry_idx(new_dentry, bindex) ||
7872     + !unionfs_lower_dentry_idx(new_dentry, bindex)->d_inode) {
7873     + printk(KERN_ERR "unionfs: revert failed in rename: "
7874     + "the object disappeared from under us!\n");
7875     + eio = -EIO;
7876     + goto revert_out;
7877     + }
7878     +
7879     + if (unionfs_lower_dentry_idx(old_dentry, bindex) &&
7880     + unionfs_lower_dentry_idx(old_dentry, bindex)->d_inode) {
7881     + printk(KERN_ERR "unionfs: revert failed in rename: "
7882     + "the object was created underneath us!\n");
7883     + eio = -EIO;
7884     + goto revert_out;
7885     + }
7886     +
7887     + local_err = __unionfs_rename(new_dir, new_dentry, new_parent,
7888     + old_dir, old_dentry, old_parent,
7889     + old_bstart);
7890     +
7891     + /* If we can't fix it, then we cop-out with -EIO. */
7892     + if (local_err) {
7893     + printk(KERN_ERR "unionfs: revert failed in rename!\n");
7894     + eio = -EIO;
7895     + }
7896     +
7897     + local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
7898     + bindex);
7899     + if (local_err)
7900     + eio = -EIO;
7901     + local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
7902     + bindex);
7903     + if (local_err)
7904     + eio = -EIO;
7905     +
7906     +revert_out:
7907     + if (eio)
7908     + err = eio;
7909     + return err;
7910     +}
7911     +
7912     +/*
7913     + * We can't copyup a directory, because it may involve huge numbers of
7914     + * children, etc. Doing that in the kernel would be bad, so instead we
7915     + * return EXDEV to the user-space utility that caused this, and let the
7916     + * user-space recurse and ask us to copy up each file separately.
7917     + */
7918     +static int may_rename_dir(struct dentry *dentry, struct dentry *parent)
7919     +{
7920     + int err, bstart;
7921     +
7922     + err = check_empty(dentry, parent, NULL);
7923     + if (err == -ENOTEMPTY) {
7924     + if (is_robranch(dentry))
7925     + return -EXDEV;
7926     + } else if (err) {
7927     + return err;
7928     + }
7929     +
7930     + bstart = dbstart(dentry);
7931     + if (dbend(dentry) == bstart || dbopaque(dentry) == bstart)
7932     + return 0;
7933     +
7934     + dbstart(dentry) = bstart + 1;
7935     + err = check_empty(dentry, parent, NULL);
7936     + dbstart(dentry) = bstart;
7937     + if (err == -ENOTEMPTY)
7938     + err = -EXDEV;
7939     + return err;
7940     +}
7941     +
7942     +/*
7943     + * The locking rules in unionfs_rename are complex. We could use a simpler
7944     + * superblock-level name-space lock for renames and copy-ups.
7945     + */
7946     +int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7947     + struct inode *new_dir, struct dentry *new_dentry)
7948     +{
7949     + int err = 0;
7950     + struct dentry *wh_dentry;
7951     + struct dentry *old_parent, *new_parent;
7952     + int valid = true;
7953     +
7954     + unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
7955     + old_parent = dget_parent(old_dentry);
7956     + new_parent = dget_parent(new_dentry);
7957     + /* un/lock parent dentries only if they differ from old/new_dentry */
7958     + if (old_parent != old_dentry &&
7959     + old_parent != new_dentry)
7960     + unionfs_lock_dentry(old_parent, UNIONFS_DMUTEX_REVAL_PARENT);
7961     + if (new_parent != old_dentry &&
7962     + new_parent != new_dentry &&
7963     + new_parent != old_parent)
7964     + unionfs_lock_dentry(new_parent, UNIONFS_DMUTEX_REVAL_CHILD);
7965     + unionfs_double_lock_dentry(old_dentry, new_dentry);
7966     +
7967     + valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
7968     + if (!valid) {
7969     + err = -ESTALE;
7970     + goto out;
7971     + }
7972     + if (!d_deleted(new_dentry) && new_dentry->d_inode) {
7973     + valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
7974     + if (!valid) {
7975     + err = -ESTALE;
7976     + goto out;
7977     + }
7978     + }
7979     +
7980     + if (!S_ISDIR(old_dentry->d_inode->i_mode))
7981     + err = unionfs_partial_lookup(old_dentry, old_parent);
7982     + else
7983     + err = may_rename_dir(old_dentry, old_parent);
7984     +
7985     + if (err)
7986     + goto out;
7987     +
7988     + err = unionfs_partial_lookup(new_dentry, new_parent);
7989     + if (err)
7990     + goto out;
7991     +
7992     + /*
7993     + * if new_dentry is already lower because of whiteout,
7994     + * simply override it even if the whited-out dir is not empty.
7995     + */
7996     + wh_dentry = find_first_whiteout(new_dentry);
7997     + if (!IS_ERR(wh_dentry)) {
7998     + dput(wh_dentry);
7999     + } else if (new_dentry->d_inode) {
8000     + if (S_ISDIR(old_dentry->d_inode->i_mode) !=
8001     + S_ISDIR(new_dentry->d_inode->i_mode)) {
8002     + err = S_ISDIR(old_dentry->d_inode->i_mode) ?
8003     + -ENOTDIR : -EISDIR;
8004     + goto out;
8005     + }
8006     +
8007     + if (S_ISDIR(new_dentry->d_inode->i_mode)) {
8008     + struct unionfs_dir_state *namelist = NULL;
8009     + /* check if this unionfs directory is empty or not */
8010     + err = check_empty(new_dentry, new_parent, &namelist);
8011     + if (err)
8012     + goto out;
8013     +
8014     + if (!is_robranch(new_dentry))
8015     + err = delete_whiteouts(new_dentry,
8016     + dbstart(new_dentry),
8017     + namelist);
8018     +
8019     + free_rdstate(namelist);
8020     +
8021     + if (err)
8022     + goto out;
8023     + }
8024     + }
8025     +
8026     + err = do_unionfs_rename(old_dir, old_dentry, old_parent,
8027     + new_dir, new_dentry, new_parent);
8028     + if (err)
8029     + goto out;
8030     +
8031     + /*
8032     + * force re-lookup since the dir on ro branch is not renamed, and
8033     + * lower dentries still indicate the un-renamed ones.
8034     + */
8035     + if (S_ISDIR(old_dentry->d_inode->i_mode))
8036     + atomic_dec(&UNIONFS_D(old_dentry)->generation);
8037     + else
8038     + unionfs_postcopyup_release(old_dentry);
8039     + if (new_dentry->d_inode && !S_ISDIR(new_dentry->d_inode->i_mode)) {
8040     + unionfs_postcopyup_release(new_dentry);
8041     + unionfs_postcopyup_setmnt(new_dentry);
8042     + if (!unionfs_lower_inode(new_dentry->d_inode)) {
8043     + /*
8044     + * If we get here, it means that no copyup was
8045     + * needed, and that a file by the old name already
8046     + * existing on the destination branch; that file got
8047     + * renamed earlier in this function, so all we need
8048     + * to do here is set the lower inode.
8049     + */
8050     + struct inode *inode;
8051     + inode = unionfs_lower_inode(old_dentry->d_inode);
8052     + igrab(inode);
8053     + unionfs_set_lower_inode_idx(new_dentry->d_inode,
8054     + dbstart(new_dentry),
8055     + inode);
8056     + }
8057     + }
8058     + /* if all of this renaming succeeded, update our times */
8059     + unionfs_copy_attr_times(old_dentry->d_inode);
8060     + unionfs_copy_attr_times(new_dentry->d_inode);
8061     + unionfs_check_inode(old_dir);
8062     + unionfs_check_inode(new_dir);
8063     + unionfs_check_dentry(old_dentry);
8064     + unionfs_check_dentry(new_dentry);
8065     +
8066     +out:
8067     + if (err) /* clear the new_dentry stuff created */
8068     + d_drop(new_dentry);
8069     +
8070     + unionfs_double_unlock_dentry(old_dentry, new_dentry);
8071     + if (new_parent != old_dentry &&
8072     + new_parent != new_dentry &&
8073     + new_parent != old_parent)
8074     + unionfs_unlock_dentry(new_parent);
8075     + if (old_parent != old_dentry &&
8076     + old_parent != new_dentry)
8077     + unionfs_unlock_dentry(old_parent);
8078     + dput(new_parent);
8079     + dput(old_parent);
8080     + unionfs_read_unlock(old_dentry->d_sb);
8081     +
8082     + return err;
8083     +}
8084     diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
8085     new file mode 100644
8086     index 0000000..760c580
8087     --- /dev/null
8088     +++ b/fs/unionfs/sioq.c
8089     @@ -0,0 +1,101 @@
8090     +/*
8091     + * Copyright (c) 2006-2010 Erez Zadok
8092     + * Copyright (c) 2006 Charles P. Wright
8093     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8094     + * Copyright (c) 2006 Junjiro Okajima
8095     + * Copyright (c) 2006 David P. Quigley
8096     + * Copyright (c) 2006-2010 Stony Brook University
8097     + * Copyright (c) 2006-2010 The Research Foundation of SUNY
8098     + *
8099     + * This program is free software; you can redistribute it and/or modify
8100     + * it under the terms of the GNU General Public License version 2 as
8101     + * published by the Free Software Foundation.
8102     + */
8103     +
8104     +#include "union.h"
8105     +
8106     +/*
8107     + * Super-user IO work Queue - sometimes we need to perform actions which
8108     + * would fail due to the unix permissions on the parent directory (e.g.,
8109     + * rmdir a directory which appears empty, but in reality contains
8110     + * whiteouts).
8111     + */
8112     +
8113     +static struct workqueue_struct *superio_workqueue;
8114     +
8115     +int __init init_sioq(void)
8116     +{
8117     + int err;
8118     +
8119     + superio_workqueue = create_workqueue("unionfs_siod");
8120     + if (!IS_ERR(superio_workqueue))
8121     + return 0;
8122     +
8123     + err = PTR_ERR(superio_workqueue);
8124     + printk(KERN_ERR "unionfs: create_workqueue failed %d\n", err);
8125     + superio_workqueue = NULL;
8126     + return err;
8127     +}
8128     +
8129     +void stop_sioq(void)
8130     +{
8131     + if (superio_workqueue)
8132     + destroy_workqueue(superio_workqueue);
8133     +}
8134     +
8135     +void run_sioq(work_func_t func, struct sioq_args *args)
8136     +{
8137     + INIT_WORK(&args->work, func);
8138     +
8139     + init_completion(&args->comp);
8140     + while (!queue_work(superio_workqueue, &args->work)) {
8141     + /* TODO: do accounting if needed */
8142     + schedule();
8143     + }
8144     + wait_for_completion(&args->comp);
8145     +}
8146     +
8147     +void __unionfs_create(struct work_struct *work)
8148     +{
8149     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8150     + struct create_args *c = &args->create;
8151     +
8152     + args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
8153     + complete(&args->comp);
8154     +}
8155     +
8156     +void __unionfs_mkdir(struct work_struct *work)
8157     +{
8158     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8159     + struct mkdir_args *m = &args->mkdir;
8160     +
8161     + args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
8162     + complete(&args->comp);
8163     +}
8164     +
8165     +void __unionfs_mknod(struct work_struct *work)
8166     +{
8167     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8168     + struct mknod_args *m = &args->mknod;
8169     +
8170     + args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
8171     + complete(&args->comp);
8172     +}
8173     +
8174     +void __unionfs_symlink(struct work_struct *work)
8175     +{
8176     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8177     + struct symlink_args *s = &args->symlink;
8178     +
8179     + args->err = vfs_symlink(s->parent, s->dentry, s->symbuf);
8180     + complete(&args->comp);
8181     +}
8182     +
8183     +void __unionfs_unlink(struct work_struct *work)
8184     +{
8185     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8186     + struct unlink_args *u = &args->unlink;
8187     +
8188     + args->err = vfs_unlink(u->parent, u->dentry);
8189     + complete(&args->comp);
8190     +}
8191     diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
8192     new file mode 100644
8193     index 0000000..b26d248
8194     --- /dev/null
8195     +++ b/fs/unionfs/sioq.h
8196     @@ -0,0 +1,91 @@
8197     +/*
8198     + * Copyright (c) 2006-2010 Erez Zadok
8199     + * Copyright (c) 2006 Charles P. Wright
8200     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8201     + * Copyright (c) 2006 Junjiro Okajima
8202     + * Copyright (c) 2006 David P. Quigley
8203     + * Copyright (c) 2006-2010 Stony Brook University
8204     + * Copyright (c) 2006-2010 The Research Foundation of SUNY
8205     + *
8206     + * This program is free software; you can redistribute it and/or modify
8207     + * it under the terms of the GNU General Public License version 2 as
8208     + * published by the Free Software Foundation.
8209     + */
8210     +
8211     +#ifndef _SIOQ_H
8212     +#define _SIOQ_H
8213     +
8214     +struct deletewh_args {
8215     + struct unionfs_dir_state *namelist;
8216     + struct dentry *dentry;
8217     + int bindex;
8218     +};
8219     +
8220     +struct is_opaque_args {
8221     + struct dentry *dentry;
8222     +};
8223     +
8224     +struct create_args {
8225     + struct inode *parent;
8226     + struct dentry *dentry;
8227     + umode_t mode;
8228     + struct nameidata *nd;
8229     +};
8230     +
8231     +struct mkdir_args {
8232     + struct inode *parent;
8233     + struct dentry *dentry;
8234     + umode_t mode;
8235     +};
8236     +
8237     +struct mknod_args {
8238     + struct inode *parent;
8239     + struct dentry *dentry;
8240     + umode_t mode;
8241     + dev_t dev;
8242     +};
8243     +
8244     +struct symlink_args {
8245     + struct inode *parent;
8246     + struct dentry *dentry;
8247     + char *symbuf;
8248     +};
8249     +
8250     +struct unlink_args {
8251     + struct inode *parent;
8252     + struct dentry *dentry;
8253     +};
8254     +
8255     +
8256     +struct sioq_args {
8257     + struct completion comp;
8258     + struct work_struct work;
8259     + int err;
8260     + void *ret;
8261     +
8262     + union {
8263     + struct deletewh_args deletewh;
8264     + struct is_opaque_args is_opaque;
8265     + struct create_args create;
8266     + struct mkdir_args mkdir;
8267     + struct mknod_args mknod;
8268     + struct symlink_args symlink;
8269     + struct unlink_args unlink;
8270     + };
8271     +};
8272     +
8273     +/* Extern definitions for SIOQ functions */
8274     +extern int __init init_sioq(void);
8275     +extern void stop_sioq(void);
8276     +extern void run_sioq(work_func_t func, struct sioq_args *args);
8277     +
8278     +/* Extern definitions for our privilege escalation helpers */
8279     +extern void __unionfs_create(struct work_struct *work);
8280     +extern void __unionfs_mkdir(struct work_struct *work);
8281     +extern void __unionfs_mknod(struct work_struct *work);
8282     +extern void __unionfs_symlink(struct work_struct *work);
8283     +extern void __unionfs_unlink(struct work_struct *work);
8284     +extern void __delete_whiteouts(struct work_struct *work);
8285     +extern void __is_opaque_dir(struct work_struct *work);
8286     +
8287     +#endif /* not _SIOQ_H */
8288     diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
8289     new file mode 100644
8290     index 0000000..570a344
8291     --- /dev/null
8292     +++ b/fs/unionfs/subr.c
8293     @@ -0,0 +1,95 @@
8294     +/*
8295     + * Copyright (c) 2003-2010 Erez Zadok
8296     + * Copyright (c) 2003-2006 Charles P. Wright
8297     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8298     + * Copyright (c) 2005-2006 Junjiro Okajima
8299     + * Copyright (c) 2005 Arun M. Krishnakumar
8300     + * Copyright (c) 2004-2006 David P. Quigley
8301     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8302     + * Copyright (c) 2003 Puja Gupta
8303     + * Copyright (c) 2003 Harikesavan Krishnan
8304     + * Copyright (c) 2003-2010 Stony Brook University
8305     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
8306     + *
8307     + * This program is free software; you can redistribute it and/or modify
8308     + * it under the terms of the GNU General Public License version 2 as
8309     + * published by the Free Software Foundation.
8310     + */
8311     +
8312     +#include "union.h"
8313     +
8314     +/*
8315     + * returns the right n_link value based on the inode type
8316     + */
8317     +int unionfs_get_nlinks(const struct inode *inode)
8318     +{
8319     + /* don't bother to do all the work since we're unlinked */
8320     + if (inode->i_nlink == 0)
8321     + return 0;
8322     +
8323     + if (!S_ISDIR(inode->i_mode))
8324     + return unionfs_lower_inode(inode)->i_nlink;
8325     +
8326     + /*
8327     + * For directories, we return 1. The only place that could cares
8328     + * about links is readdir, and there's d_type there so even that
8329     + * doesn't matter.
8330     + */
8331     + return 1;
8332     +}
8333     +
8334     +/* copy a/m/ctime from the lower branch with the newest times */
8335     +void unionfs_copy_attr_times(struct inode *upper)
8336     +{
8337     + int bindex;
8338     + struct inode *lower;
8339     +
8340     + if (!upper)
8341     + return;
8342     + if (ibstart(upper) < 0) {
8343     +#ifdef CONFIG_UNION_FS_DEBUG
8344     + WARN_ON(ibstart(upper) < 0);
8345     +#endif /* CONFIG_UNION_FS_DEBUG */
8346     + return;
8347     + }
8348     + for (bindex = ibstart(upper); bindex <= ibend(upper); bindex++) {
8349     + lower = unionfs_lower_inode_idx(upper, bindex);
8350     + if (!lower)
8351     + continue; /* not all lower dir objects may exist */
8352     + if (unlikely(timespec_compare(&upper->i_mtime,
8353     + &lower->i_mtime) < 0))
8354     + upper->i_mtime = lower->i_mtime;
8355     + if (unlikely(timespec_compare(&upper->i_ctime,
8356     + &lower->i_ctime) < 0))
8357     + upper->i_ctime = lower->i_ctime;
8358     + if (unlikely(timespec_compare(&upper->i_atime,
8359     + &lower->i_atime) < 0))
8360     + upper->i_atime = lower->i_atime;
8361     + }
8362     +}
8363     +
8364     +/*
8365     + * A unionfs/fanout version of fsstack_copy_attr_all. Uses a
8366     + * unionfs_get_nlinks to properly calcluate the number of links to a file.
8367     + * Also, copies the max() of all a/m/ctimes for all lower inodes (which is
8368     + * important if the lower inode is a directory type)
8369     + */
8370     +void unionfs_copy_attr_all(struct inode *dest,
8371     + const struct inode *src)
8372     +{
8373     + dest->i_mode = src->i_mode;
8374     + dest->i_uid = src->i_uid;
8375     + dest->i_gid = src->i_gid;
8376     + dest->i_rdev = src->i_rdev;
8377     +
8378     + unionfs_copy_attr_times(dest);
8379     +
8380     + dest->i_blkbits = src->i_blkbits;
8381     + dest->i_flags = src->i_flags;
8382     +
8383     + /*
8384     + * Update the nlinks AFTER updating the above fields, because the
8385     + * get_links callback may depend on them.
8386     + */
8387     + dest->i_nlink = unionfs_get_nlinks(dest);
8388     +}
8389     diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
8390     new file mode 100644
8391     index 0000000..bd058fe
8392     --- /dev/null
8393     +++ b/fs/unionfs/super.c
8394     @@ -0,0 +1,1047 @@
8395     +/*
8396     + * Copyright (c) 2003-2010 Erez Zadok
8397     + * Copyright (c) 2003-2006 Charles P. Wright
8398     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8399     + * Copyright (c) 2005-2006 Junjiro Okajima
8400     + * Copyright (c) 2005 Arun M. Krishnakumar
8401     + * Copyright (c) 2004-2006 David P. Quigley
8402     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8403     + * Copyright (c) 2003 Puja Gupta
8404     + * Copyright (c) 2003 Harikesavan Krishnan
8405     + * Copyright (c) 2003-2010 Stony Brook University
8406     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
8407     + *
8408     + * This program is free software; you can redistribute it and/or modify
8409     + * it under the terms of the GNU General Public License version 2 as
8410     + * published by the Free Software Foundation.
8411     + */
8412     +
8413     +#include "union.h"
8414     +
8415     +/*
8416     + * The inode cache is used with alloc_inode for both our inode info and the
8417     + * vfs inode.
8418     + */
8419     +static struct kmem_cache *unionfs_inode_cachep;
8420     +
8421     +struct inode *unionfs_iget(struct super_block *sb, unsigned long ino)
8422     +{
8423     + int size;
8424     + struct unionfs_inode_info *info;
8425     + struct inode *inode;
8426     +
8427     + inode = iget_locked(sb, ino);
8428     + if (!inode)
8429     + return ERR_PTR(-ENOMEM);
8430     + if (!(inode->i_state & I_NEW))
8431     + return inode;
8432     +
8433     + info = UNIONFS_I(inode);
8434     + memset(info, 0, offsetof(struct unionfs_inode_info, vfs_inode));
8435     + info->bstart = -1;
8436     + info->bend = -1;
8437     + atomic_set(&info->generation,
8438     + atomic_read(&UNIONFS_SB(inode->i_sb)->generation));
8439     + spin_lock_init(&info->rdlock);
8440     + info->rdcount = 1;
8441     + info->hashsize = -1;
8442     + INIT_LIST_HEAD(&info->readdircache);
8443     +
8444     + size = sbmax(inode->i_sb) * sizeof(struct inode *);
8445     + info->lower_inodes = kzalloc(size, GFP_KERNEL);
8446     + if (unlikely(!info->lower_inodes)) {
8447     + printk(KERN_CRIT "unionfs: no kernel memory when allocating "
8448     + "lower-pointer array!\n");
8449     + iget_failed(inode);
8450     + return ERR_PTR(-ENOMEM);
8451     + }
8452     +
8453     + inode->i_version++;
8454     + inode->i_op = &unionfs_main_iops;
8455     + inode->i_fop = &unionfs_main_fops;
8456     +
8457     + inode->i_mapping->a_ops = &unionfs_aops;
8458     +
8459     + /*
8460     + * reset times so unionfs_copy_attr_all can keep out time invariants
8461     + * right (upper inode time being the max of all lower ones).
8462     + */
8463     + inode->i_atime.tv_sec = inode->i_atime.tv_nsec = 0;
8464     + inode->i_mtime.tv_sec = inode->i_mtime.tv_nsec = 0;
8465     + inode->i_ctime.tv_sec = inode->i_ctime.tv_nsec = 0;
8466     + unlock_new_inode(inode);
8467     + return inode;
8468     +}
8469     +
8470     +/*
8471     + * we now define delete_inode, because there are two VFS paths that may
8472     + * destroy an inode: one of them calls clear inode before doing everything
8473     + * else that's needed, and the other is fine. This way we truncate the inode
8474     + * size (and its pages) and then clear our own inode, which will do an iput
8475     + * on our and the lower inode.
8476     + *
8477     + * No need to lock sb info's rwsem.
8478     + */
8479     +static void unionfs_delete_inode(struct inode *inode)
8480     +{
8481     +#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
8482     + spin_lock(&inode->i_lock);
8483     +#endif
8484     + i_size_write(inode, 0); /* every f/s seems to do that */
8485     +#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
8486     + spin_unlock(&inode->i_lock);
8487     +#endif
8488     +
8489     + if (inode->i_data.nrpages)
8490     + truncate_inode_pages(&inode->i_data, 0);
8491     +
8492     + clear_inode(inode);
8493     +}
8494     +
8495     +/*
8496     + * final actions when unmounting a file system
8497     + *
8498     + * No need to lock rwsem.
8499     + */
8500     +static void unionfs_put_super(struct super_block *sb)
8501     +{
8502     + int bindex, bstart, bend;
8503     + struct unionfs_sb_info *spd;
8504     + int leaks = 0;
8505     +
8506     + spd = UNIONFS_SB(sb);
8507     + if (!spd)
8508     + return;
8509     +
8510     + bstart = sbstart(sb);
8511     + bend = sbend(sb);
8512     +
8513     + /* Make sure we have no leaks of branchget/branchput. */
8514     + for (bindex = bstart; bindex <= bend; bindex++)
8515     + if (unlikely(branch_count(sb, bindex) != 0)) {
8516     + printk(KERN_CRIT
8517     + "unionfs: branch %d has %d references left!\n",
8518     + bindex, branch_count(sb, bindex));
8519     + leaks = 1;
8520     + }
8521     + WARN_ON(leaks != 0);
8522     +
8523     + /* decrement lower super references */
8524     + for (bindex = bstart; bindex <= bend; bindex++) {
8525     + struct super_block *s;
8526     + s = unionfs_lower_super_idx(sb, bindex);
8527     + unionfs_set_lower_super_idx(sb, bindex, NULL);
8528     + atomic_dec(&s->s_active);
8529     + }
8530     +
8531     + kfree(spd->dev_name);
8532     + kfree(spd->data);
8533     + kfree(spd);
8534     + sb->s_fs_info = NULL;
8535     +}
8536     +
8537     +/*
8538     + * Since people use this to answer the "How big of a file can I write?"
8539     + * question, we report the size of the highest priority branch as the size of
8540     + * the union.
8541     + */
8542     +static int unionfs_statfs(struct dentry *dentry, struct kstatfs *buf)
8543     +{
8544     + int err = 0;
8545     + struct super_block *sb;
8546     + struct dentry *lower_dentry;
8547     + struct dentry *parent;
8548     + bool valid;
8549     +
8550     + sb = dentry->d_sb;
8551     +
8552     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
8553     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
8554     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
8555     +
8556     + valid = __unionfs_d_revalidate(dentry, parent, false);
8557     + if (unlikely(!valid)) {
8558     + err = -ESTALE;
8559     + goto out;
8560     + }
8561     + unionfs_check_dentry(dentry);
8562     +
8563     + lower_dentry = unionfs_lower_dentry(sb->s_root);
8564     + err = vfs_statfs(lower_dentry, buf);
8565     +
8566     + /* set return buf to our f/s to avoid confusing user-level utils */
8567     + buf->f_type = UNIONFS_SUPER_MAGIC;
8568     + /*
8569     + * Our maximum file name can is shorter by a few bytes because every
8570     + * file name could potentially be whited-out.
8571     + *
8572     + * XXX: this restriction goes away with ODF.
8573     + */
8574     + unionfs_set_max_namelen(&buf->f_namelen);
8575     +
8576     + /*
8577     + * reset two fields to avoid confusing user-land.
8578     + * XXX: is this still necessary?
8579     + */
8580     + memset(&buf->f_fsid, 0, sizeof(__kernel_fsid_t));
8581     + memset(&buf->f_spare, 0, sizeof(buf->f_spare));
8582     +
8583     +out:
8584     + unionfs_check_dentry(dentry);
8585     + unionfs_unlock_dentry(dentry);
8586     + unionfs_unlock_parent(dentry, parent);
8587     + unionfs_read_unlock(sb);
8588     + return err;
8589     +}
8590     +
8591     +/* handle mode changing during remount */
8592     +static noinline_for_stack int do_remount_mode_option(
8593     + char *optarg,
8594     + int cur_branches,
8595     + struct unionfs_data *new_data,
8596     + struct path *new_lower_paths)
8597     +{
8598     + int err = -EINVAL;
8599     + int perms, idx;
8600     + char *modename = strchr(optarg, '=');
8601     + struct nameidata nd;
8602     +
8603     + /* by now, optarg contains the branch name */
8604     + if (!*optarg) {
8605     + printk(KERN_ERR
8606     + "unionfs: no branch specified for mode change\n");
8607     + goto out;
8608     + }
8609     + if (!modename) {
8610     + printk(KERN_ERR "unionfs: branch \"%s\" requires a mode\n",
8611     + optarg);
8612     + goto out;
8613     + }
8614     + *modename++ = '\0';
8615     + err = parse_branch_mode(modename, &perms);
8616     + if (err) {
8617     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for \"%s\"\n",
8618     + modename, optarg);
8619     + goto out;
8620     + }
8621     +
8622     + /*
8623     + * Find matching branch index. For now, this assumes that nothing
8624     + * has been mounted on top of this Unionfs stack. Once we have /odf
8625     + * and cache-coherency resolved, we'll address the branch-path
8626     + * uniqueness.
8627     + */
8628     + err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8629     + if (err) {
8630     + printk(KERN_ERR "unionfs: error accessing "
8631     + "lower directory \"%s\" (error %d)\n",
8632     + optarg, err);
8633     + goto out;
8634     + }
8635     + for (idx = 0; idx < cur_branches; idx++)
8636     + if (nd.path.mnt == new_lower_paths[idx].mnt &&
8637     + nd.path.dentry == new_lower_paths[idx].dentry)
8638     + break;
8639     + path_put(&nd.path); /* no longer needed */
8640     + if (idx == cur_branches) {
8641     + err = -ENOENT; /* err may have been reset above */
8642     + printk(KERN_ERR "unionfs: branch \"%s\" "
8643     + "not found\n", optarg);
8644     + goto out;
8645     + }
8646     + /* check/change mode for existing branch */
8647     + /* we don't warn if perms==branchperms */
8648     + new_data[idx].branchperms = perms;
8649     + err = 0;
8650     +out:
8651     + return err;
8652     +}
8653     +
8654     +/* handle branch deletion during remount */
8655     +static noinline_for_stack int do_remount_del_option(
8656     + char *optarg, int cur_branches,
8657     + struct unionfs_data *new_data,
8658     + struct path *new_lower_paths)
8659     +{
8660     + int err = -EINVAL;
8661     + int idx;
8662     + struct nameidata nd;
8663     +
8664     + /* optarg contains the branch name to delete */
8665     +
8666     + /*
8667     + * Find matching branch index. For now, this assumes that nothing
8668     + * has been mounted on top of this Unionfs stack. Once we have /odf
8669     + * and cache-coherency resolved, we'll address the branch-path
8670     + * uniqueness.
8671     + */
8672     + err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8673     + if (err) {
8674     + printk(KERN_ERR "unionfs: error accessing "
8675     + "lower directory \"%s\" (error %d)\n",
8676     + optarg, err);
8677     + goto out;
8678     + }
8679     + for (idx = 0; idx < cur_branches; idx++)
8680     + if (nd.path.mnt == new_lower_paths[idx].mnt &&
8681     + nd.path.dentry == new_lower_paths[idx].dentry)
8682     + break;
8683     + path_put(&nd.path); /* no longer needed */
8684     + if (idx == cur_branches) {
8685     + printk(KERN_ERR "unionfs: branch \"%s\" "
8686     + "not found\n", optarg);
8687     + err = -ENOENT;
8688     + goto out;
8689     + }
8690     + /* check if there are any open files on the branch to be deleted */
8691     + if (atomic_read(&new_data[idx].open_files) > 0) {
8692     + err = -EBUSY;
8693     + goto out;
8694     + }
8695     +
8696     + /*
8697     + * Now we have to delete the branch. First, release any handles it
8698     + * has. Then, move the remaining array indexes past "idx" in
8699     + * new_data and new_lower_paths one to the left. Finally, adjust
8700     + * cur_branches.
8701     + */
8702     + path_put(&new_lower_paths[idx]);
8703     +
8704     + if (idx < cur_branches - 1) {
8705     + /* if idx==cur_branches-1, we delete last branch: easy */
8706     + memmove(&new_data[idx], &new_data[idx+1],
8707     + (cur_branches - 1 - idx) *
8708     + sizeof(struct unionfs_data));
8709     + memmove(&new_lower_paths[idx], &new_lower_paths[idx+1],
8710     + (cur_branches - 1 - idx) * sizeof(struct path));
8711     + }
8712     +
8713     + err = 0;
8714     +out:
8715     + return err;
8716     +}
8717     +
8718     +/* handle branch insertion during remount */
8719     +static noinline_for_stack int do_remount_add_option(
8720     + char *optarg, int cur_branches,
8721     + struct unionfs_data *new_data,
8722     + struct path *new_lower_paths,
8723     + int *high_branch_id)
8724     +{
8725     + int err = -EINVAL;
8726     + int perms;
8727     + int idx = 0; /* default: insert at beginning */
8728     + char *new_branch , *modename = NULL;
8729     + struct nameidata nd;
8730     +
8731     + /*
8732     + * optarg can be of several forms:
8733     + *
8734     + * /bar:/foo insert /foo before /bar
8735     + * /bar:/foo=ro insert /foo in ro mode before /bar
8736     + * /foo insert /foo in the beginning (prepend)
8737     + * :/foo insert /foo at the end (append)
8738     + */
8739     + if (*optarg == ':') { /* append? */
8740     + new_branch = optarg + 1; /* skip ':' */
8741     + idx = cur_branches;
8742     + goto found_insertion_point;
8743     + }
8744     + new_branch = strchr(optarg, ':');
8745     + if (!new_branch) { /* prepend? */
8746     + new_branch = optarg;
8747     + goto found_insertion_point;
8748     + }
8749     + *new_branch++ = '\0'; /* holds path+mode of new branch */
8750     +
8751     + /*
8752     + * Find matching branch index. For now, this assumes that nothing
8753     + * has been mounted on top of this Unionfs stack. Once we have /odf
8754     + * and cache-coherency resolved, we'll address the branch-path
8755     + * uniqueness.
8756     + */
8757     + err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8758     + if (err) {
8759     + printk(KERN_ERR "unionfs: error accessing "
8760     + "lower directory \"%s\" (error %d)\n",
8761     + optarg, err);
8762     + goto out;
8763     + }
8764     + for (idx = 0; idx < cur_branches; idx++)
8765     + if (nd.path.mnt == new_lower_paths[idx].mnt &&
8766     + nd.path.dentry == new_lower_paths[idx].dentry)
8767     + break;
8768     + path_put(&nd.path); /* no longer needed */
8769     + if (idx == cur_branches) {
8770     + printk(KERN_ERR "unionfs: branch \"%s\" "
8771     + "not found\n", optarg);
8772     + err = -ENOENT;
8773     + goto out;
8774     + }
8775     +
8776     + /*
8777     + * At this point idx will hold the index where the new branch should
8778     + * be inserted before.
8779     + */
8780     +found_insertion_point:
8781     + /* find the mode for the new branch */
8782     + if (new_branch)
8783     + modename = strchr(new_branch, '=');
8784     + if (modename)
8785     + *modename++ = '\0';
8786     + if (!new_branch || !*new_branch) {
8787     + printk(KERN_ERR "unionfs: null new branch\n");
8788     + err = -EINVAL;
8789     + goto out;
8790     + }
8791     + err = parse_branch_mode(modename, &perms);
8792     + if (err) {
8793     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
8794     + "branch \"%s\"\n", modename, new_branch);
8795     + goto out;
8796     + }
8797     + err = path_lookup(new_branch, LOOKUP_FOLLOW, &nd);
8798     + if (err) {
8799     + printk(KERN_ERR "unionfs: error accessing "
8800     + "lower directory \"%s\" (error %d)\n",
8801     + new_branch, err);
8802     + goto out;
8803     + }
8804     + /*
8805     + * It's probably safe to check_mode the new branch to insert. Note:
8806     + * we don't allow inserting branches which are unionfs's by
8807     + * themselves (check_branch returns EINVAL in that case). This is
8808     + * because this code base doesn't support stacking unionfs: the ODF
8809     + * code base supports that correctly.
8810     + */
8811     + err = check_branch(&nd);
8812     + if (err) {
8813     + printk(KERN_ERR "unionfs: lower directory "
8814     + "\"%s\" is not a valid branch\n", optarg);
8815     + path_put(&nd.path);
8816     + goto out;
8817     + }
8818     +
8819     + /*
8820     + * Now we have to insert the new branch. But first, move the bits
8821     + * to make space for the new branch, if needed. Finally, adjust
8822     + * cur_branches.
8823     + * We don't release nd here; it's kept until umount/remount.
8824     + */
8825     + if (idx < cur_branches) {
8826     + /* if idx==cur_branches, we append: easy */
8827     + memmove(&new_data[idx+1], &new_data[idx],
8828     + (cur_branches - idx) * sizeof(struct unionfs_data));
8829     + memmove(&new_lower_paths[idx+1], &new_lower_paths[idx],
8830     + (cur_branches - idx) * sizeof(struct path));
8831     + }
8832     + new_lower_paths[idx].dentry = nd.path.dentry;
8833     + new_lower_paths[idx].mnt = nd.path.mnt;
8834     +
8835     + new_data[idx].sb = nd.path.dentry->d_sb;
8836     + atomic_set(&new_data[idx].open_files, 0);
8837     + new_data[idx].branchperms = perms;
8838     + new_data[idx].branch_id = ++*high_branch_id; /* assign new branch ID */
8839     +
8840     + err = 0;
8841     +out:
8842     + return err;
8843     +}
8844     +
8845     +
8846     +/*
8847     + * Support branch management options on remount.
8848     + *
8849     + * See Documentation/filesystems/unionfs/ for details.
8850     + *
8851     + * @flags: numeric mount options
8852     + * @options: mount options string
8853     + *
8854     + * This function can rearrange a mounted union dynamically, adding and
8855     + * removing branches, including changing branch modes. Clearly this has to
8856     + * be done safely and atomically. Luckily, the VFS already calls this
8857     + * function with lock_super(sb) and lock_kernel() held, preventing
8858     + * concurrent mixing of new mounts, remounts, and unmounts. Moreover,
8859     + * do_remount_sb(), our caller function, already called shrink_dcache_sb(sb)
8860     + * to purge dentries/inodes from our superblock, and also called
8861     + * fsync_super(sb) to purge any dirty pages. So we're good.
8862     + *
8863     + * XXX: however, our remount code may also need to invalidate mapped pages
8864     + * so as to force them to be re-gotten from the (newly reconfigured) lower
8865     + * branches. This has to wait for proper mmap and cache coherency support
8866     + * in the VFS.
8867     + *
8868     + */
8869     +static int unionfs_remount_fs(struct super_block *sb, int *flags,
8870     + char *options)
8871     +{
8872     + int err = 0;
8873     + int i;
8874     + char *optionstmp, *tmp_to_free; /* kstrdup'ed of "options" */
8875     + char *optname;
8876     + int cur_branches = 0; /* no. of current branches */
8877     + int new_branches = 0; /* no. of branches actually left in the end */
8878     + int add_branches; /* est. no. of branches to add */
8879     + int del_branches; /* est. no. of branches to del */
8880     + int max_branches; /* max possible no. of branches */
8881     + struct unionfs_data *new_data = NULL, *tmp_data = NULL;
8882     + struct path *new_lower_paths = NULL, *tmp_lower_paths = NULL;
8883     + struct inode **new_lower_inodes = NULL;
8884     + int new_high_branch_id; /* new high branch ID */
8885     + int size; /* memory allocation size, temp var */
8886     + int old_ibstart, old_ibend;
8887     +
8888     + unionfs_write_lock(sb);
8889     +
8890     + /*
8891     + * The VFS will take care of "ro" and "rw" flags, and we can safely
8892     + * ignore MS_SILENT, but anything else left over is an error. So we
8893     + * need to check if any other flags may have been passed (none are
8894     + * allowed/supported as of now).
8895     + */
8896     + if ((*flags & ~(MS_RDONLY | MS_SILENT)) != 0) {
8897     + printk(KERN_ERR
8898     + "unionfs: remount flags 0x%x unsupported\n", *flags);
8899     + err = -EINVAL;
8900     + goto out_error;
8901     + }
8902     +
8903     + /*
8904     + * If 'options' is NULL, it's probably because the user just changed
8905     + * the union to a "ro" or "rw" and the VFS took care of it. So
8906     + * nothing to do and we're done.
8907     + */
8908     + if (!options || options[0] == '\0')
8909     + goto out_error;
8910     +
8911     + /*
8912     + * Find out how many branches we will have in the end, counting
8913     + * "add" and "del" commands. Copy the "options" string because
8914     + * strsep modifies the string and we need it later.
8915     + */
8916     + tmp_to_free = kstrdup(options, GFP_KERNEL);
8917     + optionstmp = tmp_to_free;
8918     + if (unlikely(!optionstmp)) {
8919     + err = -ENOMEM;
8920     + goto out_free;
8921     + }
8922     + cur_branches = sbmax(sb); /* current no. branches */
8923     + new_branches = sbmax(sb);
8924     + del_branches = 0;
8925     + add_branches = 0;
8926     + new_high_branch_id = sbhbid(sb); /* save current high_branch_id */
8927     + while ((optname = strsep(&optionstmp, ",")) != NULL) {
8928     + char *optarg;
8929     +
8930     + if (!optname || !*optname)
8931     + continue;
8932     +
8933     + optarg = strchr(optname, '=');
8934     + if (optarg)
8935     + *optarg++ = '\0';
8936     +
8937     + if (!strcmp("add", optname))
8938     + add_branches++;
8939     + else if (!strcmp("del", optname))
8940     + del_branches++;
8941     + }
8942     + kfree(tmp_to_free);
8943     + /* after all changes, will we have at least one branch left? */
8944     + if ((new_branches + add_branches - del_branches) < 1) {
8945     + printk(KERN_ERR
8946     + "unionfs: no branches left after remount\n");
8947     + err = -EINVAL;
8948     + goto out_free;
8949     + }
8950     +
8951     + /*
8952     + * Since we haven't actually parsed all the add/del options, nor
8953     + * have we checked them for errors, we don't know for sure how many
8954     + * branches we will have after all changes have taken place. In
8955     + * fact, the total number of branches left could be less than what
8956     + * we have now. So we need to allocate space for a temporary
8957     + * placeholder that is at least as large as the maximum number of
8958     + * branches we *could* have, which is the current number plus all
8959     + * the additions. Once we're done with these temp placeholders, we
8960     + * may have to re-allocate the final size, copy over from the temp,
8961     + * and then free the temps (done near the end of this function).
8962     + */
8963     + max_branches = cur_branches + add_branches;
8964     + /* allocate space for new pointers to lower dentry */
8965     + tmp_data = kcalloc(max_branches,
8966     + sizeof(struct unionfs_data), GFP_KERNEL);
8967     + if (unlikely(!tmp_data)) {
8968     + err = -ENOMEM;
8969     + goto out_free;
8970     + }
8971     + /* allocate space for new pointers to lower paths */
8972     + tmp_lower_paths = kcalloc(max_branches,
8973     + sizeof(struct path), GFP_KERNEL);
8974     + if (unlikely(!tmp_lower_paths)) {
8975     + err = -ENOMEM;
8976     + goto out_free;
8977     + }
8978     + /* copy current info into new placeholders, incrementing refcnts */
8979     + memcpy(tmp_data, UNIONFS_SB(sb)->data,
8980     + cur_branches * sizeof(struct unionfs_data));
8981     + memcpy(tmp_lower_paths, UNIONFS_D(sb->s_root)->lower_paths,
8982     + cur_branches * sizeof(struct path));
8983     + for (i = 0; i < cur_branches; i++)
8984     + path_get(&tmp_lower_paths[i]); /* drop refs at end of fxn */
8985     +
8986     + /*******************************************************************
8987     + * For each branch command, do path_lookup on the requested branch,
8988     + * and apply the change to a temp branch list. To handle errors, we
8989     + * already dup'ed the old arrays (above), and increased the refcnts
8990     + * on various f/s objects. So now we can do all the path_lookups
8991     + * and branch-management commands on the new arrays. If it fail mid
8992     + * way, we free the tmp arrays and *put all objects. If we succeed,
8993     + * then we free old arrays and *put its objects, and then replace
8994     + * the arrays with the new tmp list (we may have to re-allocate the
8995     + * memory because the temp lists could have been larger than what we
8996     + * actually needed).
8997     + *******************************************************************/
8998     +
8999     + while ((optname = strsep(&options, ",")) != NULL) {
9000     + char *optarg;
9001     +
9002     + if (!optname || !*optname)
9003     + continue;
9004     + /*
9005     + * At this stage optname holds a comma-delimited option, but
9006     + * without the commas. Next, we need to break the string on
9007     + * the '=' symbol to separate CMD=ARG, where ARG itself can
9008     + * be KEY=VAL. For example, in mode=/foo=rw, CMD is "mode",
9009     + * KEY is "/foo", and VAL is "rw".
9010     + */
9011     + optarg = strchr(optname, '=');
9012     + if (optarg)
9013     + *optarg++ = '\0';
9014     + /* incgen remount option (instead of old ioctl) */
9015     + if (!strcmp("incgen", optname)) {
9016     + err = 0;
9017     + goto out_no_change;
9018     + }
9019     +
9020     + /*
9021     + * All of our options take an argument now. (Insert ones
9022     + * that don't above this check.) So at this stage optname
9023     + * contains the CMD part and optarg contains the ARG part.
9024     + */
9025     + if (!optarg || !*optarg) {
9026     + printk(KERN_ERR "unionfs: all remount options require "
9027     + "an argument (%s)\n", optname);
9028     + err = -EINVAL;
9029     + goto out_release;
9030     + }
9031     +
9032     + if (!strcmp("add", optname)) {
9033     + err = do_remount_add_option(optarg, new_branches,
9034     + tmp_data,
9035     + tmp_lower_paths,
9036     + &new_high_branch_id);
9037     + if (err)
9038     + goto out_release;
9039     + new_branches++;
9040     + if (new_branches > UNIONFS_MAX_BRANCHES) {
9041     + printk(KERN_ERR "unionfs: command exceeds "
9042     + "%d branches\n", UNIONFS_MAX_BRANCHES);
9043     + err = -E2BIG;
9044     + goto out_release;
9045     + }
9046     + continue;
9047     + }
9048     + if (!strcmp("del", optname)) {
9049     + err = do_remount_del_option(optarg, new_branches,
9050     + tmp_data,
9051     + tmp_lower_paths);
9052     + if (err)
9053     + goto out_release;
9054     + new_branches--;
9055     + continue;
9056     + }
9057     + if (!strcmp("mode", optname)) {
9058     + err = do_remount_mode_option(optarg, new_branches,
9059     + tmp_data,
9060     + tmp_lower_paths);
9061     + if (err)
9062     + goto out_release;
9063     + continue;
9064     + }
9065     +
9066     + /*
9067     + * When you use "mount -o remount,ro", mount(8) will
9068     + * reportedly pass the original dirs= string from
9069     + * /proc/mounts. So for now, we have to ignore dirs= and
9070     + * not consider it an error, unless we want to allow users
9071     + * to pass dirs= in remount. Note that to allow the VFS to
9072     + * actually process the ro/rw remount options, we have to
9073     + * return 0 from this function.
9074     + */
9075     + if (!strcmp("dirs", optname)) {
9076     + printk(KERN_WARNING
9077     + "unionfs: remount ignoring option \"%s\"\n",
9078     + optname);
9079     + continue;
9080     + }
9081     +
9082     + err = -EINVAL;
9083     + printk(KERN_ERR
9084     + "unionfs: unrecognized option \"%s\"\n", optname);
9085     + goto out_release;
9086     + }
9087     +
9088     +out_no_change:
9089     +
9090     + /******************************************************************
9091     + * WE'RE ALMOST DONE: check if leftmost branch might be read-only,
9092     + * see if we need to allocate a small-sized new vector, copy the
9093     + * vectors to their correct place, release the refcnt of the older
9094     + * ones, and return. Also handle invalidating any pages that will
9095     + * have to be re-read.
9096     + *******************************************************************/
9097     +
9098     + if (!(tmp_data[0].branchperms & MAY_WRITE)) {
9099     + printk(KERN_ERR "unionfs: leftmost branch cannot be read-only "
9100     + "(use \"remount,ro\" to create a read-only union)\n");
9101     + err = -EINVAL;
9102     + goto out_release;
9103     + }
9104     +
9105     + /* (re)allocate space for new pointers to lower dentry */
9106     + size = new_branches * sizeof(struct unionfs_data);
9107     + new_data = krealloc(tmp_data, size, GFP_KERNEL);
9108     + if (unlikely(!new_data)) {
9109     + err = -ENOMEM;
9110     + goto out_release;
9111     + }
9112     +
9113     + /* allocate space for new pointers to lower paths */
9114     + size = new_branches * sizeof(struct path);
9115     + new_lower_paths = krealloc(tmp_lower_paths, size, GFP_KERNEL);
9116     + if (unlikely(!new_lower_paths)) {
9117     + err = -ENOMEM;
9118     + goto out_release;
9119     + }
9120     +
9121     + /* allocate space for new pointers to lower inodes */
9122     + new_lower_inodes = kcalloc(new_branches,
9123     + sizeof(struct inode *), GFP_KERNEL);
9124     + if (unlikely(!new_lower_inodes)) {
9125     + err = -ENOMEM;
9126     + goto out_release;
9127     + }
9128     +
9129     + /*
9130     + * OK, just before we actually put the new set of branches in place,
9131     + * we need to ensure that our own f/s has no dirty objects left.
9132     + * Luckily, do_remount_sb() already calls shrink_dcache_sb(sb) and
9133     + * fsync_super(sb), taking care of dentries, inodes, and dirty
9134     + * pages. So all that's left is for us to invalidate any leftover
9135     + * (non-dirty) pages to ensure that they will be re-read from the
9136     + * new lower branches (and to support mmap).
9137     + */
9138     +
9139     + /*
9140     + * Once we finish the remounting successfully, our superblock
9141     + * generation number will have increased. This will be detected by
9142     + * our dentry-revalidation code upon subsequent f/s operations
9143     + * through unionfs. The revalidation code will rebuild the union of
9144     + * lower inodes for a given unionfs inode and invalidate any pages
9145     + * of such "stale" inodes (by calling our purge_inode_data
9146     + * function). This revalidation will happen lazily and
9147     + * incrementally, as users perform operations on cached inodes. We
9148     + * would like to encourage this revalidation to happen sooner if
9149     + * possible, so we like to try to invalidate as many other pages in
9150     + * our superblock as we can. We used to call drop_pagecache_sb() or
9151     + * a variant thereof, but either method was racy (drop_caches alone
9152     + * is known to be racy). So now we let the revalidation happen on a
9153     + * per file basis in ->d_revalidate.
9154     + */
9155     +
9156     + /* grab new lower super references; release old ones */
9157     + for (i = 0; i < new_branches; i++)
9158     + atomic_inc(&new_data[i].sb->s_active);
9159     + for (i = 0; i < sbmax(sb); i++)
9160     + atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
9161     +
9162     + /* copy new vectors into their correct place */
9163     + tmp_data = UNIONFS_SB(sb)->data;
9164     + UNIONFS_SB(sb)->data = new_data;
9165     + new_data = NULL; /* so don't free good pointers below */
9166     + tmp_lower_paths = UNIONFS_D(sb->s_root)->lower_paths;
9167     + UNIONFS_D(sb->s_root)->lower_paths = new_lower_paths;
9168     + new_lower_paths = NULL; /* so don't free good pointers below */
9169     +
9170     + /* update our unionfs_sb_info and root dentry index of last branch */
9171     + i = sbmax(sb); /* save no. of branches to release at end */
9172     + sbend(sb) = new_branches - 1;
9173     + dbend(sb->s_root) = new_branches - 1;
9174     + old_ibstart = ibstart(sb->s_root->d_inode);
9175     + old_ibend = ibend(sb->s_root->d_inode);
9176     + ibend(sb->s_root->d_inode) = new_branches - 1;
9177     + UNIONFS_D(sb->s_root)->bcount = new_branches;
9178     + new_branches = i; /* no. of branches to release below */
9179     +
9180     + /*
9181     + * Update lower inodes: 3 steps
9182     + * 1. grab ref on all new lower inodes
9183     + */
9184     + for (i = dbstart(sb->s_root); i <= dbend(sb->s_root); i++) {
9185     + struct dentry *lower_dentry =
9186     + unionfs_lower_dentry_idx(sb->s_root, i);
9187     + igrab(lower_dentry->d_inode);
9188     + new_lower_inodes[i] = lower_dentry->d_inode;
9189     + }
9190     + /* 2. release reference on all older lower inodes */
9191     + iput_lowers(sb->s_root->d_inode, old_ibstart, old_ibend, true);
9192     + /* 3. update root dentry's inode to new lower_inodes array */
9193     + UNIONFS_I(sb->s_root->d_inode)->lower_inodes = new_lower_inodes;
9194     + new_lower_inodes = NULL;
9195     +
9196     + /* maxbytes may have changed */
9197     + sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
9198     + /* update high branch ID */
9199     + sbhbid(sb) = new_high_branch_id;
9200     +
9201     + /* update our sb->generation for revalidating objects */
9202     + i = atomic_inc_return(&UNIONFS_SB(sb)->generation);
9203     + atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
9204     + atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
9205     + if (!(*flags & MS_SILENT))
9206     + pr_info("unionfs: %s: new generation number %d\n",
9207     + UNIONFS_SB(sb)->dev_name, i);
9208     + /* finally, update the root dentry's times */
9209     + unionfs_copy_attr_times(sb->s_root->d_inode);
9210     + err = 0; /* reset to success */
9211     +
9212     + /*
9213     + * The code above falls through to the next label, and releases the
9214     + * refcnts of the older ones (stored in tmp_*): if we fell through
9215     + * here, it means success. However, if we jump directly to this
9216     + * label from any error above, then an error occurred after we
9217     + * grabbed various refcnts, and so we have to release the
9218     + * temporarily constructed structures.
9219     + */
9220     +out_release:
9221     + /* no need to cleanup/release anything in tmp_data */
9222     + if (tmp_lower_paths)
9223     + for (i = 0; i < new_branches; i++)
9224     + path_put(&tmp_lower_paths[i]);
9225     +out_free:
9226     + kfree(tmp_lower_paths);
9227     + kfree(tmp_data);
9228     + kfree(new_lower_paths);
9229     + kfree(new_data);
9230     + kfree(new_lower_inodes);
9231     +out_error:
9232     + unionfs_check_dentry(sb->s_root);
9233     + unionfs_write_unlock(sb);
9234     + return err;
9235     +}
9236     +
9237     +/*
9238     + * Called by iput() when the inode reference count reached zero
9239     + * and the inode is not hashed anywhere. Used to clear anything
9240     + * that needs to be, before the inode is completely destroyed and put
9241     + * on the inode free list.
9242     + *
9243     + * No need to lock sb info's rwsem.
9244     + */
9245     +static void unionfs_clear_inode(struct inode *inode)
9246     +{
9247     + int bindex, bstart, bend;
9248     + struct inode *lower_inode;
9249     + struct list_head *pos, *n;
9250     + struct unionfs_dir_state *rdstate;
9251     +
9252     + list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9253     + rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9254     + list_del(&rdstate->cache);
9255     + free_rdstate(rdstate);
9256     + }
9257     +
9258     + /*
9259     + * Decrement a reference to a lower_inode, which was incremented
9260     + * by our read_inode when it was created initially.
9261     + */
9262     + bstart = ibstart(inode);
9263     + bend = ibend(inode);
9264     + if (bstart >= 0) {
9265     + for (bindex = bstart; bindex <= bend; bindex++) {
9266     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
9267     + if (!lower_inode)
9268     + continue;
9269     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
9270     + /* see Documentation/filesystems/unionfs/issues.txt */
9271     + lockdep_off();
9272     + iput(lower_inode);
9273     + lockdep_on();
9274     + }
9275     + }
9276     +
9277     + kfree(UNIONFS_I(inode)->lower_inodes);
9278     + UNIONFS_I(inode)->lower_inodes = NULL;
9279     +}
9280     +
9281     +static struct inode *unionfs_alloc_inode(struct super_block *sb)
9282     +{
9283     + struct unionfs_inode_info *i;
9284     +
9285     + i = kmem_cache_alloc(unionfs_inode_cachep, GFP_KERNEL);
9286     + if (unlikely(!i))
9287     + return NULL;
9288     +
9289     + /* memset everything up to the inode to 0 */
9290     + memset(i, 0, offsetof(struct unionfs_inode_info, vfs_inode));
9291     +
9292     + i->vfs_inode.i_version = 1;
9293     + return &i->vfs_inode;
9294     +}
9295     +
9296     +static void unionfs_destroy_inode(struct inode *inode)
9297     +{
9298     + kmem_cache_free(unionfs_inode_cachep, UNIONFS_I(inode));
9299     +}
9300     +
9301     +/* unionfs inode cache constructor */
9302     +static void init_once(void *obj)
9303     +{
9304     + struct unionfs_inode_info *i = obj;
9305     +
9306     + inode_init_once(&i->vfs_inode);
9307     +}
9308     +
9309     +int unionfs_init_inode_cache(void)
9310     +{
9311     + int err = 0;
9312     +
9313     + unionfs_inode_cachep =
9314     + kmem_cache_create("unionfs_inode_cache",
9315     + sizeof(struct unionfs_inode_info), 0,
9316     + SLAB_RECLAIM_ACCOUNT, init_once);
9317     + if (unlikely(!unionfs_inode_cachep))
9318     + err = -ENOMEM;
9319     + return err;
9320     +}
9321     +
9322     +/* unionfs inode cache destructor */
9323     +void unionfs_destroy_inode_cache(void)
9324     +{
9325     + if (unionfs_inode_cachep)
9326     + kmem_cache_destroy(unionfs_inode_cachep);
9327     +}
9328     +
9329     +/*
9330     + * Called when we have a dirty inode, right here we only throw out
9331     + * parts of our readdir list that are too old.
9332     + *
9333     + * No need to grab sb info's rwsem.
9334     + */
9335     +static int unionfs_write_inode(struct inode *inode, int sync)
9336     +{
9337     + struct list_head *pos, *n;
9338     + struct unionfs_dir_state *rdstate;
9339     +
9340     + spin_lock(&UNIONFS_I(inode)->rdlock);
9341     + list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9342     + rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9343     + /* We keep this list in LRU order. */
9344     + if ((rdstate->access + RDCACHE_JIFFIES) > jiffies)
9345     + break;
9346     + UNIONFS_I(inode)->rdcount--;
9347     + list_del(&rdstate->cache);
9348     + free_rdstate(rdstate);
9349     + }
9350     + spin_unlock(&UNIONFS_I(inode)->rdlock);
9351     +
9352     + return 0;
9353     +}
9354     +
9355     +/*
9356     + * Used only in nfs, to kill any pending RPC tasks, so that subsequent
9357     + * code can actually succeed and won't leave tasks that need handling.
9358     + */
9359     +static void unionfs_umount_begin(struct super_block *sb)
9360     +{
9361     + struct super_block *lower_sb;
9362     + int bindex, bstart, bend;
9363     +
9364     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9365     +
9366     + bstart = sbstart(sb);
9367     + bend = sbend(sb);
9368     + for (bindex = bstart; bindex <= bend; bindex++) {
9369     + lower_sb = unionfs_lower_super_idx(sb, bindex);
9370     +
9371     + if (lower_sb && lower_sb->s_op &&
9372     + lower_sb->s_op->umount_begin)
9373     + lower_sb->s_op->umount_begin(lower_sb);
9374     + }
9375     +
9376     + unionfs_read_unlock(sb);
9377     +}
9378     +
9379     +static int unionfs_show_options(struct seq_file *m, struct vfsmount *mnt)
9380     +{
9381     + struct super_block *sb = mnt->mnt_sb;
9382     + int ret = 0;
9383     + char *tmp_page;
9384     + char *path;
9385     + int bindex, bstart, bend;
9386     + int perms;
9387     +
9388     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9389     +
9390     + unionfs_lock_dentry(sb->s_root, UNIONFS_DMUTEX_CHILD);
9391     +
9392     + tmp_page = (char *) __get_free_page(GFP_KERNEL);
9393     + if (unlikely(!tmp_page)) {
9394     + ret = -ENOMEM;
9395     + goto out;
9396     + }
9397     +
9398     + bstart = sbstart(sb);
9399     + bend = sbend(sb);
9400     +
9401     + seq_printf(m, ",dirs=");
9402     + for (bindex = bstart; bindex <= bend; bindex++) {
9403     + struct path p;
9404     + p.dentry = unionfs_lower_dentry_idx(sb->s_root, bindex);
9405     + p.mnt = unionfs_lower_mnt_idx(sb->s_root, bindex);
9406     + path = d_path(&p, tmp_page, PAGE_SIZE);
9407     + if (IS_ERR(path)) {
9408     + ret = PTR_ERR(path);
9409     + goto out;
9410     + }
9411     +
9412     + perms = branchperms(sb, bindex);
9413     +
9414     + seq_printf(m, "%s=%s", path,
9415     + perms & MAY_WRITE ? "rw" : "ro");
9416     + if (bindex != bend)
9417     + seq_printf(m, ":");
9418     + }
9419     +
9420     +out:
9421     + free_page((unsigned long) tmp_page);
9422     +
9423     + unionfs_unlock_dentry(sb->s_root);
9424     +
9425     + unionfs_read_unlock(sb);
9426     +
9427     + return ret;
9428     +}
9429     +
9430     +struct super_operations unionfs_sops = {
9431     + .delete_inode = unionfs_delete_inode,
9432     + .put_super = unionfs_put_super,
9433     + .statfs = unionfs_statfs,
9434     + .remount_fs = unionfs_remount_fs,
9435     + .clear_inode = unionfs_clear_inode,
9436     + .umount_begin = unionfs_umount_begin,
9437     + .show_options = unionfs_show_options,
9438     + .write_inode = unionfs_write_inode,
9439     + .alloc_inode = unionfs_alloc_inode,
9440     + .destroy_inode = unionfs_destroy_inode,
9441     +};
9442     diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
9443     new file mode 100644
9444     index 0000000..99335a3
9445     --- /dev/null
9446     +++ b/fs/unionfs/union.h
9447     @@ -0,0 +1,670 @@
9448     +/*
9449     + * Copyright (c) 2003-2010 Erez Zadok
9450     + * Copyright (c) 2003-2006 Charles P. Wright
9451     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
9452     + * Copyright (c) 2005 Arun M. Krishnakumar
9453     + * Copyright (c) 2004-2006 David P. Quigley
9454     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
9455     + * Copyright (c) 2003 Puja Gupta
9456     + * Copyright (c) 2003 Harikesavan Krishnan
9457     + * Copyright (c) 2003-2010 Stony Brook University
9458     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
9459     + *
9460     + * This program is free software; you can redistribute it and/or modify
9461     + * it under the terms of the GNU General Public License version 2 as
9462     + * published by the Free Software Foundation.
9463     + */
9464     +
9465     +#ifndef _UNION_H_
9466     +#define _UNION_H_
9467     +
9468     +#include <linux/dcache.h>
9469     +#include <linux/file.h>
9470     +#include <linux/list.h>
9471     +#include <linux/fs.h>
9472     +#include <linux/mm.h>
9473     +#include <linux/module.h>
9474     +#include <linux/mount.h>
9475     +#include <linux/namei.h>
9476     +#include <linux/page-flags.h>
9477     +#include <linux/pagemap.h>
9478     +#include <linux/poll.h>
9479     +#include <linux/security.h>
9480     +#include <linux/seq_file.h>
9481     +#include <linux/slab.h>
9482     +#include <linux/spinlock.h>
9483     +#include <linux/smp_lock.h>
9484     +#include <linux/statfs.h>
9485     +#include <linux/string.h>
9486     +#include <linux/vmalloc.h>
9487     +#include <linux/writeback.h>
9488     +#include <linux/buffer_head.h>
9489     +#include <linux/xattr.h>
9490     +#include <linux/fs_stack.h>
9491     +#include <linux/magic.h>
9492     +#include <linux/log2.h>
9493     +#include <linux/poison.h>
9494     +#include <linux/mman.h>
9495     +#include <linux/backing-dev.h>
9496     +#include <linux/splice.h>
9497     +
9498     +#include <asm/system.h>
9499     +
9500     +#include <linux/union_fs.h>
9501     +
9502     +/* the file system name */
9503     +#define UNIONFS_NAME "unionfs"
9504     +
9505     +/* unionfs root inode number */
9506     +#define UNIONFS_ROOT_INO 1
9507     +
9508     +/* number of times we try to get a unique temporary file name */
9509     +#define GET_TMPNAM_MAX_RETRY 5
9510     +
9511     +/* maximum number of branches we support, to avoid memory blowup */
9512     +#define UNIONFS_MAX_BRANCHES 128
9513     +
9514     +/* minimum time (seconds) required for time-based cache-coherency */
9515     +#define UNIONFS_MIN_CC_TIME 3
9516     +
9517     +/* Operations vectors defined in specific files. */
9518     +extern struct file_operations unionfs_main_fops;
9519     +extern struct file_operations unionfs_dir_fops;
9520     +extern struct inode_operations unionfs_main_iops;
9521     +extern struct inode_operations unionfs_dir_iops;
9522     +extern struct inode_operations unionfs_symlink_iops;
9523     +extern struct super_operations unionfs_sops;
9524     +extern struct dentry_operations unionfs_dops;
9525     +extern struct address_space_operations unionfs_aops, unionfs_dummy_aops;
9526     +extern struct vm_operations_struct unionfs_vm_ops;
9527     +
9528     +/* How long should an entry be allowed to persist */
9529     +#define RDCACHE_JIFFIES (5*HZ)
9530     +
9531     +/* compatibility with Real-Time patches */
9532     +#ifdef CONFIG_PREEMPT_RT
9533     +# define unionfs_rw_semaphore compat_rw_semaphore
9534     +#else /* not CONFIG_PREEMPT_RT */
9535     +# define unionfs_rw_semaphore rw_semaphore
9536     +#endif /* not CONFIG_PREEMPT_RT */
9537     +
9538     +/* file private data. */
9539     +struct unionfs_file_info {
9540     + int bstart;
9541     + int bend;
9542     + atomic_t generation;
9543     +
9544     + struct unionfs_dir_state *rdstate;
9545     + struct file **lower_files;
9546     + int *saved_branch_ids; /* IDs of branches when file was opened */
9547     + const struct vm_operations_struct *lower_vm_ops;
9548     + bool wrote_to_file; /* for delayed copyup */
9549     +};
9550     +
9551     +/* unionfs inode data in memory */
9552     +struct unionfs_inode_info {
9553     + int bstart;
9554     + int bend;
9555     + atomic_t generation;
9556     + /* Stuff for readdir over NFS. */
9557     + spinlock_t rdlock;
9558     + struct list_head readdircache;
9559     + int rdcount;
9560     + int hashsize;
9561     + int cookie;
9562     +
9563     + /* The lower inodes */
9564     + struct inode **lower_inodes;
9565     +
9566     + struct inode vfs_inode;
9567     +};
9568     +
9569     +/* unionfs dentry data in memory */
9570     +struct unionfs_dentry_info {
9571     + /*
9572     + * The semaphore is used to lock the dentry as soon as we get into a
9573     + * unionfs function from the VFS. Our lock ordering is that children
9574     + * go before their parents.
9575     + */
9576     + struct mutex lock;
9577     + int bstart;
9578     + int bend;
9579     + int bopaque;
9580     + int bcount;
9581     + atomic_t generation;
9582     + struct path *lower_paths;
9583     +};
9584     +
9585     +/* These are the pointers to our various objects. */
9586     +struct unionfs_data {
9587     + struct super_block *sb; /* lower super_block */
9588     + atomic_t open_files; /* number of open files on branch */
9589     + int branchperms;
9590     + int branch_id; /* unique branch ID at re/mount time */
9591     +};
9592     +
9593     +/* unionfs super-block data in memory */
9594     +struct unionfs_sb_info {
9595     + int bend;
9596     +
9597     + atomic_t generation;
9598     +
9599     + /*
9600     + * This rwsem is used to make sure that a branch management
9601     + * operation...
9602     + * 1) will not begin before all currently in-flight operations
9603     + * complete.
9604     + * 2) any new operations do not execute until the currently
9605     + * running branch management operation completes.
9606     + *
9607     + * The write_lock_owner records the PID of the task which grabbed
9608     + * the rw_sem for writing. If the same task also tries to grab the
9609     + * read lock, we allow it. This prevents a self-deadlock when
9610     + * branch-management is used on a pivot_root'ed union, because we
9611     + * have to ->lookup paths which belong to the same union.
9612     + */
9613     + struct unionfs_rw_semaphore rwsem;
9614     + pid_t write_lock_owner; /* PID of rw_sem owner (write lock) */
9615     + int high_branch_id; /* last unique branch ID given */
9616     + char *dev_name; /* to identify different unions in pr_debug */
9617     + struct unionfs_data *data;
9618     +};
9619     +
9620     +/*
9621     + * structure for making the linked list of entries by readdir on left branch
9622     + * to compare with entries on right branch
9623     + */
9624     +struct filldir_node {
9625     + struct list_head file_list; /* list for directory entries */
9626     + char *name; /* name entry */
9627     + int hash; /* name hash */
9628     + int namelen; /* name len since name is not 0 terminated */
9629     +
9630     + /*
9631     + * we can check for duplicate whiteouts and files in the same branch
9632     + * in order to return -EIO.
9633     + */
9634     + int bindex;
9635     +
9636     + /* is this a whiteout entry? */
9637     + int whiteout;
9638     +
9639     + /* Inline name, so we don't need to separately kmalloc small ones */
9640     + char iname[DNAME_INLINE_LEN_MIN];
9641     +};
9642     +
9643     +/* Directory hash table. */
9644     +struct unionfs_dir_state {
9645     + unsigned int cookie; /* the cookie, based off of rdversion */
9646     + unsigned int offset; /* The entry we have returned. */
9647     + int bindex;
9648     + loff_t dirpos; /* offset within the lower level directory */
9649     + int size; /* How big is the hash table? */
9650     + int hashentries; /* How many entries have been inserted? */
9651     + unsigned long access;
9652     +
9653     + /* This cache list is used when the inode keeps us around. */
9654     + struct list_head cache;
9655     + struct list_head list[0];
9656     +};
9657     +
9658     +/* externs needed for fanout.h or sioq.h */
9659     +extern int unionfs_get_nlinks(const struct inode *inode);
9660     +extern void unionfs_copy_attr_times(struct inode *upper);
9661     +extern void unionfs_copy_attr_all(struct inode *dest, const struct inode *src);
9662     +
9663     +/* include miscellaneous macros */
9664     +#include "fanout.h"
9665     +#include "sioq.h"
9666     +
9667     +/* externs for cache creation/deletion routines */
9668     +extern void unionfs_destroy_filldir_cache(void);
9669     +extern int unionfs_init_filldir_cache(void);
9670     +extern int unionfs_init_inode_cache(void);
9671     +extern void unionfs_destroy_inode_cache(void);
9672     +extern int unionfs_init_dentry_cache(void);
9673     +extern void unionfs_destroy_dentry_cache(void);
9674     +
9675     +/* Initialize and free readdir-specific state. */
9676     +extern int init_rdstate(struct file *file);
9677     +extern struct unionfs_dir_state *alloc_rdstate(struct inode *inode,
9678     + int bindex);
9679     +extern struct unionfs_dir_state *find_rdstate(struct inode *inode,
9680     + loff_t fpos);
9681     +extern void free_rdstate(struct unionfs_dir_state *state);
9682     +extern int add_filldir_node(struct unionfs_dir_state *rdstate,
9683     + const char *name, int namelen, int bindex,
9684     + int whiteout);
9685     +extern struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
9686     + const char *name, int namelen,
9687     + int is_whiteout);
9688     +
9689     +extern struct dentry **alloc_new_dentries(int objs);
9690     +extern struct unionfs_data *alloc_new_data(int objs);
9691     +
9692     +/* We can only use 32-bits of offset for rdstate --- blech! */
9693     +#define DIREOF (0xfffff)
9694     +#define RDOFFBITS 20 /* This is the number of bits in DIREOF. */
9695     +#define MAXRDCOOKIE (0xfff)
9696     +/* Turn an rdstate into an offset. */
9697     +static inline off_t rdstate2offset(struct unionfs_dir_state *buf)
9698     +{
9699     + off_t tmp;
9700     +
9701     + tmp = ((buf->cookie & MAXRDCOOKIE) << RDOFFBITS)
9702     + | (buf->offset & DIREOF);
9703     + return tmp;
9704     +}
9705     +
9706     +/* Macros for locking a super_block. */
9707     +enum unionfs_super_lock_class {
9708     + UNIONFS_SMUTEX_NORMAL,
9709     + UNIONFS_SMUTEX_PARENT, /* when locking on behalf of file */
9710     + UNIONFS_SMUTEX_CHILD, /* when locking on behalf of dentry */
9711     +};
9712     +static inline void unionfs_read_lock(struct super_block *sb, int subclass)
9713     +{
9714     + if (UNIONFS_SB(sb)->write_lock_owner &&
9715     + UNIONFS_SB(sb)->write_lock_owner == current->pid)
9716     + return;
9717     + down_read_nested(&UNIONFS_SB(sb)->rwsem, subclass);
9718     +}
9719     +static inline void unionfs_read_unlock(struct super_block *sb)
9720     +{
9721     + if (UNIONFS_SB(sb)->write_lock_owner &&
9722     + UNIONFS_SB(sb)->write_lock_owner == current->pid)
9723     + return;
9724     + up_read(&UNIONFS_SB(sb)->rwsem);
9725     +}
9726     +static inline void unionfs_write_lock(struct super_block *sb)
9727     +{
9728     + down_write(&UNIONFS_SB(sb)->rwsem);
9729     + UNIONFS_SB(sb)->write_lock_owner = current->pid;
9730     +}
9731     +static inline void unionfs_write_unlock(struct super_block *sb)
9732     +{
9733     + up_write(&UNIONFS_SB(sb)->rwsem);
9734     + UNIONFS_SB(sb)->write_lock_owner = 0;
9735     +}
9736     +
9737     +static inline void unionfs_double_lock_dentry(struct dentry *d1,
9738     + struct dentry *d2)
9739     +{
9740     + BUG_ON(d1 == d2);
9741     + if (d1 < d2) {
9742     + unionfs_lock_dentry(d1, UNIONFS_DMUTEX_PARENT);
9743     + unionfs_lock_dentry(d2, UNIONFS_DMUTEX_CHILD);
9744     + } else {
9745     + unionfs_lock_dentry(d2, UNIONFS_DMUTEX_PARENT);
9746     + unionfs_lock_dentry(d1, UNIONFS_DMUTEX_CHILD);
9747     + }
9748     +}
9749     +
9750     +static inline void unionfs_double_unlock_dentry(struct dentry *d1,
9751     + struct dentry *d2)
9752     +{
9753     + BUG_ON(d1 == d2);
9754     + if (d1 < d2) { /* unlock in reverse order than double_lock_dentry */
9755     + unionfs_unlock_dentry(d1);
9756     + unionfs_unlock_dentry(d2);
9757     + } else {
9758     + unionfs_unlock_dentry(d2);
9759     + unionfs_unlock_dentry(d1);
9760     + }
9761     +}
9762     +
9763     +static inline void unionfs_double_lock_parents(struct dentry *p1,
9764     + struct dentry *p2)
9765     +{
9766     + if (p1 == p2) {
9767     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9768     + return;
9769     + }
9770     + if (p1 < p2) {
9771     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9772     + unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_CHILD);
9773     + } else {
9774     + unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_PARENT);
9775     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_CHILD);
9776     + }
9777     +}
9778     +
9779     +static inline void unionfs_double_unlock_parents(struct dentry *p1,
9780     + struct dentry *p2)
9781     +{
9782     + if (p1 == p2) {
9783     + unionfs_unlock_dentry(p1);
9784     + return;
9785     + }
9786     + if (p1 < p2) { /* unlock in reverse order of double_lock_parents */
9787     + unionfs_unlock_dentry(p1);
9788     + unionfs_unlock_dentry(p2);
9789     + } else {
9790     + unionfs_unlock_dentry(p2);
9791     + unionfs_unlock_dentry(p1);
9792     + }
9793     +}
9794     +
9795     +extern int new_dentry_private_data(struct dentry *dentry, int subclass);
9796     +extern int realloc_dentry_private_data(struct dentry *dentry);
9797     +extern void free_dentry_private_data(struct dentry *dentry);
9798     +extern void update_bstart(struct dentry *dentry);
9799     +extern int init_lower_nd(struct nameidata *nd, unsigned int flags);
9800     +extern void release_lower_nd(struct nameidata *nd, int err);
9801     +
9802     +/*
9803     + * EXTERNALS:
9804     + */
9805     +
9806     +/* replicates the directory structure up to given dentry in given branch */
9807     +extern struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
9808     + const char *name, int bindex);
9809     +
9810     +/* partial lookup */
9811     +extern int unionfs_partial_lookup(struct dentry *dentry,
9812     + struct dentry *parent);
9813     +extern struct dentry *unionfs_lookup_full(struct dentry *dentry,
9814     + struct dentry *parent,
9815     + int lookupmode);
9816     +
9817     +/* copies a file from dbstart to newbindex branch */
9818     +extern int copyup_file(struct inode *dir, struct file *file, int bstart,
9819     + int newbindex, loff_t size);
9820     +extern int copyup_named_file(struct inode *dir, struct file *file,
9821     + char *name, int bstart, int new_bindex,
9822     + loff_t len);
9823     +/* copies a dentry from dbstart to newbindex branch */
9824     +extern int copyup_dentry(struct inode *dir, struct dentry *dentry,
9825     + int bstart, int new_bindex, const char *name,
9826     + int namelen, struct file **copyup_file, loff_t len);
9827     +/* helper functions for post-copyup actions */
9828     +extern void unionfs_postcopyup_setmnt(struct dentry *dentry);
9829     +extern void unionfs_postcopyup_release(struct dentry *dentry);
9830     +
9831     +/* Is this directory empty: 0 if it is empty, -ENOTEMPTY if not. */
9832     +extern int check_empty(struct dentry *dentry, struct dentry *parent,
9833     + struct unionfs_dir_state **namelist);
9834     +/* whiteout and opaque directory helpers */
9835     +extern char *alloc_whname(const char *name, int len);
9836     +extern bool is_whiteout_name(char **namep, int *namelenp);
9837     +extern bool is_validname(const char *name);
9838     +extern struct dentry *lookup_whiteout(const char *name,
9839     + struct dentry *lower_parent);
9840     +extern struct dentry *find_first_whiteout(struct dentry *dentry);
9841     +extern int unlink_whiteout(struct dentry *wh_dentry);
9842     +extern int check_unlink_whiteout(struct dentry *dentry,
9843     + struct dentry *lower_dentry, int bindex);
9844     +extern int create_whiteout(struct dentry *dentry, int start);
9845     +extern int delete_whiteouts(struct dentry *dentry, int bindex,
9846     + struct unionfs_dir_state *namelist);
9847     +extern int is_opaque_dir(struct dentry *dentry, int bindex);
9848     +extern int make_dir_opaque(struct dentry *dir, int bindex);
9849     +extern void unionfs_set_max_namelen(long *namelen);
9850     +
9851     +extern void unionfs_reinterpose(struct dentry *this_dentry);
9852     +extern struct super_block *unionfs_duplicate_super(struct super_block *sb);
9853     +
9854     +/* Locking functions. */
9855     +extern int unionfs_setlk(struct file *file, int cmd, struct file_lock *fl);
9856     +extern int unionfs_getlk(struct file *file, struct file_lock *fl);
9857     +
9858     +/* Common file operations. */
9859     +extern int unionfs_file_revalidate(struct file *file, struct dentry *parent,
9860     + bool willwrite);
9861     +extern int unionfs_open(struct inode *inode, struct file *file);
9862     +extern int unionfs_file_release(struct inode *inode, struct file *file);
9863     +extern int unionfs_flush(struct file *file, fl_owner_t id);
9864     +extern long unionfs_ioctl(struct file *file, unsigned int cmd,
9865     + unsigned long arg);
9866     +extern int unionfs_fsync(struct file *file, struct dentry *dentry,
9867     + int datasync);
9868     +extern int unionfs_fasync(int fd, struct file *file, int flag);
9869     +
9870     +/* Inode operations */
9871     +extern struct inode *unionfs_iget(struct super_block *sb, unsigned long ino);
9872     +extern int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
9873     + struct inode *new_dir, struct dentry *new_dentry);
9874     +extern int unionfs_unlink(struct inode *dir, struct dentry *dentry);
9875     +extern int unionfs_rmdir(struct inode *dir, struct dentry *dentry);
9876     +
9877     +extern bool __unionfs_d_revalidate(struct dentry *dentry,
9878     + struct dentry *parent, bool willwrite);
9879     +extern bool is_negative_lower(const struct dentry *dentry);
9880     +extern bool is_newer_lower(const struct dentry *dentry);
9881     +extern void purge_sb_data(struct super_block *sb);
9882     +
9883     +/* The values for unionfs_interpose's flag. */
9884     +#define INTERPOSE_DEFAULT 0
9885     +#define INTERPOSE_LOOKUP 1
9886     +#define INTERPOSE_REVAL 2
9887     +#define INTERPOSE_REVAL_NEG 3
9888     +#define INTERPOSE_PARTIAL 4
9889     +
9890     +extern struct dentry *unionfs_interpose(struct dentry *this_dentry,
9891     + struct super_block *sb, int flag);
9892     +
9893     +#ifdef CONFIG_UNION_FS_XATTR
9894     +/* Extended attribute functions. */
9895     +extern void *unionfs_xattr_alloc(size_t size, size_t limit);
9896     +static inline void unionfs_xattr_kfree(const void *p)
9897     +{
9898     + kfree(p);
9899     +}
9900     +extern ssize_t unionfs_getxattr(struct dentry *dentry, const char *name,
9901     + void *value, size_t size);
9902     +extern int unionfs_removexattr(struct dentry *dentry, const char *name);
9903     +extern ssize_t unionfs_listxattr(struct dentry *dentry, char *list,
9904     + size_t size);
9905     +extern int unionfs_setxattr(struct dentry *dentry, const char *name,
9906     + const void *value, size_t size, int flags);
9907     +#endif /* CONFIG_UNION_FS_XATTR */
9908     +
9909     +/* The root directory is unhashed, but isn't deleted. */
9910     +static inline int d_deleted(struct dentry *d)
9911     +{
9912     + return d_unhashed(d) && (d != d->d_sb->s_root);
9913     +}
9914     +
9915     +/* unionfs_permission, check if we should bypass error to facilitate copyup */
9916     +#define IS_COPYUP_ERR(err) ((err) == -EROFS)
9917     +
9918     +/* unionfs_open, check if we need to copyup the file */
9919     +#define OPEN_WRITE_FLAGS (O_WRONLY | O_RDWR | O_APPEND)
9920     +#define IS_WRITE_FLAG(flag) ((flag) & OPEN_WRITE_FLAGS)
9921     +
9922     +static inline int branchperms(const struct super_block *sb, int index)
9923     +{
9924     + BUG_ON(index < 0);
9925     + return UNIONFS_SB(sb)->data[index].branchperms;
9926     +}
9927     +
9928     +static inline int set_branchperms(struct super_block *sb, int index, int perms)
9929     +{
9930     + BUG_ON(index < 0);
9931     + UNIONFS_SB(sb)->data[index].branchperms = perms;
9932     + return perms;
9933     +}
9934     +
9935     +/* check if readonly lower inode, but possibly unlinked (no inode->i_sb) */
9936     +static inline int __is_rdonly(const struct inode *inode)
9937     +{
9938     + /* if unlinked, can't be readonly (?) */
9939     + if (!inode->i_sb)
9940     + return 0;
9941     + return IS_RDONLY(inode);
9942     +
9943     +}
9944     +/* Is this file on a read-only branch? */
9945     +static inline int is_robranch_super(const struct super_block *sb, int index)
9946     +{
9947     + int ret;
9948     +
9949     + ret = (!(branchperms(sb, index) & MAY_WRITE)) ? -EROFS : 0;
9950     + return ret;
9951     +}
9952     +
9953     +/* Is this file on a read-only branch? */
9954     +static inline int is_robranch_idx(const struct dentry *dentry, int index)
9955     +{
9956     + struct super_block *lower_sb;
9957     +
9958     + BUG_ON(index < 0);
9959     +
9960     + if (!(branchperms(dentry->d_sb, index) & MAY_WRITE))
9961     + return -EROFS;
9962     +
9963     + lower_sb = unionfs_lower_super_idx(dentry->d_sb, index);
9964     + BUG_ON(lower_sb == NULL);
9965     + /*
9966     + * test sb flags directly, not IS_RDONLY(lower_inode) because the
9967     + * lower_dentry could be a negative.
9968     + */
9969     + if (lower_sb->s_flags & MS_RDONLY)
9970     + return -EROFS;
9971     +
9972     + return 0;
9973     +}
9974     +
9975     +static inline int is_robranch(const struct dentry *dentry)
9976     +{
9977     + int index;
9978     +
9979     + index = UNIONFS_D(dentry)->bstart;
9980     + BUG_ON(index < 0);
9981     +
9982     + return is_robranch_idx(dentry, index);
9983     +}
9984     +
9985     +/*
9986     + * EXTERNALS:
9987     + */
9988     +extern int check_branch(struct nameidata *nd);
9989     +extern int parse_branch_mode(const char *name, int *perms);
9990     +
9991     +/* locking helpers */
9992     +static inline struct dentry *lock_parent(struct dentry *dentry)
9993     +{
9994     + struct dentry *dir = dget_parent(dentry);
9995     + mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
9996     + return dir;
9997     +}
9998     +static inline struct dentry *lock_parent_wh(struct dentry *dentry)
9999     +{
10000     + struct dentry *dir = dget_parent(dentry);
10001     +
10002     + mutex_lock_nested(&dir->d_inode->i_mutex, UNIONFS_DMUTEX_WHITEOUT);
10003     + return dir;
10004     +}
10005     +
10006     +static inline void unlock_dir(struct dentry *dir)
10007     +{
10008     + mutex_unlock(&dir->d_inode->i_mutex);
10009     + dput(dir);
10010     +}
10011     +
10012     +/* lock base inode mutex before calling lookup_one_len */
10013     +static inline struct dentry *lookup_lck_len(const char *name,
10014     + struct dentry *base, int len)
10015     +{
10016     + struct dentry *d;
10017     + mutex_lock(&base->d_inode->i_mutex);
10018     + d = lookup_one_len(name, base, len);
10019     + mutex_unlock(&base->d_inode->i_mutex);
10020     + return d;
10021     +}
10022     +
10023     +static inline struct vfsmount *unionfs_mntget(struct dentry *dentry,
10024     + int bindex)
10025     +{
10026     + struct vfsmount *mnt;
10027     +
10028     + BUG_ON(!dentry || bindex < 0);
10029     +
10030     + mnt = mntget(unionfs_lower_mnt_idx(dentry, bindex));
10031     +#ifdef CONFIG_UNION_FS_DEBUG
10032     + if (!mnt)
10033     + pr_debug("unionfs: mntget: mnt=%p bindex=%d\n",
10034     + mnt, bindex);
10035     +#endif /* CONFIG_UNION_FS_DEBUG */
10036     +
10037     + return mnt;
10038     +}
10039     +
10040     +static inline void unionfs_mntput(struct dentry *dentry, int bindex)
10041     +{
10042     + struct vfsmount *mnt;
10043     +
10044     + if (!dentry && bindex < 0)
10045     + return;
10046     + BUG_ON(!dentry || bindex < 0);
10047     +
10048     + mnt = unionfs_lower_mnt_idx(dentry, bindex);
10049     +#ifdef CONFIG_UNION_FS_DEBUG
10050     + /*
10051     + * Directories can have NULL lower objects in between start/end, but
10052     + * NOT if at the start/end range. We cannot verify that this dentry
10053     + * is a type=DIR, because it may already be a negative dentry. But
10054     + * if dbstart is greater than dbend, we know that this couldn't have
10055     + * been a regular file: it had to have been a directory.
10056     + */
10057     + if (!mnt && !(bindex > dbstart(dentry) && bindex < dbend(dentry)))
10058     + pr_debug("unionfs: mntput: mnt=%p bindex=%d\n", mnt, bindex);
10059     +#endif /* CONFIG_UNION_FS_DEBUG */
10060     + mntput(mnt);
10061     +}
10062     +
10063     +#ifdef CONFIG_UNION_FS_DEBUG
10064     +
10065     +/* useful for tracking code reachability */
10066     +#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
10067     +
10068     +#define unionfs_check_inode(i) __unionfs_check_inode((i), \
10069     + __FILE__, __func__, __LINE__)
10070     +#define unionfs_check_dentry(d) __unionfs_check_dentry((d), \
10071     + __FILE__, __func__, __LINE__)
10072     +#define unionfs_check_file(f) __unionfs_check_file((f), \
10073     + __FILE__, __func__, __LINE__)
10074     +#define unionfs_check_nd(n) __unionfs_check_nd((n), \
10075     + __FILE__, __func__, __LINE__)
10076     +#define show_branch_counts(sb) __show_branch_counts((sb), \
10077     + __FILE__, __func__, __LINE__)
10078     +#define show_inode_times(i) __show_inode_times((i), \
10079     + __FILE__, __func__, __LINE__)
10080     +#define show_dinode_times(d) __show_dinode_times((d), \
10081     + __FILE__, __func__, __LINE__)
10082     +#define show_inode_counts(i) __show_inode_counts((i), \
10083     + __FILE__, __func__, __LINE__)
10084     +
10085     +extern void __unionfs_check_inode(const struct inode *inode, const char *fname,
10086     + const char *fxn, int line);
10087     +extern void __unionfs_check_dentry(const struct dentry *dentry,
10088     + const char *fname, const char *fxn,
10089     + int line);
10090     +extern void __unionfs_check_file(const struct file *file,
10091     + const char *fname, const char *fxn, int line);
10092     +extern void __unionfs_check_nd(const struct nameidata *nd,
10093     + const char *fname, const char *fxn, int line);
10094     +extern void __show_branch_counts(const struct super_block *sb,
10095     + const char *file, const char *fxn, int line);
10096     +extern void __show_inode_times(const struct inode *inode,
10097     + const char *file, const char *fxn, int line);
10098     +extern void __show_dinode_times(const struct dentry *dentry,
10099     + const char *file, const char *fxn, int line);
10100     +extern void __show_inode_counts(const struct inode *inode,
10101     + const char *file, const char *fxn, int line);
10102     +
10103     +#else /* not CONFIG_UNION_FS_DEBUG */
10104     +
10105     +/* we leave useful hooks for these check functions throughout the code */
10106     +#define unionfs_check_inode(i) do { } while (0)
10107     +#define unionfs_check_dentry(d) do { } while (0)
10108     +#define unionfs_check_file(f) do { } while (0)
10109     +#define unionfs_check_nd(n) do { } while (0)
10110     +#define show_branch_counts(sb) do { } while (0)
10111     +#define show_inode_times(i) do { } while (0)
10112     +#define show_dinode_times(d) do { } while (0)
10113     +#define show_inode_counts(i) do { } while (0)
10114     +
10115     +#endif /* not CONFIG_UNION_FS_DEBUG */
10116     +
10117     +#endif /* not _UNION_H_ */
10118     diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
10119     new file mode 100644
10120     index 0000000..542c513
10121     --- /dev/null
10122     +++ b/fs/unionfs/unlink.c
10123     @@ -0,0 +1,278 @@
10124     +/*
10125     + * Copyright (c) 2003-2010 Erez Zadok
10126     + * Copyright (c) 2003-2006 Charles P. Wright
10127     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10128     + * Copyright (c) 2005-2006 Junjiro Okajima
10129     + * Copyright (c) 2005 Arun M. Krishnakumar
10130     + * Copyright (c) 2004-2006 David P. Quigley
10131     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10132     + * Copyright (c) 2003 Puja Gupta
10133     + * Copyright (c) 2003 Harikesavan Krishnan
10134     + * Copyright (c) 2003-2010 Stony Brook University
10135     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
10136     + *
10137     + * This program is free software; you can redistribute it and/or modify
10138     + * it under the terms of the GNU General Public License version 2 as
10139     + * published by the Free Software Foundation.
10140     + */
10141     +
10142     +#include "union.h"
10143     +
10144     +/*
10145     + * Helper function for Unionfs's unlink operation.
10146     + *
10147     + * The main goal of this function is to optimize the unlinking of non-dir
10148     + * objects in unionfs by deleting all possible lower inode objects from the
10149     + * underlying branches having same dentry name as the non-dir dentry on
10150     + * which this unlink operation is called. This way we delete as many lower
10151     + * inodes as possible, and save space. Whiteouts need to be created in
10152     + * branch0 only if unlinking fails on any of the lower branch other than
10153     + * branch0, or if a lower branch is marked read-only.
10154     + *
10155     + * Also, while unlinking a file, if we encounter any dir type entry in any
10156     + * intermediate branch, then we remove the directory by calling vfs_rmdir.
10157     + * The following special cases are also handled:
10158     +
10159     + * (1) If an error occurs in branch0 during vfs_unlink, then we return
10160     + * appropriate error.
10161     + *
10162     + * (2) If we get an error during unlink in any of other lower branch other
10163     + * than branch0, then we create a whiteout in branch0.
10164     + *
10165     + * (3) If a whiteout already exists in any intermediate branch, we delete
10166     + * all possible inodes only up to that branch (this is an "opaqueness"
10167     + * as as per Documentation/filesystems/unionfs/concepts.txt).
10168     + *
10169     + */
10170     +static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry,
10171     + struct dentry *parent)
10172     +{
10173     + struct dentry *lower_dentry;
10174     + struct dentry *lower_dir_dentry;
10175     + int bindex;
10176     + int err = 0;
10177     +
10178     + err = unionfs_partial_lookup(dentry, parent);
10179     + if (err)
10180     + goto out;
10181     +
10182     + /* trying to unlink all possible valid instances */
10183     + for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
10184     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10185     + if (!lower_dentry || !lower_dentry->d_inode)
10186     + continue;
10187     +
10188     + lower_dir_dentry = lock_parent(lower_dentry);
10189     +
10190     + /* avoid destroying the lower inode if the object is in use */
10191     + dget(lower_dentry);
10192     + err = is_robranch_super(dentry->d_sb, bindex);
10193     + if (!err) {
10194     + /* see Documentation/filesystems/unionfs/issues.txt */
10195     + lockdep_off();
10196     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
10197     + err = vfs_unlink(lower_dir_dentry->d_inode,
10198     + lower_dentry);
10199     + else
10200     + err = vfs_rmdir(lower_dir_dentry->d_inode,
10201     + lower_dentry);
10202     + lockdep_on();
10203     + }
10204     +
10205     + /* if lower object deletion succeeds, update inode's times */
10206     + if (!err)
10207     + unionfs_copy_attr_times(dentry->d_inode);
10208     + dput(lower_dentry);
10209     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10210     + unlock_dir(lower_dir_dentry);
10211     +
10212     + if (err)
10213     + break;
10214     + }
10215     +
10216     + /*
10217     + * Create the whiteout in branch 0 (highest priority) only if (a)
10218     + * there was an error in any intermediate branch other than branch 0
10219     + * due to failure of vfs_unlink/vfs_rmdir or (b) a branch marked or
10220     + * mounted read-only.
10221     + */
10222     + if (err) {
10223     + if ((bindex == 0) ||
10224     + ((bindex == dbstart(dentry)) &&
10225     + (!IS_COPYUP_ERR(err))))
10226     + goto out;
10227     + else {
10228     + if (!IS_COPYUP_ERR(err))
10229     + pr_debug("unionfs: lower object deletion "
10230     + "failed in branch:%d\n", bindex);
10231     + err = create_whiteout(dentry, sbstart(dentry->d_sb));
10232     + }
10233     + }
10234     +
10235     +out:
10236     + if (!err)
10237     + inode_dec_link_count(dentry->d_inode);
10238     +
10239     + /* We don't want to leave negative leftover dentries for revalidate. */
10240     + if (!err && (dbopaque(dentry) != -1))
10241     + update_bstart(dentry);
10242     +
10243     + return err;
10244     +}
10245     +
10246     +int unionfs_unlink(struct inode *dir, struct dentry *dentry)
10247     +{
10248     + int err = 0;
10249     + struct inode *inode = dentry->d_inode;
10250     + struct dentry *parent;
10251     + int valid;
10252     +
10253     + BUG_ON(S_ISDIR(inode->i_mode));
10254     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10255     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10256     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10257     +
10258     + valid = __unionfs_d_revalidate(dentry, parent, false);
10259     + if (unlikely(!valid)) {
10260     + err = -ESTALE;
10261     + goto out;
10262     + }
10263     + unionfs_check_dentry(dentry);
10264     +
10265     + err = unionfs_unlink_whiteout(dir, dentry, parent);
10266     + /* call d_drop so the system "forgets" about us */
10267     + if (!err) {
10268     + unionfs_postcopyup_release(dentry);
10269     + unionfs_postcopyup_setmnt(parent);
10270     + if (inode->i_nlink == 0) /* drop lower inodes */
10271     + iput_lowers_all(inode, false);
10272     + d_drop(dentry);
10273     + /*
10274     + * if unlink/whiteout succeeded, parent dir mtime has
10275     + * changed
10276     + */
10277     + unionfs_copy_attr_times(dir);
10278     + }
10279     +
10280     +out:
10281     + if (!err) {
10282     + unionfs_check_dentry(dentry);
10283     + unionfs_check_inode(dir);
10284     + }
10285     + unionfs_unlock_dentry(dentry);
10286     + unionfs_unlock_parent(dentry, parent);
10287     + unionfs_read_unlock(dentry->d_sb);
10288     + return err;
10289     +}
10290     +
10291     +static int unionfs_rmdir_first(struct inode *dir, struct dentry *dentry,
10292     + struct unionfs_dir_state *namelist)
10293     +{
10294     + int err;
10295     + struct dentry *lower_dentry;
10296     + struct dentry *lower_dir_dentry = NULL;
10297     +
10298     + /* Here we need to remove whiteout entries. */
10299     + err = delete_whiteouts(dentry, dbstart(dentry), namelist);
10300     + if (err)
10301     + goto out;
10302     +
10303     + lower_dentry = unionfs_lower_dentry(dentry);
10304     +
10305     + lower_dir_dentry = lock_parent(lower_dentry);
10306     +
10307     + /* avoid destroying the lower inode if the file is in use */
10308     + dget(lower_dentry);
10309     + err = is_robranch(dentry);
10310     + if (!err)
10311     + err = vfs_rmdir(lower_dir_dentry->d_inode, lower_dentry);
10312     + dput(lower_dentry);
10313     +
10314     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10315     + /* propagate number of hard-links */
10316     + dentry->d_inode->i_nlink = unionfs_get_nlinks(dentry->d_inode);
10317     +
10318     +out:
10319     + if (lower_dir_dentry)
10320     + unlock_dir(lower_dir_dentry);
10321     + return err;
10322     +}
10323     +
10324     +int unionfs_rmdir(struct inode *dir, struct dentry *dentry)
10325     +{
10326     + int err = 0;
10327     + struct unionfs_dir_state *namelist = NULL;
10328     + struct dentry *parent;
10329     + int dstart, dend;
10330     + bool valid;
10331     +
10332     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10333     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10334     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10335     +
10336     + valid = __unionfs_d_revalidate(dentry, parent, false);
10337     + if (unlikely(!valid)) {
10338     + err = -ESTALE;
10339     + goto out;
10340     + }
10341     + unionfs_check_dentry(dentry);
10342     +
10343     + /* check if this unionfs directory is empty or not */
10344     + err = check_empty(dentry, parent, &namelist);
10345     + if (err)
10346     + goto out;
10347     +
10348     + err = unionfs_rmdir_first(dir, dentry, namelist);
10349     + dstart = dbstart(dentry);
10350     + dend = dbend(dentry);
10351     + /*
10352     + * We create a whiteout for the directory if there was an error to
10353     + * rmdir the first directory entry in the union. Otherwise, we
10354     + * create a whiteout only if there is no chance that a lower
10355     + * priority branch might also have the same named directory. IOW,
10356     + * if there is not another same-named directory at a lower priority
10357     + * branch, then we don't need to create a whiteout for it.
10358     + */
10359     + if (!err) {
10360     + if (dstart < dend)
10361     + err = create_whiteout(dentry, dstart);
10362     + } else {
10363     + int new_err;
10364     +
10365     + if (dstart == 0)
10366     + goto out;
10367     +
10368     + /* exit if the error returned was NOT -EROFS */
10369     + if (!IS_COPYUP_ERR(err))
10370     + goto out;
10371     +
10372     + new_err = create_whiteout(dentry, dstart - 1);
10373     + if (new_err != -EEXIST)
10374     + err = new_err;
10375     + }
10376     +
10377     +out:
10378     + /*
10379     + * Drop references to lower dentry/inode so storage space for them
10380     + * can be reclaimed. Then, call d_drop so the system "forgets"
10381     + * about us.
10382     + */
10383     + if (!err) {
10384     + iput_lowers_all(dentry->d_inode, false);
10385     + dput(unionfs_lower_dentry_idx(dentry, dstart));
10386     + unionfs_set_lower_dentry_idx(dentry, dstart, NULL);
10387     + d_drop(dentry);
10388     + /* update our lower vfsmnts, in case a copyup took place */
10389     + unionfs_postcopyup_setmnt(dentry);
10390     + unionfs_check_dentry(dentry);
10391     + unionfs_check_inode(dir);
10392     + }
10393     +
10394     + if (namelist)
10395     + free_rdstate(namelist);
10396     +
10397     + unionfs_unlock_dentry(dentry);
10398     + unionfs_unlock_parent(dentry, parent);
10399     + unionfs_read_unlock(dentry->d_sb);
10400     + return err;
10401     +}
10402     diff --git a/fs/unionfs/whiteout.c b/fs/unionfs/whiteout.c
10403     new file mode 100644
10404     index 0000000..405073a
10405     --- /dev/null
10406     +++ b/fs/unionfs/whiteout.c
10407     @@ -0,0 +1,584 @@
10408     +/*
10409     + * Copyright (c) 2003-2010 Erez Zadok
10410     + * Copyright (c) 2003-2006 Charles P. Wright
10411     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10412     + * Copyright (c) 2005-2006 Junjiro Okajima
10413     + * Copyright (c) 2005 Arun M. Krishnakumar
10414     + * Copyright (c) 2004-2006 David P. Quigley
10415     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10416     + * Copyright (c) 2003 Puja Gupta
10417     + * Copyright (c) 2003 Harikesavan Krishnan
10418     + * Copyright (c) 2003-2010 Stony Brook University
10419     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
10420     + *
10421     + * This program is free software; you can redistribute it and/or modify
10422     + * it under the terms of the GNU General Public License version 2 as
10423     + * published by the Free Software Foundation.
10424     + */
10425     +
10426     +#include "union.h"
10427     +
10428     +/*
10429     + * whiteout and opaque directory helpers
10430     + */
10431     +
10432     +/* What do we use for whiteouts. */
10433     +#define UNIONFS_WHPFX ".wh."
10434     +#define UNIONFS_WHLEN 4
10435     +/*
10436     + * If a directory contains this file, then it is opaque. We start with the
10437     + * .wh. flag so that it is blocked by lookup.
10438     + */
10439     +#define UNIONFS_DIR_OPAQUE_NAME "__dir_opaque"
10440     +#define UNIONFS_DIR_OPAQUE UNIONFS_WHPFX UNIONFS_DIR_OPAQUE_NAME
10441     +
10442     +/* construct whiteout filename */
10443     +char *alloc_whname(const char *name, int len)
10444     +{
10445     + char *buf;
10446     +
10447     + buf = kmalloc(len + UNIONFS_WHLEN + 1, GFP_KERNEL);
10448     + if (unlikely(!buf))
10449     + return ERR_PTR(-ENOMEM);
10450     +
10451     + strcpy(buf, UNIONFS_WHPFX);
10452     + strlcat(buf, name, len + UNIONFS_WHLEN + 1);
10453     +
10454     + return buf;
10455     +}
10456     +
10457     +/*
10458     + * XXX: this can be inline or CPP macro, but is here to keep all whiteout
10459     + * code in one place.
10460     + */
10461     +void unionfs_set_max_namelen(long *namelen)
10462     +{
10463     + *namelen -= UNIONFS_WHLEN;
10464     +}
10465     +
10466     +/* check if @namep is a whiteout, update @namep and @namelenp accordingly */
10467     +bool is_whiteout_name(char **namep, int *namelenp)
10468     +{
10469     + if (*namelenp > UNIONFS_WHLEN &&
10470     + !strncmp(*namep, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
10471     + *namep += UNIONFS_WHLEN;
10472     + *namelenp -= UNIONFS_WHLEN;
10473     + return true;
10474     + }
10475     + return false;
10476     +}
10477     +
10478     +/* is the filename valid == !(whiteout for a file or opaque dir marker) */
10479     +bool is_validname(const char *name)
10480     +{
10481     + if (!strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN))
10482     + return false;
10483     + if (!strncmp(name, UNIONFS_DIR_OPAQUE_NAME,
10484     + sizeof(UNIONFS_DIR_OPAQUE_NAME) - 1))
10485     + return false;
10486     + return true;
10487     +}
10488     +
10489     +/*
10490     + * Look for a whiteout @name in @lower_parent directory. If error, return
10491     + * ERR_PTR. Caller must dput() the returned dentry if not an error.
10492     + *
10493     + * XXX: some callers can reuse the whname allocated buffer to avoid repeated
10494     + * free then re-malloc calls. Need to provide a different API for those
10495     + * callers.
10496     + */
10497     +struct dentry *lookup_whiteout(const char *name, struct dentry *lower_parent)
10498     +{
10499     + char *whname = NULL;
10500     + int err = 0, namelen;
10501     + struct dentry *wh_dentry = NULL;
10502     +
10503     + namelen = strlen(name);
10504     + whname = alloc_whname(name, namelen);
10505     + if (unlikely(IS_ERR(whname))) {
10506     + err = PTR_ERR(whname);
10507     + goto out;
10508     + }
10509     +
10510     + /* check if whiteout exists in this branch: lookup .wh.foo */
10511     + wh_dentry = lookup_lck_len(whname, lower_parent, strlen(whname));
10512     + if (IS_ERR(wh_dentry)) {
10513     + err = PTR_ERR(wh_dentry);
10514     + goto out;
10515     + }
10516     +
10517     + /* check if negative dentry (ENOENT) */
10518     + if (!wh_dentry->d_inode)
10519     + goto out;
10520     +
10521     + /* whiteout found: check if valid type */
10522     + if (!S_ISREG(wh_dentry->d_inode->i_mode)) {
10523     + printk(KERN_ERR "unionfs: invalid whiteout %s entry type %d\n",
10524     + whname, wh_dentry->d_inode->i_mode);
10525     + dput(wh_dentry);
10526     + err = -EIO;
10527     + goto out;
10528     + }
10529     +
10530     +out:
10531     + kfree(whname);
10532     + if (err)
10533     + wh_dentry = ERR_PTR(err);
10534     + return wh_dentry;
10535     +}
10536     +
10537     +/* find and return first whiteout in parent directory, else ENOENT */
10538     +struct dentry *find_first_whiteout(struct dentry *dentry)
10539     +{
10540     + int bindex, bstart, bend;
10541     + struct dentry *parent, *lower_parent, *wh_dentry;
10542     +
10543     + parent = dget_parent(dentry);
10544     +
10545     + bstart = dbstart(parent);
10546     + bend = dbend(parent);
10547     + wh_dentry = ERR_PTR(-ENOENT);
10548     +
10549     + for (bindex = bstart; bindex <= bend; bindex++) {
10550     + lower_parent = unionfs_lower_dentry_idx(parent, bindex);
10551     + if (!lower_parent)
10552     + continue;
10553     + wh_dentry = lookup_whiteout(dentry->d_name.name, lower_parent);
10554     + if (IS_ERR(wh_dentry))
10555     + continue;
10556     + if (wh_dentry->d_inode)
10557     + break;
10558     + dput(wh_dentry);
10559     + wh_dentry = ERR_PTR(-ENOENT);
10560     + }
10561     +
10562     + dput(parent);
10563     +
10564     + return wh_dentry;
10565     +}
10566     +
10567     +/*
10568     + * Unlink a whiteout dentry. Returns 0 or -errno. Caller must hold and
10569     + * release dentry reference.
10570     + */
10571     +int unlink_whiteout(struct dentry *wh_dentry)
10572     +{
10573     + int err;
10574     + struct dentry *lower_dir_dentry;
10575     +
10576     + /* dget and lock parent dentry */
10577     + lower_dir_dentry = lock_parent_wh(wh_dentry);
10578     +
10579     + /* see Documentation/filesystems/unionfs/issues.txt */
10580     + lockdep_off();
10581     + err = vfs_unlink(lower_dir_dentry->d_inode, wh_dentry);
10582     + lockdep_on();
10583     + unlock_dir(lower_dir_dentry);
10584     +
10585     + /*
10586     + * Whiteouts are special files and should be deleted no matter what
10587     + * (as if they never existed), in order to allow this create
10588     + * operation to succeed. This is especially important in sticky
10589     + * directories: a whiteout may have been created by one user, but
10590     + * the newly created file may be created by another user.
10591     + * Therefore, in order to maintain Unix semantics, if the vfs_unlink
10592     + * above failed, then we have to try to directly unlink the
10593     + * whiteout. Note: in the ODF version of unionfs, whiteout are
10594     + * handled much more cleanly.
10595     + */
10596     + if (err == -EPERM) {
10597     + struct inode *inode = lower_dir_dentry->d_inode;
10598     + err = inode->i_op->unlink(inode, wh_dentry);
10599     + }
10600     + if (err)
10601     + printk(KERN_ERR "unionfs: could not unlink whiteout %s, "
10602     + "err = %d\n", wh_dentry->d_name.name, err);
10603     +
10604     + return err;
10605     +
10606     +}
10607     +
10608     +/*
10609     + * Helper function when creating new objects (create, symlink, mknod, etc.).
10610     + * Checks to see if there's a whiteout in @lower_dentry's parent directory,
10611     + * whose name is taken from @dentry. Then tries to remove that whiteout, if
10612     + * found. If <dentry,bindex> is a branch marked readonly, return -EROFS.
10613     + * If it finds both a regular file and a whiteout, return -EIO (this should
10614     + * never happen).
10615     + *
10616     + * Return 0 if no whiteout was found. Return 1 if one was found and
10617     + * successfully removed. Therefore a value >= 0 tells the caller that
10618     + * @lower_dentry belongs to a good branch to create the new object in).
10619     + * Return -ERRNO if an error occurred during whiteout lookup or in trying to
10620     + * unlink the whiteout.
10621     + */
10622     +int check_unlink_whiteout(struct dentry *dentry, struct dentry *lower_dentry,
10623     + int bindex)
10624     +{
10625     + int err;
10626     + struct dentry *wh_dentry = NULL;
10627     + struct dentry *lower_dir_dentry = NULL;
10628     +
10629     + /* look for whiteout dentry first */
10630     + lower_dir_dentry = dget_parent(lower_dentry);
10631     + wh_dentry = lookup_whiteout(dentry->d_name.name, lower_dir_dentry);
10632     + dput(lower_dir_dentry);
10633     + if (IS_ERR(wh_dentry)) {
10634     + err = PTR_ERR(wh_dentry);
10635     + goto out;
10636     + }
10637     +
10638     + if (!wh_dentry->d_inode) { /* no whiteout exists*/
10639     + err = 0;
10640     + goto out_dput;
10641     + }
10642     +
10643     + /* check if regular file and whiteout were both found */
10644     + if (unlikely(lower_dentry->d_inode)) {
10645     + err = -EIO;
10646     + printk(KERN_ERR "unionfs: found both whiteout and regular "
10647     + "file in directory %s (branch %d)\n",
10648     + lower_dir_dentry->d_name.name, bindex);
10649     + goto out_dput;
10650     + }
10651     +
10652     + /* check if branch is writeable */
10653     + err = is_robranch_super(dentry->d_sb, bindex);
10654     + if (err)
10655     + goto out_dput;
10656     +
10657     + /* .wh.foo has been found, so let's unlink it */
10658     + err = unlink_whiteout(wh_dentry);
10659     + if (!err)
10660     + err = 1; /* a whiteout was found and successfully removed */
10661     +out_dput:
10662     + dput(wh_dentry);
10663     +out:
10664     + return err;
10665     +}
10666     +
10667     +/*
10668     + * Pass an unionfs dentry and an index. It will try to create a whiteout
10669     + * for the filename in dentry, and will try in branch 'index'. On error,
10670     + * it will proceed to a branch to the left.
10671     + */
10672     +int create_whiteout(struct dentry *dentry, int start)
10673     +{
10674     + int bstart, bend, bindex;
10675     + struct dentry *lower_dir_dentry;
10676     + struct dentry *lower_dentry;
10677     + struct dentry *lower_wh_dentry;
10678     + struct nameidata nd;
10679     + char *name = NULL;
10680     + int err = -EINVAL;
10681     +
10682     + verify_locked(dentry);
10683     +
10684     + bstart = dbstart(dentry);
10685     + bend = dbend(dentry);
10686     +
10687     + /* create dentry's whiteout equivalent */
10688     + name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
10689     + if (unlikely(IS_ERR(name))) {
10690     + err = PTR_ERR(name);
10691     + goto out;
10692     + }
10693     +
10694     + for (bindex = start; bindex >= 0; bindex--) {
10695     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10696     +
10697     + if (!lower_dentry) {
10698     + /*
10699     + * if lower dentry is not present, create the
10700     + * entire lower dentry directory structure and go
10701     + * ahead. Since we want to just create whiteout, we
10702     + * only want the parent dentry, and hence get rid of
10703     + * this dentry.
10704     + */
10705     + lower_dentry = create_parents(dentry->d_inode,
10706     + dentry,
10707     + dentry->d_name.name,
10708     + bindex);
10709     + if (!lower_dentry || IS_ERR(lower_dentry)) {
10710     + int ret = PTR_ERR(lower_dentry);
10711     + if (!IS_COPYUP_ERR(ret))
10712     + printk(KERN_ERR
10713     + "unionfs: create_parents for "
10714     + "whiteout failed: bindex=%d "
10715     + "err=%d\n", bindex, ret);
10716     + continue;
10717     + }
10718     + }
10719     +
10720     + lower_wh_dentry =
10721     + lookup_lck_len(name, lower_dentry->d_parent,
10722     + dentry->d_name.len + UNIONFS_WHLEN);
10723     + if (IS_ERR(lower_wh_dentry))
10724     + continue;
10725     +
10726     + /*
10727     + * The whiteout already exists. This used to be impossible,
10728     + * but now is possible because of opaqueness.
10729     + */
10730     + if (lower_wh_dentry->d_inode) {
10731     + dput(lower_wh_dentry);
10732     + err = 0;
10733     + goto out;
10734     + }
10735     +
10736     + err = init_lower_nd(&nd, LOOKUP_CREATE);
10737     + if (unlikely(err < 0))
10738     + goto out;
10739     + lower_dir_dentry = lock_parent_wh(lower_wh_dentry);
10740     + err = is_robranch_super(dentry->d_sb, bindex);
10741     + if (!err)
10742     + err = vfs_create(lower_dir_dentry->d_inode,
10743     + lower_wh_dentry,
10744     + current_umask() & S_IRUGO,
10745     + &nd);
10746     + unlock_dir(lower_dir_dentry);
10747     + dput(lower_wh_dentry);
10748     + release_lower_nd(&nd, err);
10749     +
10750     + if (!err || !IS_COPYUP_ERR(err))
10751     + break;
10752     + }
10753     +
10754     + /* set dbopaque so that lookup will not proceed after this branch */
10755     + if (!err)
10756     + dbopaque(dentry) = bindex;
10757     +
10758     +out:
10759     + kfree(name);
10760     + return err;
10761     +}
10762     +
10763     +/*
10764     + * Delete all of the whiteouts in a given directory for rmdir.
10765     + *
10766     + * lower directory inode should be locked
10767     + */
10768     +static int do_delete_whiteouts(struct dentry *dentry, int bindex,
10769     + struct unionfs_dir_state *namelist)
10770     +{
10771     + int err = 0;
10772     + struct dentry *lower_dir_dentry = NULL;
10773     + struct dentry *lower_dentry;
10774     + char *name = NULL, *p;
10775     + struct inode *lower_dir;
10776     + int i;
10777     + struct list_head *pos;
10778     + struct filldir_node *cursor;
10779     +
10780     + /* Find out lower parent dentry */
10781     + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10782     + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10783     + lower_dir = lower_dir_dentry->d_inode;
10784     + BUG_ON(!S_ISDIR(lower_dir->i_mode));
10785     +
10786     + err = -ENOMEM;
10787     + name = __getname();
10788     + if (unlikely(!name))
10789     + goto out;
10790     + strcpy(name, UNIONFS_WHPFX);
10791     + p = name + UNIONFS_WHLEN;
10792     +
10793     + err = 0;
10794     + for (i = 0; !err && i < namelist->size; i++) {
10795     + list_for_each(pos, &namelist->list[i]) {
10796     + cursor =
10797     + list_entry(pos, struct filldir_node,
10798     + file_list);
10799     + /* Only operate on whiteouts in this branch. */
10800     + if (cursor->bindex != bindex)
10801     + continue;
10802     + if (!cursor->whiteout)
10803     + continue;
10804     +
10805     + strlcpy(p, cursor->name, PATH_MAX - UNIONFS_WHLEN);
10806     + lower_dentry =
10807     + lookup_lck_len(name, lower_dir_dentry,
10808     + cursor->namelen +
10809     + UNIONFS_WHLEN);
10810     + if (IS_ERR(lower_dentry)) {
10811     + err = PTR_ERR(lower_dentry);
10812     + break;
10813     + }
10814     + if (lower_dentry->d_inode)
10815     + err = vfs_unlink(lower_dir, lower_dentry);
10816     + dput(lower_dentry);
10817     + if (err)
10818     + break;
10819     + }
10820     + }
10821     +
10822     + __putname(name);
10823     +
10824     + /* After all of the removals, we should copy the attributes once. */
10825     + fsstack_copy_attr_times(dentry->d_inode, lower_dir_dentry->d_inode);
10826     +
10827     +out:
10828     + return err;
10829     +}
10830     +
10831     +
10832     +void __delete_whiteouts(struct work_struct *work)
10833     +{
10834     + struct sioq_args *args = container_of(work, struct sioq_args, work);
10835     + struct deletewh_args *d = &args->deletewh;
10836     +
10837     + args->err = do_delete_whiteouts(d->dentry, d->bindex, d->namelist);
10838     + complete(&args->comp);
10839     +}
10840     +
10841     +/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */
10842     +int delete_whiteouts(struct dentry *dentry, int bindex,
10843     + struct unionfs_dir_state *namelist)
10844     +{
10845     + int err;
10846     + struct super_block *sb;
10847     + struct dentry *lower_dir_dentry;
10848     + struct inode *lower_dir;
10849     + struct sioq_args args;
10850     +
10851     + sb = dentry->d_sb;
10852     +
10853     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
10854     + BUG_ON(bindex < dbstart(dentry));
10855     + BUG_ON(bindex > dbend(dentry));
10856     + err = is_robranch_super(sb, bindex);
10857     + if (err)
10858     + goto out;
10859     +
10860     + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10861     + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10862     + lower_dir = lower_dir_dentry->d_inode;
10863     + BUG_ON(!S_ISDIR(lower_dir->i_mode));
10864     +
10865     + if (!inode_permission(lower_dir, MAY_WRITE | MAY_EXEC)) {
10866     + err = do_delete_whiteouts(dentry, bindex, namelist);
10867     + } else {
10868     + args.deletewh.namelist = namelist;
10869     + args.deletewh.dentry = dentry;
10870     + args.deletewh.bindex = bindex;
10871     + run_sioq(__delete_whiteouts, &args);
10872     + err = args.err;
10873     + }
10874     +
10875     +out:
10876     + return err;
10877     +}
10878     +
10879     +/****************************************************************************
10880     + * Opaque directory helpers *
10881     + ****************************************************************************/
10882     +
10883     +/*
10884     + * is_opaque_dir: returns 0 if it is NOT an opaque dir, 1 if it is, and
10885     + * -errno if an error occurred trying to figure this out.
10886     + */
10887     +int is_opaque_dir(struct dentry *dentry, int bindex)
10888     +{
10889     + int err = 0;
10890     + struct dentry *lower_dentry;
10891     + struct dentry *wh_lower_dentry;
10892     + struct inode *lower_inode;
10893     + struct sioq_args args;
10894     +
10895     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10896     + lower_inode = lower_dentry->d_inode;
10897     +
10898     + BUG_ON(!S_ISDIR(lower_inode->i_mode));
10899     +
10900     + mutex_lock(&lower_inode->i_mutex);
10901     +
10902     + if (!inode_permission(lower_inode, MAY_EXEC)) {
10903     + wh_lower_dentry =
10904     + lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
10905     + sizeof(UNIONFS_DIR_OPAQUE) - 1);
10906     + } else {
10907     + args.is_opaque.dentry = lower_dentry;
10908     + run_sioq(__is_opaque_dir, &args);
10909     + wh_lower_dentry = args.ret;
10910     + }
10911     +
10912     + mutex_unlock(&lower_inode->i_mutex);
10913     +
10914     + if (IS_ERR(wh_lower_dentry)) {
10915     + err = PTR_ERR(wh_lower_dentry);
10916     + goto out;
10917     + }
10918     +
10919     + /* This is an opaque dir iff wh_lower_dentry is positive */
10920     + err = !!wh_lower_dentry->d_inode;
10921     +
10922     + dput(wh_lower_dentry);
10923     +out:
10924     + return err;
10925     +}
10926     +
10927     +void __is_opaque_dir(struct work_struct *work)
10928     +{
10929     + struct sioq_args *args = container_of(work, struct sioq_args, work);
10930     +
10931     + args->ret = lookup_one_len(UNIONFS_DIR_OPAQUE, args->is_opaque.dentry,
10932     + sizeof(UNIONFS_DIR_OPAQUE) - 1);
10933     + complete(&args->comp);
10934     +}
10935     +
10936     +int make_dir_opaque(struct dentry *dentry, int bindex)
10937     +{
10938     + int err = 0;
10939     + struct dentry *lower_dentry, *diropq;
10940     + struct inode *lower_dir;
10941     + struct nameidata nd;
10942     + const struct cred *old_creds;
10943     + struct cred *new_creds;
10944     +
10945     + /*
10946     + * Opaque directory whiteout markers are special files (like regular
10947     + * whiteouts), and should appear to the users as if they don't
10948     + * exist. They should be created/deleted regardless of directory
10949     + * search/create permissions, but only for the duration of this
10950     + * creation of the .wh.__dir_opaque: file. Note, this does not
10951     + * circumvent normal ->permission).
10952     + */
10953     + new_creds = prepare_creds();
10954     + if (unlikely(!new_creds)) {
10955     + err = -ENOMEM;
10956     + goto out_err;
10957     + }
10958     + cap_raise(new_creds->cap_effective, CAP_DAC_READ_SEARCH);
10959     + cap_raise(new_creds->cap_effective, CAP_DAC_OVERRIDE);
10960     + old_creds = override_creds(new_creds);
10961     +
10962     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10963     + lower_dir = lower_dentry->d_inode;
10964     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode) ||
10965     + !S_ISDIR(lower_dir->i_mode));
10966     +
10967     + mutex_lock(&lower_dir->i_mutex);
10968     + diropq = lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
10969     + sizeof(UNIONFS_DIR_OPAQUE) - 1);
10970     + if (IS_ERR(diropq)) {
10971     + err = PTR_ERR(diropq);
10972     + goto out;
10973     + }
10974     +
10975     + err = init_lower_nd(&nd, LOOKUP_CREATE);
10976     + if (unlikely(err < 0))
10977     + goto out;
10978     + if (!diropq->d_inode)
10979     + err = vfs_create(lower_dir, diropq, S_IRUGO, &nd);
10980     + if (!err)
10981     + dbopaque(dentry) = bindex;
10982     + release_lower_nd(&nd, err);
10983     +
10984     + dput(diropq);
10985     +
10986     +out:
10987     + mutex_unlock(&lower_dir->i_mutex);
10988     + revert_creds(old_creds);
10989     +out_err:
10990     + return err;
10991     +}
10992     diff --git a/fs/unionfs/xattr.c b/fs/unionfs/xattr.c
10993     new file mode 100644
10994     index 0000000..9002e06
10995     --- /dev/null
10996     +++ b/fs/unionfs/xattr.c
10997     @@ -0,0 +1,173 @@
10998     +/*
10999     + * Copyright (c) 2003-2010 Erez Zadok
11000     + * Copyright (c) 2003-2006 Charles P. Wright
11001     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
11002     + * Copyright (c) 2005-2006 Junjiro Okajima
11003     + * Copyright (c) 2005 Arun M. Krishnakumar
11004     + * Copyright (c) 2004-2006 David P. Quigley
11005     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
11006     + * Copyright (c) 2003 Puja Gupta
11007     + * Copyright (c) 2003 Harikesavan Krishnan
11008     + * Copyright (c) 2003-2010 Stony Brook University
11009     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
11010     + *
11011     + * This program is free software; you can redistribute it and/or modify
11012     + * it under the terms of the GNU General Public License version 2 as
11013     + * published by the Free Software Foundation.
11014     + */
11015     +
11016     +#include "union.h"
11017     +
11018     +/* This is lifted from fs/xattr.c */
11019     +void *unionfs_xattr_alloc(size_t size, size_t limit)
11020     +{
11021     + void *ptr;
11022     +
11023     + if (size > limit)
11024     + return ERR_PTR(-E2BIG);
11025     +
11026     + if (!size) /* size request, no buffer is needed */
11027     + return NULL;
11028     +
11029     + ptr = kmalloc(size, GFP_KERNEL);
11030     + if (unlikely(!ptr))
11031     + return ERR_PTR(-ENOMEM);
11032     + return ptr;
11033     +}
11034     +
11035     +/*
11036     + * BKL held by caller.
11037     + * dentry->d_inode->i_mutex locked
11038     + */
11039     +ssize_t unionfs_getxattr(struct dentry *dentry, const char *name, void *value,
11040     + size_t size)
11041     +{
11042     + struct dentry *lower_dentry = NULL;
11043     + struct dentry *parent;
11044     + int err = -EOPNOTSUPP;
11045     + bool valid;
11046     +
11047     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11048     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11049     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11050     +
11051     + valid = __unionfs_d_revalidate(dentry, parent, false);
11052     + if (unlikely(!valid)) {
11053     + err = -ESTALE;
11054     + goto out;
11055     + }
11056     +
11057     + lower_dentry = unionfs_lower_dentry(dentry);
11058     +
11059     + err = vfs_getxattr(lower_dentry, (char *) name, value, size);
11060     +
11061     +out:
11062     + unionfs_check_dentry(dentry);
11063     + unionfs_unlock_dentry(dentry);
11064     + unionfs_unlock_parent(dentry, parent);
11065     + unionfs_read_unlock(dentry->d_sb);
11066     + return err;
11067     +}
11068     +
11069     +/*
11070     + * BKL held by caller.
11071     + * dentry->d_inode->i_mutex locked
11072     + */
11073     +int unionfs_setxattr(struct dentry *dentry, const char *name,
11074     + const void *value, size_t size, int flags)
11075     +{
11076     + struct dentry *lower_dentry = NULL;
11077     + struct dentry *parent;
11078     + int err = -EOPNOTSUPP;
11079     + bool valid;
11080     +
11081     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11082     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11083     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11084     +
11085     + valid = __unionfs_d_revalidate(dentry, parent, false);
11086     + if (unlikely(!valid)) {
11087     + err = -ESTALE;
11088     + goto out;
11089     + }
11090     +
11091     + lower_dentry = unionfs_lower_dentry(dentry);
11092     +
11093     + err = vfs_setxattr(lower_dentry, (char *) name, (void *) value,
11094     + size, flags);
11095     +
11096     +out:
11097     + unionfs_check_dentry(dentry);
11098     + unionfs_unlock_dentry(dentry);
11099     + unionfs_unlock_parent(dentry, parent);
11100     + unionfs_read_unlock(dentry->d_sb);
11101     + return err;
11102     +}
11103     +
11104     +/*
11105     + * BKL held by caller.
11106     + * dentry->d_inode->i_mutex locked
11107     + */
11108     +int unionfs_removexattr(struct dentry *dentry, const char *name)
11109     +{
11110     + struct dentry *lower_dentry = NULL;
11111     + struct dentry *parent;
11112     + int err = -EOPNOTSUPP;
11113     + bool valid;
11114     +
11115     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11116     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11117     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11118     +
11119     + valid = __unionfs_d_revalidate(dentry, parent, false);
11120     + if (unlikely(!valid)) {
11121     + err = -ESTALE;
11122     + goto out;
11123     + }
11124     +
11125     + lower_dentry = unionfs_lower_dentry(dentry);
11126     +
11127     + err = vfs_removexattr(lower_dentry, (char *) name);
11128     +
11129     +out:
11130     + unionfs_check_dentry(dentry);
11131     + unionfs_unlock_dentry(dentry);
11132     + unionfs_unlock_parent(dentry, parent);
11133     + unionfs_read_unlock(dentry->d_sb);
11134     + return err;
11135     +}
11136     +
11137     +/*
11138     + * BKL held by caller.
11139     + * dentry->d_inode->i_mutex locked
11140     + */
11141     +ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size)
11142     +{
11143     + struct dentry *lower_dentry = NULL;
11144     + struct dentry *parent;
11145     + int err = -EOPNOTSUPP;
11146     + char *encoded_list = NULL;
11147     + bool valid;
11148     +
11149     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11150     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11151     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11152     +
11153     + valid = __unionfs_d_revalidate(dentry, parent, false);
11154     + if (unlikely(!valid)) {
11155     + err = -ESTALE;
11156     + goto out;
11157     + }
11158     +
11159     + lower_dentry = unionfs_lower_dentry(dentry);
11160     +
11161     + encoded_list = list;
11162     + err = vfs_listxattr(lower_dentry, encoded_list, size);
11163     +
11164     +out:
11165     + unionfs_check_dentry(dentry);
11166     + unionfs_unlock_dentry(dentry);
11167     + unionfs_unlock_parent(dentry, parent);
11168     + unionfs_read_unlock(dentry->d_sb);
11169     + return err;
11170     +}
11171     diff --git a/include/linux/fs.h b/include/linux/fs.h
11172     index ebb1cd5..b03df2d 100644
11173     --- a/include/linux/fs.h
11174     +++ b/include/linux/fs.h
11175     @@ -1739,6 +1739,7 @@ struct file_system_type {
11176    
11177     struct lock_class_key s_lock_key;
11178     struct lock_class_key s_umount_key;
11179     + struct lock_class_key s_vfs_rename_key;
11180    
11181     struct lock_class_key i_lock_key;
11182     struct lock_class_key i_mutex_key;
11183     diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
11184     index da317c7..64f1ced 100644
11185     --- a/include/linux/fs_stack.h
11186     +++ b/include/linux/fs_stack.h
11187     @@ -1,7 +1,19 @@
11188     +/*
11189     + * Copyright (c) 2006-2009 Erez Zadok
11190     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
11191     + * Copyright (c) 2006-2009 Stony Brook University
11192     + * Copyright (c) 2006-2009 The Research Foundation of SUNY
11193     + *
11194     + * This program is free software; you can redistribute it and/or modify
11195     + * it under the terms of the GNU General Public License version 2 as
11196     + * published by the Free Software Foundation.
11197     + */
11198     +
11199     #ifndef _LINUX_FS_STACK_H
11200     #define _LINUX_FS_STACK_H
11201    
11202     -/* This file defines generic functions used primarily by stackable
11203     +/*
11204     + * This file defines generic functions used primarily by stackable
11205     * filesystems; none of these functions require i_mutex to be held.
11206     */
11207    
11208     diff --git a/include/linux/magic.h b/include/linux/magic.h
11209     index 76285e0..ff4f649 100644
11210     --- a/include/linux/magic.h
11211     +++ b/include/linux/magic.h
11212     @@ -47,6 +47,8 @@
11213     #define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs"
11214     #define REISER2FS_JR_SUPER_MAGIC_STRING "ReIsEr3Fs"
11215    
11216     +#define UNIONFS_SUPER_MAGIC 0xf15f083d
11217     +
11218     #define SMB_SUPER_MAGIC 0x517B
11219     #define USBDEVICE_SUPER_MAGIC 0x9fa2
11220     #define CGROUP_SUPER_MAGIC 0x27e0eb
11221     diff --git a/include/linux/namei.h b/include/linux/namei.h
11222     index 05b441d..dca6f9a 100644
11223     --- a/include/linux/namei.h
11224     +++ b/include/linux/namei.h
11225     @@ -72,6 +72,7 @@ extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
11226    
11227     extern struct file *lookup_instantiate_filp(struct nameidata *nd, struct dentry *dentry,
11228     int (*open)(struct inode *, struct file *));
11229     +extern void release_open_intent(struct nameidata *);
11230    
11231     extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
11232    
11233     diff --git a/include/linux/splice.h b/include/linux/splice.h
11234     index 18e7c7c..af56841 100644
11235     --- a/include/linux/splice.h
11236     +++ b/include/linux/splice.h
11237     @@ -81,5 +81,10 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
11238     struct splice_pipe_desc *);
11239     extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
11240     splice_direct_actor *);
11241     +extern long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
11242     + loff_t *ppos, size_t len, unsigned int flags);
11243     +extern long vfs_splice_to(struct file *in, loff_t *ppos,
11244     + struct pipe_inode_info *pipe, size_t len,
11245     + unsigned int flags);
11246    
11247     #endif
11248     diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
11249     new file mode 100644
11250     index 0000000..c84d97e
11251     --- /dev/null
11252     +++ b/include/linux/union_fs.h
11253     @@ -0,0 +1,22 @@
11254     +/*
11255     + * Copyright (c) 2003-2009 Erez Zadok
11256     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
11257     + * Copyright (c) 2003-2009 Stony Brook University
11258     + * Copyright (c) 2003-2009 The Research Foundation of SUNY
11259     + *
11260     + * This program is free software; you can redistribute it and/or modify
11261     + * it under the terms of the GNU General Public License version 2 as
11262     + * published by the Free Software Foundation.
11263     + */
11264     +
11265     +#ifndef _LINUX_UNION_FS_H
11266     +#define _LINUX_UNION_FS_H
11267     +
11268     +/*
11269     + * DEFINITIONS FOR USER AND KERNEL CODE:
11270     + */
11271     +# define UNIONFS_IOCTL_INCGEN _IOR(0x15, 11, int)
11272     +# define UNIONFS_IOCTL_QUERYFILE _IOR(0x15, 15, int)
11273     +
11274     +#endif /* _LINUX_UNIONFS_H */
11275     +
11276     diff --git a/security/security.c b/security/security.c
11277     index 122b748..a02aece 100644
11278     --- a/security/security.c
11279     +++ b/security/security.c
11280     @@ -557,6 +557,7 @@ int security_inode_permission(struct inode *inode, int mask)
11281     return 0;
11282     return security_ops->inode_permission(inode, mask);
11283     }
11284     +EXPORT_SYMBOL(security_inode_permission);
11285    
11286     int security_inode_setattr(struct dentry *dentry, struct iattr *attr)
11287     {