Magellan Linux

Annotation of /trunk/kernel26-mcore/patches-2.6.36-r3/0153-2.6.36-unionfs-2.5.6.patch

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1231 - (hide annotations) (download)
Fri Dec 10 22:38:07 2010 UTC (13 years, 5 months ago) by niro
File size: 335475 byte(s)
-2.6.36-mcore-r3:
-using linux-2.6.36.2
-enabled ipv6 support
1 niro 1231 diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
2     index 4303614..5ade4a8 100644
3     --- a/Documentation/filesystems/00-INDEX
4     +++ b/Documentation/filesystems/00-INDEX
5     @@ -112,6 +112,8 @@ udf.txt
6     - info and mount options for the UDF filesystem.
7     ufs.txt
8     - info on the ufs filesystem.
9     +unionfs/
10     + - info on the unionfs filesystem
11     vfat.txt
12     - info on using the VFAT filesystem used in Windows NT and Windows 95
13     vfs.txt
14     diff --git a/Documentation/filesystems/unionfs/00-INDEX b/Documentation/filesystems/unionfs/00-INDEX
15     new file mode 100644
16     index 0000000..96fdf67
17     --- /dev/null
18     +++ b/Documentation/filesystems/unionfs/00-INDEX
19     @@ -0,0 +1,10 @@
20     +00-INDEX
21     + - this file.
22     +concepts.txt
23     + - A brief introduction of concepts.
24     +issues.txt
25     + - A summary of known issues with unionfs.
26     +rename.txt
27     + - Information regarding rename operations.
28     +usage.txt
29     + - Usage information and examples.
30     diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
31     new file mode 100644
32     index 0000000..b853788
33     --- /dev/null
34     +++ b/Documentation/filesystems/unionfs/concepts.txt
35     @@ -0,0 +1,287 @@
36     +Unionfs 2.x CONCEPTS:
37     +=====================
38     +
39     +This file describes the concepts needed by a namespace unification file
40     +system.
41     +
42     +
43     +Branch Priority:
44     +================
45     +
46     +Each branch is assigned a unique priority - starting from 0 (highest
47     +priority). No two branches can have the same priority.
48     +
49     +
50     +Branch Mode:
51     +============
52     +
53     +Each branch is assigned a mode - read-write or read-only. This allows
54     +directories on media mounted read-write to be used in a read-only manner.
55     +
56     +
57     +Whiteouts:
58     +==========
59     +
60     +A whiteout removes a file name from the namespace. Whiteouts are needed when
61     +one attempts to remove a file on a read-only branch.
62     +
63     +Suppose we have a two-branch union, where branch 0 is read-write and branch
64     +1 is read-only. And a file 'foo' on branch 1:
65     +
66     +./b0/
67     +./b1/
68     +./b1/foo
69     +
70     +The unified view would simply be:
71     +
72     +./union/
73     +./union/foo
74     +
75     +Since 'foo' is stored on a read-only branch, it cannot be removed. A
76     +whiteout is used to remove the name 'foo' from the unified namespace. Again,
77     +since branch 1 is read-only, the whiteout cannot be created there. So, we
78     +try on a higher priority (lower numerically) branch and create the whiteout
79     +there.
80     +
81     +./b0/
82     +./b0/.wh.foo
83     +./b1/
84     +./b1/foo
85     +
86     +Later, when Unionfs traverses branches (due to lookup or readdir), it
87     +eliminate 'foo' from the namespace (as well as the whiteout itself.)
88     +
89     +
90     +Opaque Directories:
91     +===================
92     +
93     +Assume we have a unionfs mount comprising of two branches. Branch 0 is
94     +empty; branch 1 has the directory /a and file /a/f. Let's say we mount a
95     +union of branch 0 as read-write and branch 1 as read-only. Now, let's say
96     +we try to perform the following operation in the union:
97     +
98     + rm -fr a
99     +
100     +Because branch 1 is not writable, we cannot physically remove the file /a/f
101     +or the directory /a. So instead, we will create a whiteout in branch 0
102     +named /.wh.a, masking out the name "a" from branch 1. Next, let's say we
103     +try to create a directory named "a" as follows:
104     +
105     + mkdir a
106     +
107     +Because we have a whiteout for "a" already, Unionfs behaves as if "a"
108     +doesn't exist, and thus will delete the whiteout and replace it with an
109     +actual directory named "a".
110     +
111     +The problem now is that if you try to "ls" in the union, Unionfs will
112     +perform is normal directory name unification, for *all* directories named
113     +"a" in all branches. This will cause the file /a/f from branch 1 to
114     +re-appear in the union's namespace, which violates Unix semantics.
115     +
116     +To avoid this problem, we have a different form of whiteouts for
117     +directories, called "opaque directories" (same as BSD Union Mount does).
118     +Whenever we replace a whiteout with a directory, that directory is marked as
119     +opaque. In Unionfs 2.x, it means that we create a file named
120     +/a/.wh.__dir_opaque in branch 0, after having created directory /a there.
121     +When unionfs notices that a directory is opaque, it stops all namespace
122     +operations (including merging readdir contents) at that opaque directory.
123     +This prevents re-exposing names from masked out directories.
124     +
125     +
126     +Duplicate Elimination:
127     +======================
128     +
129     +It is possible for files on different branches to have the same name.
130     +Unionfs then has to select which instance of the file to show to the user.
131     +Given the fact that each branch has a priority associated with it, the
132     +simplest solution is to take the instance from the highest priority
133     +(numerically lowest value) and "hide" the others.
134     +
135     +
136     +Unlinking:
137     +=========
138     +
139     +Unlink operation on non-directory instances is optimized to remove the
140     +maximum possible objects in case multiple underlying branches have the same
141     +file name. The unlink operation will first try to delete file instances
142     +from highest priority branch and then move further to delete from remaining
143     +branches in order of their decreasing priority. Consider a case (F..D..F),
144     +where F is a file and D is a directory of the same name; here, some
145     +intermediate branch could have an empty directory instance with the same
146     +name, so this operation also tries to delete this directory instance and
147     +proceed further to delete from next possible lower priority branch. The
148     +unionfs unlink operation will smoothly delete the files with same name from
149     +all possible underlying branches. In case if some error occurs, it creates
150     +whiteout in highest priority branch that will hide file instance in rest of
151     +the branches. An error could occur either if an unlink operations in any of
152     +the underlying branch failed or if a branch has no write permission.
153     +
154     +This unlinking policy is known as "delete all" and it has the benefit of
155     +overall reducing the number of inodes used by duplicate files, and further
156     +reducing the total number of inodes consumed by whiteouts. The cost is of
157     +extra processing, but testing shows this extra processing is well worth the
158     +savings.
159     +
160     +
161     +Copyup:
162     +=======
163     +
164     +When a change is made to the contents of a file's data or meta-data, they
165     +have to be stored somewhere. The best way is to create a copy of the
166     +original file on a branch that is writable, and then redirect the write
167     +though to this copy. The copy must be made on a higher priority branch so
168     +that lookup and readdir return this newer "version" of the file rather than
169     +the original (see duplicate elimination).
170     +
171     +An entire unionfs mount can be read-only or read-write. If it's read-only,
172     +then none of the branches will be written to, even if some of the branches
173     +are physically writeable. If the unionfs mount is read-write, then the
174     +leftmost (highest priority) branch must be writeable (for copyup to take
175     +place); the remaining branches can be any mix of read-write and read-only.
176     +
177     +In a writeable mount, unionfs will create new files/dir in the leftmost
178     +branch. If one tries to modify a file in a read-only branch/media, unionfs
179     +will copyup the file to the leftmost branch and modify it there. If you try
180     +to modify a file from a writeable branch which is not the leftmost branch,
181     +then unionfs will modify it in that branch; this is useful if you, say,
182     +unify differnet packages (e.g., apache, sendmail, ftpd, etc.) and you want
183     +changes to specific package files to remain logically in the directory where
184     +they came from.
185     +
186     +Cache Coherency:
187     +================
188     +
189     +Unionfs users often want to be able to modify files and directories directly
190     +on the lower branches, and have those changes be visible at the Unionfs
191     +level. This means that data (e.g., pages) and meta-data (dentries, inodes,
192     +open files, etc.) have to be synchronized between the upper and lower
193     +layers. In other words, the newest changes from a layer below have to be
194     +propagated to the Unionfs layer above. If the two layers are not in sync, a
195     +cache incoherency ensues, which could lead to application failures and even
196     +oopses. The Linux kernel, however, has a rather limited set of mechanisms
197     +to ensure this inter-layer cache coherency---so Unionfs has to do most of
198     +the hard work on its own.
199     +
200     +Maintaining Invariants:
201     +
202     +The way Unionfs ensures cache coherency is as follows. At each entry point
203     +to a Unionfs file system method, we call a utility function to validate the
204     +primary objects of this method. Generally, we call unionfs_file_revalidate
205     +on open files, and __unionfs_d_revalidate_chain on dentries (which also
206     +validates inodes). These utility functions check to see whether the upper
207     +Unionfs object is in sync with any of the lower objects that it represents.
208     +The checks we perform include whether the Unionfs superblock has a newer
209     +generation number, or if any of the lower objects mtime's or ctime's are
210     +newer. (Note: generation numbers change when branch-management commands are
211     +issued, so in a way, maintaining cache coherency is also very important for
212     +branch-management.) If indeed we determine that any Unionfs object is no
213     +longer in sync with its lower counterparts, then we rebuild that object
214     +similarly to how we do so for branch-management.
215     +
216     +While rebuilding Unionfs's objects, we also purge any page mappings and
217     +truncate inode pages (see fs/unionfs/dentry.c:purge_inode_data). This is to
218     +ensure that Unionfs will re-get the newer data from the lower branches. We
219     +perform this purging only if the Unionfs operation in question is a reading
220     +operation; if Unionfs is performing a data writing operation (e.g., ->write,
221     +->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is
222     +because (1) a self-deadlock could occur and (2) the upper Unionfs pages are
223     +considered more authoritative anyway, as they are newer and will overwrite
224     +any lower pages.
225     +
226     +Unionfs maintains the following important invariant regarding mtime's,
227     +ctime's, and atime's: the upper inode object's times are the max() of all of
228     +the lower ones. For non-directory objects, there's only one object below,
229     +so the mapping is simple; for directory objects, there could me multiple
230     +lower objects and we have to sync up with the newest one of all the lower
231     +ones. This invariant is important to maintain, especially for directories
232     +(besides, we need this to be POSIX compliant). A union could comprise
233     +multiple writable branches, each of which could change. If we don't reflect
234     +the newest possible mtime/ctime, some applications could fail. For example,
235     +NFSv2/v3 exports check for newer directory mtimes on the server to determine
236     +if the client-side attribute cache should be purged.
237     +
238     +To maintain these important invariants, of course, Unionfs carefully
239     +synchronizes upper and lower times in various places. For example, if we
240     +copy-up a file to a top-level branch, the parent directory where the file
241     +was copied up to will now have a new mtime: so after a successful copy-up,
242     +we sync up with the new top-level branch's parent directory mtime.
243     +
244     +Implementation:
245     +
246     +This cache-coherency implementation is efficient because it defers any
247     +synchronizing between the upper and lower layers until absolutely needed.
248     +Consider the example a common situation where users perform a lot of lower
249     +changes, such as untarring a whole package. While these take place,
250     +typically the user doesn't access the files via Unionfs; only after the
251     +lower changes are done, does the user try to access the lower files. With
252     +our cache-coherency implementation, the entirety of the changes to the lower
253     +branches will not result in a single CPU cycle spent at the Unionfs level
254     +until the user invokes a system call that goes through Unionfs.
255     +
256     +We have considered two alternate cache-coherency designs. (1) Using the
257     +dentry/inode notify functionality to register interest in finding out about
258     +any lower changes. This is a somewhat limited and also a heavy-handed
259     +approach which could result in many notifications to the Unionfs layer upon
260     +each small change at the lower layer (imagine a file being modified multiple
261     +times in rapid succession). (2) Rewriting the VFS to support explicit
262     +callbacks from lower objects to upper objects. We began exploring such an
263     +implementation, but found it to be very complicated--it would have resulted
264     +in massive VFS/MM changes which are unlikely to be accepted by the LKML
265     +community. We therefore believe that our current cache-coherency design and
266     +implementation represent the best approach at this time.
267     +
268     +Limitations:
269     +
270     +Our implementation works in that as long as a user process will have caused
271     +Unionfs to be called, directly or indirectly, even to just do
272     +->d_revalidate; then we will have purged the current Unionfs data and the
273     +process will see the new data. For example, a process that continually
274     +re-reads the same file's data will see the NEW data as soon as the lower
275     +file had changed, upon the next read(2) syscall (even if the file is still
276     +open!) However, this doesn't work when the process re-reads the open file's
277     +data via mmap(2) (unless the user unmaps/closes the file and remaps/reopens
278     +it). Once we respond to ->readpage(s), then the kernel maps the page into
279     +the process's address space and there doesn't appear to be a way to force
280     +the kernel to invalidate those pages/mappings, and force the process to
281     +re-issue ->readpage. If there's a way to invalidate active mappings and
282     +force a ->readpage, let us know please (invalidate_inode_pages2 doesn't do
283     +the trick).
284     +
285     +Our current Unionfs code has to perform many file-revalidation calls. It
286     +would be really nice if the VFS would export an optional file system hook
287     +->file_revalidate (similarly to dentry->d_revalidate) that will be called
288     +before each VFS op that has a "struct file" in it.
289     +
290     +Certain file systems have micro-second granularity (or better) for inode
291     +times, and asynchronous actions could cause those times to change with some
292     +small delay. In such cases, Unionfs may see a changed inode time that only
293     +differs by a tiny fraction of a second: such a change may be a false
294     +positive indication that the lower object has changed, whereas if unionfs
295     +waits a little longer, that false indication will not be seen. (These false
296     +positives are harmless, because they would at most cause unionfs to
297     +re-validate an object that may need no revalidation, and print a debugging
298     +message that clutters the console/logs.) Therefore, to minimize the chances
299     +of these situations, we delay the detection of changed times by a small
300     +factor of a few seconds, called UNIONFS_MIN_CC_TIME (which defaults to 3
301     +seconds, as does NFS). This means that we will detect the change, only a
302     +couple of seconds later, if indeed the time change persists in the lower
303     +file object. This delayed detection has an added performance benefit: we
304     +reduce the number of times that unionfs has to revalidate objects, in case
305     +there's a lot of concurrent activity on both the upper and lower objects,
306     +for the same file(s). Lastly, this delayed time attribute detection is
307     +similar to how NFS clients operate (e.g., acregmin).
308     +
309     +Finally, there is no way currently in Linux to prevent lower directories
310     +from being moved around (i.e., topology changes); there's no way to prevent
311     +modifications to directory sub-trees of whole file systems which are mounted
312     +read-write. It is therefore possible for in-flight operations in unionfs to
313     +take place, while a lower directory is being moved around. Therefore, if
314     +you try to, say, create a new file in a directory through unionfs, while the
315     +directory is being moved around directly, then the new file may get created
316     +in the new location where that directory was moved to. This is a somewhat
317     +similar behaviour in NFS: an NFS client could be creating a new file while
318     +th NFS server is moving th directory around; the file will get successfully
319     +created in the new location. (The one exception in unionfs is that if the
320     +branch is marked read-only by unionfs, then a copyup will take place.)
321     +
322     +For more information, see <http://unionfs.filesystems.org/>.
323     diff --git a/Documentation/filesystems/unionfs/issues.txt b/Documentation/filesystems/unionfs/issues.txt
324     new file mode 100644
325     index 0000000..f4b7e7e
326     --- /dev/null
327     +++ b/Documentation/filesystems/unionfs/issues.txt
328     @@ -0,0 +1,28 @@
329     +KNOWN Unionfs 2.x ISSUES:
330     +=========================
331     +
332     +1. Unionfs should not use lookup_one_len() on the underlying f/s as it
333     + confuses NFSv4. Currently, unionfs_lookup() passes lookup intents to the
334     + lower file-system, this eliminates part of the problem. The remaining
335     + calls to lookup_one_len may need to be changed to pass an intent. We are
336     + currently introducing VFS changes to fs/namei.c's do_path_lookup() to
337     + allow proper file lookup and opening in stackable file systems.
338     +
339     +2. Lockdep (a debugging feature) isn't aware of stacking, and so it
340     + incorrectly complains about locking problems. The problem boils down to
341     + this: Lockdep considers all objects of a certain type to be in the same
342     + class, for example, all inodes. Lockdep doesn't like to see a lock held
343     + on two inodes within the same task, and warns that it could lead to a
344     + deadlock. However, stackable file systems do precisely that: they lock
345     + an upper object, and then a lower object, in a strict order to avoid
346     + locking problems; in addition, Unionfs, as a fan-out file system, may
347     + have to lock several lower inodes. We are currently looking into Lockdep
348     + to see how to make it aware of stackable file systems. For now, we
349     + temporarily disable lockdep when calling vfs methods on lower objects,
350     + but only for those places where lockdep complained. While this solution
351     + may seem unclean, it is not without precedent: other places in the kernel
352     + also do similar temporary disabling, of course after carefully having
353     + checked that it is the right thing to do. Anyway, you get any warnings
354     + from Lockdep, please report them to the Unionfs maintainers.
355     +
356     +For more information, see <http://unionfs.filesystems.org/>.
357     diff --git a/Documentation/filesystems/unionfs/rename.txt b/Documentation/filesystems/unionfs/rename.txt
358     new file mode 100644
359     index 0000000..e20bb82
360     --- /dev/null
361     +++ b/Documentation/filesystems/unionfs/rename.txt
362     @@ -0,0 +1,31 @@
363     +Rename is a complex beast. The following table shows which rename(2) operations
364     +should succeed and which should fail.
365     +
366     +o: success
367     +E: error (either unionfs or vfs)
368     +X: EXDEV
369     +
370     +none = file does not exist
371     +file = file is a file
372     +dir = file is a empty directory
373     +child= file is a non-empty directory
374     +wh = file is a directory containing only whiteouts; this makes it logically
375     + empty
376     +
377     + none file dir child wh
378     +file o o E E E
379     +dir o E o E o
380     +child X E X E X
381     +wh o E o E o
382     +
383     +
384     +Renaming directories:
385     +=====================
386     +
387     +Whenever a empty (either physically or logically) directory is being renamed,
388     +the following sequence of events should take place:
389     +
390     +1) Remove whiteouts from both source and destination directory
391     +2) Rename source to destination
392     +3) Make destination opaque to prevent anything under it from showing up
393     +
394     diff --git a/Documentation/filesystems/unionfs/usage.txt b/Documentation/filesystems/unionfs/usage.txt
395     new file mode 100644
396     index 0000000..1adde69
397     --- /dev/null
398     +++ b/Documentation/filesystems/unionfs/usage.txt
399     @@ -0,0 +1,134 @@
400     +Unionfs is a stackable unification file system, which can appear to merge
401     +the contents of several directories (branches), while keeping their physical
402     +content separate. Unionfs is useful for unified source tree management,
403     +merged contents of split CD-ROM, merged separate software package
404     +directories, data grids, and more. Unionfs allows any mix of read-only and
405     +read-write branches, as well as insertion and deletion of branches anywhere
406     +in the fan-out. To maintain Unix semantics, Unionfs handles elimination of
407     +duplicates, partial-error conditions, and more.
408     +
409     +GENERAL SYNTAX
410     +==============
411     +
412     +# mount -t unionfs -o <OPTIONS>,<BRANCH-OPTIONS> none MOUNTPOINT
413     +
414     +OPTIONS can be any legal combination of:
415     +
416     +- ro # mount file system read-only
417     +- rw # mount file system read-write
418     +- remount # remount the file system (see Branch Management below)
419     +- incgen # increment generation no. (see Cache Consistency below)
420     +
421     +BRANCH-OPTIONS can be either (1) a list of branches given to the "dirs="
422     +option, or (2) a list of individual branch manipulation commands, combined
423     +with the "remount" option, and is further described in the "Branch
424     +Management" section below.
425     +
426     +The syntax for the "dirs=" mount option is:
427     +
428     + dirs=branch[=ro|=rw][:...]
429     +
430     +The "dirs=" option takes a colon-delimited list of directories to compose
431     +the union, with an optional branch mode for each of those directories.
432     +Directories that come earlier (specified first, on the left) in the list
433     +have a higher precedence than those which come later. Additionally,
434     +read-only or read-write permissions of the branch can be specified by
435     +appending =ro or =rw (default) to each directory. See the Copyup section in
436     +concepts.txt, for a description of Unionfs's behavior when mixing read-only
437     +and read-write branches and mounts.
438     +
439     +Syntax:
440     +
441     + dirs=/branch1[=ro|=rw]:/branch2[=ro|=rw]:...:/branchN[=ro|=rw]
442     +
443     +Example:
444     +
445     + dirs=/writable_branch=rw:/read-only_branch=ro
446     +
447     +
448     +BRANCH MANAGEMENT
449     +=================
450     +
451     +Once you mount your union for the first time, using the "dirs=" option, you
452     +can then change the union's overall mode or reconfigure the branches, using
453     +the remount option, as follows.
454     +
455     +To downgrade a union from read-write to read-only:
456     +
457     +# mount -t unionfs -o remount,ro none MOUNTPOINT
458     +
459     +To upgrade a union from read-only to read-write:
460     +
461     +# mount -t unionfs -o remount,rw none MOUNTPOINT
462     +
463     +To delete a branch /foo, regardless where it is in the current union:
464     +
465     +# mount -t unionfs -o remount,del=/foo none MOUNTPOINT
466     +
467     +To insert (add) a branch /foo before /bar:
468     +
469     +# mount -t unionfs -o remount,add=/bar:/foo none MOUNTPOINT
470     +
471     +To insert (add) a branch /foo (with the "rw" mode flag) before /bar:
472     +
473     +# mount -t unionfs -o remount,add=/bar:/foo=rw none MOUNTPOINT
474     +
475     +To insert (add) a branch /foo (in "rw" mode) at the very beginning (i.e., a
476     +new highest-priority branch), you can use the above syntax, or use a short
477     +hand version as follows:
478     +
479     +# mount -t unionfs -o remount,add=/foo none MOUNTPOINT
480     +
481     +To append a branch to the very end (new lowest-priority branch):
482     +
483     +# mount -t unionfs -o remount,add=:/foo none MOUNTPOINT
484     +
485     +To append a branch to the very end (new lowest-priority branch), in
486     +read-only mode:
487     +
488     +# mount -t unionfs -o remount,add=:/foo=ro none MOUNTPOINT
489     +
490     +Finally, to change the mode of one existing branch, say /foo, from read-only
491     +to read-write, and change /bar from read-write to read-only:
492     +
493     +# mount -t unionfs -o remount,mode=/foo=rw,mode=/bar=ro none MOUNTPOINT
494     +
495     +Note: in Unionfs 2.x, you cannot set the leftmost branch to readonly because
496     +then Unionfs won't have any writable place for copyups to take place.
497     +Moreover, the VFS can get confused when it tries to modify something in a
498     +file system mounted read-write, but isn't permitted to write to it.
499     +Instead, you should set the whole union as readonly, as described above.
500     +If, however, you must set the leftmost branch as readonly, perhaps so you
501     +can get a snapshot of it at a point in time, then you should insert a new
502     +writable top-level branch, and mark the one you want as readonly. This can
503     +be accomplished as follows, assuming that /foo is your current leftmost
504     +branch:
505     +
506     +# mount -t tmpfs -o size=NNN /new
507     +# mount -t unionfs -o remount,add=/new,mode=/foo=ro none MOUNTPOINT
508     +<do what you want safely in /foo>
509     +# mount -t unionfs -o remount,del=/new,mode=/foo=rw none MOUNTPOINT
510     +<check if there's anything in /new you want to preserve>
511     +# umount /new
512     +
513     +CACHE CONSISTENCY
514     +=================
515     +
516     +If you modify any file on any of the lower branches directly, while there is
517     +a Unionfs 2.x mounted above any of those branches, you should tell Unionfs
518     +to purge its caches and re-get the objects. To do that, you have to
519     +increment the generation number of the superblock using the following
520     +command:
521     +
522     +# mount -t unionfs -o remount,incgen none MOUNTPOINT
523     +
524     +Note that the older way of incrementing the generation number using an
525     +ioctl, is no longer supported in Unionfs 2.0 and newer. Ioctls in general
526     +are not encouraged. Plus, an ioctl is per-file concept, whereas the
527     +generation number is a per-file-system concept. Worse, such an ioctl
528     +requires an open file, which then has to be invalidated by the very nature
529     +of the generation number increase (read: the old generation increase ioctl
530     +was pretty racy).
531     +
532     +
533     +For more information, see <http://unionfs.filesystems.org/>.
534     diff --git a/MAINTAINERS b/MAINTAINERS
535     index df34283..559779b 100644
536     --- a/MAINTAINERS
537     +++ b/MAINTAINERS
538     @@ -5877,6 +5877,14 @@ F: Documentation/cdrom/
539     F: drivers/cdrom/cdrom.c
540     F: include/linux/cdrom.h
541    
542     +UNIONFS
543     +P: Erez Zadok
544     +M: ezk@cs.sunysb.edu
545     +L: unionfs@filesystems.org
546     +W: http://unionfs.filesystems.org/
547     +T: git git.kernel.org/pub/scm/linux/kernel/git/ezk/unionfs.git
548     +S: Maintained
549     +
550     UNSORTED BLOCK IMAGES (UBI)
551     M: Artem Bityutskiy <dedekind1@gmail.com>
552     W: http://www.linux-mtd.infradead.org/
553     diff --git a/fs/Kconfig b/fs/Kconfig
554     index 3d18530..65b6aa1 100644
555     --- a/fs/Kconfig
556     +++ b/fs/Kconfig
557     @@ -169,6 +169,7 @@ if MISC_FILESYSTEMS
558     source "fs/adfs/Kconfig"
559     source "fs/affs/Kconfig"
560     source "fs/ecryptfs/Kconfig"
561     +source "fs/unionfs/Kconfig"
562     source "fs/hfs/Kconfig"
563     source "fs/hfsplus/Kconfig"
564     source "fs/befs/Kconfig"
565     diff --git a/fs/Makefile b/fs/Makefile
566     index e6ec1d3..787332e 100644
567     --- a/fs/Makefile
568     +++ b/fs/Makefile
569     @@ -84,6 +84,7 @@ obj-$(CONFIG_ISO9660_FS) += isofs/
570     obj-$(CONFIG_HFSPLUS_FS) += hfsplus/ # Before hfs to find wrapped HFS+
571     obj-$(CONFIG_HFS_FS) += hfs/
572     obj-$(CONFIG_ECRYPT_FS) += ecryptfs/
573     +obj-$(CONFIG_UNION_FS) += unionfs/
574     obj-$(CONFIG_VXFS_FS) += freevxfs/
575     obj-$(CONFIG_NFS_FS) += nfs/
576     obj-$(CONFIG_EXPORTFS) += exportfs/
577     diff --git a/fs/namei.c b/fs/namei.c
578     index 24896e8..db22420 100644
579     --- a/fs/namei.c
580     +++ b/fs/namei.c
581     @@ -385,6 +385,7 @@ void release_open_intent(struct nameidata *nd)
582     else
583     fput(nd->intent.open.file);
584     }
585     +EXPORT_SYMBOL_GPL(release_open_intent);
586    
587     static inline struct dentry *
588     do_revalidate(struct dentry *dentry, struct nameidata *nd)
589     diff --git a/fs/splice.c b/fs/splice.c
590     index 8f1dfae..7a57fab 100644
591     --- a/fs/splice.c
592     +++ b/fs/splice.c
593     @@ -1092,8 +1092,8 @@ EXPORT_SYMBOL(generic_splice_sendpage);
594     /*
595     * Attempt to initiate a splice from pipe to file.
596     */
597     -static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
598     - loff_t *ppos, size_t len, unsigned int flags)
599     +long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
600     + loff_t *ppos, size_t len, unsigned int flags)
601     {
602     ssize_t (*splice_write)(struct pipe_inode_info *, struct file *,
603     loff_t *, size_t, unsigned int);
604     @@ -1116,13 +1116,14 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
605    
606     return splice_write(pipe, out, ppos, len, flags);
607     }
608     +EXPORT_SYMBOL_GPL(vfs_splice_from);
609    
610     /*
611     * Attempt to initiate a splice from a file to a pipe.
612     */
613     -static long do_splice_to(struct file *in, loff_t *ppos,
614     - struct pipe_inode_info *pipe, size_t len,
615     - unsigned int flags)
616     +long vfs_splice_to(struct file *in, loff_t *ppos,
617     + struct pipe_inode_info *pipe, size_t len,
618     + unsigned int flags)
619     {
620     ssize_t (*splice_read)(struct file *, loff_t *,
621     struct pipe_inode_info *, size_t, unsigned int);
622     @@ -1142,6 +1143,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
623    
624     return splice_read(in, ppos, pipe, len, flags);
625     }
626     +EXPORT_SYMBOL_GPL(vfs_splice_to);
627    
628     /**
629     * splice_direct_to_actor - splices data directly between two non-pipes
630     @@ -1211,7 +1213,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
631     size_t read_len;
632     loff_t pos = sd->pos, prev_pos = pos;
633    
634     - ret = do_splice_to(in, &pos, pipe, len, flags);
635     + ret = vfs_splice_to(in, &pos, pipe, len, flags);
636     if (unlikely(ret <= 0))
637     goto out_release;
638    
639     @@ -1270,8 +1272,8 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
640     {
641     struct file *file = sd->u.file;
642    
643     - return do_splice_from(pipe, file, &file->f_pos, sd->total_len,
644     - sd->flags);
645     + return vfs_splice_from(pipe, file, &file->f_pos, sd->total_len,
646     + sd->flags);
647     }
648    
649     /**
650     @@ -1368,7 +1370,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
651     } else
652     off = &out->f_pos;
653    
654     - ret = do_splice_from(ipipe, out, off, len, flags);
655     + ret = vfs_splice_from(ipipe, out, off, len, flags);
656    
657     if (off_out && copy_to_user(off_out, off, sizeof(loff_t)))
658     ret = -EFAULT;
659     @@ -1388,7 +1390,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
660     } else
661     off = &in->f_pos;
662    
663     - ret = do_splice_to(in, off, opipe, len, flags);
664     + ret = vfs_splice_to(in, off, opipe, len, flags);
665    
666     if (off_in && copy_to_user(off_in, off, sizeof(loff_t)))
667     ret = -EFAULT;
668     diff --git a/fs/stack.c b/fs/stack.c
669     index 4a6f7f4..7eeef12 100644
670     --- a/fs/stack.c
671     +++ b/fs/stack.c
672     @@ -1,8 +1,20 @@
673     +/*
674     + * Copyright (c) 2006-2009 Erez Zadok
675     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
676     + * Copyright (c) 2006-2009 Stony Brook University
677     + * Copyright (c) 2006-2009 The Research Foundation of SUNY
678     + *
679     + * This program is free software; you can redistribute it and/or modify
680     + * it under the terms of the GNU General Public License version 2 as
681     + * published by the Free Software Foundation.
682     + */
683     +
684     #include <linux/module.h>
685     #include <linux/fs.h>
686     #include <linux/fs_stack.h>
687    
688     -/* does _NOT_ require i_mutex to be held.
689     +/*
690     + * does _NOT_ require i_mutex to be held.
691     *
692     * This function cannot be inlined since i_size_{read,write} is rather
693     * heavy-weight on 32-bit systems
694     diff --git a/fs/unionfs/Kconfig b/fs/unionfs/Kconfig
695     new file mode 100644
696     index 0000000..f3c1ac4
697     --- /dev/null
698     +++ b/fs/unionfs/Kconfig
699     @@ -0,0 +1,24 @@
700     +config UNION_FS
701     + tristate "Union file system (EXPERIMENTAL)"
702     + depends on EXPERIMENTAL
703     + help
704     + Unionfs is a stackable unification file system, which appears to
705     + merge the contents of several directories (branches), while keeping
706     + their physical content separate.
707     +
708     + See <http://unionfs.filesystems.org> for details
709     +
710     +config UNION_FS_XATTR
711     + bool "Unionfs extended attributes"
712     + depends on UNION_FS
713     + help
714     + Extended attributes are name:value pairs associated with inodes by
715     + the kernel or by users (see the attr(5) manual page).
716     +
717     + If unsure, say N.
718     +
719     +config UNION_FS_DEBUG
720     + bool "Debug Unionfs"
721     + depends on UNION_FS
722     + help
723     + If you say Y here, you can turn on debugging output from Unionfs.
724     diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
725     new file mode 100644
726     index 0000000..08f4fd4
727     --- /dev/null
728     +++ b/fs/unionfs/Makefile
729     @@ -0,0 +1,17 @@
730     +UNIONFS_VERSION="2.5.6 (for 2.6.36-rc5)"
731     +
732     +EXTRA_CFLAGS += -DUNIONFS_VERSION=\"$(UNIONFS_VERSION)\"
733     +
734     +obj-$(CONFIG_UNION_FS) += unionfs.o
735     +
736     +unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
737     + rdstate.o copyup.o dirhelper.o rename.o unlink.o \
738     + lookup.o commonfops.o dirfops.o sioq.o mmap.o whiteout.o
739     +
740     +unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o
741     +
742     +unionfs-$(CONFIG_UNION_FS_DEBUG) += debug.o
743     +
744     +ifeq ($(CONFIG_UNION_FS_DEBUG),y)
745     +EXTRA_CFLAGS += -DDEBUG
746     +endif
747     diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
748     new file mode 100644
749     index 0000000..51ea65e
750     --- /dev/null
751     +++ b/fs/unionfs/commonfops.c
752     @@ -0,0 +1,896 @@
753     +/*
754     + * Copyright (c) 2003-2010 Erez Zadok
755     + * Copyright (c) 2003-2006 Charles P. Wright
756     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
757     + * Copyright (c) 2005-2006 Junjiro Okajima
758     + * Copyright (c) 2005 Arun M. Krishnakumar
759     + * Copyright (c) 2004-2006 David P. Quigley
760     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
761     + * Copyright (c) 2003 Puja Gupta
762     + * Copyright (c) 2003 Harikesavan Krishnan
763     + * Copyright (c) 2003-2010 Stony Brook University
764     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
765     + *
766     + * This program is free software; you can redistribute it and/or modify
767     + * it under the terms of the GNU General Public License version 2 as
768     + * published by the Free Software Foundation.
769     + */
770     +
771     +#include "union.h"
772     +
773     +/*
774     + * 1) Copyup the file
775     + * 2) Rename the file to '.unionfs<original inode#><counter>' - obviously
776     + * stolen from NFS's silly rename
777     + */
778     +static int copyup_deleted_file(struct file *file, struct dentry *dentry,
779     + struct dentry *parent, int bstart, int bindex)
780     +{
781     + static unsigned int counter;
782     + const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2;
783     + const int countersize = sizeof(counter) * 2;
784     + const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1;
785     + char name[nlen + 1];
786     + int err;
787     + struct dentry *tmp_dentry = NULL;
788     + struct dentry *lower_dentry;
789     + struct dentry *lower_dir_dentry = NULL;
790     +
791     + lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);
792     +
793     + sprintf(name, ".unionfs%*.*lx",
794     + i_inosize, i_inosize, lower_dentry->d_inode->i_ino);
795     +
796     + /*
797     + * Loop, looking for an unused temp name to copyup to.
798     + *
799     + * It's somewhat silly that we look for a free temp tmp name in the
800     + * source branch (bstart) instead of the dest branch (bindex), where
801     + * the final name will be created. We _will_ catch it if somehow
802     + * the name exists in the dest branch, but it'd be nice to catch it
803     + * sooner than later.
804     + */
805     +retry:
806     + tmp_dentry = NULL;
807     + do {
808     + char *suffix = name + nlen - countersize;
809     +
810     + dput(tmp_dentry);
811     + counter++;
812     + sprintf(suffix, "%*.*x", countersize, countersize, counter);
813     +
814     + pr_debug("unionfs: trying to rename %s to %s\n",
815     + dentry->d_name.name, name);
816     +
817     + tmp_dentry = lookup_lck_len(name, lower_dentry->d_parent,
818     + nlen);
819     + if (IS_ERR(tmp_dentry)) {
820     + err = PTR_ERR(tmp_dentry);
821     + goto out;
822     + }
823     + } while (tmp_dentry->d_inode != NULL); /* need negative dentry */
824     + dput(tmp_dentry);
825     +
826     + err = copyup_named_file(parent->d_inode, file, name, bstart, bindex,
827     + i_size_read(file->f_path.dentry->d_inode));
828     + if (err) {
829     + if (unlikely(err == -EEXIST))
830     + goto retry;
831     + goto out;
832     + }
833     +
834     + /* bring it to the same state as an unlinked file */
835     + lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
836     + if (!unionfs_lower_inode_idx(dentry->d_inode, bindex)) {
837     + atomic_inc(&lower_dentry->d_inode->i_count);
838     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
839     + lower_dentry->d_inode);
840     + }
841     + lower_dir_dentry = lock_parent(lower_dentry);
842     + err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
843     + unlock_dir(lower_dir_dentry);
844     +
845     +out:
846     + if (!err)
847     + unionfs_check_dentry(dentry);
848     + return err;
849     +}
850     +
851     +/*
852     + * put all references held by upper struct file and free lower file pointer
853     + * array
854     + */
855     +static void cleanup_file(struct file *file)
856     +{
857     + int bindex, bstart, bend;
858     + struct file **lower_files;
859     + struct file *lower_file;
860     + struct super_block *sb = file->f_path.dentry->d_sb;
861     +
862     + lower_files = UNIONFS_F(file)->lower_files;
863     + bstart = fbstart(file);
864     + bend = fbend(file);
865     +
866     + for (bindex = bstart; bindex <= bend; bindex++) {
867     + int i; /* holds (possibly) updated branch index */
868     + int old_bid;
869     +
870     + lower_file = unionfs_lower_file_idx(file, bindex);
871     + if (!lower_file)
872     + continue;
873     +
874     + /*
875     + * Find new index of matching branch with an open
876     + * file, since branches could have been added or
877     + * deleted causing the one with open files to shift.
878     + */
879     + old_bid = UNIONFS_F(file)->saved_branch_ids[bindex];
880     + i = branch_id_to_idx(sb, old_bid);
881     + if (unlikely(i < 0)) {
882     + printk(KERN_ERR "unionfs: no superblock for "
883     + "file %p\n", file);
884     + continue;
885     + }
886     +
887     + /* decrement count of open files */
888     + branchput(sb, i);
889     + /*
890     + * fput will perform an mntput for us on the correct branch.
891     + * Although we're using the file's old branch configuration,
892     + * bindex, which is the old index, correctly points to the
893     + * right branch in the file's branch list. In other words,
894     + * we're going to mntput the correct branch even if branches
895     + * have been added/removed.
896     + */
897     + fput(lower_file);
898     + UNIONFS_F(file)->lower_files[bindex] = NULL;
899     + UNIONFS_F(file)->saved_branch_ids[bindex] = -1;
900     + }
901     +
902     + UNIONFS_F(file)->lower_files = NULL;
903     + kfree(lower_files);
904     + kfree(UNIONFS_F(file)->saved_branch_ids);
905     + /* set to NULL because caller needs to know if to kfree on error */
906     + UNIONFS_F(file)->saved_branch_ids = NULL;
907     +}
908     +
909     +/* open all lower files for a given file */
910     +static int open_all_files(struct file *file)
911     +{
912     + int bindex, bstart, bend, err = 0;
913     + struct file *lower_file;
914     + struct dentry *lower_dentry;
915     + struct dentry *dentry = file->f_path.dentry;
916     + struct super_block *sb = dentry->d_sb;
917     +
918     + bstart = dbstart(dentry);
919     + bend = dbend(dentry);
920     +
921     + for (bindex = bstart; bindex <= bend; bindex++) {
922     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
923     + if (!lower_dentry)
924     + continue;
925     +
926     + dget(lower_dentry);
927     + unionfs_mntget(dentry, bindex);
928     + branchget(sb, bindex);
929     +
930     + lower_file =
931     + dentry_open(lower_dentry,
932     + unionfs_lower_mnt_idx(dentry, bindex),
933     + file->f_flags, current_cred());
934     + if (IS_ERR(lower_file)) {
935     + branchput(sb, bindex);
936     + err = PTR_ERR(lower_file);
937     + goto out;
938     + } else {
939     + unionfs_set_lower_file_idx(file, bindex, lower_file);
940     + }
941     + }
942     +out:
943     + return err;
944     +}
945     +
946     +/* open the highest priority file for a given upper file */
947     +static int open_highest_file(struct file *file, bool willwrite)
948     +{
949     + int bindex, bstart, bend, err = 0;
950     + struct file *lower_file;
951     + struct dentry *lower_dentry;
952     + struct dentry *dentry = file->f_path.dentry;
953     + struct dentry *parent = dget_parent(dentry);
954     + struct inode *parent_inode = parent->d_inode;
955     + struct super_block *sb = dentry->d_sb;
956     +
957     + bstart = dbstart(dentry);
958     + bend = dbend(dentry);
959     +
960     + lower_dentry = unionfs_lower_dentry(dentry);
961     + if (willwrite && IS_WRITE_FLAG(file->f_flags) && is_robranch(dentry)) {
962     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
963     + err = copyup_file(parent_inode, file, bstart, bindex,
964     + i_size_read(dentry->d_inode));
965     + if (!err)
966     + break;
967     + }
968     + atomic_set(&UNIONFS_F(file)->generation,
969     + atomic_read(&UNIONFS_I(dentry->d_inode)->
970     + generation));
971     + goto out;
972     + }
973     +
974     + dget(lower_dentry);
975     + unionfs_mntget(dentry, bstart);
976     + lower_file = dentry_open(lower_dentry,
977     + unionfs_lower_mnt_idx(dentry, bstart),
978     + file->f_flags, current_cred());
979     + if (IS_ERR(lower_file)) {
980     + err = PTR_ERR(lower_file);
981     + goto out;
982     + }
983     + branchget(sb, bstart);
984     + unionfs_set_lower_file(file, lower_file);
985     + /* Fix up the position. */
986     + lower_file->f_pos = file->f_pos;
987     +
988     + memcpy(&lower_file->f_ra, &file->f_ra, sizeof(struct file_ra_state));
989     +out:
990     + dput(parent);
991     + return err;
992     +}
993     +
994     +/* perform a delayed copyup of a read-write file on a read-only branch */
995     +static int do_delayed_copyup(struct file *file, struct dentry *parent)
996     +{
997     + int bindex, bstart, bend, err = 0;
998     + struct dentry *dentry = file->f_path.dentry;
999     + struct inode *parent_inode = parent->d_inode;
1000     +
1001     + bstart = fbstart(file);
1002     + bend = fbend(file);
1003     +
1004     + BUG_ON(!S_ISREG(dentry->d_inode->i_mode));
1005     +
1006     + unionfs_check_file(file);
1007     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
1008     + if (!d_deleted(dentry))
1009     + err = copyup_file(parent_inode, file, bstart,
1010     + bindex,
1011     + i_size_read(dentry->d_inode));
1012     + else
1013     + err = copyup_deleted_file(file, dentry, parent,
1014     + bstart, bindex);
1015     + /* if succeeded, set lower open-file flags and break */
1016     + if (!err) {
1017     + struct file *lower_file;
1018     + lower_file = unionfs_lower_file_idx(file, bindex);
1019     + lower_file->f_flags = file->f_flags;
1020     + break;
1021     + }
1022     + }
1023     + if (err || (bstart <= fbstart(file)))
1024     + goto out;
1025     + bend = fbend(file);
1026     + for (bindex = bstart; bindex <= bend; bindex++) {
1027     + if (unionfs_lower_file_idx(file, bindex)) {
1028     + branchput(dentry->d_sb, bindex);
1029     + fput(unionfs_lower_file_idx(file, bindex));
1030     + unionfs_set_lower_file_idx(file, bindex, NULL);
1031     + }
1032     + }
1033     + path_put_lowers(dentry, bstart, bend, false);
1034     + iput_lowers(dentry->d_inode, bstart, bend, false);
1035     + /* for reg file, we only open it "once" */
1036     + fbend(file) = fbstart(file);
1037     + dbend(dentry) = dbstart(dentry);
1038     + ibend(dentry->d_inode) = ibstart(dentry->d_inode);
1039     +
1040     +out:
1041     + unionfs_check_file(file);
1042     + return err;
1043     +}
1044     +
1045     +/*
1046     + * Helper function for unionfs_file_revalidate/locked.
1047     + * Expects dentry/parent to be locked already, and revalidated.
1048     + */
1049     +static int __unionfs_file_revalidate(struct file *file, struct dentry *dentry,
1050     + struct dentry *parent,
1051     + struct super_block *sb, int sbgen,
1052     + int dgen, bool willwrite)
1053     +{
1054     + int fgen;
1055     + int bstart, bend, orig_brid;
1056     + int size;
1057     + int err = 0;
1058     +
1059     + fgen = atomic_read(&UNIONFS_F(file)->generation);
1060     +
1061     + /*
1062     + * There are two cases we are interested in. The first is if the
1063     + * generation is lower than the super-block. The second is if
1064     + * someone has copied up this file from underneath us, we also need
1065     + * to refresh things.
1066     + */
1067     + if (d_deleted(dentry) ||
1068     + (sbgen <= fgen &&
1069     + dbstart(dentry) == fbstart(file) &&
1070     + unionfs_lower_file(file)))
1071     + goto out_may_copyup;
1072     +
1073     + /* save orig branch ID */
1074     + orig_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1075     +
1076     + /* First we throw out the existing files. */
1077     + cleanup_file(file);
1078     +
1079     + /* Now we reopen the file(s) as in unionfs_open. */
1080     + bstart = fbstart(file) = dbstart(dentry);
1081     + bend = fbend(file) = dbend(dentry);
1082     +
1083     + size = sizeof(struct file *) * sbmax(sb);
1084     + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1085     + if (unlikely(!UNIONFS_F(file)->lower_files)) {
1086     + err = -ENOMEM;
1087     + goto out;
1088     + }
1089     + size = sizeof(int) * sbmax(sb);
1090     + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1091     + if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1092     + err = -ENOMEM;
1093     + goto out;
1094     + }
1095     +
1096     + if (S_ISDIR(dentry->d_inode->i_mode)) {
1097     + /* We need to open all the files. */
1098     + err = open_all_files(file);
1099     + if (err)
1100     + goto out;
1101     + } else {
1102     + int new_brid;
1103     + /* We only open the highest priority branch. */
1104     + err = open_highest_file(file, willwrite);
1105     + if (err)
1106     + goto out;
1107     + new_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1108     + if (unlikely(new_brid != orig_brid && sbgen > fgen)) {
1109     + /*
1110     + * If we re-opened the file on a different branch
1111     + * than the original one, and this was due to a new
1112     + * branch inserted, then update the mnt counts of
1113     + * the old and new branches accordingly.
1114     + */
1115     + unionfs_mntget(dentry, bstart);
1116     + unionfs_mntput(sb->s_root,
1117     + branch_id_to_idx(sb, orig_brid));
1118     + }
1119     + /* regular files have only one open lower file */
1120     + fbend(file) = fbstart(file);
1121     + }
1122     + atomic_set(&UNIONFS_F(file)->generation,
1123     + atomic_read(&UNIONFS_I(dentry->d_inode)->generation));
1124     +
1125     +out_may_copyup:
1126     + /* Copyup on the first write to a file on a readonly branch. */
1127     + if (willwrite && IS_WRITE_FLAG(file->f_flags) &&
1128     + !IS_WRITE_FLAG(unionfs_lower_file(file)->f_flags) &&
1129     + is_robranch(dentry)) {
1130     + pr_debug("unionfs: do delay copyup of \"%s\"\n",
1131     + dentry->d_name.name);
1132     + err = do_delayed_copyup(file, parent);
1133     + /* regular files have only one open lower file */
1134     + if (!err && !S_ISDIR(dentry->d_inode->i_mode))
1135     + fbend(file) = fbstart(file);
1136     + }
1137     +
1138     +out:
1139     + if (err) {
1140     + kfree(UNIONFS_F(file)->lower_files);
1141     + kfree(UNIONFS_F(file)->saved_branch_ids);
1142     + }
1143     + return err;
1144     +}
1145     +
1146     +/*
1147     + * Revalidate the struct file
1148     + * @file: file to revalidate
1149     + * @parent: parent dentry (locked by caller)
1150     + * @willwrite: true if caller may cause changes to the file; false otherwise.
1151     + * Caller must lock/unlock dentry's branch configuration.
1152     + */
1153     +int unionfs_file_revalidate(struct file *file, struct dentry *parent,
1154     + bool willwrite)
1155     +{
1156     + struct super_block *sb;
1157     + struct dentry *dentry;
1158     + int sbgen, dgen;
1159     + int err = 0;
1160     +
1161     + dentry = file->f_path.dentry;
1162     + sb = dentry->d_sb;
1163     + verify_locked(dentry);
1164     + verify_locked(parent);
1165     +
1166     + /*
1167     + * First revalidate the dentry inside struct file,
1168     + * but not unhashed dentries.
1169     + */
1170     + if (!d_deleted(dentry) &&
1171     + !__unionfs_d_revalidate(dentry, parent, willwrite)) {
1172     + err = -ESTALE;
1173     + goto out;
1174     + }
1175     +
1176     + sbgen = atomic_read(&UNIONFS_SB(sb)->generation);
1177     + dgen = atomic_read(&UNIONFS_D(dentry)->generation);
1178     +
1179     + if (unlikely(sbgen > dgen)) { /* XXX: should never happen */
1180     + pr_debug("unionfs: failed to revalidate dentry (%s)\n",
1181     + dentry->d_name.name);
1182     + err = -ESTALE;
1183     + goto out;
1184     + }
1185     +
1186     + err = __unionfs_file_revalidate(file, dentry, parent, sb,
1187     + sbgen, dgen, willwrite);
1188     +out:
1189     + return err;
1190     +}
1191     +
1192     +/* unionfs_open helper function: open a directory */
1193     +static int __open_dir(struct inode *inode, struct file *file)
1194     +{
1195     + struct dentry *lower_dentry;
1196     + struct file *lower_file;
1197     + int bindex, bstart, bend;
1198     + struct vfsmount *mnt;
1199     +
1200     + bstart = fbstart(file) = dbstart(file->f_path.dentry);
1201     + bend = fbend(file) = dbend(file->f_path.dentry);
1202     +
1203     + for (bindex = bstart; bindex <= bend; bindex++) {
1204     + lower_dentry =
1205     + unionfs_lower_dentry_idx(file->f_path.dentry, bindex);
1206     + if (!lower_dentry)
1207     + continue;
1208     +
1209     + dget(lower_dentry);
1210     + unionfs_mntget(file->f_path.dentry, bindex);
1211     + mnt = unionfs_lower_mnt_idx(file->f_path.dentry, bindex);
1212     + lower_file = dentry_open(lower_dentry, mnt, file->f_flags,
1213     + current_cred());
1214     + if (IS_ERR(lower_file))
1215     + return PTR_ERR(lower_file);
1216     +
1217     + unionfs_set_lower_file_idx(file, bindex, lower_file);
1218     +
1219     + /*
1220     + * The branchget goes after the open, because otherwise
1221     + * we would miss the reference on release.
1222     + */
1223     + branchget(inode->i_sb, bindex);
1224     + }
1225     +
1226     + return 0;
1227     +}
1228     +
1229     +/* unionfs_open helper function: open a file */
1230     +static int __open_file(struct inode *inode, struct file *file,
1231     + struct dentry *parent)
1232     +{
1233     + struct dentry *lower_dentry;
1234     + struct file *lower_file;
1235     + int lower_flags;
1236     + int bindex, bstart, bend;
1237     +
1238     + lower_dentry = unionfs_lower_dentry(file->f_path.dentry);
1239     + lower_flags = file->f_flags;
1240     +
1241     + bstart = fbstart(file) = dbstart(file->f_path.dentry);
1242     + bend = fbend(file) = dbend(file->f_path.dentry);
1243     +
1244     + /*
1245     + * check for the permission for lower file. If the error is
1246     + * COPYUP_ERR, copyup the file.
1247     + */
1248     + if (lower_dentry->d_inode && is_robranch(file->f_path.dentry)) {
1249     + /*
1250     + * if the open will change the file, copy it up otherwise
1251     + * defer it.
1252     + */
1253     + if (lower_flags & O_TRUNC) {
1254     + int size = 0;
1255     + int err = -EROFS;
1256     +
1257     + /* copyup the file */
1258     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
1259     + err = copyup_file(parent->d_inode, file,
1260     + bstart, bindex, size);
1261     + if (!err)
1262     + break;
1263     + }
1264     + return err;
1265     + } else {
1266     + /*
1267     + * turn off writeable flags, to force delayed copyup
1268     + * by caller.
1269     + */
1270     + lower_flags &= ~(OPEN_WRITE_FLAGS);
1271     + }
1272     + }
1273     +
1274     + dget(lower_dentry);
1275     +
1276     + /*
1277     + * dentry_open will decrement mnt refcnt if err.
1278     + * otherwise fput() will do an mntput() for us upon file close.
1279     + */
1280     + unionfs_mntget(file->f_path.dentry, bstart);
1281     + lower_file =
1282     + dentry_open(lower_dentry,
1283     + unionfs_lower_mnt_idx(file->f_path.dentry, bstart),
1284     + lower_flags, current_cred());
1285     + if (IS_ERR(lower_file))
1286     + return PTR_ERR(lower_file);
1287     +
1288     + unionfs_set_lower_file(file, lower_file);
1289     + branchget(inode->i_sb, bstart);
1290     +
1291     + return 0;
1292     +}
1293     +
1294     +int unionfs_open(struct inode *inode, struct file *file)
1295     +{
1296     + int err = 0;
1297     + struct file *lower_file = NULL;
1298     + struct dentry *dentry = file->f_path.dentry;
1299     + struct dentry *parent;
1300     + int bindex = 0, bstart = 0, bend = 0;
1301     + int size;
1302     + int valid = 0;
1303     +
1304     + unionfs_read_lock(inode->i_sb, UNIONFS_SMUTEX_PARENT);
1305     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1306     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1307     +
1308     + /* don't open unhashed/deleted files */
1309     + if (d_deleted(dentry)) {
1310     + err = -ENOENT;
1311     + goto out_nofree;
1312     + }
1313     +
1314     + /* XXX: should I change 'false' below to the 'willwrite' flag? */
1315     + valid = __unionfs_d_revalidate(dentry, parent, false);
1316     + if (unlikely(!valid)) {
1317     + err = -ESTALE;
1318     + goto out_nofree;
1319     + }
1320     +
1321     + file->private_data =
1322     + kzalloc(sizeof(struct unionfs_file_info), GFP_KERNEL);
1323     + if (unlikely(!UNIONFS_F(file))) {
1324     + err = -ENOMEM;
1325     + goto out_nofree;
1326     + }
1327     + fbstart(file) = -1;
1328     + fbend(file) = -1;
1329     + atomic_set(&UNIONFS_F(file)->generation,
1330     + atomic_read(&UNIONFS_I(inode)->generation));
1331     +
1332     + size = sizeof(struct file *) * sbmax(inode->i_sb);
1333     + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1334     + if (unlikely(!UNIONFS_F(file)->lower_files)) {
1335     + err = -ENOMEM;
1336     + goto out;
1337     + }
1338     + size = sizeof(int) * sbmax(inode->i_sb);
1339     + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1340     + if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1341     + err = -ENOMEM;
1342     + goto out;
1343     + }
1344     +
1345     + bstart = fbstart(file) = dbstart(dentry);
1346     + bend = fbend(file) = dbend(dentry);
1347     +
1348     + /*
1349     + * open all directories and make the unionfs file struct point to
1350     + * these lower file structs
1351     + */
1352     + if (S_ISDIR(inode->i_mode))
1353     + err = __open_dir(inode, file); /* open a dir */
1354     + else
1355     + err = __open_file(inode, file, parent); /* open a file */
1356     +
1357     + /* freeing the allocated resources, and fput the opened files */
1358     + if (err) {
1359     + for (bindex = bstart; bindex <= bend; bindex++) {
1360     + lower_file = unionfs_lower_file_idx(file, bindex);
1361     + if (!lower_file)
1362     + continue;
1363     +
1364     + branchput(dentry->d_sb, bindex);
1365     + /* fput calls dput for lower_dentry */
1366     + fput(lower_file);
1367     + }
1368     + }
1369     +
1370     +out:
1371     + if (err) {
1372     + kfree(UNIONFS_F(file)->lower_files);
1373     + kfree(UNIONFS_F(file)->saved_branch_ids);
1374     + kfree(UNIONFS_F(file));
1375     + }
1376     +out_nofree:
1377     + if (!err) {
1378     + unionfs_postcopyup_setmnt(dentry);
1379     + unionfs_copy_attr_times(inode);
1380     + unionfs_check_file(file);
1381     + unionfs_check_inode(inode);
1382     + }
1383     + unionfs_unlock_dentry(dentry);
1384     + unionfs_unlock_parent(dentry, parent);
1385     + unionfs_read_unlock(inode->i_sb);
1386     + return err;
1387     +}
1388     +
1389     +/*
1390     + * release all lower object references & free the file info structure
1391     + *
1392     + * No need to grab sb info's rwsem.
1393     + */
1394     +int unionfs_file_release(struct inode *inode, struct file *file)
1395     +{
1396     + struct file *lower_file = NULL;
1397     + struct unionfs_file_info *fileinfo;
1398     + struct unionfs_inode_info *inodeinfo;
1399     + struct super_block *sb = inode->i_sb;
1400     + struct dentry *dentry = file->f_path.dentry;
1401     + struct dentry *parent;
1402     + int bindex, bstart, bend;
1403     + int fgen, err = 0;
1404     +
1405     + /*
1406     + * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
1407     + * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
1408     + * has been causing false positives in file system stacking layers.
1409     + * In particular, our ->mmap is called after sys_mmap2 already holds
1410     + * mmap_sem, then we lock our own mutexes; but earlier, it's
1411     + * possible for lockdep to have locked our mutexes first, and then
1412     + * we call a lower ->readdir which could call might_fault. The
1413     + * different ordering of the locks is what lockdep complains about
1414     + * -- unnecessarily. Therefore, we have no choice but to tell
1415     + * lockdep to temporarily turn off lockdep here. Note: the comments
1416     + * inside might_sleep also suggest that it would have been
1417     + * nicer to only annotate paths that needs that might_lock_read.
1418     + */
1419     + lockdep_off();
1420     + unionfs_read_lock(sb, UNIONFS_SMUTEX_PARENT);
1421     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1422     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1423     +
1424     + /*
1425     + * We try to revalidate, but the VFS ignores return return values
1426     + * from file->release, so we must always try to succeed here,
1427     + * including to do the kfree and dput below. So if revalidation
1428     + * failed, all we can do is print some message and keep going.
1429     + */
1430     + err = unionfs_file_revalidate(file, parent,
1431     + UNIONFS_F(file)->wrote_to_file);
1432     + if (!err)
1433     + unionfs_check_file(file);
1434     + fileinfo = UNIONFS_F(file);
1435     + BUG_ON(file->f_path.dentry->d_inode != inode);
1436     + inodeinfo = UNIONFS_I(inode);
1437     +
1438     + /* fput all the lower files */
1439     + fgen = atomic_read(&fileinfo->generation);
1440     + bstart = fbstart(file);
1441     + bend = fbend(file);
1442     +
1443     + for (bindex = bstart; bindex <= bend; bindex++) {
1444     + lower_file = unionfs_lower_file_idx(file, bindex);
1445     +
1446     + if (lower_file) {
1447     + unionfs_set_lower_file_idx(file, bindex, NULL);
1448     + fput(lower_file);
1449     + branchput(sb, bindex);
1450     + }
1451     +
1452     + /* if there are no more refs to the dentry, dput it */
1453     + if (d_deleted(dentry)) {
1454     + dput(unionfs_lower_dentry_idx(dentry, bindex));
1455     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1456     + }
1457     + }
1458     +
1459     + kfree(fileinfo->lower_files);
1460     + kfree(fileinfo->saved_branch_ids);
1461     +
1462     + if (fileinfo->rdstate) {
1463     + fileinfo->rdstate->access = jiffies;
1464     + spin_lock(&inodeinfo->rdlock);
1465     + inodeinfo->rdcount++;
1466     + list_add_tail(&fileinfo->rdstate->cache,
1467     + &inodeinfo->readdircache);
1468     + mark_inode_dirty(inode);
1469     + spin_unlock(&inodeinfo->rdlock);
1470     + fileinfo->rdstate = NULL;
1471     + }
1472     + kfree(fileinfo);
1473     +
1474     + unionfs_unlock_dentry(dentry);
1475     + unionfs_unlock_parent(dentry, parent);
1476     + unionfs_read_unlock(sb);
1477     + lockdep_on();
1478     + return err;
1479     +}
1480     +
1481     +/* pass the ioctl to the lower fs */
1482     +static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1483     +{
1484     + struct file *lower_file;
1485     + int err;
1486     +
1487     + lower_file = unionfs_lower_file(file);
1488     +
1489     + err = -ENOTTY;
1490     + if (!lower_file || !lower_file->f_op)
1491     + goto out;
1492     + if (lower_file->f_op->unlocked_ioctl) {
1493     + err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
1494     +#ifdef CONFIG_COMPAT
1495     + } else if (lower_file->f_op->ioctl) {
1496     + err = lower_file->f_op->compat_ioctl(
1497     + lower_file->f_path.dentry->d_inode,
1498     + lower_file, cmd, arg);
1499     +#endif
1500     + }
1501     +
1502     +out:
1503     + return err;
1504     +}
1505     +
1506     +/*
1507     + * return to user-space the branch indices containing the file in question
1508     + *
1509     + * We use fd_set and therefore we are limited to the number of the branches
1510     + * to FD_SETSIZE, which is currently 1024 - plenty for most people
1511     + */
1512     +static int unionfs_ioctl_queryfile(struct file *file, struct dentry *parent,
1513     + unsigned int cmd, unsigned long arg)
1514     +{
1515     + int err = 0;
1516     + fd_set branchlist;
1517     + int bstart = 0, bend = 0, bindex = 0;
1518     + int orig_bstart, orig_bend;
1519     + struct dentry *dentry, *lower_dentry;
1520     + struct vfsmount *mnt;
1521     +
1522     + dentry = file->f_path.dentry;
1523     + orig_bstart = dbstart(dentry);
1524     + orig_bend = dbend(dentry);
1525     + err = unionfs_partial_lookup(dentry, parent);
1526     + if (err)
1527     + goto out;
1528     + bstart = dbstart(dentry);
1529     + bend = dbend(dentry);
1530     +
1531     + FD_ZERO(&branchlist);
1532     +
1533     + for (bindex = bstart; bindex <= bend; bindex++) {
1534     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
1535     + if (!lower_dentry)
1536     + continue;
1537     + if (likely(lower_dentry->d_inode))
1538     + FD_SET(bindex, &branchlist);
1539     + /* purge any lower objects after partial_lookup */
1540     + if (bindex < orig_bstart || bindex > orig_bend) {
1541     + dput(lower_dentry);
1542     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1543     + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
1544     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
1545     + NULL);
1546     + mnt = unionfs_lower_mnt_idx(dentry, bindex);
1547     + if (!mnt)
1548     + continue;
1549     + unionfs_mntput(dentry, bindex);
1550     + unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
1551     + }
1552     + }
1553     + /* restore original dentry's offsets */
1554     + dbstart(dentry) = orig_bstart;
1555     + dbend(dentry) = orig_bend;
1556     + ibstart(dentry->d_inode) = orig_bstart;
1557     + ibend(dentry->d_inode) = orig_bend;
1558     +
1559     + err = copy_to_user((void __user *)arg, &branchlist, sizeof(fd_set));
1560     + if (unlikely(err))
1561     + err = -EFAULT;
1562     +
1563     +out:
1564     + return err < 0 ? err : bend;
1565     +}
1566     +
1567     +long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1568     +{
1569     + long err;
1570     + struct dentry *dentry = file->f_path.dentry;
1571     + struct dentry *parent;
1572     +
1573     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1574     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1575     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1576     +
1577     + err = unionfs_file_revalidate(file, parent, true);
1578     + if (unlikely(err))
1579     + goto out;
1580     +
1581     + /* check if asked for local commands */
1582     + switch (cmd) {
1583     + case UNIONFS_IOCTL_INCGEN:
1584     + /* Increment the superblock generation count */
1585     + pr_info("unionfs: incgen ioctl deprecated; "
1586     + "use \"-o remount,incgen\"\n");
1587     + err = -ENOSYS;
1588     + break;
1589     +
1590     + case UNIONFS_IOCTL_QUERYFILE:
1591     + /* Return list of branches containing the given file */
1592     + err = unionfs_ioctl_queryfile(file, parent, cmd, arg);
1593     + break;
1594     +
1595     + default:
1596     + /* pass the ioctl down */
1597     + err = do_ioctl(file, cmd, arg);
1598     + break;
1599     + }
1600     +
1601     +out:
1602     + unionfs_check_file(file);
1603     + unionfs_unlock_dentry(dentry);
1604     + unionfs_unlock_parent(dentry, parent);
1605     + unionfs_read_unlock(dentry->d_sb);
1606     + return err;
1607     +}
1608     +
1609     +int unionfs_flush(struct file *file, fl_owner_t id)
1610     +{
1611     + int err = 0;
1612     + struct file *lower_file = NULL;
1613     + struct dentry *dentry = file->f_path.dentry;
1614     + struct dentry *parent;
1615     + int bindex, bstart, bend;
1616     +
1617     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1618     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1619     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1620     +
1621     + err = unionfs_file_revalidate(file, parent,
1622     + UNIONFS_F(file)->wrote_to_file);
1623     + if (unlikely(err))
1624     + goto out;
1625     + unionfs_check_file(file);
1626     +
1627     + bstart = fbstart(file);
1628     + bend = fbend(file);
1629     + for (bindex = bstart; bindex <= bend; bindex++) {
1630     + lower_file = unionfs_lower_file_idx(file, bindex);
1631     +
1632     + if (lower_file && lower_file->f_op &&
1633     + lower_file->f_op->flush) {
1634     + err = lower_file->f_op->flush(lower_file, id);
1635     + if (err)
1636     + goto out;
1637     + }
1638     +
1639     + }
1640     +
1641     +out:
1642     + if (!err)
1643     + unionfs_check_file(file);
1644     + unionfs_unlock_dentry(dentry);
1645     + unionfs_unlock_parent(dentry, parent);
1646     + unionfs_read_unlock(dentry->d_sb);
1647     + return err;
1648     +}
1649     diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
1650     new file mode 100644
1651     index 0000000..bba3a75
1652     --- /dev/null
1653     +++ b/fs/unionfs/copyup.c
1654     @@ -0,0 +1,896 @@
1655     +/*
1656     + * Copyright (c) 2003-2010 Erez Zadok
1657     + * Copyright (c) 2003-2006 Charles P. Wright
1658     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
1659     + * Copyright (c) 2005-2006 Junjiro Okajima
1660     + * Copyright (c) 2005 Arun M. Krishnakumar
1661     + * Copyright (c) 2004-2006 David P. Quigley
1662     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
1663     + * Copyright (c) 2003 Puja Gupta
1664     + * Copyright (c) 2003 Harikesavan Krishnan
1665     + * Copyright (c) 2003-2010 Stony Brook University
1666     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
1667     + *
1668     + * This program is free software; you can redistribute it and/or modify
1669     + * it under the terms of the GNU General Public License version 2 as
1670     + * published by the Free Software Foundation.
1671     + */
1672     +
1673     +#include "union.h"
1674     +
1675     +/*
1676     + * For detailed explanation of copyup see:
1677     + * Documentation/filesystems/unionfs/concepts.txt
1678     + */
1679     +
1680     +#ifdef CONFIG_UNION_FS_XATTR
1681     +/* copyup all extended attrs for a given dentry */
1682     +static int copyup_xattrs(struct dentry *old_lower_dentry,
1683     + struct dentry *new_lower_dentry)
1684     +{
1685     + int err = 0;
1686     + ssize_t list_size = -1;
1687     + char *name_list = NULL;
1688     + char *attr_value = NULL;
1689     + char *name_list_buf = NULL;
1690     +
1691     + /* query the actual size of the xattr list */
1692     + list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
1693     + if (list_size <= 0) {
1694     + err = list_size;
1695     + goto out;
1696     + }
1697     +
1698     + /* allocate space for the actual list */
1699     + name_list = unionfs_xattr_alloc(list_size + 1, XATTR_LIST_MAX);
1700     + if (unlikely(!name_list || IS_ERR(name_list))) {
1701     + err = PTR_ERR(name_list);
1702     + goto out;
1703     + }
1704     +
1705     + name_list_buf = name_list; /* save for kfree at end */
1706     +
1707     + /* now get the actual xattr list of the source file */
1708     + list_size = vfs_listxattr(old_lower_dentry, name_list, list_size);
1709     + if (list_size <= 0) {
1710     + err = list_size;
1711     + goto out;
1712     + }
1713     +
1714     + /* allocate space to hold each xattr's value */
1715     + attr_value = unionfs_xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX);
1716     + if (unlikely(!attr_value || IS_ERR(attr_value))) {
1717     + err = PTR_ERR(name_list);
1718     + goto out;
1719     + }
1720     +
1721     + /* in a loop, get and set each xattr from src to dst file */
1722     + while (*name_list) {
1723     + ssize_t size;
1724     +
1725     + /* Lock here since vfs_getxattr doesn't lock for us */
1726     + mutex_lock(&old_lower_dentry->d_inode->i_mutex);
1727     + size = vfs_getxattr(old_lower_dentry, name_list,
1728     + attr_value, XATTR_SIZE_MAX);
1729     + mutex_unlock(&old_lower_dentry->d_inode->i_mutex);
1730     + if (size < 0) {
1731     + err = size;
1732     + goto out;
1733     + }
1734     + if (size > XATTR_SIZE_MAX) {
1735     + err = -E2BIG;
1736     + goto out;
1737     + }
1738     + /* Don't lock here since vfs_setxattr does it for us. */
1739     + err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
1740     + size, 0);
1741     + /*
1742     + * Selinux depends on "security.*" xattrs, so to maintain
1743     + * the security of copied-up files, if Selinux is active,
1744     + * then we must copy these xattrs as well. So we need to
1745     + * temporarily get FOWNER privileges.
1746     + * XXX: move entire copyup code to SIOQ.
1747     + */
1748     + if (err == -EPERM && !capable(CAP_FOWNER)) {
1749     + const struct cred *old_creds;
1750     + struct cred *new_creds;
1751     +
1752     + new_creds = prepare_creds();
1753     + if (unlikely(!new_creds)) {
1754     + err = -ENOMEM;
1755     + goto out;
1756     + }
1757     + cap_raise(new_creds->cap_effective, CAP_FOWNER);
1758     + old_creds = override_creds(new_creds);
1759     + err = vfs_setxattr(new_lower_dentry, name_list,
1760     + attr_value, size, 0);
1761     + revert_creds(old_creds);
1762     + }
1763     + if (err < 0)
1764     + goto out;
1765     + name_list += strlen(name_list) + 1;
1766     + }
1767     +out:
1768     + unionfs_xattr_kfree(name_list_buf);
1769     + unionfs_xattr_kfree(attr_value);
1770     + /* Ignore if xattr isn't supported */
1771     + if (err == -ENOTSUPP || err == -EOPNOTSUPP)
1772     + err = 0;
1773     + return err;
1774     +}
1775     +#endif /* CONFIG_UNION_FS_XATTR */
1776     +
1777     +/*
1778     + * Determine the mode based on the copyup flags, and the existing dentry.
1779     + *
1780     + * Handle file systems which may not support certain options. For example
1781     + * jffs2 doesn't allow one to chmod a symlink. So we ignore such harmless
1782     + * errors, rather than propagating them up, which results in copyup errors
1783     + * and errors returned back to users.
1784     + */
1785     +static int copyup_permissions(struct super_block *sb,
1786     + struct dentry *old_lower_dentry,
1787     + struct dentry *new_lower_dentry)
1788     +{
1789     + struct inode *i = old_lower_dentry->d_inode;
1790     + struct iattr newattrs;
1791     + int err;
1792     +
1793     + newattrs.ia_atime = i->i_atime;
1794     + newattrs.ia_mtime = i->i_mtime;
1795     + newattrs.ia_ctime = i->i_ctime;
1796     + newattrs.ia_gid = i->i_gid;
1797     + newattrs.ia_uid = i->i_uid;
1798     + newattrs.ia_valid = ATTR_CTIME | ATTR_ATIME | ATTR_MTIME |
1799     + ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_FORCE |
1800     + ATTR_GID | ATTR_UID;
1801     + mutex_lock(&new_lower_dentry->d_inode->i_mutex);
1802     + err = notify_change(new_lower_dentry, &newattrs);
1803     + if (err)
1804     + goto out;
1805     +
1806     + /* now try to change the mode and ignore EOPNOTSUPP on symlinks */
1807     + newattrs.ia_mode = i->i_mode;
1808     + newattrs.ia_valid = ATTR_MODE | ATTR_FORCE;
1809     + err = notify_change(new_lower_dentry, &newattrs);
1810     + if (err == -EOPNOTSUPP &&
1811     + S_ISLNK(new_lower_dentry->d_inode->i_mode)) {
1812     + printk(KERN_WARNING
1813     + "unionfs: changing \"%s\" symlink mode unsupported\n",
1814     + new_lower_dentry->d_name.name);
1815     + err = 0;
1816     + }
1817     +
1818     +out:
1819     + mutex_unlock(&new_lower_dentry->d_inode->i_mutex);
1820     + return err;
1821     +}
1822     +
1823     +/*
1824     + * create the new device/file/directory - use copyup_permission to copyup
1825     + * times, and mode
1826     + *
1827     + * if the object being copied up is a regular file, the file is only created,
1828     + * the contents have to be copied up separately
1829     + */
1830     +static int __copyup_ndentry(struct dentry *old_lower_dentry,
1831     + struct dentry *new_lower_dentry,
1832     + struct dentry *new_lower_parent_dentry,
1833     + char *symbuf)
1834     +{
1835     + int err = 0;
1836     + umode_t old_mode = old_lower_dentry->d_inode->i_mode;
1837     + struct sioq_args args;
1838     +
1839     + if (S_ISDIR(old_mode)) {
1840     + args.mkdir.parent = new_lower_parent_dentry->d_inode;
1841     + args.mkdir.dentry = new_lower_dentry;
1842     + args.mkdir.mode = old_mode;
1843     +
1844     + run_sioq(__unionfs_mkdir, &args);
1845     + err = args.err;
1846     + } else if (S_ISLNK(old_mode)) {
1847     + args.symlink.parent = new_lower_parent_dentry->d_inode;
1848     + args.symlink.dentry = new_lower_dentry;
1849     + args.symlink.symbuf = symbuf;
1850     +
1851     + run_sioq(__unionfs_symlink, &args);
1852     + err = args.err;
1853     + } else if (S_ISBLK(old_mode) || S_ISCHR(old_mode) ||
1854     + S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) {
1855     + args.mknod.parent = new_lower_parent_dentry->d_inode;
1856     + args.mknod.dentry = new_lower_dentry;
1857     + args.mknod.mode = old_mode;
1858     + args.mknod.dev = old_lower_dentry->d_inode->i_rdev;
1859     +
1860     + run_sioq(__unionfs_mknod, &args);
1861     + err = args.err;
1862     + } else if (S_ISREG(old_mode)) {
1863     + struct nameidata nd;
1864     + err = init_lower_nd(&nd, LOOKUP_CREATE);
1865     + if (unlikely(err < 0))
1866     + goto out;
1867     + args.create.nd = &nd;
1868     + args.create.parent = new_lower_parent_dentry->d_inode;
1869     + args.create.dentry = new_lower_dentry;
1870     + args.create.mode = old_mode;
1871     +
1872     + run_sioq(__unionfs_create, &args);
1873     + err = args.err;
1874     + release_lower_nd(&nd, err);
1875     + } else {
1876     + printk(KERN_CRIT "unionfs: unknown inode type %d\n",
1877     + old_mode);
1878     + BUG();
1879     + }
1880     +
1881     +out:
1882     + return err;
1883     +}
1884     +
1885     +static int __copyup_reg_data(struct dentry *dentry,
1886     + struct dentry *new_lower_dentry, int new_bindex,
1887     + struct dentry *old_lower_dentry, int old_bindex,
1888     + struct file **copyup_file, loff_t len)
1889     +{
1890     + struct super_block *sb = dentry->d_sb;
1891     + struct file *input_file;
1892     + struct file *output_file;
1893     + struct vfsmount *output_mnt;
1894     + mm_segment_t old_fs;
1895     + char *buf = NULL;
1896     + ssize_t read_bytes, write_bytes;
1897     + loff_t size;
1898     + int err = 0;
1899     +
1900     + /* open old file */
1901     + unionfs_mntget(dentry, old_bindex);
1902     + branchget(sb, old_bindex);
1903     + /* dentry_open calls dput and mntput if it returns an error */
1904     + input_file = dentry_open(old_lower_dentry,
1905     + unionfs_lower_mnt_idx(dentry, old_bindex),
1906     + O_RDONLY | O_LARGEFILE, current_cred());
1907     + if (IS_ERR(input_file)) {
1908     + dput(old_lower_dentry);
1909     + err = PTR_ERR(input_file);
1910     + goto out;
1911     + }
1912     + if (unlikely(!input_file->f_op || !input_file->f_op->read)) {
1913     + err = -EINVAL;
1914     + goto out_close_in;
1915     + }
1916     +
1917     + /* open new file */
1918     + dget(new_lower_dentry);
1919     + output_mnt = unionfs_mntget(sb->s_root, new_bindex);
1920     + branchget(sb, new_bindex);
1921     + output_file = dentry_open(new_lower_dentry, output_mnt,
1922     + O_RDWR | O_LARGEFILE, current_cred());
1923     + if (IS_ERR(output_file)) {
1924     + err = PTR_ERR(output_file);
1925     + goto out_close_in2;
1926     + }
1927     + if (unlikely(!output_file->f_op || !output_file->f_op->write)) {
1928     + err = -EINVAL;
1929     + goto out_close_out;
1930     + }
1931     +
1932     + /* allocating a buffer */
1933     + buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
1934     + if (unlikely(!buf)) {
1935     + err = -ENOMEM;
1936     + goto out_close_out;
1937     + }
1938     +
1939     + input_file->f_pos = 0;
1940     + output_file->f_pos = 0;
1941     +
1942     + old_fs = get_fs();
1943     + set_fs(KERNEL_DS);
1944     +
1945     + size = len;
1946     + err = 0;
1947     + do {
1948     + if (len >= PAGE_SIZE)
1949     + size = PAGE_SIZE;
1950     + else if ((len < PAGE_SIZE) && (len > 0))
1951     + size = len;
1952     +
1953     + len -= PAGE_SIZE;
1954     +
1955     + read_bytes =
1956     + input_file->f_op->read(input_file,
1957     + (char __user *)buf, size,
1958     + &input_file->f_pos);
1959     + if (read_bytes <= 0) {
1960     + err = read_bytes;
1961     + break;
1962     + }
1963     +
1964     + /* see Documentation/filesystems/unionfs/issues.txt */
1965     + lockdep_off();
1966     + write_bytes =
1967     + output_file->f_op->write(output_file,
1968     + (char __user *)buf,
1969     + read_bytes,
1970     + &output_file->f_pos);
1971     + lockdep_on();
1972     + if ((write_bytes < 0) || (write_bytes < read_bytes)) {
1973     + err = write_bytes;
1974     + break;
1975     + }
1976     + } while ((read_bytes > 0) && (len > 0));
1977     +
1978     + set_fs(old_fs);
1979     +
1980     + kfree(buf);
1981     +
1982     + if (!err)
1983     + err = output_file->f_op->fsync(output_file, 0);
1984     +
1985     + if (err)
1986     + goto out_close_out;
1987     +
1988     + if (copyup_file) {
1989     + *copyup_file = output_file;
1990     + goto out_close_in;
1991     + }
1992     +
1993     +out_close_out:
1994     + fput(output_file);
1995     +
1996     +out_close_in2:
1997     + branchput(sb, new_bindex);
1998     +
1999     +out_close_in:
2000     + fput(input_file);
2001     +
2002     +out:
2003     + branchput(sb, old_bindex);
2004     +
2005     + return err;
2006     +}
2007     +
2008     +/*
2009     + * dput the lower references for old and new dentry & clear a lower dentry
2010     + * pointer
2011     + */
2012     +static void __clear(struct dentry *dentry, struct dentry *old_lower_dentry,
2013     + int old_bstart, int old_bend,
2014     + struct dentry *new_lower_dentry, int new_bindex)
2015     +{
2016     + /* get rid of the lower dentry and all its traces */
2017     + unionfs_set_lower_dentry_idx(dentry, new_bindex, NULL);
2018     + dbstart(dentry) = old_bstart;
2019     + dbend(dentry) = old_bend;
2020     +
2021     + dput(new_lower_dentry);
2022     + dput(old_lower_dentry);
2023     +}
2024     +
2025     +/*
2026     + * Copy up a dentry to a file of specified name.
2027     + *
2028     + * @dir: used to pull the ->i_sb to access other branches
2029     + * @dentry: the non-negative dentry whose lower_inode we should copy
2030     + * @bstart: the branch of the lower_inode to copy from
2031     + * @new_bindex: the branch to create the new file in
2032     + * @name: the name of the file to create
2033     + * @namelen: length of @name
2034     + * @copyup_file: the "struct file" to return (optional)
2035     + * @len: how many bytes to copy-up?
2036     + */
2037     +int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,
2038     + int new_bindex, const char *name, int namelen,
2039     + struct file **copyup_file, loff_t len)
2040     +{
2041     + struct dentry *new_lower_dentry;
2042     + struct dentry *old_lower_dentry = NULL;
2043     + struct super_block *sb;
2044     + int err = 0;
2045     + int old_bindex;
2046     + int old_bstart;
2047     + int old_bend;
2048     + struct dentry *new_lower_parent_dentry = NULL;
2049     + mm_segment_t oldfs;
2050     + char *symbuf = NULL;
2051     +
2052     + verify_locked(dentry);
2053     +
2054     + old_bindex = bstart;
2055     + old_bstart = dbstart(dentry);
2056     + old_bend = dbend(dentry);
2057     +
2058     + BUG_ON(new_bindex < 0);
2059     + BUG_ON(new_bindex >= old_bindex);
2060     +
2061     + sb = dir->i_sb;
2062     +
2063     + err = is_robranch_super(sb, new_bindex);
2064     + if (err)
2065     + goto out;
2066     +
2067     + /* Create the directory structure above this dentry. */
2068     + new_lower_dentry = create_parents(dir, dentry, name, new_bindex);
2069     + if (IS_ERR(new_lower_dentry)) {
2070     + err = PTR_ERR(new_lower_dentry);
2071     + goto out;
2072     + }
2073     +
2074     + old_lower_dentry = unionfs_lower_dentry_idx(dentry, old_bindex);
2075     + /* we conditionally dput this old_lower_dentry at end of function */
2076     + dget(old_lower_dentry);
2077     +
2078     + /* For symlinks, we must read the link before we lock the directory. */
2079     + if (S_ISLNK(old_lower_dentry->d_inode->i_mode)) {
2080     +
2081     + symbuf = kmalloc(PATH_MAX, GFP_KERNEL);
2082     + if (unlikely(!symbuf)) {
2083     + __clear(dentry, old_lower_dentry,
2084     + old_bstart, old_bend,
2085     + new_lower_dentry, new_bindex);
2086     + err = -ENOMEM;
2087     + goto out_free;
2088     + }
2089     +
2090     + oldfs = get_fs();
2091     + set_fs(KERNEL_DS);
2092     + err = old_lower_dentry->d_inode->i_op->readlink(
2093     + old_lower_dentry,
2094     + (char __user *)symbuf,
2095     + PATH_MAX);
2096     + set_fs(oldfs);
2097     + if (err < 0) {
2098     + __clear(dentry, old_lower_dentry,
2099     + old_bstart, old_bend,
2100     + new_lower_dentry, new_bindex);
2101     + goto out_free;
2102     + }
2103     + symbuf[err] = '\0';
2104     + }
2105     +
2106     + /* Now we lock the parent, and create the object in the new branch. */
2107     + new_lower_parent_dentry = lock_parent(new_lower_dentry);
2108     +
2109     + /* create the new inode */
2110     + err = __copyup_ndentry(old_lower_dentry, new_lower_dentry,
2111     + new_lower_parent_dentry, symbuf);
2112     +
2113     + if (err) {
2114     + __clear(dentry, old_lower_dentry,
2115     + old_bstart, old_bend,
2116     + new_lower_dentry, new_bindex);
2117     + goto out_unlock;
2118     + }
2119     +
2120     + /* We actually copyup the file here. */
2121     + if (S_ISREG(old_lower_dentry->d_inode->i_mode))
2122     + err = __copyup_reg_data(dentry, new_lower_dentry, new_bindex,
2123     + old_lower_dentry, old_bindex,
2124     + copyup_file, len);
2125     + if (err)
2126     + goto out_unlink;
2127     +
2128     + /* Set permissions. */
2129     + err = copyup_permissions(sb, old_lower_dentry, new_lower_dentry);
2130     + if (err)
2131     + goto out_unlink;
2132     +
2133     +#ifdef CONFIG_UNION_FS_XATTR
2134     + /* Selinux uses extended attributes for permissions. */
2135     + err = copyup_xattrs(old_lower_dentry, new_lower_dentry);
2136     + if (err)
2137     + goto out_unlink;
2138     +#endif /* CONFIG_UNION_FS_XATTR */
2139     +
2140     + /* do not allow files getting deleted to be re-interposed */
2141     + if (!d_deleted(dentry))
2142     + unionfs_reinterpose(dentry);
2143     +
2144     + goto out_unlock;
2145     +
2146     +out_unlink:
2147     + /*
2148     + * copyup failed, because we possibly ran out of space or
2149     + * quota, or something else happened so let's unlink; we don't
2150     + * really care about the return value of vfs_unlink
2151     + */
2152     + vfs_unlink(new_lower_parent_dentry->d_inode, new_lower_dentry);
2153     +
2154     + if (copyup_file) {
2155     + /* need to close the file */
2156     +
2157     + fput(*copyup_file);
2158     + branchput(sb, new_bindex);
2159     + }
2160     +
2161     + /*
2162     + * TODO: should we reset the error to something like -EIO?
2163     + *
2164     + * If we don't reset, the user may get some nonsensical errors, but
2165     + * on the other hand, if we reset to EIO, we guarantee that the user
2166     + * will get a "confusing" error message.
2167     + */
2168     +
2169     +out_unlock:
2170     + unlock_dir(new_lower_parent_dentry);
2171     +
2172     +out_free:
2173     + /*
2174     + * If old_lower_dentry was not a file, then we need to dput it. If
2175     + * it was a file, then it was already dput indirectly by other
2176     + * functions we call above which operate on regular files.
2177     + */
2178     + if (old_lower_dentry && old_lower_dentry->d_inode &&
2179     + !S_ISREG(old_lower_dentry->d_inode->i_mode))
2180     + dput(old_lower_dentry);
2181     + kfree(symbuf);
2182     +
2183     + if (err) {
2184     + /*
2185     + * if directory creation succeeded, but inode copyup failed,
2186     + * then purge new dentries.
2187     + */
2188     + if (dbstart(dentry) < old_bstart &&
2189     + ibstart(dentry->d_inode) > dbstart(dentry))
2190     + __clear(dentry, NULL, old_bstart, old_bend,
2191     + unionfs_lower_dentry(dentry), dbstart(dentry));
2192     + goto out;
2193     + }
2194     + if (!S_ISDIR(dentry->d_inode->i_mode)) {
2195     + unionfs_postcopyup_release(dentry);
2196     + if (!unionfs_lower_inode(dentry->d_inode)) {
2197     + /*
2198     + * If we got here, then we copied up to an
2199     + * unlinked-open file, whose name is .unionfsXXXXX.
2200     + */
2201     + struct inode *inode = new_lower_dentry->d_inode;
2202     + atomic_inc(&inode->i_count);
2203     + unionfs_set_lower_inode_idx(dentry->d_inode,
2204     + ibstart(dentry->d_inode),
2205     + inode);
2206     + }
2207     + }
2208     + unionfs_postcopyup_setmnt(dentry);
2209     + /* sync inode times from copied-up inode to our inode */
2210     + unionfs_copy_attr_times(dentry->d_inode);
2211     + unionfs_check_inode(dir);
2212     + unionfs_check_dentry(dentry);
2213     +out:
2214     + return err;
2215     +}
2216     +
2217     +/*
2218     + * This function creates a copy of a file represented by 'file' which
2219     + * currently resides in branch 'bstart' to branch 'new_bindex.' The copy
2220     + * will be named "name".
2221     + */
2222     +int copyup_named_file(struct inode *dir, struct file *file, char *name,
2223     + int bstart, int new_bindex, loff_t len)
2224     +{
2225     + int err = 0;
2226     + struct file *output_file = NULL;
2227     +
2228     + err = copyup_dentry(dir, file->f_path.dentry, bstart, new_bindex,
2229     + name, strlen(name), &output_file, len);
2230     + if (!err) {
2231     + fbstart(file) = new_bindex;
2232     + unionfs_set_lower_file_idx(file, new_bindex, output_file);
2233     + }
2234     +
2235     + return err;
2236     +}
2237     +
2238     +/*
2239     + * This function creates a copy of a file represented by 'file' which
2240     + * currently resides in branch 'bstart' to branch 'new_bindex'.
2241     + */
2242     +int copyup_file(struct inode *dir, struct file *file, int bstart,
2243     + int new_bindex, loff_t len)
2244     +{
2245     + int err = 0;
2246     + struct file *output_file = NULL;
2247     + struct dentry *dentry = file->f_path.dentry;
2248     +
2249     + err = copyup_dentry(dir, dentry, bstart, new_bindex,
2250     + dentry->d_name.name, dentry->d_name.len,
2251     + &output_file, len);
2252     + if (!err) {
2253     + fbstart(file) = new_bindex;
2254     + unionfs_set_lower_file_idx(file, new_bindex, output_file);
2255     + }
2256     +
2257     + return err;
2258     +}
2259     +
2260     +/* purge a dentry's lower-branch states (dput/mntput, etc.) */
2261     +static void __cleanup_dentry(struct dentry *dentry, int bindex,
2262     + int old_bstart, int old_bend)
2263     +{
2264     + int loop_start;
2265     + int loop_end;
2266     + int new_bstart = -1;
2267     + int new_bend = -1;
2268     + int i;
2269     +
2270     + loop_start = min(old_bstart, bindex);
2271     + loop_end = max(old_bend, bindex);
2272     +
2273     + /*
2274     + * This loop sets the bstart and bend for the new dentry by
2275     + * traversing from left to right. It also dputs all negative
2276     + * dentries except bindex
2277     + */
2278     + for (i = loop_start; i <= loop_end; i++) {
2279     + if (!unionfs_lower_dentry_idx(dentry, i))
2280     + continue;
2281     +
2282     + if (i == bindex) {
2283     + new_bend = i;
2284     + if (new_bstart < 0)
2285     + new_bstart = i;
2286     + continue;
2287     + }
2288     +
2289     + if (!unionfs_lower_dentry_idx(dentry, i)->d_inode) {
2290     + dput(unionfs_lower_dentry_idx(dentry, i));
2291     + unionfs_set_lower_dentry_idx(dentry, i, NULL);
2292     +
2293     + unionfs_mntput(dentry, i);
2294     + unionfs_set_lower_mnt_idx(dentry, i, NULL);
2295     + } else {
2296     + if (new_bstart < 0)
2297     + new_bstart = i;
2298     + new_bend = i;
2299     + }
2300     + }
2301     +
2302     + if (new_bstart < 0)
2303     + new_bstart = bindex;
2304     + if (new_bend < 0)
2305     + new_bend = bindex;
2306     + dbstart(dentry) = new_bstart;
2307     + dbend(dentry) = new_bend;
2308     +
2309     +}
2310     +
2311     +/* set lower inode ptr and update bstart & bend if necessary */
2312     +static void __set_inode(struct dentry *upper, struct dentry *lower,
2313     + int bindex)
2314     +{
2315     + unionfs_set_lower_inode_idx(upper->d_inode, bindex,
2316     + igrab(lower->d_inode));
2317     + if (likely(ibstart(upper->d_inode) > bindex))
2318     + ibstart(upper->d_inode) = bindex;
2319     + if (likely(ibend(upper->d_inode) < bindex))
2320     + ibend(upper->d_inode) = bindex;
2321     +
2322     +}
2323     +
2324     +/* set lower dentry ptr and update bstart & bend if necessary */
2325     +static void __set_dentry(struct dentry *upper, struct dentry *lower,
2326     + int bindex)
2327     +{
2328     + unionfs_set_lower_dentry_idx(upper, bindex, lower);
2329     + if (likely(dbstart(upper) > bindex))
2330     + dbstart(upper) = bindex;
2331     + if (likely(dbend(upper) < bindex))
2332     + dbend(upper) = bindex;
2333     +}
2334     +
2335     +/*
2336     + * This function replicates the directory structure up-to given dentry
2337     + * in the bindex branch.
2338     + */
2339     +struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
2340     + const char *name, int bindex)
2341     +{
2342     + int err;
2343     + struct dentry *child_dentry;
2344     + struct dentry *parent_dentry;
2345     + struct dentry *lower_parent_dentry = NULL;
2346     + struct dentry *lower_dentry = NULL;
2347     + const char *childname;
2348     + unsigned int childnamelen;
2349     + int nr_dentry;
2350     + int count = 0;
2351     + int old_bstart;
2352     + int old_bend;
2353     + struct dentry **path = NULL;
2354     + struct super_block *sb;
2355     +
2356     + verify_locked(dentry);
2357     +
2358     + err = is_robranch_super(dir->i_sb, bindex);
2359     + if (err) {
2360     + lower_dentry = ERR_PTR(err);
2361     + goto out;
2362     + }
2363     +
2364     + old_bstart = dbstart(dentry);
2365     + old_bend = dbend(dentry);
2366     +
2367     + lower_dentry = ERR_PTR(-ENOMEM);
2368     +
2369     + /* There is no sense allocating any less than the minimum. */
2370     + nr_dentry = 1;
2371     + path = kmalloc(nr_dentry * sizeof(struct dentry *), GFP_KERNEL);
2372     + if (unlikely(!path))
2373     + goto out;
2374     +
2375     + /* assume the negative dentry of unionfs as the parent dentry */
2376     + parent_dentry = dentry;
2377     +
2378     + /*
2379     + * This loop finds the first parent that exists in the given branch.
2380     + * We start building the directory structure from there. At the end
2381     + * of the loop, the following should hold:
2382     + * - child_dentry is the first nonexistent child
2383     + * - parent_dentry is the first existent parent
2384     + * - path[0] is the = deepest child
2385     + * - path[count] is the first child to create
2386     + */
2387     + do {
2388     + child_dentry = parent_dentry;
2389     +
2390     + /* find the parent directory dentry in unionfs */
2391     + parent_dentry = dget_parent(child_dentry);
2392     +
2393     + /* find out the lower_parent_dentry in the given branch */
2394     + lower_parent_dentry =
2395     + unionfs_lower_dentry_idx(parent_dentry, bindex);
2396     +
2397     + /* grow path table */
2398     + if (count == nr_dentry) {
2399     + void *p;
2400     +
2401     + nr_dentry *= 2;
2402     + p = krealloc(path, nr_dentry * sizeof(struct dentry *),
2403     + GFP_KERNEL);
2404     + if (unlikely(!p)) {
2405     + lower_dentry = ERR_PTR(-ENOMEM);
2406     + goto out;
2407     + }
2408     + path = p;
2409     + }
2410     +
2411     + /* store the child dentry */
2412     + path[count++] = child_dentry;
2413     + } while (!lower_parent_dentry);
2414     + count--;
2415     +
2416     + sb = dentry->d_sb;
2417     +
2418     + /*
2419     + * This code goes between the begin/end labels and basically
2420     + * emulates a while(child_dentry != dentry), only cleaner and
2421     + * shorter than what would be a much longer while loop.
2422     + */
2423     +begin:
2424     + /* get lower parent dir in the current branch */
2425     + lower_parent_dentry = unionfs_lower_dentry_idx(parent_dentry, bindex);
2426     + dput(parent_dentry);
2427     +
2428     + /* init the values to lookup */
2429     + childname = child_dentry->d_name.name;
2430     + childnamelen = child_dentry->d_name.len;
2431     +
2432     + if (child_dentry != dentry) {
2433     + /* lookup child in the underlying file system */
2434     + lower_dentry = lookup_lck_len(childname, lower_parent_dentry,
2435     + childnamelen);
2436     + if (IS_ERR(lower_dentry))
2437     + goto out;
2438     + } else {
2439     + /*
2440     + * Is the name a whiteout of the child name ? lookup the
2441     + * whiteout child in the underlying file system
2442     + */
2443     + lower_dentry = lookup_lck_len(name, lower_parent_dentry,
2444     + strlen(name));
2445     + if (IS_ERR(lower_dentry))
2446     + goto out;
2447     +
2448     + /* Replace the current dentry (if any) with the new one */
2449     + dput(unionfs_lower_dentry_idx(dentry, bindex));
2450     + unionfs_set_lower_dentry_idx(dentry, bindex,
2451     + lower_dentry);
2452     +
2453     + __cleanup_dentry(dentry, bindex, old_bstart, old_bend);
2454     + goto out;
2455     + }
2456     +
2457     + if (lower_dentry->d_inode) {
2458     + /*
2459     + * since this already exists we dput to avoid
2460     + * multiple references on the same dentry
2461     + */
2462     + dput(lower_dentry);
2463     + } else {
2464     + struct sioq_args args;
2465     +
2466     + /* it's a negative dentry, create a new dir */
2467     + lower_parent_dentry = lock_parent(lower_dentry);
2468     +
2469     + args.mkdir.parent = lower_parent_dentry->d_inode;
2470     + args.mkdir.dentry = lower_dentry;
2471     + args.mkdir.mode = child_dentry->d_inode->i_mode;
2472     +
2473     + run_sioq(__unionfs_mkdir, &args);
2474     + err = args.err;
2475     +
2476     + if (!err)
2477     + err = copyup_permissions(dir->i_sb, child_dentry,
2478     + lower_dentry);
2479     + unlock_dir(lower_parent_dentry);
2480     + if (err) {
2481     + dput(lower_dentry);
2482     + lower_dentry = ERR_PTR(err);
2483     + goto out;
2484     + }
2485     +
2486     + }
2487     +
2488     + __set_inode(child_dentry, lower_dentry, bindex);
2489     + __set_dentry(child_dentry, lower_dentry, bindex);
2490     + /*
2491     + * update times of this dentry, but also the parent, because if
2492     + * we changed, the parent may have changed too.
2493     + */
2494     + fsstack_copy_attr_times(parent_dentry->d_inode,
2495     + lower_parent_dentry->d_inode);
2496     + unionfs_copy_attr_times(child_dentry->d_inode);
2497     +
2498     + parent_dentry = child_dentry;
2499     + child_dentry = path[--count];
2500     + goto begin;
2501     +out:
2502     + /* cleanup any leftover locks from the do/while loop above */
2503     + if (IS_ERR(lower_dentry))
2504     + while (count)
2505     + dput(path[count--]);
2506     + kfree(path);
2507     + return lower_dentry;
2508     +}
2509     +
2510     +/*
2511     + * Post-copyup helper to ensure we have valid mnts: set lower mnt of
2512     + * dentry+parents to the first parent node that has an mnt.
2513     + */
2514     +void unionfs_postcopyup_setmnt(struct dentry *dentry)
2515     +{
2516     + struct dentry *parent, *hasone;
2517     + int bindex = dbstart(dentry);
2518     +
2519     + if (unionfs_lower_mnt_idx(dentry, bindex))
2520     + return;
2521     + hasone = dentry->d_parent;
2522     + /* this loop should stop at root dentry */
2523     + while (!unionfs_lower_mnt_idx(hasone, bindex))
2524     + hasone = hasone->d_parent;
2525     + parent = dentry;
2526     + while (!unionfs_lower_mnt_idx(parent, bindex)) {
2527     + unionfs_set_lower_mnt_idx(parent, bindex,
2528     + unionfs_mntget(hasone, bindex));
2529     + parent = parent->d_parent;
2530     + }
2531     +}
2532     +
2533     +/*
2534     + * Post-copyup helper to release all non-directory source objects of a
2535     + * copied-up file. Regular files should have only one lower object.
2536     + */
2537     +void unionfs_postcopyup_release(struct dentry *dentry)
2538     +{
2539     + int bstart, bend;
2540     +
2541     + BUG_ON(S_ISDIR(dentry->d_inode->i_mode));
2542     + bstart = dbstart(dentry);
2543     + bend = dbend(dentry);
2544     +
2545     + path_put_lowers(dentry, bstart + 1, bend, false);
2546     + iput_lowers(dentry->d_inode, bstart + 1, bend, false);
2547     +
2548     + dbend(dentry) = bstart;
2549     + ibend(dentry->d_inode) = ibstart(dentry->d_inode) = bstart;
2550     +}
2551     diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c
2552     new file mode 100644
2553     index 0000000..100d2c6
2554     --- /dev/null
2555     +++ b/fs/unionfs/debug.c
2556     @@ -0,0 +1,532 @@
2557     +/*
2558     + * Copyright (c) 2003-2010 Erez Zadok
2559     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
2560     + * Copyright (c) 2003-2010 Stony Brook University
2561     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
2562     + *
2563     + * This program is free software; you can redistribute it and/or modify
2564     + * it under the terms of the GNU General Public License version 2 as
2565     + * published by the Free Software Foundation.
2566     + */
2567     +
2568     +#include "union.h"
2569     +
2570     +/*
2571     + * Helper debugging functions for maintainers (and for users to report back
2572     + * useful information back to maintainers)
2573     + */
2574     +
2575     +/* it's always useful to know what part of the code called us */
2576     +#define PRINT_CALLER(fname, fxn, line) \
2577     + do { \
2578     + if (!printed_caller) { \
2579     + pr_debug("PC:%s:%s:%d\n", (fname), (fxn), (line)); \
2580     + printed_caller = 1; \
2581     + } \
2582     + } while (0)
2583     +
2584     +/*
2585     + * __unionfs_check_{inode,dentry,file} perform exhaustive sanity checking on
2586     + * the fan-out of various Unionfs objects. We check that no lower objects
2587     + * exist outside the start/end branch range; that all objects within are
2588     + * non-NULL (with some allowed exceptions); that for every lower file
2589     + * there's a lower dentry+inode; that the start/end ranges match for all
2590     + * corresponding lower objects; that open files/symlinks have only one lower
2591     + * objects, but directories can have several; and more.
2592     + */
2593     +void __unionfs_check_inode(const struct inode *inode,
2594     + const char *fname, const char *fxn, int line)
2595     +{
2596     + int bindex;
2597     + int istart, iend;
2598     + struct inode *lower_inode;
2599     + struct super_block *sb;
2600     + int printed_caller = 0;
2601     + void *poison_ptr;
2602     +
2603     + /* for inodes now */
2604     + BUG_ON(!inode);
2605     + sb = inode->i_sb;
2606     + istart = ibstart(inode);
2607     + iend = ibend(inode);
2608     + /* don't check inode if no lower branches */
2609     + if (istart < 0 && iend < 0)
2610     + return;
2611     + if (unlikely(istart > iend)) {
2612     + PRINT_CALLER(fname, fxn, line);
2613     + pr_debug(" Ci0: inode=%p istart/end=%d:%d\n",
2614     + inode, istart, iend);
2615     + }
2616     + if (unlikely((istart == -1 && iend != -1) ||
2617     + (istart != -1 && iend == -1))) {
2618     + PRINT_CALLER(fname, fxn, line);
2619     + pr_debug(" Ci1: inode=%p istart/end=%d:%d\n",
2620     + inode, istart, iend);
2621     + }
2622     + if (!S_ISDIR(inode->i_mode)) {
2623     + if (unlikely(iend != istart)) {
2624     + PRINT_CALLER(fname, fxn, line);
2625     + pr_debug(" Ci2: inode=%p istart=%d iend=%d\n",
2626     + inode, istart, iend);
2627     + }
2628     + }
2629     +
2630     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2631     + if (unlikely(!UNIONFS_I(inode))) {
2632     + PRINT_CALLER(fname, fxn, line);
2633     + pr_debug(" Ci3: no inode_info %p\n", inode);
2634     + return;
2635     + }
2636     + if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
2637     + PRINT_CALLER(fname, fxn, line);
2638     + pr_debug(" Ci4: no lower_inodes %p\n", inode);
2639     + return;
2640     + }
2641     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2642     + if (lower_inode) {
2643     + memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2644     + if (unlikely(bindex < istart || bindex > iend)) {
2645     + PRINT_CALLER(fname, fxn, line);
2646     + pr_debug(" Ci5: inode/linode=%p:%p bindex=%d "
2647     + "istart/end=%d:%d\n", inode,
2648     + lower_inode, bindex, istart, iend);
2649     + } else if (unlikely(lower_inode == poison_ptr)) {
2650     + /* freed inode! */
2651     + PRINT_CALLER(fname, fxn, line);
2652     + pr_debug(" Ci6: inode/linode=%p:%p bindex=%d "
2653     + "istart/end=%d:%d\n", inode,
2654     + lower_inode, bindex, istart, iend);
2655     + }
2656     + continue;
2657     + }
2658     + /* if we get here, then lower_inode == NULL */
2659     + if (bindex < istart || bindex > iend)
2660     + continue;
2661     + /*
2662     + * directories can have NULL lower inodes in b/t start/end,
2663     + * but NOT if at the start/end range.
2664     + */
2665     + if (unlikely(S_ISDIR(inode->i_mode) &&
2666     + bindex > istart && bindex < iend))
2667     + continue;
2668     + PRINT_CALLER(fname, fxn, line);
2669     + pr_debug(" Ci7: inode/linode=%p:%p "
2670     + "bindex=%d istart/end=%d:%d\n",
2671     + inode, lower_inode, bindex, istart, iend);
2672     + }
2673     +}
2674     +
2675     +void __unionfs_check_dentry(const struct dentry *dentry,
2676     + const char *fname, const char *fxn, int line)
2677     +{
2678     + int bindex;
2679     + int dstart, dend, istart, iend;
2680     + struct dentry *lower_dentry;
2681     + struct inode *inode, *lower_inode;
2682     + struct super_block *sb;
2683     + struct vfsmount *lower_mnt;
2684     + int printed_caller = 0;
2685     + void *poison_ptr;
2686     +
2687     + BUG_ON(!dentry);
2688     + sb = dentry->d_sb;
2689     + inode = dentry->d_inode;
2690     + dstart = dbstart(dentry);
2691     + dend = dbend(dentry);
2692     + /* don't check dentry/mnt if no lower branches */
2693     + if (dstart < 0 && dend < 0)
2694     + goto check_inode;
2695     + BUG_ON(dstart > dend);
2696     +
2697     + if (unlikely((dstart == -1 && dend != -1) ||
2698     + (dstart != -1 && dend == -1))) {
2699     + PRINT_CALLER(fname, fxn, line);
2700     + pr_debug(" CD0: dentry=%p dstart/end=%d:%d\n",
2701     + dentry, dstart, dend);
2702     + }
2703     + /*
2704     + * check for NULL dentries inside the start/end range, or
2705     + * non-NULL dentries outside the start/end range.
2706     + */
2707     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2708     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
2709     + if (lower_dentry) {
2710     + if (unlikely(bindex < dstart || bindex > dend)) {
2711     + PRINT_CALLER(fname, fxn, line);
2712     + pr_debug(" CD1: dentry/lower=%p:%p(%p) "
2713     + "bindex=%d dstart/end=%d:%d\n",
2714     + dentry, lower_dentry,
2715     + (lower_dentry ? lower_dentry->d_inode :
2716     + (void *) -1L),
2717     + bindex, dstart, dend);
2718     + }
2719     + } else { /* lower_dentry == NULL */
2720     + if (bindex < dstart || bindex > dend)
2721     + continue;
2722     + /*
2723     + * Directories can have NULL lower inodes in b/t
2724     + * start/end, but NOT if at the start/end range.
2725     + * Ignore this rule, however, if this is a NULL
2726     + * dentry or a deleted dentry.
2727     + */
2728     + if (unlikely(!d_deleted((struct dentry *) dentry) &&
2729     + inode &&
2730     + !(inode && S_ISDIR(inode->i_mode) &&
2731     + bindex > dstart && bindex < dend))) {
2732     + PRINT_CALLER(fname, fxn, line);
2733     + pr_debug(" CD2: dentry/lower=%p:%p(%p) "
2734     + "bindex=%d dstart/end=%d:%d\n",
2735     + dentry, lower_dentry,
2736     + (lower_dentry ?
2737     + lower_dentry->d_inode :
2738     + (void *) -1L),
2739     + bindex, dstart, dend);
2740     + }
2741     + }
2742     + }
2743     +
2744     + /* check for vfsmounts same as for dentries */
2745     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2746     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2747     + if (lower_mnt) {
2748     + if (unlikely(bindex < dstart || bindex > dend)) {
2749     + PRINT_CALLER(fname, fxn, line);
2750     + pr_debug(" CM0: dentry/lmnt=%p:%p bindex=%d "
2751     + "dstart/end=%d:%d\n", dentry,
2752     + lower_mnt, bindex, dstart, dend);
2753     + }
2754     + } else { /* lower_mnt == NULL */
2755     + if (bindex < dstart || bindex > dend)
2756     + continue;
2757     + /*
2758     + * Directories can have NULL lower inodes in b/t
2759     + * start/end, but NOT if at the start/end range.
2760     + * Ignore this rule, however, if this is a NULL
2761     + * dentry.
2762     + */
2763     + if (unlikely(inode &&
2764     + !(inode && S_ISDIR(inode->i_mode) &&
2765     + bindex > dstart && bindex < dend))) {
2766     + PRINT_CALLER(fname, fxn, line);
2767     + pr_debug(" CM1: dentry/lmnt=%p:%p "
2768     + "bindex=%d dstart/end=%d:%d\n",
2769     + dentry, lower_mnt, bindex,
2770     + dstart, dend);
2771     + }
2772     + }
2773     + }
2774     +
2775     +check_inode:
2776     + /* for inodes now */
2777     + if (!inode)
2778     + return;
2779     + istart = ibstart(inode);
2780     + iend = ibend(inode);
2781     + /* don't check inode if no lower branches */
2782     + if (istart < 0 && iend < 0)
2783     + return;
2784     + BUG_ON(istart > iend);
2785     + if (unlikely((istart == -1 && iend != -1) ||
2786     + (istart != -1 && iend == -1))) {
2787     + PRINT_CALLER(fname, fxn, line);
2788     + pr_debug(" CI0: dentry/inode=%p:%p istart/end=%d:%d\n",
2789     + dentry, inode, istart, iend);
2790     + }
2791     + if (unlikely(istart != dstart)) {
2792     + PRINT_CALLER(fname, fxn, line);
2793     + pr_debug(" CI1: dentry/inode=%p:%p istart=%d dstart=%d\n",
2794     + dentry, inode, istart, dstart);
2795     + }
2796     + if (unlikely(iend != dend)) {
2797     + PRINT_CALLER(fname, fxn, line);
2798     + pr_debug(" CI2: dentry/inode=%p:%p iend=%d dend=%d\n",
2799     + dentry, inode, iend, dend);
2800     + }
2801     +
2802     + if (!S_ISDIR(inode->i_mode)) {
2803     + if (unlikely(dend != dstart)) {
2804     + PRINT_CALLER(fname, fxn, line);
2805     + pr_debug(" CI3: dentry/inode=%p:%p dstart=%d dend=%d\n",
2806     + dentry, inode, dstart, dend);
2807     + }
2808     + if (unlikely(iend != istart)) {
2809     + PRINT_CALLER(fname, fxn, line);
2810     + pr_debug(" CI4: dentry/inode=%p:%p istart=%d iend=%d\n",
2811     + dentry, inode, istart, iend);
2812     + }
2813     + }
2814     +
2815     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2816     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2817     + if (lower_inode) {
2818     + memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2819     + if (unlikely(bindex < istart || bindex > iend)) {
2820     + PRINT_CALLER(fname, fxn, line);
2821     + pr_debug(" CI5: dentry/linode=%p:%p bindex=%d "
2822     + "istart/end=%d:%d\n", dentry,
2823     + lower_inode, bindex, istart, iend);
2824     + } else if (unlikely(lower_inode == poison_ptr)) {
2825     + /* freed inode! */
2826     + PRINT_CALLER(fname, fxn, line);
2827     + pr_debug(" CI6: dentry/linode=%p:%p bindex=%d "
2828     + "istart/end=%d:%d\n", dentry,
2829     + lower_inode, bindex, istart, iend);
2830     + }
2831     + continue;
2832     + }
2833     + /* if we get here, then lower_inode == NULL */
2834     + if (bindex < istart || bindex > iend)
2835     + continue;
2836     + /*
2837     + * directories can have NULL lower inodes in b/t start/end,
2838     + * but NOT if at the start/end range.
2839     + */
2840     + if (unlikely(S_ISDIR(inode->i_mode) &&
2841     + bindex > istart && bindex < iend))
2842     + continue;
2843     + PRINT_CALLER(fname, fxn, line);
2844     + pr_debug(" CI7: dentry/linode=%p:%p "
2845     + "bindex=%d istart/end=%d:%d\n",
2846     + dentry, lower_inode, bindex, istart, iend);
2847     + }
2848     +
2849     + /*
2850     + * If it's a directory, then intermediate objects b/t start/end can
2851     + * be NULL. But, check that all three are NULL: lower dentry, mnt,
2852     + * and inode.
2853     + */
2854     + if (dstart >= 0 && dend >= 0 && S_ISDIR(inode->i_mode))
2855     + for (bindex = dstart+1; bindex < dend; bindex++) {
2856     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2857     + lower_dentry = unionfs_lower_dentry_idx(dentry,
2858     + bindex);
2859     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2860     + if (unlikely(!((lower_inode && lower_dentry &&
2861     + lower_mnt) ||
2862     + (!lower_inode &&
2863     + !lower_dentry && !lower_mnt)))) {
2864     + PRINT_CALLER(fname, fxn, line);
2865     + pr_debug(" Cx: lmnt/ldentry/linode=%p:%p:%p "
2866     + "bindex=%d dstart/end=%d:%d\n",
2867     + lower_mnt, lower_dentry, lower_inode,
2868     + bindex, dstart, dend);
2869     + }
2870     + }
2871     + /* check if lower inode is newer than upper one (it shouldn't) */
2872     + if (unlikely(is_newer_lower(dentry) && !is_negative_lower(dentry))) {
2873     + PRINT_CALLER(fname, fxn, line);
2874     + for (bindex = ibstart(inode); bindex <= ibend(inode);
2875     + bindex++) {
2876     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2877     + if (unlikely(!lower_inode))
2878     + continue;
2879     + pr_debug(" CI8: bindex=%d mtime/lmtime=%lu.%lu/%lu.%lu "
2880     + "ctime/lctime=%lu.%lu/%lu.%lu\n",
2881     + bindex,
2882     + inode->i_mtime.tv_sec,
2883     + inode->i_mtime.tv_nsec,
2884     + lower_inode->i_mtime.tv_sec,
2885     + lower_inode->i_mtime.tv_nsec,
2886     + inode->i_ctime.tv_sec,
2887     + inode->i_ctime.tv_nsec,
2888     + lower_inode->i_ctime.tv_sec,
2889     + lower_inode->i_ctime.tv_nsec);
2890     + }
2891     + }
2892     +}
2893     +
2894     +void __unionfs_check_file(const struct file *file,
2895     + const char *fname, const char *fxn, int line)
2896     +{
2897     + int bindex;
2898     + int dstart, dend, fstart, fend;
2899     + struct dentry *dentry;
2900     + struct file *lower_file;
2901     + struct inode *inode;
2902     + struct super_block *sb;
2903     + int printed_caller = 0;
2904     +
2905     + BUG_ON(!file);
2906     + dentry = file->f_path.dentry;
2907     + sb = dentry->d_sb;
2908     + dstart = dbstart(dentry);
2909     + dend = dbend(dentry);
2910     + BUG_ON(dstart > dend);
2911     + fstart = fbstart(file);
2912     + fend = fbend(file);
2913     + BUG_ON(fstart > fend);
2914     +
2915     + if (unlikely((fstart == -1 && fend != -1) ||
2916     + (fstart != -1 && fend == -1))) {
2917     + PRINT_CALLER(fname, fxn, line);
2918     + pr_debug(" CF0: file/dentry=%p:%p fstart/end=%d:%d\n",
2919     + file, dentry, fstart, fend);
2920     + }
2921     + if (unlikely(fstart != dstart)) {
2922     + PRINT_CALLER(fname, fxn, line);
2923     + pr_debug(" CF1: file/dentry=%p:%p fstart=%d dstart=%d\n",
2924     + file, dentry, fstart, dstart);
2925     + }
2926     + if (unlikely(fend != dend)) {
2927     + PRINT_CALLER(fname, fxn, line);
2928     + pr_debug(" CF2: file/dentry=%p:%p fend=%d dend=%d\n",
2929     + file, dentry, fend, dend);
2930     + }
2931     + inode = dentry->d_inode;
2932     + if (!S_ISDIR(inode->i_mode)) {
2933     + if (unlikely(fend != fstart)) {
2934     + PRINT_CALLER(fname, fxn, line);
2935     + pr_debug(" CF3: file/inode=%p:%p fstart=%d fend=%d\n",
2936     + file, inode, fstart, fend);
2937     + }
2938     + if (unlikely(dend != dstart)) {
2939     + PRINT_CALLER(fname, fxn, line);
2940     + pr_debug(" CF4: file/dentry=%p:%p dstart=%d dend=%d\n",
2941     + file, dentry, dstart, dend);
2942     + }
2943     + }
2944     +
2945     + /*
2946     + * check for NULL dentries inside the start/end range, or
2947     + * non-NULL dentries outside the start/end range.
2948     + */
2949     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2950     + lower_file = unionfs_lower_file_idx(file, bindex);
2951     + if (lower_file) {
2952     + if (unlikely(bindex < fstart || bindex > fend)) {
2953     + PRINT_CALLER(fname, fxn, line);
2954     + pr_debug(" CF5: file/lower=%p:%p bindex=%d "
2955     + "fstart/end=%d:%d\n", file,
2956     + lower_file, bindex, fstart, fend);
2957     + }
2958     + } else { /* lower_file == NULL */
2959     + if (bindex >= fstart && bindex <= fend) {
2960     + /*
2961     + * directories can have NULL lower inodes in
2962     + * b/t start/end, but NOT if at the
2963     + * start/end range.
2964     + */
2965     + if (unlikely(!(S_ISDIR(inode->i_mode) &&
2966     + bindex > fstart &&
2967     + bindex < fend))) {
2968     + PRINT_CALLER(fname, fxn, line);
2969     + pr_debug(" CF6: file/lower=%p:%p "
2970     + "bindex=%d fstart/end=%d:%d\n",
2971     + file, lower_file, bindex,
2972     + fstart, fend);
2973     + }
2974     + }
2975     + }
2976     + }
2977     +
2978     + __unionfs_check_dentry(dentry, fname, fxn, line);
2979     +}
2980     +
2981     +void __unionfs_check_nd(const struct nameidata *nd,
2982     + const char *fname, const char *fxn, int line)
2983     +{
2984     + struct file *file;
2985     + int printed_caller = 0;
2986     +
2987     + if (unlikely(!nd))
2988     + return;
2989     + if (nd->flags & LOOKUP_OPEN) {
2990     + file = nd->intent.open.file;
2991     + if (unlikely(file->f_path.dentry &&
2992     + strcmp(file->f_path.dentry->d_sb->s_type->name,
2993     + UNIONFS_NAME))) {
2994     + PRINT_CALLER(fname, fxn, line);
2995     + pr_debug(" CND1: lower_file of type %s\n",
2996     + file->f_path.dentry->d_sb->s_type->name);
2997     + }
2998     + }
2999     +}
3000     +
3001     +/* useful to track vfsmount leaks that could cause EBUSY on unmount */
3002     +void __show_branch_counts(const struct super_block *sb,
3003     + const char *file, const char *fxn, int line)
3004     +{
3005     + int i;
3006     + struct vfsmount *mnt;
3007     +
3008     + pr_debug("BC:");
3009     + for (i = 0; i < sbmax(sb); i++) {
3010     + if (likely(sb->s_root))
3011     + mnt = UNIONFS_D(sb->s_root)->lower_paths[i].mnt;
3012     + else
3013     + mnt = NULL;
3014     + printk(KERN_CONT "%d:",
3015     + (mnt ? atomic_read(&mnt->mnt_count) : -99));
3016     + }
3017     + printk(KERN_CONT "%s:%s:%d\n", file, fxn, line);
3018     +}
3019     +
3020     +void __show_inode_times(const struct inode *inode,
3021     + const char *file, const char *fxn, int line)
3022     +{
3023     + struct inode *lower_inode;
3024     + int bindex;
3025     +
3026     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3027     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3028     + if (unlikely(!lower_inode))
3029     + continue;
3030     + pr_debug("IT(%lu:%d): %s:%s:%d "
3031     + "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3032     + inode->i_ino, bindex,
3033     + file, fxn, line,
3034     + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3035     + lower_inode->i_mtime.tv_sec,
3036     + lower_inode->i_mtime.tv_nsec,
3037     + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3038     + lower_inode->i_ctime.tv_sec,
3039     + lower_inode->i_ctime.tv_nsec);
3040     + }
3041     +}
3042     +
3043     +void __show_dinode_times(const struct dentry *dentry,
3044     + const char *file, const char *fxn, int line)
3045     +{
3046     + struct inode *inode = dentry->d_inode;
3047     + struct inode *lower_inode;
3048     + int bindex;
3049     +
3050     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3051     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3052     + if (!lower_inode)
3053     + continue;
3054     + pr_debug("DT(%s:%lu:%d): %s:%s:%d "
3055     + "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3056     + dentry->d_name.name, inode->i_ino, bindex,
3057     + file, fxn, line,
3058     + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3059     + lower_inode->i_mtime.tv_sec,
3060     + lower_inode->i_mtime.tv_nsec,
3061     + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3062     + lower_inode->i_ctime.tv_sec,
3063     + lower_inode->i_ctime.tv_nsec);
3064     + }
3065     +}
3066     +
3067     +void __show_inode_counts(const struct inode *inode,
3068     + const char *file, const char *fxn, int line)
3069     +{
3070     + struct inode *lower_inode;
3071     + int bindex;
3072     +
3073     + if (unlikely(!inode)) {
3074     + pr_debug("SiC: Null inode\n");
3075     + return;
3076     + }
3077     + for (bindex = sbstart(inode->i_sb); bindex <= sbend(inode->i_sb);
3078     + bindex++) {
3079     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3080     + if (unlikely(!lower_inode))
3081     + continue;
3082     + pr_debug("SIC(%lu:%d:%d): lc=%d %s:%s:%d\n",
3083     + inode->i_ino, bindex,
3084     + atomic_read(&(inode)->i_count),
3085     + atomic_read(&(lower_inode)->i_count),
3086     + file, fxn, line);
3087     + }
3088     +}
3089     diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
3090     new file mode 100644
3091     index 0000000..a0c3bba
3092     --- /dev/null
3093     +++ b/fs/unionfs/dentry.c
3094     @@ -0,0 +1,397 @@
3095     +/*
3096     + * Copyright (c) 2003-2010 Erez Zadok
3097     + * Copyright (c) 2003-2006 Charles P. Wright
3098     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3099     + * Copyright (c) 2005-2006 Junjiro Okajima
3100     + * Copyright (c) 2005 Arun M. Krishnakumar
3101     + * Copyright (c) 2004-2006 David P. Quigley
3102     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3103     + * Copyright (c) 2003 Puja Gupta
3104     + * Copyright (c) 2003 Harikesavan Krishnan
3105     + * Copyright (c) 2003-2010 Stony Brook University
3106     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3107     + *
3108     + * This program is free software; you can redistribute it and/or modify
3109     + * it under the terms of the GNU General Public License version 2 as
3110     + * published by the Free Software Foundation.
3111     + */
3112     +
3113     +#include "union.h"
3114     +
3115     +bool is_negative_lower(const struct dentry *dentry)
3116     +{
3117     + int bindex;
3118     + struct dentry *lower_dentry;
3119     +
3120     + BUG_ON(!dentry);
3121     + /* cache coherency: check if file was deleted on lower branch */
3122     + if (dbstart(dentry) < 0)
3123     + return true;
3124     + for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
3125     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3126     + /* unhashed (i.e., unlinked) lower dentries don't count */
3127     + if (lower_dentry && lower_dentry->d_inode &&
3128     + !d_deleted(lower_dentry) &&
3129     + !(lower_dentry->d_flags & DCACHE_NFSFS_RENAMED))
3130     + return false;
3131     + }
3132     + return true;
3133     +}
3134     +
3135     +static inline void __dput_lowers(struct dentry *dentry, int start, int end)
3136     +{
3137     + struct dentry *lower_dentry;
3138     + int bindex;
3139     +
3140     + if (start < 0)
3141     + return;
3142     + for (bindex = start; bindex <= end; bindex++) {
3143     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3144     + if (!lower_dentry)
3145     + continue;
3146     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
3147     + dput(lower_dentry);
3148     + }
3149     +}
3150     +
3151     +/*
3152     + * Purge and invalidate as many data pages of a unionfs inode. This is
3153     + * called when the lower inode has changed, and we want to force processes
3154     + * to re-get the new data.
3155     + */
3156     +static inline void purge_inode_data(struct inode *inode)
3157     +{
3158     + /* remove all non-private mappings */
3159     + unmap_mapping_range(inode->i_mapping, 0, 0, 0);
3160     + /* invalidate as many pages as possible */
3161     + invalidate_mapping_pages(inode->i_mapping, 0, -1);
3162     + /*
3163     + * Don't try to truncate_inode_pages here, because this could lead
3164     + * to a deadlock between some of address_space ops and dentry
3165     + * revalidation: the address space op is invoked with a lock on our
3166     + * own page, and truncate_inode_pages will block on locked pages.
3167     + */
3168     +}
3169     +
3170     +/*
3171     + * Revalidate a single file/symlink/special dentry. Assume that info nodes
3172     + * of the @dentry and its @parent are locked. Assume parent is valid,
3173     + * otherwise return false (and let's hope the VFS will try to re-lookup this
3174     + * dentry). Returns true if valid, false otherwise.
3175     + */
3176     +bool __unionfs_d_revalidate(struct dentry *dentry, struct dentry *parent,
3177     + bool willwrite)
3178     +{
3179     + bool valid = true; /* default is valid */
3180     + struct dentry *lower_dentry;
3181     + struct dentry *result;
3182     + int bindex, bstart, bend;
3183     + int sbgen, dgen, pdgen;
3184     + int positive = 0;
3185     + int interpose_flag;
3186     +
3187     + verify_locked(dentry);
3188     + verify_locked(parent);
3189     +
3190     + /* if the dentry is unhashed, do NOT revalidate */
3191     + if (d_deleted(dentry))
3192     + goto out;
3193     +
3194     + dgen = atomic_read(&UNIONFS_D(dentry)->generation);
3195     +
3196     + if (is_newer_lower(dentry)) {
3197     + /* root dentry is always valid */
3198     + if (IS_ROOT(dentry)) {
3199     + unionfs_copy_attr_times(dentry->d_inode);
3200     + } else {
3201     + /*
3202     + * reset generation number to zero, guaranteed to be
3203     + * "old"
3204     + */
3205     + dgen = 0;
3206     + atomic_set(&UNIONFS_D(dentry)->generation, dgen);
3207     + }
3208     + if (!willwrite)
3209     + purge_inode_data(dentry->d_inode);
3210     + }
3211     +
3212     + sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
3213     +
3214     + BUG_ON(dbstart(dentry) == -1);
3215     + if (dentry->d_inode)
3216     + positive = 1;
3217     +
3218     + /* if our dentry is valid, then validate all lower ones */
3219     + if (sbgen == dgen)
3220     + goto validate_lowers;
3221     +
3222     + /* The root entry should always be valid */
3223     + BUG_ON(IS_ROOT(dentry));
3224     +
3225     + /* We can't work correctly if our parent isn't valid. */
3226     + pdgen = atomic_read(&UNIONFS_D(parent)->generation);
3227     +
3228     + /* Free the pointers for our inodes and this dentry. */
3229     + path_put_lowers_all(dentry, false);
3230     +
3231     + interpose_flag = INTERPOSE_REVAL_NEG;
3232     + if (positive) {
3233     + interpose_flag = INTERPOSE_REVAL;
3234     + iput_lowers_all(dentry->d_inode, true);
3235     + }
3236     +
3237     + if (realloc_dentry_private_data(dentry) != 0) {
3238     + valid = false;
3239     + goto out;
3240     + }
3241     +
3242     + result = unionfs_lookup_full(dentry, parent, interpose_flag);
3243     + if (result) {
3244     + if (IS_ERR(result)) {
3245     + valid = false;
3246     + goto out;
3247     + }
3248     + /*
3249     + * current unionfs_lookup_backend() doesn't return
3250     + * a valid dentry
3251     + */
3252     + dput(dentry);
3253     + dentry = result;
3254     + }
3255     +
3256     + if (unlikely(positive && is_negative_lower(dentry))) {
3257     + /* call make_bad_inode here ? */
3258     + d_drop(dentry);
3259     + valid = false;
3260     + goto out;
3261     + }
3262     +
3263     + /*
3264     + * if we got here then we have revalidated our dentry and all lower
3265     + * ones, so we can return safely.
3266     + */
3267     + if (!valid) /* lower dentry revalidation failed */
3268     + goto out;
3269     +
3270     + /*
3271     + * If the parent's gen no. matches the superblock's gen no., then
3272     + * we can update our denty's gen no. If they didn't match, then it
3273     + * was OK to revalidate this dentry with a stale parent, but we'll
3274     + * purposely not update our dentry's gen no. (so it can be redone);
3275     + * and, we'll mark our parent dentry as invalid so it'll force it
3276     + * (and our dentry) to be revalidated.
3277     + */
3278     + if (pdgen == sbgen)
3279     + atomic_set(&UNIONFS_D(dentry)->generation, sbgen);
3280     + goto out;
3281     +
3282     +validate_lowers:
3283     +
3284     + /* The revalidation must occur across all branches */
3285     + bstart = dbstart(dentry);
3286     + bend = dbend(dentry);
3287     + BUG_ON(bstart == -1);
3288     + for (bindex = bstart; bindex <= bend; bindex++) {
3289     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3290     + if (!lower_dentry || !lower_dentry->d_op
3291     + || !lower_dentry->d_op->d_revalidate)
3292     + continue;
3293     + /*
3294     + * Don't pass nameidata to lower file system, because we
3295     + * don't want an arbitrary lower file being opened or
3296     + * returned to us: it may be useless to us because of the
3297     + * fanout nature of unionfs (cf. file/directory open-file
3298     + * invariants). We will open lower files as and when needed
3299     + * later on.
3300     + */
3301     + if (!lower_dentry->d_op->d_revalidate(lower_dentry, NULL))
3302     + valid = false;
3303     + }
3304     +
3305     + if (!dentry->d_inode ||
3306     + ibstart(dentry->d_inode) < 0 ||
3307     + ibend(dentry->d_inode) < 0) {
3308     + valid = false;
3309     + goto out;
3310     + }
3311     +
3312     + if (valid) {
3313     + /*
3314     + * If we get here, and we copy the meta-data from the lower
3315     + * inode to our inode, then it is vital that we have already
3316     + * purged all unionfs-level file data. We do that in the
3317     + * caller (__unionfs_d_revalidate) by calling
3318     + * purge_inode_data.
3319     + */
3320     + unionfs_copy_attr_all(dentry->d_inode,
3321     + unionfs_lower_inode(dentry->d_inode));
3322     + fsstack_copy_inode_size(dentry->d_inode,
3323     + unionfs_lower_inode(dentry->d_inode));
3324     + }
3325     +
3326     +out:
3327     + return valid;
3328     +}
3329     +
3330     +/*
3331     + * Determine if the lower inode objects have changed from below the unionfs
3332     + * inode. Return true if changed, false otherwise.
3333     + *
3334     + * We check if the mtime or ctime have changed. However, the inode times
3335     + * can be changed by anyone without much protection, including
3336     + * asynchronously. This can sometimes cause unionfs to find that the lower
3337     + * file system doesn't change its inode times quick enough, resulting in a
3338     + * false positive indication (which is harmless, it just makes unionfs do
3339     + * extra work in re-validating the objects). To minimize the chances of
3340     + * these situations, we still consider such small time changes valid, but we
3341     + * don't print debugging messages unless the time changes are greater than
3342     + * UNIONFS_MIN_CC_TIME (which defaults to 3 seconds, as with NFS's acregmin)
3343     + * because significant changes are more likely due to users manually
3344     + * touching lower files.
3345     + */
3346     +bool is_newer_lower(const struct dentry *dentry)
3347     +{
3348     + int bindex;
3349     + struct inode *inode;
3350     + struct inode *lower_inode;
3351     +
3352     + /* ignore if we're called on semi-initialized dentries/inodes */
3353     + if (!dentry || !UNIONFS_D(dentry))
3354     + return false;
3355     + inode = dentry->d_inode;
3356     + if (!inode || !UNIONFS_I(inode)->lower_inodes ||
3357     + ibstart(inode) < 0 || ibend(inode) < 0)
3358     + return false;
3359     +
3360     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3361     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3362     + if (!lower_inode)
3363     + continue;
3364     +
3365     + /* check if mtime/ctime have changed */
3366     + if (unlikely(timespec_compare(&inode->i_mtime,
3367     + &lower_inode->i_mtime) < 0)) {
3368     + if ((lower_inode->i_mtime.tv_sec -
3369     + inode->i_mtime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3370     + pr_info("unionfs: new lower inode mtime "
3371     + "(bindex=%d, name=%s)\n", bindex,
3372     + dentry->d_name.name);
3373     + show_dinode_times(dentry);
3374     + }
3375     + return true;
3376     + }
3377     + if (unlikely(timespec_compare(&inode->i_ctime,
3378     + &lower_inode->i_ctime) < 0)) {
3379     + if ((lower_inode->i_ctime.tv_sec -
3380     + inode->i_ctime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3381     + pr_info("unionfs: new lower inode ctime "
3382     + "(bindex=%d, name=%s)\n", bindex,
3383     + dentry->d_name.name);
3384     + show_dinode_times(dentry);
3385     + }
3386     + return true;
3387     + }
3388     + }
3389     +
3390     + /*
3391     + * Last check: if this is a positive dentry, but somehow all lower
3392     + * dentries are negative or unhashed, then this dentry needs to be
3393     + * revalidated, because someone probably deleted the objects from
3394     + * the lower branches directly.
3395     + */
3396     + if (is_negative_lower(dentry))
3397     + return true;
3398     +
3399     + return false; /* default: lower is not newer */
3400     +}
3401     +
3402     +static int unionfs_d_revalidate(struct dentry *dentry,
3403     + struct nameidata *nd_unused)
3404     +{
3405     + bool valid = true;
3406     + int err = 1; /* 1 means valid for the VFS */
3407     + struct dentry *parent;
3408     +
3409     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3410     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3411     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3412     +
3413     + valid = __unionfs_d_revalidate(dentry, parent, false);
3414     + if (valid) {
3415     + unionfs_postcopyup_setmnt(dentry);
3416     + unionfs_check_dentry(dentry);
3417     + } else {
3418     + d_drop(dentry);
3419     + err = valid;
3420     + }
3421     + unionfs_unlock_dentry(dentry);
3422     + unionfs_unlock_parent(dentry, parent);
3423     + unionfs_read_unlock(dentry->d_sb);
3424     +
3425     + return err;
3426     +}
3427     +
3428     +static void unionfs_d_release(struct dentry *dentry)
3429     +{
3430     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3431     + if (unlikely(!UNIONFS_D(dentry)))
3432     + goto out; /* skip if no lower branches */
3433     + /* must lock our branch configuration here */
3434     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3435     +
3436     + unionfs_check_dentry(dentry);
3437     + /* this could be a negative dentry, so check first */
3438     + if (dbstart(dentry) < 0) {
3439     + unionfs_unlock_dentry(dentry);
3440     + goto out; /* due to a (normal) failed lookup */
3441     + }
3442     +
3443     + /* Release all the lower dentries */
3444     + path_put_lowers_all(dentry, true);
3445     +
3446     + unionfs_unlock_dentry(dentry);
3447     +
3448     +out:
3449     + free_dentry_private_data(dentry);
3450     + unionfs_read_unlock(dentry->d_sb);
3451     + return;
3452     +}
3453     +
3454     +/*
3455     + * Called when we're removing the last reference to our dentry. So we
3456     + * should drop all lower references too.
3457     + */
3458     +static void unionfs_d_iput(struct dentry *dentry, struct inode *inode)
3459     +{
3460     + int rc;
3461     +
3462     + BUG_ON(!dentry);
3463     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3464     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3465     +
3466     + if (!UNIONFS_D(dentry) || dbstart(dentry) < 0)
3467     + goto drop_lower_inodes;
3468     + path_put_lowers_all(dentry, false);
3469     +
3470     +drop_lower_inodes:
3471     + rc = atomic_read(&inode->i_count);
3472     + if (rc == 1 && inode->i_nlink == 1 && ibstart(inode) >= 0) {
3473     + /* see Documentation/filesystems/unionfs/issues.txt */
3474     + lockdep_off();
3475     + iput(unionfs_lower_inode(inode));
3476     + lockdep_on();
3477     + unionfs_set_lower_inode(inode, NULL);
3478     + /* XXX: may need to set start/end to -1? */
3479     + }
3480     +
3481     + iput(inode);
3482     +
3483     + unionfs_unlock_dentry(dentry);
3484     + unionfs_read_unlock(dentry->d_sb);
3485     +}
3486     +
3487     +struct dentry_operations unionfs_dops = {
3488     + .d_revalidate = unionfs_d_revalidate,
3489     + .d_release = unionfs_d_release,
3490     + .d_iput = unionfs_d_iput,
3491     +};
3492     diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
3493     new file mode 100644
3494     index 0000000..7da0ff0
3495     --- /dev/null
3496     +++ b/fs/unionfs/dirfops.c
3497     @@ -0,0 +1,302 @@
3498     +/*
3499     + * Copyright (c) 2003-2010 Erez Zadok
3500     + * Copyright (c) 2003-2006 Charles P. Wright
3501     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3502     + * Copyright (c) 2005-2006 Junjiro Okajima
3503     + * Copyright (c) 2005 Arun M. Krishnakumar
3504     + * Copyright (c) 2004-2006 David P. Quigley
3505     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3506     + * Copyright (c) 2003 Puja Gupta
3507     + * Copyright (c) 2003 Harikesavan Krishnan
3508     + * Copyright (c) 2003-2010 Stony Brook University
3509     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3510     + *
3511     + * This program is free software; you can redistribute it and/or modify
3512     + * it under the terms of the GNU General Public License version 2 as
3513     + * published by the Free Software Foundation.
3514     + */
3515     +
3516     +#include "union.h"
3517     +
3518     +/* Make sure our rdstate is playing by the rules. */
3519     +static void verify_rdstate_offset(struct unionfs_dir_state *rdstate)
3520     +{
3521     + BUG_ON(rdstate->offset >= DIREOF);
3522     + BUG_ON(rdstate->cookie >= MAXRDCOOKIE);
3523     +}
3524     +
3525     +struct unionfs_getdents_callback {
3526     + struct unionfs_dir_state *rdstate;
3527     + void *dirent;
3528     + int entries_written;
3529     + int filldir_called;
3530     + int filldir_error;
3531     + filldir_t filldir;
3532     + struct super_block *sb;
3533     +};
3534     +
3535     +/* based on generic filldir in fs/readir.c */
3536     +static int unionfs_filldir(void *dirent, const char *oname, int namelen,
3537     + loff_t offset, u64 ino, unsigned int d_type)
3538     +{
3539     + struct unionfs_getdents_callback *buf = dirent;
3540     + struct filldir_node *found = NULL;
3541     + int err = 0;
3542     + int is_whiteout;
3543     + char *name = (char *) oname;
3544     +
3545     + buf->filldir_called++;
3546     +
3547     + is_whiteout = is_whiteout_name(&name, &namelen);
3548     +
3549     + found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3550     +
3551     + if (found) {
3552     + /*
3553     + * If we had non-whiteout entry in dir cache, then mark it
3554     + * as a whiteout and but leave it in the dir cache.
3555     + */
3556     + if (is_whiteout && !found->whiteout)
3557     + found->whiteout = is_whiteout;
3558     + goto out;
3559     + }
3560     +
3561     + /* if 'name' isn't a whiteout, filldir it. */
3562     + if (!is_whiteout) {
3563     + off_t pos = rdstate2offset(buf->rdstate);
3564     + u64 unionfs_ino = ino;
3565     +
3566     + err = buf->filldir(buf->dirent, name, namelen, pos,
3567     + unionfs_ino, d_type);
3568     + buf->rdstate->offset++;
3569     + verify_rdstate_offset(buf->rdstate);
3570     + }
3571     + /*
3572     + * If we did fill it, stuff it in our hash, otherwise return an
3573     + * error.
3574     + */
3575     + if (err) {
3576     + buf->filldir_error = err;
3577     + goto out;
3578     + }
3579     + buf->entries_written++;
3580     + err = add_filldir_node(buf->rdstate, name, namelen,
3581     + buf->rdstate->bindex, is_whiteout);
3582     + if (err)
3583     + buf->filldir_error = err;
3584     +
3585     +out:
3586     + return err;
3587     +}
3588     +
3589     +static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
3590     +{
3591     + int err = 0;
3592     + struct file *lower_file = NULL;
3593     + struct dentry *dentry = file->f_path.dentry;
3594     + struct dentry *parent;
3595     + struct inode *inode = NULL;
3596     + struct unionfs_getdents_callback buf;
3597     + struct unionfs_dir_state *uds;
3598     + int bend;
3599     + loff_t offset;
3600     +
3601     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3602     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3603     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3604     +
3605     + err = unionfs_file_revalidate(file, parent, false);
3606     + if (unlikely(err))
3607     + goto out;
3608     +
3609     + inode = dentry->d_inode;
3610     +
3611     + uds = UNIONFS_F(file)->rdstate;
3612     + if (!uds) {
3613     + if (file->f_pos == DIREOF) {
3614     + goto out;
3615     + } else if (file->f_pos > 0) {
3616     + uds = find_rdstate(inode, file->f_pos);
3617     + if (unlikely(!uds)) {
3618     + err = -ESTALE;
3619     + goto out;
3620     + }
3621     + UNIONFS_F(file)->rdstate = uds;
3622     + } else {
3623     + init_rdstate(file);
3624     + uds = UNIONFS_F(file)->rdstate;
3625     + }
3626     + }
3627     + bend = fbend(file);
3628     +
3629     + while (uds->bindex <= bend) {
3630     + lower_file = unionfs_lower_file_idx(file, uds->bindex);
3631     + if (!lower_file) {
3632     + uds->bindex++;
3633     + uds->dirpos = 0;
3634     + continue;
3635     + }
3636     +
3637     + /* prepare callback buffer */
3638     + buf.filldir_called = 0;
3639     + buf.filldir_error = 0;
3640     + buf.entries_written = 0;
3641     + buf.dirent = dirent;
3642     + buf.filldir = filldir;
3643     + buf.rdstate = uds;
3644     + buf.sb = inode->i_sb;
3645     +
3646     + /* Read starting from where we last left off. */
3647     + offset = vfs_llseek(lower_file, uds->dirpos, SEEK_SET);
3648     + if (offset < 0) {
3649     + err = offset;
3650     + goto out;
3651     + }
3652     + err = vfs_readdir(lower_file, unionfs_filldir, &buf);
3653     +
3654     + /* Save the position for when we continue. */
3655     + offset = vfs_llseek(lower_file, 0, SEEK_CUR);
3656     + if (offset < 0) {
3657     + err = offset;
3658     + goto out;
3659     + }
3660     + uds->dirpos = offset;
3661     +
3662     + /* Copy the atime. */
3663     + fsstack_copy_attr_atime(inode,
3664     + lower_file->f_path.dentry->d_inode);
3665     +
3666     + if (err < 0)
3667     + goto out;
3668     +
3669     + if (buf.filldir_error)
3670     + break;
3671     +
3672     + if (!buf.entries_written) {
3673     + uds->bindex++;
3674     + uds->dirpos = 0;
3675     + }
3676     + }
3677     +
3678     + if (!buf.filldir_error && uds->bindex >= bend) {
3679     + /* Save the number of hash entries for next time. */
3680     + UNIONFS_I(inode)->hashsize = uds->hashentries;
3681     + free_rdstate(uds);
3682     + UNIONFS_F(file)->rdstate = NULL;
3683     + file->f_pos = DIREOF;
3684     + } else {
3685     + file->f_pos = rdstate2offset(uds);
3686     + }
3687     +
3688     +out:
3689     + if (!err)
3690     + unionfs_check_file(file);
3691     + unionfs_unlock_dentry(dentry);
3692     + unionfs_unlock_parent(dentry, parent);
3693     + unionfs_read_unlock(dentry->d_sb);
3694     + return err;
3695     +}
3696     +
3697     +/*
3698     + * This is not meant to be a generic repositioning function. If you do
3699     + * things that aren't supported, then we return EINVAL.
3700     + *
3701     + * What is allowed:
3702     + * (1) seeking to the same position that you are currently at
3703     + * This really has no effect, but returns where you are.
3704     + * (2) seeking to the beginning of the file
3705     + * This throws out all state, and lets you begin again.
3706     + */
3707     +static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
3708     +{
3709     + struct unionfs_dir_state *rdstate;
3710     + struct dentry *dentry = file->f_path.dentry;
3711     + struct dentry *parent;
3712     + loff_t err;
3713     +
3714     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3715     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3716     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3717     +
3718     + err = unionfs_file_revalidate(file, parent, false);
3719     + if (unlikely(err))
3720     + goto out;
3721     +
3722     + rdstate = UNIONFS_F(file)->rdstate;
3723     +
3724     + /*
3725     + * we let users seek to their current position, but not anywhere
3726     + * else.
3727     + */
3728     + if (!offset) {
3729     + switch (origin) {
3730     + case SEEK_SET:
3731     + if (rdstate) {
3732     + free_rdstate(rdstate);
3733     + UNIONFS_F(file)->rdstate = NULL;
3734     + }
3735     + init_rdstate(file);
3736     + err = 0;
3737     + break;
3738     + case SEEK_CUR:
3739     + err = file->f_pos;
3740     + break;
3741     + case SEEK_END:
3742     + /* Unsupported, because we would break everything. */
3743     + err = -EINVAL;
3744     + break;
3745     + }
3746     + } else {
3747     + switch (origin) {
3748     + case SEEK_SET:
3749     + if (rdstate) {
3750     + if (offset == rdstate2offset(rdstate))
3751     + err = offset;
3752     + else if (file->f_pos == DIREOF)
3753     + err = DIREOF;
3754     + else
3755     + err = -EINVAL;
3756     + } else {
3757     + struct inode *inode;
3758     + inode = dentry->d_inode;
3759     + rdstate = find_rdstate(inode, offset);
3760     + if (rdstate) {
3761     + UNIONFS_F(file)->rdstate = rdstate;
3762     + err = rdstate->offset;
3763     + } else {
3764     + err = -EINVAL;
3765     + }
3766     + }
3767     + break;
3768     + case SEEK_CUR:
3769     + case SEEK_END:
3770     + /* Unsupported, because we would break everything. */
3771     + err = -EINVAL;
3772     + break;
3773     + }
3774     + }
3775     +
3776     +out:
3777     + if (!err)
3778     + unionfs_check_file(file);
3779     + unionfs_unlock_dentry(dentry);
3780     + unionfs_unlock_parent(dentry, parent);
3781     + unionfs_read_unlock(dentry->d_sb);
3782     + return err;
3783     +}
3784     +
3785     +/*
3786     + * Trimmed directory options, we shouldn't pass everything down since
3787     + * we don't want to operate on partial directories.
3788     + */
3789     +struct file_operations unionfs_dir_fops = {
3790     + .llseek = unionfs_dir_llseek,
3791     + .read = generic_read_dir,
3792     + .readdir = unionfs_readdir,
3793     + .unlocked_ioctl = unionfs_ioctl,
3794     + .open = unionfs_open,
3795     + .release = unionfs_file_release,
3796     + .flush = unionfs_flush,
3797     + .fsync = unionfs_fsync,
3798     + .fasync = unionfs_fasync,
3799     +};
3800     diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
3801     new file mode 100644
3802     index 0000000..033343b
3803     --- /dev/null
3804     +++ b/fs/unionfs/dirhelper.c
3805     @@ -0,0 +1,158 @@
3806     +/*
3807     + * Copyright (c) 2003-2010 Erez Zadok
3808     + * Copyright (c) 2003-2006 Charles P. Wright
3809     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3810     + * Copyright (c) 2005-2006 Junjiro Okajima
3811     + * Copyright (c) 2005 Arun M. Krishnakumar
3812     + * Copyright (c) 2004-2006 David P. Quigley
3813     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3814     + * Copyright (c) 2003 Puja Gupta
3815     + * Copyright (c) 2003 Harikesavan Krishnan
3816     + * Copyright (c) 2003-2010 Stony Brook University
3817     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3818     + *
3819     + * This program is free software; you can redistribute it and/or modify
3820     + * it under the terms of the GNU General Public License version 2 as
3821     + * published by the Free Software Foundation.
3822     + */
3823     +
3824     +#include "union.h"
3825     +
3826     +#define RD_NONE 0
3827     +#define RD_CHECK_EMPTY 1
3828     +/* The callback structure for check_empty. */
3829     +struct unionfs_rdutil_callback {
3830     + int err;
3831     + int filldir_called;
3832     + struct unionfs_dir_state *rdstate;
3833     + int mode;
3834     +};
3835     +
3836     +/* This filldir function makes sure only whiteouts exist within a directory. */
3837     +static int readdir_util_callback(void *dirent, const char *oname, int namelen,
3838     + loff_t offset, u64 ino, unsigned int d_type)
3839     +{
3840     + int err = 0;
3841     + struct unionfs_rdutil_callback *buf = dirent;
3842     + int is_whiteout;
3843     + struct filldir_node *found;
3844     + char *name = (char *) oname;
3845     +
3846     + buf->filldir_called = 1;
3847     +
3848     + if (name[0] == '.' && (namelen == 1 ||
3849     + (name[1] == '.' && namelen == 2)))
3850     + goto out;
3851     +
3852     + is_whiteout = is_whiteout_name(&name, &namelen);
3853     +
3854     + found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3855     + /* If it was found in the table there was a previous whiteout. */
3856     + if (found)
3857     + goto out;
3858     +
3859     + /*
3860     + * if it wasn't found and isn't a whiteout, the directory isn't
3861     + * empty.
3862     + */
3863     + err = -ENOTEMPTY;
3864     + if ((buf->mode == RD_CHECK_EMPTY) && !is_whiteout)
3865     + goto out;
3866     +
3867     + err = add_filldir_node(buf->rdstate, name, namelen,
3868     + buf->rdstate->bindex, is_whiteout);
3869     +
3870     +out:
3871     + buf->err = err;
3872     + return err;
3873     +}
3874     +
3875     +/* Is a directory logically empty? */
3876     +int check_empty(struct dentry *dentry, struct dentry *parent,
3877     + struct unionfs_dir_state **namelist)
3878     +{
3879     + int err = 0;
3880     + struct dentry *lower_dentry = NULL;
3881     + struct vfsmount *mnt;
3882     + struct super_block *sb;
3883     + struct file *lower_file;
3884     + struct unionfs_rdutil_callback *buf = NULL;
3885     + int bindex, bstart, bend, bopaque;
3886     +
3887     + sb = dentry->d_sb;
3888     +
3889     +
3890     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
3891     +
3892     + err = unionfs_partial_lookup(dentry, parent);
3893     + if (err)
3894     + goto out;
3895     +
3896     + bstart = dbstart(dentry);
3897     + bend = dbend(dentry);
3898     + bopaque = dbopaque(dentry);
3899     + if (0 <= bopaque && bopaque < bend)
3900     + bend = bopaque;
3901     +
3902     + buf = kmalloc(sizeof(struct unionfs_rdutil_callback), GFP_KERNEL);
3903     + if (unlikely(!buf)) {
3904     + err = -ENOMEM;
3905     + goto out;
3906     + }
3907     + buf->err = 0;
3908     + buf->mode = RD_CHECK_EMPTY;
3909     + buf->rdstate = alloc_rdstate(dentry->d_inode, bstart);
3910     + if (unlikely(!buf->rdstate)) {
3911     + err = -ENOMEM;
3912     + goto out;
3913     + }
3914     +
3915     + /* Process the lower directories with rdutil_callback as a filldir. */
3916     + for (bindex = bstart; bindex <= bend; bindex++) {
3917     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3918     + if (!lower_dentry)
3919     + continue;
3920     + if (!lower_dentry->d_inode)
3921     + continue;
3922     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
3923     + continue;
3924     +
3925     + dget(lower_dentry);
3926     + mnt = unionfs_mntget(dentry, bindex);
3927     + branchget(sb, bindex);
3928     + lower_file = dentry_open(lower_dentry, mnt, O_RDONLY, current_cred());
3929     + if (IS_ERR(lower_file)) {
3930     + err = PTR_ERR(lower_file);
3931     + branchput(sb, bindex);
3932     + goto out;
3933     + }
3934     +
3935     + do {
3936     + buf->filldir_called = 0;
3937     + buf->rdstate->bindex = bindex;
3938     + err = vfs_readdir(lower_file,
3939     + readdir_util_callback, buf);
3940     + if (buf->err)
3941     + err = buf->err;
3942     + } while ((err >= 0) && buf->filldir_called);
3943     +
3944     + /* fput calls dput for lower_dentry */
3945     + fput(lower_file);
3946     + branchput(sb, bindex);
3947     +
3948     + if (err < 0)
3949     + goto out;
3950     + }
3951     +
3952     +out:
3953     + if (buf) {
3954     + if (namelist && !err)
3955     + *namelist = buf->rdstate;
3956     + else if (buf->rdstate)
3957     + free_rdstate(buf->rdstate);
3958     + kfree(buf);
3959     + }
3960     +
3961     +
3962     + return err;
3963     +}
3964     diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h
3965     new file mode 100644
3966     index 0000000..5b77eac
3967     --- /dev/null
3968     +++ b/fs/unionfs/fanout.h
3969     @@ -0,0 +1,407 @@
3970     +/*
3971     + * Copyright (c) 2003-2010 Erez Zadok
3972     + * Copyright (c) 2003-2006 Charles P. Wright
3973     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3974     + * Copyright (c) 2005 Arun M. Krishnakumar
3975     + * Copyright (c) 2004-2006 David P. Quigley
3976     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3977     + * Copyright (c) 2003 Puja Gupta
3978     + * Copyright (c) 2003 Harikesavan Krishnan
3979     + * Copyright (c) 2003-2010 Stony Brook University
3980     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
3981     + *
3982     + * This program is free software; you can redistribute it and/or modify
3983     + * it under the terms of the GNU General Public License version 2 as
3984     + * published by the Free Software Foundation.
3985     + */
3986     +
3987     +#ifndef _FANOUT_H_
3988     +#define _FANOUT_H_
3989     +
3990     +/*
3991     + * Inode to private data
3992     + *
3993     + * Since we use containers and the struct inode is _inside_ the
3994     + * unionfs_inode_info structure, UNIONFS_I will always (given a non-NULL
3995     + * inode pointer), return a valid non-NULL pointer.
3996     + */
3997     +static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode)
3998     +{
3999     + return container_of(inode, struct unionfs_inode_info, vfs_inode);
4000     +}
4001     +
4002     +#define ibstart(ino) (UNIONFS_I(ino)->bstart)
4003     +#define ibend(ino) (UNIONFS_I(ino)->bend)
4004     +
4005     +/* Dentry to private data */
4006     +#define UNIONFS_D(dent) ((struct unionfs_dentry_info *)(dent)->d_fsdata)
4007     +#define dbstart(dent) (UNIONFS_D(dent)->bstart)
4008     +#define dbend(dent) (UNIONFS_D(dent)->bend)
4009     +#define dbopaque(dent) (UNIONFS_D(dent)->bopaque)
4010     +
4011     +/* Superblock to private data */
4012     +#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info)
4013     +#define sbstart(sb) 0
4014     +#define sbend(sb) (UNIONFS_SB(sb)->bend)
4015     +#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1)
4016     +#define sbhbid(sb) (UNIONFS_SB(sb)->high_branch_id)
4017     +
4018     +/* File to private Data */
4019     +#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data))
4020     +#define fbstart(file) (UNIONFS_F(file)->bstart)
4021     +#define fbend(file) (UNIONFS_F(file)->bend)
4022     +
4023     +/* macros to manipulate branch IDs in stored in our superblock */
4024     +static inline int branch_id(struct super_block *sb, int index)
4025     +{
4026     + BUG_ON(!sb || index < 0);
4027     + return UNIONFS_SB(sb)->data[index].branch_id;
4028     +}
4029     +
4030     +static inline void set_branch_id(struct super_block *sb, int index, int val)
4031     +{
4032     + BUG_ON(!sb || index < 0);
4033     + UNIONFS_SB(sb)->data[index].branch_id = val;
4034     +}
4035     +
4036     +static inline void new_branch_id(struct super_block *sb, int index)
4037     +{
4038     + BUG_ON(!sb || index < 0);
4039     + set_branch_id(sb, index, ++UNIONFS_SB(sb)->high_branch_id);
4040     +}
4041     +
4042     +/*
4043     + * Find new index of matching branch with an existing superblock of a known
4044     + * (possibly old) id. This is needed because branches could have been
4045     + * added/deleted causing the branches of any open files to shift.
4046     + *
4047     + * @sb: the new superblock which may have new/different branch IDs
4048     + * @id: the old/existing id we're looking for
4049     + * Returns index of newly found branch (0 or greater), -1 otherwise.
4050     + */
4051     +static inline int branch_id_to_idx(struct super_block *sb, int id)
4052     +{
4053     + int i;
4054     + for (i = 0; i < sbmax(sb); i++) {
4055     + if (branch_id(sb, i) == id)
4056     + return i;
4057     + }
4058     + /* in the non-ODF code, this should really never happen */
4059     + printk(KERN_WARNING "unionfs: cannot find branch with id %d\n", id);
4060     + return -1;
4061     +}
4062     +
4063     +/* File to lower file. */
4064     +static inline struct file *unionfs_lower_file(const struct file *f)
4065     +{
4066     + BUG_ON(!f);
4067     + return UNIONFS_F(f)->lower_files[fbstart(f)];
4068     +}
4069     +
4070     +static inline struct file *unionfs_lower_file_idx(const struct file *f,
4071     + int index)
4072     +{
4073     + BUG_ON(!f || index < 0);
4074     + return UNIONFS_F(f)->lower_files[index];
4075     +}
4076     +
4077     +static inline void unionfs_set_lower_file_idx(struct file *f, int index,
4078     + struct file *val)
4079     +{
4080     + BUG_ON(!f || index < 0);
4081     + UNIONFS_F(f)->lower_files[index] = val;
4082     + /* save branch ID (may be redundant?) */
4083     + UNIONFS_F(f)->saved_branch_ids[index] =
4084     + branch_id((f)->f_path.dentry->d_sb, index);
4085     +}
4086     +
4087     +static inline void unionfs_set_lower_file(struct file *f, struct file *val)
4088     +{
4089     + BUG_ON(!f);
4090     + unionfs_set_lower_file_idx((f), fbstart(f), (val));
4091     +}
4092     +
4093     +/* Inode to lower inode. */
4094     +static inline struct inode *unionfs_lower_inode(const struct inode *i)
4095     +{
4096     + BUG_ON(!i);
4097     + return UNIONFS_I(i)->lower_inodes[ibstart(i)];
4098     +}
4099     +
4100     +static inline struct inode *unionfs_lower_inode_idx(const struct inode *i,
4101     + int index)
4102     +{
4103     + BUG_ON(!i || index < 0);
4104     + return UNIONFS_I(i)->lower_inodes[index];
4105     +}
4106     +
4107     +static inline void unionfs_set_lower_inode_idx(struct inode *i, int index,
4108     + struct inode *val)
4109     +{
4110     + BUG_ON(!i || index < 0);
4111     + UNIONFS_I(i)->lower_inodes[index] = val;
4112     +}
4113     +
4114     +static inline void unionfs_set_lower_inode(struct inode *i, struct inode *val)
4115     +{
4116     + BUG_ON(!i);
4117     + UNIONFS_I(i)->lower_inodes[ibstart(i)] = val;
4118     +}
4119     +
4120     +/* Superblock to lower superblock. */
4121     +static inline struct super_block *unionfs_lower_super(
4122     + const struct super_block *sb)
4123     +{
4124     + BUG_ON(!sb);
4125     + return UNIONFS_SB(sb)->data[sbstart(sb)].sb;
4126     +}
4127     +
4128     +static inline struct super_block *unionfs_lower_super_idx(
4129     + const struct super_block *sb,
4130     + int index)
4131     +{
4132     + BUG_ON(!sb || index < 0);
4133     + return UNIONFS_SB(sb)->data[index].sb;
4134     +}
4135     +
4136     +static inline void unionfs_set_lower_super_idx(struct super_block *sb,
4137     + int index,
4138     + struct super_block *val)
4139     +{
4140     + BUG_ON(!sb || index < 0);
4141     + UNIONFS_SB(sb)->data[index].sb = val;
4142     +}
4143     +
4144     +static inline void unionfs_set_lower_super(struct super_block *sb,
4145     + struct super_block *val)
4146     +{
4147     + BUG_ON(!sb);
4148     + UNIONFS_SB(sb)->data[sbstart(sb)].sb = val;
4149     +}
4150     +
4151     +/* Branch count macros. */
4152     +static inline int branch_count(const struct super_block *sb, int index)
4153     +{
4154     + BUG_ON(!sb || index < 0);
4155     + return atomic_read(&UNIONFS_SB(sb)->data[index].open_files);
4156     +}
4157     +
4158     +static inline void set_branch_count(struct super_block *sb, int index, int val)
4159     +{
4160     + BUG_ON(!sb || index < 0);
4161     + atomic_set(&UNIONFS_SB(sb)->data[index].open_files, val);
4162     +}
4163     +
4164     +static inline void branchget(struct super_block *sb, int index)
4165     +{
4166     + BUG_ON(!sb || index < 0);
4167     + atomic_inc(&UNIONFS_SB(sb)->data[index].open_files);
4168     +}
4169     +
4170     +static inline void branchput(struct super_block *sb, int index)
4171     +{
4172     + BUG_ON(!sb || index < 0);
4173     + atomic_dec(&UNIONFS_SB(sb)->data[index].open_files);
4174     +}
4175     +
4176     +/* Dentry macros */
4177     +static inline void unionfs_set_lower_dentry_idx(struct dentry *dent, int index,
4178     + struct dentry *val)
4179     +{
4180     + BUG_ON(!dent || index < 0);
4181     + UNIONFS_D(dent)->lower_paths[index].dentry = val;
4182     +}
4183     +
4184     +static inline struct dentry *unionfs_lower_dentry_idx(
4185     + const struct dentry *dent,
4186     + int index)
4187     +{
4188     + BUG_ON(!dent || index < 0);
4189     + return UNIONFS_D(dent)->lower_paths[index].dentry;
4190     +}
4191     +
4192     +static inline struct dentry *unionfs_lower_dentry(const struct dentry *dent)
4193     +{
4194     + BUG_ON(!dent);
4195     + return unionfs_lower_dentry_idx(dent, dbstart(dent));
4196     +}
4197     +
4198     +static inline void unionfs_set_lower_mnt_idx(struct dentry *dent, int index,
4199     + struct vfsmount *mnt)
4200     +{
4201     + BUG_ON(!dent || index < 0);
4202     + UNIONFS_D(dent)->lower_paths[index].mnt = mnt;
4203     +}
4204     +
4205     +static inline struct vfsmount *unionfs_lower_mnt_idx(
4206     + const struct dentry *dent,
4207     + int index)
4208     +{
4209     + BUG_ON(!dent || index < 0);
4210     + return UNIONFS_D(dent)->lower_paths[index].mnt;
4211     +}
4212     +
4213     +static inline struct vfsmount *unionfs_lower_mnt(const struct dentry *dent)
4214     +{
4215     + BUG_ON(!dent);
4216     + return unionfs_lower_mnt_idx(dent, dbstart(dent));
4217     +}
4218     +
4219     +/* Macros for locking a dentry. */
4220     +enum unionfs_dentry_lock_class {
4221     + UNIONFS_DMUTEX_NORMAL,
4222     + UNIONFS_DMUTEX_ROOT,
4223     + UNIONFS_DMUTEX_PARENT,
4224     + UNIONFS_DMUTEX_CHILD,
4225     + UNIONFS_DMUTEX_WHITEOUT,
4226     + UNIONFS_DMUTEX_REVAL_PARENT, /* for file/dentry revalidate */
4227     + UNIONFS_DMUTEX_REVAL_CHILD, /* for file/dentry revalidate */
4228     +};
4229     +
4230     +static inline void unionfs_lock_dentry(struct dentry *d,
4231     + unsigned int subclass)
4232     +{
4233     + BUG_ON(!d);
4234     + mutex_lock_nested(&UNIONFS_D(d)->lock, subclass);
4235     +}
4236     +
4237     +static inline void unionfs_unlock_dentry(struct dentry *d)
4238     +{
4239     + BUG_ON(!d);
4240     + mutex_unlock(&UNIONFS_D(d)->lock);
4241     +}
4242     +
4243     +static inline struct dentry *unionfs_lock_parent(struct dentry *d,
4244     + unsigned int subclass)
4245     +{
4246     + struct dentry *p;
4247     +
4248     + BUG_ON(!d);
4249     + p = dget_parent(d);
4250     + if (p != d)
4251     + mutex_lock_nested(&UNIONFS_D(p)->lock, subclass);
4252     + return p;
4253     +}
4254     +
4255     +static inline void unionfs_unlock_parent(struct dentry *d, struct dentry *p)
4256     +{
4257     + BUG_ON(!d);
4258     + BUG_ON(!p);
4259     + if (p != d) {
4260     + BUG_ON(!mutex_is_locked(&UNIONFS_D(p)->lock));
4261     + mutex_unlock(&UNIONFS_D(p)->lock);
4262     + }
4263     + dput(p);
4264     +}
4265     +
4266     +static inline void verify_locked(struct dentry *d)
4267     +{
4268     + BUG_ON(!d);
4269     + BUG_ON(!mutex_is_locked(&UNIONFS_D(d)->lock));
4270     +}
4271     +
4272     +/* macros to put lower objects */
4273     +
4274     +/*
4275     + * iput lower inodes of an unionfs dentry, from bstart to bend. If
4276     + * @free_lower is true, then also kfree the memory used to hold the lower
4277     + * object pointers.
4278     + */
4279     +static inline void iput_lowers(struct inode *inode,
4280     + int bstart, int bend, bool free_lower)
4281     +{
4282     + struct inode *lower_inode;
4283     + int bindex;
4284     +
4285     + BUG_ON(!inode);
4286     + BUG_ON(!UNIONFS_I(inode));
4287     + BUG_ON(bstart < 0);
4288     +
4289     + for (bindex = bstart; bindex <= bend; bindex++) {
4290     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4291     + if (lower_inode) {
4292     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
4293     + /* see Documentation/filesystems/unionfs/issues.txt */
4294     + lockdep_off();
4295     + iput(lower_inode);
4296     + lockdep_on();
4297     + }
4298     + }
4299     +
4300     + if (free_lower) {
4301     + kfree(UNIONFS_I(inode)->lower_inodes);
4302     + UNIONFS_I(inode)->lower_inodes = NULL;
4303     + }
4304     +}
4305     +
4306     +/* iput all lower inodes, and reset start/end branch indices to -1 */
4307     +static inline void iput_lowers_all(struct inode *inode, bool free_lower)
4308     +{
4309     + int bstart, bend;
4310     +
4311     + BUG_ON(!inode);
4312     + BUG_ON(!UNIONFS_I(inode));
4313     + bstart = ibstart(inode);
4314     + bend = ibend(inode);
4315     + BUG_ON(bstart < 0);
4316     +
4317     + iput_lowers(inode, bstart, bend, free_lower);
4318     + ibstart(inode) = ibend(inode) = -1;
4319     +}
4320     +
4321     +/*
4322     + * dput/mntput all lower dentries and vfsmounts of an unionfs dentry, from
4323     + * bstart to bend. If @free_lower is true, then also kfree the memory used
4324     + * to hold the lower object pointers.
4325     + *
4326     + * XXX: implement using path_put VFS macros
4327     + */
4328     +static inline void path_put_lowers(struct dentry *dentry,
4329     + int bstart, int bend, bool free_lower)
4330     +{
4331     + struct dentry *lower_dentry;
4332     + struct vfsmount *lower_mnt;
4333     + int bindex;
4334     +
4335     + BUG_ON(!dentry);
4336     + BUG_ON(!UNIONFS_D(dentry));
4337     + BUG_ON(bstart < 0);
4338     +
4339     + for (bindex = bstart; bindex <= bend; bindex++) {
4340     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4341     + if (lower_dentry) {
4342     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
4343     + dput(lower_dentry);
4344     + }
4345     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
4346     + if (lower_mnt) {
4347     + unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
4348     + mntput(lower_mnt);
4349     + }
4350     + }
4351     +
4352     + if (free_lower) {
4353     + kfree(UNIONFS_D(dentry)->lower_paths);
4354     + UNIONFS_D(dentry)->lower_paths = NULL;
4355     + }
4356     +}
4357     +
4358     +/*
4359     + * dput/mntput all lower dentries and vfsmounts, and reset start/end branch
4360     + * indices to -1.
4361     + */
4362     +static inline void path_put_lowers_all(struct dentry *dentry, bool free_lower)
4363     +{
4364     + int bstart, bend;
4365     +
4366     + BUG_ON(!dentry);
4367     + BUG_ON(!UNIONFS_D(dentry));
4368     + bstart = dbstart(dentry);
4369     + bend = dbend(dentry);
4370     + BUG_ON(bstart < 0);
4371     +
4372     + path_put_lowers(dentry, bstart, bend, free_lower);
4373     + dbstart(dentry) = dbend(dentry) = -1;
4374     +}
4375     +
4376     +#endif /* not _FANOUT_H */
4377     diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
4378     new file mode 100644
4379     index 0000000..1c694c3
4380     --- /dev/null
4381     +++ b/fs/unionfs/file.c
4382     @@ -0,0 +1,382 @@
4383     +/*
4384     + * Copyright (c) 2003-2010 Erez Zadok
4385     + * Copyright (c) 2003-2006 Charles P. Wright
4386     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4387     + * Copyright (c) 2005-2006 Junjiro Okajima
4388     + * Copyright (c) 2005 Arun M. Krishnakumar
4389     + * Copyright (c) 2004-2006 David P. Quigley
4390     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4391     + * Copyright (c) 2003 Puja Gupta
4392     + * Copyright (c) 2003 Harikesavan Krishnan
4393     + * Copyright (c) 2003-2010 Stony Brook University
4394     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
4395     + *
4396     + * This program is free software; you can redistribute it and/or modify
4397     + * it under the terms of the GNU General Public License version 2 as
4398     + * published by the Free Software Foundation.
4399     + */
4400     +
4401     +#include "union.h"
4402     +
4403     +static ssize_t unionfs_read(struct file *file, char __user *buf,
4404     + size_t count, loff_t *ppos)
4405     +{
4406     + int err;
4407     + struct file *lower_file;
4408     + struct dentry *dentry = file->f_path.dentry;
4409     + struct dentry *parent;
4410     +
4411     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4412     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4413     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4414     +
4415     + err = unionfs_file_revalidate(file, parent, false);
4416     + if (unlikely(err))
4417     + goto out;
4418     +
4419     + lower_file = unionfs_lower_file(file);
4420     + err = vfs_read(lower_file, buf, count, ppos);
4421     + /* update our inode atime upon a successful lower read */
4422     + if (err >= 0) {
4423     + fsstack_copy_attr_atime(dentry->d_inode,
4424     + lower_file->f_path.dentry->d_inode);
4425     + unionfs_check_file(file);
4426     + }
4427     +
4428     +out:
4429     + unionfs_unlock_dentry(dentry);
4430     + unionfs_unlock_parent(dentry, parent);
4431     + unionfs_read_unlock(dentry->d_sb);
4432     + return err;
4433     +}
4434     +
4435     +static ssize_t unionfs_write(struct file *file, const char __user *buf,
4436     + size_t count, loff_t *ppos)
4437     +{
4438     + int err = 0;
4439     + struct file *lower_file;
4440     + struct dentry *dentry = file->f_path.dentry;
4441     + struct dentry *parent;
4442     +
4443     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4444     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4445     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4446     +
4447     + err = unionfs_file_revalidate(file, parent, true);
4448     + if (unlikely(err))
4449     + goto out;
4450     +
4451     + lower_file = unionfs_lower_file(file);
4452     + err = vfs_write(lower_file, buf, count, ppos);
4453     + /* update our inode times+sizes upon a successful lower write */
4454     + if (err >= 0) {
4455     + fsstack_copy_inode_size(dentry->d_inode,
4456     + lower_file->f_path.dentry->d_inode);
4457     + fsstack_copy_attr_times(dentry->d_inode,
4458     + lower_file->f_path.dentry->d_inode);
4459     + UNIONFS_F(file)->wrote_to_file = true; /* for delayed copyup */
4460     + unionfs_check_file(file);
4461     + }
4462     +
4463     +out:
4464     + unionfs_unlock_dentry(dentry);
4465     + unionfs_unlock_parent(dentry, parent);
4466     + unionfs_read_unlock(dentry->d_sb);
4467     + return err;
4468     +}
4469     +
4470     +static int unionfs_file_readdir(struct file *file, void *dirent,
4471     + filldir_t filldir)
4472     +{
4473     + return -ENOTDIR;
4474     +}
4475     +
4476     +static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
4477     +{
4478     + int err = 0;
4479     + bool willwrite;
4480     + struct file *lower_file;
4481     + struct dentry *dentry = file->f_path.dentry;
4482     + struct dentry *parent;
4483     + const struct vm_operations_struct *saved_vm_ops = NULL;
4484     +
4485     + /*
4486     + * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
4487     + * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
4488     + * has been causing false positives in file system stacking layers.
4489     + * In particular, our ->mmap is called after sys_mmap2 already holds
4490     + * mmap_sem, then we lock our own mutexes; but earlier, it's
4491     + * possible for lockdep to have locked our mutexes first, and then
4492     + * we call a lower ->readdir which could call might_fault. The
4493     + * different ordering of the locks is what lockdep complains about
4494     + * -- unnecessarily. Therefore, we have no choice but to tell
4495     + * lockdep to temporarily turn off lockdep here. Note: the comments
4496     + * inside might_sleep also suggest that it would have been
4497     + * nicer to only annotate paths that needs that might_lock_read.
4498     + */
4499     + lockdep_off();
4500     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4501     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4502     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4503     +
4504     + /* This might be deferred to mmap's writepage */
4505     + willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
4506     + err = unionfs_file_revalidate(file, parent, willwrite);
4507     + if (unlikely(err))
4508     + goto out;
4509     + unionfs_check_file(file);
4510     +
4511     + /*
4512     + * File systems which do not implement ->writepage may use
4513     + * generic_file_readonly_mmap as their ->mmap op. If you call
4514     + * generic_file_readonly_mmap with VM_WRITE, you'd get an -EINVAL.
4515     + * But we cannot call the lower ->mmap op, so we can't tell that
4516     + * writeable mappings won't work. Therefore, our only choice is to
4517     + * check if the lower file system supports the ->writepage, and if
4518     + * not, return EINVAL (the same error that
4519     + * generic_file_readonly_mmap returns in that case).
4520     + */
4521     + lower_file = unionfs_lower_file(file);
4522     + if (willwrite && !lower_file->f_mapping->a_ops->writepage) {
4523     + err = -EINVAL;
4524     + printk(KERN_ERR "unionfs: branch %d file system does not "
4525     + "support writeable mmap\n", fbstart(file));
4526     + goto out;
4527     + }
4528     +
4529     + /*
4530     + * find and save lower vm_ops.
4531     + *
4532     + * XXX: the VFS should have a cleaner way of finding the lower vm_ops
4533     + */
4534     + if (!UNIONFS_F(file)->lower_vm_ops) {
4535     + err = lower_file->f_op->mmap(lower_file, vma);
4536     + if (err) {
4537     + printk(KERN_ERR "unionfs: lower mmap failed %d\n", err);
4538     + goto out;
4539     + }
4540     + saved_vm_ops = vma->vm_ops;
4541     + err = do_munmap(current->mm, vma->vm_start,
4542     + vma->vm_end - vma->vm_start);
4543     + if (err) {
4544     + printk(KERN_ERR "unionfs: do_munmap failed %d\n", err);
4545     + goto out;
4546     + }
4547     + }
4548     +
4549     + file->f_mapping->a_ops = &unionfs_dummy_aops;
4550     + err = generic_file_mmap(file, vma);
4551     + file->f_mapping->a_ops = &unionfs_aops;
4552     + if (err) {
4553     + printk(KERN_ERR "unionfs: generic_file_mmap failed %d\n", err);
4554     + goto out;
4555     + }
4556     + vma->vm_ops = &unionfs_vm_ops;
4557     + if (!UNIONFS_F(file)->lower_vm_ops)
4558     + UNIONFS_F(file)->lower_vm_ops = saved_vm_ops;
4559     +
4560     +out:
4561     + if (!err) {
4562     + /* copyup could cause parent dir times to change */
4563     + unionfs_copy_attr_times(parent->d_inode);
4564     + unionfs_check_file(file);
4565     + }
4566     + unionfs_unlock_dentry(dentry);
4567     + unionfs_unlock_parent(dentry, parent);
4568     + unionfs_read_unlock(dentry->d_sb);
4569     + lockdep_on();
4570     + return err;
4571     +}
4572     +
4573     +int unionfs_fsync(struct file *file, int datasync)
4574     +{
4575     + int bindex, bstart, bend;
4576     + struct file *lower_file;
4577     + struct dentry *dentry = file->f_path.dentry;
4578     + struct dentry *lower_dentry;
4579     + struct dentry *parent;
4580     + struct inode *lower_inode, *inode;
4581     + int err = -EINVAL;
4582     +
4583     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4584     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4585     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4586     +
4587     + err = unionfs_file_revalidate(file, parent, true);
4588     + if (unlikely(err))
4589     + goto out;
4590     + unionfs_check_file(file);
4591     +
4592     + bstart = fbstart(file);
4593     + bend = fbend(file);
4594     + if (bstart < 0 || bend < 0)
4595     + goto out;
4596     +
4597     + inode = dentry->d_inode;
4598     + if (unlikely(!inode)) {
4599     + printk(KERN_ERR
4600     + "unionfs: null lower inode in unionfs_fsync\n");
4601     + goto out;
4602     + }
4603     + for (bindex = bstart; bindex <= bend; bindex++) {
4604     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4605     + if (!lower_inode || !lower_inode->i_fop->fsync)
4606     + continue;
4607     + lower_file = unionfs_lower_file_idx(file, bindex);
4608     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4609     + mutex_lock(&lower_inode->i_mutex);
4610     + err = lower_inode->i_fop->fsync(lower_file, datasync);
4611     + if (!err && bindex == bstart)
4612     + fsstack_copy_attr_times(inode, lower_inode);
4613     + mutex_unlock(&lower_inode->i_mutex);
4614     + if (err)
4615     + goto out;
4616     + }
4617     +
4618     +out:
4619     + if (!err)
4620     + unionfs_check_file(file);
4621     + unionfs_unlock_dentry(dentry);
4622     + unionfs_unlock_parent(dentry, parent);
4623     + unionfs_read_unlock(dentry->d_sb);
4624     + return err;
4625     +}
4626     +
4627     +int unionfs_fasync(int fd, struct file *file, int flag)
4628     +{
4629     + int bindex, bstart, bend;
4630     + struct file *lower_file;
4631     + struct dentry *dentry = file->f_path.dentry;
4632     + struct dentry *parent;
4633     + struct inode *lower_inode, *inode;
4634     + int err = 0;
4635     +
4636     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4637     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4638     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4639     +
4640     + err = unionfs_file_revalidate(file, parent, true);
4641     + if (unlikely(err))
4642     + goto out;
4643     + unionfs_check_file(file);
4644     +
4645     + bstart = fbstart(file);
4646     + bend = fbend(file);
4647     + if (bstart < 0 || bend < 0)
4648     + goto out;
4649     +
4650     + inode = dentry->d_inode;
4651     + if (unlikely(!inode)) {
4652     + printk(KERN_ERR
4653     + "unionfs: null lower inode in unionfs_fasync\n");
4654     + goto out;
4655     + }
4656     + for (bindex = bstart; bindex <= bend; bindex++) {
4657     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4658     + if (!lower_inode || !lower_inode->i_fop->fasync)
4659     + continue;
4660     + lower_file = unionfs_lower_file_idx(file, bindex);
4661     + mutex_lock(&lower_inode->i_mutex);
4662     + err = lower_inode->i_fop->fasync(fd, lower_file, flag);
4663     + if (!err && bindex == bstart)
4664     + fsstack_copy_attr_times(inode, lower_inode);
4665     + mutex_unlock(&lower_inode->i_mutex);
4666     + if (err)
4667     + goto out;
4668     + }
4669     +
4670     +out:
4671     + if (!err)
4672     + unionfs_check_file(file);
4673     + unionfs_unlock_dentry(dentry);
4674     + unionfs_unlock_parent(dentry, parent);
4675     + unionfs_read_unlock(dentry->d_sb);
4676     + return err;
4677     +}
4678     +
4679     +static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
4680     + struct pipe_inode_info *pipe, size_t len,
4681     + unsigned int flags)
4682     +{
4683     + ssize_t err;
4684     + struct file *lower_file;
4685     + struct dentry *dentry = file->f_path.dentry;
4686     + struct dentry *parent;
4687     +
4688     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4689     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4690     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4691     +
4692     + err = unionfs_file_revalidate(file, parent, false);
4693     + if (unlikely(err))
4694     + goto out;
4695     +
4696     + lower_file = unionfs_lower_file(file);
4697     + err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
4698     + /* update our inode atime upon a successful lower splice-read */
4699     + if (err >= 0) {
4700     + fsstack_copy_attr_atime(dentry->d_inode,
4701     + lower_file->f_path.dentry->d_inode);
4702     + unionfs_check_file(file);
4703     + }
4704     +
4705     +out:
4706     + unionfs_unlock_dentry(dentry);
4707     + unionfs_unlock_parent(dentry, parent);
4708     + unionfs_read_unlock(dentry->d_sb);
4709     + return err;
4710     +}
4711     +
4712     +static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
4713     + struct file *file, loff_t *ppos,
4714     + size_t len, unsigned int flags)
4715     +{
4716     + ssize_t err = 0;
4717     + struct file *lower_file;
4718     + struct dentry *dentry = file->f_path.dentry;
4719     + struct dentry *parent;
4720     +
4721     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4722     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4723     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4724     +
4725     + err = unionfs_file_revalidate(file, parent, true);
4726     + if (unlikely(err))
4727     + goto out;
4728     +
4729     + lower_file = unionfs_lower_file(file);
4730     + err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
4731     + /* update our inode times+sizes upon a successful lower write */
4732     + if (err >= 0) {
4733     + fsstack_copy_inode_size(dentry->d_inode,
4734     + lower_file->f_path.dentry->d_inode);
4735     + fsstack_copy_attr_times(dentry->d_inode,
4736     + lower_file->f_path.dentry->d_inode);
4737     + unionfs_check_file(file);
4738     + }
4739     +
4740     +out:
4741     + unionfs_unlock_dentry(dentry);
4742     + unionfs_unlock_parent(dentry, parent);
4743     + unionfs_read_unlock(dentry->d_sb);
4744     + return err;
4745     +}
4746     +
4747     +struct file_operations unionfs_main_fops = {
4748     + .llseek = generic_file_llseek,
4749     + .read = unionfs_read,
4750     + .write = unionfs_write,
4751     + .readdir = unionfs_file_readdir,
4752     + .unlocked_ioctl = unionfs_ioctl,
4753     +#ifdef CONFIG_COMPAT
4754     + .compat_ioctl = unionfs_ioctl,
4755     +#endif
4756     + .mmap = unionfs_mmap,
4757     + .open = unionfs_open,
4758     + .flush = unionfs_flush,
4759     + .release = unionfs_file_release,
4760     + .fsync = unionfs_fsync,
4761     + .fasync = unionfs_fasync,
4762     + .splice_read = unionfs_splice_read,
4763     + .splice_write = unionfs_splice_write,
4764     +};
4765     diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
4766     new file mode 100644
4767     index 0000000..4c36f16
4768     --- /dev/null
4769     +++ b/fs/unionfs/inode.c
4770     @@ -0,0 +1,1061 @@
4771     +/*
4772     + * Copyright (c) 2003-2010 Erez Zadok
4773     + * Copyright (c) 2003-2006 Charles P. Wright
4774     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4775     + * Copyright (c) 2005-2006 Junjiro Okajima
4776     + * Copyright (c) 2005 Arun M. Krishnakumar
4777     + * Copyright (c) 2004-2006 David P. Quigley
4778     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4779     + * Copyright (c) 2003 Puja Gupta
4780     + * Copyright (c) 2003 Harikesavan Krishnan
4781     + * Copyright (c) 2003-2010 Stony Brook University
4782     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
4783     + *
4784     + * This program is free software; you can redistribute it and/or modify
4785     + * it under the terms of the GNU General Public License version 2 as
4786     + * published by the Free Software Foundation.
4787     + */
4788     +
4789     +#include "union.h"
4790     +
4791     +/*
4792     + * Find a writeable branch to create new object in. Checks all writeble
4793     + * branches of the parent inode, from istart to iend order; if none are
4794     + * suitable, also tries branch 0 (which may require a copyup).
4795     + *
4796     + * Return a lower_dentry we can use to create object in, or ERR_PTR.
4797     + */
4798     +static struct dentry *find_writeable_branch(struct inode *parent,
4799     + struct dentry *dentry)
4800     +{
4801     + int err = -EINVAL;
4802     + int bindex, istart, iend;
4803     + struct dentry *lower_dentry = NULL;
4804     +
4805     + istart = ibstart(parent);
4806     + iend = ibend(parent);
4807     + if (istart < 0)
4808     + goto out;
4809     +
4810     +begin:
4811     + for (bindex = istart; bindex <= iend; bindex++) {
4812     + /* skip non-writeable branches */
4813     + err = is_robranch_super(dentry->d_sb, bindex);
4814     + if (err) {
4815     + err = -EROFS;
4816     + continue;
4817     + }
4818     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4819     + if (!lower_dentry)
4820     + continue;
4821     + /*
4822     + * check for whiteouts in writeable branch, and remove them
4823     + * if necessary.
4824     + */
4825     + err = check_unlink_whiteout(dentry, lower_dentry, bindex);
4826     + if (err > 0) /* ignore if whiteout found and removed */
4827     + err = 0;
4828     + if (err)
4829     + continue;
4830     + /* if get here, we can write to the branch */
4831     + break;
4832     + }
4833     + /*
4834     + * If istart wasn't already branch 0, and we got any error, then try
4835     + * branch 0 (which may require copyup)
4836     + */
4837     + if (err && istart > 0) {
4838     + istart = iend = 0;
4839     + goto begin;
4840     + }
4841     +
4842     + /*
4843     + * If we tried even branch 0, and still got an error, abort. But if
4844     + * the error was an EROFS, then we should try to copyup.
4845     + */
4846     + if (err && err != -EROFS)
4847     + goto out;
4848     +
4849     + /*
4850     + * If we get here, then check if copyup needed. If lower_dentry is
4851     + * NULL, create the entire dentry directory structure in branch 0.
4852     + */
4853     + if (!lower_dentry) {
4854     + bindex = 0;
4855     + lower_dentry = create_parents(parent, dentry,
4856     + dentry->d_name.name, bindex);
4857     + if (IS_ERR(lower_dentry)) {
4858     + err = PTR_ERR(lower_dentry);
4859     + goto out;
4860     + }
4861     + }
4862     + err = 0; /* all's well */
4863     +out:
4864     + if (err)
4865     + return ERR_PTR(err);
4866     + return lower_dentry;
4867     +}
4868     +
4869     +static int unionfs_create(struct inode *dir, struct dentry *dentry,
4870     + int mode, struct nameidata *nd_unused)
4871     +{
4872     + int err = 0;
4873     + struct dentry *lower_dentry = NULL;
4874     + struct dentry *lower_parent_dentry = NULL;
4875     + struct dentry *parent;
4876     + int valid = 0;
4877     + struct nameidata lower_nd;
4878     +
4879     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4880     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4881     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4882     +
4883     + valid = __unionfs_d_revalidate(dentry, parent, false);
4884     + if (unlikely(!valid)) {
4885     + err = -ESTALE; /* same as what real_lookup does */
4886     + goto out;
4887     + }
4888     +
4889     + lower_dentry = find_writeable_branch(dir, dentry);
4890     + if (IS_ERR(lower_dentry)) {
4891     + err = PTR_ERR(lower_dentry);
4892     + goto out;
4893     + }
4894     +
4895     + lower_parent_dentry = lock_parent(lower_dentry);
4896     + if (IS_ERR(lower_parent_dentry)) {
4897     + err = PTR_ERR(lower_parent_dentry);
4898     + goto out_unlock;
4899     + }
4900     +
4901     + err = init_lower_nd(&lower_nd, LOOKUP_CREATE);
4902     + if (unlikely(err < 0))
4903     + goto out_unlock;
4904     + err = vfs_create(lower_parent_dentry->d_inode, lower_dentry, mode,
4905     + &lower_nd);
4906     + release_lower_nd(&lower_nd, err);
4907     +
4908     + if (!err) {
4909     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
4910     + if (!err) {
4911     + unionfs_copy_attr_times(dir);
4912     + fsstack_copy_inode_size(dir,
4913     + lower_parent_dentry->d_inode);
4914     + /* update no. of links on parent directory */
4915     + dir->i_nlink = unionfs_get_nlinks(dir);
4916     + }
4917     + }
4918     +
4919     +out_unlock:
4920     + unlock_dir(lower_parent_dentry);
4921     +out:
4922     + if (!err) {
4923     + unionfs_postcopyup_setmnt(dentry);
4924     + unionfs_check_inode(dir);
4925     + unionfs_check_dentry(dentry);
4926     + }
4927     + unionfs_unlock_dentry(dentry);
4928     + unionfs_unlock_parent(dentry, parent);
4929     + unionfs_read_unlock(dentry->d_sb);
4930     + return err;
4931     +}
4932     +
4933     +/*
4934     + * unionfs_lookup is the only special function which takes a dentry, yet we
4935     + * do NOT want to call __unionfs_d_revalidate_chain because by definition,
4936     + * we don't have a valid dentry here yet.
4937     + */
4938     +static struct dentry *unionfs_lookup(struct inode *dir,
4939     + struct dentry *dentry,
4940     + struct nameidata *nd_unused)
4941     +{
4942     + struct dentry *ret, *parent;
4943     + int err = 0;
4944     +
4945     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4946     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4947     +
4948     + /*
4949     + * As long as we lock/dget the parent, then can skip validating the
4950     + * parent now; we may have to rebuild this dentry on the next
4951     + * ->d_revalidate, however.
4952     + */
4953     +
4954     + /* allocate dentry private data. We free it in ->d_release */
4955     + err = new_dentry_private_data(dentry, UNIONFS_DMUTEX_CHILD);
4956     + if (unlikely(err)) {
4957     + ret = ERR_PTR(err);
4958     + goto out;
4959     + }
4960     +
4961     + ret = unionfs_lookup_full(dentry, parent, INTERPOSE_LOOKUP);
4962     +
4963     + if (!IS_ERR(ret)) {
4964     + if (ret)
4965     + dentry = ret;
4966     + /* lookup_full can return multiple positive dentries */
4967     + if (dentry->d_inode && !S_ISDIR(dentry->d_inode->i_mode)) {
4968     + BUG_ON(dbstart(dentry) < 0);
4969     + unionfs_postcopyup_release(dentry);
4970     + }
4971     + unionfs_copy_attr_times(dentry->d_inode);
4972     + }
4973     +
4974     + unionfs_check_inode(dir);
4975     + if (!IS_ERR(ret))
4976     + unionfs_check_dentry(dentry);
4977     + unionfs_check_dentry(parent);
4978     + unionfs_unlock_dentry(dentry); /* locked in new_dentry_private data */
4979     +
4980     +out:
4981     + unionfs_unlock_parent(dentry, parent);
4982     + unionfs_read_unlock(dentry->d_sb);
4983     +
4984     + return ret;
4985     +}
4986     +
4987     +static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
4988     + struct dentry *new_dentry)
4989     +{
4990     + int err = 0;
4991     + struct dentry *lower_old_dentry = NULL;
4992     + struct dentry *lower_new_dentry = NULL;
4993     + struct dentry *lower_dir_dentry = NULL;
4994     + struct dentry *old_parent, *new_parent;
4995     + char *name = NULL;
4996     + bool valid;
4997     +
4998     + unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4999     + old_parent = dget_parent(old_dentry);
5000     + new_parent = dget_parent(new_dentry);
5001     + unionfs_double_lock_parents(old_parent, new_parent);
5002     + unionfs_double_lock_dentry(old_dentry, new_dentry);
5003     +
5004     + valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
5005     + if (unlikely(!valid)) {
5006     + err = -ESTALE;
5007     + goto out;
5008     + }
5009     + if (new_dentry->d_inode) {
5010     + valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
5011     + if (unlikely(!valid)) {
5012     + err = -ESTALE;
5013     + goto out;
5014     + }
5015     + }
5016     +
5017     + lower_new_dentry = unionfs_lower_dentry(new_dentry);
5018     +
5019     + /* check for a whiteout in new dentry branch, and delete it */
5020     + err = check_unlink_whiteout(new_dentry, lower_new_dentry,
5021     + dbstart(new_dentry));
5022     + if (err > 0) { /* whiteout found and removed successfully */
5023     + lower_dir_dentry = dget_parent(lower_new_dentry);
5024     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
5025     + dput(lower_dir_dentry);
5026     + dir->i_nlink = unionfs_get_nlinks(dir);
5027     + err = 0;
5028     + }
5029     + if (err)
5030     + goto out;
5031     +
5032     + /* check if parent hierachy is needed, then link in same branch */
5033     + if (dbstart(old_dentry) != dbstart(new_dentry)) {
5034     + lower_new_dentry = create_parents(dir, new_dentry,
5035     + new_dentry->d_name.name,
5036     + dbstart(old_dentry));
5037     + err = PTR_ERR(lower_new_dentry);
5038     + if (IS_COPYUP_ERR(err))
5039     + goto docopyup;
5040     + if (!lower_new_dentry || IS_ERR(lower_new_dentry))
5041     + goto out;
5042     + }
5043     + lower_new_dentry = unionfs_lower_dentry(new_dentry);
5044     + lower_old_dentry = unionfs_lower_dentry(old_dentry);
5045     +
5046     + BUG_ON(dbstart(old_dentry) != dbstart(new_dentry));
5047     + lower_dir_dentry = lock_parent(lower_new_dentry);
5048     + err = is_robranch(old_dentry);
5049     + if (!err) {
5050     + /* see Documentation/filesystems/unionfs/issues.txt */
5051     + lockdep_off();
5052     + err = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
5053     + lower_new_dentry);
5054     + lockdep_on();
5055     + }
5056     + unlock_dir(lower_dir_dentry);
5057     +
5058     +docopyup:
5059     + if (IS_COPYUP_ERR(err)) {
5060     + int old_bstart = dbstart(old_dentry);
5061     + int bindex;
5062     +
5063     + for (bindex = old_bstart - 1; bindex >= 0; bindex--) {
5064     + err = copyup_dentry(old_parent->d_inode,
5065     + old_dentry, old_bstart,
5066     + bindex, old_dentry->d_name.name,
5067     + old_dentry->d_name.len, NULL,
5068     + i_size_read(old_dentry->d_inode));
5069     + if (err)
5070     + continue;
5071     + lower_new_dentry =
5072     + create_parents(dir, new_dentry,
5073     + new_dentry->d_name.name,
5074     + bindex);
5075     + lower_old_dentry = unionfs_lower_dentry(old_dentry);
5076     + lower_dir_dentry = lock_parent(lower_new_dentry);
5077     + /* see Documentation/filesystems/unionfs/issues.txt */
5078     + lockdep_off();
5079     + /* do vfs_link */
5080     + err = vfs_link(lower_old_dentry,
5081     + lower_dir_dentry->d_inode,
5082     + lower_new_dentry);
5083     + lockdep_on();
5084     + unlock_dir(lower_dir_dentry);
5085     + goto check_link;
5086     + }
5087     + goto out;
5088     + }
5089     +
5090     +check_link:
5091     + if (err || !lower_new_dentry->d_inode)
5092     + goto out;
5093     +
5094     + /* Its a hard link, so use the same inode */
5095     + new_dentry->d_inode = igrab(old_dentry->d_inode);
5096     + d_add(new_dentry, new_dentry->d_inode);
5097     + unionfs_copy_attr_all(dir, lower_new_dentry->d_parent->d_inode);
5098     + fsstack_copy_inode_size(dir, lower_new_dentry->d_parent->d_inode);
5099     +
5100     + /* propagate number of hard-links */
5101     + old_dentry->d_inode->i_nlink = unionfs_get_nlinks(old_dentry->d_inode);
5102     + /* new dentry's ctime may have changed due to hard-link counts */
5103     + unionfs_copy_attr_times(new_dentry->d_inode);
5104     +
5105     +out:
5106     + if (!new_dentry->d_inode)
5107     + d_drop(new_dentry);
5108     +
5109     + kfree(name);
5110     + if (!err)
5111     + unionfs_postcopyup_setmnt(new_dentry);
5112     +
5113     + unionfs_check_inode(dir);
5114     + unionfs_check_dentry(new_dentry);
5115     + unionfs_check_dentry(old_dentry);
5116     +
5117     + unionfs_double_unlock_dentry(old_dentry, new_dentry);
5118     + unionfs_double_unlock_parents(old_parent, new_parent);
5119     + dput(new_parent);
5120     + dput(old_parent);
5121     + unionfs_read_unlock(old_dentry->d_sb);
5122     +
5123     + return err;
5124     +}
5125     +
5126     +static int unionfs_symlink(struct inode *dir, struct dentry *dentry,
5127     + const char *symname)
5128     +{
5129     + int err = 0;
5130     + struct dentry *lower_dentry = NULL;
5131     + struct dentry *wh_dentry = NULL;
5132     + struct dentry *lower_parent_dentry = NULL;
5133     + struct dentry *parent;
5134     + char *name = NULL;
5135     + int valid = 0;
5136     + umode_t mode;
5137     +
5138     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5139     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5140     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5141     +
5142     + valid = __unionfs_d_revalidate(dentry, parent, false);
5143     + if (unlikely(!valid)) {
5144     + err = -ESTALE;
5145     + goto out;
5146     + }
5147     +
5148     + /*
5149     + * It's only a bug if this dentry was not negative and couldn't be
5150     + * revalidated (shouldn't happen).
5151     + */
5152     + BUG_ON(!valid && dentry->d_inode);
5153     +
5154     + lower_dentry = find_writeable_branch(dir, dentry);
5155     + if (IS_ERR(lower_dentry)) {
5156     + err = PTR_ERR(lower_dentry);
5157     + goto out;
5158     + }
5159     +
5160     + lower_parent_dentry = lock_parent(lower_dentry);
5161     + if (IS_ERR(lower_parent_dentry)) {
5162     + err = PTR_ERR(lower_parent_dentry);
5163     + goto out_unlock;
5164     + }
5165     +
5166     + mode = S_IALLUGO;
5167     + err = vfs_symlink(lower_parent_dentry->d_inode, lower_dentry, symname);
5168     + if (!err) {
5169     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5170     + if (!err) {
5171     + unionfs_copy_attr_times(dir);
5172     + fsstack_copy_inode_size(dir,
5173     + lower_parent_dentry->d_inode);
5174     + /* update no. of links on parent directory */
5175     + dir->i_nlink = unionfs_get_nlinks(dir);
5176     + }
5177     + }
5178     +
5179     +out_unlock:
5180     + unlock_dir(lower_parent_dentry);
5181     +out:
5182     + dput(wh_dentry);
5183     + kfree(name);
5184     +
5185     + if (!err) {
5186     + unionfs_postcopyup_setmnt(dentry);
5187     + unionfs_check_inode(dir);
5188     + unionfs_check_dentry(dentry);
5189     + }
5190     + unionfs_unlock_dentry(dentry);
5191     + unionfs_unlock_parent(dentry, parent);
5192     + unionfs_read_unlock(dentry->d_sb);
5193     + return err;
5194     +}
5195     +
5196     +static int unionfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
5197     +{
5198     + int err = 0;
5199     + struct dentry *lower_dentry = NULL;
5200     + struct dentry *lower_parent_dentry = NULL;
5201     + struct dentry *parent;
5202     + int bindex = 0, bstart;
5203     + char *name = NULL;
5204     + int valid;
5205     +
5206     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5207     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5208     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5209     +
5210     + valid = __unionfs_d_revalidate(dentry, parent, false);
5211     + if (unlikely(!valid)) {
5212     + err = -ESTALE; /* same as what real_lookup does */
5213     + goto out;
5214     + }
5215     +
5216     + bstart = dbstart(dentry);
5217     +
5218     + lower_dentry = unionfs_lower_dentry(dentry);
5219     +
5220     + /* check for a whiteout in new dentry branch, and delete it */
5221     + err = check_unlink_whiteout(dentry, lower_dentry, bstart);
5222     + if (err > 0) /* whiteout found and removed successfully */
5223     + err = 0;
5224     + if (err) {
5225     + /* exit if the error returned was NOT -EROFS */
5226     + if (!IS_COPYUP_ERR(err))
5227     + goto out;
5228     + bstart--;
5229     + }
5230     +
5231     + /* check if copyup's needed, and mkdir */
5232     + for (bindex = bstart; bindex >= 0; bindex--) {
5233     + int i;
5234     + int bend = dbend(dentry);
5235     +
5236     + if (is_robranch_super(dentry->d_sb, bindex))
5237     + continue;
5238     +
5239     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5240     + if (!lower_dentry) {
5241     + lower_dentry = create_parents(dir, dentry,
5242     + dentry->d_name.name,
5243     + bindex);
5244     + if (!lower_dentry || IS_ERR(lower_dentry)) {
5245     + printk(KERN_ERR "unionfs: lower dentry "
5246     + " NULL for bindex = %d\n", bindex);
5247     + continue;
5248     + }
5249     + }
5250     +
5251     + lower_parent_dentry = lock_parent(lower_dentry);
5252     +
5253     + if (IS_ERR(lower_parent_dentry)) {
5254     + err = PTR_ERR(lower_parent_dentry);
5255     + goto out;
5256     + }
5257     +
5258     + err = vfs_mkdir(lower_parent_dentry->d_inode, lower_dentry,
5259     + mode);
5260     +
5261     + unlock_dir(lower_parent_dentry);
5262     +
5263     + /* did the mkdir succeed? */
5264     + if (err)
5265     + break;
5266     +
5267     + for (i = bindex + 1; i <= bend; i++) {
5268     + /* XXX: use path_put_lowers? */
5269     + if (unionfs_lower_dentry_idx(dentry, i)) {
5270     + dput(unionfs_lower_dentry_idx(dentry, i));
5271     + unionfs_set_lower_dentry_idx(dentry, i, NULL);
5272     + }
5273     + }
5274     + dbend(dentry) = bindex;
5275     +
5276     + /*
5277     + * Only INTERPOSE_LOOKUP can return a value other than 0 on
5278     + * err.
5279     + */
5280     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5281     + if (!err) {
5282     + unionfs_copy_attr_times(dir);
5283     + fsstack_copy_inode_size(dir,
5284     + lower_parent_dentry->d_inode);
5285     +
5286     + /* update number of links on parent directory */
5287     + dir->i_nlink = unionfs_get_nlinks(dir);
5288     + }
5289     +
5290     + err = make_dir_opaque(dentry, dbstart(dentry));
5291     + if (err) {
5292     + printk(KERN_ERR "unionfs: mkdir: error creating "
5293     + ".wh.__dir_opaque: %d\n", err);
5294     + goto out;
5295     + }
5296     +
5297     + /* we are done! */
5298     + break;
5299     + }
5300     +
5301     +out:
5302     + if (!dentry->d_inode)
5303     + d_drop(dentry);
5304     +
5305     + kfree(name);
5306     +
5307     + if (!err) {
5308     + unionfs_copy_attr_times(dentry->d_inode);
5309     + unionfs_postcopyup_setmnt(dentry);
5310     + }
5311     + unionfs_check_inode(dir);
5312     + unionfs_check_dentry(dentry);
5313     + unionfs_unlock_dentry(dentry);
5314     + unionfs_unlock_parent(dentry, parent);
5315     + unionfs_read_unlock(dentry->d_sb);
5316     +
5317     + return err;
5318     +}
5319     +
5320     +static int unionfs_mknod(struct inode *dir, struct dentry *dentry, int mode,
5321     + dev_t dev)
5322     +{
5323     + int err = 0;
5324     + struct dentry *lower_dentry = NULL;
5325     + struct dentry *wh_dentry = NULL;
5326     + struct dentry *lower_parent_dentry = NULL;
5327     + struct dentry *parent;
5328     + char *name = NULL;
5329     + int valid = 0;
5330     +
5331     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5332     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5333     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5334     +
5335     + valid = __unionfs_d_revalidate(dentry, parent, false);
5336     + if (unlikely(!valid)) {
5337     + err = -ESTALE;
5338     + goto out;
5339     + }
5340     +
5341     + /*
5342     + * It's only a bug if this dentry was not negative and couldn't be
5343     + * revalidated (shouldn't happen).
5344     + */
5345     + BUG_ON(!valid && dentry->d_inode);
5346     +
5347     + lower_dentry = find_writeable_branch(dir, dentry);
5348     + if (IS_ERR(lower_dentry)) {
5349     + err = PTR_ERR(lower_dentry);
5350     + goto out;
5351     + }
5352     +
5353     + lower_parent_dentry = lock_parent(lower_dentry);
5354     + if (IS_ERR(lower_parent_dentry)) {
5355     + err = PTR_ERR(lower_parent_dentry);
5356     + goto out_unlock;
5357     + }
5358     +
5359     + err = vfs_mknod(lower_parent_dentry->d_inode, lower_dentry, mode, dev);
5360     + if (!err) {
5361     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5362     + if (!err) {
5363     + unionfs_copy_attr_times(dir);
5364     + fsstack_copy_inode_size(dir,
5365     + lower_parent_dentry->d_inode);
5366     + /* update no. of links on parent directory */
5367     + dir->i_nlink = unionfs_get_nlinks(dir);
5368     + }
5369     + }
5370     +
5371     +out_unlock:
5372     + unlock_dir(lower_parent_dentry);
5373     +out:
5374     + dput(wh_dentry);
5375     + kfree(name);
5376     +
5377     + if (!err) {
5378     + unionfs_postcopyup_setmnt(dentry);
5379     + unionfs_check_inode(dir);
5380     + unionfs_check_dentry(dentry);
5381     + }
5382     + unionfs_unlock_dentry(dentry);
5383     + unionfs_unlock_parent(dentry, parent);
5384     + unionfs_read_unlock(dentry->d_sb);
5385     + return err;
5386     +}
5387     +
5388     +/* requires sb, dentry, and parent to already be locked */
5389     +static int __unionfs_readlink(struct dentry *dentry, char __user *buf,
5390     + int bufsiz)
5391     +{
5392     + int err;
5393     + struct dentry *lower_dentry;
5394     +
5395     + lower_dentry = unionfs_lower_dentry(dentry);
5396     +
5397     + if (!lower_dentry->d_inode->i_op ||
5398     + !lower_dentry->d_inode->i_op->readlink) {
5399     + err = -EINVAL;
5400     + goto out;
5401     + }
5402     +
5403     + err = lower_dentry->d_inode->i_op->readlink(lower_dentry,
5404     + buf, bufsiz);
5405     + if (err >= 0)
5406     + fsstack_copy_attr_atime(dentry->d_inode,
5407     + lower_dentry->d_inode);
5408     +
5409     +out:
5410     + return err;
5411     +}
5412     +
5413     +static int unionfs_readlink(struct dentry *dentry, char __user *buf,
5414     + int bufsiz)
5415     +{
5416     + int err;
5417     + struct dentry *parent;
5418     +
5419     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5420     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5421     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5422     +
5423     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5424     + err = -ESTALE;
5425     + goto out;
5426     + }
5427     +
5428     + err = __unionfs_readlink(dentry, buf, bufsiz);
5429     +
5430     +out:
5431     + unionfs_check_dentry(dentry);
5432     + unionfs_unlock_dentry(dentry);
5433     + unionfs_unlock_parent(dentry, parent);
5434     + unionfs_read_unlock(dentry->d_sb);
5435     +
5436     + return err;
5437     +}
5438     +
5439     +static void *unionfs_follow_link(struct dentry *dentry, struct nameidata *nd)
5440     +{
5441     + char *buf;
5442     + int len = PAGE_SIZE, err;
5443     + mm_segment_t old_fs;
5444     + struct dentry *parent;
5445     +
5446     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5447     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5448     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5449     +
5450     + /* This is freed by the put_link method assuming a successful call. */
5451     + buf = kmalloc(len, GFP_KERNEL);
5452     + if (unlikely(!buf)) {
5453     + err = -ENOMEM;
5454     + goto out;
5455     + }
5456     +
5457     + /* read the symlink, and then we will follow it */
5458     + old_fs = get_fs();
5459     + set_fs(KERNEL_DS);
5460     + err = __unionfs_readlink(dentry, buf, len);
5461     + set_fs(old_fs);
5462     + if (err < 0) {
5463     + kfree(buf);
5464     + buf = NULL;
5465     + goto out;
5466     + }
5467     + buf[err] = 0;
5468     + nd_set_link(nd, buf);
5469     + err = 0;
5470     +
5471     +out:
5472     + if (err >= 0) {
5473     + unionfs_check_nd(nd);
5474     + unionfs_check_dentry(dentry);
5475     + }
5476     +
5477     + unionfs_unlock_dentry(dentry);
5478     + unionfs_unlock_parent(dentry, parent);
5479     + unionfs_read_unlock(dentry->d_sb);
5480     +
5481     + return ERR_PTR(err);
5482     +}
5483     +
5484     +/* this @nd *IS* still used */
5485     +static void unionfs_put_link(struct dentry *dentry, struct nameidata *nd,
5486     + void *cookie)
5487     +{
5488     + struct dentry *parent;
5489     + char *buf;
5490     +
5491     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5492     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5493     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5494     +
5495     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false)))
5496     + printk(KERN_ERR
5497     + "unionfs: put_link failed to revalidate dentry\n");
5498     +
5499     + unionfs_check_dentry(dentry);
5500     +#if 0
5501     + /* XXX: can't run this check b/c this fxn can receive a poisoned 'nd' PTR */
5502     + unionfs_check_nd(nd);
5503     +#endif
5504     + buf = nd_get_link(nd);
5505     + if (!IS_ERR(buf))
5506     + kfree(buf);
5507     + unionfs_unlock_dentry(dentry);
5508     + unionfs_unlock_parent(dentry, parent);
5509     + unionfs_read_unlock(dentry->d_sb);
5510     +}
5511     +
5512     +/*
5513     + * This is a variant of fs/namei.c:permission() or inode_permission() which
5514     + * skips over EROFS tests (because we perform copyup on EROFS).
5515     + */
5516     +static int __inode_permission(struct inode *inode, int mask)
5517     +{
5518     + int retval;
5519     +
5520     + /* nobody gets write access to an immutable file */
5521     + if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
5522     + return -EACCES;
5523     +
5524     + /* Ordinary permission routines do not understand MAY_APPEND. */
5525     + if (inode->i_op && inode->i_op->permission) {
5526     + retval = inode->i_op->permission(inode, mask);
5527     + if (!retval) {
5528     + /*
5529     + * Exec permission on a regular file is denied if none
5530     + * of the execute bits are set.
5531     + *
5532     + * This check should be done by the ->permission()
5533     + * method.
5534     + */
5535     + if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode) &&
5536     + !(inode->i_mode & S_IXUGO))
5537     + return -EACCES;
5538     + }
5539     + } else {
5540     + retval = generic_permission(inode, mask, NULL);
5541     + }
5542     + if (retval)
5543     + return retval;
5544     +
5545     + return security_inode_permission(inode,
5546     + mask & (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND));
5547     +}
5548     +
5549     +/*
5550     + * Don't grab the superblock read-lock in unionfs_permission, which prevents
5551     + * a deadlock with the branch-management "add branch" code (which grabbed
5552     + * the write lock). It is safe to not grab the read lock here, because even
5553     + * with branch management taking place, there is no chance that
5554     + * unionfs_permission, or anything it calls, will use stale branch
5555     + * information.
5556     + */
5557     +static int unionfs_permission(struct inode *inode, int mask)
5558     +{
5559     + struct inode *lower_inode = NULL;
5560     + int err = 0;
5561     + int bindex, bstart, bend;
5562     + const int is_file = !S_ISDIR(inode->i_mode);
5563     + const int write_mask = (mask & MAY_WRITE) && !(mask & MAY_READ);
5564     + struct inode *inode_grabbed = igrab(inode);
5565     + struct dentry *dentry = d_find_alias(inode);
5566     +
5567     + if (dentry)
5568     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5569     +
5570     + if (!UNIONFS_I(inode)->lower_inodes) {
5571     + if (is_file) /* dirs can be unlinked but chdir'ed to */
5572     + err = -ESTALE; /* force revalidate */
5573     + goto out;
5574     + }
5575     + bstart = ibstart(inode);
5576     + bend = ibend(inode);
5577     + if (unlikely(bstart < 0 || bend < 0)) {
5578     + /*
5579     + * With branch-management, we can get a stale inode here.
5580     + * If so, we return ESTALE back to link_path_walk, which
5581     + * would discard the dcache entry and re-lookup the
5582     + * dentry+inode. This should be equivalent to issuing
5583     + * __unionfs_d_revalidate_chain on nd.dentry here.
5584     + */
5585     + if (is_file) /* dirs can be unlinked but chdir'ed to */
5586     + err = -ESTALE; /* force revalidate */
5587     + goto out;
5588     + }
5589     +
5590     + for (bindex = bstart; bindex <= bend; bindex++) {
5591     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
5592     + if (!lower_inode)
5593     + continue;
5594     +
5595     + /*
5596     + * check the condition for D-F-D underlying files/directories,
5597     + * we don't have to check for files, if we are checking for
5598     + * directories.
5599     + */
5600     + if (!is_file && !S_ISDIR(lower_inode->i_mode))
5601     + continue;
5602     +
5603     + /*
5604     + * We check basic permissions, but we ignore any conditions
5605     + * such as readonly file systems or branches marked as
5606     + * readonly, because those conditions should lead to a
5607     + * copyup taking place later on. However, if user never had
5608     + * access to the file, then no copyup could ever take place.
5609     + */
5610     + err = __inode_permission(lower_inode, mask);
5611     + if (err && err != -EACCES && err != EPERM && bindex > 0) {
5612     + umode_t mode = lower_inode->i_mode;
5613     + if ((is_robranch_super(inode->i_sb, bindex) ||
5614     + __is_rdonly(lower_inode)) &&
5615     + (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
5616     + err = 0;
5617     + if (IS_COPYUP_ERR(err))
5618     + err = 0;
5619     + }
5620     +
5621     + /*
5622     + * NFS HACK: NFSv2/3 return EACCES on readonly-exported,
5623     + * locally readonly-mounted file systems, instead of EROFS
5624     + * like other file systems do. So we have no choice here
5625     + * but to intercept this and ignore it for NFS branches
5626     + * marked readonly. Specifically, we avoid using NFS's own
5627     + * "broken" ->permission method, and rely on
5628     + * generic_permission() to do basic checking for us.
5629     + */
5630     + if (err && err == -EACCES &&
5631     + is_robranch_super(inode->i_sb, bindex) &&
5632     + lower_inode->i_sb->s_magic == NFS_SUPER_MAGIC)
5633     + err = generic_permission(lower_inode, mask, NULL);
5634     +
5635     + /*
5636     + * The permissions are an intersection of the overall directory
5637     + * permissions, so we fail if one fails.
5638     + */
5639     + if (err)
5640     + goto out;
5641     +
5642     + /* only the leftmost file matters. */
5643     + if (is_file || write_mask) {
5644     + if (is_file && write_mask) {
5645     + err = get_write_access(lower_inode);
5646     + if (!err)
5647     + put_write_access(lower_inode);
5648     + }
5649     + break;
5650     + }
5651     + }
5652     + /* sync times which may have changed (asynchronously) below */
5653     + unionfs_copy_attr_times(inode);
5654     +
5655     +out:
5656     + unionfs_check_inode(inode);
5657     + if (dentry) {
5658     + unionfs_unlock_dentry(dentry);
5659     + dput(dentry);
5660     + }
5661     + iput(inode_grabbed);
5662     + return err;
5663     +}
5664     +
5665     +static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
5666     +{
5667     + int err = 0;
5668     + struct dentry *lower_dentry;
5669     + struct dentry *parent;
5670     + struct inode *inode;
5671     + struct inode *lower_inode;
5672     + int bstart, bend, bindex;
5673     + loff_t size;
5674     +
5675     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5676     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5677     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5678     +
5679     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5680     + err = -ESTALE;
5681     + goto out;
5682     + }
5683     +
5684     + bstart = dbstart(dentry);
5685     + bend = dbend(dentry);
5686     + inode = dentry->d_inode;
5687     +
5688     + /*
5689     + * mode change is for clearing setuid/setgid. Allow lower filesystem
5690     + * to reinterpret it in its own way.
5691     + */
5692     + if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
5693     + ia->ia_valid &= ~ATTR_MODE;
5694     +
5695     + lower_dentry = unionfs_lower_dentry(dentry);
5696     + if (!lower_dentry) { /* should never happen after above revalidate */
5697     + err = -EINVAL;
5698     + goto out;
5699     + }
5700     + lower_inode = unionfs_lower_inode(inode);
5701     +
5702     + /* check if user has permission to change lower inode */
5703     + err = inode_change_ok(lower_inode, ia);
5704     + if (err)
5705     + goto out;
5706     +
5707     + /* copyup if the file is on a read only branch */
5708     + if (is_robranch_super(dentry->d_sb, bstart)
5709     + || __is_rdonly(lower_inode)) {
5710     + /* check if we have a branch to copy up to */
5711     + if (bstart <= 0) {
5712     + err = -EACCES;
5713     + goto out;
5714     + }
5715     +
5716     + if (ia->ia_valid & ATTR_SIZE)
5717     + size = ia->ia_size;
5718     + else
5719     + size = i_size_read(inode);
5720     + /* copyup to next available branch */
5721     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
5722     + err = copyup_dentry(parent->d_inode,
5723     + dentry, bstart, bindex,
5724     + dentry->d_name.name,
5725     + dentry->d_name.len,
5726     + NULL, size);
5727     + if (!err)
5728     + break;
5729     + }
5730     + if (err)
5731     + goto out;
5732     + /* get updated lower_dentry/inode after copyup */
5733     + lower_dentry = unionfs_lower_dentry(dentry);
5734     + lower_inode = unionfs_lower_inode(inode);
5735     + }
5736     +
5737     + /*
5738     + * If shrinking, first truncate upper level to cancel writing dirty
5739     + * pages beyond the new eof; and also if its' maxbytes is more
5740     + * limiting (fail with -EFBIG before making any change to the lower
5741     + * level). There is no need to vmtruncate the upper level
5742     + * afterwards in the other cases: we fsstack_copy_inode_size from
5743     + * the lower level.
5744     + */
5745     + if (ia->ia_valid & ATTR_SIZE) {
5746     + size = i_size_read(inode);
5747     + if (ia->ia_size < size || (ia->ia_size > size &&
5748     + inode->i_sb->s_maxbytes < lower_inode->i_sb->s_maxbytes)) {
5749     + err = vmtruncate(inode, ia->ia_size);
5750     + if (err)
5751     + goto out;
5752     + }
5753     + }
5754     +
5755     + /* notify the (possibly copied-up) lower inode */
5756     + /*
5757     + * Note: we use lower_dentry->d_inode, because lower_inode may be
5758     + * unlinked (no inode->i_sb and i_ino==0. This happens if someone
5759     + * tries to open(), unlink(), then ftruncate() a file.
5760     + */
5761     + mutex_lock(&lower_dentry->d_inode->i_mutex);
5762     + err = notify_change(lower_dentry, ia);
5763     + mutex_unlock(&lower_dentry->d_inode->i_mutex);
5764     + if (err)
5765     + goto out;
5766     +
5767     + /* get attributes from the first lower inode */
5768     + if (ibstart(inode) >= 0)
5769     + unionfs_copy_attr_all(inode, lower_inode);
5770     + /*
5771     + * unionfs_copy_attr_all will copy the lower times to our inode if
5772     + * the lower ones are newer (useful for cache coherency). However,
5773     + * ->setattr is the only place in which we may have to copy the
5774     + * lower inode times absolutely, to support utimes(2).
5775     + */
5776     + if (ia->ia_valid & ATTR_MTIME_SET)
5777     + inode->i_mtime = lower_inode->i_mtime;
5778     + if (ia->ia_valid & ATTR_CTIME)
5779     + inode->i_ctime = lower_inode->i_ctime;
5780     + if (ia->ia_valid & ATTR_ATIME_SET)
5781     + inode->i_atime = lower_inode->i_atime;
5782     + fsstack_copy_inode_size(inode, lower_inode);
5783     +
5784     +out:
5785     + if (!err)
5786     + unionfs_check_dentry(dentry);
5787     + unionfs_unlock_dentry(dentry);
5788     + unionfs_unlock_parent(dentry, parent);
5789     + unionfs_read_unlock(dentry->d_sb);
5790     +
5791     + return err;
5792     +}
5793     +
5794     +struct inode_operations unionfs_symlink_iops = {
5795     + .readlink = unionfs_readlink,
5796     + .permission = unionfs_permission,
5797     + .follow_link = unionfs_follow_link,
5798     + .setattr = unionfs_setattr,
5799     + .put_link = unionfs_put_link,
5800     +};
5801     +
5802     +struct inode_operations unionfs_dir_iops = {
5803     + .create = unionfs_create,
5804     + .lookup = unionfs_lookup,
5805     + .link = unionfs_link,
5806     + .unlink = unionfs_unlink,
5807     + .symlink = unionfs_symlink,
5808     + .mkdir = unionfs_mkdir,
5809     + .rmdir = unionfs_rmdir,
5810     + .mknod = unionfs_mknod,
5811     + .rename = unionfs_rename,
5812     + .permission = unionfs_permission,
5813     + .setattr = unionfs_setattr,
5814     +#ifdef CONFIG_UNION_FS_XATTR
5815     + .setxattr = unionfs_setxattr,
5816     + .getxattr = unionfs_getxattr,
5817     + .removexattr = unionfs_removexattr,
5818     + .listxattr = unionfs_listxattr,
5819     +#endif /* CONFIG_UNION_FS_XATTR */
5820     +};
5821     +
5822     +struct inode_operations unionfs_main_iops = {
5823     + .permission = unionfs_permission,
5824     + .setattr = unionfs_setattr,
5825     +#ifdef CONFIG_UNION_FS_XATTR
5826     + .setxattr = unionfs_setxattr,
5827     + .getxattr = unionfs_getxattr,
5828     + .removexattr = unionfs_removexattr,
5829     + .listxattr = unionfs_listxattr,
5830     +#endif /* CONFIG_UNION_FS_XATTR */
5831     +};
5832     diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
5833     new file mode 100644
5834     index 0000000..b63c17e
5835     --- /dev/null
5836     +++ b/fs/unionfs/lookup.c
5837     @@ -0,0 +1,569 @@
5838     +/*
5839     + * Copyright (c) 2003-2010 Erez Zadok
5840     + * Copyright (c) 2003-2006 Charles P. Wright
5841     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
5842     + * Copyright (c) 2005-2006 Junjiro Okajima
5843     + * Copyright (c) 2005 Arun M. Krishnakumar
5844     + * Copyright (c) 2004-2006 David P. Quigley
5845     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
5846     + * Copyright (c) 2003 Puja Gupta
5847     + * Copyright (c) 2003 Harikesavan Krishnan
5848     + * Copyright (c) 2003-2010 Stony Brook University
5849     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
5850     + *
5851     + * This program is free software; you can redistribute it and/or modify
5852     + * it under the terms of the GNU General Public License version 2 as
5853     + * published by the Free Software Foundation.
5854     + */
5855     +
5856     +#include "union.h"
5857     +
5858     +/*
5859     + * Lookup one path component @name relative to a <base,mnt> path pair.
5860     + * Behaves nearly the same as lookup_one_len (i.e., return negative dentry
5861     + * on ENOENT), but uses the @mnt passed, so it can cross bind mounts and
5862     + * other lower mounts properly. If @new_mnt is non-null, will fill in the
5863     + * new mnt there. Caller is responsible to dput/mntput/path_put returned
5864     + * @dentry and @new_mnt.
5865     + */
5866     +struct dentry *__lookup_one(struct dentry *base, struct vfsmount *mnt,
5867     + const char *name, struct vfsmount **new_mnt)
5868     +{
5869     + struct dentry *dentry = NULL;
5870     + struct nameidata lower_nd;
5871     + int err;
5872     +
5873     + /* we use flags=0 to get basic lookup */
5874     + err = vfs_path_lookup(base, mnt, name, 0, &lower_nd);
5875     +
5876     + switch (err) {
5877     + case 0: /* no error */
5878     + dentry = lower_nd.path.dentry;
5879     + if (new_mnt)
5880     + *new_mnt = lower_nd.path.mnt; /* rc already inc'ed */
5881     + break;
5882     + case -ENOENT:
5883     + /*
5884     + * We don't consider ENOENT an error, and we want to return
5885     + * a negative dentry (ala lookup_one_len). As we know
5886     + * there was no inode for this name before (-ENOENT), then
5887     + * it's safe to call lookup_one_len (which doesn't take a
5888     + * vfsmount).
5889     + */
5890     + dentry = lookup_lck_len(name, base, strlen(name));
5891     + if (new_mnt)
5892     + *new_mnt = mntget(lower_nd.path.mnt);
5893     + break;
5894     + default: /* all other real errors */
5895     + dentry = ERR_PTR(err);
5896     + break;
5897     + }
5898     +
5899     + return dentry;
5900     +}
5901     +
5902     +/*
5903     + * This is a utility function that fills in a unionfs dentry.
5904     + * Caller must lock this dentry with unionfs_lock_dentry.
5905     + *
5906     + * Returns: 0 (ok), or -ERRNO if an error occurred.
5907     + * XXX: get rid of _partial_lookup and make callers call _lookup_full directly
5908     + */
5909     +int unionfs_partial_lookup(struct dentry *dentry, struct dentry *parent)
5910     +{
5911     + struct dentry *tmp;
5912     + int err = -ENOSYS;
5913     +
5914     + tmp = unionfs_lookup_full(dentry, parent, INTERPOSE_PARTIAL);
5915     +
5916     + if (!tmp) {
5917     + err = 0;
5918     + goto out;
5919     + }
5920     + if (IS_ERR(tmp)) {
5921     + err = PTR_ERR(tmp);
5922     + goto out;
5923     + }
5924     + /* XXX: need to change the interface */
5925     + BUG_ON(tmp != dentry);
5926     +out:
5927     + return err;
5928     +}
5929     +
5930     +/* The dentry cache is just so we have properly sized dentries. */
5931     +static struct kmem_cache *unionfs_dentry_cachep;
5932     +int unionfs_init_dentry_cache(void)
5933     +{
5934     + unionfs_dentry_cachep =
5935     + kmem_cache_create("unionfs_dentry",
5936     + sizeof(struct unionfs_dentry_info),
5937     + 0, SLAB_RECLAIM_ACCOUNT, NULL);
5938     +
5939     + return (unionfs_dentry_cachep ? 0 : -ENOMEM);
5940     +}
5941     +
5942     +void unionfs_destroy_dentry_cache(void)
5943     +{
5944     + if (unionfs_dentry_cachep)
5945     + kmem_cache_destroy(unionfs_dentry_cachep);
5946     +}
5947     +
5948     +void free_dentry_private_data(struct dentry *dentry)
5949     +{
5950     + if (!dentry || !dentry->d_fsdata)
5951     + return;
5952     + kfree(UNIONFS_D(dentry)->lower_paths);
5953     + UNIONFS_D(dentry)->lower_paths = NULL;
5954     + kmem_cache_free(unionfs_dentry_cachep, dentry->d_fsdata);
5955     + dentry->d_fsdata = NULL;
5956     +}
5957     +
5958     +static inline int __realloc_dentry_private_data(struct dentry *dentry)
5959     +{
5960     + struct unionfs_dentry_info *info = UNIONFS_D(dentry);
5961     + void *p;
5962     + int size;
5963     +
5964     + BUG_ON(!info);
5965     +
5966     + size = sizeof(struct path) * sbmax(dentry->d_sb);
5967     + p = krealloc(info->lower_paths, size, GFP_ATOMIC);
5968     + if (unlikely(!p))
5969     + return -ENOMEM;
5970     +
5971     + info->lower_paths = p;
5972     +
5973     + info->bstart = -1;
5974     + info->bend = -1;
5975     + info->bopaque = -1;
5976     + info->bcount = sbmax(dentry->d_sb);
5977     + atomic_set(&info->generation,
5978     + atomic_read(&UNIONFS_SB(dentry->d_sb)->generation));
5979     +
5980     + memset(info->lower_paths, 0, size);
5981     +
5982     + return 0;
5983     +}
5984     +
5985     +/* UNIONFS_D(dentry)->lock must be locked */
5986     +int realloc_dentry_private_data(struct dentry *dentry)
5987     +{
5988     + if (!__realloc_dentry_private_data(dentry))
5989     + return 0;
5990     +
5991     + kfree(UNIONFS_D(dentry)->lower_paths);
5992     + free_dentry_private_data(dentry);
5993     + return -ENOMEM;
5994     +}
5995     +
5996     +/* allocate new dentry private data */
5997     +int new_dentry_private_data(struct dentry *dentry, int subclass)
5998     +{
5999     + struct unionfs_dentry_info *info = UNIONFS_D(dentry);
6000     +
6001     + BUG_ON(info);
6002     +
6003     + info = kmem_cache_alloc(unionfs_dentry_cachep, GFP_ATOMIC);
6004     + if (unlikely(!info))
6005     + return -ENOMEM;
6006     +
6007     + mutex_init(&info->lock);
6008     + mutex_lock_nested(&info->lock, subclass);
6009     +
6010     + info->lower_paths = NULL;
6011     +
6012     + dentry->d_fsdata = info;
6013     +
6014     + if (!__realloc_dentry_private_data(dentry))
6015     + return 0;
6016     +
6017     + mutex_unlock(&info->lock);
6018     + free_dentry_private_data(dentry);
6019     + return -ENOMEM;
6020     +}
6021     +
6022     +/*
6023     + * scan through the lower dentry objects, and set bstart to reflect the
6024     + * starting branch
6025     + */
6026     +void update_bstart(struct dentry *dentry)
6027     +{
6028     + int bindex;
6029     + int bstart = dbstart(dentry);
6030     + int bend = dbend(dentry);
6031     + struct dentry *lower_dentry;
6032     +
6033     + for (bindex = bstart; bindex <= bend; bindex++) {
6034     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6035     + if (!lower_dentry)
6036     + continue;
6037     + if (lower_dentry->d_inode) {
6038     + dbstart(dentry) = bindex;
6039     + break;
6040     + }
6041     + dput(lower_dentry);
6042     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
6043     + }
6044     +}
6045     +
6046     +
6047     +/*
6048     + * Initialize a nameidata structure (the intent part) we can pass to a lower
6049     + * file system. Returns 0 on success or -error (only -ENOMEM possible).
6050     + * Inside that nd structure, this function may also return an allocated
6051     + * struct file (for open intents). The caller, when done with this nd, must
6052     + * kfree the intent file (using release_lower_nd).
6053     + *
6054     + * XXX: this code, and the callers of this code, should be redone using
6055     + * vfs_path_lookup() when (1) the nameidata structure is refactored into a
6056     + * separate intent-structure, and (2) open_namei() is broken into a VFS-only
6057     + * function and a method that other file systems can call.
6058     + */
6059     +int init_lower_nd(struct nameidata *nd, unsigned int flags)
6060     +{
6061     + int err = 0;
6062     +#ifdef ALLOC_LOWER_ND_FILE
6063     + /*
6064     + * XXX: one day we may need to have the lower return an open file
6065     + * for us. It is not needed in 2.6.23-rc1 for nfs2/nfs3, but may
6066     + * very well be needed for nfs4.
6067     + */
6068     + struct file *file;
6069     +#endif /* ALLOC_LOWER_ND_FILE */
6070     +
6071     + memset(nd, 0, sizeof(struct nameidata));
6072     + if (!flags)
6073     + return err;
6074     +
6075     + switch (flags) {
6076     + case LOOKUP_CREATE:
6077     + nd->intent.open.flags |= O_CREAT;
6078     + /* fall through: shared code for create/open cases */
6079     + case LOOKUP_OPEN:
6080     + nd->flags = flags;
6081     + nd->intent.open.flags |= (FMODE_READ | FMODE_WRITE);
6082     +#ifdef ALLOC_LOWER_ND_FILE
6083     + file = kzalloc(sizeof(struct file), GFP_KERNEL);
6084     + if (unlikely(!file)) {
6085     + err = -ENOMEM;
6086     + break; /* exit switch statement and thus return */
6087     + }
6088     + nd->intent.open.file = file;
6089     +#endif /* ALLOC_LOWER_ND_FILE */
6090     + break;
6091     + default:
6092     + /*
6093     + * We should never get here, for now.
6094     + * We can add new cases here later on.
6095     + */
6096     + pr_debug("unionfs: unknown nameidata flag 0x%x\n", flags);
6097     + BUG();
6098     + break;
6099     + }
6100     +
6101     + return err;
6102     +}
6103     +
6104     +void release_lower_nd(struct nameidata *nd, int err)
6105     +{
6106     + if (!nd->intent.open.file)
6107     + return;
6108     + else if (!err)
6109     + release_open_intent(nd);
6110     +#ifdef ALLOC_LOWER_ND_FILE
6111     + kfree(nd->intent.open.file);
6112     +#endif /* ALLOC_LOWER_ND_FILE */
6113     +}
6114     +
6115     +/*
6116     + * Main (and complex) driver function for Unionfs's lookup
6117     + *
6118     + * Returns: NULL (ok), ERR_PTR if an error occurred, or a non-null non-error
6119     + * PTR if d_splice returned a different dentry.
6120     + *
6121     + * If lookupmode is INTERPOSE_PARTIAL/REVAL/REVAL_NEG, the passed dentry's
6122     + * inode info must be locked. If lookupmode is INTERPOSE_LOOKUP (i.e., a
6123     + * newly looked-up dentry), then unionfs_lookup_backend will return a locked
6124     + * dentry's info, which the caller must unlock.
6125     + */
6126     +struct dentry *unionfs_lookup_full(struct dentry *dentry,
6127     + struct dentry *parent, int lookupmode)
6128     +{
6129     + int err = 0;
6130     + struct dentry *lower_dentry = NULL;
6131     + struct vfsmount *lower_mnt;
6132     + struct vfsmount *lower_dir_mnt;
6133     + struct dentry *wh_lower_dentry = NULL;
6134     + struct dentry *lower_dir_dentry = NULL;
6135     + struct dentry *d_interposed = NULL;
6136     + int bindex, bstart, bend, bopaque;
6137     + int opaque, num_positive = 0;
6138     + const char *name;
6139     + int namelen;
6140     + int pos_start, pos_end;
6141     +
6142     + /*
6143     + * We should already have a lock on this dentry in the case of a
6144     + * partial lookup, or a revalidation. Otherwise it is returned from
6145     + * new_dentry_private_data already locked.
6146     + */
6147     + verify_locked(dentry);
6148     + verify_locked(parent);
6149     +
6150     + /* must initialize dentry operations */
6151     + dentry->d_op = &unionfs_dops;
6152     +
6153     + /* We never partial lookup the root directory. */
6154     + if (IS_ROOT(dentry))
6155     + goto out;
6156     +
6157     + name = dentry->d_name.name;
6158     + namelen = dentry->d_name.len;
6159     +
6160     + /* No dentries should get created for possible whiteout names. */
6161     + if (!is_validname(name)) {
6162     + err = -EPERM;
6163     + goto out_free;
6164     + }
6165     +
6166     + /* Now start the actual lookup procedure. */
6167     + bstart = dbstart(parent);
6168     + bend = dbend(parent);
6169     + bopaque = dbopaque(parent);
6170     + BUG_ON(bstart < 0);
6171     +
6172     + /* adjust bend to bopaque if needed */
6173     + if ((bopaque >= 0) && (bopaque < bend))
6174     + bend = bopaque;
6175     +
6176     + /* lookup all possible dentries */
6177     + for (bindex = bstart; bindex <= bend; bindex++) {
6178     +
6179     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6180     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
6181     +
6182     + /* skip if we already have a positive lower dentry */
6183     + if (lower_dentry) {
6184     + if (dbstart(dentry) < 0)
6185     + dbstart(dentry) = bindex;
6186     + if (bindex > dbend(dentry))
6187     + dbend(dentry) = bindex;
6188     + if (lower_dentry->d_inode)
6189     + num_positive++;
6190     + continue;
6191     + }
6192     +
6193     + lower_dir_dentry =
6194     + unionfs_lower_dentry_idx(parent, bindex);
6195     + /* if the lower dentry's parent does not exist, skip this */
6196     + if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6197     + continue;
6198     +
6199     + /* also skip it if the parent isn't a directory. */
6200     + if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6201     + continue; /* XXX: should be BUG_ON */
6202     +
6203     + /* check for whiteouts: stop lookup if found */
6204     + wh_lower_dentry = lookup_whiteout(name, lower_dir_dentry);
6205     + if (IS_ERR(wh_lower_dentry)) {
6206     + err = PTR_ERR(wh_lower_dentry);
6207     + goto out_free;
6208     + }
6209     + if (wh_lower_dentry->d_inode) {
6210     + dbend(dentry) = dbopaque(dentry) = bindex;
6211     + if (dbstart(dentry) < 0)
6212     + dbstart(dentry) = bindex;
6213     + dput(wh_lower_dentry);
6214     + break;
6215     + }
6216     + dput(wh_lower_dentry);
6217     +
6218     + /* Now do regular lookup; lookup @name */
6219     + lower_dir_mnt = unionfs_lower_mnt_idx(parent, bindex);
6220     + lower_mnt = NULL; /* XXX: needed? */
6221     +
6222     + lower_dentry = __lookup_one(lower_dir_dentry, lower_dir_mnt,
6223     + name, &lower_mnt);
6224     +
6225     + if (IS_ERR(lower_dentry)) {
6226     + err = PTR_ERR(lower_dentry);
6227     + goto out_free;
6228     + }
6229     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6230     + if (!lower_mnt)
6231     + lower_mnt = unionfs_mntget(dentry->d_sb->s_root,
6232     + bindex);
6233     + unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6234     +
6235     + /* adjust dbstart/end */
6236     + if (dbstart(dentry) < 0)
6237     + dbstart(dentry) = bindex;
6238     + if (bindex > dbend(dentry))
6239     + dbend(dentry) = bindex;
6240     + /*
6241     + * We always store the lower dentries above, and update
6242     + * dbstart/dbend, even if the whole unionfs dentry is
6243     + * negative (i.e., no lower inodes).
6244     + */
6245     + if (!lower_dentry->d_inode)
6246     + continue;
6247     + num_positive++;
6248     +
6249     + /*
6250     + * check if we just found an opaque directory, if so, stop
6251     + * lookups here.
6252     + */
6253     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
6254     + continue;
6255     + opaque = is_opaque_dir(dentry, bindex);
6256     + if (opaque < 0) {
6257     + err = opaque;
6258     + goto out_free;
6259     + } else if (opaque) {
6260     + dbend(dentry) = dbopaque(dentry) = bindex;
6261     + break;
6262     + }
6263     + dbend(dentry) = bindex;
6264     +
6265     + /* update parent directory's atime with the bindex */
6266     + fsstack_copy_attr_atime(parent->d_inode,
6267     + lower_dir_dentry->d_inode);
6268     + }
6269     +
6270     + /* sanity checks, then decide if to process a negative dentry */
6271     + BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6272     + BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6273     +
6274     + if (num_positive > 0)
6275     + goto out_positive;
6276     +
6277     + /*** handle NEGATIVE dentries ***/
6278     +
6279     + /*
6280     + * If negative, keep only first lower negative dentry, to save on
6281     + * memory.
6282     + */
6283     + if (dbstart(dentry) < dbend(dentry)) {
6284     + path_put_lowers(dentry, dbstart(dentry) + 1,
6285     + dbend(dentry), false);
6286     + dbend(dentry) = dbstart(dentry);
6287     + }
6288     + if (lookupmode == INTERPOSE_PARTIAL)
6289     + goto out;
6290     + if (lookupmode == INTERPOSE_LOOKUP) {
6291     + /*
6292     + * If all we found was a whiteout in the first available
6293     + * branch, then create a negative dentry for a possibly new
6294     + * file to be created.
6295     + */
6296     + if (dbopaque(dentry) < 0)
6297     + goto out;
6298     + /* XXX: need to get mnt here */
6299     + bindex = dbstart(dentry);
6300     + if (unionfs_lower_dentry_idx(dentry, bindex))
6301     + goto out;
6302     + lower_dir_dentry =
6303     + unionfs_lower_dentry_idx(parent, bindex);
6304     + if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6305     + goto out;
6306     + if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6307     + goto out; /* XXX: should be BUG_ON */
6308     + /* XXX: do we need to cross bind mounts here? */
6309     + lower_dentry = lookup_lck_len(name, lower_dir_dentry, namelen);
6310     + if (IS_ERR(lower_dentry)) {
6311     + err = PTR_ERR(lower_dentry);
6312     + goto out;
6313     + }
6314     + /* XXX: need to mntget/mntput as needed too! */
6315     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6316     + /* XXX: wrong mnt for crossing bind mounts! */
6317     + lower_mnt = unionfs_mntget(dentry->d_sb->s_root, bindex);
6318     + unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6319     +
6320     + goto out;
6321     + }
6322     +
6323     + /* if we're revalidating a positive dentry, don't make it negative */
6324     + if (lookupmode != INTERPOSE_REVAL)
6325     + d_add(dentry, NULL);
6326     +
6327     + goto out;
6328     +
6329     +out_positive:
6330     + /*** handle POSITIVE dentries ***/
6331     +
6332     + /*
6333     + * This unionfs dentry is positive (at least one lower inode
6334     + * exists), so scan entire dentry from beginning to end, and remove
6335     + * any negative lower dentries, if any. Then, update dbstart/dbend
6336     + * to reflect the start/end of positive dentries.
6337     + */
6338     + pos_start = pos_end = -1;
6339     + for (bindex = bstart; bindex <= bend; bindex++) {
6340     + lower_dentry = unionfs_lower_dentry_idx(dentry,
6341     + bindex);
6342     + if (lower_dentry && lower_dentry->d_inode) {
6343     + if (pos_start < 0)
6344     + pos_start = bindex;
6345     + if (bindex > pos_end)
6346     + pos_end = bindex;
6347     + continue;
6348     + }
6349     + path_put_lowers(dentry, bindex, bindex, false);
6350     + }
6351     + if (pos_start >= 0)
6352     + dbstart(dentry) = pos_start;
6353     + if (pos_end >= 0)
6354     + dbend(dentry) = pos_end;
6355     +
6356     + /* Partial lookups need to re-interpose, or throw away older negs. */
6357     + if (lookupmode == INTERPOSE_PARTIAL) {
6358     + if (dentry->d_inode) {
6359     + unionfs_reinterpose(dentry);
6360     + goto out;
6361     + }
6362     +
6363     + /*
6364     + * This dentry was positive, so it is as if we had a
6365     + * negative revalidation.
6366     + */
6367     + lookupmode = INTERPOSE_REVAL_NEG;
6368     + update_bstart(dentry);
6369     + }
6370     +
6371     + /*
6372     + * Interpose can return a dentry if d_splice returned a different
6373     + * dentry.
6374     + */
6375     + d_interposed = unionfs_interpose(dentry, dentry->d_sb, lookupmode);
6376     + if (IS_ERR(d_interposed))
6377     + err = PTR_ERR(d_interposed);
6378     + else if (d_interposed)
6379     + dentry = d_interposed;
6380     +
6381     + if (!err)
6382     + goto out;
6383     + d_drop(dentry);
6384     +
6385     +out_free:
6386     + /* should dput/mntput all the underlying dentries on error condition */
6387     + if (dbstart(dentry) >= 0)
6388     + path_put_lowers_all(dentry, false);
6389     + /* free lower_paths unconditionally */
6390     + kfree(UNIONFS_D(dentry)->lower_paths);
6391     + UNIONFS_D(dentry)->lower_paths = NULL;
6392     +
6393     +out:
6394     + if (dentry && UNIONFS_D(dentry)) {
6395     + BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6396     + BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6397     + }
6398     + if (d_interposed && UNIONFS_D(d_interposed)) {
6399     + BUG_ON(dbstart(d_interposed) < 0 && dbend(d_interposed) >= 0);
6400     + BUG_ON(dbstart(d_interposed) >= 0 && dbend(d_interposed) < 0);
6401     + }
6402     +
6403     + if (!err && d_interposed)
6404     + return d_interposed;
6405     + return ERR_PTR(err);
6406     +}
6407     diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
6408     new file mode 100644
6409     index 0000000..258386e
6410     --- /dev/null
6411     +++ b/fs/unionfs/main.c
6412     @@ -0,0 +1,758 @@
6413     +/*
6414     + * Copyright (c) 2003-2010 Erez Zadok
6415     + * Copyright (c) 2003-2006 Charles P. Wright
6416     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
6417     + * Copyright (c) 2005-2006 Junjiro Okajima
6418     + * Copyright (c) 2005 Arun M. Krishnakumar
6419     + * Copyright (c) 2004-2006 David P. Quigley
6420     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
6421     + * Copyright (c) 2003 Puja Gupta
6422     + * Copyright (c) 2003 Harikesavan Krishnan
6423     + * Copyright (c) 2003-2010 Stony Brook University
6424     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
6425     + *
6426     + * This program is free software; you can redistribute it and/or modify
6427     + * it under the terms of the GNU General Public License version 2 as
6428     + * published by the Free Software Foundation.
6429     + */
6430     +
6431     +#include "union.h"
6432     +#include <linux/module.h>
6433     +#include <linux/moduleparam.h>
6434     +
6435     +static void unionfs_fill_inode(struct dentry *dentry,
6436     + struct inode *inode)
6437     +{
6438     + struct inode *lower_inode;
6439     + struct dentry *lower_dentry;
6440     + int bindex, bstart, bend;
6441     +
6442     + bstart = dbstart(dentry);
6443     + bend = dbend(dentry);
6444     +
6445     + for (bindex = bstart; bindex <= bend; bindex++) {
6446     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6447     + if (!lower_dentry) {
6448     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
6449     + continue;
6450     + }
6451     +
6452     + /* Initialize the lower inode to the new lower inode. */
6453     + if (!lower_dentry->d_inode)
6454     + continue;
6455     +
6456     + unionfs_set_lower_inode_idx(inode, bindex,
6457     + igrab(lower_dentry->d_inode));
6458     + }
6459     +
6460     + ibstart(inode) = dbstart(dentry);
6461     + ibend(inode) = dbend(dentry);
6462     +
6463     + /* Use attributes from the first branch. */
6464     + lower_inode = unionfs_lower_inode(inode);
6465     +
6466     + /* Use different set of inode ops for symlinks & directories */
6467     + if (S_ISLNK(lower_inode->i_mode))
6468     + inode->i_op = &unionfs_symlink_iops;
6469     + else if (S_ISDIR(lower_inode->i_mode))
6470     + inode->i_op = &unionfs_dir_iops;
6471     +
6472     + /* Use different set of file ops for directories */
6473     + if (S_ISDIR(lower_inode->i_mode))
6474     + inode->i_fop = &unionfs_dir_fops;
6475     +
6476     + /* properly initialize special inodes */
6477     + if (S_ISBLK(lower_inode->i_mode) || S_ISCHR(lower_inode->i_mode) ||
6478     + S_ISFIFO(lower_inode->i_mode) || S_ISSOCK(lower_inode->i_mode))
6479     + init_special_inode(inode, lower_inode->i_mode,
6480     + lower_inode->i_rdev);
6481     +
6482     + /* all well, copy inode attributes */
6483     + unionfs_copy_attr_all(inode, lower_inode);
6484     + fsstack_copy_inode_size(inode, lower_inode);
6485     +}
6486     +
6487     +/*
6488     + * Connect a unionfs inode dentry/inode with several lower ones. This is
6489     + * the classic stackable file system "vnode interposition" action.
6490     + *
6491     + * @sb: unionfs's super_block
6492     + */
6493     +struct dentry *unionfs_interpose(struct dentry *dentry, struct super_block *sb,
6494     + int flag)
6495     +{
6496     + int err = 0;
6497     + struct inode *inode;
6498     + int need_fill_inode = 1;
6499     + struct dentry *spliced = NULL;
6500     +
6501     + verify_locked(dentry);
6502     +
6503     + /*
6504     + * We allocate our new inode below by calling unionfs_iget,
6505     + * which will initialize some of the new inode's fields
6506     + */
6507     +
6508     + /*
6509     + * On revalidate we've already got our own inode and just need
6510     + * to fix it up.
6511     + */
6512     + if (flag == INTERPOSE_REVAL) {
6513     + inode = dentry->d_inode;
6514     + UNIONFS_I(inode)->bstart = -1;
6515     + UNIONFS_I(inode)->bend = -1;
6516     + atomic_set(&UNIONFS_I(inode)->generation,
6517     + atomic_read(&UNIONFS_SB(sb)->generation));
6518     +
6519     + UNIONFS_I(inode)->lower_inodes =
6520     + kcalloc(sbmax(sb), sizeof(struct inode *), GFP_KERNEL);
6521     + if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
6522     + err = -ENOMEM;
6523     + goto out;
6524     + }
6525     + } else {
6526     + /* get unique inode number for unionfs */
6527     + inode = unionfs_iget(sb, iunique(sb, UNIONFS_ROOT_INO));
6528     + if (IS_ERR(inode)) {
6529     + err = PTR_ERR(inode);
6530     + goto out;
6531     + }
6532     + if (atomic_read(&inode->i_count) > 1)
6533     + goto skip;
6534     + }
6535     +
6536     + need_fill_inode = 0;
6537     + unionfs_fill_inode(dentry, inode);
6538     +
6539     +skip:
6540     + /* only (our) lookup wants to do a d_add */
6541     + switch (flag) {
6542     + case INTERPOSE_DEFAULT:
6543     + /* for operations which create new inodes */
6544     + d_add(dentry, inode);
6545     + break;
6546     + case INTERPOSE_REVAL_NEG:
6547     + d_instantiate(dentry, inode);
6548     + break;
6549     + case INTERPOSE_LOOKUP:
6550     + spliced = d_splice_alias(inode, dentry);
6551     + if (spliced && spliced != dentry) {
6552     + /*
6553     + * d_splice can return a dentry if it was
6554     + * disconnected and had to be moved. We must ensure
6555     + * that the private data of the new dentry is
6556     + * correct and that the inode info was filled
6557     + * properly. Finally we must return this new
6558     + * dentry.
6559     + */
6560     + spliced->d_op = &unionfs_dops;
6561     + spliced->d_fsdata = dentry->d_fsdata;
6562     + dentry->d_fsdata = NULL;
6563     + dentry = spliced;
6564     + if (need_fill_inode) {
6565     + need_fill_inode = 0;
6566     + unionfs_fill_inode(dentry, inode);
6567     + }
6568     + goto out_spliced;
6569     + } else if (!spliced) {
6570     + if (need_fill_inode) {
6571     + need_fill_inode = 0;
6572     + unionfs_fill_inode(dentry, inode);
6573     + goto out_spliced;
6574     + }
6575     + }
6576     + break;
6577     + case INTERPOSE_REVAL:
6578     + /* Do nothing. */
6579     + break;
6580     + default:
6581     + printk(KERN_CRIT "unionfs: invalid interpose flag passed!\n");
6582     + BUG();
6583     + }
6584     + goto out;
6585     +
6586     +out_spliced:
6587     + if (!err)
6588     + return spliced;
6589     +out:
6590     + return ERR_PTR(err);
6591     +}
6592     +
6593     +/* like interpose above, but for an already existing dentry */
6594     +void unionfs_reinterpose(struct dentry *dentry)
6595     +{
6596     + struct dentry *lower_dentry;
6597     + struct inode *inode;
6598     + int bindex, bstart, bend;
6599     +
6600     + verify_locked(dentry);
6601     +
6602     + /* This is pre-allocated inode */
6603     + inode = dentry->d_inode;
6604     +
6605     + bstart = dbstart(dentry);
6606     + bend = dbend(dentry);
6607     + for (bindex = bstart; bindex <= bend; bindex++) {
6608     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6609     + if (!lower_dentry)
6610     + continue;
6611     +
6612     + if (!lower_dentry->d_inode)
6613     + continue;
6614     + if (unionfs_lower_inode_idx(inode, bindex))
6615     + continue;
6616     + unionfs_set_lower_inode_idx(inode, bindex,
6617     + igrab(lower_dentry->d_inode));
6618     + }
6619     + ibstart(inode) = dbstart(dentry);
6620     + ibend(inode) = dbend(dentry);
6621     +}
6622     +
6623     +/*
6624     + * make sure the branch we just looked up (nd) makes sense:
6625     + *
6626     + * 1) we're not trying to stack unionfs on top of unionfs
6627     + * 2) it exists
6628     + * 3) is a directory
6629     + */
6630     +int check_branch(struct nameidata *nd)
6631     +{
6632     + /* XXX: remove in ODF code -- stacking unions allowed there */
6633     + if (!strcmp(nd->path.dentry->d_sb->s_type->name, UNIONFS_NAME))
6634     + return -EINVAL;
6635     + if (!nd->path.dentry->d_inode)
6636     + return -ENOENT;
6637     + if (!S_ISDIR(nd->path.dentry->d_inode->i_mode))
6638     + return -ENOTDIR;
6639     + return 0;
6640     +}
6641     +
6642     +/* checks if two lower_dentries have overlapping branches */
6643     +static int is_branch_overlap(struct dentry *dent1, struct dentry *dent2)
6644     +{
6645     + struct dentry *dent = NULL;
6646     +
6647     + dent = dent1;
6648     + while ((dent != dent2) && (dent->d_parent != dent))
6649     + dent = dent->d_parent;
6650     +
6651     + if (dent == dent2)
6652     + return 1;
6653     +
6654     + dent = dent2;
6655     + while ((dent != dent1) && (dent->d_parent != dent))
6656     + dent = dent->d_parent;
6657     +
6658     + return (dent == dent1);
6659     +}
6660     +
6661     +/*
6662     + * Parse "ro" or "rw" options, but default to "rw" if no mode options was
6663     + * specified. Fill the mode bits in @perms. If encounter an unknown
6664     + * string, return -EINVAL. Otherwise return 0.
6665     + */
6666     +int parse_branch_mode(const char *name, int *perms)
6667     +{
6668     + if (!name || !strcmp(name, "rw")) {
6669     + *perms = MAY_READ | MAY_WRITE;
6670     + return 0;
6671     + }
6672     + if (!strcmp(name, "ro")) {
6673     + *perms = MAY_READ;
6674     + return 0;
6675     + }
6676     + return -EINVAL;
6677     +}
6678     +
6679     +/*
6680     + * parse the dirs= mount argument
6681     + *
6682     + * We don't need to lock the superblock private data's rwsem, as we get
6683     + * called only by unionfs_read_super - it is still a long time before anyone
6684     + * can even get a reference to us.
6685     + */
6686     +static int parse_dirs_option(struct super_block *sb, struct unionfs_dentry_info
6687     + *lower_root_info, char *options)
6688     +{
6689     + struct nameidata nd;
6690     + char *name;
6691     + int err = 0;
6692     + int branches = 1;
6693     + int bindex = 0;
6694     + int i = 0;
6695     + int j = 0;
6696     + struct dentry *dent1;
6697     + struct dentry *dent2;
6698     +
6699     + if (options[0] == '\0') {
6700     + printk(KERN_ERR "unionfs: no branches specified\n");
6701     + err = -EINVAL;
6702     + goto out;
6703     + }
6704     +
6705     + /*
6706     + * Each colon means we have a separator, this is really just a rough
6707     + * guess, since strsep will handle empty fields for us.
6708     + */
6709     + for (i = 0; options[i]; i++)
6710     + if (options[i] == ':')
6711     + branches++;
6712     +
6713     + /* allocate space for underlying pointers to lower dentry */
6714     + UNIONFS_SB(sb)->data =
6715     + kcalloc(branches, sizeof(struct unionfs_data), GFP_KERNEL);
6716     + if (unlikely(!UNIONFS_SB(sb)->data)) {
6717     + err = -ENOMEM;
6718     + goto out;
6719     + }
6720     +
6721     + lower_root_info->lower_paths =
6722     + kcalloc(branches, sizeof(struct path), GFP_KERNEL);
6723     + if (unlikely(!lower_root_info->lower_paths)) {
6724     + err = -ENOMEM;
6725     + goto out;
6726     + }
6727     +
6728     + /* now parsing a string such as "b1:b2=rw:b3=ro:b4" */
6729     + branches = 0;
6730     + while ((name = strsep(&options, ":")) != NULL) {
6731     + int perms;
6732     + char *mode = strchr(name, '=');
6733     +
6734     + if (!name)
6735     + continue;
6736     + if (!*name) { /* bad use of ':' (extra colons) */
6737     + err = -EINVAL;
6738     + goto out;
6739     + }
6740     +
6741     + branches++;
6742     +
6743     + /* strip off '=' if any */
6744     + if (mode)
6745     + *mode++ = '\0';
6746     +
6747     + err = parse_branch_mode(mode, &perms);
6748     + if (err) {
6749     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
6750     + "branch %d\n", mode, bindex);
6751     + goto out;
6752     + }
6753     + /* ensure that leftmost branch is writeable */
6754     + if (!bindex && !(perms & MAY_WRITE)) {
6755     + printk(KERN_ERR "unionfs: leftmost branch cannot be "
6756     + "read-only (use \"-o ro\" to create a "
6757     + "read-only union)\n");
6758     + err = -EINVAL;
6759     + goto out;
6760     + }
6761     +
6762     + err = path_lookup(name, LOOKUP_FOLLOW, &nd);
6763     + if (err) {
6764     + printk(KERN_ERR "unionfs: error accessing "
6765     + "lower directory '%s' (error %d)\n",
6766     + name, err);
6767     + goto out;
6768     + }
6769     +
6770     + err = check_branch(&nd);
6771     + if (err) {
6772     + printk(KERN_ERR "unionfs: lower directory "
6773     + "'%s' is not a valid branch\n", name);
6774     + path_put(&nd.path);
6775     + goto out;
6776     + }
6777     +
6778     + lower_root_info->lower_paths[bindex].dentry = nd.path.dentry;
6779     + lower_root_info->lower_paths[bindex].mnt = nd.path.mnt;
6780     +
6781     + set_branchperms(sb, bindex, perms);
6782     + set_branch_count(sb, bindex, 0);
6783     + new_branch_id(sb, bindex);
6784     +
6785     + if (lower_root_info->bstart < 0)
6786     + lower_root_info->bstart = bindex;
6787     + lower_root_info->bend = bindex;
6788     + bindex++;
6789     + }
6790     +
6791     + if (branches == 0) {
6792     + printk(KERN_ERR "unionfs: no branches specified\n");
6793     + err = -EINVAL;
6794     + goto out;
6795     + }
6796     +
6797     + BUG_ON(branches != (lower_root_info->bend + 1));
6798     +
6799     + /*
6800     + * Ensure that no overlaps exist in the branches.
6801     + *
6802     + * This test is required because the Linux kernel has no support
6803     + * currently for ensuring coherency between stackable layers and
6804     + * branches. If we were to allow overlapping branches, it would be
6805     + * possible, for example, to delete a file via one branch, which
6806     + * would not be reflected in another branch. Such incoherency could
6807     + * lead to inconsistencies and even kernel oopses. Rather than
6808     + * implement hacks to work around some of these cache-coherency
6809     + * problems, we prevent branch overlapping, for now. A complete
6810     + * solution will involve proper kernel/VFS support for cache
6811     + * coherency, at which time we could safely remove this
6812     + * branch-overlapping test.
6813     + */
6814     + for (i = 0; i < branches; i++) {
6815     + dent1 = lower_root_info->lower_paths[i].dentry;
6816     + for (j = i + 1; j < branches; j++) {
6817     + dent2 = lower_root_info->lower_paths[j].dentry;
6818     + if (is_branch_overlap(dent1, dent2)) {
6819     + printk(KERN_ERR "unionfs: branches %d and "
6820     + "%d overlap\n", i, j);
6821     + err = -EINVAL;
6822     + goto out;
6823     + }
6824     + }
6825     + }
6826     +
6827     +out:
6828     + if (err) {
6829     + for (i = 0; i < branches; i++)
6830     + path_put(&lower_root_info->lower_paths[i]);
6831     +
6832     + kfree(lower_root_info->lower_paths);
6833     + kfree(UNIONFS_SB(sb)->data);
6834     +
6835     + /*
6836     + * MUST clear the pointers to prevent potential double free if
6837     + * the caller dies later on
6838     + */
6839     + lower_root_info->lower_paths = NULL;
6840     + UNIONFS_SB(sb)->data = NULL;
6841     + }
6842     + return err;
6843     +}
6844     +
6845     +/*
6846     + * Parse mount options. See the manual page for usage instructions.
6847     + *
6848     + * Returns the dentry object of the lower-level (lower) directory;
6849     + * We want to mount our stackable file system on top of that lower directory.
6850     + */
6851     +static struct unionfs_dentry_info *unionfs_parse_options(
6852     + struct super_block *sb,
6853     + char *options)
6854     +{
6855     + struct unionfs_dentry_info *lower_root_info;
6856     + char *optname;
6857     + int err = 0;
6858     + int bindex;
6859     + int dirsfound = 0;
6860     +
6861     + /* allocate private data area */
6862     + err = -ENOMEM;
6863     + lower_root_info =
6864     + kzalloc(sizeof(struct unionfs_dentry_info), GFP_KERNEL);
6865     + if (unlikely(!lower_root_info))
6866     + goto out_error;
6867     + lower_root_info->bstart = -1;
6868     + lower_root_info->bend = -1;
6869     + lower_root_info->bopaque = -1;
6870     +
6871     + while ((optname = strsep(&options, ",")) != NULL) {
6872     + char *optarg;
6873     +
6874     + if (!optname || !*optname)
6875     + continue;
6876     +
6877     + optarg = strchr(optname, '=');
6878     + if (optarg)
6879     + *optarg++ = '\0';
6880     +
6881     + /*
6882     + * All of our options take an argument now. Insert ones that
6883     + * don't, above this check.
6884     + */
6885     + if (!optarg) {
6886     + printk(KERN_ERR "unionfs: %s requires an argument\n",
6887     + optname);
6888     + err = -EINVAL;
6889     + goto out_error;
6890     + }
6891     +
6892     + if (!strcmp("dirs", optname)) {
6893     + if (++dirsfound > 1) {
6894     + printk(KERN_ERR
6895     + "unionfs: multiple dirs specified\n");
6896     + err = -EINVAL;
6897     + goto out_error;
6898     + }
6899     + err = parse_dirs_option(sb, lower_root_info, optarg);
6900     + if (err)
6901     + goto out_error;
6902     + continue;
6903     + }
6904     +
6905     + err = -EINVAL;
6906     + printk(KERN_ERR
6907     + "unionfs: unrecognized option '%s'\n", optname);
6908     + goto out_error;
6909     + }
6910     + if (dirsfound != 1) {
6911     + printk(KERN_ERR "unionfs: dirs option required\n");
6912     + err = -EINVAL;
6913     + goto out_error;
6914     + }
6915     + goto out;
6916     +
6917     +out_error:
6918     + if (lower_root_info && lower_root_info->lower_paths) {
6919     + for (bindex = lower_root_info->bstart;
6920     + bindex >= 0 && bindex <= lower_root_info->bend;
6921     + bindex++)
6922     + path_put(&lower_root_info->lower_paths[bindex]);
6923     + }
6924     +
6925     + kfree(lower_root_info->lower_paths);
6926     + kfree(lower_root_info);
6927     +
6928     + kfree(UNIONFS_SB(sb)->data);
6929     + UNIONFS_SB(sb)->data = NULL;
6930     +
6931     + lower_root_info = ERR_PTR(err);
6932     +out:
6933     + return lower_root_info;
6934     +}
6935     +
6936     +/*
6937     + * our custom d_alloc_root work-alike
6938     + *
6939     + * we can't use d_alloc_root if we want to use our own interpose function
6940     + * unchanged, so we simply call our own "fake" d_alloc_root
6941     + */
6942     +static struct dentry *unionfs_d_alloc_root(struct super_block *sb)
6943     +{
6944     + struct dentry *ret = NULL;
6945     +
6946     + if (sb) {
6947     + static const struct qstr name = {
6948     + .name = "/",
6949     + .len = 1
6950     + };
6951     +
6952     + ret = d_alloc(NULL, &name);
6953     + if (likely(ret)) {
6954     + ret->d_op = &unionfs_dops;
6955     + ret->d_sb = sb;
6956     + ret->d_parent = ret;
6957     + }
6958     + }
6959     + return ret;
6960     +}
6961     +
6962     +/*
6963     + * There is no need to lock the unionfs_super_info's rwsem as there is no
6964     + * way anyone can have a reference to the superblock at this point in time.
6965     + */
6966     +static int unionfs_read_super(struct super_block *sb, void *raw_data,
6967     + int silent)
6968     +{
6969     + int err = 0;
6970     + struct unionfs_dentry_info *lower_root_info = NULL;
6971     + int bindex, bstart, bend;
6972     +
6973     + if (!raw_data) {
6974     + printk(KERN_ERR
6975     + "unionfs: read_super: missing data argument\n");
6976     + err = -EINVAL;
6977     + goto out;
6978     + }
6979     +
6980     + /* Allocate superblock private data */
6981     + sb->s_fs_info = kzalloc(sizeof(struct unionfs_sb_info), GFP_KERNEL);
6982     + if (unlikely(!UNIONFS_SB(sb))) {
6983     + printk(KERN_CRIT "unionfs: read_super: out of memory\n");
6984     + err = -ENOMEM;
6985     + goto out;
6986     + }
6987     +
6988     + UNIONFS_SB(sb)->bend = -1;
6989     + atomic_set(&UNIONFS_SB(sb)->generation, 1);
6990     + init_rwsem(&UNIONFS_SB(sb)->rwsem);
6991     + UNIONFS_SB(sb)->high_branch_id = -1; /* -1 == invalid branch ID */
6992     +
6993     + lower_root_info = unionfs_parse_options(sb, raw_data);
6994     + if (IS_ERR(lower_root_info)) {
6995     + printk(KERN_ERR
6996     + "unionfs: read_super: error while parsing options "
6997     + "(err = %ld)\n", PTR_ERR(lower_root_info));
6998     + err = PTR_ERR(lower_root_info);
6999     + lower_root_info = NULL;
7000     + goto out_free;
7001     + }
7002     + if (lower_root_info->bstart == -1) {
7003     + err = -ENOENT;
7004     + goto out_free;
7005     + }
7006     +
7007     + /* set the lower superblock field of upper superblock */
7008     + bstart = lower_root_info->bstart;
7009     + BUG_ON(bstart != 0);
7010     + sbend(sb) = bend = lower_root_info->bend;
7011     + for (bindex = bstart; bindex <= bend; bindex++) {
7012     + struct dentry *d = lower_root_info->lower_paths[bindex].dentry;
7013     + atomic_inc(&d->d_sb->s_active);
7014     + unionfs_set_lower_super_idx(sb, bindex, d->d_sb);
7015     + }
7016     +
7017     + /* max Bytes is the maximum bytes from highest priority branch */
7018     + sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
7019     +
7020     + /*
7021     + * Our c/m/atime granularity is 1 ns because we may stack on file
7022     + * systems whose granularity is as good. This is important for our
7023     + * time-based cache coherency.
7024     + */
7025     + sb->s_time_gran = 1;
7026     +
7027     + sb->s_op = &unionfs_sops;
7028     +
7029     + /* See comment next to the definition of unionfs_d_alloc_root */
7030     + sb->s_root = unionfs_d_alloc_root(sb);
7031     + if (unlikely(!sb->s_root)) {
7032     + err = -ENOMEM;
7033     + goto out_dput;
7034     + }
7035     +
7036     + /* link the upper and lower dentries */
7037     + sb->s_root->d_fsdata = NULL;
7038     + err = new_dentry_private_data(sb->s_root, UNIONFS_DMUTEX_ROOT);
7039     + if (unlikely(err))
7040     + goto out_freedpd;
7041     +
7042     + /* Set the lower dentries for s_root */
7043     + for (bindex = bstart; bindex <= bend; bindex++) {
7044     + struct dentry *d;
7045     + struct vfsmount *m;
7046     +
7047     + d = lower_root_info->lower_paths[bindex].dentry;
7048     + m = lower_root_info->lower_paths[bindex].mnt;
7049     +
7050     + unionfs_set_lower_dentry_idx(sb->s_root, bindex, d);
7051     + unionfs_set_lower_mnt_idx(sb->s_root, bindex, m);
7052     + }
7053     + dbstart(sb->s_root) = bstart;
7054     + dbend(sb->s_root) = bend;
7055     +
7056     + /* Set the generation number to one, since this is for the mount. */
7057     + atomic_set(&UNIONFS_D(sb->s_root)->generation, 1);
7058     +
7059     + /*
7060     + * Call interpose to create the upper level inode. Only
7061     + * INTERPOSE_LOOKUP can return a value other than 0 on err.
7062     + */
7063     + err = PTR_ERR(unionfs_interpose(sb->s_root, sb, 0));
7064     + unionfs_unlock_dentry(sb->s_root);
7065     + if (!err)
7066     + goto out;
7067     + /* else fall through */
7068     +
7069     +out_freedpd:
7070     + if (UNIONFS_D(sb->s_root)) {
7071     + kfree(UNIONFS_D(sb->s_root)->lower_paths);
7072     + free_dentry_private_data(sb->s_root);
7073     + }
7074     + dput(sb->s_root);
7075     +
7076     +out_dput:
7077     + if (lower_root_info && !IS_ERR(lower_root_info)) {
7078     + for (bindex = lower_root_info->bstart;
7079     + bindex <= lower_root_info->bend; bindex++) {
7080     + struct dentry *d;
7081     + d = lower_root_info->lower_paths[bindex].dentry;
7082     + /* drop refs we took earlier */
7083     + atomic_dec(&d->d_sb->s_active);
7084     + path_put(&lower_root_info->lower_paths[bindex]);
7085     + }
7086     + kfree(lower_root_info->lower_paths);
7087     + kfree(lower_root_info);
7088     + lower_root_info = NULL;
7089     + }
7090     +
7091     +out_free:
7092     + kfree(UNIONFS_SB(sb)->data);
7093     + kfree(UNIONFS_SB(sb));
7094     + sb->s_fs_info = NULL;
7095     +
7096     +out:
7097     + if (lower_root_info && !IS_ERR(lower_root_info)) {
7098     + kfree(lower_root_info->lower_paths);
7099     + kfree(lower_root_info);
7100     + }
7101     + return err;
7102     +}
7103     +
7104     +static int unionfs_get_sb(struct file_system_type *fs_type,
7105     + int flags, const char *dev_name,
7106     + void *raw_data, struct vfsmount *mnt)
7107     +{
7108     + int err;
7109     + err = get_sb_nodev(fs_type, flags, raw_data, unionfs_read_super, mnt);
7110     + if (!err)
7111     + UNIONFS_SB(mnt->mnt_sb)->dev_name =
7112     + kstrdup(dev_name, GFP_KERNEL);
7113     + return err;
7114     +}
7115     +
7116     +static struct file_system_type unionfs_fs_type = {
7117     + .owner = THIS_MODULE,
7118     + .name = UNIONFS_NAME,
7119     + .get_sb = unionfs_get_sb,
7120     + .kill_sb = generic_shutdown_super,
7121     + .fs_flags = FS_REVAL_DOT,
7122     +};
7123     +
7124     +static int __init init_unionfs_fs(void)
7125     +{
7126     + int err;
7127     +
7128     + pr_info("Registering unionfs " UNIONFS_VERSION "\n");
7129     +
7130     + err = unionfs_init_filldir_cache();
7131     + if (unlikely(err))
7132     + goto out;
7133     + err = unionfs_init_inode_cache();
7134     + if (unlikely(err))
7135     + goto out;
7136     + err = unionfs_init_dentry_cache();
7137     + if (unlikely(err))
7138     + goto out;
7139     + err = init_sioq();
7140     + if (unlikely(err))
7141     + goto out;
7142     + err = register_filesystem(&unionfs_fs_type);
7143     +out:
7144     + if (unlikely(err)) {
7145     + stop_sioq();
7146     + unionfs_destroy_filldir_cache();
7147     + unionfs_destroy_inode_cache();
7148     + unionfs_destroy_dentry_cache();
7149     + }
7150     + return err;
7151     +}
7152     +
7153     +static void __exit exit_unionfs_fs(void)
7154     +{
7155     + stop_sioq();
7156     + unionfs_destroy_filldir_cache();
7157     + unionfs_destroy_inode_cache();
7158     + unionfs_destroy_dentry_cache();
7159     + unregister_filesystem(&unionfs_fs_type);
7160     + pr_info("Completed unionfs module unload\n");
7161     +}
7162     +
7163     +MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University"
7164     + " (http://www.fsl.cs.sunysb.edu)");
7165     +MODULE_DESCRIPTION("Unionfs " UNIONFS_VERSION
7166     + " (http://unionfs.filesystems.org)");
7167     +MODULE_LICENSE("GPL");
7168     +
7169     +module_init(init_unionfs_fs);
7170     +module_exit(exit_unionfs_fs);
7171     diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
7172     new file mode 100644
7173     index 0000000..1f70535
7174     --- /dev/null
7175     +++ b/fs/unionfs/mmap.c
7176     @@ -0,0 +1,89 @@
7177     +/*
7178     + * Copyright (c) 2003-2010 Erez Zadok
7179     + * Copyright (c) 2003-2006 Charles P. Wright
7180     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7181     + * Copyright (c) 2005-2006 Junjiro Okajima
7182     + * Copyright (c) 2006 Shaya Potter
7183     + * Copyright (c) 2005 Arun M. Krishnakumar
7184     + * Copyright (c) 2004-2006 David P. Quigley
7185     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7186     + * Copyright (c) 2003 Puja Gupta
7187     + * Copyright (c) 2003 Harikesavan Krishnan
7188     + * Copyright (c) 2003-2010 Stony Brook University
7189     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
7190     + *
7191     + * This program is free software; you can redistribute it and/or modify
7192     + * it under the terms of the GNU General Public License version 2 as
7193     + * published by the Free Software Foundation.
7194     + */
7195     +
7196     +#include "union.h"
7197     +
7198     +
7199     +/*
7200     + * XXX: we need a dummy readpage handler because generic_file_mmap (which we
7201     + * use in unionfs_mmap) checks for the existence of
7202     + * mapping->a_ops->readpage, else it returns -ENOEXEC. The VFS will need to
7203     + * be fixed to allow a file system to define vm_ops->fault without any
7204     + * address_space_ops whatsoever.
7205     + *
7206     + * Otherwise, we don't want to use our readpage method at all.
7207     + */
7208     +static int unionfs_readpage(struct file *file, struct page *page)
7209     +{
7210     + BUG();
7211     + return -EINVAL;
7212     +}
7213     +
7214     +static int unionfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
7215     +{
7216     + int err;
7217     + struct file *file, *lower_file;
7218     + const struct vm_operations_struct *lower_vm_ops;
7219     + struct vm_area_struct lower_vma;
7220     +
7221     + BUG_ON(!vma);
7222     + memcpy(&lower_vma, vma, sizeof(struct vm_area_struct));
7223     + file = lower_vma.vm_file;
7224     + lower_vm_ops = UNIONFS_F(file)->lower_vm_ops;
7225     + BUG_ON(!lower_vm_ops);
7226     +
7227     + lower_file = unionfs_lower_file(file);
7228     + BUG_ON(!lower_file);
7229     + /*
7230     + * XXX: vm_ops->fault may be called in parallel. Because we have to
7231     + * resort to temporarily changing the vma->vm_file to point to the
7232     + * lower file, a concurrent invocation of unionfs_fault could see a
7233     + * different value. In this workaround, we keep a different copy of
7234     + * the vma structure in our stack, so we never expose a different
7235     + * value of the vma->vm_file called to us, even temporarily. A
7236     + * better fix would be to change the calling semantics of ->fault to
7237     + * take an explicit file pointer.
7238     + */
7239     + lower_vma.vm_file = lower_file;
7240     + err = lower_vm_ops->fault(&lower_vma, vmf);
7241     + return err;
7242     +}
7243     +
7244     +/*
7245     + * XXX: the default address_space_ops for unionfs is empty. We cannot set
7246     + * our inode->i_mapping->a_ops to NULL because too many code paths expect
7247     + * the a_ops vector to be non-NULL.
7248     + */
7249     +struct address_space_operations unionfs_aops = {
7250     + /* empty on purpose */
7251     +};
7252     +
7253     +/*
7254     + * XXX: we need a second, dummy address_space_ops vector, to be used
7255     + * temporarily during unionfs_mmap, because the latter calls
7256     + * generic_file_mmap, which checks if ->readpage exists, else returns
7257     + * -ENOEXEC.
7258     + */
7259     +struct address_space_operations unionfs_dummy_aops = {
7260     + .readpage = unionfs_readpage,
7261     +};
7262     +
7263     +struct vm_operations_struct unionfs_vm_ops = {
7264     + .fault = unionfs_fault,
7265     +};
7266     diff --git a/fs/unionfs/rdstate.c b/fs/unionfs/rdstate.c
7267     new file mode 100644
7268     index 0000000..f745fbc
7269     --- /dev/null
7270     +++ b/fs/unionfs/rdstate.c
7271     @@ -0,0 +1,285 @@
7272     +/*
7273     + * Copyright (c) 2003-2010 Erez Zadok
7274     + * Copyright (c) 2003-2006 Charles P. Wright
7275     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7276     + * Copyright (c) 2005-2006 Junjiro Okajima
7277     + * Copyright (c) 2005 Arun M. Krishnakumar
7278     + * Copyright (c) 2004-2006 David P. Quigley
7279     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7280     + * Copyright (c) 2003 Puja Gupta
7281     + * Copyright (c) 2003 Harikesavan Krishnan
7282     + * Copyright (c) 2003-2010 Stony Brook University
7283     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
7284     + *
7285     + * This program is free software; you can redistribute it and/or modify
7286     + * it under the terms of the GNU General Public License version 2 as
7287     + * published by the Free Software Foundation.
7288     + */
7289     +
7290     +#include "union.h"
7291     +
7292     +/* This file contains the routines for maintaining readdir state. */
7293     +
7294     +/*
7295     + * There are two structures here, rdstate which is a hash table
7296     + * of the second structure which is a filldir_node.
7297     + */
7298     +
7299     +/*
7300     + * This is a struct kmem_cache for filldir nodes, because we allocate a lot
7301     + * of them and they shouldn't waste memory. If the node has a small name
7302     + * (as defined by the dentry structure), then we use an inline name to
7303     + * preserve kmalloc space.
7304     + */
7305     +static struct kmem_cache *unionfs_filldir_cachep;
7306     +
7307     +int unionfs_init_filldir_cache(void)
7308     +{
7309     + unionfs_filldir_cachep =
7310     + kmem_cache_create("unionfs_filldir",
7311     + sizeof(struct filldir_node), 0,
7312     + SLAB_RECLAIM_ACCOUNT, NULL);
7313     +
7314     + return (unionfs_filldir_cachep ? 0 : -ENOMEM);
7315     +}
7316     +
7317     +void unionfs_destroy_filldir_cache(void)
7318     +{
7319     + if (unionfs_filldir_cachep)
7320     + kmem_cache_destroy(unionfs_filldir_cachep);
7321     +}
7322     +
7323     +/*
7324     + * This is a tuning parameter that tells us roughly how big to make the
7325     + * hash table in directory entries per page. This isn't perfect, but
7326     + * at least we get a hash table size that shouldn't be too overloaded.
7327     + * The following averages are based on my home directory.
7328     + * 14.44693 Overall
7329     + * 12.29 Single Page Directories
7330     + * 117.93 Multi-page directories
7331     + */
7332     +#define DENTPAGE 4096
7333     +#define DENTPERONEPAGE 12
7334     +#define DENTPERPAGE 118
7335     +#define MINHASHSIZE 1
7336     +static int guesstimate_hash_size(struct inode *inode)
7337     +{
7338     + struct inode *lower_inode;
7339     + int bindex;
7340     + int hashsize = MINHASHSIZE;
7341     +
7342     + if (UNIONFS_I(inode)->hashsize > 0)
7343     + return UNIONFS_I(inode)->hashsize;
7344     +
7345     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
7346     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
7347     + if (!lower_inode)
7348     + continue;
7349     +
7350     + if (i_size_read(lower_inode) == DENTPAGE)
7351     + hashsize += DENTPERONEPAGE;
7352     + else
7353     + hashsize += (i_size_read(lower_inode) / DENTPAGE) *
7354     + DENTPERPAGE;
7355     + }
7356     +
7357     + return hashsize;
7358     +}
7359     +
7360     +int init_rdstate(struct file *file)
7361     +{
7362     + BUG_ON(sizeof(loff_t) !=
7363     + (sizeof(unsigned int) + sizeof(unsigned int)));
7364     + BUG_ON(UNIONFS_F(file)->rdstate != NULL);
7365     +
7366     + UNIONFS_F(file)->rdstate = alloc_rdstate(file->f_path.dentry->d_inode,
7367     + fbstart(file));
7368     +
7369     + return (UNIONFS_F(file)->rdstate ? 0 : -ENOMEM);
7370     +}
7371     +
7372     +struct unionfs_dir_state *find_rdstate(struct inode *inode, loff_t fpos)
7373     +{
7374     + struct unionfs_dir_state *rdstate = NULL;
7375     + struct list_head *pos;
7376     +
7377     + spin_lock(&UNIONFS_I(inode)->rdlock);
7378     + list_for_each(pos, &UNIONFS_I(inode)->readdircache) {
7379     + struct unionfs_dir_state *r =
7380     + list_entry(pos, struct unionfs_dir_state, cache);
7381     + if (fpos == rdstate2offset(r)) {
7382     + UNIONFS_I(inode)->rdcount--;
7383     + list_del(&r->cache);
7384     + rdstate = r;
7385     + break;
7386     + }
7387     + }
7388     + spin_unlock(&UNIONFS_I(inode)->rdlock);
7389     + return rdstate;
7390     +}
7391     +
7392     +struct unionfs_dir_state *alloc_rdstate(struct inode *inode, int bindex)
7393     +{
7394     + int i = 0;
7395     + int hashsize;
7396     + unsigned long mallocsize = sizeof(struct unionfs_dir_state);
7397     + struct unionfs_dir_state *rdstate;
7398     +
7399     + hashsize = guesstimate_hash_size(inode);
7400     + mallocsize += hashsize * sizeof(struct list_head);
7401     + mallocsize = __roundup_pow_of_two(mallocsize);
7402     +
7403     + /* This should give us about 500 entries anyway. */
7404     + if (mallocsize > PAGE_SIZE)
7405     + mallocsize = PAGE_SIZE;
7406     +
7407     + hashsize = (mallocsize - sizeof(struct unionfs_dir_state)) /
7408     + sizeof(struct list_head);
7409     +
7410     + rdstate = kmalloc(mallocsize, GFP_KERNEL);
7411     + if (unlikely(!rdstate))
7412     + return NULL;
7413     +
7414     + spin_lock(&UNIONFS_I(inode)->rdlock);
7415     + if (UNIONFS_I(inode)->cookie >= (MAXRDCOOKIE - 1))
7416     + UNIONFS_I(inode)->cookie = 1;
7417     + else
7418     + UNIONFS_I(inode)->cookie++;
7419     +
7420     + rdstate->cookie = UNIONFS_I(inode)->cookie;
7421     + spin_unlock(&UNIONFS_I(inode)->rdlock);
7422     + rdstate->offset = 1;
7423     + rdstate->access = jiffies;
7424     + rdstate->bindex = bindex;
7425     + rdstate->dirpos = 0;
7426     + rdstate->hashentries = 0;
7427     + rdstate->size = hashsize;
7428     + for (i = 0; i < rdstate->size; i++)
7429     + INIT_LIST_HEAD(&rdstate->list[i]);
7430     +
7431     + return rdstate;
7432     +}
7433     +
7434     +static void free_filldir_node(struct filldir_node *node)
7435     +{
7436     + if (node->namelen >= DNAME_INLINE_LEN_MIN)
7437     + kfree(node->name);
7438     + kmem_cache_free(unionfs_filldir_cachep, node);
7439     +}
7440     +
7441     +void free_rdstate(struct unionfs_dir_state *state)
7442     +{
7443     + struct filldir_node *tmp;
7444     + int i;
7445     +
7446     + for (i = 0; i < state->size; i++) {
7447     + struct list_head *head = &(state->list[i]);
7448     + struct list_head *pos, *n;
7449     +
7450     + /* traverse the list and deallocate space */
7451     + list_for_each_safe(pos, n, head) {
7452     + tmp = list_entry(pos, struct filldir_node, file_list);
7453     + list_del(&tmp->file_list);
7454     + free_filldir_node(tmp);
7455     + }
7456     + }
7457     +
7458     + kfree(state);
7459     +}
7460     +
7461     +struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
7462     + const char *name, int namelen,
7463     + int is_whiteout)
7464     +{
7465     + int index;
7466     + unsigned int hash;
7467     + struct list_head *head;
7468     + struct list_head *pos;
7469     + struct filldir_node *cursor = NULL;
7470     + int found = 0;
7471     +
7472     + BUG_ON(namelen <= 0);
7473     +
7474     + hash = full_name_hash(name, namelen);
7475     + index = hash % rdstate->size;
7476     +
7477     + head = &(rdstate->list[index]);
7478     + list_for_each(pos, head) {
7479     + cursor = list_entry(pos, struct filldir_node, file_list);
7480     +
7481     + if (cursor->namelen == namelen && cursor->hash == hash &&
7482     + !strncmp(cursor->name, name, namelen)) {
7483     + /*
7484     + * a duplicate exists, and hence no need to create
7485     + * entry to the list
7486     + */
7487     + found = 1;
7488     +
7489     + /*
7490     + * if a duplicate is found in this branch, and is
7491     + * not due to the caller looking for an entry to
7492     + * whiteout, then the file system may be corrupted.
7493     + */
7494     + if (unlikely(!is_whiteout &&
7495     + cursor->bindex == rdstate->bindex))
7496     + printk(KERN_ERR "unionfs: filldir: possible "
7497     + "I/O error: a file is duplicated "
7498     + "in the same branch %d: %s\n",
7499     + rdstate->bindex, cursor->name);
7500     + break;
7501     + }
7502     + }
7503     +
7504     + if (!found)
7505     + cursor = NULL;
7506     +
7507     + return cursor;
7508     +}
7509     +
7510     +int add_filldir_node(struct unionfs_dir_state *rdstate, const char *name,
7511     + int namelen, int bindex, int whiteout)
7512     +{
7513     + struct filldir_node *new;
7514     + unsigned int hash;
7515     + int index;
7516     + int err = 0;
7517     + struct list_head *head;
7518     +
7519     + BUG_ON(namelen <= 0);
7520     +
7521     + hash = full_name_hash(name, namelen);
7522     + index = hash % rdstate->size;
7523     + head = &(rdstate->list[index]);
7524     +
7525     + new = kmem_cache_alloc(unionfs_filldir_cachep, GFP_KERNEL);
7526     + if (unlikely(!new)) {
7527     + err = -ENOMEM;
7528     + goto out;
7529     + }
7530     +
7531     + INIT_LIST_HEAD(&new->file_list);
7532     + new->namelen = namelen;
7533     + new->hash = hash;
7534     + new->bindex = bindex;
7535     + new->whiteout = whiteout;
7536     +
7537     + if (namelen < DNAME_INLINE_LEN_MIN) {
7538     + new->name = new->iname;
7539     + } else {
7540     + new->name = kmalloc(namelen + 1, GFP_KERNEL);
7541     + if (unlikely(!new->name)) {
7542     + kmem_cache_free(unionfs_filldir_cachep, new);
7543     + new = NULL;
7544     + goto out;
7545     + }
7546     + }
7547     +
7548     + memcpy(new->name, name, namelen);
7549     + new->name[namelen] = '\0';
7550     +
7551     + rdstate->hashentries++;
7552     +
7553     + list_add(&(new->file_list), head);
7554     +out:
7555     + return err;
7556     +}
7557     diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
7558     new file mode 100644
7559     index 0000000..936700e
7560     --- /dev/null
7561     +++ b/fs/unionfs/rename.c
7562     @@ -0,0 +1,517 @@
7563     +/*
7564     + * Copyright (c) 2003-2010 Erez Zadok
7565     + * Copyright (c) 2003-2006 Charles P. Wright
7566     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7567     + * Copyright (c) 2005-2006 Junjiro Okajima
7568     + * Copyright (c) 2005 Arun M. Krishnakumar
7569     + * Copyright (c) 2004-2006 David P. Quigley
7570     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7571     + * Copyright (c) 2003 Puja Gupta
7572     + * Copyright (c) 2003 Harikesavan Krishnan
7573     + * Copyright (c) 2003-2010 Stony Brook University
7574     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
7575     + *
7576     + * This program is free software; you can redistribute it and/or modify
7577     + * it under the terms of the GNU General Public License version 2 as
7578     + * published by the Free Software Foundation.
7579     + */
7580     +
7581     +#include "union.h"
7582     +
7583     +/*
7584     + * This is a helper function for rename, used when rename ends up with hosed
7585     + * over dentries and we need to revert.
7586     + */
7587     +static int unionfs_refresh_lower_dentry(struct dentry *dentry,
7588     + struct dentry *parent, int bindex)
7589     +{
7590     + struct dentry *lower_dentry;
7591     + struct dentry *lower_parent;
7592     + int err = 0;
7593     +
7594     + verify_locked(dentry);
7595     +
7596     + lower_parent = unionfs_lower_dentry_idx(parent, bindex);
7597     +
7598     + BUG_ON(!S_ISDIR(lower_parent->d_inode->i_mode));
7599     +
7600     + lower_dentry = lookup_one_len(dentry->d_name.name, lower_parent,
7601     + dentry->d_name.len);
7602     + if (IS_ERR(lower_dentry)) {
7603     + err = PTR_ERR(lower_dentry);
7604     + goto out;
7605     + }
7606     +
7607     + dput(unionfs_lower_dentry_idx(dentry, bindex));
7608     + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
7609     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex, NULL);
7610     +
7611     + if (!lower_dentry->d_inode) {
7612     + dput(lower_dentry);
7613     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
7614     + } else {
7615     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
7616     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
7617     + igrab(lower_dentry->d_inode));
7618     + }
7619     +
7620     +out:
7621     + return err;
7622     +}
7623     +
7624     +static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7625     + struct dentry *old_parent,
7626     + struct inode *new_dir, struct dentry *new_dentry,
7627     + struct dentry *new_parent,
7628     + int bindex)
7629     +{
7630     + int err = 0;
7631     + struct dentry *lower_old_dentry;
7632     + struct dentry *lower_new_dentry;
7633     + struct dentry *lower_old_dir_dentry;
7634     + struct dentry *lower_new_dir_dentry;
7635     + struct dentry *trap;
7636     +
7637     + lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7638     + lower_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
7639     +
7640     + if (!lower_new_dentry) {
7641     + lower_new_dentry =
7642     + create_parents(new_parent->d_inode,
7643     + new_dentry, new_dentry->d_name.name,
7644     + bindex);
7645     + if (IS_ERR(lower_new_dentry)) {
7646     + err = PTR_ERR(lower_new_dentry);
7647     + if (IS_COPYUP_ERR(err))
7648     + goto out;
7649     + printk(KERN_ERR "unionfs: error creating directory "
7650     + "tree for rename, bindex=%d err=%d\n",
7651     + bindex, err);
7652     + goto out;
7653     + }
7654     + }
7655     +
7656     + /* check for and remove whiteout, if any */
7657     + err = check_unlink_whiteout(new_dentry, lower_new_dentry, bindex);
7658     + if (err > 0) /* ignore if whiteout found and successfully removed */
7659     + err = 0;
7660     + if (err)
7661     + goto out;
7662     +
7663     + /* check of old_dentry branch is writable */
7664     + err = is_robranch_super(old_dentry->d_sb, bindex);
7665     + if (err)
7666     + goto out;
7667     +
7668     + dget(lower_old_dentry);
7669     + dget(lower_new_dentry);
7670     + lower_old_dir_dentry = dget_parent(lower_old_dentry);
7671     + lower_new_dir_dentry = dget_parent(lower_new_dentry);
7672     +
7673     + trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7674     + /* source should not be ancenstor of target */
7675     + if (trap == lower_old_dentry) {
7676     + err = -EINVAL;
7677     + goto out_err_unlock;
7678     + }
7679     + /* target should not be ancenstor of source */
7680     + if (trap == lower_new_dentry) {
7681     + err = -ENOTEMPTY;
7682     + goto out_err_unlock;
7683     + }
7684     + err = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
7685     + lower_new_dir_dentry->d_inode, lower_new_dentry);
7686     +out_err_unlock:
7687     + if (!err) {
7688     + /* update parent dir times */
7689     + fsstack_copy_attr_times(old_dir, lower_old_dir_dentry->d_inode);
7690     + fsstack_copy_attr_times(new_dir, lower_new_dir_dentry->d_inode);
7691     + }
7692     + unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7693     +
7694     + dput(lower_old_dir_dentry);
7695     + dput(lower_new_dir_dentry);
7696     + dput(lower_old_dentry);
7697     + dput(lower_new_dentry);
7698     +
7699     +out:
7700     + if (!err) {
7701     + /* Fixup the new_dentry. */
7702     + if (bindex < dbstart(new_dentry))
7703     + dbstart(new_dentry) = bindex;
7704     + else if (bindex > dbend(new_dentry))
7705     + dbend(new_dentry) = bindex;
7706     + }
7707     +
7708     + return err;
7709     +}
7710     +
7711     +/*
7712     + * Main rename code. This is sufficiently complex, that it's documented in
7713     + * Documentation/filesystems/unionfs/rename.txt. This routine calls
7714     + * __unionfs_rename() above to perform some of the work.
7715     + */
7716     +static int do_unionfs_rename(struct inode *old_dir,
7717     + struct dentry *old_dentry,
7718     + struct dentry *old_parent,
7719     + struct inode *new_dir,
7720     + struct dentry *new_dentry,
7721     + struct dentry *new_parent)
7722     +{
7723     + int err = 0;
7724     + int bindex;
7725     + int old_bstart, old_bend;
7726     + int new_bstart, new_bend;
7727     + int do_copyup = -1;
7728     + int local_err = 0;
7729     + int eio = 0;
7730     + int revert = 0;
7731     +
7732     + old_bstart = dbstart(old_dentry);
7733     + old_bend = dbend(old_dentry);
7734     +
7735     + new_bstart = dbstart(new_dentry);
7736     + new_bend = dbend(new_dentry);
7737     +
7738     + /* Rename source to destination. */
7739     + err = __unionfs_rename(old_dir, old_dentry, old_parent,
7740     + new_dir, new_dentry, new_parent,
7741     + old_bstart);
7742     + if (err) {
7743     + if (!IS_COPYUP_ERR(err))
7744     + goto out;
7745     + do_copyup = old_bstart - 1;
7746     + } else {
7747     + revert = 1;
7748     + }
7749     +
7750     + /*
7751     + * Unlink all instances of destination that exist to the left of
7752     + * bstart of source. On error, revert back, goto out.
7753     + */
7754     + for (bindex = old_bstart - 1; bindex >= new_bstart; bindex--) {
7755     + struct dentry *unlink_dentry;
7756     + struct dentry *unlink_dir_dentry;
7757     +
7758     + BUG_ON(bindex < 0);
7759     + unlink_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7760     + if (!unlink_dentry)
7761     + continue;
7762     +
7763     + unlink_dir_dentry = lock_parent(unlink_dentry);
7764     + err = is_robranch_super(old_dir->i_sb, bindex);
7765     + if (!err)
7766     + err = vfs_unlink(unlink_dir_dentry->d_inode,
7767     + unlink_dentry);
7768     +
7769     + fsstack_copy_attr_times(new_parent->d_inode,
7770     + unlink_dir_dentry->d_inode);
7771     + /* propagate number of hard-links */
7772     + new_parent->d_inode->i_nlink =
7773     + unionfs_get_nlinks(new_parent->d_inode);
7774     +
7775     + unlock_dir(unlink_dir_dentry);
7776     + if (!err) {
7777     + if (bindex != new_bstart) {
7778     + dput(unlink_dentry);
7779     + unionfs_set_lower_dentry_idx(new_dentry,
7780     + bindex, NULL);
7781     + }
7782     + } else if (IS_COPYUP_ERR(err)) {
7783     + do_copyup = bindex - 1;
7784     + } else if (revert) {
7785     + goto revert;
7786     + }
7787     + }
7788     +
7789     + if (do_copyup != -1) {
7790     + for (bindex = do_copyup; bindex >= 0; bindex--) {
7791     + /*
7792     + * copyup the file into some left directory, so that
7793     + * you can rename it
7794     + */
7795     + err = copyup_dentry(old_parent->d_inode,
7796     + old_dentry, old_bstart, bindex,
7797     + old_dentry->d_name.name,
7798     + old_dentry->d_name.len, NULL,
7799     + i_size_read(old_dentry->d_inode));
7800     + /* if copyup failed, try next branch to the left */
7801     + if (err)
7802     + continue;
7803     + /*
7804     + * create whiteout before calling __unionfs_rename
7805     + * because the latter will change the old_dentry's
7806     + * lower name and parent dir, resulting in the
7807     + * whiteout getting created in the wrong dir.
7808     + */
7809     + err = create_whiteout(old_dentry, bindex);
7810     + if (err) {
7811     + printk(KERN_ERR "unionfs: can't create a "
7812     + "whiteout for %s in rename (err=%d)\n",
7813     + old_dentry->d_name.name, err);
7814     + continue;
7815     + }
7816     + err = __unionfs_rename(old_dir, old_dentry, old_parent,
7817     + new_dir, new_dentry, new_parent,
7818     + bindex);
7819     + break;
7820     + }
7821     + }
7822     +
7823     + /* make it opaque */
7824     + if (S_ISDIR(old_dentry->d_inode->i_mode)) {
7825     + err = make_dir_opaque(old_dentry, dbstart(old_dentry));
7826     + if (err)
7827     + goto revert;
7828     + }
7829     +
7830     + /*
7831     + * Create whiteout for source, only if:
7832     + * (1) There is more than one underlying instance of source.
7833     + * (We did a copy_up is taken care of above).
7834     + */
7835     + if ((old_bstart != old_bend) && (do_copyup == -1)) {
7836     + err = create_whiteout(old_dentry, old_bstart);
7837     + if (err) {
7838     + /* can't fix anything now, so we exit with -EIO */
7839     + printk(KERN_ERR "unionfs: can't create a whiteout for "
7840     + "%s in rename!\n", old_dentry->d_name.name);
7841     + err = -EIO;
7842     + }
7843     + }
7844     +
7845     +out:
7846     + return err;
7847     +
7848     +revert:
7849     + /* Do revert here. */
7850     + local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
7851     + old_bstart);
7852     + if (local_err) {
7853     + printk(KERN_ERR "unionfs: revert failed in rename: "
7854     + "the new refresh failed\n");
7855     + eio = -EIO;
7856     + }
7857     +
7858     + local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
7859     + old_bstart);
7860     + if (local_err) {
7861     + printk(KERN_ERR "unionfs: revert failed in rename: "
7862     + "the old refresh failed\n");
7863     + eio = -EIO;
7864     + goto revert_out;
7865     + }
7866     +
7867     + if (!unionfs_lower_dentry_idx(new_dentry, bindex) ||
7868     + !unionfs_lower_dentry_idx(new_dentry, bindex)->d_inode) {
7869     + printk(KERN_ERR "unionfs: revert failed in rename: "
7870     + "the object disappeared from under us!\n");
7871     + eio = -EIO;
7872     + goto revert_out;
7873     + }
7874     +
7875     + if (unionfs_lower_dentry_idx(old_dentry, bindex) &&
7876     + unionfs_lower_dentry_idx(old_dentry, bindex)->d_inode) {
7877     + printk(KERN_ERR "unionfs: revert failed in rename: "
7878     + "the object was created underneath us!\n");
7879     + eio = -EIO;
7880     + goto revert_out;
7881     + }
7882     +
7883     + local_err = __unionfs_rename(new_dir, new_dentry, new_parent,
7884     + old_dir, old_dentry, old_parent,
7885     + old_bstart);
7886     +
7887     + /* If we can't fix it, then we cop-out with -EIO. */
7888     + if (local_err) {
7889     + printk(KERN_ERR "unionfs: revert failed in rename!\n");
7890     + eio = -EIO;
7891     + }
7892     +
7893     + local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
7894     + bindex);
7895     + if (local_err)
7896     + eio = -EIO;
7897     + local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
7898     + bindex);
7899     + if (local_err)
7900     + eio = -EIO;
7901     +
7902     +revert_out:
7903     + if (eio)
7904     + err = eio;
7905     + return err;
7906     +}
7907     +
7908     +/*
7909     + * We can't copyup a directory, because it may involve huge numbers of
7910     + * children, etc. Doing that in the kernel would be bad, so instead we
7911     + * return EXDEV to the user-space utility that caused this, and let the
7912     + * user-space recurse and ask us to copy up each file separately.
7913     + */
7914     +static int may_rename_dir(struct dentry *dentry, struct dentry *parent)
7915     +{
7916     + int err, bstart;
7917     +
7918     + err = check_empty(dentry, parent, NULL);
7919     + if (err == -ENOTEMPTY) {
7920     + if (is_robranch(dentry))
7921     + return -EXDEV;
7922     + } else if (err) {
7923     + return err;
7924     + }
7925     +
7926     + bstart = dbstart(dentry);
7927     + if (dbend(dentry) == bstart || dbopaque(dentry) == bstart)
7928     + return 0;
7929     +
7930     + dbstart(dentry) = bstart + 1;
7931     + err = check_empty(dentry, parent, NULL);
7932     + dbstart(dentry) = bstart;
7933     + if (err == -ENOTEMPTY)
7934     + err = -EXDEV;
7935     + return err;
7936     +}
7937     +
7938     +/*
7939     + * The locking rules in unionfs_rename are complex. We could use a simpler
7940     + * superblock-level name-space lock for renames and copy-ups.
7941     + */
7942     +int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7943     + struct inode *new_dir, struct dentry *new_dentry)
7944     +{
7945     + int err = 0;
7946     + struct dentry *wh_dentry;
7947     + struct dentry *old_parent, *new_parent;
7948     + int valid = true;
7949     +
7950     + unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
7951     + old_parent = dget_parent(old_dentry);
7952     + new_parent = dget_parent(new_dentry);
7953     + /* un/lock parent dentries only if they differ from old/new_dentry */
7954     + if (old_parent != old_dentry &&
7955     + old_parent != new_dentry)
7956     + unionfs_lock_dentry(old_parent, UNIONFS_DMUTEX_REVAL_PARENT);
7957     + if (new_parent != old_dentry &&
7958     + new_parent != new_dentry &&
7959     + new_parent != old_parent)
7960     + unionfs_lock_dentry(new_parent, UNIONFS_DMUTEX_REVAL_CHILD);
7961     + unionfs_double_lock_dentry(old_dentry, new_dentry);
7962     +
7963     + valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
7964     + if (!valid) {
7965     + err = -ESTALE;
7966     + goto out;
7967     + }
7968     + if (!d_deleted(new_dentry) && new_dentry->d_inode) {
7969     + valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
7970     + if (!valid) {
7971     + err = -ESTALE;
7972     + goto out;
7973     + }
7974     + }
7975     +
7976     + if (!S_ISDIR(old_dentry->d_inode->i_mode))
7977     + err = unionfs_partial_lookup(old_dentry, old_parent);
7978     + else
7979     + err = may_rename_dir(old_dentry, old_parent);
7980     +
7981     + if (err)
7982     + goto out;
7983     +
7984     + err = unionfs_partial_lookup(new_dentry, new_parent);
7985     + if (err)
7986     + goto out;
7987     +
7988     + /*
7989     + * if new_dentry is already lower because of whiteout,
7990     + * simply override it even if the whited-out dir is not empty.
7991     + */
7992     + wh_dentry = find_first_whiteout(new_dentry);
7993     + if (!IS_ERR(wh_dentry)) {
7994     + dput(wh_dentry);
7995     + } else if (new_dentry->d_inode) {
7996     + if (S_ISDIR(old_dentry->d_inode->i_mode) !=
7997     + S_ISDIR(new_dentry->d_inode->i_mode)) {
7998     + err = S_ISDIR(old_dentry->d_inode->i_mode) ?
7999     + -ENOTDIR : -EISDIR;
8000     + goto out;
8001     + }
8002     +
8003     + if (S_ISDIR(new_dentry->d_inode->i_mode)) {
8004     + struct unionfs_dir_state *namelist = NULL;
8005     + /* check if this unionfs directory is empty or not */
8006     + err = check_empty(new_dentry, new_parent, &namelist);
8007     + if (err)
8008     + goto out;
8009     +
8010     + if (!is_robranch(new_dentry))
8011     + err = delete_whiteouts(new_dentry,
8012     + dbstart(new_dentry),
8013     + namelist);
8014     +
8015     + free_rdstate(namelist);
8016     +
8017     + if (err)
8018     + goto out;
8019     + }
8020     + }
8021     +
8022     + err = do_unionfs_rename(old_dir, old_dentry, old_parent,
8023     + new_dir, new_dentry, new_parent);
8024     + if (err)
8025     + goto out;
8026     +
8027     + /*
8028     + * force re-lookup since the dir on ro branch is not renamed, and
8029     + * lower dentries still indicate the un-renamed ones.
8030     + */
8031     + if (S_ISDIR(old_dentry->d_inode->i_mode))
8032     + atomic_dec(&UNIONFS_D(old_dentry)->generation);
8033     + else
8034     + unionfs_postcopyup_release(old_dentry);
8035     + if (new_dentry->d_inode && !S_ISDIR(new_dentry->d_inode->i_mode)) {
8036     + unionfs_postcopyup_release(new_dentry);
8037     + unionfs_postcopyup_setmnt(new_dentry);
8038     + if (!unionfs_lower_inode(new_dentry->d_inode)) {
8039     + /*
8040     + * If we get here, it means that no copyup was
8041     + * needed, and that a file by the old name already
8042     + * existing on the destination branch; that file got
8043     + * renamed earlier in this function, so all we need
8044     + * to do here is set the lower inode.
8045     + */
8046     + struct inode *inode;
8047     + inode = unionfs_lower_inode(old_dentry->d_inode);
8048     + igrab(inode);
8049     + unionfs_set_lower_inode_idx(new_dentry->d_inode,
8050     + dbstart(new_dentry),
8051     + inode);
8052     + }
8053     + }
8054     + /* if all of this renaming succeeded, update our times */
8055     + unionfs_copy_attr_times(old_dentry->d_inode);
8056     + unionfs_copy_attr_times(new_dentry->d_inode);
8057     + unionfs_check_inode(old_dir);
8058     + unionfs_check_inode(new_dir);
8059     + unionfs_check_dentry(old_dentry);
8060     + unionfs_check_dentry(new_dentry);
8061     +
8062     +out:
8063     + if (err) /* clear the new_dentry stuff created */
8064     + d_drop(new_dentry);
8065     +
8066     + unionfs_double_unlock_dentry(old_dentry, new_dentry);
8067     + if (new_parent != old_dentry &&
8068     + new_parent != new_dentry &&
8069     + new_parent != old_parent)
8070     + unionfs_unlock_dentry(new_parent);
8071     + if (old_parent != old_dentry &&
8072     + old_parent != new_dentry)
8073     + unionfs_unlock_dentry(old_parent);
8074     + dput(new_parent);
8075     + dput(old_parent);
8076     + unionfs_read_unlock(old_dentry->d_sb);
8077     +
8078     + return err;
8079     +}
8080     diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
8081     new file mode 100644
8082     index 0000000..760c580
8083     --- /dev/null
8084     +++ b/fs/unionfs/sioq.c
8085     @@ -0,0 +1,101 @@
8086     +/*
8087     + * Copyright (c) 2006-2010 Erez Zadok
8088     + * Copyright (c) 2006 Charles P. Wright
8089     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8090     + * Copyright (c) 2006 Junjiro Okajima
8091     + * Copyright (c) 2006 David P. Quigley
8092     + * Copyright (c) 2006-2010 Stony Brook University
8093     + * Copyright (c) 2006-2010 The Research Foundation of SUNY
8094     + *
8095     + * This program is free software; you can redistribute it and/or modify
8096     + * it under the terms of the GNU General Public License version 2 as
8097     + * published by the Free Software Foundation.
8098     + */
8099     +
8100     +#include "union.h"
8101     +
8102     +/*
8103     + * Super-user IO work Queue - sometimes we need to perform actions which
8104     + * would fail due to the unix permissions on the parent directory (e.g.,
8105     + * rmdir a directory which appears empty, but in reality contains
8106     + * whiteouts).
8107     + */
8108     +
8109     +static struct workqueue_struct *superio_workqueue;
8110     +
8111     +int __init init_sioq(void)
8112     +{
8113     + int err;
8114     +
8115     + superio_workqueue = create_workqueue("unionfs_siod");
8116     + if (!IS_ERR(superio_workqueue))
8117     + return 0;
8118     +
8119     + err = PTR_ERR(superio_workqueue);
8120     + printk(KERN_ERR "unionfs: create_workqueue failed %d\n", err);
8121     + superio_workqueue = NULL;
8122     + return err;
8123     +}
8124     +
8125     +void stop_sioq(void)
8126     +{
8127     + if (superio_workqueue)
8128     + destroy_workqueue(superio_workqueue);
8129     +}
8130     +
8131     +void run_sioq(work_func_t func, struct sioq_args *args)
8132     +{
8133     + INIT_WORK(&args->work, func);
8134     +
8135     + init_completion(&args->comp);
8136     + while (!queue_work(superio_workqueue, &args->work)) {
8137     + /* TODO: do accounting if needed */
8138     + schedule();
8139     + }
8140     + wait_for_completion(&args->comp);
8141     +}
8142     +
8143     +void __unionfs_create(struct work_struct *work)
8144     +{
8145     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8146     + struct create_args *c = &args->create;
8147     +
8148     + args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
8149     + complete(&args->comp);
8150     +}
8151     +
8152     +void __unionfs_mkdir(struct work_struct *work)
8153     +{
8154     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8155     + struct mkdir_args *m = &args->mkdir;
8156     +
8157     + args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
8158     + complete(&args->comp);
8159     +}
8160     +
8161     +void __unionfs_mknod(struct work_struct *work)
8162     +{
8163     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8164     + struct mknod_args *m = &args->mknod;
8165     +
8166     + args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
8167     + complete(&args->comp);
8168     +}
8169     +
8170     +void __unionfs_symlink(struct work_struct *work)
8171     +{
8172     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8173     + struct symlink_args *s = &args->symlink;
8174     +
8175     + args->err = vfs_symlink(s->parent, s->dentry, s->symbuf);
8176     + complete(&args->comp);
8177     +}
8178     +
8179     +void __unionfs_unlink(struct work_struct *work)
8180     +{
8181     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8182     + struct unlink_args *u = &args->unlink;
8183     +
8184     + args->err = vfs_unlink(u->parent, u->dentry);
8185     + complete(&args->comp);
8186     +}
8187     diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
8188     new file mode 100644
8189     index 0000000..b26d248
8190     --- /dev/null
8191     +++ b/fs/unionfs/sioq.h
8192     @@ -0,0 +1,91 @@
8193     +/*
8194     + * Copyright (c) 2006-2010 Erez Zadok
8195     + * Copyright (c) 2006 Charles P. Wright
8196     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8197     + * Copyright (c) 2006 Junjiro Okajima
8198     + * Copyright (c) 2006 David P. Quigley
8199     + * Copyright (c) 2006-2010 Stony Brook University
8200     + * Copyright (c) 2006-2010 The Research Foundation of SUNY
8201     + *
8202     + * This program is free software; you can redistribute it and/or modify
8203     + * it under the terms of the GNU General Public License version 2 as
8204     + * published by the Free Software Foundation.
8205     + */
8206     +
8207     +#ifndef _SIOQ_H
8208     +#define _SIOQ_H
8209     +
8210     +struct deletewh_args {
8211     + struct unionfs_dir_state *namelist;
8212     + struct dentry *dentry;
8213     + int bindex;
8214     +};
8215     +
8216     +struct is_opaque_args {
8217     + struct dentry *dentry;
8218     +};
8219     +
8220     +struct create_args {
8221     + struct inode *parent;
8222     + struct dentry *dentry;
8223     + umode_t mode;
8224     + struct nameidata *nd;
8225     +};
8226     +
8227     +struct mkdir_args {
8228     + struct inode *parent;
8229     + struct dentry *dentry;
8230     + umode_t mode;
8231     +};
8232     +
8233     +struct mknod_args {
8234     + struct inode *parent;
8235     + struct dentry *dentry;
8236     + umode_t mode;
8237     + dev_t dev;
8238     +};
8239     +
8240     +struct symlink_args {
8241     + struct inode *parent;
8242     + struct dentry *dentry;
8243     + char *symbuf;
8244     +};
8245     +
8246     +struct unlink_args {
8247     + struct inode *parent;
8248     + struct dentry *dentry;
8249     +};
8250     +
8251     +
8252     +struct sioq_args {
8253     + struct completion comp;
8254     + struct work_struct work;
8255     + int err;
8256     + void *ret;
8257     +
8258     + union {
8259     + struct deletewh_args deletewh;
8260     + struct is_opaque_args is_opaque;
8261     + struct create_args create;
8262     + struct mkdir_args mkdir;
8263     + struct mknod_args mknod;
8264     + struct symlink_args symlink;
8265     + struct unlink_args unlink;
8266     + };
8267     +};
8268     +
8269     +/* Extern definitions for SIOQ functions */
8270     +extern int __init init_sioq(void);
8271     +extern void stop_sioq(void);
8272     +extern void run_sioq(work_func_t func, struct sioq_args *args);
8273     +
8274     +/* Extern definitions for our privilege escalation helpers */
8275     +extern void __unionfs_create(struct work_struct *work);
8276     +extern void __unionfs_mkdir(struct work_struct *work);
8277     +extern void __unionfs_mknod(struct work_struct *work);
8278     +extern void __unionfs_symlink(struct work_struct *work);
8279     +extern void __unionfs_unlink(struct work_struct *work);
8280     +extern void __delete_whiteouts(struct work_struct *work);
8281     +extern void __is_opaque_dir(struct work_struct *work);
8282     +
8283     +#endif /* not _SIOQ_H */
8284     diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
8285     new file mode 100644
8286     index 0000000..570a344
8287     --- /dev/null
8288     +++ b/fs/unionfs/subr.c
8289     @@ -0,0 +1,95 @@
8290     +/*
8291     + * Copyright (c) 2003-2010 Erez Zadok
8292     + * Copyright (c) 2003-2006 Charles P. Wright
8293     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8294     + * Copyright (c) 2005-2006 Junjiro Okajima
8295     + * Copyright (c) 2005 Arun M. Krishnakumar
8296     + * Copyright (c) 2004-2006 David P. Quigley
8297     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8298     + * Copyright (c) 2003 Puja Gupta
8299     + * Copyright (c) 2003 Harikesavan Krishnan
8300     + * Copyright (c) 2003-2010 Stony Brook University
8301     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
8302     + *
8303     + * This program is free software; you can redistribute it and/or modify
8304     + * it under the terms of the GNU General Public License version 2 as
8305     + * published by the Free Software Foundation.
8306     + */
8307     +
8308     +#include "union.h"
8309     +
8310     +/*
8311     + * returns the right n_link value based on the inode type
8312     + */
8313     +int unionfs_get_nlinks(const struct inode *inode)
8314     +{
8315     + /* don't bother to do all the work since we're unlinked */
8316     + if (inode->i_nlink == 0)
8317     + return 0;
8318     +
8319     + if (!S_ISDIR(inode->i_mode))
8320     + return unionfs_lower_inode(inode)->i_nlink;
8321     +
8322     + /*
8323     + * For directories, we return 1. The only place that could cares
8324     + * about links is readdir, and there's d_type there so even that
8325     + * doesn't matter.
8326     + */
8327     + return 1;
8328     +}
8329     +
8330     +/* copy a/m/ctime from the lower branch with the newest times */
8331     +void unionfs_copy_attr_times(struct inode *upper)
8332     +{
8333     + int bindex;
8334     + struct inode *lower;
8335     +
8336     + if (!upper)
8337     + return;
8338     + if (ibstart(upper) < 0) {
8339     +#ifdef CONFIG_UNION_FS_DEBUG
8340     + WARN_ON(ibstart(upper) < 0);
8341     +#endif /* CONFIG_UNION_FS_DEBUG */
8342     + return;
8343     + }
8344     + for (bindex = ibstart(upper); bindex <= ibend(upper); bindex++) {
8345     + lower = unionfs_lower_inode_idx(upper, bindex);
8346     + if (!lower)
8347     + continue; /* not all lower dir objects may exist */
8348     + if (unlikely(timespec_compare(&upper->i_mtime,
8349     + &lower->i_mtime) < 0))
8350     + upper->i_mtime = lower->i_mtime;
8351     + if (unlikely(timespec_compare(&upper->i_ctime,
8352     + &lower->i_ctime) < 0))
8353     + upper->i_ctime = lower->i_ctime;
8354     + if (unlikely(timespec_compare(&upper->i_atime,
8355     + &lower->i_atime) < 0))
8356     + upper->i_atime = lower->i_atime;
8357     + }
8358     +}
8359     +
8360     +/*
8361     + * A unionfs/fanout version of fsstack_copy_attr_all. Uses a
8362     + * unionfs_get_nlinks to properly calcluate the number of links to a file.
8363     + * Also, copies the max() of all a/m/ctimes for all lower inodes (which is
8364     + * important if the lower inode is a directory type)
8365     + */
8366     +void unionfs_copy_attr_all(struct inode *dest,
8367     + const struct inode *src)
8368     +{
8369     + dest->i_mode = src->i_mode;
8370     + dest->i_uid = src->i_uid;
8371     + dest->i_gid = src->i_gid;
8372     + dest->i_rdev = src->i_rdev;
8373     +
8374     + unionfs_copy_attr_times(dest);
8375     +
8376     + dest->i_blkbits = src->i_blkbits;
8377     + dest->i_flags = src->i_flags;
8378     +
8379     + /*
8380     + * Update the nlinks AFTER updating the above fields, because the
8381     + * get_links callback may depend on them.
8382     + */
8383     + dest->i_nlink = unionfs_get_nlinks(dest);
8384     +}
8385     diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
8386     new file mode 100644
8387     index 0000000..b8cabec
8388     --- /dev/null
8389     +++ b/fs/unionfs/super.c
8390     @@ -0,0 +1,1026 @@
8391     +/*
8392     + * Copyright (c) 2003-2010 Erez Zadok
8393     + * Copyright (c) 2003-2006 Charles P. Wright
8394     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8395     + * Copyright (c) 2005-2006 Junjiro Okajima
8396     + * Copyright (c) 2005 Arun M. Krishnakumar
8397     + * Copyright (c) 2004-2006 David P. Quigley
8398     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8399     + * Copyright (c) 2003 Puja Gupta
8400     + * Copyright (c) 2003 Harikesavan Krishnan
8401     + * Copyright (c) 2003-2010 Stony Brook University
8402     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
8403     + *
8404     + * This program is free software; you can redistribute it and/or modify
8405     + * it under the terms of the GNU General Public License version 2 as
8406     + * published by the Free Software Foundation.
8407     + */
8408     +
8409     +#include "union.h"
8410     +
8411     +/*
8412     + * The inode cache is used with alloc_inode for both our inode info and the
8413     + * vfs inode.
8414     + */
8415     +static struct kmem_cache *unionfs_inode_cachep;
8416     +
8417     +struct inode *unionfs_iget(struct super_block *sb, unsigned long ino)
8418     +{
8419     + int size;
8420     + struct unionfs_inode_info *info;
8421     + struct inode *inode;
8422     +
8423     + inode = iget_locked(sb, ino);
8424     + if (!inode)
8425     + return ERR_PTR(-ENOMEM);
8426     + if (!(inode->i_state & I_NEW))
8427     + return inode;
8428     +
8429     + info = UNIONFS_I(inode);
8430     + memset(info, 0, offsetof(struct unionfs_inode_info, vfs_inode));
8431     + info->bstart = -1;
8432     + info->bend = -1;
8433     + atomic_set(&info->generation,
8434     + atomic_read(&UNIONFS_SB(inode->i_sb)->generation));
8435     + spin_lock_init(&info->rdlock);
8436     + info->rdcount = 1;
8437     + info->hashsize = -1;
8438     + INIT_LIST_HEAD(&info->readdircache);
8439     +
8440     + size = sbmax(inode->i_sb) * sizeof(struct inode *);
8441     + info->lower_inodes = kzalloc(size, GFP_KERNEL);
8442     + if (unlikely(!info->lower_inodes)) {
8443     + printk(KERN_CRIT "unionfs: no kernel memory when allocating "
8444     + "lower-pointer array!\n");
8445     + iget_failed(inode);
8446     + return ERR_PTR(-ENOMEM);
8447     + }
8448     +
8449     + inode->i_version++;
8450     + inode->i_op = &unionfs_main_iops;
8451     + inode->i_fop = &unionfs_main_fops;
8452     +
8453     + inode->i_mapping->a_ops = &unionfs_aops;
8454     +
8455     + /*
8456     + * reset times so unionfs_copy_attr_all can keep out time invariants
8457     + * right (upper inode time being the max of all lower ones).
8458     + */
8459     + inode->i_atime.tv_sec = inode->i_atime.tv_nsec = 0;
8460     + inode->i_mtime.tv_sec = inode->i_mtime.tv_nsec = 0;
8461     + inode->i_ctime.tv_sec = inode->i_ctime.tv_nsec = 0;
8462     + unlock_new_inode(inode);
8463     + return inode;
8464     +}
8465     +
8466     +/*
8467     + * final actions when unmounting a file system
8468     + *
8469     + * No need to lock rwsem.
8470     + */
8471     +static void unionfs_put_super(struct super_block *sb)
8472     +{
8473     + int bindex, bstart, bend;
8474     + struct unionfs_sb_info *spd;
8475     + int leaks = 0;
8476     +
8477     + spd = UNIONFS_SB(sb);
8478     + if (!spd)
8479     + return;
8480     +
8481     + bstart = sbstart(sb);
8482     + bend = sbend(sb);
8483     +
8484     + /* Make sure we have no leaks of branchget/branchput. */
8485     + for (bindex = bstart; bindex <= bend; bindex++)
8486     + if (unlikely(branch_count(sb, bindex) != 0)) {
8487     + printk(KERN_CRIT
8488     + "unionfs: branch %d has %d references left!\n",
8489     + bindex, branch_count(sb, bindex));
8490     + leaks = 1;
8491     + }
8492     + WARN_ON(leaks != 0);
8493     +
8494     + /* decrement lower super references */
8495     + for (bindex = bstart; bindex <= bend; bindex++) {
8496     + struct super_block *s;
8497     + s = unionfs_lower_super_idx(sb, bindex);
8498     + unionfs_set_lower_super_idx(sb, bindex, NULL);
8499     + atomic_dec(&s->s_active);
8500     + }
8501     +
8502     + kfree(spd->dev_name);
8503     + kfree(spd->data);
8504     + kfree(spd);
8505     + sb->s_fs_info = NULL;
8506     +}
8507     +
8508     +/*
8509     + * Since people use this to answer the "How big of a file can I write?"
8510     + * question, we report the size of the highest priority branch as the size of
8511     + * the union.
8512     + */
8513     +static int unionfs_statfs(struct dentry *dentry, struct kstatfs *buf)
8514     +{
8515     + int err = 0;
8516     + struct super_block *sb;
8517     + struct dentry *lower_dentry;
8518     + struct dentry *parent;
8519     + struct path lower_path;
8520     + bool valid;
8521     +
8522     + sb = dentry->d_sb;
8523     +
8524     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
8525     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
8526     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
8527     +
8528     + valid = __unionfs_d_revalidate(dentry, parent, false);
8529     + if (unlikely(!valid)) {
8530     + err = -ESTALE;
8531     + goto out;
8532     + }
8533     + unionfs_check_dentry(dentry);
8534     +
8535     + lower_dentry = unionfs_lower_dentry(sb->s_root);
8536     + lower_path.dentry = lower_dentry;
8537     + lower_path.mnt = unionfs_mntget(sb->s_root, 0);
8538     + err = vfs_statfs(&lower_path, buf);
8539     + mntput(lower_path.mnt);
8540     +
8541     + /* set return buf to our f/s to avoid confusing user-level utils */
8542     + buf->f_type = UNIONFS_SUPER_MAGIC;
8543     + /*
8544     + * Our maximum file name can is shorter by a few bytes because every
8545     + * file name could potentially be whited-out.
8546     + *
8547     + * XXX: this restriction goes away with ODF.
8548     + */
8549     + unionfs_set_max_namelen(&buf->f_namelen);
8550     +
8551     + /*
8552     + * reset two fields to avoid confusing user-land.
8553     + * XXX: is this still necessary?
8554     + */
8555     + memset(&buf->f_fsid, 0, sizeof(__kernel_fsid_t));
8556     + memset(&buf->f_spare, 0, sizeof(buf->f_spare));
8557     +
8558     +out:
8559     + unionfs_check_dentry(dentry);
8560     + unionfs_unlock_dentry(dentry);
8561     + unionfs_unlock_parent(dentry, parent);
8562     + unionfs_read_unlock(sb);
8563     + return err;
8564     +}
8565     +
8566     +/* handle mode changing during remount */
8567     +static noinline_for_stack int do_remount_mode_option(
8568     + char *optarg,
8569     + int cur_branches,
8570     + struct unionfs_data *new_data,
8571     + struct path *new_lower_paths)
8572     +{
8573     + int err = -EINVAL;
8574     + int perms, idx;
8575     + char *modename = strchr(optarg, '=');
8576     + struct nameidata nd;
8577     +
8578     + /* by now, optarg contains the branch name */
8579     + if (!*optarg) {
8580     + printk(KERN_ERR
8581     + "unionfs: no branch specified for mode change\n");
8582     + goto out;
8583     + }
8584     + if (!modename) {
8585     + printk(KERN_ERR "unionfs: branch \"%s\" requires a mode\n",
8586     + optarg);
8587     + goto out;
8588     + }
8589     + *modename++ = '\0';
8590     + err = parse_branch_mode(modename, &perms);
8591     + if (err) {
8592     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for \"%s\"\n",
8593     + modename, optarg);
8594     + goto out;
8595     + }
8596     +
8597     + /*
8598     + * Find matching branch index. For now, this assumes that nothing
8599     + * has been mounted on top of this Unionfs stack. Once we have /odf
8600     + * and cache-coherency resolved, we'll address the branch-path
8601     + * uniqueness.
8602     + */
8603     + err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8604     + if (err) {
8605     + printk(KERN_ERR "unionfs: error accessing "
8606     + "lower directory \"%s\" (error %d)\n",
8607     + optarg, err);
8608     + goto out;
8609     + }
8610     + for (idx = 0; idx < cur_branches; idx++)
8611     + if (nd.path.mnt == new_lower_paths[idx].mnt &&
8612     + nd.path.dentry == new_lower_paths[idx].dentry)
8613     + break;
8614     + path_put(&nd.path); /* no longer needed */
8615     + if (idx == cur_branches) {
8616     + err = -ENOENT; /* err may have been reset above */
8617     + printk(KERN_ERR "unionfs: branch \"%s\" "
8618     + "not found\n", optarg);
8619     + goto out;
8620     + }
8621     + /* check/change mode for existing branch */
8622     + /* we don't warn if perms==branchperms */
8623     + new_data[idx].branchperms = perms;
8624     + err = 0;
8625     +out:
8626     + return err;
8627     +}
8628     +
8629     +/* handle branch deletion during remount */
8630     +static noinline_for_stack int do_remount_del_option(
8631     + char *optarg, int cur_branches,
8632     + struct unionfs_data *new_data,
8633     + struct path *new_lower_paths)
8634     +{
8635     + int err = -EINVAL;
8636     + int idx;
8637     + struct nameidata nd;
8638     +
8639     + /* optarg contains the branch name to delete */
8640     +
8641     + /*
8642     + * Find matching branch index. For now, this assumes that nothing
8643     + * has been mounted on top of this Unionfs stack. Once we have /odf
8644     + * and cache-coherency resolved, we'll address the branch-path
8645     + * uniqueness.
8646     + */
8647     + err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8648     + if (err) {
8649     + printk(KERN_ERR "unionfs: error accessing "
8650     + "lower directory \"%s\" (error %d)\n",
8651     + optarg, err);
8652     + goto out;
8653     + }
8654     + for (idx = 0; idx < cur_branches; idx++)
8655     + if (nd.path.mnt == new_lower_paths[idx].mnt &&
8656     + nd.path.dentry == new_lower_paths[idx].dentry)
8657     + break;
8658     + path_put(&nd.path); /* no longer needed */
8659     + if (idx == cur_branches) {
8660     + printk(KERN_ERR "unionfs: branch \"%s\" "
8661     + "not found\n", optarg);
8662     + err = -ENOENT;
8663     + goto out;
8664     + }
8665     + /* check if there are any open files on the branch to be deleted */
8666     + if (atomic_read(&new_data[idx].open_files) > 0) {
8667     + err = -EBUSY;
8668     + goto out;
8669     + }
8670     +
8671     + /*
8672     + * Now we have to delete the branch. First, release any handles it
8673     + * has. Then, move the remaining array indexes past "idx" in
8674     + * new_data and new_lower_paths one to the left. Finally, adjust
8675     + * cur_branches.
8676     + */
8677     + path_put(&new_lower_paths[idx]);
8678     +
8679     + if (idx < cur_branches - 1) {
8680     + /* if idx==cur_branches-1, we delete last branch: easy */
8681     + memmove(&new_data[idx], &new_data[idx+1],
8682     + (cur_branches - 1 - idx) *
8683     + sizeof(struct unionfs_data));
8684     + memmove(&new_lower_paths[idx], &new_lower_paths[idx+1],
8685     + (cur_branches - 1 - idx) * sizeof(struct path));
8686     + }
8687     +
8688     + err = 0;
8689     +out:
8690     + return err;
8691     +}
8692     +
8693     +/* handle branch insertion during remount */
8694     +static noinline_for_stack int do_remount_add_option(
8695     + char *optarg, int cur_branches,
8696     + struct unionfs_data *new_data,
8697     + struct path *new_lower_paths,
8698     + int *high_branch_id)
8699     +{
8700     + int err = -EINVAL;
8701     + int perms;
8702     + int idx = 0; /* default: insert at beginning */
8703     + char *new_branch , *modename = NULL;
8704     + struct nameidata nd;
8705     +
8706     + /*
8707     + * optarg can be of several forms:
8708     + *
8709     + * /bar:/foo insert /foo before /bar
8710     + * /bar:/foo=ro insert /foo in ro mode before /bar
8711     + * /foo insert /foo in the beginning (prepend)
8712     + * :/foo insert /foo at the end (append)
8713     + */
8714     + if (*optarg == ':') { /* append? */
8715     + new_branch = optarg + 1; /* skip ':' */
8716     + idx = cur_branches;
8717     + goto found_insertion_point;
8718     + }
8719     + new_branch = strchr(optarg, ':');
8720     + if (!new_branch) { /* prepend? */
8721     + new_branch = optarg;
8722     + goto found_insertion_point;
8723     + }
8724     + *new_branch++ = '\0'; /* holds path+mode of new branch */
8725     +
8726     + /*
8727     + * Find matching branch index. For now, this assumes that nothing
8728     + * has been mounted on top of this Unionfs stack. Once we have /odf
8729     + * and cache-coherency resolved, we'll address the branch-path
8730     + * uniqueness.
8731     + */
8732     + err = path_lookup(optarg, LOOKUP_FOLLOW, &nd);
8733     + if (err) {
8734     + printk(KERN_ERR "unionfs: error accessing "
8735     + "lower directory \"%s\" (error %d)\n",
8736     + optarg, err);
8737     + goto out;
8738     + }
8739     + for (idx = 0; idx < cur_branches; idx++)
8740     + if (nd.path.mnt == new_lower_paths[idx].mnt &&
8741     + nd.path.dentry == new_lower_paths[idx].dentry)
8742     + break;
8743     + path_put(&nd.path); /* no longer needed */
8744     + if (idx == cur_branches) {
8745     + printk(KERN_ERR "unionfs: branch \"%s\" "
8746     + "not found\n", optarg);
8747     + err = -ENOENT;
8748     + goto out;
8749     + }
8750     +
8751     + /*
8752     + * At this point idx will hold the index where the new branch should
8753     + * be inserted before.
8754     + */
8755     +found_insertion_point:
8756     + /* find the mode for the new branch */
8757     + if (new_branch)
8758     + modename = strchr(new_branch, '=');
8759     + if (modename)
8760     + *modename++ = '\0';
8761     + if (!new_branch || !*new_branch) {
8762     + printk(KERN_ERR "unionfs: null new branch\n");
8763     + err = -EINVAL;
8764     + goto out;
8765     + }
8766     + err = parse_branch_mode(modename, &perms);
8767     + if (err) {
8768     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
8769     + "branch \"%s\"\n", modename, new_branch);
8770     + goto out;
8771     + }
8772     + err = path_lookup(new_branch, LOOKUP_FOLLOW, &nd);
8773     + if (err) {
8774     + printk(KERN_ERR "unionfs: error accessing "
8775     + "lower directory \"%s\" (error %d)\n",
8776     + new_branch, err);
8777     + goto out;
8778     + }
8779     + /*
8780     + * It's probably safe to check_mode the new branch to insert. Note:
8781     + * we don't allow inserting branches which are unionfs's by
8782     + * themselves (check_branch returns EINVAL in that case). This is
8783     + * because this code base doesn't support stacking unionfs: the ODF
8784     + * code base supports that correctly.
8785     + */
8786     + err = check_branch(&nd);
8787     + if (err) {
8788     + printk(KERN_ERR "unionfs: lower directory "
8789     + "\"%s\" is not a valid branch\n", optarg);
8790     + path_put(&nd.path);
8791     + goto out;
8792     + }
8793     +
8794     + /*
8795     + * Now we have to insert the new branch. But first, move the bits
8796     + * to make space for the new branch, if needed. Finally, adjust
8797     + * cur_branches.
8798     + * We don't release nd here; it's kept until umount/remount.
8799     + */
8800     + if (idx < cur_branches) {
8801     + /* if idx==cur_branches, we append: easy */
8802     + memmove(&new_data[idx+1], &new_data[idx],
8803     + (cur_branches - idx) * sizeof(struct unionfs_data));
8804     + memmove(&new_lower_paths[idx+1], &new_lower_paths[idx],
8805     + (cur_branches - idx) * sizeof(struct path));
8806     + }
8807     + new_lower_paths[idx].dentry = nd.path.dentry;
8808     + new_lower_paths[idx].mnt = nd.path.mnt;
8809     +
8810     + new_data[idx].sb = nd.path.dentry->d_sb;
8811     + atomic_set(&new_data[idx].open_files, 0);
8812     + new_data[idx].branchperms = perms;
8813     + new_data[idx].branch_id = ++*high_branch_id; /* assign new branch ID */
8814     +
8815     + err = 0;
8816     +out:
8817     + return err;
8818     +}
8819     +
8820     +
8821     +/*
8822     + * Support branch management options on remount.
8823     + *
8824     + * See Documentation/filesystems/unionfs/ for details.
8825     + *
8826     + * @flags: numeric mount options
8827     + * @options: mount options string
8828     + *
8829     + * This function can rearrange a mounted union dynamically, adding and
8830     + * removing branches, including changing branch modes. Clearly this has to
8831     + * be done safely and atomically. Luckily, the VFS already calls this
8832     + * function with lock_super(sb) and lock_kernel() held, preventing
8833     + * concurrent mixing of new mounts, remounts, and unmounts. Moreover,
8834     + * do_remount_sb(), our caller function, already called shrink_dcache_sb(sb)
8835     + * to purge dentries/inodes from our superblock, and also called
8836     + * fsync_super(sb) to purge any dirty pages. So we're good.
8837     + *
8838     + * XXX: however, our remount code may also need to invalidate mapped pages
8839     + * so as to force them to be re-gotten from the (newly reconfigured) lower
8840     + * branches. This has to wait for proper mmap and cache coherency support
8841     + * in the VFS.
8842     + *
8843     + */
8844     +static int unionfs_remount_fs(struct super_block *sb, int *flags,
8845     + char *options)
8846     +{
8847     + int err = 0;
8848     + int i;
8849     + char *optionstmp, *tmp_to_free; /* kstrdup'ed of "options" */
8850     + char *optname;
8851     + int cur_branches = 0; /* no. of current branches */
8852     + int new_branches = 0; /* no. of branches actually left in the end */
8853     + int add_branches; /* est. no. of branches to add */
8854     + int del_branches; /* est. no. of branches to del */
8855     + int max_branches; /* max possible no. of branches */
8856     + struct unionfs_data *new_data = NULL, *tmp_data = NULL;
8857     + struct path *new_lower_paths = NULL, *tmp_lower_paths = NULL;
8858     + struct inode **new_lower_inodes = NULL;
8859     + int new_high_branch_id; /* new high branch ID */
8860     + int size; /* memory allocation size, temp var */
8861     + int old_ibstart, old_ibend;
8862     +
8863     + unionfs_write_lock(sb);
8864     +
8865     + /*
8866     + * The VFS will take care of "ro" and "rw" flags, and we can safely
8867     + * ignore MS_SILENT, but anything else left over is an error. So we
8868     + * need to check if any other flags may have been passed (none are
8869     + * allowed/supported as of now).
8870     + */
8871     + if ((*flags & ~(MS_RDONLY | MS_SILENT)) != 0) {
8872     + printk(KERN_ERR
8873     + "unionfs: remount flags 0x%x unsupported\n", *flags);
8874     + err = -EINVAL;
8875     + goto out_error;
8876     + }
8877     +
8878     + /*
8879     + * If 'options' is NULL, it's probably because the user just changed
8880     + * the union to a "ro" or "rw" and the VFS took care of it. So
8881     + * nothing to do and we're done.
8882     + */
8883     + if (!options || options[0] == '\0')
8884     + goto out_error;
8885     +
8886     + /*
8887     + * Find out how many branches we will have in the end, counting
8888     + * "add" and "del" commands. Copy the "options" string because
8889     + * strsep modifies the string and we need it later.
8890     + */
8891     + tmp_to_free = kstrdup(options, GFP_KERNEL);
8892     + optionstmp = tmp_to_free;
8893     + if (unlikely(!optionstmp)) {
8894     + err = -ENOMEM;
8895     + goto out_free;
8896     + }
8897     + cur_branches = sbmax(sb); /* current no. branches */
8898     + new_branches = sbmax(sb);
8899     + del_branches = 0;
8900     + add_branches = 0;
8901     + new_high_branch_id = sbhbid(sb); /* save current high_branch_id */
8902     + while ((optname = strsep(&optionstmp, ",")) != NULL) {
8903     + char *optarg;
8904     +
8905     + if (!optname || !*optname)
8906     + continue;
8907     +
8908     + optarg = strchr(optname, '=');
8909     + if (optarg)
8910     + *optarg++ = '\0';
8911     +
8912     + if (!strcmp("add", optname))
8913     + add_branches++;
8914     + else if (!strcmp("del", optname))
8915     + del_branches++;
8916     + }
8917     + kfree(tmp_to_free);
8918     + /* after all changes, will we have at least one branch left? */
8919     + if ((new_branches + add_branches - del_branches) < 1) {
8920     + printk(KERN_ERR
8921     + "unionfs: no branches left after remount\n");
8922     + err = -EINVAL;
8923     + goto out_free;
8924     + }
8925     +
8926     + /*
8927     + * Since we haven't actually parsed all the add/del options, nor
8928     + * have we checked them for errors, we don't know for sure how many
8929     + * branches we will have after all changes have taken place. In
8930     + * fact, the total number of branches left could be less than what
8931     + * we have now. So we need to allocate space for a temporary
8932     + * placeholder that is at least as large as the maximum number of
8933     + * branches we *could* have, which is the current number plus all
8934     + * the additions. Once we're done with these temp placeholders, we
8935     + * may have to re-allocate the final size, copy over from the temp,
8936     + * and then free the temps (done near the end of this function).
8937     + */
8938     + max_branches = cur_branches + add_branches;
8939     + /* allocate space for new pointers to lower dentry */
8940     + tmp_data = kcalloc(max_branches,
8941     + sizeof(struct unionfs_data), GFP_KERNEL);
8942     + if (unlikely(!tmp_data)) {
8943     + err = -ENOMEM;
8944     + goto out_free;
8945     + }
8946     + /* allocate space for new pointers to lower paths */
8947     + tmp_lower_paths = kcalloc(max_branches,
8948     + sizeof(struct path), GFP_KERNEL);
8949     + if (unlikely(!tmp_lower_paths)) {
8950     + err = -ENOMEM;
8951     + goto out_free;
8952     + }
8953     + /* copy current info into new placeholders, incrementing refcnts */
8954     + memcpy(tmp_data, UNIONFS_SB(sb)->data,
8955     + cur_branches * sizeof(struct unionfs_data));
8956     + memcpy(tmp_lower_paths, UNIONFS_D(sb->s_root)->lower_paths,
8957     + cur_branches * sizeof(struct path));
8958     + for (i = 0; i < cur_branches; i++)
8959     + path_get(&tmp_lower_paths[i]); /* drop refs at end of fxn */
8960     +
8961     + /*******************************************************************
8962     + * For each branch command, do path_lookup on the requested branch,
8963     + * and apply the change to a temp branch list. To handle errors, we
8964     + * already dup'ed the old arrays (above), and increased the refcnts
8965     + * on various f/s objects. So now we can do all the path_lookups
8966     + * and branch-management commands on the new arrays. If it fail mid
8967     + * way, we free the tmp arrays and *put all objects. If we succeed,
8968     + * then we free old arrays and *put its objects, and then replace
8969     + * the arrays with the new tmp list (we may have to re-allocate the
8970     + * memory because the temp lists could have been larger than what we
8971     + * actually needed).
8972     + *******************************************************************/
8973     +
8974     + while ((optname = strsep(&options, ",")) != NULL) {
8975     + char *optarg;
8976     +
8977     + if (!optname || !*optname)
8978     + continue;
8979     + /*
8980     + * At this stage optname holds a comma-delimited option, but
8981     + * without the commas. Next, we need to break the string on
8982     + * the '=' symbol to separate CMD=ARG, where ARG itself can
8983     + * be KEY=VAL. For example, in mode=/foo=rw, CMD is "mode",
8984     + * KEY is "/foo", and VAL is "rw".
8985     + */
8986     + optarg = strchr(optname, '=');
8987     + if (optarg)
8988     + *optarg++ = '\0';
8989     + /* incgen remount option (instead of old ioctl) */
8990     + if (!strcmp("incgen", optname)) {
8991     + err = 0;
8992     + goto out_no_change;
8993     + }
8994     +
8995     + /*
8996     + * All of our options take an argument now. (Insert ones
8997     + * that don't above this check.) So at this stage optname
8998     + * contains the CMD part and optarg contains the ARG part.
8999     + */
9000     + if (!optarg || !*optarg) {
9001     + printk(KERN_ERR "unionfs: all remount options require "
9002     + "an argument (%s)\n", optname);
9003     + err = -EINVAL;
9004     + goto out_release;
9005     + }
9006     +
9007     + if (!strcmp("add", optname)) {
9008     + err = do_remount_add_option(optarg, new_branches,
9009     + tmp_data,
9010     + tmp_lower_paths,
9011     + &new_high_branch_id);
9012     + if (err)
9013     + goto out_release;
9014     + new_branches++;
9015     + if (new_branches > UNIONFS_MAX_BRANCHES) {
9016     + printk(KERN_ERR "unionfs: command exceeds "
9017     + "%d branches\n", UNIONFS_MAX_BRANCHES);
9018     + err = -E2BIG;
9019     + goto out_release;
9020     + }
9021     + continue;
9022     + }
9023     + if (!strcmp("del", optname)) {
9024     + err = do_remount_del_option(optarg, new_branches,
9025     + tmp_data,
9026     + tmp_lower_paths);
9027     + if (err)
9028     + goto out_release;
9029     + new_branches--;
9030     + continue;
9031     + }
9032     + if (!strcmp("mode", optname)) {
9033     + err = do_remount_mode_option(optarg, new_branches,
9034     + tmp_data,
9035     + tmp_lower_paths);
9036     + if (err)
9037     + goto out_release;
9038     + continue;
9039     + }
9040     +
9041     + /*
9042     + * When you use "mount -o remount,ro", mount(8) will
9043     + * reportedly pass the original dirs= string from
9044     + * /proc/mounts. So for now, we have to ignore dirs= and
9045     + * not consider it an error, unless we want to allow users
9046     + * to pass dirs= in remount. Note that to allow the VFS to
9047     + * actually process the ro/rw remount options, we have to
9048     + * return 0 from this function.
9049     + */
9050     + if (!strcmp("dirs", optname)) {
9051     + printk(KERN_WARNING
9052     + "unionfs: remount ignoring option \"%s\"\n",
9053     + optname);
9054     + continue;
9055     + }
9056     +
9057     + err = -EINVAL;
9058     + printk(KERN_ERR
9059     + "unionfs: unrecognized option \"%s\"\n", optname);
9060     + goto out_release;
9061     + }
9062     +
9063     +out_no_change:
9064     +
9065     + /******************************************************************
9066     + * WE'RE ALMOST DONE: check if leftmost branch might be read-only,
9067     + * see if we need to allocate a small-sized new vector, copy the
9068     + * vectors to their correct place, release the refcnt of the older
9069     + * ones, and return. Also handle invalidating any pages that will
9070     + * have to be re-read.
9071     + *******************************************************************/
9072     +
9073     + if (!(tmp_data[0].branchperms & MAY_WRITE)) {
9074     + printk(KERN_ERR "unionfs: leftmost branch cannot be read-only "
9075     + "(use \"remount,ro\" to create a read-only union)\n");
9076     + err = -EINVAL;
9077     + goto out_release;
9078     + }
9079     +
9080     + /* (re)allocate space for new pointers to lower dentry */
9081     + size = new_branches * sizeof(struct unionfs_data);
9082     + new_data = krealloc(tmp_data, size, GFP_KERNEL);
9083     + if (unlikely(!new_data)) {
9084     + err = -ENOMEM;
9085     + goto out_release;
9086     + }
9087     +
9088     + /* allocate space for new pointers to lower paths */
9089     + size = new_branches * sizeof(struct path);
9090     + new_lower_paths = krealloc(tmp_lower_paths, size, GFP_KERNEL);
9091     + if (unlikely(!new_lower_paths)) {
9092     + err = -ENOMEM;
9093     + goto out_release;
9094     + }
9095     +
9096     + /* allocate space for new pointers to lower inodes */
9097     + new_lower_inodes = kcalloc(new_branches,
9098     + sizeof(struct inode *), GFP_KERNEL);
9099     + if (unlikely(!new_lower_inodes)) {
9100     + err = -ENOMEM;
9101     + goto out_release;
9102     + }
9103     +
9104     + /*
9105     + * OK, just before we actually put the new set of branches in place,
9106     + * we need to ensure that our own f/s has no dirty objects left.
9107     + * Luckily, do_remount_sb() already calls shrink_dcache_sb(sb) and
9108     + * fsync_super(sb), taking care of dentries, inodes, and dirty
9109     + * pages. So all that's left is for us to invalidate any leftover
9110     + * (non-dirty) pages to ensure that they will be re-read from the
9111     + * new lower branches (and to support mmap).
9112     + */
9113     +
9114     + /*
9115     + * Once we finish the remounting successfully, our superblock
9116     + * generation number will have increased. This will be detected by
9117     + * our dentry-revalidation code upon subsequent f/s operations
9118     + * through unionfs. The revalidation code will rebuild the union of
9119     + * lower inodes for a given unionfs inode and invalidate any pages
9120     + * of such "stale" inodes (by calling our purge_inode_data
9121     + * function). This revalidation will happen lazily and
9122     + * incrementally, as users perform operations on cached inodes. We
9123     + * would like to encourage this revalidation to happen sooner if
9124     + * possible, so we like to try to invalidate as many other pages in
9125     + * our superblock as we can. We used to call drop_pagecache_sb() or
9126     + * a variant thereof, but either method was racy (drop_caches alone
9127     + * is known to be racy). So now we let the revalidation happen on a
9128     + * per file basis in ->d_revalidate.
9129     + */
9130     +
9131     + /* grab new lower super references; release old ones */
9132     + for (i = 0; i < new_branches; i++)
9133     + atomic_inc(&new_data[i].sb->s_active);
9134     + for (i = 0; i < sbmax(sb); i++)
9135     + atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
9136     +
9137     + /* copy new vectors into their correct place */
9138     + tmp_data = UNIONFS_SB(sb)->data;
9139     + UNIONFS_SB(sb)->data = new_data;
9140     + new_data = NULL; /* so don't free good pointers below */
9141     + tmp_lower_paths = UNIONFS_D(sb->s_root)->lower_paths;
9142     + UNIONFS_D(sb->s_root)->lower_paths = new_lower_paths;
9143     + new_lower_paths = NULL; /* so don't free good pointers below */
9144     +
9145     + /* update our unionfs_sb_info and root dentry index of last branch */
9146     + i = sbmax(sb); /* save no. of branches to release at end */
9147     + sbend(sb) = new_branches - 1;
9148     + dbend(sb->s_root) = new_branches - 1;
9149     + old_ibstart = ibstart(sb->s_root->d_inode);
9150     + old_ibend = ibend(sb->s_root->d_inode);
9151     + ibend(sb->s_root->d_inode) = new_branches - 1;
9152     + UNIONFS_D(sb->s_root)->bcount = new_branches;
9153     + new_branches = i; /* no. of branches to release below */
9154     +
9155     + /*
9156     + * Update lower inodes: 3 steps
9157     + * 1. grab ref on all new lower inodes
9158     + */
9159     + for (i = dbstart(sb->s_root); i <= dbend(sb->s_root); i++) {
9160     + struct dentry *lower_dentry =
9161     + unionfs_lower_dentry_idx(sb->s_root, i);
9162     + igrab(lower_dentry->d_inode);
9163     + new_lower_inodes[i] = lower_dentry->d_inode;
9164     + }
9165     + /* 2. release reference on all older lower inodes */
9166     + iput_lowers(sb->s_root->d_inode, old_ibstart, old_ibend, true);
9167     + /* 3. update root dentry's inode to new lower_inodes array */
9168     + UNIONFS_I(sb->s_root->d_inode)->lower_inodes = new_lower_inodes;
9169     + new_lower_inodes = NULL;
9170     +
9171     + /* maxbytes may have changed */
9172     + sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
9173     + /* update high branch ID */
9174     + sbhbid(sb) = new_high_branch_id;
9175     +
9176     + /* update our sb->generation for revalidating objects */
9177     + i = atomic_inc_return(&UNIONFS_SB(sb)->generation);
9178     + atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
9179     + atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
9180     + if (!(*flags & MS_SILENT))
9181     + pr_info("unionfs: %s: new generation number %d\n",
9182     + UNIONFS_SB(sb)->dev_name, i);
9183     + /* finally, update the root dentry's times */
9184     + unionfs_copy_attr_times(sb->s_root->d_inode);
9185     + err = 0; /* reset to success */
9186     +
9187     + /*
9188     + * The code above falls through to the next label, and releases the
9189     + * refcnts of the older ones (stored in tmp_*): if we fell through
9190     + * here, it means success. However, if we jump directly to this
9191     + * label from any error above, then an error occurred after we
9192     + * grabbed various refcnts, and so we have to release the
9193     + * temporarily constructed structures.
9194     + */
9195     +out_release:
9196     + /* no need to cleanup/release anything in tmp_data */
9197     + if (tmp_lower_paths)
9198     + for (i = 0; i < new_branches; i++)
9199     + path_put(&tmp_lower_paths[i]);
9200     +out_free:
9201     + kfree(tmp_lower_paths);
9202     + kfree(tmp_data);
9203     + kfree(new_lower_paths);
9204     + kfree(new_data);
9205     + kfree(new_lower_inodes);
9206     +out_error:
9207     + unionfs_check_dentry(sb->s_root);
9208     + unionfs_write_unlock(sb);
9209     + return err;
9210     +}
9211     +
9212     +/*
9213     + * Called by iput() when the inode reference count reached zero
9214     + * and the inode is not hashed anywhere. Used to clear anything
9215     + * that needs to be, before the inode is completely destroyed and put
9216     + * on the inode free list.
9217     + *
9218     + * No need to lock sb info's rwsem.
9219     + */
9220     +static void unionfs_evict_inode(struct inode *inode)
9221     +{
9222     + int bindex, bstart, bend;
9223     + struct inode *lower_inode;
9224     + struct list_head *pos, *n;
9225     + struct unionfs_dir_state *rdstate;
9226     +
9227     + list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9228     + rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9229     + list_del(&rdstate->cache);
9230     + free_rdstate(rdstate);
9231     + }
9232     +
9233     + /*
9234     + * Decrement a reference to a lower_inode, which was incremented
9235     + * by our read_inode when it was created initially.
9236     + */
9237     + bstart = ibstart(inode);
9238     + bend = ibend(inode);
9239     + if (bstart >= 0) {
9240     + for (bindex = bstart; bindex <= bend; bindex++) {
9241     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
9242     + if (!lower_inode)
9243     + continue;
9244     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
9245     + /* see Documentation/filesystems/unionfs/issues.txt */
9246     + lockdep_off();
9247     + iput(lower_inode);
9248     + lockdep_on();
9249     + }
9250     + }
9251     +
9252     + kfree(UNIONFS_I(inode)->lower_inodes);
9253     + UNIONFS_I(inode)->lower_inodes = NULL;
9254     +}
9255     +
9256     +static struct inode *unionfs_alloc_inode(struct super_block *sb)
9257     +{
9258     + struct unionfs_inode_info *i;
9259     +
9260     + i = kmem_cache_alloc(unionfs_inode_cachep, GFP_KERNEL);
9261     + if (unlikely(!i))
9262     + return NULL;
9263     +
9264     + /* memset everything up to the inode to 0 */
9265     + memset(i, 0, offsetof(struct unionfs_inode_info, vfs_inode));
9266     +
9267     + i->vfs_inode.i_version = 1;
9268     + return &i->vfs_inode;
9269     +}
9270     +
9271     +static void unionfs_destroy_inode(struct inode *inode)
9272     +{
9273     + kmem_cache_free(unionfs_inode_cachep, UNIONFS_I(inode));
9274     +}
9275     +
9276     +/* unionfs inode cache constructor */
9277     +static void init_once(void *obj)
9278     +{
9279     + struct unionfs_inode_info *i = obj;
9280     +
9281     + inode_init_once(&i->vfs_inode);
9282     +}
9283     +
9284     +int unionfs_init_inode_cache(void)
9285     +{
9286     + int err = 0;
9287     +
9288     + unionfs_inode_cachep =
9289     + kmem_cache_create("unionfs_inode_cache",
9290     + sizeof(struct unionfs_inode_info), 0,
9291     + SLAB_RECLAIM_ACCOUNT, init_once);
9292     + if (unlikely(!unionfs_inode_cachep))
9293     + err = -ENOMEM;
9294     + return err;
9295     +}
9296     +
9297     +/* unionfs inode cache destructor */
9298     +void unionfs_destroy_inode_cache(void)
9299     +{
9300     + if (unionfs_inode_cachep)
9301     + kmem_cache_destroy(unionfs_inode_cachep);
9302     +}
9303     +
9304     +/*
9305     + * Called when we have a dirty inode, right here we only throw out
9306     + * parts of our readdir list that are too old.
9307     + *
9308     + * No need to grab sb info's rwsem.
9309     + */
9310     +static int unionfs_write_inode(struct inode *inode,
9311     + struct writeback_control *wbc)
9312     +{
9313     + struct list_head *pos, *n;
9314     + struct unionfs_dir_state *rdstate;
9315     +
9316     + spin_lock(&UNIONFS_I(inode)->rdlock);
9317     + list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9318     + rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9319     + /* We keep this list in LRU order. */
9320     + if ((rdstate->access + RDCACHE_JIFFIES) > jiffies)
9321     + break;
9322     + UNIONFS_I(inode)->rdcount--;
9323     + list_del(&rdstate->cache);
9324     + free_rdstate(rdstate);
9325     + }
9326     + spin_unlock(&UNIONFS_I(inode)->rdlock);
9327     +
9328     + return 0;
9329     +}
9330     +
9331     +/*
9332     + * Used only in nfs, to kill any pending RPC tasks, so that subsequent
9333     + * code can actually succeed and won't leave tasks that need handling.
9334     + */
9335     +static void unionfs_umount_begin(struct super_block *sb)
9336     +{
9337     + struct super_block *lower_sb;
9338     + int bindex, bstart, bend;
9339     +
9340     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9341     +
9342     + bstart = sbstart(sb);
9343     + bend = sbend(sb);
9344     + for (bindex = bstart; bindex <= bend; bindex++) {
9345     + lower_sb = unionfs_lower_super_idx(sb, bindex);
9346     +
9347     + if (lower_sb && lower_sb->s_op &&
9348     + lower_sb->s_op->umount_begin)
9349     + lower_sb->s_op->umount_begin(lower_sb);
9350     + }
9351     +
9352     + unionfs_read_unlock(sb);
9353     +}
9354     +
9355     +static int unionfs_show_options(struct seq_file *m, struct vfsmount *mnt)
9356     +{
9357     + struct super_block *sb = mnt->mnt_sb;
9358     + int ret = 0;
9359     + char *tmp_page;
9360     + char *path;
9361     + int bindex, bstart, bend;
9362     + int perms;
9363     +
9364     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9365     +
9366     + unionfs_lock_dentry(sb->s_root, UNIONFS_DMUTEX_CHILD);
9367     +
9368     + tmp_page = (char *) __get_free_page(GFP_KERNEL);
9369     + if (unlikely(!tmp_page)) {
9370     + ret = -ENOMEM;
9371     + goto out;
9372     + }
9373     +
9374     + bstart = sbstart(sb);
9375     + bend = sbend(sb);
9376     +
9377     + seq_printf(m, ",dirs=");
9378     + for (bindex = bstart; bindex <= bend; bindex++) {
9379     + struct path p;
9380     + p.dentry = unionfs_lower_dentry_idx(sb->s_root, bindex);
9381     + p.mnt = unionfs_lower_mnt_idx(sb->s_root, bindex);
9382     + path = d_path(&p, tmp_page, PAGE_SIZE);
9383     + if (IS_ERR(path)) {
9384     + ret = PTR_ERR(path);
9385     + goto out;
9386     + }
9387     +
9388     + perms = branchperms(sb, bindex);
9389     +
9390     + seq_printf(m, "%s=%s", path,
9391     + perms & MAY_WRITE ? "rw" : "ro");
9392     + if (bindex != bend)
9393     + seq_printf(m, ":");
9394     + }
9395     +
9396     +out:
9397     + free_page((unsigned long) tmp_page);
9398     +
9399     + unionfs_unlock_dentry(sb->s_root);
9400     +
9401     + unionfs_read_unlock(sb);
9402     +
9403     + return ret;
9404     +}
9405     +
9406     +struct super_operations unionfs_sops = {
9407     + .put_super = unionfs_put_super,
9408     + .statfs = unionfs_statfs,
9409     + .remount_fs = unionfs_remount_fs,
9410     + .evict_inode = unionfs_evict_inode,
9411     + .umount_begin = unionfs_umount_begin,
9412     + .show_options = unionfs_show_options,
9413     + .write_inode = unionfs_write_inode,
9414     + .alloc_inode = unionfs_alloc_inode,
9415     + .destroy_inode = unionfs_destroy_inode,
9416     +};
9417     diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
9418     new file mode 100644
9419     index 0000000..d49c834
9420     --- /dev/null
9421     +++ b/fs/unionfs/union.h
9422     @@ -0,0 +1,669 @@
9423     +/*
9424     + * Copyright (c) 2003-2010 Erez Zadok
9425     + * Copyright (c) 2003-2006 Charles P. Wright
9426     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
9427     + * Copyright (c) 2005 Arun M. Krishnakumar
9428     + * Copyright (c) 2004-2006 David P. Quigley
9429     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
9430     + * Copyright (c) 2003 Puja Gupta
9431     + * Copyright (c) 2003 Harikesavan Krishnan
9432     + * Copyright (c) 2003-2010 Stony Brook University
9433     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
9434     + *
9435     + * This program is free software; you can redistribute it and/or modify
9436     + * it under the terms of the GNU General Public License version 2 as
9437     + * published by the Free Software Foundation.
9438     + */
9439     +
9440     +#ifndef _UNION_H_
9441     +#define _UNION_H_
9442     +
9443     +#include <linux/dcache.h>
9444     +#include <linux/file.h>
9445     +#include <linux/list.h>
9446     +#include <linux/fs.h>
9447     +#include <linux/mm.h>
9448     +#include <linux/module.h>
9449     +#include <linux/mount.h>
9450     +#include <linux/namei.h>
9451     +#include <linux/page-flags.h>
9452     +#include <linux/pagemap.h>
9453     +#include <linux/poll.h>
9454     +#include <linux/security.h>
9455     +#include <linux/seq_file.h>
9456     +#include <linux/slab.h>
9457     +#include <linux/spinlock.h>
9458     +#include <linux/smp_lock.h>
9459     +#include <linux/statfs.h>
9460     +#include <linux/string.h>
9461     +#include <linux/vmalloc.h>
9462     +#include <linux/writeback.h>
9463     +#include <linux/buffer_head.h>
9464     +#include <linux/xattr.h>
9465     +#include <linux/fs_stack.h>
9466     +#include <linux/magic.h>
9467     +#include <linux/log2.h>
9468     +#include <linux/poison.h>
9469     +#include <linux/mman.h>
9470     +#include <linux/backing-dev.h>
9471     +#include <linux/splice.h>
9472     +
9473     +#include <asm/system.h>
9474     +
9475     +#include <linux/union_fs.h>
9476     +
9477     +/* the file system name */
9478     +#define UNIONFS_NAME "unionfs"
9479     +
9480     +/* unionfs root inode number */
9481     +#define UNIONFS_ROOT_INO 1
9482     +
9483     +/* number of times we try to get a unique temporary file name */
9484     +#define GET_TMPNAM_MAX_RETRY 5
9485     +
9486     +/* maximum number of branches we support, to avoid memory blowup */
9487     +#define UNIONFS_MAX_BRANCHES 128
9488     +
9489     +/* minimum time (seconds) required for time-based cache-coherency */
9490     +#define UNIONFS_MIN_CC_TIME 3
9491     +
9492     +/* Operations vectors defined in specific files. */
9493     +extern struct file_operations unionfs_main_fops;
9494     +extern struct file_operations unionfs_dir_fops;
9495     +extern struct inode_operations unionfs_main_iops;
9496     +extern struct inode_operations unionfs_dir_iops;
9497     +extern struct inode_operations unionfs_symlink_iops;
9498     +extern struct super_operations unionfs_sops;
9499     +extern struct dentry_operations unionfs_dops;
9500     +extern struct address_space_operations unionfs_aops, unionfs_dummy_aops;
9501     +extern struct vm_operations_struct unionfs_vm_ops;
9502     +
9503     +/* How long should an entry be allowed to persist */
9504     +#define RDCACHE_JIFFIES (5*HZ)
9505     +
9506     +/* compatibility with Real-Time patches */
9507     +#ifdef CONFIG_PREEMPT_RT
9508     +# define unionfs_rw_semaphore compat_rw_semaphore
9509     +#else /* not CONFIG_PREEMPT_RT */
9510     +# define unionfs_rw_semaphore rw_semaphore
9511     +#endif /* not CONFIG_PREEMPT_RT */
9512     +
9513     +/* file private data. */
9514     +struct unionfs_file_info {
9515     + int bstart;
9516     + int bend;
9517     + atomic_t generation;
9518     +
9519     + struct unionfs_dir_state *rdstate;
9520     + struct file **lower_files;
9521     + int *saved_branch_ids; /* IDs of branches when file was opened */
9522     + const struct vm_operations_struct *lower_vm_ops;
9523     + bool wrote_to_file; /* for delayed copyup */
9524     +};
9525     +
9526     +/* unionfs inode data in memory */
9527     +struct unionfs_inode_info {
9528     + int bstart;
9529     + int bend;
9530     + atomic_t generation;
9531     + /* Stuff for readdir over NFS. */
9532     + spinlock_t rdlock;
9533     + struct list_head readdircache;
9534     + int rdcount;
9535     + int hashsize;
9536     + int cookie;
9537     +
9538     + /* The lower inodes */
9539     + struct inode **lower_inodes;
9540     +
9541     + struct inode vfs_inode;
9542     +};
9543     +
9544     +/* unionfs dentry data in memory */
9545     +struct unionfs_dentry_info {
9546     + /*
9547     + * The semaphore is used to lock the dentry as soon as we get into a
9548     + * unionfs function from the VFS. Our lock ordering is that children
9549     + * go before their parents.
9550     + */
9551     + struct mutex lock;
9552     + int bstart;
9553     + int bend;
9554     + int bopaque;
9555     + int bcount;
9556     + atomic_t generation;
9557     + struct path *lower_paths;
9558     +};
9559     +
9560     +/* These are the pointers to our various objects. */
9561     +struct unionfs_data {
9562     + struct super_block *sb; /* lower super_block */
9563     + atomic_t open_files; /* number of open files on branch */
9564     + int branchperms;
9565     + int branch_id; /* unique branch ID at re/mount time */
9566     +};
9567     +
9568     +/* unionfs super-block data in memory */
9569     +struct unionfs_sb_info {
9570     + int bend;
9571     +
9572     + atomic_t generation;
9573     +
9574     + /*
9575     + * This rwsem is used to make sure that a branch management
9576     + * operation...
9577     + * 1) will not begin before all currently in-flight operations
9578     + * complete.
9579     + * 2) any new operations do not execute until the currently
9580     + * running branch management operation completes.
9581     + *
9582     + * The write_lock_owner records the PID of the task which grabbed
9583     + * the rw_sem for writing. If the same task also tries to grab the
9584     + * read lock, we allow it. This prevents a self-deadlock when
9585     + * branch-management is used on a pivot_root'ed union, because we
9586     + * have to ->lookup paths which belong to the same union.
9587     + */
9588     + struct unionfs_rw_semaphore rwsem;
9589     + pid_t write_lock_owner; /* PID of rw_sem owner (write lock) */
9590     + int high_branch_id; /* last unique branch ID given */
9591     + char *dev_name; /* to identify different unions in pr_debug */
9592     + struct unionfs_data *data;
9593     +};
9594     +
9595     +/*
9596     + * structure for making the linked list of entries by readdir on left branch
9597     + * to compare with entries on right branch
9598     + */
9599     +struct filldir_node {
9600     + struct list_head file_list; /* list for directory entries */
9601     + char *name; /* name entry */
9602     + int hash; /* name hash */
9603     + int namelen; /* name len since name is not 0 terminated */
9604     +
9605     + /*
9606     + * we can check for duplicate whiteouts and files in the same branch
9607     + * in order to return -EIO.
9608     + */
9609     + int bindex;
9610     +
9611     + /* is this a whiteout entry? */
9612     + int whiteout;
9613     +
9614     + /* Inline name, so we don't need to separately kmalloc small ones */
9615     + char iname[DNAME_INLINE_LEN_MIN];
9616     +};
9617     +
9618     +/* Directory hash table. */
9619     +struct unionfs_dir_state {
9620     + unsigned int cookie; /* the cookie, based off of rdversion */
9621     + unsigned int offset; /* The entry we have returned. */
9622     + int bindex;
9623     + loff_t dirpos; /* offset within the lower level directory */
9624     + int size; /* How big is the hash table? */
9625     + int hashentries; /* How many entries have been inserted? */
9626     + unsigned long access;
9627     +
9628     + /* This cache list is used when the inode keeps us around. */
9629     + struct list_head cache;
9630     + struct list_head list[0];
9631     +};
9632     +
9633     +/* externs needed for fanout.h or sioq.h */
9634     +extern int unionfs_get_nlinks(const struct inode *inode);
9635     +extern void unionfs_copy_attr_times(struct inode *upper);
9636     +extern void unionfs_copy_attr_all(struct inode *dest, const struct inode *src);
9637     +
9638     +/* include miscellaneous macros */
9639     +#include "fanout.h"
9640     +#include "sioq.h"
9641     +
9642     +/* externs for cache creation/deletion routines */
9643     +extern void unionfs_destroy_filldir_cache(void);
9644     +extern int unionfs_init_filldir_cache(void);
9645     +extern int unionfs_init_inode_cache(void);
9646     +extern void unionfs_destroy_inode_cache(void);
9647     +extern int unionfs_init_dentry_cache(void);
9648     +extern void unionfs_destroy_dentry_cache(void);
9649     +
9650     +/* Initialize and free readdir-specific state. */
9651     +extern int init_rdstate(struct file *file);
9652     +extern struct unionfs_dir_state *alloc_rdstate(struct inode *inode,
9653     + int bindex);
9654     +extern struct unionfs_dir_state *find_rdstate(struct inode *inode,
9655     + loff_t fpos);
9656     +extern void free_rdstate(struct unionfs_dir_state *state);
9657     +extern int add_filldir_node(struct unionfs_dir_state *rdstate,
9658     + const char *name, int namelen, int bindex,
9659     + int whiteout);
9660     +extern struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
9661     + const char *name, int namelen,
9662     + int is_whiteout);
9663     +
9664     +extern struct dentry **alloc_new_dentries(int objs);
9665     +extern struct unionfs_data *alloc_new_data(int objs);
9666     +
9667     +/* We can only use 32-bits of offset for rdstate --- blech! */
9668     +#define DIREOF (0xfffff)
9669     +#define RDOFFBITS 20 /* This is the number of bits in DIREOF. */
9670     +#define MAXRDCOOKIE (0xfff)
9671     +/* Turn an rdstate into an offset. */
9672     +static inline off_t rdstate2offset(struct unionfs_dir_state *buf)
9673     +{
9674     + off_t tmp;
9675     +
9676     + tmp = ((buf->cookie & MAXRDCOOKIE) << RDOFFBITS)
9677     + | (buf->offset & DIREOF);
9678     + return tmp;
9679     +}
9680     +
9681     +/* Macros for locking a super_block. */
9682     +enum unionfs_super_lock_class {
9683     + UNIONFS_SMUTEX_NORMAL,
9684     + UNIONFS_SMUTEX_PARENT, /* when locking on behalf of file */
9685     + UNIONFS_SMUTEX_CHILD, /* when locking on behalf of dentry */
9686     +};
9687     +static inline void unionfs_read_lock(struct super_block *sb, int subclass)
9688     +{
9689     + if (UNIONFS_SB(sb)->write_lock_owner &&
9690     + UNIONFS_SB(sb)->write_lock_owner == current->pid)
9691     + return;
9692     + down_read_nested(&UNIONFS_SB(sb)->rwsem, subclass);
9693     +}
9694     +static inline void unionfs_read_unlock(struct super_block *sb)
9695     +{
9696     + if (UNIONFS_SB(sb)->write_lock_owner &&
9697     + UNIONFS_SB(sb)->write_lock_owner == current->pid)
9698     + return;
9699     + up_read(&UNIONFS_SB(sb)->rwsem);
9700     +}
9701     +static inline void unionfs_write_lock(struct super_block *sb)
9702     +{
9703     + down_write(&UNIONFS_SB(sb)->rwsem);
9704     + UNIONFS_SB(sb)->write_lock_owner = current->pid;
9705     +}
9706     +static inline void unionfs_write_unlock(struct super_block *sb)
9707     +{
9708     + up_write(&UNIONFS_SB(sb)->rwsem);
9709     + UNIONFS_SB(sb)->write_lock_owner = 0;
9710     +}
9711     +
9712     +static inline void unionfs_double_lock_dentry(struct dentry *d1,
9713     + struct dentry *d2)
9714     +{
9715     + BUG_ON(d1 == d2);
9716     + if (d1 < d2) {
9717     + unionfs_lock_dentry(d1, UNIONFS_DMUTEX_PARENT);
9718     + unionfs_lock_dentry(d2, UNIONFS_DMUTEX_CHILD);
9719     + } else {
9720     + unionfs_lock_dentry(d2, UNIONFS_DMUTEX_PARENT);
9721     + unionfs_lock_dentry(d1, UNIONFS_DMUTEX_CHILD);
9722     + }
9723     +}
9724     +
9725     +static inline void unionfs_double_unlock_dentry(struct dentry *d1,
9726     + struct dentry *d2)
9727     +{
9728     + BUG_ON(d1 == d2);
9729     + if (d1 < d2) { /* unlock in reverse order than double_lock_dentry */
9730     + unionfs_unlock_dentry(d1);
9731     + unionfs_unlock_dentry(d2);
9732     + } else {
9733     + unionfs_unlock_dentry(d2);
9734     + unionfs_unlock_dentry(d1);
9735     + }
9736     +}
9737     +
9738     +static inline void unionfs_double_lock_parents(struct dentry *p1,
9739     + struct dentry *p2)
9740     +{
9741     + if (p1 == p2) {
9742     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9743     + return;
9744     + }
9745     + if (p1 < p2) {
9746     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9747     + unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_CHILD);
9748     + } else {
9749     + unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_PARENT);
9750     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_CHILD);
9751     + }
9752     +}
9753     +
9754     +static inline void unionfs_double_unlock_parents(struct dentry *p1,
9755     + struct dentry *p2)
9756     +{
9757     + if (p1 == p2) {
9758     + unionfs_unlock_dentry(p1);
9759     + return;
9760     + }
9761     + if (p1 < p2) { /* unlock in reverse order of double_lock_parents */
9762     + unionfs_unlock_dentry(p1);
9763     + unionfs_unlock_dentry(p2);
9764     + } else {
9765     + unionfs_unlock_dentry(p2);
9766     + unionfs_unlock_dentry(p1);
9767     + }
9768     +}
9769     +
9770     +extern int new_dentry_private_data(struct dentry *dentry, int subclass);
9771     +extern int realloc_dentry_private_data(struct dentry *dentry);
9772     +extern void free_dentry_private_data(struct dentry *dentry);
9773     +extern void update_bstart(struct dentry *dentry);
9774     +extern int init_lower_nd(struct nameidata *nd, unsigned int flags);
9775     +extern void release_lower_nd(struct nameidata *nd, int err);
9776     +
9777     +/*
9778     + * EXTERNALS:
9779     + */
9780     +
9781     +/* replicates the directory structure up to given dentry in given branch */
9782     +extern struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
9783     + const char *name, int bindex);
9784     +
9785     +/* partial lookup */
9786     +extern int unionfs_partial_lookup(struct dentry *dentry,
9787     + struct dentry *parent);
9788     +extern struct dentry *unionfs_lookup_full(struct dentry *dentry,
9789     + struct dentry *parent,
9790     + int lookupmode);
9791     +
9792     +/* copies a file from dbstart to newbindex branch */
9793     +extern int copyup_file(struct inode *dir, struct file *file, int bstart,
9794     + int newbindex, loff_t size);
9795     +extern int copyup_named_file(struct inode *dir, struct file *file,
9796     + char *name, int bstart, int new_bindex,
9797     + loff_t len);
9798     +/* copies a dentry from dbstart to newbindex branch */
9799     +extern int copyup_dentry(struct inode *dir, struct dentry *dentry,
9800     + int bstart, int new_bindex, const char *name,
9801     + int namelen, struct file **copyup_file, loff_t len);
9802     +/* helper functions for post-copyup actions */
9803     +extern void unionfs_postcopyup_setmnt(struct dentry *dentry);
9804     +extern void unionfs_postcopyup_release(struct dentry *dentry);
9805     +
9806     +/* Is this directory empty: 0 if it is empty, -ENOTEMPTY if not. */
9807     +extern int check_empty(struct dentry *dentry, struct dentry *parent,
9808     + struct unionfs_dir_state **namelist);
9809     +/* whiteout and opaque directory helpers */
9810     +extern char *alloc_whname(const char *name, int len);
9811     +extern bool is_whiteout_name(char **namep, int *namelenp);
9812     +extern bool is_validname(const char *name);
9813     +extern struct dentry *lookup_whiteout(const char *name,
9814     + struct dentry *lower_parent);
9815     +extern struct dentry *find_first_whiteout(struct dentry *dentry);
9816     +extern int unlink_whiteout(struct dentry *wh_dentry);
9817     +extern int check_unlink_whiteout(struct dentry *dentry,
9818     + struct dentry *lower_dentry, int bindex);
9819     +extern int create_whiteout(struct dentry *dentry, int start);
9820     +extern int delete_whiteouts(struct dentry *dentry, int bindex,
9821     + struct unionfs_dir_state *namelist);
9822     +extern int is_opaque_dir(struct dentry *dentry, int bindex);
9823     +extern int make_dir_opaque(struct dentry *dir, int bindex);
9824     +extern void unionfs_set_max_namelen(long *namelen);
9825     +
9826     +extern void unionfs_reinterpose(struct dentry *this_dentry);
9827     +extern struct super_block *unionfs_duplicate_super(struct super_block *sb);
9828     +
9829     +/* Locking functions. */
9830     +extern int unionfs_setlk(struct file *file, int cmd, struct file_lock *fl);
9831     +extern int unionfs_getlk(struct file *file, struct file_lock *fl);
9832     +
9833     +/* Common file operations. */
9834     +extern int unionfs_file_revalidate(struct file *file, struct dentry *parent,
9835     + bool willwrite);
9836     +extern int unionfs_open(struct inode *inode, struct file *file);
9837     +extern int unionfs_file_release(struct inode *inode, struct file *file);
9838     +extern int unionfs_flush(struct file *file, fl_owner_t id);
9839     +extern long unionfs_ioctl(struct file *file, unsigned int cmd,
9840     + unsigned long arg);
9841     +extern int unionfs_fsync(struct file *file, int datasync);
9842     +extern int unionfs_fasync(int fd, struct file *file, int flag);
9843     +
9844     +/* Inode operations */
9845     +extern struct inode *unionfs_iget(struct super_block *sb, unsigned long ino);
9846     +extern int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
9847     + struct inode *new_dir, struct dentry *new_dentry);
9848     +extern int unionfs_unlink(struct inode *dir, struct dentry *dentry);
9849     +extern int unionfs_rmdir(struct inode *dir, struct dentry *dentry);
9850     +
9851     +extern bool __unionfs_d_revalidate(struct dentry *dentry,
9852     + struct dentry *parent, bool willwrite);
9853     +extern bool is_negative_lower(const struct dentry *dentry);
9854     +extern bool is_newer_lower(const struct dentry *dentry);
9855     +extern void purge_sb_data(struct super_block *sb);
9856     +
9857     +/* The values for unionfs_interpose's flag. */
9858     +#define INTERPOSE_DEFAULT 0
9859     +#define INTERPOSE_LOOKUP 1
9860     +#define INTERPOSE_REVAL 2
9861     +#define INTERPOSE_REVAL_NEG 3
9862     +#define INTERPOSE_PARTIAL 4
9863     +
9864     +extern struct dentry *unionfs_interpose(struct dentry *this_dentry,
9865     + struct super_block *sb, int flag);
9866     +
9867     +#ifdef CONFIG_UNION_FS_XATTR
9868     +/* Extended attribute functions. */
9869     +extern void *unionfs_xattr_alloc(size_t size, size_t limit);
9870     +static inline void unionfs_xattr_kfree(const void *p)
9871     +{
9872     + kfree(p);
9873     +}
9874     +extern ssize_t unionfs_getxattr(struct dentry *dentry, const char *name,
9875     + void *value, size_t size);
9876     +extern int unionfs_removexattr(struct dentry *dentry, const char *name);
9877     +extern ssize_t unionfs_listxattr(struct dentry *dentry, char *list,
9878     + size_t size);
9879     +extern int unionfs_setxattr(struct dentry *dentry, const char *name,
9880     + const void *value, size_t size, int flags);
9881     +#endif /* CONFIG_UNION_FS_XATTR */
9882     +
9883     +/* The root directory is unhashed, but isn't deleted. */
9884     +static inline int d_deleted(struct dentry *d)
9885     +{
9886     + return d_unhashed(d) && (d != d->d_sb->s_root);
9887     +}
9888     +
9889     +/* unionfs_permission, check if we should bypass error to facilitate copyup */
9890     +#define IS_COPYUP_ERR(err) ((err) == -EROFS)
9891     +
9892     +/* unionfs_open, check if we need to copyup the file */
9893     +#define OPEN_WRITE_FLAGS (O_WRONLY | O_RDWR | O_APPEND)
9894     +#define IS_WRITE_FLAG(flag) ((flag) & OPEN_WRITE_FLAGS)
9895     +
9896     +static inline int branchperms(const struct super_block *sb, int index)
9897     +{
9898     + BUG_ON(index < 0);
9899     + return UNIONFS_SB(sb)->data[index].branchperms;
9900     +}
9901     +
9902     +static inline int set_branchperms(struct super_block *sb, int index, int perms)
9903     +{
9904     + BUG_ON(index < 0);
9905     + UNIONFS_SB(sb)->data[index].branchperms = perms;
9906     + return perms;
9907     +}
9908     +
9909     +/* check if readonly lower inode, but possibly unlinked (no inode->i_sb) */
9910     +static inline int __is_rdonly(const struct inode *inode)
9911     +{
9912     + /* if unlinked, can't be readonly (?) */
9913     + if (!inode->i_sb)
9914     + return 0;
9915     + return IS_RDONLY(inode);
9916     +
9917     +}
9918     +/* Is this file on a read-only branch? */
9919     +static inline int is_robranch_super(const struct super_block *sb, int index)
9920     +{
9921     + int ret;
9922     +
9923     + ret = (!(branchperms(sb, index) & MAY_WRITE)) ? -EROFS : 0;
9924     + return ret;
9925     +}
9926     +
9927     +/* Is this file on a read-only branch? */
9928     +static inline int is_robranch_idx(const struct dentry *dentry, int index)
9929     +{
9930     + struct super_block *lower_sb;
9931     +
9932     + BUG_ON(index < 0);
9933     +
9934     + if (!(branchperms(dentry->d_sb, index) & MAY_WRITE))
9935     + return -EROFS;
9936     +
9937     + lower_sb = unionfs_lower_super_idx(dentry->d_sb, index);
9938     + BUG_ON(lower_sb == NULL);
9939     + /*
9940     + * test sb flags directly, not IS_RDONLY(lower_inode) because the
9941     + * lower_dentry could be a negative.
9942     + */
9943     + if (lower_sb->s_flags & MS_RDONLY)
9944     + return -EROFS;
9945     +
9946     + return 0;
9947     +}
9948     +
9949     +static inline int is_robranch(const struct dentry *dentry)
9950     +{
9951     + int index;
9952     +
9953     + index = UNIONFS_D(dentry)->bstart;
9954     + BUG_ON(index < 0);
9955     +
9956     + return is_robranch_idx(dentry, index);
9957     +}
9958     +
9959     +/*
9960     + * EXTERNALS:
9961     + */
9962     +extern int check_branch(struct nameidata *nd);
9963     +extern int parse_branch_mode(const char *name, int *perms);
9964     +
9965     +/* locking helpers */
9966     +static inline struct dentry *lock_parent(struct dentry *dentry)
9967     +{
9968     + struct dentry *dir = dget_parent(dentry);
9969     + mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
9970     + return dir;
9971     +}
9972     +static inline struct dentry *lock_parent_wh(struct dentry *dentry)
9973     +{
9974     + struct dentry *dir = dget_parent(dentry);
9975     +
9976     + mutex_lock_nested(&dir->d_inode->i_mutex, UNIONFS_DMUTEX_WHITEOUT);
9977     + return dir;
9978     +}
9979     +
9980     +static inline void unlock_dir(struct dentry *dir)
9981     +{
9982     + mutex_unlock(&dir->d_inode->i_mutex);
9983     + dput(dir);
9984     +}
9985     +
9986     +/* lock base inode mutex before calling lookup_one_len */
9987     +static inline struct dentry *lookup_lck_len(const char *name,
9988     + struct dentry *base, int len)
9989     +{
9990     + struct dentry *d;
9991     + mutex_lock(&base->d_inode->i_mutex);
9992     + d = lookup_one_len(name, base, len);
9993     + mutex_unlock(&base->d_inode->i_mutex);
9994     + return d;
9995     +}
9996     +
9997     +static inline struct vfsmount *unionfs_mntget(struct dentry *dentry,
9998     + int bindex)
9999     +{
10000     + struct vfsmount *mnt;
10001     +
10002     + BUG_ON(!dentry || bindex < 0);
10003     +
10004     + mnt = mntget(unionfs_lower_mnt_idx(dentry, bindex));
10005     +#ifdef CONFIG_UNION_FS_DEBUG
10006     + if (!mnt)
10007     + pr_debug("unionfs: mntget: mnt=%p bindex=%d\n",
10008     + mnt, bindex);
10009     +#endif /* CONFIG_UNION_FS_DEBUG */
10010     +
10011     + return mnt;
10012     +}
10013     +
10014     +static inline void unionfs_mntput(struct dentry *dentry, int bindex)
10015     +{
10016     + struct vfsmount *mnt;
10017     +
10018     + if (!dentry && bindex < 0)
10019     + return;
10020     + BUG_ON(!dentry || bindex < 0);
10021     +
10022     + mnt = unionfs_lower_mnt_idx(dentry, bindex);
10023     +#ifdef CONFIG_UNION_FS_DEBUG
10024     + /*
10025     + * Directories can have NULL lower objects in between start/end, but
10026     + * NOT if at the start/end range. We cannot verify that this dentry
10027     + * is a type=DIR, because it may already be a negative dentry. But
10028     + * if dbstart is greater than dbend, we know that this couldn't have
10029     + * been a regular file: it had to have been a directory.
10030     + */
10031     + if (!mnt && !(bindex > dbstart(dentry) && bindex < dbend(dentry)))
10032     + pr_debug("unionfs: mntput: mnt=%p bindex=%d\n", mnt, bindex);
10033     +#endif /* CONFIG_UNION_FS_DEBUG */
10034     + mntput(mnt);
10035     +}
10036     +
10037     +#ifdef CONFIG_UNION_FS_DEBUG
10038     +
10039     +/* useful for tracking code reachability */
10040     +#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
10041     +
10042     +#define unionfs_check_inode(i) __unionfs_check_inode((i), \
10043     + __FILE__, __func__, __LINE__)
10044     +#define unionfs_check_dentry(d) __unionfs_check_dentry((d), \
10045     + __FILE__, __func__, __LINE__)
10046     +#define unionfs_check_file(f) __unionfs_check_file((f), \
10047     + __FILE__, __func__, __LINE__)
10048     +#define unionfs_check_nd(n) __unionfs_check_nd((n), \
10049     + __FILE__, __func__, __LINE__)
10050     +#define show_branch_counts(sb) __show_branch_counts((sb), \
10051     + __FILE__, __func__, __LINE__)
10052     +#define show_inode_times(i) __show_inode_times((i), \
10053     + __FILE__, __func__, __LINE__)
10054     +#define show_dinode_times(d) __show_dinode_times((d), \
10055     + __FILE__, __func__, __LINE__)
10056     +#define show_inode_counts(i) __show_inode_counts((i), \
10057     + __FILE__, __func__, __LINE__)
10058     +
10059     +extern void __unionfs_check_inode(const struct inode *inode, const char *fname,
10060     + const char *fxn, int line);
10061     +extern void __unionfs_check_dentry(const struct dentry *dentry,
10062     + const char *fname, const char *fxn,
10063     + int line);
10064     +extern void __unionfs_check_file(const struct file *file,
10065     + const char *fname, const char *fxn, int line);
10066     +extern void __unionfs_check_nd(const struct nameidata *nd,
10067     + const char *fname, const char *fxn, int line);
10068     +extern void __show_branch_counts(const struct super_block *sb,
10069     + const char *file, const char *fxn, int line);
10070     +extern void __show_inode_times(const struct inode *inode,
10071     + const char *file, const char *fxn, int line);
10072     +extern void __show_dinode_times(const struct dentry *dentry,
10073     + const char *file, const char *fxn, int line);
10074     +extern void __show_inode_counts(const struct inode *inode,
10075     + const char *file, const char *fxn, int line);
10076     +
10077     +#else /* not CONFIG_UNION_FS_DEBUG */
10078     +
10079     +/* we leave useful hooks for these check functions throughout the code */
10080     +#define unionfs_check_inode(i) do { } while (0)
10081     +#define unionfs_check_dentry(d) do { } while (0)
10082     +#define unionfs_check_file(f) do { } while (0)
10083     +#define unionfs_check_nd(n) do { } while (0)
10084     +#define show_branch_counts(sb) do { } while (0)
10085     +#define show_inode_times(i) do { } while (0)
10086     +#define show_dinode_times(d) do { } while (0)
10087     +#define show_inode_counts(i) do { } while (0)
10088     +
10089     +#endif /* not CONFIG_UNION_FS_DEBUG */
10090     +
10091     +#endif /* not _UNION_H_ */
10092     diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
10093     new file mode 100644
10094     index 0000000..542c513
10095     --- /dev/null
10096     +++ b/fs/unionfs/unlink.c
10097     @@ -0,0 +1,278 @@
10098     +/*
10099     + * Copyright (c) 2003-2010 Erez Zadok
10100     + * Copyright (c) 2003-2006 Charles P. Wright
10101     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10102     + * Copyright (c) 2005-2006 Junjiro Okajima
10103     + * Copyright (c) 2005 Arun M. Krishnakumar
10104     + * Copyright (c) 2004-2006 David P. Quigley
10105     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10106     + * Copyright (c) 2003 Puja Gupta
10107     + * Copyright (c) 2003 Harikesavan Krishnan
10108     + * Copyright (c) 2003-2010 Stony Brook University
10109     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
10110     + *
10111     + * This program is free software; you can redistribute it and/or modify
10112     + * it under the terms of the GNU General Public License version 2 as
10113     + * published by the Free Software Foundation.
10114     + */
10115     +
10116     +#include "union.h"
10117     +
10118     +/*
10119     + * Helper function for Unionfs's unlink operation.
10120     + *
10121     + * The main goal of this function is to optimize the unlinking of non-dir
10122     + * objects in unionfs by deleting all possible lower inode objects from the
10123     + * underlying branches having same dentry name as the non-dir dentry on
10124     + * which this unlink operation is called. This way we delete as many lower
10125     + * inodes as possible, and save space. Whiteouts need to be created in
10126     + * branch0 only if unlinking fails on any of the lower branch other than
10127     + * branch0, or if a lower branch is marked read-only.
10128     + *
10129     + * Also, while unlinking a file, if we encounter any dir type entry in any
10130     + * intermediate branch, then we remove the directory by calling vfs_rmdir.
10131     + * The following special cases are also handled:
10132     +
10133     + * (1) If an error occurs in branch0 during vfs_unlink, then we return
10134     + * appropriate error.
10135     + *
10136     + * (2) If we get an error during unlink in any of other lower branch other
10137     + * than branch0, then we create a whiteout in branch0.
10138     + *
10139     + * (3) If a whiteout already exists in any intermediate branch, we delete
10140     + * all possible inodes only up to that branch (this is an "opaqueness"
10141     + * as as per Documentation/filesystems/unionfs/concepts.txt).
10142     + *
10143     + */
10144     +static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry,
10145     + struct dentry *parent)
10146     +{
10147     + struct dentry *lower_dentry;
10148     + struct dentry *lower_dir_dentry;
10149     + int bindex;
10150     + int err = 0;
10151     +
10152     + err = unionfs_partial_lookup(dentry, parent);
10153     + if (err)
10154     + goto out;
10155     +
10156     + /* trying to unlink all possible valid instances */
10157     + for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
10158     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10159     + if (!lower_dentry || !lower_dentry->d_inode)
10160     + continue;
10161     +
10162     + lower_dir_dentry = lock_parent(lower_dentry);
10163     +
10164     + /* avoid destroying the lower inode if the object is in use */
10165     + dget(lower_dentry);
10166     + err = is_robranch_super(dentry->d_sb, bindex);
10167     + if (!err) {
10168     + /* see Documentation/filesystems/unionfs/issues.txt */
10169     + lockdep_off();
10170     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
10171     + err = vfs_unlink(lower_dir_dentry->d_inode,
10172     + lower_dentry);
10173     + else
10174     + err = vfs_rmdir(lower_dir_dentry->d_inode,
10175     + lower_dentry);
10176     + lockdep_on();
10177     + }
10178     +
10179     + /* if lower object deletion succeeds, update inode's times */
10180     + if (!err)
10181     + unionfs_copy_attr_times(dentry->d_inode);
10182     + dput(lower_dentry);
10183     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10184     + unlock_dir(lower_dir_dentry);
10185     +
10186     + if (err)
10187     + break;
10188     + }
10189     +
10190     + /*
10191     + * Create the whiteout in branch 0 (highest priority) only if (a)
10192     + * there was an error in any intermediate branch other than branch 0
10193     + * due to failure of vfs_unlink/vfs_rmdir or (b) a branch marked or
10194     + * mounted read-only.
10195     + */
10196     + if (err) {
10197     + if ((bindex == 0) ||
10198     + ((bindex == dbstart(dentry)) &&
10199     + (!IS_COPYUP_ERR(err))))
10200     + goto out;
10201     + else {
10202     + if (!IS_COPYUP_ERR(err))
10203     + pr_debug("unionfs: lower object deletion "
10204     + "failed in branch:%d\n", bindex);
10205     + err = create_whiteout(dentry, sbstart(dentry->d_sb));
10206     + }
10207     + }
10208     +
10209     +out:
10210     + if (!err)
10211     + inode_dec_link_count(dentry->d_inode);
10212     +
10213     + /* We don't want to leave negative leftover dentries for revalidate. */
10214     + if (!err && (dbopaque(dentry) != -1))
10215     + update_bstart(dentry);
10216     +
10217     + return err;
10218     +}
10219     +
10220     +int unionfs_unlink(struct inode *dir, struct dentry *dentry)
10221     +{
10222     + int err = 0;
10223     + struct inode *inode = dentry->d_inode;
10224     + struct dentry *parent;
10225     + int valid;
10226     +
10227     + BUG_ON(S_ISDIR(inode->i_mode));
10228     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10229     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10230     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10231     +
10232     + valid = __unionfs_d_revalidate(dentry, parent, false);
10233     + if (unlikely(!valid)) {
10234     + err = -ESTALE;
10235     + goto out;
10236     + }
10237     + unionfs_check_dentry(dentry);
10238     +
10239     + err = unionfs_unlink_whiteout(dir, dentry, parent);
10240     + /* call d_drop so the system "forgets" about us */
10241     + if (!err) {
10242     + unionfs_postcopyup_release(dentry);
10243     + unionfs_postcopyup_setmnt(parent);
10244     + if (inode->i_nlink == 0) /* drop lower inodes */
10245     + iput_lowers_all(inode, false);
10246     + d_drop(dentry);
10247     + /*
10248     + * if unlink/whiteout succeeded, parent dir mtime has
10249     + * changed
10250     + */
10251     + unionfs_copy_attr_times(dir);
10252     + }
10253     +
10254     +out:
10255     + if (!err) {
10256     + unionfs_check_dentry(dentry);
10257     + unionfs_check_inode(dir);
10258     + }
10259     + unionfs_unlock_dentry(dentry);
10260     + unionfs_unlock_parent(dentry, parent);
10261     + unionfs_read_unlock(dentry->d_sb);
10262     + return err;
10263     +}
10264     +
10265     +static int unionfs_rmdir_first(struct inode *dir, struct dentry *dentry,
10266     + struct unionfs_dir_state *namelist)
10267     +{
10268     + int err;
10269     + struct dentry *lower_dentry;
10270     + struct dentry *lower_dir_dentry = NULL;
10271     +
10272     + /* Here we need to remove whiteout entries. */
10273     + err = delete_whiteouts(dentry, dbstart(dentry), namelist);
10274     + if (err)
10275     + goto out;
10276     +
10277     + lower_dentry = unionfs_lower_dentry(dentry);
10278     +
10279     + lower_dir_dentry = lock_parent(lower_dentry);
10280     +
10281     + /* avoid destroying the lower inode if the file is in use */
10282     + dget(lower_dentry);
10283     + err = is_robranch(dentry);
10284     + if (!err)
10285     + err = vfs_rmdir(lower_dir_dentry->d_inode, lower_dentry);
10286     + dput(lower_dentry);
10287     +
10288     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10289     + /* propagate number of hard-links */
10290     + dentry->d_inode->i_nlink = unionfs_get_nlinks(dentry->d_inode);
10291     +
10292     +out:
10293     + if (lower_dir_dentry)
10294     + unlock_dir(lower_dir_dentry);
10295     + return err;
10296     +}
10297     +
10298     +int unionfs_rmdir(struct inode *dir, struct dentry *dentry)
10299     +{
10300     + int err = 0;
10301     + struct unionfs_dir_state *namelist = NULL;
10302     + struct dentry *parent;
10303     + int dstart, dend;
10304     + bool valid;
10305     +
10306     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10307     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10308     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10309     +
10310     + valid = __unionfs_d_revalidate(dentry, parent, false);
10311     + if (unlikely(!valid)) {
10312     + err = -ESTALE;
10313     + goto out;
10314     + }
10315     + unionfs_check_dentry(dentry);
10316     +
10317     + /* check if this unionfs directory is empty or not */
10318     + err = check_empty(dentry, parent, &namelist);
10319     + if (err)
10320     + goto out;
10321     +
10322     + err = unionfs_rmdir_first(dir, dentry, namelist);
10323     + dstart = dbstart(dentry);
10324     + dend = dbend(dentry);
10325     + /*
10326     + * We create a whiteout for the directory if there was an error to
10327     + * rmdir the first directory entry in the union. Otherwise, we
10328     + * create a whiteout only if there is no chance that a lower
10329     + * priority branch might also have the same named directory. IOW,
10330     + * if there is not another same-named directory at a lower priority
10331     + * branch, then we don't need to create a whiteout for it.
10332     + */
10333     + if (!err) {
10334     + if (dstart < dend)
10335     + err = create_whiteout(dentry, dstart);
10336     + } else {
10337     + int new_err;
10338     +
10339     + if (dstart == 0)
10340     + goto out;
10341     +
10342     + /* exit if the error returned was NOT -EROFS */
10343     + if (!IS_COPYUP_ERR(err))
10344     + goto out;
10345     +
10346     + new_err = create_whiteout(dentry, dstart - 1);
10347     + if (new_err != -EEXIST)
10348     + err = new_err;
10349     + }
10350     +
10351     +out:
10352     + /*
10353     + * Drop references to lower dentry/inode so storage space for them
10354     + * can be reclaimed. Then, call d_drop so the system "forgets"
10355     + * about us.
10356     + */
10357     + if (!err) {
10358     + iput_lowers_all(dentry->d_inode, false);
10359     + dput(unionfs_lower_dentry_idx(dentry, dstart));
10360     + unionfs_set_lower_dentry_idx(dentry, dstart, NULL);
10361     + d_drop(dentry);
10362     + /* update our lower vfsmnts, in case a copyup took place */
10363     + unionfs_postcopyup_setmnt(dentry);
10364     + unionfs_check_dentry(dentry);
10365     + unionfs_check_inode(dir);
10366     + }
10367     +
10368     + if (namelist)
10369     + free_rdstate(namelist);
10370     +
10371     + unionfs_unlock_dentry(dentry);
10372     + unionfs_unlock_parent(dentry, parent);
10373     + unionfs_read_unlock(dentry->d_sb);
10374     + return err;
10375     +}
10376     diff --git a/fs/unionfs/whiteout.c b/fs/unionfs/whiteout.c
10377     new file mode 100644
10378     index 0000000..405073a
10379     --- /dev/null
10380     +++ b/fs/unionfs/whiteout.c
10381     @@ -0,0 +1,584 @@
10382     +/*
10383     + * Copyright (c) 2003-2010 Erez Zadok
10384     + * Copyright (c) 2003-2006 Charles P. Wright
10385     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10386     + * Copyright (c) 2005-2006 Junjiro Okajima
10387     + * Copyright (c) 2005 Arun M. Krishnakumar
10388     + * Copyright (c) 2004-2006 David P. Quigley
10389     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10390     + * Copyright (c) 2003 Puja Gupta
10391     + * Copyright (c) 2003 Harikesavan Krishnan
10392     + * Copyright (c) 2003-2010 Stony Brook University
10393     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
10394     + *
10395     + * This program is free software; you can redistribute it and/or modify
10396     + * it under the terms of the GNU General Public License version 2 as
10397     + * published by the Free Software Foundation.
10398     + */
10399     +
10400     +#include "union.h"
10401     +
10402     +/*
10403     + * whiteout and opaque directory helpers
10404     + */
10405     +
10406     +/* What do we use for whiteouts. */
10407     +#define UNIONFS_WHPFX ".wh."
10408     +#define UNIONFS_WHLEN 4
10409     +/*
10410     + * If a directory contains this file, then it is opaque. We start with the
10411     + * .wh. flag so that it is blocked by lookup.
10412     + */
10413     +#define UNIONFS_DIR_OPAQUE_NAME "__dir_opaque"
10414     +#define UNIONFS_DIR_OPAQUE UNIONFS_WHPFX UNIONFS_DIR_OPAQUE_NAME
10415     +
10416     +/* construct whiteout filename */
10417     +char *alloc_whname(const char *name, int len)
10418     +{
10419     + char *buf;
10420     +
10421     + buf = kmalloc(len + UNIONFS_WHLEN + 1, GFP_KERNEL);
10422     + if (unlikely(!buf))
10423     + return ERR_PTR(-ENOMEM);
10424     +
10425     + strcpy(buf, UNIONFS_WHPFX);
10426     + strlcat(buf, name, len + UNIONFS_WHLEN + 1);
10427     +
10428     + return buf;
10429     +}
10430     +
10431     +/*
10432     + * XXX: this can be inline or CPP macro, but is here to keep all whiteout
10433     + * code in one place.
10434     + */
10435     +void unionfs_set_max_namelen(long *namelen)
10436     +{
10437     + *namelen -= UNIONFS_WHLEN;
10438     +}
10439     +
10440     +/* check if @namep is a whiteout, update @namep and @namelenp accordingly */
10441     +bool is_whiteout_name(char **namep, int *namelenp)
10442     +{
10443     + if (*namelenp > UNIONFS_WHLEN &&
10444     + !strncmp(*namep, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
10445     + *namep += UNIONFS_WHLEN;
10446     + *namelenp -= UNIONFS_WHLEN;
10447     + return true;
10448     + }
10449     + return false;
10450     +}
10451     +
10452     +/* is the filename valid == !(whiteout for a file or opaque dir marker) */
10453     +bool is_validname(const char *name)
10454     +{
10455     + if (!strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN))
10456     + return false;
10457     + if (!strncmp(name, UNIONFS_DIR_OPAQUE_NAME,
10458     + sizeof(UNIONFS_DIR_OPAQUE_NAME) - 1))
10459     + return false;
10460     + return true;
10461     +}
10462     +
10463     +/*
10464     + * Look for a whiteout @name in @lower_parent directory. If error, return
10465     + * ERR_PTR. Caller must dput() the returned dentry if not an error.
10466     + *
10467     + * XXX: some callers can reuse the whname allocated buffer to avoid repeated
10468     + * free then re-malloc calls. Need to provide a different API for those
10469     + * callers.
10470     + */
10471     +struct dentry *lookup_whiteout(const char *name, struct dentry *lower_parent)
10472     +{
10473     + char *whname = NULL;
10474     + int err = 0, namelen;
10475     + struct dentry *wh_dentry = NULL;
10476     +
10477     + namelen = strlen(name);
10478     + whname = alloc_whname(name, namelen);
10479     + if (unlikely(IS_ERR(whname))) {
10480     + err = PTR_ERR(whname);
10481     + goto out;
10482     + }
10483     +
10484     + /* check if whiteout exists in this branch: lookup .wh.foo */
10485     + wh_dentry = lookup_lck_len(whname, lower_parent, strlen(whname));
10486     + if (IS_ERR(wh_dentry)) {
10487     + err = PTR_ERR(wh_dentry);
10488     + goto out;
10489     + }
10490     +
10491     + /* check if negative dentry (ENOENT) */
10492     + if (!wh_dentry->d_inode)
10493     + goto out;
10494     +
10495     + /* whiteout found: check if valid type */
10496     + if (!S_ISREG(wh_dentry->d_inode->i_mode)) {
10497     + printk(KERN_ERR "unionfs: invalid whiteout %s entry type %d\n",
10498     + whname, wh_dentry->d_inode->i_mode);
10499     + dput(wh_dentry);
10500     + err = -EIO;
10501     + goto out;
10502     + }
10503     +
10504     +out:
10505     + kfree(whname);
10506     + if (err)
10507     + wh_dentry = ERR_PTR(err);
10508     + return wh_dentry;
10509     +}
10510     +
10511     +/* find and return first whiteout in parent directory, else ENOENT */
10512     +struct dentry *find_first_whiteout(struct dentry *dentry)
10513     +{
10514     + int bindex, bstart, bend;
10515     + struct dentry *parent, *lower_parent, *wh_dentry;
10516     +
10517     + parent = dget_parent(dentry);
10518     +
10519     + bstart = dbstart(parent);
10520     + bend = dbend(parent);
10521     + wh_dentry = ERR_PTR(-ENOENT);
10522     +
10523     + for (bindex = bstart; bindex <= bend; bindex++) {
10524     + lower_parent = unionfs_lower_dentry_idx(parent, bindex);
10525     + if (!lower_parent)
10526     + continue;
10527     + wh_dentry = lookup_whiteout(dentry->d_name.name, lower_parent);
10528     + if (IS_ERR(wh_dentry))
10529     + continue;
10530     + if (wh_dentry->d_inode)
10531     + break;
10532     + dput(wh_dentry);
10533     + wh_dentry = ERR_PTR(-ENOENT);
10534     + }
10535     +
10536     + dput(parent);
10537     +
10538     + return wh_dentry;
10539     +}
10540     +
10541     +/*
10542     + * Unlink a whiteout dentry. Returns 0 or -errno. Caller must hold and
10543     + * release dentry reference.
10544     + */
10545     +int unlink_whiteout(struct dentry *wh_dentry)
10546     +{
10547     + int err;
10548     + struct dentry *lower_dir_dentry;
10549     +
10550     + /* dget and lock parent dentry */
10551     + lower_dir_dentry = lock_parent_wh(wh_dentry);
10552     +
10553     + /* see Documentation/filesystems/unionfs/issues.txt */
10554     + lockdep_off();
10555     + err = vfs_unlink(lower_dir_dentry->d_inode, wh_dentry);
10556     + lockdep_on();
10557     + unlock_dir(lower_dir_dentry);
10558     +
10559     + /*
10560     + * Whiteouts are special files and should be deleted no matter what
10561     + * (as if they never existed), in order to allow this create
10562     + * operation to succeed. This is especially important in sticky
10563     + * directories: a whiteout may have been created by one user, but
10564     + * the newly created file may be created by another user.
10565     + * Therefore, in order to maintain Unix semantics, if the vfs_unlink
10566     + * above failed, then we have to try to directly unlink the
10567     + * whiteout. Note: in the ODF version of unionfs, whiteout are
10568     + * handled much more cleanly.
10569     + */
10570     + if (err == -EPERM) {
10571     + struct inode *inode = lower_dir_dentry->d_inode;
10572     + err = inode->i_op->unlink(inode, wh_dentry);
10573     + }
10574     + if (err)
10575     + printk(KERN_ERR "unionfs: could not unlink whiteout %s, "
10576     + "err = %d\n", wh_dentry->d_name.name, err);
10577     +
10578     + return err;
10579     +
10580     +}
10581     +
10582     +/*
10583     + * Helper function when creating new objects (create, symlink, mknod, etc.).
10584     + * Checks to see if there's a whiteout in @lower_dentry's parent directory,
10585     + * whose name is taken from @dentry. Then tries to remove that whiteout, if
10586     + * found. If <dentry,bindex> is a branch marked readonly, return -EROFS.
10587     + * If it finds both a regular file and a whiteout, return -EIO (this should
10588     + * never happen).
10589     + *
10590     + * Return 0 if no whiteout was found. Return 1 if one was found and
10591     + * successfully removed. Therefore a value >= 0 tells the caller that
10592     + * @lower_dentry belongs to a good branch to create the new object in).
10593     + * Return -ERRNO if an error occurred during whiteout lookup or in trying to
10594     + * unlink the whiteout.
10595     + */
10596     +int check_unlink_whiteout(struct dentry *dentry, struct dentry *lower_dentry,
10597     + int bindex)
10598     +{
10599     + int err;
10600     + struct dentry *wh_dentry = NULL;
10601     + struct dentry *lower_dir_dentry = NULL;
10602     +
10603     + /* look for whiteout dentry first */
10604     + lower_dir_dentry = dget_parent(lower_dentry);
10605     + wh_dentry = lookup_whiteout(dentry->d_name.name, lower_dir_dentry);
10606     + dput(lower_dir_dentry);
10607     + if (IS_ERR(wh_dentry)) {
10608     + err = PTR_ERR(wh_dentry);
10609     + goto out;
10610     + }
10611     +
10612     + if (!wh_dentry->d_inode) { /* no whiteout exists*/
10613     + err = 0;
10614     + goto out_dput;
10615     + }
10616     +
10617     + /* check if regular file and whiteout were both found */
10618     + if (unlikely(lower_dentry->d_inode)) {
10619     + err = -EIO;
10620     + printk(KERN_ERR "unionfs: found both whiteout and regular "
10621     + "file in directory %s (branch %d)\n",
10622     + lower_dir_dentry->d_name.name, bindex);
10623     + goto out_dput;
10624     + }
10625     +
10626     + /* check if branch is writeable */
10627     + err = is_robranch_super(dentry->d_sb, bindex);
10628     + if (err)
10629     + goto out_dput;
10630     +
10631     + /* .wh.foo has been found, so let's unlink it */
10632     + err = unlink_whiteout(wh_dentry);
10633     + if (!err)
10634     + err = 1; /* a whiteout was found and successfully removed */
10635     +out_dput:
10636     + dput(wh_dentry);
10637     +out:
10638     + return err;
10639     +}
10640     +
10641     +/*
10642     + * Pass an unionfs dentry and an index. It will try to create a whiteout
10643     + * for the filename in dentry, and will try in branch 'index'. On error,
10644     + * it will proceed to a branch to the left.
10645     + */
10646     +int create_whiteout(struct dentry *dentry, int start)
10647     +{
10648     + int bstart, bend, bindex;
10649     + struct dentry *lower_dir_dentry;
10650     + struct dentry *lower_dentry;
10651     + struct dentry *lower_wh_dentry;
10652     + struct nameidata nd;
10653     + char *name = NULL;
10654     + int err = -EINVAL;
10655     +
10656     + verify_locked(dentry);
10657     +
10658     + bstart = dbstart(dentry);
10659     + bend = dbend(dentry);
10660     +
10661     + /* create dentry's whiteout equivalent */
10662     + name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
10663     + if (unlikely(IS_ERR(name))) {
10664     + err = PTR_ERR(name);
10665     + goto out;
10666     + }
10667     +
10668     + for (bindex = start; bindex >= 0; bindex--) {
10669     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10670     +
10671     + if (!lower_dentry) {
10672     + /*
10673     + * if lower dentry is not present, create the
10674     + * entire lower dentry directory structure and go
10675     + * ahead. Since we want to just create whiteout, we
10676     + * only want the parent dentry, and hence get rid of
10677     + * this dentry.
10678     + */
10679     + lower_dentry = create_parents(dentry->d_inode,
10680     + dentry,
10681     + dentry->d_name.name,
10682     + bindex);
10683     + if (!lower_dentry || IS_ERR(lower_dentry)) {
10684     + int ret = PTR_ERR(lower_dentry);
10685     + if (!IS_COPYUP_ERR(ret))
10686     + printk(KERN_ERR
10687     + "unionfs: create_parents for "
10688     + "whiteout failed: bindex=%d "
10689     + "err=%d\n", bindex, ret);
10690     + continue;
10691     + }
10692     + }
10693     +
10694     + lower_wh_dentry =
10695     + lookup_lck_len(name, lower_dentry->d_parent,
10696     + dentry->d_name.len + UNIONFS_WHLEN);
10697     + if (IS_ERR(lower_wh_dentry))
10698     + continue;
10699     +
10700     + /*
10701     + * The whiteout already exists. This used to be impossible,
10702     + * but now is possible because of opaqueness.
10703     + */
10704     + if (lower_wh_dentry->d_inode) {
10705     + dput(lower_wh_dentry);
10706     + err = 0;
10707     + goto out;
10708     + }
10709     +
10710     + err = init_lower_nd(&nd, LOOKUP_CREATE);
10711     + if (unlikely(err < 0))
10712     + goto out;
10713     + lower_dir_dentry = lock_parent_wh(lower_wh_dentry);
10714     + err = is_robranch_super(dentry->d_sb, bindex);
10715     + if (!err)
10716     + err = vfs_create(lower_dir_dentry->d_inode,
10717     + lower_wh_dentry,
10718     + current_umask() & S_IRUGO,
10719     + &nd);
10720     + unlock_dir(lower_dir_dentry);
10721     + dput(lower_wh_dentry);
10722     + release_lower_nd(&nd, err);
10723     +
10724     + if (!err || !IS_COPYUP_ERR(err))
10725     + break;
10726     + }
10727     +
10728     + /* set dbopaque so that lookup will not proceed after this branch */
10729     + if (!err)
10730     + dbopaque(dentry) = bindex;
10731     +
10732     +out:
10733     + kfree(name);
10734     + return err;
10735     +}
10736     +
10737     +/*
10738     + * Delete all of the whiteouts in a given directory for rmdir.
10739     + *
10740     + * lower directory inode should be locked
10741     + */
10742     +static int do_delete_whiteouts(struct dentry *dentry, int bindex,
10743     + struct unionfs_dir_state *namelist)
10744     +{
10745     + int err = 0;
10746     + struct dentry *lower_dir_dentry = NULL;
10747     + struct dentry *lower_dentry;
10748     + char *name = NULL, *p;
10749     + struct inode *lower_dir;
10750     + int i;
10751     + struct list_head *pos;
10752     + struct filldir_node *cursor;
10753     +
10754     + /* Find out lower parent dentry */
10755     + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10756     + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10757     + lower_dir = lower_dir_dentry->d_inode;
10758     + BUG_ON(!S_ISDIR(lower_dir->i_mode));
10759     +
10760     + err = -ENOMEM;
10761     + name = __getname();
10762     + if (unlikely(!name))
10763     + goto out;
10764     + strcpy(name, UNIONFS_WHPFX);
10765     + p = name + UNIONFS_WHLEN;
10766     +
10767     + err = 0;
10768     + for (i = 0; !err && i < namelist->size; i++) {
10769     + list_for_each(pos, &namelist->list[i]) {
10770     + cursor =
10771     + list_entry(pos, struct filldir_node,
10772     + file_list);
10773     + /* Only operate on whiteouts in this branch. */
10774     + if (cursor->bindex != bindex)
10775     + continue;
10776     + if (!cursor->whiteout)
10777     + continue;
10778     +
10779     + strlcpy(p, cursor->name, PATH_MAX - UNIONFS_WHLEN);
10780     + lower_dentry =
10781     + lookup_lck_len(name, lower_dir_dentry,
10782     + cursor->namelen +
10783     + UNIONFS_WHLEN);
10784     + if (IS_ERR(lower_dentry)) {
10785     + err = PTR_ERR(lower_dentry);
10786     + break;
10787     + }
10788     + if (lower_dentry->d_inode)
10789     + err = vfs_unlink(lower_dir, lower_dentry);
10790     + dput(lower_dentry);
10791     + if (err)
10792     + break;
10793     + }
10794     + }
10795     +
10796     + __putname(name);
10797     +
10798     + /* After all of the removals, we should copy the attributes once. */
10799     + fsstack_copy_attr_times(dentry->d_inode, lower_dir_dentry->d_inode);
10800     +
10801     +out:
10802     + return err;
10803     +}
10804     +
10805     +
10806     +void __delete_whiteouts(struct work_struct *work)
10807     +{
10808     + struct sioq_args *args = container_of(work, struct sioq_args, work);
10809     + struct deletewh_args *d = &args->deletewh;
10810     +
10811     + args->err = do_delete_whiteouts(d->dentry, d->bindex, d->namelist);
10812     + complete(&args->comp);
10813     +}
10814     +
10815     +/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */
10816     +int delete_whiteouts(struct dentry *dentry, int bindex,
10817     + struct unionfs_dir_state *namelist)
10818     +{
10819     + int err;
10820     + struct super_block *sb;
10821     + struct dentry *lower_dir_dentry;
10822     + struct inode *lower_dir;
10823     + struct sioq_args args;
10824     +
10825     + sb = dentry->d_sb;
10826     +
10827     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
10828     + BUG_ON(bindex < dbstart(dentry));
10829     + BUG_ON(bindex > dbend(dentry));
10830     + err = is_robranch_super(sb, bindex);
10831     + if (err)
10832     + goto out;
10833     +
10834     + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10835     + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10836     + lower_dir = lower_dir_dentry->d_inode;
10837     + BUG_ON(!S_ISDIR(lower_dir->i_mode));
10838     +
10839     + if (!inode_permission(lower_dir, MAY_WRITE | MAY_EXEC)) {
10840     + err = do_delete_whiteouts(dentry, bindex, namelist);
10841     + } else {
10842     + args.deletewh.namelist = namelist;
10843     + args.deletewh.dentry = dentry;
10844     + args.deletewh.bindex = bindex;
10845     + run_sioq(__delete_whiteouts, &args);
10846     + err = args.err;
10847     + }
10848     +
10849     +out:
10850     + return err;
10851     +}
10852     +
10853     +/****************************************************************************
10854     + * Opaque directory helpers *
10855     + ****************************************************************************/
10856     +
10857     +/*
10858     + * is_opaque_dir: returns 0 if it is NOT an opaque dir, 1 if it is, and
10859     + * -errno if an error occurred trying to figure this out.
10860     + */
10861     +int is_opaque_dir(struct dentry *dentry, int bindex)
10862     +{
10863     + int err = 0;
10864     + struct dentry *lower_dentry;
10865     + struct dentry *wh_lower_dentry;
10866     + struct inode *lower_inode;
10867     + struct sioq_args args;
10868     +
10869     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10870     + lower_inode = lower_dentry->d_inode;
10871     +
10872     + BUG_ON(!S_ISDIR(lower_inode->i_mode));
10873     +
10874     + mutex_lock(&lower_inode->i_mutex);
10875     +
10876     + if (!inode_permission(lower_inode, MAY_EXEC)) {
10877     + wh_lower_dentry =
10878     + lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
10879     + sizeof(UNIONFS_DIR_OPAQUE) - 1);
10880     + } else {
10881     + args.is_opaque.dentry = lower_dentry;
10882     + run_sioq(__is_opaque_dir, &args);
10883     + wh_lower_dentry = args.ret;
10884     + }
10885     +
10886     + mutex_unlock(&lower_inode->i_mutex);
10887     +
10888     + if (IS_ERR(wh_lower_dentry)) {
10889     + err = PTR_ERR(wh_lower_dentry);
10890     + goto out;
10891     + }
10892     +
10893     + /* This is an opaque dir iff wh_lower_dentry is positive */
10894     + err = !!wh_lower_dentry->d_inode;
10895     +
10896     + dput(wh_lower_dentry);
10897     +out:
10898     + return err;
10899     +}
10900     +
10901     +void __is_opaque_dir(struct work_struct *work)
10902     +{
10903     + struct sioq_args *args = container_of(work, struct sioq_args, work);
10904     +
10905     + args->ret = lookup_one_len(UNIONFS_DIR_OPAQUE, args->is_opaque.dentry,
10906     + sizeof(UNIONFS_DIR_OPAQUE) - 1);
10907     + complete(&args->comp);
10908     +}
10909     +
10910     +int make_dir_opaque(struct dentry *dentry, int bindex)
10911     +{
10912     + int err = 0;
10913     + struct dentry *lower_dentry, *diropq;
10914     + struct inode *lower_dir;
10915     + struct nameidata nd;
10916     + const struct cred *old_creds;
10917     + struct cred *new_creds;
10918     +
10919     + /*
10920     + * Opaque directory whiteout markers are special files (like regular
10921     + * whiteouts), and should appear to the users as if they don't
10922     + * exist. They should be created/deleted regardless of directory
10923     + * search/create permissions, but only for the duration of this
10924     + * creation of the .wh.__dir_opaque: file. Note, this does not
10925     + * circumvent normal ->permission).
10926     + */
10927     + new_creds = prepare_creds();
10928     + if (unlikely(!new_creds)) {
10929     + err = -ENOMEM;
10930     + goto out_err;
10931     + }
10932     + cap_raise(new_creds->cap_effective, CAP_DAC_READ_SEARCH);
10933     + cap_raise(new_creds->cap_effective, CAP_DAC_OVERRIDE);
10934     + old_creds = override_creds(new_creds);
10935     +
10936     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10937     + lower_dir = lower_dentry->d_inode;
10938     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode) ||
10939     + !S_ISDIR(lower_dir->i_mode));
10940     +
10941     + mutex_lock(&lower_dir->i_mutex);
10942     + diropq = lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
10943     + sizeof(UNIONFS_DIR_OPAQUE) - 1);
10944     + if (IS_ERR(diropq)) {
10945     + err = PTR_ERR(diropq);
10946     + goto out;
10947     + }
10948     +
10949     + err = init_lower_nd(&nd, LOOKUP_CREATE);
10950     + if (unlikely(err < 0))
10951     + goto out;
10952     + if (!diropq->d_inode)
10953     + err = vfs_create(lower_dir, diropq, S_IRUGO, &nd);
10954     + if (!err)
10955     + dbopaque(dentry) = bindex;
10956     + release_lower_nd(&nd, err);
10957     +
10958     + dput(diropq);
10959     +
10960     +out:
10961     + mutex_unlock(&lower_dir->i_mutex);
10962     + revert_creds(old_creds);
10963     +out_err:
10964     + return err;
10965     +}
10966     diff --git a/fs/unionfs/xattr.c b/fs/unionfs/xattr.c
10967     new file mode 100644
10968     index 0000000..9002e06
10969     --- /dev/null
10970     +++ b/fs/unionfs/xattr.c
10971     @@ -0,0 +1,173 @@
10972     +/*
10973     + * Copyright (c) 2003-2010 Erez Zadok
10974     + * Copyright (c) 2003-2006 Charles P. Wright
10975     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10976     + * Copyright (c) 2005-2006 Junjiro Okajima
10977     + * Copyright (c) 2005 Arun M. Krishnakumar
10978     + * Copyright (c) 2004-2006 David P. Quigley
10979     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10980     + * Copyright (c) 2003 Puja Gupta
10981     + * Copyright (c) 2003 Harikesavan Krishnan
10982     + * Copyright (c) 2003-2010 Stony Brook University
10983     + * Copyright (c) 2003-2010 The Research Foundation of SUNY
10984     + *
10985     + * This program is free software; you can redistribute it and/or modify
10986     + * it under the terms of the GNU General Public License version 2 as
10987     + * published by the Free Software Foundation.
10988     + */
10989     +
10990     +#include "union.h"
10991     +
10992     +/* This is lifted from fs/xattr.c */
10993     +void *unionfs_xattr_alloc(size_t size, size_t limit)
10994     +{
10995     + void *ptr;
10996     +
10997     + if (size > limit)
10998     + return ERR_PTR(-E2BIG);
10999     +
11000     + if (!size) /* size request, no buffer is needed */
11001     + return NULL;
11002     +
11003     + ptr = kmalloc(size, GFP_KERNEL);
11004     + if (unlikely(!ptr))
11005     + return ERR_PTR(-ENOMEM);
11006     + return ptr;
11007     +}
11008     +
11009     +/*
11010     + * BKL held by caller.
11011     + * dentry->d_inode->i_mutex locked
11012     + */
11013     +ssize_t unionfs_getxattr(struct dentry *dentry, const char *name, void *value,
11014     + size_t size)
11015     +{
11016     + struct dentry *lower_dentry = NULL;
11017     + struct dentry *parent;
11018     + int err = -EOPNOTSUPP;
11019     + bool valid;
11020     +
11021     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11022     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11023     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11024     +
11025     + valid = __unionfs_d_revalidate(dentry, parent, false);
11026     + if (unlikely(!valid)) {
11027     + err = -ESTALE;
11028     + goto out;
11029     + }
11030     +
11031     + lower_dentry = unionfs_lower_dentry(dentry);
11032     +
11033     + err = vfs_getxattr(lower_dentry, (char *) name, value, size);
11034     +
11035     +out:
11036     + unionfs_check_dentry(dentry);
11037     + unionfs_unlock_dentry(dentry);
11038     + unionfs_unlock_parent(dentry, parent);
11039     + unionfs_read_unlock(dentry->d_sb);
11040     + return err;
11041     +}
11042     +
11043     +/*
11044     + * BKL held by caller.
11045     + * dentry->d_inode->i_mutex locked
11046     + */
11047     +int unionfs_setxattr(struct dentry *dentry, const char *name,
11048     + const void *value, size_t size, int flags)
11049     +{
11050     + struct dentry *lower_dentry = NULL;
11051     + struct dentry *parent;
11052     + int err = -EOPNOTSUPP;
11053     + bool valid;
11054     +
11055     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11056     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11057     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11058     +
11059     + valid = __unionfs_d_revalidate(dentry, parent, false);
11060     + if (unlikely(!valid)) {
11061     + err = -ESTALE;
11062     + goto out;
11063     + }
11064     +
11065     + lower_dentry = unionfs_lower_dentry(dentry);
11066     +
11067     + err = vfs_setxattr(lower_dentry, (char *) name, (void *) value,
11068     + size, flags);
11069     +
11070     +out:
11071     + unionfs_check_dentry(dentry);
11072     + unionfs_unlock_dentry(dentry);
11073     + unionfs_unlock_parent(dentry, parent);
11074     + unionfs_read_unlock(dentry->d_sb);
11075     + return err;
11076     +}
11077     +
11078     +/*
11079     + * BKL held by caller.
11080     + * dentry->d_inode->i_mutex locked
11081     + */
11082     +int unionfs_removexattr(struct dentry *dentry, const char *name)
11083     +{
11084     + struct dentry *lower_dentry = NULL;
11085     + struct dentry *parent;
11086     + int err = -EOPNOTSUPP;
11087     + bool valid;
11088     +
11089     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11090     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11091     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11092     +
11093     + valid = __unionfs_d_revalidate(dentry, parent, false);
11094     + if (unlikely(!valid)) {
11095     + err = -ESTALE;
11096     + goto out;
11097     + }
11098     +
11099     + lower_dentry = unionfs_lower_dentry(dentry);
11100     +
11101     + err = vfs_removexattr(lower_dentry, (char *) name);
11102     +
11103     +out:
11104     + unionfs_check_dentry(dentry);
11105     + unionfs_unlock_dentry(dentry);
11106     + unionfs_unlock_parent(dentry, parent);
11107     + unionfs_read_unlock(dentry->d_sb);
11108     + return err;
11109     +}
11110     +
11111     +/*
11112     + * BKL held by caller.
11113     + * dentry->d_inode->i_mutex locked
11114     + */
11115     +ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size)
11116     +{
11117     + struct dentry *lower_dentry = NULL;
11118     + struct dentry *parent;
11119     + int err = -EOPNOTSUPP;
11120     + char *encoded_list = NULL;
11121     + bool valid;
11122     +
11123     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11124     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11125     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11126     +
11127     + valid = __unionfs_d_revalidate(dentry, parent, false);
11128     + if (unlikely(!valid)) {
11129     + err = -ESTALE;
11130     + goto out;
11131     + }
11132     +
11133     + lower_dentry = unionfs_lower_dentry(dentry);
11134     +
11135     + encoded_list = list;
11136     + err = vfs_listxattr(lower_dentry, encoded_list, size);
11137     +
11138     +out:
11139     + unionfs_check_dentry(dentry);
11140     + unionfs_unlock_dentry(dentry);
11141     + unionfs_unlock_parent(dentry, parent);
11142     + unionfs_read_unlock(dentry->d_sb);
11143     + return err;
11144     +}
11145     diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
11146     index da317c7..64f1ced 100644
11147     --- a/include/linux/fs_stack.h
11148     +++ b/include/linux/fs_stack.h
11149     @@ -1,7 +1,19 @@
11150     +/*
11151     + * Copyright (c) 2006-2009 Erez Zadok
11152     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
11153     + * Copyright (c) 2006-2009 Stony Brook University
11154     + * Copyright (c) 2006-2009 The Research Foundation of SUNY
11155     + *
11156     + * This program is free software; you can redistribute it and/or modify
11157     + * it under the terms of the GNU General Public License version 2 as
11158     + * published by the Free Software Foundation.
11159     + */
11160     +
11161     #ifndef _LINUX_FS_STACK_H
11162     #define _LINUX_FS_STACK_H
11163    
11164     -/* This file defines generic functions used primarily by stackable
11165     +/*
11166     + * This file defines generic functions used primarily by stackable
11167     * filesystems; none of these functions require i_mutex to be held.
11168     */
11169    
11170     diff --git a/include/linux/magic.h b/include/linux/magic.h
11171     index eb9800f..9770154 100644
11172     --- a/include/linux/magic.h
11173     +++ b/include/linux/magic.h
11174     @@ -47,6 +47,8 @@
11175     #define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs"
11176     #define REISER2FS_JR_SUPER_MAGIC_STRING "ReIsEr3Fs"
11177    
11178     +#define UNIONFS_SUPER_MAGIC 0xf15f083d
11179     +
11180     #define SMB_SUPER_MAGIC 0x517B
11181     #define USBDEVICE_SUPER_MAGIC 0x9fa2
11182     #define CGROUP_SUPER_MAGIC 0x27e0eb
11183     diff --git a/include/linux/namei.h b/include/linux/namei.h
11184     index 05b441d..dca6f9a 100644
11185     --- a/include/linux/namei.h
11186     +++ b/include/linux/namei.h
11187     @@ -72,6 +72,7 @@ extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
11188    
11189     extern struct file *lookup_instantiate_filp(struct nameidata *nd, struct dentry *dentry,
11190     int (*open)(struct inode *, struct file *));
11191     +extern void release_open_intent(struct nameidata *);
11192    
11193     extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
11194    
11195     diff --git a/include/linux/splice.h b/include/linux/splice.h
11196     index 997c3b4..54f5501 100644
11197     --- a/include/linux/splice.h
11198     +++ b/include/linux/splice.h
11199     @@ -81,6 +81,11 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
11200     struct splice_pipe_desc *);
11201     extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
11202     splice_direct_actor *);
11203     +extern long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
11204     + loff_t *ppos, size_t len, unsigned int flags);
11205     +extern long vfs_splice_to(struct file *in, loff_t *ppos,
11206     + struct pipe_inode_info *pipe, size_t len,
11207     + unsigned int flags);
11208    
11209     /*
11210     * for dynamic pipe sizing
11211     diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
11212     new file mode 100644
11213     index 0000000..c84d97e
11214     --- /dev/null
11215     +++ b/include/linux/union_fs.h
11216     @@ -0,0 +1,22 @@
11217     +/*
11218     + * Copyright (c) 2003-2009 Erez Zadok
11219     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
11220     + * Copyright (c) 2003-2009 Stony Brook University
11221     + * Copyright (c) 2003-2009 The Research Foundation of SUNY
11222     + *
11223     + * This program is free software; you can redistribute it and/or modify
11224     + * it under the terms of the GNU General Public License version 2 as
11225     + * published by the Free Software Foundation.
11226     + */
11227     +
11228     +#ifndef _LINUX_UNION_FS_H
11229     +#define _LINUX_UNION_FS_H
11230     +
11231     +/*
11232     + * DEFINITIONS FOR USER AND KERNEL CODE:
11233     + */
11234     +# define UNIONFS_IOCTL_INCGEN _IOR(0x15, 11, int)
11235     +# define UNIONFS_IOCTL_QUERYFILE _IOR(0x15, 15, int)
11236     +
11237     +#endif /* _LINUX_UNIONFS_H */
11238     +
11239     diff --git a/security/security.c b/security/security.c
11240     index c53949f..eb71394 100644
11241     --- a/security/security.c
11242     +++ b/security/security.c
11243     @@ -528,6 +528,7 @@ int security_inode_permission(struct inode *inode, int mask)
11244     return 0;
11245     return security_ops->inode_permission(inode, mask);
11246     }
11247     +EXPORT_SYMBOL(security_inode_permission);
11248    
11249     int security_inode_setattr(struct dentry *dentry, struct iattr *attr)
11250     {