Magellan Linux

Annotation of /trunk/kernel26-magellan/patches-2.6.39-r1/0153-2.6.39-unionfs-2.5.9.1.patch

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1327 - (hide annotations) (download)
Fri May 27 12:09:46 2011 UTC (12 years, 11 months ago) by niro
File size: 339695 byte(s)
2.6.39-magellan-r1: using linux-2.6.39, fbcondecor-0.9.6, unionfs-2.5.9.1. dropped reiser4 and tuxonice support
1 niro 1327 diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
2     index 8c624a1..4aa288b 100644
3     --- a/Documentation/filesystems/00-INDEX
4     +++ b/Documentation/filesystems/00-INDEX
5     @@ -110,6 +110,8 @@ udf.txt
6     - info and mount options for the UDF filesystem.
7     ufs.txt
8     - info on the ufs filesystem.
9     +unionfs/
10     + - info on the unionfs filesystem
11     vfat.txt
12     - info on using the VFAT filesystem used in Windows NT and Windows 95
13     vfs.txt
14     diff --git a/Documentation/filesystems/unionfs/00-INDEX b/Documentation/filesystems/unionfs/00-INDEX
15     new file mode 100644
16     index 0000000..96fdf67
17     --- /dev/null
18     +++ b/Documentation/filesystems/unionfs/00-INDEX
19     @@ -0,0 +1,10 @@
20     +00-INDEX
21     + - this file.
22     +concepts.txt
23     + - A brief introduction of concepts.
24     +issues.txt
25     + - A summary of known issues with unionfs.
26     +rename.txt
27     + - Information regarding rename operations.
28     +usage.txt
29     + - Usage information and examples.
30     diff --git a/Documentation/filesystems/unionfs/concepts.txt b/Documentation/filesystems/unionfs/concepts.txt
31     new file mode 100644
32     index 0000000..b853788
33     --- /dev/null
34     +++ b/Documentation/filesystems/unionfs/concepts.txt
35     @@ -0,0 +1,287 @@
36     +Unionfs 2.x CONCEPTS:
37     +=====================
38     +
39     +This file describes the concepts needed by a namespace unification file
40     +system.
41     +
42     +
43     +Branch Priority:
44     +================
45     +
46     +Each branch is assigned a unique priority - starting from 0 (highest
47     +priority). No two branches can have the same priority.
48     +
49     +
50     +Branch Mode:
51     +============
52     +
53     +Each branch is assigned a mode - read-write or read-only. This allows
54     +directories on media mounted read-write to be used in a read-only manner.
55     +
56     +
57     +Whiteouts:
58     +==========
59     +
60     +A whiteout removes a file name from the namespace. Whiteouts are needed when
61     +one attempts to remove a file on a read-only branch.
62     +
63     +Suppose we have a two-branch union, where branch 0 is read-write and branch
64     +1 is read-only. And a file 'foo' on branch 1:
65     +
66     +./b0/
67     +./b1/
68     +./b1/foo
69     +
70     +The unified view would simply be:
71     +
72     +./union/
73     +./union/foo
74     +
75     +Since 'foo' is stored on a read-only branch, it cannot be removed. A
76     +whiteout is used to remove the name 'foo' from the unified namespace. Again,
77     +since branch 1 is read-only, the whiteout cannot be created there. So, we
78     +try on a higher priority (lower numerically) branch and create the whiteout
79     +there.
80     +
81     +./b0/
82     +./b0/.wh.foo
83     +./b1/
84     +./b1/foo
85     +
86     +Later, when Unionfs traverses branches (due to lookup or readdir), it
87     +eliminate 'foo' from the namespace (as well as the whiteout itself.)
88     +
89     +
90     +Opaque Directories:
91     +===================
92     +
93     +Assume we have a unionfs mount comprising of two branches. Branch 0 is
94     +empty; branch 1 has the directory /a and file /a/f. Let's say we mount a
95     +union of branch 0 as read-write and branch 1 as read-only. Now, let's say
96     +we try to perform the following operation in the union:
97     +
98     + rm -fr a
99     +
100     +Because branch 1 is not writable, we cannot physically remove the file /a/f
101     +or the directory /a. So instead, we will create a whiteout in branch 0
102     +named /.wh.a, masking out the name "a" from branch 1. Next, let's say we
103     +try to create a directory named "a" as follows:
104     +
105     + mkdir a
106     +
107     +Because we have a whiteout for "a" already, Unionfs behaves as if "a"
108     +doesn't exist, and thus will delete the whiteout and replace it with an
109     +actual directory named "a".
110     +
111     +The problem now is that if you try to "ls" in the union, Unionfs will
112     +perform is normal directory name unification, for *all* directories named
113     +"a" in all branches. This will cause the file /a/f from branch 1 to
114     +re-appear in the union's namespace, which violates Unix semantics.
115     +
116     +To avoid this problem, we have a different form of whiteouts for
117     +directories, called "opaque directories" (same as BSD Union Mount does).
118     +Whenever we replace a whiteout with a directory, that directory is marked as
119     +opaque. In Unionfs 2.x, it means that we create a file named
120     +/a/.wh.__dir_opaque in branch 0, after having created directory /a there.
121     +When unionfs notices that a directory is opaque, it stops all namespace
122     +operations (including merging readdir contents) at that opaque directory.
123     +This prevents re-exposing names from masked out directories.
124     +
125     +
126     +Duplicate Elimination:
127     +======================
128     +
129     +It is possible for files on different branches to have the same name.
130     +Unionfs then has to select which instance of the file to show to the user.
131     +Given the fact that each branch has a priority associated with it, the
132     +simplest solution is to take the instance from the highest priority
133     +(numerically lowest value) and "hide" the others.
134     +
135     +
136     +Unlinking:
137     +=========
138     +
139     +Unlink operation on non-directory instances is optimized to remove the
140     +maximum possible objects in case multiple underlying branches have the same
141     +file name. The unlink operation will first try to delete file instances
142     +from highest priority branch and then move further to delete from remaining
143     +branches in order of their decreasing priority. Consider a case (F..D..F),
144     +where F is a file and D is a directory of the same name; here, some
145     +intermediate branch could have an empty directory instance with the same
146     +name, so this operation also tries to delete this directory instance and
147     +proceed further to delete from next possible lower priority branch. The
148     +unionfs unlink operation will smoothly delete the files with same name from
149     +all possible underlying branches. In case if some error occurs, it creates
150     +whiteout in highest priority branch that will hide file instance in rest of
151     +the branches. An error could occur either if an unlink operations in any of
152     +the underlying branch failed or if a branch has no write permission.
153     +
154     +This unlinking policy is known as "delete all" and it has the benefit of
155     +overall reducing the number of inodes used by duplicate files, and further
156     +reducing the total number of inodes consumed by whiteouts. The cost is of
157     +extra processing, but testing shows this extra processing is well worth the
158     +savings.
159     +
160     +
161     +Copyup:
162     +=======
163     +
164     +When a change is made to the contents of a file's data or meta-data, they
165     +have to be stored somewhere. The best way is to create a copy of the
166     +original file on a branch that is writable, and then redirect the write
167     +though to this copy. The copy must be made on a higher priority branch so
168     +that lookup and readdir return this newer "version" of the file rather than
169     +the original (see duplicate elimination).
170     +
171     +An entire unionfs mount can be read-only or read-write. If it's read-only,
172     +then none of the branches will be written to, even if some of the branches
173     +are physically writeable. If the unionfs mount is read-write, then the
174     +leftmost (highest priority) branch must be writeable (for copyup to take
175     +place); the remaining branches can be any mix of read-write and read-only.
176     +
177     +In a writeable mount, unionfs will create new files/dir in the leftmost
178     +branch. If one tries to modify a file in a read-only branch/media, unionfs
179     +will copyup the file to the leftmost branch and modify it there. If you try
180     +to modify a file from a writeable branch which is not the leftmost branch,
181     +then unionfs will modify it in that branch; this is useful if you, say,
182     +unify differnet packages (e.g., apache, sendmail, ftpd, etc.) and you want
183     +changes to specific package files to remain logically in the directory where
184     +they came from.
185     +
186     +Cache Coherency:
187     +================
188     +
189     +Unionfs users often want to be able to modify files and directories directly
190     +on the lower branches, and have those changes be visible at the Unionfs
191     +level. This means that data (e.g., pages) and meta-data (dentries, inodes,
192     +open files, etc.) have to be synchronized between the upper and lower
193     +layers. In other words, the newest changes from a layer below have to be
194     +propagated to the Unionfs layer above. If the two layers are not in sync, a
195     +cache incoherency ensues, which could lead to application failures and even
196     +oopses. The Linux kernel, however, has a rather limited set of mechanisms
197     +to ensure this inter-layer cache coherency---so Unionfs has to do most of
198     +the hard work on its own.
199     +
200     +Maintaining Invariants:
201     +
202     +The way Unionfs ensures cache coherency is as follows. At each entry point
203     +to a Unionfs file system method, we call a utility function to validate the
204     +primary objects of this method. Generally, we call unionfs_file_revalidate
205     +on open files, and __unionfs_d_revalidate_chain on dentries (which also
206     +validates inodes). These utility functions check to see whether the upper
207     +Unionfs object is in sync with any of the lower objects that it represents.
208     +The checks we perform include whether the Unionfs superblock has a newer
209     +generation number, or if any of the lower objects mtime's or ctime's are
210     +newer. (Note: generation numbers change when branch-management commands are
211     +issued, so in a way, maintaining cache coherency is also very important for
212     +branch-management.) If indeed we determine that any Unionfs object is no
213     +longer in sync with its lower counterparts, then we rebuild that object
214     +similarly to how we do so for branch-management.
215     +
216     +While rebuilding Unionfs's objects, we also purge any page mappings and
217     +truncate inode pages (see fs/unionfs/dentry.c:purge_inode_data). This is to
218     +ensure that Unionfs will re-get the newer data from the lower branches. We
219     +perform this purging only if the Unionfs operation in question is a reading
220     +operation; if Unionfs is performing a data writing operation (e.g., ->write,
221     +->commit_write, etc.) then we do NOT flush the lower mappings/pages: this is
222     +because (1) a self-deadlock could occur and (2) the upper Unionfs pages are
223     +considered more authoritative anyway, as they are newer and will overwrite
224     +any lower pages.
225     +
226     +Unionfs maintains the following important invariant regarding mtime's,
227     +ctime's, and atime's: the upper inode object's times are the max() of all of
228     +the lower ones. For non-directory objects, there's only one object below,
229     +so the mapping is simple; for directory objects, there could me multiple
230     +lower objects and we have to sync up with the newest one of all the lower
231     +ones. This invariant is important to maintain, especially for directories
232     +(besides, we need this to be POSIX compliant). A union could comprise
233     +multiple writable branches, each of which could change. If we don't reflect
234     +the newest possible mtime/ctime, some applications could fail. For example,
235     +NFSv2/v3 exports check for newer directory mtimes on the server to determine
236     +if the client-side attribute cache should be purged.
237     +
238     +To maintain these important invariants, of course, Unionfs carefully
239     +synchronizes upper and lower times in various places. For example, if we
240     +copy-up a file to a top-level branch, the parent directory where the file
241     +was copied up to will now have a new mtime: so after a successful copy-up,
242     +we sync up with the new top-level branch's parent directory mtime.
243     +
244     +Implementation:
245     +
246     +This cache-coherency implementation is efficient because it defers any
247     +synchronizing between the upper and lower layers until absolutely needed.
248     +Consider the example a common situation where users perform a lot of lower
249     +changes, such as untarring a whole package. While these take place,
250     +typically the user doesn't access the files via Unionfs; only after the
251     +lower changes are done, does the user try to access the lower files. With
252     +our cache-coherency implementation, the entirety of the changes to the lower
253     +branches will not result in a single CPU cycle spent at the Unionfs level
254     +until the user invokes a system call that goes through Unionfs.
255     +
256     +We have considered two alternate cache-coherency designs. (1) Using the
257     +dentry/inode notify functionality to register interest in finding out about
258     +any lower changes. This is a somewhat limited and also a heavy-handed
259     +approach which could result in many notifications to the Unionfs layer upon
260     +each small change at the lower layer (imagine a file being modified multiple
261     +times in rapid succession). (2) Rewriting the VFS to support explicit
262     +callbacks from lower objects to upper objects. We began exploring such an
263     +implementation, but found it to be very complicated--it would have resulted
264     +in massive VFS/MM changes which are unlikely to be accepted by the LKML
265     +community. We therefore believe that our current cache-coherency design and
266     +implementation represent the best approach at this time.
267     +
268     +Limitations:
269     +
270     +Our implementation works in that as long as a user process will have caused
271     +Unionfs to be called, directly or indirectly, even to just do
272     +->d_revalidate; then we will have purged the current Unionfs data and the
273     +process will see the new data. For example, a process that continually
274     +re-reads the same file's data will see the NEW data as soon as the lower
275     +file had changed, upon the next read(2) syscall (even if the file is still
276     +open!) However, this doesn't work when the process re-reads the open file's
277     +data via mmap(2) (unless the user unmaps/closes the file and remaps/reopens
278     +it). Once we respond to ->readpage(s), then the kernel maps the page into
279     +the process's address space and there doesn't appear to be a way to force
280     +the kernel to invalidate those pages/mappings, and force the process to
281     +re-issue ->readpage. If there's a way to invalidate active mappings and
282     +force a ->readpage, let us know please (invalidate_inode_pages2 doesn't do
283     +the trick).
284     +
285     +Our current Unionfs code has to perform many file-revalidation calls. It
286     +would be really nice if the VFS would export an optional file system hook
287     +->file_revalidate (similarly to dentry->d_revalidate) that will be called
288     +before each VFS op that has a "struct file" in it.
289     +
290     +Certain file systems have micro-second granularity (or better) for inode
291     +times, and asynchronous actions could cause those times to change with some
292     +small delay. In such cases, Unionfs may see a changed inode time that only
293     +differs by a tiny fraction of a second: such a change may be a false
294     +positive indication that the lower object has changed, whereas if unionfs
295     +waits a little longer, that false indication will not be seen. (These false
296     +positives are harmless, because they would at most cause unionfs to
297     +re-validate an object that may need no revalidation, and print a debugging
298     +message that clutters the console/logs.) Therefore, to minimize the chances
299     +of these situations, we delay the detection of changed times by a small
300     +factor of a few seconds, called UNIONFS_MIN_CC_TIME (which defaults to 3
301     +seconds, as does NFS). This means that we will detect the change, only a
302     +couple of seconds later, if indeed the time change persists in the lower
303     +file object. This delayed detection has an added performance benefit: we
304     +reduce the number of times that unionfs has to revalidate objects, in case
305     +there's a lot of concurrent activity on both the upper and lower objects,
306     +for the same file(s). Lastly, this delayed time attribute detection is
307     +similar to how NFS clients operate (e.g., acregmin).
308     +
309     +Finally, there is no way currently in Linux to prevent lower directories
310     +from being moved around (i.e., topology changes); there's no way to prevent
311     +modifications to directory sub-trees of whole file systems which are mounted
312     +read-write. It is therefore possible for in-flight operations in unionfs to
313     +take place, while a lower directory is being moved around. Therefore, if
314     +you try to, say, create a new file in a directory through unionfs, while the
315     +directory is being moved around directly, then the new file may get created
316     +in the new location where that directory was moved to. This is a somewhat
317     +similar behaviour in NFS: an NFS client could be creating a new file while
318     +th NFS server is moving th directory around; the file will get successfully
319     +created in the new location. (The one exception in unionfs is that if the
320     +branch is marked read-only by unionfs, then a copyup will take place.)
321     +
322     +For more information, see <http://unionfs.filesystems.org/>.
323     diff --git a/Documentation/filesystems/unionfs/issues.txt b/Documentation/filesystems/unionfs/issues.txt
324     new file mode 100644
325     index 0000000..f4b7e7e
326     --- /dev/null
327     +++ b/Documentation/filesystems/unionfs/issues.txt
328     @@ -0,0 +1,28 @@
329     +KNOWN Unionfs 2.x ISSUES:
330     +=========================
331     +
332     +1. Unionfs should not use lookup_one_len() on the underlying f/s as it
333     + confuses NFSv4. Currently, unionfs_lookup() passes lookup intents to the
334     + lower file-system, this eliminates part of the problem. The remaining
335     + calls to lookup_one_len may need to be changed to pass an intent. We are
336     + currently introducing VFS changes to fs/namei.c's do_path_lookup() to
337     + allow proper file lookup and opening in stackable file systems.
338     +
339     +2. Lockdep (a debugging feature) isn't aware of stacking, and so it
340     + incorrectly complains about locking problems. The problem boils down to
341     + this: Lockdep considers all objects of a certain type to be in the same
342     + class, for example, all inodes. Lockdep doesn't like to see a lock held
343     + on two inodes within the same task, and warns that it could lead to a
344     + deadlock. However, stackable file systems do precisely that: they lock
345     + an upper object, and then a lower object, in a strict order to avoid
346     + locking problems; in addition, Unionfs, as a fan-out file system, may
347     + have to lock several lower inodes. We are currently looking into Lockdep
348     + to see how to make it aware of stackable file systems. For now, we
349     + temporarily disable lockdep when calling vfs methods on lower objects,
350     + but only for those places where lockdep complained. While this solution
351     + may seem unclean, it is not without precedent: other places in the kernel
352     + also do similar temporary disabling, of course after carefully having
353     + checked that it is the right thing to do. Anyway, you get any warnings
354     + from Lockdep, please report them to the Unionfs maintainers.
355     +
356     +For more information, see <http://unionfs.filesystems.org/>.
357     diff --git a/Documentation/filesystems/unionfs/rename.txt b/Documentation/filesystems/unionfs/rename.txt
358     new file mode 100644
359     index 0000000..e20bb82
360     --- /dev/null
361     +++ b/Documentation/filesystems/unionfs/rename.txt
362     @@ -0,0 +1,31 @@
363     +Rename is a complex beast. The following table shows which rename(2) operations
364     +should succeed and which should fail.
365     +
366     +o: success
367     +E: error (either unionfs or vfs)
368     +X: EXDEV
369     +
370     +none = file does not exist
371     +file = file is a file
372     +dir = file is a empty directory
373     +child= file is a non-empty directory
374     +wh = file is a directory containing only whiteouts; this makes it logically
375     + empty
376     +
377     + none file dir child wh
378     +file o o E E E
379     +dir o E o E o
380     +child X E X E X
381     +wh o E o E o
382     +
383     +
384     +Renaming directories:
385     +=====================
386     +
387     +Whenever a empty (either physically or logically) directory is being renamed,
388     +the following sequence of events should take place:
389     +
390     +1) Remove whiteouts from both source and destination directory
391     +2) Rename source to destination
392     +3) Make destination opaque to prevent anything under it from showing up
393     +
394     diff --git a/Documentation/filesystems/unionfs/usage.txt b/Documentation/filesystems/unionfs/usage.txt
395     new file mode 100644
396     index 0000000..1adde69
397     --- /dev/null
398     +++ b/Documentation/filesystems/unionfs/usage.txt
399     @@ -0,0 +1,134 @@
400     +Unionfs is a stackable unification file system, which can appear to merge
401     +the contents of several directories (branches), while keeping their physical
402     +content separate. Unionfs is useful for unified source tree management,
403     +merged contents of split CD-ROM, merged separate software package
404     +directories, data grids, and more. Unionfs allows any mix of read-only and
405     +read-write branches, as well as insertion and deletion of branches anywhere
406     +in the fan-out. To maintain Unix semantics, Unionfs handles elimination of
407     +duplicates, partial-error conditions, and more.
408     +
409     +GENERAL SYNTAX
410     +==============
411     +
412     +# mount -t unionfs -o <OPTIONS>,<BRANCH-OPTIONS> none MOUNTPOINT
413     +
414     +OPTIONS can be any legal combination of:
415     +
416     +- ro # mount file system read-only
417     +- rw # mount file system read-write
418     +- remount # remount the file system (see Branch Management below)
419     +- incgen # increment generation no. (see Cache Consistency below)
420     +
421     +BRANCH-OPTIONS can be either (1) a list of branches given to the "dirs="
422     +option, or (2) a list of individual branch manipulation commands, combined
423     +with the "remount" option, and is further described in the "Branch
424     +Management" section below.
425     +
426     +The syntax for the "dirs=" mount option is:
427     +
428     + dirs=branch[=ro|=rw][:...]
429     +
430     +The "dirs=" option takes a colon-delimited list of directories to compose
431     +the union, with an optional branch mode for each of those directories.
432     +Directories that come earlier (specified first, on the left) in the list
433     +have a higher precedence than those which come later. Additionally,
434     +read-only or read-write permissions of the branch can be specified by
435     +appending =ro or =rw (default) to each directory. See the Copyup section in
436     +concepts.txt, for a description of Unionfs's behavior when mixing read-only
437     +and read-write branches and mounts.
438     +
439     +Syntax:
440     +
441     + dirs=/branch1[=ro|=rw]:/branch2[=ro|=rw]:...:/branchN[=ro|=rw]
442     +
443     +Example:
444     +
445     + dirs=/writable_branch=rw:/read-only_branch=ro
446     +
447     +
448     +BRANCH MANAGEMENT
449     +=================
450     +
451     +Once you mount your union for the first time, using the "dirs=" option, you
452     +can then change the union's overall mode or reconfigure the branches, using
453     +the remount option, as follows.
454     +
455     +To downgrade a union from read-write to read-only:
456     +
457     +# mount -t unionfs -o remount,ro none MOUNTPOINT
458     +
459     +To upgrade a union from read-only to read-write:
460     +
461     +# mount -t unionfs -o remount,rw none MOUNTPOINT
462     +
463     +To delete a branch /foo, regardless where it is in the current union:
464     +
465     +# mount -t unionfs -o remount,del=/foo none MOUNTPOINT
466     +
467     +To insert (add) a branch /foo before /bar:
468     +
469     +# mount -t unionfs -o remount,add=/bar:/foo none MOUNTPOINT
470     +
471     +To insert (add) a branch /foo (with the "rw" mode flag) before /bar:
472     +
473     +# mount -t unionfs -o remount,add=/bar:/foo=rw none MOUNTPOINT
474     +
475     +To insert (add) a branch /foo (in "rw" mode) at the very beginning (i.e., a
476     +new highest-priority branch), you can use the above syntax, or use a short
477     +hand version as follows:
478     +
479     +# mount -t unionfs -o remount,add=/foo none MOUNTPOINT
480     +
481     +To append a branch to the very end (new lowest-priority branch):
482     +
483     +# mount -t unionfs -o remount,add=:/foo none MOUNTPOINT
484     +
485     +To append a branch to the very end (new lowest-priority branch), in
486     +read-only mode:
487     +
488     +# mount -t unionfs -o remount,add=:/foo=ro none MOUNTPOINT
489     +
490     +Finally, to change the mode of one existing branch, say /foo, from read-only
491     +to read-write, and change /bar from read-write to read-only:
492     +
493     +# mount -t unionfs -o remount,mode=/foo=rw,mode=/bar=ro none MOUNTPOINT
494     +
495     +Note: in Unionfs 2.x, you cannot set the leftmost branch to readonly because
496     +then Unionfs won't have any writable place for copyups to take place.
497     +Moreover, the VFS can get confused when it tries to modify something in a
498     +file system mounted read-write, but isn't permitted to write to it.
499     +Instead, you should set the whole union as readonly, as described above.
500     +If, however, you must set the leftmost branch as readonly, perhaps so you
501     +can get a snapshot of it at a point in time, then you should insert a new
502     +writable top-level branch, and mark the one you want as readonly. This can
503     +be accomplished as follows, assuming that /foo is your current leftmost
504     +branch:
505     +
506     +# mount -t tmpfs -o size=NNN /new
507     +# mount -t unionfs -o remount,add=/new,mode=/foo=ro none MOUNTPOINT
508     +<do what you want safely in /foo>
509     +# mount -t unionfs -o remount,del=/new,mode=/foo=rw none MOUNTPOINT
510     +<check if there's anything in /new you want to preserve>
511     +# umount /new
512     +
513     +CACHE CONSISTENCY
514     +=================
515     +
516     +If you modify any file on any of the lower branches directly, while there is
517     +a Unionfs 2.x mounted above any of those branches, you should tell Unionfs
518     +to purge its caches and re-get the objects. To do that, you have to
519     +increment the generation number of the superblock using the following
520     +command:
521     +
522     +# mount -t unionfs -o remount,incgen none MOUNTPOINT
523     +
524     +Note that the older way of incrementing the generation number using an
525     +ioctl, is no longer supported in Unionfs 2.0 and newer. Ioctls in general
526     +are not encouraged. Plus, an ioctl is per-file concept, whereas the
527     +generation number is a per-file-system concept. Worse, such an ioctl
528     +requires an open file, which then has to be invalidated by the very nature
529     +of the generation number increase (read: the old generation increase ioctl
530     +was pretty racy).
531     +
532     +
533     +For more information, see <http://unionfs.filesystems.org/>.
534     diff --git a/MAINTAINERS b/MAINTAINERS
535     index 2199ba1..4ca288d 100644
536     --- a/MAINTAINERS
537     +++ b/MAINTAINERS
538     @@ -6305,6 +6305,14 @@ F: Documentation/cdrom/
539     F: drivers/cdrom/cdrom.c
540     F: include/linux/cdrom.h
541    
542     +UNIONFS
543     +P: Erez Zadok
544     +M: ezk@cs.sunysb.edu
545     +L: unionfs@filesystems.org
546     +W: http://unionfs.filesystems.org/
547     +T: git git.kernel.org/pub/scm/linux/kernel/git/ezk/unionfs.git
548     +S: Maintained
549     +
550     UNSORTED BLOCK IMAGES (UBI)
551     M: Artem Bityutskiy <dedekind1@gmail.com>
552     W: http://www.linux-mtd.infradead.org/
553     diff --git a/fs/Kconfig b/fs/Kconfig
554     index f3aa9b0..0e6182c 100644
555     --- a/fs/Kconfig
556     +++ b/fs/Kconfig
557     @@ -170,6 +170,7 @@ if MISC_FILESYSTEMS
558     source "fs/adfs/Kconfig"
559     source "fs/affs/Kconfig"
560     source "fs/ecryptfs/Kconfig"
561     +source "fs/unionfs/Kconfig"
562     source "fs/hfs/Kconfig"
563     source "fs/hfsplus/Kconfig"
564     source "fs/befs/Kconfig"
565     diff --git a/fs/Makefile b/fs/Makefile
566     index fb68c2b..8ca9290 100644
567     --- a/fs/Makefile
568     +++ b/fs/Makefile
569     @@ -83,6 +83,7 @@ obj-$(CONFIG_ISO9660_FS) += isofs/
570     obj-$(CONFIG_HFSPLUS_FS) += hfsplus/ # Before hfs to find wrapped HFS+
571     obj-$(CONFIG_HFS_FS) += hfs/
572     obj-$(CONFIG_ECRYPT_FS) += ecryptfs/
573     +obj-$(CONFIG_UNION_FS) += unionfs/
574     obj-$(CONFIG_VXFS_FS) += freevxfs/
575     obj-$(CONFIG_NFS_FS) += nfs/
576     obj-$(CONFIG_EXPORTFS) += exportfs/
577     diff --git a/fs/namei.c b/fs/namei.c
578     index 54fc993..276e262 100644
579     --- a/fs/namei.c
580     +++ b/fs/namei.c
581     @@ -578,6 +578,7 @@ void release_open_intent(struct nameidata *nd)
582     fput(file);
583     }
584     }
585     +EXPORT_SYMBOL_GPL(release_open_intent);
586    
587     static inline int d_revalidate(struct dentry *dentry, struct nameidata *nd)
588     {
589     @@ -1819,6 +1820,42 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
590     return __lookup_hash(&this, base, NULL);
591     }
592    
593     +/* pass nameidata from caller (useful for NFS) */
594     +struct dentry *lookup_one_len_nd(const char *name, struct dentry *base,
595     + int len, struct nameidata *nd)
596     +{
597     + struct qstr this;
598     + unsigned long hash;
599     + unsigned int c;
600     +
601     + WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
602     +
603     + this.name = name;
604     + this.len = len;
605     + if (!len)
606     + return ERR_PTR(-EACCES);
607     +
608     + hash = init_name_hash();
609     + while (len--) {
610     + c = *(const unsigned char *)name++;
611     + if (c == '/' || c == '\0')
612     + return ERR_PTR(-EACCES);
613     + hash = partial_name_hash(c, hash);
614     + }
615     + this.hash = end_name_hash(hash);
616     + /*
617     + * See if the low-level filesystem might want
618     + * to use its own hash..
619     + */
620     + if (base->d_flags & DCACHE_OP_HASH) {
621     + int err = base->d_op->d_hash(base, base->d_inode, &this);
622     + if (err < 0)
623     + return ERR_PTR(err);
624     + }
625     +
626     + return __lookup_hash(&this, base, nd);
627     +}
628     +
629     int user_path_at(int dfd, const char __user *name, unsigned flags,
630     struct path *path)
631     {
632     @@ -3422,6 +3459,7 @@ EXPORT_SYMBOL(get_write_access); /* binfmt_aout */
633     EXPORT_SYMBOL(getname);
634     EXPORT_SYMBOL(lock_rename);
635     EXPORT_SYMBOL(lookup_one_len);
636     +EXPORT_SYMBOL(lookup_one_len_nd);
637     EXPORT_SYMBOL(page_follow_link_light);
638     EXPORT_SYMBOL(page_put_link);
639     EXPORT_SYMBOL(page_readlink);
640     diff --git a/fs/splice.c b/fs/splice.c
641     index 50a5d978..a3af841 100644
642     --- a/fs/splice.c
643     +++ b/fs/splice.c
644     @@ -1081,8 +1081,8 @@ EXPORT_SYMBOL(generic_splice_sendpage);
645     /*
646     * Attempt to initiate a splice from pipe to file.
647     */
648     -static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
649     - loff_t *ppos, size_t len, unsigned int flags)
650     +long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
651     + loff_t *ppos, size_t len, unsigned int flags)
652     {
653     ssize_t (*splice_write)(struct pipe_inode_info *, struct file *,
654     loff_t *, size_t, unsigned int);
655     @@ -1105,13 +1105,14 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
656    
657     return splice_write(pipe, out, ppos, len, flags);
658     }
659     +EXPORT_SYMBOL_GPL(vfs_splice_from);
660    
661     /*
662     * Attempt to initiate a splice from a file to a pipe.
663     */
664     -static long do_splice_to(struct file *in, loff_t *ppos,
665     - struct pipe_inode_info *pipe, size_t len,
666     - unsigned int flags)
667     +long vfs_splice_to(struct file *in, loff_t *ppos,
668     + struct pipe_inode_info *pipe, size_t len,
669     + unsigned int flags)
670     {
671     ssize_t (*splice_read)(struct file *, loff_t *,
672     struct pipe_inode_info *, size_t, unsigned int);
673     @@ -1131,6 +1132,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
674    
675     return splice_read(in, ppos, pipe, len, flags);
676     }
677     +EXPORT_SYMBOL_GPL(vfs_splice_to);
678    
679     /**
680     * splice_direct_to_actor - splices data directly between two non-pipes
681     @@ -1200,7 +1202,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
682     size_t read_len;
683     loff_t pos = sd->pos, prev_pos = pos;
684    
685     - ret = do_splice_to(in, &pos, pipe, len, flags);
686     + ret = vfs_splice_to(in, &pos, pipe, len, flags);
687     if (unlikely(ret <= 0))
688     goto out_release;
689    
690     @@ -1259,8 +1261,8 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
691     {
692     struct file *file = sd->u.file;
693    
694     - return do_splice_from(pipe, file, &file->f_pos, sd->total_len,
695     - sd->flags);
696     + return vfs_splice_from(pipe, file, &file->f_pos, sd->total_len,
697     + sd->flags);
698     }
699    
700     /**
701     @@ -1345,7 +1347,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
702     } else
703     off = &out->f_pos;
704    
705     - ret = do_splice_from(ipipe, out, off, len, flags);
706     + ret = vfs_splice_from(ipipe, out, off, len, flags);
707    
708     if (off_out && copy_to_user(off_out, off, sizeof(loff_t)))
709     ret = -EFAULT;
710     @@ -1365,7 +1367,7 @@ static long do_splice(struct file *in, loff_t __user *off_in,
711     } else
712     off = &in->f_pos;
713    
714     - ret = do_splice_to(in, off, opipe, len, flags);
715     + ret = vfs_splice_to(in, off, opipe, len, flags);
716    
717     if (off_in && copy_to_user(off_in, off, sizeof(loff_t)))
718     ret = -EFAULT;
719     diff --git a/fs/stack.c b/fs/stack.c
720     index 4a6f7f4..7eeef12 100644
721     --- a/fs/stack.c
722     +++ b/fs/stack.c
723     @@ -1,8 +1,20 @@
724     +/*
725     + * Copyright (c) 2006-2009 Erez Zadok
726     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
727     + * Copyright (c) 2006-2009 Stony Brook University
728     + * Copyright (c) 2006-2009 The Research Foundation of SUNY
729     + *
730     + * This program is free software; you can redistribute it and/or modify
731     + * it under the terms of the GNU General Public License version 2 as
732     + * published by the Free Software Foundation.
733     + */
734     +
735     #include <linux/module.h>
736     #include <linux/fs.h>
737     #include <linux/fs_stack.h>
738    
739     -/* does _NOT_ require i_mutex to be held.
740     +/*
741     + * does _NOT_ require i_mutex to be held.
742     *
743     * This function cannot be inlined since i_size_{read,write} is rather
744     * heavy-weight on 32-bit systems
745     diff --git a/fs/unionfs/Kconfig b/fs/unionfs/Kconfig
746     new file mode 100644
747     index 0000000..f3c1ac4
748     --- /dev/null
749     +++ b/fs/unionfs/Kconfig
750     @@ -0,0 +1,24 @@
751     +config UNION_FS
752     + tristate "Union file system (EXPERIMENTAL)"
753     + depends on EXPERIMENTAL
754     + help
755     + Unionfs is a stackable unification file system, which appears to
756     + merge the contents of several directories (branches), while keeping
757     + their physical content separate.
758     +
759     + See <http://unionfs.filesystems.org> for details
760     +
761     +config UNION_FS_XATTR
762     + bool "Unionfs extended attributes"
763     + depends on UNION_FS
764     + help
765     + Extended attributes are name:value pairs associated with inodes by
766     + the kernel or by users (see the attr(5) manual page).
767     +
768     + If unsure, say N.
769     +
770     +config UNION_FS_DEBUG
771     + bool "Debug Unionfs"
772     + depends on UNION_FS
773     + help
774     + If you say Y here, you can turn on debugging output from Unionfs.
775     diff --git a/fs/unionfs/Makefile b/fs/unionfs/Makefile
776     new file mode 100644
777     index 0000000..1fc8d91
778     --- /dev/null
779     +++ b/fs/unionfs/Makefile
780     @@ -0,0 +1,17 @@
781     +UNIONFS_VERSION="2.5.9.1 (for 2.6.39-rc5)"
782     +
783     +EXTRA_CFLAGS += -DUNIONFS_VERSION=\"$(UNIONFS_VERSION)\"
784     +
785     +obj-$(CONFIG_UNION_FS) += unionfs.o
786     +
787     +unionfs-y := subr.o dentry.o file.o inode.o main.o super.o \
788     + rdstate.o copyup.o dirhelper.o rename.o unlink.o \
789     + lookup.o commonfops.o dirfops.o sioq.o mmap.o whiteout.o
790     +
791     +unionfs-$(CONFIG_UNION_FS_XATTR) += xattr.o
792     +
793     +unionfs-$(CONFIG_UNION_FS_DEBUG) += debug.o
794     +
795     +ifeq ($(CONFIG_UNION_FS_DEBUG),y)
796     +EXTRA_CFLAGS += -DDEBUG
797     +endif
798     diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
799     new file mode 100644
800     index 0000000..0a271f4
801     --- /dev/null
802     +++ b/fs/unionfs/commonfops.c
803     @@ -0,0 +1,896 @@
804     +/*
805     + * Copyright (c) 2003-2011 Erez Zadok
806     + * Copyright (c) 2003-2006 Charles P. Wright
807     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
808     + * Copyright (c) 2005-2006 Junjiro Okajima
809     + * Copyright (c) 2005 Arun M. Krishnakumar
810     + * Copyright (c) 2004-2006 David P. Quigley
811     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
812     + * Copyright (c) 2003 Puja Gupta
813     + * Copyright (c) 2003 Harikesavan Krishnan
814     + * Copyright (c) 2003-2011 Stony Brook University
815     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
816     + *
817     + * This program is free software; you can redistribute it and/or modify
818     + * it under the terms of the GNU General Public License version 2 as
819     + * published by the Free Software Foundation.
820     + */
821     +
822     +#include "union.h"
823     +
824     +/*
825     + * 1) Copyup the file
826     + * 2) Rename the file to '.unionfs<original inode#><counter>' - obviously
827     + * stolen from NFS's silly rename
828     + */
829     +static int copyup_deleted_file(struct file *file, struct dentry *dentry,
830     + struct dentry *parent, int bstart, int bindex)
831     +{
832     + static unsigned int counter;
833     + const int i_inosize = sizeof(dentry->d_inode->i_ino) * 2;
834     + const int countersize = sizeof(counter) * 2;
835     + const int nlen = sizeof(".unionfs") + i_inosize + countersize - 1;
836     + char name[nlen + 1];
837     + int err;
838     + struct dentry *tmp_dentry = NULL;
839     + struct dentry *lower_dentry;
840     + struct dentry *lower_dir_dentry = NULL;
841     +
842     + lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);
843     +
844     + sprintf(name, ".unionfs%*.*lx",
845     + i_inosize, i_inosize, lower_dentry->d_inode->i_ino);
846     +
847     + /*
848     + * Loop, looking for an unused temp name to copyup to.
849     + *
850     + * It's somewhat silly that we look for a free temp tmp name in the
851     + * source branch (bstart) instead of the dest branch (bindex), where
852     + * the final name will be created. We _will_ catch it if somehow
853     + * the name exists in the dest branch, but it'd be nice to catch it
854     + * sooner than later.
855     + */
856     +retry:
857     + tmp_dentry = NULL;
858     + do {
859     + char *suffix = name + nlen - countersize;
860     +
861     + dput(tmp_dentry);
862     + counter++;
863     + sprintf(suffix, "%*.*x", countersize, countersize, counter);
864     +
865     + pr_debug("unionfs: trying to rename %s to %s\n",
866     + dentry->d_name.name, name);
867     +
868     + tmp_dentry = lookup_lck_len(name, lower_dentry->d_parent,
869     + nlen);
870     + if (IS_ERR(tmp_dentry)) {
871     + err = PTR_ERR(tmp_dentry);
872     + goto out;
873     + }
874     + } while (tmp_dentry->d_inode != NULL); /* need negative dentry */
875     + dput(tmp_dentry);
876     +
877     + err = copyup_named_file(parent->d_inode, file, name, bstart, bindex,
878     + i_size_read(file->f_path.dentry->d_inode));
879     + if (err) {
880     + if (unlikely(err == -EEXIST))
881     + goto retry;
882     + goto out;
883     + }
884     +
885     + /* bring it to the same state as an unlinked file */
886     + lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
887     + if (!unionfs_lower_inode_idx(dentry->d_inode, bindex)) {
888     + atomic_inc(&lower_dentry->d_inode->i_count);
889     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
890     + lower_dentry->d_inode);
891     + }
892     + lower_dir_dentry = lock_parent(lower_dentry);
893     + err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
894     + unlock_dir(lower_dir_dentry);
895     +
896     +out:
897     + if (!err)
898     + unionfs_check_dentry(dentry);
899     + return err;
900     +}
901     +
902     +/*
903     + * put all references held by upper struct file and free lower file pointer
904     + * array
905     + */
906     +static void cleanup_file(struct file *file)
907     +{
908     + int bindex, bstart, bend;
909     + struct file **lower_files;
910     + struct file *lower_file;
911     + struct super_block *sb = file->f_path.dentry->d_sb;
912     +
913     + lower_files = UNIONFS_F(file)->lower_files;
914     + bstart = fbstart(file);
915     + bend = fbend(file);
916     +
917     + for (bindex = bstart; bindex <= bend; bindex++) {
918     + int i; /* holds (possibly) updated branch index */
919     + int old_bid;
920     +
921     + lower_file = unionfs_lower_file_idx(file, bindex);
922     + if (!lower_file)
923     + continue;
924     +
925     + /*
926     + * Find new index of matching branch with an open
927     + * file, since branches could have been added or
928     + * deleted causing the one with open files to shift.
929     + */
930     + old_bid = UNIONFS_F(file)->saved_branch_ids[bindex];
931     + i = branch_id_to_idx(sb, old_bid);
932     + if (unlikely(i < 0)) {
933     + printk(KERN_ERR "unionfs: no superblock for "
934     + "file %p\n", file);
935     + continue;
936     + }
937     +
938     + /* decrement count of open files */
939     + branchput(sb, i);
940     + /*
941     + * fput will perform an mntput for us on the correct branch.
942     + * Although we're using the file's old branch configuration,
943     + * bindex, which is the old index, correctly points to the
944     + * right branch in the file's branch list. In other words,
945     + * we're going to mntput the correct branch even if branches
946     + * have been added/removed.
947     + */
948     + fput(lower_file);
949     + UNIONFS_F(file)->lower_files[bindex] = NULL;
950     + UNIONFS_F(file)->saved_branch_ids[bindex] = -1;
951     + }
952     +
953     + UNIONFS_F(file)->lower_files = NULL;
954     + kfree(lower_files);
955     + kfree(UNIONFS_F(file)->saved_branch_ids);
956     + /* set to NULL because caller needs to know if to kfree on error */
957     + UNIONFS_F(file)->saved_branch_ids = NULL;
958     +}
959     +
960     +/* open all lower files for a given file */
961     +static int open_all_files(struct file *file)
962     +{
963     + int bindex, bstart, bend, err = 0;
964     + struct file *lower_file;
965     + struct dentry *lower_dentry;
966     + struct dentry *dentry = file->f_path.dentry;
967     + struct super_block *sb = dentry->d_sb;
968     +
969     + bstart = dbstart(dentry);
970     + bend = dbend(dentry);
971     +
972     + for (bindex = bstart; bindex <= bend; bindex++) {
973     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
974     + if (!lower_dentry)
975     + continue;
976     +
977     + dget(lower_dentry);
978     + unionfs_mntget(dentry, bindex);
979     + branchget(sb, bindex);
980     +
981     + lower_file =
982     + dentry_open(lower_dentry,
983     + unionfs_lower_mnt_idx(dentry, bindex),
984     + file->f_flags, current_cred());
985     + if (IS_ERR(lower_file)) {
986     + branchput(sb, bindex);
987     + err = PTR_ERR(lower_file);
988     + goto out;
989     + } else {
990     + unionfs_set_lower_file_idx(file, bindex, lower_file);
991     + }
992     + }
993     +out:
994     + return err;
995     +}
996     +
997     +/* open the highest priority file for a given upper file */
998     +static int open_highest_file(struct file *file, bool willwrite)
999     +{
1000     + int bindex, bstart, bend, err = 0;
1001     + struct file *lower_file;
1002     + struct dentry *lower_dentry;
1003     + struct dentry *dentry = file->f_path.dentry;
1004     + struct dentry *parent = dget_parent(dentry);
1005     + struct inode *parent_inode = parent->d_inode;
1006     + struct super_block *sb = dentry->d_sb;
1007     +
1008     + bstart = dbstart(dentry);
1009     + bend = dbend(dentry);
1010     +
1011     + lower_dentry = unionfs_lower_dentry(dentry);
1012     + if (willwrite && IS_WRITE_FLAG(file->f_flags) && is_robranch(dentry)) {
1013     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
1014     + err = copyup_file(parent_inode, file, bstart, bindex,
1015     + i_size_read(dentry->d_inode));
1016     + if (!err)
1017     + break;
1018     + }
1019     + atomic_set(&UNIONFS_F(file)->generation,
1020     + atomic_read(&UNIONFS_I(dentry->d_inode)->
1021     + generation));
1022     + goto out;
1023     + }
1024     +
1025     + dget(lower_dentry);
1026     + unionfs_mntget(dentry, bstart);
1027     + lower_file = dentry_open(lower_dentry,
1028     + unionfs_lower_mnt_idx(dentry, bstart),
1029     + file->f_flags, current_cred());
1030     + if (IS_ERR(lower_file)) {
1031     + err = PTR_ERR(lower_file);
1032     + goto out;
1033     + }
1034     + branchget(sb, bstart);
1035     + unionfs_set_lower_file(file, lower_file);
1036     + /* Fix up the position. */
1037     + lower_file->f_pos = file->f_pos;
1038     +
1039     + memcpy(&lower_file->f_ra, &file->f_ra, sizeof(struct file_ra_state));
1040     +out:
1041     + dput(parent);
1042     + return err;
1043     +}
1044     +
1045     +/* perform a delayed copyup of a read-write file on a read-only branch */
1046     +static int do_delayed_copyup(struct file *file, struct dentry *parent)
1047     +{
1048     + int bindex, bstart, bend, err = 0;
1049     + struct dentry *dentry = file->f_path.dentry;
1050     + struct inode *parent_inode = parent->d_inode;
1051     +
1052     + bstart = fbstart(file);
1053     + bend = fbend(file);
1054     +
1055     + BUG_ON(!S_ISREG(dentry->d_inode->i_mode));
1056     +
1057     + unionfs_check_file(file);
1058     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
1059     + if (!d_deleted(dentry))
1060     + err = copyup_file(parent_inode, file, bstart,
1061     + bindex,
1062     + i_size_read(dentry->d_inode));
1063     + else
1064     + err = copyup_deleted_file(file, dentry, parent,
1065     + bstart, bindex);
1066     + /* if succeeded, set lower open-file flags and break */
1067     + if (!err) {
1068     + struct file *lower_file;
1069     + lower_file = unionfs_lower_file_idx(file, bindex);
1070     + lower_file->f_flags = file->f_flags;
1071     + break;
1072     + }
1073     + }
1074     + if (err || (bstart <= fbstart(file)))
1075     + goto out;
1076     + bend = fbend(file);
1077     + for (bindex = bstart; bindex <= bend; bindex++) {
1078     + if (unionfs_lower_file_idx(file, bindex)) {
1079     + branchput(dentry->d_sb, bindex);
1080     + fput(unionfs_lower_file_idx(file, bindex));
1081     + unionfs_set_lower_file_idx(file, bindex, NULL);
1082     + }
1083     + }
1084     + path_put_lowers(dentry, bstart, bend, false);
1085     + iput_lowers(dentry->d_inode, bstart, bend, false);
1086     + /* for reg file, we only open it "once" */
1087     + fbend(file) = fbstart(file);
1088     + dbend(dentry) = dbstart(dentry);
1089     + ibend(dentry->d_inode) = ibstart(dentry->d_inode);
1090     +
1091     +out:
1092     + unionfs_check_file(file);
1093     + return err;
1094     +}
1095     +
1096     +/*
1097     + * Helper function for unionfs_file_revalidate/locked.
1098     + * Expects dentry/parent to be locked already, and revalidated.
1099     + */
1100     +static int __unionfs_file_revalidate(struct file *file, struct dentry *dentry,
1101     + struct dentry *parent,
1102     + struct super_block *sb, int sbgen,
1103     + int dgen, bool willwrite)
1104     +{
1105     + int fgen;
1106     + int bstart, bend, orig_brid;
1107     + int size;
1108     + int err = 0;
1109     +
1110     + fgen = atomic_read(&UNIONFS_F(file)->generation);
1111     +
1112     + /*
1113     + * There are two cases we are interested in. The first is if the
1114     + * generation is lower than the super-block. The second is if
1115     + * someone has copied up this file from underneath us, we also need
1116     + * to refresh things.
1117     + */
1118     + if ((d_deleted(dentry) && dbstart(dentry) >= fbstart(file)) ||
1119     + (sbgen <= fgen &&
1120     + dbstart(dentry) == fbstart(file) &&
1121     + unionfs_lower_file(file)))
1122     + goto out_may_copyup;
1123     +
1124     + /* save orig branch ID */
1125     + orig_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1126     +
1127     + /* First we throw out the existing files. */
1128     + cleanup_file(file);
1129     +
1130     + /* Now we reopen the file(s) as in unionfs_open. */
1131     + bstart = fbstart(file) = dbstart(dentry);
1132     + bend = fbend(file) = dbend(dentry);
1133     +
1134     + size = sizeof(struct file *) * sbmax(sb);
1135     + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1136     + if (unlikely(!UNIONFS_F(file)->lower_files)) {
1137     + err = -ENOMEM;
1138     + goto out;
1139     + }
1140     + size = sizeof(int) * sbmax(sb);
1141     + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1142     + if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1143     + err = -ENOMEM;
1144     + goto out;
1145     + }
1146     +
1147     + if (S_ISDIR(dentry->d_inode->i_mode)) {
1148     + /* We need to open all the files. */
1149     + err = open_all_files(file);
1150     + if (err)
1151     + goto out;
1152     + } else {
1153     + int new_brid;
1154     + /* We only open the highest priority branch. */
1155     + err = open_highest_file(file, willwrite);
1156     + if (err)
1157     + goto out;
1158     + new_brid = UNIONFS_F(file)->saved_branch_ids[fbstart(file)];
1159     + if (unlikely(new_brid != orig_brid && sbgen > fgen)) {
1160     + /*
1161     + * If we re-opened the file on a different branch
1162     + * than the original one, and this was due to a new
1163     + * branch inserted, then update the mnt counts of
1164     + * the old and new branches accordingly.
1165     + */
1166     + unionfs_mntget(dentry, bstart);
1167     + unionfs_mntput(sb->s_root,
1168     + branch_id_to_idx(sb, orig_brid));
1169     + }
1170     + /* regular files have only one open lower file */
1171     + fbend(file) = fbstart(file);
1172     + }
1173     + atomic_set(&UNIONFS_F(file)->generation,
1174     + atomic_read(&UNIONFS_I(dentry->d_inode)->generation));
1175     +
1176     +out_may_copyup:
1177     + /* Copyup on the first write to a file on a readonly branch. */
1178     + if (willwrite && IS_WRITE_FLAG(file->f_flags) &&
1179     + !IS_WRITE_FLAG(unionfs_lower_file(file)->f_flags) &&
1180     + is_robranch(dentry)) {
1181     + pr_debug("unionfs: do delay copyup of \"%s\"\n",
1182     + dentry->d_name.name);
1183     + err = do_delayed_copyup(file, parent);
1184     + /* regular files have only one open lower file */
1185     + if (!err && !S_ISDIR(dentry->d_inode->i_mode))
1186     + fbend(file) = fbstart(file);
1187     + }
1188     +
1189     +out:
1190     + if (err) {
1191     + kfree(UNIONFS_F(file)->lower_files);
1192     + kfree(UNIONFS_F(file)->saved_branch_ids);
1193     + }
1194     + return err;
1195     +}
1196     +
1197     +/*
1198     + * Revalidate the struct file
1199     + * @file: file to revalidate
1200     + * @parent: parent dentry (locked by caller)
1201     + * @willwrite: true if caller may cause changes to the file; false otherwise.
1202     + * Caller must lock/unlock dentry's branch configuration.
1203     + */
1204     +int unionfs_file_revalidate(struct file *file, struct dentry *parent,
1205     + bool willwrite)
1206     +{
1207     + struct super_block *sb;
1208     + struct dentry *dentry;
1209     + int sbgen, dgen;
1210     + int err = 0;
1211     +
1212     + dentry = file->f_path.dentry;
1213     + sb = dentry->d_sb;
1214     + verify_locked(dentry);
1215     + verify_locked(parent);
1216     +
1217     + /*
1218     + * First revalidate the dentry inside struct file,
1219     + * but not unhashed dentries.
1220     + */
1221     + if (!d_deleted(dentry) &&
1222     + !__unionfs_d_revalidate(dentry, parent, willwrite)) {
1223     + err = -ESTALE;
1224     + goto out;
1225     + }
1226     +
1227     + sbgen = atomic_read(&UNIONFS_SB(sb)->generation);
1228     + dgen = atomic_read(&UNIONFS_D(dentry)->generation);
1229     +
1230     + if (unlikely(sbgen > dgen)) { /* XXX: should never happen */
1231     + pr_debug("unionfs: failed to revalidate dentry (%s)\n",
1232     + dentry->d_name.name);
1233     + err = -ESTALE;
1234     + goto out;
1235     + }
1236     +
1237     + err = __unionfs_file_revalidate(file, dentry, parent, sb,
1238     + sbgen, dgen, willwrite);
1239     +out:
1240     + return err;
1241     +}
1242     +
1243     +/* unionfs_open helper function: open a directory */
1244     +static int __open_dir(struct inode *inode, struct file *file)
1245     +{
1246     + struct dentry *lower_dentry;
1247     + struct file *lower_file;
1248     + int bindex, bstart, bend;
1249     + struct vfsmount *mnt;
1250     +
1251     + bstart = fbstart(file) = dbstart(file->f_path.dentry);
1252     + bend = fbend(file) = dbend(file->f_path.dentry);
1253     +
1254     + for (bindex = bstart; bindex <= bend; bindex++) {
1255     + lower_dentry =
1256     + unionfs_lower_dentry_idx(file->f_path.dentry, bindex);
1257     + if (!lower_dentry)
1258     + continue;
1259     +
1260     + dget(lower_dentry);
1261     + unionfs_mntget(file->f_path.dentry, bindex);
1262     + mnt = unionfs_lower_mnt_idx(file->f_path.dentry, bindex);
1263     + lower_file = dentry_open(lower_dentry, mnt, file->f_flags,
1264     + current_cred());
1265     + if (IS_ERR(lower_file))
1266     + return PTR_ERR(lower_file);
1267     +
1268     + unionfs_set_lower_file_idx(file, bindex, lower_file);
1269     +
1270     + /*
1271     + * The branchget goes after the open, because otherwise
1272     + * we would miss the reference on release.
1273     + */
1274     + branchget(inode->i_sb, bindex);
1275     + }
1276     +
1277     + return 0;
1278     +}
1279     +
1280     +/* unionfs_open helper function: open a file */
1281     +static int __open_file(struct inode *inode, struct file *file,
1282     + struct dentry *parent)
1283     +{
1284     + struct dentry *lower_dentry;
1285     + struct file *lower_file;
1286     + int lower_flags;
1287     + int bindex, bstart, bend;
1288     +
1289     + lower_dentry = unionfs_lower_dentry(file->f_path.dentry);
1290     + lower_flags = file->f_flags;
1291     +
1292     + bstart = fbstart(file) = dbstart(file->f_path.dentry);
1293     + bend = fbend(file) = dbend(file->f_path.dentry);
1294     +
1295     + /*
1296     + * check for the permission for lower file. If the error is
1297     + * COPYUP_ERR, copyup the file.
1298     + */
1299     + if (lower_dentry->d_inode && is_robranch(file->f_path.dentry)) {
1300     + /*
1301     + * if the open will change the file, copy it up otherwise
1302     + * defer it.
1303     + */
1304     + if (lower_flags & O_TRUNC) {
1305     + int size = 0;
1306     + int err = -EROFS;
1307     +
1308     + /* copyup the file */
1309     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
1310     + err = copyup_file(parent->d_inode, file,
1311     + bstart, bindex, size);
1312     + if (!err) {
1313     + /* only one regular file open */
1314     + fbend(file) = fbstart(file);
1315     + break;
1316     + }
1317     + }
1318     + return err;
1319     + } else {
1320     + /*
1321     + * turn off writeable flags, to force delayed copyup
1322     + * by caller.
1323     + */
1324     + lower_flags &= ~(OPEN_WRITE_FLAGS);
1325     + }
1326     + }
1327     +
1328     + dget(lower_dentry);
1329     +
1330     + /*
1331     + * dentry_open will decrement mnt refcnt if err.
1332     + * otherwise fput() will do an mntput() for us upon file close.
1333     + */
1334     + unionfs_mntget(file->f_path.dentry, bstart);
1335     + lower_file =
1336     + dentry_open(lower_dentry,
1337     + unionfs_lower_mnt_idx(file->f_path.dentry, bstart),
1338     + lower_flags, current_cred());
1339     + if (IS_ERR(lower_file))
1340     + return PTR_ERR(lower_file);
1341     +
1342     + unionfs_set_lower_file(file, lower_file);
1343     + branchget(inode->i_sb, bstart);
1344     +
1345     + return 0;
1346     +}
1347     +
1348     +int unionfs_open(struct inode *inode, struct file *file)
1349     +{
1350     + int err = 0;
1351     + struct file *lower_file = NULL;
1352     + struct dentry *dentry = file->f_path.dentry;
1353     + struct dentry *parent;
1354     + int bindex = 0, bstart = 0, bend = 0;
1355     + int size;
1356     + int valid = 0;
1357     +
1358     + unionfs_read_lock(inode->i_sb, UNIONFS_SMUTEX_PARENT);
1359     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1360     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1361     +
1362     + /* don't open unhashed/deleted files */
1363     + if (d_deleted(dentry)) {
1364     + err = -ENOENT;
1365     + goto out_nofree;
1366     + }
1367     +
1368     + /* XXX: should I change 'false' below to the 'willwrite' flag? */
1369     + valid = __unionfs_d_revalidate(dentry, parent, false);
1370     + if (unlikely(!valid)) {
1371     + err = -ESTALE;
1372     + goto out_nofree;
1373     + }
1374     +
1375     + file->private_data =
1376     + kzalloc(sizeof(struct unionfs_file_info), GFP_KERNEL);
1377     + if (unlikely(!UNIONFS_F(file))) {
1378     + err = -ENOMEM;
1379     + goto out_nofree;
1380     + }
1381     + fbstart(file) = -1;
1382     + fbend(file) = -1;
1383     + atomic_set(&UNIONFS_F(file)->generation,
1384     + atomic_read(&UNIONFS_I(inode)->generation));
1385     +
1386     + size = sizeof(struct file *) * sbmax(inode->i_sb);
1387     + UNIONFS_F(file)->lower_files = kzalloc(size, GFP_KERNEL);
1388     + if (unlikely(!UNIONFS_F(file)->lower_files)) {
1389     + err = -ENOMEM;
1390     + goto out;
1391     + }
1392     + size = sizeof(int) * sbmax(inode->i_sb);
1393     + UNIONFS_F(file)->saved_branch_ids = kzalloc(size, GFP_KERNEL);
1394     + if (unlikely(!UNIONFS_F(file)->saved_branch_ids)) {
1395     + err = -ENOMEM;
1396     + goto out;
1397     + }
1398     +
1399     + bstart = fbstart(file) = dbstart(dentry);
1400     + bend = fbend(file) = dbend(dentry);
1401     +
1402     + /*
1403     + * open all directories and make the unionfs file struct point to
1404     + * these lower file structs
1405     + */
1406     + if (S_ISDIR(inode->i_mode))
1407     + err = __open_dir(inode, file); /* open a dir */
1408     + else
1409     + err = __open_file(inode, file, parent); /* open a file */
1410     +
1411     + /* freeing the allocated resources, and fput the opened files */
1412     + if (err) {
1413     + for (bindex = bstart; bindex <= bend; bindex++) {
1414     + lower_file = unionfs_lower_file_idx(file, bindex);
1415     + if (!lower_file)
1416     + continue;
1417     +
1418     + branchput(dentry->d_sb, bindex);
1419     + /* fput calls dput for lower_dentry */
1420     + fput(lower_file);
1421     + }
1422     + }
1423     +
1424     +out:
1425     + if (err) {
1426     + kfree(UNIONFS_F(file)->lower_files);
1427     + kfree(UNIONFS_F(file)->saved_branch_ids);
1428     + kfree(UNIONFS_F(file));
1429     + }
1430     +out_nofree:
1431     + if (!err) {
1432     + unionfs_postcopyup_setmnt(dentry);
1433     + unionfs_copy_attr_times(inode);
1434     + unionfs_check_file(file);
1435     + unionfs_check_inode(inode);
1436     + }
1437     + unionfs_unlock_dentry(dentry);
1438     + unionfs_unlock_parent(dentry, parent);
1439     + unionfs_read_unlock(inode->i_sb);
1440     + return err;
1441     +}
1442     +
1443     +/*
1444     + * release all lower object references & free the file info structure
1445     + *
1446     + * No need to grab sb info's rwsem.
1447     + */
1448     +int unionfs_file_release(struct inode *inode, struct file *file)
1449     +{
1450     + struct file *lower_file = NULL;
1451     + struct unionfs_file_info *fileinfo;
1452     + struct unionfs_inode_info *inodeinfo;
1453     + struct super_block *sb = inode->i_sb;
1454     + struct dentry *dentry = file->f_path.dentry;
1455     + struct dentry *parent;
1456     + int bindex, bstart, bend;
1457     + int err = 0;
1458     +
1459     + /*
1460     + * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
1461     + * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
1462     + * has been causing false positives in file system stacking layers.
1463     + * In particular, our ->mmap is called after sys_mmap2 already holds
1464     + * mmap_sem, then we lock our own mutexes; but earlier, it's
1465     + * possible for lockdep to have locked our mutexes first, and then
1466     + * we call a lower ->readdir which could call might_fault. The
1467     + * different ordering of the locks is what lockdep complains about
1468     + * -- unnecessarily. Therefore, we have no choice but to tell
1469     + * lockdep to temporarily turn off lockdep here. Note: the comments
1470     + * inside might_sleep also suggest that it would have been
1471     + * nicer to only annotate paths that needs that might_lock_read.
1472     + */
1473     + lockdep_off();
1474     + unionfs_read_lock(sb, UNIONFS_SMUTEX_PARENT);
1475     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1476     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1477     +
1478     + /*
1479     + * We try to revalidate, but the VFS ignores return return values
1480     + * from file->release, so we must always try to succeed here,
1481     + * including to do the kfree and dput below. So if revalidation
1482     + * failed, all we can do is print some message and keep going.
1483     + */
1484     + err = unionfs_file_revalidate(file, parent,
1485     + UNIONFS_F(file)->wrote_to_file);
1486     + if (!err)
1487     + unionfs_check_file(file);
1488     + fileinfo = UNIONFS_F(file);
1489     + BUG_ON(file->f_path.dentry->d_inode != inode);
1490     + inodeinfo = UNIONFS_I(inode);
1491     +
1492     + /* fput all the lower files */
1493     + bstart = fbstart(file);
1494     + bend = fbend(file);
1495     +
1496     + for (bindex = bstart; bindex <= bend; bindex++) {
1497     + lower_file = unionfs_lower_file_idx(file, bindex);
1498     +
1499     + if (lower_file) {
1500     + unionfs_set_lower_file_idx(file, bindex, NULL);
1501     + fput(lower_file);
1502     + branchput(sb, bindex);
1503     + }
1504     +
1505     + /* if there are no more refs to the dentry, dput it */
1506     + if (d_deleted(dentry)) {
1507     + dput(unionfs_lower_dentry_idx(dentry, bindex));
1508     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1509     + }
1510     + }
1511     +
1512     + kfree(fileinfo->lower_files);
1513     + kfree(fileinfo->saved_branch_ids);
1514     +
1515     + if (fileinfo->rdstate) {
1516     + fileinfo->rdstate->access = jiffies;
1517     + spin_lock(&inodeinfo->rdlock);
1518     + inodeinfo->rdcount++;
1519     + list_add_tail(&fileinfo->rdstate->cache,
1520     + &inodeinfo->readdircache);
1521     + mark_inode_dirty(inode);
1522     + spin_unlock(&inodeinfo->rdlock);
1523     + fileinfo->rdstate = NULL;
1524     + }
1525     + kfree(fileinfo);
1526     +
1527     + unionfs_unlock_dentry(dentry);
1528     + unionfs_unlock_parent(dentry, parent);
1529     + unionfs_read_unlock(sb);
1530     + lockdep_on();
1531     + return err;
1532     +}
1533     +
1534     +/* pass the ioctl to the lower fs */
1535     +static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1536     +{
1537     + struct file *lower_file;
1538     + int err;
1539     +
1540     + lower_file = unionfs_lower_file(file);
1541     +
1542     + err = -ENOTTY;
1543     + if (!lower_file || !lower_file->f_op)
1544     + goto out;
1545     + if (lower_file->f_op->unlocked_ioctl) {
1546     + err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
1547     +#ifdef CONFIG_COMPAT
1548     + } else if (lower_file->f_op->compat_ioctl) {
1549     + err = lower_file->f_op->compat_ioctl(lower_file, cmd, arg);
1550     +#endif
1551     + }
1552     +
1553     +out:
1554     + return err;
1555     +}
1556     +
1557     +/*
1558     + * return to user-space the branch indices containing the file in question
1559     + *
1560     + * We use fd_set and therefore we are limited to the number of the branches
1561     + * to FD_SETSIZE, which is currently 1024 - plenty for most people
1562     + */
1563     +static int unionfs_ioctl_queryfile(struct file *file, struct dentry *parent,
1564     + unsigned int cmd, unsigned long arg)
1565     +{
1566     + int err = 0;
1567     + fd_set branchlist;
1568     + int bstart = 0, bend = 0, bindex = 0;
1569     + int orig_bstart, orig_bend;
1570     + struct dentry *dentry, *lower_dentry;
1571     + struct vfsmount *mnt;
1572     +
1573     + dentry = file->f_path.dentry;
1574     + orig_bstart = dbstart(dentry);
1575     + orig_bend = dbend(dentry);
1576     + err = unionfs_partial_lookup(dentry, parent);
1577     + if (err)
1578     + goto out;
1579     + bstart = dbstart(dentry);
1580     + bend = dbend(dentry);
1581     +
1582     + FD_ZERO(&branchlist);
1583     +
1584     + for (bindex = bstart; bindex <= bend; bindex++) {
1585     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
1586     + if (!lower_dentry)
1587     + continue;
1588     + if (likely(lower_dentry->d_inode))
1589     + FD_SET(bindex, &branchlist);
1590     + /* purge any lower objects after partial_lookup */
1591     + if (bindex < orig_bstart || bindex > orig_bend) {
1592     + dput(lower_dentry);
1593     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
1594     + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
1595     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
1596     + NULL);
1597     + mnt = unionfs_lower_mnt_idx(dentry, bindex);
1598     + if (!mnt)
1599     + continue;
1600     + unionfs_mntput(dentry, bindex);
1601     + unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
1602     + }
1603     + }
1604     + /* restore original dentry's offsets */
1605     + dbstart(dentry) = orig_bstart;
1606     + dbend(dentry) = orig_bend;
1607     + ibstart(dentry->d_inode) = orig_bstart;
1608     + ibend(dentry->d_inode) = orig_bend;
1609     +
1610     + err = copy_to_user((void __user *)arg, &branchlist, sizeof(fd_set));
1611     + if (unlikely(err))
1612     + err = -EFAULT;
1613     +
1614     +out:
1615     + return err < 0 ? err : bend;
1616     +}
1617     +
1618     +long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
1619     +{
1620     + long err;
1621     + struct dentry *dentry = file->f_path.dentry;
1622     + struct dentry *parent;
1623     +
1624     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1625     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1626     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1627     +
1628     + err = unionfs_file_revalidate(file, parent, true);
1629     + if (unlikely(err))
1630     + goto out;
1631     +
1632     + /* check if asked for local commands */
1633     + switch (cmd) {
1634     + case UNIONFS_IOCTL_INCGEN:
1635     + /* Increment the superblock generation count */
1636     + pr_info("unionfs: incgen ioctl deprecated; "
1637     + "use \"-o remount,incgen\"\n");
1638     + err = -ENOSYS;
1639     + break;
1640     +
1641     + case UNIONFS_IOCTL_QUERYFILE:
1642     + /* Return list of branches containing the given file */
1643     + err = unionfs_ioctl_queryfile(file, parent, cmd, arg);
1644     + break;
1645     +
1646     + default:
1647     + /* pass the ioctl down */
1648     + err = do_ioctl(file, cmd, arg);
1649     + break;
1650     + }
1651     +
1652     +out:
1653     + unionfs_check_file(file);
1654     + unionfs_unlock_dentry(dentry);
1655     + unionfs_unlock_parent(dentry, parent);
1656     + unionfs_read_unlock(dentry->d_sb);
1657     + return err;
1658     +}
1659     +
1660     +int unionfs_flush(struct file *file, fl_owner_t id)
1661     +{
1662     + int err = 0;
1663     + struct file *lower_file = NULL;
1664     + struct dentry *dentry = file->f_path.dentry;
1665     + struct dentry *parent;
1666     + int bindex, bstart, bend;
1667     +
1668     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
1669     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
1670     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
1671     +
1672     + err = unionfs_file_revalidate(file, parent,
1673     + UNIONFS_F(file)->wrote_to_file);
1674     + if (unlikely(err))
1675     + goto out;
1676     + unionfs_check_file(file);
1677     +
1678     + bstart = fbstart(file);
1679     + bend = fbend(file);
1680     + for (bindex = bstart; bindex <= bend; bindex++) {
1681     + lower_file = unionfs_lower_file_idx(file, bindex);
1682     +
1683     + if (lower_file && lower_file->f_op &&
1684     + lower_file->f_op->flush) {
1685     + err = lower_file->f_op->flush(lower_file, id);
1686     + if (err)
1687     + goto out;
1688     + }
1689     +
1690     + }
1691     +
1692     +out:
1693     + if (!err)
1694     + unionfs_check_file(file);
1695     + unionfs_unlock_dentry(dentry);
1696     + unionfs_unlock_parent(dentry, parent);
1697     + unionfs_read_unlock(dentry->d_sb);
1698     + return err;
1699     +}
1700     diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
1701     new file mode 100644
1702     index 0000000..37c2654
1703     --- /dev/null
1704     +++ b/fs/unionfs/copyup.c
1705     @@ -0,0 +1,896 @@
1706     +/*
1707     + * Copyright (c) 2003-2011 Erez Zadok
1708     + * Copyright (c) 2003-2006 Charles P. Wright
1709     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
1710     + * Copyright (c) 2005-2006 Junjiro Okajima
1711     + * Copyright (c) 2005 Arun M. Krishnakumar
1712     + * Copyright (c) 2004-2006 David P. Quigley
1713     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
1714     + * Copyright (c) 2003 Puja Gupta
1715     + * Copyright (c) 2003 Harikesavan Krishnan
1716     + * Copyright (c) 2003-2011 Stony Brook University
1717     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
1718     + *
1719     + * This program is free software; you can redistribute it and/or modify
1720     + * it under the terms of the GNU General Public License version 2 as
1721     + * published by the Free Software Foundation.
1722     + */
1723     +
1724     +#include "union.h"
1725     +
1726     +/*
1727     + * For detailed explanation of copyup see:
1728     + * Documentation/filesystems/unionfs/concepts.txt
1729     + */
1730     +
1731     +#ifdef CONFIG_UNION_FS_XATTR
1732     +/* copyup all extended attrs for a given dentry */
1733     +static int copyup_xattrs(struct dentry *old_lower_dentry,
1734     + struct dentry *new_lower_dentry)
1735     +{
1736     + int err = 0;
1737     + ssize_t list_size = -1;
1738     + char *name_list = NULL;
1739     + char *attr_value = NULL;
1740     + char *name_list_buf = NULL;
1741     +
1742     + /* query the actual size of the xattr list */
1743     + list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
1744     + if (list_size <= 0) {
1745     + err = list_size;
1746     + goto out;
1747     + }
1748     +
1749     + /* allocate space for the actual list */
1750     + name_list = unionfs_xattr_alloc(list_size + 1, XATTR_LIST_MAX);
1751     + if (unlikely(!name_list || IS_ERR(name_list))) {
1752     + err = PTR_ERR(name_list);
1753     + goto out;
1754     + }
1755     +
1756     + name_list_buf = name_list; /* save for kfree at end */
1757     +
1758     + /* now get the actual xattr list of the source file */
1759     + list_size = vfs_listxattr(old_lower_dentry, name_list, list_size);
1760     + if (list_size <= 0) {
1761     + err = list_size;
1762     + goto out;
1763     + }
1764     +
1765     + /* allocate space to hold each xattr's value */
1766     + attr_value = unionfs_xattr_alloc(XATTR_SIZE_MAX, XATTR_SIZE_MAX);
1767     + if (unlikely(!attr_value || IS_ERR(attr_value))) {
1768     + err = PTR_ERR(name_list);
1769     + goto out;
1770     + }
1771     +
1772     + /* in a loop, get and set each xattr from src to dst file */
1773     + while (*name_list) {
1774     + ssize_t size;
1775     +
1776     + /* Lock here since vfs_getxattr doesn't lock for us */
1777     + mutex_lock(&old_lower_dentry->d_inode->i_mutex);
1778     + size = vfs_getxattr(old_lower_dentry, name_list,
1779     + attr_value, XATTR_SIZE_MAX);
1780     + mutex_unlock(&old_lower_dentry->d_inode->i_mutex);
1781     + if (size < 0) {
1782     + err = size;
1783     + goto out;
1784     + }
1785     + if (size > XATTR_SIZE_MAX) {
1786     + err = -E2BIG;
1787     + goto out;
1788     + }
1789     + /* Don't lock here since vfs_setxattr does it for us. */
1790     + err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
1791     + size, 0);
1792     + /*
1793     + * Selinux depends on "security.*" xattrs, so to maintain
1794     + * the security of copied-up files, if Selinux is active,
1795     + * then we must copy these xattrs as well. So we need to
1796     + * temporarily get FOWNER privileges.
1797     + * XXX: move entire copyup code to SIOQ.
1798     + */
1799     + if (err == -EPERM && !capable(CAP_FOWNER)) {
1800     + const struct cred *old_creds;
1801     + struct cred *new_creds;
1802     +
1803     + new_creds = prepare_creds();
1804     + if (unlikely(!new_creds)) {
1805     + err = -ENOMEM;
1806     + goto out;
1807     + }
1808     + cap_raise(new_creds->cap_effective, CAP_FOWNER);
1809     + old_creds = override_creds(new_creds);
1810     + err = vfs_setxattr(new_lower_dentry, name_list,
1811     + attr_value, size, 0);
1812     + revert_creds(old_creds);
1813     + }
1814     + if (err < 0)
1815     + goto out;
1816     + name_list += strlen(name_list) + 1;
1817     + }
1818     +out:
1819     + unionfs_xattr_kfree(name_list_buf);
1820     + unionfs_xattr_kfree(attr_value);
1821     + /* Ignore if xattr isn't supported */
1822     + if (err == -ENOTSUPP || err == -EOPNOTSUPP)
1823     + err = 0;
1824     + return err;
1825     +}
1826     +#endif /* CONFIG_UNION_FS_XATTR */
1827     +
1828     +/*
1829     + * Determine the mode based on the copyup flags, and the existing dentry.
1830     + *
1831     + * Handle file systems which may not support certain options. For example
1832     + * jffs2 doesn't allow one to chmod a symlink. So we ignore such harmless
1833     + * errors, rather than propagating them up, which results in copyup errors
1834     + * and errors returned back to users.
1835     + */
1836     +static int copyup_permissions(struct super_block *sb,
1837     + struct dentry *old_lower_dentry,
1838     + struct dentry *new_lower_dentry)
1839     +{
1840     + struct inode *i = old_lower_dentry->d_inode;
1841     + struct iattr newattrs;
1842     + int err;
1843     +
1844     + newattrs.ia_atime = i->i_atime;
1845     + newattrs.ia_mtime = i->i_mtime;
1846     + newattrs.ia_ctime = i->i_ctime;
1847     + newattrs.ia_gid = i->i_gid;
1848     + newattrs.ia_uid = i->i_uid;
1849     + newattrs.ia_valid = ATTR_CTIME | ATTR_ATIME | ATTR_MTIME |
1850     + ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_FORCE |
1851     + ATTR_GID | ATTR_UID;
1852     + mutex_lock(&new_lower_dentry->d_inode->i_mutex);
1853     + err = notify_change(new_lower_dentry, &newattrs);
1854     + if (err)
1855     + goto out;
1856     +
1857     + /* now try to change the mode and ignore EOPNOTSUPP on symlinks */
1858     + newattrs.ia_mode = i->i_mode;
1859     + newattrs.ia_valid = ATTR_MODE | ATTR_FORCE;
1860     + err = notify_change(new_lower_dentry, &newattrs);
1861     + if (err == -EOPNOTSUPP &&
1862     + S_ISLNK(new_lower_dentry->d_inode->i_mode)) {
1863     + printk(KERN_WARNING
1864     + "unionfs: changing \"%s\" symlink mode unsupported\n",
1865     + new_lower_dentry->d_name.name);
1866     + err = 0;
1867     + }
1868     +
1869     +out:
1870     + mutex_unlock(&new_lower_dentry->d_inode->i_mutex);
1871     + return err;
1872     +}
1873     +
1874     +/*
1875     + * create the new device/file/directory - use copyup_permission to copyup
1876     + * times, and mode
1877     + *
1878     + * if the object being copied up is a regular file, the file is only created,
1879     + * the contents have to be copied up separately
1880     + */
1881     +static int __copyup_ndentry(struct dentry *old_lower_dentry,
1882     + struct dentry *new_lower_dentry,
1883     + struct dentry *new_lower_parent_dentry,
1884     + char *symbuf)
1885     +{
1886     + int err = 0;
1887     + umode_t old_mode = old_lower_dentry->d_inode->i_mode;
1888     + struct sioq_args args;
1889     +
1890     + if (S_ISDIR(old_mode)) {
1891     + args.mkdir.parent = new_lower_parent_dentry->d_inode;
1892     + args.mkdir.dentry = new_lower_dentry;
1893     + args.mkdir.mode = old_mode;
1894     +
1895     + run_sioq(__unionfs_mkdir, &args);
1896     + err = args.err;
1897     + } else if (S_ISLNK(old_mode)) {
1898     + args.symlink.parent = new_lower_parent_dentry->d_inode;
1899     + args.symlink.dentry = new_lower_dentry;
1900     + args.symlink.symbuf = symbuf;
1901     +
1902     + run_sioq(__unionfs_symlink, &args);
1903     + err = args.err;
1904     + } else if (S_ISBLK(old_mode) || S_ISCHR(old_mode) ||
1905     + S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) {
1906     + args.mknod.parent = new_lower_parent_dentry->d_inode;
1907     + args.mknod.dentry = new_lower_dentry;
1908     + args.mknod.mode = old_mode;
1909     + args.mknod.dev = old_lower_dentry->d_inode->i_rdev;
1910     +
1911     + run_sioq(__unionfs_mknod, &args);
1912     + err = args.err;
1913     + } else if (S_ISREG(old_mode)) {
1914     + struct nameidata nd;
1915     + err = init_lower_nd(&nd, LOOKUP_CREATE);
1916     + if (unlikely(err < 0))
1917     + goto out;
1918     + args.create.nd = &nd;
1919     + args.create.parent = new_lower_parent_dentry->d_inode;
1920     + args.create.dentry = new_lower_dentry;
1921     + args.create.mode = old_mode;
1922     +
1923     + run_sioq(__unionfs_create, &args);
1924     + err = args.err;
1925     + release_lower_nd(&nd, err);
1926     + } else {
1927     + printk(KERN_CRIT "unionfs: unknown inode type %d\n",
1928     + old_mode);
1929     + BUG();
1930     + }
1931     +
1932     +out:
1933     + return err;
1934     +}
1935     +
1936     +static int __copyup_reg_data(struct dentry *dentry,
1937     + struct dentry *new_lower_dentry, int new_bindex,
1938     + struct dentry *old_lower_dentry, int old_bindex,
1939     + struct file **copyup_file, loff_t len)
1940     +{
1941     + struct super_block *sb = dentry->d_sb;
1942     + struct file *input_file;
1943     + struct file *output_file;
1944     + struct vfsmount *output_mnt;
1945     + mm_segment_t old_fs;
1946     + char *buf = NULL;
1947     + ssize_t read_bytes, write_bytes;
1948     + loff_t size;
1949     + int err = 0;
1950     +
1951     + /* open old file */
1952     + unionfs_mntget(dentry, old_bindex);
1953     + branchget(sb, old_bindex);
1954     + /* dentry_open calls dput and mntput if it returns an error */
1955     + input_file = dentry_open(old_lower_dentry,
1956     + unionfs_lower_mnt_idx(dentry, old_bindex),
1957     + O_RDONLY | O_LARGEFILE, current_cred());
1958     + if (IS_ERR(input_file)) {
1959     + dput(old_lower_dentry);
1960     + err = PTR_ERR(input_file);
1961     + goto out;
1962     + }
1963     + if (unlikely(!input_file->f_op || !input_file->f_op->read)) {
1964     + err = -EINVAL;
1965     + goto out_close_in;
1966     + }
1967     +
1968     + /* open new file */
1969     + dget(new_lower_dentry);
1970     + output_mnt = unionfs_mntget(sb->s_root, new_bindex);
1971     + branchget(sb, new_bindex);
1972     + output_file = dentry_open(new_lower_dentry, output_mnt,
1973     + O_RDWR | O_LARGEFILE, current_cred());
1974     + if (IS_ERR(output_file)) {
1975     + err = PTR_ERR(output_file);
1976     + goto out_close_in2;
1977     + }
1978     + if (unlikely(!output_file->f_op || !output_file->f_op->write)) {
1979     + err = -EINVAL;
1980     + goto out_close_out;
1981     + }
1982     +
1983     + /* allocating a buffer */
1984     + buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
1985     + if (unlikely(!buf)) {
1986     + err = -ENOMEM;
1987     + goto out_close_out;
1988     + }
1989     +
1990     + input_file->f_pos = 0;
1991     + output_file->f_pos = 0;
1992     +
1993     + old_fs = get_fs();
1994     + set_fs(KERNEL_DS);
1995     +
1996     + size = len;
1997     + err = 0;
1998     + do {
1999     + if (len >= PAGE_SIZE)
2000     + size = PAGE_SIZE;
2001     + else if ((len < PAGE_SIZE) && (len > 0))
2002     + size = len;
2003     +
2004     + len -= PAGE_SIZE;
2005     +
2006     + read_bytes =
2007     + input_file->f_op->read(input_file,
2008     + (char __user *)buf, size,
2009     + &input_file->f_pos);
2010     + if (read_bytes <= 0) {
2011     + err = read_bytes;
2012     + break;
2013     + }
2014     +
2015     + /* see Documentation/filesystems/unionfs/issues.txt */
2016     + lockdep_off();
2017     + write_bytes =
2018     + output_file->f_op->write(output_file,
2019     + (char __user *)buf,
2020     + read_bytes,
2021     + &output_file->f_pos);
2022     + lockdep_on();
2023     + if ((write_bytes < 0) || (write_bytes < read_bytes)) {
2024     + err = write_bytes;
2025     + break;
2026     + }
2027     + } while ((read_bytes > 0) && (len > 0));
2028     +
2029     + set_fs(old_fs);
2030     +
2031     + kfree(buf);
2032     +
2033     + if (!err)
2034     + err = output_file->f_op->fsync(output_file, 0);
2035     +
2036     + if (err)
2037     + goto out_close_out;
2038     +
2039     + if (copyup_file) {
2040     + *copyup_file = output_file;
2041     + goto out_close_in;
2042     + }
2043     +
2044     +out_close_out:
2045     + fput(output_file);
2046     +
2047     +out_close_in2:
2048     + branchput(sb, new_bindex);
2049     +
2050     +out_close_in:
2051     + fput(input_file);
2052     +
2053     +out:
2054     + branchput(sb, old_bindex);
2055     +
2056     + return err;
2057     +}
2058     +
2059     +/*
2060     + * dput the lower references for old and new dentry & clear a lower dentry
2061     + * pointer
2062     + */
2063     +static void __clear(struct dentry *dentry, struct dentry *old_lower_dentry,
2064     + int old_bstart, int old_bend,
2065     + struct dentry *new_lower_dentry, int new_bindex)
2066     +{
2067     + /* get rid of the lower dentry and all its traces */
2068     + unionfs_set_lower_dentry_idx(dentry, new_bindex, NULL);
2069     + dbstart(dentry) = old_bstart;
2070     + dbend(dentry) = old_bend;
2071     +
2072     + dput(new_lower_dentry);
2073     + dput(old_lower_dentry);
2074     +}
2075     +
2076     +/*
2077     + * Copy up a dentry to a file of specified name.
2078     + *
2079     + * @dir: used to pull the ->i_sb to access other branches
2080     + * @dentry: the non-negative dentry whose lower_inode we should copy
2081     + * @bstart: the branch of the lower_inode to copy from
2082     + * @new_bindex: the branch to create the new file in
2083     + * @name: the name of the file to create
2084     + * @namelen: length of @name
2085     + * @copyup_file: the "struct file" to return (optional)
2086     + * @len: how many bytes to copy-up?
2087     + */
2088     +int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,
2089     + int new_bindex, const char *name, int namelen,
2090     + struct file **copyup_file, loff_t len)
2091     +{
2092     + struct dentry *new_lower_dentry;
2093     + struct dentry *old_lower_dentry = NULL;
2094     + struct super_block *sb;
2095     + int err = 0;
2096     + int old_bindex;
2097     + int old_bstart;
2098     + int old_bend;
2099     + struct dentry *new_lower_parent_dentry = NULL;
2100     + mm_segment_t oldfs;
2101     + char *symbuf = NULL;
2102     +
2103     + verify_locked(dentry);
2104     +
2105     + old_bindex = bstart;
2106     + old_bstart = dbstart(dentry);
2107     + old_bend = dbend(dentry);
2108     +
2109     + BUG_ON(new_bindex < 0);
2110     + BUG_ON(new_bindex >= old_bindex);
2111     +
2112     + sb = dir->i_sb;
2113     +
2114     + err = is_robranch_super(sb, new_bindex);
2115     + if (err)
2116     + goto out;
2117     +
2118     + /* Create the directory structure above this dentry. */
2119     + new_lower_dentry = create_parents(dir, dentry, name, new_bindex);
2120     + if (IS_ERR(new_lower_dentry)) {
2121     + err = PTR_ERR(new_lower_dentry);
2122     + goto out;
2123     + }
2124     +
2125     + old_lower_dentry = unionfs_lower_dentry_idx(dentry, old_bindex);
2126     + /* we conditionally dput this old_lower_dentry at end of function */
2127     + dget(old_lower_dentry);
2128     +
2129     + /* For symlinks, we must read the link before we lock the directory. */
2130     + if (S_ISLNK(old_lower_dentry->d_inode->i_mode)) {
2131     +
2132     + symbuf = kmalloc(PATH_MAX, GFP_KERNEL);
2133     + if (unlikely(!symbuf)) {
2134     + __clear(dentry, old_lower_dentry,
2135     + old_bstart, old_bend,
2136     + new_lower_dentry, new_bindex);
2137     + err = -ENOMEM;
2138     + goto out_free;
2139     + }
2140     +
2141     + oldfs = get_fs();
2142     + set_fs(KERNEL_DS);
2143     + err = old_lower_dentry->d_inode->i_op->readlink(
2144     + old_lower_dentry,
2145     + (char __user *)symbuf,
2146     + PATH_MAX);
2147     + set_fs(oldfs);
2148     + if (err < 0) {
2149     + __clear(dentry, old_lower_dentry,
2150     + old_bstart, old_bend,
2151     + new_lower_dentry, new_bindex);
2152     + goto out_free;
2153     + }
2154     + symbuf[err] = '\0';
2155     + }
2156     +
2157     + /* Now we lock the parent, and create the object in the new branch. */
2158     + new_lower_parent_dentry = lock_parent(new_lower_dentry);
2159     +
2160     + /* create the new inode */
2161     + err = __copyup_ndentry(old_lower_dentry, new_lower_dentry,
2162     + new_lower_parent_dentry, symbuf);
2163     +
2164     + if (err) {
2165     + __clear(dentry, old_lower_dentry,
2166     + old_bstart, old_bend,
2167     + new_lower_dentry, new_bindex);
2168     + goto out_unlock;
2169     + }
2170     +
2171     + /* We actually copyup the file here. */
2172     + if (S_ISREG(old_lower_dentry->d_inode->i_mode))
2173     + err = __copyup_reg_data(dentry, new_lower_dentry, new_bindex,
2174     + old_lower_dentry, old_bindex,
2175     + copyup_file, len);
2176     + if (err)
2177     + goto out_unlink;
2178     +
2179     + /* Set permissions. */
2180     + err = copyup_permissions(sb, old_lower_dentry, new_lower_dentry);
2181     + if (err)
2182     + goto out_unlink;
2183     +
2184     +#ifdef CONFIG_UNION_FS_XATTR
2185     + /* Selinux uses extended attributes for permissions. */
2186     + err = copyup_xattrs(old_lower_dentry, new_lower_dentry);
2187     + if (err)
2188     + goto out_unlink;
2189     +#endif /* CONFIG_UNION_FS_XATTR */
2190     +
2191     + /* do not allow files getting deleted to be re-interposed */
2192     + if (!d_deleted(dentry))
2193     + unionfs_reinterpose(dentry);
2194     +
2195     + goto out_unlock;
2196     +
2197     +out_unlink:
2198     + /*
2199     + * copyup failed, because we possibly ran out of space or
2200     + * quota, or something else happened so let's unlink; we don't
2201     + * really care about the return value of vfs_unlink
2202     + */
2203     + vfs_unlink(new_lower_parent_dentry->d_inode, new_lower_dentry);
2204     +
2205     + if (copyup_file) {
2206     + /* need to close the file */
2207     +
2208     + fput(*copyup_file);
2209     + branchput(sb, new_bindex);
2210     + }
2211     +
2212     + /*
2213     + * TODO: should we reset the error to something like -EIO?
2214     + *
2215     + * If we don't reset, the user may get some nonsensical errors, but
2216     + * on the other hand, if we reset to EIO, we guarantee that the user
2217     + * will get a "confusing" error message.
2218     + */
2219     +
2220     +out_unlock:
2221     + unlock_dir(new_lower_parent_dentry);
2222     +
2223     +out_free:
2224     + /*
2225     + * If old_lower_dentry was not a file, then we need to dput it. If
2226     + * it was a file, then it was already dput indirectly by other
2227     + * functions we call above which operate on regular files.
2228     + */
2229     + if (old_lower_dentry && old_lower_dentry->d_inode &&
2230     + !S_ISREG(old_lower_dentry->d_inode->i_mode))
2231     + dput(old_lower_dentry);
2232     + kfree(symbuf);
2233     +
2234     + if (err) {
2235     + /*
2236     + * if directory creation succeeded, but inode copyup failed,
2237     + * then purge new dentries.
2238     + */
2239     + if (dbstart(dentry) < old_bstart &&
2240     + ibstart(dentry->d_inode) > dbstart(dentry))
2241     + __clear(dentry, NULL, old_bstart, old_bend,
2242     + unionfs_lower_dentry(dentry), dbstart(dentry));
2243     + goto out;
2244     + }
2245     + if (!S_ISDIR(dentry->d_inode->i_mode)) {
2246     + unionfs_postcopyup_release(dentry);
2247     + if (!unionfs_lower_inode(dentry->d_inode)) {
2248     + /*
2249     + * If we got here, then we copied up to an
2250     + * unlinked-open file, whose name is .unionfsXXXXX.
2251     + */
2252     + struct inode *inode = new_lower_dentry->d_inode;
2253     + atomic_inc(&inode->i_count);
2254     + unionfs_set_lower_inode_idx(dentry->d_inode,
2255     + ibstart(dentry->d_inode),
2256     + inode);
2257     + }
2258     + }
2259     + unionfs_postcopyup_setmnt(dentry);
2260     + /* sync inode times from copied-up inode to our inode */
2261     + unionfs_copy_attr_times(dentry->d_inode);
2262     + unionfs_check_inode(dir);
2263     + unionfs_check_dentry(dentry);
2264     +out:
2265     + return err;
2266     +}
2267     +
2268     +/*
2269     + * This function creates a copy of a file represented by 'file' which
2270     + * currently resides in branch 'bstart' to branch 'new_bindex.' The copy
2271     + * will be named "name".
2272     + */
2273     +int copyup_named_file(struct inode *dir, struct file *file, char *name,
2274     + int bstart, int new_bindex, loff_t len)
2275     +{
2276     + int err = 0;
2277     + struct file *output_file = NULL;
2278     +
2279     + err = copyup_dentry(dir, file->f_path.dentry, bstart, new_bindex,
2280     + name, strlen(name), &output_file, len);
2281     + if (!err) {
2282     + fbstart(file) = new_bindex;
2283     + unionfs_set_lower_file_idx(file, new_bindex, output_file);
2284     + }
2285     +
2286     + return err;
2287     +}
2288     +
2289     +/*
2290     + * This function creates a copy of a file represented by 'file' which
2291     + * currently resides in branch 'bstart' to branch 'new_bindex'.
2292     + */
2293     +int copyup_file(struct inode *dir, struct file *file, int bstart,
2294     + int new_bindex, loff_t len)
2295     +{
2296     + int err = 0;
2297     + struct file *output_file = NULL;
2298     + struct dentry *dentry = file->f_path.dentry;
2299     +
2300     + err = copyup_dentry(dir, dentry, bstart, new_bindex,
2301     + dentry->d_name.name, dentry->d_name.len,
2302     + &output_file, len);
2303     + if (!err) {
2304     + fbstart(file) = new_bindex;
2305     + unionfs_set_lower_file_idx(file, new_bindex, output_file);
2306     + }
2307     +
2308     + return err;
2309     +}
2310     +
2311     +/* purge a dentry's lower-branch states (dput/mntput, etc.) */
2312     +static void __cleanup_dentry(struct dentry *dentry, int bindex,
2313     + int old_bstart, int old_bend)
2314     +{
2315     + int loop_start;
2316     + int loop_end;
2317     + int new_bstart = -1;
2318     + int new_bend = -1;
2319     + int i;
2320     +
2321     + loop_start = min(old_bstart, bindex);
2322     + loop_end = max(old_bend, bindex);
2323     +
2324     + /*
2325     + * This loop sets the bstart and bend for the new dentry by
2326     + * traversing from left to right. It also dputs all negative
2327     + * dentries except bindex
2328     + */
2329     + for (i = loop_start; i <= loop_end; i++) {
2330     + if (!unionfs_lower_dentry_idx(dentry, i))
2331     + continue;
2332     +
2333     + if (i == bindex) {
2334     + new_bend = i;
2335     + if (new_bstart < 0)
2336     + new_bstart = i;
2337     + continue;
2338     + }
2339     +
2340     + if (!unionfs_lower_dentry_idx(dentry, i)->d_inode) {
2341     + dput(unionfs_lower_dentry_idx(dentry, i));
2342     + unionfs_set_lower_dentry_idx(dentry, i, NULL);
2343     +
2344     + unionfs_mntput(dentry, i);
2345     + unionfs_set_lower_mnt_idx(dentry, i, NULL);
2346     + } else {
2347     + if (new_bstart < 0)
2348     + new_bstart = i;
2349     + new_bend = i;
2350     + }
2351     + }
2352     +
2353     + if (new_bstart < 0)
2354     + new_bstart = bindex;
2355     + if (new_bend < 0)
2356     + new_bend = bindex;
2357     + dbstart(dentry) = new_bstart;
2358     + dbend(dentry) = new_bend;
2359     +
2360     +}
2361     +
2362     +/* set lower inode ptr and update bstart & bend if necessary */
2363     +static void __set_inode(struct dentry *upper, struct dentry *lower,
2364     + int bindex)
2365     +{
2366     + unionfs_set_lower_inode_idx(upper->d_inode, bindex,
2367     + igrab(lower->d_inode));
2368     + if (likely(ibstart(upper->d_inode) > bindex))
2369     + ibstart(upper->d_inode) = bindex;
2370     + if (likely(ibend(upper->d_inode) < bindex))
2371     + ibend(upper->d_inode) = bindex;
2372     +
2373     +}
2374     +
2375     +/* set lower dentry ptr and update bstart & bend if necessary */
2376     +static void __set_dentry(struct dentry *upper, struct dentry *lower,
2377     + int bindex)
2378     +{
2379     + unionfs_set_lower_dentry_idx(upper, bindex, lower);
2380     + if (likely(dbstart(upper) > bindex))
2381     + dbstart(upper) = bindex;
2382     + if (likely(dbend(upper) < bindex))
2383     + dbend(upper) = bindex;
2384     +}
2385     +
2386     +/*
2387     + * This function replicates the directory structure up-to given dentry
2388     + * in the bindex branch.
2389     + */
2390     +struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
2391     + const char *name, int bindex)
2392     +{
2393     + int err;
2394     + struct dentry *child_dentry;
2395     + struct dentry *parent_dentry;
2396     + struct dentry *lower_parent_dentry = NULL;
2397     + struct dentry *lower_dentry = NULL;
2398     + const char *childname;
2399     + unsigned int childnamelen;
2400     + int nr_dentry;
2401     + int count = 0;
2402     + int old_bstart;
2403     + int old_bend;
2404     + struct dentry **path = NULL;
2405     + struct super_block *sb;
2406     +
2407     + verify_locked(dentry);
2408     +
2409     + err = is_robranch_super(dir->i_sb, bindex);
2410     + if (err) {
2411     + lower_dentry = ERR_PTR(err);
2412     + goto out;
2413     + }
2414     +
2415     + old_bstart = dbstart(dentry);
2416     + old_bend = dbend(dentry);
2417     +
2418     + lower_dentry = ERR_PTR(-ENOMEM);
2419     +
2420     + /* There is no sense allocating any less than the minimum. */
2421     + nr_dentry = 1;
2422     + path = kmalloc(nr_dentry * sizeof(struct dentry *), GFP_KERNEL);
2423     + if (unlikely(!path))
2424     + goto out;
2425     +
2426     + /* assume the negative dentry of unionfs as the parent dentry */
2427     + parent_dentry = dentry;
2428     +
2429     + /*
2430     + * This loop finds the first parent that exists in the given branch.
2431     + * We start building the directory structure from there. At the end
2432     + * of the loop, the following should hold:
2433     + * - child_dentry is the first nonexistent child
2434     + * - parent_dentry is the first existent parent
2435     + * - path[0] is the = deepest child
2436     + * - path[count] is the first child to create
2437     + */
2438     + do {
2439     + child_dentry = parent_dentry;
2440     +
2441     + /* find the parent directory dentry in unionfs */
2442     + parent_dentry = dget_parent(child_dentry);
2443     +
2444     + /* find out the lower_parent_dentry in the given branch */
2445     + lower_parent_dentry =
2446     + unionfs_lower_dentry_idx(parent_dentry, bindex);
2447     +
2448     + /* grow path table */
2449     + if (count == nr_dentry) {
2450     + void *p;
2451     +
2452     + nr_dentry *= 2;
2453     + p = krealloc(path, nr_dentry * sizeof(struct dentry *),
2454     + GFP_KERNEL);
2455     + if (unlikely(!p)) {
2456     + lower_dentry = ERR_PTR(-ENOMEM);
2457     + goto out;
2458     + }
2459     + path = p;
2460     + }
2461     +
2462     + /* store the child dentry */
2463     + path[count++] = child_dentry;
2464     + } while (!lower_parent_dentry);
2465     + count--;
2466     +
2467     + sb = dentry->d_sb;
2468     +
2469     + /*
2470     + * This code goes between the begin/end labels and basically
2471     + * emulates a while(child_dentry != dentry), only cleaner and
2472     + * shorter than what would be a much longer while loop.
2473     + */
2474     +begin:
2475     + /* get lower parent dir in the current branch */
2476     + lower_parent_dentry = unionfs_lower_dentry_idx(parent_dentry, bindex);
2477     + dput(parent_dentry);
2478     +
2479     + /* init the values to lookup */
2480     + childname = child_dentry->d_name.name;
2481     + childnamelen = child_dentry->d_name.len;
2482     +
2483     + if (child_dentry != dentry) {
2484     + /* lookup child in the underlying file system */
2485     + lower_dentry = lookup_lck_len(childname, lower_parent_dentry,
2486     + childnamelen);
2487     + if (IS_ERR(lower_dentry))
2488     + goto out;
2489     + } else {
2490     + /*
2491     + * Is the name a whiteout of the child name ? lookup the
2492     + * whiteout child in the underlying file system
2493     + */
2494     + lower_dentry = lookup_lck_len(name, lower_parent_dentry,
2495     + strlen(name));
2496     + if (IS_ERR(lower_dentry))
2497     + goto out;
2498     +
2499     + /* Replace the current dentry (if any) with the new one */
2500     + dput(unionfs_lower_dentry_idx(dentry, bindex));
2501     + unionfs_set_lower_dentry_idx(dentry, bindex,
2502     + lower_dentry);
2503     +
2504     + __cleanup_dentry(dentry, bindex, old_bstart, old_bend);
2505     + goto out;
2506     + }
2507     +
2508     + if (lower_dentry->d_inode) {
2509     + /*
2510     + * since this already exists we dput to avoid
2511     + * multiple references on the same dentry
2512     + */
2513     + dput(lower_dentry);
2514     + } else {
2515     + struct sioq_args args;
2516     +
2517     + /* it's a negative dentry, create a new dir */
2518     + lower_parent_dentry = lock_parent(lower_dentry);
2519     +
2520     + args.mkdir.parent = lower_parent_dentry->d_inode;
2521     + args.mkdir.dentry = lower_dentry;
2522     + args.mkdir.mode = child_dentry->d_inode->i_mode;
2523     +
2524     + run_sioq(__unionfs_mkdir, &args);
2525     + err = args.err;
2526     +
2527     + if (!err)
2528     + err = copyup_permissions(dir->i_sb, child_dentry,
2529     + lower_dentry);
2530     + unlock_dir(lower_parent_dentry);
2531     + if (err) {
2532     + dput(lower_dentry);
2533     + lower_dentry = ERR_PTR(err);
2534     + goto out;
2535     + }
2536     +
2537     + }
2538     +
2539     + __set_inode(child_dentry, lower_dentry, bindex);
2540     + __set_dentry(child_dentry, lower_dentry, bindex);
2541     + /*
2542     + * update times of this dentry, but also the parent, because if
2543     + * we changed, the parent may have changed too.
2544     + */
2545     + fsstack_copy_attr_times(parent_dentry->d_inode,
2546     + lower_parent_dentry->d_inode);
2547     + unionfs_copy_attr_times(child_dentry->d_inode);
2548     +
2549     + parent_dentry = child_dentry;
2550     + child_dentry = path[--count];
2551     + goto begin;
2552     +out:
2553     + /* cleanup any leftover locks from the do/while loop above */
2554     + if (IS_ERR(lower_dentry))
2555     + while (count)
2556     + dput(path[count--]);
2557     + kfree(path);
2558     + return lower_dentry;
2559     +}
2560     +
2561     +/*
2562     + * Post-copyup helper to ensure we have valid mnts: set lower mnt of
2563     + * dentry+parents to the first parent node that has an mnt.
2564     + */
2565     +void unionfs_postcopyup_setmnt(struct dentry *dentry)
2566     +{
2567     + struct dentry *parent, *hasone;
2568     + int bindex = dbstart(dentry);
2569     +
2570     + if (unionfs_lower_mnt_idx(dentry, bindex))
2571     + return;
2572     + hasone = dentry->d_parent;
2573     + /* this loop should stop at root dentry */
2574     + while (!unionfs_lower_mnt_idx(hasone, bindex))
2575     + hasone = hasone->d_parent;
2576     + parent = dentry;
2577     + while (!unionfs_lower_mnt_idx(parent, bindex)) {
2578     + unionfs_set_lower_mnt_idx(parent, bindex,
2579     + unionfs_mntget(hasone, bindex));
2580     + parent = parent->d_parent;
2581     + }
2582     +}
2583     +
2584     +/*
2585     + * Post-copyup helper to release all non-directory source objects of a
2586     + * copied-up file. Regular files should have only one lower object.
2587     + */
2588     +void unionfs_postcopyup_release(struct dentry *dentry)
2589     +{
2590     + int bstart, bend;
2591     +
2592     + BUG_ON(S_ISDIR(dentry->d_inode->i_mode));
2593     + bstart = dbstart(dentry);
2594     + bend = dbend(dentry);
2595     +
2596     + path_put_lowers(dentry, bstart + 1, bend, false);
2597     + iput_lowers(dentry->d_inode, bstart + 1, bend, false);
2598     +
2599     + dbend(dentry) = bstart;
2600     + ibend(dentry->d_inode) = ibstart(dentry->d_inode) = bstart;
2601     +}
2602     diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c
2603     new file mode 100644
2604     index 0000000..6092e69
2605     --- /dev/null
2606     +++ b/fs/unionfs/debug.c
2607     @@ -0,0 +1,548 @@
2608     +/*
2609     + * Copyright (c) 2003-2011 Erez Zadok
2610     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
2611     + * Copyright (c) 2003-2011 Stony Brook University
2612     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
2613     + *
2614     + * This program is free software; you can redistribute it and/or modify
2615     + * it under the terms of the GNU General Public License version 2 as
2616     + * published by the Free Software Foundation.
2617     + */
2618     +
2619     +#include "union.h"
2620     +
2621     +/*
2622     + * Helper debugging functions for maintainers (and for users to report back
2623     + * useful information back to maintainers)
2624     + */
2625     +
2626     +/* it's always useful to know what part of the code called us */
2627     +#define PRINT_CALLER(fname, fxn, line) \
2628     + do { \
2629     + if (!printed_caller) { \
2630     + pr_debug("PC:%s:%s:%d\n", (fname), (fxn), (line)); \
2631     + printed_caller = 1; \
2632     + } \
2633     + } while (0)
2634     +
2635     +/*
2636     + * __unionfs_check_{inode,dentry,file} perform exhaustive sanity checking on
2637     + * the fan-out of various Unionfs objects. We check that no lower objects
2638     + * exist outside the start/end branch range; that all objects within are
2639     + * non-NULL (with some allowed exceptions); that for every lower file
2640     + * there's a lower dentry+inode; that the start/end ranges match for all
2641     + * corresponding lower objects; that open files/symlinks have only one lower
2642     + * objects, but directories can have several; and more.
2643     + */
2644     +void __unionfs_check_inode(const struct inode *inode,
2645     + const char *fname, const char *fxn, int line)
2646     +{
2647     + int bindex;
2648     + int istart, iend;
2649     + struct inode *lower_inode;
2650     + struct super_block *sb;
2651     + int printed_caller = 0;
2652     + void *poison_ptr;
2653     +
2654     + /* for inodes now */
2655     + BUG_ON(!inode);
2656     + sb = inode->i_sb;
2657     + istart = ibstart(inode);
2658     + iend = ibend(inode);
2659     + /* don't check inode if no lower branches */
2660     + if (istart < 0 && iend < 0)
2661     + return;
2662     + if (unlikely(istart > iend)) {
2663     + PRINT_CALLER(fname, fxn, line);
2664     + pr_debug(" Ci0: inode=%p istart/end=%d:%d\n",
2665     + inode, istart, iend);
2666     + }
2667     + if (unlikely((istart == -1 && iend != -1) ||
2668     + (istart != -1 && iend == -1))) {
2669     + PRINT_CALLER(fname, fxn, line);
2670     + pr_debug(" Ci1: inode=%p istart/end=%d:%d\n",
2671     + inode, istart, iend);
2672     + }
2673     + if (!S_ISDIR(inode->i_mode)) {
2674     + if (unlikely(iend != istart)) {
2675     + PRINT_CALLER(fname, fxn, line);
2676     + pr_debug(" Ci2: inode=%p istart=%d iend=%d\n",
2677     + inode, istart, iend);
2678     + }
2679     + }
2680     +
2681     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2682     + if (unlikely(!UNIONFS_I(inode))) {
2683     + PRINT_CALLER(fname, fxn, line);
2684     + pr_debug(" Ci3: no inode_info %p\n", inode);
2685     + return;
2686     + }
2687     + if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
2688     + PRINT_CALLER(fname, fxn, line);
2689     + pr_debug(" Ci4: no lower_inodes %p\n", inode);
2690     + return;
2691     + }
2692     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2693     + if (lower_inode) {
2694     + memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2695     + if (unlikely(bindex < istart || bindex > iend)) {
2696     + PRINT_CALLER(fname, fxn, line);
2697     + pr_debug(" Ci5: inode/linode=%p:%p bindex=%d "
2698     + "istart/end=%d:%d\n", inode,
2699     + lower_inode, bindex, istart, iend);
2700     + } else if (unlikely(lower_inode == poison_ptr)) {
2701     + /* freed inode! */
2702     + PRINT_CALLER(fname, fxn, line);
2703     + pr_debug(" Ci6: inode/linode=%p:%p bindex=%d "
2704     + "istart/end=%d:%d\n", inode,
2705     + lower_inode, bindex, istart, iend);
2706     + }
2707     + continue;
2708     + }
2709     + /* if we get here, then lower_inode == NULL */
2710     + if (bindex < istart || bindex > iend)
2711     + continue;
2712     + /*
2713     + * directories can have NULL lower inodes in b/t start/end,
2714     + * but NOT if at the start/end range.
2715     + */
2716     + if (unlikely(S_ISDIR(inode->i_mode) &&
2717     + bindex > istart && bindex < iend))
2718     + continue;
2719     + PRINT_CALLER(fname, fxn, line);
2720     + pr_debug(" Ci7: inode/linode=%p:%p "
2721     + "bindex=%d istart/end=%d:%d\n",
2722     + inode, lower_inode, bindex, istart, iend);
2723     + }
2724     +}
2725     +
2726     +void __unionfs_check_dentry(const struct dentry *dentry,
2727     + const char *fname, const char *fxn, int line)
2728     +{
2729     + int bindex;
2730     + int dstart, dend, istart, iend;
2731     + struct dentry *lower_dentry;
2732     + struct inode *inode, *lower_inode;
2733     + struct super_block *sb;
2734     + struct vfsmount *lower_mnt;
2735     + int printed_caller = 0;
2736     + void *poison_ptr;
2737     +
2738     + BUG_ON(!dentry);
2739     + sb = dentry->d_sb;
2740     + inode = dentry->d_inode;
2741     + dstart = dbstart(dentry);
2742     + dend = dbend(dentry);
2743     + /* don't check dentry/mnt if no lower branches */
2744     + if (dstart < 0 && dend < 0)
2745     + goto check_inode;
2746     + BUG_ON(dstart > dend);
2747     +
2748     + if (unlikely((dstart == -1 && dend != -1) ||
2749     + (dstart != -1 && dend == -1))) {
2750     + PRINT_CALLER(fname, fxn, line);
2751     + pr_debug(" CD0: dentry=%p dstart/end=%d:%d\n",
2752     + dentry, dstart, dend);
2753     + }
2754     + /*
2755     + * check for NULL dentries inside the start/end range, or
2756     + * non-NULL dentries outside the start/end range.
2757     + */
2758     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2759     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
2760     + if (lower_dentry) {
2761     + if (unlikely(bindex < dstart || bindex > dend)) {
2762     + PRINT_CALLER(fname, fxn, line);
2763     + pr_debug(" CD1: dentry/lower=%p:%p(%p) "
2764     + "bindex=%d dstart/end=%d:%d\n",
2765     + dentry, lower_dentry,
2766     + (lower_dentry ? lower_dentry->d_inode :
2767     + (void *) -1L),
2768     + bindex, dstart, dend);
2769     + }
2770     + } else { /* lower_dentry == NULL */
2771     + if (bindex < dstart || bindex > dend)
2772     + continue;
2773     + /*
2774     + * Directories can have NULL lower inodes in b/t
2775     + * start/end, but NOT if at the start/end range.
2776     + * Ignore this rule, however, if this is a NULL
2777     + * dentry or a deleted dentry.
2778     + */
2779     + if (unlikely(!d_deleted((struct dentry *) dentry) &&
2780     + inode &&
2781     + !(inode && S_ISDIR(inode->i_mode) &&
2782     + bindex > dstart && bindex < dend))) {
2783     + PRINT_CALLER(fname, fxn, line);
2784     + pr_debug(" CD2: dentry/lower=%p:%p(%p) "
2785     + "bindex=%d dstart/end=%d:%d\n",
2786     + dentry, lower_dentry,
2787     + (lower_dentry ?
2788     + lower_dentry->d_inode :
2789     + (void *) -1L),
2790     + bindex, dstart, dend);
2791     + }
2792     + }
2793     + }
2794     +
2795     + /* check for vfsmounts same as for dentries */
2796     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2797     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2798     + if (lower_mnt) {
2799     + if (unlikely(bindex < dstart || bindex > dend)) {
2800     + PRINT_CALLER(fname, fxn, line);
2801     + pr_debug(" CM0: dentry/lmnt=%p:%p bindex=%d "
2802     + "dstart/end=%d:%d\n", dentry,
2803     + lower_mnt, bindex, dstart, dend);
2804     + }
2805     + } else { /* lower_mnt == NULL */
2806     + if (bindex < dstart || bindex > dend)
2807     + continue;
2808     + /*
2809     + * Directories can have NULL lower inodes in b/t
2810     + * start/end, but NOT if at the start/end range.
2811     + * Ignore this rule, however, if this is a NULL
2812     + * dentry.
2813     + */
2814     + if (unlikely(inode &&
2815     + !(inode && S_ISDIR(inode->i_mode) &&
2816     + bindex > dstart && bindex < dend))) {
2817     + PRINT_CALLER(fname, fxn, line);
2818     + pr_debug(" CM1: dentry/lmnt=%p:%p "
2819     + "bindex=%d dstart/end=%d:%d\n",
2820     + dentry, lower_mnt, bindex,
2821     + dstart, dend);
2822     + }
2823     + }
2824     + }
2825     +
2826     +check_inode:
2827     + /* for inodes now */
2828     + if (!inode)
2829     + return;
2830     + istart = ibstart(inode);
2831     + iend = ibend(inode);
2832     + /* don't check inode if no lower branches */
2833     + if (istart < 0 && iend < 0)
2834     + return;
2835     + BUG_ON(istart > iend);
2836     + if (unlikely((istart == -1 && iend != -1) ||
2837     + (istart != -1 && iend == -1))) {
2838     + PRINT_CALLER(fname, fxn, line);
2839     + pr_debug(" CI0: dentry/inode=%p:%p istart/end=%d:%d\n",
2840     + dentry, inode, istart, iend);
2841     + }
2842     + if (unlikely(istart != dstart)) {
2843     + PRINT_CALLER(fname, fxn, line);
2844     + pr_debug(" CI1: dentry/inode=%p:%p istart=%d dstart=%d\n",
2845     + dentry, inode, istart, dstart);
2846     + }
2847     + if (unlikely(iend != dend)) {
2848     + PRINT_CALLER(fname, fxn, line);
2849     + pr_debug(" CI2: dentry/inode=%p:%p iend=%d dend=%d\n",
2850     + dentry, inode, iend, dend);
2851     + }
2852     +
2853     + if (!S_ISDIR(inode->i_mode)) {
2854     + if (unlikely(dend != dstart)) {
2855     + PRINT_CALLER(fname, fxn, line);
2856     + pr_debug(" CI3: dentry/inode=%p:%p dstart=%d dend=%d\n",
2857     + dentry, inode, dstart, dend);
2858     + }
2859     + if (unlikely(iend != istart)) {
2860     + PRINT_CALLER(fname, fxn, line);
2861     + pr_debug(" CI4: dentry/inode=%p:%p istart=%d iend=%d\n",
2862     + dentry, inode, istart, iend);
2863     + }
2864     + }
2865     +
2866     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
2867     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2868     + if (lower_inode) {
2869     + memset(&poison_ptr, POISON_INUSE, sizeof(void *));
2870     + if (unlikely(bindex < istart || bindex > iend)) {
2871     + PRINT_CALLER(fname, fxn, line);
2872     + pr_debug(" CI5: dentry/linode=%p:%p bindex=%d "
2873     + "istart/end=%d:%d\n", dentry,
2874     + lower_inode, bindex, istart, iend);
2875     + } else if (unlikely(lower_inode == poison_ptr)) {
2876     + /* freed inode! */
2877     + PRINT_CALLER(fname, fxn, line);
2878     + pr_debug(" CI6: dentry/linode=%p:%p bindex=%d "
2879     + "istart/end=%d:%d\n", dentry,
2880     + lower_inode, bindex, istart, iend);
2881     + }
2882     + continue;
2883     + }
2884     + /* if we get here, then lower_inode == NULL */
2885     + if (bindex < istart || bindex > iend)
2886     + continue;
2887     + /*
2888     + * directories can have NULL lower inodes in b/t start/end,
2889     + * but NOT if at the start/end range.
2890     + */
2891     + if (unlikely(S_ISDIR(inode->i_mode) &&
2892     + bindex > istart && bindex < iend))
2893     + continue;
2894     + PRINT_CALLER(fname, fxn, line);
2895     + pr_debug(" CI7: dentry/linode=%p:%p "
2896     + "bindex=%d istart/end=%d:%d\n",
2897     + dentry, lower_inode, bindex, istart, iend);
2898     + }
2899     +
2900     + /*
2901     + * If it's a directory, then intermediate objects b/t start/end can
2902     + * be NULL. But, check that all three are NULL: lower dentry, mnt,
2903     + * and inode.
2904     + */
2905     + if (dstart >= 0 && dend >= 0 && S_ISDIR(inode->i_mode))
2906     + for (bindex = dstart+1; bindex < dend; bindex++) {
2907     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2908     + lower_dentry = unionfs_lower_dentry_idx(dentry,
2909     + bindex);
2910     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
2911     + if (unlikely(!((lower_inode && lower_dentry &&
2912     + lower_mnt) ||
2913     + (!lower_inode &&
2914     + !lower_dentry && !lower_mnt)))) {
2915     + PRINT_CALLER(fname, fxn, line);
2916     + pr_debug(" Cx: lmnt/ldentry/linode=%p:%p:%p "
2917     + "bindex=%d dstart/end=%d:%d\n",
2918     + lower_mnt, lower_dentry, lower_inode,
2919     + bindex, dstart, dend);
2920     + }
2921     + }
2922     + /* check if lower inode is newer than upper one (it shouldn't) */
2923     + if (unlikely(is_newer_lower(dentry) && !is_negative_lower(dentry))) {
2924     + PRINT_CALLER(fname, fxn, line);
2925     + for (bindex = ibstart(inode); bindex <= ibend(inode);
2926     + bindex++) {
2927     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
2928     + if (unlikely(!lower_inode))
2929     + continue;
2930     + pr_debug(" CI8: bindex=%d mtime/lmtime=%lu.%lu/%lu.%lu "
2931     + "ctime/lctime=%lu.%lu/%lu.%lu\n",
2932     + bindex,
2933     + inode->i_mtime.tv_sec,
2934     + inode->i_mtime.tv_nsec,
2935     + lower_inode->i_mtime.tv_sec,
2936     + lower_inode->i_mtime.tv_nsec,
2937     + inode->i_ctime.tv_sec,
2938     + inode->i_ctime.tv_nsec,
2939     + lower_inode->i_ctime.tv_sec,
2940     + lower_inode->i_ctime.tv_nsec);
2941     + }
2942     + }
2943     +}
2944     +
2945     +void __unionfs_check_file(const struct file *file,
2946     + const char *fname, const char *fxn, int line)
2947     +{
2948     + int bindex;
2949     + int dstart, dend, fstart, fend;
2950     + struct dentry *dentry;
2951     + struct file *lower_file;
2952     + struct inode *inode;
2953     + struct super_block *sb;
2954     + int printed_caller = 0;
2955     +
2956     + BUG_ON(!file);
2957     + dentry = file->f_path.dentry;
2958     + sb = dentry->d_sb;
2959     + dstart = dbstart(dentry);
2960     + dend = dbend(dentry);
2961     + BUG_ON(dstart > dend);
2962     + fstart = fbstart(file);
2963     + fend = fbend(file);
2964     + BUG_ON(fstart > fend);
2965     +
2966     + if (unlikely((fstart == -1 && fend != -1) ||
2967     + (fstart != -1 && fend == -1))) {
2968     + PRINT_CALLER(fname, fxn, line);
2969     + pr_debug(" CF0: file/dentry=%p:%p fstart/end=%d:%d\n",
2970     + file, dentry, fstart, fend);
2971     + }
2972     + if (unlikely(fstart != dstart)) {
2973     + PRINT_CALLER(fname, fxn, line);
2974     + pr_debug(" CF1: file/dentry=%p:%p fstart=%d dstart=%d\n",
2975     + file, dentry, fstart, dstart);
2976     + }
2977     + if (unlikely(fend != dend)) {
2978     + PRINT_CALLER(fname, fxn, line);
2979     + pr_debug(" CF2: file/dentry=%p:%p fend=%d dend=%d\n",
2980     + file, dentry, fend, dend);
2981     + }
2982     + inode = dentry->d_inode;
2983     + if (!S_ISDIR(inode->i_mode)) {
2984     + if (unlikely(fend != fstart)) {
2985     + PRINT_CALLER(fname, fxn, line);
2986     + pr_debug(" CF3: file/inode=%p:%p fstart=%d fend=%d\n",
2987     + file, inode, fstart, fend);
2988     + }
2989     + if (unlikely(dend != dstart)) {
2990     + PRINT_CALLER(fname, fxn, line);
2991     + pr_debug(" CF4: file/dentry=%p:%p dstart=%d dend=%d\n",
2992     + file, dentry, dstart, dend);
2993     + }
2994     + }
2995     +
2996     + /*
2997     + * check for NULL dentries inside the start/end range, or
2998     + * non-NULL dentries outside the start/end range.
2999     + */
3000     + for (bindex = sbstart(sb); bindex < sbmax(sb); bindex++) {
3001     + lower_file = unionfs_lower_file_idx(file, bindex);
3002     + if (lower_file) {
3003     + if (unlikely(bindex < fstart || bindex > fend)) {
3004     + PRINT_CALLER(fname, fxn, line);
3005     + pr_debug(" CF5: file/lower=%p:%p bindex=%d "
3006     + "fstart/end=%d:%d\n", file,
3007     + lower_file, bindex, fstart, fend);
3008     + }
3009     + } else { /* lower_file == NULL */
3010     + if (bindex >= fstart && bindex <= fend) {
3011     + /*
3012     + * directories can have NULL lower inodes in
3013     + * b/t start/end, but NOT if at the
3014     + * start/end range.
3015     + */
3016     + if (unlikely(!(S_ISDIR(inode->i_mode) &&
3017     + bindex > fstart &&
3018     + bindex < fend))) {
3019     + PRINT_CALLER(fname, fxn, line);
3020     + pr_debug(" CF6: file/lower=%p:%p "
3021     + "bindex=%d fstart/end=%d:%d\n",
3022     + file, lower_file, bindex,
3023     + fstart, fend);
3024     + }
3025     + }
3026     + }
3027     + }
3028     +
3029     + __unionfs_check_dentry(dentry, fname, fxn, line);
3030     +}
3031     +
3032     +void __unionfs_check_nd(const struct nameidata *nd,
3033     + const char *fname, const char *fxn, int line)
3034     +{
3035     + struct file *file;
3036     + int printed_caller = 0;
3037     +
3038     + if (unlikely(!nd))
3039     + return;
3040     + if (nd->flags & LOOKUP_OPEN) {
3041     + file = nd->intent.open.file;
3042     + if (unlikely(file->f_path.dentry &&
3043     + strcmp(file->f_path.dentry->d_sb->s_type->name,
3044     + UNIONFS_NAME))) {
3045     + PRINT_CALLER(fname, fxn, line);
3046     + pr_debug(" CND1: lower_file of type %s\n",
3047     + file->f_path.dentry->d_sb->s_type->name);
3048     + }
3049     + }
3050     +}
3051     +
3052     +static unsigned int __mnt_get_count(struct vfsmount *mnt)
3053     +{
3054     +#ifdef CONFIG_SMP
3055     + unsigned int count = 0;
3056     + int cpu;
3057     +
3058     + for_each_possible_cpu(cpu) {
3059     + count += per_cpu_ptr(mnt->mnt_pcp, cpu)->mnt_count;
3060     + }
3061     +
3062     + return count;
3063     +#else
3064     + return mnt->mnt_count;
3065     +#endif
3066     +}
3067     +
3068     +/* useful to track vfsmount leaks that could cause EBUSY on unmount */
3069     +void __show_branch_counts(const struct super_block *sb,
3070     + const char *file, const char *fxn, int line)
3071     +{
3072     + int i;
3073     + struct vfsmount *mnt;
3074     +
3075     + pr_debug("BC:");
3076     + for (i = 0; i < sbmax(sb); i++) {
3077     + if (likely(sb->s_root))
3078     + mnt = UNIONFS_D(sb->s_root)->lower_paths[i].mnt;
3079     + else
3080     + mnt = NULL;
3081     + printk(KERN_CONT "%d:",
3082     + (mnt ? __mnt_get_count(mnt) : -99));
3083     + }
3084     + printk(KERN_CONT "%s:%s:%d\n", file, fxn, line);
3085     +}
3086     +
3087     +void __show_inode_times(const struct inode *inode,
3088     + const char *file, const char *fxn, int line)
3089     +{
3090     + struct inode *lower_inode;
3091     + int bindex;
3092     +
3093     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3094     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3095     + if (unlikely(!lower_inode))
3096     + continue;
3097     + pr_debug("IT(%lu:%d): %s:%s:%d "
3098     + "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3099     + inode->i_ino, bindex,
3100     + file, fxn, line,
3101     + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3102     + lower_inode->i_mtime.tv_sec,
3103     + lower_inode->i_mtime.tv_nsec,
3104     + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3105     + lower_inode->i_ctime.tv_sec,
3106     + lower_inode->i_ctime.tv_nsec);
3107     + }
3108     +}
3109     +
3110     +void __show_dinode_times(const struct dentry *dentry,
3111     + const char *file, const char *fxn, int line)
3112     +{
3113     + struct inode *inode = dentry->d_inode;
3114     + struct inode *lower_inode;
3115     + int bindex;
3116     +
3117     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3118     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3119     + if (!lower_inode)
3120     + continue;
3121     + pr_debug("DT(%s:%lu:%d): %s:%s:%d "
3122     + "um=%lu/%lu lm=%lu/%lu uc=%lu/%lu lc=%lu/%lu\n",
3123     + dentry->d_name.name, inode->i_ino, bindex,
3124     + file, fxn, line,
3125     + inode->i_mtime.tv_sec, inode->i_mtime.tv_nsec,
3126     + lower_inode->i_mtime.tv_sec,
3127     + lower_inode->i_mtime.tv_nsec,
3128     + inode->i_ctime.tv_sec, inode->i_ctime.tv_nsec,
3129     + lower_inode->i_ctime.tv_sec,
3130     + lower_inode->i_ctime.tv_nsec);
3131     + }
3132     +}
3133     +
3134     +void __show_inode_counts(const struct inode *inode,
3135     + const char *file, const char *fxn, int line)
3136     +{
3137     + struct inode *lower_inode;
3138     + int bindex;
3139     +
3140     + if (unlikely(!inode)) {
3141     + pr_debug("SiC: Null inode\n");
3142     + return;
3143     + }
3144     + for (bindex = sbstart(inode->i_sb); bindex <= sbend(inode->i_sb);
3145     + bindex++) {
3146     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3147     + if (unlikely(!lower_inode))
3148     + continue;
3149     + pr_debug("SIC(%lu:%d:%d): lc=%d %s:%s:%d\n",
3150     + inode->i_ino, bindex,
3151     + atomic_read(&(inode)->i_count),
3152     + atomic_read(&(lower_inode)->i_count),
3153     + file, fxn, line);
3154     + }
3155     +}
3156     diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
3157     new file mode 100644
3158     index 0000000..c0205a4
3159     --- /dev/null
3160     +++ b/fs/unionfs/dentry.c
3161     @@ -0,0 +1,406 @@
3162     +/*
3163     + * Copyright (c) 2003-2011 Erez Zadok
3164     + * Copyright (c) 2003-2006 Charles P. Wright
3165     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3166     + * Copyright (c) 2005-2006 Junjiro Okajima
3167     + * Copyright (c) 2005 Arun M. Krishnakumar
3168     + * Copyright (c) 2004-2006 David P. Quigley
3169     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3170     + * Copyright (c) 2003 Puja Gupta
3171     + * Copyright (c) 2003 Harikesavan Krishnan
3172     + * Copyright (c) 2003-2011 Stony Brook University
3173     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
3174     + *
3175     + * This program is free software; you can redistribute it and/or modify
3176     + * it under the terms of the GNU General Public License version 2 as
3177     + * published by the Free Software Foundation.
3178     + */
3179     +
3180     +#include "union.h"
3181     +
3182     +bool is_negative_lower(const struct dentry *dentry)
3183     +{
3184     + int bindex;
3185     + struct dentry *lower_dentry;
3186     +
3187     + BUG_ON(!dentry);
3188     + /* cache coherency: check if file was deleted on lower branch */
3189     + if (dbstart(dentry) < 0)
3190     + return true;
3191     + for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
3192     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3193     + /* unhashed (i.e., unlinked) lower dentries don't count */
3194     + if (lower_dentry && lower_dentry->d_inode &&
3195     + !d_deleted(lower_dentry) &&
3196     + !(lower_dentry->d_flags & DCACHE_NFSFS_RENAMED))
3197     + return false;
3198     + }
3199     + return true;
3200     +}
3201     +
3202     +static inline void __dput_lowers(struct dentry *dentry, int start, int end)
3203     +{
3204     + struct dentry *lower_dentry;
3205     + int bindex;
3206     +
3207     + if (start < 0)
3208     + return;
3209     + for (bindex = start; bindex <= end; bindex++) {
3210     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3211     + if (!lower_dentry)
3212     + continue;
3213     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
3214     + dput(lower_dentry);
3215     + }
3216     +}
3217     +
3218     +/*
3219     + * Purge and invalidate as many data pages of a unionfs inode. This is
3220     + * called when the lower inode has changed, and we want to force processes
3221     + * to re-get the new data.
3222     + */
3223     +static inline void purge_inode_data(struct inode *inode)
3224     +{
3225     + /* remove all non-private mappings */
3226     + unmap_mapping_range(inode->i_mapping, 0, 0, 0);
3227     + /* invalidate as many pages as possible */
3228     + invalidate_mapping_pages(inode->i_mapping, 0, -1);
3229     + /*
3230     + * Don't try to truncate_inode_pages here, because this could lead
3231     + * to a deadlock between some of address_space ops and dentry
3232     + * revalidation: the address space op is invoked with a lock on our
3233     + * own page, and truncate_inode_pages will block on locked pages.
3234     + */
3235     +}
3236     +
3237     +/*
3238     + * Revalidate a single file/symlink/special dentry. Assume that info nodes
3239     + * of the @dentry and its @parent are locked. Assume parent is valid,
3240     + * otherwise return false (and let's hope the VFS will try to re-lookup this
3241     + * dentry). Returns true if valid, false otherwise.
3242     + */
3243     +bool __unionfs_d_revalidate(struct dentry *dentry, struct dentry *parent,
3244     + bool willwrite)
3245     +{
3246     + bool valid = true; /* default is valid */
3247     + struct dentry *lower_dentry;
3248     + struct dentry *result;
3249     + int bindex, bstart, bend;
3250     + int sbgen, dgen, pdgen;
3251     + int positive = 0;
3252     + int interpose_flag;
3253     +
3254     + verify_locked(dentry);
3255     + verify_locked(parent);
3256     +
3257     + /* if the dentry is unhashed, do NOT revalidate */
3258     + if (d_deleted(dentry))
3259     + goto out;
3260     +
3261     + dgen = atomic_read(&UNIONFS_D(dentry)->generation);
3262     +
3263     + if (is_newer_lower(dentry)) {
3264     + /* root dentry is always valid */
3265     + if (IS_ROOT(dentry)) {
3266     + unionfs_copy_attr_times(dentry->d_inode);
3267     + } else {
3268     + /*
3269     + * reset generation number to zero, guaranteed to be
3270     + * "old"
3271     + */
3272     + dgen = 0;
3273     + atomic_set(&UNIONFS_D(dentry)->generation, dgen);
3274     + }
3275     + if (!willwrite)
3276     + purge_inode_data(dentry->d_inode);
3277     + }
3278     +
3279     + sbgen = atomic_read(&UNIONFS_SB(dentry->d_sb)->generation);
3280     +
3281     + BUG_ON(dbstart(dentry) == -1);
3282     + if (dentry->d_inode)
3283     + positive = 1;
3284     +
3285     + /* if our dentry is valid, then validate all lower ones */
3286     + if (sbgen == dgen)
3287     + goto validate_lowers;
3288     +
3289     + /* The root entry should always be valid */
3290     + BUG_ON(IS_ROOT(dentry));
3291     +
3292     + /* We can't work correctly if our parent isn't valid. */
3293     + pdgen = atomic_read(&UNIONFS_D(parent)->generation);
3294     +
3295     + /* Free the pointers for our inodes and this dentry. */
3296     + path_put_lowers_all(dentry, false);
3297     +
3298     + interpose_flag = INTERPOSE_REVAL_NEG;
3299     + if (positive) {
3300     + interpose_flag = INTERPOSE_REVAL;
3301     + iput_lowers_all(dentry->d_inode, true);
3302     + }
3303     +
3304     + if (realloc_dentry_private_data(dentry) != 0) {
3305     + valid = false;
3306     + goto out;
3307     + }
3308     +
3309     + result = unionfs_lookup_full(dentry, parent, interpose_flag);
3310     + if (result) {
3311     + if (IS_ERR(result)) {
3312     + valid = false;
3313     + goto out;
3314     + }
3315     + /*
3316     + * current unionfs_lookup_backend() doesn't return
3317     + * a valid dentry
3318     + */
3319     + dput(dentry);
3320     + dentry = result;
3321     + }
3322     +
3323     + if (unlikely(positive && is_negative_lower(dentry))) {
3324     + /* call make_bad_inode here ? */
3325     + d_drop(dentry);
3326     + valid = false;
3327     + goto out;
3328     + }
3329     +
3330     + /*
3331     + * if we got here then we have revalidated our dentry and all lower
3332     + * ones, so we can return safely.
3333     + */
3334     + if (!valid) /* lower dentry revalidation failed */
3335     + goto out;
3336     +
3337     + /*
3338     + * If the parent's gen no. matches the superblock's gen no., then
3339     + * we can update our denty's gen no. If they didn't match, then it
3340     + * was OK to revalidate this dentry with a stale parent, but we'll
3341     + * purposely not update our dentry's gen no. (so it can be redone);
3342     + * and, we'll mark our parent dentry as invalid so it'll force it
3343     + * (and our dentry) to be revalidated.
3344     + */
3345     + if (pdgen == sbgen)
3346     + atomic_set(&UNIONFS_D(dentry)->generation, sbgen);
3347     + goto out;
3348     +
3349     +validate_lowers:
3350     +
3351     + /* The revalidation must occur across all branches */
3352     + bstart = dbstart(dentry);
3353     + bend = dbend(dentry);
3354     + BUG_ON(bstart == -1);
3355     + for (bindex = bstart; bindex <= bend; bindex++) {
3356     + int err;
3357     + struct nameidata lower_nd;
3358     +
3359     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3360     + if (!lower_dentry || !lower_dentry->d_op
3361     + || !lower_dentry->d_op->d_revalidate)
3362     + continue;
3363     + /*
3364     + * Don't pass nameidata to lower file system, because we
3365     + * don't want an arbitrary lower file being opened or
3366     + * returned to us: it may be useless to us because of the
3367     + * fanout nature of unionfs (cf. file/directory open-file
3368     + * invariants). We will open lower files as and when needed
3369     + * later on.
3370     + */
3371     + err = init_lower_nd(&lower_nd, LOOKUP_OPEN);
3372     + if (unlikely(err < 0)) {
3373     + valid = false;
3374     + break;
3375     + }
3376     + if (!lower_dentry->d_op->d_revalidate(lower_dentry, &lower_nd))
3377     + valid = false;
3378     + release_lower_nd(&lower_nd, err);
3379     + }
3380     +
3381     + if (!dentry->d_inode ||
3382     + ibstart(dentry->d_inode) < 0 ||
3383     + ibend(dentry->d_inode) < 0) {
3384     + valid = false;
3385     + goto out;
3386     + }
3387     +
3388     + if (valid) {
3389     + /*
3390     + * If we get here, and we copy the meta-data from the lower
3391     + * inode to our inode, then it is vital that we have already
3392     + * purged all unionfs-level file data. We do that in the
3393     + * caller (__unionfs_d_revalidate) by calling
3394     + * purge_inode_data.
3395     + */
3396     + unionfs_copy_attr_all(dentry->d_inode,
3397     + unionfs_lower_inode(dentry->d_inode));
3398     + fsstack_copy_inode_size(dentry->d_inode,
3399     + unionfs_lower_inode(dentry->d_inode));
3400     + }
3401     +
3402     +out:
3403     + return valid;
3404     +}
3405     +
3406     +/*
3407     + * Determine if the lower inode objects have changed from below the unionfs
3408     + * inode. Return true if changed, false otherwise.
3409     + *
3410     + * We check if the mtime or ctime have changed. However, the inode times
3411     + * can be changed by anyone without much protection, including
3412     + * asynchronously. This can sometimes cause unionfs to find that the lower
3413     + * file system doesn't change its inode times quick enough, resulting in a
3414     + * false positive indication (which is harmless, it just makes unionfs do
3415     + * extra work in re-validating the objects). To minimize the chances of
3416     + * these situations, we still consider such small time changes valid, but we
3417     + * don't print debugging messages unless the time changes are greater than
3418     + * UNIONFS_MIN_CC_TIME (which defaults to 3 seconds, as with NFS's acregmin)
3419     + * because significant changes are more likely due to users manually
3420     + * touching lower files.
3421     + */
3422     +bool is_newer_lower(const struct dentry *dentry)
3423     +{
3424     + int bindex;
3425     + struct inode *inode;
3426     + struct inode *lower_inode;
3427     +
3428     + /* ignore if we're called on semi-initialized dentries/inodes */
3429     + if (!dentry || !UNIONFS_D(dentry))
3430     + return false;
3431     + inode = dentry->d_inode;
3432     + if (!inode || !UNIONFS_I(inode)->lower_inodes ||
3433     + ibstart(inode) < 0 || ibend(inode) < 0)
3434     + return false;
3435     +
3436     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
3437     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
3438     + if (!lower_inode)
3439     + continue;
3440     +
3441     + /* check if mtime/ctime have changed */
3442     + if (unlikely(timespec_compare(&inode->i_mtime,
3443     + &lower_inode->i_mtime) < 0)) {
3444     + if ((lower_inode->i_mtime.tv_sec -
3445     + inode->i_mtime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3446     + pr_info("unionfs: new lower inode mtime "
3447     + "(bindex=%d, name=%s)\n", bindex,
3448     + dentry->d_name.name);
3449     + show_dinode_times(dentry);
3450     + }
3451     + return true;
3452     + }
3453     + if (unlikely(timespec_compare(&inode->i_ctime,
3454     + &lower_inode->i_ctime) < 0)) {
3455     + if ((lower_inode->i_ctime.tv_sec -
3456     + inode->i_ctime.tv_sec) > UNIONFS_MIN_CC_TIME) {
3457     + pr_info("unionfs: new lower inode ctime "
3458     + "(bindex=%d, name=%s)\n", bindex,
3459     + dentry->d_name.name);
3460     + show_dinode_times(dentry);
3461     + }
3462     + return true;
3463     + }
3464     + }
3465     +
3466     + /*
3467     + * Last check: if this is a positive dentry, but somehow all lower
3468     + * dentries are negative or unhashed, then this dentry needs to be
3469     + * revalidated, because someone probably deleted the objects from
3470     + * the lower branches directly.
3471     + */
3472     + if (is_negative_lower(dentry))
3473     + return true;
3474     +
3475     + return false; /* default: lower is not newer */
3476     +}
3477     +
3478     +static int unionfs_d_revalidate(struct dentry *dentry,
3479     + struct nameidata *nd_unused)
3480     +{
3481     + bool valid = true;
3482     + int err = 1; /* 1 means valid for the VFS */
3483     + struct dentry *parent;
3484     +
3485     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3486     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3487     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3488     +
3489     + valid = __unionfs_d_revalidate(dentry, parent, false);
3490     + if (valid) {
3491     + unionfs_postcopyup_setmnt(dentry);
3492     + unionfs_check_dentry(dentry);
3493     + } else {
3494     + d_drop(dentry);
3495     + err = valid;
3496     + }
3497     + unionfs_unlock_dentry(dentry);
3498     + unionfs_unlock_parent(dentry, parent);
3499     + unionfs_read_unlock(dentry->d_sb);
3500     +
3501     + return err;
3502     +}
3503     +
3504     +static void unionfs_d_release(struct dentry *dentry)
3505     +{
3506     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3507     + if (unlikely(!UNIONFS_D(dentry)))
3508     + goto out; /* skip if no lower branches */
3509     + /* must lock our branch configuration here */
3510     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3511     +
3512     + unionfs_check_dentry(dentry);
3513     + /* this could be a negative dentry, so check first */
3514     + if (dbstart(dentry) < 0) {
3515     + unionfs_unlock_dentry(dentry);
3516     + goto out; /* due to a (normal) failed lookup */
3517     + }
3518     +
3519     + /* Release all the lower dentries */
3520     + path_put_lowers_all(dentry, true);
3521     +
3522     + unionfs_unlock_dentry(dentry);
3523     +
3524     +out:
3525     + free_dentry_private_data(dentry);
3526     + unionfs_read_unlock(dentry->d_sb);
3527     + return;
3528     +}
3529     +
3530     +/*
3531     + * Called when we're removing the last reference to our dentry. So we
3532     + * should drop all lower references too.
3533     + */
3534     +static void unionfs_d_iput(struct dentry *dentry, struct inode *inode)
3535     +{
3536     + int rc;
3537     +
3538     + BUG_ON(!dentry);
3539     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
3540     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3541     +
3542     + if (!UNIONFS_D(dentry) || dbstart(dentry) < 0)
3543     + goto drop_lower_inodes;
3544     + path_put_lowers_all(dentry, false);
3545     +
3546     +drop_lower_inodes:
3547     + rc = atomic_read(&inode->i_count);
3548     + if (rc == 1 && inode->i_nlink == 1 && ibstart(inode) >= 0) {
3549     + /* see Documentation/filesystems/unionfs/issues.txt */
3550     + lockdep_off();
3551     + iput(unionfs_lower_inode(inode));
3552     + lockdep_on();
3553     + unionfs_set_lower_inode(inode, NULL);
3554     + /* XXX: may need to set start/end to -1? */
3555     + }
3556     +
3557     + iput(inode);
3558     +
3559     + unionfs_unlock_dentry(dentry);
3560     + unionfs_read_unlock(dentry->d_sb);
3561     +}
3562     +
3563     +struct dentry_operations unionfs_dops = {
3564     + .d_revalidate = unionfs_d_revalidate,
3565     + .d_release = unionfs_d_release,
3566     + .d_iput = unionfs_d_iput,
3567     +};
3568     diff --git a/fs/unionfs/dirfops.c b/fs/unionfs/dirfops.c
3569     new file mode 100644
3570     index 0000000..72a9c1a
3571     --- /dev/null
3572     +++ b/fs/unionfs/dirfops.c
3573     @@ -0,0 +1,302 @@
3574     +/*
3575     + * Copyright (c) 2003-2011 Erez Zadok
3576     + * Copyright (c) 2003-2006 Charles P. Wright
3577     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3578     + * Copyright (c) 2005-2006 Junjiro Okajima
3579     + * Copyright (c) 2005 Arun M. Krishnakumar
3580     + * Copyright (c) 2004-2006 David P. Quigley
3581     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3582     + * Copyright (c) 2003 Puja Gupta
3583     + * Copyright (c) 2003 Harikesavan Krishnan
3584     + * Copyright (c) 2003-2011 Stony Brook University
3585     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
3586     + *
3587     + * This program is free software; you can redistribute it and/or modify
3588     + * it under the terms of the GNU General Public License version 2 as
3589     + * published by the Free Software Foundation.
3590     + */
3591     +
3592     +#include "union.h"
3593     +
3594     +/* Make sure our rdstate is playing by the rules. */
3595     +static void verify_rdstate_offset(struct unionfs_dir_state *rdstate)
3596     +{
3597     + BUG_ON(rdstate->offset >= DIREOF);
3598     + BUG_ON(rdstate->cookie >= MAXRDCOOKIE);
3599     +}
3600     +
3601     +struct unionfs_getdents_callback {
3602     + struct unionfs_dir_state *rdstate;
3603     + void *dirent;
3604     + int entries_written;
3605     + int filldir_called;
3606     + int filldir_error;
3607     + filldir_t filldir;
3608     + struct super_block *sb;
3609     +};
3610     +
3611     +/* based on generic filldir in fs/readir.c */
3612     +static int unionfs_filldir(void *dirent, const char *oname, int namelen,
3613     + loff_t offset, u64 ino, unsigned int d_type)
3614     +{
3615     + struct unionfs_getdents_callback *buf = dirent;
3616     + struct filldir_node *found = NULL;
3617     + int err = 0;
3618     + int is_whiteout;
3619     + char *name = (char *) oname;
3620     +
3621     + buf->filldir_called++;
3622     +
3623     + is_whiteout = is_whiteout_name(&name, &namelen);
3624     +
3625     + found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3626     +
3627     + if (found) {
3628     + /*
3629     + * If we had non-whiteout entry in dir cache, then mark it
3630     + * as a whiteout and but leave it in the dir cache.
3631     + */
3632     + if (is_whiteout && !found->whiteout)
3633     + found->whiteout = is_whiteout;
3634     + goto out;
3635     + }
3636     +
3637     + /* if 'name' isn't a whiteout, filldir it. */
3638     + if (!is_whiteout) {
3639     + off_t pos = rdstate2offset(buf->rdstate);
3640     + u64 unionfs_ino = ino;
3641     +
3642     + err = buf->filldir(buf->dirent, name, namelen, pos,
3643     + unionfs_ino, d_type);
3644     + buf->rdstate->offset++;
3645     + verify_rdstate_offset(buf->rdstate);
3646     + }
3647     + /*
3648     + * If we did fill it, stuff it in our hash, otherwise return an
3649     + * error.
3650     + */
3651     + if (err) {
3652     + buf->filldir_error = err;
3653     + goto out;
3654     + }
3655     + buf->entries_written++;
3656     + err = add_filldir_node(buf->rdstate, name, namelen,
3657     + buf->rdstate->bindex, is_whiteout);
3658     + if (err)
3659     + buf->filldir_error = err;
3660     +
3661     +out:
3662     + return err;
3663     +}
3664     +
3665     +static int unionfs_readdir(struct file *file, void *dirent, filldir_t filldir)
3666     +{
3667     + int err = 0;
3668     + struct file *lower_file = NULL;
3669     + struct dentry *dentry = file->f_path.dentry;
3670     + struct dentry *parent;
3671     + struct inode *inode = NULL;
3672     + struct unionfs_getdents_callback buf;
3673     + struct unionfs_dir_state *uds;
3674     + int bend;
3675     + loff_t offset;
3676     +
3677     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3678     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3679     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3680     +
3681     + err = unionfs_file_revalidate(file, parent, false);
3682     + if (unlikely(err))
3683     + goto out;
3684     +
3685     + inode = dentry->d_inode;
3686     +
3687     + uds = UNIONFS_F(file)->rdstate;
3688     + if (!uds) {
3689     + if (file->f_pos == DIREOF) {
3690     + goto out;
3691     + } else if (file->f_pos > 0) {
3692     + uds = find_rdstate(inode, file->f_pos);
3693     + if (unlikely(!uds)) {
3694     + err = -ESTALE;
3695     + goto out;
3696     + }
3697     + UNIONFS_F(file)->rdstate = uds;
3698     + } else {
3699     + init_rdstate(file);
3700     + uds = UNIONFS_F(file)->rdstate;
3701     + }
3702     + }
3703     + bend = fbend(file);
3704     +
3705     + while (uds->bindex <= bend) {
3706     + lower_file = unionfs_lower_file_idx(file, uds->bindex);
3707     + if (!lower_file) {
3708     + uds->bindex++;
3709     + uds->dirpos = 0;
3710     + continue;
3711     + }
3712     +
3713     + /* prepare callback buffer */
3714     + buf.filldir_called = 0;
3715     + buf.filldir_error = 0;
3716     + buf.entries_written = 0;
3717     + buf.dirent = dirent;
3718     + buf.filldir = filldir;
3719     + buf.rdstate = uds;
3720     + buf.sb = inode->i_sb;
3721     +
3722     + /* Read starting from where we last left off. */
3723     + offset = vfs_llseek(lower_file, uds->dirpos, SEEK_SET);
3724     + if (offset < 0) {
3725     + err = offset;
3726     + goto out;
3727     + }
3728     + err = vfs_readdir(lower_file, unionfs_filldir, &buf);
3729     +
3730     + /* Save the position for when we continue. */
3731     + offset = vfs_llseek(lower_file, 0, SEEK_CUR);
3732     + if (offset < 0) {
3733     + err = offset;
3734     + goto out;
3735     + }
3736     + uds->dirpos = offset;
3737     +
3738     + /* Copy the atime. */
3739     + fsstack_copy_attr_atime(inode,
3740     + lower_file->f_path.dentry->d_inode);
3741     +
3742     + if (err < 0)
3743     + goto out;
3744     +
3745     + if (buf.filldir_error)
3746     + break;
3747     +
3748     + if (!buf.entries_written) {
3749     + uds->bindex++;
3750     + uds->dirpos = 0;
3751     + }
3752     + }
3753     +
3754     + if (!buf.filldir_error && uds->bindex >= bend) {
3755     + /* Save the number of hash entries for next time. */
3756     + UNIONFS_I(inode)->hashsize = uds->hashentries;
3757     + free_rdstate(uds);
3758     + UNIONFS_F(file)->rdstate = NULL;
3759     + file->f_pos = DIREOF;
3760     + } else {
3761     + file->f_pos = rdstate2offset(uds);
3762     + }
3763     +
3764     +out:
3765     + if (!err)
3766     + unionfs_check_file(file);
3767     + unionfs_unlock_dentry(dentry);
3768     + unionfs_unlock_parent(dentry, parent);
3769     + unionfs_read_unlock(dentry->d_sb);
3770     + return err;
3771     +}
3772     +
3773     +/*
3774     + * This is not meant to be a generic repositioning function. If you do
3775     + * things that aren't supported, then we return EINVAL.
3776     + *
3777     + * What is allowed:
3778     + * (1) seeking to the same position that you are currently at
3779     + * This really has no effect, but returns where you are.
3780     + * (2) seeking to the beginning of the file
3781     + * This throws out all state, and lets you begin again.
3782     + */
3783     +static loff_t unionfs_dir_llseek(struct file *file, loff_t offset, int origin)
3784     +{
3785     + struct unionfs_dir_state *rdstate;
3786     + struct dentry *dentry = file->f_path.dentry;
3787     + struct dentry *parent;
3788     + loff_t err;
3789     +
3790     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
3791     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
3792     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
3793     +
3794     + err = unionfs_file_revalidate(file, parent, false);
3795     + if (unlikely(err))
3796     + goto out;
3797     +
3798     + rdstate = UNIONFS_F(file)->rdstate;
3799     +
3800     + /*
3801     + * we let users seek to their current position, but not anywhere
3802     + * else.
3803     + */
3804     + if (!offset) {
3805     + switch (origin) {
3806     + case SEEK_SET:
3807     + if (rdstate) {
3808     + free_rdstate(rdstate);
3809     + UNIONFS_F(file)->rdstate = NULL;
3810     + }
3811     + init_rdstate(file);
3812     + err = 0;
3813     + break;
3814     + case SEEK_CUR:
3815     + err = file->f_pos;
3816     + break;
3817     + case SEEK_END:
3818     + /* Unsupported, because we would break everything. */
3819     + err = -EINVAL;
3820     + break;
3821     + }
3822     + } else {
3823     + switch (origin) {
3824     + case SEEK_SET:
3825     + if (rdstate) {
3826     + if (offset == rdstate2offset(rdstate))
3827     + err = offset;
3828     + else if (file->f_pos == DIREOF)
3829     + err = DIREOF;
3830     + else
3831     + err = -EINVAL;
3832     + } else {
3833     + struct inode *inode;
3834     + inode = dentry->d_inode;
3835     + rdstate = find_rdstate(inode, offset);
3836     + if (rdstate) {
3837     + UNIONFS_F(file)->rdstate = rdstate;
3838     + err = rdstate->offset;
3839     + } else {
3840     + err = -EINVAL;
3841     + }
3842     + }
3843     + break;
3844     + case SEEK_CUR:
3845     + case SEEK_END:
3846     + /* Unsupported, because we would break everything. */
3847     + err = -EINVAL;
3848     + break;
3849     + }
3850     + }
3851     +
3852     +out:
3853     + if (!err)
3854     + unionfs_check_file(file);
3855     + unionfs_unlock_dentry(dentry);
3856     + unionfs_unlock_parent(dentry, parent);
3857     + unionfs_read_unlock(dentry->d_sb);
3858     + return err;
3859     +}
3860     +
3861     +/*
3862     + * Trimmed directory options, we shouldn't pass everything down since
3863     + * we don't want to operate on partial directories.
3864     + */
3865     +struct file_operations unionfs_dir_fops = {
3866     + .llseek = unionfs_dir_llseek,
3867     + .read = generic_read_dir,
3868     + .readdir = unionfs_readdir,
3869     + .unlocked_ioctl = unionfs_ioctl,
3870     + .open = unionfs_open,
3871     + .release = unionfs_file_release,
3872     + .flush = unionfs_flush,
3873     + .fsync = unionfs_fsync,
3874     + .fasync = unionfs_fasync,
3875     +};
3876     diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
3877     new file mode 100644
3878     index 0000000..62ec9af
3879     --- /dev/null
3880     +++ b/fs/unionfs/dirhelper.c
3881     @@ -0,0 +1,158 @@
3882     +/*
3883     + * Copyright (c) 2003-2011 Erez Zadok
3884     + * Copyright (c) 2003-2006 Charles P. Wright
3885     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
3886     + * Copyright (c) 2005-2006 Junjiro Okajima
3887     + * Copyright (c) 2005 Arun M. Krishnakumar
3888     + * Copyright (c) 2004-2006 David P. Quigley
3889     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
3890     + * Copyright (c) 2003 Puja Gupta
3891     + * Copyright (c) 2003 Harikesavan Krishnan
3892     + * Copyright (c) 2003-2011 Stony Brook University
3893     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
3894     + *
3895     + * This program is free software; you can redistribute it and/or modify
3896     + * it under the terms of the GNU General Public License version 2 as
3897     + * published by the Free Software Foundation.
3898     + */
3899     +
3900     +#include "union.h"
3901     +
3902     +#define RD_NONE 0
3903     +#define RD_CHECK_EMPTY 1
3904     +/* The callback structure for check_empty. */
3905     +struct unionfs_rdutil_callback {
3906     + int err;
3907     + int filldir_called;
3908     + struct unionfs_dir_state *rdstate;
3909     + int mode;
3910     +};
3911     +
3912     +/* This filldir function makes sure only whiteouts exist within a directory. */
3913     +static int readdir_util_callback(void *dirent, const char *oname, int namelen,
3914     + loff_t offset, u64 ino, unsigned int d_type)
3915     +{
3916     + int err = 0;
3917     + struct unionfs_rdutil_callback *buf = dirent;
3918     + int is_whiteout;
3919     + struct filldir_node *found;
3920     + char *name = (char *) oname;
3921     +
3922     + buf->filldir_called = 1;
3923     +
3924     + if (name[0] == '.' && (namelen == 1 ||
3925     + (name[1] == '.' && namelen == 2)))
3926     + goto out;
3927     +
3928     + is_whiteout = is_whiteout_name(&name, &namelen);
3929     +
3930     + found = find_filldir_node(buf->rdstate, name, namelen, is_whiteout);
3931     + /* If it was found in the table there was a previous whiteout. */
3932     + if (found)
3933     + goto out;
3934     +
3935     + /*
3936     + * if it wasn't found and isn't a whiteout, the directory isn't
3937     + * empty.
3938     + */
3939     + err = -ENOTEMPTY;
3940     + if ((buf->mode == RD_CHECK_EMPTY) && !is_whiteout)
3941     + goto out;
3942     +
3943     + err = add_filldir_node(buf->rdstate, name, namelen,
3944     + buf->rdstate->bindex, is_whiteout);
3945     +
3946     +out:
3947     + buf->err = err;
3948     + return err;
3949     +}
3950     +
3951     +/* Is a directory logically empty? */
3952     +int check_empty(struct dentry *dentry, struct dentry *parent,
3953     + struct unionfs_dir_state **namelist)
3954     +{
3955     + int err = 0;
3956     + struct dentry *lower_dentry = NULL;
3957     + struct vfsmount *mnt;
3958     + struct super_block *sb;
3959     + struct file *lower_file;
3960     + struct unionfs_rdutil_callback *buf = NULL;
3961     + int bindex, bstart, bend, bopaque;
3962     +
3963     + sb = dentry->d_sb;
3964     +
3965     +
3966     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
3967     +
3968     + err = unionfs_partial_lookup(dentry, parent);
3969     + if (err)
3970     + goto out;
3971     +
3972     + bstart = dbstart(dentry);
3973     + bend = dbend(dentry);
3974     + bopaque = dbopaque(dentry);
3975     + if (0 <= bopaque && bopaque < bend)
3976     + bend = bopaque;
3977     +
3978     + buf = kmalloc(sizeof(struct unionfs_rdutil_callback), GFP_KERNEL);
3979     + if (unlikely(!buf)) {
3980     + err = -ENOMEM;
3981     + goto out;
3982     + }
3983     + buf->err = 0;
3984     + buf->mode = RD_CHECK_EMPTY;
3985     + buf->rdstate = alloc_rdstate(dentry->d_inode, bstart);
3986     + if (unlikely(!buf->rdstate)) {
3987     + err = -ENOMEM;
3988     + goto out;
3989     + }
3990     +
3991     + /* Process the lower directories with rdutil_callback as a filldir. */
3992     + for (bindex = bstart; bindex <= bend; bindex++) {
3993     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
3994     + if (!lower_dentry)
3995     + continue;
3996     + if (!lower_dentry->d_inode)
3997     + continue;
3998     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
3999     + continue;
4000     +
4001     + dget(lower_dentry);
4002     + mnt = unionfs_mntget(dentry, bindex);
4003     + branchget(sb, bindex);
4004     + lower_file = dentry_open(lower_dentry, mnt, O_RDONLY, current_cred());
4005     + if (IS_ERR(lower_file)) {
4006     + err = PTR_ERR(lower_file);
4007     + branchput(sb, bindex);
4008     + goto out;
4009     + }
4010     +
4011     + do {
4012     + buf->filldir_called = 0;
4013     + buf->rdstate->bindex = bindex;
4014     + err = vfs_readdir(lower_file,
4015     + readdir_util_callback, buf);
4016     + if (buf->err)
4017     + err = buf->err;
4018     + } while ((err >= 0) && buf->filldir_called);
4019     +
4020     + /* fput calls dput for lower_dentry */
4021     + fput(lower_file);
4022     + branchput(sb, bindex);
4023     +
4024     + if (err < 0)
4025     + goto out;
4026     + }
4027     +
4028     +out:
4029     + if (buf) {
4030     + if (namelist && !err)
4031     + *namelist = buf->rdstate;
4032     + else if (buf->rdstate)
4033     + free_rdstate(buf->rdstate);
4034     + kfree(buf);
4035     + }
4036     +
4037     +
4038     + return err;
4039     +}
4040     diff --git a/fs/unionfs/fanout.h b/fs/unionfs/fanout.h
4041     new file mode 100644
4042     index 0000000..ae1b86a
4043     --- /dev/null
4044     +++ b/fs/unionfs/fanout.h
4045     @@ -0,0 +1,407 @@
4046     +/*
4047     + * Copyright (c) 2003-2011 Erez Zadok
4048     + * Copyright (c) 2003-2006 Charles P. Wright
4049     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4050     + * Copyright (c) 2005 Arun M. Krishnakumar
4051     + * Copyright (c) 2004-2006 David P. Quigley
4052     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4053     + * Copyright (c) 2003 Puja Gupta
4054     + * Copyright (c) 2003 Harikesavan Krishnan
4055     + * Copyright (c) 2003-2011 Stony Brook University
4056     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
4057     + *
4058     + * This program is free software; you can redistribute it and/or modify
4059     + * it under the terms of the GNU General Public License version 2 as
4060     + * published by the Free Software Foundation.
4061     + */
4062     +
4063     +#ifndef _FANOUT_H_
4064     +#define _FANOUT_H_
4065     +
4066     +/*
4067     + * Inode to private data
4068     + *
4069     + * Since we use containers and the struct inode is _inside_ the
4070     + * unionfs_inode_info structure, UNIONFS_I will always (given a non-NULL
4071     + * inode pointer), return a valid non-NULL pointer.
4072     + */
4073     +static inline struct unionfs_inode_info *UNIONFS_I(const struct inode *inode)
4074     +{
4075     + return container_of(inode, struct unionfs_inode_info, vfs_inode);
4076     +}
4077     +
4078     +#define ibstart(ino) (UNIONFS_I(ino)->bstart)
4079     +#define ibend(ino) (UNIONFS_I(ino)->bend)
4080     +
4081     +/* Dentry to private data */
4082     +#define UNIONFS_D(dent) ((struct unionfs_dentry_info *)(dent)->d_fsdata)
4083     +#define dbstart(dent) (UNIONFS_D(dent)->bstart)
4084     +#define dbend(dent) (UNIONFS_D(dent)->bend)
4085     +#define dbopaque(dent) (UNIONFS_D(dent)->bopaque)
4086     +
4087     +/* Superblock to private data */
4088     +#define UNIONFS_SB(super) ((struct unionfs_sb_info *)(super)->s_fs_info)
4089     +#define sbstart(sb) 0
4090     +#define sbend(sb) (UNIONFS_SB(sb)->bend)
4091     +#define sbmax(sb) (UNIONFS_SB(sb)->bend + 1)
4092     +#define sbhbid(sb) (UNIONFS_SB(sb)->high_branch_id)
4093     +
4094     +/* File to private Data */
4095     +#define UNIONFS_F(file) ((struct unionfs_file_info *)((file)->private_data))
4096     +#define fbstart(file) (UNIONFS_F(file)->bstart)
4097     +#define fbend(file) (UNIONFS_F(file)->bend)
4098     +
4099     +/* macros to manipulate branch IDs in stored in our superblock */
4100     +static inline int branch_id(struct super_block *sb, int index)
4101     +{
4102     + BUG_ON(!sb || index < 0);
4103     + return UNIONFS_SB(sb)->data[index].branch_id;
4104     +}
4105     +
4106     +static inline void set_branch_id(struct super_block *sb, int index, int val)
4107     +{
4108     + BUG_ON(!sb || index < 0);
4109     + UNIONFS_SB(sb)->data[index].branch_id = val;
4110     +}
4111     +
4112     +static inline void new_branch_id(struct super_block *sb, int index)
4113     +{
4114     + BUG_ON(!sb || index < 0);
4115     + set_branch_id(sb, index, ++UNIONFS_SB(sb)->high_branch_id);
4116     +}
4117     +
4118     +/*
4119     + * Find new index of matching branch with an existing superblock of a known
4120     + * (possibly old) id. This is needed because branches could have been
4121     + * added/deleted causing the branches of any open files to shift.
4122     + *
4123     + * @sb: the new superblock which may have new/different branch IDs
4124     + * @id: the old/existing id we're looking for
4125     + * Returns index of newly found branch (0 or greater), -1 otherwise.
4126     + */
4127     +static inline int branch_id_to_idx(struct super_block *sb, int id)
4128     +{
4129     + int i;
4130     + for (i = 0; i < sbmax(sb); i++) {
4131     + if (branch_id(sb, i) == id)
4132     + return i;
4133     + }
4134     + /* in the non-ODF code, this should really never happen */
4135     + printk(KERN_WARNING "unionfs: cannot find branch with id %d\n", id);
4136     + return -1;
4137     +}
4138     +
4139     +/* File to lower file. */
4140     +static inline struct file *unionfs_lower_file(const struct file *f)
4141     +{
4142     + BUG_ON(!f);
4143     + return UNIONFS_F(f)->lower_files[fbstart(f)];
4144     +}
4145     +
4146     +static inline struct file *unionfs_lower_file_idx(const struct file *f,
4147     + int index)
4148     +{
4149     + BUG_ON(!f || index < 0);
4150     + return UNIONFS_F(f)->lower_files[index];
4151     +}
4152     +
4153     +static inline void unionfs_set_lower_file_idx(struct file *f, int index,
4154     + struct file *val)
4155     +{
4156     + BUG_ON(!f || index < 0);
4157     + UNIONFS_F(f)->lower_files[index] = val;
4158     + /* save branch ID (may be redundant?) */
4159     + UNIONFS_F(f)->saved_branch_ids[index] =
4160     + branch_id((f)->f_path.dentry->d_sb, index);
4161     +}
4162     +
4163     +static inline void unionfs_set_lower_file(struct file *f, struct file *val)
4164     +{
4165     + BUG_ON(!f);
4166     + unionfs_set_lower_file_idx((f), fbstart(f), (val));
4167     +}
4168     +
4169     +/* Inode to lower inode. */
4170     +static inline struct inode *unionfs_lower_inode(const struct inode *i)
4171     +{
4172     + BUG_ON(!i);
4173     + return UNIONFS_I(i)->lower_inodes[ibstart(i)];
4174     +}
4175     +
4176     +static inline struct inode *unionfs_lower_inode_idx(const struct inode *i,
4177     + int index)
4178     +{
4179     + BUG_ON(!i || index < 0);
4180     + return UNIONFS_I(i)->lower_inodes[index];
4181     +}
4182     +
4183     +static inline void unionfs_set_lower_inode_idx(struct inode *i, int index,
4184     + struct inode *val)
4185     +{
4186     + BUG_ON(!i || index < 0);
4187     + UNIONFS_I(i)->lower_inodes[index] = val;
4188     +}
4189     +
4190     +static inline void unionfs_set_lower_inode(struct inode *i, struct inode *val)
4191     +{
4192     + BUG_ON(!i);
4193     + UNIONFS_I(i)->lower_inodes[ibstart(i)] = val;
4194     +}
4195     +
4196     +/* Superblock to lower superblock. */
4197     +static inline struct super_block *unionfs_lower_super(
4198     + const struct super_block *sb)
4199     +{
4200     + BUG_ON(!sb);
4201     + return UNIONFS_SB(sb)->data[sbstart(sb)].sb;
4202     +}
4203     +
4204     +static inline struct super_block *unionfs_lower_super_idx(
4205     + const struct super_block *sb,
4206     + int index)
4207     +{
4208     + BUG_ON(!sb || index < 0);
4209     + return UNIONFS_SB(sb)->data[index].sb;
4210     +}
4211     +
4212     +static inline void unionfs_set_lower_super_idx(struct super_block *sb,
4213     + int index,
4214     + struct super_block *val)
4215     +{
4216     + BUG_ON(!sb || index < 0);
4217     + UNIONFS_SB(sb)->data[index].sb = val;
4218     +}
4219     +
4220     +static inline void unionfs_set_lower_super(struct super_block *sb,
4221     + struct super_block *val)
4222     +{
4223     + BUG_ON(!sb);
4224     + UNIONFS_SB(sb)->data[sbstart(sb)].sb = val;
4225     +}
4226     +
4227     +/* Branch count macros. */
4228     +static inline int branch_count(const struct super_block *sb, int index)
4229     +{
4230     + BUG_ON(!sb || index < 0);
4231     + return atomic_read(&UNIONFS_SB(sb)->data[index].open_files);
4232     +}
4233     +
4234     +static inline void set_branch_count(struct super_block *sb, int index, int val)
4235     +{
4236     + BUG_ON(!sb || index < 0);
4237     + atomic_set(&UNIONFS_SB(sb)->data[index].open_files, val);
4238     +}
4239     +
4240     +static inline void branchget(struct super_block *sb, int index)
4241     +{
4242     + BUG_ON(!sb || index < 0);
4243     + atomic_inc(&UNIONFS_SB(sb)->data[index].open_files);
4244     +}
4245     +
4246     +static inline void branchput(struct super_block *sb, int index)
4247     +{
4248     + BUG_ON(!sb || index < 0);
4249     + atomic_dec(&UNIONFS_SB(sb)->data[index].open_files);
4250     +}
4251     +
4252     +/* Dentry macros */
4253     +static inline void unionfs_set_lower_dentry_idx(struct dentry *dent, int index,
4254     + struct dentry *val)
4255     +{
4256     + BUG_ON(!dent || index < 0);
4257     + UNIONFS_D(dent)->lower_paths[index].dentry = val;
4258     +}
4259     +
4260     +static inline struct dentry *unionfs_lower_dentry_idx(
4261     + const struct dentry *dent,
4262     + int index)
4263     +{
4264     + BUG_ON(!dent || index < 0);
4265     + return UNIONFS_D(dent)->lower_paths[index].dentry;
4266     +}
4267     +
4268     +static inline struct dentry *unionfs_lower_dentry(const struct dentry *dent)
4269     +{
4270     + BUG_ON(!dent);
4271     + return unionfs_lower_dentry_idx(dent, dbstart(dent));
4272     +}
4273     +
4274     +static inline void unionfs_set_lower_mnt_idx(struct dentry *dent, int index,
4275     + struct vfsmount *mnt)
4276     +{
4277     + BUG_ON(!dent || index < 0);
4278     + UNIONFS_D(dent)->lower_paths[index].mnt = mnt;
4279     +}
4280     +
4281     +static inline struct vfsmount *unionfs_lower_mnt_idx(
4282     + const struct dentry *dent,
4283     + int index)
4284     +{
4285     + BUG_ON(!dent || index < 0);
4286     + return UNIONFS_D(dent)->lower_paths[index].mnt;
4287     +}
4288     +
4289     +static inline struct vfsmount *unionfs_lower_mnt(const struct dentry *dent)
4290     +{
4291     + BUG_ON(!dent);
4292     + return unionfs_lower_mnt_idx(dent, dbstart(dent));
4293     +}
4294     +
4295     +/* Macros for locking a dentry. */
4296     +enum unionfs_dentry_lock_class {
4297     + UNIONFS_DMUTEX_NORMAL,
4298     + UNIONFS_DMUTEX_ROOT,
4299     + UNIONFS_DMUTEX_PARENT,
4300     + UNIONFS_DMUTEX_CHILD,
4301     + UNIONFS_DMUTEX_WHITEOUT,
4302     + UNIONFS_DMUTEX_REVAL_PARENT, /* for file/dentry revalidate */
4303     + UNIONFS_DMUTEX_REVAL_CHILD, /* for file/dentry revalidate */
4304     +};
4305     +
4306     +static inline void unionfs_lock_dentry(struct dentry *d,
4307     + unsigned int subclass)
4308     +{
4309     + BUG_ON(!d);
4310     + mutex_lock_nested(&UNIONFS_D(d)->lock, subclass);
4311     +}
4312     +
4313     +static inline void unionfs_unlock_dentry(struct dentry *d)
4314     +{
4315     + BUG_ON(!d);
4316     + mutex_unlock(&UNIONFS_D(d)->lock);
4317     +}
4318     +
4319     +static inline struct dentry *unionfs_lock_parent(struct dentry *d,
4320     + unsigned int subclass)
4321     +{
4322     + struct dentry *p;
4323     +
4324     + BUG_ON(!d);
4325     + p = dget_parent(d);
4326     + if (p != d)
4327     + mutex_lock_nested(&UNIONFS_D(p)->lock, subclass);
4328     + return p;
4329     +}
4330     +
4331     +static inline void unionfs_unlock_parent(struct dentry *d, struct dentry *p)
4332     +{
4333     + BUG_ON(!d);
4334     + BUG_ON(!p);
4335     + if (p != d) {
4336     + BUG_ON(!mutex_is_locked(&UNIONFS_D(p)->lock));
4337     + mutex_unlock(&UNIONFS_D(p)->lock);
4338     + }
4339     + dput(p);
4340     +}
4341     +
4342     +static inline void verify_locked(struct dentry *d)
4343     +{
4344     + BUG_ON(!d);
4345     + BUG_ON(!mutex_is_locked(&UNIONFS_D(d)->lock));
4346     +}
4347     +
4348     +/* macros to put lower objects */
4349     +
4350     +/*
4351     + * iput lower inodes of an unionfs dentry, from bstart to bend. If
4352     + * @free_lower is true, then also kfree the memory used to hold the lower
4353     + * object pointers.
4354     + */
4355     +static inline void iput_lowers(struct inode *inode,
4356     + int bstart, int bend, bool free_lower)
4357     +{
4358     + struct inode *lower_inode;
4359     + int bindex;
4360     +
4361     + BUG_ON(!inode);
4362     + BUG_ON(!UNIONFS_I(inode));
4363     + BUG_ON(bstart < 0);
4364     +
4365     + for (bindex = bstart; bindex <= bend; bindex++) {
4366     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4367     + if (lower_inode) {
4368     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
4369     + /* see Documentation/filesystems/unionfs/issues.txt */
4370     + lockdep_off();
4371     + iput(lower_inode);
4372     + lockdep_on();
4373     + }
4374     + }
4375     +
4376     + if (free_lower) {
4377     + kfree(UNIONFS_I(inode)->lower_inodes);
4378     + UNIONFS_I(inode)->lower_inodes = NULL;
4379     + }
4380     +}
4381     +
4382     +/* iput all lower inodes, and reset start/end branch indices to -1 */
4383     +static inline void iput_lowers_all(struct inode *inode, bool free_lower)
4384     +{
4385     + int bstart, bend;
4386     +
4387     + BUG_ON(!inode);
4388     + BUG_ON(!UNIONFS_I(inode));
4389     + bstart = ibstart(inode);
4390     + bend = ibend(inode);
4391     + BUG_ON(bstart < 0);
4392     +
4393     + iput_lowers(inode, bstart, bend, free_lower);
4394     + ibstart(inode) = ibend(inode) = -1;
4395     +}
4396     +
4397     +/*
4398     + * dput/mntput all lower dentries and vfsmounts of an unionfs dentry, from
4399     + * bstart to bend. If @free_lower is true, then also kfree the memory used
4400     + * to hold the lower object pointers.
4401     + *
4402     + * XXX: implement using path_put VFS macros
4403     + */
4404     +static inline void path_put_lowers(struct dentry *dentry,
4405     + int bstart, int bend, bool free_lower)
4406     +{
4407     + struct dentry *lower_dentry;
4408     + struct vfsmount *lower_mnt;
4409     + int bindex;
4410     +
4411     + BUG_ON(!dentry);
4412     + BUG_ON(!UNIONFS_D(dentry));
4413     + BUG_ON(bstart < 0);
4414     +
4415     + for (bindex = bstart; bindex <= bend; bindex++) {
4416     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4417     + if (lower_dentry) {
4418     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
4419     + dput(lower_dentry);
4420     + }
4421     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
4422     + if (lower_mnt) {
4423     + unionfs_set_lower_mnt_idx(dentry, bindex, NULL);
4424     + mntput(lower_mnt);
4425     + }
4426     + }
4427     +
4428     + if (free_lower) {
4429     + kfree(UNIONFS_D(dentry)->lower_paths);
4430     + UNIONFS_D(dentry)->lower_paths = NULL;
4431     + }
4432     +}
4433     +
4434     +/*
4435     + * dput/mntput all lower dentries and vfsmounts, and reset start/end branch
4436     + * indices to -1.
4437     + */
4438     +static inline void path_put_lowers_all(struct dentry *dentry, bool free_lower)
4439     +{
4440     + int bstart, bend;
4441     +
4442     + BUG_ON(!dentry);
4443     + BUG_ON(!UNIONFS_D(dentry));
4444     + bstart = dbstart(dentry);
4445     + bend = dbend(dentry);
4446     + BUG_ON(bstart < 0);
4447     +
4448     + path_put_lowers(dentry, bstart, bend, free_lower);
4449     + dbstart(dentry) = dbend(dentry) = -1;
4450     +}
4451     +
4452     +#endif /* not _FANOUT_H */
4453     diff --git a/fs/unionfs/file.c b/fs/unionfs/file.c
4454     new file mode 100644
4455     index 0000000..416c52f
4456     --- /dev/null
4457     +++ b/fs/unionfs/file.c
4458     @@ -0,0 +1,382 @@
4459     +/*
4460     + * Copyright (c) 2003-2011 Erez Zadok
4461     + * Copyright (c) 2003-2006 Charles P. Wright
4462     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4463     + * Copyright (c) 2005-2006 Junjiro Okajima
4464     + * Copyright (c) 2005 Arun M. Krishnakumar
4465     + * Copyright (c) 2004-2006 David P. Quigley
4466     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4467     + * Copyright (c) 2003 Puja Gupta
4468     + * Copyright (c) 2003 Harikesavan Krishnan
4469     + * Copyright (c) 2003-2011 Stony Brook University
4470     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
4471     + *
4472     + * This program is free software; you can redistribute it and/or modify
4473     + * it under the terms of the GNU General Public License version 2 as
4474     + * published by the Free Software Foundation.
4475     + */
4476     +
4477     +#include "union.h"
4478     +
4479     +static ssize_t unionfs_read(struct file *file, char __user *buf,
4480     + size_t count, loff_t *ppos)
4481     +{
4482     + int err;
4483     + struct file *lower_file;
4484     + struct dentry *dentry = file->f_path.dentry;
4485     + struct dentry *parent;
4486     +
4487     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4488     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4489     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4490     +
4491     + err = unionfs_file_revalidate(file, parent, false);
4492     + if (unlikely(err))
4493     + goto out;
4494     +
4495     + lower_file = unionfs_lower_file(file);
4496     + err = vfs_read(lower_file, buf, count, ppos);
4497     + /* update our inode atime upon a successful lower read */
4498     + if (err >= 0) {
4499     + fsstack_copy_attr_atime(dentry->d_inode,
4500     + lower_file->f_path.dentry->d_inode);
4501     + unionfs_check_file(file);
4502     + }
4503     +
4504     +out:
4505     + unionfs_unlock_dentry(dentry);
4506     + unionfs_unlock_parent(dentry, parent);
4507     + unionfs_read_unlock(dentry->d_sb);
4508     + return err;
4509     +}
4510     +
4511     +static ssize_t unionfs_write(struct file *file, const char __user *buf,
4512     + size_t count, loff_t *ppos)
4513     +{
4514     + int err = 0;
4515     + struct file *lower_file;
4516     + struct dentry *dentry = file->f_path.dentry;
4517     + struct dentry *parent;
4518     +
4519     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4520     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4521     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4522     +
4523     + err = unionfs_file_revalidate(file, parent, true);
4524     + if (unlikely(err))
4525     + goto out;
4526     +
4527     + lower_file = unionfs_lower_file(file);
4528     + err = vfs_write(lower_file, buf, count, ppos);
4529     + /* update our inode times+sizes upon a successful lower write */
4530     + if (err >= 0) {
4531     + fsstack_copy_inode_size(dentry->d_inode,
4532     + lower_file->f_path.dentry->d_inode);
4533     + fsstack_copy_attr_times(dentry->d_inode,
4534     + lower_file->f_path.dentry->d_inode);
4535     + UNIONFS_F(file)->wrote_to_file = true; /* for delayed copyup */
4536     + unionfs_check_file(file);
4537     + }
4538     +
4539     +out:
4540     + unionfs_unlock_dentry(dentry);
4541     + unionfs_unlock_parent(dentry, parent);
4542     + unionfs_read_unlock(dentry->d_sb);
4543     + return err;
4544     +}
4545     +
4546     +static int unionfs_file_readdir(struct file *file, void *dirent,
4547     + filldir_t filldir)
4548     +{
4549     + return -ENOTDIR;
4550     +}
4551     +
4552     +static int unionfs_mmap(struct file *file, struct vm_area_struct *vma)
4553     +{
4554     + int err = 0;
4555     + bool willwrite;
4556     + struct file *lower_file;
4557     + struct dentry *dentry = file->f_path.dentry;
4558     + struct dentry *parent;
4559     + const struct vm_operations_struct *saved_vm_ops = NULL;
4560     +
4561     + /*
4562     + * Since mm/memory.c:might_fault() (under PROVE_LOCKING) was
4563     + * modified in 2.6.29-rc1 to call might_lock_read on mmap_sem, this
4564     + * has been causing false positives in file system stacking layers.
4565     + * In particular, our ->mmap is called after sys_mmap2 already holds
4566     + * mmap_sem, then we lock our own mutexes; but earlier, it's
4567     + * possible for lockdep to have locked our mutexes first, and then
4568     + * we call a lower ->readdir which could call might_fault. The
4569     + * different ordering of the locks is what lockdep complains about
4570     + * -- unnecessarily. Therefore, we have no choice but to tell
4571     + * lockdep to temporarily turn off lockdep here. Note: the comments
4572     + * inside might_sleep also suggest that it would have been
4573     + * nicer to only annotate paths that needs that might_lock_read.
4574     + */
4575     + lockdep_off();
4576     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4577     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4578     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4579     +
4580     + /* This might be deferred to mmap's writepage */
4581     + willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
4582     + err = unionfs_file_revalidate(file, parent, willwrite);
4583     + if (unlikely(err))
4584     + goto out;
4585     + unionfs_check_file(file);
4586     +
4587     + /*
4588     + * File systems which do not implement ->writepage may use
4589     + * generic_file_readonly_mmap as their ->mmap op. If you call
4590     + * generic_file_readonly_mmap with VM_WRITE, you'd get an -EINVAL.
4591     + * But we cannot call the lower ->mmap op, so we can't tell that
4592     + * writeable mappings won't work. Therefore, our only choice is to
4593     + * check if the lower file system supports the ->writepage, and if
4594     + * not, return EINVAL (the same error that
4595     + * generic_file_readonly_mmap returns in that case).
4596     + */
4597     + lower_file = unionfs_lower_file(file);
4598     + if (willwrite && !lower_file->f_mapping->a_ops->writepage) {
4599     + err = -EINVAL;
4600     + printk(KERN_ERR "unionfs: branch %d file system does not "
4601     + "support writeable mmap\n", fbstart(file));
4602     + goto out;
4603     + }
4604     +
4605     + /*
4606     + * find and save lower vm_ops.
4607     + *
4608     + * XXX: the VFS should have a cleaner way of finding the lower vm_ops
4609     + */
4610     + if (!UNIONFS_F(file)->lower_vm_ops) {
4611     + err = lower_file->f_op->mmap(lower_file, vma);
4612     + if (err) {
4613     + printk(KERN_ERR "unionfs: lower mmap failed %d\n", err);
4614     + goto out;
4615     + }
4616     + saved_vm_ops = vma->vm_ops;
4617     + err = do_munmap(current->mm, vma->vm_start,
4618     + vma->vm_end - vma->vm_start);
4619     + if (err) {
4620     + printk(KERN_ERR "unionfs: do_munmap failed %d\n", err);
4621     + goto out;
4622     + }
4623     + }
4624     +
4625     + file->f_mapping->a_ops = &unionfs_dummy_aops;
4626     + err = generic_file_mmap(file, vma);
4627     + file->f_mapping->a_ops = &unionfs_aops;
4628     + if (err) {
4629     + printk(KERN_ERR "unionfs: generic_file_mmap failed %d\n", err);
4630     + goto out;
4631     + }
4632     + vma->vm_ops = &unionfs_vm_ops;
4633     + if (!UNIONFS_F(file)->lower_vm_ops)
4634     + UNIONFS_F(file)->lower_vm_ops = saved_vm_ops;
4635     +
4636     +out:
4637     + if (!err) {
4638     + /* copyup could cause parent dir times to change */
4639     + unionfs_copy_attr_times(parent->d_inode);
4640     + unionfs_check_file(file);
4641     + }
4642     + unionfs_unlock_dentry(dentry);
4643     + unionfs_unlock_parent(dentry, parent);
4644     + unionfs_read_unlock(dentry->d_sb);
4645     + lockdep_on();
4646     + return err;
4647     +}
4648     +
4649     +int unionfs_fsync(struct file *file, int datasync)
4650     +{
4651     + int bindex, bstart, bend;
4652     + struct file *lower_file;
4653     + struct dentry *dentry = file->f_path.dentry;
4654     + struct dentry *lower_dentry;
4655     + struct dentry *parent;
4656     + struct inode *lower_inode, *inode;
4657     + int err = -EINVAL;
4658     +
4659     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4660     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4661     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4662     +
4663     + err = unionfs_file_revalidate(file, parent, true);
4664     + if (unlikely(err))
4665     + goto out;
4666     + unionfs_check_file(file);
4667     +
4668     + bstart = fbstart(file);
4669     + bend = fbend(file);
4670     + if (bstart < 0 || bend < 0)
4671     + goto out;
4672     +
4673     + inode = dentry->d_inode;
4674     + if (unlikely(!inode)) {
4675     + printk(KERN_ERR
4676     + "unionfs: null lower inode in unionfs_fsync\n");
4677     + goto out;
4678     + }
4679     + for (bindex = bstart; bindex <= bend; bindex++) {
4680     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4681     + if (!lower_inode || !lower_inode->i_fop->fsync)
4682     + continue;
4683     + lower_file = unionfs_lower_file_idx(file, bindex);
4684     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4685     + mutex_lock(&lower_inode->i_mutex);
4686     + err = lower_inode->i_fop->fsync(lower_file, datasync);
4687     + if (!err && bindex == bstart)
4688     + fsstack_copy_attr_times(inode, lower_inode);
4689     + mutex_unlock(&lower_inode->i_mutex);
4690     + if (err)
4691     + goto out;
4692     + }
4693     +
4694     +out:
4695     + if (!err)
4696     + unionfs_check_file(file);
4697     + unionfs_unlock_dentry(dentry);
4698     + unionfs_unlock_parent(dentry, parent);
4699     + unionfs_read_unlock(dentry->d_sb);
4700     + return err;
4701     +}
4702     +
4703     +int unionfs_fasync(int fd, struct file *file, int flag)
4704     +{
4705     + int bindex, bstart, bend;
4706     + struct file *lower_file;
4707     + struct dentry *dentry = file->f_path.dentry;
4708     + struct dentry *parent;
4709     + struct inode *lower_inode, *inode;
4710     + int err = 0;
4711     +
4712     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4713     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4714     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4715     +
4716     + err = unionfs_file_revalidate(file, parent, true);
4717     + if (unlikely(err))
4718     + goto out;
4719     + unionfs_check_file(file);
4720     +
4721     + bstart = fbstart(file);
4722     + bend = fbend(file);
4723     + if (bstart < 0 || bend < 0)
4724     + goto out;
4725     +
4726     + inode = dentry->d_inode;
4727     + if (unlikely(!inode)) {
4728     + printk(KERN_ERR
4729     + "unionfs: null lower inode in unionfs_fasync\n");
4730     + goto out;
4731     + }
4732     + for (bindex = bstart; bindex <= bend; bindex++) {
4733     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
4734     + if (!lower_inode || !lower_inode->i_fop->fasync)
4735     + continue;
4736     + lower_file = unionfs_lower_file_idx(file, bindex);
4737     + mutex_lock(&lower_inode->i_mutex);
4738     + err = lower_inode->i_fop->fasync(fd, lower_file, flag);
4739     + if (!err && bindex == bstart)
4740     + fsstack_copy_attr_times(inode, lower_inode);
4741     + mutex_unlock(&lower_inode->i_mutex);
4742     + if (err)
4743     + goto out;
4744     + }
4745     +
4746     +out:
4747     + if (!err)
4748     + unionfs_check_file(file);
4749     + unionfs_unlock_dentry(dentry);
4750     + unionfs_unlock_parent(dentry, parent);
4751     + unionfs_read_unlock(dentry->d_sb);
4752     + return err;
4753     +}
4754     +
4755     +static ssize_t unionfs_splice_read(struct file *file, loff_t *ppos,
4756     + struct pipe_inode_info *pipe, size_t len,
4757     + unsigned int flags)
4758     +{
4759     + ssize_t err;
4760     + struct file *lower_file;
4761     + struct dentry *dentry = file->f_path.dentry;
4762     + struct dentry *parent;
4763     +
4764     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4765     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4766     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4767     +
4768     + err = unionfs_file_revalidate(file, parent, false);
4769     + if (unlikely(err))
4770     + goto out;
4771     +
4772     + lower_file = unionfs_lower_file(file);
4773     + err = vfs_splice_to(lower_file, ppos, pipe, len, flags);
4774     + /* update our inode atime upon a successful lower splice-read */
4775     + if (err >= 0) {
4776     + fsstack_copy_attr_atime(dentry->d_inode,
4777     + lower_file->f_path.dentry->d_inode);
4778     + unionfs_check_file(file);
4779     + }
4780     +
4781     +out:
4782     + unionfs_unlock_dentry(dentry);
4783     + unionfs_unlock_parent(dentry, parent);
4784     + unionfs_read_unlock(dentry->d_sb);
4785     + return err;
4786     +}
4787     +
4788     +static ssize_t unionfs_splice_write(struct pipe_inode_info *pipe,
4789     + struct file *file, loff_t *ppos,
4790     + size_t len, unsigned int flags)
4791     +{
4792     + ssize_t err = 0;
4793     + struct file *lower_file;
4794     + struct dentry *dentry = file->f_path.dentry;
4795     + struct dentry *parent;
4796     +
4797     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_PARENT);
4798     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4799     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4800     +
4801     + err = unionfs_file_revalidate(file, parent, true);
4802     + if (unlikely(err))
4803     + goto out;
4804     +
4805     + lower_file = unionfs_lower_file(file);
4806     + err = vfs_splice_from(pipe, lower_file, ppos, len, flags);
4807     + /* update our inode times+sizes upon a successful lower write */
4808     + if (err >= 0) {
4809     + fsstack_copy_inode_size(dentry->d_inode,
4810     + lower_file->f_path.dentry->d_inode);
4811     + fsstack_copy_attr_times(dentry->d_inode,
4812     + lower_file->f_path.dentry->d_inode);
4813     + unionfs_check_file(file);
4814     + }
4815     +
4816     +out:
4817     + unionfs_unlock_dentry(dentry);
4818     + unionfs_unlock_parent(dentry, parent);
4819     + unionfs_read_unlock(dentry->d_sb);
4820     + return err;
4821     +}
4822     +
4823     +struct file_operations unionfs_main_fops = {
4824     + .llseek = generic_file_llseek,
4825     + .read = unionfs_read,
4826     + .write = unionfs_write,
4827     + .readdir = unionfs_file_readdir,
4828     + .unlocked_ioctl = unionfs_ioctl,
4829     +#ifdef CONFIG_COMPAT
4830     + .compat_ioctl = unionfs_ioctl,
4831     +#endif
4832     + .mmap = unionfs_mmap,
4833     + .open = unionfs_open,
4834     + .flush = unionfs_flush,
4835     + .release = unionfs_file_release,
4836     + .fsync = unionfs_fsync,
4837     + .fasync = unionfs_fasync,
4838     + .splice_read = unionfs_splice_read,
4839     + .splice_write = unionfs_splice_write,
4840     +};
4841     diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
4842     new file mode 100644
4843     index 0000000..b207c13
4844     --- /dev/null
4845     +++ b/fs/unionfs/inode.c
4846     @@ -0,0 +1,1099 @@
4847     +/*
4848     + * Copyright (c) 2003-2011 Erez Zadok
4849     + * Copyright (c) 2003-2006 Charles P. Wright
4850     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
4851     + * Copyright (c) 2005-2006 Junjiro Okajima
4852     + * Copyright (c) 2005 Arun M. Krishnakumar
4853     + * Copyright (c) 2004-2006 David P. Quigley
4854     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
4855     + * Copyright (c) 2003 Puja Gupta
4856     + * Copyright (c) 2003 Harikesavan Krishnan
4857     + * Copyright (c) 2003-2011 Stony Brook University
4858     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
4859     + *
4860     + * This program is free software; you can redistribute it and/or modify
4861     + * it under the terms of the GNU General Public License version 2 as
4862     + * published by the Free Software Foundation.
4863     + */
4864     +
4865     +#include "union.h"
4866     +
4867     +/*
4868     + * Find a writeable branch to create new object in. Checks all writeble
4869     + * branches of the parent inode, from istart to iend order; if none are
4870     + * suitable, also tries branch 0 (which may require a copyup).
4871     + *
4872     + * Return a lower_dentry we can use to create object in, or ERR_PTR.
4873     + */
4874     +static struct dentry *find_writeable_branch(struct inode *parent,
4875     + struct dentry *dentry)
4876     +{
4877     + int err = -EINVAL;
4878     + int bindex, istart, iend;
4879     + struct dentry *lower_dentry = NULL;
4880     +
4881     + istart = ibstart(parent);
4882     + iend = ibend(parent);
4883     + if (istart < 0)
4884     + goto out;
4885     +
4886     +begin:
4887     + for (bindex = istart; bindex <= iend; bindex++) {
4888     + /* skip non-writeable branches */
4889     + err = is_robranch_super(dentry->d_sb, bindex);
4890     + if (err) {
4891     + err = -EROFS;
4892     + continue;
4893     + }
4894     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
4895     + if (!lower_dentry)
4896     + continue;
4897     + /*
4898     + * check for whiteouts in writeable branch, and remove them
4899     + * if necessary.
4900     + */
4901     + err = check_unlink_whiteout(dentry, lower_dentry, bindex);
4902     + if (err > 0) /* ignore if whiteout found and removed */
4903     + err = 0;
4904     + if (err)
4905     + continue;
4906     + /* if get here, we can write to the branch */
4907     + break;
4908     + }
4909     + /*
4910     + * If istart wasn't already branch 0, and we got any error, then try
4911     + * branch 0 (which may require copyup)
4912     + */
4913     + if (err && istart > 0) {
4914     + istart = iend = 0;
4915     + goto begin;
4916     + }
4917     +
4918     + /*
4919     + * If we tried even branch 0, and still got an error, abort. But if
4920     + * the error was an EROFS, then we should try to copyup.
4921     + */
4922     + if (err && err != -EROFS)
4923     + goto out;
4924     +
4925     + /*
4926     + * If we get here, then check if copyup needed. If lower_dentry is
4927     + * NULL, create the entire dentry directory structure in branch 0.
4928     + */
4929     + if (!lower_dentry) {
4930     + bindex = 0;
4931     + lower_dentry = create_parents(parent, dentry,
4932     + dentry->d_name.name, bindex);
4933     + if (IS_ERR(lower_dentry)) {
4934     + err = PTR_ERR(lower_dentry);
4935     + goto out;
4936     + }
4937     + }
4938     + err = 0; /* all's well */
4939     +out:
4940     + if (err)
4941     + return ERR_PTR(err);
4942     + return lower_dentry;
4943     +}
4944     +
4945     +static int unionfs_create(struct inode *dir, struct dentry *dentry,
4946     + int mode, struct nameidata *nd_unused)
4947     +{
4948     + int err = 0;
4949     + struct dentry *lower_dentry = NULL;
4950     + struct dentry *lower_parent_dentry = NULL;
4951     + struct dentry *parent;
4952     + int valid = 0;
4953     + struct nameidata lower_nd;
4954     +
4955     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
4956     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
4957     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
4958     +
4959     + valid = __unionfs_d_revalidate(dentry, parent, false);
4960     + if (unlikely(!valid)) {
4961     + err = -ESTALE; /* same as what real_lookup does */
4962     + goto out;
4963     + }
4964     +
4965     + lower_dentry = find_writeable_branch(dir, dentry);
4966     + if (IS_ERR(lower_dentry)) {
4967     + err = PTR_ERR(lower_dentry);
4968     + goto out;
4969     + }
4970     +
4971     + lower_parent_dentry = lock_parent(lower_dentry);
4972     + if (IS_ERR(lower_parent_dentry)) {
4973     + err = PTR_ERR(lower_parent_dentry);
4974     + goto out_unlock;
4975     + }
4976     +
4977     + err = init_lower_nd(&lower_nd, LOOKUP_CREATE);
4978     + if (unlikely(err < 0))
4979     + goto out_unlock;
4980     + err = vfs_create(lower_parent_dentry->d_inode, lower_dentry, mode,
4981     + &lower_nd);
4982     + release_lower_nd(&lower_nd, err);
4983     +
4984     + if (!err) {
4985     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
4986     + if (!err) {
4987     + unionfs_copy_attr_times(dir);
4988     + fsstack_copy_inode_size(dir,
4989     + lower_parent_dentry->d_inode);
4990     + /* update no. of links on parent directory */
4991     + dir->i_nlink = unionfs_get_nlinks(dir);
4992     + }
4993     + }
4994     +
4995     +out_unlock:
4996     + unlock_dir(lower_parent_dentry);
4997     +out:
4998     + if (!err) {
4999     + unionfs_postcopyup_setmnt(dentry);
5000     + unionfs_check_inode(dir);
5001     + unionfs_check_dentry(dentry);
5002     + }
5003     + unionfs_unlock_dentry(dentry);
5004     + unionfs_unlock_parent(dentry, parent);
5005     + unionfs_read_unlock(dentry->d_sb);
5006     + return err;
5007     +}
5008     +
5009     +/*
5010     + * unionfs_lookup is the only special function which takes a dentry, yet we
5011     + * do NOT want to call __unionfs_d_revalidate_chain because by definition,
5012     + * we don't have a valid dentry here yet.
5013     + */
5014     +static struct dentry *unionfs_lookup(struct inode *dir,
5015     + struct dentry *dentry,
5016     + struct nameidata *nd_unused)
5017     +{
5018     + struct dentry *ret, *parent;
5019     + int err = 0;
5020     +
5021     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5022     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5023     +
5024     + /*
5025     + * As long as we lock/dget the parent, then can skip validating the
5026     + * parent now; we may have to rebuild this dentry on the next
5027     + * ->d_revalidate, however.
5028     + */
5029     +
5030     + /* allocate dentry private data. We free it in ->d_release */
5031     + err = new_dentry_private_data(dentry, UNIONFS_DMUTEX_CHILD);
5032     + if (unlikely(err)) {
5033     + ret = ERR_PTR(err);
5034     + goto out;
5035     + }
5036     +
5037     + ret = unionfs_lookup_full(dentry, parent, INTERPOSE_LOOKUP);
5038     +
5039     + if (!IS_ERR(ret)) {
5040     + if (ret)
5041     + dentry = ret;
5042     + /* lookup_full can return multiple positive dentries */
5043     + if (dentry->d_inode && !S_ISDIR(dentry->d_inode->i_mode)) {
5044     + BUG_ON(dbstart(dentry) < 0);
5045     + unionfs_postcopyup_release(dentry);
5046     + }
5047     + unionfs_copy_attr_times(dentry->d_inode);
5048     + }
5049     +
5050     + unionfs_check_inode(dir);
5051     + if (!IS_ERR(ret))
5052     + unionfs_check_dentry(dentry);
5053     + unionfs_check_dentry(parent);
5054     + unionfs_unlock_dentry(dentry); /* locked in new_dentry_private data */
5055     +
5056     +out:
5057     + unionfs_unlock_parent(dentry, parent);
5058     + unionfs_read_unlock(dentry->d_sb);
5059     +
5060     + return ret;
5061     +}
5062     +
5063     +static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
5064     + struct dentry *new_dentry)
5065     +{
5066     + int err = 0;
5067     + struct dentry *lower_old_dentry = NULL;
5068     + struct dentry *lower_new_dentry = NULL;
5069     + struct dentry *lower_dir_dentry = NULL;
5070     + struct dentry *old_parent, *new_parent;
5071     + char *name = NULL;
5072     + bool valid;
5073     +
5074     + unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5075     + old_parent = dget_parent(old_dentry);
5076     + new_parent = dget_parent(new_dentry);
5077     + unionfs_double_lock_parents(old_parent, new_parent);
5078     + unionfs_double_lock_dentry(old_dentry, new_dentry);
5079     +
5080     + valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
5081     + if (unlikely(!valid)) {
5082     + err = -ESTALE;
5083     + goto out;
5084     + }
5085     + if (new_dentry->d_inode) {
5086     + valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
5087     + if (unlikely(!valid)) {
5088     + err = -ESTALE;
5089     + goto out;
5090     + }
5091     + }
5092     +
5093     + lower_new_dentry = unionfs_lower_dentry(new_dentry);
5094     +
5095     + /* check for a whiteout in new dentry branch, and delete it */
5096     + err = check_unlink_whiteout(new_dentry, lower_new_dentry,
5097     + dbstart(new_dentry));
5098     + if (err > 0) { /* whiteout found and removed successfully */
5099     + lower_dir_dentry = dget_parent(lower_new_dentry);
5100     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
5101     + dput(lower_dir_dentry);
5102     + dir->i_nlink = unionfs_get_nlinks(dir);
5103     + err = 0;
5104     + }
5105     + if (err)
5106     + goto out;
5107     +
5108     + /* check if parent hierachy is needed, then link in same branch */
5109     + if (dbstart(old_dentry) != dbstart(new_dentry)) {
5110     + lower_new_dentry = create_parents(dir, new_dentry,
5111     + new_dentry->d_name.name,
5112     + dbstart(old_dentry));
5113     + err = PTR_ERR(lower_new_dentry);
5114     + if (IS_COPYUP_ERR(err))
5115     + goto docopyup;
5116     + if (!lower_new_dentry || IS_ERR(lower_new_dentry))
5117     + goto out;
5118     + }
5119     + lower_new_dentry = unionfs_lower_dentry(new_dentry);
5120     + lower_old_dentry = unionfs_lower_dentry(old_dentry);
5121     +
5122     + BUG_ON(dbstart(old_dentry) != dbstart(new_dentry));
5123     + lower_dir_dentry = lock_parent(lower_new_dentry);
5124     + err = is_robranch(old_dentry);
5125     + if (!err) {
5126     + /* see Documentation/filesystems/unionfs/issues.txt */
5127     + lockdep_off();
5128     + err = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
5129     + lower_new_dentry);
5130     + lockdep_on();
5131     + }
5132     + unlock_dir(lower_dir_dentry);
5133     +
5134     +docopyup:
5135     + if (IS_COPYUP_ERR(err)) {
5136     + int old_bstart = dbstart(old_dentry);
5137     + int bindex;
5138     +
5139     + for (bindex = old_bstart - 1; bindex >= 0; bindex--) {
5140     + err = copyup_dentry(old_parent->d_inode,
5141     + old_dentry, old_bstart,
5142     + bindex, old_dentry->d_name.name,
5143     + old_dentry->d_name.len, NULL,
5144     + i_size_read(old_dentry->d_inode));
5145     + if (err)
5146     + continue;
5147     + lower_new_dentry =
5148     + create_parents(dir, new_dentry,
5149     + new_dentry->d_name.name,
5150     + bindex);
5151     + lower_old_dentry = unionfs_lower_dentry(old_dentry);
5152     + lower_dir_dentry = lock_parent(lower_new_dentry);
5153     + /* see Documentation/filesystems/unionfs/issues.txt */
5154     + lockdep_off();
5155     + /* do vfs_link */
5156     + err = vfs_link(lower_old_dentry,
5157     + lower_dir_dentry->d_inode,
5158     + lower_new_dentry);
5159     + lockdep_on();
5160     + unlock_dir(lower_dir_dentry);
5161     + goto check_link;
5162     + }
5163     + goto out;
5164     + }
5165     +
5166     +check_link:
5167     + if (err || !lower_new_dentry->d_inode)
5168     + goto out;
5169     +
5170     + /* Its a hard link, so use the same inode */
5171     + new_dentry->d_inode = igrab(old_dentry->d_inode);
5172     + d_add(new_dentry, new_dentry->d_inode);
5173     + unionfs_copy_attr_all(dir, lower_new_dentry->d_parent->d_inode);
5174     + fsstack_copy_inode_size(dir, lower_new_dentry->d_parent->d_inode);
5175     +
5176     + /* propagate number of hard-links */
5177     + old_dentry->d_inode->i_nlink = unionfs_get_nlinks(old_dentry->d_inode);
5178     + /* new dentry's ctime may have changed due to hard-link counts */
5179     + unionfs_copy_attr_times(new_dentry->d_inode);
5180     +
5181     +out:
5182     + if (!new_dentry->d_inode)
5183     + d_drop(new_dentry);
5184     +
5185     + kfree(name);
5186     + if (!err)
5187     + unionfs_postcopyup_setmnt(new_dentry);
5188     +
5189     + unionfs_check_inode(dir);
5190     + unionfs_check_dentry(new_dentry);
5191     + unionfs_check_dentry(old_dentry);
5192     +
5193     + unionfs_double_unlock_dentry(old_dentry, new_dentry);
5194     + unionfs_double_unlock_parents(old_parent, new_parent);
5195     + dput(new_parent);
5196     + dput(old_parent);
5197     + unionfs_read_unlock(old_dentry->d_sb);
5198     +
5199     + return err;
5200     +}
5201     +
5202     +static int unionfs_symlink(struct inode *dir, struct dentry *dentry,
5203     + const char *symname)
5204     +{
5205     + int err = 0;
5206     + struct dentry *lower_dentry = NULL;
5207     + struct dentry *wh_dentry = NULL;
5208     + struct dentry *lower_parent_dentry = NULL;
5209     + struct dentry *parent;
5210     + char *name = NULL;
5211     + int valid = 0;
5212     + umode_t mode;
5213     +
5214     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5215     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5216     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5217     +
5218     + valid = __unionfs_d_revalidate(dentry, parent, false);
5219     + if (unlikely(!valid)) {
5220     + err = -ESTALE;
5221     + goto out;
5222     + }
5223     +
5224     + /*
5225     + * It's only a bug if this dentry was not negative and couldn't be
5226     + * revalidated (shouldn't happen).
5227     + */
5228     + BUG_ON(!valid && dentry->d_inode);
5229     +
5230     + lower_dentry = find_writeable_branch(dir, dentry);
5231     + if (IS_ERR(lower_dentry)) {
5232     + err = PTR_ERR(lower_dentry);
5233     + goto out;
5234     + }
5235     +
5236     + lower_parent_dentry = lock_parent(lower_dentry);
5237     + if (IS_ERR(lower_parent_dentry)) {
5238     + err = PTR_ERR(lower_parent_dentry);
5239     + goto out_unlock;
5240     + }
5241     +
5242     + mode = S_IALLUGO;
5243     + err = vfs_symlink(lower_parent_dentry->d_inode, lower_dentry, symname);
5244     + if (!err) {
5245     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5246     + if (!err) {
5247     + unionfs_copy_attr_times(dir);
5248     + fsstack_copy_inode_size(dir,
5249     + lower_parent_dentry->d_inode);
5250     + /* update no. of links on parent directory */
5251     + dir->i_nlink = unionfs_get_nlinks(dir);
5252     + }
5253     + }
5254     +
5255     +out_unlock:
5256     + unlock_dir(lower_parent_dentry);
5257     +out:
5258     + dput(wh_dentry);
5259     + kfree(name);
5260     +
5261     + if (!err) {
5262     + unionfs_postcopyup_setmnt(dentry);
5263     + unionfs_check_inode(dir);
5264     + unionfs_check_dentry(dentry);
5265     + }
5266     + unionfs_unlock_dentry(dentry);
5267     + unionfs_unlock_parent(dentry, parent);
5268     + unionfs_read_unlock(dentry->d_sb);
5269     + return err;
5270     +}
5271     +
5272     +static int unionfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
5273     +{
5274     + int err = 0;
5275     + struct dentry *lower_dentry = NULL;
5276     + struct dentry *lower_parent_dentry = NULL;
5277     + struct dentry *parent;
5278     + int bindex = 0, bstart;
5279     + char *name = NULL;
5280     + int valid;
5281     +
5282     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5283     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5284     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5285     +
5286     + valid = __unionfs_d_revalidate(dentry, parent, false);
5287     + if (unlikely(!valid)) {
5288     + err = -ESTALE; /* same as what real_lookup does */
5289     + goto out;
5290     + }
5291     +
5292     + bstart = dbstart(dentry);
5293     +
5294     + lower_dentry = unionfs_lower_dentry(dentry);
5295     +
5296     + /* check for a whiteout in new dentry branch, and delete it */
5297     + err = check_unlink_whiteout(dentry, lower_dentry, bstart);
5298     + if (err > 0) /* whiteout found and removed successfully */
5299     + err = 0;
5300     + if (err) {
5301     + /* exit if the error returned was NOT -EROFS */
5302     + if (!IS_COPYUP_ERR(err))
5303     + goto out;
5304     + bstart--;
5305     + }
5306     +
5307     + /* check if copyup's needed, and mkdir */
5308     + for (bindex = bstart; bindex >= 0; bindex--) {
5309     + int i;
5310     + int bend = dbend(dentry);
5311     +
5312     + if (is_robranch_super(dentry->d_sb, bindex))
5313     + continue;
5314     +
5315     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
5316     + if (!lower_dentry) {
5317     + lower_dentry = create_parents(dir, dentry,
5318     + dentry->d_name.name,
5319     + bindex);
5320     + if (!lower_dentry || IS_ERR(lower_dentry)) {
5321     + printk(KERN_ERR "unionfs: lower dentry "
5322     + " NULL for bindex = %d\n", bindex);
5323     + continue;
5324     + }
5325     + }
5326     +
5327     + lower_parent_dentry = lock_parent(lower_dentry);
5328     +
5329     + if (IS_ERR(lower_parent_dentry)) {
5330     + err = PTR_ERR(lower_parent_dentry);
5331     + goto out;
5332     + }
5333     +
5334     + err = vfs_mkdir(lower_parent_dentry->d_inode, lower_dentry,
5335     + mode);
5336     +
5337     + unlock_dir(lower_parent_dentry);
5338     +
5339     + /* did the mkdir succeed? */
5340     + if (err)
5341     + break;
5342     +
5343     + for (i = bindex + 1; i <= bend; i++) {
5344     + /* XXX: use path_put_lowers? */
5345     + if (unionfs_lower_dentry_idx(dentry, i)) {
5346     + dput(unionfs_lower_dentry_idx(dentry, i));
5347     + unionfs_set_lower_dentry_idx(dentry, i, NULL);
5348     + }
5349     + }
5350     + dbend(dentry) = bindex;
5351     +
5352     + /*
5353     + * Only INTERPOSE_LOOKUP can return a value other than 0 on
5354     + * err.
5355     + */
5356     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5357     + if (!err) {
5358     + unionfs_copy_attr_times(dir);
5359     + fsstack_copy_inode_size(dir,
5360     + lower_parent_dentry->d_inode);
5361     +
5362     + /* update number of links on parent directory */
5363     + dir->i_nlink = unionfs_get_nlinks(dir);
5364     + }
5365     +
5366     + err = make_dir_opaque(dentry, dbstart(dentry));
5367     + if (err) {
5368     + printk(KERN_ERR "unionfs: mkdir: error creating "
5369     + ".wh.__dir_opaque: %d\n", err);
5370     + goto out;
5371     + }
5372     +
5373     + /* we are done! */
5374     + break;
5375     + }
5376     +
5377     +out:
5378     + if (!dentry->d_inode)
5379     + d_drop(dentry);
5380     +
5381     + kfree(name);
5382     +
5383     + if (!err) {
5384     + unionfs_copy_attr_times(dentry->d_inode);
5385     + unionfs_postcopyup_setmnt(dentry);
5386     + }
5387     + unionfs_check_inode(dir);
5388     + unionfs_check_dentry(dentry);
5389     + unionfs_unlock_dentry(dentry);
5390     + unionfs_unlock_parent(dentry, parent);
5391     + unionfs_read_unlock(dentry->d_sb);
5392     +
5393     + return err;
5394     +}
5395     +
5396     +static int unionfs_mknod(struct inode *dir, struct dentry *dentry, int mode,
5397     + dev_t dev)
5398     +{
5399     + int err = 0;
5400     + struct dentry *lower_dentry = NULL;
5401     + struct dentry *wh_dentry = NULL;
5402     + struct dentry *lower_parent_dentry = NULL;
5403     + struct dentry *parent;
5404     + char *name = NULL;
5405     + int valid = 0;
5406     +
5407     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5408     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5409     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5410     +
5411     + valid = __unionfs_d_revalidate(dentry, parent, false);
5412     + if (unlikely(!valid)) {
5413     + err = -ESTALE;
5414     + goto out;
5415     + }
5416     +
5417     + /*
5418     + * It's only a bug if this dentry was not negative and couldn't be
5419     + * revalidated (shouldn't happen).
5420     + */
5421     + BUG_ON(!valid && dentry->d_inode);
5422     +
5423     + lower_dentry = find_writeable_branch(dir, dentry);
5424     + if (IS_ERR(lower_dentry)) {
5425     + err = PTR_ERR(lower_dentry);
5426     + goto out;
5427     + }
5428     +
5429     + lower_parent_dentry = lock_parent(lower_dentry);
5430     + if (IS_ERR(lower_parent_dentry)) {
5431     + err = PTR_ERR(lower_parent_dentry);
5432     + goto out_unlock;
5433     + }
5434     +
5435     + err = vfs_mknod(lower_parent_dentry->d_inode, lower_dentry, mode, dev);
5436     + if (!err) {
5437     + err = PTR_ERR(unionfs_interpose(dentry, dir->i_sb, 0));
5438     + if (!err) {
5439     + unionfs_copy_attr_times(dir);
5440     + fsstack_copy_inode_size(dir,
5441     + lower_parent_dentry->d_inode);
5442     + /* update no. of links on parent directory */
5443     + dir->i_nlink = unionfs_get_nlinks(dir);
5444     + }
5445     + }
5446     +
5447     +out_unlock:
5448     + unlock_dir(lower_parent_dentry);
5449     +out:
5450     + dput(wh_dentry);
5451     + kfree(name);
5452     +
5453     + if (!err) {
5454     + unionfs_postcopyup_setmnt(dentry);
5455     + unionfs_check_inode(dir);
5456     + unionfs_check_dentry(dentry);
5457     + }
5458     + unionfs_unlock_dentry(dentry);
5459     + unionfs_unlock_parent(dentry, parent);
5460     + unionfs_read_unlock(dentry->d_sb);
5461     + return err;
5462     +}
5463     +
5464     +/* requires sb, dentry, and parent to already be locked */
5465     +static int __unionfs_readlink(struct dentry *dentry, char __user *buf,
5466     + int bufsiz)
5467     +{
5468     + int err;
5469     + struct dentry *lower_dentry;
5470     +
5471     + lower_dentry = unionfs_lower_dentry(dentry);
5472     +
5473     + if (!lower_dentry->d_inode->i_op ||
5474     + !lower_dentry->d_inode->i_op->readlink) {
5475     + err = -EINVAL;
5476     + goto out;
5477     + }
5478     +
5479     + err = lower_dentry->d_inode->i_op->readlink(lower_dentry,
5480     + buf, bufsiz);
5481     + if (err >= 0)
5482     + fsstack_copy_attr_atime(dentry->d_inode,
5483     + lower_dentry->d_inode);
5484     +
5485     +out:
5486     + return err;
5487     +}
5488     +
5489     +static int unionfs_readlink(struct dentry *dentry, char __user *buf,
5490     + int bufsiz)
5491     +{
5492     + int err;
5493     + struct dentry *parent;
5494     +
5495     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5496     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5497     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5498     +
5499     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5500     + err = -ESTALE;
5501     + goto out;
5502     + }
5503     +
5504     + err = __unionfs_readlink(dentry, buf, bufsiz);
5505     +
5506     +out:
5507     + unionfs_check_dentry(dentry);
5508     + unionfs_unlock_dentry(dentry);
5509     + unionfs_unlock_parent(dentry, parent);
5510     + unionfs_read_unlock(dentry->d_sb);
5511     +
5512     + return err;
5513     +}
5514     +
5515     +static void *unionfs_follow_link(struct dentry *dentry, struct nameidata *nd)
5516     +{
5517     + char *buf;
5518     + int len = PAGE_SIZE, err;
5519     + mm_segment_t old_fs;
5520     + struct dentry *parent;
5521     +
5522     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5523     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5524     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5525     +
5526     + /* This is freed by the put_link method assuming a successful call. */
5527     + buf = kmalloc(len, GFP_KERNEL);
5528     + if (unlikely(!buf)) {
5529     + err = -ENOMEM;
5530     + goto out;
5531     + }
5532     +
5533     + /* read the symlink, and then we will follow it */
5534     + old_fs = get_fs();
5535     + set_fs(KERNEL_DS);
5536     + err = __unionfs_readlink(dentry, buf, len);
5537     + set_fs(old_fs);
5538     + if (err < 0) {
5539     + kfree(buf);
5540     + buf = NULL;
5541     + goto out;
5542     + }
5543     + buf[err] = 0;
5544     + nd_set_link(nd, buf);
5545     + err = 0;
5546     +
5547     +out:
5548     + if (err >= 0) {
5549     + unionfs_check_nd(nd);
5550     + unionfs_check_dentry(dentry);
5551     + }
5552     +
5553     + unionfs_unlock_dentry(dentry);
5554     + unionfs_unlock_parent(dentry, parent);
5555     + unionfs_read_unlock(dentry->d_sb);
5556     +
5557     + return ERR_PTR(err);
5558     +}
5559     +
5560     +/* this @nd *IS* still used */
5561     +static void unionfs_put_link(struct dentry *dentry, struct nameidata *nd,
5562     + void *cookie)
5563     +{
5564     + struct dentry *parent;
5565     + char *buf;
5566     +
5567     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5568     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5569     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5570     +
5571     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false)))
5572     + printk(KERN_ERR
5573     + "unionfs: put_link failed to revalidate dentry\n");
5574     +
5575     + unionfs_check_dentry(dentry);
5576     +#if 0
5577     + /* XXX: can't run this check b/c this fxn can receive a poisoned 'nd' PTR */
5578     + unionfs_check_nd(nd);
5579     +#endif
5580     + buf = nd_get_link(nd);
5581     + if (!IS_ERR(buf))
5582     + kfree(buf);
5583     + unionfs_unlock_dentry(dentry);
5584     + unionfs_unlock_parent(dentry, parent);
5585     + unionfs_read_unlock(dentry->d_sb);
5586     +}
5587     +
5588     +/*
5589     + * This is a variant of fs/namei.c:permission() or inode_permission() which
5590     + * skips over EROFS tests (because we perform copyup on EROFS).
5591     + */
5592     +static int __inode_permission(struct inode *inode, int mask, unsigned int flags)
5593     +{
5594     + int retval;
5595     +
5596     + /* nobody gets write access to an immutable file */
5597     + if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
5598     + return -EACCES;
5599     +
5600     + /* Ordinary permission routines do not understand MAY_APPEND. */
5601     + if (inode->i_op && inode->i_op->permission) {
5602     + retval = inode->i_op->permission(inode, mask, flags);
5603     + if (!retval) {
5604     + /*
5605     + * Exec permission on a regular file is denied if none
5606     + * of the execute bits are set.
5607     + *
5608     + * This check should be done by the ->permission()
5609     + * method.
5610     + */
5611     + if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode) &&
5612     + !(inode->i_mode & S_IXUGO))
5613     + return -EACCES;
5614     + }
5615     + } else {
5616     + retval = generic_permission(inode, mask, flags, NULL);
5617     + }
5618     + if (retval)
5619     + return retval;
5620     +
5621     + return security_inode_permission(inode,
5622     + mask & (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND));
5623     +}
5624     +
5625     +/*
5626     + * Don't grab the superblock read-lock in unionfs_permission, which prevents
5627     + * a deadlock with the branch-management "add branch" code (which grabbed
5628     + * the write lock). It is safe to not grab the read lock here, because even
5629     + * with branch management taking place, there is no chance that
5630     + * unionfs_permission, or anything it calls, will use stale branch
5631     + * information.
5632     + */
5633     +static int unionfs_permission(struct inode *inode, int mask, unsigned int flags)
5634     +{
5635     + struct inode *lower_inode = NULL;
5636     + int err = 0;
5637     + int bindex, bstart, bend;
5638     + int is_file;
5639     + const int write_mask = (mask & MAY_WRITE) && !(mask & MAY_READ);
5640     + struct inode *inode_grabbed;
5641     + struct dentry *dentry;
5642     +
5643     + if (flags & IPERM_FLAG_RCU) {
5644     + err = -ECHILD;
5645     + goto out_nograb;
5646     + }
5647     +
5648     + dentry = d_find_alias(inode);
5649     + if (dentry)
5650     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5651     +
5652     + inode_grabbed = igrab(inode);
5653     + is_file = !S_ISDIR(inode->i_mode);
5654     +
5655     + if (!UNIONFS_I(inode)->lower_inodes) {
5656     + if (is_file) /* dirs can be unlinked but chdir'ed to */
5657     + err = -ESTALE; /* force revalidate */
5658     + goto out;
5659     + }
5660     + bstart = ibstart(inode);
5661     + bend = ibend(inode);
5662     + if (unlikely(bstart < 0 || bend < 0)) {
5663     + /*
5664     + * With branch-management, we can get a stale inode here.
5665     + * If so, we return ESTALE back to link_path_walk, which
5666     + * would discard the dcache entry and re-lookup the
5667     + * dentry+inode. This should be equivalent to issuing
5668     + * __unionfs_d_revalidate_chain on nd.dentry here.
5669     + */
5670     + if (is_file) /* dirs can be unlinked but chdir'ed to */
5671     + err = -ESTALE; /* force revalidate */
5672     + goto out;
5673     + }
5674     +
5675     + for (bindex = bstart; bindex <= bend; bindex++) {
5676     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
5677     + if (!lower_inode)
5678     + continue;
5679     +
5680     + /*
5681     + * check the condition for D-F-D underlying files/directories,
5682     + * we don't have to check for files, if we are checking for
5683     + * directories.
5684     + */
5685     + if (!is_file && !S_ISDIR(lower_inode->i_mode))
5686     + continue;
5687     +
5688     + /*
5689     + * We check basic permissions, but we ignore any conditions
5690     + * such as readonly file systems or branches marked as
5691     + * readonly, because those conditions should lead to a
5692     + * copyup taking place later on. However, if user never had
5693     + * access to the file, then no copyup could ever take place.
5694     + */
5695     + err = __inode_permission(lower_inode, mask, flags);
5696     + if (err && err != -EACCES && err != EPERM && bindex > 0) {
5697     + umode_t mode = lower_inode->i_mode;
5698     + if ((is_robranch_super(inode->i_sb, bindex) ||
5699     + __is_rdonly(lower_inode)) &&
5700     + (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
5701     + err = 0;
5702     + if (IS_COPYUP_ERR(err))
5703     + err = 0;
5704     + }
5705     +
5706     + /*
5707     + * NFS HACK: NFSv2/3 return EACCES on readonly-exported,
5708     + * locally readonly-mounted file systems, instead of EROFS
5709     + * like other file systems do. So we have no choice here
5710     + * but to intercept this and ignore it for NFS branches
5711     + * marked readonly. Specifically, we avoid using NFS's own
5712     + * "broken" ->permission method, and rely on
5713     + * generic_permission() to do basic checking for us.
5714     + */
5715     + if (err && err == -EACCES &&
5716     + is_robranch_super(inode->i_sb, bindex) &&
5717     + lower_inode->i_sb->s_magic == NFS_SUPER_MAGIC)
5718     + err = generic_permission(lower_inode, mask, flags, NULL);
5719     +
5720     + /*
5721     + * The permissions are an intersection of the overall directory
5722     + * permissions, so we fail if one fails.
5723     + */
5724     + if (err)
5725     + goto out;
5726     +
5727     + /* only the leftmost file matters. */
5728     + if (is_file || write_mask) {
5729     + if (is_file && write_mask) {
5730     + err = get_write_access(lower_inode);
5731     + if (!err)
5732     + put_write_access(lower_inode);
5733     + }
5734     + break;
5735     + }
5736     + }
5737     + /* sync times which may have changed (asynchronously) below */
5738     + unionfs_copy_attr_times(inode);
5739     +
5740     +out:
5741     + unionfs_check_inode(inode);
5742     + if (dentry) {
5743     + unionfs_unlock_dentry(dentry);
5744     + dput(dentry);
5745     + }
5746     + iput(inode_grabbed);
5747     +out_nograb:
5748     + return err;
5749     +}
5750     +
5751     +static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
5752     +{
5753     + int err = 0;
5754     + struct dentry *lower_dentry;
5755     + struct dentry *parent;
5756     + struct inode *inode;
5757     + struct inode *lower_inode;
5758     + int bstart, bend, bindex;
5759     + loff_t size;
5760     + struct iattr lower_ia;
5761     +
5762     + /* check if user has permission to change inode */
5763     + err = inode_change_ok(dentry->d_inode, ia);
5764     + if (err)
5765     + goto out_err;
5766     +
5767     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
5768     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
5769     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
5770     +
5771     + if (unlikely(!__unionfs_d_revalidate(dentry, parent, false))) {
5772     + err = -ESTALE;
5773     + goto out;
5774     + }
5775     +
5776     + bstart = dbstart(dentry);
5777     + bend = dbend(dentry);
5778     + inode = dentry->d_inode;
5779     +
5780     + /*
5781     + * mode change is for clearing setuid/setgid. Allow lower filesystem
5782     + * to reinterpret it in its own way.
5783     + */
5784     + if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
5785     + ia->ia_valid &= ~ATTR_MODE;
5786     +
5787     + lower_dentry = unionfs_lower_dentry(dentry);
5788     + if (!lower_dentry) { /* should never happen after above revalidate */
5789     + err = -EINVAL;
5790     + goto out;
5791     + }
5792     +
5793     + /*
5794     + * Get the lower inode directly from lower dentry, in case ibstart
5795     + * is -1 (which happens when the file is open but unlinked.
5796     + */
5797     + lower_inode = lower_dentry->d_inode;
5798     +
5799     + /* check if user has permission to change lower inode */
5800     + err = inode_change_ok(lower_inode, ia);
5801     + if (err)
5802     + goto out;
5803     +
5804     + /* copyup if the file is on a read only branch */
5805     + if (is_robranch_super(dentry->d_sb, bstart)
5806     + || __is_rdonly(lower_inode)) {
5807     + /* check if we have a branch to copy up to */
5808     + if (bstart <= 0) {
5809     + err = -EACCES;
5810     + goto out;
5811     + }
5812     +
5813     + if (ia->ia_valid & ATTR_SIZE)
5814     + size = ia->ia_size;
5815     + else
5816     + size = i_size_read(inode);
5817     + /* copyup to next available branch */
5818     + for (bindex = bstart - 1; bindex >= 0; bindex--) {
5819     + err = copyup_dentry(parent->d_inode,
5820     + dentry, bstart, bindex,
5821     + dentry->d_name.name,
5822     + dentry->d_name.len,
5823     + NULL, size);
5824     + if (!err)
5825     + break;
5826     + }
5827     + if (err)
5828     + goto out;
5829     + /* get updated lower_dentry/inode after copyup */
5830     + lower_dentry = unionfs_lower_dentry(dentry);
5831     + lower_inode = unionfs_lower_inode(inode);
5832     + /*
5833     + * check for whiteouts in writeable branch, and remove them
5834     + * if necessary.
5835     + */
5836     + if (lower_dentry) {
5837     + err = check_unlink_whiteout(dentry, lower_dentry,
5838     + bindex);
5839     + if (err > 0) /* ignore if whiteout found and removed */
5840     + err = 0;
5841     + }
5842     + }
5843     +
5844     + /*
5845     + * If shrinking, first truncate upper level to cancel writing dirty
5846     + * pages beyond the new eof; and also if its' maxbytes is more
5847     + * limiting (fail with -EFBIG before making any change to the lower
5848     + * level). There is no need to vmtruncate the upper level
5849     + * afterwards in the other cases: we fsstack_copy_inode_size from
5850     + * the lower level.
5851     + */
5852     + if (ia->ia_valid & ATTR_SIZE) {
5853     + size = i_size_read(inode);
5854     + if (ia->ia_size < size || (ia->ia_size > size &&
5855     + inode->i_sb->s_maxbytes < lower_inode->i_sb->s_maxbytes)) {
5856     + err = vmtruncate(inode, ia->ia_size);
5857     + if (err)
5858     + goto out;
5859     + }
5860     + }
5861     +
5862     + /* notify the (possibly copied-up) lower inode */
5863     + /*
5864     + * Note: we use lower_dentry->d_inode, because lower_inode may be
5865     + * unlinked (no inode->i_sb and i_ino==0. This happens if someone
5866     + * tries to open(), unlink(), then ftruncate() a file.
5867     + */
5868     + /* prepare our own lower struct iattr (with our own lower file) */
5869     + memcpy(&lower_ia, ia, sizeof(lower_ia));
5870     + if (ia->ia_valid & ATTR_FILE) {
5871     + lower_ia.ia_file = unionfs_lower_file(ia->ia_file);
5872     + BUG_ON(!lower_ia.ia_file); // XXX?
5873     + }
5874     +
5875     + mutex_lock(&lower_dentry->d_inode->i_mutex);
5876     + err = notify_change(lower_dentry, &lower_ia);
5877     + mutex_unlock(&lower_dentry->d_inode->i_mutex);
5878     + if (err)
5879     + goto out;
5880     +
5881     + /* get attributes from the first lower inode */
5882     + if (ibstart(inode) >= 0)
5883     + unionfs_copy_attr_all(inode, lower_inode);
5884     + /*
5885     + * unionfs_copy_attr_all will copy the lower times to our inode if
5886     + * the lower ones are newer (useful for cache coherency). However,
5887     + * ->setattr is the only place in which we may have to copy the
5888     + * lower inode times absolutely, to support utimes(2).
5889     + */
5890     + if (ia->ia_valid & ATTR_MTIME_SET)
5891     + inode->i_mtime = lower_inode->i_mtime;
5892     + if (ia->ia_valid & ATTR_CTIME)
5893     + inode->i_ctime = lower_inode->i_ctime;
5894     + if (ia->ia_valid & ATTR_ATIME_SET)
5895     + inode->i_atime = lower_inode->i_atime;
5896     + fsstack_copy_inode_size(inode, lower_inode);
5897     +
5898     +out:
5899     + if (!err)
5900     + unionfs_check_dentry(dentry);
5901     + unionfs_unlock_dentry(dentry);
5902     + unionfs_unlock_parent(dentry, parent);
5903     + unionfs_read_unlock(dentry->d_sb);
5904     +out_err:
5905     + return err;
5906     +}
5907     +
5908     +struct inode_operations unionfs_symlink_iops = {
5909     + .readlink = unionfs_readlink,
5910     + .permission = unionfs_permission,
5911     + .follow_link = unionfs_follow_link,
5912     + .setattr = unionfs_setattr,
5913     + .put_link = unionfs_put_link,
5914     +};
5915     +
5916     +struct inode_operations unionfs_dir_iops = {
5917     + .create = unionfs_create,
5918     + .lookup = unionfs_lookup,
5919     + .link = unionfs_link,
5920     + .unlink = unionfs_unlink,
5921     + .symlink = unionfs_symlink,
5922     + .mkdir = unionfs_mkdir,
5923     + .rmdir = unionfs_rmdir,
5924     + .mknod = unionfs_mknod,
5925     + .rename = unionfs_rename,
5926     + .permission = unionfs_permission,
5927     + .setattr = unionfs_setattr,
5928     +#ifdef CONFIG_UNION_FS_XATTR
5929     + .setxattr = unionfs_setxattr,
5930     + .getxattr = unionfs_getxattr,
5931     + .removexattr = unionfs_removexattr,
5932     + .listxattr = unionfs_listxattr,
5933     +#endif /* CONFIG_UNION_FS_XATTR */
5934     +};
5935     +
5936     +struct inode_operations unionfs_main_iops = {
5937     + .permission = unionfs_permission,
5938     + .setattr = unionfs_setattr,
5939     +#ifdef CONFIG_UNION_FS_XATTR
5940     + .setxattr = unionfs_setxattr,
5941     + .getxattr = unionfs_getxattr,
5942     + .removexattr = unionfs_removexattr,
5943     + .listxattr = unionfs_listxattr,
5944     +#endif /* CONFIG_UNION_FS_XATTR */
5945     +};
5946     diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
5947     new file mode 100644
5948     index 0000000..3cbde56
5949     --- /dev/null
5950     +++ b/fs/unionfs/lookup.c
5951     @@ -0,0 +1,569 @@
5952     +/*
5953     + * Copyright (c) 2003-2011 Erez Zadok
5954     + * Copyright (c) 2003-2006 Charles P. Wright
5955     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
5956     + * Copyright (c) 2005-2006 Junjiro Okajima
5957     + * Copyright (c) 2005 Arun M. Krishnakumar
5958     + * Copyright (c) 2004-2006 David P. Quigley
5959     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
5960     + * Copyright (c) 2003 Puja Gupta
5961     + * Copyright (c) 2003 Harikesavan Krishnan
5962     + * Copyright (c) 2003-2011 Stony Brook University
5963     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
5964     + *
5965     + * This program is free software; you can redistribute it and/or modify
5966     + * it under the terms of the GNU General Public License version 2 as
5967     + * published by the Free Software Foundation.
5968     + */
5969     +
5970     +#include "union.h"
5971     +
5972     +/*
5973     + * Lookup one path component @name relative to a <base,mnt> path pair.
5974     + * Behaves nearly the same as lookup_one_len (i.e., return negative dentry
5975     + * on ENOENT), but uses the @mnt passed, so it can cross bind mounts and
5976     + * other lower mounts properly. If @new_mnt is non-null, will fill in the
5977     + * new mnt there. Caller is responsible to dput/mntput/path_put returned
5978     + * @dentry and @new_mnt.
5979     + */
5980     +struct dentry *__lookup_one(struct dentry *base, struct vfsmount *mnt,
5981     + const char *name, struct vfsmount **new_mnt)
5982     +{
5983     + struct dentry *dentry = NULL;
5984     + struct nameidata lower_nd;
5985     + int err;
5986     +
5987     + /* we use flags=0 to get basic lookup */
5988     + err = vfs_path_lookup(base, mnt, name, 0, &lower_nd);
5989     +
5990     + switch (err) {
5991     + case 0: /* no error */
5992     + dentry = lower_nd.path.dentry;
5993     + if (new_mnt)
5994     + *new_mnt = lower_nd.path.mnt; /* rc already inc'ed */
5995     + break;
5996     + case -ENOENT:
5997     + /*
5998     + * We don't consider ENOENT an error, and we want to return
5999     + * a negative dentry (ala lookup_one_len). As we know
6000     + * there was no inode for this name before (-ENOENT), then
6001     + * it's safe to call lookup_one_len (which doesn't take a
6002     + * vfsmount).
6003     + */
6004     + dentry = lookup_lck_len(name, base, strlen(name));
6005     + if (new_mnt)
6006     + *new_mnt = mntget(lower_nd.path.mnt);
6007     + break;
6008     + default: /* all other real errors */
6009     + dentry = ERR_PTR(err);
6010     + break;
6011     + }
6012     +
6013     + return dentry;
6014     +}
6015     +
6016     +/*
6017     + * This is a utility function that fills in a unionfs dentry.
6018     + * Caller must lock this dentry with unionfs_lock_dentry.
6019     + *
6020     + * Returns: 0 (ok), or -ERRNO if an error occurred.
6021     + * XXX: get rid of _partial_lookup and make callers call _lookup_full directly
6022     + */
6023     +int unionfs_partial_lookup(struct dentry *dentry, struct dentry *parent)
6024     +{
6025     + struct dentry *tmp;
6026     + int err = -ENOSYS;
6027     +
6028     + tmp = unionfs_lookup_full(dentry, parent, INTERPOSE_PARTIAL);
6029     +
6030     + if (!tmp) {
6031     + err = 0;
6032     + goto out;
6033     + }
6034     + if (IS_ERR(tmp)) {
6035     + err = PTR_ERR(tmp);
6036     + goto out;
6037     + }
6038     + /* XXX: need to change the interface */
6039     + BUG_ON(tmp != dentry);
6040     +out:
6041     + return err;
6042     +}
6043     +
6044     +/* The dentry cache is just so we have properly sized dentries. */
6045     +static struct kmem_cache *unionfs_dentry_cachep;
6046     +int unionfs_init_dentry_cache(void)
6047     +{
6048     + unionfs_dentry_cachep =
6049     + kmem_cache_create("unionfs_dentry",
6050     + sizeof(struct unionfs_dentry_info),
6051     + 0, SLAB_RECLAIM_ACCOUNT, NULL);
6052     +
6053     + return (unionfs_dentry_cachep ? 0 : -ENOMEM);
6054     +}
6055     +
6056     +void unionfs_destroy_dentry_cache(void)
6057     +{
6058     + if (unionfs_dentry_cachep)
6059     + kmem_cache_destroy(unionfs_dentry_cachep);
6060     +}
6061     +
6062     +void free_dentry_private_data(struct dentry *dentry)
6063     +{
6064     + if (!dentry || !dentry->d_fsdata)
6065     + return;
6066     + kfree(UNIONFS_D(dentry)->lower_paths);
6067     + UNIONFS_D(dentry)->lower_paths = NULL;
6068     + kmem_cache_free(unionfs_dentry_cachep, dentry->d_fsdata);
6069     + dentry->d_fsdata = NULL;
6070     +}
6071     +
6072     +static inline int __realloc_dentry_private_data(struct dentry *dentry)
6073     +{
6074     + struct unionfs_dentry_info *info = UNIONFS_D(dentry);
6075     + void *p;
6076     + int size;
6077     +
6078     + BUG_ON(!info);
6079     +
6080     + size = sizeof(struct path) * sbmax(dentry->d_sb);
6081     + p = krealloc(info->lower_paths, size, GFP_ATOMIC);
6082     + if (unlikely(!p))
6083     + return -ENOMEM;
6084     +
6085     + info->lower_paths = p;
6086     +
6087     + info->bstart = -1;
6088     + info->bend = -1;
6089     + info->bopaque = -1;
6090     + info->bcount = sbmax(dentry->d_sb);
6091     + atomic_set(&info->generation,
6092     + atomic_read(&UNIONFS_SB(dentry->d_sb)->generation));
6093     +
6094     + memset(info->lower_paths, 0, size);
6095     +
6096     + return 0;
6097     +}
6098     +
6099     +/* UNIONFS_D(dentry)->lock must be locked */
6100     +int realloc_dentry_private_data(struct dentry *dentry)
6101     +{
6102     + if (!__realloc_dentry_private_data(dentry))
6103     + return 0;
6104     +
6105     + kfree(UNIONFS_D(dentry)->lower_paths);
6106     + free_dentry_private_data(dentry);
6107     + return -ENOMEM;
6108     +}
6109     +
6110     +/* allocate new dentry private data */
6111     +int new_dentry_private_data(struct dentry *dentry, int subclass)
6112     +{
6113     + struct unionfs_dentry_info *info = UNIONFS_D(dentry);
6114     +
6115     + BUG_ON(info);
6116     +
6117     + info = kmem_cache_alloc(unionfs_dentry_cachep, GFP_ATOMIC);
6118     + if (unlikely(!info))
6119     + return -ENOMEM;
6120     +
6121     + mutex_init(&info->lock);
6122     + mutex_lock_nested(&info->lock, subclass);
6123     +
6124     + info->lower_paths = NULL;
6125     +
6126     + dentry->d_fsdata = info;
6127     +
6128     + if (!__realloc_dentry_private_data(dentry))
6129     + return 0;
6130     +
6131     + mutex_unlock(&info->lock);
6132     + free_dentry_private_data(dentry);
6133     + return -ENOMEM;
6134     +}
6135     +
6136     +/*
6137     + * scan through the lower dentry objects, and set bstart to reflect the
6138     + * starting branch
6139     + */
6140     +void update_bstart(struct dentry *dentry)
6141     +{
6142     + int bindex;
6143     + int bstart = dbstart(dentry);
6144     + int bend = dbend(dentry);
6145     + struct dentry *lower_dentry;
6146     +
6147     + for (bindex = bstart; bindex <= bend; bindex++) {
6148     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6149     + if (!lower_dentry)
6150     + continue;
6151     + if (lower_dentry->d_inode) {
6152     + dbstart(dentry) = bindex;
6153     + break;
6154     + }
6155     + dput(lower_dentry);
6156     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
6157     + }
6158     +}
6159     +
6160     +
6161     +/*
6162     + * Initialize a nameidata structure (the intent part) we can pass to a lower
6163     + * file system. Returns 0 on success or -error (only -ENOMEM possible).
6164     + * Inside that nd structure, this function may also return an allocated
6165     + * struct file (for open intents). The caller, when done with this nd, must
6166     + * kfree the intent file (using release_lower_nd).
6167     + *
6168     + * XXX: this code, and the callers of this code, should be redone using
6169     + * vfs_path_lookup() when (1) the nameidata structure is refactored into a
6170     + * separate intent-structure, and (2) open_namei() is broken into a VFS-only
6171     + * function and a method that other file systems can call.
6172     + */
6173     +int init_lower_nd(struct nameidata *nd, unsigned int flags)
6174     +{
6175     + int err = 0;
6176     +#ifdef ALLOC_LOWER_ND_FILE
6177     + /*
6178     + * XXX: one day we may need to have the lower return an open file
6179     + * for us. It is not needed in 2.6.23-rc1 for nfs2/nfs3, but may
6180     + * very well be needed for nfs4.
6181     + */
6182     + struct file *file;
6183     +#endif /* ALLOC_LOWER_ND_FILE */
6184     +
6185     + memset(nd, 0, sizeof(struct nameidata));
6186     + if (!flags)
6187     + return err;
6188     +
6189     + switch (flags) {
6190     + case LOOKUP_CREATE:
6191     + nd->intent.open.flags |= O_CREAT;
6192     + /* fall through: shared code for create/open cases */
6193     + case LOOKUP_OPEN:
6194     + nd->flags = flags;
6195     + nd->intent.open.flags |= (FMODE_READ | FMODE_WRITE);
6196     +#ifdef ALLOC_LOWER_ND_FILE
6197     + file = kzalloc(sizeof(struct file), GFP_KERNEL);
6198     + if (unlikely(!file)) {
6199     + err = -ENOMEM;
6200     + break; /* exit switch statement and thus return */
6201     + }
6202     + nd->intent.open.file = file;
6203     +#endif /* ALLOC_LOWER_ND_FILE */
6204     + break;
6205     + default:
6206     + /*
6207     + * We should never get here, for now.
6208     + * We can add new cases here later on.
6209     + */
6210     + pr_debug("unionfs: unknown nameidata flag 0x%x\n", flags);
6211     + BUG();
6212     + break;
6213     + }
6214     +
6215     + return err;
6216     +}
6217     +
6218     +void release_lower_nd(struct nameidata *nd, int err)
6219     +{
6220     + if (!nd->intent.open.file)
6221     + return;
6222     + else if (!err)
6223     + release_open_intent(nd);
6224     +#ifdef ALLOC_LOWER_ND_FILE
6225     + kfree(nd->intent.open.file);
6226     +#endif /* ALLOC_LOWER_ND_FILE */
6227     +}
6228     +
6229     +/*
6230     + * Main (and complex) driver function for Unionfs's lookup
6231     + *
6232     + * Returns: NULL (ok), ERR_PTR if an error occurred, or a non-null non-error
6233     + * PTR if d_splice returned a different dentry.
6234     + *
6235     + * If lookupmode is INTERPOSE_PARTIAL/REVAL/REVAL_NEG, the passed dentry's
6236     + * inode info must be locked. If lookupmode is INTERPOSE_LOOKUP (i.e., a
6237     + * newly looked-up dentry), then unionfs_lookup_backend will return a locked
6238     + * dentry's info, which the caller must unlock.
6239     + */
6240     +struct dentry *unionfs_lookup_full(struct dentry *dentry,
6241     + struct dentry *parent, int lookupmode)
6242     +{
6243     + int err = 0;
6244     + struct dentry *lower_dentry = NULL;
6245     + struct vfsmount *lower_mnt;
6246     + struct vfsmount *lower_dir_mnt;
6247     + struct dentry *wh_lower_dentry = NULL;
6248     + struct dentry *lower_dir_dentry = NULL;
6249     + struct dentry *d_interposed = NULL;
6250     + int bindex, bstart, bend, bopaque;
6251     + int opaque, num_positive = 0;
6252     + const char *name;
6253     + int namelen;
6254     + int pos_start, pos_end;
6255     +
6256     + /*
6257     + * We should already have a lock on this dentry in the case of a
6258     + * partial lookup, or a revalidation. Otherwise it is returned from
6259     + * new_dentry_private_data already locked.
6260     + */
6261     + verify_locked(dentry);
6262     + verify_locked(parent);
6263     +
6264     + /* must initialize dentry operations */
6265     + dentry->d_op = &unionfs_dops;
6266     +
6267     + /* We never partial lookup the root directory. */
6268     + if (IS_ROOT(dentry))
6269     + goto out;
6270     +
6271     + name = dentry->d_name.name;
6272     + namelen = dentry->d_name.len;
6273     +
6274     + /* No dentries should get created for possible whiteout names. */
6275     + if (!is_validname(name)) {
6276     + err = -EPERM;
6277     + goto out_free;
6278     + }
6279     +
6280     + /* Now start the actual lookup procedure. */
6281     + bstart = dbstart(parent);
6282     + bend = dbend(parent);
6283     + bopaque = dbopaque(parent);
6284     + BUG_ON(bstart < 0);
6285     +
6286     + /* adjust bend to bopaque if needed */
6287     + if ((bopaque >= 0) && (bopaque < bend))
6288     + bend = bopaque;
6289     +
6290     + /* lookup all possible dentries */
6291     + for (bindex = bstart; bindex <= bend; bindex++) {
6292     +
6293     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6294     + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
6295     +
6296     + /* skip if we already have a positive lower dentry */
6297     + if (lower_dentry) {
6298     + if (dbstart(dentry) < 0)
6299     + dbstart(dentry) = bindex;
6300     + if (bindex > dbend(dentry))
6301     + dbend(dentry) = bindex;
6302     + if (lower_dentry->d_inode)
6303     + num_positive++;
6304     + continue;
6305     + }
6306     +
6307     + lower_dir_dentry =
6308     + unionfs_lower_dentry_idx(parent, bindex);
6309     + /* if the lower dentry's parent does not exist, skip this */
6310     + if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6311     + continue;
6312     +
6313     + /* also skip it if the parent isn't a directory. */
6314     + if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6315     + continue; /* XXX: should be BUG_ON */
6316     +
6317     + /* check for whiteouts: stop lookup if found */
6318     + wh_lower_dentry = lookup_whiteout(name, lower_dir_dentry);
6319     + if (IS_ERR(wh_lower_dentry)) {
6320     + err = PTR_ERR(wh_lower_dentry);
6321     + goto out_free;
6322     + }
6323     + if (wh_lower_dentry->d_inode) {
6324     + dbend(dentry) = dbopaque(dentry) = bindex;
6325     + if (dbstart(dentry) < 0)
6326     + dbstart(dentry) = bindex;
6327     + dput(wh_lower_dentry);
6328     + break;
6329     + }
6330     + dput(wh_lower_dentry);
6331     +
6332     + /* Now do regular lookup; lookup @name */
6333     + lower_dir_mnt = unionfs_lower_mnt_idx(parent, bindex);
6334     + lower_mnt = NULL; /* XXX: needed? */
6335     +
6336     + lower_dentry = __lookup_one(lower_dir_dentry, lower_dir_mnt,
6337     + name, &lower_mnt);
6338     +
6339     + if (IS_ERR(lower_dentry)) {
6340     + err = PTR_ERR(lower_dentry);
6341     + goto out_free;
6342     + }
6343     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6344     + if (!lower_mnt)
6345     + lower_mnt = unionfs_mntget(dentry->d_sb->s_root,
6346     + bindex);
6347     + unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6348     +
6349     + /* adjust dbstart/end */
6350     + if (dbstart(dentry) < 0)
6351     + dbstart(dentry) = bindex;
6352     + if (bindex > dbend(dentry))
6353     + dbend(dentry) = bindex;
6354     + /*
6355     + * We always store the lower dentries above, and update
6356     + * dbstart/dbend, even if the whole unionfs dentry is
6357     + * negative (i.e., no lower inodes).
6358     + */
6359     + if (!lower_dentry->d_inode)
6360     + continue;
6361     + num_positive++;
6362     +
6363     + /*
6364     + * check if we just found an opaque directory, if so, stop
6365     + * lookups here.
6366     + */
6367     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
6368     + continue;
6369     + opaque = is_opaque_dir(dentry, bindex);
6370     + if (opaque < 0) {
6371     + err = opaque;
6372     + goto out_free;
6373     + } else if (opaque) {
6374     + dbend(dentry) = dbopaque(dentry) = bindex;
6375     + break;
6376     + }
6377     + dbend(dentry) = bindex;
6378     +
6379     + /* update parent directory's atime with the bindex */
6380     + fsstack_copy_attr_atime(parent->d_inode,
6381     + lower_dir_dentry->d_inode);
6382     + }
6383     +
6384     + /* sanity checks, then decide if to process a negative dentry */
6385     + BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6386     + BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6387     +
6388     + if (num_positive > 0)
6389     + goto out_positive;
6390     +
6391     + /*** handle NEGATIVE dentries ***/
6392     +
6393     + /*
6394     + * If negative, keep only first lower negative dentry, to save on
6395     + * memory.
6396     + */
6397     + if (dbstart(dentry) < dbend(dentry)) {
6398     + path_put_lowers(dentry, dbstart(dentry) + 1,
6399     + dbend(dentry), false);
6400     + dbend(dentry) = dbstart(dentry);
6401     + }
6402     + if (lookupmode == INTERPOSE_PARTIAL)
6403     + goto out;
6404     + if (lookupmode == INTERPOSE_LOOKUP) {
6405     + /*
6406     + * If all we found was a whiteout in the first available
6407     + * branch, then create a negative dentry for a possibly new
6408     + * file to be created.
6409     + */
6410     + if (dbopaque(dentry) < 0)
6411     + goto out;
6412     + /* XXX: need to get mnt here */
6413     + bindex = dbstart(dentry);
6414     + if (unionfs_lower_dentry_idx(dentry, bindex))
6415     + goto out;
6416     + lower_dir_dentry =
6417     + unionfs_lower_dentry_idx(parent, bindex);
6418     + if (!lower_dir_dentry || !lower_dir_dentry->d_inode)
6419     + goto out;
6420     + if (!S_ISDIR(lower_dir_dentry->d_inode->i_mode))
6421     + goto out; /* XXX: should be BUG_ON */
6422     + /* XXX: do we need to cross bind mounts here? */
6423     + lower_dentry = lookup_lck_len(name, lower_dir_dentry, namelen);
6424     + if (IS_ERR(lower_dentry)) {
6425     + err = PTR_ERR(lower_dentry);
6426     + goto out;
6427     + }
6428     + /* XXX: need to mntget/mntput as needed too! */
6429     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
6430     + /* XXX: wrong mnt for crossing bind mounts! */
6431     + lower_mnt = unionfs_mntget(dentry->d_sb->s_root, bindex);
6432     + unionfs_set_lower_mnt_idx(dentry, bindex, lower_mnt);
6433     +
6434     + goto out;
6435     + }
6436     +
6437     + /* if we're revalidating a positive dentry, don't make it negative */
6438     + if (lookupmode != INTERPOSE_REVAL)
6439     + d_add(dentry, NULL);
6440     +
6441     + goto out;
6442     +
6443     +out_positive:
6444     + /*** handle POSITIVE dentries ***/
6445     +
6446     + /*
6447     + * This unionfs dentry is positive (at least one lower inode
6448     + * exists), so scan entire dentry from beginning to end, and remove
6449     + * any negative lower dentries, if any. Then, update dbstart/dbend
6450     + * to reflect the start/end of positive dentries.
6451     + */
6452     + pos_start = pos_end = -1;
6453     + for (bindex = bstart; bindex <= bend; bindex++) {
6454     + lower_dentry = unionfs_lower_dentry_idx(dentry,
6455     + bindex);
6456     + if (lower_dentry && lower_dentry->d_inode) {
6457     + if (pos_start < 0)
6458     + pos_start = bindex;
6459     + if (bindex > pos_end)
6460     + pos_end = bindex;
6461     + continue;
6462     + }
6463     + path_put_lowers(dentry, bindex, bindex, false);
6464     + }
6465     + if (pos_start >= 0)
6466     + dbstart(dentry) = pos_start;
6467     + if (pos_end >= 0)
6468     + dbend(dentry) = pos_end;
6469     +
6470     + /* Partial lookups need to re-interpose, or throw away older negs. */
6471     + if (lookupmode == INTERPOSE_PARTIAL) {
6472     + if (dentry->d_inode) {
6473     + unionfs_reinterpose(dentry);
6474     + goto out;
6475     + }
6476     +
6477     + /*
6478     + * This dentry was positive, so it is as if we had a
6479     + * negative revalidation.
6480     + */
6481     + lookupmode = INTERPOSE_REVAL_NEG;
6482     + update_bstart(dentry);
6483     + }
6484     +
6485     + /*
6486     + * Interpose can return a dentry if d_splice returned a different
6487     + * dentry.
6488     + */
6489     + d_interposed = unionfs_interpose(dentry, dentry->d_sb, lookupmode);
6490     + if (IS_ERR(d_interposed))
6491     + err = PTR_ERR(d_interposed);
6492     + else if (d_interposed)
6493     + dentry = d_interposed;
6494     +
6495     + if (!err)
6496     + goto out;
6497     + d_drop(dentry);
6498     +
6499     +out_free:
6500     + /* should dput/mntput all the underlying dentries on error condition */
6501     + if (dbstart(dentry) >= 0)
6502     + path_put_lowers_all(dentry, false);
6503     + /* free lower_paths unconditionally */
6504     + kfree(UNIONFS_D(dentry)->lower_paths);
6505     + UNIONFS_D(dentry)->lower_paths = NULL;
6506     +
6507     +out:
6508     + if (dentry && UNIONFS_D(dentry)) {
6509     + BUG_ON(dbstart(dentry) < 0 && dbend(dentry) >= 0);
6510     + BUG_ON(dbstart(dentry) >= 0 && dbend(dentry) < 0);
6511     + }
6512     + if (d_interposed && UNIONFS_D(d_interposed)) {
6513     + BUG_ON(dbstart(d_interposed) < 0 && dbend(d_interposed) >= 0);
6514     + BUG_ON(dbstart(d_interposed) >= 0 && dbend(d_interposed) < 0);
6515     + }
6516     +
6517     + if (!err && d_interposed)
6518     + return d_interposed;
6519     + return ERR_PTR(err);
6520     +}
6521     diff --git a/fs/unionfs/main.c b/fs/unionfs/main.c
6522     new file mode 100644
6523     index 0000000..fa52f61
6524     --- /dev/null
6525     +++ b/fs/unionfs/main.c
6526     @@ -0,0 +1,763 @@
6527     +/*
6528     + * Copyright (c) 2003-2011 Erez Zadok
6529     + * Copyright (c) 2003-2006 Charles P. Wright
6530     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
6531     + * Copyright (c) 2005-2006 Junjiro Okajima
6532     + * Copyright (c) 2005 Arun M. Krishnakumar
6533     + * Copyright (c) 2004-2006 David P. Quigley
6534     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
6535     + * Copyright (c) 2003 Puja Gupta
6536     + * Copyright (c) 2003 Harikesavan Krishnan
6537     + * Copyright (c) 2003-2011 Stony Brook University
6538     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
6539     + *
6540     + * This program is free software; you can redistribute it and/or modify
6541     + * it under the terms of the GNU General Public License version 2 as
6542     + * published by the Free Software Foundation.
6543     + */
6544     +
6545     +#include "union.h"
6546     +#include <linux/module.h>
6547     +#include <linux/moduleparam.h>
6548     +
6549     +static void unionfs_fill_inode(struct dentry *dentry,
6550     + struct inode *inode)
6551     +{
6552     + struct inode *lower_inode;
6553     + struct dentry *lower_dentry;
6554     + int bindex, bstart, bend;
6555     +
6556     + bstart = dbstart(dentry);
6557     + bend = dbend(dentry);
6558     +
6559     + for (bindex = bstart; bindex <= bend; bindex++) {
6560     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6561     + if (!lower_dentry) {
6562     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
6563     + continue;
6564     + }
6565     +
6566     + /* Initialize the lower inode to the new lower inode. */
6567     + if (!lower_dentry->d_inode)
6568     + continue;
6569     +
6570     + unionfs_set_lower_inode_idx(inode, bindex,
6571     + igrab(lower_dentry->d_inode));
6572     + }
6573     +
6574     + ibstart(inode) = dbstart(dentry);
6575     + ibend(inode) = dbend(dentry);
6576     +
6577     + /* Use attributes from the first branch. */
6578     + lower_inode = unionfs_lower_inode(inode);
6579     +
6580     + /* Use different set of inode ops for symlinks & directories */
6581     + if (S_ISLNK(lower_inode->i_mode))
6582     + inode->i_op = &unionfs_symlink_iops;
6583     + else if (S_ISDIR(lower_inode->i_mode))
6584     + inode->i_op = &unionfs_dir_iops;
6585     +
6586     + /* Use different set of file ops for directories */
6587     + if (S_ISDIR(lower_inode->i_mode))
6588     + inode->i_fop = &unionfs_dir_fops;
6589     +
6590     + /* properly initialize special inodes */
6591     + if (S_ISBLK(lower_inode->i_mode) || S_ISCHR(lower_inode->i_mode) ||
6592     + S_ISFIFO(lower_inode->i_mode) || S_ISSOCK(lower_inode->i_mode))
6593     + init_special_inode(inode, lower_inode->i_mode,
6594     + lower_inode->i_rdev);
6595     +
6596     + /* all well, copy inode attributes */
6597     + unionfs_copy_attr_all(inode, lower_inode);
6598     + fsstack_copy_inode_size(inode, lower_inode);
6599     +}
6600     +
6601     +/*
6602     + * Connect a unionfs inode dentry/inode with several lower ones. This is
6603     + * the classic stackable file system "vnode interposition" action.
6604     + *
6605     + * @sb: unionfs's super_block
6606     + */
6607     +struct dentry *unionfs_interpose(struct dentry *dentry, struct super_block *sb,
6608     + int flag)
6609     +{
6610     + int err = 0;
6611     + struct inode *inode;
6612     + int need_fill_inode = 1;
6613     + struct dentry *spliced = NULL;
6614     +
6615     + verify_locked(dentry);
6616     +
6617     + /*
6618     + * We allocate our new inode below by calling unionfs_iget,
6619     + * which will initialize some of the new inode's fields
6620     + */
6621     +
6622     + /*
6623     + * On revalidate we've already got our own inode and just need
6624     + * to fix it up.
6625     + */
6626     + if (flag == INTERPOSE_REVAL) {
6627     + inode = dentry->d_inode;
6628     + UNIONFS_I(inode)->bstart = -1;
6629     + UNIONFS_I(inode)->bend = -1;
6630     + atomic_set(&UNIONFS_I(inode)->generation,
6631     + atomic_read(&UNIONFS_SB(sb)->generation));
6632     +
6633     + UNIONFS_I(inode)->lower_inodes =
6634     + kcalloc(sbmax(sb), sizeof(struct inode *), GFP_KERNEL);
6635     + if (unlikely(!UNIONFS_I(inode)->lower_inodes)) {
6636     + err = -ENOMEM;
6637     + goto out;
6638     + }
6639     + } else {
6640     + /* get unique inode number for unionfs */
6641     + inode = unionfs_iget(sb, iunique(sb, UNIONFS_ROOT_INO));
6642     + if (IS_ERR(inode)) {
6643     + err = PTR_ERR(inode);
6644     + goto out;
6645     + }
6646     + if (atomic_read(&inode->i_count) > 1)
6647     + goto skip;
6648     + }
6649     +
6650     + need_fill_inode = 0;
6651     + unionfs_fill_inode(dentry, inode);
6652     +
6653     +skip:
6654     + /* only (our) lookup wants to do a d_add */
6655     + switch (flag) {
6656     + case INTERPOSE_DEFAULT:
6657     + /* for operations which create new inodes */
6658     + d_add(dentry, inode);
6659     + break;
6660     + case INTERPOSE_REVAL_NEG:
6661     + d_instantiate(dentry, inode);
6662     + break;
6663     + case INTERPOSE_LOOKUP:
6664     + spliced = d_splice_alias(inode, dentry);
6665     + if (spliced && spliced != dentry) {
6666     + /*
6667     + * d_splice can return a dentry if it was
6668     + * disconnected and had to be moved. We must ensure
6669     + * that the private data of the new dentry is
6670     + * correct and that the inode info was filled
6671     + * properly. Finally we must return this new
6672     + * dentry.
6673     + */
6674     + spliced->d_op = &unionfs_dops;
6675     + spliced->d_fsdata = dentry->d_fsdata;
6676     + dentry->d_fsdata = NULL;
6677     + dentry = spliced;
6678     + if (need_fill_inode) {
6679     + need_fill_inode = 0;
6680     + unionfs_fill_inode(dentry, inode);
6681     + }
6682     + goto out_spliced;
6683     + } else if (!spliced) {
6684     + if (need_fill_inode) {
6685     + need_fill_inode = 0;
6686     + unionfs_fill_inode(dentry, inode);
6687     + goto out_spliced;
6688     + }
6689     + }
6690     + break;
6691     + case INTERPOSE_REVAL:
6692     + /* Do nothing. */
6693     + break;
6694     + default:
6695     + printk(KERN_CRIT "unionfs: invalid interpose flag passed!\n");
6696     + BUG();
6697     + }
6698     + goto out;
6699     +
6700     +out_spliced:
6701     + if (!err)
6702     + return spliced;
6703     +out:
6704     + return ERR_PTR(err);
6705     +}
6706     +
6707     +/* like interpose above, but for an already existing dentry */
6708     +void unionfs_reinterpose(struct dentry *dentry)
6709     +{
6710     + struct dentry *lower_dentry;
6711     + struct inode *inode;
6712     + int bindex, bstart, bend;
6713     +
6714     + verify_locked(dentry);
6715     +
6716     + /* This is pre-allocated inode */
6717     + inode = dentry->d_inode;
6718     +
6719     + bstart = dbstart(dentry);
6720     + bend = dbend(dentry);
6721     + for (bindex = bstart; bindex <= bend; bindex++) {
6722     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
6723     + if (!lower_dentry)
6724     + continue;
6725     +
6726     + if (!lower_dentry->d_inode)
6727     + continue;
6728     + if (unionfs_lower_inode_idx(inode, bindex))
6729     + continue;
6730     + unionfs_set_lower_inode_idx(inode, bindex,
6731     + igrab(lower_dentry->d_inode));
6732     + }
6733     + ibstart(inode) = dbstart(dentry);
6734     + ibend(inode) = dbend(dentry);
6735     +}
6736     +
6737     +/*
6738     + * make sure the branch we just looked up (nd) makes sense:
6739     + *
6740     + * 1) we're not trying to stack unionfs on top of unionfs
6741     + * 2) it exists
6742     + * 3) is a directory
6743     + */
6744     +int check_branch(const struct path *path)
6745     +{
6746     + /* XXX: remove in ODF code -- stacking unions allowed there */
6747     + if (!strcmp(path->dentry->d_sb->s_type->name, UNIONFS_NAME))
6748     + return -EINVAL;
6749     + if (!path->dentry->d_inode)
6750     + return -ENOENT;
6751     + if (!S_ISDIR(path->dentry->d_inode->i_mode))
6752     + return -ENOTDIR;
6753     + return 0;
6754     +}
6755     +
6756     +/* checks if two lower_dentries have overlapping branches */
6757     +static int is_branch_overlap(struct dentry *dent1, struct dentry *dent2)
6758     +{
6759     + struct dentry *dent = NULL;
6760     +
6761     + dent = dent1;
6762     + while ((dent != dent2) && (dent->d_parent != dent))
6763     + dent = dent->d_parent;
6764     +
6765     + if (dent == dent2)
6766     + return 1;
6767     +
6768     + dent = dent2;
6769     + while ((dent != dent1) && (dent->d_parent != dent))
6770     + dent = dent->d_parent;
6771     +
6772     + return (dent == dent1);
6773     +}
6774     +
6775     +/*
6776     + * Parse "ro" or "rw" options, but default to "rw" if no mode options was
6777     + * specified. Fill the mode bits in @perms. If encounter an unknown
6778     + * string, return -EINVAL. Otherwise return 0.
6779     + */
6780     +int parse_branch_mode(const char *name, int *perms)
6781     +{
6782     + if (!name || !strcmp(name, "rw")) {
6783     + *perms = MAY_READ | MAY_WRITE;
6784     + return 0;
6785     + }
6786     + if (!strcmp(name, "ro")) {
6787     + *perms = MAY_READ;
6788     + return 0;
6789     + }
6790     + return -EINVAL;
6791     +}
6792     +
6793     +/*
6794     + * parse the dirs= mount argument
6795     + *
6796     + * We don't need to lock the superblock private data's rwsem, as we get
6797     + * called only by unionfs_read_super - it is still a long time before anyone
6798     + * can even get a reference to us.
6799     + */
6800     +static int parse_dirs_option(struct super_block *sb, struct unionfs_dentry_info
6801     + *lower_root_info, char *options)
6802     +{
6803     + struct path path;
6804     + char *name;
6805     + int err = 0;
6806     + int branches = 1;
6807     + int bindex = 0;
6808     + int i = 0;
6809     + int j = 0;
6810     + struct dentry *dent1;
6811     + struct dentry *dent2;
6812     +
6813     + if (options[0] == '\0') {
6814     + printk(KERN_ERR "unionfs: no branches specified\n");
6815     + err = -EINVAL;
6816     + goto out_return;
6817     + }
6818     +
6819     + /*
6820     + * Each colon means we have a separator, this is really just a rough
6821     + * guess, since strsep will handle empty fields for us.
6822     + */
6823     + for (i = 0; options[i]; i++)
6824     + if (options[i] == ':')
6825     + branches++;
6826     +
6827     + /* allocate space for underlying pointers to lower dentry */
6828     + UNIONFS_SB(sb)->data =
6829     + kcalloc(branches, sizeof(struct unionfs_data), GFP_KERNEL);
6830     + if (unlikely(!UNIONFS_SB(sb)->data)) {
6831     + err = -ENOMEM;
6832     + goto out_return;
6833     + }
6834     +
6835     + lower_root_info->lower_paths =
6836     + kcalloc(branches, sizeof(struct path), GFP_KERNEL);
6837     + if (unlikely(!lower_root_info->lower_paths)) {
6838     + err = -ENOMEM;
6839     + /* free the underlying pointer array */
6840     + kfree(UNIONFS_SB(sb)->data);
6841     + UNIONFS_SB(sb)->data = NULL;
6842     + goto out_return;
6843     + }
6844     +
6845     + /* now parsing a string such as "b1:b2=rw:b3=ro:b4" */
6846     + branches = 0;
6847     + while ((name = strsep(&options, ":")) != NULL) {
6848     + int perms;
6849     + char *mode = strchr(name, '=');
6850     +
6851     + if (!name)
6852     + continue;
6853     + if (!*name) { /* bad use of ':' (extra colons) */
6854     + err = -EINVAL;
6855     + goto out;
6856     + }
6857     +
6858     + branches++;
6859     +
6860     + /* strip off '=' if any */
6861     + if (mode)
6862     + *mode++ = '\0';
6863     +
6864     + err = parse_branch_mode(mode, &perms);
6865     + if (err) {
6866     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
6867     + "branch %d\n", mode, bindex);
6868     + goto out;
6869     + }
6870     + /* ensure that leftmost branch is writeable */
6871     + if (!bindex && !(perms & MAY_WRITE)) {
6872     + printk(KERN_ERR "unionfs: leftmost branch cannot be "
6873     + "read-only (use \"-o ro\" to create a "
6874     + "read-only union)\n");
6875     + err = -EINVAL;
6876     + goto out;
6877     + }
6878     +
6879     + err = kern_path(name, LOOKUP_FOLLOW, &path);
6880     + if (err) {
6881     + printk(KERN_ERR "unionfs: error accessing "
6882     + "lower directory '%s' (error %d)\n",
6883     + name, err);
6884     + goto out;
6885     + }
6886     +
6887     + err = check_branch(&path);
6888     + if (err) {
6889     + printk(KERN_ERR "unionfs: lower directory "
6890     + "'%s' is not a valid branch\n", name);
6891     + path_put(&path);
6892     + goto out;
6893     + }
6894     +
6895     + lower_root_info->lower_paths[bindex].dentry = path.dentry;
6896     + lower_root_info->lower_paths[bindex].mnt = path.mnt;
6897     +
6898     + set_branchperms(sb, bindex, perms);
6899     + set_branch_count(sb, bindex, 0);
6900     + new_branch_id(sb, bindex);
6901     +
6902     + if (lower_root_info->bstart < 0)
6903     + lower_root_info->bstart = bindex;
6904     + lower_root_info->bend = bindex;
6905     + bindex++;
6906     + }
6907     +
6908     + if (branches == 0) {
6909     + printk(KERN_ERR "unionfs: no branches specified\n");
6910     + err = -EINVAL;
6911     + goto out;
6912     + }
6913     +
6914     + BUG_ON(branches != (lower_root_info->bend + 1));
6915     +
6916     + /*
6917     + * Ensure that no overlaps exist in the branches.
6918     + *
6919     + * This test is required because the Linux kernel has no support
6920     + * currently for ensuring coherency between stackable layers and
6921     + * branches. If we were to allow overlapping branches, it would be
6922     + * possible, for example, to delete a file via one branch, which
6923     + * would not be reflected in another branch. Such incoherency could
6924     + * lead to inconsistencies and even kernel oopses. Rather than
6925     + * implement hacks to work around some of these cache-coherency
6926     + * problems, we prevent branch overlapping, for now. A complete
6927     + * solution will involve proper kernel/VFS support for cache
6928     + * coherency, at which time we could safely remove this
6929     + * branch-overlapping test.
6930     + */
6931     + for (i = 0; i < branches; i++) {
6932     + dent1 = lower_root_info->lower_paths[i].dentry;
6933     + for (j = i + 1; j < branches; j++) {
6934     + dent2 = lower_root_info->lower_paths[j].dentry;
6935     + if (is_branch_overlap(dent1, dent2)) {
6936     + printk(KERN_ERR "unionfs: branches %d and "
6937     + "%d overlap\n", i, j);
6938     + err = -EINVAL;
6939     + goto out;
6940     + }
6941     + }
6942     + }
6943     +
6944     +out:
6945     + if (err) {
6946     + for (i = 0; i < branches; i++)
6947     + path_put(&lower_root_info->lower_paths[i]);
6948     +
6949     + kfree(lower_root_info->lower_paths);
6950     + kfree(UNIONFS_SB(sb)->data);
6951     +
6952     + /*
6953     + * MUST clear the pointers to prevent potential double free if
6954     + * the caller dies later on
6955     + */
6956     + lower_root_info->lower_paths = NULL;
6957     + UNIONFS_SB(sb)->data = NULL;
6958     + }
6959     +out_return:
6960     + return err;
6961     +}
6962     +
6963     +/*
6964     + * Parse mount options. See the manual page for usage instructions.
6965     + *
6966     + * Returns the dentry object of the lower-level (lower) directory;
6967     + * We want to mount our stackable file system on top of that lower directory.
6968     + */
6969     +static struct unionfs_dentry_info *unionfs_parse_options(
6970     + struct super_block *sb,
6971     + char *options)
6972     +{
6973     + struct unionfs_dentry_info *lower_root_info;
6974     + char *optname;
6975     + int err = 0;
6976     + int bindex;
6977     + int dirsfound = 0;
6978     +
6979     + /* allocate private data area */
6980     + err = -ENOMEM;
6981     + lower_root_info =
6982     + kzalloc(sizeof(struct unionfs_dentry_info), GFP_KERNEL);
6983     + if (unlikely(!lower_root_info))
6984     + goto out_error;
6985     + lower_root_info->bstart = -1;
6986     + lower_root_info->bend = -1;
6987     + lower_root_info->bopaque = -1;
6988     +
6989     + while ((optname = strsep(&options, ",")) != NULL) {
6990     + char *optarg;
6991     +
6992     + if (!optname || !*optname)
6993     + continue;
6994     +
6995     + optarg = strchr(optname, '=');
6996     + if (optarg)
6997     + *optarg++ = '\0';
6998     +
6999     + /*
7000     + * All of our options take an argument now. Insert ones that
7001     + * don't, above this check.
7002     + */
7003     + if (!optarg) {
7004     + printk(KERN_ERR "unionfs: %s requires an argument\n",
7005     + optname);
7006     + err = -EINVAL;
7007     + goto out_error;
7008     + }
7009     +
7010     + if (!strcmp("dirs", optname)) {
7011     + if (++dirsfound > 1) {
7012     + printk(KERN_ERR
7013     + "unionfs: multiple dirs specified\n");
7014     + err = -EINVAL;
7015     + goto out_error;
7016     + }
7017     + err = parse_dirs_option(sb, lower_root_info, optarg);
7018     + if (err)
7019     + goto out_error;
7020     + continue;
7021     + }
7022     +
7023     + err = -EINVAL;
7024     + printk(KERN_ERR
7025     + "unionfs: unrecognized option '%s'\n", optname);
7026     + goto out_error;
7027     + }
7028     + if (dirsfound != 1) {
7029     + printk(KERN_ERR "unionfs: dirs option required\n");
7030     + err = -EINVAL;
7031     + goto out_error;
7032     + }
7033     + goto out;
7034     +
7035     +out_error:
7036     + if (lower_root_info && lower_root_info->lower_paths) {
7037     + for (bindex = lower_root_info->bstart;
7038     + bindex >= 0 && bindex <= lower_root_info->bend;
7039     + bindex++)
7040     + path_put(&lower_root_info->lower_paths[bindex]);
7041     + }
7042     +
7043     + kfree(lower_root_info->lower_paths);
7044     + kfree(lower_root_info);
7045     +
7046     + kfree(UNIONFS_SB(sb)->data);
7047     + UNIONFS_SB(sb)->data = NULL;
7048     +
7049     + lower_root_info = ERR_PTR(err);
7050     +out:
7051     + return lower_root_info;
7052     +}
7053     +
7054     +/*
7055     + * our custom d_alloc_root work-alike
7056     + *
7057     + * we can't use d_alloc_root if we want to use our own interpose function
7058     + * unchanged, so we simply call our own "fake" d_alloc_root
7059     + */
7060     +static struct dentry *unionfs_d_alloc_root(struct super_block *sb)
7061     +{
7062     + struct dentry *ret = NULL;
7063     +
7064     + if (sb) {
7065     + static const struct qstr name = {
7066     + .name = "/",
7067     + .len = 1
7068     + };
7069     +
7070     + ret = d_alloc(NULL, &name);
7071     + if (likely(ret)) {
7072     + ret->d_op = &unionfs_dops;
7073     + ret->d_sb = sb;
7074     + ret->d_parent = ret;
7075     + }
7076     + }
7077     + return ret;
7078     +}
7079     +
7080     +/*
7081     + * There is no need to lock the unionfs_super_info's rwsem as there is no
7082     + * way anyone can have a reference to the superblock at this point in time.
7083     + */
7084     +static int unionfs_read_super(struct super_block *sb, void *raw_data,
7085     + int silent)
7086     +{
7087     + int err = 0;
7088     + struct unionfs_dentry_info *lower_root_info = NULL;
7089     + int bindex, bstart, bend;
7090     +
7091     + if (!raw_data) {
7092     + printk(KERN_ERR
7093     + "unionfs: read_super: missing data argument\n");
7094     + err = -EINVAL;
7095     + goto out;
7096     + }
7097     +
7098     + /* Allocate superblock private data */
7099     + sb->s_fs_info = kzalloc(sizeof(struct unionfs_sb_info), GFP_KERNEL);
7100     + if (unlikely(!UNIONFS_SB(sb))) {
7101     + printk(KERN_CRIT "unionfs: read_super: out of memory\n");
7102     + err = -ENOMEM;
7103     + goto out;
7104     + }
7105     +
7106     + UNIONFS_SB(sb)->bend = -1;
7107     + atomic_set(&UNIONFS_SB(sb)->generation, 1);
7108     + init_rwsem(&UNIONFS_SB(sb)->rwsem);
7109     + UNIONFS_SB(sb)->high_branch_id = -1; /* -1 == invalid branch ID */
7110     +
7111     + lower_root_info = unionfs_parse_options(sb, raw_data);
7112     + if (IS_ERR(lower_root_info)) {
7113     + printk(KERN_ERR
7114     + "unionfs: read_super: error while parsing options "
7115     + "(err = %ld)\n", PTR_ERR(lower_root_info));
7116     + err = PTR_ERR(lower_root_info);
7117     + lower_root_info = NULL;
7118     + goto out_free;
7119     + }
7120     + if (lower_root_info->bstart == -1) {
7121     + err = -ENOENT;
7122     + goto out_free;
7123     + }
7124     +
7125     + /* set the lower superblock field of upper superblock */
7126     + bstart = lower_root_info->bstart;
7127     + BUG_ON(bstart != 0);
7128     + sbend(sb) = bend = lower_root_info->bend;
7129     + for (bindex = bstart; bindex <= bend; bindex++) {
7130     + struct dentry *d = lower_root_info->lower_paths[bindex].dentry;
7131     + atomic_inc(&d->d_sb->s_active);
7132     + unionfs_set_lower_super_idx(sb, bindex, d->d_sb);
7133     + }
7134     +
7135     + /* max Bytes is the maximum bytes from highest priority branch */
7136     + sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
7137     +
7138     + /*
7139     + * Our c/m/atime granularity is 1 ns because we may stack on file
7140     + * systems whose granularity is as good. This is important for our
7141     + * time-based cache coherency.
7142     + */
7143     + sb->s_time_gran = 1;
7144     +
7145     + sb->s_op = &unionfs_sops;
7146     +
7147     + /* See comment next to the definition of unionfs_d_alloc_root */
7148     + sb->s_root = unionfs_d_alloc_root(sb);
7149     + if (unlikely(!sb->s_root)) {
7150     + err = -ENOMEM;
7151     + goto out_dput;
7152     + }
7153     +
7154     + /* link the upper and lower dentries */
7155     + sb->s_root->d_fsdata = NULL;
7156     + err = new_dentry_private_data(sb->s_root, UNIONFS_DMUTEX_ROOT);
7157     + if (unlikely(err))
7158     + goto out_freedpd;
7159     +
7160     + /* Set the lower dentries for s_root */
7161     + for (bindex = bstart; bindex <= bend; bindex++) {
7162     + struct dentry *d;
7163     + struct vfsmount *m;
7164     +
7165     + d = lower_root_info->lower_paths[bindex].dentry;
7166     + m = lower_root_info->lower_paths[bindex].mnt;
7167     +
7168     + unionfs_set_lower_dentry_idx(sb->s_root, bindex, d);
7169     + unionfs_set_lower_mnt_idx(sb->s_root, bindex, m);
7170     + }
7171     + dbstart(sb->s_root) = bstart;
7172     + dbend(sb->s_root) = bend;
7173     +
7174     + /* Set the generation number to one, since this is for the mount. */
7175     + atomic_set(&UNIONFS_D(sb->s_root)->generation, 1);
7176     +
7177     + /*
7178     + * Call interpose to create the upper level inode. Only
7179     + * INTERPOSE_LOOKUP can return a value other than 0 on err.
7180     + */
7181     + err = PTR_ERR(unionfs_interpose(sb->s_root, sb, 0));
7182     + unionfs_unlock_dentry(sb->s_root);
7183     + if (!err)
7184     + goto out;
7185     + /* else fall through */
7186     +
7187     +out_freedpd:
7188     + if (UNIONFS_D(sb->s_root)) {
7189     + kfree(UNIONFS_D(sb->s_root)->lower_paths);
7190     + free_dentry_private_data(sb->s_root);
7191     + }
7192     + dput(sb->s_root);
7193     +
7194     +out_dput:
7195     + if (lower_root_info && !IS_ERR(lower_root_info)) {
7196     + for (bindex = lower_root_info->bstart;
7197     + bindex <= lower_root_info->bend; bindex++) {
7198     + struct dentry *d;
7199     + d = lower_root_info->lower_paths[bindex].dentry;
7200     + /* drop refs we took earlier */
7201     + atomic_dec(&d->d_sb->s_active);
7202     + path_put(&lower_root_info->lower_paths[bindex]);
7203     + }
7204     + kfree(lower_root_info->lower_paths);
7205     + kfree(lower_root_info);
7206     + lower_root_info = NULL;
7207     + }
7208     +
7209     +out_free:
7210     + kfree(UNIONFS_SB(sb)->data);
7211     + kfree(UNIONFS_SB(sb));
7212     + sb->s_fs_info = NULL;
7213     +
7214     +out:
7215     + if (lower_root_info && !IS_ERR(lower_root_info)) {
7216     + kfree(lower_root_info->lower_paths);
7217     + kfree(lower_root_info);
7218     + }
7219     + return err;
7220     +}
7221     +
7222     +static struct dentry *unionfs_mount(struct file_system_type *fs_type,
7223     + int flags, const char *dev_name,
7224     + void *raw_data)
7225     +{
7226     + struct dentry *dentry;
7227     +
7228     + dentry = mount_nodev(fs_type, flags, raw_data, unionfs_read_super);
7229     + if (!PTR_ERR(dentry))
7230     + UNIONFS_SB(dentry->d_sb)->dev_name =
7231     + kstrdup(dev_name, GFP_KERNEL);
7232     + return dentry;
7233     +}
7234     +
7235     +static struct file_system_type unionfs_fs_type = {
7236     + .owner = THIS_MODULE,
7237     + .name = UNIONFS_NAME,
7238     + .mount = unionfs_mount,
7239     + .kill_sb = generic_shutdown_super,
7240     + .fs_flags = FS_REVAL_DOT,
7241     +};
7242     +
7243     +static int __init init_unionfs_fs(void)
7244     +{
7245     + int err;
7246     +
7247     + pr_info("Registering unionfs " UNIONFS_VERSION "\n");
7248     +
7249     + err = unionfs_init_filldir_cache();
7250     + if (unlikely(err))
7251     + goto out;
7252     + err = unionfs_init_inode_cache();
7253     + if (unlikely(err))
7254     + goto out;
7255     + err = unionfs_init_dentry_cache();
7256     + if (unlikely(err))
7257     + goto out;
7258     + err = init_sioq();
7259     + if (unlikely(err))
7260     + goto out;
7261     + err = register_filesystem(&unionfs_fs_type);
7262     +out:
7263     + if (unlikely(err)) {
7264     + stop_sioq();
7265     + unionfs_destroy_filldir_cache();
7266     + unionfs_destroy_inode_cache();
7267     + unionfs_destroy_dentry_cache();
7268     + }
7269     + return err;
7270     +}
7271     +
7272     +static void __exit exit_unionfs_fs(void)
7273     +{
7274     + stop_sioq();
7275     + unionfs_destroy_filldir_cache();
7276     + unionfs_destroy_inode_cache();
7277     + unionfs_destroy_dentry_cache();
7278     + unregister_filesystem(&unionfs_fs_type);
7279     + pr_info("Completed unionfs module unload\n");
7280     +}
7281     +
7282     +MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University"
7283     + " (http://www.fsl.cs.sunysb.edu)");
7284     +MODULE_DESCRIPTION("Unionfs " UNIONFS_VERSION
7285     + " (http://unionfs.filesystems.org)");
7286     +MODULE_LICENSE("GPL");
7287     +
7288     +module_init(init_unionfs_fs);
7289     +module_exit(exit_unionfs_fs);
7290     diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
7291     new file mode 100644
7292     index 0000000..bcc5652
7293     --- /dev/null
7294     +++ b/fs/unionfs/mmap.c
7295     @@ -0,0 +1,89 @@
7296     +/*
7297     + * Copyright (c) 2003-2011 Erez Zadok
7298     + * Copyright (c) 2003-2006 Charles P. Wright
7299     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7300     + * Copyright (c) 2005-2006 Junjiro Okajima
7301     + * Copyright (c) 2006 Shaya Potter
7302     + * Copyright (c) 2005 Arun M. Krishnakumar
7303     + * Copyright (c) 2004-2006 David P. Quigley
7304     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7305     + * Copyright (c) 2003 Puja Gupta
7306     + * Copyright (c) 2003 Harikesavan Krishnan
7307     + * Copyright (c) 2003-2011 Stony Brook University
7308     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
7309     + *
7310     + * This program is free software; you can redistribute it and/or modify
7311     + * it under the terms of the GNU General Public License version 2 as
7312     + * published by the Free Software Foundation.
7313     + */
7314     +
7315     +#include "union.h"
7316     +
7317     +
7318     +/*
7319     + * XXX: we need a dummy readpage handler because generic_file_mmap (which we
7320     + * use in unionfs_mmap) checks for the existence of
7321     + * mapping->a_ops->readpage, else it returns -ENOEXEC. The VFS will need to
7322     + * be fixed to allow a file system to define vm_ops->fault without any
7323     + * address_space_ops whatsoever.
7324     + *
7325     + * Otherwise, we don't want to use our readpage method at all.
7326     + */
7327     +static int unionfs_readpage(struct file *file, struct page *page)
7328     +{
7329     + BUG();
7330     + return -EINVAL;
7331     +}
7332     +
7333     +static int unionfs_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
7334     +{
7335     + int err;
7336     + struct file *file, *lower_file;
7337     + const struct vm_operations_struct *lower_vm_ops;
7338     + struct vm_area_struct lower_vma;
7339     +
7340     + BUG_ON(!vma);
7341     + memcpy(&lower_vma, vma, sizeof(struct vm_area_struct));
7342     + file = lower_vma.vm_file;
7343     + lower_vm_ops = UNIONFS_F(file)->lower_vm_ops;
7344     + BUG_ON(!lower_vm_ops);
7345     +
7346     + lower_file = unionfs_lower_file(file);
7347     + BUG_ON(!lower_file);
7348     + /*
7349     + * XXX: vm_ops->fault may be called in parallel. Because we have to
7350     + * resort to temporarily changing the vma->vm_file to point to the
7351     + * lower file, a concurrent invocation of unionfs_fault could see a
7352     + * different value. In this workaround, we keep a different copy of
7353     + * the vma structure in our stack, so we never expose a different
7354     + * value of the vma->vm_file called to us, even temporarily. A
7355     + * better fix would be to change the calling semantics of ->fault to
7356     + * take an explicit file pointer.
7357     + */
7358     + lower_vma.vm_file = lower_file;
7359     + err = lower_vm_ops->fault(&lower_vma, vmf);
7360     + return err;
7361     +}
7362     +
7363     +/*
7364     + * XXX: the default address_space_ops for unionfs is empty. We cannot set
7365     + * our inode->i_mapping->a_ops to NULL because too many code paths expect
7366     + * the a_ops vector to be non-NULL.
7367     + */
7368     +struct address_space_operations unionfs_aops = {
7369     + /* empty on purpose */
7370     +};
7371     +
7372     +/*
7373     + * XXX: we need a second, dummy address_space_ops vector, to be used
7374     + * temporarily during unionfs_mmap, because the latter calls
7375     + * generic_file_mmap, which checks if ->readpage exists, else returns
7376     + * -ENOEXEC.
7377     + */
7378     +struct address_space_operations unionfs_dummy_aops = {
7379     + .readpage = unionfs_readpage,
7380     +};
7381     +
7382     +struct vm_operations_struct unionfs_vm_ops = {
7383     + .fault = unionfs_fault,
7384     +};
7385     diff --git a/fs/unionfs/rdstate.c b/fs/unionfs/rdstate.c
7386     new file mode 100644
7387     index 0000000..59b7333
7388     --- /dev/null
7389     +++ b/fs/unionfs/rdstate.c
7390     @@ -0,0 +1,285 @@
7391     +/*
7392     + * Copyright (c) 2003-2011 Erez Zadok
7393     + * Copyright (c) 2003-2006 Charles P. Wright
7394     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7395     + * Copyright (c) 2005-2006 Junjiro Okajima
7396     + * Copyright (c) 2005 Arun M. Krishnakumar
7397     + * Copyright (c) 2004-2006 David P. Quigley
7398     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7399     + * Copyright (c) 2003 Puja Gupta
7400     + * Copyright (c) 2003 Harikesavan Krishnan
7401     + * Copyright (c) 2003-2011 Stony Brook University
7402     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
7403     + *
7404     + * This program is free software; you can redistribute it and/or modify
7405     + * it under the terms of the GNU General Public License version 2 as
7406     + * published by the Free Software Foundation.
7407     + */
7408     +
7409     +#include "union.h"
7410     +
7411     +/* This file contains the routines for maintaining readdir state. */
7412     +
7413     +/*
7414     + * There are two structures here, rdstate which is a hash table
7415     + * of the second structure which is a filldir_node.
7416     + */
7417     +
7418     +/*
7419     + * This is a struct kmem_cache for filldir nodes, because we allocate a lot
7420     + * of them and they shouldn't waste memory. If the node has a small name
7421     + * (as defined by the dentry structure), then we use an inline name to
7422     + * preserve kmalloc space.
7423     + */
7424     +static struct kmem_cache *unionfs_filldir_cachep;
7425     +
7426     +int unionfs_init_filldir_cache(void)
7427     +{
7428     + unionfs_filldir_cachep =
7429     + kmem_cache_create("unionfs_filldir",
7430     + sizeof(struct filldir_node), 0,
7431     + SLAB_RECLAIM_ACCOUNT, NULL);
7432     +
7433     + return (unionfs_filldir_cachep ? 0 : -ENOMEM);
7434     +}
7435     +
7436     +void unionfs_destroy_filldir_cache(void)
7437     +{
7438     + if (unionfs_filldir_cachep)
7439     + kmem_cache_destroy(unionfs_filldir_cachep);
7440     +}
7441     +
7442     +/*
7443     + * This is a tuning parameter that tells us roughly how big to make the
7444     + * hash table in directory entries per page. This isn't perfect, but
7445     + * at least we get a hash table size that shouldn't be too overloaded.
7446     + * The following averages are based on my home directory.
7447     + * 14.44693 Overall
7448     + * 12.29 Single Page Directories
7449     + * 117.93 Multi-page directories
7450     + */
7451     +#define DENTPAGE 4096
7452     +#define DENTPERONEPAGE 12
7453     +#define DENTPERPAGE 118
7454     +#define MINHASHSIZE 1
7455     +static int guesstimate_hash_size(struct inode *inode)
7456     +{
7457     + struct inode *lower_inode;
7458     + int bindex;
7459     + int hashsize = MINHASHSIZE;
7460     +
7461     + if (UNIONFS_I(inode)->hashsize > 0)
7462     + return UNIONFS_I(inode)->hashsize;
7463     +
7464     + for (bindex = ibstart(inode); bindex <= ibend(inode); bindex++) {
7465     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
7466     + if (!lower_inode)
7467     + continue;
7468     +
7469     + if (i_size_read(lower_inode) == DENTPAGE)
7470     + hashsize += DENTPERONEPAGE;
7471     + else
7472     + hashsize += (i_size_read(lower_inode) / DENTPAGE) *
7473     + DENTPERPAGE;
7474     + }
7475     +
7476     + return hashsize;
7477     +}
7478     +
7479     +int init_rdstate(struct file *file)
7480     +{
7481     + BUG_ON(sizeof(loff_t) !=
7482     + (sizeof(unsigned int) + sizeof(unsigned int)));
7483     + BUG_ON(UNIONFS_F(file)->rdstate != NULL);
7484     +
7485     + UNIONFS_F(file)->rdstate = alloc_rdstate(file->f_path.dentry->d_inode,
7486     + fbstart(file));
7487     +
7488     + return (UNIONFS_F(file)->rdstate ? 0 : -ENOMEM);
7489     +}
7490     +
7491     +struct unionfs_dir_state *find_rdstate(struct inode *inode, loff_t fpos)
7492     +{
7493     + struct unionfs_dir_state *rdstate = NULL;
7494     + struct list_head *pos;
7495     +
7496     + spin_lock(&UNIONFS_I(inode)->rdlock);
7497     + list_for_each(pos, &UNIONFS_I(inode)->readdircache) {
7498     + struct unionfs_dir_state *r =
7499     + list_entry(pos, struct unionfs_dir_state, cache);
7500     + if (fpos == rdstate2offset(r)) {
7501     + UNIONFS_I(inode)->rdcount--;
7502     + list_del(&r->cache);
7503     + rdstate = r;
7504     + break;
7505     + }
7506     + }
7507     + spin_unlock(&UNIONFS_I(inode)->rdlock);
7508     + return rdstate;
7509     +}
7510     +
7511     +struct unionfs_dir_state *alloc_rdstate(struct inode *inode, int bindex)
7512     +{
7513     + int i = 0;
7514     + int hashsize;
7515     + unsigned long mallocsize = sizeof(struct unionfs_dir_state);
7516     + struct unionfs_dir_state *rdstate;
7517     +
7518     + hashsize = guesstimate_hash_size(inode);
7519     + mallocsize += hashsize * sizeof(struct list_head);
7520     + mallocsize = __roundup_pow_of_two(mallocsize);
7521     +
7522     + /* This should give us about 500 entries anyway. */
7523     + if (mallocsize > PAGE_SIZE)
7524     + mallocsize = PAGE_SIZE;
7525     +
7526     + hashsize = (mallocsize - sizeof(struct unionfs_dir_state)) /
7527     + sizeof(struct list_head);
7528     +
7529     + rdstate = kmalloc(mallocsize, GFP_KERNEL);
7530     + if (unlikely(!rdstate))
7531     + return NULL;
7532     +
7533     + spin_lock(&UNIONFS_I(inode)->rdlock);
7534     + if (UNIONFS_I(inode)->cookie >= (MAXRDCOOKIE - 1))
7535     + UNIONFS_I(inode)->cookie = 1;
7536     + else
7537     + UNIONFS_I(inode)->cookie++;
7538     +
7539     + rdstate->cookie = UNIONFS_I(inode)->cookie;
7540     + spin_unlock(&UNIONFS_I(inode)->rdlock);
7541     + rdstate->offset = 1;
7542     + rdstate->access = jiffies;
7543     + rdstate->bindex = bindex;
7544     + rdstate->dirpos = 0;
7545     + rdstate->hashentries = 0;
7546     + rdstate->size = hashsize;
7547     + for (i = 0; i < rdstate->size; i++)
7548     + INIT_LIST_HEAD(&rdstate->list[i]);
7549     +
7550     + return rdstate;
7551     +}
7552     +
7553     +static void free_filldir_node(struct filldir_node *node)
7554     +{
7555     + if (node->namelen >= DNAME_INLINE_LEN)
7556     + kfree(node->name);
7557     + kmem_cache_free(unionfs_filldir_cachep, node);
7558     +}
7559     +
7560     +void free_rdstate(struct unionfs_dir_state *state)
7561     +{
7562     + struct filldir_node *tmp;
7563     + int i;
7564     +
7565     + for (i = 0; i < state->size; i++) {
7566     + struct list_head *head = &(state->list[i]);
7567     + struct list_head *pos, *n;
7568     +
7569     + /* traverse the list and deallocate space */
7570     + list_for_each_safe(pos, n, head) {
7571     + tmp = list_entry(pos, struct filldir_node, file_list);
7572     + list_del(&tmp->file_list);
7573     + free_filldir_node(tmp);
7574     + }
7575     + }
7576     +
7577     + kfree(state);
7578     +}
7579     +
7580     +struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
7581     + const char *name, int namelen,
7582     + int is_whiteout)
7583     +{
7584     + int index;
7585     + unsigned int hash;
7586     + struct list_head *head;
7587     + struct list_head *pos;
7588     + struct filldir_node *cursor = NULL;
7589     + int found = 0;
7590     +
7591     + BUG_ON(namelen <= 0);
7592     +
7593     + hash = full_name_hash(name, namelen);
7594     + index = hash % rdstate->size;
7595     +
7596     + head = &(rdstate->list[index]);
7597     + list_for_each(pos, head) {
7598     + cursor = list_entry(pos, struct filldir_node, file_list);
7599     +
7600     + if (cursor->namelen == namelen && cursor->hash == hash &&
7601     + !strncmp(cursor->name, name, namelen)) {
7602     + /*
7603     + * a duplicate exists, and hence no need to create
7604     + * entry to the list
7605     + */
7606     + found = 1;
7607     +
7608     + /*
7609     + * if a duplicate is found in this branch, and is
7610     + * not due to the caller looking for an entry to
7611     + * whiteout, then the file system may be corrupted.
7612     + */
7613     + if (unlikely(!is_whiteout &&
7614     + cursor->bindex == rdstate->bindex))
7615     + printk(KERN_ERR "unionfs: filldir: possible "
7616     + "I/O error: a file is duplicated "
7617     + "in the same branch %d: %s\n",
7618     + rdstate->bindex, cursor->name);
7619     + break;
7620     + }
7621     + }
7622     +
7623     + if (!found)
7624     + cursor = NULL;
7625     +
7626     + return cursor;
7627     +}
7628     +
7629     +int add_filldir_node(struct unionfs_dir_state *rdstate, const char *name,
7630     + int namelen, int bindex, int whiteout)
7631     +{
7632     + struct filldir_node *new;
7633     + unsigned int hash;
7634     + int index;
7635     + int err = 0;
7636     + struct list_head *head;
7637     +
7638     + BUG_ON(namelen <= 0);
7639     +
7640     + hash = full_name_hash(name, namelen);
7641     + index = hash % rdstate->size;
7642     + head = &(rdstate->list[index]);
7643     +
7644     + new = kmem_cache_alloc(unionfs_filldir_cachep, GFP_KERNEL);
7645     + if (unlikely(!new)) {
7646     + err = -ENOMEM;
7647     + goto out;
7648     + }
7649     +
7650     + INIT_LIST_HEAD(&new->file_list);
7651     + new->namelen = namelen;
7652     + new->hash = hash;
7653     + new->bindex = bindex;
7654     + new->whiteout = whiteout;
7655     +
7656     + if (namelen < DNAME_INLINE_LEN) {
7657     + new->name = new->iname;
7658     + } else {
7659     + new->name = kmalloc(namelen + 1, GFP_KERNEL);
7660     + if (unlikely(!new->name)) {
7661     + kmem_cache_free(unionfs_filldir_cachep, new);
7662     + new = NULL;
7663     + goto out;
7664     + }
7665     + }
7666     +
7667     + memcpy(new->name, name, namelen);
7668     + new->name[namelen] = '\0';
7669     +
7670     + rdstate->hashentries++;
7671     +
7672     + list_add(&(new->file_list), head);
7673     +out:
7674     + return err;
7675     +}
7676     diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
7677     new file mode 100644
7678     index 0000000..c8ab910
7679     --- /dev/null
7680     +++ b/fs/unionfs/rename.c
7681     @@ -0,0 +1,522 @@
7682     +/*
7683     + * Copyright (c) 2003-2011 Erez Zadok
7684     + * Copyright (c) 2003-2006 Charles P. Wright
7685     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
7686     + * Copyright (c) 2005-2006 Junjiro Okajima
7687     + * Copyright (c) 2005 Arun M. Krishnakumar
7688     + * Copyright (c) 2004-2006 David P. Quigley
7689     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
7690     + * Copyright (c) 2003 Puja Gupta
7691     + * Copyright (c) 2003 Harikesavan Krishnan
7692     + * Copyright (c) 2003-2011 Stony Brook University
7693     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
7694     + *
7695     + * This program is free software; you can redistribute it and/or modify
7696     + * it under the terms of the GNU General Public License version 2 as
7697     + * published by the Free Software Foundation.
7698     + */
7699     +
7700     +#include "union.h"
7701     +
7702     +/*
7703     + * This is a helper function for rename, used when rename ends up with hosed
7704     + * over dentries and we need to revert.
7705     + */
7706     +static int unionfs_refresh_lower_dentry(struct dentry *dentry,
7707     + struct dentry *parent, int bindex)
7708     +{
7709     + struct dentry *lower_dentry;
7710     + struct dentry *lower_parent;
7711     + int err = 0;
7712     + struct nameidata lower_nd;
7713     +
7714     + verify_locked(dentry);
7715     +
7716     + lower_parent = unionfs_lower_dentry_idx(parent, bindex);
7717     +
7718     + BUG_ON(!S_ISDIR(lower_parent->d_inode->i_mode));
7719     +
7720     + err = init_lower_nd(&lower_nd, LOOKUP_OPEN);
7721     + if (unlikely(err < 0))
7722     + goto out;
7723     + lower_dentry = lookup_one_len_nd(dentry->d_name.name, lower_parent,
7724     + dentry->d_name.len, &lower_nd);
7725     + release_lower_nd(&lower_nd, err);
7726     + if (IS_ERR(lower_dentry)) {
7727     + err = PTR_ERR(lower_dentry);
7728     + goto out;
7729     + }
7730     +
7731     + dput(unionfs_lower_dentry_idx(dentry, bindex));
7732     + iput(unionfs_lower_inode_idx(dentry->d_inode, bindex));
7733     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex, NULL);
7734     +
7735     + if (!lower_dentry->d_inode) {
7736     + dput(lower_dentry);
7737     + unionfs_set_lower_dentry_idx(dentry, bindex, NULL);
7738     + } else {
7739     + unionfs_set_lower_dentry_idx(dentry, bindex, lower_dentry);
7740     + unionfs_set_lower_inode_idx(dentry->d_inode, bindex,
7741     + igrab(lower_dentry->d_inode));
7742     + }
7743     +
7744     +out:
7745     + return err;
7746     +}
7747     +
7748     +static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
7749     + struct dentry *old_parent,
7750     + struct inode *new_dir, struct dentry *new_dentry,
7751     + struct dentry *new_parent,
7752     + int bindex)
7753     +{
7754     + int err = 0;
7755     + struct dentry *lower_old_dentry;
7756     + struct dentry *lower_new_dentry;
7757     + struct dentry *lower_old_dir_dentry;
7758     + struct dentry *lower_new_dir_dentry;
7759     + struct dentry *trap;
7760     +
7761     + lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7762     + lower_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
7763     +
7764     + if (!lower_new_dentry) {
7765     + lower_new_dentry =
7766     + create_parents(new_parent->d_inode,
7767     + new_dentry, new_dentry->d_name.name,
7768     + bindex);
7769     + if (IS_ERR(lower_new_dentry)) {
7770     + err = PTR_ERR(lower_new_dentry);
7771     + if (IS_COPYUP_ERR(err))
7772     + goto out;
7773     + printk(KERN_ERR "unionfs: error creating directory "
7774     + "tree for rename, bindex=%d err=%d\n",
7775     + bindex, err);
7776     + goto out;
7777     + }
7778     + }
7779     +
7780     + /* check for and remove whiteout, if any */
7781     + err = check_unlink_whiteout(new_dentry, lower_new_dentry, bindex);
7782     + if (err > 0) /* ignore if whiteout found and successfully removed */
7783     + err = 0;
7784     + if (err)
7785     + goto out;
7786     +
7787     + /* check of old_dentry branch is writable */
7788     + err = is_robranch_super(old_dentry->d_sb, bindex);
7789     + if (err)
7790     + goto out;
7791     +
7792     + dget(lower_old_dentry);
7793     + dget(lower_new_dentry);
7794     + lower_old_dir_dentry = dget_parent(lower_old_dentry);
7795     + lower_new_dir_dentry = dget_parent(lower_new_dentry);
7796     +
7797     + trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7798     + /* source should not be ancenstor of target */
7799     + if (trap == lower_old_dentry) {
7800     + err = -EINVAL;
7801     + goto out_err_unlock;
7802     + }
7803     + /* target should not be ancenstor of source */
7804     + if (trap == lower_new_dentry) {
7805     + err = -ENOTEMPTY;
7806     + goto out_err_unlock;
7807     + }
7808     + err = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
7809     + lower_new_dir_dentry->d_inode, lower_new_dentry);
7810     +out_err_unlock:
7811     + if (!err) {
7812     + /* update parent dir times */
7813     + fsstack_copy_attr_times(old_dir, lower_old_dir_dentry->d_inode);
7814     + fsstack_copy_attr_times(new_dir, lower_new_dir_dentry->d_inode);
7815     + }
7816     + unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
7817     +
7818     + dput(lower_old_dir_dentry);
7819     + dput(lower_new_dir_dentry);
7820     + dput(lower_old_dentry);
7821     + dput(lower_new_dentry);
7822     +
7823     +out:
7824     + if (!err) {
7825     + /* Fixup the new_dentry. */
7826     + if (bindex < dbstart(new_dentry))
7827     + dbstart(new_dentry) = bindex;
7828     + else if (bindex > dbend(new_dentry))
7829     + dbend(new_dentry) = bindex;
7830     + }
7831     +
7832     + return err;
7833     +}
7834     +
7835     +/*
7836     + * Main rename code. This is sufficiently complex, that it's documented in
7837     + * Documentation/filesystems/unionfs/rename.txt. This routine calls
7838     + * __unionfs_rename() above to perform some of the work.
7839     + */
7840     +static int do_unionfs_rename(struct inode *old_dir,
7841     + struct dentry *old_dentry,
7842     + struct dentry *old_parent,
7843     + struct inode *new_dir,
7844     + struct dentry *new_dentry,
7845     + struct dentry *new_parent)
7846     +{
7847     + int err = 0;
7848     + int bindex;
7849     + int old_bstart, old_bend;
7850     + int new_bstart, new_bend;
7851     + int do_copyup = -1;
7852     + int local_err = 0;
7853     + int eio = 0;
7854     + int revert = 0;
7855     +
7856     + old_bstart = dbstart(old_dentry);
7857     + old_bend = dbend(old_dentry);
7858     +
7859     + new_bstart = dbstart(new_dentry);
7860     + new_bend = dbend(new_dentry);
7861     +
7862     + /* Rename source to destination. */
7863     + err = __unionfs_rename(old_dir, old_dentry, old_parent,
7864     + new_dir, new_dentry, new_parent,
7865     + old_bstart);
7866     + if (err) {
7867     + if (!IS_COPYUP_ERR(err))
7868     + goto out;
7869     + do_copyup = old_bstart - 1;
7870     + } else {
7871     + revert = 1;
7872     + }
7873     +
7874     + /*
7875     + * Unlink all instances of destination that exist to the left of
7876     + * bstart of source. On error, revert back, goto out.
7877     + */
7878     + for (bindex = old_bstart - 1; bindex >= new_bstart; bindex--) {
7879     + struct dentry *unlink_dentry;
7880     + struct dentry *unlink_dir_dentry;
7881     +
7882     + BUG_ON(bindex < 0);
7883     + unlink_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
7884     + if (!unlink_dentry)
7885     + continue;
7886     +
7887     + unlink_dir_dentry = lock_parent(unlink_dentry);
7888     + err = is_robranch_super(old_dir->i_sb, bindex);
7889     + if (!err)
7890     + err = vfs_unlink(unlink_dir_dentry->d_inode,
7891     + unlink_dentry);
7892     +
7893     + fsstack_copy_attr_times(new_parent->d_inode,
7894     + unlink_dir_dentry->d_inode);
7895     + /* propagate number of hard-links */
7896     + new_parent->d_inode->i_nlink =
7897     + unionfs_get_nlinks(new_parent->d_inode);
7898     +
7899     + unlock_dir(unlink_dir_dentry);
7900     + if (!err) {
7901     + if (bindex != new_bstart) {
7902     + dput(unlink_dentry);
7903     + unionfs_set_lower_dentry_idx(new_dentry,
7904     + bindex, NULL);
7905     + }
7906     + } else if (IS_COPYUP_ERR(err)) {
7907     + do_copyup = bindex - 1;
7908     + } else if (revert) {
7909     + goto revert;
7910     + }
7911     + }
7912     +
7913     + if (do_copyup != -1) {
7914     + for (bindex = do_copyup; bindex >= 0; bindex--) {
7915     + /*
7916     + * copyup the file into some left directory, so that
7917     + * you can rename it
7918     + */
7919     + err = copyup_dentry(old_parent->d_inode,
7920     + old_dentry, old_bstart, bindex,
7921     + old_dentry->d_name.name,
7922     + old_dentry->d_name.len, NULL,
7923     + i_size_read(old_dentry->d_inode));
7924     + /* if copyup failed, try next branch to the left */
7925     + if (err)
7926     + continue;
7927     + /*
7928     + * create whiteout before calling __unionfs_rename
7929     + * because the latter will change the old_dentry's
7930     + * lower name and parent dir, resulting in the
7931     + * whiteout getting created in the wrong dir.
7932     + */
7933     + err = create_whiteout(old_dentry, bindex);
7934     + if (err) {
7935     + printk(KERN_ERR "unionfs: can't create a "
7936     + "whiteout for %s in rename (err=%d)\n",
7937     + old_dentry->d_name.name, err);
7938     + continue;
7939     + }
7940     + err = __unionfs_rename(old_dir, old_dentry, old_parent,
7941     + new_dir, new_dentry, new_parent,
7942     + bindex);
7943     + break;
7944     + }
7945     + }
7946     +
7947     + /* make it opaque */
7948     + if (S_ISDIR(old_dentry->d_inode->i_mode)) {
7949     + err = make_dir_opaque(old_dentry, dbstart(old_dentry));
7950     + if (err)
7951     + goto revert;
7952     + }
7953     +
7954     + /*
7955     + * Create whiteout for source, only if:
7956     + * (1) There is more than one underlying instance of source.
7957     + * (We did a copy_up is taken care of above).
7958     + */
7959     + if ((old_bstart != old_bend) && (do_copyup == -1)) {
7960     + err = create_whiteout(old_dentry, old_bstart);
7961     + if (err) {
7962     + /* can't fix anything now, so we exit with -EIO */
7963     + printk(KERN_ERR "unionfs: can't create a whiteout for "
7964     + "%s in rename!\n", old_dentry->d_name.name);
7965     + err = -EIO;
7966     + }
7967     + }
7968     +
7969     +out:
7970     + return err;
7971     +
7972     +revert:
7973     + /* Do revert here. */
7974     + local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
7975     + old_bstart);
7976     + if (local_err) {
7977     + printk(KERN_ERR "unionfs: revert failed in rename: "
7978     + "the new refresh failed\n");
7979     + eio = -EIO;
7980     + }
7981     +
7982     + local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
7983     + old_bstart);
7984     + if (local_err) {
7985     + printk(KERN_ERR "unionfs: revert failed in rename: "
7986     + "the old refresh failed\n");
7987     + eio = -EIO;
7988     + goto revert_out;
7989     + }
7990     +
7991     + if (!unionfs_lower_dentry_idx(new_dentry, bindex) ||
7992     + !unionfs_lower_dentry_idx(new_dentry, bindex)->d_inode) {
7993     + printk(KERN_ERR "unionfs: revert failed in rename: "
7994     + "the object disappeared from under us!\n");
7995     + eio = -EIO;
7996     + goto revert_out;
7997     + }
7998     +
7999     + if (unionfs_lower_dentry_idx(old_dentry, bindex) &&
8000     + unionfs_lower_dentry_idx(old_dentry, bindex)->d_inode) {
8001     + printk(KERN_ERR "unionfs: revert failed in rename: "
8002     + "the object was created underneath us!\n");
8003     + eio = -EIO;
8004     + goto revert_out;
8005     + }
8006     +
8007     + local_err = __unionfs_rename(new_dir, new_dentry, new_parent,
8008     + old_dir, old_dentry, old_parent,
8009     + old_bstart);
8010     +
8011     + /* If we can't fix it, then we cop-out with -EIO. */
8012     + if (local_err) {
8013     + printk(KERN_ERR "unionfs: revert failed in rename!\n");
8014     + eio = -EIO;
8015     + }
8016     +
8017     + local_err = unionfs_refresh_lower_dentry(new_dentry, new_parent,
8018     + bindex);
8019     + if (local_err)
8020     + eio = -EIO;
8021     + local_err = unionfs_refresh_lower_dentry(old_dentry, old_parent,
8022     + bindex);
8023     + if (local_err)
8024     + eio = -EIO;
8025     +
8026     +revert_out:
8027     + if (eio)
8028     + err = eio;
8029     + return err;
8030     +}
8031     +
8032     +/*
8033     + * We can't copyup a directory, because it may involve huge numbers of
8034     + * children, etc. Doing that in the kernel would be bad, so instead we
8035     + * return EXDEV to the user-space utility that caused this, and let the
8036     + * user-space recurse and ask us to copy up each file separately.
8037     + */
8038     +static int may_rename_dir(struct dentry *dentry, struct dentry *parent)
8039     +{
8040     + int err, bstart;
8041     +
8042     + err = check_empty(dentry, parent, NULL);
8043     + if (err == -ENOTEMPTY) {
8044     + if (is_robranch(dentry))
8045     + return -EXDEV;
8046     + } else if (err) {
8047     + return err;
8048     + }
8049     +
8050     + bstart = dbstart(dentry);
8051     + if (dbend(dentry) == bstart || dbopaque(dentry) == bstart)
8052     + return 0;
8053     +
8054     + dbstart(dentry) = bstart + 1;
8055     + err = check_empty(dentry, parent, NULL);
8056     + dbstart(dentry) = bstart;
8057     + if (err == -ENOTEMPTY)
8058     + err = -EXDEV;
8059     + return err;
8060     +}
8061     +
8062     +/*
8063     + * The locking rules in unionfs_rename are complex. We could use a simpler
8064     + * superblock-level name-space lock for renames and copy-ups.
8065     + */
8066     +int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
8067     + struct inode *new_dir, struct dentry *new_dentry)
8068     +{
8069     + int err = 0;
8070     + struct dentry *wh_dentry;
8071     + struct dentry *old_parent, *new_parent;
8072     + int valid = true;
8073     +
8074     + unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
8075     + old_parent = dget_parent(old_dentry);
8076     + new_parent = dget_parent(new_dentry);
8077     + /* un/lock parent dentries only if they differ from old/new_dentry */
8078     + if (old_parent != old_dentry &&
8079     + old_parent != new_dentry)
8080     + unionfs_lock_dentry(old_parent, UNIONFS_DMUTEX_REVAL_PARENT);
8081     + if (new_parent != old_dentry &&
8082     + new_parent != new_dentry &&
8083     + new_parent != old_parent)
8084     + unionfs_lock_dentry(new_parent, UNIONFS_DMUTEX_REVAL_CHILD);
8085     + unionfs_double_lock_dentry(old_dentry, new_dentry);
8086     +
8087     + valid = __unionfs_d_revalidate(old_dentry, old_parent, false);
8088     + if (!valid) {
8089     + err = -ESTALE;
8090     + goto out;
8091     + }
8092     + if (!d_deleted(new_dentry) && new_dentry->d_inode) {
8093     + valid = __unionfs_d_revalidate(new_dentry, new_parent, false);
8094     + if (!valid) {
8095     + err = -ESTALE;
8096     + goto out;
8097     + }
8098     + }
8099     +
8100     + if (!S_ISDIR(old_dentry->d_inode->i_mode))
8101     + err = unionfs_partial_lookup(old_dentry, old_parent);
8102     + else
8103     + err = may_rename_dir(old_dentry, old_parent);
8104     +
8105     + if (err)
8106     + goto out;
8107     +
8108     + err = unionfs_partial_lookup(new_dentry, new_parent);
8109     + if (err)
8110     + goto out;
8111     +
8112     + /*
8113     + * if new_dentry is already lower because of whiteout,
8114     + * simply override it even if the whited-out dir is not empty.
8115     + */
8116     + wh_dentry = find_first_whiteout(new_dentry);
8117     + if (!IS_ERR(wh_dentry)) {
8118     + dput(wh_dentry);
8119     + } else if (new_dentry->d_inode) {
8120     + if (S_ISDIR(old_dentry->d_inode->i_mode) !=
8121     + S_ISDIR(new_dentry->d_inode->i_mode)) {
8122     + err = S_ISDIR(old_dentry->d_inode->i_mode) ?
8123     + -ENOTDIR : -EISDIR;
8124     + goto out;
8125     + }
8126     +
8127     + if (S_ISDIR(new_dentry->d_inode->i_mode)) {
8128     + struct unionfs_dir_state *namelist = NULL;
8129     + /* check if this unionfs directory is empty or not */
8130     + err = check_empty(new_dentry, new_parent, &namelist);
8131     + if (err)
8132     + goto out;
8133     +
8134     + if (!is_robranch(new_dentry))
8135     + err = delete_whiteouts(new_dentry,
8136     + dbstart(new_dentry),
8137     + namelist);
8138     +
8139     + free_rdstate(namelist);
8140     +
8141     + if (err)
8142     + goto out;
8143     + }
8144     + }
8145     +
8146     + err = do_unionfs_rename(old_dir, old_dentry, old_parent,
8147     + new_dir, new_dentry, new_parent);
8148     + if (err)
8149     + goto out;
8150     +
8151     + /*
8152     + * force re-lookup since the dir on ro branch is not renamed, and
8153     + * lower dentries still indicate the un-renamed ones.
8154     + */
8155     + if (S_ISDIR(old_dentry->d_inode->i_mode))
8156     + atomic_dec(&UNIONFS_D(old_dentry)->generation);
8157     + else
8158     + unionfs_postcopyup_release(old_dentry);
8159     + if (new_dentry->d_inode && !S_ISDIR(new_dentry->d_inode->i_mode)) {
8160     + unionfs_postcopyup_release(new_dentry);
8161     + unionfs_postcopyup_setmnt(new_dentry);
8162     + if (!unionfs_lower_inode(new_dentry->d_inode)) {
8163     + /*
8164     + * If we get here, it means that no copyup was
8165     + * needed, and that a file by the old name already
8166     + * existing on the destination branch; that file got
8167     + * renamed earlier in this function, so all we need
8168     + * to do here is set the lower inode.
8169     + */
8170     + struct inode *inode;
8171     + inode = unionfs_lower_inode(old_dentry->d_inode);
8172     + igrab(inode);
8173     + unionfs_set_lower_inode_idx(new_dentry->d_inode,
8174     + dbstart(new_dentry),
8175     + inode);
8176     + }
8177     + }
8178     + /* if all of this renaming succeeded, update our times */
8179     + unionfs_copy_attr_times(old_dentry->d_inode);
8180     + unionfs_copy_attr_times(new_dentry->d_inode);
8181     + unionfs_check_inode(old_dir);
8182     + unionfs_check_inode(new_dir);
8183     + unionfs_check_dentry(old_dentry);
8184     + unionfs_check_dentry(new_dentry);
8185     +
8186     +out:
8187     + if (err) /* clear the new_dentry stuff created */
8188     + d_drop(new_dentry);
8189     +
8190     + unionfs_double_unlock_dentry(old_dentry, new_dentry);
8191     + if (new_parent != old_dentry &&
8192     + new_parent != new_dentry &&
8193     + new_parent != old_parent)
8194     + unionfs_unlock_dentry(new_parent);
8195     + if (old_parent != old_dentry &&
8196     + old_parent != new_dentry)
8197     + unionfs_unlock_dentry(old_parent);
8198     + dput(new_parent);
8199     + dput(old_parent);
8200     + unionfs_read_unlock(old_dentry->d_sb);
8201     +
8202     + return err;
8203     +}
8204     diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
8205     new file mode 100644
8206     index 0000000..b923742
8207     --- /dev/null
8208     +++ b/fs/unionfs/sioq.c
8209     @@ -0,0 +1,101 @@
8210     +/*
8211     + * Copyright (c) 2006-2011 Erez Zadok
8212     + * Copyright (c) 2006 Charles P. Wright
8213     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8214     + * Copyright (c) 2006 Junjiro Okajima
8215     + * Copyright (c) 2006 David P. Quigley
8216     + * Copyright (c) 2006-2011 Stony Brook University
8217     + * Copyright (c) 2006-2011 The Research Foundation of SUNY
8218     + *
8219     + * This program is free software; you can redistribute it and/or modify
8220     + * it under the terms of the GNU General Public License version 2 as
8221     + * published by the Free Software Foundation.
8222     + */
8223     +
8224     +#include "union.h"
8225     +
8226     +/*
8227     + * Super-user IO work Queue - sometimes we need to perform actions which
8228     + * would fail due to the unix permissions on the parent directory (e.g.,
8229     + * rmdir a directory which appears empty, but in reality contains
8230     + * whiteouts).
8231     + */
8232     +
8233     +static struct workqueue_struct *superio_workqueue;
8234     +
8235     +int __init init_sioq(void)
8236     +{
8237     + int err;
8238     +
8239     + superio_workqueue = create_workqueue("unionfs_siod");
8240     + if (!IS_ERR(superio_workqueue))
8241     + return 0;
8242     +
8243     + err = PTR_ERR(superio_workqueue);
8244     + printk(KERN_ERR "unionfs: create_workqueue failed %d\n", err);
8245     + superio_workqueue = NULL;
8246     + return err;
8247     +}
8248     +
8249     +void stop_sioq(void)
8250     +{
8251     + if (superio_workqueue)
8252     + destroy_workqueue(superio_workqueue);
8253     +}
8254     +
8255     +void run_sioq(work_func_t func, struct sioq_args *args)
8256     +{
8257     + INIT_WORK(&args->work, func);
8258     +
8259     + init_completion(&args->comp);
8260     + while (!queue_work(superio_workqueue, &args->work)) {
8261     + /* TODO: do accounting if needed */
8262     + schedule();
8263     + }
8264     + wait_for_completion(&args->comp);
8265     +}
8266     +
8267     +void __unionfs_create(struct work_struct *work)
8268     +{
8269     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8270     + struct create_args *c = &args->create;
8271     +
8272     + args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
8273     + complete(&args->comp);
8274     +}
8275     +
8276     +void __unionfs_mkdir(struct work_struct *work)
8277     +{
8278     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8279     + struct mkdir_args *m = &args->mkdir;
8280     +
8281     + args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
8282     + complete(&args->comp);
8283     +}
8284     +
8285     +void __unionfs_mknod(struct work_struct *work)
8286     +{
8287     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8288     + struct mknod_args *m = &args->mknod;
8289     +
8290     + args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
8291     + complete(&args->comp);
8292     +}
8293     +
8294     +void __unionfs_symlink(struct work_struct *work)
8295     +{
8296     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8297     + struct symlink_args *s = &args->symlink;
8298     +
8299     + args->err = vfs_symlink(s->parent, s->dentry, s->symbuf);
8300     + complete(&args->comp);
8301     +}
8302     +
8303     +void __unionfs_unlink(struct work_struct *work)
8304     +{
8305     + struct sioq_args *args = container_of(work, struct sioq_args, work);
8306     + struct unlink_args *u = &args->unlink;
8307     +
8308     + args->err = vfs_unlink(u->parent, u->dentry);
8309     + complete(&args->comp);
8310     +}
8311     diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
8312     new file mode 100644
8313     index 0000000..c2dfb94
8314     --- /dev/null
8315     +++ b/fs/unionfs/sioq.h
8316     @@ -0,0 +1,91 @@
8317     +/*
8318     + * Copyright (c) 2006-2011 Erez Zadok
8319     + * Copyright (c) 2006 Charles P. Wright
8320     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
8321     + * Copyright (c) 2006 Junjiro Okajima
8322     + * Copyright (c) 2006 David P. Quigley
8323     + * Copyright (c) 2006-2011 Stony Brook University
8324     + * Copyright (c) 2006-2011 The Research Foundation of SUNY
8325     + *
8326     + * This program is free software; you can redistribute it and/or modify
8327     + * it under the terms of the GNU General Public License version 2 as
8328     + * published by the Free Software Foundation.
8329     + */
8330     +
8331     +#ifndef _SIOQ_H
8332     +#define _SIOQ_H
8333     +
8334     +struct deletewh_args {
8335     + struct unionfs_dir_state *namelist;
8336     + struct dentry *dentry;
8337     + int bindex;
8338     +};
8339     +
8340     +struct is_opaque_args {
8341     + struct dentry *dentry;
8342     +};
8343     +
8344     +struct create_args {
8345     + struct inode *parent;
8346     + struct dentry *dentry;
8347     + umode_t mode;
8348     + struct nameidata *nd;
8349     +};
8350     +
8351     +struct mkdir_args {
8352     + struct inode *parent;
8353     + struct dentry *dentry;
8354     + umode_t mode;
8355     +};
8356     +
8357     +struct mknod_args {
8358     + struct inode *parent;
8359     + struct dentry *dentry;
8360     + umode_t mode;
8361     + dev_t dev;
8362     +};
8363     +
8364     +struct symlink_args {
8365     + struct inode *parent;
8366     + struct dentry *dentry;
8367     + char *symbuf;
8368     +};
8369     +
8370     +struct unlink_args {
8371     + struct inode *parent;
8372     + struct dentry *dentry;
8373     +};
8374     +
8375     +
8376     +struct sioq_args {
8377     + struct completion comp;
8378     + struct work_struct work;
8379     + int err;
8380     + void *ret;
8381     +
8382     + union {
8383     + struct deletewh_args deletewh;
8384     + struct is_opaque_args is_opaque;
8385     + struct create_args create;
8386     + struct mkdir_args mkdir;
8387     + struct mknod_args mknod;
8388     + struct symlink_args symlink;
8389     + struct unlink_args unlink;
8390     + };
8391     +};
8392     +
8393     +/* Extern definitions for SIOQ functions */
8394     +extern int __init init_sioq(void);
8395     +extern void stop_sioq(void);
8396     +extern void run_sioq(work_func_t func, struct sioq_args *args);
8397     +
8398     +/* Extern definitions for our privilege escalation helpers */
8399     +extern void __unionfs_create(struct work_struct *work);
8400     +extern void __unionfs_mkdir(struct work_struct *work);
8401     +extern void __unionfs_mknod(struct work_struct *work);
8402     +extern void __unionfs_symlink(struct work_struct *work);
8403     +extern void __unionfs_unlink(struct work_struct *work);
8404     +extern void __delete_whiteouts(struct work_struct *work);
8405     +extern void __is_opaque_dir(struct work_struct *work);
8406     +
8407     +#endif /* not _SIOQ_H */
8408     diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
8409     new file mode 100644
8410     index 0000000..bdca2f7
8411     --- /dev/null
8412     +++ b/fs/unionfs/subr.c
8413     @@ -0,0 +1,95 @@
8414     +/*
8415     + * Copyright (c) 2003-2011 Erez Zadok
8416     + * Copyright (c) 2003-2006 Charles P. Wright
8417     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8418     + * Copyright (c) 2005-2006 Junjiro Okajima
8419     + * Copyright (c) 2005 Arun M. Krishnakumar
8420     + * Copyright (c) 2004-2006 David P. Quigley
8421     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8422     + * Copyright (c) 2003 Puja Gupta
8423     + * Copyright (c) 2003 Harikesavan Krishnan
8424     + * Copyright (c) 2003-2011 Stony Brook University
8425     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
8426     + *
8427     + * This program is free software; you can redistribute it and/or modify
8428     + * it under the terms of the GNU General Public License version 2 as
8429     + * published by the Free Software Foundation.
8430     + */
8431     +
8432     +#include "union.h"
8433     +
8434     +/*
8435     + * returns the right n_link value based on the inode type
8436     + */
8437     +int unionfs_get_nlinks(const struct inode *inode)
8438     +{
8439     + /* don't bother to do all the work since we're unlinked */
8440     + if (inode->i_nlink == 0)
8441     + return 0;
8442     +
8443     + if (!S_ISDIR(inode->i_mode))
8444     + return unionfs_lower_inode(inode)->i_nlink;
8445     +
8446     + /*
8447     + * For directories, we return 1. The only place that could cares
8448     + * about links is readdir, and there's d_type there so even that
8449     + * doesn't matter.
8450     + */
8451     + return 1;
8452     +}
8453     +
8454     +/* copy a/m/ctime from the lower branch with the newest times */
8455     +void unionfs_copy_attr_times(struct inode *upper)
8456     +{
8457     + int bindex;
8458     + struct inode *lower;
8459     +
8460     + if (!upper)
8461     + return;
8462     + if (ibstart(upper) < 0) {
8463     +#ifdef CONFIG_UNION_FS_DEBUG
8464     + WARN_ON(ibstart(upper) < 0);
8465     +#endif /* CONFIG_UNION_FS_DEBUG */
8466     + return;
8467     + }
8468     + for (bindex = ibstart(upper); bindex <= ibend(upper); bindex++) {
8469     + lower = unionfs_lower_inode_idx(upper, bindex);
8470     + if (!lower)
8471     + continue; /* not all lower dir objects may exist */
8472     + if (unlikely(timespec_compare(&upper->i_mtime,
8473     + &lower->i_mtime) < 0))
8474     + upper->i_mtime = lower->i_mtime;
8475     + if (unlikely(timespec_compare(&upper->i_ctime,
8476     + &lower->i_ctime) < 0))
8477     + upper->i_ctime = lower->i_ctime;
8478     + if (unlikely(timespec_compare(&upper->i_atime,
8479     + &lower->i_atime) < 0))
8480     + upper->i_atime = lower->i_atime;
8481     + }
8482     +}
8483     +
8484     +/*
8485     + * A unionfs/fanout version of fsstack_copy_attr_all. Uses a
8486     + * unionfs_get_nlinks to properly calcluate the number of links to a file.
8487     + * Also, copies the max() of all a/m/ctimes for all lower inodes (which is
8488     + * important if the lower inode is a directory type)
8489     + */
8490     +void unionfs_copy_attr_all(struct inode *dest,
8491     + const struct inode *src)
8492     +{
8493     + dest->i_mode = src->i_mode;
8494     + dest->i_uid = src->i_uid;
8495     + dest->i_gid = src->i_gid;
8496     + dest->i_rdev = src->i_rdev;
8497     +
8498     + unionfs_copy_attr_times(dest);
8499     +
8500     + dest->i_blkbits = src->i_blkbits;
8501     + dest->i_flags = src->i_flags;
8502     +
8503     + /*
8504     + * Update the nlinks AFTER updating the above fields, because the
8505     + * get_links callback may depend on them.
8506     + */
8507     + dest->i_nlink = unionfs_get_nlinks(dest);
8508     +}
8509     diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
8510     new file mode 100644
8511     index 0000000..c3ac814
8512     --- /dev/null
8513     +++ b/fs/unionfs/super.c
8514     @@ -0,0 +1,1030 @@
8515     +/*
8516     + * Copyright (c) 2003-2011 Erez Zadok
8517     + * Copyright (c) 2003-2006 Charles P. Wright
8518     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
8519     + * Copyright (c) 2005-2006 Junjiro Okajima
8520     + * Copyright (c) 2005 Arun M. Krishnakumar
8521     + * Copyright (c) 2004-2006 David P. Quigley
8522     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
8523     + * Copyright (c) 2003 Puja Gupta
8524     + * Copyright (c) 2003 Harikesavan Krishnan
8525     + * Copyright (c) 2003-2011 Stony Brook University
8526     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
8527     + *
8528     + * This program is free software; you can redistribute it and/or modify
8529     + * it under the terms of the GNU General Public License version 2 as
8530     + * published by the Free Software Foundation.
8531     + */
8532     +
8533     +#include "union.h"
8534     +
8535     +/*
8536     + * The inode cache is used with alloc_inode for both our inode info and the
8537     + * vfs inode.
8538     + */
8539     +static struct kmem_cache *unionfs_inode_cachep;
8540     +
8541     +struct inode *unionfs_iget(struct super_block *sb, unsigned long ino)
8542     +{
8543     + int size;
8544     + struct unionfs_inode_info *info;
8545     + struct inode *inode;
8546     +
8547     + inode = iget_locked(sb, ino);
8548     + if (!inode)
8549     + return ERR_PTR(-ENOMEM);
8550     + if (!(inode->i_state & I_NEW))
8551     + return inode;
8552     +
8553     + info = UNIONFS_I(inode);
8554     + memset(info, 0, offsetof(struct unionfs_inode_info, vfs_inode));
8555     + info->bstart = -1;
8556     + info->bend = -1;
8557     + atomic_set(&info->generation,
8558     + atomic_read(&UNIONFS_SB(inode->i_sb)->generation));
8559     + spin_lock_init(&info->rdlock);
8560     + info->rdcount = 1;
8561     + info->hashsize = -1;
8562     + INIT_LIST_HEAD(&info->readdircache);
8563     +
8564     + size = sbmax(inode->i_sb) * sizeof(struct inode *);
8565     + info->lower_inodes = kzalloc(size, GFP_KERNEL);
8566     + if (unlikely(!info->lower_inodes)) {
8567     + printk(KERN_CRIT "unionfs: no kernel memory when allocating "
8568     + "lower-pointer array!\n");
8569     + iget_failed(inode);
8570     + return ERR_PTR(-ENOMEM);
8571     + }
8572     +
8573     + inode->i_version++;
8574     + inode->i_op = &unionfs_main_iops;
8575     + inode->i_fop = &unionfs_main_fops;
8576     +
8577     + inode->i_mapping->a_ops = &unionfs_aops;
8578     +
8579     + /*
8580     + * reset times so unionfs_copy_attr_all can keep out time invariants
8581     + * right (upper inode time being the max of all lower ones).
8582     + */
8583     + inode->i_atime.tv_sec = inode->i_atime.tv_nsec = 0;
8584     + inode->i_mtime.tv_sec = inode->i_mtime.tv_nsec = 0;
8585     + inode->i_ctime.tv_sec = inode->i_ctime.tv_nsec = 0;
8586     + unlock_new_inode(inode);
8587     + return inode;
8588     +}
8589     +
8590     +/*
8591     + * final actions when unmounting a file system
8592     + *
8593     + * No need to lock rwsem.
8594     + */
8595     +static void unionfs_put_super(struct super_block *sb)
8596     +{
8597     + int bindex, bstart, bend;
8598     + struct unionfs_sb_info *spd;
8599     + int leaks = 0;
8600     +
8601     + spd = UNIONFS_SB(sb);
8602     + if (!spd)
8603     + return;
8604     +
8605     + bstart = sbstart(sb);
8606     + bend = sbend(sb);
8607     +
8608     + /* Make sure we have no leaks of branchget/branchput. */
8609     + for (bindex = bstart; bindex <= bend; bindex++)
8610     + if (unlikely(branch_count(sb, bindex) != 0)) {
8611     + printk(KERN_CRIT
8612     + "unionfs: branch %d has %d references left!\n",
8613     + bindex, branch_count(sb, bindex));
8614     + leaks = 1;
8615     + }
8616     + WARN_ON(leaks != 0);
8617     +
8618     + /* decrement lower super references */
8619     + for (bindex = bstart; bindex <= bend; bindex++) {
8620     + struct super_block *s;
8621     + s = unionfs_lower_super_idx(sb, bindex);
8622     + unionfs_set_lower_super_idx(sb, bindex, NULL);
8623     + atomic_dec(&s->s_active);
8624     + }
8625     +
8626     + kfree(spd->dev_name);
8627     + kfree(spd->data);
8628     + kfree(spd);
8629     + sb->s_fs_info = NULL;
8630     +}
8631     +
8632     +/*
8633     + * Since people use this to answer the "How big of a file can I write?"
8634     + * question, we report the size of the highest priority branch as the size of
8635     + * the union.
8636     + */
8637     +static int unionfs_statfs(struct dentry *dentry, struct kstatfs *buf)
8638     +{
8639     + int err = 0;
8640     + struct super_block *sb;
8641     + struct dentry *lower_dentry;
8642     + struct dentry *parent;
8643     + struct path lower_path;
8644     + bool valid;
8645     +
8646     + sb = dentry->d_sb;
8647     +
8648     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
8649     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
8650     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
8651     +
8652     + valid = __unionfs_d_revalidate(dentry, parent, false);
8653     + if (unlikely(!valid)) {
8654     + err = -ESTALE;
8655     + goto out;
8656     + }
8657     + unionfs_check_dentry(dentry);
8658     +
8659     + lower_dentry = unionfs_lower_dentry(sb->s_root);
8660     + lower_path.dentry = lower_dentry;
8661     + lower_path.mnt = unionfs_mntget(sb->s_root, 0);
8662     + err = vfs_statfs(&lower_path, buf);
8663     + mntput(lower_path.mnt);
8664     +
8665     + /* set return buf to our f/s to avoid confusing user-level utils */
8666     + buf->f_type = UNIONFS_SUPER_MAGIC;
8667     + /*
8668     + * Our maximum file name can is shorter by a few bytes because every
8669     + * file name could potentially be whited-out.
8670     + *
8671     + * XXX: this restriction goes away with ODF.
8672     + */
8673     + unionfs_set_max_namelen(&buf->f_namelen);
8674     +
8675     + /*
8676     + * reset two fields to avoid confusing user-land.
8677     + * XXX: is this still necessary?
8678     + */
8679     + memset(&buf->f_fsid, 0, sizeof(__kernel_fsid_t));
8680     + memset(&buf->f_spare, 0, sizeof(buf->f_spare));
8681     +
8682     +out:
8683     + unionfs_check_dentry(dentry);
8684     + unionfs_unlock_dentry(dentry);
8685     + unionfs_unlock_parent(dentry, parent);
8686     + unionfs_read_unlock(sb);
8687     + return err;
8688     +}
8689     +
8690     +/* handle mode changing during remount */
8691     +static noinline_for_stack int do_remount_mode_option(
8692     + char *optarg,
8693     + int cur_branches,
8694     + struct unionfs_data *new_data,
8695     + struct path *new_lower_paths)
8696     +{
8697     + int err = -EINVAL;
8698     + int perms, idx;
8699     + char *modename = strchr(optarg, '=');
8700     + struct path path;
8701     +
8702     + /* by now, optarg contains the branch name */
8703     + if (!*optarg) {
8704     + printk(KERN_ERR
8705     + "unionfs: no branch specified for mode change\n");
8706     + goto out;
8707     + }
8708     + if (!modename) {
8709     + printk(KERN_ERR "unionfs: branch \"%s\" requires a mode\n",
8710     + optarg);
8711     + goto out;
8712     + }
8713     + *modename++ = '\0';
8714     + err = parse_branch_mode(modename, &perms);
8715     + if (err) {
8716     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for \"%s\"\n",
8717     + modename, optarg);
8718     + goto out;
8719     + }
8720     +
8721     + /*
8722     + * Find matching branch index. For now, this assumes that nothing
8723     + * has been mounted on top of this Unionfs stack. Once we have /odf
8724     + * and cache-coherency resolved, we'll address the branch-path
8725     + * uniqueness.
8726     + */
8727     + err = kern_path(optarg, LOOKUP_FOLLOW, &path);
8728     + if (err) {
8729     + printk(KERN_ERR "unionfs: error accessing "
8730     + "lower directory \"%s\" (error %d)\n",
8731     + optarg, err);
8732     + goto out;
8733     + }
8734     + for (idx = 0; idx < cur_branches; idx++)
8735     + if (path.mnt == new_lower_paths[idx].mnt &&
8736     + path.dentry == new_lower_paths[idx].dentry)
8737     + break;
8738     + path_put(&path); /* no longer needed */
8739     + if (idx == cur_branches) {
8740     + err = -ENOENT; /* err may have been reset above */
8741     + printk(KERN_ERR "unionfs: branch \"%s\" "
8742     + "not found\n", optarg);
8743     + goto out;
8744     + }
8745     + /* check/change mode for existing branch */
8746     + /* we don't warn if perms==branchperms */
8747     + new_data[idx].branchperms = perms;
8748     + err = 0;
8749     +out:
8750     + return err;
8751     +}
8752     +
8753     +/* handle branch deletion during remount */
8754     +static noinline_for_stack int do_remount_del_option(
8755     + char *optarg, int cur_branches,
8756     + struct unionfs_data *new_data,
8757     + struct path *new_lower_paths)
8758     +{
8759     + int err = -EINVAL;
8760     + int idx;
8761     + struct path path;
8762     +
8763     + /* optarg contains the branch name to delete */
8764     +
8765     + /*
8766     + * Find matching branch index. For now, this assumes that nothing
8767     + * has been mounted on top of this Unionfs stack. Once we have /odf
8768     + * and cache-coherency resolved, we'll address the branch-path
8769     + * uniqueness.
8770     + */
8771     + err = kern_path(optarg, LOOKUP_FOLLOW, &path);
8772     + if (err) {
8773     + printk(KERN_ERR "unionfs: error accessing "
8774     + "lower directory \"%s\" (error %d)\n",
8775     + optarg, err);
8776     + goto out;
8777     + }
8778     + for (idx = 0; idx < cur_branches; idx++)
8779     + if (path.mnt == new_lower_paths[idx].mnt &&
8780     + path.dentry == new_lower_paths[idx].dentry)
8781     + break;
8782     + path_put(&path); /* no longer needed */
8783     + if (idx == cur_branches) {
8784     + printk(KERN_ERR "unionfs: branch \"%s\" "
8785     + "not found\n", optarg);
8786     + err = -ENOENT;
8787     + goto out;
8788     + }
8789     + /* check if there are any open files on the branch to be deleted */
8790     + if (atomic_read(&new_data[idx].open_files) > 0) {
8791     + err = -EBUSY;
8792     + goto out;
8793     + }
8794     +
8795     + /*
8796     + * Now we have to delete the branch. First, release any handles it
8797     + * has. Then, move the remaining array indexes past "idx" in
8798     + * new_data and new_lower_paths one to the left. Finally, adjust
8799     + * cur_branches.
8800     + */
8801     + path_put(&new_lower_paths[idx]);
8802     +
8803     + if (idx < cur_branches - 1) {
8804     + /* if idx==cur_branches-1, we delete last branch: easy */
8805     + memmove(&new_data[idx], &new_data[idx+1],
8806     + (cur_branches - 1 - idx) *
8807     + sizeof(struct unionfs_data));
8808     + memmove(&new_lower_paths[idx], &new_lower_paths[idx+1],
8809     + (cur_branches - 1 - idx) * sizeof(struct path));
8810     + }
8811     +
8812     + err = 0;
8813     +out:
8814     + return err;
8815     +}
8816     +
8817     +/* handle branch insertion during remount */
8818     +static noinline_for_stack int do_remount_add_option(
8819     + char *optarg, int cur_branches,
8820     + struct unionfs_data *new_data,
8821     + struct path *new_lower_paths,
8822     + int *high_branch_id)
8823     +{
8824     + int err = -EINVAL;
8825     + int perms;
8826     + int idx = 0; /* default: insert at beginning */
8827     + char *new_branch , *modename = NULL;
8828     + struct path path;
8829     +
8830     + /*
8831     + * optarg can be of several forms:
8832     + *
8833     + * /bar:/foo insert /foo before /bar
8834     + * /bar:/foo=ro insert /foo in ro mode before /bar
8835     + * /foo insert /foo in the beginning (prepend)
8836     + * :/foo insert /foo at the end (append)
8837     + */
8838     + if (*optarg == ':') { /* append? */
8839     + new_branch = optarg + 1; /* skip ':' */
8840     + idx = cur_branches;
8841     + goto found_insertion_point;
8842     + }
8843     + new_branch = strchr(optarg, ':');
8844     + if (!new_branch) { /* prepend? */
8845     + new_branch = optarg;
8846     + goto found_insertion_point;
8847     + }
8848     + *new_branch++ = '\0'; /* holds path+mode of new branch */
8849     +
8850     + /*
8851     + * Find matching branch index. For now, this assumes that nothing
8852     + * has been mounted on top of this Unionfs stack. Once we have /odf
8853     + * and cache-coherency resolved, we'll address the branch-path
8854     + * uniqueness.
8855     + */
8856     + err = kern_path(optarg, LOOKUP_FOLLOW, &path);
8857     + if (err) {
8858     + printk(KERN_ERR "unionfs: error accessing "
8859     + "lower directory \"%s\" (error %d)\n",
8860     + optarg, err);
8861     + goto out;
8862     + }
8863     + for (idx = 0; idx < cur_branches; idx++)
8864     + if (path.mnt == new_lower_paths[idx].mnt &&
8865     + path.dentry == new_lower_paths[idx].dentry)
8866     + break;
8867     + path_put(&path); /* no longer needed */
8868     + if (idx == cur_branches) {
8869     + printk(KERN_ERR "unionfs: branch \"%s\" "
8870     + "not found\n", optarg);
8871     + err = -ENOENT;
8872     + goto out;
8873     + }
8874     +
8875     + /*
8876     + * At this point idx will hold the index where the new branch should
8877     + * be inserted before.
8878     + */
8879     +found_insertion_point:
8880     + /* find the mode for the new branch */
8881     + if (new_branch)
8882     + modename = strchr(new_branch, '=');
8883     + if (modename)
8884     + *modename++ = '\0';
8885     + if (!new_branch || !*new_branch) {
8886     + printk(KERN_ERR "unionfs: null new branch\n");
8887     + err = -EINVAL;
8888     + goto out;
8889     + }
8890     + err = parse_branch_mode(modename, &perms);
8891     + if (err) {
8892     + printk(KERN_ERR "unionfs: invalid mode \"%s\" for "
8893     + "branch \"%s\"\n", modename, new_branch);
8894     + goto out;
8895     + }
8896     + err = kern_path(new_branch, LOOKUP_FOLLOW, &path);
8897     + if (err) {
8898     + printk(KERN_ERR "unionfs: error accessing "
8899     + "lower directory \"%s\" (error %d)\n",
8900     + new_branch, err);
8901     + goto out;
8902     + }
8903     + /*
8904     + * It's probably safe to check_mode the new branch to insert. Note:
8905     + * we don't allow inserting branches which are unionfs's by
8906     + * themselves (check_branch returns EINVAL in that case). This is
8907     + * because this code base doesn't support stacking unionfs: the ODF
8908     + * code base supports that correctly.
8909     + */
8910     + err = check_branch(&path);
8911     + if (err) {
8912     + printk(KERN_ERR "unionfs: lower directory "
8913     + "\"%s\" is not a valid branch\n", optarg);
8914     + path_put(&path);
8915     + goto out;
8916     + }
8917     +
8918     + /*
8919     + * Now we have to insert the new branch. But first, move the bits
8920     + * to make space for the new branch, if needed. Finally, adjust
8921     + * cur_branches.
8922     + * We don't release nd here; it's kept until umount/remount.
8923     + */
8924     + if (idx < cur_branches) {
8925     + /* if idx==cur_branches, we append: easy */
8926     + memmove(&new_data[idx+1], &new_data[idx],
8927     + (cur_branches - idx) * sizeof(struct unionfs_data));
8928     + memmove(&new_lower_paths[idx+1], &new_lower_paths[idx],
8929     + (cur_branches - idx) * sizeof(struct path));
8930     + }
8931     + new_lower_paths[idx].dentry = path.dentry;
8932     + new_lower_paths[idx].mnt = path.mnt;
8933     +
8934     + new_data[idx].sb = path.dentry->d_sb;
8935     + atomic_set(&new_data[idx].open_files, 0);
8936     + new_data[idx].branchperms = perms;
8937     + new_data[idx].branch_id = ++*high_branch_id; /* assign new branch ID */
8938     +
8939     + err = 0;
8940     +out:
8941     + return err;
8942     +}
8943     +
8944     +
8945     +/*
8946     + * Support branch management options on remount.
8947     + *
8948     + * See Documentation/filesystems/unionfs/ for details.
8949     + *
8950     + * @flags: numeric mount options
8951     + * @options: mount options string
8952     + *
8953     + * This function can rearrange a mounted union dynamically, adding and
8954     + * removing branches, including changing branch modes. Clearly this has to
8955     + * be done safely and atomically. Luckily, the VFS already calls this
8956     + * function with lock_super(sb) and lock_kernel() held, preventing
8957     + * concurrent mixing of new mounts, remounts, and unmounts. Moreover,
8958     + * do_remount_sb(), our caller function, already called shrink_dcache_sb(sb)
8959     + * to purge dentries/inodes from our superblock, and also called
8960     + * fsync_super(sb) to purge any dirty pages. So we're good.
8961     + *
8962     + * XXX: however, our remount code may also need to invalidate mapped pages
8963     + * so as to force them to be re-gotten from the (newly reconfigured) lower
8964     + * branches. This has to wait for proper mmap and cache coherency support
8965     + * in the VFS.
8966     + *
8967     + */
8968     +static int unionfs_remount_fs(struct super_block *sb, int *flags,
8969     + char *options)
8970     +{
8971     + int err = 0;
8972     + int i;
8973     + char *optionstmp, *tmp_to_free; /* kstrdup'ed of "options" */
8974     + char *optname;
8975     + int cur_branches = 0; /* no. of current branches */
8976     + int new_branches = 0; /* no. of branches actually left in the end */
8977     + int add_branches; /* est. no. of branches to add */
8978     + int del_branches; /* est. no. of branches to del */
8979     + int max_branches; /* max possible no. of branches */
8980     + struct unionfs_data *new_data = NULL, *tmp_data = NULL;
8981     + struct path *new_lower_paths = NULL, *tmp_lower_paths = NULL;
8982     + struct inode **new_lower_inodes = NULL;
8983     + int new_high_branch_id; /* new high branch ID */
8984     + int size; /* memory allocation size, temp var */
8985     + int old_ibstart, old_ibend;
8986     +
8987     + unionfs_write_lock(sb);
8988     +
8989     + /*
8990     + * The VFS will take care of "ro" and "rw" flags, and we can safely
8991     + * ignore MS_SILENT, but anything else left over is an error. So we
8992     + * need to check if any other flags may have been passed (none are
8993     + * allowed/supported as of now).
8994     + */
8995     + if ((*flags & ~(MS_RDONLY | MS_SILENT)) != 0) {
8996     + printk(KERN_ERR
8997     + "unionfs: remount flags 0x%x unsupported\n", *flags);
8998     + err = -EINVAL;
8999     + goto out_error;
9000     + }
9001     +
9002     + /*
9003     + * If 'options' is NULL, it's probably because the user just changed
9004     + * the union to a "ro" or "rw" and the VFS took care of it. So
9005     + * nothing to do and we're done.
9006     + */
9007     + if (!options || options[0] == '\0')
9008     + goto out_error;
9009     +
9010     + /*
9011     + * Find out how many branches we will have in the end, counting
9012     + * "add" and "del" commands. Copy the "options" string because
9013     + * strsep modifies the string and we need it later.
9014     + */
9015     + tmp_to_free = kstrdup(options, GFP_KERNEL);
9016     + optionstmp = tmp_to_free;
9017     + if (unlikely(!optionstmp)) {
9018     + err = -ENOMEM;
9019     + goto out_free;
9020     + }
9021     + cur_branches = sbmax(sb); /* current no. branches */
9022     + new_branches = sbmax(sb);
9023     + del_branches = 0;
9024     + add_branches = 0;
9025     + new_high_branch_id = sbhbid(sb); /* save current high_branch_id */
9026     + while ((optname = strsep(&optionstmp, ",")) != NULL) {
9027     + char *optarg;
9028     +
9029     + if (!optname || !*optname)
9030     + continue;
9031     +
9032     + optarg = strchr(optname, '=');
9033     + if (optarg)
9034     + *optarg++ = '\0';
9035     +
9036     + if (!strcmp("add", optname))
9037     + add_branches++;
9038     + else if (!strcmp("del", optname))
9039     + del_branches++;
9040     + }
9041     + kfree(tmp_to_free);
9042     + /* after all changes, will we have at least one branch left? */
9043     + if ((new_branches + add_branches - del_branches) < 1) {
9044     + printk(KERN_ERR
9045     + "unionfs: no branches left after remount\n");
9046     + err = -EINVAL;
9047     + goto out_free;
9048     + }
9049     +
9050     + /*
9051     + * Since we haven't actually parsed all the add/del options, nor
9052     + * have we checked them for errors, we don't know for sure how many
9053     + * branches we will have after all changes have taken place. In
9054     + * fact, the total number of branches left could be less than what
9055     + * we have now. So we need to allocate space for a temporary
9056     + * placeholder that is at least as large as the maximum number of
9057     + * branches we *could* have, which is the current number plus all
9058     + * the additions. Once we're done with these temp placeholders, we
9059     + * may have to re-allocate the final size, copy over from the temp,
9060     + * and then free the temps (done near the end of this function).
9061     + */
9062     + max_branches = cur_branches + add_branches;
9063     + /* allocate space for new pointers to lower dentry */
9064     + tmp_data = kcalloc(max_branches,
9065     + sizeof(struct unionfs_data), GFP_KERNEL);
9066     + if (unlikely(!tmp_data)) {
9067     + err = -ENOMEM;
9068     + goto out_free;
9069     + }
9070     + /* allocate space for new pointers to lower paths */
9071     + tmp_lower_paths = kcalloc(max_branches,
9072     + sizeof(struct path), GFP_KERNEL);
9073     + if (unlikely(!tmp_lower_paths)) {
9074     + err = -ENOMEM;
9075     + goto out_free;
9076     + }
9077     + /* copy current info into new placeholders, incrementing refcnts */
9078     + memcpy(tmp_data, UNIONFS_SB(sb)->data,
9079     + cur_branches * sizeof(struct unionfs_data));
9080     + memcpy(tmp_lower_paths, UNIONFS_D(sb->s_root)->lower_paths,
9081     + cur_branches * sizeof(struct path));
9082     + for (i = 0; i < cur_branches; i++)
9083     + path_get(&tmp_lower_paths[i]); /* drop refs at end of fxn */
9084     +
9085     + /*******************************************************************
9086     + * For each branch command, do kern_path on the requested branch,
9087     + * and apply the change to a temp branch list. To handle errors, we
9088     + * already dup'ed the old arrays (above), and increased the refcnts
9089     + * on various f/s objects. So now we can do all the kern_path'ss
9090     + * and branch-management commands on the new arrays. If it fail mid
9091     + * way, we free the tmp arrays and *put all objects. If we succeed,
9092     + * then we free old arrays and *put its objects, and then replace
9093     + * the arrays with the new tmp list (we may have to re-allocate the
9094     + * memory because the temp lists could have been larger than what we
9095     + * actually needed).
9096     + *******************************************************************/
9097     +
9098     + while ((optname = strsep(&options, ",")) != NULL) {
9099     + char *optarg;
9100     +
9101     + if (!optname || !*optname)
9102     + continue;
9103     + /*
9104     + * At this stage optname holds a comma-delimited option, but
9105     + * without the commas. Next, we need to break the string on
9106     + * the '=' symbol to separate CMD=ARG, where ARG itself can
9107     + * be KEY=VAL. For example, in mode=/foo=rw, CMD is "mode",
9108     + * KEY is "/foo", and VAL is "rw".
9109     + */
9110     + optarg = strchr(optname, '=');
9111     + if (optarg)
9112     + *optarg++ = '\0';
9113     + /* incgen remount option (instead of old ioctl) */
9114     + if (!strcmp("incgen", optname)) {
9115     + err = 0;
9116     + goto out_no_change;
9117     + }
9118     +
9119     + /*
9120     + * All of our options take an argument now. (Insert ones
9121     + * that don't above this check.) So at this stage optname
9122     + * contains the CMD part and optarg contains the ARG part.
9123     + */
9124     + if (!optarg || !*optarg) {
9125     + printk(KERN_ERR "unionfs: all remount options require "
9126     + "an argument (%s)\n", optname);
9127     + err = -EINVAL;
9128     + goto out_release;
9129     + }
9130     +
9131     + if (!strcmp("add", optname)) {
9132     + err = do_remount_add_option(optarg, new_branches,
9133     + tmp_data,
9134     + tmp_lower_paths,
9135     + &new_high_branch_id);
9136     + if (err)
9137     + goto out_release;
9138     + new_branches++;
9139     + if (new_branches > UNIONFS_MAX_BRANCHES) {
9140     + printk(KERN_ERR "unionfs: command exceeds "
9141     + "%d branches\n", UNIONFS_MAX_BRANCHES);
9142     + err = -E2BIG;
9143     + goto out_release;
9144     + }
9145     + continue;
9146     + }
9147     + if (!strcmp("del", optname)) {
9148     + err = do_remount_del_option(optarg, new_branches,
9149     + tmp_data,
9150     + tmp_lower_paths);
9151     + if (err)
9152     + goto out_release;
9153     + new_branches--;
9154     + continue;
9155     + }
9156     + if (!strcmp("mode", optname)) {
9157     + err = do_remount_mode_option(optarg, new_branches,
9158     + tmp_data,
9159     + tmp_lower_paths);
9160     + if (err)
9161     + goto out_release;
9162     + continue;
9163     + }
9164     +
9165     + /*
9166     + * When you use "mount -o remount,ro", mount(8) will
9167     + * reportedly pass the original dirs= string from
9168     + * /proc/mounts. So for now, we have to ignore dirs= and
9169     + * not consider it an error, unless we want to allow users
9170     + * to pass dirs= in remount. Note that to allow the VFS to
9171     + * actually process the ro/rw remount options, we have to
9172     + * return 0 from this function.
9173     + */
9174     + if (!strcmp("dirs", optname)) {
9175     + printk(KERN_WARNING
9176     + "unionfs: remount ignoring option \"%s\"\n",
9177     + optname);
9178     + continue;
9179     + }
9180     +
9181     + err = -EINVAL;
9182     + printk(KERN_ERR
9183     + "unionfs: unrecognized option \"%s\"\n", optname);
9184     + goto out_release;
9185     + }
9186     +
9187     +out_no_change:
9188     +
9189     + /******************************************************************
9190     + * WE'RE ALMOST DONE: check if leftmost branch might be read-only,
9191     + * see if we need to allocate a small-sized new vector, copy the
9192     + * vectors to their correct place, release the refcnt of the older
9193     + * ones, and return. Also handle invalidating any pages that will
9194     + * have to be re-read.
9195     + *******************************************************************/
9196     +
9197     + if (!(tmp_data[0].branchperms & MAY_WRITE)) {
9198     + printk(KERN_ERR "unionfs: leftmost branch cannot be read-only "
9199     + "(use \"remount,ro\" to create a read-only union)\n");
9200     + err = -EINVAL;
9201     + goto out_release;
9202     + }
9203     +
9204     + /* (re)allocate space for new pointers to lower dentry */
9205     + size = new_branches * sizeof(struct unionfs_data);
9206     + new_data = krealloc(tmp_data, size, GFP_KERNEL);
9207     + if (unlikely(!new_data)) {
9208     + err = -ENOMEM;
9209     + goto out_release;
9210     + }
9211     +
9212     + /* allocate space for new pointers to lower paths */
9213     + size = new_branches * sizeof(struct path);
9214     + new_lower_paths = krealloc(tmp_lower_paths, size, GFP_KERNEL);
9215     + if (unlikely(!new_lower_paths)) {
9216     + err = -ENOMEM;
9217     + goto out_release;
9218     + }
9219     +
9220     + /* allocate space for new pointers to lower inodes */
9221     + new_lower_inodes = kcalloc(new_branches,
9222     + sizeof(struct inode *), GFP_KERNEL);
9223     + if (unlikely(!new_lower_inodes)) {
9224     + err = -ENOMEM;
9225     + goto out_release;
9226     + }
9227     +
9228     + /*
9229     + * OK, just before we actually put the new set of branches in place,
9230     + * we need to ensure that our own f/s has no dirty objects left.
9231     + * Luckily, do_remount_sb() already calls shrink_dcache_sb(sb) and
9232     + * fsync_super(sb), taking care of dentries, inodes, and dirty
9233     + * pages. So all that's left is for us to invalidate any leftover
9234     + * (non-dirty) pages to ensure that they will be re-read from the
9235     + * new lower branches (and to support mmap).
9236     + */
9237     +
9238     + /*
9239     + * Once we finish the remounting successfully, our superblock
9240     + * generation number will have increased. This will be detected by
9241     + * our dentry-revalidation code upon subsequent f/s operations
9242     + * through unionfs. The revalidation code will rebuild the union of
9243     + * lower inodes for a given unionfs inode and invalidate any pages
9244     + * of such "stale" inodes (by calling our purge_inode_data
9245     + * function). This revalidation will happen lazily and
9246     + * incrementally, as users perform operations on cached inodes. We
9247     + * would like to encourage this revalidation to happen sooner if
9248     + * possible, so we like to try to invalidate as many other pages in
9249     + * our superblock as we can. We used to call drop_pagecache_sb() or
9250     + * a variant thereof, but either method was racy (drop_caches alone
9251     + * is known to be racy). So now we let the revalidation happen on a
9252     + * per file basis in ->d_revalidate.
9253     + */
9254     +
9255     + /* grab new lower super references; release old ones */
9256     + for (i = 0; i < new_branches; i++)
9257     + atomic_inc(&new_data[i].sb->s_active);
9258     + for (i = 0; i < sbmax(sb); i++)
9259     + atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
9260     +
9261     + /* copy new vectors into their correct place */
9262     + tmp_data = UNIONFS_SB(sb)->data;
9263     + UNIONFS_SB(sb)->data = new_data;
9264     + new_data = NULL; /* so don't free good pointers below */
9265     + tmp_lower_paths = UNIONFS_D(sb->s_root)->lower_paths;
9266     + UNIONFS_D(sb->s_root)->lower_paths = new_lower_paths;
9267     + new_lower_paths = NULL; /* so don't free good pointers below */
9268     +
9269     + /* update our unionfs_sb_info and root dentry index of last branch */
9270     + i = sbmax(sb); /* save no. of branches to release at end */
9271     + sbend(sb) = new_branches - 1;
9272     + dbend(sb->s_root) = new_branches - 1;
9273     + old_ibstart = ibstart(sb->s_root->d_inode);
9274     + old_ibend = ibend(sb->s_root->d_inode);
9275     + ibend(sb->s_root->d_inode) = new_branches - 1;
9276     + UNIONFS_D(sb->s_root)->bcount = new_branches;
9277     + new_branches = i; /* no. of branches to release below */
9278     +
9279     + /*
9280     + * Update lower inodes: 3 steps
9281     + * 1. grab ref on all new lower inodes
9282     + */
9283     + for (i = dbstart(sb->s_root); i <= dbend(sb->s_root); i++) {
9284     + struct dentry *lower_dentry =
9285     + unionfs_lower_dentry_idx(sb->s_root, i);
9286     + igrab(lower_dentry->d_inode);
9287     + new_lower_inodes[i] = lower_dentry->d_inode;
9288     + }
9289     + /* 2. release reference on all older lower inodes */
9290     + iput_lowers(sb->s_root->d_inode, old_ibstart, old_ibend, true);
9291     + /* 3. update root dentry's inode to new lower_inodes array */
9292     + UNIONFS_I(sb->s_root->d_inode)->lower_inodes = new_lower_inodes;
9293     + new_lower_inodes = NULL;
9294     +
9295     + /* maxbytes may have changed */
9296     + sb->s_maxbytes = unionfs_lower_super_idx(sb, 0)->s_maxbytes;
9297     + /* update high branch ID */
9298     + sbhbid(sb) = new_high_branch_id;
9299     +
9300     + /* update our sb->generation for revalidating objects */
9301     + i = atomic_inc_return(&UNIONFS_SB(sb)->generation);
9302     + atomic_set(&UNIONFS_D(sb->s_root)->generation, i);
9303     + atomic_set(&UNIONFS_I(sb->s_root->d_inode)->generation, i);
9304     + if (!(*flags & MS_SILENT))
9305     + pr_info("unionfs: %s: new generation number %d\n",
9306     + UNIONFS_SB(sb)->dev_name, i);
9307     + /* finally, update the root dentry's times */
9308     + unionfs_copy_attr_times(sb->s_root->d_inode);
9309     + err = 0; /* reset to success */
9310     +
9311     + /*
9312     + * The code above falls through to the next label, and releases the
9313     + * refcnts of the older ones (stored in tmp_*): if we fell through
9314     + * here, it means success. However, if we jump directly to this
9315     + * label from any error above, then an error occurred after we
9316     + * grabbed various refcnts, and so we have to release the
9317     + * temporarily constructed structures.
9318     + */
9319     +out_release:
9320     + /* no need to cleanup/release anything in tmp_data */
9321     + if (tmp_lower_paths)
9322     + for (i = 0; i < new_branches; i++)
9323     + path_put(&tmp_lower_paths[i]);
9324     +out_free:
9325     + kfree(tmp_lower_paths);
9326     + kfree(tmp_data);
9327     + kfree(new_lower_paths);
9328     + kfree(new_data);
9329     + kfree(new_lower_inodes);
9330     +out_error:
9331     + unionfs_check_dentry(sb->s_root);
9332     + unionfs_write_unlock(sb);
9333     + return err;
9334     +}
9335     +
9336     +/*
9337     + * Called by iput() when the inode reference count reached zero
9338     + * and the inode is not hashed anywhere. Used to clear anything
9339     + * that needs to be, before the inode is completely destroyed and put
9340     + * on the inode free list.
9341     + *
9342     + * No need to lock sb info's rwsem.
9343     + */
9344     +static void unionfs_evict_inode(struct inode *inode)
9345     +{
9346     + int bindex, bstart, bend;
9347     + struct inode *lower_inode;
9348     + struct list_head *pos, *n;
9349     + struct unionfs_dir_state *rdstate;
9350     +
9351     + truncate_inode_pages(&inode->i_data, 0);
9352     + end_writeback(inode);
9353     +
9354     + list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9355     + rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9356     + list_del(&rdstate->cache);
9357     + free_rdstate(rdstate);
9358     + }
9359     +
9360     + /*
9361     + * Decrement a reference to a lower_inode, which was incremented
9362     + * by our read_inode when it was created initially.
9363     + */
9364     + bstart = ibstart(inode);
9365     + bend = ibend(inode);
9366     + if (bstart >= 0) {
9367     + for (bindex = bstart; bindex <= bend; bindex++) {
9368     + lower_inode = unionfs_lower_inode_idx(inode, bindex);
9369     + if (!lower_inode)
9370     + continue;
9371     + unionfs_set_lower_inode_idx(inode, bindex, NULL);
9372     + /* see Documentation/filesystems/unionfs/issues.txt */
9373     + lockdep_off();
9374     + iput(lower_inode);
9375     + lockdep_on();
9376     + }
9377     + }
9378     +
9379     + kfree(UNIONFS_I(inode)->lower_inodes);
9380     + UNIONFS_I(inode)->lower_inodes = NULL;
9381     +}
9382     +
9383     +static struct inode *unionfs_alloc_inode(struct super_block *sb)
9384     +{
9385     + struct unionfs_inode_info *i;
9386     +
9387     + i = kmem_cache_alloc(unionfs_inode_cachep, GFP_KERNEL);
9388     + if (unlikely(!i))
9389     + return NULL;
9390     +
9391     + /* memset everything up to the inode to 0 */
9392     + memset(i, 0, offsetof(struct unionfs_inode_info, vfs_inode));
9393     +
9394     + i->vfs_inode.i_version = 1;
9395     + return &i->vfs_inode;
9396     +}
9397     +
9398     +static void unionfs_destroy_inode(struct inode *inode)
9399     +{
9400     + kmem_cache_free(unionfs_inode_cachep, UNIONFS_I(inode));
9401     +}
9402     +
9403     +/* unionfs inode cache constructor */
9404     +static void init_once(void *obj)
9405     +{
9406     + struct unionfs_inode_info *i = obj;
9407     +
9408     + inode_init_once(&i->vfs_inode);
9409     +}
9410     +
9411     +int unionfs_init_inode_cache(void)
9412     +{
9413     + int err = 0;
9414     +
9415     + unionfs_inode_cachep =
9416     + kmem_cache_create("unionfs_inode_cache",
9417     + sizeof(struct unionfs_inode_info), 0,
9418     + SLAB_RECLAIM_ACCOUNT, init_once);
9419     + if (unlikely(!unionfs_inode_cachep))
9420     + err = -ENOMEM;
9421     + return err;
9422     +}
9423     +
9424     +/* unionfs inode cache destructor */
9425     +void unionfs_destroy_inode_cache(void)
9426     +{
9427     + if (unionfs_inode_cachep)
9428     + kmem_cache_destroy(unionfs_inode_cachep);
9429     +}
9430     +
9431     +/*
9432     + * Called when we have a dirty inode, right here we only throw out
9433     + * parts of our readdir list that are too old.
9434     + *
9435     + * No need to grab sb info's rwsem.
9436     + */
9437     +static int unionfs_write_inode(struct inode *inode,
9438     + struct writeback_control *wbc)
9439     +{
9440     + struct list_head *pos, *n;
9441     + struct unionfs_dir_state *rdstate;
9442     +
9443     + spin_lock(&UNIONFS_I(inode)->rdlock);
9444     + list_for_each_safe(pos, n, &UNIONFS_I(inode)->readdircache) {
9445     + rdstate = list_entry(pos, struct unionfs_dir_state, cache);
9446     + /* We keep this list in LRU order. */
9447     + if ((rdstate->access + RDCACHE_JIFFIES) > jiffies)
9448     + break;
9449     + UNIONFS_I(inode)->rdcount--;
9450     + list_del(&rdstate->cache);
9451     + free_rdstate(rdstate);
9452     + }
9453     + spin_unlock(&UNIONFS_I(inode)->rdlock);
9454     +
9455     + return 0;
9456     +}
9457     +
9458     +/*
9459     + * Used only in nfs, to kill any pending RPC tasks, so that subsequent
9460     + * code can actually succeed and won't leave tasks that need handling.
9461     + */
9462     +static void unionfs_umount_begin(struct super_block *sb)
9463     +{
9464     + struct super_block *lower_sb;
9465     + int bindex, bstart, bend;
9466     +
9467     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9468     +
9469     + bstart = sbstart(sb);
9470     + bend = sbend(sb);
9471     + for (bindex = bstart; bindex <= bend; bindex++) {
9472     + lower_sb = unionfs_lower_super_idx(sb, bindex);
9473     +
9474     + if (lower_sb && lower_sb->s_op &&
9475     + lower_sb->s_op->umount_begin)
9476     + lower_sb->s_op->umount_begin(lower_sb);
9477     + }
9478     +
9479     + unionfs_read_unlock(sb);
9480     +}
9481     +
9482     +static int unionfs_show_options(struct seq_file *m, struct vfsmount *mnt)
9483     +{
9484     + struct super_block *sb = mnt->mnt_sb;
9485     + int ret = 0;
9486     + char *tmp_page;
9487     + char *path;
9488     + int bindex, bstart, bend;
9489     + int perms;
9490     +
9491     + /* to prevent a silly lockdep warning with namespace_sem */
9492     + lockdep_off();
9493     + unionfs_read_lock(sb, UNIONFS_SMUTEX_CHILD);
9494     + unionfs_lock_dentry(sb->s_root, UNIONFS_DMUTEX_CHILD);
9495     +
9496     + tmp_page = (char *) __get_free_page(GFP_KERNEL);
9497     + if (unlikely(!tmp_page)) {
9498     + ret = -ENOMEM;
9499     + goto out;
9500     + }
9501     +
9502     + bstart = sbstart(sb);
9503     + bend = sbend(sb);
9504     +
9505     + seq_printf(m, ",dirs=");
9506     + for (bindex = bstart; bindex <= bend; bindex++) {
9507     + struct path p;
9508     + p.dentry = unionfs_lower_dentry_idx(sb->s_root, bindex);
9509     + p.mnt = unionfs_lower_mnt_idx(sb->s_root, bindex);
9510     + path = d_path(&p, tmp_page, PAGE_SIZE);
9511     + if (IS_ERR(path)) {
9512     + ret = PTR_ERR(path);
9513     + goto out;
9514     + }
9515     +
9516     + perms = branchperms(sb, bindex);
9517     +
9518     + seq_printf(m, "%s=%s", path,
9519     + perms & MAY_WRITE ? "rw" : "ro");
9520     + if (bindex != bend)
9521     + seq_printf(m, ":");
9522     + }
9523     +
9524     +out:
9525     + free_page((unsigned long) tmp_page);
9526     +
9527     + unionfs_unlock_dentry(sb->s_root);
9528     + unionfs_read_unlock(sb);
9529     + lockdep_on();
9530     +
9531     + return ret;
9532     +}
9533     +
9534     +struct super_operations unionfs_sops = {
9535     + .put_super = unionfs_put_super,
9536     + .statfs = unionfs_statfs,
9537     + .remount_fs = unionfs_remount_fs,
9538     + .evict_inode = unionfs_evict_inode,
9539     + .umount_begin = unionfs_umount_begin,
9540     + .show_options = unionfs_show_options,
9541     + .write_inode = unionfs_write_inode,
9542     + .alloc_inode = unionfs_alloc_inode,
9543     + .destroy_inode = unionfs_destroy_inode,
9544     +};
9545     diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
9546     new file mode 100644
9547     index 0000000..1821705
9548     --- /dev/null
9549     +++ b/fs/unionfs/union.h
9550     @@ -0,0 +1,679 @@
9551     +/*
9552     + * Copyright (c) 2003-2011 Erez Zadok
9553     + * Copyright (c) 2003-2006 Charles P. Wright
9554     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
9555     + * Copyright (c) 2005 Arun M. Krishnakumar
9556     + * Copyright (c) 2004-2006 David P. Quigley
9557     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
9558     + * Copyright (c) 2003 Puja Gupta
9559     + * Copyright (c) 2003 Harikesavan Krishnan
9560     + * Copyright (c) 2003-2011 Stony Brook University
9561     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
9562     + *
9563     + * This program is free software; you can redistribute it and/or modify
9564     + * it under the terms of the GNU General Public License version 2 as
9565     + * published by the Free Software Foundation.
9566     + */
9567     +
9568     +#ifndef _UNION_H_
9569     +#define _UNION_H_
9570     +
9571     +#include <linux/dcache.h>
9572     +#include <linux/file.h>
9573     +#include <linux/list.h>
9574     +#include <linux/fs.h>
9575     +#include <linux/mm.h>
9576     +#include <linux/module.h>
9577     +#include <linux/mount.h>
9578     +#include <linux/namei.h>
9579     +#include <linux/page-flags.h>
9580     +#include <linux/pagemap.h>
9581     +#include <linux/poll.h>
9582     +#include <linux/security.h>
9583     +#include <linux/seq_file.h>
9584     +#include <linux/slab.h>
9585     +#include <linux/spinlock.h>
9586     +#include <linux/statfs.h>
9587     +#include <linux/string.h>
9588     +#include <linux/vmalloc.h>
9589     +#include <linux/writeback.h>
9590     +#include <linux/buffer_head.h>
9591     +#include <linux/xattr.h>
9592     +#include <linux/fs_stack.h>
9593     +#include <linux/magic.h>
9594     +#include <linux/log2.h>
9595     +#include <linux/poison.h>
9596     +#include <linux/mman.h>
9597     +#include <linux/backing-dev.h>
9598     +#include <linux/splice.h>
9599     +#include <linux/sched.h>
9600     +
9601     +#include <asm/system.h>
9602     +
9603     +#include <linux/union_fs.h>
9604     +
9605     +/* the file system name */
9606     +#define UNIONFS_NAME "unionfs"
9607     +
9608     +/* unionfs root inode number */
9609     +#define UNIONFS_ROOT_INO 1
9610     +
9611     +/* number of times we try to get a unique temporary file name */
9612     +#define GET_TMPNAM_MAX_RETRY 5
9613     +
9614     +/* maximum number of branches we support, to avoid memory blowup */
9615     +#define UNIONFS_MAX_BRANCHES 128
9616     +
9617     +/* minimum time (seconds) required for time-based cache-coherency */
9618     +#define UNIONFS_MIN_CC_TIME 3
9619     +
9620     +/* Operations vectors defined in specific files. */
9621     +extern struct file_operations unionfs_main_fops;
9622     +extern struct file_operations unionfs_dir_fops;
9623     +extern struct inode_operations unionfs_main_iops;
9624     +extern struct inode_operations unionfs_dir_iops;
9625     +extern struct inode_operations unionfs_symlink_iops;
9626     +extern struct super_operations unionfs_sops;
9627     +extern struct dentry_operations unionfs_dops;
9628     +extern struct address_space_operations unionfs_aops, unionfs_dummy_aops;
9629     +extern struct vm_operations_struct unionfs_vm_ops;
9630     +
9631     +/* How long should an entry be allowed to persist */
9632     +#define RDCACHE_JIFFIES (5*HZ)
9633     +
9634     +/* compatibility with Real-Time patches */
9635     +#ifdef CONFIG_PREEMPT_RT
9636     +# define unionfs_rw_semaphore compat_rw_semaphore
9637     +#else /* not CONFIG_PREEMPT_RT */
9638     +# define unionfs_rw_semaphore rw_semaphore
9639     +#endif /* not CONFIG_PREEMPT_RT */
9640     +
9641     +/* file private data. */
9642     +struct unionfs_file_info {
9643     + int bstart;
9644     + int bend;
9645     + atomic_t generation;
9646     +
9647     + struct unionfs_dir_state *rdstate;
9648     + struct file **lower_files;
9649     + int *saved_branch_ids; /* IDs of branches when file was opened */
9650     + const struct vm_operations_struct *lower_vm_ops;
9651     + bool wrote_to_file; /* for delayed copyup */
9652     +};
9653     +
9654     +/* unionfs inode data in memory */
9655     +struct unionfs_inode_info {
9656     + int bstart;
9657     + int bend;
9658     + atomic_t generation;
9659     + /* Stuff for readdir over NFS. */
9660     + spinlock_t rdlock;
9661     + struct list_head readdircache;
9662     + int rdcount;
9663     + int hashsize;
9664     + int cookie;
9665     +
9666     + /* The lower inodes */
9667     + struct inode **lower_inodes;
9668     +
9669     + struct inode vfs_inode;
9670     +};
9671     +
9672     +/* unionfs dentry data in memory */
9673     +struct unionfs_dentry_info {
9674     + /*
9675     + * The semaphore is used to lock the dentry as soon as we get into a
9676     + * unionfs function from the VFS. Our lock ordering is that children
9677     + * go before their parents.
9678     + */
9679     + struct mutex lock;
9680     + int bstart;
9681     + int bend;
9682     + int bopaque;
9683     + int bcount;
9684     + atomic_t generation;
9685     + struct path *lower_paths;
9686     +};
9687     +
9688     +/* These are the pointers to our various objects. */
9689     +struct unionfs_data {
9690     + struct super_block *sb; /* lower super_block */
9691     + atomic_t open_files; /* number of open files on branch */
9692     + int branchperms;
9693     + int branch_id; /* unique branch ID at re/mount time */
9694     +};
9695     +
9696     +/* unionfs super-block data in memory */
9697     +struct unionfs_sb_info {
9698     + int bend;
9699     +
9700     + atomic_t generation;
9701     +
9702     + /*
9703     + * This rwsem is used to make sure that a branch management
9704     + * operation...
9705     + * 1) will not begin before all currently in-flight operations
9706     + * complete.
9707     + * 2) any new operations do not execute until the currently
9708     + * running branch management operation completes.
9709     + *
9710     + * The write_lock_owner records the PID of the task which grabbed
9711     + * the rw_sem for writing. If the same task also tries to grab the
9712     + * read lock, we allow it. This prevents a self-deadlock when
9713     + * branch-management is used on a pivot_root'ed union, because we
9714     + * have to ->lookup paths which belong to the same union.
9715     + */
9716     + struct unionfs_rw_semaphore rwsem;
9717     + pid_t write_lock_owner; /* PID of rw_sem owner (write lock) */
9718     + int high_branch_id; /* last unique branch ID given */
9719     + char *dev_name; /* to identify different unions in pr_debug */
9720     + struct unionfs_data *data;
9721     +};
9722     +
9723     +/*
9724     + * structure for making the linked list of entries by readdir on left branch
9725     + * to compare with entries on right branch
9726     + */
9727     +struct filldir_node {
9728     + struct list_head file_list; /* list for directory entries */
9729     + char *name; /* name entry */
9730     + int hash; /* name hash */
9731     + int namelen; /* name len since name is not 0 terminated */
9732     +
9733     + /*
9734     + * we can check for duplicate whiteouts and files in the same branch
9735     + * in order to return -EIO.
9736     + */
9737     + int bindex;
9738     +
9739     + /* is this a whiteout entry? */
9740     + int whiteout;
9741     +
9742     + /* Inline name, so we don't need to separately kmalloc small ones */
9743     + char iname[DNAME_INLINE_LEN];
9744     +};
9745     +
9746     +/* Directory hash table. */
9747     +struct unionfs_dir_state {
9748     + unsigned int cookie; /* the cookie, based off of rdversion */
9749     + unsigned int offset; /* The entry we have returned. */
9750     + int bindex;
9751     + loff_t dirpos; /* offset within the lower level directory */
9752     + int size; /* How big is the hash table? */
9753     + int hashentries; /* How many entries have been inserted? */
9754     + unsigned long access;
9755     +
9756     + /* This cache list is used when the inode keeps us around. */
9757     + struct list_head cache;
9758     + struct list_head list[0];
9759     +};
9760     +
9761     +/* externs needed for fanout.h or sioq.h */
9762     +extern int unionfs_get_nlinks(const struct inode *inode);
9763     +extern void unionfs_copy_attr_times(struct inode *upper);
9764     +extern void unionfs_copy_attr_all(struct inode *dest, const struct inode *src);
9765     +
9766     +/* include miscellaneous macros */
9767     +#include "fanout.h"
9768     +#include "sioq.h"
9769     +
9770     +/* externs for cache creation/deletion routines */
9771     +extern void unionfs_destroy_filldir_cache(void);
9772     +extern int unionfs_init_filldir_cache(void);
9773     +extern int unionfs_init_inode_cache(void);
9774     +extern void unionfs_destroy_inode_cache(void);
9775     +extern int unionfs_init_dentry_cache(void);
9776     +extern void unionfs_destroy_dentry_cache(void);
9777     +
9778     +/* Initialize and free readdir-specific state. */
9779     +extern int init_rdstate(struct file *file);
9780     +extern struct unionfs_dir_state *alloc_rdstate(struct inode *inode,
9781     + int bindex);
9782     +extern struct unionfs_dir_state *find_rdstate(struct inode *inode,
9783     + loff_t fpos);
9784     +extern void free_rdstate(struct unionfs_dir_state *state);
9785     +extern int add_filldir_node(struct unionfs_dir_state *rdstate,
9786     + const char *name, int namelen, int bindex,
9787     + int whiteout);
9788     +extern struct filldir_node *find_filldir_node(struct unionfs_dir_state *rdstate,
9789     + const char *name, int namelen,
9790     + int is_whiteout);
9791     +
9792     +extern struct dentry **alloc_new_dentries(int objs);
9793     +extern struct unionfs_data *alloc_new_data(int objs);
9794     +
9795     +/* We can only use 32-bits of offset for rdstate --- blech! */
9796     +#define DIREOF (0xfffff)
9797     +#define RDOFFBITS 20 /* This is the number of bits in DIREOF. */
9798     +#define MAXRDCOOKIE (0xfff)
9799     +/* Turn an rdstate into an offset. */
9800     +static inline off_t rdstate2offset(struct unionfs_dir_state *buf)
9801     +{
9802     + off_t tmp;
9803     +
9804     + tmp = ((buf->cookie & MAXRDCOOKIE) << RDOFFBITS)
9805     + | (buf->offset & DIREOF);
9806     + return tmp;
9807     +}
9808     +
9809     +/* Macros for locking a super_block. */
9810     +enum unionfs_super_lock_class {
9811     + UNIONFS_SMUTEX_NORMAL,
9812     + UNIONFS_SMUTEX_PARENT, /* when locking on behalf of file */
9813     + UNIONFS_SMUTEX_CHILD, /* when locking on behalf of dentry */
9814     +};
9815     +static inline void unionfs_read_lock(struct super_block *sb, int subclass)
9816     +{
9817     + if (UNIONFS_SB(sb)->write_lock_owner &&
9818     + UNIONFS_SB(sb)->write_lock_owner == current->pid)
9819     + return;
9820     + down_read_nested(&UNIONFS_SB(sb)->rwsem, subclass);
9821     +}
9822     +static inline void unionfs_read_unlock(struct super_block *sb)
9823     +{
9824     + if (UNIONFS_SB(sb)->write_lock_owner &&
9825     + UNIONFS_SB(sb)->write_lock_owner == current->pid)
9826     + return;
9827     + up_read(&UNIONFS_SB(sb)->rwsem);
9828     +}
9829     +static inline void unionfs_write_lock(struct super_block *sb)
9830     +{
9831     + down_write(&UNIONFS_SB(sb)->rwsem);
9832     + UNIONFS_SB(sb)->write_lock_owner = current->pid;
9833     +}
9834     +static inline void unionfs_write_unlock(struct super_block *sb)
9835     +{
9836     + up_write(&UNIONFS_SB(sb)->rwsem);
9837     + UNIONFS_SB(sb)->write_lock_owner = 0;
9838     +}
9839     +
9840     +static inline void unionfs_double_lock_dentry(struct dentry *d1,
9841     + struct dentry *d2)
9842     +{
9843     + BUG_ON(d1 == d2);
9844     + if (d1 < d2) {
9845     + unionfs_lock_dentry(d1, UNIONFS_DMUTEX_PARENT);
9846     + unionfs_lock_dentry(d2, UNIONFS_DMUTEX_CHILD);
9847     + } else {
9848     + unionfs_lock_dentry(d2, UNIONFS_DMUTEX_PARENT);
9849     + unionfs_lock_dentry(d1, UNIONFS_DMUTEX_CHILD);
9850     + }
9851     +}
9852     +
9853     +static inline void unionfs_double_unlock_dentry(struct dentry *d1,
9854     + struct dentry *d2)
9855     +{
9856     + BUG_ON(d1 == d2);
9857     + if (d1 < d2) { /* unlock in reverse order than double_lock_dentry */
9858     + unionfs_unlock_dentry(d1);
9859     + unionfs_unlock_dentry(d2);
9860     + } else {
9861     + unionfs_unlock_dentry(d2);
9862     + unionfs_unlock_dentry(d1);
9863     + }
9864     +}
9865     +
9866     +static inline void unionfs_double_lock_parents(struct dentry *p1,
9867     + struct dentry *p2)
9868     +{
9869     + if (p1 == p2) {
9870     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9871     + return;
9872     + }
9873     + if (p1 < p2) {
9874     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_PARENT);
9875     + unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_CHILD);
9876     + } else {
9877     + unionfs_lock_dentry(p2, UNIONFS_DMUTEX_REVAL_PARENT);
9878     + unionfs_lock_dentry(p1, UNIONFS_DMUTEX_REVAL_CHILD);
9879     + }
9880     +}
9881     +
9882     +static inline void unionfs_double_unlock_parents(struct dentry *p1,
9883     + struct dentry *p2)
9884     +{
9885     + if (p1 == p2) {
9886     + unionfs_unlock_dentry(p1);
9887     + return;
9888     + }
9889     + if (p1 < p2) { /* unlock in reverse order of double_lock_parents */
9890     + unionfs_unlock_dentry(p1);
9891     + unionfs_unlock_dentry(p2);
9892     + } else {
9893     + unionfs_unlock_dentry(p2);
9894     + unionfs_unlock_dentry(p1);
9895     + }
9896     +}
9897     +
9898     +extern int new_dentry_private_data(struct dentry *dentry, int subclass);
9899     +extern int realloc_dentry_private_data(struct dentry *dentry);
9900     +extern void free_dentry_private_data(struct dentry *dentry);
9901     +extern void update_bstart(struct dentry *dentry);
9902     +extern int init_lower_nd(struct nameidata *nd, unsigned int flags);
9903     +extern void release_lower_nd(struct nameidata *nd, int err);
9904     +
9905     +/*
9906     + * EXTERNALS:
9907     + */
9908     +
9909     +/* replicates the directory structure up to given dentry in given branch */
9910     +extern struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
9911     + const char *name, int bindex);
9912     +
9913     +/* partial lookup */
9914     +extern int unionfs_partial_lookup(struct dentry *dentry,
9915     + struct dentry *parent);
9916     +extern struct dentry *unionfs_lookup_full(struct dentry *dentry,
9917     + struct dentry *parent,
9918     + int lookupmode);
9919     +
9920     +/* copies a file from dbstart to newbindex branch */
9921     +extern int copyup_file(struct inode *dir, struct file *file, int bstart,
9922     + int newbindex, loff_t size);
9923     +extern int copyup_named_file(struct inode *dir, struct file *file,
9924     + char *name, int bstart, int new_bindex,
9925     + loff_t len);
9926     +/* copies a dentry from dbstart to newbindex branch */
9927     +extern int copyup_dentry(struct inode *dir, struct dentry *dentry,
9928     + int bstart, int new_bindex, const char *name,
9929     + int namelen, struct file **copyup_file, loff_t len);
9930     +/* helper functions for post-copyup actions */
9931     +extern void unionfs_postcopyup_setmnt(struct dentry *dentry);
9932     +extern void unionfs_postcopyup_release(struct dentry *dentry);
9933     +
9934     +/* Is this directory empty: 0 if it is empty, -ENOTEMPTY if not. */
9935     +extern int check_empty(struct dentry *dentry, struct dentry *parent,
9936     + struct unionfs_dir_state **namelist);
9937     +/* whiteout and opaque directory helpers */
9938     +extern char *alloc_whname(const char *name, int len);
9939     +extern bool is_whiteout_name(char **namep, int *namelenp);
9940     +extern bool is_validname(const char *name);
9941     +extern struct dentry *lookup_whiteout(const char *name,
9942     + struct dentry *lower_parent);
9943     +extern struct dentry *find_first_whiteout(struct dentry *dentry);
9944     +extern int unlink_whiteout(struct dentry *wh_dentry);
9945     +extern int check_unlink_whiteout(struct dentry *dentry,
9946     + struct dentry *lower_dentry, int bindex);
9947     +extern int create_whiteout(struct dentry *dentry, int start);
9948     +extern int delete_whiteouts(struct dentry *dentry, int bindex,
9949     + struct unionfs_dir_state *namelist);
9950     +extern int is_opaque_dir(struct dentry *dentry, int bindex);
9951     +extern int make_dir_opaque(struct dentry *dir, int bindex);
9952     +extern void unionfs_set_max_namelen(long *namelen);
9953     +
9954     +extern void unionfs_reinterpose(struct dentry *this_dentry);
9955     +extern struct super_block *unionfs_duplicate_super(struct super_block *sb);
9956     +
9957     +/* Locking functions. */
9958     +extern int unionfs_setlk(struct file *file, int cmd, struct file_lock *fl);
9959     +extern int unionfs_getlk(struct file *file, struct file_lock *fl);
9960     +
9961     +/* Common file operations. */
9962     +extern int unionfs_file_revalidate(struct file *file, struct dentry *parent,
9963     + bool willwrite);
9964     +extern int unionfs_open(struct inode *inode, struct file *file);
9965     +extern int unionfs_file_release(struct inode *inode, struct file *file);
9966     +extern int unionfs_flush(struct file *file, fl_owner_t id);
9967     +extern long unionfs_ioctl(struct file *file, unsigned int cmd,
9968     + unsigned long arg);
9969     +extern int unionfs_fsync(struct file *file, int datasync);
9970     +extern int unionfs_fasync(int fd, struct file *file, int flag);
9971     +
9972     +/* Inode operations */
9973     +extern struct inode *unionfs_iget(struct super_block *sb, unsigned long ino);
9974     +extern int unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
9975     + struct inode *new_dir, struct dentry *new_dentry);
9976     +extern int unionfs_unlink(struct inode *dir, struct dentry *dentry);
9977     +extern int unionfs_rmdir(struct inode *dir, struct dentry *dentry);
9978     +
9979     +extern bool __unionfs_d_revalidate(struct dentry *dentry,
9980     + struct dentry *parent, bool willwrite);
9981     +extern bool is_negative_lower(const struct dentry *dentry);
9982     +extern bool is_newer_lower(const struct dentry *dentry);
9983     +extern void purge_sb_data(struct super_block *sb);
9984     +
9985     +/* The values for unionfs_interpose's flag. */
9986     +#define INTERPOSE_DEFAULT 0
9987     +#define INTERPOSE_LOOKUP 1
9988     +#define INTERPOSE_REVAL 2
9989     +#define INTERPOSE_REVAL_NEG 3
9990     +#define INTERPOSE_PARTIAL 4
9991     +
9992     +extern struct dentry *unionfs_interpose(struct dentry *this_dentry,
9993     + struct super_block *sb, int flag);
9994     +
9995     +#ifdef CONFIG_UNION_FS_XATTR
9996     +/* Extended attribute functions. */
9997     +extern void *unionfs_xattr_alloc(size_t size, size_t limit);
9998     +static inline void unionfs_xattr_kfree(const void *p)
9999     +{
10000     + kfree(p);
10001     +}
10002     +extern ssize_t unionfs_getxattr(struct dentry *dentry, const char *name,
10003     + void *value, size_t size);
10004     +extern int unionfs_removexattr(struct dentry *dentry, const char *name);
10005     +extern ssize_t unionfs_listxattr(struct dentry *dentry, char *list,
10006     + size_t size);
10007     +extern int unionfs_setxattr(struct dentry *dentry, const char *name,
10008     + const void *value, size_t size, int flags);
10009     +#endif /* CONFIG_UNION_FS_XATTR */
10010     +
10011     +/* The root directory is unhashed, but isn't deleted. */
10012     +static inline int d_deleted(struct dentry *d)
10013     +{
10014     + return d_unhashed(d) && (d != d->d_sb->s_root);
10015     +}
10016     +
10017     +/* unionfs_permission, check if we should bypass error to facilitate copyup */
10018     +#define IS_COPYUP_ERR(err) ((err) == -EROFS)
10019     +
10020     +/* unionfs_open, check if we need to copyup the file */
10021     +#define OPEN_WRITE_FLAGS (O_WRONLY | O_RDWR | O_APPEND)
10022     +#define IS_WRITE_FLAG(flag) ((flag) & OPEN_WRITE_FLAGS)
10023     +
10024     +static inline int branchperms(const struct super_block *sb, int index)
10025     +{
10026     + BUG_ON(index < 0);
10027     + return UNIONFS_SB(sb)->data[index].branchperms;
10028     +}
10029     +
10030     +static inline int set_branchperms(struct super_block *sb, int index, int perms)
10031     +{
10032     + BUG_ON(index < 0);
10033     + UNIONFS_SB(sb)->data[index].branchperms = perms;
10034     + return perms;
10035     +}
10036     +
10037     +/* check if readonly lower inode, but possibly unlinked (no inode->i_sb) */
10038     +static inline int __is_rdonly(const struct inode *inode)
10039     +{
10040     + /* if unlinked, can't be readonly (?) */
10041     + if (!inode->i_sb)
10042     + return 0;
10043     + return IS_RDONLY(inode);
10044     +
10045     +}
10046     +/* Is this file on a read-only branch? */
10047     +static inline int is_robranch_super(const struct super_block *sb, int index)
10048     +{
10049     + int ret;
10050     +
10051     + ret = (!(branchperms(sb, index) & MAY_WRITE)) ? -EROFS : 0;
10052     + return ret;
10053     +}
10054     +
10055     +/* Is this file on a read-only branch? */
10056     +static inline int is_robranch_idx(const struct dentry *dentry, int index)
10057     +{
10058     + struct super_block *lower_sb;
10059     +
10060     + BUG_ON(index < 0);
10061     +
10062     + if (!(branchperms(dentry->d_sb, index) & MAY_WRITE))
10063     + return -EROFS;
10064     +
10065     + lower_sb = unionfs_lower_super_idx(dentry->d_sb, index);
10066     + BUG_ON(lower_sb == NULL);
10067     + /*
10068     + * test sb flags directly, not IS_RDONLY(lower_inode) because the
10069     + * lower_dentry could be a negative.
10070     + */
10071     + if (lower_sb->s_flags & MS_RDONLY)
10072     + return -EROFS;
10073     +
10074     + return 0;
10075     +}
10076     +
10077     +static inline int is_robranch(const struct dentry *dentry)
10078     +{
10079     + int index;
10080     +
10081     + index = UNIONFS_D(dentry)->bstart;
10082     + BUG_ON(index < 0);
10083     +
10084     + return is_robranch_idx(dentry, index);
10085     +}
10086     +
10087     +/*
10088     + * EXTERNALS:
10089     + */
10090     +extern int check_branch(const struct path *path);
10091     +extern int parse_branch_mode(const char *name, int *perms);
10092     +
10093     +/* locking helpers */
10094     +static inline struct dentry *lock_parent(struct dentry *dentry)
10095     +{
10096     + struct dentry *dir = dget_parent(dentry);
10097     + mutex_lock_nested(&dir->d_inode->i_mutex, I_MUTEX_PARENT);
10098     + return dir;
10099     +}
10100     +static inline struct dentry *lock_parent_wh(struct dentry *dentry)
10101     +{
10102     + struct dentry *dir = dget_parent(dentry);
10103     +
10104     + mutex_lock_nested(&dir->d_inode->i_mutex, UNIONFS_DMUTEX_WHITEOUT);
10105     + return dir;
10106     +}
10107     +
10108     +static inline void unlock_dir(struct dentry *dir)
10109     +{
10110     + mutex_unlock(&dir->d_inode->i_mutex);
10111     + dput(dir);
10112     +}
10113     +
10114     +/* lock base inode mutex before calling lookup_one_len */
10115     +static inline struct dentry *lookup_lck_len(const char *name,
10116     + struct dentry *base, int len)
10117     +{
10118     + struct dentry *d;
10119     + struct nameidata lower_nd;
10120     + int err;
10121     +
10122     + err = init_lower_nd(&lower_nd, LOOKUP_OPEN);
10123     + if (unlikely(err < 0)) {
10124     + d = ERR_PTR(err);
10125     + goto out;
10126     + }
10127     + mutex_lock(&base->d_inode->i_mutex);
10128     + d = lookup_one_len_nd(name, base, len, &lower_nd);
10129     + release_lower_nd(&lower_nd, err);
10130     + mutex_unlock(&base->d_inode->i_mutex);
10131     +out:
10132     + return d;
10133     +}
10134     +
10135     +static inline struct vfsmount *unionfs_mntget(struct dentry *dentry,
10136     + int bindex)
10137     +{
10138     + struct vfsmount *mnt;
10139     +
10140     + BUG_ON(!dentry || bindex < 0);
10141     +
10142     + mnt = mntget(unionfs_lower_mnt_idx(dentry, bindex));
10143     +#ifdef CONFIG_UNION_FS_DEBUG
10144     + if (!mnt)
10145     + pr_debug("unionfs: mntget: mnt=%p bindex=%d\n",
10146     + mnt, bindex);
10147     +#endif /* CONFIG_UNION_FS_DEBUG */
10148     +
10149     + return mnt;
10150     +}
10151     +
10152     +static inline void unionfs_mntput(struct dentry *dentry, int bindex)
10153     +{
10154     + struct vfsmount *mnt;
10155     +
10156     + if (!dentry && bindex < 0)
10157     + return;
10158     + BUG_ON(!dentry || bindex < 0);
10159     +
10160     + mnt = unionfs_lower_mnt_idx(dentry, bindex);
10161     +#ifdef CONFIG_UNION_FS_DEBUG
10162     + /*
10163     + * Directories can have NULL lower objects in between start/end, but
10164     + * NOT if at the start/end range. We cannot verify that this dentry
10165     + * is a type=DIR, because it may already be a negative dentry. But
10166     + * if dbstart is greater than dbend, we know that this couldn't have
10167     + * been a regular file: it had to have been a directory.
10168     + */
10169     + if (!mnt && !(bindex > dbstart(dentry) && bindex < dbend(dentry)))
10170     + pr_debug("unionfs: mntput: mnt=%p bindex=%d\n", mnt, bindex);
10171     +#endif /* CONFIG_UNION_FS_DEBUG */
10172     + mntput(mnt);
10173     +}
10174     +
10175     +#ifdef CONFIG_UNION_FS_DEBUG
10176     +
10177     +/* useful for tracking code reachability */
10178     +#define UDBG pr_debug("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
10179     +
10180     +#define unionfs_check_inode(i) __unionfs_check_inode((i), \
10181     + __FILE__, __func__, __LINE__)
10182     +#define unionfs_check_dentry(d) __unionfs_check_dentry((d), \
10183     + __FILE__, __func__, __LINE__)
10184     +#define unionfs_check_file(f) __unionfs_check_file((f), \
10185     + __FILE__, __func__, __LINE__)
10186     +#define unionfs_check_nd(n) __unionfs_check_nd((n), \
10187     + __FILE__, __func__, __LINE__)
10188     +#define show_branch_counts(sb) __show_branch_counts((sb), \
10189     + __FILE__, __func__, __LINE__)
10190     +#define show_inode_times(i) __show_inode_times((i), \
10191     + __FILE__, __func__, __LINE__)
10192     +#define show_dinode_times(d) __show_dinode_times((d), \
10193     + __FILE__, __func__, __LINE__)
10194     +#define show_inode_counts(i) __show_inode_counts((i), \
10195     + __FILE__, __func__, __LINE__)
10196     +
10197     +extern void __unionfs_check_inode(const struct inode *inode, const char *fname,
10198     + const char *fxn, int line);
10199     +extern void __unionfs_check_dentry(const struct dentry *dentry,
10200     + const char *fname, const char *fxn,
10201     + int line);
10202     +extern void __unionfs_check_file(const struct file *file,
10203     + const char *fname, const char *fxn, int line);
10204     +extern void __unionfs_check_nd(const struct nameidata *nd,
10205     + const char *fname, const char *fxn, int line);
10206     +extern void __show_branch_counts(const struct super_block *sb,
10207     + const char *file, const char *fxn, int line);
10208     +extern void __show_inode_times(const struct inode *inode,
10209     + const char *file, const char *fxn, int line);
10210     +extern void __show_dinode_times(const struct dentry *dentry,
10211     + const char *file, const char *fxn, int line);
10212     +extern void __show_inode_counts(const struct inode *inode,
10213     + const char *file, const char *fxn, int line);
10214     +
10215     +#else /* not CONFIG_UNION_FS_DEBUG */
10216     +
10217     +/* we leave useful hooks for these check functions throughout the code */
10218     +#define unionfs_check_inode(i) do { } while (0)
10219     +#define unionfs_check_dentry(d) do { } while (0)
10220     +#define unionfs_check_file(f) do { } while (0)
10221     +#define unionfs_check_nd(n) do { } while (0)
10222     +#define show_branch_counts(sb) do { } while (0)
10223     +#define show_inode_times(i) do { } while (0)
10224     +#define show_dinode_times(d) do { } while (0)
10225     +#define show_inode_counts(i) do { } while (0)
10226     +
10227     +#endif /* not CONFIG_UNION_FS_DEBUG */
10228     +
10229     +#endif /* not _UNION_H_ */
10230     diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
10231     new file mode 100644
10232     index 0000000..bf447bb
10233     --- /dev/null
10234     +++ b/fs/unionfs/unlink.c
10235     @@ -0,0 +1,278 @@
10236     +/*
10237     + * Copyright (c) 2003-2011 Erez Zadok
10238     + * Copyright (c) 2003-2006 Charles P. Wright
10239     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10240     + * Copyright (c) 2005-2006 Junjiro Okajima
10241     + * Copyright (c) 2005 Arun M. Krishnakumar
10242     + * Copyright (c) 2004-2006 David P. Quigley
10243     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10244     + * Copyright (c) 2003 Puja Gupta
10245     + * Copyright (c) 2003 Harikesavan Krishnan
10246     + * Copyright (c) 2003-2011 Stony Brook University
10247     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
10248     + *
10249     + * This program is free software; you can redistribute it and/or modify
10250     + * it under the terms of the GNU General Public License version 2 as
10251     + * published by the Free Software Foundation.
10252     + */
10253     +
10254     +#include "union.h"
10255     +
10256     +/*
10257     + * Helper function for Unionfs's unlink operation.
10258     + *
10259     + * The main goal of this function is to optimize the unlinking of non-dir
10260     + * objects in unionfs by deleting all possible lower inode objects from the
10261     + * underlying branches having same dentry name as the non-dir dentry on
10262     + * which this unlink operation is called. This way we delete as many lower
10263     + * inodes as possible, and save space. Whiteouts need to be created in
10264     + * branch0 only if unlinking fails on any of the lower branch other than
10265     + * branch0, or if a lower branch is marked read-only.
10266     + *
10267     + * Also, while unlinking a file, if we encounter any dir type entry in any
10268     + * intermediate branch, then we remove the directory by calling vfs_rmdir.
10269     + * The following special cases are also handled:
10270     +
10271     + * (1) If an error occurs in branch0 during vfs_unlink, then we return
10272     + * appropriate error.
10273     + *
10274     + * (2) If we get an error during unlink in any of other lower branch other
10275     + * than branch0, then we create a whiteout in branch0.
10276     + *
10277     + * (3) If a whiteout already exists in any intermediate branch, we delete
10278     + * all possible inodes only up to that branch (this is an "opaqueness"
10279     + * as as per Documentation/filesystems/unionfs/concepts.txt).
10280     + *
10281     + */
10282     +static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry,
10283     + struct dentry *parent)
10284     +{
10285     + struct dentry *lower_dentry;
10286     + struct dentry *lower_dir_dentry;
10287     + int bindex;
10288     + int err = 0;
10289     +
10290     + err = unionfs_partial_lookup(dentry, parent);
10291     + if (err)
10292     + goto out;
10293     +
10294     + /* trying to unlink all possible valid instances */
10295     + for (bindex = dbstart(dentry); bindex <= dbend(dentry); bindex++) {
10296     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10297     + if (!lower_dentry || !lower_dentry->d_inode)
10298     + continue;
10299     +
10300     + lower_dir_dentry = lock_parent(lower_dentry);
10301     +
10302     + /* avoid destroying the lower inode if the object is in use */
10303     + dget(lower_dentry);
10304     + err = is_robranch_super(dentry->d_sb, bindex);
10305     + if (!err) {
10306     + /* see Documentation/filesystems/unionfs/issues.txt */
10307     + lockdep_off();
10308     + if (!S_ISDIR(lower_dentry->d_inode->i_mode))
10309     + err = vfs_unlink(lower_dir_dentry->d_inode,
10310     + lower_dentry);
10311     + else
10312     + err = vfs_rmdir(lower_dir_dentry->d_inode,
10313     + lower_dentry);
10314     + lockdep_on();
10315     + }
10316     +
10317     + /* if lower object deletion succeeds, update inode's times */
10318     + if (!err)
10319     + unionfs_copy_attr_times(dentry->d_inode);
10320     + dput(lower_dentry);
10321     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10322     + unlock_dir(lower_dir_dentry);
10323     +
10324     + if (err)
10325     + break;
10326     + }
10327     +
10328     + /*
10329     + * Create the whiteout in branch 0 (highest priority) only if (a)
10330     + * there was an error in any intermediate branch other than branch 0
10331     + * due to failure of vfs_unlink/vfs_rmdir or (b) a branch marked or
10332     + * mounted read-only.
10333     + */
10334     + if (err) {
10335     + if ((bindex == 0) ||
10336     + ((bindex == dbstart(dentry)) &&
10337     + (!IS_COPYUP_ERR(err))))
10338     + goto out;
10339     + else {
10340     + if (!IS_COPYUP_ERR(err))
10341     + pr_debug("unionfs: lower object deletion "
10342     + "failed in branch:%d\n", bindex);
10343     + err = create_whiteout(dentry, sbstart(dentry->d_sb));
10344     + }
10345     + }
10346     +
10347     +out:
10348     + if (!err)
10349     + inode_dec_link_count(dentry->d_inode);
10350     +
10351     + /* We don't want to leave negative leftover dentries for revalidate. */
10352     + if (!err && (dbopaque(dentry) != -1))
10353     + update_bstart(dentry);
10354     +
10355     + return err;
10356     +}
10357     +
10358     +int unionfs_unlink(struct inode *dir, struct dentry *dentry)
10359     +{
10360     + int err = 0;
10361     + struct inode *inode = dentry->d_inode;
10362     + struct dentry *parent;
10363     + int valid;
10364     +
10365     + BUG_ON(S_ISDIR(inode->i_mode));
10366     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10367     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10368     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10369     +
10370     + valid = __unionfs_d_revalidate(dentry, parent, false);
10371     + if (unlikely(!valid)) {
10372     + err = -ESTALE;
10373     + goto out;
10374     + }
10375     + unionfs_check_dentry(dentry);
10376     +
10377     + err = unionfs_unlink_whiteout(dir, dentry, parent);
10378     + /* call d_drop so the system "forgets" about us */
10379     + if (!err) {
10380     + unionfs_postcopyup_release(dentry);
10381     + unionfs_postcopyup_setmnt(parent);
10382     + if (inode->i_nlink == 0) /* drop lower inodes */
10383     + iput_lowers_all(inode, false);
10384     + d_drop(dentry);
10385     + /*
10386     + * if unlink/whiteout succeeded, parent dir mtime has
10387     + * changed
10388     + */
10389     + unionfs_copy_attr_times(dir);
10390     + }
10391     +
10392     +out:
10393     + if (!err) {
10394     + unionfs_check_dentry(dentry);
10395     + unionfs_check_inode(dir);
10396     + }
10397     + unionfs_unlock_dentry(dentry);
10398     + unionfs_unlock_parent(dentry, parent);
10399     + unionfs_read_unlock(dentry->d_sb);
10400     + return err;
10401     +}
10402     +
10403     +static int unionfs_rmdir_first(struct inode *dir, struct dentry *dentry,
10404     + struct unionfs_dir_state *namelist)
10405     +{
10406     + int err;
10407     + struct dentry *lower_dentry;
10408     + struct dentry *lower_dir_dentry = NULL;
10409     +
10410     + /* Here we need to remove whiteout entries. */
10411     + err = delete_whiteouts(dentry, dbstart(dentry), namelist);
10412     + if (err)
10413     + goto out;
10414     +
10415     + lower_dentry = unionfs_lower_dentry(dentry);
10416     +
10417     + lower_dir_dentry = lock_parent(lower_dentry);
10418     +
10419     + /* avoid destroying the lower inode if the file is in use */
10420     + dget(lower_dentry);
10421     + err = is_robranch(dentry);
10422     + if (!err)
10423     + err = vfs_rmdir(lower_dir_dentry->d_inode, lower_dentry);
10424     + dput(lower_dentry);
10425     +
10426     + fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
10427     + /* propagate number of hard-links */
10428     + dentry->d_inode->i_nlink = unionfs_get_nlinks(dentry->d_inode);
10429     +
10430     +out:
10431     + if (lower_dir_dentry)
10432     + unlock_dir(lower_dir_dentry);
10433     + return err;
10434     +}
10435     +
10436     +int unionfs_rmdir(struct inode *dir, struct dentry *dentry)
10437     +{
10438     + int err = 0;
10439     + struct unionfs_dir_state *namelist = NULL;
10440     + struct dentry *parent;
10441     + int dstart, dend;
10442     + bool valid;
10443     +
10444     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
10445     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
10446     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
10447     +
10448     + valid = __unionfs_d_revalidate(dentry, parent, false);
10449     + if (unlikely(!valid)) {
10450     + err = -ESTALE;
10451     + goto out;
10452     + }
10453     + unionfs_check_dentry(dentry);
10454     +
10455     + /* check if this unionfs directory is empty or not */
10456     + err = check_empty(dentry, parent, &namelist);
10457     + if (err)
10458     + goto out;
10459     +
10460     + err = unionfs_rmdir_first(dir, dentry, namelist);
10461     + dstart = dbstart(dentry);
10462     + dend = dbend(dentry);
10463     + /*
10464     + * We create a whiteout for the directory if there was an error to
10465     + * rmdir the first directory entry in the union. Otherwise, we
10466     + * create a whiteout only if there is no chance that a lower
10467     + * priority branch might also have the same named directory. IOW,
10468     + * if there is not another same-named directory at a lower priority
10469     + * branch, then we don't need to create a whiteout for it.
10470     + */
10471     + if (!err) {
10472     + if (dstart < dend)
10473     + err = create_whiteout(dentry, dstart);
10474     + } else {
10475     + int new_err;
10476     +
10477     + if (dstart == 0)
10478     + goto out;
10479     +
10480     + /* exit if the error returned was NOT -EROFS */
10481     + if (!IS_COPYUP_ERR(err))
10482     + goto out;
10483     +
10484     + new_err = create_whiteout(dentry, dstart - 1);
10485     + if (new_err != -EEXIST)
10486     + err = new_err;
10487     + }
10488     +
10489     +out:
10490     + /*
10491     + * Drop references to lower dentry/inode so storage space for them
10492     + * can be reclaimed. Then, call d_drop so the system "forgets"
10493     + * about us.
10494     + */
10495     + if (!err) {
10496     + iput_lowers_all(dentry->d_inode, false);
10497     + dput(unionfs_lower_dentry_idx(dentry, dstart));
10498     + unionfs_set_lower_dentry_idx(dentry, dstart, NULL);
10499     + d_drop(dentry);
10500     + /* update our lower vfsmnts, in case a copyup took place */
10501     + unionfs_postcopyup_setmnt(dentry);
10502     + unionfs_check_dentry(dentry);
10503     + unionfs_check_inode(dir);
10504     + }
10505     +
10506     + if (namelist)
10507     + free_rdstate(namelist);
10508     +
10509     + unionfs_unlock_dentry(dentry);
10510     + unionfs_unlock_parent(dentry, parent);
10511     + unionfs_read_unlock(dentry->d_sb);
10512     + return err;
10513     +}
10514     diff --git a/fs/unionfs/whiteout.c b/fs/unionfs/whiteout.c
10515     new file mode 100644
10516     index 0000000..582cef2
10517     --- /dev/null
10518     +++ b/fs/unionfs/whiteout.c
10519     @@ -0,0 +1,601 @@
10520     +/*
10521     + * Copyright (c) 2003-2011 Erez Zadok
10522     + * Copyright (c) 2003-2006 Charles P. Wright
10523     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
10524     + * Copyright (c) 2005-2006 Junjiro Okajima
10525     + * Copyright (c) 2005 Arun M. Krishnakumar
10526     + * Copyright (c) 2004-2006 David P. Quigley
10527     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
10528     + * Copyright (c) 2003 Puja Gupta
10529     + * Copyright (c) 2003 Harikesavan Krishnan
10530     + * Copyright (c) 2003-2011 Stony Brook University
10531     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
10532     + *
10533     + * This program is free software; you can redistribute it and/or modify
10534     + * it under the terms of the GNU General Public License version 2 as
10535     + * published by the Free Software Foundation.
10536     + */
10537     +
10538     +#include "union.h"
10539     +
10540     +/*
10541     + * whiteout and opaque directory helpers
10542     + */
10543     +
10544     +/* What do we use for whiteouts. */
10545     +#define UNIONFS_WHPFX ".wh."
10546     +#define UNIONFS_WHLEN 4
10547     +/*
10548     + * If a directory contains this file, then it is opaque. We start with the
10549     + * .wh. flag so that it is blocked by lookup.
10550     + */
10551     +#define UNIONFS_DIR_OPAQUE_NAME "__dir_opaque"
10552     +#define UNIONFS_DIR_OPAQUE UNIONFS_WHPFX UNIONFS_DIR_OPAQUE_NAME
10553     +
10554     +/* construct whiteout filename */
10555     +char *alloc_whname(const char *name, int len)
10556     +{
10557     + char *buf;
10558     +
10559     + buf = kmalloc(len + UNIONFS_WHLEN + 1, GFP_KERNEL);
10560     + if (unlikely(!buf))
10561     + return ERR_PTR(-ENOMEM);
10562     +
10563     + strcpy(buf, UNIONFS_WHPFX);
10564     + strlcat(buf, name, len + UNIONFS_WHLEN + 1);
10565     +
10566     + return buf;
10567     +}
10568     +
10569     +/*
10570     + * XXX: this can be inline or CPP macro, but is here to keep all whiteout
10571     + * code in one place.
10572     + */
10573     +void unionfs_set_max_namelen(long *namelen)
10574     +{
10575     + *namelen -= UNIONFS_WHLEN;
10576     +}
10577     +
10578     +/* check if @namep is a whiteout, update @namep and @namelenp accordingly */
10579     +bool is_whiteout_name(char **namep, int *namelenp)
10580     +{
10581     + if (*namelenp > UNIONFS_WHLEN &&
10582     + !strncmp(*namep, UNIONFS_WHPFX, UNIONFS_WHLEN)) {
10583     + *namep += UNIONFS_WHLEN;
10584     + *namelenp -= UNIONFS_WHLEN;
10585     + return true;
10586     + }
10587     + return false;
10588     +}
10589     +
10590     +/* is the filename valid == !(whiteout for a file or opaque dir marker) */
10591     +bool is_validname(const char *name)
10592     +{
10593     + if (!strncmp(name, UNIONFS_WHPFX, UNIONFS_WHLEN))
10594     + return false;
10595     + if (!strncmp(name, UNIONFS_DIR_OPAQUE_NAME,
10596     + sizeof(UNIONFS_DIR_OPAQUE_NAME) - 1))
10597     + return false;
10598     + return true;
10599     +}
10600     +
10601     +/*
10602     + * Look for a whiteout @name in @lower_parent directory. If error, return
10603     + * ERR_PTR. Caller must dput() the returned dentry if not an error.
10604     + *
10605     + * XXX: some callers can reuse the whname allocated buffer to avoid repeated
10606     + * free then re-malloc calls. Need to provide a different API for those
10607     + * callers.
10608     + */
10609     +struct dentry *lookup_whiteout(const char *name, struct dentry *lower_parent)
10610     +{
10611     + char *whname = NULL;
10612     + int err = 0, namelen;
10613     + struct dentry *wh_dentry = NULL;
10614     +
10615     + namelen = strlen(name);
10616     + whname = alloc_whname(name, namelen);
10617     + if (unlikely(IS_ERR(whname))) {
10618     + err = PTR_ERR(whname);
10619     + goto out;
10620     + }
10621     +
10622     + /* check if whiteout exists in this branch: lookup .wh.foo */
10623     + wh_dentry = lookup_lck_len(whname, lower_parent, strlen(whname));
10624     + if (IS_ERR(wh_dentry)) {
10625     + err = PTR_ERR(wh_dentry);
10626     + goto out;
10627     + }
10628     +
10629     + /* check if negative dentry (ENOENT) */
10630     + if (!wh_dentry->d_inode)
10631     + goto out;
10632     +
10633     + /* whiteout found: check if valid type */
10634     + if (!S_ISREG(wh_dentry->d_inode->i_mode)) {
10635     + printk(KERN_ERR "unionfs: invalid whiteout %s entry type %d\n",
10636     + whname, wh_dentry->d_inode->i_mode);
10637     + dput(wh_dentry);
10638     + err = -EIO;
10639     + goto out;
10640     + }
10641     +
10642     +out:
10643     + kfree(whname);
10644     + if (err)
10645     + wh_dentry = ERR_PTR(err);
10646     + return wh_dentry;
10647     +}
10648     +
10649     +/* find and return first whiteout in parent directory, else ENOENT */
10650     +struct dentry *find_first_whiteout(struct dentry *dentry)
10651     +{
10652     + int bindex, bstart, bend;
10653     + struct dentry *parent, *lower_parent, *wh_dentry;
10654     +
10655     + parent = dget_parent(dentry);
10656     +
10657     + bstart = dbstart(parent);
10658     + bend = dbend(parent);
10659     + wh_dentry = ERR_PTR(-ENOENT);
10660     +
10661     + for (bindex = bstart; bindex <= bend; bindex++) {
10662     + lower_parent = unionfs_lower_dentry_idx(parent, bindex);
10663     + if (!lower_parent)
10664     + continue;
10665     + wh_dentry = lookup_whiteout(dentry->d_name.name, lower_parent);
10666     + if (IS_ERR(wh_dentry))
10667     + continue;
10668     + if (wh_dentry->d_inode)
10669     + break;
10670     + dput(wh_dentry);
10671     + wh_dentry = ERR_PTR(-ENOENT);
10672     + }
10673     +
10674     + dput(parent);
10675     +
10676     + return wh_dentry;
10677     +}
10678     +
10679     +/*
10680     + * Unlink a whiteout dentry. Returns 0 or -errno. Caller must hold and
10681     + * release dentry reference.
10682     + */
10683     +int unlink_whiteout(struct dentry *wh_dentry)
10684     +{
10685     + int err;
10686     + struct dentry *lower_dir_dentry;
10687     +
10688     + /* dget and lock parent dentry */
10689     + lower_dir_dentry = lock_parent_wh(wh_dentry);
10690     +
10691     + /* see Documentation/filesystems/unionfs/issues.txt */
10692     + lockdep_off();
10693     + err = vfs_unlink(lower_dir_dentry->d_inode, wh_dentry);
10694     + lockdep_on();
10695     + unlock_dir(lower_dir_dentry);
10696     +
10697     + /*
10698     + * Whiteouts are special files and should be deleted no matter what
10699     + * (as if they never existed), in order to allow this create
10700     + * operation to succeed. This is especially important in sticky
10701     + * directories: a whiteout may have been created by one user, but
10702     + * the newly created file may be created by another user.
10703     + * Therefore, in order to maintain Unix semantics, if the vfs_unlink
10704     + * above failed, then we have to try to directly unlink the
10705     + * whiteout. Note: in the ODF version of unionfs, whiteout are
10706     + * handled much more cleanly.
10707     + */
10708     + if (err == -EPERM) {
10709     + struct inode *inode = lower_dir_dentry->d_inode;
10710     + err = inode->i_op->unlink(inode, wh_dentry);
10711     + }
10712     + if (err)
10713     + printk(KERN_ERR "unionfs: could not unlink whiteout %s, "
10714     + "err = %d\n", wh_dentry->d_name.name, err);
10715     +
10716     + return err;
10717     +
10718     +}
10719     +
10720     +/*
10721     + * Helper function when creating new objects (create, symlink, mknod, etc.).
10722     + * Checks to see if there's a whiteout in @lower_dentry's parent directory,
10723     + * whose name is taken from @dentry. Then tries to remove that whiteout, if
10724     + * found. If <dentry,bindex> is a branch marked readonly, return -EROFS.
10725     + * If it finds both a regular file and a whiteout, delete whiteout (this
10726     + * should never happen).
10727     + *
10728     + * Return 0 if no whiteout was found. Return 1 if one was found and
10729     + * successfully removed. Therefore a value >= 0 tells the caller that
10730     + * @lower_dentry belongs to a good branch to create the new object in).
10731     + * Return -ERRNO if an error occurred during whiteout lookup or in trying to
10732     + * unlink the whiteout.
10733     + */
10734     +int check_unlink_whiteout(struct dentry *dentry, struct dentry *lower_dentry,
10735     + int bindex)
10736     +{
10737     + int err;
10738     + struct dentry *wh_dentry = NULL;
10739     + struct dentry *lower_dir_dentry = NULL;
10740     +
10741     + /* look for whiteout dentry first */
10742     + lower_dir_dentry = dget_parent(lower_dentry);
10743     + wh_dentry = lookup_whiteout(dentry->d_name.name, lower_dir_dentry);
10744     + dput(lower_dir_dentry);
10745     + if (IS_ERR(wh_dentry)) {
10746     + err = PTR_ERR(wh_dentry);
10747     + goto out;
10748     + }
10749     +
10750     + if (!wh_dentry->d_inode) { /* no whiteout exists*/
10751     + err = 0;
10752     + goto out_dput;
10753     + }
10754     +
10755     + /* check if regular file and whiteout were both found */
10756     + if (unlikely(lower_dentry->d_inode))
10757     + printk(KERN_WARNING "unionfs: removing whiteout; regular "
10758     + "file exists in directory %s (branch %d)\n",
10759     + lower_dir_dentry->d_name.name, bindex);
10760     +
10761     + /* check if branch is writeable */
10762     + err = is_robranch_super(dentry->d_sb, bindex);
10763     + if (err)
10764     + goto out_dput;
10765     +
10766     + /* .wh.foo has been found, so let's unlink it */
10767     + err = unlink_whiteout(wh_dentry);
10768     + if (!err)
10769     + err = 1; /* a whiteout was found and successfully removed */
10770     +out_dput:
10771     + dput(wh_dentry);
10772     +out:
10773     + return err;
10774     +}
10775     +
10776     +/*
10777     + * Pass an unionfs dentry and an index. It will try to create a whiteout
10778     + * for the filename in dentry, and will try in branch 'index'. On error,
10779     + * it will proceed to a branch to the left.
10780     + */
10781     +int create_whiteout(struct dentry *dentry, int start)
10782     +{
10783     + int bstart, bend, bindex;
10784     + struct dentry *lower_dir_dentry;
10785     + struct dentry *lower_dentry;
10786     + struct dentry *lower_wh_dentry;
10787     + struct nameidata nd;
10788     + char *name = NULL;
10789     + int err = -EINVAL;
10790     +
10791     + verify_locked(dentry);
10792     +
10793     + bstart = dbstart(dentry);
10794     + bend = dbend(dentry);
10795     +
10796     + /* create dentry's whiteout equivalent */
10797     + name = alloc_whname(dentry->d_name.name, dentry->d_name.len);
10798     + if (unlikely(IS_ERR(name))) {
10799     + err = PTR_ERR(name);
10800     + goto out;
10801     + }
10802     +
10803     + for (bindex = start; bindex >= 0; bindex--) {
10804     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10805     +
10806     + if (!lower_dentry) {
10807     + /*
10808     + * if lower dentry is not present, create the
10809     + * entire lower dentry directory structure and go
10810     + * ahead. Since we want to just create whiteout, we
10811     + * only want the parent dentry, and hence get rid of
10812     + * this dentry.
10813     + */
10814     + lower_dentry = create_parents(dentry->d_inode,
10815     + dentry,
10816     + dentry->d_name.name,
10817     + bindex);
10818     + if (!lower_dentry || IS_ERR(lower_dentry)) {
10819     + int ret = PTR_ERR(lower_dentry);
10820     + if (!IS_COPYUP_ERR(ret))
10821     + printk(KERN_ERR
10822     + "unionfs: create_parents for "
10823     + "whiteout failed: bindex=%d "
10824     + "err=%d\n", bindex, ret);
10825     + continue;
10826     + }
10827     + }
10828     +
10829     + lower_wh_dentry =
10830     + lookup_lck_len(name, lower_dentry->d_parent,
10831     + dentry->d_name.len + UNIONFS_WHLEN);
10832     + if (IS_ERR(lower_wh_dentry))
10833     + continue;
10834     +
10835     + /*
10836     + * The whiteout already exists. This used to be impossible,
10837     + * but now is possible because of opaqueness.
10838     + */
10839     + if (lower_wh_dentry->d_inode) {
10840     + dput(lower_wh_dentry);
10841     + err = 0;
10842     + goto out;
10843     + }
10844     +
10845     + err = init_lower_nd(&nd, LOOKUP_CREATE);
10846     + if (unlikely(err < 0))
10847     + goto out;
10848     + lower_dir_dentry = lock_parent_wh(lower_wh_dentry);
10849     + err = is_robranch_super(dentry->d_sb, bindex);
10850     + if (!err)
10851     + err = vfs_create(lower_dir_dentry->d_inode,
10852     + lower_wh_dentry,
10853     + current_umask() & S_IRUGO,
10854     + &nd);
10855     + unlock_dir(lower_dir_dentry);
10856     + dput(lower_wh_dentry);
10857     + release_lower_nd(&nd, err);
10858     +
10859     + if (!err || !IS_COPYUP_ERR(err))
10860     + break;
10861     + }
10862     +
10863     + /* set dbopaque so that lookup will not proceed after this branch */
10864     + if (!err)
10865     + dbopaque(dentry) = bindex;
10866     +
10867     +out:
10868     + kfree(name);
10869     + return err;
10870     +}
10871     +
10872     +/*
10873     + * Delete all of the whiteouts in a given directory for rmdir.
10874     + *
10875     + * lower directory inode should be locked
10876     + */
10877     +static int do_delete_whiteouts(struct dentry *dentry, int bindex,
10878     + struct unionfs_dir_state *namelist)
10879     +{
10880     + int err = 0;
10881     + struct dentry *lower_dir_dentry = NULL;
10882     + struct dentry *lower_dentry;
10883     + char *name = NULL, *p;
10884     + struct inode *lower_dir;
10885     + int i;
10886     + struct list_head *pos;
10887     + struct filldir_node *cursor;
10888     +
10889     + /* Find out lower parent dentry */
10890     + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10891     + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10892     + lower_dir = lower_dir_dentry->d_inode;
10893     + BUG_ON(!S_ISDIR(lower_dir->i_mode));
10894     +
10895     + err = -ENOMEM;
10896     + name = __getname();
10897     + if (unlikely(!name))
10898     + goto out;
10899     + strcpy(name, UNIONFS_WHPFX);
10900     + p = name + UNIONFS_WHLEN;
10901     +
10902     + err = 0;
10903     + for (i = 0; !err && i < namelist->size; i++) {
10904     + list_for_each(pos, &namelist->list[i]) {
10905     + cursor =
10906     + list_entry(pos, struct filldir_node,
10907     + file_list);
10908     + /* Only operate on whiteouts in this branch. */
10909     + if (cursor->bindex != bindex)
10910     + continue;
10911     + if (!cursor->whiteout)
10912     + continue;
10913     +
10914     + strlcpy(p, cursor->name, PATH_MAX - UNIONFS_WHLEN);
10915     + lower_dentry =
10916     + lookup_lck_len(name, lower_dir_dentry,
10917     + cursor->namelen +
10918     + UNIONFS_WHLEN);
10919     + if (IS_ERR(lower_dentry)) {
10920     + err = PTR_ERR(lower_dentry);
10921     + break;
10922     + }
10923     + if (lower_dentry->d_inode)
10924     + err = vfs_unlink(lower_dir, lower_dentry);
10925     + dput(lower_dentry);
10926     + if (err)
10927     + break;
10928     + }
10929     + }
10930     +
10931     + __putname(name);
10932     +
10933     + /* After all of the removals, we should copy the attributes once. */
10934     + fsstack_copy_attr_times(dentry->d_inode, lower_dir_dentry->d_inode);
10935     +
10936     +out:
10937     + return err;
10938     +}
10939     +
10940     +
10941     +void __delete_whiteouts(struct work_struct *work)
10942     +{
10943     + struct sioq_args *args = container_of(work, struct sioq_args, work);
10944     + struct deletewh_args *d = &args->deletewh;
10945     +
10946     + args->err = do_delete_whiteouts(d->dentry, d->bindex, d->namelist);
10947     + complete(&args->comp);
10948     +}
10949     +
10950     +/* delete whiteouts in a dir (for rmdir operation) using sioq if necessary */
10951     +int delete_whiteouts(struct dentry *dentry, int bindex,
10952     + struct unionfs_dir_state *namelist)
10953     +{
10954     + int err;
10955     + struct super_block *sb;
10956     + struct dentry *lower_dir_dentry;
10957     + struct inode *lower_dir;
10958     + struct sioq_args args;
10959     +
10960     + sb = dentry->d_sb;
10961     +
10962     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode));
10963     + BUG_ON(bindex < dbstart(dentry));
10964     + BUG_ON(bindex > dbend(dentry));
10965     + err = is_robranch_super(sb, bindex);
10966     + if (err)
10967     + goto out;
10968     +
10969     + lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
10970     + BUG_ON(!S_ISDIR(lower_dir_dentry->d_inode->i_mode));
10971     + lower_dir = lower_dir_dentry->d_inode;
10972     + BUG_ON(!S_ISDIR(lower_dir->i_mode));
10973     +
10974     + if (!inode_permission(lower_dir, MAY_WRITE | MAY_EXEC)) {
10975     + err = do_delete_whiteouts(dentry, bindex, namelist);
10976     + } else {
10977     + args.deletewh.namelist = namelist;
10978     + args.deletewh.dentry = dentry;
10979     + args.deletewh.bindex = bindex;
10980     + run_sioq(__delete_whiteouts, &args);
10981     + err = args.err;
10982     + }
10983     +
10984     +out:
10985     + return err;
10986     +}
10987     +
10988     +/****************************************************************************
10989     + * Opaque directory helpers *
10990     + ****************************************************************************/
10991     +
10992     +/*
10993     + * is_opaque_dir: returns 0 if it is NOT an opaque dir, 1 if it is, and
10994     + * -errno if an error occurred trying to figure this out.
10995     + */
10996     +int is_opaque_dir(struct dentry *dentry, int bindex)
10997     +{
10998     + int err = 0;
10999     + struct dentry *lower_dentry;
11000     + struct dentry *wh_lower_dentry;
11001     + struct inode *lower_inode;
11002     + struct sioq_args args;
11003     + struct nameidata lower_nd;
11004     +
11005     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
11006     + lower_inode = lower_dentry->d_inode;
11007     +
11008     + BUG_ON(!S_ISDIR(lower_inode->i_mode));
11009     +
11010     + mutex_lock(&lower_inode->i_mutex);
11011     +
11012     + if (!inode_permission(lower_inode, MAY_EXEC)) {
11013     + err = init_lower_nd(&lower_nd, LOOKUP_OPEN);
11014     + if (unlikely(err < 0)) {
11015     + mutex_unlock(&lower_inode->i_mutex);
11016     + goto out;
11017     + }
11018     + wh_lower_dentry =
11019     + lookup_one_len_nd(UNIONFS_DIR_OPAQUE, lower_dentry,
11020     + sizeof(UNIONFS_DIR_OPAQUE) - 1,
11021     + &lower_nd);
11022     + release_lower_nd(&lower_nd, err);
11023     + } else {
11024     + args.is_opaque.dentry = lower_dentry;
11025     + run_sioq(__is_opaque_dir, &args);
11026     + wh_lower_dentry = args.ret;
11027     + }
11028     +
11029     + mutex_unlock(&lower_inode->i_mutex);
11030     +
11031     + if (IS_ERR(wh_lower_dentry)) {
11032     + err = PTR_ERR(wh_lower_dentry);
11033     + goto out;
11034     + }
11035     +
11036     + /* This is an opaque dir iff wh_lower_dentry is positive */
11037     + err = !!wh_lower_dentry->d_inode;
11038     +
11039     + dput(wh_lower_dentry);
11040     +out:
11041     + return err;
11042     +}
11043     +
11044     +void __is_opaque_dir(struct work_struct *work)
11045     +{
11046     + struct sioq_args *args = container_of(work, struct sioq_args, work);
11047     + struct nameidata lower_nd;
11048     + int err;
11049     +
11050     + err = init_lower_nd(&lower_nd, LOOKUP_OPEN);
11051     + if (unlikely(err < 0))
11052     + return;
11053     + args->ret = lookup_one_len_nd(UNIONFS_DIR_OPAQUE,
11054     + args->is_opaque.dentry,
11055     + sizeof(UNIONFS_DIR_OPAQUE) - 1,
11056     + &lower_nd);
11057     + release_lower_nd(&lower_nd, err);
11058     + complete(&args->comp);
11059     +}
11060     +
11061     +int make_dir_opaque(struct dentry *dentry, int bindex)
11062     +{
11063     + int err = 0;
11064     + struct dentry *lower_dentry, *diropq;
11065     + struct inode *lower_dir;
11066     + struct nameidata nd;
11067     + const struct cred *old_creds;
11068     + struct cred *new_creds;
11069     +
11070     + /*
11071     + * Opaque directory whiteout markers are special files (like regular
11072     + * whiteouts), and should appear to the users as if they don't
11073     + * exist. They should be created/deleted regardless of directory
11074     + * search/create permissions, but only for the duration of this
11075     + * creation of the .wh.__dir_opaque: file. Note, this does not
11076     + * circumvent normal ->permission).
11077     + */
11078     + new_creds = prepare_creds();
11079     + if (unlikely(!new_creds)) {
11080     + err = -ENOMEM;
11081     + goto out_err;
11082     + }
11083     + cap_raise(new_creds->cap_effective, CAP_DAC_READ_SEARCH);
11084     + cap_raise(new_creds->cap_effective, CAP_DAC_OVERRIDE);
11085     + old_creds = override_creds(new_creds);
11086     +
11087     + lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
11088     + lower_dir = lower_dentry->d_inode;
11089     + BUG_ON(!S_ISDIR(dentry->d_inode->i_mode) ||
11090     + !S_ISDIR(lower_dir->i_mode));
11091     +
11092     + mutex_lock(&lower_dir->i_mutex);
11093     + err = init_lower_nd(&nd, LOOKUP_OPEN);
11094     + if (unlikely(err < 0))
11095     + goto out;
11096     + diropq = lookup_one_len_nd(UNIONFS_DIR_OPAQUE, lower_dentry,
11097     + sizeof(UNIONFS_DIR_OPAQUE) - 1, &nd);
11098     + release_lower_nd(&nd, err);
11099     + if (IS_ERR(diropq)) {
11100     + err = PTR_ERR(diropq);
11101     + goto out;
11102     + }
11103     +
11104     + err = init_lower_nd(&nd, LOOKUP_CREATE);
11105     + if (unlikely(err < 0))
11106     + goto out;
11107     + if (!diropq->d_inode)
11108     + err = vfs_create(lower_dir, diropq, S_IRUGO, &nd);
11109     + if (!err)
11110     + dbopaque(dentry) = bindex;
11111     + release_lower_nd(&nd, err);
11112     +
11113     + dput(diropq);
11114     +
11115     +out:
11116     + mutex_unlock(&lower_dir->i_mutex);
11117     + revert_creds(old_creds);
11118     +out_err:
11119     + return err;
11120     +}
11121     diff --git a/fs/unionfs/xattr.c b/fs/unionfs/xattr.c
11122     new file mode 100644
11123     index 0000000..a93d803
11124     --- /dev/null
11125     +++ b/fs/unionfs/xattr.c
11126     @@ -0,0 +1,173 @@
11127     +/*
11128     + * Copyright (c) 2003-2011 Erez Zadok
11129     + * Copyright (c) 2003-2006 Charles P. Wright
11130     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
11131     + * Copyright (c) 2005-2006 Junjiro Okajima
11132     + * Copyright (c) 2005 Arun M. Krishnakumar
11133     + * Copyright (c) 2004-2006 David P. Quigley
11134     + * Copyright (c) 2003-2004 Mohammad Nayyer Zubair
11135     + * Copyright (c) 2003 Puja Gupta
11136     + * Copyright (c) 2003 Harikesavan Krishnan
11137     + * Copyright (c) 2003-2011 Stony Brook University
11138     + * Copyright (c) 2003-2011 The Research Foundation of SUNY
11139     + *
11140     + * This program is free software; you can redistribute it and/or modify
11141     + * it under the terms of the GNU General Public License version 2 as
11142     + * published by the Free Software Foundation.
11143     + */
11144     +
11145     +#include "union.h"
11146     +
11147     +/* This is lifted from fs/xattr.c */
11148     +void *unionfs_xattr_alloc(size_t size, size_t limit)
11149     +{
11150     + void *ptr;
11151     +
11152     + if (size > limit)
11153     + return ERR_PTR(-E2BIG);
11154     +
11155     + if (!size) /* size request, no buffer is needed */
11156     + return NULL;
11157     +
11158     + ptr = kmalloc(size, GFP_KERNEL);
11159     + if (unlikely(!ptr))
11160     + return ERR_PTR(-ENOMEM);
11161     + return ptr;
11162     +}
11163     +
11164     +/*
11165     + * BKL held by caller.
11166     + * dentry->d_inode->i_mutex locked
11167     + */
11168     +ssize_t unionfs_getxattr(struct dentry *dentry, const char *name, void *value,
11169     + size_t size)
11170     +{
11171     + struct dentry *lower_dentry = NULL;
11172     + struct dentry *parent;
11173     + int err = -EOPNOTSUPP;
11174     + bool valid;
11175     +
11176     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11177     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11178     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11179     +
11180     + valid = __unionfs_d_revalidate(dentry, parent, false);
11181     + if (unlikely(!valid)) {
11182     + err = -ESTALE;
11183     + goto out;
11184     + }
11185     +
11186     + lower_dentry = unionfs_lower_dentry(dentry);
11187     +
11188     + err = vfs_getxattr(lower_dentry, (char *) name, value, size);
11189     +
11190     +out:
11191     + unionfs_check_dentry(dentry);
11192     + unionfs_unlock_dentry(dentry);
11193     + unionfs_unlock_parent(dentry, parent);
11194     + unionfs_read_unlock(dentry->d_sb);
11195     + return err;
11196     +}
11197     +
11198     +/*
11199     + * BKL held by caller.
11200     + * dentry->d_inode->i_mutex locked
11201     + */
11202     +int unionfs_setxattr(struct dentry *dentry, const char *name,
11203     + const void *value, size_t size, int flags)
11204     +{
11205     + struct dentry *lower_dentry = NULL;
11206     + struct dentry *parent;
11207     + int err = -EOPNOTSUPP;
11208     + bool valid;
11209     +
11210     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11211     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11212     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11213     +
11214     + valid = __unionfs_d_revalidate(dentry, parent, false);
11215     + if (unlikely(!valid)) {
11216     + err = -ESTALE;
11217     + goto out;
11218     + }
11219     +
11220     + lower_dentry = unionfs_lower_dentry(dentry);
11221     +
11222     + err = vfs_setxattr(lower_dentry, (char *) name, (void *) value,
11223     + size, flags);
11224     +
11225     +out:
11226     + unionfs_check_dentry(dentry);
11227     + unionfs_unlock_dentry(dentry);
11228     + unionfs_unlock_parent(dentry, parent);
11229     + unionfs_read_unlock(dentry->d_sb);
11230     + return err;
11231     +}
11232     +
11233     +/*
11234     + * BKL held by caller.
11235     + * dentry->d_inode->i_mutex locked
11236     + */
11237     +int unionfs_removexattr(struct dentry *dentry, const char *name)
11238     +{
11239     + struct dentry *lower_dentry = NULL;
11240     + struct dentry *parent;
11241     + int err = -EOPNOTSUPP;
11242     + bool valid;
11243     +
11244     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11245     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11246     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11247     +
11248     + valid = __unionfs_d_revalidate(dentry, parent, false);
11249     + if (unlikely(!valid)) {
11250     + err = -ESTALE;
11251     + goto out;
11252     + }
11253     +
11254     + lower_dentry = unionfs_lower_dentry(dentry);
11255     +
11256     + err = vfs_removexattr(lower_dentry, (char *) name);
11257     +
11258     +out:
11259     + unionfs_check_dentry(dentry);
11260     + unionfs_unlock_dentry(dentry);
11261     + unionfs_unlock_parent(dentry, parent);
11262     + unionfs_read_unlock(dentry->d_sb);
11263     + return err;
11264     +}
11265     +
11266     +/*
11267     + * BKL held by caller.
11268     + * dentry->d_inode->i_mutex locked
11269     + */
11270     +ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size)
11271     +{
11272     + struct dentry *lower_dentry = NULL;
11273     + struct dentry *parent;
11274     + int err = -EOPNOTSUPP;
11275     + char *encoded_list = NULL;
11276     + bool valid;
11277     +
11278     + unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
11279     + parent = unionfs_lock_parent(dentry, UNIONFS_DMUTEX_PARENT);
11280     + unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
11281     +
11282     + valid = __unionfs_d_revalidate(dentry, parent, false);
11283     + if (unlikely(!valid)) {
11284     + err = -ESTALE;
11285     + goto out;
11286     + }
11287     +
11288     + lower_dentry = unionfs_lower_dentry(dentry);
11289     +
11290     + encoded_list = list;
11291     + err = vfs_listxattr(lower_dentry, encoded_list, size);
11292     +
11293     +out:
11294     + unionfs_check_dentry(dentry);
11295     + unionfs_unlock_dentry(dentry);
11296     + unionfs_unlock_parent(dentry, parent);
11297     + unionfs_read_unlock(dentry->d_sb);
11298     + return err;
11299     +}
11300     diff --git a/include/linux/fs_stack.h b/include/linux/fs_stack.h
11301     index da317c7..64f1ced 100644
11302     --- a/include/linux/fs_stack.h
11303     +++ b/include/linux/fs_stack.h
11304     @@ -1,7 +1,19 @@
11305     +/*
11306     + * Copyright (c) 2006-2009 Erez Zadok
11307     + * Copyright (c) 2006-2007 Josef 'Jeff' Sipek
11308     + * Copyright (c) 2006-2009 Stony Brook University
11309     + * Copyright (c) 2006-2009 The Research Foundation of SUNY
11310     + *
11311     + * This program is free software; you can redistribute it and/or modify
11312     + * it under the terms of the GNU General Public License version 2 as
11313     + * published by the Free Software Foundation.
11314     + */
11315     +
11316     #ifndef _LINUX_FS_STACK_H
11317     #define _LINUX_FS_STACK_H
11318    
11319     -/* This file defines generic functions used primarily by stackable
11320     +/*
11321     + * This file defines generic functions used primarily by stackable
11322     * filesystems; none of these functions require i_mutex to be held.
11323     */
11324    
11325     diff --git a/include/linux/magic.h b/include/linux/magic.h
11326     index 1e5df2a..01ee54d 100644
11327     --- a/include/linux/magic.h
11328     +++ b/include/linux/magic.h
11329     @@ -50,6 +50,8 @@
11330     #define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs"
11331     #define REISER2FS_JR_SUPER_MAGIC_STRING "ReIsEr3Fs"
11332    
11333     +#define UNIONFS_SUPER_MAGIC 0xf15f083d
11334     +
11335     #define SMB_SUPER_MAGIC 0x517B
11336     #define USBDEVICE_SUPER_MAGIC 0x9fa2
11337     #define CGROUP_SUPER_MAGIC 0x27e0eb
11338     diff --git a/include/linux/namei.h b/include/linux/namei.h
11339     index eba45ea..8e19e9c 100644
11340     --- a/include/linux/namei.h
11341     +++ b/include/linux/namei.h
11342     @@ -81,8 +81,11 @@ extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
11343    
11344     extern struct file *lookup_instantiate_filp(struct nameidata *nd, struct dentry *dentry,
11345     int (*open)(struct inode *, struct file *));
11346     +extern void release_open_intent(struct nameidata *);
11347    
11348     extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
11349     +extern struct dentry *lookup_one_len_nd(const char *, struct dentry *, int,
11350     + struct nameidata *nd);
11351    
11352     extern int follow_down_one(struct path *);
11353     extern int follow_down(struct path *);
11354     diff --git a/include/linux/splice.h b/include/linux/splice.h
11355     index 997c3b4..54f5501 100644
11356     --- a/include/linux/splice.h
11357     +++ b/include/linux/splice.h
11358     @@ -81,6 +81,11 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
11359     struct splice_pipe_desc *);
11360     extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
11361     splice_direct_actor *);
11362     +extern long vfs_splice_from(struct pipe_inode_info *pipe, struct file *out,
11363     + loff_t *ppos, size_t len, unsigned int flags);
11364     +extern long vfs_splice_to(struct file *in, loff_t *ppos,
11365     + struct pipe_inode_info *pipe, size_t len,
11366     + unsigned int flags);
11367    
11368     /*
11369     * for dynamic pipe sizing
11370     diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
11371     new file mode 100644
11372     index 0000000..c84d97e
11373     --- /dev/null
11374     +++ b/include/linux/union_fs.h
11375     @@ -0,0 +1,22 @@
11376     +/*
11377     + * Copyright (c) 2003-2009 Erez Zadok
11378     + * Copyright (c) 2005-2007 Josef 'Jeff' Sipek
11379     + * Copyright (c) 2003-2009 Stony Brook University
11380     + * Copyright (c) 2003-2009 The Research Foundation of SUNY
11381     + *
11382     + * This program is free software; you can redistribute it and/or modify
11383     + * it under the terms of the GNU General Public License version 2 as
11384     + * published by the Free Software Foundation.
11385     + */
11386     +
11387     +#ifndef _LINUX_UNION_FS_H
11388     +#define _LINUX_UNION_FS_H
11389     +
11390     +/*
11391     + * DEFINITIONS FOR USER AND KERNEL CODE:
11392     + */
11393     +# define UNIONFS_IOCTL_INCGEN _IOR(0x15, 11, int)
11394     +# define UNIONFS_IOCTL_QUERYFILE _IOR(0x15, 15, int)
11395     +
11396     +#endif /* _LINUX_UNIONFS_H */
11397     +
11398     diff --git a/security/security.c b/security/security.c
11399     index 4ba6d4c..093d8b4 100644
11400     --- a/security/security.c
11401     +++ b/security/security.c
11402     @@ -520,6 +520,7 @@ int security_inode_permission(struct inode *inode, int mask)
11403     return 0;
11404     return security_ops->inode_permission(inode, mask, 0);
11405     }
11406     +EXPORT_SYMBOL(security_inode_permission);
11407    
11408     int security_inode_exec_permission(struct inode *inode, unsigned int flags)
11409     {