The kernel handles the managment of UFS/FFS snapshots. Since UFS/FFS
updates filesystem data (rather than always writing changes to new
locations like ZFS), the kernel must check every filesystem write
to see if the block being written is part of a snapshot. If it is
part of a snapshot, then the kernel must make a copy of the old
block value into a newly allocated block for the snapshot before
allowing the write to be done. Similarly, if a block is being freed,
the kernel must check to see if it is part of a snapshot and let
the snapshot claim the block rather than freeing it for future use.
When a snapshot is freed, its blocks need to be offered to older
snapshots and freed only if no older snapshots wish to claim them.
When snapshots were added to UFS/FFS they were integrated into soft
updates and just a small part of the management of snapshots needed
to be added to fsck_ffs(8) as soft updates minimized the set of
snapshot changes that might need correction. When journaling was
added to soft updates a much more complete knowledge of snapshots
needed to be added to fsck_ffs(8) for it to be able to properly
handle the filesystem changes that a journal rollback needs to do
(specifically the freeing and allocation of blocks). Since this
functionality was unavailable, the use of snapshots was disabled
when running with journaled soft updates.
This set of changes imports the kernel code for the management of
snapshots to fsck_ffs(8). With this code in place it will become
possible to enable snapshots when running with journalled soft
updates. The most immediate benefit will be the ability to use
snapshots to take consistent filesystem dumps on live filesystems.
Future work will be done to update fsck_ffs(8) to be able to use
snapshots to run in background on live filesystems running with
journaled soft updates.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36491
If the root directory exists but has a bad block number Pass1 will
accept it and setup an inoinfo structure for it. When Pass2 runs
and cannot read the root inode's content because of a bad (or
duplicate) block number, it removes the bad root inode and replaces
it. As part of creating the replacement root inode, it creates an
inoinfo entry for it. But Pass2 did delete the inoinfo entry that
Pass1 had set up for the root inode so ended up with two inoinfo
structures for it. The final step of Pass2 checks that all the ".."
entries are correct adding them if they are missing which resulted
in a second ".." entry being added to the root directory which
definitely did not go over well in the kernel name cache!
Reported by: Peter Holm
Sponsored by: The FreeBSD Foundation
into ffs_sbsearch() to allow use by other parts of the system.
Historically only fsck_ffs(8), the UFS filesystem checker, had code
to track down and use alternate UFS superblocks. Since fsdb(8) used
much of the fsck_ffs(8) implementation it had some ability to track
down alternate superblocks.
This change extracts the code to track down alternate superblocks
from fsck_ffs(8) and puts it into a new function ffs_sbsearch() in
sys/ufs/ffs/ffs_subr.c. Like ffs_sbget() and ffs_sbput() also found
in ffs_subr.c, these functions can be used directly by the kernel
subsystems. Additionally they are exported to the UFS library,
libufs(8) so that they can be used by user-level programs. The new
functions added to libufs(8) are sbfind(3) that is an alternative
to sbread(3) and sbsearch(3) that is an alternative to sbget(3).
See their manual pages for further details.
The utilities that have been changed to search for superblocks are
dumpfs(8), fsdb(8), ffsinfo(8), and fsck_ffs(8). Also, the prtblknos(8)
tool found in tools/diag/prtblknos searches for superblocks.
The UFS specific mount code uses the superblock search interface
when mounting the root filesystem and when the administrator doing
a mount(8) command specifies the force flag (-f). The standalone UFS
boot code (found in stand/libsa/ufs.c) uses the superblock search
code in the hope of being able to get the system up and running so
that fsck_ffs(8) can be used to get the filesystem cleaned up.
The following utilities have not been changed to search for
superblocks: clri(8), tunefs(8), snapinfo(8), fstyp(8), quot(8),
dump(8), fsirand(8), growfs(8), quotacheck(8), gjournal(8), and
glabel(8). When these utilities fail, they do report the cause of
the failure. The one exception is the tasting code used to try and
figure what a given disk contains. The tasting code will remain
silent so as not to put out a slew of messages as it trying to taste
every new mass storage device that shows up.
Reviewed by: kib
Reviewed by: Warner Losh
Tested by: Peter Holm
Differential Revision: https://reviews.freebsd.org/D36053
Sponsored by: The FreeBSD Foundation
The cleanup of fsck_ffs(8) in commit c0bfa109b9 broke fsdb(8).
This commit adds the one-line update needed in fsdb(8) to make it
work with the new fsck_ffs(8) structure.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 3 days
Part of the problem was that fsck_ffs would read the superblock
multiple times complaining and repairing the superblock check hash
each time and then at the end failing to write out the superblock
with the corrected check hash. This fix reads the superblock just
once and if the check hash is corrected ensures that the fixed
superblock gets written.
Tested by: Peter Holm
PR: 245916
MFC after: 1 week
Sponsored by: Netflix
Several large data structures are allocated by fsck_ffs to track
resource usage. Most but not all were deallocated at the end of
checking each filesystem. This commit consolidates the freeing
of all data structures in one place and adds one that had previously
been missing.
It is important to clean up these data structures as they can be
large. If the previous allocations have not been freed, fsck_ffs
can run out of address space when many large filesystems are being
checked. An alternative would be to fork a new instance of fsck_ffs
for each filesystem to be checked, but we choose to free the small
set of large structures to save the fork overhead.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 7 days
Sponsored by: Netflix
making fsck_ffs(8) run faster, there should be no functional change.
The original fsck_ffs(8) had its own disk I/O management system.
When gjournal(8) was added to FreeBSD 7, code was added to fsck_ffs(8)
to do the necessary gjournal rollback. Rather than use the existing
fsck_ffs(8) disk I/O system, it wrote its own from scratch. Similarly
when journalled soft updates were added in FreeBSD 9, code was added
to fsck_ffs(8) to do the necessary journal rollback. And once again,
rather than using either of the existing fsck_ffs(8) disk I/O
systems, it wrote its own from scratch. Lastly the fsdb(8) utility
uses the fsck_ffs(8) disk I/O management system. In preparation for
making the changes necessary to enable snapshots to be taken when
using journalled soft updates, it was necessary to have a single
disk I/O system used by all the various subsystems in fsck_ffs(8).
This commit merges the functionality required by all the different
subsystems into a single disk I/O system that supports all of their
needs. In so doing it picks up optimizations from each of them
with the results that each of the subsystems does fewer reads and
writes than it did with its own customized I/O system. It also
greatly simplifies making changes to fsck_ffs(8) since everything
goes through a single place. For example the ginode() function
fetches an inode from the disk. When inode check hashes were added,
they previously had to be checked in the code implementing inode
fetch in each of the three different disk I/O systems. Now they
need only be checked in ginode().
Tested by: Peter Holm
Sponsored by: Netflix
The new name more accurately describes what it does and the file move
puts it with other similar functions. Done in preparation for future
cleanups. No functional differences intended.
Sponsored by: Netflix
Historic Footnote: my last FreeBSD svn commit
This one is also a small list:
- 3x duplicate definition (ufs2_zino, returntosingle, nflag)
- 5x 'needs extern', 3/5 of which are referenced in fsdb
-fno-common will become the default in GCC10/LLVM11.
MFC after: 1 week
directory entries that is caused by uninitialized directory entry
padding written to the disk. It can be viewed by any user with read
access to that directory. Up to 3 bytes of kernel stack are disclosed
per file entry, depending on the the amount of padding the kernel
needs to pad out the entry to a 32 bit boundry. The offset in the
kernel stack that is disclosed is a function of the filename size.
Furthermore, if the user can create files in a directory, this 3
byte window can be expanded 3 bytes at a time to a 254 byte window
with 75% of the data in that window exposed. The additional exposure
is done by removing the entry, creating a new entry with a 4-byte
longer name, extracting 3 more bytes by reading the directory, and
repeating until a 252 byte name is created.
This exploit works in part because the area of the kernel stack
that is being disclosed is in an area that typically doesn't change
that often (perhaps a few times a second on a lightly loaded system),
and these file creates and unlinks themselves don't overwrite the
area of kernel stack being disclosed.
It appears that this bug originated with the creation of the Fast
File System in 4.1b-BSD (Circa 1982, more than 36 years ago!), and
is likely present in every Unix or Unix-like system that uses
UFS/FFS. Amazingly, nobody noticed until now.
This update also adds the -z flag to fsck_ffs to have it scrub
the leaked information in the name padding of existing directories.
It only needs to be run once on each UFS/FFS filesystem after a
patched kernel is installed and running.
Submitted by: David G. Lawrence <dg@dglawrence.com>
Reviewed by: kib
MFC after: 1 week
last allocated block of the file and if that is found, shortens the
file to reference the last allocated block thus avoiding having it
reference a hole at its end.
This update corrects an error where fsck_ffs miscalculated the last
logical block of the file when the file contained a large hole.
Reported by: Jamie Landeg-Jones
Tested by: Peter Holm
MFC after: 2 weeks
Sponsored by: Netflix
shorter than its size resulting in a hole as its final block (which
is a violation of the invarients of the UFS filesystem).
Soft updates will always ensure that the file size is correct when
writing inodes to disk for files that contain only direct block
pointers. However soft updates does not roll back sizes for files
with indirect blocks that it has set to unallocated because their
contents have not yet been written to disk. Hence, the file can
appear to have a hole at its end because the block pointer has been
rolled back to zero when its inode was written to disk. Thus,
fsck_ffs calculates the last allocated block in the file. For files
that extend into indirect blocks, fsck_ffs checks for a size past
the last allocated block of the file and if that is found, shortens
the file to reference the last allocated block thus avoiding having
it reference a hole at its end.
Submitted by: Chuck Silvers <chs@netflix.com>
Tested by: Chuck Silvers <chs@netflix.com>
MFC after: 1 week
Sponsored by: Netflix
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.
Convert the utilities that had been using the undocumented
interface to use the new documented interface.
No functional change (as for now the libufs library does not
do inode check-hashes).
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
Specifically reading is done if ffs_sbget() and writing is done
in ffs_sbput(). These functions are exported to libufs via the
sbget() and sbput() functions which then used in the various
filesystem utilities. This work is in preparation for adding
subperblock check hashes.
No functional change intended.
Reviewed by: kib
routine to write out the cylinder groups rather than recreating the
calculation of the cylinder-group check hash in fsck_ffs.
No functional change intended.
When the fsck_ffs program cannot fully repair a file system, it will
output the message PLEASE RERUN FSCK. However, it does not exit with a
non-zero status in this case (contradicting the man page claim that it
"exits with 0 on success, and >0 if an error occurs." The fsck
rc-script (when running "fsck -y") tests the status from fsck (which
passes along the exit status from fsck_ffs) and issues a "stop_boot"
if the status fails. However, this is not effective since fsck_ffs can
return zero even on (some) errors. Effectively, it is left to a later
step in the boot process when the file systems are mounted to detect
the still-unclean file system and stop the boot.
This change modifies fsck_ffs so that when it cannot fully repair the
file system and issues the PLEASE RERUN FSCK message it also exits
with a non-zero status.
While here, the fsck_ffs man page has also been updated to document
the failing exit status codes used by fsck_ffs. Previously, only exit
status 7 was documented. Some of these exit statuses are tested for in
the fsck rc-script, so they are clearly depended upon and deserve
documentation.
Reviewed by: mckusick, vangyzen, jilles (manpages)
MFC after: 1 week
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D13862
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Instead of casting listmax and numdirs to unsigned values just define
them as unsigned and avoid the casts. Use reallocarray(3).
While here, fs_ncg is already unsigned so the cast is unnecessary.
Reviewed by: mckusick
MFC after: 2 weeks
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
alternate superblock location when given in the -b option. When int
is 32-bits, block numbers larger than 2^32 would get truncated. This
commit changes the storage fpr the alternate superblock location
to a ufs2_daddr_t.
Submitted by: Dmitry Sivachenko <trtrmitya@gmail.com>
errors have been detected in a particular run.
Clean up the global state variables so that a restart can happen correctly.
Separate the global variables in fsck_ffs and fsdb to their own file. This
fixes header sharing with fscd.
Correctly initialize, static-ize, and remove global variables as needed in
dir.c. This fixes a problem with lost+found directories that was causing
a segfault.
Correctly initialize, static-ize, and remove global variables as needed in
suj.c.
Initialize the suj globals before allocating the disk object, not after.
Also ensure that 'preen' mode doesn't conflict with 'restart' mode
Submitted by: scottl, max
Reviewed by: max, mckusick (earlier version)
Obtained from: Netflix
MFC after: 3 days
Clang errors around printf could be trivially fixed, but the breakage in
sbin/fsdb were to significant for this type of change.
Submitter of this changeset has been notified and hopefully this can be
restored soon.
that they do not need to be read again in pass5. As this nearly
doubles the memory requirement for fsck, the cache is thrown away
if other memory needs in fsck would otherwise fail. Thus, the
memory footprint of fsck remains unchanged in memory constrained
environments.
This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker
Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 4 weeks
that was built before ffs grew support for TRIM, your filesystem will have
plenty of free blocks that the flash chip doesn't know are free, so it
can't take advantage of them for wear leveling. Once you've upgraded your
kernel, you enable TRIM on the filesystem (tunefs -t enable), then run
fsck_ffs -E on it before mounting it.
I tested this patch by half-filling an mdconfig'ed filesystem image,
running fsck_ffs -E on it, then verifying that the contents were not
damaged by comparing them to a pristine copy using rsync's checksum
functionality. There is no reliable way to test it on real hardware.
Many thanks to mckusick@, who provided the tricky parts of this patch and
reviewed the final version.
Reviewed by: mckusick@
MFC after: 3 weeks
Due to UFS insistence to pretend that device sector size is 512 bytes,
sector size is obtained from ioctl(DIOCGSECTORSIZE) for real devices,
and from the label otherwise. The file images without label have to
be made with 512 sector size.
In collaboration with: pho
Reviewed by: jeff
Tested by: bz, pho
brings in support for an optional intent log which eliminates the need
for background fsck on unclean shutdown.
Sponsored by: iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
robust. With these changes fsck is now able to detect and reliably
rebuild corrupted cylinder group maps. The -D option is no longer
necessary as it has been replaced by a prompt asking whether the
corrupted cylinder group should be rebuilt and doing so when requested.
These actions are only offered and taken when running fsck in manual
mode. Corrupted cylinder groups found during preen mode cause the fsck
to fail.
Add the -r option to free up excess unused inodes. Decreasing the
number of preallocated inodes reduces the running time of future
runs of fsck and frees up space that can allocated to files. The -r
option is ignored when running in preen mode.
Reviewed by: Xin LI <delphij@>
Sponsored by: Rsync.net