1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-19 10:53:58 +00:00
Commit Graph

14059 Commits

Author SHA1 Message Date
Konstantin Belousov
271ab2406f For sigaction(2), ignore possible garbage in sa_flags for sa_handler
== SIG_DFL or SIG_IGN.  Sloppy code does not fully initialize struct
sigaction for such cases, and being too demanding in the case of
default handler does not catch anything.

Reported and tested by:	Alex Tutubalin <lexa@lexa.ru>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-16 07:06:58 +00:00
Hans Petter Selasky
1a26c3c047 Major callout subsystem cleanup and rewrite:
- Close a migration race where callout_reset() failed to set the
  CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
  spinlocks.
- Switching the callout CPU number cannot always be done on a
  per-callout basis. See the updated timeout(9) manual page for more
  information.
- The timeout(9) manual page has been updated to reflect how all the
  functions inside the callout API are working. The manual page has
  been made function oriented to make it easier to deduce how each of
  the functions making up the callout API are working without having
  to first read the whole manual page. Group all functions into a
  handful of sections which should give a quick top-level overview
  when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
  to reduce the complexity in the callout code and to avoid problems
  about atomically stopping callouts via callout_stop(). If someone
  needs it, it can be re-added. From my quick grep there are no
  CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
  added. See the updated timeout(9) manual page for a complete
  description.
- Update the callout clients in the "kern/" folder to use the callout
  API properly, like cv_timedwait(). Previously there was some custom
  sleepqueue code in the callout subsystem, which has been removed,
  because we now allow callouts to be protected by spinlocks. This
  allows us to tear down the callout like done with regular mutexes,
  and a "td_slpmutex" has been added to "struct thread" to atomically
  teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
  "SWT_SLEEPQTIMO" states can now be completely removed. Currently
  they are marked as available and will be cleaned up in a follow up
  commit.
- Bump the __FreeBSD_version to indicate kernel modules need
  recompilation.
- There has been several reports that this patch "seems to squash a
  serious bug leading to a callout timeout and panic".

Kernel build testing:	all architectures were built
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D1438
Sponsored by:		Mellanox Technologies
Reviewed by:		jhb, adrian, sbruno and emaste
2015-01-15 15:32:30 +00:00
Robert Watson
3d1a9ed34e In order to support ongoing work to implement variable-size mbufs, and
more generally make it easier to extend 'struct mbuf in the future', make
a number of changes to the data structure:

- As we anticipate embedding mbufs headers within variable-size regions of
  memory in the future, change the definitions of byte arrays embedded in
  mbufs to be of size [0] rather than [MLEN] and [MHLEN].  In fact, the
  cxgbe driver already uses 'struct mbuf' on the front of other storage
  sizes, but we would like the global mbuf allocator do be able to do this
  as well.

- Fold 'struct m_hdr' into 'struct mbuf' itself, eliminating a set of
  macros that aliased 'mh_foo' field names to 'm_foo' names such as
  'm_next'.  These present a particular problem as we would like to add
  new mbuf-header fields -- e.g., 'm_size' -- that, if similarly named via
  macros, would introduce collisions with many other variable names in the
  kernel.

- Rename 'struct m_ext' to 'struct struct_m_ext' so that we can add
  compile-time assertions without bumping into the still-extant 'm_ext'
  macro.

- Remove the MSIZE compile-time assertion for 'struct mbuf', but add new
  assertions for alignment of embedded data arrays (64-bit alignment even
  on 32-bit platforms), and for the sizes the mbuf header, packet header,
  and m_ext structure.

- Document that these assertions exist in comments in mbuf.h.

This change is not intended to cause (non-trivial) behavioural
differences, but is a precursor to further mbuf-allocator work.

Differential Revision:	https://reviews.freebsd.org/D1483
Reviewed by:	bz, gnn, np, glebius ("go ahead, I trust you")
Sponsored by:	EMC / Isilon Storage Division
2015-01-14 23:44:00 +00:00
Hans Petter Selasky
d2955419cd Avoid race with "dev_rel()" when using the recently added
"delist_dev()" function. Make sure the character device structure
doesn't go away until the end of the "destroy_dev()" function due to
concurrently running cleanup code inside "devfs_populate()".

MFC after:	1 week
Reported by:	dchagin@
2015-01-14 22:07:13 +00:00
Hans Petter Selasky
07dbde6777 Add a kernel function to delist our kernel character devices, so that
the device name can be re-used right away in case we are destroying
the character devices in the background.

MFC after:	4 days
Reported by:	dchagin@
2015-01-14 14:04:29 +00:00
Jamie Gritton
6a3f277901 Remove the prison flags PR_IP4_DISABLE and PR_IP6_DISABLE, which have been
write-only for as long as they've existed.
2015-01-14 04:50:28 +00:00
Jamie Gritton
0e5e396ede Don't set prison's pr_ip4s or pr_ip6s to -1.
PR:		196474
MFC after:	3 days
2015-01-14 03:52:41 +00:00
Konstantin Belousov
18cc2ff047 Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmem
does not access kernel mappings directly.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-12 08:58:07 +00:00
Ian Lepore
f8f02928d3 Fix an off by one in ppsratecheck(). If you asked for N=1 you'd get one,
but for any N>1 you'd get N-1 packets/events per second.
2015-01-11 20:48:29 +00:00
Robert Watson
a5e9a6d56f Garbage collect m_copymdata(), an mbuf utility routine introduced
in FreeBSD 7 that has not been used since.  It contains a number
of unresolved bugs including an inverted bcopy() and incorrect
handling of read-only mbufs using internal storage.  Removing this
unused code is substantially essier than fixing it in order to
update it to the coming mbuf world order -- but it can always be
restored from revision history if it turns out to prove useful for
future work.

Pointed out by:	jmallett
Sponsored by:	EMC / Isilon Storage Division
2015-01-10 10:41:23 +00:00
Dmitry Chagin
47ce3e5326 Allow clock_getcpuclockid() on the CPU-time clock for zombie process.
Posix does not prohibit this.

Differential Revision:	https://reviews.freebsd.org/D1470

Reviewed by:	kib
MFC after:	1 week
2015-01-10 07:22:38 +00:00
Xin LI
6e19f0def0 Improve style and fix a possible use-after-free case introduced in r268384
by reinitializing the 'freestate' pointer after freeing the memory.

Obtained from:	HardenedBSD (71fab80c5dd3034b71a29a61064625018671bbeb)
PR:		194525
Submitted by:	Oliver Pinter <oliver.pinter@hardenedbsd.org>
MFC after:	2 weeks
2015-01-10 06:48:35 +00:00
Robert Watson
3df42f654e Remove a 'This is dumb' comment that has been incorrect for at least a
decade: m_pulldown() is willing to consider ordinary mbufs writable.
Retain another, related, and also outdated comment, but with a caveat
that it is partially stale.  Do not, for now, address the problem that
it raises (that only EXT_CLUSTER external storage is considered
writable, regardless of the results of M_WRITABLE() on the mbuf).

MFC after:	3 days
Sponsored by:	EMC / Isilon Storage Division
2015-01-09 12:08:51 +00:00
John Baldwin
7a635f0d70 Change the default method for device_quiesce() to return 0 instead of
EOPNOTSUPP.  The current behavior can mask real quiesce errors since
devclass_quiesce_driver() stops iterating over drivers as soon as it
gets an error (incluiding EOPNOTSUPP), but the caller it returns the
error to explicitly ignores EOPNOTSUPP.

Reviewed by:	imp
2015-01-08 21:46:28 +00:00
John Baldwin
bbf686ed60 Reject attempts to read the cpuset mask of a negative domain ID. 2015-01-08 19:11:14 +00:00
John Baldwin
c0ae66888b Create a cpuset mask for each NUMA domain that is available in the
kernel via the global cpuset_domain[] array. To export these to userland,
add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a
specific domain. Add a -d flag to cpuset(1) that can be used to fetch
the mask for a given domain.

Differential Revision:	https://reviews.freebsd.org/D1232
Submitted by:	jeff (kernel bits)
Reviewed by:	adrian, jeff
2015-01-08 15:53:13 +00:00
Robert Watson
b66f2a48e6 Replace hand-crafted versions of M_SIZE() and M_START() in uipc_mbuf.c
with calls to the centralised macros, reducing direct use of MLEN and
MHLEN.

Differential Revision:	https://reviews.freebsd.org/D1444
Reviewed by:	bz
Sponsored by:	EMC / Isilon Storage Division
2015-01-08 11:16:21 +00:00
Mark Johnston
bdb9ab0dd9 Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR:		193873
Differential Revision:	https://reviews.freebsd.org/D904
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by:	jhibbits (earlier version)
Sponsored by:	EMC / Isilon Storage Division
2015-01-07 01:01:39 +00:00
Mark Johnston
bbd685e3a5 Use crcopysafe(9) to make a copy of a process' credential struct. crcopy(9)
may perform a blocking memory allocation, which is unsafe when holding a
mutex.

Differential Revision:	https://reviews.freebsd.org/D1443
Reviewed by:	rwatson
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-01-05 23:07:22 +00:00
John Baldwin
531d65e139 Trim trailing whitespace. 2015-01-05 20:50:44 +00:00
John Baldwin
92597e064b On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST).  Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.

PR:		192316
Differential Revision:	https://reviews.freebsd.org/D1441
No objection from:	jkim
MFC after:	1 week
2015-01-05 20:44:44 +00:00
Robert Watson
ed6a66ca6c To ease changes to underlying mbuf structure and the mbuf allocator, reduce
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:

- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().

This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.

Differential Revision:	https://reviews.freebsd.org/D1436
Reviewed by:	glebius, trasz, gnn, bz
Sponsored by:	EMC / Isilon Storage Division
2015-01-05 09:58:32 +00:00
Justin T. Gibbs
5b326a32ce Prevent live-lock and access of destroyed data in taskqueue_drain_all().
Phabric:	https://reviews.freebsd.org/D1247
Reviewed by:	jhb, avg
Sponsored by:	Spectra Logic Corporation

sys/kern_subr_taskqueue.c:
	Modify taskqueue_drain_all() processing to use a temporary
	"barrier task", rather than rely on a user task that may
	be destroyed during taskqueue_drain_all()'s execution.  The
	barrier task is queued behind all previously queued tasks
	and then has its priority elevated so that future tasks
	cannot pass it in the queue.

	Use a similar barrier scheme to drain threads processing
	current tasks.  This requires taskqueue_run_locked() to
	insert and remove the taskqueue_busy object for the running
	thread for every task processed.

share/man/man9/taskqueue.9:
	Remove warning about live-lock issues with taskqueue_drain_all()
	and indicate that it does not wait for tasks queued after
	it begins processing.
2015-01-04 19:55:44 +00:00
Dmitry Chagin
1beb1a8e13 Regen for r276654 (__getcwd()). 2015-01-04 10:40:23 +00:00
Dmitry Chagin
9f7a06f27e Indeed, instead of hiding the kern___getcwd() bug by bogus cast
in r276564, change path type to char * (pathnames are always char *).
And remove bogus casts of malloc().
kern___getcwd() internally doesn't actually use or support u_char *
paths, except to copy them to a normal char * path.

These changes are not visible to libc as libc/gen/getcwd.c misdeclares
__getcwd() as taking a plain char * path.

While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as
we always have sysproto.h.

Pointed out by:	bde

MFC after:	1 week
2015-01-04 10:34:02 +00:00
Hans Petter Selasky
04a8159ddf Rework r276532 a bit. Always avoid recursing into the console drivers
clients, hence they might not handle it very well. This change allows
debugging mutex problems with kernel console drivers when
"debug.witness.skipspin=0" is set in the boot environment.

MFC after:	1 week
2015-01-03 17:21:19 +00:00
Hans Petter Selasky
2029b6c9e6 The "cnputs_mtx" mutex must be allowed to recurse. Debug prints and/or
witness printouts in the console driver clients can cause this mutex
to recurse by calls to "printf()" from witness for example. In
particular this can happen if "debug.witness.skipspin=0" is set in the
boot environment.

MFC after:	1 week
2015-01-02 13:10:33 +00:00
Mateusz Guzik
af77c1a620 Convert vfs hash lock from a mutex to an rwlock. 2014-12-30 21:40:45 +00:00
Warner Losh
12d7eaa009 Turns out, this isn't only called from i386... 2014-12-30 02:39:47 +00:00
Mateusz Guzik
84267cacf4 sysctl: don't modify oid_running for static nodes
It is necessary to prevent nodes from being destroyed while used, but static
ones cannot be destroyed.
2014-12-28 19:24:01 +00:00
Rick Macklem
46b34e5adc Fix the comment introduced in r276192 so that it clearly
states that the change is needed to avoid a deadlock.

Suggested by:	kib
MFC after:	1 week
2014-12-25 14:44:04 +00:00
Rick Macklem
65cb225c5e Modify vop_stdadvlock{async}() so that it only
locks/unlocks the vnode and does a VOP_GETATTR()
for the SEEK_END case. This is safe to do, since
lf_advlock{async}() only uses the size argument
for the SEEK_END case.
The NFSv4 server needs this when
vfs.nfsd.enable_locallocks!=0 since locking the
vnode results in a LOR that can cause a deadlock
for the nfsd threads.

Reviewed by:	kib
MFC after:	1 week
2014-12-24 22:58:08 +00:00
Gleb Smirnoff
53b680caa2 In sbappend*() family of functions clear M_PROTO flags of incoming
mbufs. sbappendstream() already does this in m_demote().

PR:		196174
Sponsored by:	Nginx, Inc.
2014-12-22 15:39:24 +00:00
Konstantin Belousov
8ee9765a9d Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that
the created file name was cached.  Use the flag for core dumps.

Requested by:	rpaulo
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-21 13:32:07 +00:00
Warner Losh
61f26cae7d Where appropriate, use the modern terms for the one true time base
(UTC) rather than the archaic (GMT) in comments. Except where the
comments are making fun of people doing this (and pedants who insist
on the new terms).
2014-12-21 05:07:11 +00:00
Gleb Smirnoff
e834a84026 Revert r274494, r274712, r275955 and provide extra comments explaining
why there could appear a zero-sized mbufs in socket buffers.

A proper fix would be to divorce record socket buffers and stream
socket buffers, and divorce pru_send that accepts normal data from
pru_send that accepts control data.
2014-12-20 22:12:04 +00:00
Gleb Smirnoff
b7413232f6 Add to sbappendstream_locked() a check against NULL mbuf, like it is done
in sbappend_locked() and sbappendrecord_locked().

This is a quick fix to the panic introduced by r274712.

A proper solution should be to make sosend_generic() avoid calling
pru_send() with NULL mbuf for the protocols that do not understand
control messages. Those protocols that understand control messages,
should be able to receive NULL mbuf, if control is non-NULL.
2014-12-20 14:19:46 +00:00
Konstantin Belousov
6c21f6edb8 The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter().  Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by:	Chris Torek <torek@pi-coral.com>
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-18 10:01:12 +00:00
Gleb Kurtsou
dde58752db Adjust printf format specifiers for dev_t and ino_t in kernel.
ino_t and dev_t are about to become uint64_t.

Reviewed by:	kib, mckusick
2014-12-17 07:27:19 +00:00
Konstantin Belousov
5b73811feb Add missed break.
CID:	1258587
Sponsored by:	The FreeBSD Foundation
MFC after:	20 days
2014-12-16 09:49:07 +00:00
Konstantin Belousov
917dd39084 Add missed break.
CID:	1258586
Sponsored by:	The FreeBSD Foundation
MFC after:	4 days
2014-12-16 09:48:23 +00:00
John Baldwin
5ad25ceb41 Check for SS_NBIO in so->so_state instead of sb->sb_flags in
soreceive_stream().

Differential Revision:	https://reviews.freebsd.org/D1299
Reviewed by:	bz, gnn
MFC after:	1 week
2014-12-15 17:52:08 +00:00
Konstantin Belousov
237623b028 Add a facility for non-init process to declare itself the reaper of
the orphaned descendants.  Base of the API is modelled after the same
feature from the DragonFlyBSD.

Requested by:	bapt
Reviewed by:	jilles (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-12-15 12:01:42 +00:00
Konstantin Belousov
83d7eb544c Fix gcc build.
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2014-12-14 08:43:13 +00:00
Dmitry Chagin
fd07ddcf6f Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9).
A _NEW flag passed to _init_flags() to avoid check for double-init.

Differential Revision:	https://reviews.freebsd.org/D1208
Reviewed by:	jhb, wblock
MFC after:	1 Month
2014-12-13 21:00:10 +00:00
Konstantin Belousov
6ddcc23386 Add facility to stop all userspace processes. The supposed use of the
feature is to quisce the system before suspend.

Stop is implemented by reusing the thread_single(9) with the special
mode SINGLE_ALLPROC.  SINGLE_ALLPROC differs from the existing
single-threading modes by allowing (requiring) caller to operate on
other process.  Interruptible sleeps for !TDF_SBDRY threads are
suspended like SIGSTOP does it, instead of aborting the sleep, like
SINGLE_NO_EXIT, to avoid spurious EINTRs on resume.

Provide debugging sysctl debug.stop_all_proc, which causes total stop
and suspends syncer, while waiting for variable reset for resume.  It
is used for debugging; should be removed after the real use of the
interface is added.

In collaboration with:	pho
Discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-13 16:18:29 +00:00
Konstantin Belousov
0061ddb3ed Only sleep interruptible while waiting for suspension end when
filesystem specified VFCF_SBDRY flag, i.e. for NFS.

There are two issues with the sleeps.  First, applications may get
unexpected EINTR from the disk i/o syscalls.  Second, interruptible
sleep allows the stop of the process, and since mount point is
referenced while thread sleeps, unmount cannot free mount point
structure' memory, blocking unmount indefinitely.

Even for NFS, it is probably only reasonable to enable PCATCH for intr
mounts, but this information is currently not available at VFS level.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-13 16:07:01 +00:00
Konstantin Belousov
ea117d1735 The vinactive() call in vgonel() may start writes for the dirty pages,
creating delayed write buffers belonging to the reclaimed vnode.  Put
the buffer cleanup code after inactivation.

Add asserts that ensure that buffer queues are empty and add BO_DEAD
flag for bufobj to check that no buffers are added after the cleanup.
BO_DEAD is only used by INVARIANTS-enabled kernels.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-13 16:02:37 +00:00
Konstantin Belousov
fe21241ee0 For architectures where time_t is wide enough, in particular, 64bit
platforms, avoid overflow after year 2038 in clock_ct_to_ts().

PR:	195868
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-12 09:37:18 +00:00
Konstantin Belousov
b2344ab5ff Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount.
Since VFS does not/cannot stop writes, sync might run indefinitely, or
be a wrong thing to do at all.  E. g. NFS ignores VFS_SYNC() for
forced unmounts, since non-responding server does not allow sync to
finish.  On the other hand, filesystems can and do stop writes using
fs-specific facilities, and should already fully flush caches in
VFS_UNMOUNT() due to the race.

Adjust msdosfs tp sync in unmount for forced call, to accomodate the
new behaviour.  Note that it is still racy, since writes are not
stopped.

Discussed with:	avg, bjk, mckusick
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-12-09 10:00:47 +00:00