1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-19 10:53:58 +00:00
Commit Graph

14059 Commits

Author SHA1 Message Date
Konstantin Belousov
a77c72f5ae Apply chunk forgotten in r275620. Remove local variable for real.
CID:	1257462
Sponsored by:	The FreeBSD Foundation
2014-12-09 09:36:28 +00:00
Konstantin Belousov
a25100c539 Add functions syncer_suspend() and syncer_resume(), which are supposed
to be called before suspension and after resume, correspondingly.  The
syncer_suspend() ensures that all filesystems dirty data and metadata
are saved to the permanent storage, and stops kernel threads which
might modify filesystems.  The syncer_resume() restores stopped
threads.

For now, only syncer is stopped.  This is needed, because each sync
loop causes superblock updates for UFS.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:48:57 +00:00
Konstantin Belousov
904ed548bb When getnewbuf_reuse_bp() is called to reclaim some (clean) buffer,
the vnode owning the buffer is not locked.  More, it cannot be locked
safely, since getnewbuf_reuse_bp() is called from newbuf(), and some
other vnode is already locked, for which reused buffer will be
reassigned.

As the consequence, reclamation of the owning vnode could go in
parallel, in particular, the call to vnode_destroy_vobject(), which
deallocates the vm object and zeroes the v_bufobj->bo_object.  Note
that the pages wired by the buffer are left wired and can be safely
freed by the vfs_vmio_release() without the need for the vm object
lock.  Also, seeing stale pointer to the v_object is safe due to vm
object type stability.

Check for bo_bufobj != NULL and cache the value in local variable to
avoid trying to lock NULL vm object.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:42:34 +00:00
Konstantin Belousov
07a9368a48 Do some refactoring and minor cleanups of the thread_single() code in
preparation for the global stop commit.

Move the code to weed suspended or sleeping threads into the
appropriate state, into the helper weed_inhib().  Current code already
has deep nesting and hard to follow [1].

Add currently useless helper remain_for_mode(), which returns the
count of threads which are allowed to run, according to the
single-threading mode.

In thread_single_end(), do not save curthread into local variable, it
is unused after, except to find curproc.

Remove stray empty line.

Requested by:	avg [1]
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:27:43 +00:00
Konstantin Belousov
8638fe7bea Thread waiting for the vfork(2)-ed child to exec or exit, must allow
for the suspension.

Currently, the loop performs uninterruptible cv_wait(9) call, which
prevents suspension until child allows further execution of parent.
If child is stopped, suspension or single-threading is delayed
indefinitely.

Create a helper thread_suspend_check_needed() to identify the need for
a call to thread_suspend_check().  It is required since call to the
thread_suspend_check() cannot be safely done while owning the child
(p2) process lock.  Only when suspension is needed, drop p2 lock and
call thread_suspend_check().  Perform wait for cv with timeout, in
case suspend is requested after wait started; I do not see a better
way to interrupt the wait.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:18:05 +00:00
Konstantin Belousov
aba1ca528e When process is exiting, check for suspension regardless of
multithreaded status of the process.

The stopped state must be cleared before P_WEXIT is set.  A stop
signal delivered just before first PROC_LOCK() block in exit1(9) would
put the process into pending stop with P_WEXIT set or assertion
triggered.  Also recheck for the suspension after failed
thread_single(9) call, since process lock could be dropped.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:02:02 +00:00
Andriy Gapon
036a8c5dac remove opensolaris cyclic code, replace with high-precision callouts
In the old days callout(9) had 1 tick precision and that was inadequate
for some uses, e.g. DTrace profile module, so we had to emulate cyclic
API and behavior.  Now we can directly use callout(9) in the very few
places where cyclic was used.

Differential Revision:	https://reviews.freebsd.org/D1161
Reviewed by:	gnn, jhb, markj
MFC after:	2 weeks
2014-12-07 11:21:41 +00:00
Warner Losh
d0b6da086f Const poison in a few places to ensure we don't modify things
through the module data pointer.
2014-12-03 22:14:13 +00:00
John Baldwin
b10c08a52b Revert device_getenv_int() for now as it duplicates resource_int_value().
We should perhaps implement a device_getenv_*() and device_setenv_*() API
as a convenience wrapper on top of resource_*_value() and resource_set_*().
2014-12-03 15:29:53 +00:00
Konstantin Belousov
6afb32fc67 Disable recursion for the process spinlock.
Tested by:	pho
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2014-12-01 17:36:10 +00:00
Justin T. Gibbs
2c6bf3d90b Remove trailing whitespace. 2014-11-30 19:32:00 +00:00
Gleb Smirnoff
c80ea19b38 Merge from projects/sendfile:
Provide pru_ready for AF_LOCAL sockets.  Local sockets sendsdata directly
to the receive buffer of the peer, thus pru_ready also works on the peer
socket.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 13:40:58 +00:00
Gleb Smirnoff
651e4e6a30 Merge from projects/sendfile: extend protocols API to support
sending not ready data:
o Add new flag to pru_send() flags - PRUS_NOTREADY.
o Add new protocol method pru_ready().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-30 13:24:21 +00:00
Gleb Smirnoff
0f9d0a73a4 Merge from projects/sendfile:
o Introduce a notion of "not ready" mbufs in socket buffers.  These
mbufs are now being populated by some I/O in background and are
referenced outside.  This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
  freed.

o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
  The sb_ccc stands for ""claimed character count", or "committed
  character count".  And the sb_acc is "available character count".
  Consumers of socket buffer API shouldn't already access them directly,
  but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
  with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
  search.
o New function sbready() is provided to activate certain amount of mbufs
  in a socket buffer.

A special note on SCTP:
  SCTP has its own sockbufs.  Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs.  Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them.  The new
notion of "not ready" data isn't supported by SCTP.  Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
  A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does.  This was discussed with rrs@.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 12:52:33 +00:00
Gleb Smirnoff
57f43a45a3 - Move sbcheck() declaration under SOCKBUF_DEBUG.
- Improve SOCKBUF_DEBUG macros.
- Improve sbcheck().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 11:22:39 +00:00
Gleb Smirnoff
8967b220a3 Make sballoc() and sbfree() functions. Ideally, they could be marked
as static, but unfortunately Infiniband (ab)uses them.

Sponsored by:	Nginx, Inc.
2014-11-30 11:02:07 +00:00
Warner Losh
fac92ae126 The current limit of 100k for the linker hints file is getting a bit
crowded as we now are at about 70k. Bump the limit to 1MB instead
which is still quite a reasonable limit and allows for future growth
of this file and possible future expansion to additional data.

MFC After: 2 weeks
2014-11-29 17:29:30 +00:00
Konstantin Belousov
6762091ea4 Remove lock recursion for the pipe pair mutex, and disable the
recursion on mutex initialization.

The only places where the recursive acquire is performed are read and
write filters, since knlist, which uses the pipe pair mutex as lock,
is locked when filter is called.

The recursion was added in r93296, and consistent locking for
kn_fop->f_event() introduced in r133741.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2014-11-29 17:18:20 +00:00
Konstantin Belousov
70778bba03 Assert the state of the process lock and sigact mutex in
kern_sigprocmask() and reschedule_signals().

Discussed with:	rea
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-28 10:20:00 +00:00
Hans Petter Selasky
50ae6690fc Style changes:
- Move two IOCTL related defines to the top of the C-file
- Add more comments describing the recently added IOCTL small size and
small align macros
2014-11-28 09:32:07 +00:00
Alfred Perlstein
56c14bca7e Make igb and ixgbe check tunables at probe time.
This allows one to make a kernel module to tune the
number of queues before the driver loads.

This is needed so that a module at SI_SUB_CPU can set
tunables for these drivers to take.  Otherwise getenv
is called too early by the TUNABLE macros.

Reviewed by: smh
Phabric: https://reviews.freebsd.org/D1149
2014-11-26 20:19:36 +00:00
Konstantin Belousov
5c7bebf961 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
Konstantin Belousov
e442f29f08 Fix SA_SIGINFO | SA_RESETHAND handling. The sysent' sv_sendsig()
method needs pre-reset state of the ps_siginfo to correctly construct
signal frame.

Move sigdflt() call after the sv_sendsig() invocation in postsig().
Simultaneously extract common code from trapsignal() and postsig()
into new helper postsig_done().

Submitted by:	rea
MFC after:	1 week
2014-11-26 14:09:04 +00:00
John Baldwin
a2d751936b Add a bus_get_domain() wrapper around BUS_GET_DOMAIN(). Use this to add
a new per-device '%domain' sysctl node that returns the NUMA domain a
device is associated with if it is associated with one.

Note that this API is still a WIP and might change before 11.0 actually
ships.

Differential Revision:	https://reviews.freebsd.org/D930
Reviewed by:	kib, adrian
2014-11-24 19:55:45 +00:00
John Baldwin
20abb66ede Properly initialize the capability rights for vnodes exported to procstat
that aren't for file descriptors (cwd, jdir, tracevp, etc.).

Submitted by:	Mikhail <mp@lenta.ru>
2014-11-24 18:34:11 +00:00
Gleb Smirnoff
90effb2341 Merge from projects/sendfile:
o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but
  doesn't sleep. It returns immediately, and will execute the I/O done handler
  function that must be supplied as argument.
o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager.
o Extend pagertab to support pgo_getpages_async method, and implement this
  method for vnode_pager.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-23 12:01:52 +00:00
Mateusz Guzik
dff9862c0e ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct
Change racct_ create and destroy to macros evaluating to nothing without RACCT
so that their callers passing ui_racct don't have to be ifdefed.
2014-11-23 08:25:44 +00:00
Mateusz Guzik
0c0d16e8ac filedesc: plug a test for impossible condition in fgetvp_rights 2014-11-23 00:12:27 +00:00
Konstantin Belousov
64779280c9 The size value should be asserted when it is known.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
2014-11-22 18:15:02 +00:00
John Baldwin
180e57e5c7 Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).

Differential Revision:	https://reviews.freebsd.org/D1193
Reviewed by:	kib
MFC after:	2 weeks
2014-11-21 20:53:17 +00:00
Gleb Smirnoff
67af272bcf Do not allocate zero-length mbuf in sosend_generic().
Found by:	pho
Sponsored by:	Nginx, Inc.
2014-11-19 14:27:38 +00:00
Zbigniew Bodek
dc61566f95 Stop using early_putc immediately after configuring console with cninit()
Early UART should be released right after system console initialization is
completed. Otherwise, after cninit() both early and system console coexist
what may lead to various issues (i.a. writing to unmapped early
UART address). This cannot be done in cninit_finish() since it can be
called late at the end of MI configuration.

Obtained from:   Semihalf
Reviewed by:     andrew
Sponsored by:    The FreeBSD Foundation
2014-11-19 14:23:29 +00:00
Warner Losh
40e6bdaf1e opt_global.h is included automatically in the build. No need to
explicitly include it in these places.

Sponsored by: Netflix
2014-11-18 17:06:56 +00:00
John-Mark Gurney
2c30bc1fcf prevent doing filter ops locking for staticly compiled filter ops...
This significantly reduces lock contention when adding/removing knotes
on busy multi-kq system...  Next step is to cache these references per
kq.. i.e. kq refs it once and keeps a local ref count so that the same
refs don't get accessed by many cpus...

only allocate a knote when we might use it...

Add a new flag, _FORCEONESHOT..  This allows a thread to force the
delivery of another event in a safe manner, say waking up an idle http
connection to force it to be reaped...

If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's
disabled, so won't be delivered anyways..

Tested by:	adrian
2014-11-16 01:18:41 +00:00
Gleb Smirnoff
8146bcfea1 - Use NULL to compare a pointer.
- Use KASSERT() instead of panic.
- Remove useless 'continue', no need to restart cycle here.

Sponsored by:	Nginx, Inc.
2014-11-14 15:44:19 +00:00
Gleb Smirnoff
6bf6b25e88 Merge from projects/sendfile:
Use sbcut_locked() instead of manually editing a sockbuf.

Sponsored by:	Nginx, Inc.
2014-11-14 15:33:40 +00:00
Konstantin Belousov
5fab60a071 In vfs_write_suspend_umnt(), if suspension cannot be established, do
not forget to restore write ops count when returning the error.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-14 11:31:10 +00:00
Gleb Smirnoff
f274e25659 There should not be zero length mbufs in socket buffers. The code comes
from r1451, and thus can't be explained.  A patch with explicit panic()
here survived all tests.

Tested by:	pho
Sponsored by:	Nginx, Inc.
2014-11-14 06:02:29 +00:00
Jung-uk Kim
db1ec81edd Correct a typo to fix chown(2). It was broken since r274476.
Pointy hat to:	kib
X-MFC-With:	r274476
2014-11-13 23:51:13 +00:00
Mateusz Guzik
eb48fbd963 filedesc: fixup fdinit to lock fdp and preapare files conditinally
Not all consumers providing fdp to copy from want files.

Perhaps these functions should be reorganized to better express the outcome.

This fixes up panics after r273895 .

Reported by:	markj
2014-11-13 21:15:09 +00:00
Konstantin Belousov
416be7a1c6 Fix assertion, &uc->uc_busy is never zero, the intent is to test the
uc_busy value, and not its address [1].

Remove the single use of the macro, write KASSERT() explicitely in the
code of umtxq_sleep_pi().

Submitted by:	Eric van Gyzen <eric@vangyzen.net> [1]
MFC after:	1 week
2014-11-13 18:51:09 +00:00
Konstantin Belousov
6e646651d3 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
Konstantin Belousov
e64b4fa858 Do not try to dereference thread pointer when the value is not a pointer.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 17:44:35 +00:00
Konstantin Belousov
f2c1a52afb Remove fossil. It has been present in 4.4Lite2, but its use was
removed for some time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 17:43:37 +00:00
Dmitry Chagin
c28d9d0f9f Regen for r274462. 2014-11-13 05:28:06 +00:00
Dmitry Chagin
186d9c3473 Add the ppoll() system call.
Export kern_poll() needed by an upcoming Linuxulator change.

Differential Revision:	https://reviews.freebsd.org/D1133
Reviewed by:	kib, wblock
MFC after:	1 month
2014-11-13 05:26:14 +00:00
Konstantin Belousov
389a25c716 For posix_fallocate(2) and posix_fadvise(2), return ESPIPE when
underlying file does not have DFLAG_SEEKABLE set [1].

For posix_fallocate(2), simplify error handling logic.  Do return when
fp is not yet referenced.

Noted by:	bde [1]
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-12 17:31:38 +00:00
Gleb Smirnoff
2b21d0e883 Merge from projects/sendfile:
- Use KASSERT()s instead of panic().
- Use sbavail() instead of sb_cc.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-12 10:17:46 +00:00
Gleb Smirnoff
cfa6009e36 In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-12 09:57:15 +00:00
Gleb Smirnoff
efe28398f5 Fix build. 2014-11-11 22:08:18 +00:00
Gleb Smirnoff
0e87b36eaa Remove SF_KQUEUE code. This code was developed at Netflix, but was not
ever used.  It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained.  It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.

Sponsored by:	Netflix
2014-11-11 20:32:46 +00:00
Pawel Jakub Dawidek
5ebb15b942 Add missing privilege check when setting the dump device. Before that change it
was possible for a regular user to setup the dump device if he had write access
to the given device. In theory it is a security issue as user might get access
to kernel's memory after provoking kernel crash, but in practise it is not
recommended to give regular users direct access to storage devices.

Rework the code so that we do privileges check within the set_dumper() function
to avoid similar problems in the future.

Discussed with:	secteam
2014-11-11 04:48:09 +00:00
Konstantin Belousov
0436fcb809 When sleeping waiting for the profiling stop, always set P_STOPPROF
before dropping process lock.  Clear P_STOPPROF when doing wakeup.

Both issues caused thread to hang in stopprofclock() "stopprof" sleep.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-10 14:11:17 +00:00
Alexander V. Chernikov
5e11eb847e Finish r274118#2: commit forgotten uipc_debug.c 2014-11-06 15:17:04 +00:00
Bjoern A. Zeeb
763f2e7844 After the changes in r274118 make NOIP kernels compile by hiding an
otherwise unused variable declaration behind INET6 || INET.

MFC after:	27 days
X-MFS with:	r274118
2014-11-06 12:19:39 +00:00
Mateusz Guzik
bfda9935bd Add sysctl kern.proc.cwd
It returns only current working directory of given process which saves a lot of
overhead over kern.proc.filedesc if given proc has a lot of open fds.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified)
X-Additional:	JuniorJobs project
2014-11-06 08:12:34 +00:00
Mateusz Guzik
3ae366de58 filedesc: avoid taking fdesc_mtx when not necessary in fddrop
No functional changes.
2014-11-06 07:44:10 +00:00
Mateusz Guzik
eb6021fb96 filedesc: just free old tables without altering the list which is freed anyway
No functional changes.
2014-11-06 07:37:31 +00:00
Mateusz Guzik
a99500a912 Extend struct ucred with group table.
This saves one malloc + free with typical cases and better utilizes
memory.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified)
X-Additional:	JuniorJobs project
2014-11-05 02:08:37 +00:00
Alexander V. Chernikov
9f25cbe45e Remove old hack abusing domattach from NFS code.
According to IANA RPC uaddr registry, there are no AFs
except IPv4 and IPv6, so it's not worth being too abstract here.

Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries.
Use own initialization without relying on domattach code.

While I admit that this was one of the rare places in kernel
networking code which really was capable of doing multi-AF
without any AF-depended code, it is not possible anymore to
rely on dom* code.

While here, change terrifying "Invalid radix node head, rn:" message,
to different non-understandable "netcred already exists for given addr/mask",
but less terrifying. Since we know that rn_addaddr() returns NULL if
the same record already exists, we should provide more friendly error.

MFC after:	1 month
2014-11-05 00:58:01 +00:00
Dag-Erling Smørgrav
bccb6d5aa1 [SA-14:25] Fix kernel stack disclosure in setlogin(2) / getlogin(2).
[SA-14:26] Fix remote command execution in ftp(1).

Approved by:	so (des)
2014-11-04 23:29:29 +00:00
John Baldwin
2cba8dd301 Add a new thread state "spinning" to schedgraph and add tracepoints at the
start and stop of spinning waits in lock primitives.
2014-11-04 16:35:56 +00:00
Hans Petter Selasky
0ecd606b24 Simplify logic a bit. Ensure data buffer is properly aligned,
especially for platforms where unaligned access is not allowed. Make
it possible to override the small buffer size.

A simple continuous read string test using libusb showed a reduction
in CPU usage from roughly 10% to less than 1% using a dual-core GHz
CPU, when the malloc() operation was skipped for small buffers.

MFC after:	2 weeks
2014-11-04 11:29:49 +00:00
Jean-Sébastien Pédron
2d6f6d6373 Enable vt(4) by default
vt(4) is a new console driver which brings features such as:
    o  Support for Unicode and double-width characters
    o  Integration with the KMS kernel video drivers
    o  Support for UEFI

You may need to update your console settings in /etc/rc.conf, most
probably the keymap. During boot, /etc/rc.d/syscons will indicate what
you need to do.

vt(4) still has issues and lacks some features compared to syscons(4).
See the wiki for up-to-date information:
    https://wiki.freebsd.org/Newcons

If you want to keep using syscons(4), you can do so by adding the
following line to /boot/loader.conf:
    kern.vty=sc

Differential Revision:	https://reviews.freebsd.org/D1005
Discussed with:	emaste@, nwhitehorn@, ray@
Relnotes:	yes
2014-11-04 10:18:03 +00:00
Konstantin Belousov
74d5b4af82 Clean up confusing comment. Move it to the place of code which is
talked about.  Explain where the mentioned trampoline located
(usermode), and the fact that attempt to exit last thread is denied in
kernel (by delegating the work to usermode).

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-03 11:29:08 +00:00
Konstantin Belousov
ab57474c83 When other end of the pipe closed during the write, but some bytes
were written, return short write instead of EPIPE.

Update comment.

Discussed with:	bde (long time ago)
MFC after:	2 weeks
2014-11-03 10:01:56 +00:00
Mateusz Guzik
5cbf44bf89 Provide an on-stack temporary buffer for small ioctl requests. 2014-11-03 07:46:51 +00:00
Mateusz Guzik
324a7026f1 filedesc: plus sys/kdb.h include which crept in with r274007 2014-11-03 06:24:43 +00:00
Mateusz Guzik
1d29258ac2 filedesc: plug unnecessary fdp NULL checks in fdescfreee and fdcopy
Anything reaching these functions has fd table.
2014-11-03 05:12:17 +00:00
Mateusz Guzik
32417098f0 filedesc: create a dedicated zone for struct filedesc0
Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from
malloc use 2048 bytes.

There is no easy way to shrink the structure <= 1024 an it is likely to grow in
the future.
2014-11-03 04:16:04 +00:00
Konstantin Belousov
cc24666735 Followup to r273966. Fix the build with ADAPTIVE_LOCKMGRS kernel option.
Note that the option is currently not used in any in-tree kernel
configs, including LINTs.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-11-02 19:51:33 +00:00
Mateusz Guzik
3dca54ab98 filedesc: move freeing old tables to fdescfree
They cannot be accessed by anyone and hold count only protects the structure
from being freed.
2014-11-02 14:12:03 +00:00
Mateusz Guzik
3dc85312b2 filedesc: factor out some code out of fdescfree
Previously it had a huge self-contained chunk dedicated to dealing with shared
tables.

No functional changes.
2014-11-02 13:43:04 +00:00
Konstantin Belousov
72ba3c0822 Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines
whether the shared request for already shared-locked lock could be
granted.  Both problems result in the exclusive locker starvation.

The concurrent exclusive request is indicated by either
LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags.  The reverse
condition, i.e. no exclusive waiters, must check that both flags are
cleared.

Add a flag LK_NODDLKTREAT for shared lock request to indicate that
current thread guarantees that it does not own the lock in shared
mode.  This turns back the exclusive lock starvation avoidance code;
see man page update for detailed description.

Use LK_NODDLKTREAT when doing lookup(9).

Reported and tested by:	pho
No objections from:	attilio
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-11-02 13:10:31 +00:00
Mateusz Guzik
080fdefc28 filedesc: tidy up fdcheckstd
No functional changes.
2014-11-02 02:32:33 +00:00
Mateusz Guzik
d3f3e12a4f filedesc: lock filedesc lock in fdcloseexec only when needed 2014-11-02 01:13:11 +00:00
Mateusz Guzik
cdcf242896 Fix up module unload for syscall_module_handler consumers.
After r273707 it was registering syscalls as static.

This fixes hwpmc module unload.

Reported by: markj
2014-11-01 22:36:40 +00:00
Jean-Sébastien Pédron
da49f6bcc3 vt(4): Adjust the cursor position after changing the window size
A new terminal_set_cursor() is added: it wraps the existing
teken_set_cursor() function.

In vtbuf_grow(), the cursor position is adjusted at the end of the
function. In vt_change_font(), we call terminal_set_cursor() just after
terminal_set_winsize_blank(), while the terminal is mute.

This fixes a bug where, after loading a kernel video driver which
increases the terminal window size, the cursor remains at its old
position, in other words, in the middle of the display content.

PR:		194421
MFC after:	1 week
2014-11-01 17:05:15 +00:00
Konstantin Belousov
2361c6d135 Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).  This makes the functions type-compatible
with volatile objects and does not require devolatile force, e.g. in
kern_umtx.c.

Requested by:	bde
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-10-31 17:43:21 +00:00
Mateusz Guzik
2534d8eeb6 filedesc: drop retval argument from do_dup
It was almost always td_retval anyway.

For the one case where it is not, preserve the old value across the call.
2014-10-31 10:35:01 +00:00
Mateusz Guzik
8a5177cca3 filedesc: fix missed comments about fdsetugidsafety
While here just note that both fdsetugidsafety and fdcheckstd take sleepable
locks.
2014-10-31 09:56:00 +00:00
Mateusz Guzik
f652d856ab filedesc: make fdinit return with source filedesc locked and new one sized
appropriately

Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable.
We don't have to call it with the lock held if we are just creating new
filedesc.

As a side note, strictly speaking processes can have fdtables with
fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file
descriptor they get will be 0 and the only syscall allowing to choose fd number
requires an active file descriptor. Should this ever change, we can add an 'init'
(or similar) parameter to fdgrowtable.
2014-10-31 09:25:28 +00:00
Mateusz Guzik
ffeb890592 filedesc: iterate over fd table only once in fdcopy
While here add 'fdused_init' which does not perform unnecessary work.

Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold
it when appropriate. This function is only used with INVARIANTS.

No functional changes intended.
2014-10-31 09:19:46 +00:00
Mateusz Guzik
1a0c80a3df filedesc: tidy up fdfree
Implement fdefree_last variant and get rid of 'last' parameter.

No functional changes.
2014-10-31 09:15:59 +00:00
Mateusz Guzik
b97a758ffc filedesc: tidy up fdcopy a little bit
Test for file availability by fde_file != NULL instead of fdisused, this is
consistent with similar checks later.

Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never
reached in practice.

fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS.

No functional changes.
2014-10-31 05:41:27 +00:00
Mark Murray
10cb24248a This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random.
This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources.

The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people.

The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway.

Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to.

My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise.

My Nomex pants are on. Let the feedback commence!

Reviewed by:	trasz,des(partial),imp(partial?),rwatson(partial?)
Approved by:	so(des)
2014-10-30 21:21:53 +00:00
Mateusz Guzik
f55cf4b0d1 filedesc: make sure to force table reload in fget_unlocked when count == 0
This is a fixup to r273843.
2014-10-30 07:21:38 +00:00
Mateusz Guzik
29c85772bb filedesc: microoptimize fget_unlocked by retrying obtaining reference count
without restarting whole lookup

Restart is only needed when fp was closed by current process, which is a much
rarer event than ref/deref by some other thread.
2014-10-30 05:21:12 +00:00
Mateusz Guzik
aa77d52800 filedesc: get rid of atomic_load_acq_int from fget_unlocked
A read barrier was necessary because fd table pointer and table size were
updated separately, opening a window where fget_unlocked could read new size
and old pointer.

This patch puts both these fields into one dedicated structure, pointer to which
is later atomically updated. As such, fget_unlocked only needs data a dependency
barrier which is a noop on all supported architectures.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
2014-10-30 05:10:33 +00:00
John Baldwin
01e1933dcc Rework virtual machine hypervisor detection.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
  the 0x40000000 leaf to determine the hypervisor vendor ID.  Export the
  vendor ID and the highest supported hypervisor CPUID leaf via
  hv_vendor[] and hv_high variables, respectively.  The hv_vendor[]
  array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
  identcpu.c.  Add a VM_GUEST_VMWARE to identify vmware and use that in
  the TSC code to identify VMWare.

Differential Revision:	https://reviews.freebsd.org/D1010
Reviewed by:	delphij, jkim, neel
2014-10-28 19:17:44 +00:00
Konstantin Belousov
f7e91c288a Convert kern_umtx.c to use fueword() and casueword().
Also fix some mishandling of suword(9) errors as errno, which resulted
in spurious ERESTART.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	3 weeks
2014-10-28 15:30:33 +00:00
Konstantin Belousov
0a2c94b86e Replace some calls to fuword() by fueword() with proper error checking.
Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	3 weeks
2014-10-28 15:28:20 +00:00
Konstantin Belousov
4f3dc90023 Add fueword(9) and casueword(9) functions. They are like fuword(9)
and casuword(9), but do not mix value read and indication of fault.

I know (or remember) enough assembly to handle x86 and powerpc.  For
arm, mips and sparc64, implement fueword() and casueword() as wrappers
around fuword() and casuword(), which means that the functions cannot
distinguish between -1 and fault.

On architectures where fueword() and casueword() are native, implement
fuword() and casuword() using fueword() and casuword(), to reduce
assembly code duplication.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	2 weeks (ia64 needs treating)
2014-10-28 15:22:13 +00:00
Hans Petter Selasky
0e1152fcc2 The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-28 12:00:39 +00:00
Mateusz Guzik
a7963b9758 Simplify sys_getloginclass.
Just use current thread credentials as they have the same accuracy as the
ones obtained from proc..
2014-10-28 04:59:33 +00:00
Mateusz Guzik
b720dc9749 Change loginclass mutex to an rwlock.
While here reduce nesting in loginclass_free.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn>
X-Additional:	JuniorJobs project
MFC after:	2 weeks
2014-10-28 04:33:57 +00:00
Mateusz Guzik
8d866b10fc Tidy up functions related to uidinfo management.
- reference found uidinfo in uilookup
- reduce nesting by handling shorter cases first
2014-10-27 20:20:05 +00:00
Mateusz Guzik
8101958c4f De-k&r-ify function definitions in kern/kern_resource.c
No functional changes.
2014-10-27 20:18:30 +00:00
Mateusz Guzik
e015b1ab0a Avoid dynamic syscall overhead for statically compiled modules.
The kernel tracks syscall users so that modules can safely unregister them.

But if the module is not unloadable or was compiled into the kernel, there is
no need to do this.

Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC
during kernel build and 0 otherwise.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
2014-10-26 19:42:44 +00:00
Mateusz Guzik
b90638866e Fix up an assertion in kern_setgroups, it should compare with ngroups_max + 1
Bug introdued in r273685.

Noted by: Tiwei Bie <btw mail.ustc.edu.cn>
2014-10-26 14:25:42 +00:00