1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-24 11:29:10 +00:00
Commit Graph

244072 Commits

Author SHA1 Message Date
Mark Johnston
7cdeaf3309 Add preliminary support for atomic updates of per-page queue state.
Queue operations on a page use the page lock when updating the page to
reflect the desired queue state, and the page queue lock when physically
enqueuing or dequeuing a page.  Multiple pages share a given page lock,
but queue state is per-page; this false sharing results in heavy lock
contention.

Take a small step towards the use of atomic_cmpset to synchronize
updates to per-page queue state by introducing vm_page_pqstate_cmpset()
and using it in the page daemon.  In the longer term the plan is to stop
using the page lock to protect page identity and rely only on the object
and page busy locks.  However, since the page daemon avoids acquiring
the object lock except when necessary, some synchronization with a
concurrent free of the page is required.  vm_page_pqstate_cmpset() can
be used to ensure that queue state updates are successful only if the
page is not scheduled for a dequeue, which is sufficient for the page
daemon.

Add vm_page_swapqueue(), which moves a page from one queue to another
using vm_page_pqstate_cmpset().  Use it in the active queue scan, which
does not use the object lock.  Modify vm_page_dequeue_deferred() to
use vm_page_pqstate_cmpset() as well.

Reviewed by:	kib
Discussed with:	jeff
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21257
2019-09-03 14:29:58 +00:00
Mark Johnston
9d75f0dc75 Map the vm_page array into KVA on amd64.
r351198 allows the kernel to use domain-local memory to back the vm_page
array (up to 2MB boundaries) and reserves a separate PML4 entry for that
purpose.  One consequence of that change is that the vm_page array is no
longer present in minidumps, which only adds pages mapped above
VM_MIN_KERNEL_ADDRESS.

To avoid the friction caused by having kernel data structures mapped
below VM_MIN_KERNEL_ADDRESS, map the vm_page array starting at
VM_MIN_KERNEL_ADDRESS instead of using a dedicated PML4 entry.

Reviewed by:	kib
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21491
2019-09-03 13:18:51 +00:00
Mateusz Guzik
f5791174df pseudofs: fix a LOR pfs_node vs pidhash (sleepable after non-sleepable)
Sponsored by:	The FreeBSD Foundation
2019-09-03 12:54:51 +00:00
Andriy Gapon
50f14c4f68 superio: fix the copyright block and update the year
MFC after:	2 weeks
2019-09-03 12:40:58 +00:00
Li-Wen Hsu
7fb6c523bc Temporarily skip sys.sys.qmath_test.qdivq_s64q in CI because it is unstable
PR:		240219
Discussed with:	trasz
Sponsored by:	The FreeBSD Foundation
2019-09-03 10:49:13 +00:00
Mateusz Guzik
d05b53e0ba Add sysctlbyname system call
Previously userspace would issue one syscall to resolve the sysctl and then
another one to actually use it. Do it all in one trip.

Fallback is provided in case newer libc happens to be running on an older
kernel.

Submitted by:	Pawel Biernacki
Reported by:	kib, brooks
Differential Revision:	https://reviews.freebsd.org/D17282
2019-09-03 04:16:30 +00:00
Mark Johnston
209f2e9838 Add a sysctl to dump kernel mappings and their properties on amd64.
The sysctl is called vm.pmap.kernel_maps.  It dumps address ranges
and their corresponding protection and mapping mode, as well as
counts of 2MB and 1GB pages in the range.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21380
2019-09-02 21:57:57 +00:00
Mark Johnston
87044fca73 Replace PMAP_LARGEMAP_MAX_ADDRESS() with a more general predicate.
No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-09-02 21:54:08 +00:00
Michael Tuexen
fe5dee73f7 This patch improves the DSACK handling to conform with RFC 2883.
The lowest SACK block is used when multiple Blocks would be elegible as
DSACK blocks ACK blocks get reordered - while maintaining the ordering of
SACK blocks not relevant in the DSACK context is maintained.

Reviewed by:		rrs@, tuexen@
Obtained from:		Richard Scheffenegger
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D21038
2019-09-02 19:04:02 +00:00
Ian Lepore
385a0efed3 Fix the name of the devicetree bindings document file cited in the manpage.
Reported by:	thj@
2019-09-02 18:32:08 +00:00
Edward Tomasz Napierala
1d3a302b4a Bump Linux version to 3.2.0. Without it, binaries linked against
glibc 2.24 and up (eg Ubuntu 19.04) fail with "FATAL: kernel too old".

This alone is not enough to make newer binaries actually work;
fix/hack/workaround is pending review at https://reviews.freebsd.org/D20687.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20757
2019-09-02 18:10:35 +00:00
Warner Losh
31b11bb3f2 In nvme_completion_poll, add a sanity check to make sure that we complete the
polling within a second. Panic if we don't. All the commands that use this
interface should typically complete within a few tens to hundreds of
microseconds. Panic rather than return ETIMEDOUT because if the command somehow
does later complete, it will randomly corrupt memory. Also, it helps to get a
traceback from where the unexpected failure happens, rather than an infinite
loop.
2019-09-02 17:11:32 +00:00
Warner Losh
ab0681aac9 In all the places that we use the polled for completion interface, except crash
dump support code, move the while loop into an inline function. These aren't
done in the fast path, so if the compiler choses to not inline, any performance
hit is tiny.
2019-09-02 17:11:27 +00:00
Warner Losh
fc68da4b4d Add a brief comment explaining why we can return ETIMEDOUT from the call to the
polled interface. Normally this would have the potential to corrupt stack memory
because the completion routines would run after we return. In this case,
however, we're doing a dump so it's safe for reasons explained in the comment.
2019-09-02 17:10:46 +00:00
Edward Tomasz Napierala
7a8cbc5288 Relax compat.linux.osrelease checks. This way one can do eg
'compat.linux.osrelease=3.10.0-957.12.1.el7.x86_64', which
corresponds to CentOS 7.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20685
2019-09-02 16:57:42 +00:00
Mateusz Guzik
1874c61b90 vfs: restore mp null check in vop_stdgetwritemount
The initially read mount point can already be NULL.

Reported by:	markj
Fixes: r351656 ("vfs: stop refing freed mount points in vop_stdgetwritemount")
Sponsored by:	The FreeBSD Foundation
2019-09-02 15:24:25 +00:00
Johannes Lundberg
458ba18d43 LinuxKPI: Add sysfs create/remove functions that handles multiple files in one call.
Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	D21475
2019-09-02 14:51:59 +00:00
Ed Maste
6b62f42434 libc: Use musl's optimized memchr
Parentheses added to HASZERO macro to avoid a GCC warning.

Reviewed by:	kib, mjg
Obtained from:	musl (snapshot at commit 4d0a82170a)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17631
2019-09-02 13:56:44 +00:00
Mateusz Guzik
ebea9e6d8f libutil: remove SIGSYS handling from setusercontext
It was a workaround for cases where the kernel lacks setloginclass(2),
added in the 9.x era.

Submitted by:	Pawel Biernacki
2019-09-02 13:55:31 +00:00
Ed Maste
82adf9ad56 Belatedly bump __FreeBSD_version for r351659, gets(3) removal
Reported by:	linimon
2019-09-02 12:48:18 +00:00
Mateusz Guzik
9576ff5803 proc: clear pid bitmap entry after dropping proctree lock
There is no correctness change here, but the procid lock is contended in
the fork path and taking it while holding proctree avoidably extends its
hold time.

Note that there are other ids which can end up getting cleared with the
lock.

Sponsored by:	The FreeBSD Foundation
2019-09-02 12:46:43 +00:00
Toomas Soome
05b24e8617 loader.efi: use and prefer coninex interface
Add support for EFI_SIMPLE_TEXT_INPUT_EX_PROTOCOL.
2019-09-02 11:04:17 +00:00
Toomas Soome
591d5f0ef5 loader.efi: some systems do not translate scan code 0x8 to backspace
Add scancode translation for backspace.
2019-09-02 10:45:10 +00:00
Hans Petter Selasky
4d83500fda Use DEVICE memory instead of UNCACHEABLE on aarch64 in ioremap() in the LinuxKPI.
This fixes system hangs on reading device registers on aarch64.

Tested with:	Marvell MACCHIATObin (Armada8k) + mlx4en, amdgpu
Submitted by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D20789
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-02 08:34:45 +00:00
Hans Petter Selasky
f6549df685 Fix regression issue after r351616. Make sure the mbuf queue gets initialized.
Found by:	gonzo@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-09-02 08:31:18 +00:00
Brooks Davis
389154096a Remove remnants of optimization for > pagesize allocations.
In the past, this allocator seems to have allocated things larger than
a page seperately. Much of this code was removed at some point (perhaps
along with sbrk() used) so remove the rest. Instead, keep allocating in
power-of-two bins up to FIRST_BUCKET_SIZE << (NBUCKETS - 1). If we want
something more efficent, we should use a fancier allocator.

While here, remove some vestages of sbrk() use. Most importantly, don't
try to page align the pagepool since it's always page aligned by mmap().

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D21453
2019-09-02 08:03:29 +00:00
Kyle Evans
4b3b82a756 mips: fix some mcount nits
The symbol version for _mcount was removed 12 years ago in r169525 from
gmon/Symbol.map, to be added to the per-arch Symbol.map. mips was overlooked
in this, so _mcount has no symver. Add it back to where it should have been,
rather than where it would go if it were added today, since we're correcting
a historical mistake.

Additionally, _mcount is getting thrown into .mdebug.abi32 in the llvm80/90
world as it's not getting explicitly thrown into .text, so do this now. This
fixes the libc build that was previously failing due to relocations in
.mdebug.abi32. This is specifically due to the way clang's integrated AS
works and that they emit the .mdebug.abiNN section early in the process. An
LLVM bug has been submitted[0] and an agreement has been made that the
mips backend should switch to .text following .mdebug.abiNN for
compatibility.

[0] https://bugs.llvm.org/show_bug.cgi?id=43119

Reviewed by:	imp, arichardson
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21435
2019-09-02 01:55:55 +00:00
Mark Johnston
08cfa56ea3 Extend uma_reclaim() to permit different reclamation targets.
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure.  This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.

With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets.  Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure.  In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim().  When trimming, only
items in excess of the estimate are reclaimed.  Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.

Now, when under memory pressure, the page daemon will trim zones
rather than draining them.  As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.

Reviewed by:	jeff
Tested by:	pho (an earlier version)
MFC after:	2 months
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16667
2019-09-01 22:22:43 +00:00
Mark Johnston
63cdd18e40 Restrict the input domain set in cpuset_setdomain(2) to all_domains.
To permit larger values of MAXMEMDOM, which is currently 8 on amd64,
cpuset_setdomain(2) accepts a mask of size 256.  In the kernel, domain
set masks are 64 bits wide, but can only represent a set of MAXMEMDOM
domains due to the use of the ds_order table.

Domain sets passed to cpuset_setdomain(2) are restricted to a subset
of their parent set, which is typically the root set, but before this
happens we modify the input set to exclude empty domains.
domainset_empty_vm() and other code which manipulates domain sets
expect the mask to be a subset of all_domains, so enforce that when
performing validation of cpuset_setdomain(2) parameters.

Reported and tested by:	pho
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21477
2019-09-01 21:38:08 +00:00
Mark Johnston
c993b95329 Fix an off-by-one bug in the CPU and domain ID parser.
The "size" parameter is the size of the corresponding bit set, so the
maximum CPU or domain index is size - 1.

MFC after:	1 week
2019-09-01 21:20:31 +00:00
Brandon Bergren
9ae631858e Move CAS check in powerpc64 ofw loader until after the PVR check.
This unbreaks using the powerpc64 loader on a 32-bit processor.

Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D21297
2019-09-01 18:26:21 +00:00
Ed Maste
840aca2880 makefs: share msdosfsmount.h between kernel msdosfs and makefs
Sponsored by:	The FreeBSD Foundation
2019-09-01 16:55:33 +00:00
Ed Maste
73f4b4ebac vnic: correct and simplify SIOCSIFFLAGS
PR:		223573, 223575
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D13028
2019-09-01 16:53:17 +00:00
Ed Maste
597d9b4008 ar: use more correct size_t type for loop index
Submitted by:	cem
MFC after:	1 week
2019-09-01 16:51:25 +00:00
Ed Maste
aebac09b6f lldb: shorten thread names to make logs easier to follow
lldb prepends the thread name to log entries, and the existing thread
name for the FreeBSD ProcessMonitor thread was longer than the kernel's
supported thread name length, and so was truncated.  This made logs hard
to read, as the truncated thread name ran into the log message.  Shorten
"lldb.process.freebsd.operation" to just "freebsd.op" so that logs are
more readable.

(Upstreaming to lldb still to be done).
2019-09-01 16:50:34 +00:00
Ed Maste
6c30aa54c3 Remove CLANG_NO_IAS definition
CLANG_NO_IAS is not used anywhere in the tree.

Sponsored by:	The FreeBSD Foundation
2019-09-01 16:47:48 +00:00
Ed Maste
da15a90df6 libstdc++: remove gets
Removed from libc in r351659
2019-09-01 16:41:24 +00:00
Ed Maste
7381dcc9ee libc: remove gets
gets is unsafe and shouldn't be used (for many years now).  Leave it in
the existing symbol version so anything that previously linked aginst it
still runs, but do not allow new software to link against it.

(The compatability/legacy implementation must not be static so that
the symbol and in particular the compat sym gets@FBSD_1.0 make it
into libc.)

PR:		222796 (exp-run)
Reported by:	Paul Vixie
Reviewed by:	allanjude, cy, eadler, gnn, jhb, kib, ngie (some earlier)
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D12298
2019-09-01 16:12:05 +00:00
Santhosh Raju
47b9005348 Add myself (fox) and update the mentor information.
Approved by:	philip (mentor)
2019-09-01 15:39:28 +00:00
Vincenzo Maffione
253b2ec199 netmap: import changes from upstream (SHA 137f537eae513)
- Rework option processing.
 - Use larger integers for memory size values in the
   memory management code.

MFC after:	2 weeks
2019-09-01 14:47:41 +00:00
Mateusz Guzik
2796c209b0 vfs: stop refing freed mount points in vop_stdgetwritemount
The code used blindly ref based on an unsafely red address and then would
backpedal if necessary. This was safe in terms of memory access since
mounts are type-stable, but made for a potential a bug where the mount
was reused and had the count reset to 0 before this code decreased it.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21411
2019-09-01 14:01:09 +00:00
Michael Tuexen
b21097894e Fix initialization of top_fsn.
MFC after:		3 days
2019-09-01 10:39:16 +00:00
Michael Tuexen
6182677fb3 Improve the handling of state cookie parameters in INIT-ACK chunks.
This fixes problem with parameters indicating a zero length or partial
parameters after an unknown parameter indicating to stop processing. It
also fixes a problem with state cookie parameters after unknown
parametes indicating to stop porcessing.
Thanks to Mark Wodrich from Google for finding two of these issues
by fuzz testing the userland stack and reporting them in
https://github.com/sctplab/usrsctp/issues/355
and
https://github.com/sctplab/usrsctp/issues/352

MFC after:		3 days
2019-09-01 10:09:53 +00:00
Jung-uk Kim
1c9c1f5903 Add support for TP-Link Archer T2U Nano.
MFC after:	2 weeks
2019-09-01 06:40:58 +00:00
Mateusz Guzik
e0f4540a2a nullfs: reduce areas protected by vnode interlock in null_lock
Similarly to the other routine stop taking the interlock for the lower
vnode. The interlock for nullfs vnode is still taken to ensure
stability of ->v_data.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21480
2019-09-01 02:52:00 +00:00
Kyle Evans
32287ea72b posixshm: switch to OBJT_SWAP in advance of other changes
Future changes to posixshm will start tracking writeable mappings in order
to support file sealing. Tracking writeable mappings for an OBJT_DEFAULT
object is complicated as it may be swapped out and converted to an
OBJT_SWAP. One may generically add this tracking for vm_object, but this is
difficult to do without increasing memory footprint of vm_object and blowing
up memory usage by a significant amount.

On the other hand, the swap pager can be expanded to track writeable
mappings without increasing vm_object size. This change is currently in
D21456. Switch over to OBJT_SWAP in advance of the other changes to the
swap pager and posixshm.
2019-09-01 00:33:16 +00:00
Aleksandr Rybalko
c96574f39c ARM kernel can get RAM regions three ways:
o from FDT;
o from EFI;
o from Linux Boot API (ATAG).
U-Boot may pass RAM info all that 3 ways simultaneously.
We do select between FDT and EFI, but not for ATAG.
So this is not problem fix, but correctness check.

MFC after:	2 weeks
2019-08-31 21:28:06 +00:00
Li-Wen Hsu
24612bfd1f Unskip test cases from netbsd-tests by defining __HAVE_FENV
This unskips:
  - lib.libc.stdlib.strtod_test.strtod_round
  - lib.msun.fe_round_test.t_nofe_round

In lib/msun/tests/Makefile only define on fe_round_test.c because
lib.msun.ilogb_test.ilogb will get wrong results and needs more examination.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-08-31 20:45:45 +00:00
Li-Wen Hsu
c4bf2f169a Fix dtrace test case after r351423 due to ping6(8) options changed
Failure test case:
    cddl.usr.sbin.dtrace.common.ip.t_dtrace_contrib.tst_ipv6localicmp_ksh

Sponsored by:	The FreeBSD Foundation
2019-08-31 15:10:27 +00:00
Li-Wen Hsu
8dc0ad14ad Fix tests use /etc/motd after r350184 by using an always existing file
Sponsored by:	The FreeBSD Foundation
2019-08-31 14:41:58 +00:00