Before this change kern.sched.interact sysctl setting above 32 gave
all interactive threads identical priority of PRI_MIN_INTERACT due to
((PRI_MAX_INTERACT - PRI_MIN_INTERACT + 1) / sched_interact) turning
zero. Setting the sysctl lower reduced the range of used priority
levels up to half, that is not great either.
Change of the operations order should fix the issue, always using full
range of priorities, while overflow is impossible there since both
score and priority values are small. While there, make the variables
unsigned as they really are.
MFC after: 1 month
We only use nvme_completion_poll in the initialization path. The
commands they queue and wait for finish quickly as they involve no I/O
to the drive's media. These command take about 20-200 microsecnds
each. Set the wait time to 1us and then increase it by 1.5 each
successive iteration (max 1ms). This reduces initialization time by
80ms in cpervica's tests.
Use this same technique waiting for RDY state transitions. This saves
another 20ms. In total we're down from ~330ms to ~2ms.
Tested by: cperciva
Sponsored by: Netflix
Reviewed by: mav
Differential Review: https://reviews.freebsd.org/D32259
Imports the changes for building official images on Azure Marketplace,
which fulfill the requirements of Azure and FreeBSD cloud images like
disk layout and UEFI for Gen2 VM, along with some minor improvements like
configurations to speed up booting.
"CLOUDWARE" list will be updated after some more collaborations with re
completed.
Reviewed by: re (gjb)
Sponsored by: The FreeBSD Foundation
Technical assistance from: Microsoft
Differential Revision: https://reviews.freebsd.org/D23804
A core segment is bounded in size only by memory size. On 64-bit
architectures this means a segment can be much larger than 4GB.
However, compress_chunk() takes only a u_int, clamping segment size to
4GB-1, resulting in a truncated core. Everything else, including the
compressor internally, uses size_t, so use size_t at the boundary here.
This dates back to the original refactor back in 2015 (r279801 /
aa14e9b7).
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
Make freebsd-update(8) support jails by adding the -j flag which takes
a jail jid or name as an argument. This takes advantage of the recently
added -j support to freebsd-version(8) in order to get the version of
the installed userland.
Reviewed by: dteske, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25711
Make freebsd-version(1) support jails by adding the -j flag which takes
a jail jid or name as an argument. As with other options, -j
flags stack and display in the order requested.
Reviewed by: bcr (manpages), kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25705
NOTE_ABSTIME may also have a zero timeout, which indicates that we
should still fire immediately as an absolute time in the past. A test
has been added for this one as well.
Fixes: 9c999a259f ("kqueue: don't arbitrarily restrict long-past...")
Point hat: kevans
Reported by: syzbot+1c8d1154f560b3930042@syzkaller.appspotmail.com
That fixes memory leak on last GELI provider destroyed, introduced
in 2dbc9a388e. This patch was originally developed late 2019 and
the flag was necessary to prevent zone drainage under memory pressure.
Today, with f09cbea31a the UMA is fixed not to drain into reserves.
Discussed with: jtl, markj
Fixes: 2dbc9a388e
PR: 258787
The FreeBSD nvme driver has reset the nvme controller twice on attach to
address a theoretical issue assuring the hardware is in a known
state. However, exierence has shown the second reset is unnecessary and
increases the time to boot. Eliminate the second reset. Should there be
a situation when you need a second reset (for buggy or at least somewhat
out of the mainstream hardware), the hardware option NVME_2X_RESET will
restore the old behavior. Document this in nvme(4).
If there's any trouble at all with this, I'll add a sysctl tunable to
control it.
Sponsored by: Netflix
Reviewed by: cperciva, mav
Differential Revision: https://reviews.freebsd.org/D32241
After some study of the code and the standard, I think we can just drop
the pause(), unconditionally. If we're not initialized, then there's
nothing to wait for from a software perspective. If we are initialized,
then there might be outstanding I/O. If so, then the qpair 'recovery
state' will transition to WAITING in nvme_ctrlr_disable_qpairs, which
will ignore any interrupts for items that complete before we complete
the reset by setting cc.en=0.
If we go on to fail the controller, we'll cancel the outstanding I/O
transactions. If we reset the controller, the hardware throws away
pending transactions and we retry all the pending I/O transactions. Any
transactions that happend to complete before cc.en=0 will have the same
effect in the end (doing the same transaction twice is just inefficient,
it won't affect the state of the device any differently than having done
it once).
The standard imposes no wait times here, so it isn't needed from that
perspective.
Unanswered Question: Do we may need to disable interrupts while we
disable in legacy mode since those are level-sensitive.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D32248
The don't touch the mmio of the drive after we do a EN 1->0 transition
is only for a tiny number of dirves that have this unforunate issue.
Sponsored by: Netflix
Rewrite the nested if's using the preferred FreeBSD style for branches
of ifs that return. NFC. Minor tweaks to the comments to better fit new
code layout.
Sponsored by: Netflix
Reviewed by: mav, chuck (prior rev, but comments rolled in)
Differential Revision: https://reviews.freebsd.org/D32245
Remove the 5ms delays after writing the administrative queue
registers. These delays are from the very earliest days of the driver
(they are in the first commit) and were most likely vestiges of the
Chatham NVMe prototype card that was used to create this driver. Many of
the workarounds necessary for it aren't necessary for standards
compliant cards. The original driver had other areas marked for Chatham,
but these were not. They are unneeded. There's three lines of supporting
evidence.
First, the NVMe standards make no mention of a delay time after these
registers are written. Second, the Linux driver doesn't have them, even
as an option. Third, all my nvme cards work w/o them.
To be safe, add a write barrier between setting up the admin queue and
enabling the controller.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D32247
ATF_REQUIRE_ERRNO requires the given errno iff the given expression is
true. These test cases used it incorrectly, potentially allowing
sem_clockwait_np to succeed when it was expected to fail. Use separate
ATF calls to require failure and the expected errno.
MFC after: 1 week
Sponsored by: Dell EMC Isilon
In a guest on a busy hypervisor, the time remaining after an
interrupted sleep could be much lower than other environments.
Relax the lower bound on VMs.
MFC after: 1 week
Sponsored by: Dell EMC Isilon
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics
on DSACKs however are very useful in figuring out how much bad retransmissions you
are doing. This is further complicated, however, by stacks that do TLP. A TLP
when discovering a lost ack in the reverse path will cause the generation
of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well
as the more traditional dsack-bytes and dsack-packets. These will now
all display in netstat -p tcp -s. This also updates all stacks that
are currently built to keep track of these stats.
Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32158
RFC 3414 Section 4. Discovery specifies that a discovery request message has a
varBindList left empty. Nonetheless, bsnmpd(1) should not crash when receiving
a non-zero var-bindings list in a Discovery Request message.
PR: 255214
MFC after: 2 weeks
The previous update to handle the gicv2m as a child of the gicv3 driver
assumed there was only a single gicv2m child. On some hardware there
are multiple children. Support this by removing the mbi ivars and
adding a new interface to handle MSI allocation in a given range.
Tested by: mw, trasz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32224
When writing to memory on arm64 we may be trying to be accessing a
read-only page. In this case try to access via the DMAP region to
get a writable location.
While here simplify writing data in DDB and stop trashing the size as
it is passed into the cache handling functions.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32053
When printing arm64 registers because of an exception in the kernel
also print the symbol and offset. This can be used to track down why
the exception occured without needing external tools.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32077
Use correct devclass name, due to the mismatch miibus would attach
to the wrong thing causing mii_attach to silently fail.
Fixes: dfcaa2c18b (enetc_mdio: Support building the driver ...)
Having it included confuses KOBJOPLOOKUP resulting in kobj_error_method
being called instead of a devmethod from the switch driver.
That in turn returns ENXIO which was treated as a pointer and
dereferenced by etherswitch ioctl logic causing the kernel to panic.
Fixes: b542c9e42b (modules: felix: Add needed dependencies)
It was missed during the conversion of kernel configs.
Although the driver is already built as a kernel module we might
want to have it built-in for diskless booting and such.