Print a message when iflib_tx_structures_setup fails, like we do for
iflib_rx_structures_setup.
Now that we always print a message from within
iflib_qset_structures_setup when it fails, stop printing one in
iflib_device_register() at the call site.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D15300
Since pop %ss/mov %ss instructions defer all interrupts and exceptions
for the next instruction, it is possible that the userspace watchpoint
trap executes on the first instruction of the kernel entry for
syscall/bpt.
In this case, DB# should be treated similarly to NMI: on amd64 we must
always load GSBASE even if the trap comes from kernel mode, and load
the kernel page table root into %cr3. Moreover, the trap must
use the dedicated stack, because we are still on the user stack when
trapped on syscall entry.
For i386, we must reload %cr3. The syscall instruction is not configured,
so there is no issue with executing on user stack when trapping.
Due to some CPU erratas it is not always possible to detect that the
userspace watchpoint triggered by inspecting %dr6. In trap(), compare the
trap %rip with the known unsafe entry points and if matched pretend that
the watchpoint did not fire at all.
Thank you to the MSRC Incident Response Team, and in particular Greg
Lenti and Nate Warfield, for coordinating the response to this issue
across multiple vendors.
Thanks to Computer Recycling at The Working Center of Kitchener for
making hardware available to allow us to test the patch on additional
CPU families.
Reviewed by: jhb
Discussed with: Matthew Dillon
Tested by: emaste
Sponsored by: The FreeBSD Foundation
Security: CVE-2018-8897
Security: FreeBSD-SA-18:06.debugreg
Summary:
chrp_cpuref_init() was relying on the boot strap processor to be
the first child of /cpus. That was not always the case, specially
on pseries with FDT.
This change uses the "reg" property of each CPU instead and also
adds several sanity checks to avoid unexpected behavior (maybe
too many panics?).
The main observed symptom was interrupts being missed by the main
processor, leading to timeouts and the kernel aborting the boot.
Submitted by: Leandro Lupori
Reviewed by: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D15174
based link-local address.
The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.
Reviewed by: network (ae), kib
Differential Revision: https://reviews.freebsd.org/D13407
MFC after: 1 week
Sponsored by: Mellanox Technologies
A version of this patch was originally sent to me by se@, matching behavior
from newer versions of GNU grep.
While there have been some differences of opinion on whether stdin should be
closed or not after depleting it in process of -f, I've opted to leave stdin
open and just let the later matching stuff fail and result in a no-match.
I'm not married to the current behavior- it was generally chosen since we
are adopting this in particular from GNU grep, and I would like to stay
consistent without a strong argument to the contrary. The current behavior
isn't technically wrong, it's just fairly unfriendly to the developer-user
of grep that may not realize their usage is trivially invalid.
Submitted by: se
TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value.
If an ACK arrives before the calculated badrxtwin (now + SRTT):
tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1));
We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops.
Reported by: rstone
Reviewed by: hiren, transport
Approved by: sbruno
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8556
With r333218 it is now possible for drivers to use an sx lock and thus sleep while
waiting on long running operations rather than DELAY().
Reported by: gallatin
Reviewed by: sbruno
Approved by: sbruno
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14984
The entire mechanism is rarely used and is quite not performant due to
atomci ops on the syscall table. It also has added overhead for completely
unrelated syscalls.
Reduce it by avoiding the func calls if possible (which consistutes vast
majority of cases).
Provides about 3% syscall rate speed up for getuid on Broadwell.
The parameter is effectively controllable by userspace. It does not matter
what it is set to as it is being passed to copyin - worst case the operation
will just fail.
While here stop computing it unless it is going to be used.
Noted by: dillon@backplane.com
If you add a child to a device that has quiet children, we'll
automatically set the quiet flag on the children, and its
children.
This is indended for things like CPU that have a large amount of
repetition in booting that adds nothing.
There was a missing trick expanding the passed pattern to a full word
by multiplication. As a side effect non-zero patterns would be
incorrectly laid down.
This stems from the use of rep stosq which is word-sized, while the passed
argument is byte-sized.
I initially repurposed memcpy into memset without taking this into account.
All but non-bzero testing was performed with a variant utilizing ERMS, i.e.
using only stosb which happens to not into the problem whatsoever. So my bad
twice.
Thanks to Oliver Pinter for noting the problem and providing a testcase.
The caph_enter function should made it easier to sandbox application
and not force us to remember that we need to check errno on failure.
Another function is also checking if casper is present.
Reviewed by: emaste, cem (partially)
Differential Revision: https://reviews.freebsd.org/D14557
The canonical check for whether or not a ring is drainable is
TXQ_AVAIL() > MAX_TX_DESC() + 2. Use this same construct here,
in order to avoid a potential off-by-one error where we might otherwise
fail to request an interrupt.
Reviewed by: mmacy
Sponsored by: Netflix
Boost the priority of user-space threads when they set
their affinity to a core to adjust its frequency. This avoids a situation
where a CPU bound kernel thread with the same affinity is running on a
down-clocked core, and will "block" powerd from up-clocking the core
until the kernel thread yields. This can lead to poor perfomance,
and to things potentially getting stuck on Giant.
Reviewed by: kib (imp reviewed earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15246
memmove is repurposed bcopy (arguments swapped, return value added)
The libkern variant is a wrapper around bcopy, so this is a big
improvement.
memset is repurposed memcpy. The librkern variant is doing fishy stuff,
including branching on 0 and calling bzero.
Both functions are rather crude and subject to partial depessimization.
This is a soft prerequisite to adding variants utilizing the
'Enhanced REP MOVSB/STOSB' bit and let the kernel patch at runtime.
Currently when net.inet.carp.allow=0 CARP state remains as MASTER, which is
not very useful (if there are other masters -- it can lead to split brain,
if there are none -- it makes no sense). Having it as INIT makes it clear
that carp packets are disabled.
Submitted by: wg
MFC after: 1 month
Relnotes: yes
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14477
Without a subsequent wbinvd the changes to suspended_cpus (and
resuming_cpus) can be lost at least on AMD systems that use MOESI cache
coherency protocol. That can happen because one of APs ends up as an
Owner of the corresponding cache line(s) and the changes may never reach
the main memory before the AP is reset.
While here, move clearing of suspended_cpus a little bit earlier as the
fact of returning from savectx (with zero return value) means that the
CPU has fully restored it execution context.
Also, rework the comment that describes the need for resuming_cpus.
This change fixed suspend to RAM a previously broken AMD-based system.
Reviewed by: kib
Discussed with: bde
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D15295
The properties 'assigned-clocks', 'assigned-clock-parents' and
'assigned-clock-rates' all work together.
'assigned-clocks' holds the list of clock for which we need to either
assign a new parent or a new frequency.
The old code just supported assigning a new parents, add support for
assigning a new frequency too.
Add support for setting pll rate. On RockChip SoC two kind of plls are
supported, integer mode and fractional mode.
The two modes are intended to support more frequencies for the core plls.
While here change the recalc method as it appears that the datasheet is
wrong on the calculation method.
Add the clock definition for the arm clock.
While here remove the indexes in the clock table as we will need clock
with a 0 index (non-exported clocks).
Most filesystems, with the notable exceptions of msdosfs and autofs use
only vfs_timestamp() to read the current time. This has the benefit of
configurable granularity (using the vfs.timestamp_precision sysctl).
For convenience, use it on msdosfs too.
Submitted by: Damjan Jovanovic
Differential Revision: https://reviews.freebsd.org/D15297
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.
Synchronously remove all external references to a multicast address before enqueueing for delete.
Reported by: lwhsu
Approved by: sbruno
https://github.com/illumos/illumos-gate/edit/master/usr/src/lib/libzfs/common/libzfs_sendrecv.c
Patch our version of it to quiesce warnings until someone decides to sync
up our code:
libzfs_sendrecv.c:2555:30: warning: format specifies type 'unsigned long'
but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
sprintf(guidname, "%lu", thisguid);
~~~ ^~~~~~~~
%llu
libzfs_sendrecv.c:2612:29: warning: format specifies type 'unsigned long'
but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
sprintf(guidname, "%lu", parent_fromsnap_guid);
~~~ ^~~~~~~~~~~~~~~~~~~~
%llu
libzfs_sendrecv.c:2645:29: warning: format specifies type 'unsigned long'
but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
sprintf(guidname, "%lu", parent_fromsnap_guid);
~~~ ^~~~~~~~~~~~~~~~~~~~
%llu
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D15325
With Linux 4.17 dts the compatible for the prcm added 'simplebus' we mean
that the simplebus driver will attach to it at the BUS_PASS_BUS pass.
Change the pass for the prcm driver to be at BUS_PASS_BUS so we will win
the attach.
This introduce a problem as this driver needs the ti_scm one to be already
attached. ti_scm also attach at BUS_PASS_BUS but after the prcm one as it is
after in the dtb and the simplebus driver simpy walk the tree to attach it's
children.
Use the bus_new_pass method to defer the frequencies read at BUS_PASS_TIMER.
This fixes booting on BeagleBone*
Reported by: many
em(4) and igb(4) were tested by me, and ixgbe(4) and bnxt(4) were
tested by sbruno.
Reviewed by: mmacy, shurd
MFC after: 1 month
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15262