We originally left this out because iflib modulates interrupts and
accomplishes some level of batching versus the custom queues in the
older driver. Upon more detailed study of the Linux driver which has a
newer implementation, it finally became clear to me this is actually a
holdoff timer and not an interrupt limit as it is conventionally
(statically) programmed and displayed as an interrupt rate. The data
sheets also make this somewhat clear.
Thus, AIM accomplishes two beneficial things for a wide variety of
workloads[1]:
1. At low throughput/packet rates, it will significantly lower latency
(by counter-intuitively "increasing" the interrupt rate.. better
thought of as decreasing the holdoff timer because you will modulate
down before coming anywhere near these interrupt rates).
2. At bulk data rates, it is tuned to achieve a lower interrupt rate
(by increasing the holdoff timer) than the current static 8000/s. This
decreases processing overhead and yields more headroom for other work
such as packet filters or userland.
For a single NIC this might be worth a few sys% on common CPUs, but may
be meaningful when multiplied such as if_lagg, if_bridge and forwarding
setups.
The AIM algorithm was re-introduced from the older igb or out of tree
driver, and then modernized with permission to use Intel code from other
drivers.
I have retroactively added it to lem(4) and em(4) where the same concept
applies, albeit to a single ITR register.
[1]: http://iommu.com/datasheets/ethernet/controllers-nics/intel/e1000/gbe-controllers-interrupt-moderation-appl-note.pdf
Tested by: cc (https://wiki.freebsd.org/chengcui/testD46768)
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D46768
(cherry picked from commit 3e501ef896)
Use 1000 * major + minor when calculating the version number that
gets set in the Ficl environment or lua loader property. This allows
for more room if the minor number needs to go above 9.
Add loader.version property to lua loader.
PR: 282001
Reviewed by: imp
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39631
(cherry picked from commit a50d73d578)
The program copies an input buffer to an output buffer without verifying
that the size of the input buffer is less than the size of the output
buffer, leading to a buffer overflow.
Inside the function pci_vtcon_control_send, the length of the iov buffer
is not validated before copy of the payload.
Reported by: Synacktiv
Reviewed by: markj
Security: HYP-19
Sponsored by: The Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46105
(cherry picked from commit 8934002959)
This is a follow-up to the fix for HYP-19, addressing another condition
where an overflow might still occur. (Spotted by jhb@, thanks!)
Reported by: Synacktiv
Reviewed by: markj
Security: HYP-19
Sponsored by: Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46882
(cherry picked from commit b34a4edefb)
(cherry picked from commit c17d96fe79)
Sometimes, especially with older firmware, mps(4) would have trouble
initializing the card in one of these two steps. Add in a retry after a
short delay. Sean Bruno and Stephen McConnell thought this was OK in the
bug discussions, but never committed it. Steve indicated the delay
might not be necessary, but the OP clearly needed to make it longer to
make things work. I've kept the delay, and added the suggested comment.
Ported the iocfacts part to mpr as well, since we see similar errors
about once every month or two over a few thousand controllers at
work. We've not seen it with IOC_INIT as far back as I can query the
error log database, so I didn't port that forward. We'll see if this
helps, but won't know for sure until next year (so I'm committing it now
since it won't hurt and might help). We usually see this failure in
connection with complicated recovery operations with a drive that's
failing, though, at least in the last year's worth of failures. It's
not clear this is the same as OP or not.
PR: 212841
Sponsored by: Netflix
Co-authored-by: imp
(cherry picked from commit c0e0e530ce)
The default VM size may depend on the architecture. In particular,
it is currently larged on riscv64 due to a toolchain issue which
results in bloated binaries.
MFC after: 3 days
Fixes: 59c21ed6e8 "release: Bump default VM size for riscv64 to 6 GB"
Sponsored by: Amazon
(cherry picked from commit ed807f7bca)
Due to issues with the riscv64 toolchain, some binaries end up
significantly larger on riscv64 than they should be. This results
in riscv64 VM images -- and at present *only* riscv64 images -- not
fitting within the default 5 GB filesystem size.
Bump the default size for riscv64 to 6 GB until the toolchain issues
can be resolved.
MFC after: 1 week
Sponsored by: Amazon
(cherry picked from commit 59c21ed6e8)
The units of the size reported in the 'sectors' xenbus node is always 512b,
regardless of the value of the 'sector-size' node. The sector offsets in
the ring requests are also always based on 512b sectors, regardless of the
'sector-size' reported in xenbus.
Fix both blkfront and blkback to assume 512b sectors in the required fields.
The blkif.h public header has been recently updated in upstream Xen repository
to fix the regressions in the specification introduced by later modifications,
and clarify the base units of xenstore and shared ring fields.
PR: 280884
Reported by: Christian Kujau
MFC after: 1 week
Sponsored by: Cloud Software Group
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D46756
(cherry picked from commit e7fe856437)
MFC after: 3 days
Sponsored by: Klara, Inc.
Reviewed by: 0mp, markj
Differential Revision: https://reviews.freebsd.org/D47019
(cherry picked from commit d350e8d795)
cmp: Check the status of stdout.
POSIX requires us to print an error message and exit non-zero if
writing to stdout fails. This can only happen if sflag is unset.
MFC after: 3 days
Sponsored by: Klara, Inc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D47020
(cherry picked from commit 3c37828ee1)
MFC after: 3 days
Sponsored by: Klara, Inc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D46996
(cherry picked from commit 334af5e413)
env: Improve documentation.
* The `env` utility's inability to run a command whose name contains an
equal sign is a feature, not a bug, so move that paragraph up from the
BUGS section to the DESCRIPTION section.
* Mention that this can be worked around by prefixing the command name
with `command`, and add an example of this to the EXAMPLE section.
* Add a test case which verifies that `env` does not run a command with
an equal sign in its name even if it exists, and also demonstrates the
workaround.
MFC after: 3 days
Sponsored by: Klara, Inc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D46997
(cherry picked from commit a0dfb0668b)
env: Check the status of stdout.
MFC after: 3 days
Sponsored by: Klara, Inc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D47009
(cherry picked from commit c2d93a803a)
The sendmail service script needs to be stopped during shutdown
to ensure a clean shutdown of active SMTP connections (and writing
any in memory queue files).
rcorder(8) requires the rcorder block to be an uninterrupted sequence of
REQUIRE, PROVIDE, BEFORE, and KEYWORD lines. Having a comment in between
REQUIRE and KEYWORD makes rcorder stop parsing the block when it reaches
the comment.
Fix that by moving the comment out from the rcorder block.
Reviewed by: bnovkov, christos, gshapiro, markj
Approved by: bnovkov (mentor), christos (mentor), markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D46924
(cherry picked from commit 8751fbe36f)
Synchronize the error handling in nfsd. If you check other error
handlings in those same condition blocks, it uses nfsd_exit instead,
which will call killchildren() and call the rpcbind service to do
the service un-mapping.
When soft updates began being enabled by default that change carried
over to mdmfs(8) which does not want or need them. This fix ensures
that they are only enabled in mdmfs(8) when requested with the -U flag.
Reported by: Ivan Rozhuk
Tested by: Michael Proto
PR: 279308
(cherry picked from commit 5b21d4ad06)
Ignore DHCP options 124 and 125 to shut up the warning messages.
These options are defined in the RFC 3925.
PR: 281361
Reviewed by: jrm (mentor), otis (mentor), thj
Tested by: jlduran@gmail.com
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46760
(cherry picked from commit 38c63b5283)
nfsrv_freeopen() was being called after the mutex
lock was released, making it possible for other
kernel threads to change the lists while nfsrv_freeopen()
took the nfsstateid out of the lists.
This patch moves the code around
"if (nfsrv_freeopen(stp, vp, 1 p) == 0) {"
into nfsrv_freeopen(), so that it can remove the nfsstateid
structure from all lists before unlocking the mutex.
This should avoid any race between CLOSE and other nfsd threads
updating the NFSv4 state.
The patch does not affect semantics when vfs.nfsd.enable_locallocks=0.
PR: 280978
Tested by: Matthew L. Dailey <matthew.l.dailey@dartmouth.edu>
(cherry picked from commit eb345e05ac)
When reading the next code in a stream, avoid reading an extra byte if
we're going to throw it away. When there's no more bits to extract from
the stream, bits will be 0 and we'll mask the read byte with 0 anyway.
At worst, this will avoid reading one past the end of gbuf array (which
is not possible in well formed streams).
PR: 127912
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D47041
(cherry picked from commit 818c7b769a)
Print the complete list of url that have failed
PR: 281924
Co-authored-by: Baptiste Daroussin <bapt@FreeBSD.org>
Differential Revision: https://reviews.freebsd.org/D46983
(cherry picked from commit be9243409d)
(cherry picked from commit 2f29060f46)
.pkg is the default extension as of commit c244b1d8a3, falling back to
.txz if not found.
PR: 281924
Reviewed by: bapt
Fixes: a2aac2f5e5 ("pkg(7): when bootstrapping first search for pkg.bsd file then pkg.txz")
Fixes: c244b1d8a3 ("pkg: settle the uniq extension to .pkg instead of .bsd")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46977
(cherry picked from commit f5c847ae84)
(cherry picked from commit fef1f3fecd)
The virtio_scsi device allows a VM guest to directly send SCSI commands
(ctsio->cdb array) to the kernel driver exposed on /dev/cam/ctl
(ctl.ko).
All kernel commands accessible from the guest are defined by
ctl_cmd_table.
The command ctl_persistent_reserve_out (cdb[0]=0x5F and cbd[1]=0) allows
the caller to call malloc() with an arbitrary size (uint32_t). This can
be used by the guest to overload the kernel memory (DOS attack).
Reported by: Synacktiv
Reviewed by: asomers
Security: HYP-08
Sponsored by: The Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46044
(cherry picked from commit 64b0f52be2)
(cherry picked from commit 2e7f4728fa)
The function nvme_opc_get_log_page in the file usr.sbin/bhyve/pci_nvme.c
is vulnerable to buffer over-read. The value logoff is user controlled
but never checked against the value of logsize. Thus the difference:
logsize - logoff
can underflow.
Due to the sc structure layout, an attacker can dump internals fields of
sc and the content of next heap allocation.
Reported by: Synacktiv
Reviewed by: emaste, jhb
Security: HYP-07
Sponsored by: Alpha-Omega Project, The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46021
(cherry picked from commit b0a24be007)
(cherry picked from commit a5be19efbb)
Current cleanup code assumes that all the fields are allocated and/or setup by
the time cleanup is called, but this is not always true: a failure in mid-setup
of the device will cause the functions to be called with possibly uninitialized
fields.
Fix the functions to cope with such sate, while also attempting to make the
cleanup idempotent.
Finally fix an error path during setup that would not mark the device as
closed, and hence prevents the kernel from finishing booting.
Fixes: 96375eac94 ("xen-netfront: add multiqueue support")
Sponsored by: Citrix Systems R&D
(cherry picked from commit 318bbb6d5a)
The current sizing of the array used to store grant table frames is broken, as
the calculation:
max_nr_glist_frames = (boot_max_nr_grant_frames *
GREFS_PER_GRANT_FRAME /
(PAGE_SIZE / sizeof(grant_ref_t)));
Is plain bogus, for once grant_ref_t is the type of the grant reference, but
not the entry used to store such references in the grant frames. But even if
the above calculation is switched to use grant_entry_v1_t, it would end up as:
max_nr_glist_frames = (boot_max_nr_grant_frames *
(PAGE_SIZE / sizeof(grant_entry_v1_t)) /
(PAGE_SIZE / sizeof(grant_entry_v1_t)));
Which is pointless (note GREFS_PER_GRANT_FRAME has been expanded to (PAGE_SIZE
/ sizeof(grant_entry_v1_t))).
Just use boot_max_nr_grant_frames directly to size the grant table frames
array.
Fixes: 30d1eefe39 ("Import OS interfaces to Xen services.")
Sponsored by: Citrix Systems R&D
(cherry picked from commit 1a12f0aea8)
Do not clear knotes from the TTY until it gets dealloc'ed, unless the
TTY is being revoked, in that case delete the knotes when closed is
called on the TTY.
When knotes are cleared from a knlist, those knotes become detached from
the knlist. And when an event is triggered on a detached knote there
isn't an associated knlist and therefore no lock will be taken when the
event is triggered.
This becomes a problem when a detached knote is triggered on a TTY since
the mutex for a TTY is also used as the lock for its knlists. This
scenario ends up calling the TTY event handlers without the TTY lock
being held and tripping on asserts in the event handlers.
PR: 272151
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D41605
(cherry picked from commit acd5638e26)
fsize is using 2 bytes for cluster number, but with fat32 we
actually do have 4 bytes and with large disks the high bytes will be in use.
illumos issue: https://www.illumos.org/issues/16821
Sponsored by: MNX Cloud, Inc.
(cherry picked from commit 79a0d14fa0)
All uses of this function were incorrect. if_amcount is a reference
count which tracks the number of times the network stack internally set
IFF_ALLMULTI. (if_pcount is the corresponding counter for IFF_PROMISC.)
Remove if_getamcount() and fix up callers to get the number of assigned
multicast addresses instead, since that's what they actually want.
Sponsored by: Klara, Inc.
Reviewed by: zlei, glebius
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46523
(cherry picked from commit 408c909dc6)
(cherry picked from commit b513c311d0)
The previous width of Netif (10 or 8) was too short for modern interface
names; make it 12, which is long enough to display "epair0a.1000".
This came up in practice with genet(4) interfaces, since the base
interface name is long enough that with the previous limit, VLAN
identifiers would be truncated at 1 character in the IPv6 output:
"genet0.100" becomes "genet0.1".
The width is now fixed, and doesn't depend on the address family,
because there's no reason that length of the interface name would vary
based on the AF.
PR: 266474
Reviewed by: imp,zlei,Mina Galić
Pull Request: https://github.com/freebsd/freebsd-src/pull/1223
(cherry picked from commit d33b87e8cf)
netstat: for -W, use IFNAMSIZ
If -W is specified, use IFNAMSIZ as the width of the Netif column,
instead of the default 12.
(cherry picked from commit ae9c0ba8ef)
(cherry picked from commit 6b86b8f0f6)
Try to live with cruel reality fact - if_vmove doesn't move an
interface from previous vnet cloning infrastructure to the new
one. Let's admit this as design feature and make it work better.
* Delete two blocks of code that would fallback to vnet0, if a
cloner isn't found. They didn't do any good job and also whole
idea of treating vnet0 as special one is wrong.
* When deleting a cloned interface, lookup its cloner using it's
home vnet.
With this change simple sequence works correctly:
ifconfig foo0 create
jail -c name=jj persist vnet vnet.interface=foo0
jexec jj ifconfig foo0 destroy
Differential revision: https://reviews.freebsd.org/D33942
(cherry picked from commit 6d1808f051)
* Do a single call into if_clone.c instead of two. The cloner
can't disappear since the interface sits on its list.
* Make restoration smarter - check that cloner with same name
exists in the new vnet.
Differential revision: https://reviews.freebsd.org/D33941
(cherry picked from commit 54712fc423)
Now that if_alloc_domain() never fails and actually doesn't
expose ifnet to outside we can eliminate IFNET_HOLD and two
step index allocation.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33259
(cherry picked from commit 8062e5759c)
This follows the recent MFC [1]. No functional change intended.
This is a direct commit to stable/13.
1. f500e5c6c9 net: Remove unneeded NULL check for the allocated ifnet
As for the consumer `enc_add_hhooks()`, `hhook_add_hook()` will never
fail for the given parameters. Meanwhile, to build the module if_enc(4),
at least option INET or INET6 is required, so no need for the error
EPFNOSUPPORT.
No functional change intended.
Reviewed by: ae
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46770
(cherry picked from commit 7643141e93)
(cherry picked from commit d6374ee051)
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().
No functional change intended.
Reviewed by: kp, imp, glebius, stevek
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45740
MFC note: This is only a partial MFC, as some drivers do not exist in
stable/13 branch. The if_epair(4) drifts too much from stable/14 so not
included in this MFC.
(cherry picked from commit aa3860851b)
(cherry picked from commit 6b1f530935)
The upstream fix to make lld output for our EFI loaders reproducible
again was committed in 54521a2ff9. Bump lld's LINKER_FREEBSD_VERSION
to be able to check this in the EFI loader Makefile.
MFC after: 3 days
(cherry picked from commit f97c7fdc59)
[Parallel] Revert sequential task changes
https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
`execute` and the old `spawn` without argument. However, sequential
tasks might be executed by any worker thread (non-deterministic),
leading to non-determinism output for ld.lld -z nocombreloc (see
https://reviews.llvm.org/D133003).
In addition, the extra member variables have overhead.
This sequential task has only been used for lld parallel relocation
scanning.
This patch restores the behavior before https://reviews.llvm.org/D148728 .
Fix#105958
Pull Request: https://github.com/llvm/llvm-project/pull/109084
This fixes the non-reproducibility we had noticed when linking our EFI
loaders, and for which we committed a workaround in f5ce3f4ef5.
MFC after: 3 days
(cherry picked from commit 54521a2ff9)
The last consumer of if_com_alloc() is firewire. It never fails
to allocate. Most likely the if_com_alloc() KPI will go away
together with if_fwip(), less likely new consumers of if_com_alloc()
will be added, but they would need to follow the no fail KPI.
MFC note: As for stable/13, there is one additional consumer sppp,
which also never fails to allocate. This MFCing is mainly to keep
behavioral compatibility of if_alloc_domain() and its wrappers
if_alloc(), if_alloc_dev(), and if_gethandle() with stable/14 and
onward branches. 3rd party drivers should be ready for this for years
as this behavioral change was done in stable/14 at November 22 2021.
As a good effect new drivers to be MFCed to stable/13 do not have to
conditionally check failure from if_alloc() for stable/13.
(cherry picked from commit 4787572d05)
The function hda_codec_command is vulnerable to buffer over-read, the
payload value is extracted from the command and used as an array index
without any validation.
Fortunately, the payload value is capped at 255, so the information
disclosure is limited and only a small part of .rodata of bhyve binary
can be disclosed.
The risk is low because the leaked information is not sensitive. An
attacker may be able to validate the version of the bhyve binary using
this information disclosure (layout of .rodata information, ex:
jmp_tables) before executing an exploit.
Reported by: Synacktiv
Reviewed by: christos, emaste
Security: HYP-13
Sponsored by: The Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46098
(cherry picked from commit e94a1d6a7f)
(cherry picked from commit 757bbf484c)
If the guest VM emits the exit code VM_EXITCODE_DB the kernel will
execute the function named vm_handle_db.
If the value of rsp is not page aligned and if rsp+sizeof(uint64_t)
spans across two pages, the function vm_copy_setup will need two structs
vm_copyinfo to prepare the copy operation.
For instance is rsp value is 0xFFC, two vm_copyinfo objects are needed:
* address=0xFFC, len=4
* address=0x1000, len=4
The vulnerability was addressed by commit 51fda658ba ("vmm: Properly
handle writes spanning across two pages in vm_handle_db"). Still,
replace the KASSERT with an error return as a more defensive approach.
Reported by: Synacktiv
Reviewed by markj, emaste
Security: HYP-09
Sponsored by: The Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46133
(cherry picked from commit d19fa9c1b7)
(cherry picked from commit f8db6fb90e)
The manual page says %m is replaced with “the string representation of
the error code stored in the errno variable at the beginning of the
call”. However, we don't actually save `errno` until fairly late in
`__vfprintf()`. Make sure it is saved before we do anything that
might perturb `errno`.
MFC after: 1 week
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D46718
(cherry picked from commit 74f1007fcc)
This script is usually run unprivileged, so install fails to create a
temporary file while copying the finished database. Revert to using
cat, which can overwrite the existing file as it is usually owned by
the same user which is running the script.
Fixes: f62c1f3f8e
MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D46872
(cherry picked from commit 26bd374e72)
When a signal is trapped, the script continues after the trap code has
run, unless the trap code explicitly exits. In the particular case of
locate.updatedb, this is mostly harmless, except that the trap code is
executed twice (once for the signal and once when we reach the end of
the script), but it's still worth fixing.
Furthermore, install the trap as soon as we've created the temporary
directory, to minimize the window during which we can fail to clean up
after ourselves if interrupted.
While here, simplify the empty check at the end and make some minor
style tweaks.
MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D46475
(cherry picked from commit f62c1f3f8e)
This chipset suffered an (un)usual number of bugs and iterations. Let's
add our NVM/firmware code from e1000 and the similar igc_nvm function
from DPDK to keep track of issues.
Sponsored by: BBOX.io
(cherry picked from commit 33ed9bdca3)
igc, derived from igb, does not use these registers. All interrupt
timing is governed by EITR or LLI and driven by write-back.
Sponsored by: BBOX.io
(cherry picked from commit a40ecb6f74)
These are more succinct than jumping through the function pointers
directly and add some additional error handling.
(cherry picked from commit 1e3b1870ad)