MI linux.[c|h] are the module independent in terms of the Linux emulation
layer (ie, intended for both ISA - 32 & 64 bit), analogue of MD linux.h.
There must be a code here that cannot be placed into the corresponding by
common sense MI source and header files, i.e., code is machine independent,
but ISA dependent.
For the use_real_names knob, the code must be placed into the
linux_socket.[c|h], however linux_socket is ISA dependent.
MFC after: 2 weeks
This obsolete system call is not supported by glibc. In ancient libc
versions (before glibc 2.0), uselib() was used to load the shared
libraries with names found in an array of names in the binary.
On Linux, since 3.15, this system call is available only when
the kernel is configured with the CONFIG_USELIB option.
It doesn't look like anyone needs this syscall for others Linuxulators,
so move it to the corresponding MD Linuxulator.
MFC after: 2 weeks
Include sys/sysent.h directly where it needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.
MFC after: 2 weeks
Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.
MFC after: 2 weeks
FreeBSD-9 had introduced support for the full set of Unicode
characters to the parsing and processing of keymap character tables.
This support has been extended to cover the table for accented
characters that are reached via dead key combinations in FreeBSD-13.2.
New ioctls have been introduced to support both the pre-Unicode and
the Unicode formats and keyboard drivers have been extended to support
those ioctls.
This commit makes the ABI compatibility functions in the kernel
optional and dependent on COMPAT_FREEBSD13 in -CURRENT.
The kbdcontrol command in -CURRENT and 13-STABLE (before 13.2) has
been made ABI compatible with old kernels to allow a new world to be
run on an old kernel (that does not have full Unicode support for
keymaps).
This commit is not to merged back to 12-STABLE or 13-STABLE. It is
part of review D38465, which has been split into 3 separate commits
due to different MFC and life-time requirements of either commit.
Approved by: imp
Differential Revision: https://reviews.freebsd.org/D38465
The definition of pre-Unicode keymap ioctls will be made optional and
dependent on COMPAT_FREEBSD13 in a follow-up commit to 14-CURRENT.
While we generally provide ABI compatibility for older binaries on
a new kernel, but not functionally extended userland programs on an
old kernel, it has been specifically requested to preserve ABI
compatibility for the kbdcontrol program for both these cases.
Passing the kernel configuration option COMPAT_FREEBSD13 to the build
of kbdcontrol will make ioctls visible to the build that are normally
hidden, but required to implement compatibility with kernels that only
support 8 bit characters in dead key maps.
This commit is not to be merged to any previous FreeBSD version and
it shall be reverted as soon as this type of ABI compatibility is no
longer deemed necessary (probably before 14-STABLE is branched).
This commit is a part of review D38465 and split off to allow it to be
reverted using the commit ID.
Support for the full range of Unicode character codes has been added
to the main keymap back in 2009, with compatibility shims added in
2011 (to support an older kbdcontrol command on a new kernel during
an upgrade from FreeBSD-8 to FreeBSD-9).
Unicode support for accented characters that are reached via dead key
combinations has been added just recently, again with compatibility
shims to allow all combinations of old/new kernel and old/new
kbdcontrol command to load and display the keymaps including the dead
key table. (But full Unicode in the dead key table requires both a new
kernel and kbdcontrol command.)
This commit makes the compatibility shims depend on the respective
compatibility ioctls (OGIO_KEYMAP, OPIO_KEYMAP, OGIO_DEADKEYMAP, and
OPIO_DEADKEYMAP) being defined in sys/kbio.h. This is true for all of
them in 13-STABLE, none in 12-STABLE (as of now), and will become
optional due to a follow-up commit to sys/kbio.h in -CURRENT.
This commit is the only part of review D38465 that should be merged
back to 12-STABLE and 13-STABLE.
MFC after: 1 month
The data port returns the data of the fwcfg item.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38333
The selector port is used to select the desired fwcfg item.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38332
qemu's fwcfg and bhyve's fwctl are both used to configure ovmf. qemu's
fwcfg is much more powerfull than bhyve's fwctl. For that reason, add
support for qemu's fwcfg.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38331
The list is used to generate the dsdt entry for every acpi device.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D3830
The guest will check the dsdt to detect acpi devices. Therefore, add a
helper function to create such a dsdt entry for an acpi device.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38329
These helper function can be used to assign acpi resources to an
acpi_device.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38328
To simplify the handling of different acpi devices like qemu fwcfg or a
tpm, add a helper struct. It will handle the reporting of acpi
resources.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38327
Notable changes include:
- DSCP QoS Support (leveraging support added in
rG9c950139051298831ce19d01ea5fb33ec6ea7f89)
- Improved PFC handling and TC queue assignments (now all remaining
queues are assigned to TC 0 when more than one TC is enabled and the
number of available queues does not evenly divide between them)
- Support for dumping the internal FW state for additional debugging by
Intel support
- Support for allowing "No FEC" to be a valid state for the LESM to
negotiate when using non-standard compliant modules
Also includes various bug fixes and smaller enhancements, too.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: erj@
Tested by: Jeff Pieper <jeffrey.pieper@intel.com>
MFC after: 3 days
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D38109
Unfortunately, even after e79b6807, I still, much more rarely,
tripped asserts when playing with many ctldir mounts at once.
Since this appears to happen if we dispatched twice too fast, just
ignore it. We don't actually need to do anything if someone already
started doing it for us.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#14462
The zio returned from arc_write() in dmu_objset_sync() uses
zio_nowait(). However we may reach the end of dsl_dataset_sync()
which checks if we need to activate features in the filesystem
without knowing if that zio has even run through the ZIO pipeline yet.
In that case we will flag features to be activated in
dsl_dataset_block_born() but dsl_dataset_sync() has already
completed its run and those features will not actually be activated.
Mitigate this by moving the feature activation code in
dsl_dataset_sync_done(). Also add new ASSERTs in
dsl_scan_visitbp() checking if a block contradicts any filesystem
flags.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#13816
We've had cases where we trigger an OOM despite having memory freely
available on the system. For example, here, we had about 21GB free:
kernel: Node 0 Normal: 2418758*4kB (UME) 1549533*8kB (UE) 0*16kB
0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
22071296kB
The problem being, all the memory is in 4K and 8K contiguous regions,
but the allocation request was for a 16K contiguous region:
kernel: SafeExecutors-4 invoked oom-killer:
gfp_mask=0x42dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_ZERO),
order=2, oom_score_adj=0
The offending allocation came from this call trace:
kernel: Call Trace:
kernel: dump_stack+0x57/0x7a
kernel: dump_header+0x4f/0x1e1
kernel: oom_kill_process.cold.33+0xb/0x10
kernel: out_of_memory+0x1ad/0x490
kernel: __alloc_pages_slowpath+0xd55/0xe40
kernel: __alloc_pages_nodemask+0x2df/0x330
kernel: kmalloc_large_node+0x42/0x90
kernel: __kmalloc_node+0x25a/0x320
kernel: ? spl_kmem_free_impl+0x21/0x30 [spl]
kernel: spl_kmem_alloc_impl+0xa5/0x100 [spl]
kernel: spl_kmem_zalloc+0x19/0x20 [spl]
kernel: zfsdev_ioctl+0x2b/0xe0 [zfs]
kernel: do_vfs_ioctl+0xa9/0x640
kernel: ? __audit_syscall_entry+0xdd/0x130
kernel: ksys_ioctl+0x67/0x90
kernel: __x64_sys_ioctl+0x1a/0x20
kernel: do_syscall_64+0x5e/0x200
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7fdca3674317
The problem is, for each ioctl that ZFS makes, it has to allocate a
zfs_cmd_t structure, which is 13744 bytes in size (on my system):
sdb> sizeof zfs_cmd
(size_t)13744
This size, coupled with the fact that we currently allocate it with
kmem_zalloc, means we need a 16K contiguous region of memory to satisfy
the request.
The solution taken by this change, is to use "vmem" instead of "kmem" to
do the allocation, such that we don't necessarily need a contiguous 16K
memory region to satisfy the allocation.
Arguably, a better solution would be not to require such a large
allocation to begin with (e.g. reduce the size of the zfs_cmd_t
structure), but that'd be a much larger change than this "one liner".
Thus, I've opted for this approach for now; we can always circle back
and attempt to reduce the size of the structure in the future.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes#14474
Protect the call with the node lock. We cannot lock the fvp vnode
sleepable there, because we already own other participating vnode's
locks. Taking it without sleeping require unwinding the whole locking
state in one more place.
Note that the liveness of the node is guaranteed by the lock on the
parent directory vnode.
Reported and tested by: pho
Fixes: cbac1f3464956185cf95955344b6009e2cc3ae40ESC
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38557
The helper tmpfs_access_locked() requires either the vnode or node
locked for consistency of the access check, unlike the pure vnode op.
Reviewed by: markj, mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38557
Commit 7344856e3a6d added a lot of macros that will front end
vnet macros so that nfsd(8) can run in vnet prison.
This patch adds some more of them and also a lot of uses of
nfsstatsv1_p instead of nfsstatsv1. nfsstatsv1_p points to
nfsstatsv1 for prison0, but will point to a malloc'd structure
for other prisons.
It also puts nfsstatsv1_p in nfscommon.ko instead of nfsd.ko.
MFC after: 3 months
Suppose that the cluster size is larger than page size. If the buffer
at the old EOF (before extending) was partial and dirty, it cannot be
automatically neither written out nor validated by the buffer cache,
since extending buffer adds invalid pages at the end.
Correct the buffer state by calling vfs_bio_clrbuf() on it, to mark
newly added and zeroed pages as valid.
Note that UFS is immune to the problem because ffs_truncate() always
allocate the block and buffer for the last byte of the file.
PR: 269341
Reported by: asomers
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38549
If extension fails, vnode pager recorded size might be left increased.
Only update vnode pager when extension is past the point of no rollback.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38549
Also do minor style adjustments.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38549
When vm_fault_soft_fast() creates a mapping, it release the VM map lock
before unbusying the top-level object. Without the map lock, however,
nothing prevents the VM object from being deallocated while still busy.
Fix the problem by unbusying the object before releasing the VM map
lock. If vm_fault_soft_fast() fails to create a mapping, the VM map
lock is not released, so those cases don't need to change.
Reported by: syzkaller
Reviewed by: kib (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D38527
tcp6_connect() is always called in a net_epoch read section.
Fixes: 3d76be28ec ("netinet6: require network epoch for in6_pcbconnect()")
Reviewed by: tuexen, glebius
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38506
Debugging reported NULL de-reference panic in dnode_hold_impl() I found
that for certain types of errors arc_read() may only return error code,
but not properly report it via done and pio arguments. Lack of done
calls may result in reference and/or memory leaks in higher level code.
Lack of error reporting via pio may result in unnoticed errors there.
For example, dbuf_read(), where dbuf_read_impl() ignores arc_read()
return, relies completely on the pio mechanism and missed the errors.
This patch makes arc_read() to always call done callback and always
propagate errors to parent zio, if either is provided.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes#14454
pmap_init_pv_table makes a first pass over the memory segments to
compute the amount of address space needed to allocate per-superpage
locks. It then makes a second pass over each segment allocating
domain-local memory to back the pages for the locks belonging to each
segment. This second pass rounds each segment's allocation up to a
page size since the domain-local allocation has to be a multiple of
pages. However, the first pass was only doing a single round of the
total page counts up at the end not accounting for the padding present
in each segment. To fix, apply the rounding in each segment in the
first pass instead of just at the end.
While here, tidy the second pass a bit by trimming some
not-quite-right logic copied from amd64. In particular, compute pages
directly at the start of the loop iteration to more closely match the
first loop. Then, drop an always-false condition as 'end' was
computed as 'start + pages' where 'start == highest + 1'. Thus, the
actual condition being tested was 'if (highest >= highest + 1 +
pages)'. Finally, remove 'highest' entirely by keep the result of the
'pvd' increment in the existing loop.
Reported by: CHERI (overflow)
Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D38377
These ports have been removed so these knobs are no longer meaningful.
This reverts commit 608289394f.
This reverts commit 39eb07f172.
Reviewed by: imp, bapt, emaste
Differential Revision: https://reviews.freebsd.org/D38562
`cpu_data(cpu)` evaluates to a `struct cpuinfo_x86` filled with
attributes of the given CPU number. The CPU number is an index in the
`__cpu_data[]` array with MAXCPU entries. On FreeBSD, we simply
initialize all of them like we do with `boot_cpu_data`.
While here, we add the `x86_model` field to the `struct cpuinfo_x86`. We
use `CPUID_TO_MODEL()` to set it.
At the same time, we fix the value of `x86` which should have been set
to the CPU family. It was using the same implementation as
`CPUID_TO_MODEL()` before. It now uses `CPUID_TO_FAMILY()`.
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D38542
`xa_is_err()` is synonymous to `IS_ERR()`.
Other introduced functions call their equivalent without the `irq*`
suffix.
Reviewed by: manu
Approved by: manu
Differential Revision: https://reviews.freebsd.org/D38534