The latest generation hardware requires IO CQ (completion queue)
descriptors memory to be aligned to a 4K. It needs that feature for
the best performance.
Allocating unaligned descriptors will have a big performance impact as
the packet processing in a HW won't be optimized properly. For that
purpose adjust ena_dma_alloc() to support it.
It's a critical fix, especially for the arm64 EC2 instances.
Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27114
They are only there to provide less innacurate statistics for debuggers.
However, this is quite heavy-weight and instead it would be better to
teach debuggers how to obtain the necessary information.
This deduplicates 2 sets of caches using the same sizes.
Memory savings fluctuate a lot, one sample result is buildworld on zfs
saving ~180MB RAM in reduced page count associated with zio caches.
The ZIL will be opened on the first write, not earlier.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
OpenZFS Pull Request: https://github.com/openzfs/zfs/pull/11152
PR: 250934
My script to convert git commits to svn patch does not handle binary
files correctly, and r367387 committed a set of empty files as a result.
MFC with: r367387
Sponsored by: Rubicon Communications, LLC (Netgate)
lz4 port from illumos to Linux added a 16KB per-CPU cache to accommodate for
the missing 16KB malloc. FreeBSD supports this size, making the extra cache
harmful as it can't share buckets.
The use of atomic_sub_64() in zfs_zstd.c was breaking the 32-bit build on
platforms without native 64-bit atomics due to atomic_sub_64() not being
available, and no fallback being provided in _STANDALONE.
Provide a standalone stub to match atomic_add_64() using simple math.
While this is not actually atomic, it does not matter in libsa context,
since it always runs single-threaded and does not run under a scheduler.
Reviewed by: mjg (in email)
This applies:
commit c4ede65bdf
Author: Mateusz Guzik <mjguzik@gmail.com>
Date: Fri Oct 30 23:26:10 2020 +0100
zstd: track allocator statistics
Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.
from openzfs
Foundation copyrights, approved by emaste@. It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.
Reviewed by: emaste, imp, gbe (manpages)
Differential Revision: https://reviews.freebsd.org/D26980
hese kstats are often expensive to compute so we want to avoid them
unless specifically requested.
The following kstats are affected by this change:
kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg
PR: 249258
Reported by: mjg
Reviewed by: mjg, allanjude
Obtained from: https://github.com/openzfs/zfs/pull/11099
Sponsored by: iXsystems, Inc.
- fix panic due to tqid overflow
- Improve libzfs_error_init messages
- Expose zfetch_max_idistance tunable
- Make dbufstat work on FreeBSD
- Fix EIO after resuming receive of new dataset over an existing one
Some vnodes come with a hack which inherits the fplookup flag despite having vops
which don't provide the routine.
Reported by: YAMAMOTO Shigeru <shigeru@os-hackers.jp>
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.
Make it 64-bit on LP64 platforms and 0-check otherwise.
Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.
Reported by: YAMAMOTO Shigeru <shigeru@os-hackers.jp>
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D26759
Add support to the _STANDALONE environment enough bits of the kernel
that we can compile it. We still have a small zstd_shim.c since there
were 3 items that were a bit hard to nail down and may be cleaned up
in the future. These go hand in hand with a number of commits to
sys/sys in the past weeks, should this need be MFCd.
Discussed with: mmacy (in review and on IRC/Slack)
Reviewed by: freqlabs (on openzfs repo)
Differential Revision: https://reviews.freebsd.org/D26218
Dynamically created OIDs automatically get this flag set.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26561
- Annotate FreeBSD sysctls with CTLFLAG_MPSAFE
- Reduce stack usage of Lua
- Don't save user FPU context in kernel threads
- Add support for procfs_list
- Code cleanup in zio_crypt
- Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
- Drop references when skipping dmu_send due to EXDEV
- Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
- Fix legacy compat for platform IOCs
This was introduced when I merged r361287 to OpenZFS and has been fixed
there already, commit 3f6bb6e43fd68e.
Reported by: swills
Reviewed by: allanjude, freqlabs, mmacy
An upcoming change to include bitset(9) macros from vm_page.h
causes a macro name collision with vchi's custom bitset macros.
This change was performed mechanically by:
sed -i .orig s/BITSET/VCHI_BITSET/g $(grep -rl BITSET sys/contrib/vchiq)
Reviewed by: andrew
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26177
The pointer to vnode is already stored into f_vnode, so f_data can be
reused. Fix all found users of f_data for DTYPE_VNODE.
Provide finit_vnode() helper to initialize file of DTYPE_VNODE type.
Reviewed by: markj (previous version)
Discussed with: freqlabs (openzfs chunk)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346
This package is intended to be used with ice(4) version 0.26.16. That
update will happen in a forthcoming commit.
MFC after: 3 days
Sponsored by: Intel Corporation
Make the inlines static to avoid kernel build failure with Clang 11 on i386.
(The issue was not observed with Clang 10, currently in tree; reproduction
depends on compiler inlining choices.)
The compiler may choose not to inline 'bare' C inlines, and in that case
expects a symbol of the same name will be available. It does not
automatically define that symbol at use, because of traditional C linking
semantics. (In contrast, C++ does define it, and then deduplicates redundant
definitions at link). As we do not instantiate the C99 inline ('extern
inline ...;'), the linker errors with "undefined symbol."
Reported by: dim
Tested by: dim
Fixes: r364219
Add prng(9) as a replacement for random(9) in the kernel.
There are two major differences from random(9) and random(3):
- General prng(9) APIs (prng32(9), etc) do not guarantee an
implementation or particular sequence; they should not be used for
repeatable simulations.
- However, specific named API families are also exposed (for now: PCG),
and those are expected to be repeatable (when so-guaranteed by the named
algorithm).
Some minor differences from random(3) and earlier random(9):
- PRNG state for the general prng(9) APIs is per-CPU; this eliminates
contention on PRNG state in SMP workloads. Each PCPU generator in an
SMP system produces a unique sequence.
- Better statistical properties than the Park-Miller ("minstd") PRNG
(longer period, uniform distribution in all bits, passes
BigCrush/PractRand analysis).
- Faster than Park-Miller ("minstd") PRNG -- no division is required to
step PCG-family PRNGs.
For now, random(9) becomes a thin shim around prng32(). Eventually I
would like to mechanically switch consumers over to the explicit API.
Reviewed by: kib, markj (previous version both)
Discussed with: markm
Differential Revision: https://reviews.freebsd.org/D25916
It was pointed out to me that this is the convention for documenting upgrade
instructions, rather than just leaving the instructions in the commit message.
It's possible these commands won't be used again before we transition to git,
but then at least they'll give a path forward for whoever touches this next.
Suggested by: lwhsu
We use these to compile libefivar. The particular motivation for this update is
the inclusion of the RISC-V machine definitions that allow us to build the
library on the platform. This support could easily have been submitted as a
small local diff, but the timing of the release coincided with this work, and
it has been over 3 years since these sources were initially imported.
Note that this comes with a license change from regular BSD 2-clause to the
BSD+Patent license. This has been approved by core@ for this particular
project [1].
As with the original import, we retain only the subset of headers that we
actually need to build libefivar. I adapted imp@'s process slightly for this
update:
# Generate list of the headers needed to build
cp -r ../vendor/edk2/dist/MdePkg/Include sys/contrib/edk2
cd lib/libefivar
make
pushd `make -V .OBJDIR`
cat .depend*.o | grep sys/contrib | cut -d' ' -f 3 |
sort -u | sed -e 's=/full/path/sys/contrib/edk2/==' > /tmp/xxx
popd
# Merge the needed files
cd ../../sys/contrib/edk2
svn revert -R .
for i in `cat /tmp/xxx`; do
svn merge -c VendorRevision svn+ssh://repo.freebsd.org/base/vendor/edk2/dist/MdePkg/$i $i
done
svn merge -c VendorRevision svn+ssh://repo.freebsd.org/base/vendor/edk2/dist/MdePkg/MdePkg.dec MdePkg.dec
[1] https://www.freebsd.org/internal/software-license.html
The ice(4) driver is the driver for the Intel E8xx series Ethernet
controllers; currently with codenames Columbiaville and
Columbia Park.
These new controllers support 100G speeds, as well as introducing
more queues, better virtualization support, and more offload
capabilities. Future work will enable virtual functions (like
in ixl(4)) and the other functionality outlined above.
For full functionality, the kernel should be compiled with
"device ice_ddp" like in the amd64 NOTES file, and/or
ice_ddp_load="YES" should be added to /boot/loader.conf so that
the DDP package file included in this commit can be downloaded
to the adapter. Otherwise, the adapter will fall back to a single
queue mode with limited functionality.
A man page for this driver will be forthcoming.
MFC after: 1 month
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21959
* Removed adaptive interrupt moderation (not suported on FreeBSD).
* Use ena_com_free_q_entries instead of ena_com_free_desc.
* Don't use ENA_MEM_FREE outside of the ena_com.
* Don't use barriers before calling doorbells as it's already done in
the HAL.
* Add function that generates random RSS key, common for all driver's
interfaces.
* Change admin stats sysctls to U64.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Although I added the reset type field to ath_hal_reset() years ago,
I never finished adding it both throughout the HALs and in if_ath.c.
This will eventually deprecate the ath_hal force_full_reset option
because it can be requested at the driver layer.
So:
* Teach ar5416ChipReset() and ar9300_chip_reset() about the HAL type
* Use it in ar5416Reset() and ar9300_reset() when doing a full chip reset
* Extend ath_reset() to include the HAL_RESET_TYPE parameter added in the above functions
* Use HAL_RESET_NORMAL in most calls to ath_reset()
* .. but use HAL_RESET_BBPANIC for the BB panics, and HAL_RESET_FORCE_COLD during fatal, beacon miss and other hardware related hangs.
This should be a glorified no-op outside of actual hardware issues.
I've tested things with ath_hal force_full_reset set to 1 for years now,
so I know that feature and a full reset works (albeit much slower than
a warm reset!) and it does unwedge hardware.
The eventual aim is to use this for all the places where the driver
detects a potential hang as well as if long calibration - ie, noise floor
calibration - fails to complete. That's one of the big hardware related
things that causes station mode operation to hang without easy recovery.
Differential Revision: https://reviews.freebsd.org/D24981
As usual, the full release notes are found on Github:
https://github.com/facebook/zstd/releases/tag/v1.4.5
Notable changes include:
* Improved decompress performance on amd64 and arm (5-10%
and 15-50%, respectively).
* '--patch-from' zstd(1) CLI option, which provides something like a very fast
version of bspatch(1) with slightly worse compression. See release notes.
In this update, I dropped the 3-year old -O0 workaround for an LLVM ARM bug;
the bug was fixed in LLVM SVN in 2017, but we didn't remove this workaround
from our tree until now.
MFC after: I won't, but feel free
Relnotes: yes
Yes, people shouldn't use bitfields in C for structure parsing.
If someone ever wants a cleanup task then it'd be great to remove them
from this vendor code and other places in the ar9285/ar9287 HALs.
Alas, here we are.
AH_BYTE_ORDER wasn't defined and neither were the two values it could be.
So when compiling ath_ee_print_9300 it'd default to the big endian struct
layout and get a WHOLE lot of stuff wrong.
So:
* move AH_BYTE_ORDER into ath_hal/ah.h where it can be used by everyone.
* ensure that AH_BYTE_ORDER is actually defined before using it!
This should work on both big and little endian platforms.
Ok, yeah, the commit title is a bit misleading.
This has to do with CDD (cyclic delay diversity) - how this and later
wifi hardware transmits lower rates over more antennas. Eg, if you're
transmitting legacy 11abg rates on 2 or 3 antennas, you COULD just
send them all at the same time or you could delay each by tens/hundreds
of nanoseconds to try and get some better diversity characteristics.
However, this has a fun side effect - the antenna pattern is no longer
a bunch of interacting dipoles, but are a bunch of interacting dipoles
plus a bunch of changing phases. And it's frequency dependent - 50-200nS
is not exactly the same fraction of a wavelength across all of 2GHz or 5GHz!
Thus the power spectral density and maximum directional gain that you're
effectively getting is not .. well, as flat as it once was.
For more information, look up FCC/OET 13TR1003 in the FCC technical report
database. It has pretty graphics and everything.
Anyway, the problem lies thusly - the CDD code just subtracts another 3dB
or 5dB for the lower rates based on transmit antenna configuration.
However, it's not done based on operating configuration and it doesn't
take into account how far from any regulatory limits the hardware is at.
It also doesn't let us do things like transmit legacy rates and frames
on a single antenna without losing up to 5dB when we absolutely don't
need to in that case (there's no CDD used when one antenna is used!)
This shows up as the hardware behaving even worse for longer distance links
at 20MHz because, well, those are the exact rates losing a bunch more
transmit power.
* For lower power NICs (ie the majority of what is out there!) it's highly
unlikely we're going to hit anywhere near the PSD limits.
* It's doing it based on the existing limits from the CTL table (conformance
testing limits) - this isn't the regulatory max! It's what the NIC is
allowed to put out in each frequency and rate configuration! So things like
band edges, power amplifier behaviour and maximum current draw apply here.
Blindly subtracting 3 to 5dB from /this/ value is /very/ conservative..
* /and/ ath9k just plainly doesn't do any of this at all.
So, for now disable it and get the TX power back, thus matching what ath9k
in Linux is doing. If/once I get some more cycles I'll look at making it
a bit more adaptive and really only kick in if we're a few dB away from
hard regulatory limits.
Tested:
* AR9344 (2GHz + SoC, 2x2 configuration) - AP and STA modes
* QCA9580 (5GHz 2x2 and 3x3 configurations) - AP and STA modes
Now that armv4/v5 is gone, remove the bits that implemented atomic operations
by disabling interrupts.
Those were specific to FreeBSD and never reached upstream.
This is required to switch MIPS to compile with LLVM by default (D23204).
Reviewed By: brooks
Differential Revision: https://reviews.freebsd.org/D24091
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Remove an old workaround that is no longer necessary since rS343824.
There used to be a problem with FMan interrupts firing on multiple CPUS
at the same time.
This ended up being due to multicast interrupts being unsupported in the
Freescale PIC (so instead of using a selection algorithm, it would do some
unspecified action, such as interrupting multiple cpus at random.)
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23829
After the network epoch was added, we lost the ability to migrate the
ithread in the middle of dispatch, as being in the network epoch will pin
the current thread (for safety reasons.)
Luckily, we don't actually have to do this workaround in the first place,
as we can just bind it to the correct cpu when we preallocate it.
Pass dev through to XX_PreallocAndBindIntr() and actually bind it to the
cpu like it was supposed to in the first place, instad of leaving it
floating and moving it to the correct cpu the first time it fires.
This fixes panics while bringing up dtsec on my X5000.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23826
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Approved by: kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23628
* Fix a couple of format errors.
* Add some extra compiler flags needed to force clang to build SPE code.
(These are temporary until the target triple is fixed)
ipf_pcksum6(), directly pass the adddress of the mbuf to it. This reduces
one pointer dereference. ipf_pcksum6() doesn't use the packet information
control block except to obtain the mbuf address.
MFC after: 3 days
FreeBSD-only function should live in the O/S specific source file.
This essentially reverts r349929 Now that ipftest and ipfreplay are
disabled in FreeBSD 11-stable.
MFC after: 3 days
The full release notes can be found on Github:
https://github.com/facebook/zstd/releases/tag/v1.4.4
Notable changes in this release include improved decompression speed (about
10%). See the Github release notes for more details.
MFC after: I'm not going to, but feel free
Relnotes: yes
Version 43 requires further modifications to iwm(4), and this was not
caught in some initial testing. Version 34 works and is the version
available on Intel's web site.
MFC with: r354201
Sponsored by: The FreeBSD Foundation
Mock implementation of NETMAP routines is located in ena_netmap.c/.h
files. All code is protected under the DEV_NETMAP macro. Makefile was
updated with files and flag.
As ENA driver provide own implementations of (un)likely it must be
undefined before including NETMAP headers.
ena_netmap_attach function is called on the end of NIC attach. It fills
structure with NIC configuration and callbacks. Then provides it to
netmap_attach. Similarly netmap_detach is called during ena_detach.
Three callbacks are used.
nm_register is implemented by ena_netmap_reg. It is called when user
space application open or close NIC in NETMAP mode. Current action is
recognized based on onoff parameter: true means on and false off. As
NICs rings need to be reconfigured ena_down and ena_up are reused.
When user space application wants to receive new packets from NIC
nm_rxsync is called, and when there are new packets ready for Tx
nm_txsync is called.
Differential Revision: https://reviews.freebsd.org/D21934
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
This is in preparation for adding the corresponding support to iwm(4).
Version 46 is the latest but contains unrecognized TLVs, so use version
43 for now.
Obtained from: linux-firmware
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
The logic in XX_IsPortalIntr() was reading past the end of XX_PInfo.
This was causing it to erroneously return 1 instead of 0 in some
circumstances, causing a panic on the AmigaOne X5000 due to mixing
exclusive and nonexclusive interrupts on the same interrupt line.
Since this code is only called a couple of times during startup, use
a simple double loop instead of the complex read-ahead single loop.
This also fixes a bug where it would never check cpu=0 on type=1.
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D21988
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
On 64-bit platforms uintptr_t makes the copy twice as large as it should be.
This code isn't actually used in FreeBSD, since it's for guest mode only,
not hypervisor mode, but fixing it for completeness sake.
Reported by: bdragon (clang9 build)
Clang sees this construct and warns that adding an int to a string like this
does not concatenate the two. Fortunately, this is not what octeon-sdk
actually intended to do, so we take the path towards remediation that clang
offers: use array indexing instead.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
Disable ar9300_swap_tx_desc() for the moment. It is an unused
function only tried to compile on big endian systems.
Found by: s390x buildkernel
MFC after: 3 months
Fix the reported boot failures and revert r350510.
Note this commit is effectively merging ACPICA 20190703 again and applying
an upstream patch.
https://github.com/acpica/acpica/commit/73f6372
Tested by: scottl
instead of the hard-coded value of 4. This is a precursor to increasing
the number of interfaces speficied in "on {interface, ..., interface}".
Note that though this feature is coded in ipf_y.y, it is partially
supported in the ipfilter kld, meaning it does not work yet (and is yet
to be documented in ipf.5 too).
MFC after: 2 weeks
broken ipfilter rule matches (upstream bug #554). The upstream patch
was incomplete, it resolved all but one rule compare issue. The issue
fixed here is when "{to, reply-to, dup-to} interface" are used in
conjuncion with "on interface". The match was only made if the on keyword
was specified in the same order in each case referencing the same rule.
This commit fixes this.
The reason for this is that interface name strings and comment keyword
comments are stored in a a variable length field starting at fr_names
in the frentry struct. These strings are placed into this variable length
in the order they are encountered by ipf_y.y and indexed through index
pointers in fr_ifnames, fr_comment or one of the frdest struct fd_name
fields. (Three frdest structs are within frentry.) Order matters and
this patch takes this into account.
While in here it was discovered that though ipfilter is designed to
support multiple interface specifiations per rule (up to four), this
undocumented (the man page makes no mention of it) feature does not work.
A todo is to fix the multiple interfaces feature at a later date. To
understand the design decision as to why only four were intended, it is
suspected that the decision was made because Sun workstations and PCs
rarely if ever exceeded four NICs at the time, this is not true in 2019.
PR: 238796
Reported by: WHR <msl0000023508@gmail.com>
MFC after: 2 weeks
Our in-tree gcc doesn't have a no-tree-vectorize optimization knob, so we get a
warning that it's unused. This causes the build to fail on all our gcc platforms.
Add a quick version check as a stop-gap measure to get CI building again.
When the ipfilter kld is loaded, used within VNET jail, and unloaded,
then subsequent loading, use, and unloading of another packet filters
will cause the subsequently loaded netpfil kld's to panic.
The scenario is as follows:
cd /usr/tests/sys/netpfil/common
kldunload ipl
kldunload pfsync
kldunload ipfw
kyua test pass_block
kldload ipl
kyua test pass_block
kldunload ipl
kldload pfsync
kyua test pass_block
kldunload pfsync
-- page fault panic occurs here --
Reported by: "Ahsan Barkati" <ahsanbarkati@g.....com> via kp@
Discussed with: kp@
Tested by: kp@
MFC after: 3 days
Finish what was started a few years ago and harmonize IPv6 and IPv4
kernel names. We are down to very few places now that it is feasible
to do the change for everything remaining with causing too much disturbance.
Remove "aliases" for IPv6 names which confusingly could indicate
that we are talking about a different data structure or field or
have two fields, one for each address family.
Try to follow common conventions used in FreeBSD.
* Rename sin6p to sin6 as that is how it is spelt in most places.
* Remove "aliases" (#defines) for:
- in6pcb which really is an inpcb and nothing separate
- sotoin6pcb which is sotoinpcb (as per above)
- in6p_sp which is inp_sp
- in6p_flowinfo which is inp_flow
* Try to use ia6 for in6_addr rather than in6p.
* With all these gone also rename the in6p variables to inp as
that is what we call it in most of the network stack including
parts of netinet6.
The reasons behind this cleanup are that we try to further
unify netinet and netinet6 code where possible and that people
will less ignore one or the other protocol family when doing
code changes as they may not have spotted places due to different
names for the same thing.
No functional changes.
Discussed with: tuexen (SCTP changes)
MFC after: 3 months
Sponsored by: Netflix
with an eventual goal to convert all legacl zlib callers to the new zlib
version:
* Move generic zlib shims that are not specific to zlib 1.0.4 to
sys/dev/zlib.
* Connect new zlib (1.2.11) to the zlib kernel module, currently built
with Z_SOLO.
* Prefix the legacy zlib (1.0.4) with 'zlib104_' namespace.
* Convert sys/opencrypto/cryptodeflate.c to use new zlib.
* Remove bundled zlib 1.2.3 from ZFS and adapt it to new zlib and make
it depend on the zlib module.
* Fix Z_SOLO build of new zlib.
PR: 229763
Submitted by: Yoshihiro Ota <ota j email ne jp>
Reviewed by: markm (sys/dev/zlib/zlib_kmod.c)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D19706
ipfilter 5.1.2 into FreeBSD-10, the fix for, 2580062 from/to targets
should be able to use any interface name, moved frentry.fr_cksum to
prior to frentry.fr_func thereby making this code redundant. After
investigating whether this fix to move fr_cksum was correct and if it
broke anything, it has been determined that the fix is correct and this
code is redundant. We remove it here.
MFC after: 2 weeks
This changes the return code however the caller only tests for 0 and != 0.
One might ask then, why multiple return codes when the caller only tests
for 0 and != 0? From what I can tell, Darren probably passed various
return codes for sake of debugging. The debugging code is long gone
however we can still use the different return codes using DTrace FBT
traces. We can still determine why the compare failed by examining the
differences between the fr1 and fr2 frentry structs, which is a simple
test in DTrace. This allows reducing the number of tests, improving the
code while not affecting our ability to capture information for
diagnostic purposes.
MFC after: 1 week
prior to its import into FreeBSD. This macro calculates the size to be
compared within the frentry structure. The ipfilter 4 version of the
macro calculated the compare size based upon the static size of the
frentry struct. Today it uses the ipfilter 5 method of calculating the
size based upon the new to ipfilter 5 fr_size value found in the
frentry struct itself.
No effective change in code is intended.
MFC after: 1 week
The reason for this is that ipftest(8), which still works on FreeBSD-11,
fails to link to it, breaking stable/11 builds.
ipftest(8) was broken (segfault) sometime during the FreeBSD-12 cycle.
glebius@ suggested we disable building it until I can get around to
fixing it. Hence this was not caught in -current.
The intention is to fix ipftest(8) as it is used by the netbsd-tests
(imported by ngie@ many moons ago) for regression testing.
MFC after: immediately
via an ioctl interface. Rules can be added or removed and stats and
counters can be zeroed out. As the ipfilter interprets these
instructions or operations they are stored in an integer called
addrem (add/remove). 1 is add, 2 is remove, and 3 is clear stats and
counters. Much of this is not documented. This commit documents these
operations by replacing simple integers with a self documenting
enum along with a few basic comments.
MFC after: 1 week
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.
This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.
No functional change is intended. __FreeBSD_version is bumped.
Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247
by dropping TCP fragments with offset = 1.
In addition to dropping these fragments, add a DTrace probe to allow
for more detailed monitoring and diagnosis if required.
MFC after: 1 week
PR/203585 this appears to have been broken by r235959, which predates
the ipfilter 5.1.2 import into FreeBSD.
The IPv6 checksum calculation is incorrect. To resolve this we call
in6_cksum() to do the the heavy lifting for us, through a new function
ipf_pcksum6(). Should we need to revisit this area again, a DTrace probe
is added to aid with future debugging.
PR: 203275, 203585
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20583
assumed the pfil hook registration performed in ipf_modload() would take
care of this. However ipf_modload() is only called when the ipl kld is
loaded or when ipfilter is first called when it is statically linked
into the kernel at build time.
Prior to this, even though r302298 has been in the tree for a while, it
has never been used. So, r302298 in reality begins now.
PR: 212000
Reported by: ahsanb@
MFC after: 1 month
Recent HAL change preparing to support ENAv2 required minor driver
modifications.
The ena_com_sq_empty_space() is not available in this ena-com, so it had
to be replaced with ena_com_free_desc().
Moreover, the ena_com_admin_init() is no longer using 3rd argument
indicating if the spin lock should be initialized, so it was removed.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
This is a prerequisite of unifying kernel zlib instances.
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20191
One of the fun issues with scanning has been how the existing
ANI values were programmed into the hardware when channels were
changed. If you're on a really crappy channel and ANI has made
you deaf then when you scan you continue to be deaf on all channels.
This code passes in a flag to startpcureceive which in AR5416 and later
is also used to enable ANI. This allows it to know if it's a normal
operation or a scan operation.
This fixes my situation at home where a temporary spot of a device
going deaf due to interference starts scanning and .. can't hear
anything until I restart.
Now, this isn't the full fix - ideally:
(a) all the ANI config and per-channel information would be migrated
to the shared HAL stuff and enabled for all of the NICs;
(b) when a station reassociates and some other error conditions
(like missed beacons, NF calibration failures, etc) a knob
to reset ANI parameters would likely help recovery.
But hey, I'm committing bits of code again! woo!
Tested:
* AR9344 (2G), STA operation
When building libnv without a debug those arguments are no longer used
because assertions will be changed to NOP.
Submitted by: Mindaugas Rasiukevicius <rmind@netbsd.org>
MFC after: 2 weeks
When building libnv without a debug those arguments are no longer used
because assertions will be changed to NOP.
Submitted by: Mindaugas Rasiukevicius <rmind@netbsd.org>
MFC after: 2 weeks
The drivers were removed in r344299 so there is no need to keep the
firmware files in the src tree.
Reviewed by: imp, jhibbits, johalun
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19583
Embedded lzma decompression library becomes a module usable by other
consumers, in addition to geom_uzip.
Most important code changes are
- removal of XZ_DEC_SINGLE define, we need the code to work
with XZ_DEC_DYNALLOC;
- xz_crc32_init() call is removed from geom_uzip, xz module handles
initialization on its own.
xz is no longer embedded into geom_uzip, instead the depend line for
the module is provided, and corresponding kernel option is added to
each MIPS kernel config file using geom_uzip.
The commit also carries unrelated cleanup by removing excess "device geom_uzip"
in places which were missed in r344479.
Reviewed by: cem, hselasky, ray, slavash (previous versions)
Sponsored by: Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D19266
MFC after: 3 weeks
In r343986 we introduced a double free. The structure was already
freed fixed in the r302966. This problem was introduced
because the GitHub version was out of sync with the FreeBSD one.
Submitted by: Mindaugas Rasiukevicius <rmind@netbsd.org>
MFC with: r343986
nvpair_create_stringv: free the temporary string; this fix affects
nvlist_add_stringf() and nvlist_add_stringv().
nvpair_remove_nvlist_array (NV_TYPE_NVLIST_ARRAY case): free the chain
of nvpairs (as resetting it prevents nvlist_destroy() from freeing it).
Note: freeing the chain in nvlist_destroy() is not sufficient, because
it would still leak through nvlist_take_nvlist_array(). This affects
all nvlist_*_nvlist_array() use
Submitted by: Mindaugas Rasiukevicius <rmind@netbsd.org>
Reported by: clang/gcc ASAN
MFC after: 2 weeks
never get here, however a test for SOLARIS, as redundant as this test is,
serves to document that this is the illumos definition. This should help
those who come after me to follow the code more easily.
MFC after: 1 month
Remove #ifdefs for ancient and irrelevant operating systems from
ipfilter.
When ipfilter was written the UNIX and UNIX-like systems in use
were diverse and plentiful. IRIX, Tru64 (OSF/1) don't exist any
more. OpenBSD removed ipfilter shortly after the first time the
ipfilter license terms changed in the early 2000's. ipfilter on AIX,
HP/UX, and Linux never really caught on. Removal of code for operating
systems that ipfilter will never run on again will simplify the code
making it easier to fix bugs, complete partially implemented features,
and extend ipfilter.
Unsupported previous version FreeBSD code and some older NetBSD code
has also been removed.
What remains is supported FreeBSD, NetBSD, and illumos. FreeBSD and
NetBSD have collaborated exchanging patches, while illumos has expressed
willingness to have their ipfilter updated to 5.1.2, provided their
zone-specific updates to their ipfilter are merged (which are of interest
to FreeBSD to allow control of ipfilters in jails from the global zone).
Reviewed by: glebius@
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D19006
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.
In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.
New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.
Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.
Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.
Differential Revision: https://reviews.freebsd.org/D18951