Intel Speed Shift is Intel's technology to control frequency in hardware,
with hints from software.
Let's get a working version of this in the tree and we can refine it from
here.
Submitted by: bwidawsk, scottph
Reviewed by: bcr (manpages), myself
Discussed with: jhb, kib (earlier versions)
With feedback from: Greg V, gallatin, freebsdnewbie AT freenet.de
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D18028
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.
It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
an expiration time for redirected routes. It defaults to 10 minutes,
analogues with net.inet6.icmp6.redirtimeout.
Implementation uses separate file, route_temporal.c, as route.c is already
bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
route with non-zero rt_expire time is added or rt_expire is changed.
It does not add any overhead when no temporal routes are present.
Callout traverses entire routing tree under wlock, scheduling expired routes
for deletion and calculating the next time it needs to be run. The rationale
for such implemention is the following: typically workloads requiring large
amount of routes have redirects turned off already, while the systems with
small amount of routes will not inhibit large overhead during tree traversal.
This changes also fixes netstat -rn display of route expiration time, which
has been broken since the conversion from kread() to sysctl.
Reviewed by: bz
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D23075
The main objective here is to make it easy to identify what needs to change
in order to use a different sysent generator than the current Lua-based one,
which may be used to MFC some of the changes that have happened so we can
avoid parallel accidents in stable branches, for instance.
As a secondary objective, it's now feasible to override the generator on a
per-Makefile basis if needed, so that one could refactor their Makefile to
use this while pinning generation to the legacy makesyscalls.sh. I don't
anticipate any consistent need for such a thing, but it's low-effort to
achieve.
Summary:
The CPLD is the communications medium between the CPU and the XMOS
"Xena" event coprocessor. It provides a mailbox communication feature,
along with dual-port RAM to be used between the CPU and XMOS. Also, it
provides basic board stats as well, such as PCIe presence, JTAG signals,
and CPU fan speed reporting (in revolutions per second). Only fan speed
reading is handled, as a sysctl.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D23136
r355473 vastly improved the readability and cleanliness of these Makefiles.
Every single one of them follows the same pattern and duplicates the exact
same logic.
Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll
use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and
include a common sysent.mk to handle the rest. This makes it less tedious to
make sweeping changes.
Some default values are provided for GENERATED/SYSENT_*; almost all of these
just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use
effectively the same filenames with an arbitrary prefix. Most ABIs will be
able to get away with just setting GENERATED_PREFIX and including
^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile
is the notable exception, as it doesn't take a SYSENT_CONF and the generated
files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits
the pattern enough to use the common version.
Reviewed by: brooks, imp
Nice!: emaste
Differential Revision: https://reviews.freebsd.org/D23197
The utility here seems somewhat limited, but clang will attempt to generate
.eh_frame and actively fail in doing so. It is perhaps worth investigating
why it's being generated in the first place (GCC doesn't do so), but this
isn't a high priority.
lld on RISC-V is not yet able to handle undefined weak symbols for
non-PIC code in the code model (medany/medium) used by the RISC-V
kernel.
Both GCC and clang emit an auipc / addi pair of instructions to
generate an address relative to the current PC with a 31-bit offset.
Undefined weak symbols need to have an address of 0, but the kernel
runs with PC values much greater than 2^31, so there is no way to
construct a NULL pointer as a PC-relative value. The bfd linker
rewrites the instruction pair to use lui / addi with values of 0 to
force a NULL pointer address. (There are similar cases for 'ld'
becoming auipc / ld that bfd rewrites to lui / ld with an address of
0.)
To work around this, compile the kernel with -fPIE when using lld.
This does not make the kernel position-independent, but it does
force the compiler to indirect address lookups through GOT entries
(so auipc / ld against a GOT entry to fetch the address). This
adds extra memory indirections for global symbols, so should be
disabled once lld is finally fixed.
A few 'la' instructions in locore that depend on PC-relative
addressing to load physical addresses before paging is enabled have to
use auipc / addi and not indirect via GOT entries, so change those to
use 'lla' which always uses auipc / addi for both PIC and non-PIC.
Submitted by: jrtc27
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23064
more consistent with other NUMA features as UMA_ZONE_FIRSTTOUCH and
UMA_ZONE_ROUNDROBIN. The system will now pick a select a default depending
on kernel configuration. API users need only specify one if they want to
override the default.
Remove the UMA_XDOMAIN and UMA_FIRSTTOUCH kernel options and key only off
of NUMA. XDOMAIN is now fast enough in all cases to enable whenever NUMA
is.
Reviewed by: markj
Discussed with: rlibby
Differential Revision: https://reviews.freebsd.org/D22831
This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.
This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.
The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.
Submitted by: jhibbits (original patch), kevans (mips bits)
Reviewed by: jhibbits, jeff, kevans
Differential Revision: https://reviews.freebsd.org/D22976
The fdt attachment for this heavily relies on extres for clk work. This
unbreaks the build for mips XLPN32/XLP, which have pci/fdt but no need for
this fdt attachment.
An i2c bus can be divided into segments which can be selectively connected
and disconnected from the main bus. This is usually done to enable using
multiple slave devices having the same address, by isolating the devices
onto separate bus segments, only one of which is connected to the main bus
at once.
There are several types of i2c bus muxes, which break down into two general
categories...
- Muxes which are themselves i2c slaves. These devices respond to i2c
commands on their upstream bus, and based on those commands, connect
various downstream buses to the upstream. In newbus terms, they are both
a child of an iicbus and the parent of one or more iicbus instances.
- Muxes which are not i2c devices themselves. Such devices are part of the
i2c bus electrically, but in newbus terms their parent is some other
bus. The association with the upstream bus must be established by
separate metadata (such as FDT data).
In both cases, the mux driver has one or more iicbus child instances
representing the downstream buses. The mux driver implements the iicbus_if
interface, as if it were an iichb host bridge/i2c controller driver. It
services the IO requests sent to it by forwarding them to the iicbus
instance representing the upstream bus, after electrically connecting the
upstream bus to the downstream bus that hosts the i2c slave device which
made the IO request.
The net effect is automatic mux switching which is transparent to slaves on
the downstream buses. They just do i2c IO they way they normally do, and the
bus is electrically connected for the duration of the IO and then idled when
it is complete.
The existing iicbus_if callback() method is enhanced so that the parameter
passed to it can be a struct which contains a device_t for the requesting
bus and slave devices. This change is done by adding a flag that indicates
the extra values are present, and making the flags field the first field of
a new args struct. If the flag is set, the iichb or mux driver can recast
the pointer-to-flags into a pointer-to-struct and access the extra
fields. Thus abi compatibility with older drivers is retained (but a mux
cannot exist on the bus with the older iicbus driver in use.)
A new set of core support routines exists in iicbus.c. This code will help
implement mux drivers for any type of mux hardware by supplying all the
boilerplate code that forwards IO requests upstream. It also has code for
parsing metadata and instantiating the child iicbus instances based on it.
Two new hardware mux drivers are added. The ltc430x driver supports the
LTC4305/4306 mux chips which are controlled via i2c commands. The
iic_gpiomux driver supports any mux hardware which is controlled by
manipulating the state of one or more gpio pins. Test Plan
Tested locally using a variety of mux'd bus configurations involving both
ltc4305 and a homebrew gpio-controlled mux. Tested configurations included
cascaded muxes (unlikely in the real world, but useful to prove that 'it all
just works' in terms of the automatic switching and upstream forwarding of
IO requests).
This brings arm into line with how every other arch does it. For some
reason, only arm lacked a definition of a symbol named kernbase in its
locore.S file(s) for use in its ldscript.arm file. Needlessly different
means harder to maintain.
Using a common symbol name also eases work in progress on a script to help
generate arm and arm64 kernels packaged in various ways (like with a header
blob needed for a bootloader prepended to the kernel file).
symbols from the linked kernel.
The main thrust of this change is to generate a kernel that has the arm
"marker" symbols stripped. Marker symbols start with $a, $d, $t or $x, and
are emitted by the compiler to tell other toolchain components about the
locations of data embedded in the instruction stream (literal-pool
stuff). They are used for generating mixed-endian binaries (which we don't
support). The linked kernel has approximately 21,000 such symbols in it,
wasting space (500K in kernel.full, 190K in the final linked kernel), and
sometimes obscuring function names in stack tracebacks.
This change also simplifies the way the kernel is linked. Instead of using
sed to generate two different ldscript files to generate both an elf kernel
and a binary (elf headers stripped) kernel, we now use a single ldscript
that refers to a "text_start" symbol, and we provide the value for that
symbol using --defsym on the linker command line.
This driver configure the registers in the GRF according to the value
of the regulators for the platform.
Some IP can run with either 3.0V or 1.8V, if we don't configure them
correctly according to the external voltage used they will not work.
It's only done at boot time for now and might be needed at runtime for
IP like sdmmc.
Reviewed by: mmel
Tested On: RockPro64, Firefly-RK3399 (gonzo), AIO-3288 (mmel)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D22854
* Fix a couple of format errors.
* Add some extra compiler flags needed to force clang to build SPE code.
(These are temporary until the target triple is fixed)
To improve reliability of kernel modules after the clang switch, switch to
-fPIC when building for now.
This bypasses some limitations to the way clang and LLD handle relocations,
and is a more robustly tested compilation regime than the
"static shared object" mode that we were previously attempting to convince
the compiler stack to use.
The kernel linker was recently augmented to be able to handle this mode.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22798
other files.
Arm and mips systems need to replace the SYSTEM_LD variable because they
need to create intermediate files which are post-processed with objcopy to
create the final .TARGET file. Previously they did so by pasting the full
expansion of SYSTEM_LD with the output filename replaced. This means
changing SYSTEM_LD in kern.pre.mk means you need to chase down anything that
replaces it and figure out how it differs so you can paste your changes in
there too.
Now there is a SYSTEM_LD_BASECMD variable that holds the entire basic kernel
linker command without the input and output files. This will allow arm and
mips makefiles to create their custom versions by refering to
SYSTEM_LD_BASECMD, which then becomes the one place where you have to make
changes to the basic linker command args.
Differential Revision: https://reviews.freebsd.org/D22921
SYSTEM_LD variable. This avoids duplicating the contents of SYSTEM_LD
from kern.pre.mk just to add the -N flag to it. If the basic linker command
ever needs to be changed, this will be one less place that has to be found
and fixed.
Some testing by kp@ indicates that the -N flag may not be needed at all,
so a comment to that effect is also added, and the -N flag may be removed
in a followup commit.
Differential Revision: https://reviews.freebsd.org/D22920
The VM generation counter is a 128-bit value exposed by the BIOS via ACPI.
The value changes to another unique identifier whenever a VM is duplicated.
Additionally, ACPI provides notification events when such events occur.
The driver decodes the pointer to the UUID, exports the value to userspace
via OPAQUE sysctl blob, and forwards the ACPI notifications in the form of
an EVENTHANDLER invocation as well as userspace devctl events.
See design paper: https://go.microsoft.com/fwlink/p/?LinkID=260709
This is lame, but it's what we already do for the clang build. We take
misaligned pointers into network header structures in many places.
Reviewed by: ian
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22876
The OpenCores I2C IP core can be found on any bus. Split out the PCI
bus specifics into their own file, only compiled on systems with PCI.
Reviewed by: kp
Sponsored by: Axiado
We use armv7/GENERIC for the RPI2 images. The original RPI2 is actually a
32-bit BCM2836, but v1.2 was upgraded to the 64-bit BCM2837. The project
continues to provide the RPI2 image as armv7, as it's the lowest common
denominator of the two. Historically, we've just kind of implicitly
acknowledged this by including some bcm2837 bits on a SOC_BCM2836 kernel
config -- this worked until r354875 added code that actually cared.
Acknowledge formally that BCM2837 is valid in arm32.
This name is inconsistent with the other BCM* SOC on !arm64 for two reasons:
1. It's a pre-existing option on arm64, and
2. the naming convention on arm/ should've arguably changed to include BRCM
#1 seems to be a convincing enough argument to maintain the existing name
for it.
array with a singleton.
Also, pccbb isa attachment is never going to happen, do disconnect it from the
build (will delete this in future commit). It would need to be updated as well,
but since this code is effectively dead code, remove it from the build instead.
Unfortunately, there are some limitations:
- memory aperture of his controller is only 16MiB, so it is nearly
unusable for graphic cards
- every attempt to generate type 1 config cycle always causes trap.
These config cycles are disabled now and we don't support cards
with PCIe switch.
- in some cases, attempt to do config cycle to (probably) not-yet ready
card also causes trap. This cannot be detected at runtime, but it seems
like very rare issue.
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D22724
We used to include the hisi version if soc_hisi_hi6220 was present,
include the altera version if dwmmc_altera was present and include
the rockchip version if soc_rockchip_rk3328 was present.
Now every version have it's own device directive.
The rockchip version isn't named dwmmc_rockchip because all other
rockchip driver are named rk_XXX.
MFC after: 1 month
These were obtained from the Chelsio Unified Wire v3.12.0.1 beta
release.
Note that the firmwares are not uuencoded any more.
MFH: 1 month
Sponsored by: Chelsio Communications
This change makes it possible to use OPAL console as a GDB debug port.
Similar to uart and uart_phyp debug ports, it has to be enabled by
setting the hw.uart.dbgport variable to the serial console node
of the device tree.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22649
Summary:
There's no need to use the fallback fls() and flsl() libkern functions
when the PowerISA includes instructions that already do the bulk of the
work. Take advantage of this through the GCC builtins __builtin_clz()
and __builtin_clzl().
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D22340
In some cases, like is locked bootstrap or device's inability to boot from
removable media, we cannot use standard boot sequence and is necessary to
boot kernel directly from U-Boot.
Discussed with: jhibbits
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D13861
Uses two GPIO pins as MDC (clock) and MDIO (bidirectional I/O), relies
on mii_bitbang.
Tested on SG-3200 where the PHY for one of the ports is wired independently
of the SoC MDIO bus.
Sponsored by: Rubicon Communications, LLC (Netgate)
ConnectX-6 DX.
Currently TLS v1.2 and v1.3 with AES 128/256 crypto over TCP/IP (v4
and v6) is supported.
A per PCI device UMA zone is used to manage the memory of the send
tags. To optimize performance some crypto contexts may be cached by
the UMA zone, until the UMA zone finishes the memory of the given send
tag.
An asynchronous task is used manage setup of the send tags towards the
firmware. Most importantly setting the AES 128/256 bit pre-shared keys
for the crypto context.
Updating the state of the AES crypto engine and encrypting data, is
all done in the fast path. Each send tag tracks the TCP sequence
number in order to detect non-contiguous blocks of data, which may
require a dump of prior unencrypted data, to restore the crypto state
prior to wire transmission.
Statistics counters have been added to count the amount of TLS data
transmitted in total, and the amount of TLS data which has been dumped
prior to transmission. When non-contiguous TCP sequence numbers are
detected, the software needs to dump the beginning of the current TLS
record up until the point of retransmission. All TLS counters utilize
the counter(9) API.
In order to enable hardware TLS offload the following sysctls must be set:
kern.ipc.mb_use_ext_pgs=1
kern.ipc.tls.ifnet.permitted=1
kern.ipc.tls.enable=1
Sponsored by: Mellanox Technologies
Interrupt based driver, implements SPI mode and clock configuration.
Tested on espressobin and SG-3200.
Sponsored by: Rubicon Communications, LLC (Netgate)
When the linker doesn't have this feature, add -mno-relax to CFLAGS
on RISC-V.
Define the feature for ld.bfd, but not lld. If lld gains relaxation
support in a newer version, we can enable it for those versions of lld
in bsd.linker.mk.
Reviewed by: mhorne
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22659
The hardware offload is primarily targeted for TLS v1.2 and v1.3,
using AES 128/256 bit pre-shared keys. This patch adds all the needed
hardware structures, capabilites and firmware commands.
Sponsored by: Mellanox Technologies
This controller is a bit tricky as the STOP condition must be indicated in
the last tranferred byte, some devices will not like the repeated start
behavior of this controller. A proper fix to this issue is in the works.
This driver works in polling mode, can be used early in the boot (required
in some cases).
Tested on espressobin/SG-1100 and the SG-3200.
Obtained from: pfSense
Sponsored by: Rubicon Communications, LLC (Netgate)
This makes it possible to retrieve per-connection statistical
information such as the receive window size, RTT, or goodput,
using a newly added TCP_STATS getsockopt(3) option, and extract
them using the stats_voistat_fetch(3) API.
See the net/tcprtt port for an example consumer of this API.
Compared to the existing TCP_INFO system, the main differences
are that this mechanism is easy to extend without breaking ABI,
and provides statistical information instead of raw "snapshots"
of values at a given point in time. stats(3) is more generic
and can be used in both userland and the kernel.
Reviewed by: thj
Tested by: thj
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20655
After discussing with mmel@, it was clear this is insufficient to address
all the needs. mmel@ will commit his original patch, from
https://reviews.freebsd.org/D13861, and the additions needed from r354714
will be made afterward.
Requested by: mmel
Sponsored by: Juniper Networks, Inc.
r354290 removed arm.arm from universe, but arm.arm kernels were still
found and built during the kernel stage. r354934 tagged armv5 kernel
configs as NO_UNIVERSE, but LINT-V5 remained. Stop building it as well.
Leave the clean rule in place for now so folks don't end up with a stale
LINT-V5.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22560
This change makes it possible to use a POWER Hypervisor virtual
terminal device (phyp vty) as a GDB debug port.
Similar to the uart debug port, it has to be enabled by setting
the hw.uart_phyp.dbgport variable to the vty node of the device
tree.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22205
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters.
Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE
connections.
NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment
the encrypted TLS frames output by the crypto engine. Instead, the
TOE is placed into a special setup to permit "dummy" connections to be
associated with regular sockets using KTLS. This permits using the
TOE to segment the encrypted TLS records. However, this approach does
have some limitations:
1) Regular TOE sockets cannot be used when the TOE is in this special
mode. One can use either TOE and TOE-based KTLS or NIC KTLS, but
not both at the same time.
2) In NIC KTLS mode, the TOE is only able to accept a per-connection
timestamp offset that varies in the upper 4 bits. Put another way,
only connections whose timestamp offset has the 28 lower bits
cleared can use NIC KTLS and generate correct timestamps. The
driver will refuse to enable NIC KTLS on connections with a
timestamp offset with any of the lower 28 bits set. To use NIC
KTLS, users can either disable TCP timestamps by setting the
net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the
tcp_new_ts_offset() function to clear the lower 28 bits of the
generated offset.
3) Because the TCP segmentation relies on fields mirrored in a TCB in
the TOE, not all fields in a TCP packet can be sent in the TCP
segments generated from a TLS record. Specifically, for packets
containing TCP options other than timestamps, the driver will
inject an "empty" TCP packet holding the requested options (e.g. a
SACK scoreboard) along with the segments from the TLS record.
These empty TCP packets are counted by the
dev.cc.N.txq.M.kern_tls_options sysctls.
Unlike TOE TLS which is able to buffer encrypted TLS records in
on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS
records for retransmit requests as well as non-retransmit requests
that do not include the start of a TLS record but do include the
trailer. The T6 NIC KTLS code tries to optimize some of the cases for
requests to transmit partial TLS records. In particular it attempts
to minimize sending "waste" bytes that have to be given as input to
the crypto engine but are not needed on the wire to satisfy mbufs sent
from the TCP stack down to the driver.
TCP packets for TLS requests are broken down into the following
classes (with associated counters):
- Mbufs that send an entire TLS record in full do not have any waste
bytes (dev.cc.N.txq.M.kern_tls_full).
- Mbufs that send a short TLS record that ends before the end of the
trailer (dev.cc.N.txq.M.kern_tls_short). For sockets using AES-CBC,
the encryption must always start at the beginning, so if the mbuf
starts at an offset into the TLS record, the offset bytes will be
"waste" bytes. For sockets using AES-GCM, the encryption can start
at the 16 byte block before the starting offset capping the waste at
15 bytes.
- Mbufs that send a partial TLS record that has a non-zero starting
offset but ends at the end of the trailer
(dev.cc.N.txq.M.kern_tls_partial). In order to compute the
authentication hash stored in the trailer, the entire TLS record
must be sent as input to the crypto engine, so the bytes before the
offset are always "waste" bytes.
In addition, other per-txq sysctls are provided:
- dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq
using AES-CBC.
- dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq
using AES-GCM.
- dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to
compensate for the TOE engine not being able to set FIN on the last
segment of a TLS record if the TLS record mbuf had FIN set.
- dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this
txq including full, short, and partial records.
- dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header
and payload) sent for TLS record requests.
- dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS
record requests.
To enable NIC KTLS with T6, set the following tunables prior to
loading the cxgbe(4) driver:
hw.cxgbe.config_file=kern_tls
hw.cxgbe.kern_tls=1
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21962