mpf are allocated on the stack, which causes this check to falsely trigger.
A new check which takes on-stack mbufs into account will be reintroduced
after 5.2 is out the door.
Approved by: re (watson)
Requested by: many
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.
It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.
tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.
It removes significant locking requirements from the tcp stack with
regard to the routing table.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
accordingly. The define is left intact for ABI compatibility
with userland.
This is a pre-step for the introduction of tcp_hostcache. The
network stack remains fully useable with this change.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Short description of ip_fastforward:
o adds full direct process-to-completion IPv4 forwarding code
o handles ip fragmentation incl. hw support (ip_flow did not)
o sends icmp needfrag to source if DF is set (ip_flow did not)
o supports ipfw and ipfilter (ip_flow did not)
o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
o returns anything it can't handle back to normal ip_input
Enable with sysctl -w net.inet.ip.fastforwarding=1
Reviewed by: sam (mentor)
be printed, if the module were loaded into a kernel which had INET6 enabled.
The gre(4) driver does not use INET6, nor is it specified for IPv6. The
tunnel_status() function in ifconfig(8) is somewhat overzealous and assumes
that all tunnel interfaces speak KAME ifioctls.
This fix follows the path of least resistance, by teaching gre(4) about
the two KAME ifioctls concerned.
PR: bin/56341
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.
This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.
While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.
NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.
Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
o move it from subr_bus.c to netisr.c where it more properly belongs
o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to
grab Giant as needed when MPSAFE operation is enabled
Supported by: FreeBSD Foundation
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).
Supported by: FreeBSD Foundation
direct dispatch) to avoid extensive kernel stack usage and to
avoid directly re-entering the network stack. The latter causes
locking problems when, for example, a complete TCP handshake`
happens w/o a context switch.
with an mbuf until it is reclaimed. This is in contrast to tags that
vanish when an mbuf chain passes through an interface. Persistent tags
are used, for example, by MAC labels.
Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp. This fixes problems
with "tag leakage".
Pointed out by: Jonathan Stone
Reviewed by: Robert Watson
a minor conflict):
o Use ETHER_ADDR_LEN in preference to '6'.
o Remove two unnecessary (caddr_t) casts. One of them causes problems in
my tree where etherbroadcastaddr is const, and (caddr_t) casts the const
away.
is non-free. (More checks can/should be added in the future.)
Use M_ASSERTVALID in BPF_MTAP so that we catch when freed mbufs are
passed in, even if no bpf listeners are active.
Inspired by a bug in if_dc caught by Kenjiro Cho.
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)
any queued packets for the isr, process those packets before the newly
submitted packet, maintaining ordering of all packets being delivered
to the netisr. Remove the bypass counter since we don't bypass anymore.
Leave the comment about possible problems and options since later
performance optimization may change the strategy for addressing ordering
problems here.
Specifically, this maintains the strong isr ordering guarantee; additional
parallelism and lower latency may be possible by moving to weaker
guarantees (per-interface, for example). We will probably at some point
also want to remove the one instance netisr dispatch limit currently
enforced by a mutex, but it's not clear that's 100% safe yet, even in
the netperf branch.
Reviewed by: sam, others
o move route_cb to be private to rtsock.c
o replace global static route_proto by locals
o eliminate global #define shorthands for info references
o remove some register decls
o ansi-fy function decls
o move items to be close in scope to their usage
o add rt_dispatch function for dispatching the actual message
o cleanup tangled logic for doing all-but-me msg send
Support by: FreeBSD Foundation
(direct dispatch) in interrupt threads when the netisr in question
isn't already active. If a netisr is already active, or direct
dispatch is already in progress, we queue the packet for later
delivery. Previously, this option was disabled by default. I have
measured 20%+ performance improvements in IP packet forwarding with
this enabled.
Please report any problems ASAP, especially relating to stack depth or
out-of-order packet processing.
Discussed with: jlemon, peter
Sponsored by: DARPA, Network Associates Laboratories
then the mbuf has been consumed by a hook; otherwise beware of a null
mbuf return (gack). In particular the bridge was doing the wrong thing.
While in the ipv6 code make it's handling of pfil_run_hooks identical
to netbsd.
Pointed out by: Pyun YongHyeon <yongari@kt-is.co.kr>
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.
If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.
Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.
The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules
Heavy lifting by: "Max Laier" <max@love2party.net>
Supported by: FreeBSD Foundation
Obtained from: NetBSD (bits of pfil.h and pfil.c)
o replace magic constants with #defines (e.g. ETHER_ADDR_LEN)
o move mib variables to net.link.ether.bridge with backwards compatible
entries for well-known items maintained under BURN_BRIDGES
o revamp debugging support so it is conditioanlly compiled with BRIDGE_DEBUG
(on currently) and runtime controlled by net.link.ether.bridge.debug
o change timeout to MPSAFE callout
o optimize lookup for common case of two interfaces
o optimize forwarding path to take IFNET lock only when needed
o make boot-time printf dependent on bootverbose
o sundry style changes (ANSI decls, extraneous spaces, etc.)
Sponsored by: FreeBSD Foundation
to protect the vlan state in each ifnet (e.g. vlan count). The latter is
probably better handled through an ifnet-centric means but since changes
are infrequent shouldn't matter for now.
Sponsored by: FreeBSD Foundation
parts of the system about certain kinds of events, like changes
in the ABR rate, changes in the carrier state, PVC changes. The
main consumers of these events are the harp(4) pseudo-driver
and the ILMI daemon via ng_atm(4).
implement the ATMIOCGVCCS ioctls. This routine handles changing
VCC tables (which can occure because we cannot hold the driver mutex
while allocating memory) with a loop and a re-allocation, should the
table not fit in the allocated memory.
from the network interface earlier in ether_input(). At some point
(no fingers pointed), things were restructured and the labeling operation
moved later. This wasn't a problem as BPF_MTAP() relies on the ifnet
label not the mbuf label, but there might have been other problems.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
extracted from received frames, both in the IFCAP_VLAN_HWTAGGING case
and not. (Some drivers may already do this masking internally, but
doing it here doesn't hurt and insures consistency.)
- In vlan_ioctl(), don't let the user set a VLAN ID value with anything
besides the VLID bits set, otherwise we will have trouble matching
an interface in vlan_input() later.
PR: kern/46405
insertion and extraction) has revealed two bugs:
- In vlan_start(), we're supposed to check the underlying interface to
see if it has the IFCAP_VLAN_HWTAGGING cabability set and, if so, set
things up for the VLAN_OUTPUT_TAG() routine. However the code checks
ifp->if_capabilities, which is the vlan pseudo-interface's capabilities
when it should be checking p->if_capabilities, which relates to the
underlying physical interface. Change ifp->if_capabilities to
p->if_capabilities so this works.
- In vlan_input(), we have to extract the 16-bit tag value from the
received frame and use it to figure out which vlan interface gets
the frame. The code that we use to track down the desired vlan
pseudo-interface is:
for (ifv = LIST_FIRST(&ifv_list); ifv != NULL;
ifv = LIST_NEXT(ifv, ifv_list))
if (ifp == ifv->ifv_p && tag == ifv->ifv_tag)
break;
The problem is that 'tag' is not computed consistently. In the case
where the interface supports hardware VLAN tag extraction and calls
VLAN_INPUT_TAG(), we do this:
tag = *(u_int*)(mtag+1);
But in the software emulation case, we do this
tag = EVL_VLANOFTAG(ntohs(evl->evl_tag));
The problem here is the EVL_VLANOFTAG() macro is only ever applied
in this one case. It's never applied to ifv->ifv_tag or anwhere else.
We must be consistent: either it's applied everywhere or nowhere.
To see how this can be a problem, do something like
ifconfig vlan0 vlan 12345 vlandev foo0 and observe the results.
I'm not quite sure what the right thing is to do here. Neither the
vlan(4) nor ifconfig(8) man pages suggest which way to go. For now,
I've removed this use of EVL_VLANOFTAG() so that the tag will match
correctly in all cases. I will not get upset if somebody makes a
compelling argument for using EVL_VLANOFTAG() everywhere instead,
as long as the use is consistent.
on friday 13th and without making a universe). This adds struct and
constant definitions for ATM traffic parameters and re-enables the
build of the midway driver.
Tested by: make universe
function couldn't handle chains of > MCLBYTES, and it had a bug which
caused corruption and panics in certain low mbuf situations.
Additionally, change the failure case so that looutput returns ENOBUFS
rather than attempting to pass on non-defragmented mbuf chains.
Finally, remove the printf which would happen every time the low memory
situation occured. It served no useful purpose other than to clue me
in as to what was causing the panic in question. :)
MFC after: 4 days
ILMI daemons. Factor out common softc fields for all ATM interfaces that
need to be externally visible into an ifatm structure and make the midway
driver using this structure and fill the MIB.
be changed, it is very convenient to be able to toggle SDH/Sonet,
idle/unassigned cells and scrambled mode and to see the carrier
state.
Reviewed by: -arch (if_media.h definitions)
(currently) only consumer (en).
Add a sysctl node hw.atm where the atm drivers will hook on their hardware
sysctl sub-trees.
Make atm_ifattach call if_attach and remove the corresponding call to if_attach
from en. Create atm_ifdetach and use that in en.
While the last change actually changes the interface this is not a problem in
practice because the only other consumer of this API is an older LANAI driver
on the net, that is not ready for current anyway.
Reviewed by: -atm
11a/b/g by adding an optional 3-bit mode field
o correct the spelling of OFDM (was ODFM)
o add an 802.11 subtype option for turbo mode: the phy is clocked at 2x the
normal clock rate; note this can be applied to both OFDM in 11a and OFDM
in 11g mode (and possibly DS11 in 11b for certain phy's)
o add 802.11 CCK aliases for 11b/11g rates--the more common terminology
returning some additional room in the first mbuf in a chain, and
avoiding feature-specific contents in the mbuf header. To do this:
- Modify mbuf_to_label() to extract the tag, returning NULL if not
found.
- Introduce mac_init_mbuf_tag() which does most of the work
mac_init_mbuf() used to do, except on an m_tag rather than an
mbuf.
- Scale back mac_init_mbuf() to perform m_tag allocation and invoke
mac_init_mbuf_tag().
- Replace mac_destroy_mbuf() with mac_destroy_mbuf_tag(), since
m_tag's are now GC'd deep in the m_tag/mbuf code rather than
at a higher level when mbufs are directly free()'d.
- Add mac_copy_mbuf_tag() to support m_copy_pkthdr() and related
notions.
- Generally change all references to mbuf labels so that they use
mbuf_to_label() rather than &mbuf->m_pkthdr.label. This
required no changes in the MAC policies (yay!).
- Tweak mbuf release routines to not call mac_destroy_mbuf(),
tag destruction takes care of it for us now.
- Remove MAC magic from m_copy_pkthdr() and m_move_pkthdr() --
the existing m_tag support does all this for us. Note that
we can no longer just zero the m_tag list on the target mbuf,
rather, we have to delete the chain because m_tag's will
already be hung off freshly allocated mbuf's.
- Tweak m_tag copying routines so that if we're copying a MAC
m_tag, we don't do a binary copy, rather, we initialize the
new storage and do a deep copy of the label.
- Remove use of MAC_FLAG_INITIALIZED in a few bizarre places
having to do with mbuf header copies previously.
- When an mbuf is copied in ip_input(), we no longer need to
explicitly copy the label because it will get handled by the
m_tag code now.
- No longer any weird handling of MAC labels in if_loop.c during
header copies.
- Add MPC_LOADTIME_FLAG_LABELMBUFS flag to Biba, MLS, mac_test.
In mac_test, handle the label==NULL case, since it can be
dynamically loaded.
In order to improve performance with this change, introduce the notion
of "lazy MAC label allocation" -- only allocate m_tag storage for MAC
labels if we're running with a policy that uses MAC labels on mbufs.
Policies declare this intent by setting the MPC_LOADTIME_FLAG_LABELMBUFS
flag in their load-time flags field during declaration. Note: this
opens up the possibility of post-boot policy modules getting back NULL
slot entries even though they have policy invariants of non-NULL slot
entries, as the policy might have been loaded after the mbuf was
allocated, leaving the mbuf without label storage. Policies that cannot
handle this case must be declared as NOTLATE, or must be modified.
- mac_labelmbufs holds the current cumulative status as to whether
any policies require mbuf labeling or not. This is updated whenever
the active policy set changes by the function mac_policy_updateflags().
The function iterates the list and checks whether any have the
flag set. Write access to this variable is protected by the policy
list; read access is currently not protected for performance reasons.
This might change if it causes problems.
- Add MAC_POLICY_LIST_ASSERT_EXCLUSIVE() to permit the flags update
function to assert appropriate locks.
- This makes allocation in mac_init_mbuf() conditional on the flag.
Reviewed by: sam
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
- Support IFF_MONITOR.
- Borrow some consistency for if_input() routines from if_ethersubr.c.
- Correct comments regarding fddi_input() that no longer apply.
frames. A comment in if_atm.h suggests that both macros, that for extracting
the ethertype and that for inserting it, handle their argument in host
byte order. In fact, the inserting macro treated its argument as an opposite
host order short and the calling code feeds it the result of htons(). This
happens to work on i386, but fails on sparc. Make the macro use real host
endianess.
Reviewed by: kjc, atm@
is more robust and prevents the hijacking of /dev/console for the typical
mistake.
Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the
driver source has cross-branch compatibility to old releases.
drain routines are done by swi_net, which allows for better queue control
at some future point. Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.
Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
warning which breaks builds.
cc1: warnings being treated as errors
src/sys/net/bridge.c: In function `bdg_forward':
sys/net/bridge.c:931: warning: suggest parentheses around assignment used as truth value
*** Error code 1
IP fast forwarding, SIOCGIFADDR, setting hardware address (not currently
enabled in cm driver), multicasts (experimental)
- add ARC_MAX_DATA, use IF_HANDOFF, remove arc_sprintf() and some unused
variables
- if_simloop logic is made more similar to ethernet
- drop not ours packets early (if we are not in promiscous mode)
Submitted by: mark tinguely (partially)
parent device, if there is a parent configured. Modify the result
returned by the parent to indicate that the only supported media
is the currently configured one.
Reviewed by: brooks
and set the link type for use by libpcap and tcpdump
o move mtx unlock in bpfdetach up; it doesn't need to be held so long
o change printf in bpf_detach to distinguish it from the same one in bpfsetdlt
Note there are locking issues here related to ioctl processing; they
have not been addressed here.
Submitted by: Guy Harris <guy@alum.mit.edu>
Obtained from: NetBSD (w/ locking modifications)
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"
Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.
No functional change by this commit.
state machine to provide station and host ap functionality for drivers.
More work will follow to split out the state machine and protocol
support from the ioctl interfaces to ease portability/sharing with
NetBSD and forthcoming ports to other systems.
Reviewed by: imp
Obtained from: NetBSD (originally)
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a "copy" operation. This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain. This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.
These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block. This
introduces an incompatibility with openbsd which we may want to revisit.
Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them. We may want to support this;
for now we watch for it with an assert.
Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.
Supported by: Vernier Networks
Reviewed by: Robert Watson <rwatson@FreeBSD.org>
revision 1.62. It was checking for M_PREPEND() failing, not for the
case of a NULL mbuf pointer being supplied to the macro. Back out
that revision, and fix the NULL dereference by not calling EH_RESTORE()
in the case where the mbuf pointer is NULL because the firewall
rejected the packet.