packet divert at kernel for IPv6/IPv4 translater daemon
This includes queue related patch submitted by jburkhol@home.com.
Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
the old one: an unnecessary define (KLD_MODULE) has been deleted and
the initialisation of the module is done after domaininit was called
to be sure inet is running.
Some slight changed were made to ip_auth.c and ip_state.c in order
to assure including of sys/systm.h in case we make a kld
Make sure ip_fil does nmot include osreldate in kernel mode
Remove mlfk_ipl.c from here: no sources allowed in these directories!
- Implement 'ipfw tee' (finally)
- Divert packets by calling new function divert_packet() directly instead
of going through protosw[].
- Replace kludgey global variable 'ip_divert_port' with a function parameter
to divert_packet()
- Replace kludgey global variable 'frag_divert_port' with a function parameter
to ip_reass()
- style(9) fixes
Reviewed by: julian, green
This results in closer behavior to earlier versions, where the fixed
200ms timer actually resulted in a delay anywhere from 1..200ms, with
the average delay being 100ms.
Pointed out by: dg
for IPv6 yet)
With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
to be dangerous. It will better serve us as a port building a KLD,
ala SKIP.
The hooks are staying although it would be better to port and use
the NetBSD pfil interface rather than have custom hooks.
the link are equal to the default aliasing address. Do not zero them!
This will fix the problem with non-working links added with the source
and/or aliasing address equal to the default aliasing address, but the
default aliasing address is set later, after the link has been set up,
like both natd(8) and ppp(8) do (for objective reasons).
Reviewed by: Brian Somers <brian@FreeBSD.org>,
Eivind Eklund <eivind@FreeBSD.org>,
Charles Mott <cmott@srv.net>
have been there in the first place. A GENERIC kernel shrinks almost 1k.
Add a slightly different safetybelt under nostop for tty drivers.
Add some missing FreeBSD tags
`dst_port') work for outgoing packets.
- Make permanent links whose `alias_addr' matches the primary aliasing
address `aliasAddress' work for incoming packets.
- Typo fixes.
Reviewed by: brian, eivind
Make a sonewconn3() which takes an extra argument (proc) so new sockets created
with sonewconn() from a user's system call get the correct credentials, not
just the parent's credentials.
In the words of originator:
:If an incoming connection is initiated through natd and deny_incoming is
:not set, then a new alias_link structure is created to handle the link.
:If there is nothing listening for the incoming connection, then the kernel
:responds with a RST for the connection. However, this is not processed
:correctly in libalias/alias.c:TcpMonitor{In,Out} and
:libalias/alias_db.c:SetState{In,Out} as it thinks a connection
:has been established and therefore applies a timeout of 86400 seconds
:to the link.
:
:If many of these half-connections are initiated (during, for example, a
:port scan of the host), then many thousands of unnecessary links are
:created and the resident size of natd balloons to 20MB or more.
PR: 13639
Reviewed by: brian
- eliminate the fast/slow timeout lists for TCP and instead use a
callout entry for each timer.
- increase the TCP timer granularity to HZ
- implement "bad retransmit" recovery, as presented in
"On Estimating End-to-End Network Path Properties", by Allman and Paxson.
Submitted by: jlemon, wollmann
using syslog(3) (log(9)) for its various purposes! This long-awaited
change also includes such nice things as:
* macros expanding into _two_ comma-delimited arguments!
* snprintf!
* more snprintf!
* linting and criticism by more people than you can shake a stick at!
* a slightly more uniform message style than before!
and last but not least
* no less than 5 rewrites!
Reviewed by: committers
drop any segment arriving at a closed port.
tcp.blackhole=1 - only drop SYN without RST
tcp.blackhole=2 - drop everything without RST
tcp.blackhole=0 - always send RST - default behaviour
This confuses nmap -sF or -sX or -sN quite badly.
sysctl knobs.
With these knobs on, refused connection attempts are dropped
without sending a RST, or Port unreachable in the UDP case.
In the TCP case, sending of RST is inhibited iff the incoming
segment was a SYN.
Docs and rc.conf settings to follow.
- Sort xrefs
- FreeBSD.ORG -> FreeBSD.org
- Be consistent with section names as outlines in mdoc(7)
- Other misc mdoc cleanup.
PR: doc/13144
Submitted by: Alexy M. Zelkin <phantom@cris.net>
with a match probability to achieve non-deterministic behaviour of
the firewall. This can be extremely useful for testing purposes
such as simulating random packet drop without having to use dummynet
(which already does the same thing), and simulating multipath effects
and the associated out-of-order delivery (this time in conjunction
with dummynet).
The overhead on normal rules is just one comparison with 0.
Since it would have been trivial to implement this by just adding
a field to the ip_fw structure, I decided to do it in a
backward-compatible way (i.e. struct ip_fw is unchanged, and as a
consequence you don't need to recompile ipfw if you don't want to
use this feature), since this was also useful for -STABLE.
When, at some point, someone decides to change struct ip_fw, please
add a length field and a version number at the beginning, so userland
apps can keep working even if they are out of sync with the kernel.
respectively logging and dropping ICMP REDIRECT packets.
Note that there is no rate limiting on the log messages, so log_redirect
should be used with caution (preferrably only for debugging purposes).
_or_ you may specify "log logamount number" to set logging specifically
the rule.
In addition, "ipfw resetlog" has been added, which will reset the
logging counters on any/all rule(s). ipfw resetlog does not affect
the packet/byte counters (as ipfw reset does), and is the only "set"
command that can be run at securelevel >= 3.
This should address complaints about not being able to set logging
amounts, not being able to restart logging at a high securelevel,
and not being able to just reset logging without resetting all of the
counters in a rule.
would make a difference. However, my previous diff _did_ change the
behavior in some way (not necessarily break it), so I'm fixing it.
Found by: bde
Submitted by: bde
exit on errors.
If we don't, in_pcbrehash() is called without a preceeding
in_pcbinshash(), causing a crash.
There are apparently several conditions that could cause the crash;
PR misc/12256 is only one of these.
PR: misc/12256
This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt. This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion. This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.
Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it. The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it. (Don't blame Jayanth
for this patch though)
An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened. However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code. There are other things that call pru_send()
directly though.
Problem originally noted by: John Plevyak <jplevyak@inktomi.com>
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.
cdevsw_add() will print an message if the d_maj field looks bogus.
Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.
Move bdevsw() and devsw() functions to kern/kern_conf.c
Bump __FreeBSD_version to 400006
This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions
if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.
I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.
Reformat and initialize correctly all "struct cdevsw".
Initialize the d_maj and d_bmaj fields.
The d_reset field was not removed, although it is never used.
I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.
Vinum and i4b not modified, patches emailed to respective authors.
(default 1) disables PMTUD globally. Although PMTUD can be disabled in
the standard case by locking the MTU on a static route (including the
default route), this method doesn't work in the face of dynamic routing
protocols like gated.
+ add a missing call to dn_rule_delete() when flushing firewall
rules, thus preventing possible panics due to dangling pointers
(this was already done for single rule deletes).
+ improve "usage" output in ipfw(8)
+ add a few checks to ipfw pipe parameters and make it a bit more
tolerant of common mistakes (such as specifying kbit instead of Kbit)
PR: kern/10889
Submitted by: Ruslan Ermilov
routines. The descriptor contains parameters which could be used
within those routines (eg. ip_output() ).
On passing, add IPPROTO_PGM entry to netinet/in.h
+ plug an mbuf leak when dummynet used with bridging
+ make prototype of dummynet_io consistent with usage
+ code cleanup so that now bandwidth regulation is precise to the
bit/s and not to (8*HZ) bit/s as before.
This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.
Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail
still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for
jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/
1:
s/suser/suser_xxx/
2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with
later.
There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
- unifdef -DCOMPAT_IPFW (this was on by default already)
- remove traces of in-kernel ip_nat package, it was never committed.
- Make IPFW and DUMMYNET initialize themselves rather than depend on
compiled-in hooks in ip_init(). This means they initialize the same
way both in-kernel and as kld modules. (IPFW initializes now :-)
the IP header (this would not work for bridged packets).
This has been fixed long ago in the 2.2 branch.
Problem noticed by: a few people
Fix suggested by: Remy Nonnenmacher
also rely less on other modules clearing static values, and clear them
in a few cases we missed before.
Submitted by: Matthew Reimer <mreimer@vpop.net>
Move the Olicom token ring driver to the officially sanctionned location of
/sys/contrib. Also fix some brokenness in the generic token ring support.
Be warned that if_dl.h has been changed and SOME programs might
like recompilation.
- Transparent proxying support added.
- PPTP redirecting support added based on patches
contributed by Dru Nelson <dnelson@redwoodsoft.com>.
Submitted by: Charles Mott <cmott@srv.net>
their ttl). This can be used - in combination with the proper ipfw
incantations - to make a firewall or router invisible to traceroute
and other exploration tools.
This behaviour is controlled by a sysctl variable (net.inet.ip.stealth)
and hidden behind a kernel option (IPSTEALTH).
Reviewed by: eivind, bde
This is for various Olicom cards. An IBM driver is following.
This patch also adds support to tcpdump to decode packets on tokenring.
Congratulations to the proud father.. (below)
Submitted by: Larry Lile <lile@stdio.com>
This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
convince myself that nothing will break if we permit IP input while
interface addresses are unconfigured. (At worst, they will hit some
ULP's PCB scan and fail if nobody is listening.) So, remove the restriction
that addresses must be configured before packets can be input. Assume
that any unicast packet we receive while unconfigured is potentially ours.
Divert was not feeding clean data to ifa_ifwithaddr() so it was
giving bad results.
Submitted by: kseel <kseel@utcorp.com>, Ruslan Ermilov <ru@ucb.crimea.ua>
This was missed in the 4.4-Lite2 merge.
Noticed by: Mohan Parthasarathy <Mohan.Parthasarathy@eng.Sun.COM> and
jayanth@loc201.tandem.com (vijayaraghavan_jayanth)
on the tcp-impl mailing list.
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.
Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's
fix for this problem.
state.
Note: this requires a recompilation of netstat (but netstat has been
broken since rev 1.52 of ip_mroute.c anyway)
Obtained from: Significantly based on Steve McCanne's
<mccanne@cs.berkeley.edu> work for BSD/OS
arplookup() to try again. This gets rid of at least one user's
"arpresolve: can't allocate llinfo" errors, and arplookup() gives
better error messages to help track down the problem if there really
is a problem with the routing table.
problems reported recently (the rtentry pointer in the dummynet
queue was not initialized in all cases, resulting in spurious
rt_refcnt decreases in the lucky cases, and memory trashing in
other cases.
have all fields in network order, whereas ipfw expects some to be
in host order. This resulted in some incorrect matching, e.g. some
packets being identified as fragments, or bandwidth not being
correctly enforced.
NOTE: this only affects bridge+ipfw, normal ipfw usage was already
correct).
Reported-By: Dave Alden and others.
Add bounds checking to netbios NS packet resolving code. This should
prevent natd from crashing on badly formed netbios packets (as might be
heard when the machine is sitting on a cable modem or certain DSL
networks), and also closes potential security holes that might have
exploited the lack of bounds checking in the previous version of the
code.
If timer calculation results in degenerate value (0), force it to 1
to avoid divide-by-zero panic later on in calls to IGMP_RANDOM_DELAY().
I considered simply adding 1 to the timer calculation, but was unsure
if the calculation was part of the IGMP standard or not so did not want
to mess with it for all cases.
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>
option not defined the sysctl int value is set to -1 and read-only.
#ifdef KERNEL's added appropriately to wall off visibility of kernel
routines from user code.
Add ICMP_BANDLIM option and 'net.inet.icmp.icmplim' sysctl. If option
is specified in kernel config, icmplim defaults to 100 pps. Setting it
to 0 will disable the feature. This feature limits ICMP error responses
for packets sent to bad tcp or udp ports, which does a lot to help the
machine handle network D.O.S. attacks.
The kernel will report packet rates that exceed the limit at a rate of
one kernel printf per second. There is one issue in regards to the
'tail end' of an attack... the kernel will not output the last report
until some unrelated and valid icmp error packet is return at some
point after the attack is over. This is a minor reporting issue only.
when a TCP "stealth" scan is directed at a *BSD box by ensuring the window
is 0 for all RST packets generated through tcp_respond()
Reviewed by: Don Lewis <Don.Lewis@tsc.tdk.com>
Obtained from: Bugtraq (from: Darren Reed <avalon@COOMBS.ANU.EDU.AU>)
This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
- Don't bother checking for conflicting sockets if we're binding to a
multicast address.
- Don't return an error if we're binding to INADDR_ANY, the conflicting
socket is bound to INADDR_ANY, and the conflicting socket has SO_REUSEPORT
set.
PR: kern/7713
addresses by default.
Add a knob "icmp_bmcastecho" to "rc.network" to allow this
behaviour to be controlled from "rc.conf".
Document the controlling sysctl variable "net.inet.icmp.bmcastecho"
in sysctl(3).
Reviewed by: dg, jkh
Reminded on -hackers by: Steinar Haug <sthaug@nethelp.no>
4.1.4. Experimental Protocol
A system should not implement an experimental protocol unless it
is participating in the experiment and has coordinated its use of
the protocol with the developer of the protocol.
Pointed out by: Steinar Haug <sthaug@nethelp.no>
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
several new features are added:
- support vc/vp shaping
- support pvc shadow interface
code cleanup:
- remove WMAYBE related code. ENI WMAYBE DMA doen't work.
- remove updating if_lastchange for every packet.
- BPF related code is moved to midway.c as it should be.
(bpfwrite should work if atm_pseudohdr and LLC/SNAP are
prepended.)
- BPF link type is changed to DLT_ATM_RFC1483.
BPF now understands only LLC/SNAP!! (because bpf can't
handle variable link header length.)
It is recommended to use LLC/SNAP instead of NULL
encapsulation for various reasons. (BPF, IPv6,
interoperability, etc.)
the code has been used for months in ALTQ and KAME IPv6.
OKed by phk long time ago.