/var/run resides on an NFS filesystem (flock() always returns 0 in
this case, so we falsely assume that ypbind is dead and bail out).
Settle instead for better failure checking when using clnttcp_create()
and clnt_call() to interact with ypbind. We still try to flock()
/var/yp/binding/$DOMAINNAME.2, but if this doesn't work, we drop into
the code that retrieves the binding information from ypbind directly.
If that also fails, then we're toast. On NFS filesystems, this means
we'll be ignoring the binding file for no reason and always talking to
ypbind even though we don't have to, but at least things will work.
(I could just replace the flock(/var/run/ypbind.lock) check with
an RPC call to ypbind's NULLPROC procedure, but if the flock() of
the binding file doesn't pan out we're going to try to talk to
ypbind later anyway. *sigh* Is NFS file locking ever going to work?)
of a successful map retrieval. (This has to do with a previous change
to xdr_ypresp_all_seq() and ypxfr_get_map(); originally, yp_all()
would look for a return value of YP_FALSE to signal success, but now
it should be looking for YP_NOMORE. It should not be passing YP_NOMORE
back up to the caller though.)
Noticed by: <aagero@aage.priv.no>
There is also another small bug here, which is that the call to
xdr_free() that happens immediately after the clnt_call() in yp_all()
clobbers the return status value. I've worked around this for now,
but I think the xdr_free() is actually bogus and should be removed.
I want to check some more before I do that though.
XDR routines auto-generated by rpcgen don't quite match the format of
the original ones even though tey have the same names (that was one of
the things wrong with the old XDR routines).
rpcgen-erated on the fly (just like librpcsvc).
Makefile: Add rule for generating yp_xdr.c and yp.h.
xdryp.c: gut everything except the special ypresp_all XDR function
needed to to handle yp_all() (this one can't be created on
the fly), and xdr_datum(), which isn't used internally by
libc, but which as documented as being there in yp_prot.h,
so what the hell. We now get everything else from yp_xdr.c.
yplib.c: change a few structure member names to match those found in
yp.h instead of those declared in yp_prot.h.
it before before trying to establish a binding. If /var/run/ypbind.lock
doesn't exist, or if it exists and isn't locked, then ypbind isn't
running, which means NIS is either turned off or hosed.
- Have _yp_check() call yp_unbind() after it sucessfully calls yp_bind()
to make sure it frees resources correctly. (I don't think there's really
a memory leak here, but it seems somehow wrong to call yp_bind() without
making a corresponding call to yp_unbind() afterwards.)
This makes the NIS code behave a little better in cases where libc makes
calls to NIS, but it isn't running correctly (i.e. there's no ypbind).
This cleans up some strange libc behavior that manifests itself if
you have the system domain name set, but aren't actually running NIS.
In this event, the getrpcent(3) code could try to call into NIS and
cause several inexplicable "clnttcp_create error: RPC program not
registered" messages to appear. This happens because _yp_check() checks
if the system domain name is set and, if it is, proceeds to call
yp_bind() to attempt to establish a binding. Since there is no
binding file (remember: ypbind isn't running, so /var/yp/binding
will be empty), _yp_dobind() will attempt to contact ypbind to
prod it into binding the domain. And because ypbind isn't running,
the code generates the 'clnttcp_create' error. Ultimately the
_yp_check() fails and the getrpcent(3) code rolls over to the /etc/rpc
file, but the error messages are annoying, and the code should be
smart enough to forgo the binding attempt when NIS is turned off.
on, which is fine, except that _yp_dobind() is called before we check
the cache. The means we can return from the cache check (if we have
a hit) without calling _yp_unbind().
We should do the cache check first and _then_ drop into the section
that binds the server and does the yp_match query.
Strange as it sounds, it should map to YPERR_DOMAIN instead.
The YP_NODOM protocol error code is generally returned by ypserv when you
ask it for data from a domain that it doesn't support. By contrast,
the YPERR_NODOM error code means 'local domain name not set.'
Consequently, this incorrect mapping leads to yperr_string() generating
a very confusing error message. YPERR_DOMAIN says 'couldn't
bind to a server which serves this domain' which is much closer
to the truth.
ypbind.c:
Make fewer assumtions about the state of the dom_alive and dom_broadcasting
flags in roc_received().
If select() fails, use syslog() to report the error rather than perror().
Check that all our malloc()s succeed. Report malloc() failure in
ypbindproc_setdom_2() to callers.
yplib.c:
Use #defined constants in ypbinderr_string() rather than hard-coded values.
- Moved to a more client-driven model. We aggressively attempt to keep
the default domain bound (as before) but we give up on non-default
domains if we lose contact with a server and fail to get a response
after one round of broadcasting. This helps drastically reduce the
amount of network bandwitdh that ypbind consumes: if a client references
the secondary domain at some later point, this will prod ypbind into
establishing a new binding anyway, so continuously broadcasting without
need is pointless.
Note that we still actively seek out a binding for our default domain
even if no client program has queried us yet. I'm not exactly sure if
this matches SunOS's behavior or not, but I decided to do it this way
since we can get into all sorts of trouble if our default domain comes
unbound. Even so, we're still much quieter than we used to be.
- Removed a bunch of no-longer pertinent comments and a couple of
chunks of #ifdef 0'ed code that no longer fit in to the new layout.
- Theo deRaadt must have become frustrated with the callback mechanism
in clnt_broadcast(), because he shamelessly stole the clnt_broadcast()
code right out of the RPC library and hacked it up to suit his needs.
(Comments and all! :)
I can understand why: clnt_broadcast() blocks while awaiting replies.
Changing this behavior requires surgery. However, you can work around
this: fork the broadcast into a child process and relay the results
back to the parent via a pipe. (Careful obervation has shown that the
SunOS ypbind forks children for broadcasting too, though I can only
guess what sort of interprocess communication it uses. pipe() seems to
do the job well enough.)
This may seem like the long way around, but it's not really that
hard to implement, and I'd prefer to use documented RPC library functions
wherever possible. We're careful to limit the number of simultaneous
broadcasters to avoid swamping the system (the current limit is 5).
Each clnt_broadcast() call only sends out a small number of packets
at increasing intervals. We're also careful not to spawn more than one
bradcaster for a given domain.
- Used clntudp_bufcreate() and clnt_call() to implement a ping()
function for directly querying a particular server so that we can
check if it's still alive. This lets me completely remove the old
bradcasting code and use actual RPC library calls instead, at the
cost of more than a few handfulls of torn-out hair. (Make no mistake
folks: I *HATE* RPC.) Currently, the ping interval is one minute.
- Fixed another potential 'nfds too big for select()' bug: use
_rpc_dtablesize() instead of getdtablesize().
- Quieted gcc -Wall a bit.
- Probably a bunch of other stuff that I've forgotten.
ypbind.8:
- Updated man page to reflect modifications.
ypwhich.c:
- Small mind-o fix from last time: decode error results from
ypbind correctly (*groan*)
yplib.c:
- same as above
- Change behavior of _yp_dobind() a little: if we get back a 'Domain
not bound' error for a given domain, retry a few times before giving
up and passing the error back to the caller. We have to sleep for a
few seconds between tries since the 'Domain not bound' error comes
back immediately (by repeatedly looping, we end up pounding on ypbind).
We retry at most 20 times at 5 second intervals. This gives us a full
minute to get a response. This seems to deviate a bit from SunOS
behavior -- it appears to wait forever -- but I don't like the idea
of perpetually hanging inside a library call.
Note that this should fix the problems some people have with bindings
not being established fast enough at boot time; sometimes amd is started
in /etc/rc after ypbind has run but before it gets a binding set up. The
automounter gets annoyed at this and tends to exit. By pausing ther YP
calls until a binding is ready, we avoid this situation.
- Another _yp_dobind() change: if we determine that our binding files
are unlocked or nonexistent, jump directly to code that pokes ypbind
into restablishing the binding. Again, if it fails, we'll time out
eventually and return.
ypbind.c: if a client program asks ypbind for the name of the server
for a particular domain, and there isn't a binding for that domain
available yet, ypbind needs to supply a status value along with its
failure message. Set yprespbody.ypbind_error before returning from
a ypbindproc_domain request.
yplib.c: properly handle the error status messages ypbind now has the
ability to send us. Add a ypbinderr_string() function to decode the
error values.
ypwhich.c: handle ypbind errors correctly: yperr_string() can't handle
ypbind_status messages -- use ypbinderr_string instead.
where one or more of the non-default domains are not yet bound.
If we make a YP request for a domain other than the default domain,
and there is no binding for the new domain yet, _yp_dobind() sees
that the /var/yp/binding/DOMAIN.VERS file for the unbound domain is
not locked (by ypbind) and from this it concludes that the NIS system
is dead, so it gives up.
This behavior has been changed: before giving up in this case, we now
make a second check to see if the binding file for the *default* domain
is also not locked. Only if the default domain binding file is also
unlocked to we now assume that ypbind has bought the farm and bail out.
(Note: this assumes that the user hasn't changed the default domain
while ypbind is running.)
With this change, _do_ypbind() is allowed to proceed into the next
section of code wherein it prods ypbind into establishing a binding
for the new domain. This first call times out after ten seconds,
after which it should retry and succeed. From then on, the binding
for the second domain should be handled normally.
Make sure all arguments to the yp_*() functions are valid before sending
them off to the server. This is somewhat distressing: once again my
FreeBSD box brought down my entire network because of NIS bogosities.
I *think* the poor argument checking in this module is the cause, but
I still haven't been able to reproduce the exact series of events that
lead to the ypserv crashes. For now I've resorted to sticking my FreeBSD
box in a seprate domain. Hopefully a weekend of heavy testing will
uncover the problem.
ypserv to do a yp_match() with an a null or empty key causes much havok.
(Note that this could be construed as a denial of service attack if used
maliciously.)
Submitted by: Sebastian Strollo <seb@erix.ericsson.se>
- In /usr/src/lib/libc/yp/yplib.c, function yp_first when clnt_call
fails with (r != RPC_SUCCESS) ysd->dom_vers should be set to 0! This
ensures that /var/yp/bindings/dom.vers will be read again on retry.
What happens now is that when our server is down and someone tries to
use yp they will continue to try until kingdom come. So:
if(r != RPC_SUCCESS) {
clnt_perror(ysd->dom_client, "yp_first: clnt_call");
ysd->dom_vers = -1;
^^^^ change to 0
goto again;
}