1
0
mirror of https://git.FreeBSD.org/src.git synced 2025-01-06 13:09:50 +00:00
Commit Graph

11897 Commits

Author SHA1 Message Date
Sean Farley
4bc1fa7662 Have the man page catch up with the namespace pollution cleanup that
occurred between 2001-2003.  Thanks to bde for the history lesson[1]
concerning sys/types.h and the many system calls that at one time
(pre-2001) were required by POSIX to include it.

1. http://lists.freebsd.org/pipermail/freebsd-arch/2008-April/008126.html

MFC after:	3 days
2008-04-26 02:33:53 +00:00
Ruslan Ermilov
eff93c8073 Stricter check for integer overflow. 2008-04-24 07:49:00 +00:00
Marcel Moolenaar
236ee032b5 Add support for gpart:
o  Correct for gpart's 1-based index, versus 0-based index used by
   legacy slicers.
o  Parse and understand the xs and xt parameters.
2008-04-24 00:11:15 +00:00
Xin LI
d0aa4fd3ca Avoid various shadowed variables. libthr is now almost WARNS=4 clean except
for some const dequalifiers that needs more careful investigation.

Ok'ed by:	davidxu
2008-04-23 21:06:51 +00:00
Jason Evans
e5bf0d71c9 Implement red-black trees without using parent pointers, and store the
color bit in the least significant bit of the right child pointer, in
order to reduce red-black tree linkage overhead by ~2X as compared to
sys/tree.h.

Use the new red-black tree implementation in malloc, which drops
memory usage by ~0.5 or ~1%, for 32- and 64-bit systems, respectively.
2008-04-23 16:09:18 +00:00
Marcel Moolenaar
727b08eb66 Correct an off-by-1 for GPART. The literal partition type (i.e.
the actual UUID) is prefixed by '!' to distinguish them from
well-known aliases.

MFC after: 3 days
2008-04-23 03:00:26 +00:00
Sean Farley
0b5e889911 Add four utility functions related to struct grp processing modeled in-part
after similar calls related to struct pwd in libutil/pw_util.c:
  - gr_equal()
    Perform a deep comparison of two struct grp's.  It does a thorough, yet
    unoptimized comparison of all the members regardless of order.

  - gr_make()
    Create a string (see group(5)) from a struct grp.

  - gr_dup()
    Duplicate a struct grp.  Returns a value that is a single contiguous
    block of memory.

  - gr_scan()
    Create a struct grp from a string (as produced by gr_make()).

MFC after:	3 weeks
2008-04-23 00:49:13 +00:00
John Baldwin
bc669a8c33 Fix a leak in the recent fixes for file descriptors > SHRT_MAX. In the
case of a file descriptor we can't handle, clear the FILE structure's flags
so it can be reused.

MFC after:	1 week
Reported by:	otto @ OpenBSD
2008-04-22 17:03:32 +00:00
David Xu
fb2641d9b1 Use native rwlock. 2008-04-22 06:44:11 +00:00
Antoine Brodin
88ff5136d1 Document that you must include <sys/param.h> before <sys/cpuset.h>.
Approved by:	rwatson (mentor)
2008-04-20 15:51:56 +00:00
Ruslan Ermilov
5b30d6ca77 Don't forget to free() currency_symbol and asciivalue when multiple
conversion specifiers for them are present.

Submitted by:	Maxim Dounin <mdounin@mdounin.ru>
Obtained from:	NetBSD (partially)
MFC after:	3 days
2008-04-19 07:22:58 +00:00
Ruslan Ermilov
3890416f9c Better strfmon(3) conversion specifiers sanity checking.
There were no checks for left and right precisions at all, and
a check for field width had integer overflow bug.

Reported by:	Maksymilian Arciemowicz
Security:	http://securityreason.com/achievement_securityalert/53
Submitted by:	Maxim Dounin <mdounin@mdounin.ru>
MFC after:	3 days
2008-04-19 07:18:22 +00:00
John Baldwin
1e98f88776 Next stage of stdio cleanup: Retire __sFILEX and merge the fields back into
__sFILE.  This was supposed to be done in 6.0.  Some notes:
- Where possible I restored the various lines to their pre-__sFILEX state.
- Retire INITEXTRA() and just initialize the wchar bits (orientation and
  mbstate) explicitly instead.  The various places that used INITEXTRA
  didn't need the locking fields or _up initialized.  (Some places needed
  _up to exist and not be off the end of a NULL or garbage pointer, but
  they didn't require it to be initialized to a specific value.)
- For now, stdio.h "knows" that pthread_t is a 'struct pthread *' to
  avoid namespace pollution of including all the pthread types in stdio.h.
  Once we remove all the inlines and make __sFILE private it can go back
  to using pthread_t, etc.
- This does not remove any of the inlines currently and does not change
  any of the public ABI of 'FILE'.

MFC after:	1 month
Reviewed by:	peter
2008-04-17 22:17:54 +00:00
Xin LI
6fda52ba75 Implement fdopendir(3) by splitting __opendir2() into two parts, the upper part
deals with the usual __opendir2() calls, and the rest part with an interface
translator to expose fdopendir(3) functionality.  Manual page was obtained from
kib@'s work for *at(2) system calls.
2008-04-16 18:59:36 +00:00
Xin LI
f6386c2536 Style fixes to opendir.c:
- Use /*- for copyright block;
 - ANSIfy.
2008-04-16 18:40:52 +00:00
Ruslan Ermilov
96e5e69a4a Sort MAN and MLINKS. 2008-04-16 14:57:40 +00:00
Ruslan Ermilov
878f6086e3 Connect newly added manpages to the build.
Submitted by:	kib
2008-04-16 14:44:43 +00:00
Konstantin Belousov
a141af6930 Man pages for the openat(2), fexecve(2) and related syscalls.
Reviewed by:	ru
2008-04-16 13:03:12 +00:00
Warner Losh
abe458f391 Doh! Extra mips in the path. Remove these and wait until tomorrow
when I have more brain cells to try again.
2008-04-16 05:11:25 +00:00
Warner Losh
6afe466807 Turns out the machine/asm.h isn't needed here, since SYS.h already
included it.
2008-04-16 05:08:27 +00:00
Warner Losh
69e1fc6e80 FreeBSD/mips libc support. Merged from perforce mips2-jnpr branch. 2008-04-16 05:06:11 +00:00
David Xu
6d9517bc9f _vfork is not in libthr, remove the reference. 2008-04-16 03:19:11 +00:00
Colin Percival
fc2841a92f Fix one-byte buffer overflow: NUL gets written to the buffer, but isn't
counted in the width specification in scanf.

This is not a security problem, since this function is only used to
parse a user's configuration file.

Submitted by:	Joerg Sonnenberger
Obtained from:	dragonflybsd
MFC after:	1 week
2008-04-15 23:29:51 +00:00
David Xu
d61f3de656 Implement POSIX function tcgetsid() which returns session id.
PR: stand/107561
2008-04-15 08:33:32 +00:00
David Xu
fa4b421a7a don't include pthread_np.h, it is not used. 2008-04-14 08:08:40 +00:00
Xin LI
92226c92f3 Use calloc() instaed of zeroing memory ourselves. 2008-04-13 08:05:08 +00:00
David Schultz
77fab5a8eb Unbreak the build for arm and powerpc.
Pointy hat to yours truly.
2008-04-12 14:53:52 +00:00
David Schultz
e058c00c40 Updates for changes in the way printf() handles hex floating point
numbers.
2008-04-12 03:11:56 +00:00
David Schultz
76303a9735 Make several changes to the way printf handles hex floating point (%a):
1. Previously, printing the number 1.0 could produce 0x1p+0, 0x2p-1,
   0x4p-2, or 0x8p-3, depending on what happened to be convenient. This
   meant that printing a value as a double and printing the same value
   as a long double could produce different (but equivalent) results.
   The change is to always make the leading digit a 1, unless the
   number is 0. This solves the aforementioned problem and has
   several other advantages.

2. Use the FPU to do rounding. This is far simpler and more portable
   than manipulating the bits, and it fixes an obsure round-to-even
   bug. It also raises the exceptions now required by IEEE 754R.
   The drawbacks are that it is usually slightly slower, and it makes
   printf less effective as a debugging tool when the FPU is hosed
   (e.g., due to a buggy softfloat implementation).

3. On i386, twiddle the rounding precision so that (2) works properly
   for long doubles.

4. Make several simplifications that are now possible due to (2).

5. Split __hldtoa() into a separate file.

Thanks to remko for access to a sparc64 box for testing.
2008-04-12 03:11:36 +00:00
David Schultz
10a465e525 Fix some bugs that caused sparc64's quad precision sqrt to get
the wrong answer for virtually all inputs.

Thanks to remko for access to a sparc64 box for testing.
2008-04-12 03:10:13 +00:00
David Schultz
a9d5aa6aeb Make the software emulator for long doubles set the FPU exception
flags appropriately. The next step is to make it raise a SIGFPE if
any exceptions are unmasked.

Thanks to remko for access to a sparc64 box for testing.
2008-04-12 03:09:51 +00:00
Xin LI
82e45205c8 Add memrchr(3).
Obtained from:	OpenBSD
2008-04-10 00:12:44 +00:00
Daniel Eischen
fc9299dd1b Move the cpuset functions from FBSD_1.0 to FBSD_1.1. All symbols added
to 8.0 belong in the FBSD_1.1 symbol namespace.
2008-04-07 13:53:51 +00:00
Doug Rabson
472f4537e6 On i386, don't try to do network-type stuff if the device name is'nt pxeN. 2008-04-05 15:03:29 +00:00
Doug Rabson
aea15cbc62 Add some compatibility code so that software which is built to use the new
struct flock with l_sysid member can work properly on an an old kernel which
doesn't support l_sysid.

Sponsored by:	Isilon Systems
2008-04-04 09:43:03 +00:00
Warner Losh
22e5baf782 Minor style(9) nit: move to using ANSI definition of functions. 2008-04-03 20:36:44 +00:00
Ruslan Ermilov
c3ee8ebcbc Fix descriptions of "struct msqid_ds and "struct ipc_perm" to match
harsh reality.
2008-04-03 16:21:43 +00:00
David Schultz
92a1a6e169 Fix some corner cases:
- fma(x, y, z) returns z, not NaN, if z is infinite, x and y are finite,
  x*y overflows, and x*y and z have opposite signs.
- fma(x, y, z) doesn't generate an overflow, underflow, or inexact exception
  if z is NaN or infinite, as per IEEE 754R.
- If the rounding mode is set to FE_DOWNWARD, fma(1.0, 0.0, -0.0) is -0.0,
  not +0.0.
2008-04-03 06:14:51 +00:00
David Xu
caad30a422 put THR_CRITICAL_LEAVE into do .. while statement. 2008-04-03 02:47:35 +00:00
Kevin Lo
6cec2e4b55 style(9) cleanup 2008-04-03 02:41:54 +00:00
David Xu
a6cba9400a add __hidden suffix to _umtx_op_err, this eliminates PLT. 2008-04-03 02:13:51 +00:00
David Xu
7abb97dcd8 Non-portable functions are in pthread_np.h, fix compiling problem. 2008-04-02 11:41:12 +00:00
David Xu
7a30bcf04b Add pthread_setaffinity_np and pthread_getaffinity_np to libc namespace. 2008-04-02 08:53:18 +00:00
David Xu
8b873a2328 Remove unused functions. 2008-04-02 08:33:42 +00:00
David Xu
d6e0eb0a48 Replace function _umtx_op with _umtx_op_err, the later function directly
returns errno, because errno can be mucked by user's signal handler and
most of pthread api heavily depends on errno to be correct, this change
should improve stability of the thread library.
2008-04-02 07:41:25 +00:00
David Xu
8bf1a48cb3 Replace userland rwlock with a pure kernel based rwlock, the new
implementation does not switch pointers when it resumes waiters.

Asked by: jeff
2008-04-02 04:32:31 +00:00
David Xu
ad4a96ba13 Normally, we are often reading local time rather than setting time zone,
replace mutex with rwlock, this should eliminate lock contention in
most cases.
2008-04-01 06:56:11 +00:00
David Xu
18967c1918 Restore normal pthread_cond_signal path to avoid some obscure races. 2008-04-01 06:23:08 +00:00
David Xu
f5bc4f9930 return EAGAIN early rather than running bunch of code later, micro optimize
static branch prediction.
2008-04-01 00:21:49 +00:00
David Schultz
8087c515ab Remove a (bogus) remnant of debugging this on sparc64. 2008-03-31 13:11:45 +00:00
Konstantin Belousov
ba2983e5b3 Add the libc glue and headers definitions for the *at() syscalls.
Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:14:04 +00:00
Tim Kientzle
4b7d286a5b Include an extra byte for the trailing NUL. <sigh>
Pointy hat: Me
2008-03-31 06:24:39 +00:00
David Xu
5ab512bb8e Rewrite rwlock to user atomic operations to change rwlock state, this
eliminates internal mutex lock contention when most rwlock operations
are read.

Orignal patch provided by: jeff
2008-03-31 02:55:49 +00:00
David Schultz
074fb64d9a Add assembly versions of remquol() and remainderl(). 2008-03-30 21:21:53 +00:00
David Schultz
c7392feecc Hook remquol() and remainderl() up to the build. 2008-03-30 20:48:02 +00:00
David Schultz
a2e5f27559 Implement remainderl() as a wrapper around remquol(). The extra work
remquol() performs to compute the quotient is negligible.
2008-03-30 20:47:42 +00:00
David Schultz
cef56f9d6d Implement remquol() based on remquo(). 2008-03-30 20:47:26 +00:00
David Schultz
511dd36b32 Implement csqrtl(). 2008-03-30 20:07:15 +00:00
David Schultz
84c1c0a1ca Hook hypotl() and cabsl() up to the build. 2008-03-30 20:03:46 +00:00
David Schultz
01a13522ad Document hypotl().
Submitted by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
2008-03-30 20:03:29 +00:00
David Schultz
a641fc76eb Alias hypotl() and cabsl() for platforms where long double is the same
as double.
2008-03-30 20:03:06 +00:00
David Schultz
2264157a42 Implement cabsl() in terms of hypotl().
Submitted by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
2008-03-30 20:02:03 +00:00
David Schultz
d23166b015 Implement hypotl(). This is bde's conversion of fdlibm hypot(), with minor
fixes for ld128 by me.
2008-03-30 20:01:50 +00:00
Bruce Evans
42ee187c3c Use fabs[f]() instead of bit fiddling for setting absolute values.
This makes little difference in float precision, but in double
precision gives a speedup of about 30% on amd64 (A64 CPU) and i386
(A64).  This depends on fabs[f]() being inline and efficient.  The
bit fiddling (or any use of SET_HIGH_WORD(), which libm does too
much because it was best on old 32-bit machines) always causes
packing overheads and sometimes causes stalls in the packing, since
it operates on only part of a variable in the double precision case.
It apparently did cause stalls in a critical path here.
2008-03-30 18:07:12 +00:00
Bruce Evans
c0c7ddd3a8 Use the expression fabs(x+0.0)-fabs(y+0.0) instead of
fabs(x+0.0)+fabs(y+0.0) when mixing NaNs.  This improves
consistency of the result by making it harder for the compiler to reorder
the operands.  (FP addition is not necessarily commutative because the
order of operands makes a difference on some machines iff the operands are
both NaNs.)
2008-03-30 17:28:27 +00:00
Bruce Evans
f94997c8d7 Fix a missing mask in a hi+lo decomposition. Thus bug made the extra
precision in software useless, so hypotf() had some errors in the 1-2
ulp range unless there is extra precision in hardware (as happens on
i386).
2008-03-30 17:17:42 +00:00
Doug Rabson
ecc03b80f1 Don't call xdrrec_skiprecord in the non-blocking case. If
__xdrrec_getrec has returned TRUE, then we have a complete request in
the buffer - calling xdrrec_skiprecord is not necessary. In particular,
if there is another record already buffered on the stream,
xdrrec_skiprecord will discard both this request and the next
one, causing the call to xdr_callmsg to fail and the stream to be
closed.

Sponsored by:	Isilon Systems
2008-03-30 09:36:17 +00:00
Doug Rabson
7ea7cc4bab Don't assume that there is readable data on the stream after the
fragment header.
2008-03-30 09:35:04 +00:00
Ruslan Ermilov
dbdb679c6f Remove options MK_LIBKSE and DEFAULT_THREAD_LIB now that we no longer
build libkse.  This should fix WITHOUT_LIBTHR builds as a side effect.
2008-03-29 17:44:40 +00:00
David Schultz
a1af0d70da Include math.h for the fmaf() prototype. 2008-03-29 16:38:29 +00:00
David Schultz
ee0730e61e Fix some rather obscene code that has ambiguous if...if...else...
constructs in it.
2008-03-29 16:37:59 +00:00
David Schultz
838200ff96 Document modff() and modfl(). Technically, modff() and modfl()
live in libm, while modf() lives in libc due to historical
mistakes. I'm claiming in the manpage that they all live in libm,
since programmers should not rely on the mistake.
2008-03-29 16:19:35 +00:00
Jeff Roberson
d1317e00b8 - Add a man page for cpuset_getaffinity() and cpuset_setaffinity() and
hook it up to the build.

Reviewed by:	brueffer (skeleton and formatting assistance)
2008-03-29 10:26:29 +00:00
Jeff Roberson
329356f9f2 - Add a man page for cpuset(), cpuset_setid(), and cpuset_getid() and hook
it up to the build.

Reviewed by:	brueffer (skeleton and formatting assistance)
2008-03-29 10:06:30 +00:00
Paul Saab
6e7534b8c8 Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by:	alc, ups
2008-03-28 04:29:27 +00:00
Ruslan Ermilov
cbdcc7cb91 Removed no longer existing CTL_MACHDEP defines.
Inspired by:	phk
2008-03-26 23:02:17 +00:00
Doug Rabson
dfdcada31e Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
Christian Brueffer
662cac9f23 Fix some "in in" typos in comments.
PR:		121490
Submitted by:	Anatoly Borodin <anatoly.borodin@gmail.com>
Approved by:	rwatson (mentor), jkoshy
MFC after:	3 days
2008-03-26 07:32:08 +00:00
Ruslan Ermilov
5a9926445a Compile libthr with warnings.
(Somehow this file sneaked from initial commit.)
2008-03-25 15:33:00 +00:00
Ruslan Ermilov
e03efb02bc Compile libthr with warnings. 2008-03-25 13:28:12 +00:00
Ruslan Ermilov
7e0e78248e Fixed mis-implementation of pthread_mutex_get{spin,yield}loops_np().
Reviewed by:	davidxu
2008-03-25 09:48:10 +00:00
Jeff Roberson
fbb275f59d - Restore kse.h in this directory so other tools don't find it by mistake.
- Restore the ability to debug kse coredumps in 8.0.

Suggested by:	marcel
2008-03-23 09:38:11 +00:00
David Xu
9939a13667 Add POSIX pthread API pthread_getcpuclockid() to get a thread's cpu
time clock id.
2008-03-22 09:59:20 +00:00
David Xu
20b94d8035 Use linker set to collection all target operations. 2008-03-22 05:40:44 +00:00
Kai Wang
7a36fb79f9 Add MLINK for archive_write_close.
Approved by:	jkoshy(mentor), kientzle
2008-03-21 11:10:20 +00:00
David Xu
04a57d2c83 Resolve __error()'s PLT early so that it needs not to be resolved again,
otherwise rwlock is recursivly called when signal happens and the __error
was never resolved before.
2008-03-21 02:31:55 +00:00
Ruslan Ermilov
a1292a02d3 pthread_mutexattr_destroy() was accidentally broken in last revision,
unbreak it.  We should really start compiling this with warnings.
2008-03-20 11:47:08 +00:00
Dag-Erling Smørgrav
5092cf0569 s/wait/delta/ to avoid namespace collision.
MFC after:	2 weeks
2008-03-20 09:55:27 +00:00
David Xu
8c38215f50 Preserve application code's errno in rtld locking code, it attemps to keep
any case safe.
2008-03-20 09:35:44 +00:00
David Xu
48ebe2ebc4 Make pthread_mutexattr_settype to return error number directly and
conformant to POSIX specification.

Bug reported by: modelnine at modelnine dt org
2008-03-20 08:27:14 +00:00
David Xu
c8a4eae56f don't reduce new thread's refcount if current thread can not set cpuset
for it, since the new thread will reduce it by itself.
2008-03-19 09:33:07 +00:00
David Xu
519e8d87bb - Trim trailing spaces.
- Use a different sigmask variable name to avoid confusing.
2008-03-19 08:13:04 +00:00
David Xu
86a06c6000 if passed thread pointer is equal to current thread, pass -1 to kernel
to speed up searching.
2008-03-19 06:38:21 +00:00
Joseph Koshy
b23372cd8e Ensure that the section header table is written out in an order
consistent with the section indices returned to the application by
elf_ndxscn().

Submitted by:		kaiw
2008-03-19 06:06:34 +00:00
Joseph Koshy
df7d1e2023 Clarify that the ELF library only sets the sh_entsize field of a
section header entry if the application is not taking charge of ELF
object layout.

Update (c) years, and bump the manual page's date.

Submitted by:		kaiw
2008-03-19 05:07:49 +00:00
Maksim Yevmenkin
07f8cd18c6 Add mandatory "security description" SDP parameter to the PANU profile
Pointed-out by:	Iain Hibbert < plunky at rya-online dot net >
MFC after:	3 days
2008-03-19 00:06:30 +00:00
Maksim Yevmenkin
13040bc96b Add PSM and Load Factor SDP parameters to the BNEP based profiles
(NAP, GN and PANU). No reason to not to support them.

Separate SDP parameters data structures for the BNEP based profiles.

Generalize Service Availability SDP parameter creation.

Requested by:	Iain Hibbert < plunky at rya-online dot net >
MFC after:	3 days
2008-03-18 18:21:39 +00:00
David Xu
2ea1f90a18 - Copy signal mask out before THR_UNLOCK(), because THR_UNLOCK() may call
_thr_suspend_check() which messes sigmask saved in thread structure.
- Don't suspend a thread has force_exit set.
- In pthread_exit(), if there is a suspension flag set, wake up waiting-
  thread after setting PS_DEAD, this causes waiting-thread to break loop
  in suspend_common().
2008-03-18 02:06:51 +00:00
Antoine Brodin
59e7781613 Don't allocate the constant array "props" on the stack in wctype.
PR:		74743
Submitted by:	knut st. osmundsen
Approved by:	rwatson (mentor)
MFC after:	1 month
2008-03-17 18:22:23 +00:00
David Schultz
18798c64f0 scandir(3) previously used st_size to obtain an initial estimate
of the array length needed to store all the directory entries.
Although BSD has historically guaranteed that st_size is the size
of the directory file, POSIX does not, and more to the point, some
recent filesystems such as ZFS use st_size to mean something else.

The fix is to not stat the directory at all, set the initial
array size to 32 entries, and realloc it in powers of 2 if that
proves insufficient.

PR:	113668
2008-03-16 19:08:53 +00:00
David Xu
a9a11568ff Actually delete SIGCANCEL mask for suspended thread, so the signal will not
be masked when it is resumed.
2008-03-16 03:22:38 +00:00
Tim Kientzle
409e319377 Update a comment: the format bid only runs once per archive; it no
longer runs once per entry.
2008-03-15 11:09:16 +00:00
Tim Kientzle
845aa4ab0a Free up the entry objects allocated during this test. 2008-03-15 11:06:15 +00:00
Tim Kientzle
adfb462fea Release the buffers used for exercising the compress code. 2008-03-15 11:05:49 +00:00
Tim Kientzle
0b315cd9ae Remove the duplicate "archive_format" and "archive_format_name" fields
from the private archive_write structure and fix up all writers to use
the format fields in the base "archive" structure.  This error made it
impossible to query the format after setting up a writer because the
write format was stored in an inaccessible place.
2008-03-15 11:04:45 +00:00
Tim Kientzle
c43d294189 Correct a sign mismatch that only showed up on 64-bit systems.
Pointy hat: me
2008-03-15 11:02:47 +00:00
Tim Kientzle
3010219939 Refactor the mtree code a bit to make the layering clearer: Each
"file" is described by multiple "lines" each possibly containing
multiple "keywords."  Incorporate some additions from Joerg Sonnenberger
to handle linked files and correctly deal with backing files on disk.
2008-03-15 07:10:24 +00:00
Tim Kientzle
d7740aea75 FreeBSD does have fstat().
Correct the nasty typo this uncovers.
2008-03-15 04:20:50 +00:00
Tim Kientzle
eb971f9524 Testability is more important than standards conformance.
Disable the use of PaxHeader.<pid> for the fake pax extension pathname
until I can make the name here settable.  Otherwise, tests that try
to compare output to static pre-generated reference files break.
2008-03-15 03:49:18 +00:00
Tim Kientzle
24f55a5963 Ignore a few more common files. 2008-03-15 02:31:28 +00:00
Tim Kientzle
80334b7d22 Resolve a minor nit in SUS compliance by including the PID in the
fake directory name used for pax extended headers.
2008-03-15 02:30:42 +00:00
Tim Kientzle
cde1a05218 GC a reference to the defunct TESTFILES variable. 2008-03-15 02:22:08 +00:00
Tim Kientzle
60617bf578 A subtle point: "pax interchange format" mandates that all strings
(including pathname, gname, uname) be stored in UTF-8.  This usually
doesn't cause problems on FreeBSD because the "C" locale on FreeBSD
can convert any byte to Unicode/wchar_t and from there to UTF-8.  In
other locales (including the "C" locale on Linux which is really
ASCII), you can get into trouble with pathnames that cannot be
converted to UTF-8.

Libarchive's pax writer truncated pathnames and other strings at the
first nonconvertible character.  (ouch!)  Other archivers have worked
around this by storing unconvertible pathnames as raw binary, a
practice which has been sanctioned by the Austin group.  However,
libarchive's pax reader would segfault reading headers that weren't
proper UTF-8.  (ouch!)  Since bsdtar defaults to pax format, this
affects bsdtar rather heavily.

To correctly support the new "hdrcharset" header that is going into
SUS and to handle conversion failures in general, libarchive's pax reader
and writer have been overhauled fairly extensively.  They used to do
most of the pax header processing using wchar_t (Unicode); they now do
most of it using char so that common logic applies to either UTF-8 or
"binary" strings.

As a bonus, a number of extraneous conversions to/from wchar_t have
been eliminated, which should speed things up just a tad.

Thanks to: Bjoern Jacke for originally reporting this to me
Thanks to: Joerg Sonnenberger for noting a bad typo in my first draft of this
Thanks to: Gunnar Ritter for getting the standard fixed
MFC after: 5 days
2008-03-15 01:43:59 +00:00
Tim Kientzle
3a6aaff135 Ignore some built files. 2008-03-15 00:52:22 +00:00
Tim Kientzle
408a822432 Don't lie. If a string can't be converted to a wide (Unicode) string,
return a NULL instead of an incomplete string.  Expand the test coverage
to verify the correct behavior here.
2008-03-14 23:19:46 +00:00
Tim Kientzle
6c8f54e991 Don't advertise the default block size as a constant; don't
rely on a deprecated value to set the default.  This is also
related to a longer-term goal of setting the default block
size based on format and possibly other factors, which makes
it a bad idea to tie this to a published constant.
2008-03-14 23:09:02 +00:00
Tim Kientzle
8e4bc81237 New public functions archive_entry_copy_link() and archive_entry_copy_link_w()
override the currently set link value, whether that's a hardlink
or a symlink.  Plus documentation update and tests.
2008-03-14 23:00:53 +00:00
Tim Kientzle
1051e364aa Update some comments, comment out argument names to guard against
namespace problems.
2008-03-14 22:47:38 +00:00
Tim Kientzle
871e5c0326 Since "length" computes the length of a string and is used as an
argument to malloc(3), it should be size_t, not int.
2008-03-14 22:44:07 +00:00
Tim Kientzle
d6f37be734 Let archive_entry_clear() accept a NULL pointer and simply do nothing.
In particular, this allows archive_entry_free() to work correctly
for a NULL pointer, which makes it parallel with free(3).
2008-03-14 22:40:36 +00:00
Tim Kientzle
42d1f7b4ba Rework the versioning implementation and test to match the
new interface.  Mark the functions that are going away in
libarchive 3.0.

In particular, archive_version_string() now computes the
string rather than assuming that it will be created by the
build infrastructure.  Eventually, this will allow some
simplification of the build infrastructure.
2008-03-14 22:31:57 +00:00
Tim Kientzle
0349d719b1 Rework the versioning information, hopefully for the last time.
* There are now only two public version identifiers:  "number" is
   a single integer that combines Major/minor/release in a single
   value of the form Mmmmrrr.  This is easy to compare against for
   checking feature support.  "string" is a displayable text string
   of the form "libarchive M.mm.rr".
 * The number is present both as a macro (version of the installed header)
   and a function (version of the shared library).  The string form
   is available only as a function.
 * Retain the older version definitions for now, but mark them all
   as deprecated, to disappear in libarchive 3.0 (whenever that happens).
 * Rework the various deprecation conditionals to use ARCHIVE_VERSION_NUMBER.

An ancillary goal is to reduce the number of @...@ substitutions that
are required.  Someday, I might even be able to avoid build-time
processing of archive.h entirely.
2008-03-14 22:19:50 +00:00
Tim Kientzle
45943bfd93 Add a useful sprintf()-style wrapper around
archive_string_vsprintf().  (Which is built
on top of libarchive's internal resizable string
support.)
2008-03-14 22:00:09 +00:00
Tim Kientzle
7c5b1173a5 Support for writing 'compress' format, thanks to Joerg Sonnenberger. 2008-03-14 20:35:38 +00:00
Tim Kientzle
20347f62e6 A block in a tar file is 512 bytes. Period.
Remove the entirely pointless symbolic constant
and sizeof(unsigned char).  (The constant
here is doubly wrong, since not only does
it obscure a basic format constant, it was
never intended to be a tar-specific value,
so could conceivably be changed at some point
in the future.)
2008-03-14 20:32:20 +00:00
Joseph Koshy
bcbe65a85f - Document Pentium and Pentium MMX events.
- Update (c) years and the manual page's date.
2008-03-14 06:22:03 +00:00
Ruslan Ermilov
53bbf5aa35 Fix bugs in previous revision (missing comma, misspelled syscall name). 2008-03-13 10:33:24 +00:00
Ruslan Ermilov
517d383637 Remove trailing whitespace. 2008-03-13 10:26:17 +00:00
Ruslan Ermilov
c130b9f1af Add missing section number. 2008-03-13 10:25:30 +00:00
David Xu
9fbfd54e8e In file sem_timewait.3, remove reference to SYSV semphore in SEE ALSO
section, sync it with sem_wait.3.
2008-03-13 01:53:28 +00:00
Kai Wang
a739eb8374 Current 'ar' read support in libarchive can only handle a GNU/SVR4
filename table whose size is less than 65536 bytes.

The original intention was to not consume the filename table, so the
client will have a chance to look at it. To achieve that, the library
call decompressor->read_ahead to read(look ahead) but do not call
decompressor->consume to consume the data, thus a limit was raised
since read_ahead call can only look ahead at most BUFFER_SIZE(65536)
bytes at the moment, and you can not "look any further" before you
consume what you already "saw".

This commit will turn GNU/SVR4 filename table into "archive format
data", i.e., filename table will be consumed by libarchive, so the
65536-bytes limit will be gone, but client can no longer have access
to the content of filename table.

'ar' support test suite is changed accordingly. BSD ar(1) is not
affected by this change since it doesn't look at the filename table.

Reported by:	erwin
Discussed with:	jkoshy, kientzle
Reviewed by:	jkoshy, kientzle
Approved by:	jkoshy(mentor), kientzle
2008-03-12 21:10:26 +00:00
Joseph Koshy
484202faab Bring the behaviour of pmc_capabilities() and pmc_width() in line with
documentation: set 'errno' and return -1 in case of an error.

Update (c) years.
2008-03-12 15:51:32 +00:00
Joseph Koshy
ef4ba9be47 Describe return values from pmc_ncpu() and pmc_npmc() better. 2008-03-12 15:48:59 +00:00
Paolo Pisati
ab0fcfd00a -Don't pass down the entire pkt to ProtoAliasIn, ProtoAliasOut, FragmentIn
and FragmentOut.
-Axe the old PacketAlias API: it has been deprecated since 5.x.
2008-03-12 11:58:29 +00:00
Jeff Roberson
7d4cbc3607 - Remove kse syscall symbols and man pages. 2008-03-12 10:12:22 +00:00
Jeff Roberson
1e71e49d12 - Don't inspect the P_SA flag. It's being removed. 2008-03-12 10:00:33 +00:00
Jeff Roberson
34147e4308 - Remove libkse and related support code in libpthread from the build.
Don't remove the files yet.  Kernel support will be removed shortly.
2008-03-12 09:49:39 +00:00
Tim Kientzle
df4691b984 Portability: Eliminate the need for uudecode by incorporating
uudecode into the main test driver and invoking it just-in-time
within the various tests.

Also, incorporate a number of improvements to the main test support
code that have proven useful on other projects where I've used this
framework.
2008-03-12 05:12:23 +00:00
Tim Kientzle
0b4793efb7 Remove some unused fields from the private archive_read structure
(left over from when the unified read/write structure was copied
to form separate read and write structures) and eliminate the
pointless initialization of a couple of the unused fields.
2008-03-12 04:58:32 +00:00
Tim Kientzle
c2247d3995 Tighten up the semantics of acl_next() and xattr_next() when you
hit the end of the ACL or xattr list.

Thanks to: Jeff Johnson for pointing out the obvious typo
2008-03-12 04:47:37 +00:00
Tim Kientzle
826055b6a8 Typo, thanks to: Jeff Johnson.
MFC after: 3 days
2008-03-12 04:26:44 +00:00
David Xu
e54cc1f0d5 Add missing comma. 2008-03-12 02:37:31 +00:00
David Xu
1dd273df59 Add manual for function sem_timedwait().
Reviewed by: ru, deischen
2008-03-12 02:33:17 +00:00
David Xu
150b71918c If a thread is cancelled, it may have already consumed a umtx_wake,
check waiter and semphore counter to see if we may wake up next thread.
2008-03-11 03:26:47 +00:00
Maksim Yevmenkin
e096d1e4ba Add structures to hold SDP parameters for the NAP, GN and PANU profiles.
It should be mentioned that a somewhat similar patch was submitted by
Rako < rako29 at gmail dot com >

MFC after:	1 week
2008-03-11 00:08:40 +00:00
Joseph Koshy
c7f03ab040 Use .Fo/.Fc and .Xo/.Xc to bring the line widths below 79 columns.
Correct a typo [a misplaced comma].

Reviewed by:		ru
2008-03-10 14:45:29 +00:00
Joseph Koshy
80c4d6eba3 Use .Fo/.Fc and .Xo/.Xc to bring the line widths below 79 columns.
Reviewed by:		ru
2008-03-10 14:44:41 +00:00
Robert Watson
4813b6af4b Add reference to kldunloadf system call, which was previously not
mentioned in the kldunload(2) man page.

MFC after:	3 days
Spotted by:	rink
2008-03-10 09:54:13 +00:00
Antoine Brodin
e3ad7f6626 Introduce a new F_DUP2FD command to fcntl(2), for compatibility with
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).

PR:		120233
Submitted by:	Jukka Ukkonen
Approved by:	rwaston (mentor)
MFC after:	1 month
2008-03-08 22:02:21 +00:00
Antoine Brodin
6044f112a6 Merge changes from NetBSD on humanize_number.c, 1.8 -> 1.13
Significant changes:
- rev. 1.11: Use PRId64 instead of a cast to long long and %lld to print
an int64_t.
- rev. 1.12: Fix a bug that humanize_number() produces "1000" where it
should be "1.0G" or "1.0M".  The bug reported by Greg Troxel.

PR:		118461
PR:		102694
Approved by:	rwatson (mentor)
Obtained from:	NetBSD
MFC after:	1 month
2008-03-08 21:55:59 +00:00
Jason Evans
f2ec9c0c86 Remove stale #include <machine/atomic.h>, which as needed by lazy
deallocation.
2008-03-07 16:54:03 +00:00
Robert Watson
cee815cf77 Add __FBSDID() tags.
MFC after:	3 days
2008-03-07 15:25:56 +00:00
David Xu
8a18c0d3c8 Fix a bug when calculating remnant size. 2008-03-06 03:24:03 +00:00
David Xu
697b4b49be Don't report death event to debugger if it is a forced exit. 2008-03-06 02:07:18 +00:00
David Xu
70e79fbb0d Restore code setting new thread's scheduler parameters, I was thinking
that there might be starvations, but because we have already locked the
thread, the cpuset settings will always be done before the new thread
does real-world work.
2008-03-06 01:59:08 +00:00
David Xu
1cb51125aa Increase and decrease in_sigcancel_handler accordingly to avoid possible
error caused by nested SIGCANCEL stack, it is a bit complex.
2008-03-05 07:04:55 +00:00
David Xu
54dff16b26 Use cpuset defined in pthread_attr for newly created thread, for now,
we set scheduling parameters and cpu binding fully in userland, and
because default scheduling policy is SCHED_RR (time-sharing), we set
default sched_inherit to PTHREAD_SCHED_INHERIT, this saves a system
call.
2008-03-05 07:01:20 +00:00
David Xu
07bbb16640 Add more cpu affinity function's symbols. 2008-03-05 06:56:35 +00:00
David Xu
21845eb98d Check actual size of cpuset kernel is using and define underscore version
of API.
2008-03-05 06:55:48 +00:00
David Xu
76a9679f8e If a new thread is created, it inherits current thread's signal masks,
however if current thread is executing cancellation handler, signal
SIGCANCEL may have already been blocked, this is unexpected, unblock the
signal in new thread if this happens.

MFC after: 1 week
2008-03-04 04:28:59 +00:00
David Xu
54c9b47c2b Include cpuset.h, unbreak compiling. 2008-03-04 03:45:11 +00:00
David Xu
a759db946a implement pthread_attr_getaffinity_np and pthread_attr_setaffinity_np. 2008-03-04 03:03:24 +00:00
David Xu
57030e1071 Implement functions pthread_getaffinity_np and pthread_setaffinity_np to
get and set thread's cpu affinity mask.
2008-03-03 09:16:29 +00:00
Joseph Koshy
2a100d2353 - Fix an off-by-one bug in _libelf_insert_section(). [1]
- Update (c) years.

Submitted by:		kaiw [1]
2008-03-03 04:29:25 +00:00
David Schultz
3e13dd37ff 1 << 47 needs to be written 1ULL << 47. 2008-03-02 20:16:55 +00:00
Jeff Roberson
d7f687fc9b Add cpuset, an api for thread to cpu binding and cpu resource grouping
and assignment.
 - Add a reference to a struct cpuset in each thread that is inherited from
   the thread that created it.
 - Release the reference when the thread is destroyed.
 - Add prototypes for syscalls and macros for manipulating cpusets in
   sys/cpuset.h
 - Add syscalls to create, get, and set new numbered cpusets:
   cpuset(), cpuset_{get,set}id()
 - Add syscalls for getting and setting affinity masks for cpusets or
   individual threads: cpuid_{get,set}affinity()
 - Add types for the 'level' and 'which' parameters for the cpuset.  This
   will permit expansion of the api to cover cpu masks for other objects
   identifiable with an id_t integer.  For example, IRQs and Jails may be
   coming soon.
 - The root set 0 contains all valid cpus.  All thread initially belong to
   cpuset 1.  This permits migrating all threads off of certain cpus to
   reserve them for special applications.

Sponsored by:	Nokia
Discussed with:	arch, rwatson, brooks, davidxu, deischen
Reviewed by:	antoine
2008-03-02 07:39:22 +00:00
Joseph Koshy
48f9a2656a Translate the r_info field of ELF relocation records when converting
between 64 and 32 bit variants.

Submitted by:		kaiw
2008-03-02 06:33:10 +00:00
David Schultz
e43c8f6acc Hook up sqrtl() to the build. 2008-03-02 01:48:17 +00:00
David Schultz
c6f56f9f41 MD implementations of sqrtl(). 2008-03-02 01:48:08 +00:00
David Schultz
c6a4447b64 MI implementation of sqrtl(). This is very slow and should
be overridden when hardware sqrt is available.
2008-03-02 01:47:58 +00:00
Philip Paeps
db47316b5c Use the easily-greppable copyright notice template from
src/share/examples/mdoc/POSIX-copyright.

Requested by:	ru
2008-02-29 17:48:25 +00:00
Bruce Evans
a278d99026 Fix and improve some magic numbers for the "medium size" case.
e_rem_pio2.c:
This case goes up to about 2**20pi/2, but the comment about it said that
it goes up to about 2**19pi/2.

It went too far above 2**pi/2, giving a multiplier fn with 21 significant
bits in some cases.  This would be harmful except for a numerical
accident.  It happens that the terms of the approximation to pi/2,
when rounded to 33 bits so that multiplications by 20-bit fn's are
exact, happen to be rounded to 32 bits so multiplications by 21-bit
fn's are exact too, so the bug only complicates the error analysis (we
might lose a bit of accuracy but have bits to spare).

e_rem_pio2f.c:
The bogus comment in e_rem_pio2.c was copied and the code was changed
to be bug-for-bug compatible with it, except the limit was made 90
ulps smaller than necessary.  The approximation to pi/2 was not
modified except for discarding some of it.

The same rough error analysis that justifies the limit of 2**20pi/2
for double precision only justifies a limit of 2**18pi/2 for float
precision.  We depended on exhaustive testing to check the magic numbers
for float precision.  More exaustive testing shows that we can go up
to 2**28pi/2 using a 53+25 bit approximation to pi/2 for float precision,
with a the maximum error for cosf() and sinf() unchanged at 0.5009
ulps despite the maximum error in rem_pio2f being ~0.25 ulps.  Implement
this.
2008-02-28 16:22:36 +00:00
Sean Farley
7f08f0dd77 Replace the use of warnx() with direct output to stderr using _write().
This reduces the size of a statically-linked binary by approximately 100KB
in a trivial "return (0)" test application.  readelf -S was used to verify
that the .text section was reduced and that using strlen() saved a few
more bytes over using sizeof().  Since the section of code is only called
when environ is corrupt (program bug), I went with fewer bytes over fewer
cycles.

I made minor edits to the submitted patch to make the output resemble
warnx().

Submitted by:	kib bz
Approved by:	wes (mentor)
MFC after:	5 days
2008-02-28 04:09:08 +00:00
John Baldwin
fc9ab4f6da Add <limits.h> for SHRT_MAX.
Pointy hat to:	jhb
2008-02-27 21:25:19 +00:00
John Baldwin
c55d7e868a File descriptors are an int, but our stdio FILE object uses a short to hold
them.  Thus, any fd whose value is greater than SHRT_MAX is handled
incorrectly (the short value is sign-extended when converted to an int).
An unpleasant side effect is that if fopen() opens a file and gets a
backing fd that is greater than SHRT_MAX, fclose() will fail and the file
descriptor will be leaked.  Better handle this by fixing fopen(), fdopen(),
and freopen() to fail attempts to use a fd greater than SHRT_MAX with
EMFILE.

At some point in the future we should look at expanding the file descriptor
in FILE to an int, but that is a bit complicated due to ABI issues.

MFC after:	1 week
Discussed on:	arch
Reviewed by:	wollman
2008-02-27 19:02:02 +00:00
Tim Kientzle
e29c664a4c Spelling correction, thanks to Joerg Sonnenberger. 2008-02-27 06:16:41 +00:00
Tim Kientzle
a26e9253f6 Optimize skipping over Zip entries.
Thanks to: Dan Nelson, who sent me the patch
MFC after: 7 days
2008-02-27 06:05:59 +00:00
Garrett Wollman
6ca61b39bb stdio is currently limited to file descriptors not greater than
{SHRT_MAX}, so {STREAM_MAX} should be no greater than that.  (This
does not exactly meet the letter of POSIX but comes reasonably close
to it in spirit.)

MFC after:	14 days
2008-02-27 05:56:57 +00:00
Ruslan Ermilov
a059c409c2 Added the "restrict" type-qualifier to the readlink() prototype. 2008-02-26 20:33:52 +00:00
Tim Kientzle
35f4ae0981 Rename the archive_endian.h functions to avoid name clashes
with NetBSD's sys/endian.h file.

Pointed out by: Joerg Sonnenberger
2008-02-26 07:17:47 +00:00
Bruce Evans
e822ea5b2a Inline __ieee754__rem_pio2f(). On amd64 (A64) and i386 (A64), this
gives an average speedup of about 12 cycles or 17% for
9pi/4 < |x| <= 2**19pi/2 and a smaller speedup for larger x, and a
small speeddown for |x| <= 9pi/4 (only 1-2 cycles average, but that
is 4%).

Inlining this is less likely to bust caches than inlining the float
version since it is much smaller (about 220 bytes text and rodata) and
has many fewer branches.  However, the float version was already large
due to its manual inlining of the branches and also the polynomial
evaluations.
2008-02-25 22:19:17 +00:00
Bruce Evans
c32951b16e Use a temporary array instead of the arg array y[] for calling
__kernel_rem_pio2().  This simplifies analysis of aliasing and thus
results in better code for the usual case where __kernel_rem_pio2()
is not called.  In particular, when __ieee854_rem_pio2[f]() is inlined,
it normally results in y[] being returned in registers.  I couldn't
get this to work using the restrict qualifier.

In float precision, this saves 2-3% in most cases on amd64 and i386
(A64) despite it not being inlined in float precision yet.  In double
precision, this has high variance, with an average gain of 2% for
amd64 and 0.7% for i386 (but a much larger gain for usual cases) and
some losses.
2008-02-25 18:28:58 +00:00
Bruce Evans
70d818a20e Change __ieee754_rem_pio2f() to return double instead of float so that
this function and its callers cosf(), sinf() and tanf() don't waste time
converting values from doubles to floats and back for |x| > 9pi/4.
All these functions were optimized a few years ago to mostly use doubles
internally and across the __kernel*() interfaces but not across the
__ieee754_rem_pio2f() interface.

This saves about 40 cycles in cosf(), sinf() and tanf() for |x| > 9pi/4
on amd64 (A64), and about 20 cycles on i386 (A64) (except for cosf()
and sinf() in the upper range).  40 cycles is about 35% for |x| < 9pi/4
<= 2**19pi/2 and about 5% for |x| > 2**19pi/2.  The saving is much
larger on amd64 than on i386 since the conversions are not easy to
optimize except on i386 where some of them are automatic and others
are optimized invalidly.  amd64 is still about 10% slower in cosf()
and tanf() in the lower range due to conversion overhead.

This also gives a tiny speedup for |x| <= 9pi/4 on amd64 (by simplifying
the code).  It also avoids compiler bugs and/or additional slowness
in the conversions on (not yet supported) machines where double_t !=
double.
2008-02-25 13:33:20 +00:00
Christian Brueffer
636133e3dd Add missing words.
MFC after:	3 days
2008-02-25 13:03:18 +00:00
Bruce Evans
0d1564b6c7 Fix some off-by-1 errors.
e_rem_pio2.c:
Float and double precision didn't work because init_jk[] was 1 too small.
It needs to be 2 larger than you might expect, and 1 larger than it was
for these precisions, since its test for recomputing needs a margin of
47 bits (almost 2 24-bit units).

init_jk[] seems to be barely enough for extended and quad precisions.
This hasn't been completely verified.  Callers now get about 24 bits
of extra precision for float, and about 19 for double, but only about
8 for extended and quad.  8 is not enough for callers that want to
produce extra-precision results, but current callers have rounding
errors of at least 0.8 ulps, so another 1/2**8 ulps of error from the
reduction won't affect them much.

Add a comment about some of the magic for init_jk[].

e_rem_pio2.c:
Double precision worked in practice because of a compensating off-by-1
error here.  Extended precision was asked for, and it executed exactly
the same code as the unbroken double precision.

e_rem_pio2f.c:
Float precision worked in practice because of a compensating off-by-1
error here.  Double precision was asked for, and was almost needed,
since the cosf() and sinf() callers want to produce extra-precision
results, at least internally so that their error is only 0.5009 ulps.
However, the extra precision provided by unbroken float precision is
enough, and the double-precision code has extra overheads, so the
off-by-1 error cost about 5% in efficiency on amd64 and i386.
2008-02-25 11:43:20 +00:00
Rafal Jaworowski
56ae1bed48 Let PowerPC world optionally build with -msoft-float. For FPU-less PowerPC
variations (e500 currently), this provides a gcc-level FPU emulation and is an
alternative approach to the recently introduced kernel-level emulation
(FPU_EMU).

Approved by:	cognet (mentor)
MFp4:		e500
2008-02-24 19:22:53 +00:00
Bruce Evans
60a50c2585 Optimize the 9pi/2 < |x| <= 2**19pi/2 case some more by avoiding an
fabs(), a conditional branch, and sign adjustments of 3 variables for
x < 0 when the branch is taken.  In double precision, even when the
branch is perfectly predicted, this saves about 10 cycles or 10% on
amd64 (A64) and i386 (A64) for the negative half of the range, but
makes little difference for the positive half of the range.  In float
precision, it also saves about 4 cycles for the positive half of the
range on i386, and many more cycles in both halves on amd64 (28 in the
negative half and 11 in the positive half for tanf), but the amd64
times for float precision are anomalously slow so the larger
improvement is only a side effect.

Previous commits arranged for the x < 0 case to be handled simply:
- one part of the rounding method uses the magic number 0x1.8p52
  instead of the usual 0x1.0p52.  The latter is required for large |x|,
  but it doesn't work for negative x and we don't need it for large |x|.
- another part of the rounding method no longer needs to add `half'.
  It would have needed to add -half for negative x.
- removing the "quick check no cancellation" in the double precision
  case removed the need to take the absolute value of the quadrant
  number.

Add my noncopyright in e_rem_pio2.c
2008-02-23 12:53:21 +00:00
Bruce Evans
dbf10e45c4 Avoid using FP-to-integer conversion for !(amd64 || i386) too. Use the
FP-to-FP method to round to an integer on all arches, and convert this
to an int using FP-to-integer conversion iff irint() is not available.
This is cleaner and works well on at least ia64, where it saves 20-30
cycles or about 10% on average for 9Pi/4 < |x| <= 32pi/2 (should be
similar up to 2**19pi/2, but I only tested the smaller range).

After the previous commit to e_rem_pio2.c removed the "quick check no
cancellation" non-optimization, the result of the FP-to-integer
conversion is not needed so early, so using irint() became a much
smaller optimization than when it was committed.

An earlier commit message said that cos, cosf, sin and sinf were equally
fast on amd64 and i386 except for cos and sin on i386.  Actually, cos
and sin on amd64 are equally fast to cosf and sinf on i386 (~88 cycles),
while cosf and sinf on amd64 are not quite equally slow to cos and sin
on i386 (average 115 cycles with more variance).
2008-02-22 18:43:23 +00:00
Bruce Evans
7c1b5e7953 Remove the "quick check no cancellation" optimization for
9pi/2 < |x| < 32pi/2 since it is only a small or negative optimation
and it gets in the way of further optimizations.  It did one more
branch to avoid some integer operations and to use a different
dependency on previous results.  The branches are fairly predictable
so they are usually not a problem, so whether this is a good
optimization depends mainly on the timing for the previous results,
which is very machine-dependent.  On amd64 (A64), this "optimization"
is a pessimization of about 1 cycle or 1%; on ia64, it is an
optimization of about 2 cycles or 1%; on i386 (A64), it is an
optimization of about 5 cycles or 4%; on i386 (Celeron P2) it is an
optimization of about 4 cycles or 3% for cos but a pessimization of
about 5 cycles for sin and 1 cycle for tan.  I think the new i386
(A64) slowness is due to an pipeline stall due to an avoidable
load-store mismatch (so the old timing was better), and the i386
(Celeron) variance is due to its branch predictor not being too good.
2008-02-22 17:26:24 +00:00
Bruce Evans
43590b1517 Optimize the 9pi/2 < |x| <= 2**19pi/2 case on amd64 and i386 by avoiding
the the double to int conversion operation which is very slow on these
arches.  Assume that the current rounding mode is the default of
round-to-nearest and use rounding operations in this mode instead of
faking this mode using the round-towards-zero mode for conversion to
int.  Round the double to an integer as a double first and as an int
second since the double result is needed much earler.

Double rounding isn't a problem since we only need a rough approximation.
We didn't support other current rounding modes and produce much larger
errors than before if called in a non-default mode.

This saves an average about 10 cycles on amd64 (A64) and about 25 on
i386 (A64) for x in the above range.  In some cases the saving is over
25%.  Most cases with |x| < 1000pi now take about 88 cycles for cos
and sin (with certain CFLAGS, etc.), except on i386 where cos and sin
(but not cosf and sinf) are much slower at 111 and 121 cycles respectivly
due to the compiler only optimizing well for float precision.  A64
hardware cos and sin are slower at 105 cycles on i386 and 110 cycles
on amd64.
2008-02-22 15:55:14 +00:00
Bruce Evans
0ddfa46b44 Add an irint() function in inline asm for amd64 and i386. irint() is
the same as lrint() except it returns int instead of long.  Though the
extern lrint() is fairly fast on these arches, it still takes about
12 cycles longer than the inline version, and 12 cycles is a lot in
applications where [li]rint() is used to avoid slow conversions that
are only a couple of times slower.

This is only for internal use.  The libm versions of *rint*() should
also be inline, but that would take would take more header engineering.
Implementing irint() instead of lrint() also avoids a conflict with
the extern declaration of the latter.
2008-02-22 14:11:03 +00:00
Bruce Evans
f839bac29c Optimize the conversion to bits a little (by about 11 cycles or 16%
on i386 (A64), 5 cycles on amd64 (A64), and 3 cycles on ia64).  gcc
tends to generate very bad code for accessing floating point values
as bits except when the integer accesses have the same width as the
floating point values, and direct accesses to bit-fields (as is common
only for long double precision) always gives such accesses.  Use the
expsign access method, which is good for 80-bit long doubles and
hopefully no worse for 128-bit long doubles.  Now the generated code
is less bad.  There is still unnecessary copying of the arg on amd64
and i386 and mysterious extra slowness on amd64.
2008-02-22 11:59:05 +00:00
Bruce Evans
a7aa8cc980 Optimize the fixup for +-0 by using better classification for this case
and by using a table lookup to avoid a branch when this case occurs.
On i386, this saves 1-4 cycles out of about 64 for non-large args.
2008-02-22 10:04:53 +00:00
Bruce Evans
33843eef65 Fix rintl() on signaling NaNs and unsupported formats. 2008-02-22 09:21:14 +00:00
David Schultz
5aa554c7e5 s/rcsid/__FBSDID/ 2008-02-22 02:30:36 +00:00
David Schultz
fab324dfa4 Remove an unused variable. 2008-02-22 02:27:34 +00:00
David Schultz
7cd50f4d94 Eliminate some warnings. 2008-02-22 02:26:51 +00:00
Philip Paeps
a975b4b6f2 Note, as required by our agreement with IEEE/The Open Group, that the message
queue manual pages excerpt the POSIX standard.

Spotted by:	Mindaugas Rasiukevicius <rmind -at- NetBSD.org>
Reviewed by:	imp
MFC after:	1 day
2008-02-21 19:16:57 +00:00
Tim Kientzle
5b7a04161d Sanity-check the block size.
Thanks to: Joerg Sonnenberger
MFC after: 7 days
2008-02-21 03:21:50 +00:00
Bruce Evans
f21d26becb Merge cosmetic changes from e_rem_pio2.c 1.10 (convert to __FBSDID();
fix indentation and return type of __ieee754_rem_pio2()).

Remove unused variables.
2008-02-19 15:42:46 +00:00
Bruce Evans
9e9d3bc9f1 Optimize for 3pi/4 <= |x| <= 9pi/4 in much the same way as for
pi/4 <= |x| <= 3pi/4.  Use the same branch ladder as for float precision.
Remove the optimization for |x| near pi/2 and don't do it near the
multiples of pi/2 in the newly optimized range, since it requires
fairly large code to handle only relativley few cases.  Ifdef out
optimization for |x| <= pi/4 since this case can't occur because it
is done in callers.

On amd64 (A64), for cos() and sin() with uniformly distributed args,
no cache misses, some parallelism in the caller, and good but not great
CC and CFLAGS, etc., this saves about 40 cycles or 38% in the newly
optimized range, or about 27% on average across the range |x| <= 2pi
(~65 cycles for most args, while the A64 hardware fcos and fsin take
~75 cycles for half the args and 125 cycles for the other half).  The
speedup for tan() is much smaller, especially relatively.  The speedup
on i386 (A64) is slightly smaller, especially relatively.  i386 is
still much slower than amd64 here (unlike in the float case where it
is slightly faster).
2008-02-19 15:30:58 +00:00
Bruce Evans
9ce8756044 Rearrange the polynomial evaluation for better parallelism. This
saves an average of about 8 cycles or 5% on A64 (amd64 and i386 --
more in cycles but about the same percentage on i386, and more with
old versions of gcc) with good CFLAGS and some parallelism in the
caller.  As usual, it takes a couple more multiplications so it will
be slower on old machines.

Convert to __FBSDID().
2008-02-19 12:54:14 +00:00
Tim Kientzle
b3fa7a9568 Include O_BINARY in open() calls on platforms that support it. 2008-02-19 06:10:48 +00:00
Tim Kientzle
dc4a55fdfc Another tiny, tiny step towards Windows support. No, I don't plan to
ever commit the Windows support files to FreeBSD CVS.  That would just
be wrong.
2008-02-19 06:06:13 +00:00
Tim Kientzle
54c845efb9 Someday I might forgive the standards bodies for omitting timegm().
Maybe.  In the meantime, my workarounds for trying to coax UTC without
timegm() are getting uglier and uglier.  Apparently, some systems
don't support setenv()/unsetenv(), so you can't set the TZ env var and
hope thereby to coax mktime() into generating UTC.  Without that, I
don't see a really good alternative to just giving up and converting to
localtime with mktime().  (I suppose I should research the Perl library
approach for computing an inverse function to gmtime(); that might
actually be simpler than this growing list of hacks.)
2008-02-19 06:02:01 +00:00
Tim Kientzle
334a6ee707 Simplify file type setting. 2008-02-19 05:54:24 +00:00
Tim Kientzle
4d9cfd1eb7 The test_assert() function that backs my custom assert() macro
now returns a value, which supports such convenient
constructs as:
   if (assert(NULL != foo())) {
   }

Also be careful to setlocale("C") for each new test to
avoid locale pollution.

Also a couple of minor portability enhancements.
2008-02-19 05:52:30 +00:00
Tim Kientzle
5c5430972a Portability: Since the values are fixed and the symbolic names
are only present on some platforms, just use the values directly.
2008-02-19 05:49:02 +00:00
Tim Kientzle
98ef1f2ddb Portability: Include O_BINARY if the local platform defines it. 2008-02-19 05:46:58 +00:00
Tim Kientzle
f167d4f9c3 Correct a compile error when libbz2/zlib are unavailable. 2008-02-19 05:44:59 +00:00
Tim Kientzle
ee10f0feb0 Mark a few additional functions that are/are not available on FreeBSD. 2008-02-19 05:40:28 +00:00
Tim Kientzle
75018fc592 Portability improvements:
* If the platform can't restore char nodes, block nodes, or fifos,
don't try and just return error.
  * Include O_BINARY in most open() calls (define O_BINARY to 0 if the
platform doesn't provide a definition already)
  * Refactor the ownership restore to more cleanly support platforms
that don't have any form of {l,f,}chown() call.
  * Comment a lingering issue with older Unix-like systems that allow
root to hose the filesystem.  I don't (yet) have a good solution for
this, but I expect it will require adding more redundant stat()
calls. <sigh>

MFC after: 14 days
2008-02-19 05:39:35 +00:00
David Schultz
345241c5e0 Document return values better. 2008-02-18 19:02:49 +00:00
David Schultz
71c11dd528 Add tgammaf() as a simple wrapper around tgamma(). 2008-02-18 17:27:11 +00:00
Bruce Evans
be396b71c1 2 long double constants were missing L suffixes. This helped break tanl()
on !(amd64 || i386).  It gave slightly worse than double precision in some
cases.  tanl() now passes tests of 2^24 values on ia64.
2008-02-18 15:39:52 +00:00
Bruce Evans
19a9e1bb1c Fix a typo which broke k_tanl.c on !(amd64 || i386). 2008-02-18 14:09:41 +00:00
Bruce Evans
38662c9698 Inline __ieee754__rem_pio2(). With gcc4-2, this gives an average
optimization of about 10% for cos(x), sin(x) and tan(x) on
|x| < 2**19*pi/2.  We didn't do this before because __ieee754__rem_pio2()
is too large and complicated for gcc-3.3 to inline very well.  We don't
do this for float precision because it interferes with optimization
of the usual (?) case (|x| < 9pi/4) which is manually inlined for float
precision only.

This has some rough edges:
- some static data is duplicated unnecessarily.  There isn't much after
  the recent move of large tables to k_rem_pio2.c, and some static data
  is duplicated to good affect (all the data static const, so that the
  compiler can evaluate expressions like 2*pio2 at compile time and
  generate even more static data for the constant for this).
- extern inline is used (for the same reason as in previous inlining of
  k_cosf.c etc.), but C99 apparently doesn't allow extern inline
  functions with static data, and gcc will eventually warn about this.

Convert to __FBSDID().

Indent __ieee754_rem_pio2()'s declaration consistently (its style was
made inconsistent with fdlibm a while ago, so complete this).

Fix __ieee754_rem_pio2()'s return type to match its prototype.  Someone
changed too many ints to int32_t's when fixing the assumption that all
ints are int32_t's.
2008-02-18 14:02:12 +00:00
Kevin Lo
8f9872ccb3 getopt(3) returns -1, not EOF. 2008-02-18 03:19:25 +00:00
David Schultz
842d1d5c98 Use volatile hacks to make sure exp() generates an underflow
exception when it's supposed to. Previously, gcc -O2 was optimizing
away the statement that generated it.
2008-02-17 21:53:19 +00:00
Jason Evans
1945c7bd47 Fix a race condition in arena_ralloc() for shrinking in-place large
reallocation, when junk filling is enabled.  Junk filling must occur
prior to shrinking, since any deallocated trailing pages are immediately
available for use by other threads.

Reported by:	Mats Palmgren <mats.palmgren@bredband.net>
2008-02-17 18:34:17 +00:00
Jason Evans
196d0d4b59 Remove support for lazy deallocation. Benchmarks across a wide range of
allocation patterns, number of CPUs, and MALLOC_OPTIONS settings indicate
that lazy deallocation has the potential to worsen throughput dramatically.
Performance degradation occurs when multiple threads try to clear the lazy
free cache simultaneously.  Various experiments to avoid this bottleneck
failed to completely solve this problem, while adding yet more complexity.
2008-02-17 17:09:24 +00:00
David Schultz
234b60cd97 Hook up sinl(), cosl(), and tanl() to the build. 2008-02-17 07:33:51 +00:00
David Schultz
8e77cc6431 Add implementations of sinl(), cosl(), and tanl().
Submitted by:	Steve Kargl <sgk@apl.washington.edu>
2008-02-17 07:33:12 +00:00
David Schultz
f869a8c5f3 Documentation for sinl(), cosl(), and tanl(). 2008-02-17 07:32:44 +00:00
David Schultz
61f955827d Add kernel functions for 128-bit long doubles. These could be improved
a bit, but access to a freebsd/sparc64 machine is needed.

Submitted by:	bde and Steve Kargl <sgk@apl.washington.edu> (earlier version)
2008-02-17 07:32:31 +00:00
David Schultz
de336b0c5e Add kernel functions for 80-bit long doubles. Many thanks to Steve and
Bruce for putting lots of effort into these; getting them right isn't
easy, and they went through many iterations.

Submitted by:	Steve Kargl <sgk@apl.washington.edu> with revisions from bde
2008-02-17 07:32:14 +00:00
David Schultz
079299f710 Add more pi for long doubles. Also, avoid storing multiple copies
of the pi/2 array, as it is unlikely to vary, except in Indiana.
2008-02-17 07:31:59 +00:00
Gregory Neil Shapiro
b3a502610b Switch libmilter from select(2) to poll(2) so milters are not limited
by the size of FD_SETSIZE.

PR:		118824
Submitted by:	vsevolod
MFC after:	3 weeks
2008-02-17 05:14:47 +00:00
Xin LI
5435230b4d Allow underscore in domain names while resolving. While having underscore
is a violation of RFC 1034 [STD 13], it is accepted by certain name servers
as well as other popular operating systems' resolver library.

Bugs are mine.

Obtained from:	ume
MFC after:	2 weeks
2008-02-16 00:16:49 +00:00
Antoine Brodin
b08b3ca18f - Make Disk_Names() behave as documented in libdisk(3): return an array
of disk names, where you must free each pointer, as well as the array
by hand. [1]
- Destaticize "disks" in Disk_Names, it has no reasons to be static.

PR:		kern/96077 [1]
PR:		kern/114110 [1]
MFC after:	1 month
Approved by:	rwatson (mentor)
2008-02-15 21:19:15 +00:00
Bruce Evans
63b4a1f80c Sigh, the weak reference for ceill(), floorl() and truncl() was in
unreachable code due to a missing include.  This kept arm and powerpc
broken.

Reported by:	sam, grehan
2008-02-15 07:01:40 +00:00
Bruce Evans
5014f8ded4 Oops, the weak reference for ceill(), floorl() and truncl() was in the
wrong file.  This broke arm and powerpc.

Reported by:	grehan
2008-02-14 15:10:34 +00:00
Bruce Evans
3365b45e5e Use the expression fabs(x+0.0)+fabs(y+0.0) instad of a+b (where a is
|x| or |y| and b is |y| or |x|) when mixing NaN arg(s).

hypot*() had its own foot shooting for mixing NaNs -- it swaps the
args so that |x| in bits is largest, but does this before quieting
signaling NaNs, so on amd64 (where the result of adding NaNs depends
on the order) it gets inconsistent results if setting the quiet bit
makes a difference, just like a similar ia64 and i387 hardware comparison.
The usual fix (see e_powf.c 1.13 for more details) of mixing using
(a+0.0)+-(b+0.0) doesn't work on amd64 if the args are swapped (since
the rder makes a difference with SSE). Fortunately, the original args
are unchanged and don't need to be swapped when we let the hardware
decide the mixing after quieting them, but we need to take their
absolute value.

hypotf() doesn't seem to have any real bugs masked by this non-bug.
On amd64, its maximum error in 2^32 trials on amd64 is now 0.8422 ulps,
and on i386 the maximum error is unchanged and about the same, except
with certain CFLAGS it magically drops to 0.5 (perfect rounding).

Convert to __FBSDID().
2008-02-14 13:44:03 +00:00
Dag-Erling Smørgrav
096ba44775 _pthread_mutex_isowned_np(): use a more reliable method; the current code
will work in simple cases, but may fail in more complicated ones.

Reviewed by:	davidxu
2008-02-14 12:37:58 +00:00
Bruce Evans
b4437c3d32 Fix the hi+lo decomposition for 2/(3ln2). The decomposition needs to
be into 12+24 bits of precision for extra-precision multiplication,
but was into 13+24 bits.  On i386 with -O1 the bug was hidden by
accidental extra precision, but on amd64, in 2^32 trials the bug
caused about 200000 errors of more than 1 ulp, with a maximum error
of about 80 ulps.  Now the maximum error in 2^32 trials on amd64
is 0.8573 ulps.  It is still 0.8316 ulps on i386 with -O1.

The nearby decomposition of 1/ln2 and the decomposition of 2/(3ln2) in
the double precision version seem to be sub-optimal but not broken.
2008-02-14 10:23:51 +00:00
Bruce Evans
011cbae1fe Use the expression (x+0.0)-(y+0.0) instead of x+y when mixing NaN arg(s).
This uses 2 tricks to improve consistency so that more serious problems
aren't hidden in simple regression tests by noise for the NaNs:

- for a signaling NaN, adding 0.0 generates the invalid exception and
  converts to a quiet NaN, and doesn't have too many effects for other
  types of args (it converts -0 to +0 in some rounding modes, but that
  hopefully doesn't change the result after adding the NaN arg).  This
  avoids some inconsistencies on i386 and ia64.  On these arches, the
  result of an operation on 2 NaNs is apparently the largest or the
  smallest of the NaNs as bits (consistently largest or smallest for
  each arch, but the opposite).  I forget which way the comparison
  goes and if the sign bit affects it.  The quiet bit is is handled
  poorly by not always setting it before the comparision or ignoring
  it.  Thus if one of the args was originally a signaling NaN and the
  other was originally a quiet NaN, then the result depends too much
  on whether the signaling NaN has been quieted at this point, which
  in turn depends on optimizations and promotions.  E.g., passing float
  signaling NaNs to double functions must quiet them on conversion;
  on i387, loading a signaling NaN of type float or double (but not
  long double) into a register involves a conversion, so it quiets
  signaling NaNs, so if the addition has 2 register operands than it
  only sees quiet NaNs, but if the addition has a memory operand then
  it sees a signaling NaN iff it is in the memory operand.

- subtraction instead of addition is used to avoid a dubious optimization
  in old versions of gcc.  For SSE operations, mixing of NaNs apparently
  always gives the target operand.  This is not as good as the i387
  and ia64 behaviour.  It doesn't mix NaNs at all, and makes addition
  not quite commutative.  Old versions of gcc sometimes rewrite x+y
  to y+x and thus give different results (in bits) for NaNs.  gcc-3.3.3
  rewrites x+y to y+x for one of pow() and powf() but not the other,
  so starting from float NaN args x and y, powf(x, y) was almost always
  different from pow(x, y).

These tricks won't give consistency of 2-arg float and double functions
with long double ones on amd64, since long double ones use the i387
which has different semantics from SSE.

Convert to __FBSDID().
2008-02-14 09:42:24 +00:00
Bruce Evans
e7c95ee5fe s_ceill.c
s_floorl.c
s_truncl.c
2008-02-13 17:38:16 +00:00
Bruce Evans
74d68da630 On arches where long double is the same as double, alias ceil(), floor()
and trunc() to the corresponding long double functions.  This is not
just an optimization for these arches.  The full long double functions
have a wrong value for `huge', and the arches without full long doubles
depended on it being wrong.
2008-02-13 16:56:52 +00:00
Bruce Evans
6597187205 Fix the C version of ceill(x) for -1 < x <= -0 in all rounding modes.
The result should be -0, but was +0.
2008-02-13 15:22:53 +00:00
Rong-En Fan
7913e26359 - Remove duplicate tputs.3 from MLINK. As we use termcap in the bsae, remove
the one links to curs_terminfo.

Submitted by:	David Naylor <blackdragon at highveldmail.co.za>
MFC after:	3 days
2008-02-13 14:34:39 +00:00
Bruce Evans
f01bfe5c6d Fix exp2*(x) on signaling NaNs by returning x+x as usual.
This has the side effect of confusing gcc-4.2.1's optimizer into more
often doing the right thing.  When it does the wrong thing here, it
seems to be mainly making too many copies of x with dependency chains.
This effect is tiny on amd64, but in some cases on i386 it is enormous.
E.g., on i386 (A64) with -O1, the current version of exp2() should
take about 50 cycles, but took 83 cycles before this change and 66
cycles after this change.  exp2f() with -O1 only speeded up from 51
to 47 cycles.  (exp2f() should take about 40 cycles, on an Athlon in
either i386 or amd64 mode, and now takes 42 on amd64).  exp2l() with
-O1 slowed down from 155 cycles to 123 for some args; this is unimportant
since the i386 exp2l() is a fake; the wrong thing for it seems to
involve branch misprediction.
2008-02-13 10:44:44 +00:00
Bruce Evans
828f7b4a82 Rearrange the polynomial evaluation for better parallelism. This is
faster on all machines tested (old Celeron (P2), A64 (amd64 and i386)
and ia64) except on ia64 when compiled with -O1.  It takes 2 more
multiplications, so it will be slower on old machines.  The speedup
is about 8 cycles = 17% on A64 (amd64 and i386) with best CFLAGS
and some parallelism in the caller.

Move the evaluation of 2**k up a bit so that it doesn't compete too
much with the new polynomial evaluation.  Unlike the previous
optimization, this rearrangement cannot change the result, so compilers
and CPU schedulers can do it, but they don't do it quite right yet.
This saves a whole 1 or 2 cycles on A64.
2008-02-13 08:36:13 +00:00
Bruce Evans
02ef796d23 Use hardware remainder on amd64 since it is 5 to 10 times faster than
software remainder and is already used for remquo().
2008-02-13 06:01:48 +00:00
David E. O'Brien
a2c4cd4549 style.Makefile(5) 2008-02-13 05:25:43 +00:00
David E. O'Brien
6cc9986927 style(9) 2008-02-13 05:12:05 +00:00
Ruslan Ermilov
5f56182b6f Change readlink(2)'s return type and type of the last argument
to match POSIX.

Prodded by:	Alexey Lyashkov
2008-02-12 20:09:04 +00:00
Bruce Evans
a2ddfa5ea7 Fix remainder() and remainderf() in round-towards-minus-infinity mode
when the result is +-0.  IEEE754 requires (in all rounding modes) that
if the result is +-0 then its sign is the same as that of the first
arg, but in round-towards-minus-infinity mode an uncorrected implementation
detail always reversed the sign.  (The detail is that x-x with x's
sign positive gives -0 in this mode only, but the algorithm assumed
that x-x always has positive sign for such x.)

remquo() and remquof() seem to need the same fix, but I cannot test them
yet.

Use long doubles when mixing NaN args.  This trick improves consistency
of results on at least amd64, so that more serious problems like the
above aren't hidden in simple regression tests by noise for the NaNs.
On amd64, hardware remainder should be used since it is about 10 times
faster than software remainder and is already used for remquo(), but
it involves using the i387 even for floats and doubles, and the i387
does NaN mixing which is better than but inconsistent with SSE NaN mixing.
Software remainder() would probably have been inconsistent with
software remainderl() for the same reason if the latter existed.

Signaling NaNs cause further inconsistencies on at least ia64 and i386.

Use __FBSDID().
2008-02-12 17:11:36 +00:00
Rong-En Fan
67c700c106 - Update build glues for ncurses 5.6 snapshot 20080209
- While I'm here, sort macro defines in ncurses_cfg.h
2008-02-11 13:39:36 +00:00
Remko Lodder
8e167da222 After issueing a ntpdate [1] I noticed it's already 2008, reflect that
in the last modified date.

Noticed by:	brueffer [1]
2008-02-11 07:43:23 +00:00
Remko Lodder
08a155ad22 Fix typo (s/existance/existence/)
Noticed by:	ceri
2008-02-11 07:15:52 +00:00