This uses the kdump(1) utrace support code directly until a common library
is created.
This allows malloc(3) tracing with MALLOC_CONF=utrace:true and rtld tracing
with LD_UTRACE=1. Unknown utrace(2) data is just printed as hex.
PR: 43819 [inspired by]
Reviewed by: jhb
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D3819
The functional_test.sh harness for each test subdir was inspired
by the version in bin/sh/tests/functional_test.sh
Some gymnastics were required to deal with implicit rules for
.c / .o -> .out as the suffix transformation rules were
incorrectly trying to create the test outputs from some of the
source files
Sponsored by: EMC / Isilon Storage Division
The latter is already defined in bsd.libnames.mk, so avoid the conflict
in case someone copy-pastes make variables
While here, switch path to the top of the source tree with SRCTOP
and move from the pattern of:
.if ${MK_FOO} != "no"
SUBDIR+= bar
.endif
to
SUBDIR.${MK_FOO}+= bar
since we know that MK_FOO is always either yes or no and the latter
form is easier to follow and much shorter. Various exception to this
pattern dealt with on an ah-hoc basis.
Discussed on arch@ a while ago.
This is done by changing get_syscall() to either lookup the known syscall
or add it into the list with the default handlers for printing.
This also simplifies some code to not have to check if the syscall variable
is set or NULL.
Reviewed by: jhb
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D3792
This change adds the bits that are necessary to fetch system call
arguments and return values from trapframes for CloudABI. This allows us
to properly print system calls with the right name. We need to make sure
that we properly convert error numbers when system calls fail.
We still need to improve truss to pretty-print some of the system calls
that have flags.
address's length (and then overriding it if it "looks wrong"), use the
next argument to the system call to determine the length. This is more
reliable since this is what the kernel depends on anyway and is also
simpler.
integer. Fix the argument decoding to treat this as a quad instead of an
int. This includes using QUAD_ALIGN and QUAD_SLOTS as necessary. To
continue printing IDs in decimal, add a new QuadHex argument type that
prints a 64-bit integer in hex, use QuadHex for the existing off_t arguments,
repurpose Quad to print a 64-bit integer in decimal, and use Quad for id_t
arguments.
This fixes the decoding of wait6(2) and procctl(2) on 32-bit platforms.
probably fallout from the removal of the extra padding argument before
off_t in 7. However, that padding still exists for 32-bit powerpc, so
use QUAD_ALIGN.
- Fix QUAD_ALIGN to be zero for powerpc64. It should only be set to 1
for 32-bit platforms that add padding to align 64-bit arguments.
- Refactor the interface between the ABI-independent code and the
ABI-specific backends. The backends now provide smaller hooks to
fetch system call arguments and return values. The rest of the
system call entry and exit handling that was previously duplicated
among all the backends has been moved to one place.
- Merge the loop when waiting for an event with the loop for handling stops.
This also means not emulating a procfs-like interface on top of ptrace().
Instead, use a single event loop that fetches process events via waitid().
Among other things this allows us to report the full 32-bit exit value.
- Use PT_FOLLOW_FORK to follow new child processes instead of forking a new
truss process for each new child. This allows one truss process to monitor
a tree of processes and truss -c should now display one total for the
entire tree instead of separate summaries per process.
- Use the recently added fields to ptrace_lwpinfo to determine the current
system call number and argument count. The latter is especially useful
and fixes a regression since the conversion from procfs. truss now
generally prints the correct number of arguments for most system calls
rather than printing extra arguments for any call not listed in the
table in syscalls.c.
- Actually check the new ABI when processes call exec. The comments claimed
that this happened but it was not being done (perhaps this was another
regression in the conversion to ptrace()). If the new ABI after exec
is not supported, truss detaches from the process. If truss does not
support the ABI for a newly executed process the process is killed
before it returns from exec.
- Along with the refactor, teach the various ABI-specific backends to
fetch both return values, not just the first. Use this to properly
report the full 64-bit return value from lseek(). In addition, the
handler for "pipe" now pulls the pair of descriptors out of the
return values (which is the true kernel system call interface) but
displays them as an argument (which matches the interface exported by
libc).
- Each ABI handler adds entries to a linker set rather than requiring
a statically defined table of handlers in main.c.
- The arm and mips system call fetching code was changed to follow the
same pattern as amd64 (and the in-kernel handler) of fetching register
arguments first and then reading any remaining arguments from the
stack. This should fix indirect system call arguments on at least
arm.
- The mipsn32 and n64 ABIs will now look for arguments in A4 through A7.
- Use register %ebp for the 6th system call argument for Linux/i386 ABIs
to match the in-kernel argument fetch code.
- For powerpc binaries on a powerpc64 system, fetch the extra arguments
on the stack as 32-bit values that are then copied into the 64-bit
argument array instead of reading the 32-bit values directly into the
64-bit array.
Reviewed by: kib (earlier version)
Tested on: amd64 (FreeBSD/amd64 & i386), i386, arm (earlier version)
Tested on: powerpc64 (FreeBSD/powerpc64 & powerpc)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D3575
These are only handled as 'build-tools' in Makefile.inc1. This causes
'make clean' from the top of the tree to not clean the directories. It also
effectively has kept them disconnected and risks them bitrotting. The
buildworld process never cleans them either.
Connect them so they will always be built, cleaned, etc, but never installed.
Discussed with: imp (briefly)
Sponsored by: EMC / Isilon Storage Division
routepr() (-r flag). It is too narrow to show an IPv6 prefix
in most cases.
- Accept "local" as a synonym of "unix" in protocol family name.
- Show a prefix length in CIDR notation when name resolution failed in
netname().
- Make routename() and netname() AF-independent and remove
unnecessary typecasting from struct sockaddr.
- Use getnameinfo(3) to format L2 addr in intpr().
- Fix a bug which showed "Address" when -A flag is specfied in pr_rthdr().
- Replace cryptic GETSA() macro with SA_SIZE().
- Fix declarations shadowing local variables with the same names.
- Add more static, remove unused header files and variables.
MFC after: 1 week
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.
This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.
Reviewed by: rwatson, wblock
Differential Revision: https://reviews.freebsd.org/D3411
Feeding any file encoded in 8 bit locales such as KOI8-RU
to sort utility running under UTF-8 locale produces astonishing
result of recoding the output to UTF-8. To counter that, just
run sort under 'C' locale for now.
Sort the output obtained from xargs and the expected output
to ensure the end result versus the input file is stable
Differential Revision: D3432
Submitted by: Nikolai Lifanov <lifanov@mail.lifanov.com>
arrays generically rather than duplicating a hack in all of the backends.
- Add two new system call argument types and use them instead of StringArray
for the argument and environment arguments execve and linux_execve.
- Honor the -a/-e flags in the handling of these new types.
- Instead of printing "<missing argument>" when the decoding is disabled,
print the raw pointer value.
Before truss would fetch 100 string pointers and happily walk off the end
of the array if it never found a NULL. This also means for a short argv
list it could fail entirely if the 100 string pointers spanned into an
unmapped page.
Instead, fetch page-aligned blocks of string pointers in a loop fetching
each string until a NULL is found.
While here, make use of the open memstream file descriptor instead of
allocating a temporary array. This allows us to fetch each string once
instead of twice.
- Print the ident value as decimal instead of hexadecimal for filter types
that use "small" values such as file descriptors and PIDs.
- Decode NOTE_* flags in the fflags field of kevents for several system
filter types.
with open_memstream() to build the string for each argument. This allows
for more complicated argument building without resorting to intermediate
malloc's, etc.
Related, the strsig*() functions no longer return allocated strings but
use a static global buffer instead.
The many christian denominations have different dates for their
celebrations and controversies are likely to be always.
These are well established and happen to be holidays in many
Catholic countries.
MFC after: 1 month
- Don't exit if get_struct() fails, instead print the raw pointer value to
match all other argument decoding cases.
- Use an xlat table instead of a home-rolled switch for the operation name.
- Display the nested socketcall args structure as a structure instead of as
two inline arguments.
sigqueue, sigreturn, sigsuspend, sigtimedwait, sigwait, sigwaitinfo, and
thr_kill.
- Print signal sets as a structure (with {}'s) and in particular use this to
differentiate empty sets from a NULL pointer.
- Decode arguments for some other system calls: issetugid, pipe2, sysarch
(operations are only decoded for amd64 and i386), and thr_self.
Its idea was to be a simple initiator and execute several commands from
kernel level, but FreeBSD never had consumer for that functionality,
while its implementation polluted many unrelated places..
As the name indicates, these are flags to pass to nm(1). The newer
binutils have a plugin mechanism so, to build something with LLVM's
LTO, we need to pass flags to nm(1). This commit also extends
lorder(1) to pass NMFLAGS to nm(1).
The option was added only to ease the transition from GNU Binutils to
ELF Tool Chain tools, and that process is now complete (for the viable
replacements). Noting the removal in UPDATING is sufficient as we have
not shipped a release with the option.
Reviewed by: brooks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3240
fixes and quality of life improvements.
While there are security issues in this time frame that affect usage as a
server (eg: linked into apache), this isn't possible here.
in units(1). The most visible is the removal of libedit warnings
about being unable to open termcap database.
Reviewed by: eadler@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3322
The localedef tool can read entire (and unmodified) CLDR posix definition
files, and generate all 6 LC categories: LC_COLLATE, LC_CTYPE, LC_TIME,
LC_NUMERIC, LC_MONETARY and LC_MESSAGES.
This tool has a long history with Solaris. The Nexenta developers
modified it to read CLDR files and created the much richer collation
formats. The libc collation functions have to be modified to read the
new format (called "BSD-1.0") and to handle the new data structures.
The result will be that locale-sensitive tools and functions will now
properly sort multibyte and unicode strings.
Obtained from: Dragonfly
is taken to match the geometry and only when the geometry is max'd
out, is the actual recorded size taken.
Note that qemu has the same logic for the fixed VHD format. However
that is known to conflict with Microsoft Azure, where the recorded
size of the image is what counts.
Pointed out by: gjb@
especially useful now that libc's open() always calls openat(). While here,
fix a few other things:
- Decode the mode argument passed to access(), eaccess(), and faccessat().
- Decode the atfd paramete to pretty-print AT_FDCWD.
- Decode the special AT_* flags used with some of the *at() system calls.
- Decode arguments for fchmod(), lchmod(), fchown(), lchown(), eaccess(),
and futimens().
- Decode both of the timeval structures passed to futimes() instead of just
the first one.
tightening sanity check of the input. [1]
While I'm there also replace ed(1) with red(1) because we do
not need the unrestricted functionality. [2]
Obtained from: Bitrig [1], DragonFly [2]
Security: CVE-2015-1418 [1]
If the resulting argument is longer than MAXPATHLEN, realloc() was called to
extend the space, but the new pointer was not correctly stored.
Different from what OpenBSD has done, rewrite brace_subst() to calculate the
necessary space first and realloc() at most once.
As before, the e_len fields are not updated in case of a realloc.
Therefore, a following long argument will do another realloc.
PR: 201750
MFC after: 1 week
length. In particular, instead of blinding fetching 1k blocks, do an initial
fetch up to the end of the current page followed by page-sized fetches up to
the maximum size. Previously if the 1k buffer crossed a page boundary and
the second page was not valid, the entire operation would fail.
Revert r286102 and apply a cleaner fix.
Tested for overflows by FORTIFY_SOURCE GSoC (with clang).
Suggested by: bde
Reviewed by: Oliver Pinter
Tested by: Oliver Pinter
MFC after: 3 days
ELF Tool Chain elfcopy is nearly a drop-in replacement for GNU objcopy,
but does not currently support PE output which is needed for building
x86 UEFI bits.
Add a src.conf knob to allow installing it as objcopy and set it by
default for aarch64 only, where we don't have a native binutils.
Reviewed by: bapt
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2887
Ar cannot handle UIDs with more than 6 digits, and storing the mtime,
uid, gid and mode provides little to negative value anyhow for ar's
uses. Turn on deterministic (-D) mode by default; it can be disabled by
the user with -U.
PR: 196929
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3190
making clear transits between different states, and avoids bug with
handling repeated $'s.
Reviewed by: bapt
Differential Revision: https://reviews.freebsd.org/D3221
Rationale: ident(1) is useful out of RCS, lot of scripts are using ident(1) and
failing when base is built WITHOUT_RCS.
This version is:
- fully compatible with RCS 5.7 ident.
- fully compatible with RCS 5.9 ident.
- passes all ident test from GNU RCS 5.9 test suite
This version has support for: svn extension for the Keyword id (double colon and
# before last $)
Différences with GNU RCS ident:
- no long options as found in GNU RCS 5.9 (but not commented there).
- '-V' reports nothing but has been added for compatibility.
Differential Revision: https://reviews.freebsd.org/D3200
Reviewed by: pfg
This is required by our FORTIFY_SOURCE implementation as it
does more inlining. As a rule of thumb, FORTIFY_SOURCE doubles
the number of inlines except that in grep inlining
blows up for some reason.
This is required in order for us to support deterministic mode by
default. If multiple -D or -U options are specified on the command
line, the final one takes precedence. GNU ar also uses -U for this.
An equivalent change will be applied to ELF Tool Chain's version of ar.
PR: 196929
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3175
If apropos(1) and whatis(1) are not hardlinks to man(1) that means the system is
using mandocdb, then man -k should spawn apropos(1) and/or whatis(1) directly
Reported by: kevlo
Tested by: kevlo
Sponsored by: gandi.net
In the kernel, structs such as tcpstat are manipulated as an array of
counter_u64_t (uint64_t *), but made visible to userland as an array of
uint64_t. kread_counters() was previously copying the counter array into
user space and sequentially overwriting each counter with its value. This
mostly affects IPsec counters, as other counters are exported via sysctl.
PR: 201700
Tested by: Jason Unovitch
MFC after: 1 week
According to the last(1) man page, the "reboot" pseudo-user should print
all system reboot entries. This got broken by the utmpx import, as
records are typed.
Re-add support for "last reboot" by specifically matching against
SHUTDOWN_TIME and BOOT_TIME records.
PR: 168844
Submitted by: matthew@
MFC after: 1 month
This is based on work done by jeff@ and jhb@, as well as the numa.diff
patch that has been circulating when someone asks for first-touch NUMA
on -10 or -11.
* Introduce a simple set of VM policy and iterator types.
* tie the policy types into the vm_phys path for now, mirroring how
the initial first-touch allocation work was enabled.
* add syscalls to control changing thread and process defaults.
* add a global NUMA VM domain policy.
* implement a simple cascade policy order - if a thread policy exists, use it;
if a process policy exists, use it; use the default policy.
* processes inherit policies from their parent processes, threads inherit
policies from their parent threads.
* add a simple tool (numactl) to query and modify default thread/process
policities.
* add documentation for the new syscalls, for numa and for numactl.
* re-enable first touch NUMA again by default, as now policies can be
set in a variety of methods.
This is only relevant for very specific workloads.
This doesn't pretend to be a final NUMA solution.
The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
'sysctl vm.default_policy=rr'.
This is only relevant if MAXMEMDOM is set to something other than 1.
Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
this is a glorified no-op for you.
Thank you to Norse Corp for giving me access to rather large
(for FreeBSD!) NUMA machines in order to develop and verify this.
Thank you to Dell for providing me with dual socket sandybridge
and westmere v3 hardware to do NUMA development with.
Thank you to Scott Long at Netflix for providing me with access
to the two-socket, four-domain haswell v3 hardware.
Thank you to Peter Holm for running the stress testing suite
against the NUMA branch during various stages of development!
Tested:
* MIPS (regression testing; non-NUMA)
* i386 (regression testing; non-NUMA GENERIC)
* amd64 (regression testing; non-NUMA GENERIC)
* westmere, 2 socket (thankyou norse!)
* sandy bridge, 2 socket (thankyou dell!)
* ivy bridge, 2 socket (thankyou norse!)
* westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
* haswell, 2 socket (thankyou norse!)
* haswell v3, 2 socket (thankyou dell)
* haswell v3, 2x18 core (thankyou scott long / netflix!)
* Peter Holm ran a stress test suite on this work and found one
issue, but has not been able to verify it (it doesn't look NUMA
related, and he only saw it once over many testing runs.)
* I've tested bhyve instances running in fixed NUMA domains and cpusets;
all seems to work correctly.
Verified:
* intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
NUMA policies for processes under test.
Review:
This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
as well as privately and via emails to freebsd-arch@. The git history
with specific attributes is available at https://github.com/erikarn/freebsd/
in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).
This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
wblock) but not achieved a clear consensus. My hope is that with further
exposure and testing more functionality can be implemented and evaluated.
Notes:
* The VM doesn't handle unbalanced domains very well, and if you have an overly
unbalanced memory setup whilst under high memory pressure, VM page allocation
may fail leading to a kernel panic. This was a problem in the past, but it's
much more easily triggered now with these tools.
* This work only controls the path through vm_phys; it doesn't yet strongly/predictably
affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory
isn't really guaranteed in any way. That's next on my plate.
Sponsored by: Norse Corp, Inc.; Dell