1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-25 11:37:56 +00:00
Commit Graph

506 Commits

Author SHA1 Message Date
Mateusz Guzik
32417098f0 filedesc: create a dedicated zone for struct filedesc0
Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from
malloc use 2048 bytes.

There is no easy way to shrink the structure <= 1024 an it is likely to grow in
the future.
2014-11-03 04:16:04 +00:00
Mateusz Guzik
3dca54ab98 filedesc: move freeing old tables to fdescfree
They cannot be accessed by anyone and hold count only protects the structure
from being freed.
2014-11-02 14:12:03 +00:00
Mateusz Guzik
3dc85312b2 filedesc: factor out some code out of fdescfree
Previously it had a huge self-contained chunk dedicated to dealing with shared
tables.

No functional changes.
2014-11-02 13:43:04 +00:00
Mateusz Guzik
080fdefc28 filedesc: tidy up fdcheckstd
No functional changes.
2014-11-02 02:32:33 +00:00
Mateusz Guzik
d3f3e12a4f filedesc: lock filedesc lock in fdcloseexec only when needed 2014-11-02 01:13:11 +00:00
Mateusz Guzik
2534d8eeb6 filedesc: drop retval argument from do_dup
It was almost always td_retval anyway.

For the one case where it is not, preserve the old value across the call.
2014-10-31 10:35:01 +00:00
Mateusz Guzik
8a5177cca3 filedesc: fix missed comments about fdsetugidsafety
While here just note that both fdsetugidsafety and fdcheckstd take sleepable
locks.
2014-10-31 09:56:00 +00:00
Mateusz Guzik
f652d856ab filedesc: make fdinit return with source filedesc locked and new one sized
appropriately

Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable.
We don't have to call it with the lock held if we are just creating new
filedesc.

As a side note, strictly speaking processes can have fdtables with
fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file
descriptor they get will be 0 and the only syscall allowing to choose fd number
requires an active file descriptor. Should this ever change, we can add an 'init'
(or similar) parameter to fdgrowtable.
2014-10-31 09:25:28 +00:00
Mateusz Guzik
ffeb890592 filedesc: iterate over fd table only once in fdcopy
While here add 'fdused_init' which does not perform unnecessary work.

Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold
it when appropriate. This function is only used with INVARIANTS.

No functional changes intended.
2014-10-31 09:19:46 +00:00
Mateusz Guzik
1a0c80a3df filedesc: tidy up fdfree
Implement fdefree_last variant and get rid of 'last' parameter.

No functional changes.
2014-10-31 09:15:59 +00:00
Mateusz Guzik
b97a758ffc filedesc: tidy up fdcopy a little bit
Test for file availability by fde_file != NULL instead of fdisused, this is
consistent with similar checks later.

Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never
reached in practice.

fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS.

No functional changes.
2014-10-31 05:41:27 +00:00
Mateusz Guzik
f55cf4b0d1 filedesc: make sure to force table reload in fget_unlocked when count == 0
This is a fixup to r273843.
2014-10-30 07:21:38 +00:00
Mateusz Guzik
29c85772bb filedesc: microoptimize fget_unlocked by retrying obtaining reference count
without restarting whole lookup

Restart is only needed when fp was closed by current process, which is a much
rarer event than ref/deref by some other thread.
2014-10-30 05:21:12 +00:00
Mateusz Guzik
aa77d52800 filedesc: get rid of atomic_load_acq_int from fget_unlocked
A read barrier was necessary because fd table pointer and table size were
updated separately, opening a window where fget_unlocked could read new size
and old pointer.

This patch puts both these fields into one dedicated structure, pointer to which
is later atomically updated. As such, fget_unlocked only needs data a dependency
barrier which is a noop on all supported architectures.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
2014-10-30 05:10:33 +00:00
Mateusz Guzik
58a3dcb229 filedesc assert that table size is at least 3 in fdsetugidsafety
Requested by: kib
2014-10-22 08:56:57 +00:00
Mateusz Guzik
11888da8d9 filedesc: cleanup setugidsafety a little
Rename it to fdsetugidsafety for consistency with other functions.

There is no need to take filedesc lock if not closing any files.

The loop has to verify each file and we are guaranteed fdtable has space
for at least 20 fds. As such there is no need to check fd_lastfile.

While here tidy up is_unsafe.
2014-10-22 00:23:43 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Mateusz Guzik
966ee9f25f filedesc: plug 2 write-only variables
Reported by: Coverity
CID: 1245745, 1245746
2014-10-20 21:57:24 +00:00
Mateusz Guzik
55056be254 filedesc: plug 2 assignments to M_ZERO-ed pointers in falloc_noinstall
No functional changes.
2014-10-15 01:16:11 +00:00
Mateusz Guzik
2b4a2528d7 filedesc: fix up breakage introduced in 272505
Include sequence counter supports incoditionally [1]. This fixes reprted build
problems with e.g. nvidia driver due to missing opt_capsicum.h.

Replace fishy looking sizeof with offsetof. Make fde_seq the last member in
order to simplify calculations.

Suggested by:	kib [1]
X-MFC:		with 272505
2014-10-05 19:40:29 +00:00
Konstantin Belousov
57c2505e65 On error, sbuf_bcat() returns -1. Some callers returned this -1 to
the upper layers, which interpret it as errno value, which happens to
be ERESTART.  The result was spurious restarts of the sysctls in loop,
e.g. kern.proc.proc, instead of returning ENOMEM to caller.

Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers
expecting errno.

In collaboration with:	pho
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2014-10-05 17:35:59 +00:00
Mateusz Guzik
ee3fd7bbb1 Plug capability races.
fp and appropriate capability lookups were not atomic, which could result in
improper capabilities being checked.

This could result either in protection bypass or in a spurious ENOTCAPABLE.

Make fp + capability check atomic with the help of sequence counters.

Reviewed by:	kib
MFC after:	3 weeks
2014-10-04 08:08:56 +00:00
Mateusz Guzik
0c4a09a378 Make do_dup() static and move relevant macros to kern_descrip.c
No functional changes.
2014-09-26 19:48:47 +00:00
Konstantin Belousov
f69261f2f9 Fix fcntl(2) compat32 after r270691. The copyin and copyout of the
struct flock are done in the sys_fcntl(), which mean that compat32 used
direct access to userland pointers.

Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which
performs neccessary userland memory accesses, and use it from both
native and compat32 fcntl syscalls.

Reported by:	jhibbits
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-09-25 21:07:19 +00:00
John Baldwin
9696feebe2 Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)
2014-09-22 16:20:47 +00:00
John Baldwin
2d69d0dcc2 Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
  and invfo_kqfilter() for use by file types that do not support the
  respective operations.  Home-grown versions of invfo_poll() were
  universally broken (they returned an errno value, invfo_poll()
  uses poll_no_poll() to return an appropriate event mask).  Home-grown
  ioctl routines also tended to return an incorrect errno (invfo_ioctl
  returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
  unsupported file operations.
- Reorder fileops members to match the order in the structure definition
  to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
  layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat().  Most
  of these used invfo_*(), but a dummy fo_stat() implementation was
  added.
2014-09-12 21:29:10 +00:00
John Baldwin
0ed667f6e5 Simplify vntype_to_kinfo() by returning when the desired value is found
instead of breaking out of the loop and then immediately checking the loop
index so that if it was broken out of the proper value can be returned.

While here, use nitems().
2014-09-12 20:56:09 +00:00
Mateusz Guzik
64196a9996 Plug unnecessary fp assignments in kern_fcntl.
No functional changes.
2014-09-05 23:56:25 +00:00
Gleb Smirnoff
e86447ca44 - Remove socket file operations declaration from sys/file.h.
- Make them static in sys_socket.c.
- Provide generic invfo_truncate() instead of soo_truncate().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-26 14:44:08 +00:00
Mateusz Guzik
037755fd15 Fix up races with f_seqcount handling.
It was possible that the kernel would overwrite user-supplied hint.

Abuse vnode lock for this purpose.

In collaboration with: kib
MFC after:	1 week
2014-08-26 08:17:22 +00:00
Mateusz Guzik
a1bf811596 Prepare fget_unlocked for reading fd table only once.
Some capsicum functions accept fdp + fd and lookup fde based on that.
Add variants which accept fde.

Reviewed by:	pjd
MFC after:	1 week
2014-07-23 19:33:49 +00:00
Mateusz Guzik
b23c40d7b1 Don't zero fd_nfiles during fdp destruction.
Code trying to take a look has to check fd_refcnt and it is 0 by that time.

This is a follow up to r268505, without this the code would leak memory for
tables bigger than the default.

MFC after:	1 week
2014-07-10 21:05:45 +00:00
Mateusz Guzik
e518baf8f9 Avoid relocking filedesc lock when closing fds during fdp destruction.
Don't call bzero nor fdunused from fdfree for such cases. It would do
unnecessary work and complain that the lock is not taken.

MFC after:	1 week
2014-07-10 20:59:54 +00:00
Mateusz Guzik
b9d32c36fa Make fdunshare accept only td parameter.
Proc had to match the thread anyway and 2 parameters were inconsistent
with the rest.

MFC after:	1 week
2014-06-28 05:41:53 +00:00
Mateusz Guzik
35778d7aa9 Make sure to always clear p_fd for process getting rid of its filetable.
Filetable can be shared with other processes. Previous code failed to
clear the pointer for all but the last process getting rid of the table.
This is mostly cosmetics.

Get rid of 'This should happen earlier' comment. Clearing the pointer in
this place is fine as consumers can reliably check for files availability
by inspecting fd_refcnt and vnodes availabity by NULL-checking them.

MFC after:	1 week
2014-06-28 05:18:03 +00:00
Mateusz Guzik
450570a55e Tidy up fd-related functions called by do_execve
o assert in each one that fdp is not shared
o remove unnecessary NULL checks - all userspace processes have fdtables
and kernel processes cannot execve
o remove comments about the danger of fd_ofiles getting reallocated - fdtable
is not shared and fd_ofiles could be only reallocated if new fd was about to be
added, but if that was possible the code would already be buggy as setugidsafety
work could be undone

MFC after:	1 week
2014-06-23 01:28:18 +00:00
Mateusz Guzik
158627616c Don't take filedesc lock in fdunshare().
We can read refcnt safely and only care if it is equal to 1.

If it could suddenly change from 1 to something bigger the code would be
buggy even in the previous form and transitions from > 1 to 1 are equally racy
and harmless (we copy even though there is no need).

MFC after:	1 week
2014-06-22 21:37:27 +00:00
Mateusz Guzik
adf87ab01c fd: replace fd_nfiles with fd_lastfile where appropriate
fd_lastfile is guaranteed to be the biggest open fd, so when the intent
is to iterate over active fds or lookup one, there is no point in looking
beyond that limit.

Few places are left unpatched for now.

MFC after:	1 week
2014-06-22 01:31:55 +00:00
Mateusz Guzik
0f0b852c73 do_dup: plug redundant adjustment of fd_lastfile
By that time it was already set by fdalloc, or was there in the first place
if fd is replaced.

MFC after:	1 week
2014-06-22 00:53:33 +00:00
Mateusz Guzik
f2b1eaec33 Request a non-exiting process in sysctl_kern_proc_{o,}filedesc
This fixes a race with exit1 freeing p_textvp.

Suggested by:	kib
MFC after:	1 week
2014-05-02 21:55:09 +00:00
Mateusz Guzik
210a5d1689 Garbage collect fdavail.
It rarely returns an error and fdallocn handles the failure of fdalloc
just fine.
2014-04-04 05:07:36 +00:00
Mateusz Guzik
f804336026 Mark the following sysctls as MPSAFE:
kern.file
kern.proc.filedesc
kern.proc.ofiledesc

MFC after:	7 days
2014-03-21 19:12:05 +00:00
Mateusz Guzik
4c73e705a5 Take filedesc lock only for reading when allocating new fdtable.
Code populating the table does this already.

MFC after:	1 week
2014-03-21 01:34:19 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Bryan Drewery
63d8fe5531 Fix style of comment blocks.
Reported by:	peter
Approved by:	bapt (mentor, implicit)
X-MFC with:	r262006
2014-02-22 04:28:49 +00:00
Mateusz Guzik
1f9e8f8ad9 Fix a race between kern_proc_{o,}filedesc_out and fdescfree leading
to use-after-free.

fdescfree proceeds to free file pointers once fd_refcnt reaches 0, but
kern_proc_{o,}filedesc_out only checked for hold count.

MFC after:	3 days
2014-02-21 22:29:09 +00:00
Bryan Drewery
70f82cfbaf Fix M_FILEDESC leak in fdgrowtable() introduced in r244510.
fdgrowtable() now only reallocates fd_map when necessary.

This fixes fdgrowtable() to use the same logic as fdescfree() for
when to free the fd_map. The logic in fdescfree() is intended to
not free the initial static allocation, however the fd_map grows
at a slower rate than the table does. The table is intended to hold
20 fd, but its initial map has many more slots than 20.  The slot
sizing causes NDSLOTS(20) through NDSLOTS(63) to be 1 which matches
NDSLOTS(20), so fdescfree() was assuming that the fd_map was still
the initial allocation and not freeing it.

This partially reverts r244510 by reintroducing some of the logic
it removed in fdgrowtable().

Reviewed by:	mjg
Approved by:	bapt (mentor)
MFC after:	2 weeks
2014-02-17 00:00:39 +00:00
Bryan Drewery
88812f91aa Remove redundant memcpy of fd_ofiles in fdgrowtable() added in r247602
Discussed with:	mjg
Approved by:	bapt (mentor)
MFC after:	2 weeks
2014-02-16 23:10:46 +00:00
Mateusz Guzik
231a0fe857 Plug a memory leak in dup2 when both old and new fd have ioctl caps.
Reviewed by:	pjd
MFC after:	3 days
2014-01-03 16:36:55 +00:00
Mateusz Guzik
0918d4b21f Don't check for fd limits in fdgrowtable_exp.
Callers do that already and additional check races with process
decreasing limits and can result in not growing the table at all, which
is currently not handled.

MFC after:	3 days
2014-01-03 16:34:16 +00:00