freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-29 12:03:03 +00:00

Author	SHA1	Message	Date
Pawel Jakub Dawidek	a7ebb3eb8b	When we are operating on blocking socket and get EAGAIN on send(2) or recv(2) this means that request timed out. Translate the meaningless EAGAIN to ETIMEDOUT to give administrator a hint that he might need to increase timeout in configuration file. MFC after: 1 month	2011-04-02 09:29:53 +00:00
Pawel Jakub Dawidek	02dfe9724c	Declare directions for sockets between primary and secondary. In HAST we use two sockets - one for only sending the data and one for only receiving the data. MFC after: 1 month	2011-04-02 09:25:13 +00:00
Pawel Jakub Dawidek	3a0b818f59	Allow to disable sends or receives on a socket using shutdown(2) by interpreting NULL 'data' argument passed to proto_common_send() or proto_common_recv() as a will to do so. MFC after: 1 month	2011-04-02 09:22:06 +00:00
Pawel Jakub Dawidek	2a49afacd1	Handle the problem described in r220264 by using GEOM GATE queue of unlimited length. This should fix deadlocks reported by HAST users. MFC after: 1 week	2011-04-02 07:01:09 +00:00
Pawel Jakub Dawidek	54987cacfd	Add mapsize to the header just before sending the packet. Before it could change later and we were sending invalid mapsize. Some time ago I added optimization where when nodes are connected for the first time and there were no writes to them yet, there is no initial full synchronization. This bug prevented it from working. MFC after: 1 week	2011-03-25 20:19:15 +00:00
Pawel Jakub Dawidek	7d4df5cd0b	Use timeout from configuration file not only when sending and receiving, but also when establishing connection. MFC after: 1 week	2011-03-25 20:15:16 +00:00
Pawel Jakub Dawidek	643080b75f	Use role2str() when setting process title. MFC after: 1 week	2011-03-25 20:13:38 +00:00
Pawel Jakub Dawidek	640b7ee623	Don't create socketpair for connection forwarding between parent and secondary. Secondary doesn't need to connect anywhere. MFC after: 1 week	2011-03-23 11:09:04 +00:00
Pawel Jakub Dawidek	6d51b7d530	Add my copyright. MFC after: 1 week	2011-03-22 21:19:51 +00:00
Mikolaj Golub	9237aa3fa5	After synchronization is complete we should make primary counters be equal to secondary counters: primary_localcnt = secondary_remotecnt primary_remotecnt = secondary_localcnt Previously it was done wrong and split-brain was observed after primary had synchronized up-to-date data from secondary. Approved by: pjd (mentor) MFC after: 1 week	2011-03-22 20:27:26 +00:00
Mikolaj Golub	b068d5aafb	For requests that are sent only to remote component use the error from remote. Approved by: pjd (mentor) MFC after: 1 week	2011-03-22 19:49:27 +00:00
Pawel Jakub Dawidek	e2eabb44d7	The proto API is a general purpose API, so don't use 'hast' in structures or function names. It can now be used outside of HAST. MFC after: 1 week	2011-03-22 16:21:11 +00:00
Pawel Jakub Dawidek	cd72d521e3	White space cleanups. MFC after: 1 week	2011-03-22 10:39:34 +00:00
Pawel Jakub Dawidek	4d8dc3b838	When dropping privileges prefer capsicum over chroot+setgid+setuid. We can use capsicum for secondary worker processes and hastctl. When working as primary we drop privileges using chroot+setgid+setuid still as we need to send ioctl(2)s to ggate device, for which capsicum doesn't allow (yet). X-MFC after: capsicum is merged to stable/8	2011-03-21 21:31:50 +00:00
Pawel Jakub Dawidek	9446b4536e	Initialize localcnt on first write. This fixes assertion when we create resource, set role to primary, do no writes, then sent it to secondary and accept connection from primary. MFC after: 1 week	2011-03-21 21:16:12 +00:00
Pawel Jakub Dawidek	756cb15420	Fix typo. MFC after: 1 week	2011-03-21 21:14:07 +00:00
Pawel Jakub Dawidek	351758d85b	Before handling any events on descriptors check signals so we can update our info about worker processes if any of them was terminated in the meantime. This fixes the problem with 'hastctl status' running from a hook called on split-brain: 1. Secondary calls a hooks and terminates. 2. Hook asks for resource status via 'hastctl status'. 3. The main hastd handles the status request by sending it to the secondary worker who is already dead, but because signals weren't checked yet he doesn't know that and we get EPIPE. MFC after: 1 week	2011-03-21 15:29:20 +00:00
Pawel Jakub Dawidek	ed646d4dbc	Remove stale comment. Yes, it is valid to set role back to init. MFC after: 1 week	2011-03-21 15:08:10 +00:00
Pawel Jakub Dawidek	2b5ad0e077	Increase debug level of "Checking hooks." message. MFC after: 1 week	2011-03-21 14:53:27 +00:00
Pawel Jakub Dawidek	e208a185f0	Be pedantic and free nvout before exiting. MFC after: 1 week	2011-03-21 14:51:16 +00:00
Pawel Jakub Dawidek	38ea70cadf	Detect situation where resource internal identifier differs. This means that both nodes have separately managed resources that don't have the same data. MFC after: 1 week	2011-03-21 14:50:12 +00:00
Pawel Jakub Dawidek	0b626a289e	In hast.conf we define the other node's address in 'remote' variable. This way we know how to connect to secondary node when we are primary. The same variable is used by the secondary node - it only accepts connections from the address stored in 'remote' variable. In cluster configurations it is common that each node has its individual IP address and there is one addtional shared IP address which is assigned to primary node. It seems it is possible that if the shared IP address is from the same network as the individual IP address it might be choosen by the kernel as a source address for connection with the secondary node. Such connection will be rejected by secondary, as it doesn't come from primary node individual IP. Add 'source' variable that allows to specify source IP address we want to bind to before connecting to the secondary node. MFC after: 1 week	2011-03-21 08:54:59 +00:00
Pawel Jakub Dawidek	1884f6bbf3	Log when we start hooks checking and when we execute a hook. MFC after: 1 week	2011-03-21 08:38:24 +00:00
Pawel Jakub Dawidek	8a8763b7cf	Use snprlcat() instead of two strlcat(3)s. MFC after: 1 week	2011-03-21 08:37:50 +00:00
Pawel Jakub Dawidek	9925a680a9	Add snprlcat() and vsnprlcat() - the functions I'm always missing. They work as a combination of snprintf(3) and strlcat(3) - the caller can append a string build based on the given format. MFC after: 1 week	2011-03-21 08:36:50 +00:00
Pawel Jakub Dawidek	4f0ec4797a	When creating connection on behalf of primary worker, set pjdlog prefix to resource name and role, so that any logs related to that can be identified properly. MFC after: 1 week	2011-03-21 08:33:58 +00:00
Pawel Jakub Dawidek	c3a8627c9a	If there is any traffic on one of out descriptors, we were not checking for long running hooks. Fix it by not using select(2) timeout to decide if we want to check hooks or not. MFC after: 1 week	2011-03-21 08:31:35 +00:00
Mikolaj Golub	8d7dcf14ff	For secondary, set 2 * HAST_KEEPALIVE seconds timeout for incoming connection so the worker will exit if it does not receive packets from the primary during this interval. Reported by: Christian Vogt <Christian.Vogt@haw-hamburg.de> Tested by: Christian Vogt <Christian.Vogt@haw-hamburg.de> Approved by: pjd (mentor) MFC after: 1 week	2011-03-17 21:02:14 +00:00
Pawel Jakub Dawidek	35daccccce	Remove #include needed for debugging. MFC after: 1 week	2011-03-15 13:53:39 +00:00
Mikolaj Golub	bc7a916a25	Make workers inherit debug level from the main process. Approved by: pjd (mentor) MFC after: 1 week	2011-03-11 12:12:35 +00:00
Pawel Jakub Dawidek	a98bce2941	Unbreak the build. MFC after: 2 weeks	2011-03-07 19:54:51 +00:00
Pawel Jakub Dawidek	fa356f6cfe	- Log size of data to synchronize in human readable form (using %N). - Log synchronization time (using %T). - Log synchronization speed in human readable form (using %N). MFC after: 2 weeks	2011-03-07 10:41:12 +00:00
Pawel Jakub Dawidek	1c151458c6	Use %S to print IP address and port number. MFC after: 2 weeks	2011-03-07 10:39:26 +00:00
Pawel Jakub Dawidek	9e5bdc9d83	- Turn on printf extentions. - Load support for %T for pritning time. - Add support for %N for printing number in human readable form. - Add support for %S for printing sockaddr structure (currently only AF_INET family is supported, as this is all we need in HAST). - Disable gcc compile-time format checking as this will no longer work. MFC after: 2 weeks	2011-03-07 10:38:18 +00:00
Pawel Jakub Dawidek	a61f579394	Provides three states for pjdlog_initialized, so we can also tell that this is fist initialization ever. MFC after: 2 weeks	2011-03-07 10:33:52 +00:00
Pawel Jakub Dawidek	8cd3d45ad9	Allow to compress on-the-wire data using two algorithms: - HOLE - it simply turns all-zero blocks into few bytes header; it is extremely fast, so it is turned on by default; it is mostly intended to speed up initial synchronization where we expect many zeros; - LZF - very fast algorithm by Marc Alexander Lehmann, which shows very decent compression ratio and has BSD license. MFC after: 2 weeks	2011-03-06 23:09:33 +00:00
Pawel Jakub Dawidek	1fee97b01f	Allow to checksum on-the-wire data using either CRC32 or SHA256. MFC after: 2 weeks	2011-03-06 22:56:14 +00:00
Pawel Jakub Dawidek	493812ee6e	When we decide to unlink socket file, sun_path must be set. If it is set, but there is problem unlinking the file, log a warning. MFC after: 1 week	2011-02-09 08:01:10 +00:00
Pawel Jakub Dawidek	0d8d37212b	Explicitly include <sys/types.h> as suggested by getpid(2) and don't rely on <sys/un.h> including what's needed. MFC after: 1 week	2011-02-08 23:16:19 +00:00
Pawel Jakub Dawidek	f431ab182a	Unlink UNIX domain socket file only if: 1. The descriptor is the one we are listening on (not the one when we connect as a client and not the one which is created on accept(2)). 2. Descriptor was created by us (PID matches with the PID stored on bind(2)). Reported by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 1 week	2011-02-08 23:08:20 +00:00
Pawel Jakub Dawidek	e84a29b629	Now that we break the loop on fstat(2) failure we no longer need to satisfy gcc's imperfections. MFC after: 1 week	2011-02-06 14:17:08 +00:00
Pawel Jakub Dawidek	207ee3cdea	Add (void) cast before snprintf(3)s for which we are not interested in return values. MFC after: 1 week	2011-02-06 14:09:19 +00:00
Pawel Jakub Dawidek	ee3a876c18	Treat fstat(2) failure (different than EBADF) as fatal error. Reported by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 1 week	2011-02-06 14:07:58 +00:00
Pawel Jakub Dawidek	18d6e1a5f6	Open syslog when logging sysconf(3) failure. Reported by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 1 week	2011-02-06 14:06:37 +00:00
Pawel Jakub Dawidek	5aa85abd1d	Close more descriptors that can be open if the worker process for the given resource is already running. Submitted by: Mikolaj Golub <to.my.trociny@gmail.com> MFC after: 1 week	2011-02-06 12:21:29 +00:00
Pawel Jakub Dawidek	32ecf62028	Setup another socketpair between parent and child, so that primary sandboxed worker can ask the main privileged process to connect in worker's behalf and then we can migrate descriptor using this socketpair to worker. This is not really needed now, but will be needed once we start to use capsicum for sandboxing. MFC after: 1 week	2011-02-03 11:39:49 +00:00
Pawel Jakub Dawidek	21e7bc5e52	Add missing locking after moving keepalive_send() to remote send thread in r214692. MFC after: 1 week	2011-02-03 11:33:32 +00:00
Pawel Jakub Dawidek	f4c96f944c	Let the caller log info about successful privilege drop. We don't want to log this in hastctl. MFC after: 1 week	2011-02-03 10:37:44 +00:00
Pawel Jakub Dawidek	01ab52c021	- Rename proto_descriptor_{send,recv}() functions to proto_connection_{send,recv} and change them to return proto_conn structure. We don't operate directly on descriptors, but on proto_conns. - Add wrap method to wrap descriptor with proto_conn. - Remove methods to send and receive descriptors and implement this functionality as additional argument to send and receive methods. MFC after: 1 week	2011-02-02 15:53:09 +00:00
Pawel Jakub Dawidek	1c1933226f	Add proto_connect_wait() to wait for connection to finish. If timeout argument to proto_connect() is -1, then the caller needs to use this new function to wait for connection. This change is in preparation for capsicum, where sandboxed worker wants to ask main process to connect in worker's behalf and pass descriptor to the worker. Because we don't want the main process to wait for the connection, it will start async connection and pass descriptor to the worker who will be responsible for waiting for the connection to finish. MFC after: 1 week	2011-02-02 15:46:28 +00:00

1 2 3 4 5

202 Commits