forked from Netflix/dynomite
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsocket.txt
131 lines (105 loc) · 6.14 KB
/
socket.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
- int listen(int sockfd, int backlog);
Linux: The backlog argument defines the maximum length to which the
queue of pending connections for sockfd may grow. If a connection
request arrives when the queue is full, the client may receive an error
with an indication of ECONNREFUSED or, if the underlying protocol
supports retransmission, the request may be ignored so that a later
reattempt at connection succeeds.
backlog specifies the queue length for completely established sockets
waiting to be accepted, instead of the number of incomplete connection
requests. The maximum length of the queue for incomplete sockets can
be set using /proc/sys/net/ipv4/tcp_max_syn_backlog.
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn,
then it is silently truncated to that value; the default value in this
file is 128. In kernels before 2.4.25, this limit was a hard coded value,
SOMAXCONN, with the value 128.
BSD: The backlog argument defines the maximum length the queue of pending
connections may grow to. The real maximum queue length will be 1.5 times
more than the value specified in the backlog argument. A subsequent
listen() system call on the listening socket allows the caller to change
the maximum queue length using a new backlog argument. If a connection
request arrives with the queue full the client may receive an error with
an indication of ECONNREFUSED, or, in the case of TCP, the connection
will be silently dropped.
The listen() system call appeared in 4.2BSD. The ability to configure
the maximum backlog at run-time, and to use a negative backlog to request
the maximum allowable value, was introduced in FreeBSD 2.2.
- SO_LINGER (linger) socket option
This option specifies what should happen when the socket of a type that
promises reliable delivery still has untransmitted messages when it is
closed
struct linger {
int l_onoff; /* nonzero to linger on close */
int l_linger; /* time to linger (in secs) */
};
l_onoff = 0 (default), then l_linger value is ignored and close returns
immediately. But if there is any data still remaining in the socket send
buffer, the system will try to deliver the data to the peer
l_onoff = nonzero, then close blocks until data is transmitted or the
l_linger timeout period expires
a) l_linger = 0, TCP aborts connection, discards any data still remaining
in the socket send buffer and sends RST to peer. This avoids the
TCP's TIME_WAIT state
b) l_linger = nonzero, then kernel will linger when socket is closed. If
there is any pending data in the socket send buffer, the kernel waits
until all the data is sent and acknowledged by peer TCP, or the
linger time expires
If a socket is set as nonblocking, it will not wait for close to complete
even if linger time is nonzero
- TIME_WAIT state
The end that performs active close i.e. the end that sends the first FIN
goes into TIME_WAIT state. After a FIN packet is sent to the peer and
after that peers FIN/ACK arrvies and is ACKed, we go into a TIME_WAIT
state. The duration that the end point remains in this state is 2 x MSL
(maximum segment lifetime). The reason that the duration of the TIME_WAIT
state is 2 x MSL is because the maximum amount of time a packet can wander
around a network is assumed to be MSL seconds. The factor of 2 is for the
round-trip. The recommended value for MSL is 120 seconds, but Berkeley
derived implementations normally use 30 seconds instead. This means a
TIME_WAIT delay is between 1 and 4 minutes.
For Linux, the TIME_WAIT state duration is 1 minute (net/tcp.h):
#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
* state, about 60 seconds */
TIME_WAIT state on client, combined with limited number of ephermeral ports
available for TCP connections severely limits the rate at which new
connections to the server can be created. On Linux, by default ephemeral
ports are in the range of 32768 to 61000:
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000
So with a TIME_WAIT state duration of 1 minute, the maximum sustained rate
for any client is ~470 new connections per second
- TCP keepalive
TCP keepalive packet (TCP packet with no data and the ACK flag turned on)
is used to assert that connection is still up and running. This is useful
because if the remote peer goes away without closing their connection, the
keepalive probe will detect this and notice that the connection is broken
even if there is no traffic on it.
Imagine, the following scenario: You have a valid TCP connection established
between two endpoints A and B. B terminates abnormally (think kernel panic
or unplugging of network cable) without sending anything over the network
to notify A that connection is broken. A, from its side, is ready to
receive data, and has no idea that B has gone away. Now B comes back up
again, and while A knows about a connection with B and still thinks that it
active, B has no such idea. A tries to send data to B over a dead
connection, and B replies with an RST packet, causing A to finally close
the connection. So, without a keepalive probe A would never close the
connection if it never sent data over it.
- There are four socket functions that pass a socket address structure from
the process to the kernel - bind, connect, sendmsg and sendto. These
function are also responsible for passing the length of the sockaddr that
they are passing (socklen_t).
There are five socket functions that pass a socket from the kernel to the
process - accept, recvfrom, recvmsg, getpeername, getsockname. The kernel
is also responsible for returning the length of the sockaddr struct that
it returns back to the userspace
Different sockaddr structs:
1. sockaddr_in
2. sockaddr_in6
3. sockaddr_un
Special types of in_addr_t
/* Address to accept any incoming messages */
#define INADDR_ANY ((in_addr_t) 0x00000000)
/* Address to send to all hosts */
#define INADDR_BROADCAST ((in_addr_t) 0xffffffff)
/* Address indicating an error return */
#define INADDR_NONE ((in_addr_t) 0xffffffff)