Socket作用
没有socket的网络通信是这样的
可见,socket 为用户进程和 TCP/IP 协议簇通信中提供了中间的抽象层,也即接口。
Socket 的接口函数
在 TCP 中的client/server的网络通信中,流程大概如下图所示:
各函数的c接口及Python接口解释如下:
socket()
int socket(int domain, int type, int protocol);
socket() creates an endpoint for communication and returns a file descriptor that refers to that endpoint.
其中的 file desciptor 简记为 sockfd.
-
domain: specifies a communication domain; 即协议域, 常见的有
- AF_UNIX, AF_LOCAL: Local communication
- AF_INET: IPv4 Internet protocols
- AF_INET6: IPv6 Internet protocols
-
type: specifies the communication semantics. 常见的有
- SOCK_STREAM: Provides sequenced, reliable, two-way, connection-
based byte streams. 用于 TCP 传输 - SOCK_DGRAM: Supports datagrams (connectionless, unreliable
messages of a fixed maximum length). 用于 UDP 传输 - SOCK_RAW: Provides raw network protocol access. 允许应用层直接发送和接受来自 IP 层的 package 而无需经过传输层。可用于 ICMP 和 IGMP 传输
- SOCK_STREAM: Provides sequenced, reliable, two-way, connection-
protocol: 定协议,有IPPROTO_TCP、IPPTOTO_UDP、IPPROTO_SCTP,分别对应于 TCP, UDP, SCTP 传输协议。如果指定为0,表示由内核根据type指定默认的通信协议
对应的python接口
socket.socket(family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None)
bind()
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
When a socket is created with socket(), it exists in a name space (address family) but has no address assigned to it. bind() assigns the address specified by addr
to the socket referred to by the file descriptor sockfd
. Traditionally, this operation is called “assigning a name to a socket”.
sockdf: 即为 socket() 的返回值。
addr: The actual structure passed for the addr argument will depend on the
address family. 对应的 IPV4 为
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address. */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
- addrlen: addrlen specifies the size, in bytes, of the address structure pointed to by addr.
!Note: 通常服务器在启动的时候都会绑定一个众所周知的地址(如ip地址+端口号),用于提供服务,客户就可以通过它来接连服务器;而客户端就不用指定,有系统自动分配一个端口号和自身的ip地址组合。这就是为什么通常服务器端在listen之前会调用bind(),而客户端就不会调用,而是在connect()时由系统随机生成一个。
对应的 Python 接口
socket.bind(address)
AF_INET即 IPV4 对应的 address 为 (host, port).
listen()
int listen(int sockfd, int backlog);
listen for connections on a socket.
- backlog: defines the maximum length to which the queue of pending connections for sockfd may grow.
!Note: 调用listen后,内核就会建立两个队列,一个SYN队列,表示接受到请求,但未完成三次握手的连接;另一个是ACCEPT队列,表示已经完成了三次握手的队列. backlog 已经完成三次握手而等待accept()的连接数
对应的 Python 接口
socket.listen([backlog])
connect()
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
The connect() system call connects the socket referred to by the file descriptor sockfd
to the address specified by addr
.
sockfd: 客户端的 sockfd
addr: 服务端的 socket 地址
对应的 Python 接口
socket.connect(address)
accept()
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
accept a connection on a socket, It extracts the first connection request on the queue of pending connections for the listening socket, sockfd, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state.
sockfd: The argument sockfd is a socket that has been created with socket(), bound to a local address with bind(), and is listening for connections after a listen().
addr: The argument addr is a pointer to a sockaddr structure. This structure is filled in with the address of the peer socket, as known to the communications layer.
!Note: If no pending connections are present on the queue, and the socket is
not marked as nonblocking, accept() blocks the caller until a
connection is present. If the socket is marked nonblocking and no
pending connections are present on the queue, accept() fails with the
error EAGAIN or EWOULDBLOCK.
- EAGAIN 和 EWOULDBLOCK 同样东西
#define EAGAIN 11 /* Try again /
#define EWOULDBLOCK EAGAIN / Operation would block */
当应用程序进行一些非阻塞(non-blocking)操作(对文件或socket)的时候。例如,以 O_NONBLOCK的标志打开文件/socket/FIFO,如果你连续做read操作而没有数据可读。此时程序不会阻塞起来等待数据准备就绪返 回,read函数会返回一个错误EAGAIN,提示你的应用程序现在没有数据可读请稍后再试。
对应 python 的接口
socket.accept()
The return value is a pair (conn, address) where conn
is a new socket object usable to send and receive data on the connection, and address
is the address bound to the socket on the other end of the connection.
!Note: 监听套接字 v.s. 连接套接字
- 监听套接字: 由服务端 socket() 函数返回 sockfd,并传入 listen() 方法里的 套接字.
- 连接套接字: 在客户端 connect() 成功后,accept() 新创建的套接字.
- 一个服务器通常通常仅仅只创建一个监听socket描述字,它在该服务器的生命周期内一直存在。内核为每个由服务器进程接受的客户连接创建了一个已连接socket描述字,当服务器完成了对某个客户的服务,相应的已连接socket描述字就被关闭。
read()/write()
close() v.s. shutdown()
参考
下图来自 Socket 编程实战
三次握手
简单的原理图
稍微复杂些的(来自曾志优同学):
步骤详见 socket原理详解的 3.6、listen、connect、accept流程及原理
更详细的原理请见 高性能网络编程(一)----accept建立连接
参考
socket 进阶篇
- 理解并使用 select: I/O 多路复用
- socketserver模块的使用和源码剖析, 参考 python之socket编程