Ryan Dahl ryan@joyent.com
This document assumes you are familiar with how non-blocking socket I/O is done in UNIX.
Windows has different notions for how asynchronous and non-blocking I/O
are done. select() is supported in Window but it supports only 64
file descriptors—which is unacceptable.
Microsoft understands how to make high-concurrency servers but they've
choosen to do it with an system somewhat different than what one is used to
UNIX. It is called overlapped
I/O. The device by which overlapped socket I/O is polled for
completion is an I/O
completion port. It is more or less equivalent to kqueue (Macintosh and
BSDs), epoll
(Linux), event
completion ports (Solaris), poll (modern UNIXes), or select
(all operating systems). The main variation is that in UNIXes you generally
ask the kernel to wait for file descriptors to change their readability or
writablity, while in Windows you wait for asynchronous functions to complete.
For example, instead of waiting for a socket to become writable and then
write(2)
to it, as you do in UNIX operating systems, you would rather WSASend()
a buffer and wait for it to have been sent.
The consequence of this different polling interface is that non-blocking
write(2) and read(2) (among other calls) are not
portable to Windows for high-performance servers.
In UNIX nearly everything has a file descriptor and read(2)
and write(2) more or less work on all of them. This is a nice
abstraction but for non-blocking I/O it does not dig as deep as one would
like. The file system itself has no concept of non-blocking I/O—file
descriptors for on disk files cannot be polled for readability,
read(2) always has the possibility of blocking for an
indefinite amount of time. UNIX users should not snub the Windows async API,
in practice the explicit difference between sockets, pipes, on disk files,
and TTYs seems make usage more clear where as in UNIX they deceptively seem
seem like they should work similar but do not.
Almost every socket operation that you're familiar with has an overlapped counter-part. The following section tries to pair Windows overlapped I/O syscalls with non-blocking UNIX ones.
_setmaxstdio().)
send(2), write(2)WSASend()
recv(2), read(2)WSARecv()
connect(2)ConnectEx()
Non-blocking connect() is has difficult semantics in
UNIX. The proper way to connect to a remote host is this: call
connect(2) while it returns
EINPROGRESS poll on the file descriptor for writablity.
Then use
int error; socklen_t len = sizeof(int); getsockopt(fd, SOL_SOCKET, SO_ERROR, &error, &len);A zero
error indicates that the connection succeeded.
(Documented in connect(2) under EINPROGRESS
on the Linux man page.)
accept(2)AcceptEx()
sendfile(2)TransmitFile()
The exact API of sendfile(2) on UNIX has not been agreed
on yet. Each operating system does it slightly different. All
sendfile(2) implementations (except possibly FreeBSD?) are blocking
even on non-blocking sockets.
shutdown(2), graceful close, half-duplex connectionsDisconnectEx()
close(2)closesocket()
SOCKET.
AF_UNIX
domain sockets. AF_UNIX sockets exist in the file system
often looking like
/tmp/pipenameWindows named pipes have a path, but they are not directly part of the file system; instead they look like
\\.\pipe\pipename
socket(AF_UNIX, SOCK_STREAM, 0), bind(2), listen(2)CreateNamedPipe()
Use FILE_FLAG_OVERLAPPED, PIPE_TYPE_BYTE,
PIPE_NOWAIT.
send(2), write(2)WriteFileEx()
recv(2), read(2)ReadFileEx()
connect(2)CreateNamedPipe()
accept(2)ConnectNamedPipe()
In UNIX file system files are not able to use non-blocking I/O. There are some operating systems that have asynchronous I/O but it is not standard and at least on Linux is done with pthreads in GNU libc. For this reason applications designed to be portable across different UNIXes must manage a thread pool for issuing file I/O syscalls.
The situation is better in Windows: true overlapped I/O is available when reading or writing a stream of data to a file.
write(2)WriteFileEx()
Solaris's event completion ports has true in-kernel async writes with aio_write(3RT)
read(2)ReadFileEx()
Solaris's event completion ports has true in-kernel async reads with aio_read(3RT)
It is (usually?) possible to poll a UNIX TTY file descriptor for
readability or writablity just like a TCP socket—this is very helpful
and nice. In Windows the situation is worse, not only is it a completely
different API but there are not overlapped versions to read and write to the
TTY. Polling for readability can be accomplished by waiting in another
thread with RegisterWaitForSingleObject().
read(2)ReadConsole()
and
ReadConsoleInput()
do not support overlapped I/O and there are no overlapped
counter-parts. One strategy to get around this is
RegisterWaitForSingleObject(&tty_wait_handle, tty_handle, tty_want_poll, NULL, INFINITE, WT_EXECUTEINWAITTHREAD | WT_EXECUTEONLYONCE)which will execute
tty_want_poll() in a different thread.
You can use this to notify the calling thread that
ReadConsoleInput() will not block.
write(2)WriteConsole()
is also blocking but this is probably acceptable.
tcsetattr(3)SetConsoleMode()
tips
GetAddrInfoEx() function. It seems Asynchronous Procedure Calls must be used instead.
Windows Sockets 2
IOCP:
WSAOVERLAPPED Structure
GetOverlappedResult()
HasOverlappedIoCompleted()
CancelIoEx()
— cancels an overlapped operation.
WSASend()
WSARecv()
ConnectEx()
TransmitFile()
— an async sendfile() for windows.
WSADuplicateSocket()
— describes how to share a socket between two processes.
_setmaxstdio()
— something like setting the maximum number of file decriptors
and setrlimit(3)
AKA ulimit -n. Note the file descriptor limit on windows is
2048.
APC:
DNSQuery()
— General purpose DNS query function like res_query() on UNIX.
Pipe functions
CreateNamedPipe
CallNamedPipe
— like accept is for UNIX pipes.
ConnectNamedPipe