diff --git a/iocp-links.html b/iocp-links.html index 49e97a6b..b046f60e 100644 --- a/iocp-links.html +++ b/iocp-links.html @@ -1,251 +1,304 @@ - + dt { margin-top: 1em; } + dd { margin-bottom: 1em; } + +
Ryan Dahl ry@tinyclouds.org +
Ryan Dahl ryan@joyent.com
This document assumes you are familiar with how non-blocking socket I/O is done in UNIX. -
Windows has very different notions for how asynchronous and non-blocking I/O
-are done. While Windows has select() it supports only 64
-file descriptors. Obviously Microsoft does understand how to make
-high-concurrency servers, they've simply choosen a different paradigm for
-this called Windows has different notions for how asynchronous and non-blocking I/O
+are done. select() is supported in Window but it supports only 64
+file descriptors—which is unacceptable.
+Microsoft understands how to make high-concurrency servers but they've
+choosen to do it with an system somewhat different than what one is used to
+UNIX. It is called overlapped
- I/O. The mechanism in Windows by which multiple sockets are polled
-for completion is called
-I/O
- completion ports. More or less equivlant to kqueue (Macintosh,
-FreeBSD, other BSDs), epoll
+ I/O. The device by which overlapped socket I/O is polled for
+completion is an I/O
+ completion port. It is more or less equivalent to kqueue (Macintosh and
+BSDs), epoll
(Linux), event
completion ports (Solaris), poll (modern UNIXes), or select
-(all operating systems). The main difference is that in UNIX you ask the
-kernel to wait for file descriptors to change their readability or
-writablity while in windows you wait for asynchronous functions to complete.
+(all operating systems). The main variation is that in UNIXes you generally
+ask the kernel to wait for file descriptors to change their readability or
+writablity, while in Windows you wait for asynchronous functions to complete.
+
+
For example, instead of waiting for a socket to become writable and then
write(2)
-to it, as you do in UNIX operating systems, you rather WSASend()
a buffer and wait for it to have been sent.
-The result is that non-blocking write(2) and read(2)
-are non-portable to Windows. This tends to throw the poor sap assigned with
-the job of porting your app to Windows into compulsive nervous twitches.
-Almost every socket operation that you're familar with has an
-overlapped counter-part (see table).
+The consequence of this different polling interface is that non-blocking
+write(2) and read(2) (among other calls) are not
+portable to Windows for high-performance servers.
-
-
| - |
- int fd;- |
-
- HANDLE handle;- SOCKET socket;- (the two are the same type) - |
| socket or pipe | -
- send(2),
- write(2)
- |
-
- WSASend()
- |
-
| socket or pipe | -
- recv(2),
- read(2)
- |
-
- WSARecv()
- |
-
| socket | -
- connect(2)- Non-blocking connect() is has difficult semantics in
- UNIX. The proper way to connect to a remote host is this: call
- connect(2) which will usually return EAGAIN.
- Poll on the file descriptor for writablity. Then use
- int error; + +
|
-
- ConnectEx()
- |
-
| pipe | -
- connect(2)- |
-
- ConnectNamedPipe()
-
- Be sure to set PIPE_NOWAIT in CreateNamedPipe()
- |
-
| socket | -
- accept(2)- |
-
- AcceptEx()
- |
-
| pipe | -
- accept(2)- |
-
- ConnectNamedPipe()
- |
-
| file | -
- write(2)
- |
-
- WriteFileEx()
- |
-
| file | -
- read(2)
- |
-
- ReadFileEx()
- |
-
| socket and file | -
- sendfile() [1]
- |
-
- TransmitFile()
- |
-
| tty | -
- tcsetattr(3)
- |
-
- SetConsoleMode()
- |
-
| tty | -
- read(2)
- |
-
- ReadConsole()
- and
- ReadConsoleInput()
- do not support overlapped I/O and there are no overlapped
- counter-parts. One strategy to get around this is
- RegisterWaitForSingleObject(&tty_wait_handle, tty_handle, - tty_want_poll, NULL, INFINITE, WT_EXECUTEINWAITTHREAD | - WT_EXECUTEONLYONCE)- which will execute tty_want_poll() in a different thread.
- You can use this to notify the calling thread that
- ReadConsoleInput() will not block.
-
- |
-
| tty | -
- write(2)
- |
-
- WriteConsole()
- is also blocking but this is probably acceptable.
- |
-
-
[1] sendfile() on UNIX has not been agreed
-on yet. Each operating system has a slightly different API.
+
The exact API of sendfile(2) on UNIX has not been agreed
+on yet. Each operating system does it slightly different. All
+sendfile(2) implementations (except possibly FreeBSD?) are blocking
+even on non-blocking sockets.
+
+The following are nearly same in Windows overlapped and UNIX
+non-blocking sockets. The only difference is that the UNIX variants
+take integer file descriptors while Windows uses SOCKET.
+
sockaddr
+ bind()
+ getsockname()
+AF_UNIX
+ domain sockets. AF_UNIX sockets exist in the file system
+often looking like
+/tmp/pipename+ +Windows named pipes have a path, but they are not directly part of the file +system; instead they look like + +
\\.\pipe\pipename+ + +
socket(AF_UNIX, SOCK_STREAM, 0), bind(2), listen(2)CreateNamedPipe()
+
+Use FILE_FLAG_OVERLAPPED, PIPE_TYPE_BYTE,
+PIPE_NOWAIT.
+
send(2), write(2)WriteFileEx()
+recv(2), read(2)ReadFileEx()
+connect(2)CreateNamedPipe()
+accept(2)ConnectNamedPipe()
++In UNIX file system files are not able to use non-blocking I/O. There are +some operating systems that have asynchronous I/O but it is not standard and +at least on Linux is done with pthreads in GNU libc. For this reason +applications designed to be portable across different UNIXes must manage a +thread pool for issuing file I/O syscalls. + +
+The situation is better in Windows: true overlapped I/O is available when +reading or writing a stream of data to a file. + +
write(2)WriteFileEx()
+
+Solaris's event completion ports has true in-kernel async writes with aio_write(3RT) +
read(2)ReadFileEx()
+
+Solaris's event completion ports has true in-kernel async reads with aio_read(3RT) +
It is (usually?) possible to poll a UNIX TTY file descriptor for
+readability or writablity just like a TCP socket—this is very helpful
+and nice. In Windows the situation is worse, not only is it a completely
+different API but there are not overlapped versions to read and write to the
+TTY. Polling for readability can be accomplished by waiting in another
+thread with RegisterWaitForSingleObject().
+
+
read(2)ReadConsole()
+and
+ReadConsoleInput()
+do not support overlapped I/O and there are no overlapped
+counter-parts. One strategy to get around this is
+RegisterWaitForSingleObject(&tty_wait_handle, tty_handle, + tty_want_poll, NULL, INFINITE, WT_EXECUTEINWAITTHREAD | + WT_EXECUTEONLYONCE)+which will execute
tty_want_poll() in a different thread.
+You can use this to notify the calling thread that
+ReadConsoleInput() will not block.
+write(2)WriteConsole()
+is also blocking but this is probably acceptable.
+tcsetattr(3)SetConsoleMode()
+tips
accept is for UNIX pipes.
ConnectNamedPipe