377 lines
14 KiB
HTML
377 lines
14 KiB
HTML
<html>
|
|
<head>
|
|
<style>
|
|
body {
|
|
max-width: 40em;
|
|
margin: 2em;
|
|
}
|
|
|
|
a {
|
|
color: inherit;
|
|
}
|
|
|
|
a:hover {
|
|
color: red;
|
|
}
|
|
|
|
dt { margin-top: 1em; }
|
|
dd { margin-bottom: 1em; }
|
|
</style>
|
|
<title>Asynchronous I/O in Windows for UNIX Programmers</title>
|
|
</head>
|
|
<body>
|
|
<h1>Asynchronous I/O in Windows for UNIX Programmers</h1>
|
|
|
|
<p>Ryan Dahl ryan@joyent.com
|
|
|
|
<p>This document assumes you are familiar with how non-blocking socket I/O
|
|
is done in UNIX.
|
|
|
|
<p>Windows has different notions for how asynchronous and non-blocking I/O
|
|
are done. <code>select()</code> is supported in Window but it supports only 64
|
|
file descriptors—which is unacceptable.
|
|
Microsoft understands how to make high-concurrency servers but they've
|
|
choosen to do it with an system somewhat different than what one is used to
|
|
UNIX. It is called <a
|
|
href="http://msdn.microsoft.com/en-us/library/ms686358(v=vs.85).aspx">overlapped
|
|
I/O</a>. The device by which overlapped socket I/O is polled for
|
|
completion is an <a
|
|
href="http://msdn.microsoft.com/en-us/library/aa365198(VS.85).aspx">I/O
|
|
completion port</a>. It is more or less equivalent to <a
|
|
href="http://en.wikipedia.org/wiki/Kqueue">kqueue</a> (Macintosh and
|
|
BSDs), <a href="http://en.wikipedia.org/wiki/Epoll">epoll</a>
|
|
(Linux), <a
|
|
href="http://developers.sun.com/solaris/articles/event_completion.html">event
|
|
completion ports</a> (Solaris), <a href="">poll</a> (modern UNIXes), or <a
|
|
href="http://www.kernel.org/doc/man-pages/online/pages/man2/select.2.html">select</a>
|
|
(all operating systems). The main variation is that in UNIXes you generally
|
|
ask the kernel to wait for file descriptors to change their readability or
|
|
writablity, while in Windows you wait for asynchronous functions to complete.
|
|
|
|
<p>
|
|
For example, instead of waiting for a socket to become writable and then
|
|
<a
|
|
href="http://www.kernel.org/doc/man-pages/online/pages/man2/write.2.html"><code>write(2)</code></a>
|
|
to it, as you do in UNIX operating systems, you would rather <a
|
|
href="http://msdn.microsoft.com/en-us/library/ms742203(v=vs.85).aspx"><code>WSASend()</code></a>
|
|
a buffer and wait for it to have been sent.
|
|
|
|
<p>
|
|
The consequence of this different polling interface is that non-blocking
|
|
<code>write(2)</code> and <code>read(2)</code> (among other calls) are not
|
|
portable to Windows for high-performance servers.
|
|
|
|
|
|
<p>In UNIX nearly everything has a file descriptor and <code>read(2)</code>
|
|
and <code>write(2)</code> more or less work on all of them. This is a nice
|
|
abstraction but for non-blocking I/O it does not dig as deep as one would
|
|
like. The file system itself has no concept of non-blocking I/O—file
|
|
descriptors for on disk files cannot be polled for readability,
|
|
<code>read(2)</code> always has the possibility of blocking for an
|
|
indefinite amount of time. UNIX users should not snub the Windows async API,
|
|
in practice the explicit difference between sockets, pipes, on disk files,
|
|
and TTYs seems make usage more clear where as in UNIX they deceptively seem
|
|
seem like they should work similar but do not.
|
|
|
|
|
|
<p>
|
|
Almost every socket operation that you're familiar with has an
|
|
overlapped counter-part. The following section tries to pair Windows
|
|
overlapped I/O syscalls with non-blocking UNIX ones.
|
|
|
|
|
|
<h3>TCP Sockets</h3>
|
|
|
|
TCP Sockets are by far the most important stream to get right.
|
|
Servers should expect to be handling tens of thousands of these
|
|
per thread, concurrently. This is possible with overlapped I/O in Windows if
|
|
one is careful to avoid UNIX-ism like file descriptors. (Windows has a
|
|
hard limit of 2048 open file descriptors—see
|
|
<a
|
|
href="http://msdn.microsoft.com/en-us/library/6e3b887c.aspx"><code>_setmaxstdio()</code></a>.)
|
|
|
|
|
|
<dl>
|
|
|
|
<dt><code>send(2)</code>, <code>write(2)</code></dt>
|
|
<dd>Windows: <a href="http://msdn.microsoft.com/en-us/library/ms742203(v=vs.85).aspx"><code>WSASend()</code></a>
|
|
</dd>
|
|
|
|
|
|
<dt><code>recv(2)</code>, <code>read(2)</code></dt>
|
|
<dd>
|
|
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms741688(v=VS.85).aspx"><code>WSARecv()</code></a>
|
|
</dd>
|
|
|
|
|
|
<dt><code>connect(2)</code></dt>
|
|
<dd>
|
|
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms737606(VS.85).aspx"><code>ConnectEx()</code></a>
|
|
|
|
<p>
|
|
Non-blocking <code>connect()</code> is has difficult semantics in
|
|
UNIX. The proper way to connect to a remote host is this: call
|
|
<code>connect(2)</code> while it returns
|
|
<code>EINPROGRESS</code> poll on the file descriptor for writablity.
|
|
Then use
|
|
<pre>int error;
|
|
socklen_t len = sizeof(int);
|
|
getsockopt(fd, SOL_SOCKET, SO_ERROR, &error, &len);</pre>
|
|
A zero <code>error</code> indicates that the connection succeeded.
|
|
(Documented in <code>connect(2)</code> under <code>EINPROGRESS</code>
|
|
on the Linux man page.)
|
|
</dd>
|
|
|
|
|
|
<dt><code>accept(2)</code></dt>
|
|
<dd>
|
|
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms737524(v=VS.85).aspx"><code>AcceptEx()</code></a>
|
|
</dd>
|
|
|
|
|
|
<dt><code>sendfile(2)</code></dt>
|
|
<dd>
|
|
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms740565(v=VS.85).aspx"><code>TransmitFile()</code></a>
|
|
|
|
<p> The exact API of <code>sendfile(2)</code> on UNIX has not been agreed
|
|
on yet. Each operating system does it slightly different. All
|
|
<code>sendfile(2)</code> implementations (except possibly FreeBSD?) are blocking
|
|
even on non-blocking sockets.
|
|
<ul>
|
|
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/sendfile.2.html">Linux <code>sendfile(2)</code></a>
|
|
<li><a href="http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2">FreeBSD <code>sendfile(2)</code></a>
|
|
<li><a href="http://www.manpagez.com/man/2/sendfile/">Darwin <code>sendfile(2)</code></a>
|
|
</ul>
|
|
Marc Lehmann has written <a
|
|
href="https://github.com/joyent/node/blob/2c185a9dfd3be8e718858b946333c433c375c295/deps/libeio/eio.c#L954-1080">a
|
|
portable version in libeio</a>.
|
|
</dd>
|
|
|
|
<dt><code>shutdown(2)</code>, graceful close, half-duplex connections</dt>
|
|
<dd>
|
|
<a
|
|
href="http://msdn.microsoft.com/en-us/library/ms738547(v=VS.85).aspx">Graceful
|
|
Shutdown, Linger Options, and Socket Closure</a>
|
|
<br/>
|
|
<a
|
|
href="http://msdn.microsoft.com/en-us/library/ms737757(VS.85).aspx"><code>DisconnectEx()</code></a>
|
|
|
|
</dd>
|
|
|
|
<dt><code>close(2)</code></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/ms737582(v=VS.85).aspx"><code>closesocket()</code></a>
|
|
</dd>
|
|
|
|
|
|
The following are nearly same in Windows overlapped and UNIX
|
|
non-blocking sockets. The only difference is that the UNIX variants
|
|
take integer file descriptors while Windows uses <code>SOCKET</code>.
|
|
<ul>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms740496(v=VS.85).aspx"><code>sockaddr</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms737550(v=VS.85).aspx"><code>bind()</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms738543(v=VS.85).aspx"><code>getsockname()</code></a>
|
|
</ul>
|
|
|
|
<h3>Named Pipes</h3>
|
|
|
|
Windows has "named pipes" which are more or less the same as <a
|
|
href="http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html"><code>AF_UNIX</code>
|
|
domain sockets</a>. <code>AF_UNIX</code> sockets exist in the file system
|
|
often looking like
|
|
<pre>/tmp/<i>pipename</i></pre>
|
|
|
|
Windows named pipes have a path, but they are not directly part of the file
|
|
system; instead they look like
|
|
|
|
<pre>\\.\pipe\<i>pipename</i></pre>
|
|
|
|
|
|
<dl>
|
|
<dt><code>socket(AF_UNIX, SOCK_STREAM, 0), bind(2), listen(2)</code></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/aa365150(VS.85).aspx"><code>CreateNamedPipe()</code></a>
|
|
|
|
<p>Use <code>FILE_FLAG_OVERLAPPED</code>, <code>PIPE_TYPE_BYTE</code>,
|
|
<code>PIPE_NOWAIT</code>.
|
|
</dd>
|
|
|
|
|
|
<dt><code>send(2)</code>, <code>write(2)</code></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/aa365748(v=VS.85).aspx"><code>WriteFileEx()</code></a>
|
|
</dd>
|
|
|
|
|
|
<dt><code>recv(2)</code>, <code>read(2)</code></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/aa365468(v=VS.85).aspx"><code>ReadFileEx()</code></a>
|
|
</dd>
|
|
|
|
<dt><code>connect(2)</code></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/aa365150(VS.85).aspx"><code>CreateNamedPipe()</code></a>
|
|
</dd>
|
|
|
|
|
|
<dt><code>accept(2)</code></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/aa365146(v=VS.85).aspx"><code>ConnectNamedPipe()</code></a>
|
|
</dd>
|
|
|
|
|
|
</dl>
|
|
|
|
Examples:
|
|
<ul>
|
|
<li><a
|
|
href="http://msdn.microsoft.com/en-us/library/aa365601(v=VS.85).aspx">Named
|
|
Pipe Server Using Completion Routines</a>
|
|
<li><a
|
|
href="http://msdn.microsoft.com/en-us/library/aa365603(v=VS.85).aspx">Named
|
|
Pipe Server Using Overlapped I/O</a>
|
|
</ul>
|
|
|
|
|
|
<h3>On Disk Files</h3>
|
|
|
|
<p>
|
|
In UNIX file system files are not able to use non-blocking I/O. There are
|
|
some operating systems that have asynchronous I/O but it is not standard and
|
|
at least on Linux is done with pthreads in GNU libc. For this reason
|
|
applications designed to be portable across different UNIXes must manage a
|
|
thread pool for issuing file I/O syscalls.
|
|
|
|
<p>
|
|
The situation is better in Windows: true overlapped I/O is available when
|
|
reading or writing a stream of data to a file.
|
|
|
|
<dl>
|
|
|
|
<dt><code>write(2)</code></dt>
|
|
<dd> Windows:
|
|
<a href="http://msdn.microsoft.com/en-us/library/aa365748(v=VS.85).aspx"><code>WriteFileEx()</code></a>
|
|
|
|
<p>Solaris's event completion ports has true in-kernel async writes with <a
|
|
href="http://download.oracle.com/docs/cd/E19253-01/816-5171/aio-write-3rt/index.html">aio_write(3RT)</a>
|
|
</dd>
|
|
|
|
<dt><code>read(2)</code></dt>
|
|
<dd> Windows:
|
|
<a href="http://msdn.microsoft.com/en-us/library/aa365468(v=VS.85).aspx"><code>ReadFileEx()</code></a>
|
|
|
|
<p>Solaris's event completion ports has true in-kernel async reads with <a
|
|
href="http://download.oracle.com/docs/cd/E19253-01/816-5171/aio-read-3rt/index.html">aio_read(3RT)</a>
|
|
</dd>
|
|
|
|
</dl>
|
|
|
|
<h3>Console/TTY</h3>
|
|
|
|
<p>It is (usually?) possible to poll a UNIX TTY file descriptor for
|
|
readability or writablity just like a TCP socket—this is very helpful
|
|
and nice. In Windows the situation is worse, not only is it a completely
|
|
different API but there are not overlapped versions to read and write to the
|
|
TTY. Polling for readability can be accomplished by waiting in another
|
|
thread with <a
|
|
href="http://msdn.microsoft.com/en-us/library/ms685061(VS.85).aspx"><code>RegisterWaitForSingleObject()</code></a>.
|
|
|
|
<dl>
|
|
|
|
<dt><code>read(2)</code></dt>
|
|
<dd>
|
|
<a
|
|
href="http://msdn.microsoft.com/en-us/library/ms684958(v=VS.85).aspx"><code>ReadConsole()</code></a>
|
|
and
|
|
<a
|
|
href="http://msdn.microsoft.com/en-us/library/ms684961(v=VS.85).aspx"><code>ReadConsoleInput()</code></a>
|
|
do not support overlapped I/O and there are no overlapped
|
|
counter-parts. One strategy to get around this is
|
|
<pre><a href="http://msdn.microsoft.com/en-us/library/ms685061(VS.85).aspx">RegisterWaitForSingleObject</a>(&tty_wait_handle, tty_handle,
|
|
tty_want_poll, NULL, INFINITE, WT_EXECUTEINWAITTHREAD |
|
|
WT_EXECUTEONLYONCE)</pre>
|
|
which will execute <code>tty_want_poll()</code> in a different thread.
|
|
You can use this to notify the calling thread that
|
|
<code>ReadConsoleInput()</code> will not block.
|
|
</dd>
|
|
|
|
|
|
<dt><code>write(2)</code></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/ms687401(v=VS.85).aspx"><code>WriteConsole()</code></a>
|
|
is also blocking but this is probably acceptable.
|
|
</dd>
|
|
|
|
|
|
<dt><a
|
|
href="http://www.kernel.org/doc/man-pages/online/pages/man3/tcsetattr.3.html"><code>tcsetattr(3)</code></a></dt>
|
|
<dd>
|
|
<a href="http://msdn.microsoft.com/en-us/library/ms686033(VS.85).aspx"><code>SetConsoleMode()</code></a>
|
|
</dd>
|
|
|
|
|
|
</dl>
|
|
|
|
|
|
|
|
<h2 id="foot2">Links</h2>
|
|
<p>
|
|
tips
|
|
<ul>
|
|
<li> overlapped = non-blocking.
|
|
<li> There is no overlapped <a href="http://msdn.microsoft.com/en-us/library/ms738518(VS.85).aspx"><code>GetAddrInfoEx()</code></a> function. It seems Asynchronous Procedure Calls must be used instead.
|
|
<li> <a href=http://msdn.microsoft.com/en-us/library/ms740673(VS.85).aspx"><code>Windows Sockets 2</code></a>
|
|
</ul>
|
|
|
|
<p>
|
|
IOCP:
|
|
<ul>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms686358(v=vs.85).aspx">Synchronization and Overlapped Input and Output</a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms741665(v=VS.85).aspx"><code>WSAOVERLAPPED</code> Structure</a>
|
|
<ul>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms683209(v=VS.85).aspx"><code>GetOverlappedResult()</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms683244(v=VS.85).aspx"><code>HasOverlappedIoCompleted()</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/aa363792(v=vs.85).aspx"><code>CancelIoEx()</code></a>
|
|
— cancels an overlapped operation.
|
|
</ul>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms742203(v=vs.85).aspx"><code>WSASend()</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms741688(v=VS.85).aspx"><code>WSARecv()</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms737606(VS.85).aspx"><code>ConnectEx()</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms740565(v=VS.85).aspx"><code>TransmitFile()</code></a>
|
|
— an async <code>sendfile()</code> for windows.
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms741565(v=VS.85).aspx"><code>WSADuplicateSocket()</code></a>
|
|
— describes how to share a socket between two processes.
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/6e3b887c.aspx"><code>_setmaxstdio()</code></a>
|
|
— something like setting the maximum number of file decriptors
|
|
and <a
|
|
href="http://www.kernel.org/doc/man-pages/online/pages/man2/setrlimit.2.html"><code>setrlimit(3)</code></a>
|
|
AKA <code>ulimit -n</code>. Note the file descriptor limit on windows is
|
|
2048.
|
|
</ul>
|
|
|
|
<p>
|
|
APC:
|
|
<ul>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms681951(v=vs.85).aspx">Asynchronous Procedure Calls</a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/ms682016"><code>DNSQuery()</code></a>
|
|
— General purpose DNS query function like <code>res_query()</code> on UNIX.
|
|
</ul>
|
|
|
|
|
|
Pipes:
|
|
<ul>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/aa365781(v=VS.85).aspx"><code>Pipe functions</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/aa365150(VS.85).aspx"><code>CreateNamedPipe</code></a>
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/aa365144(v=VS.85).aspx"><code>CallNamedPipe</code></a>
|
|
— like <code>accept</code> is for UNIX pipes.
|
|
<li><a href="http://msdn.microsoft.com/en-us/library/aa365146(v=VS.85).aspx"><code>ConnectNamedPipe</code></a>
|
|
</ul>
|
|
|
|
|
|
Also useful:
|
|
<a
|
|
href="http://msdn.microsoft.com/en-us/library/xw1ew2f8(v=vs.80).aspx">Introduction
|
|
to Visual C++ for UNIX Users</a>
|
|
|
|
</body></html>
|