libuv/iocp-links.html
Ryan Dahl 116bd967d6 ...
2011-03-23 20:40:55 -07:00

377 lines
14 KiB
HTML

<html>
<head>
<style>
body {
max-width: 40em;
margin: 2em;
}
a {
color: inherit;
}
a:hover {
color: red;
}
dt { margin-top: 1em; }
dd { margin-bottom: 1em; }
</style>
<title>Asynchronous I/O in Windows for UNIX Programmers</title>
</head>
<body>
<h1>Asynchronous I/O in Windows for UNIX Programmers</h1>
<p>Ryan Dahl ryan@joyent.com
<p>This document assumes you are familiar with how non-blocking socket I/O
is done in UNIX.
<p>Windows has different notions for how asynchronous and non-blocking I/O
are done. <code>select()</code> is supported in Window but it supports only 64
file descriptors&mdash;which is unacceptable.
Microsoft understands how to make high-concurrency servers but they've
choosen to do it with an system somewhat different than what one is used to
UNIX. It is called <a
href="http://msdn.microsoft.com/en-us/library/ms686358(v=vs.85).aspx">overlapped
I/O</a>. The device by which overlapped socket I/O is polled for
completion is an <a
href="http://msdn.microsoft.com/en-us/library/aa365198(VS.85).aspx">I/O
completion port</a>. It is more or less equivalent to <a
href="http://en.wikipedia.org/wiki/Kqueue">kqueue</a> (Macintosh and
BSDs), <a href="http://en.wikipedia.org/wiki/Epoll">epoll</a>
(Linux), <a
href="http://developers.sun.com/solaris/articles/event_completion.html">event
completion ports</a> (Solaris), <a href="">poll</a> (modern UNIXes), or <a
href="http://www.kernel.org/doc/man-pages/online/pages/man2/select.2.html">select</a>
(all operating systems). The main variation is that in UNIXes you generally
ask the kernel to wait for file descriptors to change their readability or
writablity, while in Windows you wait for asynchronous functions to complete.
<p>
For example, instead of waiting for a socket to become writable and then
<a
href="http://www.kernel.org/doc/man-pages/online/pages/man2/write.2.html"><code>write(2)</code></a>
to it, as you do in UNIX operating systems, you would rather <a
href="http://msdn.microsoft.com/en-us/library/ms742203(v=vs.85).aspx"><code>WSASend()</code></a>
a buffer and wait for it to have been sent.
<p>
The consequence of this different polling interface is that non-blocking
<code>write(2)</code> and <code>read(2)</code> (among other calls) are not
portable to Windows for high-performance servers.
<p>In UNIX nearly everything has a file descriptor and <code>read(2)</code>
and <code>write(2)</code> more or less work on all of them. This is a nice
abstraction but for non-blocking I/O it does not dig as deep as one would
like. The file system itself has no concept of non-blocking I/O&mdash;file
descriptors for on disk files cannot be polled for readability,
<code>read(2)</code> always has the possibility of blocking for an
indefinite amount of time. UNIX users should not snub the Windows async API,
in practice the explicit difference between sockets, pipes, on disk files,
and TTYs seems make usage more clear where as in UNIX they deceptively seem
seem like they should work similar but do not.
<p>
Almost every socket operation that you're familiar with has an
overlapped counter-part. The following section tries to pair Windows
overlapped I/O syscalls with non-blocking UNIX ones.
<h3>TCP Sockets</h3>
TCP Sockets are by far the most important stream to get right.
Servers should expect to be handling tens of thousands of these
per thread, concurrently. This is possible with overlapped I/O in Windows if
one is careful to avoid UNIX-ism like file descriptors. (Windows has a
hard limit of 2048 open file descriptors&mdash;see
<a
href="http://msdn.microsoft.com/en-us/library/6e3b887c.aspx"><code>_setmaxstdio()</code></a>.)
<dl>
<dt><code>send(2)</code>, <code>write(2)</code></dt>
<dd>Windows: <a href="http://msdn.microsoft.com/en-us/library/ms742203(v=vs.85).aspx"><code>WSASend()</code></a>
</dd>
<dt><code>recv(2)</code>, <code>read(2)</code></dt>
<dd>
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms741688(v=VS.85).aspx"><code>WSARecv()</code></a>
</dd>
<dt><code>connect(2)</code></dt>
<dd>
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms737606(VS.85).aspx"><code>ConnectEx()</code></a>
<p>
Non-blocking <code>connect()</code> is has difficult semantics in
UNIX. The proper way to connect to a remote host is this: call
<code>connect(2)</code> while it returns
<code>EINPROGRESS</code> poll on the file descriptor for writablity.
Then use
<pre>int error;
socklen_t len = sizeof(int);
getsockopt(fd, SOL_SOCKET, SO_ERROR, &error, &len);</pre>
A zero <code>error</code> indicates that the connection succeeded.
(Documented in <code>connect(2)</code> under <code>EINPROGRESS</code>
on the Linux man page.)
</dd>
<dt><code>accept(2)</code></dt>
<dd>
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms737524(v=VS.85).aspx"><code>AcceptEx()</code></a>
</dd>
<dt><code>sendfile(2)</code></dt>
<dd>
Windows: <a href="http://msdn.microsoft.com/en-us/library/ms740565(v=VS.85).aspx"><code>TransmitFile()</code></a>
<p> The exact API of <code>sendfile(2)</code> on UNIX has not been agreed
on yet. Each operating system does it slightly different. All
<code>sendfile(2)</code> implementations (except possibly FreeBSD?) are blocking
even on non-blocking sockets.
<ul>
<li><a href="http://www.kernel.org/doc/man-pages/online/pages/man2/sendfile.2.html">Linux <code>sendfile(2)</code></a>
<li><a href="http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2">FreeBSD <code>sendfile(2)</code></a>
<li><a href="http://www.manpagez.com/man/2/sendfile/">Darwin <code>sendfile(2)</code></a>
</ul>
Marc Lehmann has written <a
href="https://github.com/joyent/node/blob/2c185a9dfd3be8e718858b946333c433c375c295/deps/libeio/eio.c#L954-1080">a
portable version in libeio</a>.
</dd>
<dt><code>shutdown(2)</code>, graceful close, half-duplex connections</dt>
<dd>
<a
href="http://msdn.microsoft.com/en-us/library/ms738547(v=VS.85).aspx">Graceful
Shutdown, Linger Options, and Socket Closure</a>
<br/>
<a
href="http://msdn.microsoft.com/en-us/library/ms737757(VS.85).aspx"><code>DisconnectEx()</code></a>
</dd>
<dt><code>close(2)</code></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/ms737582(v=VS.85).aspx"><code>closesocket()</code></a>
</dd>
The following are nearly same in Windows overlapped and UNIX
non-blocking sockets. The only difference is that the UNIX variants
take integer file descriptors while Windows uses <code>SOCKET</code>.
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/ms740496(v=VS.85).aspx"><code>sockaddr</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms737550(v=VS.85).aspx"><code>bind()</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms738543(v=VS.85).aspx"><code>getsockname()</code></a>
</ul>
<h3>Named Pipes</h3>
Windows has "named pipes" which are more or less the same as <a
href="http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html"><code>AF_UNIX</code>
domain sockets</a>. <code>AF_UNIX</code> sockets exist in the file system
often looking like
<pre>/tmp/<i>pipename</i></pre>
Windows named pipes have a path, but they are not directly part of the file
system; instead they look like
<pre>\\.\pipe\<i>pipename</i></pre>
<dl>
<dt><code>socket(AF_UNIX, SOCK_STREAM, 0), bind(2), listen(2)</code></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/aa365150(VS.85).aspx"><code>CreateNamedPipe()</code></a>
<p>Use <code>FILE_FLAG_OVERLAPPED</code>, <code>PIPE_TYPE_BYTE</code>,
<code>PIPE_NOWAIT</code>.
</dd>
<dt><code>send(2)</code>, <code>write(2)</code></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/aa365748(v=VS.85).aspx"><code>WriteFileEx()</code></a>
</dd>
<dt><code>recv(2)</code>, <code>read(2)</code></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/aa365468(v=VS.85).aspx"><code>ReadFileEx()</code></a>
</dd>
<dt><code>connect(2)</code></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/aa365150(VS.85).aspx"><code>CreateNamedPipe()</code></a>
</dd>
<dt><code>accept(2)</code></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/aa365146(v=VS.85).aspx"><code>ConnectNamedPipe()</code></a>
</dd>
</dl>
Examples:
<ul>
<li><a
href="http://msdn.microsoft.com/en-us/library/aa365601(v=VS.85).aspx">Named
Pipe Server Using Completion Routines</a>
<li><a
href="http://msdn.microsoft.com/en-us/library/aa365603(v=VS.85).aspx">Named
Pipe Server Using Overlapped I/O</a>
</ul>
<h3>On Disk Files</h3>
<p>
In UNIX file system files are not able to use non-blocking I/O. There are
some operating systems that have asynchronous I/O but it is not standard and
at least on Linux is done with pthreads in GNU libc. For this reason
applications designed to be portable across different UNIXes must manage a
thread pool for issuing file I/O syscalls.
<p>
The situation is better in Windows: true overlapped I/O is available when
reading or writing a stream of data to a file.
<dl>
<dt><code>write(2)</code></dt>
<dd> Windows:
<a href="http://msdn.microsoft.com/en-us/library/aa365748(v=VS.85).aspx"><code>WriteFileEx()</code></a>
<p>Solaris's event completion ports has true in-kernel async writes with <a
href="http://download.oracle.com/docs/cd/E19253-01/816-5171/aio-write-3rt/index.html">aio_write(3RT)</a>
</dd>
<dt><code>read(2)</code></dt>
<dd> Windows:
<a href="http://msdn.microsoft.com/en-us/library/aa365468(v=VS.85).aspx"><code>ReadFileEx()</code></a>
<p>Solaris's event completion ports has true in-kernel async reads with <a
href="http://download.oracle.com/docs/cd/E19253-01/816-5171/aio-read-3rt/index.html">aio_read(3RT)</a>
</dd>
</dl>
<h3>Console/TTY</h3>
<p>It is (usually?) possible to poll a UNIX TTY file descriptor for
readability or writablity just like a TCP socket&mdash;this is very helpful
and nice. In Windows the situation is worse, not only is it a completely
different API but there are not overlapped versions to read and write to the
TTY. Polling for readability can be accomplished by waiting in another
thread with <a
href="http://msdn.microsoft.com/en-us/library/ms685061(VS.85).aspx"><code>RegisterWaitForSingleObject()</code></a>.
<dl>
<dt><code>read(2)</code></dt>
<dd>
<a
href="http://msdn.microsoft.com/en-us/library/ms684958(v=VS.85).aspx"><code>ReadConsole()</code></a>
and
<a
href="http://msdn.microsoft.com/en-us/library/ms684961(v=VS.85).aspx"><code>ReadConsoleInput()</code></a>
do not support overlapped I/O and there are no overlapped
counter-parts. One strategy to get around this is
<pre><a href="http://msdn.microsoft.com/en-us/library/ms685061(VS.85).aspx">RegisterWaitForSingleObject</a>(&tty_wait_handle, tty_handle,
tty_want_poll, NULL, INFINITE, WT_EXECUTEINWAITTHREAD |
WT_EXECUTEONLYONCE)</pre>
which will execute <code>tty_want_poll()</code> in a different thread.
You can use this to notify the calling thread that
<code>ReadConsoleInput()</code> will not block.
</dd>
<dt><code>write(2)</code></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/ms687401(v=VS.85).aspx"><code>WriteConsole()</code></a>
is also blocking but this is probably acceptable.
</dd>
<dt><a
href="http://www.kernel.org/doc/man-pages/online/pages/man3/tcsetattr.3.html"><code>tcsetattr(3)</code></a></dt>
<dd>
<a href="http://msdn.microsoft.com/en-us/library/ms686033(VS.85).aspx"><code>SetConsoleMode()</code></a>
</dd>
</dl>
<h2 id="foot2">Links</h2>
<p>
tips
<ul>
<li> overlapped = non-blocking.
<li> There is no overlapped <a href="http://msdn.microsoft.com/en-us/library/ms738518(VS.85).aspx"><code>GetAddrInfoEx()</code></a> function. It seems Asynchronous Procedure Calls must be used instead.
<li> <a href=http://msdn.microsoft.com/en-us/library/ms740673(VS.85).aspx"><code>Windows Sockets 2</code></a>
</ul>
<p>
IOCP:
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/ms686358(v=vs.85).aspx">Synchronization and Overlapped Input and Output</a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms741665(v=VS.85).aspx"><code>WSAOVERLAPPED</code> Structure</a>
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/ms683209(v=VS.85).aspx"><code>GetOverlappedResult()</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms683244(v=VS.85).aspx"><code>HasOverlappedIoCompleted()</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/aa363792(v=vs.85).aspx"><code>CancelIoEx()</code></a>
&mdash; cancels an overlapped operation.
</ul>
<li><a href="http://msdn.microsoft.com/en-us/library/ms742203(v=vs.85).aspx"><code>WSASend()</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms741688(v=VS.85).aspx"><code>WSARecv()</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms737606(VS.85).aspx"><code>ConnectEx()</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms740565(v=VS.85).aspx"><code>TransmitFile()</code></a>
&mdash; an async <code>sendfile()</code> for windows.
<li><a href="http://msdn.microsoft.com/en-us/library/ms741565(v=VS.85).aspx"><code>WSADuplicateSocket()</code></a>
&mdash; describes how to share a socket between two processes.
<li><a href="http://msdn.microsoft.com/en-us/library/6e3b887c.aspx"><code>_setmaxstdio()</code></a>
&mdash; something like setting the maximum number of file decriptors
and <a
href="http://www.kernel.org/doc/man-pages/online/pages/man2/setrlimit.2.html"><code>setrlimit(3)</code></a>
AKA <code>ulimit -n</code>. Note the file descriptor limit on windows is
2048.
</ul>
<p>
APC:
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/ms681951(v=vs.85).aspx">Asynchronous Procedure Calls</a>
<li><a href="http://msdn.microsoft.com/en-us/library/ms682016"><code>DNSQuery()</code></a>
&mdash; General purpose DNS query function like <code>res_query()</code> on UNIX.
</ul>
Pipes:
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/aa365781(v=VS.85).aspx"><code>Pipe functions</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/aa365150(VS.85).aspx"><code>CreateNamedPipe</code></a>
<li><a href="http://msdn.microsoft.com/en-us/library/aa365144(v=VS.85).aspx"><code>CallNamedPipe</code></a>
&mdash; like <code>accept</code> is for UNIX pipes.
<li><a href="http://msdn.microsoft.com/en-us/library/aa365146(v=VS.85).aspx"><code>ConnectNamedPipe</code></a>
</ul>
Also useful:
<a
href="http://msdn.microsoft.com/en-us/library/xw1ew2f8(v=vs.80).aspx">Introduction
to Visual C++ for UNIX Users</a>
</body></html>