diff --git a/docs/libcurl-the-guide b/docs/libcurl-the-guide deleted file mode 100644 index ba2fb9d2fd..0000000000 --- a/docs/libcurl-the-guide +++ /dev/null @@ -1,1191 +0,0 @@ -$Id$ - _ _ ____ _ - ___| | | | _ \| | - / __| | | | |_) | | - | (__| |_| | _ <| |___ - \___|\___/|_| \_\_____| - -PROGRAMMING WITH LIBCURL - -About this Document - - This document attempts to describe the general principles and some basic - approaches to consider when programming with libcurl. The text will focus - mainly on the C interface but might apply fairly well on other interfaces as - well as they usually follow the C one pretty closely. - - This document will refer to 'the user' as the person writing the source code - that uses libcurl. That would probably be you or someone in your position. - What will be generally referred to as 'the program' will be the collected - source code that you write that is using libcurl for transfers. The program - is outside libcurl and libcurl is outside of the program. - - To get the more details on all options and functions described herein, please - refer to their respective man pages. - -Building - - There are many different ways to build C programs. This chapter will assume a - unix-style build process. If you use a different build system, you can still - read this to get general information that may apply to your environment as - well. - - Compiling the Program - - Your compiler needs to know where the libcurl headers are - located. Therefore you must set your compiler's include path to point to - the directory where you installed them. The 'curl-config'[3] tool can be - used to get this information: - - $ curl-config --cflags - - Linking the Program with libcurl - - When having compiled the program, you need to link your object files to - create a single executable. For that to succeed, you need to link with - libcurl and possibly also with other libraries that libcurl itself depends - on. Like OpenSSL libraries, but even some standard OS libraries may be - needed on the command line. To figure out which flags to use, once again - the 'curl-config' tool comes to the rescue: - - $ curl-config --libs - - SSL or Not - - libcurl can be built and customized in many ways. One of the things that - varies from different libraries and builds is the support for SSL-based - transfers, like HTTPS and FTPS. If OpenSSL was detected properly at - build-time, libcurl will be built with SSL support. To figure out if an - installed libcurl has been built with SSL support enabled, use - 'curl-config' like this: - - $ curl-config --feature - - And if SSL is supported, the keyword 'SSL' will be written to stdout, - possibly together with a few other features that can be on and off on - different libcurls. - - See also the "Features libcurl Provides" further down. - - -Portable Code in a Portable World - - The people behind libcurl have put a considerable effort to make libcurl work - on a large amount of different operating systems and environments. - - You program libcurl the same way on all platforms that libcurl runs on. There - are only very few minor considerations that differs. If you just make sure to - write your code portable enough, you may very well create yourself a very - portable program. libcurl shouldn't stop you from that. - - -Global Preparation - - The program must initialize some of the libcurl functionality globally. That - means it should be done exactly once, no matter how many times you intend to - use the library. Once for your program's entire life time. This is done using - - curl_global_init() - - and it takes one parameter which is a bit pattern that tells libcurl what to - initialize. Using CURL_GLOBAL_ALL will make it initialize all known internal - sub modules, and might be a good default option. The current two bits that - are specified are: - - CURL_GLOBAL_WIN32 which only does anything on Windows machines. When used on - a Windows machine, it'll make libcurl initialize the win32 socket - stuff. Without having that initialized properly, your program cannot use - sockets properly. You should only do this once for each application, so if - your program already does this or of another library in use does it, you - should not tell libcurl to do this as well. - - CURL_GLOBAL_SSL which only does anything on libcurls compiled and built - SSL-enabled. On these systems, this will make libcurl initialize OpenSSL - properly for this application. This is only needed to do once for each - application so if your program or another library already does this, this - bit should not be needed. - - libcurl has a default protection mechanism that detects if curl_global_init() - hasn't been called by the time curl_easy_perform() is called and if that is - the case, libcurl runs the function itself with a guessed bit pattern. Please - note that depending solely on this is not considered nice nor very good. - - When the program no longer uses libcurl, it should call - curl_global_cleanup(), which is the opposite of the init call. It will then - do the reversed operations to cleanup the resources the curl_global_init() - call initialized. - - Repeated calls to curl_global_init() and curl_global_cleanup() should be - avoided. They should only be called once each. - - -Features libcurl Provides - - It is considered best-practice to determine libcurl features run-time rather - than build-time (if possible of course). By calling curl_version_info() and - checking tout he details of the returned struct, your program can figure out - exactly what the currently running libcurl supports. - - -Handle the Easy libcurl - - libcurl first introduced the so called easy interface. All operations in the - easy interface are prefixed with 'curl_easy'. - - Recent libcurl versions also offer the multi interface. More about that - interface, what it is targeted for and how to use it is detailed in a - separate chapter further down. You still need to understand the easy - interface first, so please continue reading for better understanding. - - To use the easy interface, you must first create yourself an easy handle. You - need one handle for each easy session you want to perform. Basically, you - should use one handle for every thread you plan to use for transferring. You - must never share the same handle in multiple threads. - - Get an easy handle with - - easyhandle = curl_easy_init(); - - It returns an easy handle. Using that you proceed to the next step: setting - up your preferred actions. A handle is just a logic entity for the upcoming - transfer or series of transfers. - - You set properties and options for this handle using curl_easy_setopt(). They - control how the subsequent transfer or transfers will be made. Options remain - set in the handle until set again to something different. Alas, multiple - requests using the same handle will use the same options. - - Many of the options you set in libcurl are "strings", pointers to data - terminated with a zero byte. Keep in mind that when you set strings with - curl_easy_setopt(), libcurl will not copy the data. It will merely point to - the data. You MUST make sure that the data remains available for libcurl to - use until finished or until you use the same option again to point to - something else. - - One of the most basic properties to set in the handle is the URL. You set - your preferred URL to transfer with CURLOPT_URL in a manner similar to: - - curl_easy_setopt(easyhandle, CURLOPT_URL, "http://curl.haxx.se/"); - - Let's assume for a while that you want to receive data as the URL identifies - a remote resource you want to get here. Since you write a sort of application - that needs this transfer, I assume that you would like to get the data passed - to you directly instead of simply getting it passed to stdout. So, you write - your own function that matches this prototype: - - size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp); - - You tell libcurl to pass all data to this function by issuing a function - similar to this: - - curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data); - - You can control what data your function get in the forth argument by setting - another property: - - curl_easy_setopt(easyhandle, CURLOPT_FILE, &internal_struct); - - Using that property, you can easily pass local data between your application - and the function that gets invoked by libcurl. libcurl itself won't touch the - data you pass with CURLOPT_FILE. - - libcurl offers its own default internal callback that'll take care of the - data if you don't set the callback with CURLOPT_WRITEFUNCTION. It will then - simply output the received data to stdout. You can have the default callback - write the data to a different file handle by passing a 'FILE *' to a file - opened for writing with the CURLOPT_FILE option. - - Now, we need to take a step back and have a deep breath. Here's one of those - rare platform-dependent nitpicks. Did you spot it? On some platforms[2], - libcurl won't be able to operate on files opened by the program. Thus, if you - use the default callback and pass in a an open file with CURLOPT_FILE, it - will crash. You should therefore avoid this to make your program run fine - virtually everywhere. - - There are of course many more options you can set, and we'll get back to a - few of them later. Let's instead continue to the actual transfer: - - success = curl_easy_perform(easyhandle); - - The curl_easy_perform() will connect to the remote site, do the necessary - commands and receive the transfer. Whenever it receives data, it calls the - callback function we previously set. The function may get one byte at a time, - or it may get many kilobytes at once. libcurl delivers as much as possible as - often as possible. Your callback function should return the number of bytes - it "took care of". If that is not the exact same amount of bytes that was - passed to it, libcurl will abort the operation and return with an error code. - - When the transfer is complete, the function returns a return code that - informs you if it succeeded in its mission or not. If a return code isn't - enough for you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a - buffer of yours where it'll store a human readable error message as well. - - If you then want to transfer another file, the handle is ready to be used - again. Mind you, it is even preferred that you re-use an existing handle if - you intend to make another transfer. libcurl will then attempt to re-use the - previous - - -Multi-threading issues - - libcurl is completely thread safe, except for two issues: signals and alarm - handlers. Signals are needed for a SIGPIPE handler, and the alarm() Bacall - is used to catch timeouts (mostly during ENS lookup). - - If you are accessing HTTPS or FTPS URLs in a multi-threaded manner, you are - then of course using OpenSSL multi-threaded and it has itself a few - requirements on this. Basilio, you need to provide one or two functions to - allow it to function properly. For all details, see this: - - http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION - - When using multiple threads you should set the CURLOPT_NOSIGNAL option to - TRUE for all handles. Everything will work fine except that timeouts are not - honored during the DNS lookup - which you can work around by building libcurl - with c-ares support. c-ares is a library that provides asynchronous name - resolves. Unfortunately, c-ares does not yet support IPv6. - - Also, note that CURLOPT_DNS_USE_GLOBAL_CACHE is not thread-safe. - -When It Doesn't Work - - There will always be times when the transfer fails for some reason. You might - have set the wrong libcurl option or misunderstood what the libcurl option - actually does, or the remote server might return non-standard replies that - confuse the library which then confuses your program. - - There's one golden rule when these things occur: set the CURLOPT_VERBOSE - option to TRUE. It'll cause the library to spew out the entire protocol - details it sends, some internal info and some received protocol data as well - (especially when using FTP). If you're using HTTP, adding the headers in the - received output to study is also a clever way to get a better understanding - why the server behaves the way it does. Include headers in the normal body - output with CURLOPT_HEADER set TRUE. - - Of course there are bugs left. We need to get to know about them to be able - to fix them, so we're quite dependent on your bug reports! When you do report - suspected bugs in libcurl, please include as much details you possibly can: a - protocol dump that CURLOPT_VERBOSE produces, library version, as much as - possible of your code that uses libcurl, operating system name and version, - compiler name and version etc. - - If CURLOPT_VERBOSE is not enough, you increase the level of debug data your - application receive by using the CURLOPT_DEBUGFUNCTION. - - Getting some in-depth knowledge about the protocols involved is never wrong, - and if you're trying to do funny things, you might very well understand - libcurl and how to use it better if you study the appropriate RFC documents - at least briefly. - - -Upload Data to a Remote Site - - libcurl tries to keep a protocol independent approach to most transfers, thus - uploading to a remote FTP site is very similar to uploading data to a HTTP - server with a PUT request. - - Of course, first you either create an easy handle or you re-use one existing - one. Then you set the URL to operate on just like before. This is the remote - URL, that we now will upload. - - Since we write an application, we most likely want libcurl to get the upload - data by asking us for it. To make it do that, we set the read callback and - the custom pointer libcurl will pass to our read callback. The read callback - should have a prototype similar to: - - size_t function(char *bufptr, size_t size, size_t nitems, void *userp); - - Where bufptr is the pointer to a buffer we fill in with data to upload and - size*nitems is the size of the buffer and therefore also the maximum amount - of data we can return to libcurl in this call. The 'userp' pointer is the - custom pointer we set to point to a struct of ours to pass private data - between the application and the callback. - - curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function); - - curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata); - - Tell libcurl that we want to upload: - - curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE); - - A few protocols won't behave properly when uploads are done without any prior - knowledge of the expected file size. So, set the upload file size using the - CURLOPT_INFILESIZE_LARGE for all known file sizes like this[1]: - - /* in this example, file_size must be an off_t variable */ - curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size); - - When you call curl_easy_perform() this time, it'll perform all the necessary - operations and when it has invoked the upload it'll call your supplied - callback to get the data to upload. The program should return as much data as - possible in every invoke, as that is likely to make the upload perform as - fast as possible. The callback should return the number of bytes it wrote in - the buffer. Returning 0 will signal the end of the upload. - - -Passwords - - Many protocols use or even require that user name and password are provided - to be able to download or upload the data of your choice. libcurl offers - several ways to specify them. - - Most protocols support that you specify the name and password in the URL - itself. libcurl will detect this and use them accordingly. This is written - like this: - - protocol://user:password@example.com/path/ - - If you need any odd letters in your user name or password, you should enter - them URL encoded, as %XX where XX is a two-digit hexadecimal number. - - libcurl also provides options to set various passwords. The user name and - password as shown embedded in the URL can instead get set with the - CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to - a string in the format "user:password:". In a manner like this: - - curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret"); - - Another case where name and password might be needed at times, is for those - users who need to authenticate themselves to a proxy they use. libcurl offers - another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar - to the CURLOPT_USERPWD option like this: - - curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret"); - - There's a long time unix "standard" way of storing ftp user names and - passwords, namely in the $HOME/.netrc file. The file should be made private - so that only the user may read it (see also the "Security Considerations" - chapter), as it might contain the password in plain text. libcurl has the - ability to use this file to figure out what set of user name and password to - use for a particular host. As an extension to the normal functionality, - libcurl also supports this file for non-FTP protocols such as HTTP. To make - curl use this file, use the CURLOPT_NETRC option: - - curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE); - - And a very basic example of how such a .netrc file may look like: - - machine myhost.mydomain.com - login userlogin - password secretword - - All these examples have been cases where the password has been optional, or - at least you could leave it out and have libcurl attempt to do its job - without it. There are times when the password isn't optional, like when - you're using an SSL private key for secure transfers. - - To pass the known private key password to libcurl: - - curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword"); - - -HTTP Authentication - - The previous chapter showed how to set user name and password for getting - URLs that require authentication. When using the HTTP protocol, there are - many different ways a client can provide those credentials to the server and - you can control what way libcurl will (attempt to) use. The default HTTP - authentication method is called 'Basic', which is sending the name and - password in clear-text in the HTTP request, base64-encoded. This is insecure. - - At the time of this writing libcurl can be built to use: Basic, Digest, NTLM, - Negotiate, GSS-Negotiate and SPNEGO. You can tell libcurl which one to use - with CURLOPT_HTTPAUTH as in: - - curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, CURLAUTH_DIGEST); - - And when you send authentication to a proxy, you can also set authentication - type the same way but instead with CURLOPT_PROXYAUTH: - - curl_easy_setopt(easyhandle, CURLOPT_PROXYAUTH, CURLAUTH_NTLM); - - Both these options allow you to set multiple types (by ORing them together), - to make libcurl pick the most secure one out of the types the server/proxy - claims to support. This method does however add a round-trip since libcurl - must first ask the server what it supports: - - curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, - CURLAUTH_DIGEST|CURLAUTH_BASIC); - - For convenience, you can use the 'CURLAUTH_ANY' define (instead of a list - with specific types) which allows libcurl to use whatever method it wants. - - When asking for multiple types, libcurl will pick the available one it - considers "best" in its own internal order of preference. - - -HTTP POSTing - - We get many questions regarding how to issue HTTP POSTs with libcurl the - proper way. This chapter will thus include examples using both different - versions of HTTP POST that libcurl supports. - - The first version is the simple POST, the most common version, that most HTML - pages using the