This clarifies the handling of server responses by folding the code for
the complicated protocols into their protocol handlers. This concerns
mainly HTTP and its bastard sibling RTSP.
The terms "read" and "write" are often used without clear context if
they refer to the connect or the client/application side of a
transfer. This PR uses "read/write" for operations on the client side
and "send/receive" for the connection, e.g. server side. If this is
considered useful, we can revisit renaming of further methods in another
PR.
Curl's protocol handler `readwrite()` method been changed:
```diff
- CURLcode (*readwrite)(struct Curl_easy *data, struct connectdata *conn,
- const char *buf, size_t blen,
- size_t *pconsumed, bool *readmore);
+ CURLcode (*write_resp)(struct Curl_easy *data, const char *buf, size_t blen,
+ bool is_eos, bool *done);
```
The name was changed to clarify that this writes reponse data to the
client side. The parameter changes are:
* `conn` removed as it always operates on `data->conn`
* `pconsumed` removed as the method needs to handle all data on success
* `readmore` removed as no longer necessary
* `is_eos` as indicator that this is the last call for the transfer
response (end-of-stream).
* `done` TRUE on return iff the transfer response is to be treated as
finished
This change affects many files only because of updated comments in
handlers that provide no implementation. The real change is that the
HTTP protocol handlers now provide an implementation.
The HTTP protocol handlers `write_resp()` implementation will get passed
**all** raw data of a server response for the transfer. The HTTP/1.x
formatted status and headers, as well as the undecoded response
body. `Curl_http_write_resp_hds()` is used internally to parse the
response headers and pass them on. This method is public as the RTSP
protocol handler also uses it.
HTTP/1.1 "chunked" transport encoding is now part of the general
*content encoding* writer stack, just like other encodings. A new flag
`CLIENTWRITE_EOS` was added for the last client write. This allows
writers to verify that they are in a valid end state. The chunked
decoder will check if it indeed has seen the last chunk.
The general response handling in `transfer.c:466` happens in function
`readwrite_data()`. This mainly operates now like:
```
static CURLcode readwrite_data(data, ...)
{
do {
Curl_xfer_recv_resp(data, buf)
...
Curl_xfer_write_resp(data, buf)
...
} while(interested);
...
}
```
All the response data handling is implemented in
`Curl_xfer_write_resp()`. It calls the protocol handler's `write_resp()`
implementation if available, or does the default behaviour.
All raw response data needs to pass through this function. Which also
means that anyone in possession of such data may call
`Curl_xfer_write_resp()`.
Closes#12480
To help users better understand where the URL (and denied scheme) comes
from. Also removed "in libcurl" from the message, since the disabling
can be done by the application.
The error message now says "not supported" or "disabled" depending on
why it was denied:
Protocol "hej" not supported
Protocol "http" disabled
And in redirects:
Protocol "hej" not supported (in redirect)
Protocol "http" disabled (in redirect)
Reported-by: Mauricio Scheffer
Fixes#12465Closes#12469
- add `SingleRequest->download_done` as indicator that
all download bytes have been received
- remove `stop_reading` bool from readwrite functions
- move excess body handling into client download writer
Closes#12371
Windows compilers define `_WIN32` automatically. Windows SDK headers
or build env defines `WIN32`, or we have to take care of it. The
agreement seems to be that `_WIN32` is the preferred practice here.
Make the source code rely on that to detect we're building for Windows.
Public `curl.h` was using `WIN32`, `__WIN32__` and `CURL_WIN32` for
Windows detection, next to the official `_WIN32`. After this patch it
only uses `_WIN32` for this. Also, make it stop defining `CURL_WIN32`.
There is a slight chance these break compatibility with Windows
compilers that fail to define `_WIN32`. I'm not aware of any obsolete
or modern compiler affected, but in case there is one, one possible
solution is to define this macro manually.
grepping for `WIN32` remains useful to discover Windows-specific code.
Also:
- extend `checksrc` to ensure we're not using `WIN32` anymore.
- apply minor formatting here and there.
- delete unnecessary checks for `!MSDOS` when `_WIN32` is present.
Co-authored-by: Jay Satiro
Reviewed-by: Daniel Stenberg
Closes#12376
- have common pattern of `if not match, continue`
- revert pages long if()s to return early
- move dead connection check to later since it may
be relatively expensive
- check multiuse also when NOT building with NGHTTP2
- for MULTIUSE bundles, verify that the inspected
connection indeed supports multiplexing when in use
(bundles may contain a mix of connection, afaict)
Closes#12373
Instead of a loop to scan over the potentially 30+ scheme names, this
uses a "perfect hash" table. This works fine because the set of schemes
is known and cannot change in a build. The hash algorithm and table size
is made to only make a single scheme index per table entry.
The perfect hash is generated by a separate tool (scripts/schemetable.c)
Closes#12347
Fixes:
```
./lib/url.c:178:56: warning: use of an empty initializer is a C2x extension [-Wc2x-extensions]
178 | static const struct Curl_handler * const protocols[] = {
| ^
./lib/url.c:178:56: warning: zero size arrays are an extension [-Wzero-length-array]
```
Closes#12344
Fixes:
```
./lib/url.c:456:35: error: no member named 'formp' in 'struct UrlState'
456 | Curl_mime_cleanpart(data->state.formp);
| ~~~~~~~~~~~ ^
```
Regression from 74b87a8af1#11682Closes#12343
1. Because the value is not strictly set with a setopt option.
2. Because otherwise when duping a handle when all the set.* fields are
first copied and an error happens (think out of memory mid-function),
the function would easily free the list *before* it was deep-copied,
which could lead to a double-free.
Closes#12323
To make it work properly with curl_easy_duphandle(). This, because
duphandle duplicates the entire 'UserDefined' struct by plain copy while
'hstslist' is a linked curl_list of file names. This would lead to a
double-free when the second of the two involved easy handles were
closed.
Closes#12315
- tunnel https proxy used for http: transfers does
no check if proxy-ssl configuration matches
- test cases added, test_10_12 fails on 8.4.0
Closes#12255
- perform connection cache matching against `data->set.ssl.primary`
and proxy counterpart
- fully clone connection ssl config only when connection is used
Closes#12237
- resolving is done for a connection, not for every transfer
- save create/dup/free of a cares channel for each transfer
- check values of setopt calls against a local channel if no
connection has been attached yet, when needed.
Closes#12198
- move definitions from content_encoding.h to sendf.h
- move create/cleanup/add code into sendf.c
- installed content_encoding writers will always be called
on Curl_client_write(CLIENTWRITE_BODY)
- Curl_client_cleanup() frees writers and tempbuffers from
paused transfers, irregardless of protocol
Closes#11908
- Fix netrc info message to use the generic ".netrc" filename if the
user did not specify a netrc location.
- Update --netrc doc to add that recent versions of curl on Windows
prefer .netrc over _netrc.
Before:
* Couldn't find host google.com in the (nil) file; using defaults
After:
* Couldn't find host google.com in the .netrc file; using defaults
Closes https://github.com/curl/curl/pull/11904
When the legacy CURLOPT_HTTPPOST option is used, it gets converted into
the modem mimpost struct at first use. This data is (now) kept for the
entire transfer and not only per single HTTP request. This re-enables
rewind in the beginning of the second request instead of in end of the
first, as brought by 1b39731.
The request struct is per-request data only.
Extend test 650 to verify.
Fixes#11680
Reported-by: yushicheng7788 on github
Closes#11682
It was previously unlimited by default, but that's not a sensible
default. While changing this has a remote risk of breaking an existing
use case, I figure it is more likely to actually save users from loops.
Closes#11581
HTTP/1 connections that are upgraded to HTTP/2 should not be picked up
for reuse and multiplexing by other handles until the 101 switching
process is completed.
Lots-of-debgging-by: Stefan Eissing
Reported-by: Richard W.M. Jones
Bug: https://curl.se/mail/lib-2023-07/0045.htmlCloses#11557
Make sure the user and password for the second request is taken from the
redirected-to URL.
Add test case 899 to verify.
Reported-by: James Lucas
Fixes#11410Closes#11412
- fix HTTP/2 check to not declare a connection dead when
the read attempt results in EAGAIN
- add H2-PROXY alive check as for HTTP/2 that was missing
and is needed
- add attach/detach around Curl_conn_is_alive() and remove
these in filter methods
- add checks for number of connections used in some test_10
proxy tunneling tests
Closes#11368
- Use CURL_FORMAT_CURL_OFF_T where %zd was erroneously used for some
curl_off_t variables.
- Use %zu where %zd was erroneously used for some size_t variables.
Prior to this change some of the Windows CI tests were failing because
in Windows 32-bit targets have a 32-bit size_t and a 64-bit curl_off_t.
When %zd was used for some curl_off_t variables then only the lower
32-bits was read and the upper 32-bits would be read for part or all of
the next specifier.
Fixes https://github.com/curl/curl/issues/11327
Closes https://github.com/curl/curl/pull/11321
- add an `id` long to Curl_easy, -1 on init
- once added to a multi (or its own multi), it gets
a non-negative number assigned by the connection cache
- `id` is unique among all transfers using the same
cache until reaching LONG_MAX where it will wrap
around. So, not unique eternally.
- CURLINFO_CONN_ID returns the connection id attached to
data or, if none present, data->state.lastconnect_id
- variables and type declared in tool for write out
Closes#11185
Out of 415 labels throughout the code base, 86 of those labels were
not at the start of the line. Which means labels always at the start of
the line is the favoured style overall with 329 instances.
Out of the 86 labels not at the start of the line:
* 75 were indented with the same indentation level of the following line
* 8 were indented with exactly one space
* 2 were indented with one fewer indentation level then the following
line
* 1 was indented with the indentation level of the following line minus
three space (probably unintentional)
Co-Authored-By: Viktor Szakats
Closes#11134
- expression 'hostptr' is always true
- a part of conditional expression is always true: proxypasswd
- expression 'proxyuser' is always true
- avoid multiple Curl_now() calls in allocate_conn
Ref: #10929Closes#10959
- move host checks together
- simplify the scheme parser loop and the end of host name parser
- avoid itermediate buffer storing in multiple places
- reduce scope for several variables
- skip the Curl_dyn_tail() call for speed
- detect IPv6 earlier and skip extra checks for such hosts
- normalize directly in dynbuf instead of itermediate buffer
- split out the IPv6 parser into its own funciton
- call the IPv6 parser directly for ipv6 addresses
- remove (unused) special treatment of % in host names
- junkscan() once in the beginning instead of scattered
- make junkscan return error code
- remove unused query management from dedotdotify()
- make Curl_parse_login_details use memchr
- more use of memchr() instead of strchr() and less strlen() calls
- make junkscan check and return the URL length
An optimized build runs one of my benchmark URL parsing programs ~41%
faster using this branch. (compared against the shipped 7.88.1 library
in Debian)
Closes#10935
As they are not driving transfers or any socket activity, the main loop
does not need to iterate over these handles. A performance improvement.
They are instead only held in their own separate lists.
'data->multi' is kept a pointer to the multi handle as long as the easy
handle is actually part of it even when the handle is moved to the
pending/msgsent lists. It needs to know which multi handle it belongs
to, if for example curl_easy_cleanup() is called before the handle is
removed from the multi handle.
Alll 'data->multi' pointers of handles still part of the multi handle
gets cleared by curl_multi_cleanup() which "orphans" all previously
attached easy handles.
This is take 2. The first version was reverted for the 8.0.1 release.
Assisted-by: Stefan Eissing
Closes#10801
Linked lists themselves do not carry any allocations, so for the lists
that do not have have a set destructor we can just skip the
Curl_llist_destroy() call and save CPU time.
Closes#10764