Previously the code would just do that for the path when extracting the
full URL, which made a subsequent curl_url_get() of the path to
(unexpectedly) still return it without the leading path.
Amend lib1560 to verify this. Clarify the curl_url_set() docs about it.
Bug: https://curl.se/mail/lib-2023-06/0015.htmlCloses#11272
Reported-by: Pedro Henrique
This reverts commit df6c2f7b54.
(It only keep the test case that checks redirection to an absolute URL
without hostname and CURLU_NO_AUTHORITY).
I originally wanted to make CURLU_ALLOW_SPACE accept spaces in the
hostname only because I thought
curl_url_set(CURLUPART_URL, CURLU_ALLOW_SPACE) was already accepting
them, and they were only not being accepted in the hostname when
curl_url_set(CURLUPART_URL) was used for a redirection.
That is not actually the case, urlapi never accepted hostnames with
spaces, and a hostname with a space in it never makes sense.
I probably misread the output of my original test when I they were
normally accepted when using CURLU_ALLOW_SPACE, and not redirecting.
Some other URL parsers seems to allow space in the host part of the URL,
e.g. both python3's urllib.parse module, and Chromium's javascript URL
object allow spaces (chromium percent escapes the spaces with %20),
(they also both ignore TABs, and other whitespace characters), but those
URLs with spaces in the hostname are useless, neither python3's requests
module nor Chromium's window.location can actually use them.
There is no reason to add support for URLs with spaces in the host,
since it was not a inconsistency bug; let's revert that patch before it
makes it into release. Sorry about that.
I also reverted the extra check for CURLU_NO_AUTHORITY since that does
not seem to be necessary, CURLU_NO_AUTHORITY already worked for
redirects.
Closes#11169
It can only be an IPv4 address if all parts are all digits and no more than
four parts, otherwise it is a host name. Even slightly wrong IPv4 will now be
passed through as a host name.
Regression from 17a15d8846 shipped in 8.1.0
Extended test 1560 accordingly.
Reported-by: Pavel Kalyugin
Fixes#11129Closes#11131
curl_url_set(uh, CURLUPART_URL, redirurl, flags) was not respecing
CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY in the host part of redirurl
when redirecting to an absolute URL.
Closes#11136
- move host checks together
- simplify the scheme parser loop and the end of host name parser
- avoid itermediate buffer storing in multiple places
- reduce scope for several variables
- skip the Curl_dyn_tail() call for speed
- detect IPv6 earlier and skip extra checks for such hosts
- normalize directly in dynbuf instead of itermediate buffer
- split out the IPv6 parser into its own funciton
- call the IPv6 parser directly for ipv6 addresses
- remove (unused) special treatment of % in host names
- junkscan() once in the beginning instead of scattered
- make junkscan return error code
- remove unused query management from dedotdotify()
- make Curl_parse_login_details use memchr
- more use of memchr() instead of strchr() and less strlen() calls
- make junkscan check and return the URL length
An optimized build runs one of my benchmark URL parsing programs ~41%
faster using this branch. (compared against the shipped 7.88.1 library
in Debian)
Closes#10935
A typical mistake would be to try to set "https://" - including the
separator - this is now rejected as that would then lead to
url_get(... URL...) would get an invalid URL extracted.
Extended test 1560 to verify.
Closes#10911
Using bad numbers in an IPv4 numerical address now returns
CURLUE_BAD_HOSTNAME.
I noticed while working on trurl and it was originally reported here:
https://github.com/curl/trurl/issues/78
Updated test 1560 accordingly.
Closes#10894
Meaning that it would wrongly still store the fragment using spaces
instead of %20 if allowing space while also asking for URL encoding.
Discovered when playing with trurl.
Added test to lib1560 to verify the fix.
Closes#10887
- sscanf() is rather complex and slow, strchr() much simpler
- the port number function does not need to fully verify the IPv6 address
anyway as it is done later in the hostname_check() function and doing
it twice is unnecessary.
Closes#10541
- they are mostly pointless in all major jurisdictions
- many big corporations and projects already don't use them
- saves us from pointless churn
- git keeps history for us
- the year range is kept in COPYING
checksrc is updated to allow non-year using copyright statements
Closes#10205
When CURLU_URLENCODE is set, the parser would mistreat the path
component if the URL was specified without a slash like in
http://local.test:80?-123
Extended test 1560 to reproduce and verify the fix.
Reported-by: Trail of Bits
Closes#9763
As per the documentation :
> Setting a part to a NULL pointer will effectively remove that
> part's contents from the CURLU handle.
But currently clearing CURLUPART_URL does nothing and returns
CURLUE_OK. This change will clear all parts of the URL at once.
Closes#9028
Add licensing and copyright information for all files in this repository. This
either happens in the file itself as a comment header or in the file
`.reuse/dep5`.
This commit also adds a Github workflow to check pull requests and adapts
copyright.pl to the changes.
Closes#8869
Previously, the return code CURLUE_MALFORMED_INPUT was used for almost
30 different URL format violations. This made it hard for users to
understand why a particular URL was not acceptable. Since the API cannot
point out a specific position within the URL for the problem, this now
instead introduces a number of additional and more fine-grained error
codes to allow the API to return more exactly in what "part" or section
of the URL a problem was detected.
Also bug-fixes curl_url_get() with CURLUPART_ZONEID, which previously
returned CURLUE_OK even if no zoneid existed.
Test cases in 1560 have been adjusted and extended. Tests 1538 and 1559
have been updated.
Updated libcurl-errors.3 and curl_url_strerror() accordingly.
Closes#8049
file URLs that are 6 bytes or shorter are not complete. Return
CURLUE_MALFORMED_INPUT for those. Extended test 1560 to verify.
Triggered by #8041Closes#8042
The host name is stored decoded and can be encoded when used to extract
the full URL. By default when extracting the URL, the host name will not
be URL encoded to work as similar as possible as before. When not URL
encoding the host name, the '%' character will however still be encoded.
Getting the URL with the CURLU_URLENCODE flag set will percent encode
the host name part.
As a bonus, setting the host name part with curl_url_set() no longer
accepts a name that contains space, CR or LF.
Test 1560 has been extended to verify percent encodings.
Reported-by: Noam Moshe
Reported-by: Sharon Brizinov
Reported-by: Raul Onitza-Klugman
Reported-by: Kirill Efimov
Fixes#7830Closes#7834
- file://host.name/path/file.txt is a valid UNC path
\\host.name\path\files.txt to a non-local file transformed into URI
(RFC 8089 Appendix E.3)
- UNC paths on other OSs must be smb: URLs
Closes#7366
Add curl_url_strerror() to convert CURLUcode into readable string and
facilitate easier troubleshooting in programs using URL API.
Extend CURLUcode with CURLU_LAST for iteration in unit tests.
Update man pages with a mention of new function.
Update example code and tests with new functionality where it fits.
Closes#7605
They were never officially allowed and slipped in only due to sloppy
parsing. Spaces (ascii 32) should be correctly encoded (to %20) before
being part of a URL.
The new flag bit CURLU_ALLOW_SPACE when a full URL is set, makes libcurl
allow spaces.
Updated test 1560 to verify.
Closes#7073
When the host name in a URL is given as an IPv4 numerical address, the
address can be specified with dotted numericals in four different ways:
a32, a.b24, a.b.c16 or a.b.c.d and each part can be specified in
decimal, octal (0-prefixed) or hexadecimal (0x-prefixed).
Instead of passing on the name as-is and leaving the handling to the
underlying name functions, which made them not work with c-ares but work
with getaddrinfo, this change now makes the curl URL API itself detect
and "normalize" host names specified as IPv4 numericals.
The WHATWG URL Spec says this is an okay way to specify a host name in a
URL. RFC 3896 does not allow them, but curl didn't prevent them before
and it seems other RFC 3896-using tools have not either. Host names used
like this are widely supported by other tools as well due to the
handling being done by getaddrinfo and friends.
I decided to add the functionality into the URL API itself so that all
users of these functions get the benefits, when for example wanting to
compare two URLs. Also, it makes curl built to use c-ares now support
them as well and make curl builds more consistent.
The normalization makes HTTPS and virtual hosted HTTP work fine even
when curl gets the address specified using one of the "obscure" formats.
Test 1560 is extended to verify.
Fixes#6863Closes#6871
configure --enable-debug now enables -Wassign-enum with clang,
identifying several enum "abuses" also fixed.
Reported-by: Gisle Vanem
Bug: 879007f811 (commitcomment-42087553)Closes#5929