Commit Graph

120 Commits

Author SHA1 Message Date
Viktor Szakats
1bc69df7b4
tidy-up: use more example domains
Also make use of the example TLD:
https://en.wikipedia.org/wiki/.example

Reviewed-by: Daniel Stenberg
Closes #11992
2023-09-29 18:25:56 +00:00
Jay Satiro
7a2421dbb7 escape: replace Curl_isunreserved with ISUNRESERVED
- Use the ALLCAPS version of the macro so that it is clear a macro is
  being called that evaluates the variable multiple times.

- Also capitalize macro isurlpuntcs => ISURLPUNTCS since it evaluates
  a variable multiple times.

This is a follow-up to 291d225a which changed Curl_isunreserved into an
alias macro for ISUNRESERVED. The problem is the former is not easily
identified as a macro by the caller, which could lead to a bug.

For example, ISUNRESERVED(*foo++) is easily identifiable as wrong but
Curl_isunreserved(*foo++) is not even though they both are the same.

Closes https://github.com/curl/curl/pull/11846
2023-09-14 03:07:45 -04:00
Daniel Stenberg
887b998e6e
urlapi: setting a blank URL ("") is not an ok URL
Test it in 1560
Fixes #11714
Reported-by: ad0p on github
Closes #11715
2023-08-23 23:24:16 +02:00
Daniel Stenberg
a281057091
urlapi: return CURLUE_BAD_HOSTNAME if puny2idn encoding fails
And document it. Only return out of memory when it actually is a memory
problem.

Pointed-out-by: Jacob Mealey
Closes #11674
2023-08-17 08:21:08 +02:00
Daniel Stenberg
c350069f64
urlapi: CURLU_PUNY2IDN - convert from punycode to IDN name
Asssisted-by: Jay Satiro
Closes #11655
2023-08-13 15:34:38 +02:00
Daniel Stenberg
49e2443186
urlapi: make sure zoneid is also duplicated in curl_url_dup
Add several curl_url_dup() tests to the general lib1560 test.

Reported-by: Rutger Broekhoff
Bug: https://curl.se/mail/lib-2023-07/0047.html
Closes #11549
2023-08-01 08:00:28 +02:00
Sergey
a21f318992
urlapi: fix heap buffer overflow
`u->path = Curl_memdup(path, pathlen + 1);` accesses bytes after the null-terminator.

```
==2676==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x04d48c75 at pc 0x0112708a bp 0x006fb7e0 sp 0x006fb3c4
READ of size 78 at 0x04d48c75 thread T0
    #0 0x1127089 in __asan_wrap_memcpy D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\sanitizer_common\sanitizer_common_interceptors.inc:840
    #1 0x1891a0e in Curl_memdup C:\actions-runner\_work\client\client\third_party\curl\lib\strdup.c:97
    #2 0x18db4b0 in parseurl C:\actions-runner\_work\client\client\third_party\curl\lib\urlapi.c:1297
    #3 0x18db819 in parseurl_and_replace C:\actions-runner\_work\client\client\third_party\curl\lib\urlapi.c:1342
    #4 0x18d6e39 in curl_url_set C:\actions-runner\_work\client\client\third_party\curl\lib\urlapi.c:1790
    #5 0x1877d3e in parseurlandfillconn C:\actions-runner\_work\client\client\third_party\curl\lib\url.c:1768
    #6 0x1871acf in create_conn C:\actions-runner\_work\client\client\third_party\curl\lib\url.c:3403
    #7 0x186d8dc in Curl_connect C:\actions-runner\_work\client\client\third_party\curl\lib\url.c:3888
    #8 0x1856b78 in multi_runsingle C:\actions-runner\_work\client\client\third_party\curl\lib\multi.c:1982
    #9 0x18531e3 in curl_multi_perform C:\actions-runner\_work\client\client\third_party\curl\lib\multi.c:2756
```

Closes #11560
2023-08-01 07:59:07 +02:00
Daniel Stenberg
dacd25888f
curl_url_set: enforce the max string length check for all parts
Update the docs and test 1559 accordingly

Closes #11273
2023-06-08 23:40:08 +02:00
Daniel Stenberg
3c9256c8a0
urlapi: have *set(PATH) prepend a slash if one is missing
Previously the code would just do that for the path when extracting the
full URL, which made a subsequent curl_url_get() of the path to
(unexpectedly) still return it without the leading path.

Amend lib1560 to verify this. Clarify the curl_url_set() docs about it.

Bug: https://curl.se/mail/lib-2023-06/0015.html
Closes #11272
Reported-by: Pedro Henrique
2023-06-08 16:08:45 +02:00
Daniel Stenberg
ba669d072d
urlapi: scheme starts with alpha
Add multiple tests to lib1560 to verify

Fixes #11249
Reported-by: ad0p on github
Closes #11250
2023-06-05 16:28:27 +02:00
Daniel Stenberg
6375a65433
urlapi: remove superfluous host name check
... as it is checked later more proper.

Closes #11195
2023-05-25 08:30:20 +02:00
Daniel Stenberg
127eb0d83a
misc: fix spelling mistakes
Reported-by: musvaage on github
Fixes #11171
Closes #11172
2023-05-23 10:42:09 +02:00
Emanuele Torre
eef076baa6
Revert "urlapi: respect CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY for redirects"
This reverts commit df6c2f7b54.
(It only keep the test case that checks redirection to an absolute URL
without hostname and CURLU_NO_AUTHORITY).

I originally wanted to make CURLU_ALLOW_SPACE accept spaces in the
hostname only because I thought
curl_url_set(CURLUPART_URL, CURLU_ALLOW_SPACE) was already accepting
them, and they were only not being accepted in the hostname when
curl_url_set(CURLUPART_URL) was used for a redirection.

That is not actually the case, urlapi never accepted hostnames with
spaces, and a hostname with a space in it never makes sense.
I probably misread the output of my original test when I they were
normally accepted when using CURLU_ALLOW_SPACE, and not redirecting.

Some other URL parsers seems to allow space in the host part of the URL,
e.g. both python3's urllib.parse module, and Chromium's javascript URL
object allow spaces (chromium percent escapes the spaces with %20),
(they also both ignore TABs, and other whitespace characters), but those
URLs with spaces in the hostname are useless, neither python3's requests
module nor Chromium's window.location can actually use them.

There is no reason to add support for URLs with spaces in the host,
since it was not a inconsistency bug; let's revert that patch before it
makes it into release. Sorry about that.

I also reverted the extra check for CURLU_NO_AUTHORITY since that does
not seem to be necessary, CURLU_NO_AUTHORITY already worked for
redirects.

Closes #11169
2023-05-21 13:59:04 +02:00
Daniel Stenberg
92772e6d39
urlapi: allow numerical parts in the host name
It can only be an IPv4 address if all parts are all digits and no more than
four parts, otherwise it is a host name. Even slightly wrong IPv4 will now be
passed through as a host name.

Regression from 17a15d8846 shipped in 8.1.0

Extended test 1560 accordingly.

Reported-by: Pavel Kalyugin
Fixes #11129
Closes #11131
2023-05-19 16:01:26 +02:00
Emanuele Torre
df6c2f7b54
urlapi: respect CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY for redirects
curl_url_set(uh, CURLUPART_URL, redirurl, flags)  was not respecing
CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY in the host part of redirurl
when redirecting to an absolute URL.

Closes #11136
2023-05-18 20:52:59 +02:00
Emanuele Torre
f198d33e8d
checksrc: disallow spaces before labels
Out of 415 labels throughout the code base, 86 of those labels were
not at the start of the line. Which means labels always at the start of
the line is the favoured style overall with 329 instances.

Out of the 86 labels not at the start of the line:
* 75 were indented with the same indentation level of the following line
* 8 were indented with exactly one space
* 2 were indented with one fewer indentation level then the following
  line
* 1 was indented with the indentation level of the following line minus
  three space (probably unintentional)

Co-Authored-By: Viktor Szakats

Closes #11134
2023-05-18 20:45:04 +02:00
Emanuele Torre
7f712399d5
checksrc: check for spaces before the colon of switch labels
Closes #11047
2023-04-27 23:26:50 +02:00
Daniel Stenberg
d567cca1de
checksrc: fix SPACEBEFOREPAREN for conditions starting with "*"
The open paren check wants to warn for spaces before open parenthesis
for if/while/for but also for any function call. In order to avoid
catching function pointer declarations, the logic allows a space if the
first character after the open parenthesis is an asterisk.

I also spotted what we did not include "switch" in the check but we should.

This check is a little lame, but we reduce this problem by not allowing
that space for if/while/for/switch.

Reported-by: Emanuele Torre
Closes #11044
2023-04-27 17:24:47 +02:00
Daniel Stenberg
b7b1846275
urlapi: make internal function start with Curl_
Curl_url_set_authority() it is.

Follow-up to acd82c8bfd

Closes #11035
2023-04-27 08:36:51 +02:00
Stefan Eissing
acd82c8bfd
tests/http: more tests with specific clients
- Makefile support for building test specific clients in tests/http/clients
- auto-make of clients when invoking pytest
- added test_09_02 for server PUSH_PROMISEs using clients/h2-serverpush
- added test_02_21 for lib based downloads and pausing/unpausing transfers

curl url parser:
- added internal method `curl_url_set_authority()` for setting the
  authority part of a url (used for PUSH_PROMISE)

http2:
- made logging of PUSH_PROMISE handling nicer

Placing python test requirements in requirements.txt files
- separate files to base test suite and http tests since use
  and module lists differ
- using the files in the gh workflows

websocket test cases, fixes for we and bufq
- bufq: account for spare chunks in space calculation
- bufq: reset chunks that are skipped empty
- ws: correctly encode frames with 126 bytes payload
- ws: update frame meta information on first call of collect
  callback that fills user buffer
- test client ws-data: some test/reporting improvements

Closes #11006
2023-04-26 23:24:46 +02:00
Daniel Stenberg
3f1d89ed24
urlapi: skip a pointless assign
It stores a null byte after already having confirmed there is a null
byte there. Detected by PVS.

Ref: #10929
Closes #10943
2023-04-13 14:36:28 +02:00
Daniel Stenberg
4cfa5bcc9a
urlapi: cleanups
- move host checks together
- simplify the scheme parser loop and the end of host name parser
- avoid itermediate buffer storing in multiple places
- reduce scope for several variables
- skip the Curl_dyn_tail() call for speed
- detect IPv6 earlier and skip extra checks for such hosts
- normalize directly in dynbuf instead of itermediate buffer
- split out the IPv6 parser into its own funciton
- call the IPv6 parser directly for ipv6 addresses
- remove (unused) special treatment of % in host names
- junkscan() once in the beginning instead of scattered
- make junkscan return error code
- remove unused query management from dedotdotify()
- make Curl_parse_login_details use memchr
- more use of memchr() instead of strchr() and less strlen() calls
- make junkscan check and return the URL length

An optimized build runs one of my benchmark URL parsing programs ~41%
faster using this branch. (compared against the shipped 7.88.1 library
in Debian)

Closes #10935
2023-04-13 08:41:40 +02:00
Daniel Stenberg
826e8011d5
urlapi: prevent setting invalid schemes with *url_set()
A typical mistake would be to try to set "https://" - including the
separator - this is now rejected as that would then lead to
url_get(... URL...) would get an invalid URL extracted.

Extended test 1560 to verify.

Closes #10911
2023-04-09 23:23:54 +02:00
Daniel Stenberg
17a15d8846
urlapi: detect and error on illegal IPv4 addresses
Using bad numbers in an IPv4 numerical address now returns
CURLUE_BAD_HOSTNAME.

I noticed while working on trurl and it was originally reported here:
https://github.com/curl/trurl/issues/78

Updated test 1560 accordingly.

Closes #10894
2023-04-06 09:02:00 +02:00
Daniel Stenberg
f042e1e75d
urlapi: URL encoding for the URL missed the fragment
Meaning that it would wrongly still store the fragment using spaces
instead of %20 if allowing space while also asking for URL encoding.

Discovered when playing with trurl.

Added test to lib1560 to verify the fix.

Closes #10887
2023-04-05 08:30:12 +02:00
rcombs
b1d735956f
urlapi: take const args in _dup and _get functions
Closes #10708
2023-03-08 15:38:26 +01:00
rcombs
95cb7d3166
urlapi: avoid mutating internals in getter routine
This was not intended.

Closes #10708
2023-03-08 15:38:18 +01:00
Daniel Stenberg
0a0c9b6dfa
urlapi: '%' is illegal in host names
Update test 1560 to verify

Ref: #10708
Closes #10711
2023-03-08 15:33:43 +01:00
Brad Spencer
ad4997e5b2
urlapi: parse IPv6 literals without ENABLE_IPV6
This makes the URL parser API stable and working the same way
independently of libcurl supporting IPv6 transfers or not.

Closes #10660
2023-03-03 10:05:08 +01:00
Daniel Stenberg
8b27799f8c
urlapi: do the port number extraction without using sscanf()
- sscanf() is rather complex and slow, strchr() much simpler

- the port number function does not need to fully verify the IPv6 address
  anyway as it is done later in the hostname_check() function and doing
  it twice is unnecessary.

Closes #10541
2023-02-17 16:21:26 +01:00
Pronyushkin Petr
2b46ce0313
urlapi: fix part of conditional expression is always true: qlen
Closes #10408
2023-02-06 08:53:07 +01:00
Daniel Stenberg
37554d7c07
urlapi: remove pathlen assignment
"Value stored to 'pathlen' is never read"

Follow-up to 804d5293f8

Reported-by: Kvarec Lezki

Closes #10405
2023-02-03 08:20:21 +01:00
Daniel Stenberg
63c53ea627
urlapi: skip the extra dedotdot alloc if no dot in path
Saves an allocation for many/most URLs.

Updates test 1395 accordingly

Closes #10403
2023-02-02 22:34:32 +01:00
Daniel Stenberg
7305ca63e2
urlapi: avoid Curl_dyn_addf() for hex outputs
Inspired by the recent fixes to escape.c, we should avoid calling
Curl_dyn_addf() in loops, perhaps in particular when adding something so
simple as %HH codes - for performance reasons. This change makes the
same thing for the URL parser's two URL-encoding loops.

Closes #10384
2023-02-01 23:05:51 +01:00
Daniel Stenberg
804d5293f8
urlapi: skip path checks if path is just "/"
As a miniscule optimization, treat a path of the length 1 as the same as
non-existing, as it can only be a single leading slash, and that's what
we do for no paths as well.

Closes #10385
2023-02-01 23:04:45 +01:00
Daniel Stenberg
2bc1d775f5
copyright: update all copyright lines and remove year ranges
- they are mostly pointless in all major jurisdictions
- many big corporations and projects already don't use them
- saves us from pointless churn
- git keeps history for us
- the year range is kept in COPYING

checksrc is updated to allow non-year using copyright statements

Closes #10205
2023-01-03 09:19:21 +01:00
Daniel Stenberg
901392cbb7
urlapi: add CURLU_PUNYCODE
Allows curl_url_get() get the punycode version of host names for the
host name and URL parts.

Extend test 1560 to verify.

Closes #10109
2022-12-26 23:29:23 +01:00
Daniel Stenberg
c20b35ddae
urlapi: reject more bad letters from the host name: &+()
Follow-up from eb0167ff7d

Extend test 1560 to verify

Closes #10096
2022-12-15 08:23:48 +01:00
Daniel Stenberg
b15ca64bb0
urlapi: remove two variable assigns
To please scan-build:

urlapi.c:1163:9: warning: Value stored to 'qlen' is never read
        qlen = Curl_dyn_len(&enc);
        ^      ~~~~~~~~~~~~~~~~~~
urlapi.c:1164:9: warning: Value stored to 'query' is never read
        query = u->query = Curl_dyn_ptr(&enc);
        ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Follow-up to 7d6cf06f57

Closes #9777
2022-10-21 11:00:18 +02:00
Daniel Stenberg
7d6cf06f57
urlapi: fix parsing URL without slash with CURLU_URLENCODE
When CURLU_URLENCODE is set, the parser would mistreat the path
component if the URL was specified without a slash like in
http://local.test:80?-123

Extended test 1560 to reproduce and verify the fix.

Reported-by: Trail of Bits

Closes #9763
2022-10-20 08:56:53 +02:00
12932
ddeec8feba
misc: nitpick grammar in comments/docs
because the 'u' in URL is actually a consonant *sound* it is only
correct to write "a URL"

sorry this is a bit nitpicky :P

https://english.stackexchange.com/questions/152/when-should-i-use-a-vs-an
https://www.techtarget.com/whatis/feature/Which-is-correct-a-URL-or-an-URL

Closes #9699
2022-10-12 11:32:43 +02:00
John Bampton
e80c4ff3d0
misc: fix spelling in docs and comments
also: remove outdated sentence

Closes #9644
2022-10-05 16:12:10 +02:00
Daniel Stenberg
eb0167ff7d
urlapi: reject more bad characters from the host name field
Extended test 1560 to verify

Report from the ongoing source code audit by Trail of Bits.

Closes #9608
2022-09-28 08:22:42 +02:00
Patrick Monnerat
9d51329047
setopt: use the handler table for protocol name to number conversions
This also returns error CURLE_UNSUPPORTED_PROTOCOL rather than
CURLE_BAD_FUNCTION_ARGUMENT when a listed protocol name is not found.

A new schemelen parameter is added to Curl_builtin_scheme() to support
this extended use.

Note that disabled protocols are not recognized anymore.

Tests adapted accordingly.

Closes #9472
2022-09-16 23:29:01 +02:00
Daniel Stenberg
846678541b
urlapi: detect scheme better when not guessing
When the parser is not allowed to guess scheme, it should consider the
word ending at the first colon to be the scheme, independently of number
of slashes.

The parser now checks that the scheme is known before it counts slashes,
to improve the error messge for URLs with unknown schemes and maybe no
slashes.

When following redirects, no scheme guessing is allowed and therefore
this change effectively prevents redirects to unknown schemes such as
"data".

Fixes #9503
2022-09-15 09:31:40 +02:00
Daniel Stenberg
f703cf971c
urlapi: leaner with fewer allocs
Slightly faster with more robust code. Uses fewer and smaller mallocs.

- remove two fields from the URL handle struct
- reduce copies and allocs
- use dynbuf buffers more instead of custom malloc + copies
- uses dynbuf to build the host name in reduces serial alloc+free within
  the same function.
- move dedotdotify into urlapi.c and make it static, not strdup the input
  and optimize it by checking for . and / before using strncmp
- remove a few strlen() calls
- add Curl_dyn_setlen() that can "trim" an existing dynbuf

Closes #9408
2022-09-07 10:21:45 +02:00
Daniel Stenberg
8dd95da35b
ctype: remove all use of <ctype.h>, use our own versions
Except in the test servers.

Closes #9433
2022-09-06 08:32:36 +02:00
Viktor Szakats
c9061f242b
misc: spelling fixes
Found using codespell 2.2.1.

Also delete the redundant protocol designator from an archive.org URL.

Reviewed-by: Daniel Stenberg
Closes #9403
2022-08-31 14:31:01 +00:00
Pierrick Charron
4bf2c231d7
urlapi: make curl_url_set(url, CURLUPART_URL, NULL, 0) clear all parts
As per the documentation :

> Setting a part to a NULL pointer will effectively remove that
> part's contents from the CURLU handle.

But currently clearing CURLUPART_URL does nothing and returns
CURLUE_OK. This change will clear all parts of the URL at once.

Closes #9028
2022-06-20 08:15:51 +02:00
max.mehl
ad9bc5976d
copyright: make repository REUSE compliant
Add licensing and copyright information for all files in this repository. This
either happens in the file itself as a comment header or in the file
`.reuse/dep5`.

This commit also adds a Github workflow to check pull requests and adapts
copyright.pl to the changes.

Closes #8869
2022-06-13 09:13:00 +02:00