Commit Graph

152 Commits

Author SHA1 Message Date
Daniel Stenberg
b696fc129b
lib: use Curl_str_number() for parsing decimal numbers
Instead of strtoul() and strtol() calls.

Easier API with better integer overflow detection and built-in max check
that now comes automatic everywhere this is used.

Closes #16319
2025-02-14 10:38:56 +01:00
Daniel Stenberg
44deccf907
urlapi: simplify junkscan
Makes it smaller and possibly somewhat faster

Closes #16307
2025-02-13 12:51:47 +01:00
Terence Eden
a042c67df3
docs: use valid example domain names
Replace .site domains and domain.com with valid example domains.

Fixes #16269
Closes #16270
2025-02-09 00:17:05 +01:00
Daniel Stenberg
150b0d808b
urlapi: cleanup the redirect logic somewhat
Closes #15877
2025-01-01 14:11:15 +01:00
Daniel Stenberg
66e5351e0a
urlapi: fix redirect to a new fragment or query (only)
The redirect logic was broken when the redirect-to URL was a relative
URL only as a fragment or query (starting with '#' or '?').

Extended test 1560 to reproduce, then verify.

Reported-by: Jeroen Ooms
Fixes #15836
Closes #15848
2024-12-30 08:23:26 +01:00
Daniel Stenberg
566a6d7b09
urlapi: normalize the IPv6 address
As the parsing and address "regeneration" are done anyway, we might as
well use the updated version in the result and thereby A) get a
normalized (and lower cased) version of the address and B) avoid a
strcpy().

Updated test 1560 to verify.

Closes #15143
2024-10-03 16:05:03 +02:00
Daniel Stenberg
fbf5d507ce
lib/src: white space edits to comply better with code style
... as checksrc now finds and complains about these.

Closes #14921
2024-09-19 14:59:12 +02:00
Viktor Szakats
56b0442e4c
urlapi: drop unused header
Closes #14867
2024-09-19 12:56:21 +02:00
Daniel Stenberg
d1394a00ea
urlapi: verify URL *decoded* hostname when set
It was previously wrongly verifying the input in its URL encoded format
when setting the hostname component with curl_url_set(), so it wrongly
rejected '%'.

Now it URL decodes the name appropriately before the check.

Added tests to lib1560 to verify that a fine %-code is okay and that a
bad %-code (that decodes to '%') is rejected.

Regression from 0a0c9b6dfa, shipped in 8.0.0

Fixes #14656
Reported-by: Venkat Krishna R
Closes #14657
2024-08-23 13:55:13 +02:00
Viktor Szakats
f81f351b9a
tidy-up: OS names
Use these words and casing more consistently across text, comments and
one curl tool output:
AIX, ALPN, ANSI, BSD, Cygwin, Darwin, FreeBSD, GitHub, HP-UX, Linux,
macOS, MS-DOS, MSYS, MinGW, NTLM, POSIX, Solaris, UNIX, Unix, Unicode,
WINE, WebDAV, Win32, winbind, WinIDN, Windows, Windows CE, Winsock.

Mostly OS names and a few more.

Also a couple of other minor text fixups.

Closes #14360
2024-08-04 19:17:45 +02:00
martinevsky
e22b509754
urlapi: remove unused definition of HOST_BAD
Closes #14235
2024-07-19 18:17:53 +02:00
Daniel Stenberg
c074ba64a8
code: language cleanup in comments
Based on the standards and guidelines we use for our documentation.

 - expand contractions (they're => they are etc)
 - host name = > hostname
 - file name => filename
 - user name = username
 - man page => manpage
 - run-time => runtime
 - set-up => setup
 - back-end => backend
 - a HTTP => an HTTP
 - Two spaces after a period => one space after period

Closes #14073
2024-07-01 22:58:55 +02:00
Viktor Szakats
72abf7c13a
lib: tidy up types and casts
Cherry-picked from #13489
Closes #13862
2024-06-05 14:02:39 +02:00
Daniel Stenberg
655d44d139
urlapi: add CURLU_NO_GUESS_SCHEME
Used for extracting:

- when used asking for a scheme, it will return CURLUE_NO_SCHEME if the
  stored information was a guess

- when used asking for a URL, the URL is returned without a scheme, like
  when previously given to the URL parser when it was asked to guess

- as soon as the scheme is set explicitly, it is no longer internally
  marked as guessed

The idea being:

1. allow a user to figure out if a URL's scheme was set as a result of
  guessing

2. extract the URL without a guessed scheme

3. this makes it work similar to how we already deal with port numbers

Extend test 1560 to verify.

Closes #13616
2024-06-01 23:51:42 +02:00
Daniel Stenberg
aef369867f
lib: call Curl_strntolower instead of doing crafted loops
Closes #13627
2024-05-14 08:00:19 +02:00
Daniel Stenberg
fe17c162d0
urlapi: allow setting port number zero
Also set and check errno when strtoul() parsing numbers for better error
checking.

Updated test 1560

Closes #13427
2024-04-19 23:54:21 +02:00
Daniel Stenberg
0a25b3e014
urlapi: remove unused flags argument from Curl_url_set_authority
The function is only called from a single place (for HTTP/2 server push)
so might as well just assume this fixed option every time.

Closes #13409
2024-04-18 22:24:33 +02:00
Daniel Stenberg
3eac21d86b
urlapi: add CURLU_GET_EMPTY for empty queries and fragments
By default the API inhibits empty queries and fragments extracted.
Unless this new flag is set.

This also makes the behavior more consistent: without it set, zero
length queries and fragments are considered not present in the URL. With
the flag set, they are returned as a zero length strings if they were in
fact present in the URL.

This applies when extracting the individual query and fragment
components and for the full URL.

Closes #13396
2024-04-18 10:37:28 +02:00
Daniel Stenberg
c37b694e46
urlapi: fix relative redirects to fragment-only
Using the URL API for a redirect URL when the redirected-to string
starts with a hash, ie is only a fragment, the API would produce the
wrong final URL.

Adjusted test 1560 to test for several new redirect cases.

Closes #13394
2024-04-17 22:41:47 +02:00
Viktor Szakats
e411c98f70
build: prefer USE_IPV6 macro internally (was: ENABLE_IPV6)
Before this patch, two macros were used to guard IPv6 features in curl
sources: `ENABLE_IPV6` and `USE_IPV6`. This patch makes the source use
the latter for consistency with other similar switches.

`-DENABLE_IPV6` remains accepted for compatibility as a synonym for
`-DUSE_IPV6`, when passed to the compiler.

`ENABLE_IPV6` also remains the name of the CMake and `Makefile.vc`
options to control this feature.

Closes #13349
2024-04-13 08:33:26 +00:00
Louis Solofrizzo
57446b67ba
lib: initialize output pointers to NULL before calling strto[ff,l,ul]
In order to make MSAN happy:

    ==2200945==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x596f3b3ed246 in curlx_strtoofft [...]/libcurl/src/lib/strtoofft.c:239:11
    #1 0x596f3b402156 in Curl_httpchunk_read [...]/libcurl/src/lib/http_chunks.c:149:12
    #2 0x596f3b348550 in readwrite_data [...]/libcurl/src/lib/transfer.c:607:11
    [...]

    ==2202041==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5a3fab66a72a in Curl_parse_port [...]/libcurl/src/lib/urlapi.c:547:8
    #1 0x5a3fab650645 in parse_authority [...]/libcurl/src/lib/urlapi.c:796:12
    #2 0x5a3fab6740f6 in parseurl [...]/libcurl/src/lib/urlapi.c:1176:16
    #3 0x5a3fab664fc5 in parseurl_and_replace [...]/libcurl/src/lib/urlapi.c:1342:12
    [...]

    ==2202320==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x569076a0d6b0 in ipv4_normalize [...]/libcurl/src/lib/urlapi.c:683:12
    #1 0x5690769f2820 in parse_authority [...]/libcurl/src/lib/urlapi.c:803:10
    #2 0x569076a160f6 in parseurl [...]/libcurl/src/lib/urlapi.c:1176:16
    #3 0x569076a06fc5 in parseurl_and_replace [...]/libcurl/src/lib/urlapi.c:1342:12
    [...]

Signed-off-by: Louis Solofrizzo <lsolofrizzo@scaleway.com>
Closes #12995
2024-02-26 17:19:27 +01:00
Daniel Stenberg
162113676a
urlapi: remove assert
This assert triggers wrongly when CURLU_GUESS_SCHEME and
CURLU_NO_AUTHORITY are both set and the URL is a single path.

I think this assert has played out its role. It was introduced in a
rather big refactor.

Follow-up to 4cfa5bcc9a

Reported-by: promptfuzz_ on hackerone
Closes #12775
2024-01-24 23:15:13 +01:00
Daniel Stenberg
f58e493e44
curl.h: add CURLE_TOO_LARGE
A new error code to be used when an internal field grows too large, like
when a dynbuf reaches its maximum. Previously it would return
CURLE_OUT_OF_MEMORY for this, which is highly misleading.

Ref: #12268
Closes #12269
2023-12-18 10:34:22 +01:00
Viktor Szakats
3829759bd0
build: enable missing OpenSSF-recommended warnings, with fixes
https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html
as of 2023-11-29 [1].

Enable new recommended warnings (except `-Wsign-conversion`):

- enable `-Wformat=2` for clang (in both cmake and autotools).
- add `CURL_PRINTF()` internal attribute and mark functions accepting
  printf arguments with it. This is a copy of existing
  `CURL_TEMP_PRINTF()` but using `__printf__` to make it compatible
  with redefinting the `printf` symbol:
  https://gcc.gnu.org/onlinedocs/gcc-3.0.4/gcc_5.html#SEC94
- fix `CURL_PRINTF()` and existing `CURL_TEMP_PRINTF()` for
  mingw-w64 and enable it on this platform.
- enable `-Wimplicit-fallthrough`.
- enable `-Wtrampolines`.
- add `-Wsign-conversion` commented with a FIXME.
- cmake: enable `-pedantic-errors` the way we do it with autotools.
  Follow-up to d5c0351055 #2747
- lib/curl_trc.h: use `CURL_FORMAT()`, this also fixes it to enable format
  checks. Previously it was always disabled due to the internal `printf`
  macro.

Fix them:

- fix bug where an `set_ipv6_v6only()` call was missed in builds with
  `--disable-verbose` / `CURL_DISABLE_VERBOSE_STRINGS=ON`.
- add internal `FALLTHROUGH()` macro.
- replace obsolete fall-through comments with `FALLTHROUGH()`.
- fix fallthrough markups: Delete redundant ones (showing up as
  warnings in most cases). Add missing ones. Fix indentation.
- silence `-Wformat-nonliteral` warnings with llvm/clang.
- fix one `-Wformat-nonliteral` warning.
- fix new `-Wformat` and `-Wformat-security` warnings.
- fix `CURL_FORMAT_SOCKET_T` value for mingw-w64. Also move its
  definition to `lib/curl_setup.h` allowing use in `tests/server`.
- lib: fix two wrongly passed string arguments in log outputs.
  Co-authored-by: Jay Satiro
- fix new `-Wformat` warnings on mingw-w64.

[1] 56c0fde389/docs/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C%2B%2B.md

Closes #12489
2023-12-16 13:12:37 +00:00
Daniel Stenberg
7c992dd9f8
lib: rename Curl_strndup to Curl_memdup0 to avoid misunderstanding
Since the copy does not stop at a null byte, let's not call it anything
that makes you think it works like the common strndup() function.

Based on feedback from Jay Satiro, Stefan Eissing and Patrick Monnerat

Closes #12490
2023-12-08 17:22:33 +01:00
Viktor Szakats
e9a7d4a1c8
windows: use built-in _WIN32 macro to detect Windows
Windows compilers define `_WIN32` automatically. Windows SDK headers
or build env defines `WIN32`, or we have to take care of it. The
agreement seems to be that `_WIN32` is the preferred practice here.
Make the source code rely on that to detect we're building for Windows.

Public `curl.h` was using `WIN32`, `__WIN32__` and `CURL_WIN32` for
Windows detection, next to the official `_WIN32`. After this patch it
only uses `_WIN32` for this. Also, make it stop defining `CURL_WIN32`.

There is a slight chance these break compatibility with Windows
compilers that fail to define `_WIN32`. I'm not aware of any obsolete
or modern compiler affected, but in case there is one, one possible
solution is to define this macro manually.

grepping for `WIN32` remains useful to discover Windows-specific code.

Also:

- extend `checksrc` to ensure we're not using `WIN32` anymore.

- apply minor formatting here and there.

- delete unnecessary checks for `!MSDOS` when `_WIN32` is present.

Co-authored-by: Jay Satiro
Reviewed-by: Daniel Stenberg

Closes #12376
2023-11-22 15:42:25 +00:00
Sam James
bc8509a748
misc: fix -Walloc-size warnings
GCC 14 introduces a new -Walloc-size included in -Wextra which gives:

```
src/tool_operate.c: In function ‘add_per_transfer’:
src/tool_operate.c:213:5: warning: allocation of insufficient size ‘1’ for type ‘struct per_transfer’ with size ‘480’ [-Walloc-size]
  213 |   p = calloc(sizeof(struct per_transfer), 1);
      |     ^
src/var.c: In function ‘addvariable’:
src/var.c:361:5: warning: allocation of insufficient size ‘1’ for type ‘struct var’ with size ‘32’ [-Walloc-size]
  361 |   p = calloc(sizeof(struct var), 1);
      |     ^
```

The calloc prototype is:
```
void *calloc(size_t nmemb, size_t size);
    ```

So, just swap the number of members and size arguments to match the
prototype, as we're initialising 1 struct of size `sizeof(struct
...)`. GCC then sees we're not doing anything wrong.

Closes #12292
2023-11-11 23:35:47 +01:00
Daniel Stenberg
d3b3ba35a5
lib: add and use Curl_strndup()
The Curl_strndup() function is similar to memdup(), but copies 'n' bytes
then adds a terminating null byte ('\0').

Closes #12251
2023-11-02 20:35:20 +01:00
Daniel Stenberg
5c846a12a3
urlapi: when URL encoding the fragment, pass in the right length
A benign bug because it would only add an extra null terminator.

Made lib1560 get a test that runs this code.

Closes #12250
2023-11-02 16:23:17 +01:00
Daniel Stenberg
ffbc9981c4
urlapi: skip appending NULL pointer query
Reported-by: kirbyn17 on hackerone

Closes #12240
2023-11-01 10:55:55 +01:00
Daniel Stenberg
c64d0d67fd
urlapi: avoid null deref if setting blank host to url encode
Reported-by: kirbyn17 on hackerone

Closes #12240
2023-11-01 10:55:46 +01:00
Stefan Eissing
39547ae64d
url: protocol handler lookup tidy-up
- rename lookup to what it does
- use ARRAYSIZE instead of NULL check for end
- offer alternate lookup for 0-terminated strings

Closes #12216
2023-10-27 16:55:54 +02:00
Viktor Szakats
1bc69df7b4
tidy-up: use more example domains
Also make use of the example TLD:
https://en.wikipedia.org/wiki/.example

Reviewed-by: Daniel Stenberg
Closes #11992
2023-09-29 18:25:56 +00:00
Jay Satiro
7a2421dbb7 escape: replace Curl_isunreserved with ISUNRESERVED
- Use the ALLCAPS version of the macro so that it is clear a macro is
  being called that evaluates the variable multiple times.

- Also capitalize macro isurlpuntcs => ISURLPUNTCS since it evaluates
  a variable multiple times.

This is a follow-up to 291d225a which changed Curl_isunreserved into an
alias macro for ISUNRESERVED. The problem is the former is not easily
identified as a macro by the caller, which could lead to a bug.

For example, ISUNRESERVED(*foo++) is easily identifiable as wrong but
Curl_isunreserved(*foo++) is not even though they both are the same.

Closes https://github.com/curl/curl/pull/11846
2023-09-14 03:07:45 -04:00
Daniel Stenberg
887b998e6e
urlapi: setting a blank URL ("") is not an ok URL
Test it in 1560
Fixes #11714
Reported-by: ad0p on github
Closes #11715
2023-08-23 23:24:16 +02:00
Daniel Stenberg
a281057091
urlapi: return CURLUE_BAD_HOSTNAME if puny2idn encoding fails
And document it. Only return out of memory when it actually is a memory
problem.

Pointed-out-by: Jacob Mealey
Closes #11674
2023-08-17 08:21:08 +02:00
Daniel Stenberg
c350069f64
urlapi: CURLU_PUNY2IDN - convert from punycode to IDN name
Asssisted-by: Jay Satiro
Closes #11655
2023-08-13 15:34:38 +02:00
Daniel Stenberg
49e2443186
urlapi: make sure zoneid is also duplicated in curl_url_dup
Add several curl_url_dup() tests to the general lib1560 test.

Reported-by: Rutger Broekhoff
Bug: https://curl.se/mail/lib-2023-07/0047.html
Closes #11549
2023-08-01 08:00:28 +02:00
Sergey
a21f318992
urlapi: fix heap buffer overflow
`u->path = Curl_memdup(path, pathlen + 1);` accesses bytes after the null-terminator.

```
==2676==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x04d48c75 at pc 0x0112708a bp 0x006fb7e0 sp 0x006fb3c4
READ of size 78 at 0x04d48c75 thread T0
    #0 0x1127089 in __asan_wrap_memcpy D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\sanitizer_common\sanitizer_common_interceptors.inc:840
    #1 0x1891a0e in Curl_memdup C:\actions-runner\_work\client\client\third_party\curl\lib\strdup.c:97
    #2 0x18db4b0 in parseurl C:\actions-runner\_work\client\client\third_party\curl\lib\urlapi.c:1297
    #3 0x18db819 in parseurl_and_replace C:\actions-runner\_work\client\client\third_party\curl\lib\urlapi.c:1342
    #4 0x18d6e39 in curl_url_set C:\actions-runner\_work\client\client\third_party\curl\lib\urlapi.c:1790
    #5 0x1877d3e in parseurlandfillconn C:\actions-runner\_work\client\client\third_party\curl\lib\url.c:1768
    #6 0x1871acf in create_conn C:\actions-runner\_work\client\client\third_party\curl\lib\url.c:3403
    #7 0x186d8dc in Curl_connect C:\actions-runner\_work\client\client\third_party\curl\lib\url.c:3888
    #8 0x1856b78 in multi_runsingle C:\actions-runner\_work\client\client\third_party\curl\lib\multi.c:1982
    #9 0x18531e3 in curl_multi_perform C:\actions-runner\_work\client\client\third_party\curl\lib\multi.c:2756
```

Closes #11560
2023-08-01 07:59:07 +02:00
Daniel Stenberg
dacd25888f
curl_url_set: enforce the max string length check for all parts
Update the docs and test 1559 accordingly

Closes #11273
2023-06-08 23:40:08 +02:00
Daniel Stenberg
3c9256c8a0
urlapi: have *set(PATH) prepend a slash if one is missing
Previously the code would just do that for the path when extracting the
full URL, which made a subsequent curl_url_get() of the path to
(unexpectedly) still return it without the leading path.

Amend lib1560 to verify this. Clarify the curl_url_set() docs about it.

Bug: https://curl.se/mail/lib-2023-06/0015.html
Closes #11272
Reported-by: Pedro Henrique
2023-06-08 16:08:45 +02:00
Daniel Stenberg
ba669d072d
urlapi: scheme starts with alpha
Add multiple tests to lib1560 to verify

Fixes #11249
Reported-by: ad0p on github
Closes #11250
2023-06-05 16:28:27 +02:00
Daniel Stenberg
6375a65433
urlapi: remove superfluous host name check
... as it is checked later more proper.

Closes #11195
2023-05-25 08:30:20 +02:00
Daniel Stenberg
127eb0d83a
misc: fix spelling mistakes
Reported-by: musvaage on github
Fixes #11171
Closes #11172
2023-05-23 10:42:09 +02:00
Emanuele Torre
eef076baa6
Revert "urlapi: respect CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY for redirects"
This reverts commit df6c2f7b54.
(It only keep the test case that checks redirection to an absolute URL
without hostname and CURLU_NO_AUTHORITY).

I originally wanted to make CURLU_ALLOW_SPACE accept spaces in the
hostname only because I thought
curl_url_set(CURLUPART_URL, CURLU_ALLOW_SPACE) was already accepting
them, and they were only not being accepted in the hostname when
curl_url_set(CURLUPART_URL) was used for a redirection.

That is not actually the case, urlapi never accepted hostnames with
spaces, and a hostname with a space in it never makes sense.
I probably misread the output of my original test when I they were
normally accepted when using CURLU_ALLOW_SPACE, and not redirecting.

Some other URL parsers seems to allow space in the host part of the URL,
e.g. both python3's urllib.parse module, and Chromium's javascript URL
object allow spaces (chromium percent escapes the spaces with %20),
(they also both ignore TABs, and other whitespace characters), but those
URLs with spaces in the hostname are useless, neither python3's requests
module nor Chromium's window.location can actually use them.

There is no reason to add support for URLs with spaces in the host,
since it was not a inconsistency bug; let's revert that patch before it
makes it into release. Sorry about that.

I also reverted the extra check for CURLU_NO_AUTHORITY since that does
not seem to be necessary, CURLU_NO_AUTHORITY already worked for
redirects.

Closes #11169
2023-05-21 13:59:04 +02:00
Daniel Stenberg
92772e6d39
urlapi: allow numerical parts in the host name
It can only be an IPv4 address if all parts are all digits and no more than
four parts, otherwise it is a host name. Even slightly wrong IPv4 will now be
passed through as a host name.

Regression from 17a15d8846 shipped in 8.1.0

Extended test 1560 accordingly.

Reported-by: Pavel Kalyugin
Fixes #11129
Closes #11131
2023-05-19 16:01:26 +02:00
Emanuele Torre
df6c2f7b54
urlapi: respect CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY for redirects
curl_url_set(uh, CURLUPART_URL, redirurl, flags)  was not respecing
CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY in the host part of redirurl
when redirecting to an absolute URL.

Closes #11136
2023-05-18 20:52:59 +02:00
Emanuele Torre
f198d33e8d
checksrc: disallow spaces before labels
Out of 415 labels throughout the code base, 86 of those labels were
not at the start of the line. Which means labels always at the start of
the line is the favoured style overall with 329 instances.

Out of the 86 labels not at the start of the line:
* 75 were indented with the same indentation level of the following line
* 8 were indented with exactly one space
* 2 were indented with one fewer indentation level then the following
  line
* 1 was indented with the indentation level of the following line minus
  three space (probably unintentional)

Co-Authored-By: Viktor Szakats

Closes #11134
2023-05-18 20:45:04 +02:00
Emanuele Torre
7f712399d5
checksrc: check for spaces before the colon of switch labels
Closes #11047
2023-04-27 23:26:50 +02:00
Daniel Stenberg
d567cca1de
checksrc: fix SPACEBEFOREPAREN for conditions starting with "*"
The open paren check wants to warn for spaces before open parenthesis
for if/while/for but also for any function call. In order to avoid
catching function pointer declarations, the logic allows a space if the
first character after the open parenthesis is an asterisk.

I also spotted what we did not include "switch" in the check but we should.

This check is a little lame, but we reduce this problem by not allowing
that space for if/while/for/switch.

Reported-by: Emanuele Torre
Closes #11044
2023-04-27 17:24:47 +02:00