Commit Graph

94 Commits

Author SHA1 Message Date
Daniel Stenberg
66e5351e0a
urlapi: fix redirect to a new fragment or query (only)
The redirect logic was broken when the redirect-to URL was a relative
URL only as a fragment or query (starting with '#' or '?').

Extended test 1560 to reproduce, then verify.

Reported-by: Jeroen Ooms
Fixes #15836
Closes #15848
2024-12-30 08:23:26 +01:00
Daniel Stenberg
566a6d7b09
urlapi: normalize the IPv6 address
As the parsing and address "regeneration" are done anyway, we might as
well use the updated version in the result and thereby A) get a
normalized (and lower cased) version of the address and B) avoid a
strcpy().

Updated test 1560 to verify.

Closes #15143
2024-10-03 16:05:03 +02:00
Daniel Stenberg
d78e129d50
WebSockets: make support official (non-experimental)
Inverts the configure/cmake options to instead provide options that
disable WebSockets and have them (ws + wss) enabled by default.

Closes #14936
2024-09-27 13:20:25 +02:00
Daniel Stenberg
fbf5d507ce
lib/src: white space edits to comply better with code style
... as checksrc now finds and complains about these.

Closes #14921
2024-09-19 14:59:12 +02:00
Daniel Stenberg
d1394a00ea
urlapi: verify URL *decoded* hostname when set
It was previously wrongly verifying the input in its URL encoded format
when setting the hostname component with curl_url_set(), so it wrongly
rejected '%'.

Now it URL decodes the name appropriately before the check.

Added tests to lib1560 to verify that a fine %-code is okay and that a
bad %-code (that decodes to '%') is rejected.

Regression from 0a0c9b6dfa, shipped in 8.0.0

Fixes #14656
Reported-by: Venkat Krishna R
Closes #14657
2024-08-23 13:55:13 +02:00
Daniel Stenberg
655d44d139
urlapi: add CURLU_NO_GUESS_SCHEME
Used for extracting:

- when used asking for a scheme, it will return CURLUE_NO_SCHEME if the
  stored information was a guess

- when used asking for a URL, the URL is returned without a scheme, like
  when previously given to the URL parser when it was asked to guess

- as soon as the scheme is set explicitly, it is no longer internally
  marked as guessed

The idea being:

1. allow a user to figure out if a URL's scheme was set as a result of
  guessing

2. extract the URL without a guessed scheme

3. this makes it work similar to how we already deal with port numbers

Extend test 1560 to verify.

Closes #13616
2024-06-01 23:51:42 +02:00
Viktor Szakats
25cbc2f79a
tests: make the unit test result type CURLcode
Before this patch, the result code was a mixture of `int` and
`CURLcode`.

Also adjust casts and fix a couple of minor issues found along the way.

Cherry-picked from #13489
Closes #13600
2024-05-12 18:53:07 +02:00
Daniel Stenberg
fe17c162d0
urlapi: allow setting port number zero
Also set and check errno when strtoul() parsing numbers for better error
checking.

Updated test 1560

Closes #13427
2024-04-19 23:54:21 +02:00
Daniel Stenberg
3eac21d86b
urlapi: add CURLU_GET_EMPTY for empty queries and fragments
By default the API inhibits empty queries and fragments extracted.
Unless this new flag is set.

This also makes the behavior more consistent: without it set, zero
length queries and fragments are considered not present in the URL. With
the flag set, they are returned as a zero length strings if they were in
fact present in the URL.

This applies when extracting the individual query and fragment
components and for the full URL.

Closes #13396
2024-04-18 10:37:28 +02:00
Daniel Stenberg
4dc414c3ec
lib1560: test with leading zeroes and more IPv4 versions
Inspired by WHATWG URL Spec test inputs

Closes #13400
2024-04-17 22:45:24 +02:00
Daniel Stenberg
c37b694e46
urlapi: fix relative redirects to fragment-only
Using the URL API for a redirect URL when the redirected-to string
starts with a hash, ie is only a fragment, the API would produce the
wrong final URL.

Adjusted test 1560 to test for several new redirect cases.

Closes #13394
2024-04-17 22:41:47 +02:00
MonkeybreadSoftware
add22feeef
idn: add native AppleIDN (icucore) support for macOS/iOS
I implemented the IDN functions for macOS and iOS using Unicode
libraries coming with macOS and iOS.

Builds and runs here on macOS 14.2.1. Also verified to load and
run on older macOS version 10.13.

Build requires macOS SDK 13 or equivalent.

Set `-DUSE_APPLE_IDN=ON` CMake option to enable it.
With autotools and other build tools, set these manual options:
```
CPPFLAGS=-DUSE_APPLE_IDN
LIBS=-licucore
```

Completes TODO 1.6.

TODO: add autotools option and feature-detection.

Refs: #5330 #5371
Co-authored-by: Viktor Szakats
Closes #13246
2024-04-17 00:24:09 +02:00
Viktor Szakats
5b286c2508
build: delete/replace clang warning pragmas
- delete redundant warning suppressions for `-Wformat-nonliteral`.
  This now relies on `CURL_PRINTF()` and it's theoratically possible
  that this macro isn't active but the warning is. We're ignoring this
  as a corner-case here.

- replace two pragmas with code changes to avoid the warnings.

Follow-up to aee4ebe591 #12803
Follow-up to 0923012758 #12540
Follow-up to 3829759bd0 #12489

Reviewed-by: Daniel Stenberg
Closes #12812
2024-01-27 21:19:41 +00:00
Viktor Szakats
2dbe75bd7f
build: fix some -Wsign-conversion/-Warith-conversion warnings
- enable `-Wsign-conversion` warnings, but also setting them to not
  raise errors.
- fix `-Warith-conversion` warnings seen in CI.
  These are triggered by `-Wsign-converion` and causing errors unless
  explicitly silenced. It makes more sense to fix them, there just a few
  of them.
- fix some `-Wsign-conversion` warnings.
- hide `-Wsign-conversion` warnings with a `#pragma`.
- add macro `CURL_WARN_SIGN_CONVERSION` to unhide them on a per-build
  basis.
- update a CI job to unhide them with the above macro:
  https://github.com/curl/curl/actions/workflows/linux.yml -> OpenSSL -O3

Closes #12492
2023-12-19 12:45:28 +00:00
Viktor Szakats
3829759bd0
build: enable missing OpenSSF-recommended warnings, with fixes
https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html
as of 2023-11-29 [1].

Enable new recommended warnings (except `-Wsign-conversion`):

- enable `-Wformat=2` for clang (in both cmake and autotools).
- add `CURL_PRINTF()` internal attribute and mark functions accepting
  printf arguments with it. This is a copy of existing
  `CURL_TEMP_PRINTF()` but using `__printf__` to make it compatible
  with redefinting the `printf` symbol:
  https://gcc.gnu.org/onlinedocs/gcc-3.0.4/gcc_5.html#SEC94
- fix `CURL_PRINTF()` and existing `CURL_TEMP_PRINTF()` for
  mingw-w64 and enable it on this platform.
- enable `-Wimplicit-fallthrough`.
- enable `-Wtrampolines`.
- add `-Wsign-conversion` commented with a FIXME.
- cmake: enable `-pedantic-errors` the way we do it with autotools.
  Follow-up to d5c0351055 #2747
- lib/curl_trc.h: use `CURL_FORMAT()`, this also fixes it to enable format
  checks. Previously it was always disabled due to the internal `printf`
  macro.

Fix them:

- fix bug where an `set_ipv6_v6only()` call was missed in builds with
  `--disable-verbose` / `CURL_DISABLE_VERBOSE_STRINGS=ON`.
- add internal `FALLTHROUGH()` macro.
- replace obsolete fall-through comments with `FALLTHROUGH()`.
- fix fallthrough markups: Delete redundant ones (showing up as
  warnings in most cases). Add missing ones. Fix indentation.
- silence `-Wformat-nonliteral` warnings with llvm/clang.
- fix one `-Wformat-nonliteral` warning.
- fix new `-Wformat` and `-Wformat-security` warnings.
- fix `CURL_FORMAT_SOCKET_T` value for mingw-w64. Also move its
  definition to `lib/curl_setup.h` allowing use in `tests/server`.
- lib: fix two wrongly passed string arguments in log outputs.
  Co-authored-by: Jay Satiro
- fix new `-Wformat` warnings on mingw-w64.

[1] 56c0fde389/docs/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C%2B%2B.md

Closes #12489
2023-12-16 13:12:37 +00:00
Viktor Szakats
e9a7d4a1c8
windows: use built-in _WIN32 macro to detect Windows
Windows compilers define `_WIN32` automatically. Windows SDK headers
or build env defines `WIN32`, or we have to take care of it. The
agreement seems to be that `_WIN32` is the preferred practice here.
Make the source code rely on that to detect we're building for Windows.

Public `curl.h` was using `WIN32`, `__WIN32__` and `CURL_WIN32` for
Windows detection, next to the official `_WIN32`. After this patch it
only uses `_WIN32` for this. Also, make it stop defining `CURL_WIN32`.

There is a slight chance these break compatibility with Windows
compilers that fail to define `_WIN32`. I'm not aware of any obsolete
or modern compiler affected, but in case there is one, one possible
solution is to define this macro manually.

grepping for `WIN32` remains useful to discover Windows-specific code.

Also:

- extend `checksrc` to ensure we're not using `WIN32` anymore.

- apply minor formatting here and there.

- delete unnecessary checks for `!MSDOS` when `_WIN32` is present.

Co-authored-by: Jay Satiro
Reviewed-by: Daniel Stenberg

Closes #12376
2023-11-22 15:42:25 +00:00
Daniel Stenberg
5c846a12a3
urlapi: when URL encoding the fragment, pass in the right length
A benign bug because it would only add an extra null terminator.

Made lib1560 get a test that runs this code.

Closes #12250
2023-11-02 16:23:17 +01:00
Daniel Stenberg
8c8a03f252
lib1560: verify appending blank URL encoded query string 2023-11-01 10:55:58 +01:00
Daniel Stenberg
21c5d5971e
lib1560: verify setting host to "" with and without URL encode 2023-11-01 10:55:55 +01:00
Viktor Szakats
3b6d18bbf6
spelling: fix codespell 2.2.6 typos
Closes #12019
2023-10-03 21:37:56 +00:00
Daniel Stenberg
887b998e6e
urlapi: setting a blank URL ("") is not an ok URL
Test it in 1560
Fixes #11714
Reported-by: ad0p on github
Closes #11715
2023-08-23 23:24:16 +02:00
Daniel Stenberg
c350069f64
urlapi: CURLU_PUNY2IDN - convert from punycode to IDN name
Asssisted-by: Jay Satiro
Closes #11655
2023-08-13 15:34:38 +02:00
Daniel Stenberg
49e2443186
urlapi: make sure zoneid is also duplicated in curl_url_dup
Add several curl_url_dup() tests to the general lib1560 test.

Reported-by: Rutger Broekhoff
Bug: https://curl.se/mail/lib-2023-07/0047.html
Closes #11549
2023-08-01 08:00:28 +02:00
Daniel Stenberg
3c9256c8a0
urlapi: have *set(PATH) prepend a slash if one is missing
Previously the code would just do that for the path when extracting the
full URL, which made a subsequent curl_url_get() of the path to
(unexpectedly) still return it without the leading path.

Amend lib1560 to verify this. Clarify the curl_url_set() docs about it.

Bug: https://curl.se/mail/lib-2023-06/0015.html
Closes #11272
Reported-by: Pedro Henrique
2023-06-08 16:08:45 +02:00
Daniel Stenberg
ba669d072d
urlapi: scheme starts with alpha
Add multiple tests to lib1560 to verify

Fixes #11249
Reported-by: ad0p on github
Closes #11250
2023-06-05 16:28:27 +02:00
Daniel Stenberg
329889f1ea
lib1560: verify more scheme guessing
- on 2nd level domains
- on names without dots

As mentioned in #11161, "imap.com" will be guessed IMAP

Closes #11219
2023-05-29 23:44:42 +02:00
Daniel Stenberg
6375a65433
urlapi: remove superfluous host name check
... as it is checked later more proper.

Closes #11195
2023-05-25 08:30:20 +02:00
Emanuele Torre
eef076baa6
Revert "urlapi: respect CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY for redirects"
This reverts commit df6c2f7b54.
(It only keep the test case that checks redirection to an absolute URL
without hostname and CURLU_NO_AUTHORITY).

I originally wanted to make CURLU_ALLOW_SPACE accept spaces in the
hostname only because I thought
curl_url_set(CURLUPART_URL, CURLU_ALLOW_SPACE) was already accepting
them, and they were only not being accepted in the hostname when
curl_url_set(CURLUPART_URL) was used for a redirection.

That is not actually the case, urlapi never accepted hostnames with
spaces, and a hostname with a space in it never makes sense.
I probably misread the output of my original test when I they were
normally accepted when using CURLU_ALLOW_SPACE, and not redirecting.

Some other URL parsers seems to allow space in the host part of the URL,
e.g. both python3's urllib.parse module, and Chromium's javascript URL
object allow spaces (chromium percent escapes the spaces with %20),
(they also both ignore TABs, and other whitespace characters), but those
URLs with spaces in the hostname are useless, neither python3's requests
module nor Chromium's window.location can actually use them.

There is no reason to add support for URLs with spaces in the host,
since it was not a inconsistency bug; let's revert that patch before it
makes it into release. Sorry about that.

I also reverted the extra check for CURLU_NO_AUTHORITY since that does
not seem to be necessary, CURLU_NO_AUTHORITY already worked for
redirects.

Closes #11169
2023-05-21 13:59:04 +02:00
Daniel Stenberg
92772e6d39
urlapi: allow numerical parts in the host name
It can only be an IPv4 address if all parts are all digits and no more than
four parts, otherwise it is a host name. Even slightly wrong IPv4 will now be
passed through as a host name.

Regression from 17a15d8846 shipped in 8.1.0

Extended test 1560 accordingly.

Reported-by: Pavel Kalyugin
Fixes #11129
Closes #11131
2023-05-19 16:01:26 +02:00
Emanuele Torre
df6c2f7b54
urlapi: respect CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY for redirects
curl_url_set(uh, CURLUPART_URL, redirurl, flags)  was not respecing
CURLU_ALLOW_SPACE and CURLU_NO_AUTHORITY in the host part of redirurl
when redirecting to an absolute URL.

Closes #11136
2023-05-18 20:52:59 +02:00
Daniel Stenberg
4cfa5bcc9a
urlapi: cleanups
- move host checks together
- simplify the scheme parser loop and the end of host name parser
- avoid itermediate buffer storing in multiple places
- reduce scope for several variables
- skip the Curl_dyn_tail() call for speed
- detect IPv6 earlier and skip extra checks for such hosts
- normalize directly in dynbuf instead of itermediate buffer
- split out the IPv6 parser into its own funciton
- call the IPv6 parser directly for ipv6 addresses
- remove (unused) special treatment of % in host names
- junkscan() once in the beginning instead of scattered
- make junkscan return error code
- remove unused query management from dedotdotify()
- make Curl_parse_login_details use memchr
- more use of memchr() instead of strchr() and less strlen() calls
- make junkscan check and return the URL length

An optimized build runs one of my benchmark URL parsing programs ~41%
faster using this branch. (compared against the shipped 7.88.1 library
in Debian)

Closes #10935
2023-04-13 08:41:40 +02:00
Daniel Stenberg
309a517ffd
lib1560: verify that more bad host names are rejected
when setting the hostname component of a URL

Closes #10922
2023-04-11 11:33:07 +02:00
Daniel Stenberg
826e8011d5
urlapi: prevent setting invalid schemes with *url_set()
A typical mistake would be to try to set "https://" - including the
separator - this is now rejected as that would then lead to
url_get(... URL...) would get an invalid URL extracted.

Extended test 1560 to verify.

Closes #10911
2023-04-09 23:23:54 +02:00
Daniel Stenberg
17a15d8846
urlapi: detect and error on illegal IPv4 addresses
Using bad numbers in an IPv4 numerical address now returns
CURLUE_BAD_HOSTNAME.

I noticed while working on trurl and it was originally reported here:
https://github.com/curl/trurl/issues/78

Updated test 1560 accordingly.

Closes #10894
2023-04-06 09:02:00 +02:00
Daniel Stenberg
f042e1e75d
urlapi: URL encoding for the URL missed the fragment
Meaning that it would wrongly still store the fragment using spaces
instead of %20 if allowing space while also asking for URL encoding.

Discovered when playing with trurl.

Added test to lib1560 to verify the fix.

Closes #10887
2023-04-05 08:30:12 +02:00
Daniel Stenberg
0a0c9b6dfa
urlapi: '%' is illegal in host names
Update test 1560 to verify

Ref: #10708
Closes #10711
2023-03-08 15:33:43 +01:00
Daniel Stenberg
54605666ed
lib1560: fix enumerated type mixed with another type
Follow-up to c84c0f9aa3

Closes #10684
2023-03-06 08:14:42 +01:00
Daniel Stenberg
c84c0f9aa3
lib1560: test parsing URLs with ridiculously large fields
In the order of 120K.

Closes #10665
2023-03-03 23:23:53 +01:00
Daniel Stenberg
bb11969838
lib1560: add a test using %25 in the userinfo in a URL
Closes #10578
2023-02-21 16:10:13 +01:00
Daniel Stenberg
b30b0c3840
lib1560: add IPv6 canonicalization tests
Closes #10552
2023-02-17 23:22:05 +01:00
Daniel Stenberg
8b27799f8c
urlapi: do the port number extraction without using sscanf()
- sscanf() is rather complex and slow, strchr() much simpler

- the port number function does not need to fully verify the IPv6 address
  anyway as it is done later in the hostname_check() function and doing
  it twice is unnecessary.

Closes #10541
2023-02-17 16:21:26 +01:00
Daniel Stenberg
2bc1d775f5
copyright: update all copyright lines and remove year ranges
- they are mostly pointless in all major jurisdictions
- many big corporations and projects already don't use them
- saves us from pointless churn
- git keeps history for us
- the year range is kept in COPYING

checksrc is updated to allow non-year using copyright statements

Closes #10205
2023-01-03 09:19:21 +01:00
Daniel Stenberg
901392cbb7
urlapi: add CURLU_PUNYCODE
Allows curl_url_get() get the punycode version of host names for the
host name and URL parts.

Extend test 1560 to verify.

Closes #10109
2022-12-26 23:29:23 +01:00
Daniel Stenberg
b151faa083
lib1560: add some basic IDN host name tests
Closes #10094
2022-12-15 22:57:08 +01:00
Daniel Stenberg
c20b35ddae
urlapi: reject more bad letters from the host name: &+()
Follow-up from eb0167ff7d

Extend test 1560 to verify

Closes #10096
2022-12-15 08:23:48 +01:00
Daniel Stenberg
7d6cf06f57
urlapi: fix parsing URL without slash with CURLU_URLENCODE
When CURLU_URLENCODE is set, the parser would mistreat the path
component if the URL was specified without a slash like in
http://local.test:80?-123

Extended test 1560 to reproduce and verify the fix.

Reported-by: Trail of Bits

Closes #9763
2022-10-20 08:56:53 +02:00
Daniel Stenberg
eb0167ff7d
urlapi: reject more bad characters from the host name field
Extended test 1560 to verify

Report from the ongoing source code audit by Trail of Bits.

Closes #9608
2022-09-28 08:22:42 +02:00
Daniel Stenberg
1a87a1efba
url: a zero-length userinfo part in the URL is still a (blank) user
Adjusted test 1560 to verify

Reported-by: Jay Satiro

Fixes #9088
Closes #9590
2022-09-26 07:45:53 +02:00
Daniel Stenberg
c4768f168c
lib1560: extended to verify detect/reject of unknown schemes
... when no guessing is allowed.
2022-09-15 09:31:45 +02:00
Daniel Stenberg
ef80a87f40
libtest/lib1560: test basic websocket URL parsing 2022-09-09 15:11:14 +02:00