Commit Graph

155 Commits

Author SHA1 Message Date
Emanuele Torre
f198d33e8d
checksrc: disallow spaces before labels
Out of 415 labels throughout the code base, 86 of those labels were
not at the start of the line. Which means labels always at the start of
the line is the favoured style overall with 329 instances.

Out of the 86 labels not at the start of the line:
* 75 were indented with the same indentation level of the following line
* 8 were indented with exactly one space
* 2 were indented with one fewer indentation level then the following
  line
* 1 was indented with the indentation level of the following line minus
  three space (probably unintentional)

Co-Authored-By: Viktor Szakats

Closes #11134
2023-05-18 20:45:04 +02:00
Emanuele Torre
7f712399d5
checksrc: check for spaces before the colon of switch labels
Closes #11047
2023-04-27 23:26:50 +02:00
Daniel Stenberg
d567cca1de
checksrc: fix SPACEBEFOREPAREN for conditions starting with "*"
The open paren check wants to warn for spaces before open parenthesis
for if/while/for but also for any function call. In order to avoid
catching function pointer declarations, the logic allows a space if the
first character after the open parenthesis is an asterisk.

I also spotted what we did not include "switch" in the check but we should.

This check is a little lame, but we reduce this problem by not allowing
that space for if/while/for/switch.

Reported-by: Emanuele Torre
Closes #11044
2023-04-27 17:24:47 +02:00
Daniel Stenberg
b7b1846275
urlapi: make internal function start with Curl_
Curl_url_set_authority() it is.

Follow-up to acd82c8bfd

Closes #11035
2023-04-27 08:36:51 +02:00
Stefan Eissing
acd82c8bfd
tests/http: more tests with specific clients
- Makefile support for building test specific clients in tests/http/clients
- auto-make of clients when invoking pytest
- added test_09_02 for server PUSH_PROMISEs using clients/h2-serverpush
- added test_02_21 for lib based downloads and pausing/unpausing transfers

curl url parser:
- added internal method `curl_url_set_authority()` for setting the
  authority part of a url (used for PUSH_PROMISE)

http2:
- made logging of PUSH_PROMISE handling nicer

Placing python test requirements in requirements.txt files
- separate files to base test suite and http tests since use
  and module lists differ
- using the files in the gh workflows

websocket test cases, fixes for we and bufq
- bufq: account for spare chunks in space calculation
- bufq: reset chunks that are skipped empty
- ws: correctly encode frames with 126 bytes payload
- ws: update frame meta information on first call of collect
  callback that fills user buffer
- test client ws-data: some test/reporting improvements

Closes #11006
2023-04-26 23:24:46 +02:00
Daniel Stenberg
3f1d89ed24
urlapi: skip a pointless assign
It stores a null byte after already having confirmed there is a null
byte there. Detected by PVS.

Ref: #10929
Closes #10943
2023-04-13 14:36:28 +02:00
Daniel Stenberg
4cfa5bcc9a
urlapi: cleanups
- move host checks together
- simplify the scheme parser loop and the end of host name parser
- avoid itermediate buffer storing in multiple places
- reduce scope for several variables
- skip the Curl_dyn_tail() call for speed
- detect IPv6 earlier and skip extra checks for such hosts
- normalize directly in dynbuf instead of itermediate buffer
- split out the IPv6 parser into its own funciton
- call the IPv6 parser directly for ipv6 addresses
- remove (unused) special treatment of % in host names
- junkscan() once in the beginning instead of scattered
- make junkscan return error code
- remove unused query management from dedotdotify()
- make Curl_parse_login_details use memchr
- more use of memchr() instead of strchr() and less strlen() calls
- make junkscan check and return the URL length

An optimized build runs one of my benchmark URL parsing programs ~41%
faster using this branch. (compared against the shipped 7.88.1 library
in Debian)

Closes #10935
2023-04-13 08:41:40 +02:00
Daniel Stenberg
826e8011d5
urlapi: prevent setting invalid schemes with *url_set()
A typical mistake would be to try to set "https://" - including the
separator - this is now rejected as that would then lead to
url_get(... URL...) would get an invalid URL extracted.

Extended test 1560 to verify.

Closes #10911
2023-04-09 23:23:54 +02:00
Daniel Stenberg
17a15d8846
urlapi: detect and error on illegal IPv4 addresses
Using bad numbers in an IPv4 numerical address now returns
CURLUE_BAD_HOSTNAME.

I noticed while working on trurl and it was originally reported here:
https://github.com/curl/trurl/issues/78

Updated test 1560 accordingly.

Closes #10894
2023-04-06 09:02:00 +02:00
Daniel Stenberg
f042e1e75d
urlapi: URL encoding for the URL missed the fragment
Meaning that it would wrongly still store the fragment using spaces
instead of %20 if allowing space while also asking for URL encoding.

Discovered when playing with trurl.

Added test to lib1560 to verify the fix.

Closes #10887
2023-04-05 08:30:12 +02:00
rcombs
b1d735956f
urlapi: take const args in _dup and _get functions
Closes #10708
2023-03-08 15:38:26 +01:00
rcombs
95cb7d3166
urlapi: avoid mutating internals in getter routine
This was not intended.

Closes #10708
2023-03-08 15:38:18 +01:00
Daniel Stenberg
0a0c9b6dfa
urlapi: '%' is illegal in host names
Update test 1560 to verify

Ref: #10708
Closes #10711
2023-03-08 15:33:43 +01:00
Brad Spencer
ad4997e5b2
urlapi: parse IPv6 literals without ENABLE_IPV6
This makes the URL parser API stable and working the same way
independently of libcurl supporting IPv6 transfers or not.

Closes #10660
2023-03-03 10:05:08 +01:00
Daniel Stenberg
8b27799f8c
urlapi: do the port number extraction without using sscanf()
- sscanf() is rather complex and slow, strchr() much simpler

- the port number function does not need to fully verify the IPv6 address
  anyway as it is done later in the hostname_check() function and doing
  it twice is unnecessary.

Closes #10541
2023-02-17 16:21:26 +01:00
Pronyushkin Petr
2b46ce0313
urlapi: fix part of conditional expression is always true: qlen
Closes #10408
2023-02-06 08:53:07 +01:00
Daniel Stenberg
37554d7c07
urlapi: remove pathlen assignment
"Value stored to 'pathlen' is never read"

Follow-up to 804d5293f8

Reported-by: Kvarec Lezki

Closes #10405
2023-02-03 08:20:21 +01:00
Daniel Stenberg
63c53ea627
urlapi: skip the extra dedotdot alloc if no dot in path
Saves an allocation for many/most URLs.

Updates test 1395 accordingly

Closes #10403
2023-02-02 22:34:32 +01:00
Daniel Stenberg
7305ca63e2
urlapi: avoid Curl_dyn_addf() for hex outputs
Inspired by the recent fixes to escape.c, we should avoid calling
Curl_dyn_addf() in loops, perhaps in particular when adding something so
simple as %HH codes - for performance reasons. This change makes the
same thing for the URL parser's two URL-encoding loops.

Closes #10384
2023-02-01 23:05:51 +01:00
Daniel Stenberg
804d5293f8
urlapi: skip path checks if path is just "/"
As a miniscule optimization, treat a path of the length 1 as the same as
non-existing, as it can only be a single leading slash, and that's what
we do for no paths as well.

Closes #10385
2023-02-01 23:04:45 +01:00
Daniel Stenberg
2bc1d775f5
copyright: update all copyright lines and remove year ranges
- they are mostly pointless in all major jurisdictions
- many big corporations and projects already don't use them
- saves us from pointless churn
- git keeps history for us
- the year range is kept in COPYING

checksrc is updated to allow non-year using copyright statements

Closes #10205
2023-01-03 09:19:21 +01:00
Daniel Stenberg
901392cbb7
urlapi: add CURLU_PUNYCODE
Allows curl_url_get() get the punycode version of host names for the
host name and URL parts.

Extend test 1560 to verify.

Closes #10109
2022-12-26 23:29:23 +01:00
Daniel Stenberg
c20b35ddae
urlapi: reject more bad letters from the host name: &+()
Follow-up from eb0167ff7d

Extend test 1560 to verify

Closes #10096
2022-12-15 08:23:48 +01:00
Daniel Stenberg
b15ca64bb0
urlapi: remove two variable assigns
To please scan-build:

urlapi.c:1163:9: warning: Value stored to 'qlen' is never read
        qlen = Curl_dyn_len(&enc);
        ^      ~~~~~~~~~~~~~~~~~~
urlapi.c:1164:9: warning: Value stored to 'query' is never read
        query = u->query = Curl_dyn_ptr(&enc);
        ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Follow-up to 7d6cf06f57

Closes #9777
2022-10-21 11:00:18 +02:00
Daniel Stenberg
7d6cf06f57
urlapi: fix parsing URL without slash with CURLU_URLENCODE
When CURLU_URLENCODE is set, the parser would mistreat the path
component if the URL was specified without a slash like in
http://local.test:80?-123

Extended test 1560 to reproduce and verify the fix.

Reported-by: Trail of Bits

Closes #9763
2022-10-20 08:56:53 +02:00
12932
ddeec8feba
misc: nitpick grammar in comments/docs
because the 'u' in URL is actually a consonant *sound* it is only
correct to write "a URL"

sorry this is a bit nitpicky :P

https://english.stackexchange.com/questions/152/when-should-i-use-a-vs-an
https://www.techtarget.com/whatis/feature/Which-is-correct-a-URL-or-an-URL

Closes #9699
2022-10-12 11:32:43 +02:00
John Bampton
e80c4ff3d0
misc: fix spelling in docs and comments
also: remove outdated sentence

Closes #9644
2022-10-05 16:12:10 +02:00
Daniel Stenberg
eb0167ff7d
urlapi: reject more bad characters from the host name field
Extended test 1560 to verify

Report from the ongoing source code audit by Trail of Bits.

Closes #9608
2022-09-28 08:22:42 +02:00
Patrick Monnerat
9d51329047
setopt: use the handler table for protocol name to number conversions
This also returns error CURLE_UNSUPPORTED_PROTOCOL rather than
CURLE_BAD_FUNCTION_ARGUMENT when a listed protocol name is not found.

A new schemelen parameter is added to Curl_builtin_scheme() to support
this extended use.

Note that disabled protocols are not recognized anymore.

Tests adapted accordingly.

Closes #9472
2022-09-16 23:29:01 +02:00
Daniel Stenberg
846678541b
urlapi: detect scheme better when not guessing
When the parser is not allowed to guess scheme, it should consider the
word ending at the first colon to be the scheme, independently of number
of slashes.

The parser now checks that the scheme is known before it counts slashes,
to improve the error messge for URLs with unknown schemes and maybe no
slashes.

When following redirects, no scheme guessing is allowed and therefore
this change effectively prevents redirects to unknown schemes such as
"data".

Fixes #9503
2022-09-15 09:31:40 +02:00
Daniel Stenberg
f703cf971c
urlapi: leaner with fewer allocs
Slightly faster with more robust code. Uses fewer and smaller mallocs.

- remove two fields from the URL handle struct
- reduce copies and allocs
- use dynbuf buffers more instead of custom malloc + copies
- uses dynbuf to build the host name in reduces serial alloc+free within
  the same function.
- move dedotdotify into urlapi.c and make it static, not strdup the input
  and optimize it by checking for . and / before using strncmp
- remove a few strlen() calls
- add Curl_dyn_setlen() that can "trim" an existing dynbuf

Closes #9408
2022-09-07 10:21:45 +02:00
Daniel Stenberg
8dd95da35b
ctype: remove all use of <ctype.h>, use our own versions
Except in the test servers.

Closes #9433
2022-09-06 08:32:36 +02:00
Viktor Szakats
c9061f242b
misc: spelling fixes
Found using codespell 2.2.1.

Also delete the redundant protocol designator from an archive.org URL.

Reviewed-by: Daniel Stenberg
Closes #9403
2022-08-31 14:31:01 +00:00
Pierrick Charron
4bf2c231d7
urlapi: make curl_url_set(url, CURLUPART_URL, NULL, 0) clear all parts
As per the documentation :

> Setting a part to a NULL pointer will effectively remove that
> part's contents from the CURLU handle.

But currently clearing CURLUPART_URL does nothing and returns
CURLUE_OK. This change will clear all parts of the URL at once.

Closes #9028
2022-06-20 08:15:51 +02:00
max.mehl
ad9bc5976d
copyright: make repository REUSE compliant
Add licensing and copyright information for all files in this repository. This
either happens in the file itself as a comment header or in the file
`.reuse/dep5`.

This commit also adds a Github workflow to check pull requests and adapts
copyright.pl to the changes.

Closes #8869
2022-06-13 09:13:00 +02:00
Daniel Stenberg
c3fc406ebb
urlapi: support CURLU_URLENCODE for curl_url_get() 2022-06-08 16:32:46 +02:00
Daniel Stenberg
914aaab915
urlapi: reject percent-decoding host name into separator bytes
CVE-2022-27780

Reported-by: Axel Chong
Bug: https://curl.se/docs/CVE-2022-27780.html
Closes #8826
2022-05-09 12:50:34 +02:00
Sergey Markelov
b5b86856a9
urlapi: address (harmless) UndefinedBehavior sanitizer warning
`while(i--)` causes runtime error: unsigned integer overflow: 0 - 1
cannot be represented in type 'size_t' (aka 'unsigned long')

Closes #8797
2022-05-05 08:38:06 +02:00
Daniel Stenberg
a3f4d7cee9
misc: spelling fixes
Mostly in comments but also in the -w documentation for headers_json.

Closes #8647
2022-03-30 10:49:06 +02:00
Stefan Eissing
70ac27604a
urlapi: handle "redirects" smarter
- avoid one malloc when setting a new url via curl_url_set()
    and CURLUPART_URL.
  - extract common pattern into a new static function.

Closes #8450
2022-02-14 17:56:58 +01:00
Daniel Stenberg
2610142139
lib: remove support for CURL_DOES_CONVERSIONS
TPF was the only user and support for that was dropped.

Closes #8378
2022-02-04 08:05:35 +01:00
HenrikHolst
9fe2a20b1c urlapi: remove an unnecessary call to strlen
- Use strcpy instead of strlen+memcpy to copy the url path.

Ref: https://curl.se/mail/lib-2022-02/0006.html

Closes https://github.com/curl/curl/pull/8370
2022-02-01 15:43:45 -05:00
Daniel Stenberg
eec5ce4ab4
urlapi: if possible, shorten given numerical IPv6 addresses
Extended test 1560 to verify

Closes #8206
2022-01-02 22:59:08 +01:00
Daniel Stenberg
92d1aee8b1
urlapi: accept port number zero
This is a regression since 7.62.0 (fb30ac5a2d).

Updated test 1560 accordingly

Reported-by: Brad Fitzpatrick
Fixes #8090
Closes #8091
2021-12-03 22:58:41 +01:00
Daniel Stenberg
4183b8fe9a
urlapi: provide more detailed return codes
Previously, the return code CURLUE_MALFORMED_INPUT was used for almost
30 different URL format violations. This made it hard for users to
understand why a particular URL was not acceptable. Since the API cannot
point out a specific position within the URL for the problem, this now
instead introduces a number of additional and more fine-grained error
codes to allow the API to return more exactly in what "part" or section
of the URL a problem was detected.

Also bug-fixes curl_url_get() with CURLUPART_ZONEID, which previously
returned CURLUE_OK even if no zoneid existed.

Test cases in 1560 have been adjusted and extended. Tests 1538 and 1559
have been updated.

Updated libcurl-errors.3 and curl_url_strerror() accordingly.

Closes #8049
2021-11-25 08:36:04 +01:00
Daniel Stenberg
a5f5687368
urlapi: make Curl_is_absolute_url always use MAX_SCHEME_LEN
Instad of having all callers pass in the maximum length, always use
it. The passed in length is instead used only as the length of the
target buffer for to storing the scheme name in, if used.

Added the scheme max length restriction to the curl_url_set.3 man page.

Follow-up to 45bcb2eaa7

Closes #8047
2021-11-25 08:33:48 +01:00
Daniel Stenberg
3e6eb18fce
urlapi: reject short file URLs
file URLs that are 6 bytes or shorter are not complete. Return
CURLUE_MALFORMED_INPUT for those. Extended test 1560 to verify.

Triggered by #8041
Closes #8042
2021-11-23 08:45:21 +01:00
Stefan Eissing
45bcb2eaa7
urlapi: cleanup scheme parsing
Makea Curl_is_absolute_url() always leave a defined 'buf' and avoids
copying on urls that do not start with a scheme.

Closes #8043
2021-11-22 22:41:11 +01:00
Daniel Stenberg
efffa66f65
urlapi: skip a strlen(), pass in zero
... to let curl_easy_escape() itself do the strlen. This avoids a (false
positive) Coverity warning and it avoids us having to store the strlen()
return value in an int variable.

Reviewed-by: Daniel Gustafsson
Closes #7862
2021-10-15 23:22:14 +02:00
Daniel Stenberg
9a8564a920
urlapi: URL decode percent-encoded host names
The host name is stored decoded and can be encoded when used to extract
the full URL. By default when extracting the URL, the host name will not
be URL encoded to work as similar as possible as before. When not URL
encoding the host name, the '%' character will however still be encoded.

Getting the URL with the CURLU_URLENCODE flag set will percent encode
the host name part.

As a bonus, setting the host name part with curl_url_set() no longer
accepts a name that contains space, CR or LF.

Test 1560 has been extended to verify percent encodings.

Reported-by: Noam Moshe
Reported-by: Sharon Brizinov
Reported-by: Raul Onitza-Klugman
Reported-by: Kirill Efimov
Fixes #7830
Closes #7834
2021-10-11 17:04:14 +02:00
Daniel Gustafsson
12246eddc5 lib: avoid fallthrough cases in switch statements
Commit b5a434f7f0 inhibits the warning
on implicit fallthrough cases, since the current coding of indicating
fallthrough with comments is falling out of fashion with new compilers.
This attempts to make the issue smaller by rewriting fallthroughs to no
longer fallthrough, via either breaking the cases or turning switch
statements into if statements.

  lib/content_encoding.c: the fallthrough codepath is simply copied
    into the case as it's a single line.
  lib/http_ntlm.c: the fallthrough case skips a state in the state-
    machine and fast-forwards to NTLMSTATE_LAST. Do this before the
    switch statement instead to set up the states that we actually
    want.
  lib/http_proxy.c: the fallthrough is just falling into exiting the
    switch statement which can be done easily enough in the case.
  lib/mime.c: switch statement rewritten as if statement.
  lib/pop3.c: the fallthrough case skips to the next state in the
    statemachine, do this explicitly instead.
  lib/urlapi.c: switch statement rewritten as if statement.
  lib/vssh/wolfssh.c: the fallthrough cases fast-forwards the state
    machine, do this by running another iteration of the switch
    statement instead.
  lib/vtls/gtls.c: switch statement rewritten as if statement.
  lib/vtls/nss.c: the fallthrough codepath is simply copied into the
    case as it's a single line. Also twiddle a comment to not be
    inside a non-brace if statement.

Closes: #7322
See-also: #7295
Reviewed-by: Daniel Stenberg <daniel@haxx.se>
2021-09-29 10:00:52 +02:00
Sergey Markelov
4b997626b1
urlapi: support UNC paths in file: URLs on Windows
- file://host.name/path/file.txt is a valid UNC path
  \\host.name\path\files.txt to a non-local file transformed into URI
  (RFC 8089 Appendix E.3)

- UNC paths on other OSs must be smb: URLs

Closes #7366
2021-09-27 08:32:41 +02:00
Daniel Stenberg
98e6db24c4
urlapi.c:seturl: assert URL instead of using if-check
There's no code flow possible where this can happen. The assert makes
sure it also won't be introduced undetected in the future.

Closes #7610
2021-08-23 08:50:58 +02:00
Daniel Stenberg
d696ee00ee
lib: use %u instead of %ld for port number printf
Follow-up to 764c6bd3bf which changed the type of some port number
fields. Detected by Coverity (CID 1486624) etc.

Closes #7325
2021-06-30 23:25:35 +02:00
Daniel Stenberg
b67d3ba73e
curl_url_set: reject spaces in URLs w/o CURLU_ALLOW_SPACE
They were never officially allowed and slipped in only due to sloppy
parsing. Spaces (ascii 32) should be correctly encoded (to %20) before
being part of a URL.

The new flag bit CURLU_ALLOW_SPACE when a full URL is set, makes libcurl
allow spaces.

Updated test 1560 to verify.

Closes #7073
2021-06-15 10:49:49 +02:00
Daniel Stenberg
04488851e2
urlapi: make sure no +/- signs are accepted in IPv4 numericals
Follow-up to 56a037cc0a. Extends test 1560 to verify.

Reported-by: Tuomas Siipola
Fixes #6916
Closes #6917
2021-04-21 09:17:55 +02:00
Daniel Stenberg
56a037cc0a
urlapi: "normalize" numerical IPv4 host names
When the host name in a URL is given as an IPv4 numerical address, the
address can be specified with dotted numericals in four different ways:
a32, a.b24, a.b.c16 or a.b.c.d and each part can be specified in
decimal, octal (0-prefixed) or hexadecimal (0x-prefixed).

Instead of passing on the name as-is and leaving the handling to the
underlying name functions, which made them not work with c-ares but work
with getaddrinfo, this change now makes the curl URL API itself detect
and "normalize" host names specified as IPv4 numericals.

The WHATWG URL Spec says this is an okay way to specify a host name in a
URL. RFC 3896 does not allow them, but curl didn't prevent them before
and it seems other RFC 3896-using tools have not either. Host names used
like this are widely supported by other tools as well due to the
handling being done by getaddrinfo and friends.

I decided to add the functionality into the URL API itself so that all
users of these functions get the benefits, when for example wanting to
compare two URLs. Also, it makes curl built to use c-ares now support
them as well and make curl builds more consistent.

The normalization makes HTTPS and virtual hosted HTTP work fine even
when curl gets the address specified using one of the "obscure" formats.

Test 1560 is extended to verify.

Fixes #6863
Closes #6871
2021-04-19 08:34:55 +02:00
Daniel Stenberg
8ab78f720a
misc: fix "warning: empty expression statement has no effect"
Turned several macros into do-while(0) style to allow their use to work
find with semicolon.

Bug: 08e8455ddd (commitcomment-45433279)
Follow-up to 08e8455ddd
Reported-by: Gisle Vanem
Closes #6376
2020-12-26 23:44:17 +01:00
Daniel Stenberg
abd846c374
urlapi: don't accept blank port number field without scheme
... as it makes the URL parser accept "very-long-hostname://" as a valid
host name and we don't want that. The parser now only accepts a blank
(no digits) after the colon if the URL starts with a scheme.

Reported-by: d4d on hackerone

Closes #6283
2020-12-07 00:50:49 +01:00
Daniel Stenberg
4d2f800677
curl.se: new home
Closes #6172
2020-11-04 23:59:47 +01:00
Daniel Stenberg
b7ea3d2c22
urlapi: URL encode a '+' in the query part
... when asked to with CURLU_URLENCODE.

Extended test 1560 to verify.
Reported-by: Dietmar Hauser
Fixes #6086
Closes #6087
2020-10-15 23:21:53 +02:00
Emil Engler
c0f0e400e0
urlapi: use more Curl_safefree
Closes #5968
2020-09-17 09:44:36 +02:00
Daniel Stenberg
032e838b73
terminology: call them null-terminated strings
Updated terminology in docs, comments and phrases to refer to C strings
as "null-terminated". Done to unify with how most other C oriented docs
refer of them and what users in general seem to prefer (based on a
single highly unscientific poll on twitter).

Reported-by: coinhubs on github
Fixes #5598
Closes #5608
2020-06-28 00:31:24 +02:00
Daniel Stenberg
31e53584db
escape: make the URL decode able to reject only %00 bytes
... or all "control codes" or nothing.

Assisted-by: Nicolas Sterchele
2020-06-25 09:57:18 +02:00
Daniel Stenberg
7f1c098728
urlapi: accept :: as a valid IPv6 address
Text 1560 is extended to verify.

Reported-by: Pavel Volgarev
Fixes #5344
Closes #5351
2020-05-08 08:47:29 +02:00
Daniel Stenberg
d3dc0a07e9
urlapi: guess scheme correct even with credentials given
In the "scheme-less" parsing case, we need to strip off credentials
first before we guess scheme based on the host name!

Assisted-by: Jay Satiro
Fixes #4856
Closes #4857
2020-01-28 08:40:16 +01:00
Daniel Stenberg
02c6b984cb
urlapi: fix use-after-free bug
Follow-up from 2c20109a9b

Added test 663 to verify.

Reported by OSS-Fuzz
Bug: https://crbug.com/oss-fuzz/17954

Closes #4453
2019-10-03 22:54:26 +02:00
Daniel Stenberg
2c20109a9b
urlapi: fix URL encoding when setting a full URL 2019-10-02 07:53:17 +02:00
Marcel Raad
0f62c9af8b
urlapi: fix unused variable warning
`dest` is only used with `ENABLE_IPV6`.

Closes https://github.com/curl/curl/pull/4444
2019-10-01 10:47:41 +02:00
Daniel Stenberg
6e7733f788
urlapi: question mark within fragment is still fragment
The parser would check for a query part before fragment, which caused it
to do wrong when the fragment contains a question mark.

Extended test 1560 to verify.

Reported-by: Alex Konev
Fixes #4412
Closes #4413
2019-09-24 23:30:43 +02:00
Paul Dreik
47066036a0
urlapi: avoid index underflow for short ipv6 hostnames
If the input hostname is "[", hlen will underflow to max of size_t when
it is subtracted with 2.

hostname[hlen] will then cause a warning by ubsanitizer:

runtime error: addition of unsigned offset to 0x<snip> overflowed to
0x<snip>

I think that in practice, the generated code will work, and the output
of hostname[hlen] will be the first character "[".

This can be demonstrated by the following program (tested in both clang
and gcc, with -O3)

int main() {
  char* hostname=strdup("[");
  size_t hlen = strlen(hostname);

  hlen-=2;
  hostname++;
  printf("character is %d\n",+hostname[hlen]);
  free(hostname-1);
}

I found this through fuzzing, and even if it seems harmless, the proper
thing is to return early with an error.

Closes #4389
2019-09-21 15:57:17 +02:00
Daniel Stenberg
36fbb10071
urlapi: Expression 'storep' is always true
Fixes warning detected by PVS-Studio
Fixes #4374
2019-09-20 08:07:48 +02:00
Daniel Stenberg
a6451487d4
urlapi: 'scheme' is always true
Fixes warning detected by PVS-Studio
Fixes #4374
2019-09-20 08:07:46 +02:00
Daniel Stenberg
b10464399b
urlapi: part of conditional expression is always true: (relurl[0] == '/')
Fixes warning detected by PVS-Studio
Fixes #4374
2019-09-20 08:07:42 +02:00
Jens Finkhaeuser
0a4ecbdf1c
urlapi: CURLU_NO_AUTHORITY allows empty authority/host part
CURLU_NO_AUTHORITY is intended for use with unknown schemes (i.e. not
"file:///") to override cURL's default demand that an authority exists.

Closes #4349
2019-09-19 15:57:28 +02:00
Daniel Stenberg
9637dbfffd
urlapi: one colon is enough for the strspn() input (typo) 2019-09-10 11:51:51 +02:00
Daniel Stenberg
eab3c580f9
urlapi: verify the IPv6 numerical address
It needs to parse correctly. Otherwise it could be tricked into letting
through a-f using host names that libcurl would then resolve. Like
'[ab.be]'.

Reported-by: Thomas Vegas
Closes #4315
2019-09-10 11:32:12 +02:00
Omar Ramadan
c454d7f3f4
urlapi: increase supported scheme length to 40 bytes
The longest currently registered URI scheme at IANA is 36 bytes long.

Closes #3905
Closes #3900
2019-05-20 15:27:02 +02:00
Marcel Raad
10db3ef21e
lib: reduce variable scopes
Fixes Codacy/CppCheck warnings.

Closes https://github.com/curl/curl/pull/3872
2019-05-20 08:51:11 +02:00
Daniel Stenberg
9f9ec7da57
urlapi: require a non-zero host name length when parsing URL
Updated test 1560 to verify.

Closes #3880
2019-05-14 13:39:10 +02:00
Daniel Stenberg
2d0e9b40d3
urlapi: add CURLUPART_ZONEID to set and get
The zoneid can be used with IPv6 numerical addresses.

Updated test 1560 to verify.

Closes #3834
2019-05-05 15:52:46 +02:00
Daniel Stenberg
bdb2dbc103
urlapi: strip off scope id from numerical IPv6 addresses
... to make the host name "usable". Store the scope id and put it back
when extracting a URL out of it.

Also makes curl_url_set() syntax check CURLUPART_HOST.

Fixes #3817
Closes #3822
2019-05-03 12:17:22 +02:00
Daniel Stenberg
5fc28510a4
CURL_MAX_INPUT_LENGTH: largest acceptable string input size
This limits all accepted input strings passed to libcurl to be less than
CURL_MAX_INPUT_LENGTH (8000000) bytes, for these API calls:
curl_easy_setopt() and curl_url_set().

The 8000000 number is arbitrary picked and is meant to detect mistakes
or abuse, not to limit actual practical use cases. By limiting the
acceptable string lengths we also reduce the risk of integer overflows
all over.

NOTE: This does not apply to `CURLOPT_POSTFIELDS`.

Test 1559 verifies.

Closes #3805
2019-04-29 08:02:44 +02:00
Daniel Stenberg
d715d2ac89
urlapi: stricter CURLUPART_PORT parsing
Only allow well formed decimal numbers in the input.

Document that the number MUST be between 1 and 65535.

Add tests to test 1560 to verify the above.

Ref: https://github.com/curl/curl/issues/3753
Closes #3762
2019-04-13 11:17:30 +02:00
Jakub Zakrzewski
0dd47c2a3d
urlapi: urlencode characters above 0x7f correctly
fixes #3741
Closes #3742
2019-04-07 22:57:42 +02:00
Daniel Stenberg
05b100aee2
cleanup: make local functions static
urlapi: turn three local-only functions into statics

conncache: make conncache_find_first_connection static

multi: make detach_connnection static

connect: make getaddressinfo static

curl_ntlm_core: make hmac_md5 static

http2: make two functions static

http: make http_setup_conn static

connect: make tcpnodelay static

tests: make UNITTEST a thing to mark functions with, so they can be static for
normal builds and non-static for unit test builds

... and mark Curl_shuffle_addr accordingly.

url: make up_free static

setopt: make vsetopt static

curl_endian: make write32_le static

rtsp: make rtsp_connisdead static

warnless: remove unused functions

memdebug: remove one unused function, made another static
2019-02-10 18:38:57 +01:00
Daniel Stenberg
f260b9e932
urlapi: reduce variable scope, remove unreachable 'break'
Both nits pointed out by codacy.com

Closes #3540
2019-02-09 23:33:36 +01:00
Daniel Gustafsson
a4482b21bd urlapi: fix parsing ipv6 with zone index
The previous fix for parsing IPv6 URLs with a zone index was a paddle
short for URLs without an explicit port. This patch fixes that case
and adds a unit test case.

This bug was highlighted by issue #3408, and while it's not the full
fix for the problem there it is an isolated bug that should be fixed
regardless.

Closes #3411
Reported-by: GitYuanQu on github
Reviewed-by: Daniel Stenberg <daniel@haxx.se>
2018-12-30 20:11:57 +01:00
Leonardo Taccari
305d25ed8a
urlapi: distinguish possibly empty query
If just a `?' to indicate the query is passed always store a zero length
query instead of having a NULL query.

This permits to distinguish URL with trailing `?'.

Fixes #3369
Closes #3370
2018-12-13 10:21:33 +01:00
Daniel Gustafsson
d8607da1a6 urlapi: Fix port parsing of eol colon
A URL with a single colon without a portnumber should use the default
port, discarding the colon. Fix, add a testcase and also do little bit
of comment wordsmithing.

Closes #3365
Reviewed-by: Daniel Stenberg <daniel@haxx.se>
2018-12-12 11:48:04 +01:00
Daniel Gustafsson
e1be2ecba4 tests: add urlapi unittest
This adds a new unittest intended to cover the internal functions in
the urlapi code, starting with parse_port(). In order to avoid name
collisions in debug builds, parse_port() is renamed Curl_parse_port()
since it will be exported.

Reviewed-by: Daniel Stenberg <daniel@haxx.se>
Reviewed-by: Marcel Raad <Marcel.Raad@teamviewer.com>
2018-12-11 15:02:24 +01:00
Daniel Gustafsson
63533cbde2 urlapi: fix portnumber parsing for ipv6 zone index
An IPv6 URL which contains a zone index includes a '%%25<zode id>'
string before the ending ']' bracket. The parsing logic wasn't set
up to cope with the zone index however, resulting in a malformed url
error being returned. Fix by breaking the parsing into two stages
to correctly handle the zone index.

Closes #3355
Closes #3319
Reported-by: tonystz on Github
Reviewed-by: Daniel Stenberg <daniel@haxx.se>
Reviewed-by: Marcel Raad <Marcel.Raad@teamviewer.com>
2018-12-11 15:02:19 +01:00
Daniel Stenberg
dcd6f81025
snprintf: renamed and we now only use msnprintf()
The function does not return the same value as snprintf() normally does,
so readers may be mislead into thinking the code works differently than
it actually does. A different function name makes this easier to detect.

Reported-by: Tomas Hoger
Assisted-by: Daniel Gustafsson
Fixes #3296
Closes #3297
2018-11-23 08:26:51 +01:00
Daniel Stenberg
9aa8ff2895
urlapi: only skip encoding the first '=' with APPENDQUERY set
APPENDQUERY + URLENCODE would skip all equals signs but now it only skip
encoding the first to better allow "name=content" for any content.

Reported-by: Alexey Melnichuk
Fixes #3231
Closes #3231
2018-11-07 08:28:48 +01:00
Daniel Stenberg
9df8dc101b
url: a short host name + port is not a scheme
The function identifying a leading "scheme" part of the URL considered a
few letters ending with a colon to be a scheme, making something like
"short:80" to become an unknown scheme instead of a short host name and
a port number.

Extended test 1560 to verify.

Also fixed test203 to use file_pwd to make it get the correct path on
windows. Removed test 2070 since it was a duplicate of 203.

Assisted-by: Marcel Raad
Reported-by: Hagai Auro
Fixes #3220
Fixes #3233
Closes #3223
Closes #3235
2018-11-06 19:11:58 +01:00
Daniel Stenberg
d9abebc7ee
Revert "url: a short host name + port is not a scheme"
This reverts commit 226cfa8264.

This commit caused test failures on appveyor/windows. Work on fixing them is
in #3235.
2018-11-05 09:24:59 +01:00
Daniel Stenberg
226cfa8264
url: a short host name + port is not a scheme
The function identifying a leading "scheme" part of the URL considered a few
letters ending with a colon to be a scheme, making something like "short:80"
to become an unknown scheme instead of a short host name and a port number.

Extended test 1560 to verify.

Reported-by: Hagai Auro
Fixes #3220
Closes #3223
2018-11-03 15:01:27 +01:00
Daniel Stenberg
b28094833a
URL: fix IPv6 numeral address parser
Regression from 46e164069d. Extended test 1560 to verify.

Reported-by: tpaukrt on github
Fixes #3218
Closes #3219
2018-11-03 00:14:04 +01:00
Daniel Stenberg
d9a2dc9aad
urlapi: starting with a drive letter on win32 is not an abs url
... and libcurl doesn't support any single-letter URL schemes (if there
even exist any) so it should be fairly risk-free.

Reported-by: Marcel Raad

Fixes #3070
Closes #3071
2018-10-02 11:48:01 +02:00
Daniel Stenberg
2097cd5152
urlapi: fix support for address scope in IPv6 numerical addresses
Closes #3024
2018-09-21 11:19:14 +02:00