curl/lib/curl_multibyte.c
Viktor Szakats 2a292c3984
build: add Windows CE / CeGCC support, with CI jobs
Make it possible to build curl for Windows CE using the CeGCC toolchain.
With both CMake and autotools, including tests and examples, also in CI.
The build configuration is the default one with Schannel enabled. No
3rd-party dependencies have been tested.

Also revive old code to make Schannel build with Windows CE, including
certificate verification.

Builds have been throughougly tested. But, I've made no functional tests
for this PR. Some parts (esp. file operations, like truncate and seek)
are stubbed out and likely broken as a result. Test servers build, but
they do not work on Windows CE. This patch substitutes `fstat()` calls
with `stat()`, which operate on filenames, not file handles. This may or
may not work and/or may not be secure.

About CeGCC: I used the latest available macOS binary build v0.59.1
r1397 from 2009, in native `mingw32ce` build mode. CeGCC is in effect
MinGW + GCC 4.4.0 + old/classic-mingw Windows headers. It targets
Windows CE v3.0 according to its `_WIN32_WCE` value. It means this PR
restores portions of old/classic-mingw support. It makes the Windows CE
codepath compatible with GCC 4.4.0. It also adds workaround for CMake,
which cannot identify and configure this toolchain out of the box.

Notes:
- CMake doesn't recognize CeGCC/mingw32ce, necessitating tricks as seen
  with Amiga and MS-DOS.
- CMake doesn't set `MINGW` for mingw32ce. Set it and `MINGW32CE`
  manually as a helper variable, in addition to `WINCE` which CMake sets
  based on `CMAKE_SYSTEM_NAME`.
- CMake fails to create an implib for `libcurl.dll`, due to not
  recognizing the platform as a Windowsy one. This patch adds the
  necessary workaround to make it work.
- headers shipping with CeGCC miss some things curl needs for Schannel
  support. Fixed by restoring and renovating code previously deleted
  old-mingw code.
- it's sometime non-trivial to figure out if a fallout is WinCE,
  mingw32ce, old-mingw, or GCC version-specific.
- WinCE is always Unicode. With exceptions: no `wmain`,
  `GetProcAddress()`.
- `_fileno()` is said to convert from `FILE *` to `void *` which is
  a Win32 file `HANDLE`. (This patch doesn't use this, but with further
  effort it probably could be.)
  https://stackoverflow.com/questions/3989545/how-do-i-get-the-file-handle-from-the-fopen-file-structure
- WinCE has no signals, current directory, stdio/CRT file handles, no
  `_get_osfhandle()`, no `errno`, no `errno.h`. Some of this stuff is
  standard C89, yet missing from this platform. Microsoft expects
  Windows CE apps to use Win32 file API and `FILE *` exclusively.
- revived CeGCC here (not tested for this PR):
  https://building.enlyze.com/posts/a-new-windows-ce-x86-compiler-in-2024/

On `UNDER_CE` vs. `_WIN32_WCE`: (This patch settled on `UNDER_CE`)

- A custom VS2008 WinCE toolchain does not set any of these.
  The compiler binaries don't contain these strings, and has no compiler
  option for targeting WinCE, hinting that a vanilla toolchain isn't
  setting any of them either.
- `UNDER_CE` is automatically defined by the CeGCC compiler.
  https://cegcc.sourceforge.net/docs/details.html
- `UNDER_CE` is similar to `_WIN32`, except it's not set automatically
  by all compilers. It's not supposed to have any value, like a version.
  (Though e.g. OpenSSL sets it to a version)
- `_WIN32_WCE` is the CE counterpart of the non-CE `_WIN32_WINNT` macro.
  That does return the targeted Windows CE version.
- `_WIN32_WCE` is not defined by compilers, and relies on a header
  setting it to a default, or the build to set it to the desired target
  version. This is also how `_WIN32_WINNT` works.
- `_WIN32_WCE` default is set by `windef.h` in CeGCC.
- `_WIN32_WCE` isn't set to a default by MSVC Windows CE headers (the
  ones I checked at least).
- CMake sets `_WIN32_WCE=<ver>`, `UNDER_CE`, `WINCE` for MSVC WinCE.
- `_WIN32_WCE` seems more popular in other projects, including CeGCC
  itself. `zlib` is a notable exception amongst curl dependencies,
  which uses `UNDER_CE`.
- Since `_WIN32_WCE` needs "certain" headers to have it defined, it's
  undefined depending on headers included beforehand.
- `curl/curl.h` re-uses `_WIN32_WCE`'s as a self-guard, relying on
  its not-(necessarily)-defined-by-default property:
  25b445e479/include/curl/curl.h (L77)

Toolchain downloads:
- Windows:
  https://downloads.sourceforge.net/cegcc/cegcc/0.59.1/cegcc_mingw32ce_cygwin1.7_r1399.tar.bz2
- macOS Intel:
  https://downloads.sourceforge.net/cegcc/cegcc/0.59.1/cegcc_mingw32ce_snowleopard_r1397.tar.bz2

Closes #15975
2025-02-21 13:56:34 +01:00

358 lines
9.3 KiB
C

/***************************************************************************
* _ _ ____ _
* Project ___| | | | _ \| |
* / __| | | | |_) | |
* | (__| |_| | _ <| |___
* \___|\___/|_| \_\_____|
*
* Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
*
* This software is licensed as described in the file COPYING, which
* you should have received as part of this distribution. The terms
* are also available at https://curl.se/docs/copyright.html.
*
* You may opt to use, copy, modify, merge, publish, distribute and/or sell
* copies of the Software, and permit persons to whom the Software is
* furnished to do so, under the terms of the COPYING file.
*
* This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY
* KIND, either express or implied.
*
* SPDX-License-Identifier: curl
*
***************************************************************************/
/*
* This file is 'mem-include-scan' clean, which means memdebug.h and
* curl_memory.h are purposely not included in this file. See test 1132.
*
* The functions in this file are curlx functions which are not tracked by the
* curl memory tracker memdebug.
*/
#include "curl_setup.h"
#ifdef _WIN32
#include "curl_multibyte.h"
/*
* MultiByte conversions using Windows kernel32 library.
*/
wchar_t *curlx_convert_UTF8_to_wchar(const char *str_utf8)
{
wchar_t *str_w = NULL;
if(str_utf8) {
int str_w_len = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
str_utf8, -1, NULL, 0);
if(str_w_len > 0) {
str_w = malloc(str_w_len * sizeof(wchar_t));
if(str_w) {
if(MultiByteToWideChar(CP_UTF8, 0, str_utf8, -1, str_w,
str_w_len) == 0) {
free(str_w);
return NULL;
}
}
}
}
return str_w;
}
char *curlx_convert_wchar_to_UTF8(const wchar_t *str_w)
{
char *str_utf8 = NULL;
if(str_w) {
int bytes = WideCharToMultiByte(CP_UTF8, 0, str_w, -1,
NULL, 0, NULL, NULL);
if(bytes > 0) {
str_utf8 = malloc(bytes);
if(str_utf8) {
if(WideCharToMultiByte(CP_UTF8, 0, str_w, -1, str_utf8, bytes,
NULL, NULL) == 0) {
free(str_utf8);
return NULL;
}
}
}
}
return str_utf8;
}
#ifndef UNDER_CE
/* declare GetFullPathNameW for mingw-w64 UWP builds targeting old windows */
#if defined(CURL_WINDOWS_UWP) && defined(__MINGW32__) && \
(_WIN32_WINNT < _WIN32_WINNT_WIN10)
WINBASEAPI DWORD WINAPI GetFullPathNameW(LPCWSTR, DWORD, LPWSTR, LPWSTR *);
#endif
/* Fix excessive paths (paths that exceed MAX_PATH length of 260).
*
* This is a helper function to fix paths that would exceed the MAX_PATH
* limitation check done by Windows APIs. It does so by normalizing the passed
* in filename or path 'in' to its full canonical path, and if that path is
* longer than MAX_PATH then setting 'out' to "\\?\" prefix + that full path.
*
* For example 'in' filename255chars in current directory C:\foo\bar is
* fixed as \\?\C:\foo\bar\filename255chars for 'out' which will tell Windows
* it is ok to access that filename even though the actual full path is longer
* than 260 chars.
*
* For non-Unicode builds this function may fail sometimes because only the
* Unicode versions of some Windows API functions can access paths longer than
* MAX_PATH, for example GetFullPathNameW which is used in this function. When
* the full path is then converted from Unicode to multibyte that fails if any
* directories in the path contain characters not in the current codepage.
*/
static bool fix_excessive_path(const TCHAR *in, TCHAR **out)
{
size_t needed, count;
const wchar_t *in_w;
wchar_t *fbuf = NULL;
/* MS documented "approximate" limit for the maximum path length */
const size_t max_path_len = 32767;
#ifndef _UNICODE
wchar_t *ibuf = NULL;
char *obuf = NULL;
#endif
*out = NULL;
/* skip paths already normalized */
if(!_tcsncmp(in, _T("\\\\?\\"), 4))
goto cleanup;
#ifndef _UNICODE
/* convert multibyte input to unicode */
needed = mbstowcs(NULL, in, 0);
if(needed == (size_t)-1 || needed >= max_path_len)
goto cleanup;
++needed; /* for NUL */
ibuf = malloc(needed * sizeof(wchar_t));
if(!ibuf)
goto cleanup;
count = mbstowcs(ibuf, in, needed);
if(count == (size_t)-1 || count >= needed)
goto cleanup;
in_w = ibuf;
#else
in_w = in;
#endif
/* GetFullPathNameW returns the normalized full path in unicode. It converts
forward slashes to backslashes, processes .. to remove directory segments,
etc. Unlike GetFullPathNameA it can process paths that exceed MAX_PATH. */
needed = (size_t)GetFullPathNameW(in_w, 0, NULL, NULL);
if(!needed || needed > max_path_len)
goto cleanup;
/* skip paths that are not excessive and do not need modification */
if(needed <= MAX_PATH)
goto cleanup;
fbuf = malloc(needed * sizeof(wchar_t));
if(!fbuf)
goto cleanup;
count = (size_t)GetFullPathNameW(in_w, (DWORD)needed, fbuf, NULL);
if(!count || count >= needed)
goto cleanup;
/* prepend \\?\ or \\?\UNC\ to the excessively long path.
*
* c:\longpath ---> \\?\c:\longpath
* \\.\c:\longpath ---> \\?\c:\longpath
* \\?\c:\longpath ---> \\?\c:\longpath (unchanged)
* \\server\c$\longpath ---> \\?\UNC\server\c$\longpath
*
* https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats
*/
if(!wcsncmp(fbuf, L"\\\\?\\", 4))
; /* do nothing */
else if(!wcsncmp(fbuf, L"\\\\.\\", 4))
fbuf[2] = '?';
else if(!wcsncmp(fbuf, L"\\\\.", 3) || !wcsncmp(fbuf, L"\\\\?", 3)) {
/* Unexpected, not UNC. The formatting doc doesn't allow this AFAICT. */
goto cleanup;
}
else {
wchar_t *temp;
if(!wcsncmp(fbuf, L"\\\\", 2)) {
/* "\\?\UNC\" + full path without "\\" + null */
needed = 8 + (count - 2) + 1;
if(needed > max_path_len)
goto cleanup;
temp = malloc(needed * sizeof(wchar_t));
if(!temp)
goto cleanup;
wcsncpy(temp, L"\\\\?\\UNC\\", 8);
wcscpy(temp + 8, fbuf + 2);
}
else {
/* "\\?\" + full path + null */
needed = 4 + count + 1;
if(needed > max_path_len)
goto cleanup;
temp = malloc(needed * sizeof(wchar_t));
if(!temp)
goto cleanup;
wcsncpy(temp, L"\\\\?\\", 4);
wcscpy(temp + 4, fbuf);
}
free(fbuf);
fbuf = temp;
}
#ifndef _UNICODE
/* convert unicode full path to multibyte output */
needed = wcstombs(NULL, fbuf, 0);
if(needed == (size_t)-1 || needed >= max_path_len)
goto cleanup;
++needed; /* for NUL */
obuf = malloc(needed);
if(!obuf)
goto cleanup;
count = wcstombs(obuf, fbuf, needed);
if(count == (size_t)-1 || count >= needed)
goto cleanup;
*out = obuf;
obuf = NULL;
#else
*out = fbuf;
fbuf = NULL;
#endif
cleanup:
free(fbuf);
#ifndef _UNICODE
free(ibuf);
free(obuf);
#endif
return *out ? true : false;
}
int curlx_win32_open(const char *filename, int oflag, ...)
{
int pmode = 0;
int result = -1;
TCHAR *fixed = NULL;
const TCHAR *target = NULL;
#ifdef _UNICODE
wchar_t *filename_w = curlx_convert_UTF8_to_wchar(filename);
#endif
va_list param;
va_start(param, oflag);
if(oflag & O_CREAT)
pmode = va_arg(param, int);
va_end(param);
#ifdef _UNICODE
if(filename_w) {
if(fix_excessive_path(filename_w, &fixed))
target = fixed;
else
target = filename_w;
result = _wopen(target, oflag, pmode);
curlx_unicodefree(filename_w);
}
else
CURL_SETERRNO(EINVAL);
#else
if(fix_excessive_path(filename, &fixed))
target = fixed;
else
target = filename;
result = (_open)(target, oflag, pmode);
#endif
free(fixed);
return result;
}
FILE *curlx_win32_fopen(const char *filename, const char *mode)
{
FILE *result = NULL;
TCHAR *fixed = NULL;
const TCHAR *target = NULL;
#ifdef _UNICODE
wchar_t *filename_w = curlx_convert_UTF8_to_wchar(filename);
wchar_t *mode_w = curlx_convert_UTF8_to_wchar(mode);
if(filename_w && mode_w) {
if(fix_excessive_path(filename_w, &fixed))
target = fixed;
else
target = filename_w;
result = _wfopen(target, mode_w);
}
else
CURL_SETERRNO(EINVAL);
curlx_unicodefree(filename_w);
curlx_unicodefree(mode_w);
#else
if(fix_excessive_path(filename, &fixed))
target = fixed;
else
target = filename;
result = (fopen)(target, mode);
#endif
free(fixed);
return result;
}
int curlx_win32_stat(const char *path, struct_stat *buffer)
{
int result = -1;
TCHAR *fixed = NULL;
const TCHAR *target = NULL;
#ifdef _UNICODE
wchar_t *path_w = curlx_convert_UTF8_to_wchar(path);
if(path_w) {
if(fix_excessive_path(path_w, &fixed))
target = fixed;
else
target = path_w;
#ifndef USE_WIN32_LARGE_FILES
result = _wstat(target, buffer);
#else
result = _wstati64(target, buffer);
#endif
curlx_unicodefree(path_w);
}
else
CURL_SETERRNO(EINVAL);
#else
if(fix_excessive_path(path, &fixed))
target = fixed;
else
target = path;
#ifndef USE_WIN32_LARGE_FILES
result = _stat(target, buffer);
#else
result = _stati64(target, buffer);
#endif
#endif
free(fixed);
return result;
}
#endif /* UNDER_CE */
#endif /* _WIN32 */