Cross-platform way to handle std::string/std::wstring with std::filesystem::path - character-encoding

I have a sample piece of C++ code that is throwing an exception on Linux:
namespace fs = std::filesystem;
const fs::path pathDir(L"/var/media");
const fs::path pathMedia = pathDir / L"COMPACTO - Diogo Poças.mxf" // <-- Exception thrown here
The exception being thrown is: filesystem error: Cannot convert character sequence: Invalid in or incomplete multibyte or wide character
I surmise that the issue is related to the use of the ç character.
Why is this wide string (wchar_t) an "invalid or incomplete multibyte or wide character"?
Going forward, how do I make related code cross-platform to run on Windows and/or Linux.
Are there helper functions I need to use?
What rules do I need to enforce from a programmer's PoV?
I've seen a response here that says "Don't use wide strings on Linux", do I use the same rules for Windows?
Linux Environment (not forgetting the fact that I'd like to run cross-platform):
Ubuntu 18.04.3
gcc 9.2.1
C++17

Unfortunately std::filesystem was not written with operating system compatibility in mind, at least not as advertised.
For Unix based systems, we need UTF8 (u8"string", or just "string" depending on the compiler)
For Windows, we need UTF16 (L"string")
In C++17 you can use filesystem::u8path (which for some reason is deprecated in C++20). In Windows, this will convert UTF8 to UTF16. Now you can pass UTF16 to APIs.
#ifdef _WINDOWS_PLATFORM
//windows I/O setup
_setmode(_fileno(stdin), _O_WTEXT);
_setmode(_fileno(stdout), _O_WTEXT);
#endif
fs::path path = fs::u8path(u8"ελληνικά.txt");
#ifdef _WINDOWS_PLATFORM
std::wcout << "UTF16: " << path << std::endl;
#else
std::cout << "UTF8: " << path << std::endl;
#endif
Or use your own macro to set UTF16 for Windows (L"string"), and UTF8 for Unix based systems (u8"string" or just "string"). Make sure UNICODE is defined for Windows.
#ifdef _WINDOWS_PLATFORM
#define _TEXT(quote) L##quote
#define _tcout std::wcout
#else
#define _TEXT(quote) u8##quote
#define _tcout std::cout
#endif
fs::path path(_TEXT("ελληνικά.txt"));
_tcout << path << std::endl;
See also
https://en.cppreference.com/w/cpp/filesystem/path/native
Note, Visual Studio has a special constructor for std::fstream which allows using UTF16 filename, and it's compatible for UTF8 read/write. For example the following code will work in Visual Studio:
fs::path utf16 = fs::u8path(u8"UTF8 filename ελληνικά.txt");
std::ofstream fout(utf16);
fout << u8"UTF8 content ελληνικά";
I am not sure if that's supported on latest gcc versions running on Windows.

Looks like a GCC bug.
According to std::filesystem::path::path you should be able to call std::filesystem::path constructor with a wide-character string and that independent of underlying platform (that's the whole point of std::filesystem).
Clang shows correct behavior.

Related

QtCreator annotation compiler does not find stdbool.h

I'm using QtCreator 4.11.2 , installed via MSYS2, with ClangCodeModel enabled.
Here is my program (this is the result of creating a New Non-QT Plain C Application):
#include <stdio.h>
#include <stdbool.h>
_Bool a;
bool b;
int main()
{
printf("Hello World!\n");
return 0;
}
The .pro file is unchanged from the default:
TEMPLATE = app
CONFIG += console
CONFIG -= app_bundle
CONFIG -= qt
SOURCES += \
main.c
The annotation compiler highlights an error saying stdbool.h cannot be found.
But it does not give an error for _Bool a; , so it is clearly running in C99 mode but has some problem with include paths. The "Follow symbol under cursor" option works, opening stdbool.h.
My question is: How do I configure include paths for the annotation compiler or otherwise fix this problem?
I have been unable to figure out how to set options for the annotation compiler or even which compiler binary it is using . Under Tools > Options > C++ > Code Model > Diagnostic Configuration it lets me add -W flags but does not let me add -I flags, a red message pops up saying the option is invalid.
Under Tools > Options > C++ Code Model inspector, there are no diagnostic messages, and the Code Model Inspecting Log shows stdbool.h being correctly found and parsed, as msys64/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/stdbool.h.
If I disable the ClangCodeModel plugin then there are no errors , but I would like to use the clang version if it can be made to work as in general it has good diagnostics.
The result of clang --version in a shell prompt is:
clang version 10.0.0 (https://github.com/msys2/MINGW-packages.git 3f880aaba91a3d9cdfb222dc270274731a2119a9)
Target: x86_64-w64-windows-gnu
Thread model: posix
InstalledDir: F:\Prog\msys64\mingw64\bin
and if I compile this same source code using clang outside of QtCreator, it compiles and runs correctly with no diagnostics. So the annotation compiler is clearly not the same as the commandline clang?
The Kit I have selected in QtCreator is the autodetected Desktop Qt MinGW-w64 64bit (MSYS2)
The exact same symptoms occur if I make a Plain C++ project and try to include stdbool.h (which is required to exist by the C++ Standard, although deprecated), although interestingly it does accept <cstdbool>.
I have found a workaround of sorts: including in the .pro file the line:
INCLUDEPATH += F:/Prog/msys64/mingw64/lib/gcc/x86_64-w64-mingw32/9.3.0/include/
causes the annotation compiler to work correctly, however this is undesirable as I'd have to keep changing it whenever I switch Kits because it also passes this to the actual build compiler, not just the annotation compiler.
Create file stdbool.h in C:\msys64\mingw64\x86_64-w64-mingw32\include and copy paste this code:
/* Copyright (C) 1998-2017 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Under Section 7 of GPL version 3, you are granted additional
permissions described in the GCC Runtime Library Exception, version
3.1, as published by the Free Software Foundation.
You should have received a copy of the GNU General Public License and
a copy of the GCC Runtime Library Exception along with this program;
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
/*
* ISO C Standard: 7.16 Boolean type and values <stdbool.h>
*/
#ifndef _STDBOOL_H
#define _STDBOOL_H
#ifndef __cplusplus
#define bool _Bool
#define true 1
#define false 0
#else /* __cplusplus */
/* Supporting _Bool in C++ is a GCC extension. */
#define _Bool bool
#if __cplusplus < 201103L
/* Defining these macros in C++98 is a GCC extension. */
#define bool bool
#define false false
#define true true
#endif
#endif /* __cplusplus */
/* Signal that all the definitions are present. */
#define __bool_true_false_are_defined 1
#endif /* stdbool.h */
Note
Creating a manual file stdbool.h works for me but its a sketchy and a temporary solution for now. Don't use this if you feel its too sketcy. I would rather use a alternative solution than this hack if it exist. This solution might not be good but it still works for me.

puts(NULL) - why doesn't WP+RTE complain?

Consider this small C file:
#include <stdio.h>
void f(void) {
puts(NULL);
}
I'm running the WP and RTE plugins of Frama-C like this:
frama-c-gui puts.c -wp -rte -wp-rte
I would expect this code to generate a proof obligation of valid_read_string(NULL); or similar, which would be obviously unprovable. However, to my surprise, no such thing happens. Is this a deficiency in the ACSL specification of the standard library?
Basically yes. You can see in the version of stdio.h that is bundled with Frama-C that the specification for puts is
/*# assigns *stream \from s[..]; */
extern int fputs(const char * restrict s,
FILE * restrict stream);
i.e. the bare minimum, an assigns clause (plus a from clause for Eva). Preconditions on s and stream. Adding a precondition on s would be easy; things are more complex for stream since you need a model for the various objects of type FILE.

Getting bison parser to divulge debug information

I am having trouble writing a bison parser, and unexpectedly ran into difficulties getting the parser to print debug information. I found two solutions on the web, but neither seems to work.
This advocates to put this code in the main routine:
extern int yydebug;
yydebug = 1;
Unfortunately the C++ compiler detects an undefined reference to `yydebug'.
This suggests putting
#if YYDEBUG == 1
extern yydebug;
yydebug = 1;
#endif
into the grammar file. It compiles but does not produce output.
What does work is to edit the parser file itself, replacing
int yydebug;
by
int yydebug = 1;
The big disadvantage is that I have to redo this every time I change the grammar file, which during debugging would happen constantly. Is there any other way I can provoke the parser into coughing up its secret machinations?
I am using bison v2.4.1 to generate the parser, with the following command-line options:
bison -ldv -p osil -o $(srcdir)/OSParseosil.tab.cpp OSParseosil.y
Although the output is a C++ file, I am using the standard C skeleton.
With bison and the standard C skeleton, to enable debug support you need to do one of the following:
Use the -t (Posix) or --debug (Bison extension) command-line option when you create your grammar. (bison -t ...)
Use the -DYYDEBUG=1 command-line option (gcc or clang, at least) when you compile the generated grammar (gcc -DYYDEBUG=1 parser.tab.c ...`).
Add the %debug directive to your bison source
Put #define YYDEBUG 1 in the prologue in your bison source (the part of the file between %{ and %}.
I'd use -t in the bison command line. It's simple, and since it is Posix standard it probably will also work on other derived parser generators. However, adding %debug to the bison source is also simple; while it is not as portable, it works in bison 2.4.
Once you've done that, simply setting yydebug to a non-zero value is sufficient to produce debug output.
If you want to set yydebug in some translation unit other than the generated parser itself, you need to be aware of the parser prefix you declared in the bison command line. (In the parser itself, yydebug is #defined to the prefixed name.) And you need to declare the debug variable (with the correct prefix) as extern. So in your main, you probably want to use:
extern int osildebug;
// ...
int main(int argc, char** argv) {
osildebug = 1;
// ...
}
If you're using bison, your best place to find information is the bison manual; most of the above answer will be found in that page.

JEDI JCL runtime compiler error E2040 when using JclWin32.hpp

I have installed the current stable JEDI Code library in C++ Builder XE3 on Windows 7 x32. It works fine, but only as long as I don't include files like JclFileUtils.hpp which are including JclWin32.hpp. Then I get always the compiler error E2040: "Declaration terminated incorrectly" (in file JclWin32.hpp, line 682, second line in the following code snippet):
#define NetApi32 L"netapi32.dll"
static const System::Int8 CSIDL_PROGRAM_FILESX86 = System::Int8(0x2a);
#define RT_MANIFEST (System::WideChar *)(0x18)
I neither have an idea were this error comes from, nor could I found any hints to this. What could be the cause? Thanks in advance.
I got help and the solution for this problem. Just replace the static const declaration:
static const System::Int8 CSIDL_PROGRAM_FILESX86 = System::Int8(0x2a);
with this macro definition:
#define CSIDL_PROGRAM_FILESX86 0x2a
This is a bug in JclWin32.pas.
In C/C++, the Win32 API declares CSIDL values in Microsoft's shlobj.h header using preprocessor #define statements, eg:
#define CSIDL_PROGRAM_FILESX86 0x002a
After the preprocessor is run and performs #define symbol replacements, the compiler ends up seeing the following invalid declaration in JclWin32.hpp:
static const System::Int8 0x002a = System::Int8(0x2a);
JCL should not be re-declaring CSIDL_PROGRAM_FILESX86 (or any other CSIDL value) at all. It should be either:
using Delphi's own Winapi.ShlObj unit, which already declares CSIDL values.
if not using the Winapi.ShlObj unit, then it should at least be declaring its manual CSIDL values as {$EXTERNALSYM} so they do not appear in the generated JclWin32.hpp file. If needed, JCL can include an {$HPPEMIT '#include <shlobj.h>'} statement to pull in the existing Win32 API declarations for C/C++ projects to use.

Cannot link a minimal Lua program

I have the following trivial Lua program which I copied from the book Programming In Lua
#include <stdio.h>
#include <lua.h>
#include <lauxlib.h>
#include <lualib.h>
int main (void)
{
char buff[256];
int error;
lua_State *L = luaL_newstate(); /* opens Lua */
luaL_openlibs(L); /* opens the standard libraries */
while (fgets(buff, sizeof(buff), stdin) != NULL)
{
error = luaL_loadbuffer(L, buff, strlen(buff), "line") ||
lua_pcall(L, 0, 0, 0);
if (error)
{
fprintf(stderr, "%s", lua_tostring(L, -1));
lua_pop(L, 1); /* pop error message from the stack */
}
}
lua_close(L);
return 0;
}
my environment is cywin
my make file looks like this:
CC=gcc
INCLUDE='-I/home/xyz/c_drive/Program Files/Lua/5.1/include'
LINKFLAGS='-L/home/xyz/c_drive/Program Files/Lua/5.1/lib' -llua51
li.o:li.c
$(CC) $(INCLUDE) -c li.c
main:li.o
$(CC) -o main $(LINKFLAGS) li.o
clean:
rm *.o
rm main
My /home/xyz/c_drive/Program Files/Lua/5.1/lib directory contains lua5.1.dll lua5.1.lib lua51.dll and lua51.lib
Trying to build my main target I am getting the following errors:
li.o:li.c:(.text+0x35): undefined reference to `_luaL_newstate'
li.o:li.c:(.text+0x49): undefined reference to `_luaL_openlibs'
li.o:li.c:(.text+0xaf): undefined reference to `_luaL_loadbuffer'
li.o:li.c:(.text+0xd9): undefined reference to `_lua_pcall'
li.o:li.c:(.text+0x120): undefined reference to `_lua_tolstring'
li.o:li.c:(.text+0x154): undefined reference to `_lua_settop'
li.o:li.c:(.text+0x167): undefined reference to `_lua_close'
Any ideas about what I might be doing wrong here?
The problem is that you have named the libraries on the link command line before the object files that require them. The linker loads modules from left to right on the command line. At the point on the line where you name -llua51, no undefined symbols that could be satisfied by that library are known. Then you name li.o, which does have unknown symbols.
Some Unix-like environments don't treat this as an error because part of the link process is deferred to the program load when reference to .so files are satisfied. But Cygwin, MinGW, and Windows in general must treat this as an error because DLLs work quite differently from .so files.
The solution is to put -llua51 after all the .o files on your link line.
Edit: Incidentally, it appears you are linking against the Lua for Windows distribution, but building with GCC under Cygwin. You will want to use Dependency Walker to make sure that your program does not depend on the Cygwin runtime, and that it does depend on the same C runtime as the lua51.dll from Lua for Windows. IIRC, that will be the runtime for the previous version of Visual Studio. It is possible to make GCC link against that, but you will need to be using the MinGW port (which you can use from Cygwin), and link against a couple of specific libraries to get that version. I'm away from my usual PC, or I'd quote an exact link line. (I believe you need -lmoldname -lmsvcr80 or something like that, as the last items on the link line.)
It will cause mysterious and very hard to diagnose problems if more than one C runtime library is in use. The easy answer is to use the same one as your preferred Lua DLL. Another alternative is that the Lua Binaries project has pre-compiled Lua DLLs for a wide array of C toolchains on Windows. If you need a Lua application that understands the Cygwin environment, you will want one that is built by GCC for Cygwin and not the Lua for Windows flavor. Lua Binaries will be your friend, or you can build Lua your self from source.
The names in the Lua API do not have those leading underscores. Try compiling with -fno-leading-underscore.

Resources