How to avoid formatting for XML, SQL etc. raw strings in clang-format? - clang-format

I have some C++ code with embedded XML like this:
QLatin1String test()
{
return QLatin1String(R"XML(
<ui language="c++">
<widget class="Test" name="dialogbuttonbox">
<property name="text">
<string>DialogButtonBox</string>
</property>
</widget>
</ui>)XML");
}
When formatting this with
clang-format --style=WebKit test1.cpp
Then the xml will be formatted to be broken, because of the definition of RawStringFormat:
If no style has been defined in the .clang-format file for the
specific language, a predefined style given by ‘BasedOnStyle’ is used.
If ‘BasedOnStyle’ is not found, the formatting is based on llvm style.
So I tried the following within .clang-format:
RawStringFormats:
- Language: None
Delimiters: ['XML']
DisableFormat: true
Which results into the following error:
.clang-format:177:15: error: unknown enumerated scalar - Language: None
What I really want to avoid is to add comments to the source code to switch formatting off/on:
// clang-format off
// clang-format on
So my question is how to switch off formatting for XML, SQL or whatever is not supported by clang-format raw strings?
Also what about the above unknown enumerated scalar message? Does it mean that Language: None ist not allowed? So maybe I have a feature request?

I had a similar issue.
My fix was to split the XML sections into their own files.
The clang formater then doesn't touch them and your standard VS Code formater will be used.
I then needed to set the maxLineWidth to 0 which means no new lines because of column length.
.vscode/settings.json
"xml.format.maxLineWidth": 0

Related

SEC company filings: Is the <SEC-HEADER> tag valid SGML? If so, how to parse it?

I tried to parse SEC company filings from sec.gov. Starting from fb 10-Q index.htm let's look at a complete text submission filing like complete submission text filing. It has a structure like:
<SEC-DOCUMENT>
<SEC-HEADER>
<ACCEPTANCE-DATETIME>"some content" This tag is not closed.
"some lines resembling yaml markup"
These are indented lines with a
"key": "value" structure.
</SEC-HEADER>
<DOCUMENT>
.
.
some content
.
.
</DOCUMENT>
"several DOCUMENT tags" ...
</SEC-DOCUMENT>
I tried to figure out the structure of the <SEC-HEADER> tag and found some information under Public Dissemination
Service (PDS) Technical
Specification (pdf) and concluded that the content of the header should be SGML.
Nevertheless, I am clueless about the formatting, since there are no angle brackets, and the keys - value paires are separated by colons like key: value instead of <key>value</key>. In the pdf link I could not find anything about colons.
Question: Is the <SEC-HEADER> tag valid SGML? If it is, how to parse it?
I'd be glad at any help.
The short answer is no. The <SEC-HEADER> tag in the raw filing is not a valid SGML.
However, it is my understanding that this section in the raw filing is parsed automatically from the header file <accession_num>.hdr.sgml, which does follow SGML. This header file can be found in the same directory as the raw filing (i.e., the <accession_num>.txt file).
I use a REGEX of the form: ^<(.+?)>(.+?)$ (with re.MULTILINE option) to capture each (tag, value) tuple and get the results directly in a dict().
I believe the only tag in that file that has a closing tag is the </FILER> tag, where there could be multiple filers in each filing. You can first extract those using a REGEX of the form: <FILER>(.+?)</FILER> and then employ the same REGEX as above to get the inner tags for each filer.
Note that other than 'FILER', there could be other tags, representing different relations of the entities to the filing. Those are 'ISSUER', 'SUBJECT COMPANY', 'FILED BY', 'FILED FOR', 'SERIAL COMPANY', 'REPORTING OWNER'.

Setting EnableHipHopSyntax to True with HHVM

When I run my code, I get the following error:
Syntax only allowed with -v Eval.EnableHipHopSyntax=true in /var/web/site/myfile.php on line 26
myfile.php has a function at that line that has:
public static function set (
string $theme // <str> The theme to set as active.
, string $style = "default" // <str> The style that you want to set.
, string $layout = "default" // <str> The layout that you want to assign.
): string // RETURNS <str>
The bottom line, ): string" is the appropriate syntax for the hack language, but for some reason HHVM decided to brilliantly disable its own syntax by default.
I can't seem to find any documentation with HHVM that indicates how to set that config file. How can one go about this process?
Edit:
It turns out my HHVM conversion tool was not converting <?php to <?hh as I had instructed it to, due to having converted itself. In other words, it was attempting to convert <?hh to <?hh, which did me no good.
I had mistakenly assumed that HHVM was disabling it for <?hh tags, which was not the case.
This syntax is part of Hack, but you have a PHP file. If you change the opening tag from <?php to <?hh, it'll work.
Alternatively, you can add hhvm.enable_hip_hop_syntax = true to /etc/hhvm/php.ini.

Simple NSData's category to parse XML with cyrillic

I have to parse NSData with XML string, does somebody know simple category to do it? I have such for JSON, but I forced to use XML. I tried to use XMLReader, it's interface looks clean, but I found some issues:
Mysterious new line characters and spaces everywhere:
"comment_count" = {text = "\n \n 21";};
My cyrillic symbols looks so:
"description_text" = {text = "\n \U041f\U0438\U043a\U0430\U0431\U0443\U0448};
Example:
<?xml version="1.0" encoding="UTF-8" ?>
<news>
<xml_count>43</xml_count>
<hot_count>449</hot_count>
<item type="text">
<id>1469845</id>
<rating>147</rating>
<pluses>171</pluses>
<minuses>24</minuses>
<title>
<![CDATA[Обновление огромного архива Пикабу!]]>
</title>
<comment_count>26</comment_count>
<comment_link>http://pikabu.ru/story/obnovlenie_ogromnogo_arkhiva_pikabu_1469845</comment_link>
<author>icq677555</author>
<description_text>
<![CDATA[Пикабушники, я обновил свой огромный архив текстовых постов из горячего!]]>
</description_text>
</item>
</news>
I just realized whats' going on. Your data samples are obviously NSDictionary instances printed in the debugger. So the issues you found are:
As XML was originally designed as an annotated text format, the whitespace (spaces, newlines) handling doesn't perfectly fit for data only usage. You can either trim all resulting strings ([stringVar stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]), adapt XMLReader to do it or use the XML parser at http://ios.biomsoft.com/2011/09/11/simple-xml-to-nsdictionary-converter/ (which does it by default).
The funny output you get for Cyrillic characters is the proper escaping for non-ASCII characters in the debugger output (which uses the old-style property list format). It's an artifact of the debugger output. Your variables contain the proper characters.
BTW: While JSON contains implicit type information (strings are always quoted, numbers are never quoted etc.), XML without a schema file does not. So all the parsed simple values will be strings even if they originally were numbers.
Update:
The XML parser you're using still contains the old whitespace handling code described in Pesky new lines and whitespace in XML reader class (though the comment tells otherwise). Apply the fix mentioned at the bottom of the answer, namely change the line:
[dictInProgress setObject:textInProgress forKey:kXMLReaderTextNodeKey];
to:
[dictInProgress setObject:[textInProgress stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] forKey:kXMLReaderTextNodeKey];

iconv C API: charset conversion from/to local encoding

I am using the iconv C API and I want iconv to detect the local encoding of the computer. Is that possible? Apparently it is because when I look in the source code, I find in the file iconv_open1.h that if the fromcode or tocode variables are empty strings ("") then the local encoding is used using the locale_charset() function call.
Someone also told me that in order to convert the locale encoding to unicode, all I needed was to use iconv_open ("UTF-8", "")
Unfortunately, I find no mention of this in the documentation.
And when I convert some iso-8859-1 text to the locale encoding (which is utf-8 on my machine), then during conversion I get errno=EILSEQ (illegal sequence). I checked and iconv_open returned no error.
If instead of the empty string in iconv_open I specify "utf-8", then I get no error. Obviously iconv failed to detect my current charset.
edit: I checked with a simple C program that puts(nl_langinfo(CODESET)) and I get ANSI_X3.4-1968 (which is ASCII). Apparently, I got a problem with charset detection.
edit: this should be related to Why is nl_langinfo(CODESET) different from locale charmap?
additional information: my program is written in Ada, and I bind at link-time to C functions. Apparently, the locale setting is not initialized the same way in the Ada runtime and C runtime.
I'll take the same answer as in Why is nl_langinfo(CODESET) different from locale charmap?
You need to first call
setlocale(LC_ALL, "");

What does "Error parsing XML: not well-formed" mean?

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:orientation=”vertical”
android:layout_width=”fill_parent”
android:layout_height=”fill_parent” >
I get these two errors
error: Error parsing XML: not well-formed (invalid token)
&
Open quote is expected for attribute "android:orientation" associated with an element type "LinearLayout".
Did you copy and paste that from word? Your quotes look a little funky. Sometimes word will use a different character than the expected " for double quotes. Make sure those are all consistent. Otherwise, the syntax is invalid.
Looks like you have "smart quotes" ( not simple " double quotes) around some attributes in your LinearLayout element.
There are many references that explain the differences between valid and well formed XML documents. A good starting point can be found here. There is also an online XML Validator that you can use to test XML documents.
The validator shows that you have two issues:
Some of your attribute values use an invalid quote character: ” vs. ", and
you need to close the LinearLayout tag with /> instead of just >.

Resources