How can I efficiently parse formatted text from a file in Qt?

How can I efficiently parse formatted text from a file in Qt? - parsing

I would like to get efficient way of working with Strings in Qt. Since I am new in Qt environment.
So What I am doing:
I am loading a text file, and getting each lines.
Each line has text with comma separated.
Line schema:
Fname{limit:list:option}, Lname{limit:list:option} ... etc.
Example:
John{0:0:0}, Lname{0:0:0}
Notes:limit can be 1 or 0 and the same as others.
So I would like to get Fname and get limit,list,option values from {}.
I am thinking to write a code with find { and takes what is inside, by reading symbol by symbol.
What is the efficient way to parse that?
Thanks.

The following snippet will give you Fname and limit,list,option from the first set of brackets. It could be easily updated if you are interested in the Lname set as well.
QFile file("input.txt");
if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
qDebug() << "Failed to open input file.";
QRegularExpression re("(?<name>\\w+)\\{(?<limit>[0-1]):(?<list>[0-1]):(?<option>[0-1])}");
while (!file.atEnd())
{
QString line = file.readLine();
QRegularExpressionMatch match = re.match(line);
QString name = match.captured("name");
int limit = match.captured("limit").toInt();
int list = match.captured("list").toInt();
int option = match.captured("option").toInt();
// Do something with values ...
}

Related

Implement heredocs with trim indent using PEG.js

I working on a language similar to ruby called gaiman and I'm using PEG.js to generate the parser.
Do you know if there is a way to implement heredocs with proper indentation?
xxx = <<<END
hello
world
END
the output should be:
"hello
world"
I need this because this code doesn't look very nice:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
this is a function where the user wants to return:
"xxx
xxx"
I would prefer the code to look like this:
def foo(arg) {
if arg == "here" then
return <<<END
xxx
xxx
END
end
end
If I trim all the lines user will not be able to use a string with leading spaces when he wants. Does anyone know if PEG.js allows this?
I don't have any code yet for heredocs, just want to be sure if something that I want is possible.
EDIT:
So I've tried to implement heredocs and the problem is that PEG doesn't allow back-references.
heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ marker {
return text.join('');
}
It says that the marker is not defined. As for trimming I think I can use location() function

I don't think that's a reasonable expectation for a parser generator; few if any would be equal to the challenge.
For a start, recognising the here-string syntax is inherently context-sensitive, since the end-delimiter must be a precise copy of the delimiter provided after the <<< token. So you would need a custom lexical analyser, and that means that you need a parser generator which allows you to use a custom lexical analyser. (So a parser generator which assumes you want a scannerless parser might not be the optimal choice.)
Recognising the end of the here-string token shouldn't be too difficult, although you can't do it with a single regular expression. My approach would be to use a custom scanning function which breaks the here-string into a series of lines, concatenating them as it goes until it reaches a line containing only the end-delimiter.
Once you've recognised the text of the literal, all you need to normalise the spaces in the way you want is the column number at which the <<< starts. With that, you can trim each line in the string literal. So you only need a lexical scanner which accurately reports token position. Trimming wouldn't normally be done inside the generated lexical scanner; rather, it would be the associated semantic action. (Equally, it could be a semantic action in the grammar. But it's always going to be code that you write.)
When you trim the literal, you'll need to deal with the cases in which it is impossible, because the user has not respected the indentation requirement. And you'll need to do something with tab characters; getting those right probably means that you'll want a lexical scanner which computes visible column positions rather than character offsets.
I don't know if peg.js corresponds with those requirements, since I don't use it. (I did look at the documentation, and failed to see any indication as to how you might incorporate a custom scanner function. But that doesn't mean there isn't a way to do it.) I hope that the discussion above at least lets you check the detailed documentation for the parser generator you want to use, and otherwise find a different parser generator which will work for you in this use case.

Here is the implementation of heredocs in Peggy successor to PEG.js that is not maintained anymore. This code was based on the GitHub issue.
heredoc = "<<<" begin:marker "\n" text:($any_char+ "\n")+ _ end:marker (
&{ return begin === end; }
/ '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`\\s{${min}}`);
return text.map(line => {
return line[0].replace(re, '');
}).join('\n');
}
any_char = (!"\n" .)
marker_char = (!" " !"\n" .)
marker "Marker" = $marker_char+
_ "whitespace"
= [ \t\n\r]* { return []; }
EDIT: above didn't work with another piece of code after heredoc, here is better grammar:
{ let heredoc_begin = null; }
heredoc = "<<<" beginMarker "\n" text:content endMarker {
const loc = location();
const min = loc.start.column - 1;
const re = new RegExp(`^\\s{${min}}`, 'mg');
return {
type: 'Literal',
value: text.replace(re, '')
};
}
__ = (!"\n" !" " .)
marker 'Marker' = $__+
beginMarker = m:marker { heredoc_begin = m; }
endMarker = "\n" " "* end:marker &{ return heredoc_begin === end; }
content = $(!endMarker .)*

Saxonica - .NET API - XQuery - XPDY0002: The context item for axis step root/descendant::xxx is absent

I'm getting same error as this question, but with XQuery:
SaxonApiException: The context item for axis step ./CLIENT is absent
When running from the command line, all is good. So I don't think there is a syntax problem with the XQuery itself. I won't post the input file unless needed.
The XQuery is displayed with a Console.WriteLine before the error appears:
----- Start: XQUERY:
(: FLWOR = For Let Where Order-by Return :)
<MyFlightLegs>
{
for $flightLeg in //FlightLeg
where $flightLeg/DepartureAirport = 'OKC' or $flightLeg/ArrivalAirport = 'OKC'
order by $flightLeg/ArrivalDate[1] descending
return $flightLeg
}
</MyFlightLegs>
----- End : XQUERY:
Error evaluating (<MyFlightLegs {for $flightLeg in root/descendant::FlightLeg[DepartureAirport = "OKC" or ArrivalAirport = "OKC"] ... return $flightLeg}/>) on line 4 column 20
XPDY0002: The context item for axis step root/descendant::FlightLeg is absent
I think that like the other question, maybe my input XML file is not properly specified.
I took the samples/cs/ExamplesHE.cs run method of the XQuerytoStream class.
Code there for easy reference is:
public class XQueryToStream : Example
{
public override string testName
{
get { return "XQueryToStream"; }
}
public override void run(Uri samplesDir)
{
Processor processor = new Processor();
XQueryCompiler compiler = processor.NewXQueryCompiler();
compiler.BaseUri = samplesDir.ToString();
compiler.DeclareNamespace("saxon", "http://saxon.sf.net/");
XQueryExecutable exp = compiler.Compile("<saxon:example>{static-base-uri()}</saxon:example>");
XQueryEvaluator eval = exp.Load();
Serializer qout = processor.NewSerializer();
qout.SetOutputProperty(Serializer.METHOD, "xml");
qout.SetOutputProperty(Serializer.INDENT, "yes");
qout.SetOutputStream(new FileStream("testoutput.xml", FileMode.Create, FileAccess.Write));
Console.WriteLine("Output written to testoutput.xml");
eval.Run(qout);
}
}
I changed to pass the Xquery file name, the xml file name, and the output file name, and tried to make a static method out of it. (Had success doing the same with the XSLT processor.)
static void DemoXQuery(string xmlInputFilename, string xqueryInputFilename, string outFilename)
{
// Create a Processor instance.
Processor processor = new Processor();
// Load the source document
DocumentBuilder loader = processor.NewDocumentBuilder();
loader.BaseUri = new Uri(xmlInputFilename);
XdmNode indoc = loader.Build(loader.BaseUri);
XQueryCompiler compiler = processor.NewXQueryCompiler();
//BaseUri is inconsistent with Transform= Processor?
//compiler.BaseUri = new Uri(xqueryInputFilename);
//compiler.DeclareNamespace("saxon", "http://saxon.sf.net/");
string xqueryFileContents = File.ReadAllText(xqueryInputFilename);
Console.WriteLine("----- Start: XQUERY:");
Console.WriteLine(xqueryFileContents);
Console.WriteLine("----- End : XQUERY:");
XQueryExecutable exp = compiler.Compile(xqueryFileContents);
XQueryEvaluator eval = exp.Load();
Serializer qout = processor.NewSerializer();
qout.SetOutputProperty(Serializer.METHOD, "xml");
qout.SetOutputProperty(Serializer.INDENT, "yes");
qout.SetOutputStream(new FileStream(outFilename,
FileMode.Create, FileAccess.Write));
eval.Run(qout);
}
Also two questions regarding "BaseURI".
1. Should it be a directory name, or can it be same as the Xquery file name?
2. I get this compile error: "Cannot implicity convert to "System.Uri" to "String".
compiler.BaseUri = new Uri(xqueryInputFilename);
It's exactly the same thing I did for XSLT which worked. But it looks like BaseUri is a string for XQuery, but a real Uri object for XSLT? Any reason for the difference?

You seem to be asking a whole series of separate questions, which are hard to disentangle.
Your C# code appears to be compiling the query
<saxon:example>{static-base-uri()}</saxon:example>
which bears no relationship to the XQuery code you supplied that involves MyFlightLegs.
The MyFlightLegs query uses //FlightLeg and is clearly designed to run against a source document containing a FlightLeg element, but your C# code makes no attempt to supply such a document. You need to add an eval.ContextItem = value statement.
Your second C# fragment creates an input document in the line
XdmNode indoc = loader.Build(loader.BaseUri);
but it doesn't supply it to the query evaluator.
A base URI can be either a directory or a file; resolving relative.xml against file:///my/dir/ gives exactly the same result as resolving it against file:///my/dir/query.xq. By convention, though, the static base URI of the query is the URI of the resource (eg file) containing the source query text.
Yes, there's a lot of inconsistency in the use of strings versus URI objects in the API design. (There's also inconsistency about the spelling of BaseURI versus BaseUri.) Sorry about that; you're just going to have to live with it.

Bottom line solution based on Michael Kay's response; I added this line of code after doing the exp.Load():
eval.ContextItem = indoc;
The indoc object created earlier is what relates to the XML input file to be processed by the XQuery.

cl_http_utility not normalizing my url. Why?

Via an enterpreise service consumer I connect to a webservice, which returns me some data, and also url's.
However, I tried all methods of the mentioned class above and NO METHOD seems to convert the unicode-characters inside my url into the proper readable characters.... ( in this case '=' and ';' ) ...
The only method, which runs properly is "is_valid_url", which returns false, when I pass url's like this:
http://not_publish-workflow-dev.hq.not_publish.com/lc/content/forms/af/not_publish/request-datson-internal/v01/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled
What am I missing?

It seems that this format is for json values. Usually = and & don't need to be written with the \u prefix. To decode all \u characters, you may use this code:
DATA(json_value) = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled`.
FIND ALL OCCURRENCES OF REGEX '\\u....' IN json_value RESULTS DATA(matches).
SORT matches BY offset DESCENDING.
LOOP AT matches ASSIGNING FIELD-SYMBOL(<match>).
DATA hex2 TYPE x LENGTH 2.
hex2 = to_upper( substring( val = json_value+<match>-offset(<match>-length) off = 2 ) ).
DATA(uchar) = cl_abap_conv_in_ce=>uccp( hex2 ).
REPLACE SECTION OFFSET <match>-offset LENGTH <match>-length OF json_value WITH uchar.
ENDLOOP.
ASSERT json_value = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId=105862&wcmmode=disabled`.

I hate to answer my own questions, but anyway, I found an own solution, via manually replacing those unicodes. It is similar to Sandra's idea, but able to convert ANY unicode.
I share it here, just in case, any person might also need it.
DATA: lt_res_tab TYPE match_result_tab.
DATA(valid_url) = url.
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
WHILE lines( lt_res_tab ) > 0.
DATA(match) = substring( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length ).
DATA(hex_unicode) = to_upper( match+2 ).
DATA(char) = cl_abap_conv_in_ce=>uccp( uccp = hex_unicode ).
valid_url = replace( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length with = char ).
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
ENDWHILE.
WRITE / url.
WRITE / valid_url.

Create a simple Lua .txt Database with write() and read()

im trying to create a simple 2 functions text-file "database" with LUA. All i need is 2 Functions.
my db should look like that:
varName;varValue
JohnAge;18
JohnCity;Munich
LarissaAge;21
LarissaCity;Berlin
In fact im not stuck to any format! I just don't have a way to save data longterm in my lua enviroment and i need to find a workaround. So if you already have a
similiar solution at hand, please feel free to throw it at me. Thank you very much
Function WriteToDB(varName, varValue)
If database.text contains a line that starts with varName
replace whatever comes after seperator ";" with varValue (but dont go into the next line)
Function ReadFromDB(varName)
If database.text contains a line that starts with varName
take whatever comes after the seperator ";" and return that (but dont go into the next line)
elseif not found print("error")

Save the data as Lua code that builds a table:
return {
JohnAge = 18,
JohnCity = "Munich",
LarissaAge = 21,
LarissaCity = "Berlin",
}
Or better yet
return {
["John"] = {Age = 18, City = "Munich"},
["Larissa"] = {Age = 21, City = "Berlin"},
}
Load the data with
db = dofile"db.lua"
Access the data with
print(db["Larissa"].Age)
or
print(db[name].Age)

Read lua interface

In lua, is there any way to read an interface file to extract name/methods/args?
I have an .idl file like this:
interface
{
name = myInterface,
methods = {
testing = {
resulttype = "double",
args = {{direction = "in",
type = "double"},
}
}
}
This is equal to the code bellow (easier to read):
interface myInterface {
double testing (in double a);
};
I can read file, load as string and parse with gmatch for example to extract information, but is there any easy mode to parse this info?
At the end i want something (a table for example) with the interface name, their methods, result types and args. Just to know the interface that i`m working.

Lua has several facilities to interpret chunks of code. Namely, dofile, loadfile and loadstring. Luckily, your input file is almost valid Lua code (assuming those braces were matched). The only thing that is problematic is interface {.
All of the above functions effectively create a function object with a file's or a string's contents as their code. dofile immediately executes that function, while the others return a function, which you can invoke whenever you like. Therefore, if you're free to change the files, replace interface in the first line with return. Then you can do:
local interface = dofile("input.idl")
And interface will be a nice table, just as you have specified it in the file. If you cannot change those files to your liking, you will have to load the file into the string, perform some string manipulation (specifically, replace the first interface with return) and then use loadstring instead:
io.input("input.idl")
local input = io.read("*all")
input = string.gsub(input, "^interface", "return") -- ^ marks beginning of string
local f = loadstring(input)
local interface = f()
In both cases this is what you will get:
> require"pl.pretty".dump(interface)
{
name = "myInterface",
methods = {
testing = {
args = {
{
type = "double",
direction = "in"
}
},
resulttype = "double"
}
}
}
> print(interface.methods.testing.args[1].type)
double
EDIT:
I just realised, in your example input myInterface is not enclosed in " and therefore not a proper string. Is that also a mistake in your input file or is that what your files actually look like? In the latter case, you would need to change that as well. Lua is not going to complain if it's a name it doesn't know, but you also won't get the field in that case.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How can I efficiently parse formatted text from a file in Qt? - parsing

Related

Implement heredocs with trim indent using PEG.js

Saxonica - .NET API - XQuery - XPDY0002: The context item for axis step root/descendant::xxx is absent

cl_http_utility not normalizing my url. Why?

Create a simple Lua .txt Database with write() and read()

Read lua interface

Categories

Resources