Parsing an integer and HEX value in Ragel - parsing

I am trying to design a parser using Ragel and C++ as host langauge.
There is a particular case where a parameter can be defined in two formats :
a. Integer : eg. SignalValue = 24
b. Hexadecimal : eg. SignalValue = 0x18
I have the below code to parse such a parameter :
INT = ((digit+)$incr_Count) %get_int >!(int_error); #[0-9]
HEX = (([0].'x'.[0-9A-F]+)$incr_Count) %get_hex >!(hex_error); #[hexadecimal]
SIGNAL_VAL = ( INT | HEX ) %/getSignalValue;
However in the above defined parser command, only the integer values(as defined in section a) gets recognized and parsed correctly.
If an hexadecimal number(eg. 0x24) is provided, then the number gets stored as ´0´ . There is no error called in case of hexadecimal number. The parser recognizes the hexadecimal, but the value stored is '0'.
I seem to be missing out some minor details with Ragel. Has anyone faced a similar situation?
The remaning part of the code :
//Global
int lInt = -1;
action incr_Count {
iGenrlCount++;
}
action get_int {
int channel = 0xFF;
std::stringstream str;
while(iGenrlCount > 0)
{
str << *(p - iGenrlCount);
iGenrlCount--;
}
str >> lInt; //push the values
str.clear();
}
action get_hex {
std::stringstream str;
while(iGenrlCount > 0)
{
str << std::hex << *(p - iGenrlCount);
iGenrlCount--;
}
str >> lInt; //push the values
}
action getSignalValue {
cout << "lInt = " << lInt << endl;
}

It's not a problem with your FSM (which looks fine for the task you have), it's more of a C++ coding issue. Try this implementation of get_hex():
action get_hex {
std::stringstream str;
cout << "get_hex()" << endl;
while(iGenrlCount > 0)
{
str << *(p - iGenrlCount);
iGenrlCount--;
}
str >> std::hex >> lInt; //push the values
}
Notice that it uses str just as a string buffer and applies std::hex to >> from std::stringstream to int. So in the end you get:
$ ./a.out 245
lInt = 245
$ ./a.out 0x245
lInt = 581
Which probably is what you want.

Related

How to print a String in a C++ Builder console application?

I am trying to print the contents of a String in a console application. I am doing a test and would like to visualize the content for debugging purposes.
Here is my code:
bool Tests::test001() {
std::string temp;
CDecoder decoder; // Create an instance of the CDecoder class
String input = "60000000190210703800000EC00000164593560001791662000000000000080000000002104302040235313531353135313531353153414C4535313030313233343536373831323334353637383930313233";
String expected_output = "6000000019";
String output = decoder.getTPDU(input); // Call the getTPDU method
std::cout << "Expected :" << expected_output.t_str() <<std::endl;
std::cout << "Obtained :" << output.t_str() <<std::endl;
return output == expected_output; // Return true if the output is as expected, false otherwise
}
This is what I get:
Running test: 0
Expected :024B8874
Obtained :00527226
Test Fail
Press any key to continue...
This is what I want to get:
Running test: 0
Expected :6000000019
Obtained :0000001902
Test Fail
Press any key to continue...
Here the Obtained value is a substring of the input I chose randomly (a shift to the left by two characters).
Whether I use t_str() or c_str() the result is the same.
In C++Builder 2009 and later, String (aka System::String) is an alias (ie, a typedef) for System::UnicodeString, which is a UTF-16 string type based on wchar_t on Windows and char16_t on other platforms.
Also, the UnicodeString::t_str() method has been deprecated since around C++Builder 2010. In modern versions, it just returns the same pointer as the UnicodeString::c_str() method.
You can't print a UnicodeString's characters using std::cout. You are getting memory addresses printed out instead of characters, because std::cout does not have an operator<< defined for wchar_t*/char16_t* pointers, but it does have one for void* pointers.
You need to use std::wcout instead, eg:
std::wcout << L"Expected :" << expected_output.c_str() << std::endl;
std::wcout << L"Obtained :" << output.c_str() << std::endl;
If you want to use std::cout, you will have to convert the String values to either System::UTF8String (and put the console into UTF-8 mode) or System::AnsiString instead, eg:
std::cout << "Expected :" << AnsiString(expected_output).c_str() << std::endl;
std::cout << "Obtained :" << AnsiString(output).c_str() << std::endl;
This seems to do the work:
std::wcout
Here is the working code:
// Member function to run test001
bool Tests::test001() {
std::string temp;
CDecoder decoder; // Create an instance of the CDecoder class
String input = "60000000190210703800000EC00000164593560001791662000000000000080000000002104302040235313531353135313531353153414C4535313030313233343536373831323334353637383930313233";
String expected_output = "6000000019";
String output = decoder.getTPDU(input); // Call the getTPDU method
std::wcout << "Expected: " << expected_output <<std::endl;
std::wcout << "Obtained: " << output <<std::endl;
return output == expected_output; // Return true if the output is as expected, false otherwise

String Newline not displaying in Doors

I have csv file containing some data like:
374,Test Comment multiplelines \n Here's the 2nd line,Other_Data
Where 374 is the object ID from doors, then some commentary and then some other data.
I have a piece of code that reads the data from the CSV file, stores it in the appropriate variables and then writes it to the doors Object.
Module Openend_module = edit("path_to_mod", true,true,true)
Object o ;
Column c;
string attrib;
string oneLine ;
string OBJECT_ID = "";
string Comment = "";
String Other_data = "";
int offset;
string split_text(string s)
{
if (findPlainText(s, sub, offset, len, false))
{
return s[0 : offset -1]
}
else
{
return ""
}
}
Stream input = read("Path_to_Input.txt");
input >> oneLine
OBJECT_ID = split_text(oneLine)
oneLine = oneLine[offset+1:]
Comment = split_text(oneLine)
Other_data = oneLine[offset+1:]
When using print Comment the output in the DXL console is : Test Comment multiplelines \n Here's the 2nd line
for o in Opened_Module do
{
if (o."Absolute Number"""==OBJECT_ID ){
attrib = "Result_Comment " 2
o.attrib = Comment
}
}
But after writing to the doors object, the \n is not taken into consideration and the result is as follows:
I've tried putting the string inside a Buffer and using stringOf() but the escape character just disappeared.
I've also tried adding \r\n and \\n to the input csv text but still no luck
This isn't the most efficient way of handling this, but I have a relatively straightforward fix.
I would suggest adding the following:
Module Openend_module = edit("path_to_mod", true,true,true)
Object o ;
Column c;
string attrib;
string oneLine ;
string OBJECT_ID = "";
string Comment = "";
String Other_data = "";
int offset;
string split_text(string s)
{
if (findPlainText(s, sub, offset, len, false))
{
return s[0 : offset -1]
}
else
{
return ""
}
}
Stream input = read("Path_to_Input.txt");
input >> oneLine
OBJECT_ID = split_text(oneLine)
oneLine = oneLine[offset+1:]
Comment = split_text(oneLine)
Other_data = oneLine[offset+1:]
//Modification to comment string
int x
int y
while ( findPlainText ( Comment , "\\n" , x , y , false ) ) {
Comment = ( Comment [ 0 : x - 1 ] ) "\n" ( Comment [ x + 2 : ] )
}
This will run the comment string through a parser, replacing string "\n" with the char '\n'. Be aware- this will ignore any trailing spaces at the end of a line.
Let me know if that helps.

Boost Spirit Qi parser not working when factored out into qi::grammar

I'm playing around with Spirit parsers to parse strings like id 1234. It works perfectly with an inline start = qi::lit("id") >> qi::int_; but not if I want to put that into a separate qi::grammar-based struct. See cases 1, 2 and 3 in the below example:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
#include <fstream>
namespace qi = boost::spirit::qi;
namespace phoenix = boost::phoenix;
struct Context{};
template <typename Iterator, typename SkipParser>
struct IdGrammar : qi::grammar<Iterator, SkipParser>
{
explicit IdGrammar(Context& out) : IdGrammar::base_type(start, "IdGrammar")
{
start = qi::lit("id") >> qi::int_;
}
qi::rule<Iterator, SkipParser> start;
};
template <typename Iterator, typename SkipParser>
struct MyGrammar : qi::grammar<Iterator, SkipParser>
{
explicit MyGrammar(Context& out)
: MyGrammar::base_type(start, "MyGrammar")
{
IdGrammar<Iterator, SkipParser> idGrammar(out);
// start = idGrammar >> *(',' >> idGrammar); // 1 = Parsing fails
start = idGrammar; // 2 = Parsing fails
// start = qi::lit("id") >> qi::int_; // 3 = Parsing succeeds
start.name("the start");
qi::on_error<qi::fail>(
start,
phoenix::ref(std::cout) << phoenix::val("Parsing error: expecting ") << qi::_4 // what failed?
<< phoenix::val(" here: \"")
<< phoenix::construct<std::string>(qi::_3, qi::_2) // iterators to error-pos, end
<< phoenix::val("\"")
<< std::endl);
}
qi::rule<Iterator, SkipParser> start;
};
int main()
{
typedef std::string::const_iterator iterator_type;
Context ctx;
MyGrammar<iterator_type, qi::space_type> roman_parser(ctx); // Our grammar
std::string str = "id 5012";
iterator_type iter = str.begin(), end = str.end();
bool r = phrase_parse(iter, end, roman_parser, qi::space);
if (r && iter == end)
{
std::cout << "Parsing succeeded\n";
}
else
{
std::string rest(iter, end);
std::cout << "Parsing failed\n";
std::cout << "stopped at: \"" << rest << "\"\n";
}
}
Run example on Coliru
The output for the failing cases (1 and 2) is:
Parsing failed
stopped at: "id 5012"
What makes the difference here? Mind that I removed any assignment of the integer result to keep the example minimal - assuming that is unrelated to the problem.
The lifetime of idGrammar must be longer than just the constructor scope.
Make it a member variable:
template <typename Iterator, typename SkipParser>
struct MyGrammar : qi::grammar<Iterator, SkipParser>
{
explicit MyGrammar(Context& out)
: MyGrammar::base_type(start, "MyGrammar")
, idGrammar(out)
{
start = idGrammar >> *(',' >> idGrammar); // 1 = Now parsing succeeds
// start = idGrammar; // 2 = Now parsing succeeds
// start = qi::lit("id") >> qi::int_; // 3 = Parsing succeeds
start.name("the start");
qi::on_error<qi::fail>(
start,
phoenix::ref(std::cout) << phoenix::val("Parsing error: expecting ") << qi::_4 // what failed?
<< phoenix::val(" here: \"")
<< phoenix::construct<std::string>(qi::_3, qi::_2) // iterators to error-pos, end
<< phoenix::val("\"")
<< std::endl);
}
qi::rule<Iterator, SkipParser> start;
IdGrammar<Iterator, SkipParser> idGrammar;
};

Nom Parser To Unescape String

I'm writing a Nom parser for RCS. RCS Files tend to be ISO-8859-1 encoded. One of the grammar productions is for a String. This is #-delimited and literal # symbols are escaped as ##.
#A String# -> A String
#A ## String# -> A # String
I have a working function (see end). IResult is from Nom, you either return the parsed thing, plus the rest of the unparsed input, or an Error/Incomplete. Cow is used to return a reference built on the original input slice if no unescaping was required, or an owned string if it was.
Are there any built in Nom macros that could have helped with this parse?
#[macro_use]
extern crate nom;
use std::str;
use std::borrow::Cow;
use nom::*;
/// Parse an RCS String
fn string<'a>(input: &'a[u8]) -> IResult<&'a[u8], Cow<'a, str>> {
let len = input.len();
if len < 1 {
return IResult::Incomplete(Needed::Unknown);
}
if input[0] != b'#' {
return IResult::Error(Err::Code(ErrorKind::Custom(0)));
}
// start of current chunk. Chunk is a piece of unescaped input
let mut start = 1;
// current char index in input
let mut i = start;
// FIXME only need to allocate if input turned out to need unescaping
let mut s: String = String::new();
// Was the input escaped?
let mut escaped = false;
while i < len {
// Check for end delimiter
if input[i] == b'#' {
// if there's another # then it is an escape sequence
if i + 1 < len && input[i + 1] == b'#' {
// escaped #
i += 1; // want to include the first # in the output
s.push_str(str::from_utf8(&input[start .. i]).unwrap());
start = i + 1;
escaped = true;
} else {
// end of string
let result = if escaped {
s.push_str(str::from_utf8(&input[start .. i]).unwrap());
Cow::Owned(s)
} else {
Cow::Borrowed(str::from_utf8(&input[1 .. i]).unwrap())
};
return IResult::Done(&input[i + 1 ..], result);
}
}
i += 1;
}
IResult::Incomplete(Needed::Unknown)
}
It looks like the way to use the nom library is using the macro combinators. A quick browse of the source code gives some nice examples of parsers, including parsing of strings with escape characters. This is what I came up with:
#[macro_use]
extern crate nom;
use nom::*;
named!(string< Vec<u8> >, delimited!(
tag!("#"),
fold_many0!(
alt!(
is_not!(b"#") |
map!(
complete!(tag!("##")),
|_| &b"#"[..]
)
),
Vec::new(),
|mut acc: Vec<u8>, bytes: &[u8]| {
acc.extend(bytes);
acc
}
),
tag!("#")
));
#[test]
fn it_works() {
assert_eq!(string(b"#string#"), IResult::Done(&b""[..], b"string".to_vec()));
assert_eq!(string(b"#string with ## escapes#"), IResult::Done(&b""[..], b"string with # escapes".to_vec()));
assert_eq!(string(b"#invalid string"), IResult::Incomplete(Needed::Size(16)));
}
As you can see, I simply copy the bytes into a vector using Vec::extend - you could be more sophisticated here and return a Cow byte slice if you wanted.
The escaped! macro does not appear to be of use in this case unfortunately, as it can't seem to work when the terminator is the same as the escape character (which is actually a pretty common case).

String Split in DXL

I have a string
Ex: "We prefer questions that can be answered; not just discussed "
now i want to split this string from ";"
like
We prefer questions that can be answered
and
not just discussed
is this possible in DXL.
i am learning DXL, so i don't have any idea whether we can split or not.
Note : This is not a home work.
I'm sorry for necroing this post. Being new to DXL I spent some time with the same challenge. I noticed that the implementations available on the have different specifications of "splitting" a string. Loving the Ruby language, I missed an implementation which comes at least close to the Ruby version of String#split.
Maybe my findings will be helpful to anybody.
Here's a functional comparison of
Variant A: niol's implementation (which at a first glance, appears to be the same implementation which is usually found at Capri Soft,
Variant B: PJT's implementation,
Variant C: Brett's implementation and
Variant D: my implementation (which provides the correct functionality imo).
To eliminate structural difference, all implementations were implemented in functions, returning a Skip list or an Array.
Splitting results
Note that all implementations return different results, depending on their definition of "splitting":
string mellow yellow; delimiter ello
splitVariantA returns 1 elements: ["mellow yellow" ]
splitVariantB returns 2 elements: ["m" "llow yellow" ]
splitVariantC returns 3 elements: ["w" "w y" "" ]
splitVariantD returns 3 elements: ["m" "w y" "w" ]
string now's the time; delimiter
splitVariantA returns 3 elements: ["now's" "the" "time" ]
splitVariantB returns 2 elements: ["" "now's the time" ]
splitVariantC returns 5 elements: ["time" "the" "" "now's" "" ]
splitVariantD returns 3 elements: ["now's" "the" "time" ]
string 1,2,,3,4,,; delimiter ,
splitVariantA returns 4 elements: ["1" "2" "3" "4" ]
splitVariantB returns 2 elements: ["1" "2,,3,4,," ]
splitVariantC returns 7 elements: ["" "" "4" "3" "" "2" "" ]
splitVariantD returns 7 elements: ["1" "2" "" "3" "4" "" "" ]
Timing
Splitting the string 1,2,,3,4,, with the pattern , for 10000 times on my machine gives these timings:
splitVariantA() : 406 ms
splitVariantB() : 46 ms
splitVariantC() : 749 ms
splitVariantD() : 1077 ms
Unfortunately, my implementation D is the slowest. Surprisingly, the regular expressions implementation C is pretty fast.
Source code
// niol, modified
Array splitVariantA(string splitter, string str){
Array tokens = create(1, 1);
Buffer buf = create;
int str_index;
buf = "";
for(str_index = 0; str_index < length(str); str_index++){
if( str[str_index:str_index] == splitter ){
array_push_str(tokens, stringOf(buf));
buf = "";
}
else
buf += str[str_index:str_index];
}
array_push_str(tokens, stringOf(buf));
delete buf;
return tokens;
}
// PJT, modified
Skip splitVariantB(string s, string delimiter) {
int offset
int len
Skip skp = create
if ( findPlainText(s, delimiter, offset, len, false)) {
put(skp, 0, s[0 : offset -1])
put(skp, 1, s[offset +1 :])
}
return skp
}
// Brett, modified
Skip splitVariantC (string s, string delim) {
Skip skp = create
int i = 0
Regexp split = regexp "^(.*)" delim "(.*)$"
while (split s) {
string temp_s = s[match 1]
put(skp, i++, s[match 2])
s = temp_s
}
put(skp, i++, s[match 2])
return skp
}
Skip splitVariantD(string str, string pattern) {
if (null(pattern) || 0 == length(pattern))
pattern = " ";
if (pattern == " ")
str = stringStrip(stringSqueeze(str, ' '));
Skip result = create;
int i = 0; // index for searching in str
int j = 0; // index counter for result array
bool found = true;
while (found) {
// find pattern
int pos = 0;
int len = 0;
found = findPlainText(str[i:], pattern, pos, len, true);
if (found) {
// insert into result
put(result, j++, str[i:i+pos-1]);
i += pos + len;
}
}
// append the rest after last found pattern
put(result, j, str[i:]);
return result;
}
Quick join&split I could come up with. Seams to work okay.
int array_size(Array a){
int size = 0;
while( !null(get(a, size, 0) ) )
size++;
return size;
}
void array_push_str(Array a, string str){
int array_index = array_size(a);
put(a, str, array_index, 0);
}
string array_get_str(Array a, int index){
return (string get(a, index, 0));
}
string str_join(string joiner, Array str_array){
Buffer joined = create;
int array_index = 0;
joined += "";
for(array_index = 0; array_index < array_size(str_array); array_index++){
joined += array_get_str(str_array, array_index);
if( array_index + 1 < array_size(str_array) )
joined += joiner;
}
return stringOf(joined)
}
Array str_split(string splitter, string str){
Array tokens = create(1, 1);
Buffer buf = create;
int str_index;
buf = "";
for(str_index = 0; str_index < length(str); str_index++){
if( str[str_index:str_index] == splitter ){
array_push_str(tokens, stringOf(buf));
buf = "";
}else{
buf += str[str_index:str_index];
}
}
array_push_str(tokens, stringOf(buf));
delete buf;
return tokens;
}
If you only split the string once this is how I would do it:
string s = "We prefer questions that can be answered; not just discussed"
string sub = ";"
int offset
int len
if ( findPlainText(s, sub, offset, len, false)) {
/* the reason why I subtract one and add one is to remove the delimiter from the out put.
First print is to print the prefix and then second is the suffix.*/
print s[0 : offset -1]
print s[offset +1 :]
} else {
// no delimiter found
print "Failed to match"
}
You could also use regular expressions refer to the DXL reference manual. It would be better to use regular expressions if you want to split up the string by multiple delimiters such as str = "this ; is an;example"
ACTUALLY WORKS:
This solution will split as many times as needed, or none, if the delimiter doesn't exist in the string.
This is what I have used instead of a traditional "split" command.
It actually skips the creation of an array, and just loops through each string that would be in the array and calls "someFunction" on each of those strings.
string s = "We prefer questions that can be answered; not just discussed"
// for this example, ";" is used as the delimiter
Regexp split = regexp "^(.*);(.*)$"
// while a ";" exists in s
while (split s) {
// save the text before the last ";"
string temp_s = s[match 1]
// call someFunction on the text after the last ";"
someFunction(s[match 2])
// remove the text after the last ";" (including ";")
s = temp_s
}
// call someFunction again for the last (or only) string
someFunction(s)
Sorry for necroing an old post; I just didn't find the other answers useful.
Perhaps someone would find handy this fused solution as well. It splits string in Skip, based on delimiter, which can actually have length more then one.
Skip splitString(string s1, string delimit)
{
int offset, len
Skip splited = create
while(findPlainText(s1, delimit, offset, len, false))
{
put(splited, s1[0:offset-1], s1[0:offset-1])
s1 = s1[offset+length(delimit):length(s1)-1]
}
if(length(s1)>0)
{
put (splited, s1, s1)
}
return splited
}
I tried this out and worked out for me...
string s = "We prefer questions that can be answered,not just discussed,hiyas"
string sub = ","
int offset
int len
string s1=s
while(length(s1)>0){
if ( findPlainText(s1, sub, offset, len, false)) {
print s1[0 : offset -1]"\n"
s1= s1[offset+1:length(s1)]
}
else
{
print s1
s1=""
}
}
Here is a better implementation. This is a recursive split of the string by searching a keyword.
pragma runLim, 10000
string s = "We prefer questions that can be answered,not just discussed,hiyas;
Next Line,Var1,Nemesis;
Next Line,Var2,Nemesis1;
Next Line,Var3,Nemesis2;
New,Var4,Nemesis3;
Next Line,Var5,Nemesis4;
New,Var5,Nemesis5;"
string sub = ","
int offset
int len
string searchkey=null
string curr=s
string nxt=s
string searchline=null
string Modulename=""
string Attributename=""
string Attributevalue=""
while(findPlainText(curr,"Next Line", offset,len,false))
{
int intlen=offset
searchkey=curr[offset:length(curr)]
if(findPlainText(searchkey,"Next Line",offset,len,false))
{
curr=searchkey[offset+1:length(searchkey)]
}
if(findPlainText(searchkey,";",offset,len,false))
{
searchline=searchkey[0:offset]
}
int counter=0
while(length(searchline)>0)
{
if (findPlainText(searchline, sub, offset, len, false))
{
if(counter==0)
{
Modulename=searchline[0 : offset -1]
counter++
}
else if(counter==1)
{
Attributename=searchline[0 : offset -1]
counter++
}
searchline= searchline[offset+1:length(searchline)]
}
else
{
if(counter==2)
{
Attributevalue=searchline[0:length(searchline)-2]
counter++
}
searchline=""
}
}
print "Modulename="Modulename " Attributename=" Attributename " Attributevalue= "Attributevalue "\n"
}

Resources