c++ xml parser function not working - xml-parsing

I am using xerces c++ to manipulate an xml file? but getNodeValue() and setNodeValue() are not working but getNodeName() is working. Do anyone has any suggestions?
if( currentNode->getNodeType() && currentNode->getNodeType() == DOMNode::ELEMENT_NODE )
{
// Found node which is an Element. Re-cast node as element
DOMElement* currentElement= dynamic_cast< xercesc::DOMElement* >( currentNode );
if( XMLString::equals(currentElement->getTagName(), TAG_ApplicationSettings))
{
// Already tested node as type element and of name "ApplicationSettings".
// Read attributes of element "ApplicationSettings".
const XMLCh* xmlch_OptionA = currentElement->getAttribute(ATTR_OptionA);
m_OptionA = XMLString::transcode(xmlch_OptionA);
XMLCh* t,*s;
//s= XMLString::transcode("manish");
//currentNode->setElementText(s);
t=(XMLCh*)currentNode->getNodeName();
s=(XMLCh*)currentNode->getNodeValue();
cout<getNodeValue()) << "\n";

A DOMElement may contain a collection of other DOMElements or a DOMText. To get the text value of an element you need to call the method getTextContent(), getNodeValue will always return NULL.
The is another better way conceptually, as the DOMText is a child of the DOMElement we can traverse through the child node and get the value.
Below is the logic in the form of a method:
string getElementValue(const DOMElement& parent)
{
DOMNode *child;
string strVal;
for (child = parent.getFirstChild();child != NULL ; child = child->getNextSibling())
{
if(DOMNode::TEXT_NODE == child->getNodeType())
{
DOMText* data = dynamic_cast<DOMText*>(child);
const XMLCh* val = data->getWholeText();
strVal += XMLString::transcode(val);
}
else
{
throw "ERROR : Non Text Node";
}
}
return strVal;
}
Hope this helps :)

getNodeValue() will always return an empty string, because the "value" of an element node is in its child. In our case it is text node child. Either way is to iterate through child nodes
or use getTextContent.
First check for child nodes in a node using hasChildNodes() then use methods like getFirstChild() etc. . Afterwards use getNodeValue().
DOMNode* ptrDomNode = SomeNode;
if(ptrDomNode->hasChildNodes())
{
DOMNode* dTextNode = ptrDomNode->getFirstChild();
char* string = XMLString::transcode(dTextNode->getNodeValue());
}

Related

Filling a list within a loop using list.add and loop variable - how to clone variable?

i try to read out a certain XML API. It is my first project in dart and not very optimized, however it works until the loop ... well loops.
Here the code:
getKategorie(XmlElement element) {
Kategorie tmpKategorie = Kategorie();
dynamic iterableElements;
Board tmpBoard = Board();
tmpKategorie.id = element.getAttribute("id");
tmpKategorie.name = element.getElement("name")?.innerText;
tmpKategorie.description = element.getElement("description")?.innerText;
iterableElements = element.getElement("boards")?.childElements;
for (XmlElement tmpElement in iterableElements) {
if (tmpElement.localName == "board") {
tmpBoard.id = tmpElement?.getAttribute("id");
tmpBoard.name = tmpElement?.getElement("name")?.innerText;
tmpBoard.description = tmpElement?.getElement("description")?.innerText;
tmpKategorie.boards.add(tmpBoard);
}
}
return tmpKategorie;
}
the custom classes i use:
class Kategorie {
List<Board> boards = [];
dynamic id;
dynamic name = "NN";
dynamic description = "NN";
}
class Board {
dynamic id;
dynamic name;
dynamic description;
}
Everything works fine until i reach the second Element with localName board. While the tmpBoard variable will be filled with the data from the second/third/whatever Element content, the before added list element in tmpKategore.boards is also changed.
It looks like the add call for the tmpKategorie.boards list is only putting in the reference to the loop variable tmpBoard which ends in my problem: At the end all added list entries are identical to the last one.
How can i copy the object instead of referencing it into the list?
solution like suggested
if (tmpElement.localName == "board") {
tmpBoard.id = tmpElement?.getAttribute("id");
becomes
if (tmpElement.localName == "board") {
tmpBoard = Board();
tmpBoard.id = tmpElement?.getAttribute("id");

Determining context at a position in file using ANTLR4

I'm trying to write a Language Extension for VS Code in JavaScript and I seem to be missing something.
I have a Lexer.g4 and Parser.g4 for my language and can generate a tree using them.
My issue is that the VS Code API gives me a document and a position in that document (line #, character #). From any of the examples I've looked at for ANTLR4 I can't seem to find any functions generated that take a position in the file and give back the nodes of a tree at that position.
I want to know, for example that the cursor is placed on the name of a function.
Am I supposed to be walking the entire tree and checking the position of tokens to see if they enclose the position I'm in in the editor? Or maybe I'm not using the right tool for the job? I feel like I'm probably missing something more fundamental.
Yes, you have to walk the parse tree to find the context at a given position. This is a pretty simple task and you can see it in action in my ANLTR4 exension for vscode. There are multiple functions returning something useful for a given position. For instance this one:
/**
* Returns the parse tree which covers the given position or undefined if none could be found.
*/
function parseTreeFromPosition(root: ParseTree, column: number, row: number): ParseTree | undefined {
// Does the root node actually contain the position? If not we don't need to look further.
if (root instanceof TerminalNode) {
let terminal = (root as TerminalNode);
let token = terminal.symbol;
if (token.line != row)
return undefined;
let tokenStop = token.charPositionInLine + (token.stopIndex - token.startIndex + 1);
if (token.charPositionInLine <= column && tokenStop >= column) {
return terminal;
}
return undefined;
} else {
let context = (root as ParserRuleContext);
if (!context.start || !context.stop) { // Invalid tree?
return undefined;
}
if (context.start.line > row || (context.start.line == row && column < context.start.charPositionInLine)) {
return undefined;
}
let tokenStop = context.stop.charPositionInLine + (context.stop.stopIndex - context.stop.startIndex + 1);
if (context.stop.line < row || (context.stop.line == row && tokenStop < column)) {
return undefined;
}
if (context.children) {
for (let child of context.children) {
let result = parseTreeFromPosition(child, column, row);
if (result) {
return result;
}
}
}
return context;
}
}

How to get object and lower hierarchical objects from a DXL

I'm new on DXL and working on something that is probably quite simple.
I would like to parse the current module, and get the following data for each object that has a given ID (calling IDNUM below) not empty:
IDNUM - Object text - all text with a lower hierarchic level and the same thing for all objects liked to this one.
It will probably be easier to understand with the code. So far, it looks like that:
Object o
Object ol
Link l
Module m = current Module
For o in entire(m) do{
if (o."IDNUM" != ""){
print o."IDNUM" ""
print o."text" ""
//HERE I WOULD LIKE TO ALSO PRINT EVERY TEXT IN OBJECT "LOWER" THAN o
for l in o --> "*" do{
ol = target(l)
print ol."text" ""
//HERE I WOULD LIKE TO ALSO PRINT EVERY TEXT IN OBJECT "LOWER" THAN ol
}
}
}
Basically, I have the ID and title of both an object and the one liked to it, but not the text below. In other words, my code will "mimic" the function right click>copy>copy with hierarchy
How can I do that? Unfortunately I didn't find anything very helpful.
Thanks a lot in advance,
Here is the sketch code you outlined:
Object o
Object ol
Link l
Module m = current Module
For o in entire(m) do{
if (o."IDNUM" != ""){
print o."IDNUM" ""
print o."text" ""
//HERE I WOULD LIKE TO ALSO PRINT EVERY TEXT IN OBJECT "LOWER" THAN o
for l in o --> "*" do{
ol = target(l)
print ol."text" ""
//HERE I WOULD LIKE TO ALSO PRINT EVERY TEXT IN OBJECT "LOWER" THAN ol
}
}
}
There are a few little syntax things that need to be changed here, but the big change is how you are handling linked items. Links 'live' in the source module, but they only store a limited amount of information, mostly the modules that are the source and target of the link, and the absolute numbers of the objects they touch. So you need to check if the module on the other side is open before you try and read text from it.
And since you are trying to go through the entire link structure, you'll need a recursive element to this.
I would probably end up with something like this:
//turn off run limit timer- this might take a bit
pragma runLim , 0
Object o
Module m = current Module
// Recursive function- assumes each object has a text attribute- will error otherwise
void link_print(Object obj) {
print obj."text" "\n"
Link out_link
Object next_obj = null
for out_link in obj -> "*" do {
// Set the next object in the chain
next_obj = target ( out_link )
// This might return null if the module is not loaded
if ( null next_obj ) {
// Load the module in read-only mode, displayed and in standard view
read ( fullName ( ModName_ target ( out_link ) ) , true , true )
// Try and resolve out 'target' again
next_obj = target ( out_link )
// If it doesn't work, print a message so we can figure it out
if ( null next_obj ) {
print "Error Accessing Object " ( targetAbsNo ( out_link ) )""
} else {
//Iterate down structure
link_print ( next_obj )
}
} else {
//Iterate down structure
link_print ( next_obj )
}
}
}
for o in entire(m) do {
// Note that I cast the result of o."IDNUM" to a string type by appending ""
if (o."IDNUM" "" != ""){
print o."IDNUM" "\n"
// Recurse
link_print(o)
print "\n"
}
}
Note! Depending on the size of your link structure, i.e. how many levels you have (and if there are any circular link patterns), this could be a pretty resource intensive task, and would be better solved using something other than "print" commands (like appending it to a word file, for example, so you know how far it got before it errored out)
Good luck!
EDIT:
Rather than head down recursively, this script will now go a single level but should report child objects.
//turn off run limit timer- this might take a bit
pragma runLim , 0
Object o
Module m = current Module
// Recursive function- assumes each object has a text attribute- will error otherwise
void link_print(Object obj) {
print obj."text" "\n"
Link out_link
Object next_obj = null
Object child_obj = null
for out_link in obj -> "*" do {
// Set the next object in the chain
next_obj = target ( out_link )
// This might return null if the module is not loaded
if ( null next_obj ) {
// Load the module in read-only mode, displayed and in standard view
read ( fullName ( ModName_ target ( out_link ) ) , true , true )
// Try and resolve out 'target' again
next_obj = target ( out_link )
// If it doesn't work, print a message so we can figure it out
if ( null next_obj ) {
print "Error Accessing Object " ( targetAbsNo ( out_link ) )""
} else {
// Loop and report on child objects
for child_obj in next_obj do {
print child_obj."text" "\n"
}
}
} else {
// Loop and report on child objects
for child_obj in next_obj do {
print child_obj."text" "\n"
}
}
}
}
for o in entire(m) do {
// Note that I cast the result of o."IDNUM" to a string type by appending ""
if (o."IDNUM" "" != ""){
print o."IDNUM" "\n"
// Recurse
link_print(o)
print "\n"
}
}
Dear Russell (and everyone else)
I've just went through the piece of code you provided me and it works.... but not for what I'm looking for. It seems my explanation wasn't very clear. I'm sorry (I'm not a native speaker).
I'm not looking to get all links, but just the Object text that is written just below the object pointed by the current link.
Here is what my files look like
object1 (with IDNUM) : "Title_doc_1" --> (link) objectA "Title_doc_2"
object2 : "some_text" objectB : "some_text"
object3 : "some_text" objectC : "some_text"
(object1 can point to many other objectA but I already deal with that.)
The code I provided above parses the "doc_1", and print "IDNUM" "Title_doc_1" "Title_doc_2"
What I'm looking for is to get, not only objectA but also objectB and objectC which are hierarchically below objectA (and object2 and object3 too but it will ve the same process).
Hopping I made myself understood...

Find Words in entire module

I have skip list contains an ADC, FIFO, DAC, FILO etc.
I want to know whether these words are used in the entire module or not .if used in the module should return the unused words.
I have a program but it is taking too much time to execute.
Please help me with this.
Here is the code :
Skip Search_In_Entire_Module(Skip List)
{
int sKey = 0
Skip sList = create()
string data = ""
string objText1
Object obj
for data in List do
{
int var_count = 0
for obj in m do
{
objText1 = obj."Object Text"
if objText1!=null then
{
if (isDeleted obj){continue}
if (table obj) {continue}
if (row obj) {continue}
if (cell obj) {continue}
Buffer buf = create()
buf = objText1
int index = 0
while(true)
{
index = contains(buf, data, index)
if(0 <= index)
{
index += length(data)
}
else
{
var_count++
break
}
}
delete(buf)
}
}
if (var_count ==0)
{
put(sList,sKey,data)
sKey++
}
}
return sList
}
Unused_Terminolody_Data = Search_In_Entire_Module(Terminology_Data)
Just wondering: why is this in a while loop?
while(true)
{
index = contains(buf, data, index)
if(0 <= index)
{
index += length(data)
}
else
{
var_count++
break
}
}
I would instead just do:
index = contains ( buf, data )
if ( index == -1 ) {
var_count++
}
buf = ""
I would also not keep deleting and recreating the buffer. Create the buffer up where you create the object variable, then set it equal to "" to clear it, then delete it at the end of the program.
Let me know if this helps!
Balthos makes good points, and I think there's a little more you could do. My adaptation of your function follows. Points to note:
I implemented Balthos's suggestions (above) of taking out the
'while' loop, and buffer creation/deletion.
I changed the function signature. Given that Skip lists are passed
by reference, and must be created and deleted outside the function
it's syntactically confusing (to me, anyway) to return them from a
function. So, I pass both skip lists (terms we're seeking, terms not
found) in as function parameters. Please excuse me changing variable
names - it helped me to understand what was going on more quickly.
There's no need to put the Object Text in a string - this is
relatively slow and consumes memory that will not be freed until
DOORS exits. So, I put the Object Text in a buffer earlier in the
function, and search that. The 'if (!null bufObjText)' at my line 34
is equivalent to your 'objText1!=null'. If you prefer, 'if
(bufObjText != null)' does the same.
The conditional 'if (var_count ==0)' is redundant - I moved it's
functions into an earlier 'if' block (my line 40).
I moved the tests for deleted, table, row and cell objects up, so
that they occur before we take the time to fill a buffer with object
text - so that's only done if necessary.
Item 2 probably isn't going to have a performance impact, but the others will. The only quesiton is, how large?
Please let us know if this improves the running time over what you currently have. I don't have a sufficiently large set of sample data to make meaningful comparisons with your code.
Module modCurrent = current
Skip skUnused_Terminology_Data = create
Skip skSeeking_Terminology_Data = create()
put (skSeeking_Terminology_Data, 0, "SPONG")
put (skSeeking_Terminology_Data, 1, "DoD")
void Search_In_Entire_Module(Skip skTermsSought, skTermsNotFound)
{
Object obj
Buffer bufObjText = create()
int intSkipKey = 0
int index = 0
string strSkipData = ""
for strSkipData in skTermsSought do
{
int var_count = 0
bool blFoundTerm = false
for obj in modCurrent do
{
if (isDeleted obj){continue}
if (table obj) {continue}
if (row obj) {continue}
if (cell obj) {continue}
bufObjText = obj."Object Text"
if (!null bufObjText) then
{
Regexp re = regexp2 strSkipData
blFoundTerm = search (re, bufObjText, 0)
if ( blFoundTerm ) {
put(skUnused_Terminology_Data, intSkipKey, strSkipData)
intSkipKey++
}
bufObjText = ""
}
}
delete (bufObjText)
}
Search_In_Entire_Module (skSeeking_Terminology_Data, skUnused_Terminology_Data)
string strNotFound
for strNotFound in skUnused_Terminology_Data do
{
print strNotFound "\n"
}
delete skUnused_Terminology_Data
delete skSeeking_Terminology_Data

What grammar is this?

I have to parse a document containing groups of variable-value-pairs which is serialized to a string e.g. like this:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Here are the different elements:
Group IDs:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of each group:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
One of the groups:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14 ^VAR1^6^VALUE1^^
Variables:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of the values:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
The values themselves:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Variables consist only of alphanumeric characters.
No assumption is made about the values, i.e. they may contain any character, including ^.
Is there a name for this kind of grammar? Is there a parsing library that can handle this mess?
So far I am using my own parser, but due to the fact that I need to detect and handle corrupt serializations the code looks rather messy, thus my question for a parser library that could lift the burden.
The simplest way to approach it is to note that there are two nested levels that work the same way. The pattern is extremely simple:
id^length^content^
At the outer level, this produces a set of groups. Within each group, the content follows exactly the same pattern, only here the id is the variable name, and the content is the variable value.
So you only need to write that logic once and you can use it to parse both levels. Just write a function that breaks a string up into a list of id/content pairs. Call it once to get the groups, and then loop through them calling it again for each content to get the variables in that group.
Breaking it down into these steps, first we need a way to get "tokens" from the string. This function returns an object with three methods, to find out if we're at "end of file", and to grab the next delimited or counted substring:
var tokens = function(str) {
var pos = 0;
return {
eof: function() {
return pos == str.length;
},
delimited: function(d) {
var end = str.indexOf(d, pos);
if (end == -1) {
throw new Error('Expected delimiter');
}
var result = str.substr(pos, end - pos);
pos = end + d.length;
return result;
},
counted: function(c) {
var result = str.substr(pos, c);
pos += c;
return result;
}
};
};
Now we can conveniently write the reusable parse function:
var parse = function(str) {
var parts = {};
var t = tokens(str);
while (!t.eof()) {
var id = t.delimited('^');
var len = t.delimited('^');
var content = t.counted(parseInt(len, 10));
var end = t.counted(1);
if (end !== '^') {
throw new Error('Expected ^ after counted string, instead found: ' + end);
}
parts[id] = content;
}
return parts;
};
It builds an object where the keys are the IDs (or variable names). I'm asuming as they have names that the order isn't significant.
Then we can use that at both levels to create the function to do the whole job:
var parseGroups = function(str) {
var groups = parse(str);
Object.keys(groups).forEach(function(id) {
groups[id] = parse(groups[id]);
});
return groups;
}
For your example, it produces this object:
{
'1': {
VAR1: 'VALUE1'
},
'4': {
VAR1: 'VALUE1',
VAR2: 'VAL2'
}
}
I don't think it's a trivial task to create a grammar for this. But on the other hand, a simple straight forward approach is not that hard. You know the corresponding string length for every critical string. So you just chop your string according to those lengths apart..
where do you see problems?

Resources