how to use poco library parser XML stream - xml-parsing

The XML content is following:
test.xml
<root>
<headers>
<elementA>Google</elementA>
<elementB>FaceBook</elementB>
</headers>
</root>
I want to use poco XML library parser the up XML content.
I use the following code parser the /root/headers/elementA Node and get the content("Google");
std::ifstream in(“test.xml”);
Poco::XML::InputSource src(in);
Poco::XML::DOMParser parser;
Poco::AutoPtr<Poco::XML::Document> pDoc = parser.parse(&src);
Poco::XML::NodeIterator it(pDoc, Poco::XML::NodeFilter::SHOW_ELEMENTS);
Poco::XML::Node* pRootNode = it.root();
Poco::XML::XMLString xmlPath("/headers/elementA");
Poco::XML::Node * pNewNode = pRootNode->getNodeByPath(xmlPath);
the program running success, but the pNewNode pointer is NULL, i'am reference the poco office site document and not found any useful example. i'am also use google search the topic but still not found any useful info.
who can help me fixed the problem. i have cost three days for the problem.

You are missing /root in your path:
std::ifstream in(“test.xml”);
Poco::XML::InputSource src(in);
Poco::XML::DOMParser parser;
Poco::AutoPtr<Poco::XML::Document> pDoc = parser.parse(&src);
Poco::XML::NodeIterator it(pDoc, Poco::XML::NodeFilter::SHOW_ELEMENT);
Poco::XML::Node* pRootNode = it.root();
Poco::XML::XMLString xmlPath("/root/headers/elementA");
Poco::XML::Node * pNewNode = pRootNode->getNodeByPath(xmlPath);
std::cout << (pNewNode ? "not null" : "null") << std::endl;

Related

Read data from XLSX provided as XSTRING

An Excel file (.xlsx) is uploaded on the frontend which is UI5 Fiori.
The file contents come to SAP ABAP backend via ODATA in XSTRING format.
I need to store that XSTRING into an internal table and then in a DDIC table. Eg: Suppose the Excel has 5 columns then I want to store that data of 5 columns in the corresponding columns in the DDIC table.
I have tried various Function Modules like:
SCMS_XSTRING_TO_BINARY
SCMS_BINARY_TO_STRING
and following Classes & methods:
cl_bcs_convert=>raw_to_string
cl_soap_xml_helper=>xstring_to_string
but none were able to convert the XSTRING to STRING.
Can you please suggest which function module or class/method can be used to solve the problem?
For most comfort, use abap2xlsx.
If you cannot or do not want to use that, you can alternatively parse the Excel file on your own. .xlsx files are basically .zip files with a different file ending. Use cl_abap_zip->load to open the xstring you receive and ->get to extract the individual files from the zip. Afterwards, use XML parsers like cl_ixml or transformations to parse the XML content of the files.
Note that Excel's XML is a complicated file format, with several files that work together to form the worksheets. Refer to Microsoft's File format reference for Word, Excel, and PowerPoint for details. It's non-trivial to interpret this, so you will usually be a lot happier with abap2xlsx.
abap2xlsx is the most powerful and feature-rich way of doing this, as said by Florian, it supports styles, charts, complex tables, however it may not be always available due to the system limitations, restrictions to install custom packages in system or whatever.
Here is the way how to accomplish this with pure standard without using custom frameworks.
Since Netweaver 7.02 SAP supports Open Microsoft formats natively and provides classes for handling them: CL_XLSX_DOCUMENT, CL_DOCX_DOCUMENT and CL_PPTX_DOCUMENT, abap2xlsx is built at these classes too, yes. So let's start a bit of reinventing the wheel.
XLSX file is an OpenXML archive of files, of which the most interesting: sheet1.xml and sharedStrings.xml. Let's build a sample based on MARC table fields
Now you want to transfer this table to internal table with the same structure. The steps would be:
Extract needed files from XLSX archive
Read worksheet structure from sheet1.xml
Read sheet values from sharedStrings.xml
Map them together and write the result to the internal table
Here is the sample class that handles the job, I used the cl_openxml_helper applet to load XLSX, but you can receive XSTRINGed XLSX in whatever way.
CLASS xlsx_reader DEFINITION.
PUBLIC SECTION.
TYPES: BEGIN OF ty_marc,
matnr TYPE char20,
werks TYPE char20,
disls TYPE char20,
ekgrp TYPE char20,
dismm TYPE char20,
END OF ty_marc,
tt_marc TYPE STANDARD TABLE OF ty_marc WITH EMPTY KEY.
METHODS: read RETURNING VALUE(tab) TYPE tt_marc,
extract_xml IMPORTING index TYPE i
xstring TYPE xstring
RETURNING VALUE(rv_xml_data) TYPE xstring.
ENDCLASS.
CLASS xlsx_reader IMPLEMENTATION.
METHOD read.
TYPES: BEGIN OF ty_row,
value TYPE string,
index TYPE abap_bool,
END OF ty_row,
BEGIN OF ty_worksheet,
row_id TYPE i,
row TYPE TABLE OF ty_row WITH EMPTY KEY,
END OF ty_worksheet,
BEGIN OF ty_si,
t TYPE string,
END OF ty_si.
DATA: data TYPE TABLE OF ty_si,
sheet TYPE TABLE OF ty_worksheet.
TRY.
DATA(xstring_xlsx) = cl_openxml_helper=>load_local_file( 'C:\marc.xlsx' ).
CATCH cx_openxml_not_found.
ENDTRY.
"Read the sheet XML
DATA(xml_sheet) = extract_xml( EXPORTING xstring = xstring_xlsx iv_xml_index = 2 ).
"Read the data XML
DATA(xml_data) = extract_xml( EXPORTING xstring = xstring_xlsx iv_xml_index = 3 ).
TRY.
* transforming structure into ABAP
CALL TRANSFORMATION zsheet
SOURCE XML xml_sheet
RESULT root = sheet.
* transforming data into ABAP
CALL TRANSFORMATION zxlsx_data
SOURCE XML xml_data
RESULT root = data.
CATCH cx_xslt_exception.
CATCH cx_st_match_element.
CATCH cx_st_ref_access.
ENDTRY.
* mapping structure and data
LOOP AT sheet ASSIGNING FIELD-SYMBOL(<fs_row>).
APPEND INITIAL LINE TO tab ASSIGNING FIELD-SYMBOL(<line>).
LOOP AT <fs_row>-row ASSIGNING FIELD-SYMBOL(<fs_cell>).
ASSIGN COMPONENT sy-tabix OF STRUCTURE <line> TO FIELD-SYMBOL(<fs_field>).
CHECK sy-subrc = 0.
<fs_field> = COND #( WHEN <fs_cell>-index = abap_false THEN <fs_cell>-value ELSE VALUE #( data[ <fs_cell>-value + 1 ]-t OPTIONAL ) ).
ENDLOOP.
ENDLOOP.
ENDMETHOD.
METHOD extract_xml.
TRY.
DATA(lo_package) = cl_xlsx_document=>load_document( iv_data = xstring ).
DATA(lo_parts) = lo_package->get_parts( ).
CHECK lo_parts IS BOUND AND lo_package IS BOUND.
DATA(lv_uri) = lo_parts->get_part( 2 )->get_parts( )->get_part( index )->get_uri( )->get_uri( ).
DATA(lo_xml_part) = lo_package->get_part_by_uri( cl_openxml_parturi=>create_from_partname( lv_uri ) ).
rv_xml_data = lo_xml_part->get_data( ).
CATCH cx_openxml_format cx_openxml_not_found.
ENDTRY.
ENDMETHOD.
ENDCLASS.
zsheet transformation:
<?sap.transform simple?>
<tt:transform xmlns:tt="http://www.sap.com/transformation-templates" template="main">
<tt:root name="root"/>
<tt:template name="main">
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac=
"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3=
"http://schemas.microsoft.com/office/spreadsheetml/2016/revision3">
<tt:skip count="4"/>
<sheetData>
<tt:loop name="row" ref="root">
<row>
<tt:attribute name="r" value-ref="row_id"/>
<tt:loop name="cells" ref="$row.ROW">
<c>
<tt:cond><tt:attribute name="t" value-ref="index"/><tt:assign to-ref="index" val="C('X')"/></tt:cond>
<v><tt:value ref="value"/></v>
</c>
</tt:loop>
</row>
</tt:loop>
</sheetData>
<tt:skip count="2"/>
</worksheet>
</tt:template>
</tt:transform>
zxlsx_data transformation
<?sap.transform simple?>
<tt:transform xmlns:tt="http://www.sap.com/transformation-templates" template="main">
<tt:root name="ROOT"/>
<tt:template name="main">
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<tt:loop name="line" ref=".ROOT">
<si>
<t>
<tt:value ref="t"/>
</t>
</si>
</tt:loop>
</sst>
</tt:template>
</tt:transform>
Here is how to call it:
START-OF-SELECTION.
DATA(reader) = NEW xlsx_reader( ).
DATA(marc) = reader->read( ).
The code is pretty self-explanatory, but let's put a couple of notes:
File sheet1.xml contains a special attribute t in each cell which denotes either the value should be treated as a literal or a reference to sharedStrings.xml
I used two simple transformations but XSLT can be used as well, possibly allowing you to reduce all XML stuff to single transformation
I deliberately used generic char20 types to be able to handle headers. If you wanna preserve native types, then you cannot read table header (skip the first line in sheet LOOP), because you'll receive type violation and dump. If you receive table without headers, then it is fine to declare structure with native types
If you don't want to use transformations then sXML is your friend. You can parse XML with classes as well, but ST transformation are considerably faster
With some additional effort you can make this snippet dynamic and parse XLSX with any structure
You can read more about this approach in this doc.

Compile error when adding semantic action to Boost Spirit parser

Thanks to the help from user 'sehe' I'm now at the point where I can
compile my ast.
(Please see here: https://stackoverflow.com/a/29301655/1835623 )
Now one of the data fields extracted from JEDEC file I need to parse looks like this:
"12345 0000100101010111010110101011010"
I already built a parser to consume these kind of fields:
std::string input("12345 010101010101010101010");
std::string::iterator st = input.begin();
qi::parse(st, input.end(), qi::ulong_ >> ' ' >> *qi::char_("01"));
Obviously not that complicated. Now my problem is that I want to assign
the ulong_ and the binary string to some local variables using a semantic action. This is what I did:
using boost::phoenix::ref;
std::string input("12345 010101010101010101010");
std::string::iterator st = input.begin();
uint32_t idx;
std::string sequence;
qi::parse(st, input.end(),
qi::ulong_[ref(idx) = qi::_1] >>
' ' >>
*qi::char_("01")[ref(sequence) += qi::_2]);
But unfortunately this doesn't even compile and the error message I get
is not helpful (at least to me)? I guess it's something simple... but I'm hopelessly stuck now. :-(
Does someone have an idea what I'm doing wrong?
Two ways:
fix the SA's
qi::parse(st, input.end(),
qi::ulong_[ref(idx) = qi::_1] >>
' ' >>
qi::as_string[*qi::char_("01")] [phx::ref(sequence) += qi::_1]);
Notes:
it's qi::_1 because the expression the SA attaches to doesn't expose two elements, just 1
it's explicitly phx::ref because otherwise ADL¹ will select std::ref (because of std::string)
use as_string to coerce the attribute type from std::vector<char>
However, of course, as always: Boost Spirit: "Semantic actions are evil"?
qi::parse(st, input.end(),
qi::ulong_ >> ' ' >> *qi::char_("01"),
idx, sequence
);
Use the parser API to bind references to attributes.
¹ What is "Argument-Dependent Lookup" (aka ADL, or "Koenig Lookup")?

Stanford type dependency, can not extract "prepositional modfier"

I am trying to extract the prepositional modifier, like it is stated in the Dependency Manual:
I try to parse the sentence :
"I saw a cat with a telescope" , using the code:
List<CoreMap> sentences = stanfordDocument.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
Tree tree = sentence.get(TreeAnnotation.class);
TreebankLanguagePack languagePack = new PennTreebankLanguagePack();
GrammaticalStructureFactory grammaticalStructureFactory = languagePack.grammaticalStructureFactory();
GrammaticalStructure structure = grammaticalStructureFactory.newGrammaticalStructure(tree);
Collection<TypedDependency> typedDependencies = structure.typedDependenciesCollapsed();
for (TypedDependency td : typedDependencies) {
System.out.println(td.reln());
}
}
As stated in the Manual I was expecting to get : prep(saw, with).
In the Collection of the TypedDependeny I get only
"nsubj; root; det; dobj; det; prep_with" as relation type, and not the "prep/prepc" as stated in the http://robotics.usc.edu/~gkoch/DependencyManual.pdf (page 8).
I have also tried to extract pcomp : Prepositional compelement (page 7 of the manual) and it doesnt find it.
Did somebody encountered the same problem? Am I doing anything wrong?
CoreNLP outputs "Collapsed dependencies preserving a tree structure" (section 4.4 of the manual) from my experience. I think it's the same thing here (e.g. prep_with is a collapsed dependency of prep(saw, with))

Unable to append a sheet using OpenXml with F# (FSharp)

The CreateSpreadsheetWorkbook example method from the OpenXml documentation does translate directly to F#. The problem seems to be the Append method of the Sheets object. The code executes without error, but the resulting xlsx file is missing the inner Xml which should have been appended, and the file is unreadable by Excel. I suspect the problem stems from the conversion of functional F# structures into a System.Collections type, but I do not have direct evidence for this.
I have run similar code in C# and VB.NET (i.e. the documentation example) and it executes perfectly and creates a readable, complete xlsx file.
I know that I could deal with the XML directly, but I would like to understand the nature of the mismatch between F# and OpenXml. Any suggestions?
The code is almost directly from the example:
namespace OpenXmlLib
open System
open DocumentFormat
open DocumentFormat.OpenXml
open DocumentFormat.OpenXml.Packaging
open DocumentFormat.OpenXml.Spreadsheet
module OpenXmlXL =
// this function overwrites an existing file without warning!
let CreateSpreadsheetWorkbook (filepath: string) =
// Create a spreadsheet document by supplying the filepath.
// By default, AutoSave = true, Editable = true, and Type = xlsx.
let spreadsheetDocument = SpreadsheetDocument.Create(filepath, SpreadsheetDocumentType.Workbook)
// Add a WorkbookPart to the document.
let workbookpart = spreadsheetDocument.AddWorkbookPart()
workbookpart.Workbook <- new Workbook()
// Add a WorksheetPart to the WorkbookPart.
let worksheetPart = workbookpart.AddNewPart<WorksheetPart>()
worksheetPart.Worksheet <- new Worksheet(new SheetData())
// Add Sheets to the Workbook.
let sheets = spreadsheetDocument.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets())
// Append a new worksheet and associate it with the workbook.
let sheet = new Sheet()
sheet.Id <- stringValue(spreadsheetDocument.WorkbookPart.GetIdOfPart(worksheetPart))
//Console.WriteLine(sheet.Id.Value)
sheet.SheetId <- UInt32Value(1u)
// Console.WriteLine(sheet.SheetId.Value)
sheet.Name <- StringValue("TestSheet")
//Console.WriteLine(sheet.Name.Value)
sheets.Append (sheet)
// Console.WriteLine("Sheets: {0}", sheets.InnerXml.ToString())
workbookpart.Workbook.Save()
spreadsheetDocument.Close()
The sheet is created, but empty:
sheet.xml:
<?xml version="1.0" encoding="utf-8" ?>
<x:worksheet xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main" />
workbook.xml:
<?xml version="1.0" encoding="utf-8" ?>
- <x:workbook xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
- <x:sheets>
<x:sheet name="TestSheet" sheetId="1" r:id="R263eb6f245a2497e" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />
</x:sheets>
</x:workbook>
The problem is very subtle, and is in your calls to the Worksheet constructor and the Sheets.Append method. Both of these methods are overloaded, and can take either a seq<OpenXmlElement> or any number of individual OpenXmlElements (via a [<System.ParamArray>]/params array). The twist is that the OpenXmlElement type itself implements the seq<OpenXmlElement> interface.
In C#, when you call new Worksheet(new SheetData()), the compiler's overload resolution picks the second of the overloads, implicitly creating a one-element array containing the SheetData value. However, in F#, since the SheetData class implements IEnumerable<OpenXmlElement>, the first overload is chosen, which creates a new WorkSheet by enumerating the contents of the SheetData, which is not what you want.
To fix this, you need to set up your calls so that they use the other overload (first example below) or explicitly create a singleton sequence (second example below):
worksheetPart.Worksheet <- new Worksheet(new SheetData() :> OpenXmlElement)
...
sheets.Append([sheet :> OpenXmlElement])

When parsing XML, the character é is missing

I have an XML as input to a Java function that parses it and produces an output. Somewhere in the XML there is the word "stratégie". The output is "stratgie". How should I parse the XML as to get the "é" character as well?
The XML is not produced by myself, I get it as a response from a web service and I am positive that "stratégie" is included in it as "stratégie".
In the parser, I have:
public List<Item> GetItems(InputStream stream) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(stream);
doc.getDocumentElement().normalize();
NodeList nodeLst = doc.getElementsByTagName("item");
List<Item> items = new ArrayList<Item>();
Item currentItem = new Item();
Node node = nodeLst.item(0);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element item = (Element) node;
if(node.getChildNodes().getLength()==0){
return null;
}
NodeList title = item.getElementsByTagName("title");
Element titleElmnt = (Element) title.item(0);
if (null != titleElmnt)
currentItem.setTitle(titleElmnt.getChildNodes().item(0).getNodeValue());
....
Using the debugger, I can see that titleElmnt.getChildNodes().item(0).getNodeValue() is "stratgie" (without the é).
Thank you for your help.
I strongly suspect that either you're parsing it incorrectly or (rather more likely) it's just not being displayed properly. You haven't really told us anything about the code or how you're using the result, which makes it hard to give very concrete advice.
As ever with encoding issues, the first thing to do is work out exactly where data is getting lost. Lots of logging tends to be the way forward: create a small test case that demonstrates the problem (as small as you can get away with) and log everything about the data. Don't just try to log it as raw text: log the Unicode value of each character. That way your log will have all the information even if there are problems with the font or encoding you use to view the log.
The answer was here: http://www.yagudaev.com/programming/java/7-jsp-escaping-html
You can either use utf-8 and have the 'é' char in your document instead of é, or you need to have a parser that understand this entity which exists in HTML and XHTML and maybe other XML dialects but not in pure XML : in pure XML there's "only" ", <, > and maybe &apos; I don't remember.
Maybe you can need to specify those special-char entities in your DTD or XML Schema (I don't know which one you use) and tell your parser about it.

Resources