How can one to dynamically parse a CSV file using C# and the Smart Format Detector in FileHelpers 3.1? - parsing

As per in this FileHelpers 3.1 example, you can automatically detect a CSV file format using the FileHelpers.Detection.SmartFormatDetector class.
But the example goes no further. How do you use this information to dynamically parse a CSV file? It must have something to do with the DelimitedFileEngine but I cannot see how.
Update:
I figured out a possible way but had to resort to using reflection (which does not feel right). Is there another/better way? Maybe using System.Dynamic? Anyway, here is the code I have so far, it ain't pretty but it works:
// follows on from smart detector example
FileHelpers.Detection.RecordFormatInfo lDetectedFormat = formats[0];
Type lDetectedClass = lDetectedFormat.ClassBuilderAsDelimited.CreateRecordClass();
List<FieldInfo> lFieldInfoList = new List<FieldInfo>(lDetectedFormat.ClassBuilderAsDelimited.FieldCount);
foreach (FileHelpers.Dynamic.DelimitedFieldBuilder lField in lDetectedFormat.ClassBuilderAsDelimited.Fields)
lFieldInfoList.Add(lDetectedClass.GetField(lField.FieldName));
FileHelperAsyncEngine lFileEngine = new FileHelperAsyncEngine(lDetectedClass);
int lRecNo = 0;
lFileEngine.BeginReadFile(cReadingsFile);
try
{
while (true)
{
object lRec = lFileEngine.ReadNext();
if (lRec == null)
break;
Trace.WriteLine("Record " + lRecNo);
lFieldInfoList.ForEach(f => Trace.WriteLine(" " + f.Name + " = " + f.GetValue(lRec)));
lRecNo++;
}
}
finally
{
lFileEngine.Close();
}

As I use the SmartFormatDetector to determine the exact format of the incoming Delimited files you can use following appoach:
private DelimitedClassBuilder GetFormat(string file)
{
var detector = new FileHelpers.Detection.SmartFormatDetector();
var format = detector.DetectFileFormat(file);
return format.First().ClassBuilderAsDelimited;
}
private List<T> ConvertFile2Objects<T>(string file, out DelimitedFileEngine engine)
{
var format = GetSeperator(file); // Get Here your FormatInfo
engine = new DelimitedFileEngine(typeof(T)); //define your DelimitdFileEngine
//set some Properties of the engine with what you need
engine.ErrorMode = ErrorMode.SaveAndContinue; //optional
engine.Options.Delimiter = format.Delimiter;
engine.Options.IgnoreFirstLines = format.IgnoreFirstLines;
engine.Options.IgnoreLastLines = format.IgnoreLastLines;
//process
var ret = engine.ReadFileAsList(file);
this.errorCount = engine.ErrorManager.ErrorCount;
var err = engine.ErrorManager.Errors;
engine.ErrorManager.SaveErrors("errors.out");
//return records do here what you need
return ret.Cast<T>().ToList();
}
This is an approach I use in a project, where I only know that I have to process Delimited files of multiple types.
Attention:
I noticed that with the files I recieved the SmartFormatDetector has a problem with tab delimiter. Maybe this should be considered.
Disclaimer: This code is not perfected but in a usable state. Modification and/or refactoring is adviced.

Related

Filling a list within a loop using list.add and loop variable - how to clone variable?

i try to read out a certain XML API. It is my first project in dart and not very optimized, however it works until the loop ... well loops.
Here the code:
getKategorie(XmlElement element) {
Kategorie tmpKategorie = Kategorie();
dynamic iterableElements;
Board tmpBoard = Board();
tmpKategorie.id = element.getAttribute("id");
tmpKategorie.name = element.getElement("name")?.innerText;
tmpKategorie.description = element.getElement("description")?.innerText;
iterableElements = element.getElement("boards")?.childElements;
for (XmlElement tmpElement in iterableElements) {
if (tmpElement.localName == "board") {
tmpBoard.id = tmpElement?.getAttribute("id");
tmpBoard.name = tmpElement?.getElement("name")?.innerText;
tmpBoard.description = tmpElement?.getElement("description")?.innerText;
tmpKategorie.boards.add(tmpBoard);
}
}
return tmpKategorie;
}
the custom classes i use:
class Kategorie {
List<Board> boards = [];
dynamic id;
dynamic name = "NN";
dynamic description = "NN";
}
class Board {
dynamic id;
dynamic name;
dynamic description;
}
Everything works fine until i reach the second Element with localName board. While the tmpBoard variable will be filled with the data from the second/third/whatever Element content, the before added list element in tmpKategore.boards is also changed.
It looks like the add call for the tmpKategorie.boards list is only putting in the reference to the loop variable tmpBoard which ends in my problem: At the end all added list entries are identical to the last one.
How can i copy the object instead of referencing it into the list?
solution like suggested
if (tmpElement.localName == "board") {
tmpBoard.id = tmpElement?.getAttribute("id");
becomes
if (tmpElement.localName == "board") {
tmpBoard = Board();
tmpBoard.id = tmpElement?.getAttribute("id");

Saxonica - .NET API - XQuery - XPDY0002: The context item for axis step root/descendant::xxx is absent

I'm getting same error as this question, but with XQuery:
SaxonApiException: The context item for axis step ./CLIENT is absent
When running from the command line, all is good. So I don't think there is a syntax problem with the XQuery itself. I won't post the input file unless needed.
The XQuery is displayed with a Console.WriteLine before the error appears:
----- Start: XQUERY:
(: FLWOR = For Let Where Order-by Return :)
<MyFlightLegs>
{
for $flightLeg in //FlightLeg
where $flightLeg/DepartureAirport = 'OKC' or $flightLeg/ArrivalAirport = 'OKC'
order by $flightLeg/ArrivalDate[1] descending
return $flightLeg
}
</MyFlightLegs>
----- End : XQUERY:
Error evaluating (<MyFlightLegs {for $flightLeg in root/descendant::FlightLeg[DepartureAirport = "OKC" or ArrivalAirport = "OKC"] ... return $flightLeg}/>) on line 4 column 20
XPDY0002: The context item for axis step root/descendant::FlightLeg is absent
I think that like the other question, maybe my input XML file is not properly specified.
I took the samples/cs/ExamplesHE.cs run method of the XQuerytoStream class.
Code there for easy reference is:
public class XQueryToStream : Example
{
public override string testName
{
get { return "XQueryToStream"; }
}
public override void run(Uri samplesDir)
{
Processor processor = new Processor();
XQueryCompiler compiler = processor.NewXQueryCompiler();
compiler.BaseUri = samplesDir.ToString();
compiler.DeclareNamespace("saxon", "http://saxon.sf.net/");
XQueryExecutable exp = compiler.Compile("<saxon:example>{static-base-uri()}</saxon:example>");
XQueryEvaluator eval = exp.Load();
Serializer qout = processor.NewSerializer();
qout.SetOutputProperty(Serializer.METHOD, "xml");
qout.SetOutputProperty(Serializer.INDENT, "yes");
qout.SetOutputStream(new FileStream("testoutput.xml", FileMode.Create, FileAccess.Write));
Console.WriteLine("Output written to testoutput.xml");
eval.Run(qout);
}
}
I changed to pass the Xquery file name, the xml file name, and the output file name, and tried to make a static method out of it. (Had success doing the same with the XSLT processor.)
static void DemoXQuery(string xmlInputFilename, string xqueryInputFilename, string outFilename)
{
// Create a Processor instance.
Processor processor = new Processor();
// Load the source document
DocumentBuilder loader = processor.NewDocumentBuilder();
loader.BaseUri = new Uri(xmlInputFilename);
XdmNode indoc = loader.Build(loader.BaseUri);
XQueryCompiler compiler = processor.NewXQueryCompiler();
//BaseUri is inconsistent with Transform= Processor?
//compiler.BaseUri = new Uri(xqueryInputFilename);
//compiler.DeclareNamespace("saxon", "http://saxon.sf.net/");
string xqueryFileContents = File.ReadAllText(xqueryInputFilename);
Console.WriteLine("----- Start: XQUERY:");
Console.WriteLine(xqueryFileContents);
Console.WriteLine("----- End : XQUERY:");
XQueryExecutable exp = compiler.Compile(xqueryFileContents);
XQueryEvaluator eval = exp.Load();
Serializer qout = processor.NewSerializer();
qout.SetOutputProperty(Serializer.METHOD, "xml");
qout.SetOutputProperty(Serializer.INDENT, "yes");
qout.SetOutputStream(new FileStream(outFilename,
FileMode.Create, FileAccess.Write));
eval.Run(qout);
}
Also two questions regarding "BaseURI".
1. Should it be a directory name, or can it be same as the Xquery file name?
2. I get this compile error: "Cannot implicity convert to "System.Uri" to "String".
compiler.BaseUri = new Uri(xqueryInputFilename);
It's exactly the same thing I did for XSLT which worked. But it looks like BaseUri is a string for XQuery, but a real Uri object for XSLT? Any reason for the difference?
You seem to be asking a whole series of separate questions, which are hard to disentangle.
Your C# code appears to be compiling the query
<saxon:example>{static-base-uri()}</saxon:example>
which bears no relationship to the XQuery code you supplied that involves MyFlightLegs.
The MyFlightLegs query uses //FlightLeg and is clearly designed to run against a source document containing a FlightLeg element, but your C# code makes no attempt to supply such a document. You need to add an eval.ContextItem = value statement.
Your second C# fragment creates an input document in the line
XdmNode indoc = loader.Build(loader.BaseUri);
but it doesn't supply it to the query evaluator.
A base URI can be either a directory or a file; resolving relative.xml against file:///my/dir/ gives exactly the same result as resolving it against file:///my/dir/query.xq. By convention, though, the static base URI of the query is the URI of the resource (eg file) containing the source query text.
Yes, there's a lot of inconsistency in the use of strings versus URI objects in the API design. (There's also inconsistency about the spelling of BaseURI versus BaseUri.) Sorry about that; you're just going to have to live with it.
Bottom line solution based on Michael Kay's response; I added this line of code after doing the exp.Load():
eval.ContextItem = indoc;
The indoc object created earlier is what relates to the XML input file to be processed by the XQuery.

Aspose: Text after Ampersand(&) not seen while setting the page header

I encountered a problem with setting the page header text containing ampersand like ‘a&b’. The text after ‘&’ disappears in the pdf maybe because it is the reserved key in Aspose. My code looks like this:
PageSetup pageSetup = workbook.getWorksheets().get(worksheetName).getPageSetup();
//calling the function
setHeaderFooter(pageSetup, parameters, criteria)
//function for setting header and footer
def setHeaderFooter(PageSetup pageSetup, parameters, criteria = [:])
{
def selectedLoa=getSelectedLoa(parameters)
if(selectedLoa.length()>110){
String firstLine = selectedLoa.substring(0,110);
String secondLine = selectedLoa.substring(110);
if(secondLine.length()>120){
secondLine = secondLine.substring(0,122)+"...."
}
selectedLoa = firstLine+"\n"+secondLine.trim();
}
def periodInfo=getPeriodInfo(parameters, criteria)
def reportingInfo=periodInfo[0]
def comparisonInfo=periodInfo[1]
def benchmarkName=getBenchmark(parameters)
def isNonComparison = criteria.isNonComparison?
criteria.isNonComparison:false
def footerInfo="&BReporting Period:&B " + reportingInfo+"\n"
if (comparisonInfo && !isNonComparison){
footerInfo=footerInfo+"&BComparison Period:&B " +comparisonInfo+"\n"
}
if (benchmarkName){
footerInfo+="&BBenchmark:&B "+benchmarkName
}
//where I encounterd the issue,selectedLoa contains string with ampersand
pageSetup.setHeader(0, pageSetup.getHeader(0) + "\n&\"Lucida Sans,Regular\"&8&K02-074&BPopulation:&B "+selectedLoa)
//Insertion of footer
pageSetup.setFooter(0,"&\"Lucida Sans,Regular\"&8&K02-074"+footerInfo)
def downloadDate = new Date().format("MMMM dd, yyyy")
pageSetup.setFooter(2,"&\"Lucida Sans,Regular\"&8&K02-074" + downloadDate)
//Insertion of logo
try{
def bucketName = parameters.containsKey('printedRLBucketName')?parameters.get('printedRLBucketName'):null
def filePath = parameters.containsKey('printedReportLogo')?parameters.get('printedReportLogo'): null
// Declaring a byte array
byte[] binaryData
if(!filePath || filePath.contains("null") || filePath.endsWith("null")){
filePath = root+"/images/defaultExportLogo.png"
InputStream is = new FileInputStream(new File(filePath))
binaryData = is.getBytes()
}else {
AmazonS3Client s3client = amazonClientService.getAmazonS3Client()
S3Object object = s3client.getObject(bucketName, filePath)
// Getting the bytes out of input stream of S3 object
binaryData = object.getObjectContent().getBytes()
}
// Setting the logo/picture in the right section (2) of the page header
pageSetup.setHeaderPicture(2, binaryData);
// Setting the script for the logo/picture
pageSetup.setHeader(2, "&G");
// Scaling the picture to correct size
Picture pic = pageSetup.getPicture(true, 2);
pic.setLockAspectRatio(true)
pic.setRelativeToOriginalPictureSize(true)
pic.setHeight(35)
pic.setWidth(Math.abs(pic.getWidth() * (pic.getHeightScale() / 100)).intValue());
}catch (Exception e){
e.printStackTrace()
}
}
In this case, I get only ‘a’ in the pdf header all other text after ampersand gets disappeared. Please suggest me with a solution for this. I am using aspose 18.2
We have added header on a PDF page with below code snippet but we did not notice any problem when ampersand sign is included in header text.
// open document
Document document = new Document(dataDir + "input.pdf");
// create text stamp
TextStamp textStamp = new TextStamp("a&bcdefg");
// set properties of the stamp
textStamp.setTopMargin(10);
textStamp.setHorizontalAlignment(HorizontalAlignment.Center);
textStamp.setVerticalAlignment(VerticalAlignment.Top);
// set text properties
textStamp.getTextState().setFont(new FontRepository().findFont("Arial"));
textStamp.getTextState().setFontSize(14.0F);
textStamp.getTextState().setFontStyle(FontStyles.Bold);
textStamp.getTextState().setFontStyle(FontStyles.Italic);
textStamp.getTextState().setForegroundColor(Color.getGreen());
// iterate through all pages of PDF file
for (int Page_counter = 1; Page_counter <= document.getPages().size(); Page_counter++) {
// add stamp to all pages of PDF file
document.getPages().get_Item(Page_counter).addStamp(textStamp);
}
// save output document
document.save(dataDir + "TextStamp_18.8.pdf");
Please ensure using Aspose.PDF for Java 18.8 in your environment. For further information on adding page header, you may visit Add Text Stamp in the Header or Footer section.
In case you face any problem while adding header, then please share your code snippet and generated PDF document with us via Google Drive, Dropbox etc. so that we may investigate it to help you out.
PS: I work with Aspose as Developer Evangelist.
Well, yes, "&" is a reserved word when inserting headers/footers in MS Excel spreadsheet via Aspose.Cells APIs. To cope with your issue, you got to place another ampersand to paste the "& (ampersand)" in the header string. See the sample code for your reference:
e.g
Sample code:
Workbook wb = new Workbook();
Worksheet ws = wb.getWorksheets().get(0);
ws.getCells().get("A1").putValue("testin..");
String headerText="a&&bcdefg";
PageSetup pageSetup = ws.getPageSetup();
pageSetup.setHeader(0, headerText);
wb.save("f:\\files\\out1.xlsx");
wb.save("f:\\files\\out2.pdf");
Hope this helps a bit.
I am working as Support developer/ Evangelist at Aspose.

Google Apps Script

I've created a simple script that reads through an xml file and posts the results to an SQL database. This works perfectly.
I've put a little if statement in the script to identify orders that have already been posted to SQL. Basically if the transactionID in the input array is higher than the highest transactionID on the SQL server it adds the row values to the output array.
It seems that I am missing a trick here because I am getting "TypeError: Cannot call method "getAttribute" of undefined. (line 18, file "Code")" when trying to compare the current xml row to the last transaction ID.
I've done some searching and whilst I can see people with similar problems the explanations don't make a whole lot of sense to me.
Anyway, here is the relevant part of the code. Note that this all works perfectly without the if() bit.
function getXML() {
var id = lastTransactionID();
var xmlSite = UrlFetchApp.fetch("https://api.eveonline.com/corp/WalletTransactions.xml.aspx?KeyID=1111&vCode=1111&accountKey=1001").getContentText();
var xmlDoc = XmlService.parse(xmlSite);
var root = xmlDoc.getRootElement();
var row = new Array();
row = root.getChild("result").getChild("rowset").getChildren("row");
var output = new Array();
var i = 0;
for (j=0;i<row.length;j++){
if(row[j].getAttribute("transactionID").getValue()>id){ //Produces: TypeError: Cannot call method "getAttribute" of undefined. (line 18, file "Code")
output[i] = new Array();
output[i][0] = row[j].getAttribute("transactionDateTime").getValue();
output[i][1] = row[j].getAttribute("transactionID").getValue();
output[i][2] = row[j].getAttribute("quantity").getValue();
output[i][3] = row[j].getAttribute("typeName").getValue();
output[i][4] = row[j].getAttribute("typeID").getValue();
output[i][5] = row[j].getAttribute("price").getValue();
output[i][6] = row[j].getAttribute("clientID").getValue();
output[i][7] = row[j].getAttribute("clientName").getValue();
output[i][8] = row[j].getAttribute("stationID").getValue();
output[i][9] = row[j].getAttribute("stationName").getValue();
output[i][10] = row[j].getAttribute("transactionType").getValue();
output[i][11] = row[j].getAttribute("transactionFor").getValue();
output[i][12] = row[j].getAttribute("journalTransactionID").getValue();
output[i][13] = row[j].getAttribute("clientTypeID").getValue();
i++;
}
}
insert(output,output.length);
}
I have seen my mistake and corrected.
Mistake was in the for loop.
for (j=0;i

What grammar is this?

I have to parse a document containing groups of variable-value-pairs which is serialized to a string e.g. like this:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Here are the different elements:
Group IDs:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of each group:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
One of the groups:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14 ^VAR1^6^VALUE1^^
Variables:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of the values:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
The values themselves:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Variables consist only of alphanumeric characters.
No assumption is made about the values, i.e. they may contain any character, including ^.
Is there a name for this kind of grammar? Is there a parsing library that can handle this mess?
So far I am using my own parser, but due to the fact that I need to detect and handle corrupt serializations the code looks rather messy, thus my question for a parser library that could lift the burden.
The simplest way to approach it is to note that there are two nested levels that work the same way. The pattern is extremely simple:
id^length^content^
At the outer level, this produces a set of groups. Within each group, the content follows exactly the same pattern, only here the id is the variable name, and the content is the variable value.
So you only need to write that logic once and you can use it to parse both levels. Just write a function that breaks a string up into a list of id/content pairs. Call it once to get the groups, and then loop through them calling it again for each content to get the variables in that group.
Breaking it down into these steps, first we need a way to get "tokens" from the string. This function returns an object with three methods, to find out if we're at "end of file", and to grab the next delimited or counted substring:
var tokens = function(str) {
var pos = 0;
return {
eof: function() {
return pos == str.length;
},
delimited: function(d) {
var end = str.indexOf(d, pos);
if (end == -1) {
throw new Error('Expected delimiter');
}
var result = str.substr(pos, end - pos);
pos = end + d.length;
return result;
},
counted: function(c) {
var result = str.substr(pos, c);
pos += c;
return result;
}
};
};
Now we can conveniently write the reusable parse function:
var parse = function(str) {
var parts = {};
var t = tokens(str);
while (!t.eof()) {
var id = t.delimited('^');
var len = t.delimited('^');
var content = t.counted(parseInt(len, 10));
var end = t.counted(1);
if (end !== '^') {
throw new Error('Expected ^ after counted string, instead found: ' + end);
}
parts[id] = content;
}
return parts;
};
It builds an object where the keys are the IDs (or variable names). I'm asuming as they have names that the order isn't significant.
Then we can use that at both levels to create the function to do the whole job:
var parseGroups = function(str) {
var groups = parse(str);
Object.keys(groups).forEach(function(id) {
groups[id] = parse(groups[id]);
});
return groups;
}
For your example, it produces this object:
{
'1': {
VAR1: 'VALUE1'
},
'4': {
VAR1: 'VALUE1',
VAR2: 'VAL2'
}
}
I don't think it's a trivial task to create a grammar for this. But on the other hand, a simple straight forward approach is not that hard. You know the corresponding string length for every critical string. So you just chop your string according to those lengths apart..
where do you see problems?

Resources