How to parse a .xfa file - parsing

Hoping that someone has some info on how to parse a xfa file. I can parse csv or xml files just fine, but an xfa one has come along and I'm not familar with the format. Looks like tab delimited body with column metadata at the top.
Anyone dealt with these before or can give me a steer on how to parse them?
I use vb.net but the language of any solution isn't too relevant.
Much appreciated.

Mmm, looks like nobody has a clue. The problem is that .xfa doesn't look like a "standard" extension: after all, anybody can create its own extension names, from .xyz to .something...
I looked around a bit, found, unsurprisingly (the 'x') an XML format with this extension, not much more.
Indicating where this kind of file come from, what kind of data it holds, might help. Or not.
You describe the file as being a simple TSV (tab separated values) with a header. It is quite trivial to parse, with a tokenizer or some regex, so I am not sure where you are stuck.

I think you might be talking about this: http://en.wikipedia.org/wiki/XFA_forms
This seemed to be a page that was designed to deal with that template: http://www.w3.org/1999/05/XFA/xfa-template-19990614
That information should be enough to get the ball rolling. If that fails then you can always analyse the file itself for patterns and go from there. I don't see it being too tricky.
Anyway, I hope that helps.
P.S. If you could provide a link to that .xfa we could probably give you more help.

The original post says the content looks like "tab delimited body with column metadata at the top". An XFA form doesn't look anything like that - XFA forms typically use a *.xdp extension and are XML.

Check out the Adobe page:
http://partners.adobe.com/public/developer/xml/index_arch.html
(Adobe XML Forms Architecture, currently 1400 pages)
Let LiveCycle/Acrobat parse it for you.

Related

Angular 2 - upload of file

I need to create upload of images to my webserver in my Angular 2 app. Can anybody provide me some guidance how to achive this?
These are the prerequisities:
ASMX web service communicating in JSON.
post method used for communication.
JPEG / PNG up to 1MB of size.
Concept I wanted to follow (but failed)
Load the content of JPEG to variable, encode it using the Base64 coding and post it to ASMX service that will accept two parameters (token for authentication and encoded data.
What exactly is my problem
Web service was the easy part, it is done and working, but I can't manage to get the file content for enconding. I used this:
component.html
...
<input type="file" (change)="fileChangeEvent($event)" />
...
component.ts
private fileChangeEvent(fileInput: any) {
let image = fileInput.target.files[0] as File;
...
}
As you have probably guessed, the problem is in the File class, because it provides me only basic info about the file (name, size, last modif, ...) but I can't get the content of the file. Or at least I don't know how to get it. I also checked other questions here on SO, but all of the answers had something special that did not met my requirements. And maybe I'm just blind, but I can't see where the content is get.
So, is there anybody, who is able to provide me some guidelines to follow?
Thank you very much in advance.
I have left this question open for experienced guys, who could be able to answer it. There is no answer though and I found out the answer yesterday. So, after some research and modification of search phrase, I found out the answer. There is a FileReader type which can be used for reading the content of the file. Here is the source of the answer:
Getting byte array through input type = file
Thanks to original answer now I know how to do it.

cawler: html file encodings issue

I try to write a crawler to get some information.But I find the word is different in webpage source.For example, the word Möller is Möller in html file.
I want to know how can I recover it after I get the html file.
Having fix this problem and provide the answer in case some beginner meet the same problem.
I use chr() to substitute the wrong code, for example use chr(246) to substitute ö
If there is better solution, please tell me.

How to get http tag text by id using lua

There is a webpage parser, which takes a page contains several tags, in a certain structure, where divs are badly nested. I need to extract a certain div element, and copy it and all its content to a new html file.
Since I am new to lua, I may need basic clarification for things might seem simple.
Thanks,
The ease of extraction of data is going to largely depend on the page itself. If the page uses the exact same tag information throughout its entirety, it'll be much more difficult to extract than it would if it has named tags.
If you're able to find a version of the page that returns json format, then you're that much better off. Here's a snippet of code on something I wrote to grab definitions from a webpage that did not have json format:
local actualword, definition = string.match(wayup,"<html.-<td class='word'>%c(.-)%c</td>.-<div class=\"definition\">(.-)</div>")
Essentially, this code searched down the page until it found the class "word", and took the word after it (%c is the pattern for control characters). It continued on to "definition" and captured that, as well.
As you can see, it's a bit convoluted, but I had the luck of having specifically named tags for what I wanted.
This is edited to fit your comment. As a side note that I should have mentioned before, if you're familiar with regular expressions, you can use its model to capture what you need. In this case, it's capturing the string in its totality:
local data = string.match(page, "(<div id=\"aa\"><div>.-</div>.-</div>)")
It's rarely the fault of the language, but rather the webpage itself, that makes it hard to data mine anything. Since webpages could literally have hundreds of lines of code, it's hard to pinpoint exactly what you want without coming across garbage information. It's why I prefer a simplified result such as json, since Lua has a json module that can encode/decode and you can get your precise information.

Ruby/Rails parse XML without loading it all into memory

I'm wondering if there's an easy way to parse an XML document in rails without loading it all into ram.
I've been using (depending on the XML) a combination of Nokogiri and the standard Hash.from_xml to pull get the contents of the XML.
That is all well and good when I'm dealing with (attempting to import) 100 or even 1000 products. When however the XML doc has 16,000 or 40,000 products in it.... well my Dino starts to really really feel it.
So I'm wondering if there's a way to walk the XML without pulling it all into memory.
Sorry I don't have code.... I'm attempting to avoid writing anything new. I mean who wants to write their own XML parser eh?
I came to this...
reader = Nokogiri::JSON::Reader(File.open('test.xml'))
reader.each do |node|
if(node.name == 'Product')
hash = Hash.from_xml(node.outer_xml).values.first
break;
end
end
I watched my memory load while I ran this across a 60 meg file. It accomplished my goal. I'd love to see other answers. Perhaps something even lighter.
Because XML is hierarchical the parser needs to know the whole structure to parse it correctly. You could feed well formed fragments to Nokogiri::HTML::Document.parse but you'd need to get those fragments out some other way.
Let's say you have a huge xml document:
<products>
<product>stuff</product>
<product>...</product>
... and so on
</products>
The actual products are enveloped within <products>, strip out the envelope part and then using string splitting to get an array of each <product> and its contents. Then parse each of these as XML fragments. Just a thought.
This might help, although I've never used it: https://github.com/soulcutter/saxerator

Config file format

does anyone knows a file format for configuration files easy to read by humans? I want to have something like tag = value where value may be:
String
Number(int or float)
Boolean(true/false)
Array(of String values, Number values, Boolean values)
Another structure(it will be more clear what I mean in the fallowing example)
Now I use something like this:
IntTag=1
FloatTag=1.1
StringTag="a string"
BoolTag=true
ArrayTag1=[1 2 3]
ArrayTag2=[1.1 2.1 3.1]
ArrayTag3=["str1" "str2" "str3"]
StructTag=
{
NestedTag1=1
NestedTag2="str1"
}
and so on.
Parsing is easy but for large files I find it hard to read/edit in text editors. I don't like xml for the same reason, it's hard to read. INI does not support nesting and I want to be able to nest tags. I also don't want a complicated format because I will use limited kind of values as I mentioned above.
Thanks for any help.
What about YAML ? It's easy to parse, nicely structured has wide programming language support. If you don't need the full feature set, you could also use JSON.
Try YAML - is (subjectively) easy to read, allows nesting, and is relatively simple to parse.

Resources