XML Message from UNIX Pipe via Stream Component - stream

I have a process that outputs multiple XML documents to a UNIX named pipe in a continuous stream. I'd like to be able to take that named pipe input and create multiple XML messages in the "from" portion of a camel route.
The Stream component seemed to be the natural choice to consume the named pipe input, but each line of the XML text is converted into a message instead of the whole XML document being the message.
I know I'm missing something fundamental here, but my google-foo has come up empty...
Any pointers on how to accomplish this greatly appreciated.
Thanks,
Dave

So more RTFM'ing and I have a solution.
from("file:/tmp?fileName=tshark-pipe&noop=true")
.split().tokenizeXML("packet").streaming()
.log("body: ${body}");
I found the above solution in the Camel "Splitter" EIP documentation page.

Related

Dart Read support for binary files

there exist some sample code for an Http Server in the Dart:io section.
Now I will distribute images with this server. To achieve this, I read the requested image file and send its content to the client via request.response.write().
The problem is the format of the read data:
Either I read the image file as 16bit-String or as Byte Array. Neither of them is compatible to a raw 8-bit array, which I have to send to the client.
May someone help me?
There exist several kinds of write-methods in the response class.
write
writeCharCode
add
While "write" writes the data 'as seen', "writeCharCode" transforms the data back to raw-format. However, writeCharCode prepends some "magic byte" (C2) at the beginning, so it corrupts the data.
Another function, called add( List < int > ) processes the readAsBytes-result as desired.
Best regards,
Alex

How to get the string that caused parse error?

Suppose I have this code:
(handler-case (read ...)
(parse-error (condition)
(format t "What text was I reading last to get this error? ~s~&"
(how-to-get-this-text? condition))))
I can only see the parse-namestring accessors, but it gives the message of the error, not the text it was parsing.
EDIT
In my case the problem is less generic, so an alternative solution not involving the entire string that failed to parse can be good too.
Imagine this example code I'm trying to parse:
prefix(perhaps (nested (symbolic)) expressions))suffix
In some cases I need to stop on "suffix" and in others, I need to continue, the suffix itself has no other meaning but just being an indicator of the action the parser should take next.
READ parses from a stream, not a string. The s-expression can be arbitrarily long. Should READ keep a string of what's been read?
What you might need is a special stream. In standard Common Lisp there is no mechanism for user defined streams. But in real life every implementation has such extensible streams. See for example 'gray streams'.
http://www.sbcl.org/1.0/manual/Gray-Streams.html
There's no standard function to do it. You might be able to brute-force something with read-from-string, but whatever you do, it will require some extra work.

What Character encoding is this?

When i backup my blackberry using blackberry desktop mananger, it saves it as an .ipd file.
its in hex... Not sure if its any particular type. But i used software called ABC amber Text Converter to convert this .ipd file into plain text format. And some of it comes out as plain text, Like all the messages saved in the backup file. But some of the text in the file looks like this:
qÖ²u_+;¢õ¿B[[¤†D`Ø,>p
|Cñ:ÌQ†nÁä¼sÒ®sKDv©{(]
)++³É«.gsn>
z
'‚51o4Kq
8Ütâ¯cí¿þ2´Õ|5kl$S,H
dbiIjz
*!~k$|
&*OÝ>0ðî­wã
+zno%q
2k;
YnÁÅŸ5|Xñ7Ú<}y2
A
V܉lO5‰<œtÅRI-I
Does anybody have any idea What the hell this is or if there is Any way i can decode this?
Thanks
It's just binary data. You may have been able to extract some text from the file where strings of text were stored, but the rest will be just bytes of data.
You'll need a specific program that understands these backup files. A quick google reveals a few choices, such as MagicBerry.
One of the Blackberry developers has helpfully blogged a bit of information about the binary format, so you could try using that to write your own program to parse it:
http://us.blackberry.com/devjournals/resources/journals/jan_2006/ipd_file_format.jsp

how to obtain URLs from Dmoz ODP

I want to use a database of URLs present in DMOZ ODP for my application. ( an array of URL strings OR a file containing the same ). Is there any way of obtaining it , ( other than the manual copy-paste ) ?
EDIT :
Is there any script / code to parse the rdf file..
Take a look at http://rdf.dmoz.org/, you'll need to find a way to parse the RDF into your database.
I did this the other day using the odp2db scripts from Steve's Software. They're old, but the format hasn't changed significantly so they work fine.
I found I didn't need to do the iconv and xmlclean.pl steps suggested in the readme, just uncompressed the dumps and ran the structure2db.pl and content2db.pl scripts. You'll need to create the database tables manually (see the SQL at top of script for that) and modify the connection details in the scripts before you start.
With the mid-January 2009 dump I used, there's 756,962 categories and 4,436,796 websites. It took a while to run through them all, but not excessively long, though I did dispense with the site descriptions as I didn't need them. Also, may be worth adding database indices after creating the tables to speed access up later. The raw structure and content files were 75MB and 300MB compressed respectively. 848MB and 2GB respectively.
I've actually done this in java. I just used the SAX API to read through the RDF files. It was pretty straight forward. In my case I wanted to pull out every URL that was in a topic with "Weblogs" in the topic name.
Basically what did was implement a org.xml.sax.helpers.DefaultHandler
Then to setup the code you do:
InputSource is = new InputSource(new FileInputStream("filename.rdf"));
XMLReader r = XMLReaderFactory.createXMLReader();
r.setContentHandler(new MyHandlerClass());
r.parse(is);
and that's pretty much it. In my handler class I had to implement:
startElement(String uri, String localName, String qName, Attributes attributes) then I had an if statement to see if it was an "ExternalPage" tag, in which case I went to another state to look for "topic","Title" and "Description". I had another
characters(char[] ch, int start, int length) where I read in the topic, title, and description text depending on which one had been most recently sent to startElement
endElement(String uri, String localName, String qName) where I checked to see which element was ending, and if it ExternalPage, that meant the end of the current element.
The whole thing was 80-90 lines of code for the basic parsing. So pretty easy to write. It was able to chew through the multi-gigabyte files in... I don't remember maybe a minute or two? If you just want to query out some specific data, it might be easier just to write the code to do that in your handler, rather then trying to load it into a DB.
If you find a tool that works well, that's obviously better then writing your own code. But writing your own code isn't hard! RDF is just an XML format, and it's not nested or anything. A simple SAX parser is easily doable in a day or so.
You could always pay one of the currupt editors there and they will help you out :)

How to parse a .xfa file

Hoping that someone has some info on how to parse a xfa file. I can parse csv or xml files just fine, but an xfa one has come along and I'm not familar with the format. Looks like tab delimited body with column metadata at the top.
Anyone dealt with these before or can give me a steer on how to parse them?
I use vb.net but the language of any solution isn't too relevant.
Much appreciated.
Mmm, looks like nobody has a clue. The problem is that .xfa doesn't look like a "standard" extension: after all, anybody can create its own extension names, from .xyz to .something...
I looked around a bit, found, unsurprisingly (the 'x') an XML format with this extension, not much more.
Indicating where this kind of file come from, what kind of data it holds, might help. Or not.
You describe the file as being a simple TSV (tab separated values) with a header. It is quite trivial to parse, with a tokenizer or some regex, so I am not sure where you are stuck.
I think you might be talking about this: http://en.wikipedia.org/wiki/XFA_forms
This seemed to be a page that was designed to deal with that template: http://www.w3.org/1999/05/XFA/xfa-template-19990614
That information should be enough to get the ball rolling. If that fails then you can always analyse the file itself for patterns and go from there. I don't see it being too tricky.
Anyway, I hope that helps.
P.S. If you could provide a link to that .xfa we could probably give you more help.
The original post says the content looks like "tab delimited body with column metadata at the top". An XFA form doesn't look anything like that - XFA forms typically use a *.xdp extension and are XML.
Check out the Adobe page:
http://partners.adobe.com/public/developer/xml/index_arch.html
(Adobe XML Forms Architecture, currently 1400 pages)
Let LiveCycle/Acrobat parse it for you.

Resources