how to parse XML with ASIHTTP - ios

I am currently testing how to combine ios, php, mysql and xml. So far I have set up a database, got my php scripts working and am using them to request data from my external mysql database that then returns the results to my app using ASIHTTPRequest network wrappers.. what I am now trying to do is parse the xml that I am generating with my php scripts, Its a pretty easy example my php looks a little something like this
<?xml version="1.0"?>
<entries>
<code>3237554</code>
</entries>
All I am looking to parse is that number, In the past I have use NSXMLParse from the objective-c library however I was calling everything from an rss feed so the set up to acquiring the data was entirely different as it was not self generated etc. Due to the fact that I am using ASIHTTPRequest I am taking a guess here 'that I would hopefully like you to help me with' is that I should be grabbing my xml in
- (void)requestFinished:(ASIHTTPRequest *)request
which is pretty obvious because I am grabbing just basic text output from there as it is now.. So my question is what is the best way to start parsing this text thats coming in? is there a special xml parser library I am not aware of or can I use NSXMLParser some how?
//EDIT:: Working solution here

If the size of the returned xml is not that big, libxml2 is a nice convenient solution that I have used effectively.
Take a look at Matt's blog here --
http://cocoawithlove.com/2008/10/using-libxml2-for-parsing-and-xpath.html

Related

Parse and store using Apache Nutch

I am trying to develop a Crawler to crawl youtube.com and parse the meta information(title, description, publisher etc) and store these into Hbase/other storage systems. I understood that I have to write plugin(s) to achieve this. But I'm confused what plugins I need to write for this. I am inspecting with this four -
Parser
ParserFilter
Indexer
IndexFilter
To parse the specific metadata information for youtube page, do I need to write a custom Parser plugin or ParseFilter plugin along with using parse-html plugin?
After parsing, to store the entry in Hbase/other storage system do I require to write a IndexWriter plugin? By indexing, we generally understand indexing in Solr, ElasticSearch etc. But I don't need to index in any search engine obviously. So, how can I store them in some store say Hbase after parsing?
Thanks in advance!
Since youtube is a web page, you'll need to write an HtmlParseFilter which gives you access to the raw HTML fetched from the server, but at the moment youtube a LOT of javascript and neither parse-html or parse-tika support executing the js code, so I'll advice you to use the protocol-selenium plugin so you'll delegate the rendering of a webpage to the selenium driver and get the HTML back (after all the JS has been executed). After you write your own HtmlParseFilter you'll need to write your own IndexingFilter, in this case you'll only need to specify what info you want to send to your backend, this is totally backend-agnostic and relies only on the Nutch codebase (that's why you'll need your own IndexWriter).
I assume that you're using Nutch 1.x, in this case yes you need to write a custom IndexWriter for your backend (which is fairly easy). If you use Nutch 2.x you'll have access to several backends through Apache Gora but then you'll have some features missing (like protocol-selenium).
I think you should use something like Crawler4j for your purposes.
The real power of Nutch is utilized when you want to do a much wider search or you want to index your data directly into Solr/ES. But since you just want to download data for each URL, I would totally go with Crawler4j. It's much easier to setup and does not require complex configurations.

How to download a webpage with a URL query in Clojure?

I like that in Clojure I can read a webpage with (slurp url) the same way I read files stored on the local machine. But as soon as the URL contains a question mark followed by paramethers (https://www.google.ru/search?q=clojure) slurp returns error 400. Do I have to use another function? What is the simplest way?
I think you'll need to encode the = yourself. try this:
(slurp "https://google.ru/search?q%3Dclojure")
Also note there have been encoding issues with the underlying clojure.java.io/reader (what slurp is using under the covers) in the past so check your clojure version as well.
It is worth noting however that slurp is pretty basic and I wouldn't recommend relying on it for anything other than really basic stuff or as a convenience for working with URL's. If you need to pull information from URL's in actual code, I would suggest you look at clj-http, which is a full-featured http client library which will give you much more control than slurp.

Get data from server in iOS

I have a script that takes hours and outputs if it is open or closed and I want to run this on my server, and then from my iOS app get if a certain place is open or closed. And I was wondering how I could do this without generating an JSON file and then parsing that as I would to frequently re-generate the file as the things change from open to closed and such. So is there a way I could make a call to the server and just get the data that I want?
Im not entirely sure whether you are asking specifically for an iOS framework for requests and responses or server side. So I figure I will try to answer both. Here are some swift projects you might be interested in. I'm not incredibly familiar with objective c yet but these resources might prove useful.
Alamofire
This will help you send and receive http requests.
Swifty JSON
This will help with JSON parsing
However if your question is more regarding server side issues. You could look into a REST api and that way you could only request certain entities that you are interested in instead of just sending back the entire batch of information with all of your data, and searching through it on the iOS device.
Hopefully this is somewhat helpful.
If you are only planning on GET-ing some content from the web you can use the NSURL class. NSURL *lru = [NSURL URLWithString:#"http:mywebserver.com"] and then create a new string object from the url with [NSString stringWithContentsOfURL:lru], or any data you would like. If you are trying to hold a session or having more control over the request, I would suggest looking into the NSURLSession class.

Simple way to modify XML in iOS

I get an XML file with a huge amount of data. I would like to add some entries to it. I have looked at GDataXML and a few others but cant seem to find something that will allow me to add entries without having to fully map the XML and then reconstruct it. What is a simple way to get this done?
As you have looked at GDataXML and don't want to go for it, then TouchXML could be a nice alternative. Here is a good tutorial on how to use TouchXML here.

Parsing html data with nutch 1.0 and a custom plugin

I am currently trying to write a custom plugin for nutch 1.0. This plugin is supposed to parse html data and filter out relevant information from documents. I have a basic plugin working, it extends the HtmlParserResult object and is executed each time I do a parse.
My problems are two faced at the moment:
I do not understand the workflow/pipline of the nutch parsing good enough. I do not find the information about this on the nutch site.
I do not understand how the DOM parsing is done, I see that Nutch have set of DOM objects and that the HtmlParser plugin does some DOM parsing, still I have not figured out how this is best done.
I remember making a nutch HTML parsing plugin for a past work. I don't have access to how I did it exactly, but here are the basic points. We wanted to do the following:
parse an HTML page but conditionally use a H1 tag or a tag with a certain class as the page title rather than the actual //html/head/title
There were some special pieces of data that were sometimes on the page (ie what tab was selected, which would tell us if this was a retail customer, a bank customer, or a corporate customer).
etc.
What I did was just find the html-parse plugin class (I'm having trouble finding the actual class name), and extend it. Then override the parsing function. The new function should call the super function and then can walk the DOM tree to find the special data you are looking for. In my case I'd look for a better title and then override the value that the super function came up with.
For your second question, I'm not clear what you are asking about. I think you are asking what happens when the DOM isn't well formed? I would just dig through the nutch code (http://grepcode.com/snapshot/repo1.maven.org/maven2/org.apache.nutch/nutch/1.3/) and find out how the parsing is done (i'm sure they use a library to do it). That should tell you more about if things are greedy, or what.
Holler if you have questions.

Resources