I get an XML file with a huge amount of data. I would like to add some entries to it. I have looked at GDataXML and a few others but cant seem to find something that will allow me to add entries without having to fully map the XML and then reconstruct it. What is a simple way to get this done?
As you have looked at GDataXML and don't want to go for it, then TouchXML could be a nice alternative. Here is a good tutorial on how to use TouchXML here.
Related
How could I access a website and turn components of the website into strings. For example taking information from Facebook posts. I have done a little searching but can't find any good tutorials or anything useful.
Try looking at this tutorial. It should get you more familiar on the subject and start you off on the right track.
As it states at the beginning of the tutorial...
How to Parse HTML on iOS
Let’s say you want to find some information inside a web page and
display it in a custom way in your app. This technique is called
“scraping.” Let’s also assume you’ve thought through alternatives to
scraping web pages from inside your app, and are pretty sure that’s
what you want to do. Well then you get to the question – how can you
programmatically dig through the HTML and find the part you’re looking
for, in the most robust way possible? Believe it or not, regular
expressions won’t cut it! Well, in this tutorial you’ll find out how!
You’ll get hands-on experience with parsing HTML into an Objective-C
data model that your apps can use.
http://www.raywenderlich.com/14172/how-to-parse-html-on-ios
I am creating an app where there is a page which enables users to create a small project, I mean painting with the brush, adding labels, text fields, and adding UIImageViews and placing an image in them using the iOS library. then a screenshot is taken (for now) and it is uploaded on dropbox. from a tableview the users will be able to see all the uploaded documents. but the point is that it is only a screenshot. I wanted to upload things in a way that the textfields could be scrolled, and when I add the video feature, see the video. then add comments. I thought of uploading all the images, photos, textviews separately, and then save all the position of the pieces in an XML, so that the projects can be viewed from the table view: when a row is selected the app opens the XML and in base of that composes all the pieces like a puzzle. I decided to use Google library, (data), but I can't seem to find where to download the sample project with the library. so I put it at a side. I then tried to use NSXML parser, but I only see tutorials that enable asccess to an xml file, not actually create one according to each project!! Help!! How can I proceed? Any suggestions or tutorials? May be were to give me the link to data project, and please not to the google developer page or trunk, because it is a mess!!
thanks for the help in advance
There is a very good tutorial by Ray Wenderlich's web site (a great iOS dev and tutorial resource) on how to create XML with GDataXML here:
http://www.raywenderlich.com/725/how-to-read-and-write-xml-documents-with-gdataxml
Here is the GitHub repo: https://github.com/neonichu/GDataXML
OR you could use the very simple, very easy to use standard NSPropertyListSerialization:dataWithPropertyList:format:options:error and this is how you would do it:
Populate it with NSArray's and NSDictionary's (this is the part that requires the most code - but it isn't difficult to do). The objects in your dictionary are the children and sub children (which may be NSArray's of NSDictionaries). Then use the following to write it out:
NSData *xmlData = [NSPropertyListSerialization dataWithPropertyList: resultLists
format: NSPropertyListXMLFormat_v1_0
options: 0
error: &error];
//----- DO ERROR CHECKING (left out to simplify)
BOOL result = [xmlData writeToFile:arrayFileName atomically:YES];
Boom. An XML file.
I'm backing up a core data file and it is relatively trivial to do.
I normally like Ray Wenderlich's web site, but in this case he makes something that is very easy to do very complex. I've done it that way and honestly, the Apple way is much, much easier.
I want to parse this RSS (from a bit.ly user stream) to grab all bit.ly links and related data, and store all new items into a database. What are examples of easy ways to parse RSS (e.g. simple helpers with small footprint)? I'm a Code Igniter rookie, so please be gentle :)
I ended up by simply using json_decode(file_get_contents('http://bitly.com/u/joaoramos.json')); and then dug into the object to get what I needed.
Check out SimplePie. Integrating this in CI , shouldn't be hard .
Or If you want one with less foot print, check this Thread on CI forums , This is very light and does the job.
I have the following problem: I have a lot of papers in pdf format and I have to extract information from the first page of each one and then save it into a database
I just need to extract, the title, the abstract, keywords, authors list, universities list, emails. I want to do a script to get a string for each one of that fields, for each paper.
How can I do that? Does anyone already did that? What languages and tools do you recommend me?
and Does exist a paper repository that already do that database feeding?
Considering the pdfs could be with different encodings, I have to deal with this problem too. Any help with this would be great.
An example of a paper its here
Greetings!
http://pdfbox.apache.org/
You have to check about the security of the pdf, that it's really text and not an image. Check the command line application of pdfbox if it works extracting the text, then you can use the jar and use http://pdfbox.apache.org/apidocs/org/apache/pdfbox/examples/util/ExtractTextByArea.html
Hope it helps....
By the way it's java...
edit.
I have not used this as a jar library http://www.qoppa.com/pdftext/, but I used the example application and it works, but I decided to go with pdfbox...
You need a API to read your pdf.
Seems fine (I never try it though)
You can probably find others with this link :-)
I am currently trying to write a custom plugin for nutch 1.0. This plugin is supposed to parse html data and filter out relevant information from documents. I have a basic plugin working, it extends the HtmlParserResult object and is executed each time I do a parse.
My problems are two faced at the moment:
I do not understand the workflow/pipline of the nutch parsing good enough. I do not find the information about this on the nutch site.
I do not understand how the DOM parsing is done, I see that Nutch have set of DOM objects and that the HtmlParser plugin does some DOM parsing, still I have not figured out how this is best done.
I remember making a nutch HTML parsing plugin for a past work. I don't have access to how I did it exactly, but here are the basic points. We wanted to do the following:
parse an HTML page but conditionally use a H1 tag or a tag with a certain class as the page title rather than the actual //html/head/title
There were some special pieces of data that were sometimes on the page (ie what tab was selected, which would tell us if this was a retail customer, a bank customer, or a corporate customer).
etc.
What I did was just find the html-parse plugin class (I'm having trouble finding the actual class name), and extend it. Then override the parsing function. The new function should call the super function and then can walk the DOM tree to find the special data you are looking for. In my case I'd look for a better title and then override the value that the super function came up with.
For your second question, I'm not clear what you are asking about. I think you are asking what happens when the DOM isn't well formed? I would just dig through the nutch code (http://grepcode.com/snapshot/repo1.maven.org/maven2/org.apache.nutch/nutch/1.3/) and find out how the parsing is done (i'm sure they use a library to do it). That should tell you more about if things are greedy, or what.
Holler if you have questions.