Reading ePub format

Reading ePub format - ios

I am trying to develop an iPhone application to read ePub files. Is there any framework available to develop this? I have no idea about how to read this file format. I tried to parse a sample file with .epub extension using NSXML Parser, but that fails.

The EPUB format brings together a bunch of different specifications / formats:
one to say what the content of the book should look like (a subset of XHTML 1.1 + CSS)
one to define a "manifest" that lists all of the files that make up that content (OPF, which is an XML file)
one to define how everything is packaged up (OEBPS: a zip file of everything in the manifest plus a few extra files)
The specs look a bit daunting but actually once you've got the basics (unzipping, parsing XML) down it's not particularly difficult or complex.
You'll need to work out how to download the EPUB, to unzip it somewhere, to parse the manifest and then to display the relevant content.
Some pointers if you're just starting out:
parse xml
unzip
To display content just use a UIWebView for now.
Here's a high level step by step for your code:
1) create a view with a UIWebView
2) download the EPUB file
3) unzip it to a subdirectory in your app's documents folder using the zip library, linked above
4) parse the XML file at META-INF/container.xml (if this file doesn't exist the EPUB is invalid) using TBXML, linked above
5) In this XML, find the first "rootfile" with media-type application/oebps-package+xml. This is the OPF file for the book.
6) parse the OPF file (also XML)
7) now you need to know what the first chapter of the book is.
a) each <item> in the <manifest> element has an id and an href. Store these in an NSDictionary where the key is the id and the object is the href.
b) Look at the first <itemref> in the <spine>. It has an idref attribute which corresponds to one of the ids in (a). Look up that id in the NSDictionary and you'll get an href.
c) this is the the file of the first chapter to show the user. Work out what the full path is (hint: it's wherever you unzipped the zip file to in (3) plus the base directory of the OPF file in (6))
8) create an NSURL using fileURLWithPath:, where the path is the full path from (7c). Load this request using the UIWebView you created in (1).
You'll need to implement forward / backward buttons or swipes or something so that users can move from one chapter to another. Use the <spine> to work out which file to show next - the <itemrefs> in the XML are in the order they should appear to the reader.

Apparently EPUB is "just" an XML format, so if you have an xml parser and the spec it should be okay.
Plus a little tuto? Have fun!
EDIT: you could also read some code here, this is for generating epub, not reading them but the code may be useful.
EDIT again: And see links to related question in the right sidebar, there are some links in the answers to free ebook reader which support ePub.
EDIT 3: You should add a comment when you edit your question so people who answer you can continue the discussion (if you don't comment we're not noticed of your edit).
So, The parsing fail because you didn't read the spec or related questions on Stack Overflow... *.epub file are a zipped folder containing XML file(s), not plain xml.

I read through this tutorial once (free registration required, sorry) and it gave me a great introduction to ePub. deverloperWorks tutorial here
I highly suggest you look at some of the XML processing libraries. If you just want to get specific information out of the XML file, then you can pick the right parsing strategy.

there is an open source project fbreader,
it also support iphone
http://www.fbreader.org/about.php

I'm playing arround to create an epub-framework for iphone apps.
At the moment (I really just startet) i can generate a title page with links to the chapters.
My approach is
Use quickconnect iphone framework as
a layer (maybe i change to phonegap)
which basically allows for javascript
apps as iphone apps
Add the UNZIPed epub as a ressource to the project
Parse the whole thing with a customized version of the epub.js (somewhere on google-code)
Right now I'm looking into pageflip, some kind of gui and minor usability issues (save the current page beingviewed)
I hope that give's you an idea on how to start

Jonathan Wight (schwa) has developed a ObjC solution for parsing and displaying ePub documents on the iPhone. It's part of his TouchCode open source repository.

Related

How to generate .mobi file for Kindle that supports Kindle Reading Speed feature

I am generating a multi-chapter eBook for Kindle Fire by first generating a well-formed xhtml-based EPUB 3.0 format file and then converting the .epub file to .mobi w/ Kindle Previewer and/or kindlegen. The generated .mobi file transfers properly to the Kindle and looks entirely correct. The problem is that my generated file never produces the "Learning Reading Speed" status at the bottom or the actual estimate of reading time. The reading speed feature never seems to get activated for any .mobi file generated with kindlegen. I'm aware that status area cycles through various features/statuses by pressing the status area on the reader screen and am certain that the feature is never activated.
I have generated an alternate version of the .mobi file using Calibre and the reading speed feature is enabled, however the format of the output file is heavily altered and is not consistent with the kindlegen format.
What is the key to generating a Kindle .mobi file with kindlegen that supports the reading speed feature?

I finally discovered the answer, which is that the generated .mobi file needs two tags manually added, 113 ASIN and 501 CDEContentType = EBOK, in the correct primary header of the .mobi file.
The tag information is published elsewhere, but often overlooks that a kindlegen generated .mobi file can have two versions of the same book embedded within the .mobi file, each with a primary header. If the tags are added to the first primary header (typically a v6 header) but not the second primary header (typically a v8 header), the Kindle device will not recognize the tags.
In my case, the tags needed to be added to the second primary header which allowed the Kindle device to treat the file as a book rather than a document. Most .mobi tag editors reference the first primary header only, which can cause confusion. Alternately, the .mobi could be split into two files in which case the tags could be manually added to the primary header of the relevant post-split file.

Custom file types with iOS Document Interaction Programming

I understand the basic of Document Interaction Programming and UIDocumentInteractionController and I've got it working in my app. However I'm having trouble with specific details of using custom file types. I can't find this addressed in the Apple docs anywhere.
My app uses it's own file types with unique extensions. The files themselves are just plists (xml), but I want the device to treat the files as only openable in my app. Originally I implemented the Document Interaction stuff to treat them as XML while I got it working, but now I want it to treat them as binary files that it needs to hand off to my app.
At the moment, if you have one of my files in an email attachment, iOS first shows the QuickLook (which just spews all the text content of the xml out) before you can choose to Open In. Similarly if one of my files is opened with Safari, Safari just shows the XML and doesn't give you the option to show it in my app at all.
So how do I get iOS to not treat my files as XML? I've changed the "Conforms to UTI" value and "public.mime-type" value in the info.plist, but it seems to have no effect.
Any tips greatly appreciated.

As far as i understand the UIT concept of Apple you cannot just change the file extension to change a potential UIT of the file. If the file contains XML-Data, other apps as well as internal apps might recognize your content and show it internally as XML.
Try to store your Plists with NSPropertyListSerialization NSPropertyListBinaryFormat_v1_0 (then you readble XML)
When you did that without success, why not trying this:
use zlib to compress the XML plists afterwards to a zipped file.
make a "unique" file extensions (<file>.myappname)
this should "hide" other apps and quick view.
Tell me if one of the ways did work for you.

How to create a reflowable content from the PDF?

I am going to developing an application, which is an epub. I have PDF files. I need to make those files as reflowable content(epub)... Then only the PDF files will be viewable in mobiles, tablets... etc.. Please suggest the solutions to make reflowable contents from the PDF...

If you don't mind using an open source software, go with Sigil.
If you want to learn innards of how to create by hand, or some tool of your own, Follow this. (This is a one month course, So you will not get all the content in one day, though).
Create the folder structure.
In a folder of your choice, create the following: META-INF (folder), OEBPS (folder), mimetype ( a file with exactly same name ).
Put application/epub+zip in the file mimetype. No spaces no lines.
Convert your PDF to text format. In Adobe acrobat, you will have file > export> .
Read the content from PDF, you will find some conclusions of how you can split them in to chapters or sub reading topics. Split according to the understanding of the book, and make so many text files.
Make sub folder structure. Make Images, Text, Styles (folders) content.opf, toc.ncx (files) inside OEBPS folder.
Put all your split files in Text folder created in step 5.
put all images extracted in pdf in Images folder
Put any styles (not describing here,) in Styles folder.
In the META-INF folder created in step 1, create a file called container.xml and fill with the following: <?xml version="1.0"?><container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/> </rootfiles></container>.
If you are able to do these many things sincerely, ping again, I would try to tell you what you should put in content.opf, and toc.ncx in created in step 5.
As an example, You can use some example from my site. Download from here and use them with caution. Do not distribute.

We're opening up a beta for our web based pdf reflow viewer at the beginning of 2015. Feel free to sign up to be part of our beta test. More info here:
http://flexpaper.devaldi.com/reflow-pdf-documents.jsp

To retrieve the contents of .doc files?

I am working as a Software developer for Mobile Applications. I am developing an application in which i want to retrieve the contents of the .doc files that arrive on the Blackberry mobile as an Email Attachment Part. Whenever i am retrieving the contents of the .txt files, the code written for the mobile is retrieving the accurate contents but in case of .doc files, it is displaying a lot of junk material in the header and footer of the actual contents.
So, my problem is that how can i get rid of this additional junk material as i want to retrieve only the actual contents of the .doc files. Please reply
Thanks

You can get the specifications of the doc-Format from Microsoft. Though, I don't know if they're complete or even useful. Another guess would be to have a look at Projects which have implemented it, like OpenOffice.org.
Bobby

Creating Microsoft Word (.docx) documents in Ruby

Is there an easy way to create Word documents (.docx) in a Ruby application? Actually, in my case it's a Rails application served from a Linux server.
A gem similar to Prawn but for DOCX instead of PDF would be great!

As has been noted, there don't appear to be any libraries to manipulate Open XML documents in Ruby, but OpenXML Developer has complete documentation on the format of Open XML documents.
If what you want is to send a copy of a standard document (like a form letter) customized for each user, it should be fairly simple given that a DOCX is a ZIP file that contains various parts in a directory hierarchy. Have a DOCX "template" that contains all the parts and tree structure that you want to send to all users (with no real content), then simply create new (or modify existing) pieces that contain the user-specific content you want and inject it into the ZIP (DOCX file) before sending it to the user.
For example: You could have document-template.xml that contains Dear [USER-PLACEHOLDER]:. When a user requests the document, you replace [USER-PLACEHOLDER] with the user's name, then add the resulting document.xml to the your-template.docx ZIP file (which would contain all the images and other parts you want in the Word document) and send that resulting document to the user.
Note that if you rename a .docx file to .zip it is trivial to explore the structure and format of the parts inside. You can remove or replace images or other parts very easily with any ZIP manipulation tools or programmatically with code.
Generating a brand new Word document with completely custom content from raw XML would be very difficult without access to an API to make the job easier. If you really need to do that, you might consider installing Mono, then use VB.NET, C# or IronRuby to create your Open XML documents using the Open XML Format SDK 1.0. Since you would just be using the Microsoft.Office.DocumentFormat.OpenXml.Packaging Namespace to manipulate Open XML documents, it should work okay in Mono, which seems to support everything the SDK requires.

Maybe this gem is interesting for you.
https://github.com/trade-informatics/caracal/
It like prawn but with docx.

You can use Apache POI. It is written in Java, but integrates with Ruby as an extension

This is an old question but there's a new answer. If you'd like to turn an HTML doc into a Word (docx) doc, just use the 'htmltoword' gem:
https://github.com/karnov/htmltoword
I'm not sure why there was answer creep and everyone started posting templating solutions, but this answers the OP's question. Just like Prawn, except Word instead of PDF.
UPDATE:
There's also pandoc and an API wrapper for pandoc called docverter. Both have slightly complicated installs since pandoc is a haskell library.

I know if you serve a HTML document as a word document with the .doc extension, it will open in Word just fine. Just don't do anything fancy.
Edit: Here is an example using classic ASP. http://www.aspdev.org/asp/asp-export-word/

Using a technique very similar to that suggested by Grant Wagner I have created a Ruby html to word gem that should allow you to easily output Word docx files from your ruby app. You can check it out at http://github.com/nickfrandsen/htmltoword - Simply pass it a html string and it will create a corresponding word docx file.
def show
respond_to do |format|
format.docx do
file = Htmltoword::Document.create params[:docx_html_source], "file_name.docx"
send_file file.path, :disposition => "attachment"
end
end
end
Hope you find it useful. If you have any problems with it feel free to open a github issue.

Disclosure: I'm the leader of the docxtemplater project.
I know you're looking for a ruby solution, but because all other solutions only tell you how to do it globally, without giving you a library that does exactly what you want, here's a solution based on JS or NodeJS (works in both)
DocxTemplater Library
Demo of the library
You can also use it in the commandline:
npm install docxtemplater -g
docxtemplater <configFile>
----config.docxFile: The input file in docx format
----config.outputFile: The outputfile of the document

This is a way Doccy (doccyapp.com) has a api that does just that which you can use. Supports docx, odt and pages and converts to PDF as well if you like

Further to Grant's answer, you can also send Word a "Flat OPC" file, which is essentially the docx unzipped and concatenated to create a single xml file. This way, you can replace [USER-PLACEHOLDER] in one file and be done with it (ie no zipping or unzipping).

If anyone is still looking at this, this post explains how to use an XML data source. This works nicely for me.
http://seroter.wordpress.com/2009/12/23/populating-word-2007-templates-through-open-xml/

Check out this github repo: https://github.com/jawspeak/ruby-docx-templater
It allows you to create a document from a word template.

If you're running on Windows, of course, it's a matter of WIN32OLE and some pain with the Word COM objects.
Chances are that your serving from a *nix environment, though. Word 2007 uses the "Microsoft Office Open XML" format (*.docx) which can be opened using the appropriate compatibility pack from Microsoft.
Some of the more recent Office apps (2002/XP and 2003 at least) had their own XML formats which may also be useable.
I'm not aware of any Ruby tools to make the process easier, sadly.
If it can be made acceptable, I think I'd be inclined to go down the renamed-html file route. I just saved a document as HTML from WordXP, renamed it to a .doc and opened it without problem.

I encountered the same problem. Unfortunately I could not manipulate the xml because my clients should themselves to fill in templates. And to do this is not always possible (for example, office for mac does not allow this).
As a solution to this problem, I made a simple gem, which can be used as an rtf document template with embedded ruby: https://github.com/eicca/rtf-templater
I tested it and it works ok for filling reports and documents. However, formatting badly displays for complex loops and conditions.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart