Does ePub restrict HTML to only some subset?

Does ePub restrict HTML to only some subset? - epub

I was thinking about creating an ePub reader. All the ePub files I have seen so far seemed very simple: just text paragraphs with some big font for the title, and some rectangular illustration images. So, I thought ePub provides only simple ways to describe the text content.
But it seems that an ePub file contains lots HTML and CSS. I opened a sample ePub and it contained text in <p> with the class attribute. Does it mean that it can basically be like a website archive? The author can use any advanced formatting/layout feature that are used when creating an HTML website? If so, I would have to implement a whole web browser to create an ePub reader.
Or, is the HTML allowed in ePub are somehow restricted to only certain HTML tags and attributes, like the HTML that is allowed when writing on an online forum.
PS: I did some research on my own after posting this, and my conclusion is that it is the former. I have tried some famous ePub apps on the Android market, and they all seem to be weird in terms of GUI (meaning, probably non-native),and whilst there does not seem to be a definitive way to know whether an app is native or a web-app, one trick was enabling the layout boundary, and those apps do not have boundaries inside the ePub view itself, meaningly it probably is a web-view.
I searched GitHub for ePub viewers, and they all seem to be using JavaScript or a web-view, including this Android ePub viewer.
So, probably those ePub apps are just parsing the meta data files in the ePub format, and for the rendering of the book itself, they are just delegating that to the web-view and using some sort of JavaScript framework to add a UI on the web-view.
If someone knows better, please correct me.

My understanding of previous ePub specs is that it is a web archive of sorts. A compressed archive consisting of metadata, fonts, images, and content.
It used to be that this content was only in a specially-flavored XHTML format, but it looks like they've also added SVG content documents. I've admittedly lost track of the ePub spec changes (I didn't realize they had merged efforts with the W3C), but hopefully the spec links above can give an idea of what's different between a standard html5 web page and what epub expects.
EDIT: I should also mention that a lot of the readers I worked with back in the day had the bad habit of stripping out formatting and just presenting text (not even text with embedded fonts -- a big no-no for non-English texts). Not sure if this was the reader software being "robust" and acting against ePub formatting that would break their app, or something else.

Related

iOS create PDF invoice

I want to create a PDF invoice inside my iOS App (either in Objective-C or Swift).
My main problem is that the invoice might have several pages, which is very difficult to realize with the existing APIs from Apple (CoreGraphics, Quartz 2D, etc).
By now, I already have a barely working solution:
I created a HTML template which is the basic structure for the invoice
The template is filled with data using GRMustache
I load the generated HTML file into a UIWebView and save it as PDF (I used NDHTMLtoPDF to do this)
So far, so good.
The problem with this solution is that page breaks don't work properly.
There are some tables and images and the page break often cut's off tables or images.
I have tried to use the page-break-inside: avoid; css property for the images and the tables but UIWebView seems to ignore them completely...
My question is:
Do you know how to fix the page break problem?
Can you recommend another solution to create PDFs on iOS?
Should I design the invoice in Storyboard and generate a PDF from the UIView? What about the page breaks here?
I would prefer to have a template (e.g. HTML), fill it with data and save it as PDF, rather than doing everything in code.

I actually wrote a PDF reporting component myself with presumably similar requirements:
templating is done via HTML / CSS
Static Header / Footer on each page
You can achieve this with UIPrintPageRenderer and UIViewPrintFormatter APIs, with the downside of probably getting your app rejected in the app store review, since you're using private Apple APIs. So this approach might only be viable in an academic setting or if your developing in-house apps.
There are some tutorials on how to do this which I'm not going to repeat, but for me these resources were quite useful:
http://www.fromdev.com/2014/06/how-to-create-a4-size-pdf-file-in-ios.html
http://www.labs.saachitech.com/2012/10/23/pdf-generation-using-uiprintpagerenderer/

How download linked pdf files from website?

I want to download hundreds of pdf documents from a site. I have tried tools such as SiteSucker and similar, but it does not work, because there appears to be some "separation" between the files and the page that links to them. I don't know how to describe this in a better way, since I don't know that much about website programming or scraping. Any advice on what this can be and how one can circumvent it?
More specifically, I am trying to download pdfs of UN resolutions, stored on pages like this one: http://www.un.org/depts/dhl/resguide/r53_en.shtml
It appears there is an in-built "search function," on the UN site, which makes dummy scraping, like SiteSucker, not work as intended.
Are there other tools that I can use?

Clicking a link on the page you mentioned redirects to a page composed by two frames (html). The first one is the "header" and the second one loads a page to generate the PDF file and embed it inside. The URL of the PDF file is hard to guess. I don't know of free tool that could scrap this type of page.
Here is an example of the url in the second frame that ends to the PDF file:
http://daccess-dds-ny.un.org/doc/UNDOC/GEN/N99/774/43/PDF/N9977443.pdf?OpenElement

How to implement an interactive PDF in iOS

Here im hitting my head againt the wall.
My client provided a pdf with buttons(just like buttons,when user tap on button,it will load next page and previous page etc.).
This buttons will work only when we open it in adobe reader.
I tried the QLpreviewview,quickview but it is not working,all what i can do is just to load the pdf in the webview.
Can anyone please help me in how to load an interactive pdf in iOS.
Thanks in advance.

Have a look at PSPDFKit, it is the most advanced framework I've found for PDFs in iOS. They have an impressive list of customers as well.
It is a bit pricy though, but you have the option to get the Source Code too if you need to modify anything. Could be worth it if your client need that kind of performance and other features as well.
(I am not in any way affiliated with PSPDFKit)

The limitations are due to the capabilities (or non-capabilities) of the PDF viewer used.
Currently the leading PDF viewer on iDevices is PDFExpert by Readdle. Adobe Reader for iDevices is weaker, but can deal to some extent with form elements.
For page navigation etc. you might use links instead of button fields (as far as you can live with the capabilities of links, and not use JavaScript). Links are said to be handled properly with many PDF viewers.
You may have to require certain PDF viewers on instructional level, because you don't have control over the viewer used by the actual user. And, as you noticed, many PDF viewers are simply too dumb do deal with active elements.
Another approach would be looking at PDF-to-HTML5 converters, and serve HTML5 from a server.

Parse epub information, like title, author

I like to make a eBook app for iOS, something like iBooks, but with some special features. My questions is regarding the epub files. I can unzip them with ZipArchivos and storing each epub in its own folder.
So far so good. The epub's are imported to my app, either by email attachment, Safari or with iTunes Sync. Also this is working well.
When getting a new epub I want to extract several information from it, like title, author, publisher ... and to store this information in CoreData.
Is this possible and if yes, does anyone have a solution for it?

Please have a look at the two discussions liked above.
You might also want to check Readium SDK out (if its licensing terms fit into your project): https://github.com/readium/readium-sdk
There are also "premade EPUB-support SDKs" which you can buy online, but in my experience they are quite poor.
For our own hybrid app Menestrello ( https://readbeyond.it/menestrello/ ), the native iOS part does smart unzipping (via the Objective-Zip library) + basic metadata parsing (title, author, etc.) just to populate the library view. The "real" parsing required when the EPUB is opened in reading view is done in JS. I wrote both parsers (native iOS and JS) from scratch.

What's a good way to "standardize" the format of web content for importing into an app?

I'm working on a project where one of the requirements is to import articles from a website. The problem is that different websites have different formats. I could theoretically write some code that strips out extra content from known websites, but this is hardly maintainable. It sounds like there should be something already out there that can "RSS-ify" pages for me into some format, so that I can easily import content.
Is there such a service that I can just "plug" any website into? Also, what are my options here in regards to importing web content into an app, aside from this? How would you handle it?
Edit: Two things that come to mind are FeedBurner and Apple's Safari browser, which adds an "RSS" button. (Safari Mobile actually does one better and renders a neat read view in iOS 5.) Are they relevant to this question in any way?

It seems that there is none. The best I can do is work with content providers to offer RSS feeds and such.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart