how to view epub documents as pages according to printed counterpart? - epub

In popular desktop ebook viewers like calibre, FBreader or Cool Reader, I'm missing a feature to show ebooks in the same pagination as their printed counterparts. Some people (here, too) claim that epub does not have a page concept (e.g. at how to implement 'page break' in epub reader).
But this is not true. From http://www.idpf.org/accessibility/guidelines/content/xhtml/pagenum.php:
"If an ebook is produced from the same workflow as a print document, print pagination markers should be retained in the document. These markers benefit readers in mixed print/digital environments, such as a classroom, as the page numbers allow a common point of reference between the two editions." and from its FAQ-section: "Do page numbers really matter anymore? - Yes. Despite the assertions of the futurists and technophiles, print still reigns supreme. As a result, anyone in a mixed print/digital environment — using an assistive technology or not — needs a way synchronize electronic and print content."
I tested several ebook viewers with two different documents that contain page break tags, but they did not break up into pages (or I'm missing a preference option). Any help, infos are highly appreciated.

You can force page break with CSS and the property page-break-after=always but is not the best page layout for an ebook. For example add a class to your epub:type="pagebreak" label with that style.
<span epub:type="pagebreak" id="page23" class="pagenumber">23</span>
.pagenumber { /* other styles*/ page-break-after=always !important; }

Related

How to make web site iPad ready? [duplicate]

How does the Reader function of Mobile Safari in iOS 5 work? How do I enable it on my site. How do I tell it what content on my page is an article to trigger this function?
A lot of the answers posted here contain false information. Here are some corrections/clarifications:
The <article> element works fine as a wrapper; Safari Reader recognizes it. My site is an example. It doesn’t matter which wrapper element you choose, as long as there is one, other than <body> or <p>. You can use <article>, <div>, <section>; or elements that are semantically incorrect for this purpose, like <nav>, <aside>, <footer>, <header>; or even inline elements like <span> (!).
No headings are required for Reader to work. Here’s an example of a document without any <h*> elements on which Reader works fine: http://mathiasbynens.be/demo/safari-reader-test-3
I posted some more details regarding my findings here: http://mathiasbynens.be/notes/safari-reader
I've tested 100 or so variations of this on my iPhone in order to figure out what triggers this elusive Reader state. My conclusions are as follows:
Here is what I found had an impact:
Having around 200 or more words (or 1000 characters including whitespace) in the article you want to trigger the "Reader" seems necessary
The reader was NEVER triggered when I had less than 170 words; although it was sometimes triggered when I had 180 or 190 words.
Text inside certain elements such as <ol> or <ul> (that are not typically used to contain a story) will not count towards the 200 words (they will however be displayed in the reader if the reader is triggered for other reasons)
Wrapping the 200 words in a block element such as a <div> or <article> seems necessary (that said, I'd be surprised if there were any websites where that was not already the case)
For full disclosure, here is what I found did NOT have an impact:
Whether using a header or not
Whether wrapping the text in a <p> or letting it flow freely
Punctuations (ie removing all periods, commas, etc, did not have an impact)
It seems the algorithm it is based on is looking for p-Tags and it counts delimiters like "." in the innerText. The section (div) with the most points gets the focus.
see:
http://lab.arc90.com/experiments/readability/
Seems to be the base for the Reader-mode, at least Safari attributes it in the Acknowledgements, see:
file:///C:/Program%20Files/Safari/Safari.resources/Help/Acknowledgments.html
Arc90 ( Readability )
Copyright © Arc90 Inc.
Readability is licensed under the Apache License, Version 2.0.
This question (How to disable Safari Reader in a web page) has more details. Copied here:
I'm curious to know more about what triggers the Reader option in Safari and what does not. I wouldn't plan to implement anything that would disable it, but curious as a technical exercise.
Here is what I've learned so far with some basic playing around:
You need at least one H tag
It does not go by character count alone but by the number of P tags and length
Probably looks for sentence breaks '.' and other criteria
Safari will provide the 'Reader' if, with a H tag, and the following:
1 P tag, 2417 chars
4 P tags, 1527 chars
5 P tags, 1150 chars
6 P tags, 862 chars
If you subtract 1 character from any of the above, the 'Reader' option is not available.
I should note that the character count of the H tag plays a part but sadly did not realize this when I determined the results above. Assume 20+ characters for H tag and fixed throughout the results above.
Some other interesting things:
Setting for P tags removes them from the count
Setting display to none, and then showing them 230ms later with Javascript avoided the Reader option too
I'd be interested if anyone can determine this in full.
Both Firefox and Chrome have the similar plugin named iReader. Here is its project with source code.
http://code.google.com/p/ireader-extension/
Read the code to get more.
I was struggling with this. I finally took out the <ul> markings in my story, and viola! it started working.
I didn't put any wrapper around the body, but may have done it by accident.
HTML5 article tag doesn't trigger it on my tests. It also doesn't seem to work on offline content (i.e. pages saved on your local machine).
What does seem to trigger it is a div block with a lot of p's with a lot of text.
The p tag theory sounds good. I think it also detects other elements as well. One of our pages with 6 paragraphs didn't trigger the Reader, but one with 4 paragraphs and an img tag did.
It's also smart enough to detect multi-page articles. Try it out on a multi-page article on nytimes.com or nymag.com. Would be interested to know how it detects that as well.
Surprising though it may be, it indeed does not pay any attention to the HTML5 article tag, particularly disappointing given that Safari 5 has complete support for article, section, nav, etc in CSS--they can be styled just like a div now, and behave the same as any block level element.
I had specifically set up a site with an article tag and several inner section tags, in prep for semantic HTML5 labeling for exactly such a purpose, so I was really hoping that Safari 5 would use that for Reader. No such luck--probably should file a bug on this, as it would make a great deal of sense. It in fact completely ignores most of the h2 level subheads on the page, each marked as a section, only displaying the single div that adheres to the criteria mentioned previously.
Ironically, the old version of the same site, which has neither article, section, nor separating div tags, recognizes the whole body for display in Reader.
See Article Publishing Guidelines.
Here are APIs about how to read and parse: Readability Developer APIs. There's already a project you can refer: ruby-readability.
A brief history:
The Safari Reader feature since Apple's Safari 5 browser embeded a codebase named Readability, and Readability started off as a simple, Javascript-based reading tool that turned any web page into a customizable reading view. It was released by Arc90 (as an Arc90 Lab experiment), a New York City-based design and technology shop, back in early 2009. It's also embeded in Amazon Kindle and popular iPad applications like Flipboard and Reeder.
I am working on algorithms for cleaning web-sites from information "waste" similar to Safari Reader feature. It's not so good as readability but has some cool stuff.
You can learn more at smartbrowser.codeplex.com project page.

Add alternative text to a phrase in a document file

I use LibreOffice Writer and I want to insert an alternative text to a specific phrase in the document, how can I do it?
Example if we have an image in the document we can make double left click and add the alternative text like this:
Is it possible to make the same if we select a whole phrase of text? If yes how? And if No is there any other proposal?
The alternative text in 'word'/odt documents is actually intended as the 'alt' attribute in HTML (web) pages:
The alt attribute provides alternative information for an image if a
user for some reason cannot view it (because of slow connection, an
error in the src attribute, or if the user uses a screen reader).
(http://www.w3schools.com/tags/att_img_alt.asp)
It's only purpuse is thus to provide the user with information in case he/she can not view the image. Since having alternative text in case some text cannot be displayed is, well, silly, this 'alt' attribute is not defined for pieces of text. Alternatively, you could have a hyperlink pointing to nothing ("#"), which does provide a tooltip attribute.
What is it that you're intending to achieve anyway? It's not going to show up on any prints, which is the intended purpose of Writer... Footnotes (for prints) or Comments (for communication with co-editors) might suit you better.

Ruby on rails, markup interpreter with custom tags provided on runtime? For forms, not views

Site for writers and readers, both groups will be non-technical users (writers will be familiar with BBCode already, but I can choose other markup). Writers will write guides using markup tags to embodied info. Readers will be presented with parsed text. Tags will be expanded to some info.
Number of tags needed as well as info tied to particular tag will change. So they can not be hard-coded.
I'm looking for any interpreter that can use tags provided at run time, for my next Ruby on rails app. Anyone know such?
Edit: Yeah. I'm not looking for views markup, but for forms textarea markup to be used by website users (to format their guides, but I do need ONE markup for formatting, and embedding info).
Based on my current understanding of your needs, I recommend mustache. This is described as a "logic-less" template processor. It doesn't have programming logic, simply run-time replacements.
Here's one way to use it (from the github readme)
Given this template (winner.mustache):
Hello {{name}}
You have just won {{value}} bucks!
We can fill in the values at will:
view = Winner.new
view[:name] = 'George'
view[:value] = 100
view.render
Which returns:
Hello George
You have just won 100 bucks!

Processing HTML with UIMA

I am trying to get my head around the UIMA architecture.
I would like to create a pipeline that starts with HTML markup. I need to strip this to plain text, so it can be processed by different annotators, like POS, chunking, entity detection, etc. However I would also like to keep track of which regions correspond to the original html tags, like links, paragraphs, em, etc. Basically I would like a final annotator that takes advantage of structural annotations (from html) and semantic annotations (from the other components), all at once.
So, I can imagine starting off with a component that strips the html markup and adds annotations to keep track of the tags I am interested in. Does such a component exist already? It seems like something a lot of people would want.
If I do have to create it from scratch, what kind of component is it? It's not just a straight annotator, because it needs to change the SOFA: it needs to replace the markup with plain text.
Or should I have it create a new view of the document, so we maintain a markup view and a plain text view of the document? This seems weird, considering I will never care about the markup view again. Also, how would I make sure the other annotators (which I won't be coding myself) operate on the plain text view of the document rather than the markup view?
Depending on the complexity of the markup, some people use Apache Tika, and some people use Boilerpipe.
Here is a blog post from someone who wanted to use Boilerpipe in UIMA but ran into a snag because he wanted to preserve offsets back to the HTML.
Here is the UIMA annotator that calls tika.
UIMA Ruta provides some analysis engines for this task. The HtmlAnnotator creates annotations in the html text for the different tags. The HtmlConverter is able to create a new view that contains only the text of the html, but with the corresponding annotations for the tags. There are some configuration parameters for handling linebreaks and so on. For further processing without sofa mappings in a pipeline, there is the ViewWriter that is able copy the new plain text view to the _initalView of a new file.
DISCLAIMER: I am a developer of UIMA Ruta

Why aren't CSS3 #page rules working?

I am trying to make a print newspaper that is easily generated from my WordPress site. I am trying to design the print template for it, but in order to control specific pages, I need to use the #page rule.
I read this tutorial, which suggests methods like:
#page :left {
#top-left {
content: "Cascading Style Sheets";
}
}
But when I try this in Firefox 5 (and Chrome 14) print-preview or print, it does not print anything at the top-left. In fact, I can't seem to do anything within the #page rule.
Am I doing this wrong or is this a CSS3 feature that just hasn't been implemented yet?
Just after the preface, the tutorial states:
Web browsers are good at dealing with pixels on a screen, but not very good at printing. To print a full book we turned to Prince, a dedicated batch processor which converts XML to PDF by way of CSS. Prince supports the print-specific features of CSS2, as well as functionality proposed for CSS3.
So, the tutorial code isn't meant for use with a browser's print function in the first place.
In any case, as I said in my comment, I wouldn't count on any browser implementing this yet. Everyone's crazy about the stuff happening on screen like animations right now.

Resources