ePub specifications clarification needed - epub

just wanted some help on ePub specifications.. Is it mandatory for the toc.ncx to have the src(ie. the xhtml) .I have observed the same content src is also available in the .opf file.

Yes, that is mandatory and that's a design problem :
Overlap between NCX and OPF metadata
Because the NCX is borrowed from another standard, there is some
overlap between the information encoded in the NCX and that in the
OPF. This is rarely a problem when you generate EPUBs
programmatically, where the same code can output to two different
files. Take care to put the same information in both places, as
different EPUB readers might use the values from one or the other.
Source
That's not the case anymore in the future version of the norm (ePub 3), but please note that no device support it currently.

Related

How can I parse a progressive JPG

I am writing a 'clean-room' program that requires parsing/unparsing of jpegs. I have found all the information I need to parse/unparse baseline jpegs, but I cannot find the information that I need to parse/unparse progressive jpegs.
I need to be able to convert the compressed data to macroblocks and back, so most available frameworks are too high level. I also want to understand what is going on, hence the 'clean room' approach.
Can anybody help me please? A specification of the SOF1 header would be useful, as would be the layout of the compressed data in the scan segment.
Thanks in advance.
If you want to figure this out, I'd get this book:
https://www.amazon.com/Compressed-Image-File-Formats-JPEG/dp/0201604434/ref=sr_1_5?ie=UTF8&qid=1486949641&sr=8-5&keywords=jpeg
It explains it all in easy-to-understand terms. The author has source code at
http://colosseumbuilders.com/sourcecode/imagelib403.zip
that is designed to be easy to understand.
The SOF1 header is the same as all other SOF headers. You need to have a copy of the JPEG standard (as obtuse as it is). The other sources above will help you get through it.

Print contents of rpg file in human-readable format

Context
A friend of mine is having trouble printing source code to a human readable format.
The compiled (I assume) programs of their welding robot have the .rpg extension. They want to collect print-outs in human-readable format, possibly for backup or future reference.
Their supplier can provide the software that accomplishes this, be it at a considerable cost (and possibly: an annual license). Because of this, my friend decided to ask me if a easier/cheaper solution exists.
Examples & Pictures
The files can be read on the console of the robot, an example:
I've done some minor research and I'm fairly sure this is the Report Program Generator (RPG) language developed by IBM. The Assembly-like syntax seems to match; it might be one of the later versions of the language.
My friend has send me an example .rpg file, the contents seem binary with some string literals scattered throughout. Screenshot of the contents of an example file in hexadecimal:
The Question
There is not much, if any, clear information to be found online so I suppose I have multiple questions (for anyone that might know more about this):
Is this (first image) Report Program Generator (RPG) code?
Does the .rpg file contain compiled or processed code? Maybe an intermediate format?
Is it possible to convert files as shown in the example, back to source-code or human-readable format, kind of 'disassemble' it?
If anyone knows more, don't hesitate to give me any information or ask more details if necessary. Thanks in advance!
And maybe not an important question but still something that bugs me (and might indicate I'm on the wrong track):
If this is indeed an RPG program, why would the compiled/processed binary have the .rpg extension, shouldn't the source-file have that? This leads me to believe I'm either (a) assuming the wrong things (the language, etc...) or (b) this is an intermediate format, easier for machines to read, that has to be interpreted by some kind of runtime system.
I don't think that's any version of IBM's RPG language. RPG does have a MOVEL opcode, but it doesn't have any of the others.
Also, all the versions of the IBM language have been intended for business programming. I doubt that it would have been used for robotics.
My guess is that's a proprietary language of the company that makes the robot.
There are some similarities but it does not look like IBM RPG language.
RPG sources are in fact source physical file members. They are not stored in the "traditional" file system but in OS/400 libraries. Therefore RPG sources have no extension. They can be converted to Integrated File System stream file though.
I can't answer this question I'm afraid as it's unknown language to me.
I expect possibly that the OP misidentifies the file type/extension; that the extension is actually .prg, and the files serve as instructions for a Panasonic Industrial Welding Robot. The following forum [drilled down to Panasonic Robots] bills itself as the biggest Industrial Robots Supportforum worldwide!; perhaps a good place to ask about those images provided in the OP, and the inquiry about getting source from what appears to be a binary instruction stream.
FWiW, the first image seems to show that the Ezed utility [on the console] gives that human-readable format, so then the question might be how to get that saved and then how to transfer that elsewhere; e.g. what type of comm ports and file transfer utilities are available from whatever platform/OS.

Extracting ePub Excerpt

I've read about the ePub format, standard, structure, readers, tools and available developer techniques to manipulate/convert/create ePubs but there is no such thing as a magical function (so far) to extract a particular length of characters to create an excerpt of the book. And that's precisely what I'm looking for: A way to extract the first X words of an ePub.
The first approach I'm considering (not my favorite btw) is creating a parser to read all the ePub metadata and start parsing the xml files in the right order until I have enough words to create the excerpt of a determined ePub (I will appreciate some feedback in this direction)
The second way (which I can't find so far) is an existent tool/function or parser (in any language) which returns (hopefully) the plain text of the ePub so I can collect the first X words in order to create my excerpt.
Do you know about any tool which can help me achieve the second option?
You should have a look at Apache Tika: http://tika.apache.org/
You can use it from command line, or as a java library or even in server mode to extract text from ePub.
Hope this will help,
F.
Jose,
I'm not aware of any tool to do what you want. Let me comment on your first approach, though. If you do find a tool I hope these comments allow you to evaluate it.
I think your approach is fine and, if you want to do a good job of creating an extract, you may want to own this step anyway. I would suggest you,
grab the OPF file and look for a GUIDE section. If a GUIDE section exists, check the types that are given. Some are probably not relevant for an excerpt (cover,title-page,copyright-page). Many books will not have the types explicitly stated but this should help where they do.
now go through the files in sequence in the SPINE section, excluding anything that is irrelevant, and read through enough XHTML files to get your excerpt.
while in the OPF file grab a bunch of metadata if this is relevant for the excerpt (title, creator, date are mandatory, I think, and some authors will also put in a whole bunch of other metadata such as keywords).
If you are creating a mini-EPUB with this excerpt you will need to pick up any CSS, Audio, Video, Image and Custom Font files that get referenced in the XHTML files used to make your excerpt. You may even choose to use the original cover file for the cover file of your excerpt epub.
If you working with fixed layout books with fun stuff like Read Aloud AND you want to create a mini-EPUB as an excerpt, you may be better off going with a page count rather than a word count. Don't forget to include any SMIL files into your excerpt and to make it look nice: (i) don't split a two page spread and (ii) make sure that the first page is an odd numbered page if odd in the original or even if even numbered in the original - to do this you may need to add a blank filler page (get the odd/even wrong and subsequent two page spreads won't be facing each other)
I hope that helps.

Binary Serialized File - Delphi

I am trying to deserialize an old file format that was serialized in Delphi, it uses binary seralization. I know nothing about the structure of the file except some very high level records that are in it.
What steps would you take to solve this problem? Any tools etc?
A good hexeditor, and use the gray matter to identify structures.
If you get a hint what kind of file it is, you can search for more specialized tools.
Running the unix/Linux "file" command can be good too (*) See Barry's comment below for how it works. It can be a quick check for common filetypes like DBF,ZIP etc hidden by using a different extension.
(*) there are 3rd party builds for windows, but they might lag in versions. If you can do it on a recent *nix distro, it is advised to do so.
The serialization process simply loops over all published properties and streams their value to a text file. If you do not know the exact classes that were streamed to the file you will have a very hard time deserializing the file. (if not impossible)
A good hex editor is first. If the file is read without buffering (eg read directly from a TFileStream) you could gain some information when using ProcMon from SysInternals; You can see exactly what data is read in what chunks and thus determine more quickly where the boundaries are between the structures you already identified.

Setting up help for a Delphi app

What's the best way to set up help (specifically HTML Help) for a Delphi application? I can see several options, all of which has disadvantages. Specifically:
I could set HelpContext in the forms designer wherever appropriate, but then I'm stuck having to track numbers instead of symbolic constants.
I could set HelpContext programmatically. Then I can use symbolic constants, but I'd have more code to keep up with, and I couldn't easily check the text DFMs to see which forms still need help.
I could set HelpKeyword, but since that does a keyword lookup (like Application.HelpKeyword) rather than a topic jump (like Application.HelpJump), I'd have to make sure that each of my help pages has a unique, non-changing, top-level keyword; this seems like extra work. (And there are HelpKeyword-related VCL bugs like this and this.)
I could set HelpKeyword, set an Application.OnHelp handler to convert HelpKeyword requests to HelpJump requests so that I can assign help by topic ID instead of keyword lookup, and add code such as my own help viewer (based on HelpScribble's code) that fixes the VCL bugs and lets HelpJump work with anchors. By this point, though, I feel like I'm working against the VCL rather than with it.
Which approach did you choose for your app?
When I first started researching how to do this several years ago, I first got the "All About help files in Borland Delphi" tutorial from: http://www.ec-software.com/support_tutorials.html
In that document, the section "Preparing a help file for context sensitive help" (which in my version of the document starts on page 28). It describes a nice numbering scheme you can use to organize your numbers into sections, e.g. Starting with 100000 for your main form and continuing with 101000 or 110000 for each secondary form, etc.
But then I wanted to use descriptive string IDs instead of numbers for my Help topics. I started using THelpRouter, which is part of EC Software's free Help Suite at: http://www.ec-software.com/downloads_delphi.html
But then I settled on a Help tool that supported string ID's directly for topics (I use Dr. Explain: http://www.drexplain.com/) so now I simply use HelpJump, e.g.:
Application.HelpJump('UGQuickStart');
I hope that helps.
We use symbolic constants. Yes, it is a bit more work, but it pays off. Especially because some of our dialogs are dynamically built and sometimes require different help IDs.
I create the help file, which gets the help topic ID, and then go around the forms and set their HelpContext values to them. Since the level of maintenance needed is very low - the form is unlikely to change help file context unless something major happens - this works just fine.
We use Help&Manual - its a wonderful tool, outputting almost any format of stuff you could want, doc, rtf, html, pdf - all from the same source. It will even read in (or paste from rtf (eg MSWord). It uses topic ID's (strings) which I just keep a list of and I manually put each one into a form (or class) as it suits me. Sounds difficult but trust me you'll spend far longer hating the wrong authouring tool. I spent years finding it!
Brian

Resources