Extracting binary data from QR-code with zbar - ios

I'm trying to extract binary data from QR-code with zbar (the QR-code was originally encoded using the iOS SDK passing a NSData object). Unfortunately the ZBarSymbol class only provides the content in a NSString member. Trying to extract a NSData from it using NSISOLatin1StringEncoding seems to work but still fails in some occasions.
I see in the zbar implementation that it is possible to access an object of type zbar_symbol_t that contains a pointer to char. By looking into it, it seems to contain the original content but with additional data of some kind, this is an example:
Original data: 9e7328c16bca3aaff532440917e4df6e155b96bd
Data in zbar_symbol_t: c29e7328c3816bc38a3ac2afc3b532440917c3a4c39f6e155bc296c2bd
Anyone who knows what is exactly that data in zbar_symbol_t, why it is different from data I originally placed in the QR-code and how it is possible, if possible at all, to extract my original data from that?

I am not sure what those bytes represent, probably zbar is trying to interprete the bytes as a UTF-8 string even though the QR is in byte mode.
Switching to zxing fixed everything, there is no interleaved unexpected byte and the raw data contains the entire QR code including the mode, terminator, padding etc... Also it seems to never fails, while zbar seemed to fail sometime.

Related

Unicode characters in scanned Barcode Swift

I have created one bar code scanner application and used AVFoundation native framework. Some of our barcode contains hidden unicode characters and we are unable to scan it. Here is an example of bar code:
]d201000000000010!0000-023
I am getting above code like: \u{1D}01000000000010\u{1D}0000-023
In above barcode ]d2 varies. I am unable to find type of the barcode. How can I parse that Unicode contained string into normal string? Does any one face this type of issue or barcode? Thanks in advance.
\u{1D}01000000000010\u{1D}0000-023 Looks to be a GS1-formatted barcode. Full spec is here And the values after the {1D} delimiter are call "application identifiers" and identify the type of data contained in that field. GS1 is really common in any industry where full supply-chain tracking is needed such as the medical device industry, etc. A concise list of application identifiers is here

NSKeyedArchiver sometimes makes a broken file

My iOS app saves NSCoding objects in Document directory.
NSKeyedArchiver archives them. It is always O.K. but sometimes makes broken files.
The broken files have the following two patterns.
Lack of data
I can convert them to ascii strings and recover meaningful
How do I convert an NSData object with hex data to ASCII in Swift?
They have bplist prefix. But they don’t have the trailers.
Total loss
I cannot convert them to ascii strings.
They look shifting all bytes.
This is one of the headers in the broken files comparing with the correct header.
broken (sequence of characters seems to be different every data):
Nè\à¡<99>K<80>^_È<97>▸T§:Æñã9µú▸Ñ1^LË^VYGfM^A%KÍ<95
expected:
bplist00Ô^A^B^C^D^E^H01T$topX$objectsX$versionY$
Has anyone experience the same case?

GS1 barcode parsing

We need to parse the GS1 datamatrix barcode which will be provided by other party. We know they are going to use GTIN(01), lot number(10), Expiration date(17), serial number (21). The problems is that barcode reader output a string, the format is like this 01076123456789001710050310AC3453G321455777. Since there is not separator and both serial number and lot number are variable length according to GS1 standard, we have trouble to identify segments. My understanding is that it seems like the best way to parse is to embed the parser in the scanning device, not from the application. But we didn't plan an embed software yet. How can I implement the parser? Any suggestions?
There should be a FNC1 character at the end of a variable-length field that is not filled to maximum; so that FNC1 will appear between the G3 and the 21.
FNC1 is invisible to humans but can be detected by scanners and will be reproduced in the string reported by the scanner. Simply send the string directly to a text file and examine the text with a hex reader. the FNC1 should be obvious.
If you can, it might be an idea to swap the sequence of the 21 field and the 10 field since you appear to be using a pure-numeric for 21. This would make the barcode produced a little shorter.
One way to deal with this is to program the scanner to replace FNC1 with space or another plain text character before sending it to your application. The scanner manufacturer usually provides a tool to produce programming bar codes that can do simple substitutions in the scanner. Then you can parse the data without having to handle special characters.

How to write and read float data fast, not using string?

I have many float data which is generated from an image. I want to store it to a file, like XX.dat ( general in C). and I will read it again to do further processing.
I have method to represent float by nsstring and write it in to .txt file. but it is too slow. Is there some function which is same as fwrite( *data , *pfile) and fread(*buf, *pfile) in c? or some new idea?
many thanks!
In iOS you can still make use of the standard low-level file (and socket, among other things) API's. So you can use fopen(), fwrite(), fread(), etc. just as you would in any other C program.
This question has some examples of using the low-level file API on iOS: iPhone Unzip code
Another option to consider is writing your floats into something like an NSMutableData instance, and then writing that to file. That will be faster than converting everything to strings (you'll get a binary file instead of a text one), though probably still not as fast as using the low-level API's. And you'd probably have to use something like this to convert between floats and byte-arrays.
If you are familiar with lower level access, you could mmap your file, and then access the data directly just as you would any allocated memory.

Parsing PDF files

I'm finding it difficult to parse a pdf file that's created in a non-english language. I used pdfbox and itext but couldn't find anything in there that could help parse this file. Here's the pdf file that I'm talking about: http://prapatti.com/slokas/telugu/vishnusahasranaamam.pdf The pdf says that it's created use LaTeX and Tikkana font. I have Tikkana font installed on my machine, but that didn't help. Please help me in this.
Thanks, K
When you say "parse PDF files", my first thought was that the PDF in question wasn't opening in various PDF viewers & libraries, and was therefore corrupt in some way.
But that's not the case at all. It opens just fine in Acrobat Reader X. And then I see the text on the page.
And when I copy/paste that text from the first page, I get:
Ûûp{¨¶ðQ{p{¨|={pÛû{¨>üb¶úN}l{¨d{p{¨> >Ûpû¶bp{¨}|=/}pT¶=}Nm{Z{Úpd{m}a¾Ú}mp{Ú¶¨>ztNð{øÔ_c}m{ТÁ}=N{Nzt¶ztbm}¥Ázv¬b¢Á
Á ÛûÁøÛûzÏrze¨=ztTzv}lÛzt{¨d¨c}p{Ðu{¨½ÐuÛ½{=Û Á{=Á Á ÁÛûb}ßb{q{d}p{¨ze=Vm{Ðu½Û{=Á
That's from Reader.
Much of the text in this PDF is written using various "Type 3" fonts. These fonts claim to use "WinAnsiEncoding" (Also Known As code page 1252), with a "differences" array. This differences array is wrong:
47 /BB 61 /BP /BQ 81 /C6...
The first number is the code point being replaced, the second is a Name of a character that replaces the original value at that code point.
There's no such character names as BB, BP, BQ, C9... and so on. So when you copy-paste that text, you get the above garbage.
I'm sorry, but the only reliable way to extract text from such a PDF is OCR (optical character recognition).
Eh... Long shot idea:
If you can find the specific versions of the specific fonts used to generate this PDF, you just might be able to determine the actual stream contents of known characters converted to Type 3 fonts in this way.
Once you have these known streams, you can compare them to the streams in the PDF and use that to build your own translation table.
You could either fix the existing PDF[s] (by changing the names in the encoding dictionary and Type 3 charproc entries) such that these text extractors will work correctly, or just grab the bytes out of the stream and translate them yourself.
The workflow would go something like this:
For each character in a font used in the form:
render it to PDF by itself using the same LaTeK/GhostScript versions.
Open the PDF and find the CharProc for that particular known character.
Store that stream along with the known character used to build it.
For each text byte in the PDF to be interpreted.
Get the glyph name for the given byte based on the existing encoding array
Get the "char proc" stream for that glyph name and compare it to your known char procs.
NOTE: This could be rewritten to be much more efficient with some caching, but it gets the idea across (I hope).
All that requires a fairly deep understanding of PDF and the parsing methods involved. But it just might work. Might not too...

Resources