I am working on Epson dot matrix printer and printing using Esc/p command. But I have problem when i want to print a border of table. The table border will use the extended ASCII :
So I want to use character : 179, 180, 185, 186, etc.... But when I send it to printer, it print strange character. So how to make it work?
I am trying to select character set but it seems still not work. If anyone know about, please let me know.
There are several sets of ASCII tables in the printer.
The printer documentation shows which table is default.
To send extended character to printer, it must be written as hex byte:
data="\xC9" # example extended byte
If you need more precise answer, please let us know which printer you are using, show example of your code.
Related
I am trying to extract data from this Japanese PDF using tabula-py (and tabula-java), but the output is gibberish. In both tabula-py and tabula-java, the output isn't human readable (definitely not Japanese characters), and there are no no error/warning messages. It does seem that the content of the PDF is processed though.
When using the standalone Tabula tool, the characters are encoded properly:
Searching online in the tabula-py and tabula-java documentation, and below are suggestions I could find, but these don't change the output.
Setting the -Dfile.encoding=utf8 (in java call to tabula-py or tabula-java)
Setting chcp 65001 (in Windows command prompt)
I understand Tabula and tabula-java (and tabula-py) use the same library, but is there something different between the two that would explain the difference in encoding output?
Background info
There is nothing unusual in this PDF compared to any other.
The text like any PDF is written in authors random order so for example the 1st PDF body Line (港区内認可保育園等一覧) is the 1262nd block of text added long after the table was started. To hear written order we can use Read Aloud, to verify character and language recognition but unless the PDF was correctly tagged it will also jump from text block to block
So internally the text is rarely tabular the first 8 lines are
1 認可保育園
0歳 1歳 2歳3歳4歳5歳 計
短時間 標準時間
001010 区立
3か月
3455-
4669
芝5-18-1-101
Thus you need text extractors that work in a grid like manner or convert the text layout into a row by row output.
This is where all extractors will be confounded as to how to output such a jumbled dense layout and generally ALL will struggle with this page.
Hence its best to use a good generic solution. It will still need data cleaning but at least you will have some thing to work on.
If you only need a zone from the page it is best to set the boundary of interest to avoid extraneous parsing.
Your "standalone Tabula tool" output is very good but could possibly be better by use pdftotext -layout and adjust some options to produce amore regular order.
Your Question
the difference in encoding output?
The Answer
The output from pdf is not the internal coding, so the desired text output is UTF-8, but PDF does not store the text as UTF-8 or unicode it simply uses numbers from a font character map. IF the map is poor everything would be gibberish, however in this case the map is good, so where does the gibberish arise? It is because that out part is not using UTF-8 and console output is rarely unicode.
You correctly show that console needs to be set to Unicode mode then the output should match (except for the density problem)
The density issue would be easier to handle if preprocessed in a flowing format such as HTML
or using a different language
I use the printer zebra printer model zd410 and have designed a label with some keyboard inputs that I want to provide in a ZPL payload, instead of using their software for actually filling out a formula since we already have the data from the application that should print the labels.
The label has these 4 Variable keyboard input with following prompt texts.
barcode = text
reservation = text
department = text
size = text
So by sending
LABEL.ZPL^XA
^XFE:LABEL.ZPL^FS
^XZ
I can print a label with the empty values. I was hoping that I could fill them out by doing something like:
LABEL.ZPL^XA
^XFE:LABEL.ZPL^FS
^department=M3
^size=XL
^reservation=0008734
^barcode=000000000001
^XZ
However this does not work, I was wondering how one would achieve something similar.
What you want to do is possible, but requires the recalled format to contain "placeholders" where the data is to be displayed. In the E:LABEL.ZPL format, you need to define numbered fields like:
^FO150,125^A0N,36,20^FN1^FS
Where the ^FN1^FS is the placeholder for the field #1 data. The ^FN#^FS can be placed in the format anywhere you would normally put a ^FD...^FS data field (text and barcode data).
Then you recall the format and supply the data for the placeholder using:
^XA
^XFE:LABEL.ZPL^FS
^FN1^FDMY VALUE^FS
^XZ
That will substitute ^FDMY VALUE^FS for any occurrences of the ^FN1^FS placeholder in the recalled format.
We need to parse the GS1 datamatrix barcode which will be provided by other party. We know they are going to use GTIN(01), lot number(10), Expiration date(17), serial number (21). The problems is that barcode reader output a string, the format is like this 01076123456789001710050310AC3453G321455777. Since there is not separator and both serial number and lot number are variable length according to GS1 standard, we have trouble to identify segments. My understanding is that it seems like the best way to parse is to embed the parser in the scanning device, not from the application. But we didn't plan an embed software yet. How can I implement the parser? Any suggestions?
There should be a FNC1 character at the end of a variable-length field that is not filled to maximum; so that FNC1 will appear between the G3 and the 21.
FNC1 is invisible to humans but can be detected by scanners and will be reproduced in the string reported by the scanner. Simply send the string directly to a text file and examine the text with a hex reader. the FNC1 should be obvious.
If you can, it might be an idea to swap the sequence of the 21 field and the 10 field since you appear to be using a pure-numeric for 21. This would make the barcode produced a little shorter.
One way to deal with this is to program the scanner to replace FNC1 with space or another plain text character before sending it to your application. The scanner manufacturer usually provides a tool to produce programming bar codes that can do simple substitutions in the scanner. Then you can parse the data without having to handle special characters.
I am using PDFKitten for searching strings within PDF documents with highlighting of the results. FastPDFKit or any other commercial library is no option so i sticked to the most close one for my requirements.
As you can see in the screenshot i searched for the string "in" which is always correctly highlighted except the last one. I got a more complex PDF document where the highlighted box for "in" is nearly 40% wrong.
I read the whole syntax and checked the issues tracker but except line height problems i found nothing regarding the width calculation. For the moment i dont see any pattern where the calculation goes or could be wrong and i hope that maybe someone else had a close problem to mine.
My current expectation is that the coordinates and character width is wrong calculated somewhere in the font classes or RenderingState.m. The project is very complex and maybe someone of you had a similar problem with PDFKitten in the past.
I have used the original sample PDF document from PDFKitten for my screenshot.
This might be a bug in PDFKitten when calculating the width of characters whose character identifier does not coincide with its unicode character code.
appendPDFString in StringDetector works with two strings when processing some string data:
// Use CID string for font-related computations.
NSString *cidString = [font stringWithPDFString:string];
// Use Unicode string to compare with user input.
NSString *unicodeString = [[font stringWithPDFString:string] lowercaseString];
stringWithPDFString in Font transforms the sequence of character identifiers of its argument into a unicode string.
Thus, in spite of the name of the variable, cidString is not a sequence of character identifiers but instead of unicode chars. Nonetheless its entries are used as argument of didScanCharacter which in Scanner is implemented to forward the position by the character width: It is using the value as parameter of widthOfCharacter in Font to determine the character width, and that method (according to the comment "Width of the given character (CID) scaled to fontsize") expects its argument to be a character identifier.
So, if CID and unicode character code don't coincide, the wrong character widths is determined and the position of any following character cannot be trusted. In the case at hand, the /fi ligature has a CID of 12 which is way different from its Unicode code 0xfb01.
I would propose PDFKitten to be enhanced to also define a didScanCID method in StringDetector which in appendPDFString should be called next to didScanCharacter for each processed character forwarding its CID. Scanner then should make use of this new method instead to calculate the width to forward its cursor.
This should be triple-checked first, though. Maybe some widthOfCharacter implementations (there are different ones for different font types) in spite of the comment expect the argument to be a unicode code after all...
(Sorry if I used the wrong vocabulary here or there, I'm a 'Java guy... :))
I need to print a receipt from my web based apps using dot matrix printer epson tm-u220d (pos printer).
I need to know, should I generate the receipt in html or in plain text ?
I ever saw some commands for dot matrix printer to change the font size, line feed etc .. but I don't remember that commands. if I have to use plain text I need to use that commands. anyone knows where i can get the references ?
Thanks
There is a very good chance that these printers support Esc/P2 which was the Escape codes required to do some formatting on the printer...Here's a link to the RawPrinterHelper...
How are you connected to the printer? Parallel, USB, You may need to add a generic text print driver to allow the means of writing raw escape code sequences to be sent to the printer...for an example, here's an example code that needs to be sent to the printer, depending on how you implement this, to give additional flexibility, the class could parse for simple html codes and re-interpret them as Esc/P2 codes
This will be printed in bold
|
V
0x1b0x69This will be printed in bold0x1b0x70
0x1b is Escape, 0x69 is E (Turns on Bold)
0x1b ' " , 0x70 is F (Turns off Bold)
It appears that this printer has a windows driver:
http://www.posguys.com/12_12/Epson-TM-U220_502/
If that is the case, then you can try to print via html. If that doesn't work and you have the ability to create pdf's, you can print the pdf to the windows driver and you should be set. Most pdf generation libraries permit changing the size of the paper, so with some experimentation you can probably make it work. I actually have a web app that does this ... it generates a pdf sized for the printer and the user prints to a label printer from acrobat.