print web on dot matrix receipt printer - printing

I need to print a receipt from my web based apps using dot matrix printer epson tm-u220d (pos printer).
I need to know, should I generate the receipt in html or in plain text ?
I ever saw some commands for dot matrix printer to change the font size, line feed etc .. but I don't remember that commands. if I have to use plain text I need to use that commands. anyone knows where i can get the references ?
Thanks

There is a very good chance that these printers support Esc/P2 which was the Escape codes required to do some formatting on the printer...Here's a link to the RawPrinterHelper...
How are you connected to the printer? Parallel, USB, You may need to add a generic text print driver to allow the means of writing raw escape code sequences to be sent to the printer...for an example, here's an example code that needs to be sent to the printer, depending on how you implement this, to give additional flexibility, the class could parse for simple html codes and re-interpret them as Esc/P2 codes
This will be printed in bold
|
V
0x1b0x69This will be printed in bold0x1b0x70
0x1b is Escape, 0x69 is E (Turns on Bold)
0x1b ' " , 0x70 is F (Turns off Bold)

It appears that this printer has a windows driver:
http://www.posguys.com/12_12/Epson-TM-U220_502/
If that is the case, then you can try to print via html. If that doesn't work and you have the ability to create pdf's, you can print the pdf to the windows driver and you should be set. Most pdf generation libraries permit changing the size of the paper, so with some experimentation you can probably make it work. I actually have a web app that does this ... it generates a pdf sized for the printer and the user prints to a label printer from acrobat.

Related

Gibberish table output in tabula-java for Japanese PDF but works in standalone Tabula

I am trying to extract data from this Japanese PDF using tabula-py (and tabula-java), but the output is gibberish. In both tabula-py and tabula-java, the output isn't human readable (definitely not Japanese characters), and there are no no error/warning messages. It does seem that the content of the PDF is processed though.
When using the standalone Tabula tool, the characters are encoded properly:
Searching online in the tabula-py and tabula-java documentation, and below are suggestions I could find, but these don't change the output.
Setting the -Dfile.encoding=utf8 (in java call to tabula-py or tabula-java)
Setting chcp 65001 (in Windows command prompt)
I understand Tabula and tabula-java (and tabula-py) use the same library, but is there something different between the two that would explain the difference in encoding output?
Background info
There is nothing unusual in this PDF compared to any other.
The text like any PDF is written in authors random order so for example the 1st PDF body Line (港区内認可保育園等一覧) is the 1262nd block of text added long after the table was started. To hear written order we can use Read Aloud, to verify character and language recognition but unless the PDF was correctly tagged it will also jump from text block to block
So internally the text is rarely tabular the first 8 lines are
1 認可保育園
0歳 1歳 2歳3歳4歳5歳 計
短時間 標準時間
001010 区立
3か月
3455-
4669
芝5-18-1-101
Thus you need text extractors that work in a grid like manner or convert the text layout into a row by row output.
This is where all extractors will be confounded as to how to output such a jumbled dense layout and generally ALL will struggle with this page.
Hence its best to use a good generic solution. It will still need data cleaning but at least you will have some thing to work on.
If you only need a zone from the page it is best to set the boundary of interest to avoid extraneous parsing.
Your "standalone Tabula tool" output is very good but could possibly be better by use pdftotext -layout and adjust some options to produce amore regular order.
Your Question
the difference in encoding output?
The Answer
The output from pdf is not the internal coding, so the desired text output is UTF-8, but PDF does not store the text as UTF-8 or unicode it simply uses numbers from a font character map. IF the map is poor everything would be gibberish, however in this case the map is good, so where does the gibberish arise? It is because that out part is not using UTF-8 and console output is rarely unicode.
You correctly show that console needs to be set to Unicode mode then the output should match (except for the density problem)
The density issue would be easier to handle if preprocessed in a flowing format such as HTML
or using a different language

Escp command to print table border

I am working on Epson dot matrix printer and printing using Esc/p command. But I have problem when i want to print a border of table. The table border will use the extended ASCII :
So I want to use character : 179, 180, 185, 186, etc.... But when I send it to printer, it print strange character. So how to make it work?
I am trying to select character set but it seems still not work. If anyone know about, please let me know.
There are several sets of ASCII tables in the printer.
The printer documentation shows which table is default.
To send extended character to printer, it must be written as hex byte:
data="\xC9" # example extended byte
If you need more precise answer, please let us know which printer you are using, show example of your code.

Cannot Serve International Characters From Lisp Portable AllegroServe

I am using Clozure Cl on Mac os x 10.9 and Portable allegro serve
I have a file with text has characters like ı ç ş ö (these are some characters Turkish also have) and some Arabic characters. I cannot serve them. when i visit from the browser this kind of characters are not displayed at all, only part of text showed is the ones until the first ı in the text.
In Lisp i use a function composed with a do and read-lines and format (or i have tried print princ prin1 also) reads entire document and when i set the :external-format :utf-8 it shows the read characters properly in Lisp. Problem is in serving them, if i can serve them as i read on Lisp it will be done.
Also If do not set :external-formatat all, in Lisp it is read improperly, as expected, however, this time the browser can show all the text but with wrong characters in place of above described characters.
How to fix that and use external-formats character encodings properly?
See http://www.xach.com/lisp/allegro-cl/2001-3/964.html for an example on how to use :external-format in AllegroServe.
Cheers
Frank
P.S. I also posted an answer to the same question newsgroup comp.lang.lisp .

Determine if received data is PostScript or PCL

I have a service that receives printer data via tcp/ip. When the data is received, is there reliable, efficient way to examine the data stream and determine if the data is PostScript vs PCL data? For example, are there characters I could look for at the beginning of the data stream to indicate the format?
I would probably just count the number of escape characters in the file. PCL will have gobs of them. Postscript will have gobs of % signs. That isn't a perfect solution, but it's dead simple and I'll bet it would actually be quite reliable.
The only "real" way I can see doing this is to actually parse the PCL and parse the postscript and see which one works.
I'll add my 2¢.
Like others have mentioned here, your first stab at programmatically identifying the document would be to look at the first two characters. If it starts with %!, it is PostScript, if it starts with an escape character (hex 1B, oct 033, ascii 27), as very likely PCL will start with PCL commands, then it is PCL. This will likely resolve 99% of the documents you need to process. If it still isn't known, then you can search the document for a showpage string. If it's PostScript, it has to have a showpage to render the page. If you can't find one, and if there are any escape characters in the file, you know it is PCL, and you can err on the side of PCL if there is no showpage, and there is no escape characters, because raw text files are valid PCL and printers can blort them out as they come.
Postscript data must begin with "%!ps" or "%!PS" - it may be a longer readable string like "%!PS-Adobe-3.0" - but that is basically this.
Most likely PCL have a similar signature - I remember seeing it in the past.
According to the PCL 5 General Printing FAQs PCL files should start with ESC "E". I assume another ESC sequence must follow. So my guess is that files starting with bytes 1B 45 1B are most likely PCL files.
This leaves PCL files unrecognized which don't adhere to this rule.
In my use case it's macOS that always produces PCL with the ESC E at the beginning.

Parsing PDF files

I'm finding it difficult to parse a pdf file that's created in a non-english language. I used pdfbox and itext but couldn't find anything in there that could help parse this file. Here's the pdf file that I'm talking about: http://prapatti.com/slokas/telugu/vishnusahasranaamam.pdf The pdf says that it's created use LaTeX and Tikkana font. I have Tikkana font installed on my machine, but that didn't help. Please help me in this.
Thanks, K
When you say "parse PDF files", my first thought was that the PDF in question wasn't opening in various PDF viewers & libraries, and was therefore corrupt in some way.
But that's not the case at all. It opens just fine in Acrobat Reader X. And then I see the text on the page.
And when I copy/paste that text from the first page, I get:
Ûûp{¨¶ðQ{p{¨|={pÛû{¨>üb¶úN}l{¨d{p{¨> >Ûpû¶bp{¨}|=/}pT¶=}Nm{Z{Úpd{m}a¾Ú}mp{Ú¶¨>ztNð{øÔ_c}m{ТÁ}=N{Nzt¶ztbm}¥Ázv¬b¢Á
Á ÛûÁøÛûzÏrze¨=ztTzv}lÛzt{¨d¨c}p{Ðu{¨½ÐuÛ½{=Û Á{=Á Á ÁÛûb}ßb{q{d}p{¨ze=Vm{Ðu½Û{=Á
That's from Reader.
Much of the text in this PDF is written using various "Type 3" fonts. These fonts claim to use "WinAnsiEncoding" (Also Known As code page 1252), with a "differences" array. This differences array is wrong:
47 /BB 61 /BP /BQ 81 /C6...
The first number is the code point being replaced, the second is a Name of a character that replaces the original value at that code point.
There's no such character names as BB, BP, BQ, C9... and so on. So when you copy-paste that text, you get the above garbage.
I'm sorry, but the only reliable way to extract text from such a PDF is OCR (optical character recognition).
Eh... Long shot idea:
If you can find the specific versions of the specific fonts used to generate this PDF, you just might be able to determine the actual stream contents of known characters converted to Type 3 fonts in this way.
Once you have these known streams, you can compare them to the streams in the PDF and use that to build your own translation table.
You could either fix the existing PDF[s] (by changing the names in the encoding dictionary and Type 3 charproc entries) such that these text extractors will work correctly, or just grab the bytes out of the stream and translate them yourself.
The workflow would go something like this:
For each character in a font used in the form:
render it to PDF by itself using the same LaTeK/GhostScript versions.
Open the PDF and find the CharProc for that particular known character.
Store that stream along with the known character used to build it.
For each text byte in the PDF to be interpreted.
Get the glyph name for the given byte based on the existing encoding array
Get the "char proc" stream for that glyph name and compare it to your known char procs.
NOTE: This could be rewritten to be much more efficient with some caching, but it gets the idea across (I hope).
All that requires a fairly deep understanding of PDF and the parsing methods involved. But it just might work. Might not too...

Resources