How to get page no in tesseract while ocring multi tiff file - image-processing

How can we get the page no in command line while ocring a multi tiff file. For eg -
tesseract myfile.tif output-page_no.txt
Here the output file should have the corresponding page no from the tiff file.

The hocr output option would produce page numbers. The text, however, can output page breaks, given the appropriate switches:
tesseract -c include_page_breaks=1 -c page_separator="[PAGE SEPRATOR]" 109359.tiff 109359
See this post.

Related

importing an external txt file that produces an ascii art image using stdin and stdout

My codeI need to convert a text file that has the number of slashes, spaces, and dots needed to make the desired image of ASCII art using stdin and stdout.
I'm having trouble importing the text.file
snippet of my text file i was given to convert to the picture provided below
image i need to create in shell with the provided data above

How to output multiple pages in gnuplot cairolatex terminal?

As written in the title, I am trying to create a multipage pdf with gnuplots cairolatex terminal.
I am using gnuplot 5.4 in cygwin.
Single page works fine, i.e.
./gnuplot-script
pdflatex plot.tex
However when I plot multiple pages in the gnuplot-script, the output .tex file seems to contain errors ..
E.g. the gnuplot-script
set terminal cairolatex standalone
set output "plot.tex"
plot x
plot x**2
outputs a plot.tex that contains two \documentclass{minimal} and pdflatex then complains with
! LaTeX Error: Can be used only in preamble.
...
l.181 \documentclass
{minimal}
I can workaround this by putting each plot into a new file, but it seems a bit strange that simple multipage output is bugged in this terminal?
Am I missing some special command to start a new page in the cairolatex terminal or something? I don't see anything in the documentation for this ..
If you really need to create a TeX-based multipage pdf file directly from gnuplot, I suggest to use the tikz terminal rather than cairolatex.
set terminal tikz standalone
set output "plot.tex"
plot x
plot x**2
unset output
!pdflatex plot

JTessBoxEditor is not identifying the characters and making boxes

I am trying to train my tesseract by making box characters in the images but the JTessBoxEditor is not recognising any characters. When running the command --> tesseract eng.arial.exp1.tiff eng.arial.exp1 batch.nochop makebox--> it is creating a box file on the same .tiff image but it I do not know how to edit the text file. Is there any reason the JTessBoxEditor is not recognising any characters? Is there any alternative training software or method.
I expected that each .tiff file will be boxed automatically. Since the .box files are being generated with tesseract, is there any way to precisely edit the .box file to feature the correct characters
Once you got the box file, you can switch to Box Editor tab, open it, and start editing.
http://vietocr.sourceforge.net/training.html

postscript with embedded logo eps file for printing a report

I am trying to write a print report with postscript being the form design and the body of the report is dynamic data received from a database.
I need to include at least 2 eps files.
1, Being our customer logo, which has been converted to eps.
2, A postscript form design, basically a square box that fills the page
with a tail description at the bottom, which can be several lines.
it will also be nice if i can have a 3rd eps file which is almost identical to the 2nd eps file, but to specify the end of the report with some extra details at the tail end (bottom of the page).
The Basic flow of my program:
1. print header -this will be the eps of logo
2. print data line from a database source
check for end of page?
No...... > loop back to print the next data line.
Yes......> print 2nd eps file
(print next page)
loop back to print header (1st eps file) and continues with the report
until end of report.
my issue is:
if I only use the logo eps, every page will print the logo header and works well.
But if I include the 2nd eps file,
the first page will print the logo and the boxing from the 2nd eps file correctly,
but subsequent pages will only print the boxing from the 2nd eps file, and no logo being printed.
this issue appears to occur whenever i use any logo image that has been converted to eps, so if i replace my 2nd eps with just another logo, only one logo will be printed, but if i write 2 postscript each with a box and 2 different tails, the report will print perfectly.
But if I only use the logo eps, that will also work well.
any suggestion please?
(sorry i was not able to include the postscript in question.....i kept getting error when i tried to past in my code).
but any simple postscript that draw a box and a few descriptive text
and a logo that has been converted to an eps file will cause the issue.
Hi KenS and everyone else who was following my question.
i have changed the way i have created the 2 eps.
i have generated the logo eps using GIMP as before.
but now, i have seen the code below from
https://www.math.ubc.ca/~cass/graphics/import/sample/combined.html
this allows to combine more then one eps
after adding the defined code in the link above
each of our eps needs to start with:-
BeginEPSE
%%BeginDocument: (type the name for the following eps)
(your eps code ) in my case was the Form Box and tail
%%EndDocument
EndEPSE
BeginEPSE
%%BeginDocument:
<.... copy all the code for the logo >
%%EndDocument
EndEPSE
this has solved my issue, I am sure my original method should still work...maybe with time??

Multiple page pdf to single png using ghostscript

I am trying to convert a multi page pdf to png using following command
gs -q -dNOPAUSE -dBATCH -sDEVICE=png256 -r600x600 -sOutputFile=out.png in.pdf
However it only converts the first page. I can pass a %d in out%d.png to get multiple pages, but I want a single png output. Is that possible ? I am aware of convert utility of imagemagick which does that (using gs as a delegate), but I want to do directly using gs.
I wasn't aware it was possible to have a PNG file which contains mutliple different images. As far as I am aware it is not possible to produce such a file with Ghostscript.

Resources