Empty Page!! Error when trying to generate a box file with tesseract - image-processing

I'm trying to train tesseract on a custom font.
When try to generate a training file with: tesseract lang.font-name.exp0.tif lang.font-name.exp0.box nobatch box.train I get the error Empty page!!
I've been following this guide. I've created my tif file, loaded it into jTessBoxEditor and manually added the boxes as none were auto populated.
I'm under the impression all I need is the tif file and the box file, those get fed in the box.train command and out comes a tf file.
What does Empty Page!! mean and how do I fix it?

Related

pdfjs generate new file lack text

I used pdfjs to split the pdf files, then add some watermark(the watermark string contains some Chinese). Finally, generate a new pdf file for browsing. Strangely, some of the text appears incomplete.What's happening?
--------libs version as follows:
pdfjs: 2.16.105
pdf-lib: 1.17.1
vue: 2.6.10
Somebody said: "The CMap is compressed by Webpack". I put it in static resource directory, then the problem still exists. What do I need to do to fix it? thx

Trying to make a list of all the images for preprocessing but getting an error

I am trying to add all the images in the list from my computer in the jupyter notebook. But after creating an object name as class_name and put it equal to covid, I created a sourse_dir file and destination_dir file, and when I run it shows no error. but when I try to make a list of source_dir files I get an error the system can not find the path. does anyone can help. please see the attached image of my jupyter notebook and path of source_dir file. . can anyone help me why I am getting this error.
enter image description here
enter image description here

Why jupyter is not able to download as pdf a markdown cell using LaTex \mathscr?

Just created a markdown cell in Jupyter using some equations, and some of them using \mathscr to have like "math" fonts. When I run the kernel containing the equations everything is ok, however when I click the option to Download as PDF via LaTex, I'm getting the error below:
! Undefined control sequence.
l.300 [\mathscr
{L}({\bf{y}}|\beta, \sigma^2, {\bf{X}}) = (2\pi\sigma^2)^{-...
?
! Emergency stop.
l.300 [\mathscr
{L}({\bf{y}}|\beta, \sigma^2, {\bf{X}}) = (2\pi\sigma^2)^{-...
If I remove the \mathscr part everything can be exported with no issues (excepting some convertion problems for special characters), however, I wanted to know ho to solve it. I've been reading and it looks like the nbconvert configuration file can be modified to solve this, but I couldn't find the mentioned file and the exact way to modify it
Thanks for your help
I think the problem is with absent \usepackage{mathrsfs} directive in an intermediate .tex-file.
So you have a several ways to overcome it.
If you face with this problem occasianaly you could the following:
download the .tex-file instead pdf;
manually insert to \usepackage{mathrsfs} to it.
before the first \usepackage for example;
run something like
xelatex file.tex to finally convert to pdf.
If you will do it often, you could try to edit appropriate jinja-template.
At first, find the place where nbconvert was installed. For example with pip: pip show nbconvert. Imagine the path is /home/i/.local/lib/python3.5/site-packages
Then the template would be at /home/i/.local/lib/python3.5/site-packages/nbconvert/templates/latex/base.tplx.
And again: just add \usepackage{mathrsfs} right after ((* block packages *)).
Voila -- the problem should gone.
At the end you have the third option -- you can create your own template from scratch and use it with nbconvert. I don't think it's very convenient way to solve your problem. You could read more in the documentation: http://nbconvert.readthedocs.io/en/latest/customizing.html

How to change the default image viewer in SimpleITK

I am using the SImpleITK and ipython notebook for image processing and as we know imagej is the default image viewer in simpleitk. But the type of image i am using is .mha and .mha is not supported by imagej. For this I have to use fiji or imagej2.
I have tried to make fiji as a default image viewer according to the instruction mentioned in simpleitk FAQs but did not get success and show the following message:
I want to know what mistake I am doing .
How to make Fiji or imagej2 as a default image viewer for all types of images when I work with simpleitk and ipython notebook.
Thanks.
Looks to me like you still have ImageJ in your STIK_SHOW_COMMAND variable which is not found in your fiji folder.
Read the instructions again. I assume you did not follow them well enough. Your variable should contain the new image viewer and no ImageJ in a different folder.
Try to change the extension of the file with the system variable SITK_SHOW_EXTENSION and try to provide the command to the visualisation software with SITK_SHOW_COMMAND.
In my .bash_profile I have:
export SITK_SHOW_COMMAND='itksnap'
Other instructions can be obtained with ?sitk.Show() .
Try changing the "%F" to "%f". It is case sensitive. Or actually, you can just leave it off. If there is no "%f" it will just put the file name at the end of the command line.
Also, you can use SimpleITK/ImageJ to view MHA files. SimpleITK actually writes out a Nifti file by default when Show is called, regardless of the input image.

Combine multi-page PDFs into one PDF with ImageMagick

I am trying to use ImageMagick (6.8.0) to combine several multi-page PDFs into a single PDF. This command:
$ convert multi-page-1.pdf multi-page-2.pdf merged.pdf
Returns merged.pdf, which contains the first page of multi-page-1.pdf and the first page of multi-page-2.pdf.
This command:
$ convert multi-page-1.pdf[2] multi-page-2.pdf[2] merged.pdf
Returns merged.pdf, which contains the third page of multi-page-1.pdf and the third page of multi-page--2.pdf.
I would like to merged.pdf to contain all of the pages of each multi-page pdf. I have so far not found a way of telling the convert command to use a range of pages, although I have tried adding [0-1] and [0,1] at the end of the filenames.
Interestingly, this ghostscript command (which I found via StackOverflow but cannot re-find) does work as I would like it to:
$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf multi-page-1.pdf multi-page-2.pdf
The problem is, the ImageMagick 'convert' command takes urls as inputs and ghostscript does not, and I need my program to take url input rather than file paths.
Is it possible to get the result of the above ghostscript command using ImageMagick?
Why don't you use pdfunite?
Example:
$ pdfunite 1.pdf 2.pdf 3.pdf merged.pdf
I asked this question on an internal company forum, and the conclusion was that there is no way to do the type of document merging we would like to do with ImageMagick without first downloading the file the the local filesystem.
For those of you using Heroku, we are taking advantage of the Heroku 'tmp' directory in order to save the file "locally" on staging and production: https://devcenter.heroku.com/articles/read-only-filesystem
Once we save the file in 'tmp', we will iterate through each page of the pdf and save them all separately. We will find the number of PDF pages using the 'pdf-reader' gem.
EDIT:
Here is the custom paperclip processor I wrote to deal with this (all files are pulled down to the tmp directory beforehand):
https://gist.github.com/jessieay/5832466

Resources