I am trying to extract text part of an image using Tesseract-OCR and OpenCV in Python. I have attached an example image as below:
It cannot capture '[' and ']' properly. The extraction output of this image is (testScreenshot):
Elektronik Mühendisliği Bölümü
Ozturkfat)osmaniye.edu.tr
0328 8271000
Expected result is [at] instead of fat). If I change the language to English rather than Turkish, fat] is captured. Don't you that this is weird ? How can I capture properly this as [at] with the setting of Turkish?
Thanks in advance
from PIL import Image
import pytesseract
plainText = pytesseract.image_to_string(Image.open(testScreenshot), lang='tur', config=tessdata_dir_config)
print(plainText)
Edit: If I give only '[' and ']', it also do not capture inside of the bracket as well. Example input image is:
The output:
rolfat)
rolfat)
As you can see that, right half of the image ([at]) not captured because I remove the beginning text (rol). Somehow, it is sensitive to the characters of [ and ]. They might be sharper on the image compared to other characters. This can be a reason ?
How can I use the tesseract to extract the mathematical equation?
While reading the image given below:
after using:
img = cv2.imread(IN_PATH+'sample1.png')
pytesseract.image_to_string(img)
I get the result as:
'The value of 7/8144 is\n- (a) 20.2 (b) 20.16\n(c) 20.12 (d) 20.4'
With the older versions, I could have used
config='-l eng + equ'
pytesseract.image_to_string(img,config=config)
but the equ is no longer supported in the tesseract 4.0+.
I have the equ.traineddata file too but I do not know how that'll work and when I tried to paste it inside the /usr/share/tesseract-ocr/4.00/tessdata/ it threw an error that it can not be copied.
Please help how can I extract some text with simple mathematics symbols in it.
I was checking some Lua source, trying to get and learn from them, but it seems there are encoded & obsfuscated.
I decoded it using base64 decode, but still unreadable.
Is there any ways to desobfuscate it?
LuaR“
æÆì~>o¢by„A#€ÁÀAA†AÅÂAFB„K¥Jƒƒ„JÃB…¥CJƒ†¥ƒJƒƒ†ŒCÀC€‹ÀÝ€EÀ À…ŠÃ
âƒcþåÃ%eD‹Á„…AÅEÁFA†ÆÁGA‡ŠÄÅ Š„ÅŠF
ŠDÆ
Š„FŠÄÆŠGŠDÇŠ„G
ŠÄÇ
ŠH‹Á‡ˆAÈHÁIA‰ÉÁ JAŠ
ÁJ‹AËKÁ L AŒ Ì Á
M
A
Í
Á
ÁJ‹AËKÁ L AŒ Ì Á
M
A
Í
Á
This is a precompiled Lua 5.2 script.
You can see its contents with luac -l -p foo.
Make sure you use luac from Lua 5.2. If in doubt, try luac -v.
Sure: luadec
Just curious, why did you tried base64? That chunk you provided is a simple lua code, translated to lua vm bytecodes. it is not even obfuscated.
This is compiled lua source. You can use this tool to decompile. It isn't actually obfuscated.
I've been using Doxygen successfully to generate PDF documentation for a sizable Fortran 90 project since v1.6. After a recent upgrade to Doxygen 1.8, pdflatex is choking with an error I can't understand. From refman.log:
.
.
.
<use classfate__source_a022bf629bdc1d3059ebd5fb86d13b4f4_icgraph.pdf>
Package pdftex.def Info: classfate__source_a022bf629bdc1d3059ebd5fb86d13b4f4_ic
graph.pdf used on input line 607.
(pdftex.def) Requested size: 350.0pt x 65.42921pt.
)
(./classm__aerosol.tex
! Undefined control sequence.
<recently read> \LT#LL#FM#cr
l.25 ...1833ffa6f2fae54ededb}{ia\-\_\-nsize}), \\*
? ?
Type <return> to proceed, S to scroll future error messages,
R to run without stopping, Q to run quietly,
I to insert something, E to edit your file,
1 or ... or 9 to ignore the next 1 to 9 tokens of input,
H for help, X to quit.
Looking at the first 25 lines of classm__aerosol.tex, nothing obviously matches the error message:
\hypertarget{classm__aerosol}{\section{m\-\_\-aerosol Module Reference}
\label{classm__aerosol}\index{m\-\_\-aerosol#{m\-\_\-aerosol}}
}
Contains general aerosol-\/related constants and routines.
\subsection*{Public Member Functions}
\begin{DoxyCompactItemize}
\item
subroutine \hyperlink{classm__aerosol_aa06c1f39c6bd34f22be92d21535f0320}{aerdis} (I\-A\-E\-R\-O, M\-A\-E\-R\-O, V\-O\-L, A\-R\-E\-A, M\-U, T\-G\-A\-S, R\-H\-O, A\-G\-A\-M\-M\-A, X\-L\-A\-E\-R, D\-M\-E\-A\-N, N\-A\-E\-R, X\-N\-D\-A\-E\-R, L\-S\-D\-A\-E\-R)
\begin{DoxyCompactList}\small\item\em Return aerosol mass given a volume, based on aerosol size distribution function. \end{DoxyCompactList}\item
real(kind=wp) function \hyperlink{classm__aerosol_a2dff4ff413057e8788fba7270a30c093}{lamsed} (V\-O\-L, H, M\-U\-G, R\-H\-O\-A\-E\-R, A\-G\-A\-M\-M\-A, A\-C\-H\-I, A\-F\-E\-O, K\-O, M\-A\-E\-R, F\-M\-A\-E\-R, F\-A\-E\-R\-S\-S, F\-S\-E\-D\-D\-K)
\begin{DoxyCompactList}\small\item\em Calculate aerosol removal constant and interpolation factor between steady-\/state and decaying aerosol correlations. \end{DoxyCompactList}\item
pure real(kind=wp) function \hyperlink{classm__aerosol_a6d0a04004f49c404c67e0aa69dd39ee1}{fdbend} (V\-E\-L, H\-S\-E\-D, T\-G, R\-H\-O\-G, M\-U\-G, R\-H\-O\-P\-A\-R, C\-A\-E\-R\-O, X\-D\-B\-E\-N\-D, N90\-J)
\begin{DoxyCompactList}\small\item\em Find total impaction efficiency for aerosol deposition considering 90-\/degree bends in a flow path. \end{DoxyCompactList}\end{DoxyCompactItemize}
\subsection*{Public Attributes}
\begin{DoxyCompactItemize}
\item
integer, parameter \hyperlink{classm__aerosol_a8f604b7ffe3c1833ffa6f2fae54ededb}{ia\-\_\-nsize} = 30
\item
integer, parameter \hyperlink{classm__aerosol_ae71813ecf0c7768af9d6292efb14774f}{ia\-\_\-nmass} = 10
\item
real(kind=wp), dimension(\hyperlink{classm__aerosol_a8f604b7ffe3c1833ffa6f2fae54ededb}{ia\-\_\-nsize}), \\*
Nothing obviously matches the recently read chunk "\LT#LL#FM#cr" and I don't know enough low-level TeX to translate that into something that might actually be in the source text.
Suspecting this might have been fixed in a later version of Doxygen than the one shipping with Linux Mint (v1.8.1.2), I built & installed v1.8.3.1 from source, updated my doxyfile, blew away the old documentation and regenerated it. I get the same baffling error.
There's nothing obvious in refman.log that would indicate missing or broken LaTeX packages and I'm completely at a loss as to what's causing this.
As this still gets a hit on Google when you search:
doxygen missing $ inserted
I would like to add something.
Do not use a PROJECT_NAME containing underscores (_)!
After a brief look into the doxygen's current documentation (I am using 1.8.4) it does not make that explicit.
this will be difficult to solve unless you provide a bit more information - possibly using \errorcontextlines=9999 as suggested in the comments on the question.
as a first short though, the name of the control sequence that can't be found (i.e. \LT#LL#FM#cr) is one defined by the longtable package (documentation, p. 15) - thus adding:
\usepackage{longtable}
to the preamble of the document might help.
If so, according to the doxygen documentation here, adding the following to your configuration file should do the trick:
EXTRA_PACKAGES=longtable
I'm trying to do some parsing of a bunch of haskell source files using haskell-src-exts but ran into trouble in the first file I tested on. Here is the first bit:
{-# LANGUAGE CPP, MultiParamTypeClasses, ScopedTypeVariables #-}
{-# OPTIONS_GHC -Wall -fno-warn-orphans #-}
----------------------------------------------------------------------
-- |
-- Module : FRP.Reactive.Fun
-- Copyright : (c) Conal Elliott 2007
-- License : GNU AGPLv3 (see COPYING)
--
-- Maintainer : conal#conal.net
-- Stability : experimental
--
-- Functions, with constant functions optimized, with instances for many
-- standard classes.
----------------------------------------------------------------------
module FRP.Reactive.Fun (Fun, fun, apply, batch) where
import Prelude hiding
( zip, zipWith
#if __GLASGOW_HASKELL__ >= 609
, (.), id
#endif
)
#if __GLASGOW_HASKELL__ >= 609
import Control.Category
#endif
And the code I'm using to test:
*Search> f <- parseFile "/tmp/file.hs"
*Search> f
ParseFailed (SrcLoc {srcFilename = "/tmp/file.hs", srcLine = 19, srcColumn = 1}) "Parse error: ;"
The issue appears to be the CPP conditional sections, but it appears that CPP is a supported extenstion. I'm using haskell-src-exts-1.11.1 with ghc 7.0.4
I'm just trying to do some quick and dirty analysis, so I don't mind stripping out those sections before parsing if I have to, but better solutions would be welcomed.
Possibly use cpphs to "evaluate" the pre-processor statements first?
Also, that is the known extension list copied (and extended) from Cabal; haskell-src-exts doesn't support CPP.