I am trying to create a program that scrapes images from the web in Lua. A minor problem is that images sometimes have no extension or incorrect extensions. See this animated "jpeg" for example: http://i.imgur.com/Imvmy6C.jpg
So I created a function to detect the filetype of an image. It's pretty simple, just compare the first few characters of the returned image. Png files begin with PNG, Gifs with GIF, and JPGs with the strange symbol "╪".
It's a bit hacky since images aren't supposed to be represented as strings, but it worked fine. Except when I actually ran the code.
When I enter the code into the command line it works fine. But when I run a file with the code in it, it doesn't work. Weirder, it only fails on jpegs. It still correctly recognizes PNGs and GIFs.
Here is the minimal code necessary to reproduce the bug:
http = require "socket.http"
function detectImageType(image)
local imageType = "unknown"
if string.sub(image, 2, 2) == "╪" then imageType = "jpg" end
return imageType
end
image = http.request("http://i.imgur.com/T4xRtBh.jpg")
print(detectImageType(image))
Copy and pasting this into the command line returns "jpg" correctly. Running this as a file returns "unknown".
I am using Lua 5.1.4 from the Lua for Windows package, through powershell, on Windows 8.1.
EDIT:
Found the problem string.byte("╪") returns 216 on the command line and 226 when run as a file. I have no idea why, maybe different encodings for lua and powershell?
This line solves the problem:
if string.byte(string.sub(image, 2, 2)) == 216 then imageType = "jpg" end
I think it's because when you're saving your file you're saving it as a different encoding so the ╪ character may be translated to another character. It's more robust to convert it to the byte code:
http = require "socket.http"
function detectImageType(image)
local imageType = "unknown"
if string.byte(image, 2) == 216 then imageType = "jpg" end
return imageType
end
image = http.request("http://i.imgur.com/T4xRtBh.jpg")
print(detectImageType(image))
Related
I'm using Lua in Scite on Windows, but hopefully this is a general Lua question.
Let's say I want to write a temporary string content to a temporary file in Lua - which I want to be eventually read by another program, - and I tried using io.tmpfile():
mytmpfile = assert( io.tmpfile() )
mytmpfile:write( MYTMPTEXT )
mytmpfile:seek("set", 0) -- back to start
print("mytmpfile" .. mytmpfile .. "<<<")
mytmpfile:close()
I like io.tmpfile() because it is noted in https://www.lua.org/pil/21.3.html :
The tmpfile function returns a handle for a temporary file, open in read/write mode. That file is automatically removed (deleted) when your program ends.
However, when I try to print mytmpfile, I get:
C:\Users\ME/sciteLuaFunctions.lua:956: attempt to concatenate a FILE* value (global 'mytmpfile')
>Lua: error occurred while processing command
I got the explanation for that here Re: path for io.tmpfile() ?:
how do I get the path used to generate the temp file created by io.tmpfile()
You can't. The whole point of tmpfile is to give you a file handle without
giving you the file name to avoid race conditions.
And indeed, on some OSes, the file has no name.
So, it will not be possible for me to use the filename of the tmpfile in a command line that should be ran by the OS, as in:
f = io.popen("python myprog.py " .. mytmpfile)
So my questions are:
Would it be somehow possible to specify this tmpfile file handle as the input argument for the externally ran program/script, say in io.popen - instead of using the (non-existing) tmpfile filename?
If above is not possible, what is the next best option (in terms of not having to maintain it, i.e. not having to remember to delete the file) for opening a temporary file in Lua?
You can get a temp filename with os.tmpname.
local n = os.tmpname()
local f = io.open(n, 'w+b')
f:write(....)
f:close()
os.remove(n)
If your purpose is sending some data to a python script, you can also use 'w' mode in popen.
--lua
local f = io.popen(prog, 'w')
f:write(....)
#python
import sys
data = sys.stdin.readline()
I am constructing a general purpose function to read a text file, which may be Ascii, UTF-8 or UTF-16. (The encoding is known when the function is invoked). The file name may contain UTF8 characters, so the standard lua io functions are not a solution. I have no control over the Lua implementation (5.3) or the binary modules available in the environment.
My current code is:
require "luacom"
local function readTextFile(sPath, bUnicode, iBits)
local fso = luacom.CreateObject("Scripting.FileSystemObject")
if not fso:FileExists(sPath) then return false, "" end --check the file exists
local so = luacom.CreateObject("ADODB.Stream")
--so.CharSet defaults to Unicode aka utf-16
--so.Type defaults to text
so.Mode = 1 --adModeRead
if not bUnicode then
so.CharSet = "ascii"
elseif iBits == 8 then
so.CharSet = "utf-8"
end
so:Open()
so:LoadFromFile(sPath)
local contents = so:ReadText()
so:Close()
return true, contents
end
--test Unicode(utf-16) files
local file = "D:\\OneDrive\\Desktop\\utf16.txt" --this exists
local booOK, factsetcontents = readTextFile(file, true, 16)
When executed I get the error: COM exception:(d:\my\lua\luacom-master\src\library\tluacom.cpp,382):Operation is not allowed in this context on line 19 [local stream = so:LoadFromFile(sPath)]
I've pored over the ADO documentation and am obviously missing something that is staring me in the face! Is what I'm trying to do impossible?
ETA: If I comment out the line so.Mode = 1, this works. Which is great, but I don't understand why, which meaans I may end up making the same mistake unwittingly, whatever that mistake is!
I don't know about AdoDB Stream.Mode and why the function failed. But I think it's rather tricky to use a ADODB COM object on Windows to read ASCII/UTF8/UNICODE encoded files.
You can instead :
use standard Lua io.open function in binary mode and use manual decoding of the bytes content
use a binary module to do all the work
use a specific Lua implementation for Windows that can read/write those kind of encoded files natively, like LuaRT
from kivy.uix.image import Image
self.img = Image(source="image") # This works when image is an PNG image
self.img = Image(source="image.jpg") # This works when image.jpg is a JPG image
self.img = Image(source="image") # This doesn't work when image is a JPG image
I need to specify images without extention for the app to be generic (working with more image types). Can I achieve it somehow?
Kivy is using "imghdr" to determine the image type here, and as a fallback it uses the file extension here.
That explains why the image loads fine when it has a file extension, even though "imghdr" can't find the file type in the file's content.
I tested on a list of JPEG files, and each time "imghdr" was able to detect the file type each time. That is done here im imghdr. Notably, "imghdr" does not consider the file extension.
$ python
>>> import os, imghdr
... for f in os.listdir('.'):
... print('%s -- %s' % (f, imghdr.what(f)))
Maybe the JPEG file is missing the "JFIF" or "Exif" string that imghdr is looking for? You could use hexedit to see if one of those string is present at Byte 6 of the image file.
I have a Dragonfly processor which should take a given PDF and return a PNG of the first page of the document.
When I run this processor via the console, I get back the PNG as expected, however, when in the context of Rails, I'm getting it as a PDF.
My code is roughly similar to this:
def to_pdf_thumbnail(temp_object)
tempfile = new_tempfile('png')
args = "'#{temp_object.path}[0]' '#{tempfile.path}'"
full_command = "convert #{args}"
result = `#{full_command}`
tempfile
end
def new_tempfile(ext=nil)
tempfile = ext ? Tempfile.new(['dragonfly', ".#{ext}"]) : Tempfile.new('dragonfly')
tempfile.binmode
tempfile.close
tempfile
end
Now, tempfile is definitely creating a .png file, but the convert is generating a PDF (when run from within Rails 3).
Any ideas as to what the issue might be here? Is something getting confused about the content type?
I should add that both this and a standard conversion (asset.png.url) both yield a PDF with the PDF content as a small block in the middle of the (A4) image.
An approach I’m using for this is to generate the thumbnail PNG on the fly via the thumb method from Dragonfly’s ImageMagick plugin:
<%= image_tag rails_model.file.thumb('100x100#', format: 'png', frame: 0).url %>
So long as Ghostscript is installed, ImageMagick/Dragonfly will honour the format / frame (i.e. page of the PDF) settings. If file is an image rather than a PDF, it will be converted to a PNG, and the frame number ignored (unless it’s a GIF).
Try this
def to_pdf_thumbnail(temp_object)
ret = ''
tempfile = new_tempfile('png')
system("convert",tmp_object.path[0],tmpfile.path)
tempfile.open {|f| ret = f.read }
ret
end
The problem is you are likely handing convert ONE argument not two
Doesn't convert rely on the extension to determine the type? Are you sure the tempfiles have the proper extensions?
OpenCV says something like
Corrupt JPEG data: premature end of data segment
or
Corrupt JPEG data: bad Huffman code
or
Corrupt JPEG data: 22 extraneous bytes before marker 0xd9
when loading a corrupt jpeg image with imread().
Can I somehow catch that? Why would I get this information otherwise?
Do I have to check the binary file on my own?
OpenCV (version 2.4) does not overwrite the basic error handling for libjpeg, making them 'uncatchable'. Add the following method to modules/highgui/src/grfmt_jpeg.cpp, right below the definition of error_exit():
METHODDEF(void)
output_message( j_common_ptr cinfo )
{
char buffer[JMSG_LENGTH_MAX];
/* Create the message */
(*cinfo->err->format_message) (cinfo, buffer);
/* Default OpenCV error handling instead of print */
CV_Error(CV_StsError, buffer);
}
Now apply the method to the decoder error handler:
state->cinfo.err = jpeg_std_error(&state->jerr.pub);
state->jerr.pub.error_exit = error_exit;
state->jerr.pub.output_message = output_message; /* Add this line */
Apply the method to the encoder error handler as well:
cinfo.err = jpeg_std_error(&jerr.pub);
jerr.pub.error_exit = error_exit;
jerr.pub.output_message = output_message; /* Add this line */
Recompile and install OpenCV as usual. From now on you should be able to catch libjpeg errors like any other OpenCV error. Example:
>>> cv2.imread("/var/opencv/bad_image.jpg")
OpenCV Error: Unspecified error (Corrupt JPEG data: 1137 extraneous bytes before marker 0xc4) in output_message, file /var/opencv/opencv-2.4.9/modules/highgui/src/grfmt_jpeg.cpp, line 180
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
cv2.error: /var/opencv/opencv-2.4.9/modules/highgui/src/grfmt_jpeg.cpp:180: error: (-2) Corrupt JPEG data: 1137 extraneous bytes before marker 0xc4 in function output_message
(I've submitted a pull request for the above but it got rejected because it would cause issues with people reading images without exception catching.)
Hope this helps anyone still struggling with this issue. Good luck.
It could be easier to fix the error in the file instead of trying to repair the loading function of OpenCV. If you are using Linux you can use ImageMagick to make reparation to a set of images (is usual to have it installed by default):
$ mogrify -set comment 'Image rewritten with ImageMagick' *.jpg
This command changes a property of the file leaving the image data untouched. However, the image is loaded and resaved, eliminating the extra information that causes the corruption error.
If you need more information about ImageMagick you can visit their website: http://www.imagemagick.org/script/index.php
You cannot catch it if you use imread(). However there is imdecode() function that is called by imread(). Maybe it gives you more feedback. For this you would have to load the image into memory on your own and then call the decoder.
It boils down to: You have to dig through the OpenCV sources to solve your problem.
i had to deal with this recently and found a solution over here
http://artax.karlin.mff.cuni.cz/~isa_j1am/other/opencv/
i just need to make 2 edits # $cv\modules\highgui\src\grfmt_jpeg.cpp.
--- opencv-1.0.0.orig/otherlibs/highgui/grfmt_jpeg.cpp 2006-10-16 13:02:49.000000000 +0200
+++ opencv-1.0.0/otherlibs/highgui/grfmt_jpeg.cpp 2007-08-11 09:10:28.000000000 +0200
## -181,7 +181,7 ##
m_height = cinfo->image_height;
m_iscolor = cinfo->num_components > 1;
- result = true;
+ result = (cinfo->err->num_warnings == 0);
}
}
## -405,8 +405,9 ##
icvCvt_CMYK2Gray_8u_C4C1R( buffer[0], 0, data, 0, cvSize(m_width,1) );
}
}
- result = true;
+
jpeg_finish_decompress( cinfo );
+ result = (cinfo->err->num_warnings == 0);
}
}
I am using opencv python package to read some image and also met this error message. This error can not be catch by Python. But if you want to find which image is corrupted without recompiling opencv as #Robbert suggested, you can try the following method.
First, you can pinpoint the directory where the corrupt images reside, which is fairly easy. Then you go to the directory, and use mogrify command line tool provided by ImageMagick to change the image meta info, as suggest by #goe.
mogrify -set comment "errors fixed in meta info" -format png *.jpg
The above command will convert the original jpg image to png format and also clean the original image to remove errors in meta info. When you run mogrify command, it will also output some message about which image is corrupted in the directory so that you can accurately find the corrupted image.
After that, you can do whatever you want with the original corrupted jpg image.
Any one stumbles upon this post and reads this answer.
I had to get hold of a corrupted image file.
These websites can help you corrupt your file
Corrupt a file - The file corrupter you were looking for!
CORRUPT A FILE ONLINE
Corrupt my File
First and the third website was not that much useful.
Second website is interesting as I could set the amount of file that I need to corrupt.
OpenCV version I used here is 3.4.0
I used normal cv2.imread(fileLocation)
fileLocation Location of corrupted image file
OpenCV didn't show any error message for any of the corrupted files used here
First and Third website only gave one file and both had None stored in them, when I tried to print them
Second website did let me decide the amount of file that was needed to be corrupted
Corruption% Opencv message on printing the image
4% None
10% None
25% None
50% None Corrupt JPEG data: 3 extraneous bytes before marker 0x4f
75% None Corrupt JPEG data: 153 extraneous bytes before marker 0xb2
100% Corrupt JPEG data: 330 extraneous bytes before marker 0xc6 None
I guess the only check we have to make here would be
if image is not None:
Do your code or else pop an error
You can redirect stderr to a file, then after imread, search for the string "Huffman" inside that file. After searching the file, empty it. It works for me and now I am able to discard corrupted images and just process good ones.
If you load your image with imdecode, you can check errno :
std::vector<char> datas();
//Load yout image in datas here
errno = 0;
cv::Mat mat = cv::imdecode(datas, -1);
if (errno != 0)
{
//Error
}
(tested on OpenCV 3.4.1)
I found that the issue is in libjpeg. If OpenCV uses it, it gets error
Corrupt JPEG data: 22 extraneous bytes before marker 0xd9
You can try my solution to solve it. It disables JPEG during compilation. After that OpenCV cannot read/write, but it works.
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local -D BUILD_SHARED_LIBS=OFF -D BUILD_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_JPEG=OFF -D WITH_IPP=OFF ..
I found an easy solution without the need to recompile openCV.
You can use imagemagick to detect the same errors, however it returns an error as expected. See the description here: https://stackoverflow.com/a/66283167/2887398