In a particular Rails application, I'm pulling binary data out of LDAP into a variable for processing. Is there a way to check if the variable contains binary data? I don't want to continue with processing of this variable if it's not binary. I would expect to use is_a?...
In fact, the binary data I'm pulling from LDAP is a photo. So maybe there's an even better way to ensure the variable contains binary JPEG data? The result of this check will determine whether to continue processing the JPEG data, or to render a default JPEG from disk instead.
There is actually a lot more to this question than you might think. Only since Ruby 1.9 has there been a concept of characters (in some encoding) versus raw bytes. So in Ruby 1.9 you might be able to get away with requesting the encoding. Since you are getting stuff from LDAP the encoding for the strings coming in should be well known, most likely ISO-8859-1 or UTF-8.
In which case you can get the encoding and act on that:
some_variable.encoding # => when ASCII-8BIT, treat as a photo
Since you really want to verify that the binary data is a photo, it would make sense to run it through an image library. RMagick comes to mind. The documentation will show you how to verify that any binary data is actually JPEG encoded. You will then also be able to store other properties such as width and height.
If you don't have RMagick installed, an alternative approach would be to save the data into a Tempfile, drop down into Unix (assuming you are on Unix) and try to identify the file. If your system has ImageMagick installed, the identify command will tell you all about images. But just calling file on it will tell you this too:
~/Pictures$ file P1020359.jpg
P1020359.jpg: JPEG image data, EXIF standard, comment: "AppleMark"
You need to call the identify and file commands in a shell from Ruby:
%x(identify #{tempfile})
%x(file #{tempfile})
Related
One day my application declared all passwords invalid.
After tedious search the problem was found: a cipher initialization vector (just a bunch of random bits) is given to the application via ENV. And rails had decided to convert this string (which is arbitrary binary data) to UTF-8.
I'm doing basically this, before server start:
ENV["RAILS_ACC_VEC"] = "\xB3n%-\x9E^\xE1\x93 \x17\xEER\x1B\n\x84S"
Rack::Server.start( ...
and later
if Rails.env != "production"
salt = "dummy"
else
salt = ENV["RAILS_ACC_VEC"]
end
The bitstring should be 128 bit long. But it happened to be 176 bit long and contained valid UTF-8. (Obviousely, the cipher routines did utterly fail with that.)
The application currently runs on Rails 4.2.8 and ruby 2.4, and with default encoding.
The reason for the problem could be found: usually the application is started with the server or from deploy, with no locale in the environment. This time it was started from a console, and that console happened to be set to ISO 8859.
The consequence is also clear: one needs to take care that the application is always started with a definite locale in the ENV - either LC_CTYPE=C (equivalent to no locale), or -maybe better- UTF-8 (in case the application has default config.encoding).
What I am now trying to figure out, is, when and why does ruby/rails do such things?
I know that transcoding may happen with an IO object, but there the intended charset can be specified when opening.
It may make some sense, if the system seems to run in ISO 8859, and rails itself runs with UTF-8, that the ENV, when moved from outside to inside, may need transcoding. But that holds true only if language is concerned, and not all ENV content might be language.
So, how is the ENV opened in binary mode?
The more ambitioned question then is, are there more evil dangers of such kind around with the Encoding feature?
You should not store binary data in the system environment. The operating system is not designed to store binary data in its environment. I don't believe any provide that feature. All environment variables should be text. Maybe an OS can store binary data in the environment, but I don't believe that is a standard. I doubt they can store a null byte (\x00). It is probably a security risk for operating systems, leading to buffer overflow exploits for other programs that read the environment. Try a search of 'posix env binary'.
You should store your IV as base64 encoded data whenever you store it as text.
ENV['IV'] = 'VGhpcyBjYW4gYmUgYmluYXJ5Lg=='
export IV=VGhpcyBjYW4gYmUgYmluYXJ5Lg== # or from the shell
...
iv = Base64.decode64 ENV['IV']
I'm using Ruby 2.4 and Rails 5. I have file content in a variabe named "content". The content could contain data from things like a PDF file, a Word file, or an HTML file. Is there any way to tell if the variable contains binary data? Ultimately, I would like to know if this is a PDf, Microsoft Office, or some other type of OpenOffice file. This answer -- Rails: possible to check if a string is binary? -- suggests that I can check the encoding of the variable
content.encoding
and it would produce
ASCII-8BIT
in the case of binary data, however, I've noticed there are cases where HTML content stored in the variable could also return "ASCII-8BIT" as the content.encoding, so using "content.encoding" is not a foolproof way to tell me if I have binary data. Does such a way exist and if so, what is it?
If your real question is not about binary data per se but about determining the file type of the data, I'd recommend to have a look at the ruby-filemagic gem which will give you this information much more reliably. The gem is a simple wrapper around the libmagic library which is standard on unix-like systems. The library works by scanning the content of a file and matching it against a set of known "magic" patterns in various file types.
Sample usage for a string buffer (e.g. data read form the database):
require "ruby-filemagic"
content = File.read("/.../sample.pdf") # just an example to get some data
fm = FileMagic.new
fm.buffer(content)
#=> "PDF document, version 1.4"
For the gem to work (and compile) you need the file utility as well as the magic library with headers installed on your system. Quoting from the readme:
The file(1) library and headers are required:
Debian/Ubuntu:: +libmagic-dev+
Fedora/SuSE:: +file-devel+
Gentoo:: +sys-libs/libmagic+
OS X:: brew install libmagic
Tested to work well under Rails 5.
If you're on an unix machine, you can use the file command:
file titi.pdf
You could then do something like:
require 'open2'
cmd = 'file -'
Open3.popen3(cmd) do |stdin, stdout, wait_thr|
stdin.write(content)
stdin.close
puts "file type is:" + stoud.read
end
I need to upload a large file (>2 GB) using multipart POST request. Source file can be named using unicode symbols. The problem is that libcurl does not support unicode wfopen in windows, so I am not able to complete this task in usual way like
curl_formadd(&formpost, &lastptr,
CURLFORM_COPYNAME, fieldname,
CURLFORM_FILENAME, filename,
CURLFORM_FILE, full_path_to_file,
CURLFORM_CONTENTTYPE, "application/octet-stream",
CURLFORM_END);
I figured out that I can use a CURLFORM_STREAM option of curl_formadd in conjunction with CURLOPT_READFUNCTION. Now I need to manually set the file size through CURLFORM_CONTENTSLENGTH option, but it accepts only "long" as a parameter when I need to set a "long long" file size. After a look through curl manual I found some CURLOPT_POSTFIELDSIZE_LARGE option, but it does nothing in my case. It seems that multipart request system ignores this parameter. I don't know what to do, I don't want to give up unicode names or large files support.
I have previously asked this question: How to write exif metadata to an image.
I now have found a way to inject metadata. However, it results in a copy of the image into memory. With large images, and the need to already have a copy in memory, this is going to have performance, and possibly cause a memory crash.
Is there a correct way to inject metadata without having to make a copy of the image? Perhaps it could be tacked on to a file, after it is written to disk?
I would prefer native implementations, without having to resort to a third party library just for this, if at all possible.
This question could require a small or large amount of code depending on what you need. EXIF data is stored in a JPEG APP1 marker (FFE1). It looks very much like a TIFF file with a TIFF header, IFD and individual tags with the data. If you can build your own APP1 marker segment, then inserting it or replacing it in a JPEG file is trivial. If you are looking to read the metadata from an existing file, add some new tags and then write it back, that can be more involved. The tricky part of EXIF data are those tags which require more than 4-bytes. Each TIFF tag is 12 bytes: 2-byte tag, 2-byte data type, 4-byte count, 4-byte data. If the data doesn't fit completely in the 4 bytes of the tag, then the tag specifies an absolute offset into the file of where to find the data. If the existing data has any tags with data like this (e.g. make, model, capture date, capture time, etc), you will need to repack that data by fixing the offsets and then add your own. In a nutshell:
1) If you are adding a pre-made APP1 marker to a JPEG file, this is simple and requires little code.
2) If you need to read the existing meta-data from a JPEG file, add your own and write it back, the code is a bit more involved. It's not "difficult", but it involves a lot more than just reading and writing blocks of data.
Start by reading the TIFF 6.0 spec to understand the tag and directory structure:
TIFF 6.0 spec
Next, take a look at the JPEG EXIF spec:
EXIF 2.2 Spec
I would expect the existing exif manipulator software can do it, but haven't tested.
Links:
http://www.exiv2.org/
http://libexif.sourceforge.net/
http://www.kraxel.org/blog/linux/fbida/
CGImageSourceRef could be used to get image properties including its thumbnail without loading all image data into memory. This way memory is not wasted by UIImage and NSData.
CGImageSourceRef imageSource = CGImageSourceCreateWithURL((CFURLRef)[NSURL fileURLWithPath:path], NULL);
Then save CGImageDestinationRef adding the source image and exif data.
CGImageDestinationAddImageFromSource (destRef,
imageSource,
0,
(CFDictionaryRef)propertes );//exif
BOOL success = CGImageDestinationFinalize(destRef);
I am working on creating an Easter egg for a website. I want to hide some lines of data in an image`s EXIF comment field which can be used to reconstitute a tarball that contains a text file with a riddle. Is this possible? If so, how do I get the code/data for the tarball which I can then include in the EXIF data?
It is possible, although you may have to encode the data (e.g. using base64) if you use a field intended to hold ASCII text.
The best way to generate a tarball is to use the appropriately-named tar utility.