Can I use UUID as my image file name for uniquness? - ruby-on-rails

I am naming the image file in my rails app with UUID, e.g.67bc91b6-fdb3-11e0-bbff-f04da2baa2d4.png
However, I was not able to display this image correctly in my page, any clue on why this happens?

The name of a file does not matter, and an UUID (or a hash of the content) is certainly a good way to generate a unique filename. Make sure you're sending the appropriate headers. For example, if you access your URL with curl -I, you should see the correct content-type:
$ curl -I http://localhost/APP/image/67bc91b6-fdb3-11e0-bbff-f04da2baa2d4.png
HTTP/1.1 200 OK
Cache-Control: public, max-age=4
Content-Type: image/png
Content-Length: 198542
If the content-type is not set or set to the wrong format, or Content-Length is not equal to the filesize, make it so.

I've done this in other languages (C#) and had no issues; there is nothing on the web that restricts use of a UUID as an image file name. Something else must be afoot here.

If you would like shorter filenames, that is always a good thing, you can use Base36 encoding.
http://en.wikipedia.org/wiki/Base_36
Since it is case insensitive, it works well for all operative systems, and file names would be a lot shorter than simple UUID to Hex.

Related

Apache Tika Server - Request Header Parameters?

The Apache Tika Server provides a Rest API to extract text from a document. It is also possible to set specific request header parameters like X-Tika-PDFOcrStrategy. e.g:
$ curl -T test/Dokument01.pdf http://localhost:9998/tika --header "X-Tika-PDFOcrStrategy: ocr_only"
From a lot of different documents about tika I found these documented additional header parameters:
X-Tika-OCRLanguage: eng
X-Tika-PDFextractInlineImages: true | false
X-Tika-PDFOcrStrategy: ocr_only | ocr_and_text_extraction
X-Tika-OCRoutputType: hocr
But there seems to be no documentation about how to use the X-Tika-.....? header parameters or which parameters are supported and which not.
For example I wonder if it is possible to overwrite the ImageType mode or the DPI with something like:
X-Tika-PDFocrImageType: rgb
X-Tika-PDFocrDPI: 100
My question is: Which header parameters are supported and which naming convention did these params follow?
The code that handles the X-Tika-OCR and X-Tika-PDF headers is TikaResource.processHeaderConfig.
Those header suffixes and values are then mapped onto the TesseractOCRConfig and PDFParserConfig configuration objects via reflection.
So, to see what X-Tika headers you can set, look up the options on the config class you want to tweak things on (Tesseract or PDF), then build the name, then set the header. If you are not sure what the option does, or what values it takes, look at the JavaDocs for the underlying setter method that will get called.
For eg setExtractInlineImages on PDF, that maps to X-Tika-PDFextractInlineImages

RegEx to find in-line images in a plain text email message

Certain mail clients allow for the sender to place images directly in the body of their email (instead of as a traditional attachment). When I receive one of these emails in my application, I need to be able to look at only the text/plain message body and determine that the sender embedded an inline image.
I'm trying to craft a RegEx to find image placeholders in the text/plain message body so I can swap them for <img> tags in my own HTML-enabled version of the message. (Wacky, I know, but this is the requirement).
The problem I'm finding is that the placeholders differ based on the sending mail client. For example, when sent from MS Outlook, the text/plain body of the multi-part message looks like this:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Check out this image:
[cid:image001.jpg#01CB50D4.769583B0]
Isn't it cool??
A similar message sent from Gmail is a little bit different:
Content-Type: text/plain; charset=ISO-8859-1
Check out this image:
[image: image001.jpg]
Isn't it cool??
The text/html body and image/jpeg part with the base64 encoded image follow.
Has anyone done any research on this before and compiled a list or built a RegEx specifically for this purpose?
I realize a more reliable way to achieve my goal is to look at the text/html portion of the message--which seems to be a bit more standardized from the few tests I've done--but unfortunately I don't have access to that in this scenario.
I'm using C#, if that matters to anyone.
Here's a list of text/plain image placeholders I've compiled thus far:
Gmail: [image: filename.jpg]
Outlook 2007: [cid:filename.jpg#01CB50D4.769583B0]
Thunderbird 3.0.7: none
I'd suggest to go with html part. If you want to find just a placeholder in plain text part, this very simple regular expression should be sufficient (PCRE):
^\[.*\]$
At least this is what works for examples above. If you'd like to identify image name, a bit complicated expression would be required. Mind that, this will catch all lines starting with [ and ending with ] no matter what the contents are. If you'd like to limit regexp to some file types, try this:
^\[.*(\.jpg|\.jpeg|\.png|\.gif|\.bmp).*\]$i
Examples will work in Perl, since you didn't mention language...

request.format returning */*

I'm currently developing an API for my application on RoR
As an example, I created some XML, loaded with all the info I need to create the object, let's say a Person, and using Curl I submitted it to my application
I'm able to call exactly the create action I want from the controller and the hash params of the object are being passed correctly
But now I need to apply a different behaviour if request was made or not with XML, what is bothering me is why in the controller request.format gives */*.
Any clues?
curl -v -H "Content-Type: application/xml; charset=utf-8" --data-ascii #client.xml http://foo.com:3000/clients?api_key=xxx
def create
logger.debug request.format # produces "*/*"
if request.format.xml?
# never gets here
end
end
*/* means that the user-agent accepts all formats and doesn't care what format you give it. I believe Safari does this, among others. By default, curl sends an Accept header of */*.
Here is a dump of the headers curl sends by default:
User-Agent: curl/7.18.1 (i386-apple-darwin9.6.0) libcurl/7.18.1 zlib/1.2.3
Host: example.com
Accept: */*
Content-Type:
However, in this case, it looks like you want to send back XML if the payload sent to you was XML? If that's the case, you want to check the request's Content-Type header directly. i.e., request.content_type is the method you want.
Addenda: I thought a bit more about this, and I think the best approach is to first check request.format and only if that is inconclusive check request.content_type. Essentially, the HTTP spec provides for clients being able to tell servers that "I'm giving you XML, but I want JSON back." The Accept header is how clients tell you what they want back, and if someone actually sends it, you should honor that. Only use the request's Content-Type as a hint if the client didn't specify.
*/* simply means that all MIME types are accepted.
Looking at the code for the request.format method, the MIME type is determined by the file extension, or if that's not present then by the value of the HTTP Accept header. So you either need to pass Curl an XML file saved to disk, or get Curl to set the Accept header to an XML MIME type (e.g. text/xml) when it makes the request to your API.

Translate binary characters to a human readable string?

So let's say we have a string that is like this:
‰û]M§Äq¸ºþe Ø·¦ŸßÛµÖ˜eÆÈym™ÎB+KºªXv©+Å+óS—¶ê'å‚4ŒBFJF󒉚Ү}Fó†ŽxöÒ&‹¢ T†^¤( OêIº ò|<)ð
How do I turn it into a human readable string of chars, cuz like it was a wierd output of HTML from a webserver that is text I think cuz half the web page loaded correctly. Do I need to read it with like C or Python or something. That's only a snippet of the string.
If that is in fact supposed to be a human-readable string, you'll need to figure out what character encoding it uses and translate. It's also possible that the string is compressed, encrypted, or represents binary data. It would be helpful to know where you got your string from.
I'm guessing your web server isn't sending the correct mime-type. I'd suggest taking a look at the http headers using Firefox's Live Headers plugin. If a web server decides to send you a pdf, but doesn't set the mime-type, you'll just see garbage on your screen. Alternatively, save the page to a file, and then run these commands from Cygwin or a unix shell:
file mypage.htm
strings mypage.htm
The first will tell you if the header bytes follow any recognizable pattern. The second will strip out and display all the human readable text.

Load testing multipart form

I'm trying to load-test a Rails application using JMeter. A critical part of the application involves a form that includes both text inputs and file uploads. It works fine in a browser, but when I try to post that page in JMeter, Rails is saving all of the parts of the multipart form as temp files, which causes things to break when it's looking for a string and gets a tempfile instead.
It appears that the difference is that, from a browser, the piece of the multipart request that contains a text input looks like this:
-----------------------------7d93b4186074c
Content-Disposition: form-data; name="field_name"
test
-----------------------------7d93b4186074c
while from JMeter it looks like this:
-----------------------------7d159c1302d0y0
Content-Disposition: form-data; name="field_name"
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
test
-----------------------------7d159c1302d0y0
So apparently Rails sees the former and interprets it as a plain text value and treats it as a string, but sees the latter and saves it to a temp file.
I have not been able to find a setting to convince JMeter not to send the additional headers in the multipart form for non-file fields.
Is there a way to convince Rails to ignore those headers and treat the text/plain text as strings instead of text files? Or a quick way to put a filter in front of my controller that will strip the extra headers?
Alternately, is there a better tool to load-test a Rails application that includes file upload?
Turns out these days you can just tick "use browser compatible headers" in JMeter. Could've saved myself a hell of a lot of time there :-)
So, I have customized JMeter's multipart request posting part in the source code to put out the request that rails understand. The change is easy as shown below but to create compiling Java/JMeter environment took time. :(
Anyways, now I can successfully upload a file by multipart post via JMeter.
in src/protocol/http/org/apache/jmeter/protocol/http/sampler/PostWriter.java
writeStartFileMultipart()
//writeln(out, "Content-Transfer-Encoding: binary"); // $NON-NLS-1$
writeFormMultipart()
/*****
writeln(out, "Content-Type: text/plain; charset=" + charSet); // $NON-NLS-1$
writeln(out, "Content-Transfer-Encoding: 8bit"); // $NON-NLS-1$
*****/
P.S.
A tip tip to create the build environment for 2.4 is
to comment out the 3rd party libraries check in build.xml file.
copy lib/xstream-1.3.1.jar from binary archive into lib/ directory
There may be a better way, but I ended up adding a quick filter to turn the text/plain tempfiles into strings within the parameter hash:
def change_text_files_to_strings
params.each_pair do |key, value|
params[key] = value.read if (value.class.to_s=='Tempfile' && value.content_type.start_with?('text/plain') )
end
end
By the way, it turns out that jmeter is correct here, and rails incorrect: according to RFC 2388, each item in a multipart request should have a content type (not just files), so Rails really shouldn't be using the presence of a content-type header to determine whether it's a file. Ah well.
I also used the solution above as ColdFusion was sending similar headers (minus the Content-Transfer-Encoding) with each piece of form data. I wonder if there's a better way.
EDIT: Anyone know if this has been fixed in Rails 3?
What kind of error do you get? Something like
NoMethodError (undefined method `rewind' for "1":String):
There is an issue with Rack that could explain your problem. See https://github.com/rack/rack/issuesearch?state=open&q=rewind#issue/116
We were also having a similar issue, In addition to the above answers we also correlate the X-CSRF-Token of HTTP Header Manager in that request and were
successfully able to upload the required media as many as times we wanted.

Resources