Viewing a large production log file - ruby-on-rails

I have a 22GB production.log file in my Ruby on Rails app. I want to browse/search the contents over SSH without downloading. Is this possible? Are there any tools?

22GB is a very large file so it would be risky for your server to open the whole file using any tools. I'd recommend to split the file into multiple parts and search in each part. For example, using this command to split your file into small chunks of 1GB.
split -b 1GB very_large_file small_file
Also, you should set logrotate for your server to avoid log file getting too big.

Related

detect and remove all executable files in uploaded ZIP file

I am working on a web application using Rails which user can upload a zip file which contains its data/file/docs and etc. But I'm concerned with security right now, I want to scan the uploaded zip file and remove all kind of executable such exe, bash and etc how can I do this?
Edit: I am aware of clamav API for rails but it would only scan the file for malicious files not removing the executable, just imagine opening a wrong uploaded executable file in the server and the cost of this action server/business-wide!
First, it would be better and more robust to whitelist allowed file types, and not blacklist disallowed ones (eg. executables). So you should have a list of types you allow if that is possible in your application.
Then the question is how you determine the type of a file.
The trivial way is checking the file extension, but that's not very strong. It may still be good for a first check to avoid spending precious cpu time on further checks.
After that, you can use the filemagic database to quite reliably find the type of uploaded files. You have two options:
If your application runs on linux, you can call the file tool directly, something like filetype = `file -Ib #{filename}` to get the filetype. Note that filename in this example needs to be sanitized to avoid OS command injection!
If you want to support Windows too (or just want to avoid calling shell commands and have nicer code), you can use the ruby-filemagic gem:
require 'filemagic'
filename = 'yourfile.ext'
magic = FileMagic.new
filetype = magic.file(filename)
The problem with ruby-filemagic is that it's not maintained anymore, but it would probably still work fine to find executables.

How we can upload Large file in chunks in rails?

I am trying to upload a zip file of 350mb - 500mb to server. It gives "ENOSPC" error.
Is it possible to upload file in chunks and receive it on server as one file ?
or
Use custom location for tmpfs, so that it will be independent of system tmp, because in my case tmp is of 128MB only.
Why not use the Web-server uploading feature like nginx-upload and apache-upload
Not sure what it is called in apache but I guess apache too has it
if you are using Nginx
there is also a nginx-upload-progress which can be helpful if you want to track the progress of the upload
Hope this help

How do I generate files and then zip/compress with Heroku?

I sort of want to do the reverse of this.
Instead of unzipping and adding the collection files to S3 I want to
On user's request:
generate a bunch of xml files
zip the xml files with some images (pre-existing images hosted on s3)
download zip
Does anybody know agood way of doing this? I think I could manage this no problem on a normal machine but Heroku complicates things somewhat in that it has a read-only filesystem.
From the heroku documentation on the read-only filesystem:
There are two directories that are writeable: ./tmp and ./log (under your application root). If you wish to drop a file temporarily for the duration of the request, you can write to a filename like #{RAILS_ROOT}/tmp/myfile_#{Process.pid}. There is no guarantee that this file will be there on subsequent requests (although it might be), so this should not be used for any kind of permanent storage.
You should be able to pretty easily write your generated xml files to tmp/ and keep track of the names, download and write the s3 files to the same directory, and (maybe?) invoke a zip command as long as the output is in tmp/, then serve the file to the browser with the correct mime type to prompt a download. I would only be concerned with how big the filesize is and if heroku has an undocumented limit on what they'll allow in the tmp directory. Especially since you are only performing this action for a one-time download in the duration of a single request, I think you have a good chance of being able to do it.
Edit: Looking around a bit, you might be able to use something like RubyZip to create your zip file if you want to avoid calling system commands.

Finding unused images in a Rails app?

I'm familiar with tools like Deadweight for finding CSS not in use in your Rails app, but does anything exist for images? I'm sitting in a project with a massive directory of assets from working with a variety of designers and I'm trying to trim the fat in this project. It's especially a pain when moving assets to our CDN.
Any thoughts?
It depends greatly on the code using the images. It's always possible that a filename is computed (by concatenating two values or string substitution etc) so a simply grepping by filename isn't necessarily enough.
You could try running wget (probably already installed if you've got a linux machine, otherwise http://users.ugent.be/~bpuype/wget/ ) to mirror your whole site. Do this on the same machine or network if you can, it'll crawl your whole site and grab all the images
# mirror mysite.com accepting only jpg, png and gif files
wget -A jpg,png,gif --mirror www.mysite.com
Once you've done that, you're going to have a second copy of your site's hierarchy containing any images that are actively linked to by any page reachable by crawling your site. You can then backup your source image directory, and replace it with wget's copy. Next, monitor your log files for 404's pertaining to gif/jpg/png files. Hope that helps.
Finding unsed images should be easier than CSS.
Just find *.jpg *.png *gif with glob, put those filenames to dictionary or array and find those filenames againt html, css, js files, remove filename if found and you will get unused list, and move those images to another folder with same directory structure (It will be good for restoring for just in case)
Basically like this, and of course for the file names that encrypted/encoded/obcuscated will not work.
require "fileutils"
img=Dir.glob("**/*.jpg")+Dir.glob("**/*.png")+Dir.glob("**/*.gif")
data=Dir.glob("**/*.htm*")+Dir.glob("**/*.css")+Dir.glob("**/*.js")
puts img.length.to_s+" images found & "+data.length.to_s+" files found to search against"
content=""
data.each do |f|
content+=File.open(f, 'r').read
end
img.each do |m|
if not content=~ Regexp.new("\\b"+File.basename(m)+"\\b")
FileUtils.mkdir_p "../unused/"+File.dirname(m)
FileUtils.mv m,"../unused/"+m
puts "Image "+m+" moved to ../unused/"+File.dirname(m)+" folder"
end
end
PS: I used fileutils, because normal makedirs and mv are not works in my windows version of ruby
And I am not good at ruby, so please double check it before you use it.
Here is the sample results I ran in root folder of sample rails folder in my windows
---\ruby>ruby img_coverage.rb
5 images found & 12 files found to search against
Image depot/public/images/test.jpg moved to ../unused/depot/public/images folder
If your image URLs often come from many computed / concatenated strings and other stuff hard to track programmatically within your source code, and your application is in heavy use, you could try a soft "honeypot" approach like this:
Move all the assets to a different directory, e.g. /attic
Set up an empty /images directory (or what your asset directory is called)
Set up a .htaccess file (if you're on Apache of course) that, using the -f flag, redirects all requests to nonexistent image files to a script
The script copies the requested file from the /attic into the /images directory and displays it
The next request to that image will go directly to the image, because it exists now
After some time and sufficient usage, all needed images should have been copied to the assets directory.
It's a "soft" approach of course because a dialog / situation could have not been opened/entered/used by any user during that time (things like error message icons for example). But it will recognize all used files, no matter where they're requested from, and might help sort out much of the unneeded files.
If your file manager supports it, try sorting your images directory by the files' "last accessed" date. Files that haven't been accessed in a long time most likely aren't used any longer.
Along the same lines, you can also filter or grep through your web server's logs and make a list of the image files that it has served up in the last several months. Any images not in this list are likely unused.

Heroku: Serving Large Dynamically-Generated Assets Without a Local Filesystem

I have a question about hosting large dynamically-generated assets and Heroku.
My app will offer bulk download of a subset of its underlying data, which will consist of a large file (>100 MB) generated once every 24 hours. If I were running on a server, I'd just write the file into the public directory.
But as I understand it, this is not possible with Heroku. The /tmp directory can be written to, but the guaranteed lifetime of files there seems to be defined in terms of one request-response cycle, not a background job.
I'd like to use S3 to host the download file. The S3 gem does support streaming uploads, but only for files that already exist on the local filesystem. It looks like the content size needs to be known up-front, which won't be possible in my case.
So this looks like a catch-22. I'm trying to avoid creating a gigantic string in memory when uploading to S3, but S3 only supports streaming uploads for files that already exist on the local filesystem.
Given a Rails app in which I can't write to the local filesystem, how do I serve a large file that's generated daily without creating a large string in memory?
${RAILS_ROOT}/tmp (not /tmp, it's in your app's directory) lasts for the duration of your process. If you're running a background DJ, the files in TMP will last for the duration of that process.
Actually, the files will last longer, the reason we say you can't guarantee availability is that tmp isn't shared across servers, and each job/process can run on a different server based on the cloud load. You also need to make sure you delete your files when you're done with them as part of the job.
-Another Heroku employee
Rich,
Have you tried writing the file to ./tmp then streaming the file to S3?
-Blake Mizerany (Heroku)

Resources