Finding unused images in a Rails app? - ruby-on-rails

I'm familiar with tools like Deadweight for finding CSS not in use in your Rails app, but does anything exist for images? I'm sitting in a project with a massive directory of assets from working with a variety of designers and I'm trying to trim the fat in this project. It's especially a pain when moving assets to our CDN.
Any thoughts?

It depends greatly on the code using the images. It's always possible that a filename is computed (by concatenating two values or string substitution etc) so a simply grepping by filename isn't necessarily enough.
You could try running wget (probably already installed if you've got a linux machine, otherwise http://users.ugent.be/~bpuype/wget/ ) to mirror your whole site. Do this on the same machine or network if you can, it'll crawl your whole site and grab all the images
# mirror mysite.com accepting only jpg, png and gif files
wget -A jpg,png,gif --mirror www.mysite.com
Once you've done that, you're going to have a second copy of your site's hierarchy containing any images that are actively linked to by any page reachable by crawling your site. You can then backup your source image directory, and replace it with wget's copy. Next, monitor your log files for 404's pertaining to gif/jpg/png files. Hope that helps.

Finding unsed images should be easier than CSS.
Just find *.jpg *.png *gif with glob, put those filenames to dictionary or array and find those filenames againt html, css, js files, remove filename if found and you will get unused list, and move those images to another folder with same directory structure (It will be good for restoring for just in case)
Basically like this, and of course for the file names that encrypted/encoded/obcuscated will not work.
require "fileutils"
img=Dir.glob("**/*.jpg")+Dir.glob("**/*.png")+Dir.glob("**/*.gif")
data=Dir.glob("**/*.htm*")+Dir.glob("**/*.css")+Dir.glob("**/*.js")
puts img.length.to_s+" images found & "+data.length.to_s+" files found to search against"
content=""
data.each do |f|
content+=File.open(f, 'r').read
end
img.each do |m|
if not content=~ Regexp.new("\\b"+File.basename(m)+"\\b")
FileUtils.mkdir_p "../unused/"+File.dirname(m)
FileUtils.mv m,"../unused/"+m
puts "Image "+m+" moved to ../unused/"+File.dirname(m)+" folder"
end
end
PS: I used fileutils, because normal makedirs and mv are not works in my windows version of ruby
And I am not good at ruby, so please double check it before you use it.
Here is the sample results I ran in root folder of sample rails folder in my windows
---\ruby>ruby img_coverage.rb
5 images found & 12 files found to search against
Image depot/public/images/test.jpg moved to ../unused/depot/public/images folder

If your image URLs often come from many computed / concatenated strings and other stuff hard to track programmatically within your source code, and your application is in heavy use, you could try a soft "honeypot" approach like this:
Move all the assets to a different directory, e.g. /attic
Set up an empty /images directory (or what your asset directory is called)
Set up a .htaccess file (if you're on Apache of course) that, using the -f flag, redirects all requests to nonexistent image files to a script
The script copies the requested file from the /attic into the /images directory and displays it
The next request to that image will go directly to the image, because it exists now
After some time and sufficient usage, all needed images should have been copied to the assets directory.
It's a "soft" approach of course because a dialog / situation could have not been opened/entered/used by any user during that time (things like error message icons for example). But it will recognize all used files, no matter where they're requested from, and might help sort out much of the unneeded files.

If your file manager supports it, try sorting your images directory by the files' "last accessed" date. Files that haven't been accessed in a long time most likely aren't used any longer.
Along the same lines, you can also filter or grep through your web server's logs and make a list of the image files that it has served up in the last several months. Any images not in this list are likely unused.

Related

Can you create URLs for files in sphinx regardless of where they are saved?

Can you change the location of 'rst' files in sphinx without changing their URIs? I'm working on a documentation where we want to move some files to different folders, without changing the URIs:
For Example: If you create a sphinx project with $ sphinx-quickstart and add some files and folders:
index.rst
/tutorials/howToFoo.rst
/scripts/
With the toctree in in index.rst looking like that:
.. toctree::
:maxdepth: 1
:caption: Processing:
:glob:
scripts/*
tutorials/*
Then after building the project with make html, you have a link in your browser as seen here: tutorials/howToFoo.html
If you want to save the the file in a different folder:
index.rst
/tutorials/
/scripts/howToFoo.rst
Then the URL of your file howToFoo.rst changes depending on where it is saved:
scripts/howToFoo.html.
This is a problem because I don't want links to tutorials or scripts to break.
As the project aims to include many people, it will be very probable that there will be changes in the file structure in the future.
Now my question: Can you create a setup where you can move the files without having to write redirects to their new location, every time you move them?
For Cross Referencing inside of Sphinx this is solved for example with targets, explained here:
https://docs.readthedocs.io/en/stable/guides/cross-referencing-with-sphinx.html#automatically-label-sections
But this doesn't help me because the link in the browser still stays the same.
What I want is a link SomeNeverchangingLinkFor_howToFoo.html regardless of where the file howToFoo.rst is saved.

I have to use "../" in dreamweaver on some links. Is this normal?

I cannot find anything on the internet probably because I'm not sure how to ask the question properly, but I cannot seem to figure out why on some images or links to pages i have to put ../ in front of the file name for dreamWeaver to see the file. It's not every file just some of them.
It depends if the pages or images are in a separate folder.
If your page is in root/pages/home/index.html and your image is in root/pages/home/images/image.jpg then you will only need images/image.jpg but if the image is in root/pages/about/images/image.jpg then you require .. because it's outside of the local folder.
This is a good link: https://www.coffeecup.com/help/articles/absolute-vs-relative-pathslinks/

In org-mode, how do I keep the original path to images when using #+INCLUDE:?

I can use:
#+INCLUDE:
to include an org file in another org file, which allows me to assemble, say, a website from various org files. I'm exporting from the C-c C-e exporter in org-mode 7.5.
I could maintain a quite complex publication this way. This modular approach is quite common in, e.g. LaTeX and Texinfo publications.
However, links to images no longer work from the #+INCLUDEd org files. What seems to be happening is that the path to the images is taken as being from the org file that I am exporting from, rather than the actual org file that references the image.
The only ways I can see to resolve this are to:
use a flat file structure; or
make the image path from the referencing file (which I might not know in advance) rather than itself.
Neither of these is really sustainable.
How do I tell org to use the correct image path from its own relevant org file rather than the parent org file?
From what I know of the exporter, INCLUDE files are inserted into the document before export. Therefore the content is part of the document before it starts following paths to reach any links to files (images).
After a bit of testing you likely will need to use absolute file paths. Since you move between Windows and Linux your best bet would be to use a consistent scheme on both starting from your home directory.
Like that you can make the Org link:
[[~/path/to/image.jpg]], which will work on both systems (assuming you have set %HOME% on Windows).
Option 1 is potentially an alternative (although I agree it wouldn't be ideal at all), whereas the second option would have obvious pitfalls if you INCLUDE the file in more than one future document.

How do I generate files and then zip/compress with Heroku?

I sort of want to do the reverse of this.
Instead of unzipping and adding the collection files to S3 I want to
On user's request:
generate a bunch of xml files
zip the xml files with some images (pre-existing images hosted on s3)
download zip
Does anybody know agood way of doing this? I think I could manage this no problem on a normal machine but Heroku complicates things somewhat in that it has a read-only filesystem.
From the heroku documentation on the read-only filesystem:
There are two directories that are writeable: ./tmp and ./log (under your application root). If you wish to drop a file temporarily for the duration of the request, you can write to a filename like #{RAILS_ROOT}/tmp/myfile_#{Process.pid}. There is no guarantee that this file will be there on subsequent requests (although it might be), so this should not be used for any kind of permanent storage.
You should be able to pretty easily write your generated xml files to tmp/ and keep track of the names, download and write the s3 files to the same directory, and (maybe?) invoke a zip command as long as the output is in tmp/, then serve the file to the browser with the correct mime type to prompt a download. I would only be concerned with how big the filesize is and if heroku has an undocumented limit on what they'll allow in the tmp directory. Especially since you are only performing this action for a one-time download in the duration of a single request, I think you have a good chance of being able to do it.
Edit: Looking around a bit, you might be able to use something like RubyZip to create your zip file if you want to avoid calling system commands.

why have most of the files in a dreamweaver site been put into a directory called 'upload'?

I cleaned up someone's style sheet for a Dreamweaver site, by editing the css directly, and now the secretary is having trouble using her old template.
Most of the files in her site reside in subdirectories of the 'upload' directory. For example, I would have expected to see the stylesheet in
../assets/css/ etc.
but in fact I'm finding it in
../upload/assets/css/ etc.
In addition to assets, I am also finding Templates and images as subdirectories of 'upload'.
Do you know why this 'upload' directory was used?
I am considering two possible approaches.
(1) Make sure everything needed is in ../upload/ and remove the subdirectories that are directly in the root directory
(2) Edit the template to remove all references to ../upload/
Note that (2) appeals to me because the file structure will be simpler; but I wonder if the client has some sort of extension in her Dreamweaver that causes everything she ftp's to be put into the 'upload' directory.
Note that so far I have copied my cleaned up css file over to ../upload/assets/ as a short-term solution. But they want to be able to make changes to their template, and add new pages, on their own in future.
Thanks.
The likely problem is how she has her FTP remote settings specified. It appears that it now points to the upload folder rather than the web root. Or, it could be that her FTP user account is tied to the upload folder rather than the web root.

Resources