I'm trying to write a script to take a PDF and increase the brightness/contrast such that my scanned handwritting is actually readable. I am able to do this with Photoshop (which is really tedious), but I can't figure out what RMagick methods to use to produce a similar result.
Any pointers? Thanks for the help.
I ended up using Fred's ImageMagick scripts to make the handwriting readable see : http://www.fmwconcepts.com/imagemagick/
I ended up not using RMagick for this part; instead I just called imagemagick's convert terminal command from ruby. It is a little bit convoluted - but it worked for me. Some sample code is below:
localthres_script = '~/Downloads/test/localthresh.sh' # CONSTANT LOCATION
params = '-m 3 -r 25 -b 20 -n yes'
pdf = Magick::ImageList.new("#{dir}/#{pdf_name_wo_ext}.pdf")
i=1
pdf.each do |page|
image_name = "#{pdf_name_wo_ext}_#{i}"
puts "==> Enhancing images..."
%x[#{localthres_script} #{params} #{dir}/#{image_name}.png #{dir}/PDF_SCRIPT/enhanced/#{image_name}.png]
puts "==> Moving images..."
%x[mv #{dir}/#{image_name}.png #{dir}/PDF_SCRIPT/original/#{image_name}.png]
i = i+1
end # each
I know this isn't the cleanest code, but it worked for me.
Related
So I have a Ruby script (using Ruby because we have a library of pre-existing code that I need to use). From within Ruby I am using backticks to call Linux commands, specifically in this case the "mv" command. I am trying to move one file to another location but I keep getting the error message that x and y are "the same file" even though they are very clearly NOT the same file.
Here is the code in Ruby:
#!/usr/local/rvm/rubies/ruby-2.1.1/bin/ruby
masterFiles=[]
masterFiles << "/mnt/datadrive/Data Capture/QualityControl/UH_HRA_SVY/Scans and DataOutput/Data/UH_HRA_SVY_DATA.txt"
masterFiles << "/mnt/datadrive/Data Capture/QualityControl/UH_HRA_SVY_SPAN/Scans and DataOutput/Data/UH_HRA_SVY_SPAN_DATA.txt"
tm=Time.new.strftime("%Y%m%d")
masterFiles.each do |mf|
if File.exist?(mf)
qmf=39.chr + mf + 39.chr
`cat #{qmf} >> /tmp/QM`
savename=39.chr + \
"/mnt/datadrive/Data Capture/QualityControl/UH_HRA_SVY/Scans and DataOutput/Data/DailyFiles/" + \
File.basename(mf).gsub(".txt","_"+tm) + ".txt" + 39.chr
`mv #{qmf} #{savename}`
end
end
The error that I get is this:
mv: `/mnt/datadrive/Data Capture/QualityControl/UH_HRA_SVY_SPAN/Scans
and DataOutput/Data/UH_HRA_SVY_SPAN_DATA.txt' and `/mnt/datadrive/Data
Capture/QualityControl/UH_HRA_SVY/Scans and
DataOutput/Data/DailyFiles/UH_HRA_SVY_SPAN_DATA_20140530.txt' are the
same file
If I change this line:
`mv #{qmf} #{savename}`
To this:
puts "mv #{qmf} #{savename}"
And then run the output, it works as expected.
I am pretty sure that this has to do with spaces in the path. I have tried every combination of double-quoting, triple-quoting, quadruple-quoting, and back-slashing I can think of to resolve this but no go. I have also tried using FileUtils.mv but get what is basically the same error worded differently.
Can anybody help ? Thanks a lot.
p.s. I realize it's entirely possible that I could be going about this in an entirely wrong-headed way, so feel free to point that out if so. However, I am trying to use the tools which I already have some knowledge of (cat, mv, etc) instead of re-inventing the wheel.
You could use FileUtils.mv
I often do aliases like so:
require 'fileutils'
def mv(from, to)
FileUtils.mv(from, to)
end
And inside the mv() method I do additional safeguards, i.e. if the file does not exist, if there is a lack of permissions and so forth.
If you then still have problems with filenames that have ' ' blank characters, try to put the file into a "" quote like:
your_target_location = "foo/bar bla"
I need to cache an ftp folder locally in ruby. Right now I'm using ftp_sync to download the ftp folder but it's painfully slow, do you guys know any library that can download the folder files in parallel?
Thanks!
The syncftp gem may help you:
http://rubydoc.info/gems/syncftp/0.0.3/frames
Ruby has a decent built-in FTP library in case you want to roll your own:
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/ftp/rdoc/Net/FTP.html
To download files in parallel, you can use multiple threads with timeouts:
Ruby Net::FTP Timeout Threads
A great way to get parallel work done is Celluloid, the concurrent framework:
https://github.com/celluloid/celluloid
All that said, if the download speed is limited to your overall network bandwidth, then none of these approaches will help much.
To speed up the transfers in this case, be sure you're only downloading the information that's changed: new files and changed sections of existing files.
Segmented downloading can give massive speedups in some cases, such as downloaded big log files where only a small percentage of the file has changed, and the changes are all at the end of the file, and are all appends.
You can also consider shelling out to the command line. There are many tools that can help you with this. A good general-purpose one is "curl", which supports simple ranges for FTP files as well, for example you can get the first 100 bytes of a document using FTP like this:
curl -r 0-99 ftp://www.get.this/README
Are you open to other protocols besides FTP? Take a look at the "rsync" command, which is excellent for download synchronization. The rsync command has many optimizations to transfer just the changed data. For example rsync can sync a remote directory to a local directory like this:
rsync -auvC me#my.com:/remote/foo/ /local/foo/
Take a look at Curb. It's a wrapper around Curl, and can do multiple connections in parallel.
This is a modified version of one of their examples:
require 'curb'
urls = %w[
http://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p286.tar.bz2
http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2
]
responses = {}
m = Curl::Multi.new
# add a few easy handles
urls.each do |url|
responses[url] = Curl::Easy.new(url)
puts "Queuing #{ url }..."
m.add(responses[url])
end
spinner_counter = 0
spinner = %w[ | / - \ ]
m.perform do
print 'Performing downloads ', spinner[spinner_counter], "\r"
spinner_counter = (spinner_counter + 1) % spinner.size
end
puts
urls.each do |url|
print "[#{ url } #{ responses[url].total_time } seconds] Saving #{ responses[url].body_str.size } bytes..."
File.open(File.basename(url), 'wb') { |fo| fo.write(responses[url].body_str) }
puts 'done.'
end
That'll pull in both the Ruby and Python source (which are pretty big so they'll take about a minute, depending on your internet connection and host). You won't see any files appear until the last block, where they get written out.
I'm using paperclip at the moment to convert pdf files to images.
My code looks something like this
def convert_keynote_to_slides
system('convert -size 640x300 ' + keynote.queued_for_write[:original].path + ' ' + KEYNOTE_PATH + '/' + File.basename( self.keynote_file_name )+"%02d.png")
slide_basename = File.basename( self.keynote_file_name )
files = Dir.entries(KEYNOTE_PATH).sort
for file in files
#puts file if file.include?(slide_basename +'-')
self.slides.build("slide" => "#{file}") if file.include?(slide_basename)
end
end
I'm sure this can be re-factored to work better.
My questions are:
Is there a way to figure out the progress of ImageMagick if not how would I put this into a delayed job as im worried this wont scale very well.
Can anyone point me in the direction as to how to make this code better / more efficient. The KEYNOTE_PATH points to a directory in public where all of the images are held in a single folder im not sure if I like this or not. What would probably be better is to assign a random name to each file.
I hope you're doing extensive filtering of keynote.queued_for_write[:original].path and File.basename( self.keynote_file_name ) input variables, so you're not susceptible to shell meta-character injection attacks.
I created a rails class with a video attachment, and i want to know how to get the length of a video that is uploaded to my application. How can I achieve that ?
I didn't get Rvideo working on fully, the gem hasn't been updated in four years. However, this works:
before_post_process :get_video_duration
def get_video_duration
result = `ffmpeg -i #{self.video.to_file.path} 2>&1`
r = result.match("Duration: ([0-9]+):([0-9]+):([0-9]+).([0-9]+)")
if r
self.duration = r[1].to_i*3600+r[2].to_i*60+r[3].to_i
end
end
Use ffmpeg and the RVideo gem, which is a thin Ruby wrapper around it. There's a lot of forks of the RVideo project, personally I use http://github.com/greatseth/rvideo because it supports capturing frames from video and saving them as images. When it's all set up, you can do this:
# For Paperclip 2
video_attributes = RVideo::Inspector.new(:file => self.upload.to_file.path, :ffmpeg_binary => "/usr/local/bin/ffmpeg" )
video_attributes.duration # duration in milliseconds
# For Paperclip 3
video_attributes = RVideo::Inspector.new(:file => self.upload.queued_for_write[:original].path)
video_attributes.duration # duration in milliseconds
I had to do this recently, and this was my approach:
before_post_process do
file = data.queued_for_write[:original].path
self.duration = Paperclip.run("ffprobe", '-i %s -show_entries format=duration -v quiet -of csv="p=0"' % file).to_f
end
ffprobe is installed by ffmpeg, so if you have that installed you're probably good to go.
I want to read only the first line of a file using Ruby in the fastest, simplest, most idiomatic way possible. What's the best approach?
(Specifically: I want to read the git commit UUID out of the REVISION file in my latest Capistrano-deployed Rails directory, and then output that to my tag. This will let me see at an http-glance what version is deployed to my server. If there's an entirely different & better way to do this, please let me know.)
This will read exactly one line and ensure that the file is properly closed immediately after.
strVar = File.open('somefile.txt') {|f| f.readline}
# or, in Ruby 1.8.7 and above: #
strVar = File.open('somefile.txt', &:readline)
puts strVar
Here's a concise idiomatic way to do it that properly opens the file for reading and closes it afterwards.
File.open('path.txt', &:gets)
If you want an empty file to cause an exception use this instead.
File.open('path.txt', &:readline)
Also, here's a quick & dirty implementation of head that would work for your purposes and in many other instances where you want to read a few more lines.
# Reads a set number of lines from the top.
# Usage: File.head('path.txt')
class File
def self.head(path, n = 1)
open(path) do |f|
lines = []
n.times do
line = f.gets || break
lines << line
end
lines
end
end
end
You can try this:
File.foreach('path_to_file').first
How to read the first line in a ruby file:
commit_hash = File.open("filename.txt").first
Alternatively you could just do a git-log from inside your application:
commit_hash = `git log -1 --pretty=format:"%H"`
The %H tells the format to print the full commit hash. There are also modules which allow you to access your local git repo from inside a Rails app in a more ruby-ish manner although I have never used them.
first_line = open("filename").gets
I think the jkupferman suggestion of investigating the git --pretty options makes the most sense, however yet another approach would be the head command e.g.
ruby -e 'puts `head -n 1 filename`' #(backtick before `head` and after `filename`)
Improving on the answer posted by #Chuck, I think it might be worthwhile to point out that if the file you are reading is empty, an EOFError exception will be thrown. Catch and ignore the exception:
def readit(filename)
text = ""
begin
text = File.open(filename, &:readline)
rescue EOFError
end
text
end
first_line = File.readlines('file_path').first.chomp