Cloudfoundry [Cloudfoundry] File.open ruby rails resque class - ruby-on-rails

I am using cloudfoundry. I upload a file and save the file..my routine returns the path and filename
/var/vcap/data/dea/apps/Dwarfquery-0-99065f0be8880d91916257931ed91162/app/tmp/region1-legends10-11-2012-20:53.xml
However the scheduled resque routine which tries to read it using File.Open returns the following error
Errno::ENOENT
Error
No such file or directory - /var/vcap/data/dea/apps/Dwarfquery-0-99065f0be8880d91916257931ed91162/app/tmp/region1-legends10-11-2012-20:53.xml
This is the path returned by the Upload Server...I have added require 'open-uri' at the top of my Job Class
The line that is failing is
File.open(fpath, 'r+') do |f|
where fpath the the file/directory returning the error

I'm not proficient with ruby at all, but just to clarify:
Are the bit that uploads and the Resque routine part of the same "app" (in Cloud Foundry sense?)
Are you trying to read the file soon after it has been uploaded, or long after (in particular, after your app has/could have been restarted?)
This is important because:
Each "app" has its own temporary folder and obviously one app can't access another app's filesystem. This also holds if you deployed your app with multiple "instances". Each instance is a separate process that has its own filesystem.
local filesystem storage is ephemeral and is wiped clean every time the app restarts
If you need to access binary data between apps, you will want to use some kind of storage (e.g. Mongo's GridFS) to have it persisted and visible by both apps.

Related

Heroku file.exists? cannot find file that is just saved

I'm transferring a WAV file from server to storage after manipulation, so I transfer the file to Heroku, then confirm the file exists before running the manipulation, however, File.exists? returns that the file does not exist, I feel its a naming or / path issue however cannot figure it.
I save the file URL in the file object which gives a URL example below
/uploads/wav_file/wbWavAudioFile/116/REff7e0b513481000322f530c849ddcccd.wav
On the HTML page, I can access and read this as well as download the file from the Heroku instance ( I re-rake the files on deploy this is proof of concept work and will use persistent storage after)
However, if I call
if File.exists?(Rails.root+call.wbAudioFile.url)
puts "file exists"
#do some manipulation to file
else
puts "file DOES NOT EXISIT"
end
I get File not found and falls to does not exsist
is there a case-sensitivity issue? a / instead of \ issue?
Or am I needing to define the path differently?
Advice appreciated.
I circumvented this issue after considering that the production model would post the files to a Bucket, so just coded for the production specification instead of testing locally.

Should I delete uploaded files from the file system on my own?

I have a rails app where the user can upload files. The files get uploaded to an external cloud service by a backgroud jobs. It's vital for my app that the files won't get stored in the file system after they've been uploaded. Not right away, in general -- they must not remain in the file system.
Should I delete them on my own? Or will get deleted automatically?
Also, debugging my app, I noticied this for an attachment params:
[2] pry(#<MyController>)> my_params.tempfile.path
"/var/folders/qr/0v5z71xn7x503ykyv1j6lkp00000gn/T/RackMultipart20181007-10937-3ntmgg.png"
That file gets stored not in "/tmp" but in "/var" and that means that it won't get deleted automatically, right?
Note that I'm not using paperclip for this task.
You are right the files won't get deleted automatically.
You have to delete the file explicitly at some point in time.
It depends how you set it up. If you used Tempfile to save it then yes the files will be deleted when the object is garbage collected. If not then it probably won't be deleted.
If the files get stored on an external service it might be worth setting up ActiveStorage which allows you to directly upload to external storage providers without the file ever touching your server.

What is the recommended approach to parse a CSV file stored in S3?

I am using the aws-sdk gem to read a CSV file stored in AWS S3.
Referencing the AWS doc. So far I have:
Aws::S3::Resource.new.bucket(ENV['AWS_BUCKET_NAME']).object(s3_key).get({ response_target: "#{Rails.root}/tmp/items.csv" })
In Pry, this returns:
output error: #<IOError: closed stream>
However, navigating to tmp/. I can see the items.csv file and it contains the right content. I am not certain wether the return value is an actual error.
My second concern. Is it fine to store temporary files in "#{Rails.root}/tmp/"?
Or should I consider another approach?
I can load the file in memory and then CSV.parse. Will this have implications if the CSV file is huge?
I'm not sure how to synchronously return a file object using the aws gem.
But I can offer some advice on the other topics you mentioned.
First of all, /tmp - I've found that saving files here is a working approach. On AWS, I've used this directory to create a local LRU cache for S3-stored images. The key thing is to preemp the situation where the file has been automatically deleted. The file needs to be refetched if this happens. By the way, Heroku has a 'read-only filesystem' but still permits you to write into /tmp.
The second part is the question of synchronously returning a file object.
While it may be possible to do this using the S3 gem, I've found success fetching it over HTTP using something like open-uri or mechanize. If it's not supposed to be a publically-available asset, you can change the permissions on S3 to restrict access to your server.

Creating a dashboard using csv files

I am trying to create a dashboard using CSV files, Highcharts.js, and HTML5. In a local development environment I can render the charts using CSVs both on my file system and hosted on the web. The current goal is to deploy the dashboard live on Heroku.
The CSVs will be updated manually - for now - once per day in a consistent format as required by Highcharts. The web application should be able to render the charts with these new, "standardized" CSVs whenever the dashboard page is requested. My question is: where do I host these CSVs? Do I use S3? Do I keep them on my local file system and manually push the updates to heroku daily? If the CSVs are hosted on another machine, is there a way for my application (and only my application) to access them securely?
Thanks!
Use the gem carrierwave direct to upload the file directly from the client to an Amazon S3 bucket.
https://github.com/dwilkie/carrierwave_direct
You basically give the trusted logged in client a temporary key to upload the file, and nothing else, and then the client returns information about the uploaded file to your web app. Make sure you have set the upload to be private to prevent any third parties from trying to brut force find the CSV. You will then need to create a background worker to do the actually work on the CVS file. The gem has some good docs on how to do this.
https://github.com/dwilkie/carrierwave_direct#processing-and-referencing-files-in-a-background-process
In short in the background process you will download the file temporarily to heroku, parse it out, get the data you need and then discard the copy on heroku, and if you want the copy on S3. This way you get around the heroku issue of permanent file storage, and the issue of tied up dynos with direct uploads, because there is nothing like NGINX for file uploads on heroku.
Also make sure that the file size does not exceed the available memory of your worker dyno, otherwise you will crash. Sense you don't seem to need to worry about concurrency I would suggest https://github.com/resque/resque.

Heroku: Serving Large Dynamically-Generated Assets Without a Local Filesystem

I have a question about hosting large dynamically-generated assets and Heroku.
My app will offer bulk download of a subset of its underlying data, which will consist of a large file (>100 MB) generated once every 24 hours. If I were running on a server, I'd just write the file into the public directory.
But as I understand it, this is not possible with Heroku. The /tmp directory can be written to, but the guaranteed lifetime of files there seems to be defined in terms of one request-response cycle, not a background job.
I'd like to use S3 to host the download file. The S3 gem does support streaming uploads, but only for files that already exist on the local filesystem. It looks like the content size needs to be known up-front, which won't be possible in my case.
So this looks like a catch-22. I'm trying to avoid creating a gigantic string in memory when uploading to S3, but S3 only supports streaming uploads for files that already exist on the local filesystem.
Given a Rails app in which I can't write to the local filesystem, how do I serve a large file that's generated daily without creating a large string in memory?
${RAILS_ROOT}/tmp (not /tmp, it's in your app's directory) lasts for the duration of your process. If you're running a background DJ, the files in TMP will last for the duration of that process.
Actually, the files will last longer, the reason we say you can't guarantee availability is that tmp isn't shared across servers, and each job/process can run on a different server based on the cloud load. You also need to make sure you delete your files when you're done with them as part of the job.
-Another Heroku employee
Rich,
Have you tried writing the file to ./tmp then streaming the file to S3?
-Blake Mizerany (Heroku)

Resources