Download Excel file from web url using Ruby on Rails - ruby-on-rails

I am trying to create an application that crawls a website providing free financial data in .xlsx format. They upload files once a month and not always on the same day.
Is it possible to download any new files from a specific URL and dump it into my S3 bucket, before reading it into a database? I have read up about creating a worker using Sidekiq. I expect that this will play a crucial part in the process.
Can anybody perhaps give some advice or point me to a tutorial that can help?

Yes, you can, and you don't even need Sidekiq.
Take a look at AWS SDK for Ruby, and do the following things:
Just write a ruby script that downloads the xlsx files then upload to S3. Be sure the script starts with #!/usr/bin/env ruby, and give it execute permission.
Add this script to your crontab jobs, and make it run everyday.

Related

How to use File.new() in Rails app to create a file?

I would like to create some_javascript_file.js after a user submits a form and save this file in the public directory of my app.
When testing this localy I can simply use File.new("./public/some_javascript_file.js", "w") to accomplish this but this solution doesn't work when the app is deployed to Heroku.
Please would you have any suggestions? Thank you
This is because of how Heroku works. Basically, as stated in the docs, the filesystem is ephemeral and you can't rely on it. If you want persistence, you should upload this JS file to some external service, such as AWS S3. Or you might want to deploy your app to a different environment, such as self-managed VPS, where filesystem is real and your file won't disappear.

Best AWS Storage Option for Exporting Directories as .zip Files?

I'm brand new to AWS products, ruby on rails, web development, and coding of any type. For my first project after a quick (and dirty) bootcamp, I'm trying to build a ruby-on-rails website that stores images and allows the user to download them as a zip file. I used the RubyZip gem to accomplish this in my EC2 dev environment, but I have deployed to Elastic Beanstalk with S3 file storage, and the RubyZip gem seems unable to handle this structure without traditional directory targets for zipping.
My question is what's the best setup to achieve this functionality in EB? Disregarding the ruby constraint, zipping an S3 directory seems tricky. Should I move to EFS or another storage system? I plan to erase the folders regularly, and limit them to ~100 photos, so long term and large size storage are not a concern. Thanks very much!
Edit: I am attached to Ruby (only language I know), but not RubyZip, AWS, or much of anything else if they are not the best approach for this task.
I think you're on the right path as far as using S3 as a solution. The problem that your facing is that when you're interacting with S3 it's not like a folder on your local system, instead you're hitting the S3 API to interact with the files. (upload, edit, delete ect). This will be a problem you'll encounter with every AWS based storage solution.
I think the solution, in your case, is to fetch all of the photos and download them to a temporary folder on your local system. Then, you could zip them using Ruby, locally. After it's zipped, upload it back to S3.
Edit: By locally I mean onto the server where the Ruby application is running (not client-side)

Writing to MS Access files on/via Rails app hosted on Heroku

I'm trying to determine whether it's possible to write to an MS Access mdb file using a rails app hosted on heroku?
I'm guessing ODBC is the way to go, but am not sure of which gem etc as well as where to temporarily stick the mdb file while the rails app is operating on it.
The end result is I want to upload a csv and an access file and have the rails app import bits of the data into certain spots of the access file, then spit the edited mdb file back to the user as a download.
Any thoughts on how to attack this?

temporary file access using carrierwave rails?

Hi I am a ROR developer and developing an ROR application where user can upload file and share the URL to download.
I am using carrierwave for file upload and storing the file in my application (not in AWS).
I want to give permission for the user to access the upload file only for a certain period of time (for ex:1 hour).
please help me to achieve this without creating any cron jobs or before_filter method?
I found AWS has default feature for this but I am not storing the file in AWS S3. I am storing it in my local application folder, so please guide me how I can achieve this.

Uploading & Unzipping files to S3 through Rails hosted on Heroku?

I'd like to be able to upload a zip file to my Rails application that contains a number of images. Then I'd like Rails to unzip that file and attach the images inside to my Photo's model via Paperclip, so that they are ultimately stored on my Amazon S3 account (configured through Paperclip).
I'd like do do this all on my Rails site hosted on Heroku, which unfortunately doesn't allow local storage of any kind (so far as I'm aware) to temporarily do the unzipping before the Paperclip parsing.
How would I do this??
I would recommend uploading directly to S3 which bypasses Heroku entirely so you're not restricted to the 30 second request timeout they enforce (which drops your uploads after that time is hit) or the 1gb /tmp directory limit. After the file is uploaded, you can make a POST to your Rails app with the file's name and location and then do your unzipping operation. If you'd like to use Paperclip for post-processing, I have attached a link below. If you end up going the route of uploading directly to S3 which offloads the work from your Rails server, please check out my sample projects:
Sample project using Rails 3, Flash and MooTools-based FancyUploader to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-FancyUploader
Sample project using Rails 3, Flash/Silverlight/GoogleGears/BrowserPlus and jQuery-based Plupload to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload
Here is the link for the Paperclip post processing for an example like images:
http://www.railstoolkit.com/posts/fancyupload-amazon-s3-uploader-with-paperclip
dmagkic is correct about the rails_root/tmp. I recommend something like the following:
Upload files through heroku to S3
Setup a background job to zip the files (store the file names that you need to group)
run the BJ that downloads the files from S3, zips them, sends the zip to S3, removes the unzipped files.
That way your application will still be responsive'ish during the upload process.
If you try to upload multiple files, you COULD write to /tmp, but just make sure that all the files come across in the same post request.
Heroku does allow writing to #{RAILS_ROOT}/tmp.
But you need to take in mind that file will be there only as long as request lasts. Probably longer, but that is not guaranteed. You could try to block request while you unzip and send to S3, but you should take care of the time it takes.
It sounds to me like you need some flash uploader that can unzip and send to S3, without Heroku.

Resources