I want to provide an option to export data via a spreadsheet. I don't want to store it permanently (hence there's no need of storage services like S3). What would the most most efficient and scalable way of doing this? Where can I temporarily store this file while it is being processed? Here's what should happen:
List item
User uploads spreadsheet
My backend processes it and updates the DB
Discard the spreadsheet
My 2 requirements are efficiency and scalability.
If I was you i would look for a way to parse the XLS/CSV on the front-end and sending JSON to your backend. This way you pass slow /intensive work to the client (scalability) and process only JSON on the server.
You can start here:
https://stackoverflow.com/a/37083658/1540290
I'm assuming you have a form with a file input to pick the xls file you want to process like this:
<input id="my_model_source" type="file" name="my_model[source]">
To process the xls you could use roo gem.
Option 1:
In some controller (where you are processing the file) you can receive the file like this: params[:my_model][:source]. This file will be an ActionDispatch::Http::UploadedFile instance. This class has the instance method path that will give you a temp file to work with.
So, with roo gem, yo can read it like this:
xls = Roo::Spreadsheet.open(params[:my_model][:source].path, extension: :xlsx)
Option 2:
The option one will work if your importing process is not too heavy.
If indeed, is too heavy you can use Active Job to handle the processing in background.
If you choose Active Job, you:
will lose the opportunity to use ActionDispatch::Http::UploadedFile's path method. You will need to generate the temp file on your own. To achieve this you could use cp command to copy the ActionDispatch::Http::UploadedFile's path wherever you want. After use it you can deleted with rm commnad
will lose a real time response. To handle this you could use Job Notifier gem
I have tried to show roughly what paths you can take.
Related
I need to know if I'm implementing the correct procedure when sending an email with an attached .xls document.
The attached .xls document would simple have the information provided in an object. ...ie: person.first_name person.last_name. The xls file will render the full names of the list.
I have a front-end React.js with a back-end Rails API.
I actually have it working in it’s most simple way but I’m not sure if this would be the best way.
Please let me know if there is amore efficient way of doing this.
Currently I have this setup: In my React Action Creator I have a fetch calling to a custom “list” method on my controller backend. In this controller I’m creating and writing the file like this:
File.open(“new_file”, 'w+') do | f |
c = #list.names do | name, data |
f.puts ( "#{name.first_name} #{name.last_name}")
f.close
end
end
The above code will create a file in the root of my application which I’m not sure if it’s best practice.
After this code runs the mailer sends out the email with the proper xls file attached .
My question is: What do I do with this newly created file on the root of my rails application. Is this normal to have and every time this runs the file is overwritten which is okay in my opinion. What if two different people on different devices run the code at the same time. Will there be a chance of the list being mixed up and one user getting the wrong list? I just feel like this is not right to create a file to my back-end Rails api whenever my user needs a list emailed to them. Even if I delete it right after it’s sent in the mailer.
Thank you for your help.
How about creating temporally files with the ruby API
Or if there is not a need to reuse the file use send_data
If you have thousand of files that need caching is worth thinking something like the official Rails cache api(you can cache any kind of file no just HTML )
Is it possible to access to the information being saved into a rails log file without reading the log file. To be clear I do not want to send the log file as a batch process but rather every event that is written into the log file I want to also send as a background job to a separate database.
I have multiple apps running in docker containers and wish to save the log entries of each into a shared telemetry database running on the server. Currently the logs are formatted with lograge but I have not figured out how to access this information directly and send it to a background job to be processed.(as stated before I would like direct access to the data being written to the log and send that via a background job)
I am aware of the command Rails.logger.instance_variable_get(:#logger) however what I am looking for is the actual data being saved to the logs so I can ship it to a database.
The reasoning behind this is that there are multiple rails api's running in docker containers. I have an after action set up to run a background job that I hoped would send just the individual log entry but this is where I am stuck. Sizing isn't an issue as the data stored in this database to be purged every 2 weeks. This is moreso a tool for the in-house devs to track telemetry through a dashboard. I appreciate you taking the time to respond
You would probably have to go through your app code and manually save the output from the logger into a table/field in your database inline. Theoretically, any data that ends up in your log should be accessible from within your app.
Depending on what how much data you're planning on saving this may not be the best idea as it has the potential to grow your database extremely quickly (it's not uncommon for apps to create GBs worth of logs in a single day).
You could write a background job that opens the log files, searches for data, and saves it to your database, but the configuration required for that will depend largely on your hosting setup.
So I got a solution working and in fairness it wasn't as difficult as I had thought. As I was using the lograge gem for formatting the logs I created a custom formatter through the guide in this Link.
As I wanted the Son format I just copied this format but was able to put in the call for a background job at this point and also cleanse some data I did not want.
module Lograge
module Formatters
class SomeService < Lograge::Formatters::Json
def call(data)
data = data.delete_if do |k|
[:format, :view, :db].include? k
end
::JSON.dump(data)
# faktory job to ship data
LogSenderJob.perform_async(data)
super
end
end
end
end
This was just one solution to the problem that was made easier as I was able to get the data formatted via lograge but another solution was to create a custom logger and in there I could tell it to write to a database if necessary.
I have a ckan website.I upload data manually to the datastore, its working perfectly. However my actual requirement is to automate the process.I want a job scheduler which automatically upload data like geojson,excel,csv,pdf,etc in the ckan application.
Please provide inputs
Thanks
You could write a bash (or python) script that calls the CKAN API using the ckanapi program. Use the action function create_package or probably more likely create_resource. This example, including uploading the file, is in the ckanapi's README:
$ ckanapi resource_create package_id=my-dataset-with-files \
upload=#/path/to/file/to/upload.csv \
url=dummy-value # ignored but required by CKAN<2.6
If this is a regular automatable thing then you probably don't want to add a new CKAN dataset each time, as that implies the metadata for that dataset is the same each time, and that doesn't sound helpful for the user - you probably want a new resource each time instead. If the only thing that has changed between each data file is the date, with everything else the same (the purpose, data structure, method of collection, people involved) then it makes more sense to create a single dataset and each update is a new resource in it.
In my application, I have a textarea input where users can type a note.
When they click Save, there is an AJAX call to Web Api that saves the note to the database.
I would like for users to be able to attach multiple files to this note (Gmail style) before saving the Note. It would be nice if the upload could start as soon as attached, before saving the note.
What is the best strategy for this?
P.S. I can't use jQuery fineuploader plugin or anything like that because I need to give the files unique names on the server before uploading them to Azure.
Is what I'm trying to do possible, or do I have to make the whole 'Note' a normal form post instead of an API call?
Thanks!
This approach is file-based, but you can apply the same logic to Azure Blob Storage containers if you wish.
What I normally do is give the user a unique GUID when they GET the AddNote page. I create a folder called:
C:\TemporaryUploads\UNIQUE-USER-GUID\
Then any files the user uploads at this stage get assigned to this folder:
C:\TemporaryUploads\UNIQUE-USER-GUID\file1.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file2.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file3.txt
When the user does a POST and I have confirmed that all validation has passed, I simply copy the files to the completed folder, with the newly generated note ID:
C:\NodeUploads\Note-100001\file1.txt
Then delete the C:\TemporaryUploads\UNIQUE-USER-GUID folder
Cleaning Up
Now. That's all well and good for users who actually go ahead and save a note, but what about the ones who uploaded a file and closed the browser? There are two options at this stage:
Have a background service clean up these files on a scheduled basis. Daily, weekly, etc. This should be a job for Azure's Web Jobs
Clean up the old files via the web app each time a new note is saved. Not a great approach as you're doing File IO when there are potentially no files to delete
Building on RGraham's answer, here's another approach you could take:
Create a blob container for storing note attachments. Let's call it note-attachments.
When the user comes to the screen of creating a note, assign a GUID to the note.
When user uploads the file, you just prefix the file name with this note id. So if a user uploads a file say file1.txt, it gets saved into blob storage as note-attachments/{note id}/file1.txt.
Depending on your requirement, once you save the note, you may move this blob to another blob container or keep it here only. Since the blob has note id in its name, searching for attachments for a note is easy.
For uploading files, I would recommend doing it directly from the browser to blob storage making use of AJAX, CORS and Shared Access Signature. This way you will avoid data going through your servers. You may find these blog posts useful:
Revisiting Windows Azure Shared Access Signature
Windows Azure Storage and Cross-Origin Resource Sharing (CORS) – Lets Have Some Fun
I am working on a Rails web application, running on a Heroku stack, that handles looking after some documents that are attached to a Rails database object. i.e. suppose we have an object called product_i of class/table Product/products, and product_i_prospectus.pdf is the associated product prospectus, where each product has a single prospectus.
Since I am working on Heroku, and thus do not have root access, I plan to use Amazon S3 to store the static resource associated with product_i. So far, so good.
Now suppose that product_i_attributes.txt is also a file I want to upload, and indeed I want to actually fill out information in the product_i object (i.e. the row in the table corresponding to product_i), based on information in the file product_i_attributes.txt.
In a sentence: I want to create, or alter, database objects, based on the content of static text files uploaded to my S3 bucket.
I don't actually have to be able to access them once they are in the bucket strictly speaking, I just need to create some stuff out of a text file.
I have done something similar with csv files. I would not try to process the file directly at upload as it can be resource intensive.
My solution was to upload the file to s3 and then call a background job method(delayed_job, resque, etc.) that processed the csv after upload. You could then call a delete after the job processed to remove the file from s3 if you no longer needed it after processing.
For Heroku this will require that you add a worker (if you don't already have one) to process the background jobs that will process the text files.
Take a look at the aws-sdk-for-ruby gem. This will allow you to access your S3 bucket.