I'm building a project where the front end is react and the backend is ruby on rails and uses a postgres DB. A required functionality is the ability for users to export a large dataset. So they'll get a table view and click "export" and that will send a request to the backend which should create a CSV file and send it to the front end.
This is the query that displays the data in the table and how it's executed (using find_by_sql)
query = <<-SQL
SELECT * FROM ORDERS WHERE ORDERS.STORE_ID = ? OFFSET ? LIMIT ?
SQL
query_result = Order.find_by_sql([query, store_id.to_i, offset.to_i, 50])
Now whenever the users click export, it's going to make a request to the same endpoint except it'll set a flag to notify the backend that it wants a CSV file and the limit will be much greater than 50...it could be hundreds of thousands to millions of records.
What is the best way to create a CSV to send to the front end, taking into account that the number of records will be large.
You have a couple of options:
Create a temporary file, use the standard CSV library to populate that temporary file, and then use send_file to dispatch that file to the user.
Depending on the size of data and/or your server's ability to host large temporary files, spooling to a tempfile might take too long or be otherwise impractical. In that case, you might want to stream the CSV data as it's generated, which is more complicated to set up but lessens the impact on your server.
This article has some well thought out steps to set up an interface for streaming data. As a bonus, it also delegates the act of generating the CSV to PostgreSQL itself. This will give you the best possible performance, but at the expense of code readability. However, it should set you on your way.
Related
I am trying to extract a large amount of details out of our Eloqua system using it's API and got this API to work perfectly for single IDs: https://docs.oracle.com/en/cloud/saas/marketing/eloqua-rest-api/op-api-rest-1.0-data-contact-id-get.html
The problem is that I need to run this for a large number of IDs and it will require alot in order to run it for the entire population. Is there any bulk APIs that can extract all of the following details out of Eloqua/Contact for the entire population? I don't see any on that pages documentation that meet this need under the Bulk section.
contactid, company, employees, company_revenue, business_phone, email_address, web_domain, date_created, date_modified, address_1, address_2, city, state_or_province, zip_or_postal_code, mobile_phone, first_name, last_name, title
It's a multi-step process with the Bulk API, typically in the following fashion:
Get a list of the current internal field names - useful for creating your export definition
Create an export definition and post it here. There is a useful example on the page, you do not need a filter criteria. Store the export ID somewhere
Using your export definition id, create a sync. It will gather the data in the background and prepare it for you. Take note of the sync ID provided in the initial response.
Check on the sync status with your sync ID here. It should only take a couple of minutes - and there is a callback url option as well in the previous step, if you don't want to keep polling.
Once your data is ready, use that sync id and request the data. Depending on how many rows were retrieved, you might need to paginate through the results using the offset query param. By default it will give you JSON, but I usually choose CSV (specify in the header).
If you need updated data, feel free to create a new sync using the same export definition id. You do not need to create a new export definition each time.
I have a user submission form that includes images. Originally I was using Carrierwave, but with that the image is sent to my server for processing first before being saved to Google Cloud Services, and if the image/s is/are too large, the request times out and the user just gets a server error.
So what I need is a way to upload directly to GCS. Active Storage seemed like the perfect solution, but I'm getting really confused about how hard compression seems to be.
An ideal solution would be to resize the image automatically upon upload, but there doesn't seem to be a way to do that.
A next-best solution would be to create a resized variant upon upload using something like #record.images.first.variant(resize_to_limit [xxx,xxx]) #using image_processing gem, but the docs seem to imply that a variant can only be created upon page load, which would obviously be extremely detrimental to load time, especially if there are many images. More evidence for this is that when I create a variant, it's not in my GCS bucket, so it clearly only exists in my server's memory. If I try
#record.images.first.variant(resize_to_limit [xxx,xxx]).service_url
I get a url back, but it's invalid. I get a failed image when I try to display the image on my site, and when I visit the url, I get these errors from GCS:
The specified key does not exist.
No such object.
so apparently I can't create a permanent url.
A third best solution would be to write a Google Cloud Function that automatically resizes the images inside Google Cloud, but reading through the docs, it appears that I would have to create a new resized file with a new url, and I'm not sure how I could replace the original url with the new one in my database.
To summarize, what I'd like to accomplish is to allow direct upload to GCS, but control the size of the files before they are downloaded by the user. My problems with Active Storage are that (1) I can't control the size of the files on my GCS bucket, leading to arbitrary storage costs, and (2) I apparently have to choose between users having to download arbitrarily large files, or having to process images while their page loads, both of which will be very expensive in server costs and load time.
It seems extremely strange that Active Storage would be set up this way and I can't help but think I'm missing something. Does anyone know of a way to solve either problem?
Here's what I did to fix this:
1- I upload the attachment that the user added directly to my service provider ( I use S3 ).
2- I add an after_commit job that calls a Sidekiq worker to generate the thumbs
3- My sidekiq worker ( AttachmentWorker ) calls my model's generate_thumbs method
4- generate_thumbs will loop through the different sizes that I want to generate for this file
Now, here's the tricky part:
def generate_thumbs
[
{ resize: '300x300^', extent: '300x300', gravity: :center },
{ resize: '600>' }
].each do |size|
self.file_url(size, true)
end
end
def file_url(size, process = false)
value = self.file # where file is my has_one_attached
if size.nil?
url = value
else
url = value.variant(size)
if process
url = url.processed
end
end
return url.service_url
end
In the file_url method, we will only call .processed if we pass process = true. I've experimented a lot with this method to have the best possible performance outcome out of it.
The .processed will check with your bucket if the file exists or not, and if not, it will generate your new file and upload it.
Also, here's another question that I have previously asked concerning ActiveStorage that can also help you: ActiveStorage & S3: Make files public
I absolutely don't know Active Storage. However, a good pattern for your use case is to resize the image when it come in. For this
Let the user store the image in Bucket1
When the file is created in Bucket1, an event is triggered. Plug a function on this event
The Cloud Functions resizes the image and store it into Bucket2
You can delete the image in Bucket1 at the end of the Cloud Function, or keep it few days or move it to cheaper storage (to keep the original image in case of issue). For this last 2 actions, you can use Life Cycle to delete of change the storage class of files.
Note: You can use the same Bucket (instead of Bucket1 and Bucket2), but an event to resize the image will be sent every time that a file is create in the bucket. You can use PubSub as middleware and add filter on it to trigger your function only with the file is created in the correct folder. I wrote an article on this
I want to implement a flight search system in Rails 4.
And I found this resource,
My questions are:
I've downloaded the airports.dat file and it contains chunks of data, do I need to import those data into psql? If yes, how?
If I just need the airport ID and name values, how do I selectively import them?
If I want to implement ajax load airport name just like the way expedia.com did, would it be buggy(slow loading time) if I use VPSs like Digitalocean?
Please advise me.
You can write a method that read line by line, and parse every line by the comma (","). Then you have enough information to insert to database.
For example:
flight1 = '507,"Heathrow","London","United Kingdom","LHR","EGLL",51.4775,-0.461389,83,0,"E","Europe/London"'
Then you can get the ID by call flight1.split(",")[0]
The speed of your search function is affected by your search algorithm and how do you implement the logic, and isn't affected by using a VPS.
I've a "best practice" question on CouchDB (actually I'm using TouchDB a CouchDB port to iOS), when using CouchCocoa framework.
I need to delete a bunch of documents that I get via a query.
I know 3 ways to do this:
1) put all the documents into an NSArray, then use [CouchDatabase deleteDocuments:]
2) foreach query rows call the delete method, like:
for (CouchQueryRow* row in query.rows)
[row.document DELETE];
3) create a query that emit the _id, _rev properties and add the _deleted property, then use the bulk update, like:
[couchDatabase putChanges:]
What's the better performance-wise? There's a better way to do it?
At the HTTP API level, the fastest way to achieve this is to run a single batch request that provides the _id and current _rev of all documents to be removed.
Your job is to make sure that CouchCocoa actually does this — I know that CouchCocoa will try to cache the _rev of documents it reads, so if you are deleting documents that have just been read, [CouchDatabase deleteDocuments:] should be enough, otherwise you will have to [CouchDatabase getDocumentsWithIDs:] first.
If your documents are very large, it might become better to get the _rev using a view instead of a bulk fetch. This forces you to use [CouchDatabase putChanges:] to perform the bulk deletion. I don't know where the document size threshold lies, so you will have to benchmark this one.
Of course, you also need to decide what happens when a conflict occurs.
I'm having a problem is sending(creating) an HL7 message using mirth.
I want to read data from my patient table in SQLSERVER 2008 and, using that data,
I want to send a message to my destination connector, a file writer. I want my messages to get saved in the file writer's output directory.
So far I'm able to generate the message, but the size of the output file in my destination directory is increasing as the channel's polling time goes on.
Have I done something wrong in the transformer mapping?
UPDATE:
The size of the output file in my destination directory IS increasing. (My .txt file starts from 1 kb and goes to 900kb and so on). This is happening becasue same data is getting generated again and again and multiple times too. for eg. my generated message has one(MSH,PID,PV1,ORM) for one row of data in my Database. The same MSH,PID, PV1 and ORM are getting generated multiple times.
If you are seeing the same data generated in your output directory multiple time, the most likely cause is that you are not doing anything to indicate to your database that a given record has been processed.
For example, if you have 1 record in your database: ["John", "Smith", "12134" ...] on the first poll, you will generate 1 message. If on the second poll you also have a second record ["Fred", "Jones", "98371" ...], you will generate TWO messages - one for John Smith and one for Fred Jones. And so on.
The key is to use the "Run On-Update Statement" of your Database Reader (Source) connector to update the database table you are polling with an indication that a given record has been processed. This ensures that the same record is not processed multiple times.
This requires that your source table have some kind of column to indicate the record has been processed. Mirth will not keep track of this for you - you must do it manually.
You can't have a file reader as a destination, so I assume you mean file writer. You say that "the size of my file in my destination is increasing." Is that a typo? Do you mean NOT increasing?
If it is increasing, then your messages are getting generated and you can view them to start your next round of troubleshooting...
If not, the you should look at the message log in the dashboard to see what is happening on a message-by-message basis - that would be the next place to troubleshoot.
You have to have a way of distinguishing what records to pull from the database by filtering on some sort of status flag or possible a time-stamp. Then, you have to use some sort of On-Update statement to mark these same records as processed.
i.e.
Select id, patient, result from results where status_flag='N'
or
Select * from results where status_flag = 'N' and created_date >= '9/25/2012'
Then, in either a transformer step or the On-Update section of your Source, you would do something like:
Update results
set status_flag = 'Y' where id=$(id)
If you do not do something like this and you have Mirth polling at a certain interval, it will just keep pulling the same records over and over.
You have to change your connector type as Database reader in source.
You have to change your connector type as file writer in the destination.
And you can write your data in the file, For which you have access to write.
while creating HL7 template you have to use the following code in outbound message template
MSH|^~\&|||
Thanks
Krishna