Rails generating/caching rarely changing datasets - ruby-on-rails

In a website I am working on I have cases with an expensive controller endpoint (5+ seconds between request received and having the response ready) that outputs data sourced from database tables that nearly never change.
The only time I expect the response to change is once every few months when I get new source data, or when redeploying the site with related code changes.
As such I am thinking of just creating the resulting JSON, etc. files to then serve statically. But is there a Rails way to do this, rather than just some custom CLI script that creates a bunch of files I then commit? Ideally still being able to use ActiveRecord etc. maybe as some sort of build/deploy time action?

Related

Rails save log data to database

Is it possible to access to the information being saved into a rails log file without reading the log file. To be clear I do not want to send the log file as a batch process but rather every event that is written into the log file I want to also send as a background job to a separate database.
I have multiple apps running in docker containers and wish to save the log entries of each into a shared telemetry database running on the server. Currently the logs are formatted with lograge but I have not figured out how to access this information directly and send it to a background job to be processed.(as stated before I would like direct access to the data being written to the log and send that via a background job)
I am aware of the command Rails.logger.instance_variable_get(:#logger) however what I am looking for is the actual data being saved to the logs so I can ship it to a database.
The reasoning behind this is that there are multiple rails api's running in docker containers. I have an after action set up to run a background job that I hoped would send just the individual log entry but this is where I am stuck. Sizing isn't an issue as the data stored in this database to be purged every 2 weeks. This is moreso a tool for the in-house devs to track telemetry through a dashboard. I appreciate you taking the time to respond
You would probably have to go through your app code and manually save the output from the logger into a table/field in your database inline. Theoretically, any data that ends up in your log should be accessible from within your app.
Depending on what how much data you're planning on saving this may not be the best idea as it has the potential to grow your database extremely quickly (it's not uncommon for apps to create GBs worth of logs in a single day).
You could write a background job that opens the log files, searches for data, and saves it to your database, but the configuration required for that will depend largely on your hosting setup.
So I got a solution working and in fairness it wasn't as difficult as I had thought. As I was using the lograge gem for formatting the logs I created a custom formatter through the guide in this Link.
As I wanted the Son format I just copied this format but was able to put in the call for a background job at this point and also cleanse some data I did not want.
module Lograge
module Formatters
class SomeService < Lograge::Formatters::Json
def call(data)
data = data.delete_if do |k|
[:format, :view, :db].include? k
end
::JSON.dump(data)
# faktory job to ship data
LogSenderJob.perform_async(data)
super
end
end
end
end
This was just one solution to the problem that was made easier as I was able to get the data formatted via lograge but another solution was to create a custom logger and in there I could tell it to write to a database if necessary.

Preventing Rails from connecting to database during initialization

I am quite new at Ruby/Rails. I am building a service that make an API available to users and ends up with some files created in the local filesystem, without any need to connect to any database. Then, once every few hours, I want to run a piece of ruby code that takes these local files, uploads them to Amazon S3 and registers their location into a Postgres database.
Right now both codes live together in the same project. I am observing that every time a user does something the system connects to the database. I have seen this answer which recommends to eliminate all traces of ActiveRecord in my code, but given that I want to have my background bookkeeping process connect to the database I am stuck on what to do.
Is it possible to define two different profiles (one with database and one without) and specify which profile a certain function call should run on? would this work?
I'm a bit confused by this, the db does not magically connect to the database for kicks on every request, it does so because of a specific request requires it. Generally through ActiveRecord but not exclusively
If your system is connecting every time you make a request, then that implies you have some sort of user metric or authorisation based code in there. Just killing off the database will cause this to fail, and likely you'll have to find it anyways, to then get your system to work. I'd advise locating it.
Things to look for are before_filters in controllers, or database session management, for example, or look for what is in the logs - the query should appear - and that will tell you what is being loaded, modified or whatnot.
It might even work to stop your database, just before doing a user activity, and see where the error leads you. Rinse and repeat until the user activity works, without the database.

How to dispaly a holding screen whilst ActiveJob retrieves lots of data from an external API

I have an application that makes API requests to salesforce using restforce.
Specifically the application finds a contact object, returns IDs for all related objects and then pulls the full record for every related object based on their ID.
This takes a long time for two reasons:
There are a lot of request to an external API, usually takes a few fractions of a second for each to reply and for some there can be +500 individual requests.
There is often a large amount of data being pulled back via each request.
All requests currently fall within the salesforce rest API limits but I'm getting timeout errors from my development server as it can take 5+ minutes for some of these requests to process.
Rails 4.2 - How best to handle this?
My question is how do I best get rails to handle this?
I can fire the API requests either from the controller (which definitely violates the skinny controllers) or from the view (via helper methods, which seems like a dodgy hack).
Ideally I'd like to get it running in a background job, but i'm unsure if I can just include all the authentication and other methods in a job in the same way I can include helper methods?
Even if I could get it to work in a background job, I'm unsure what best practice might be for the user experience. Ideally I'd like to route them to a page telling them to "hang tight, go get a coffee" with a progress bar, and then auto route them to the final page once the request is complete...
But I'm unsure how to generate a temporary display until a job has been completed?
Could anyone recommend any gems or strategies that might help me digest this problem?
You should definitely use a background job for this.
Give a database object to the job, which it will update to signal that is has finished, and maybe from time to time to indicate progress.
On the user side, simply tell them that the background job is working, with eventually a progress indicator, and display the result once the database object giving to the job tells you it's ready.

rails 3 - generate a preview for a new post on the fly

For a rails 3 app I am building, a user gets to share a post which has numerous different parameters. Some parameters are optional, others are required. While the user is filling out the parameters, I want to generate a preview for how the post will look on the fly. Some parameters are URLs which need to be sent back to the server to process, so basically, the preview cannot be 100% generated client side.
I was wondering what it the best way to go about this. Since it could be a lot of data, I don't want to send all the data back to the server every time something changes to regenerate the preview. I would rather only like to send the data that has changed. But in this case, where is the rest of the data stored? In a session, perhaps? Also, I would prefer to not rebuild the model object with all the data every time. Is there a way to persist the model object that represents the post as it is being created?
Thanks.
How big is that "a lot of data"? If you send it all, does it have a noticeable impact on performance, or are you just imagining that it would?
As you provided not too much information, here's basic info on what I would do:
process client-side. As much as possible.
data that can't be processed on the client - send to the server (only that part, not the rest of it). Receive result of processing and incorporate into what you already built.
no sessions, partially built models and any other state on the server. Stateless protocols are simple. Simplicity is prerequisite for reliability.

Searching for a song while using multiple API's

I'm going to attempt to create an open project which compares the most common MP3 download providers.
This will require a user to enter a track/album/artist name i.e. Deadmau5 this will then pull the relevant prices from the API's.
I have a few questions that some of you may have encountered before:
Should I have one server side page that requests all the data and it is all loaded simultaneously. If so, how would you deal with timeouts or any other problems that may arise. Or should the page load, then each price get pulled in one by one (ajax). What are your experiences when running a comparison check?
The main feature will to compare prices, but how can I be sure that the products are the same. I was thinking running time, track numbers but I would still have to set one source as my primary.
I'm making this a wiki, please add and edit any issues that you can think of.
Thanks for your help. Look out for a future blog!
I would check amazon first. they will give you a SKU (the barcode on the back of the album, I think amazon calls it an EAN) If the other providers use this, you can make sure they are looking at the right item.
I would cache all results into a database, and expire them after a reasonable time. This way when you get 100 requests for Britney Spears, you don't have to hammer the other sites and slow down your application.
You should also make sure you are multithreading whatever requests you are doing server side. Curl for instance allows you to pull multiple urls, and assigns a user defined callback. I'd have the callback send a some data so you can update your page with as the results come back. GETTUNES => curl callback returns some data for each url while connection is open that you parse it on the client side.

Resources