Sending data from an analytics engine to a Rails server - ruby-on-rails

I have an analytics engine which periodically packages a bunch of stats in JSON format. I want to send these packages to a Rails server. Upon a package arriving, the Rails server should examine it, generate a model instance out of it (for historical purposes), and then display the contents to the user. I've thought of two approaches.
1) Have a little app residing on the same host as the Rails server to be listening for these packages (using ZeroMQ). Upon receiving a package, the app would invoke a Rails action through CURL, passing on the package as a parameter. My concern with this approach is that my Rails server checks that only signed-in users can access actions which affect models. By creating an action accessible to this listening app (and therefore other entities), am I exposing myself to a major security flaw?
2) The second approach is to simply have the listening app dump the package into a special database table. The Rails server will then periodically check this table for new packages. Upon detecting one or more, it will process them and remove them from the table.
This is the first time I'm doing something like this, so if you have techniques or experiences you can share for better solutions, I'd love to learn.
Thank you.

you can restrict access to a certain call by limiting the host name that is allowed for the request in routes.rb
post "/analytics" => "analytics#create", :constraints => {:ip => /127.0.0.1/}
If you want the users to see updates, you can use polling to refresh the page every minute orso.

1) Yes you are exposing a major security breach unless :
Your zeroMQ app provides the needed data to do authentification and authorization on the rails side
Your rails app is configured to listen only on the 127.0.0.1 interface and is thus not accessible from the outside
Like Benjamin suggests, you restrict specific routes to certain IP
2) This approach looks a lot like what delayed_job does. You might wanna take a look there : https://github.com/collectiveidea/delayed_job and use a rake task to add a new job.
In short, your listening app will call a rake task that will add a custom delayed_job when receiving a packet. Then let delayed_job handle the load. You benefit from delayed_job goodness (different queues, scaling, ...). The hard part is getting the result.
One idea would be to associated a unique ID with each job, and have the delayed_job task output the result in a data store wich associated the job ID with the result. This data store can be a simple relational table
+----+--------+
| ID | Result |
+----+--------+
or a memecache/redis/whatever instance. You just need to poll that data store looking for the result associated with the job ID. And delete everything when you are done displaying that to the user.
3) Why don't you directly POST the data to the rails server ?

Following Benjamin's lead, I implemented a filter for this particular action.
def verify_ip
#ips = ['127.0.0.1']
if not #ips.include? request.remote_ip
redirect_to root_url
end
end
The listening app on the localhost now invokes the action, passing the JSON package received from the analytics engine as a param. Thank you.

Related

Share data between ActiveJob and Controller

Every n seconds application is requesting a remote JSON file that provides live prices for securities in the Trading system. JSON has a block with the data I need (marketdata) and a block with the current dataversion(version and seqnum).
Right now I use ActionController::Live (with EventSource on the client side) to push updated data to the browser. All actions are done within one method:
opening SSE connection;
forming dynamic URL;
pulling new data from remote server;
comparing/reassigning seqnum value;
updating database if needed.
So my goal now is to separate pulling & updating the database (ActiveJob) with pushing updated values to the browser (ActionController::Live). To accomplish this I need:
either to store somewhere on the server side seqnum & version to share between controller and background job;
or monitor databases for the latest changes in the updated_at fields.
So basically I have two questions:
What is more efficient between the two options above?Are there any other good approaches?
(in case the first one has a right to exist) How to implement this approach?
Considering the fact that you might have, for example, multiple rails process running, I believe it becomes quite hard for you to let activejob talk directly to rails controller in some way.
Defintely store seqnum and version, I wouldn't rely on updated_at in any case, it's too easy to get it updated randomly and so end up sending stuff to the client without any real reason. Also in this case they seem like very solid fields to point out if the file has been updated.
With polling
That being said, you want to "signal" ActionController::Live in some way and I'm afraid polling here is your only option, unless on your client side there is a specific moment when it needs to know if the file has been updated, in which case you might want to use websockets or something similar.
So, something like
cached_request = YourCachedRequest.latest # Assuming it returns a single record
updated = true
loop do
if updated
updated = false
response.stream.write cached_request.serialize_in_some_way
end
current_version = cached_request.version # use seqnum too if you need
cached_request = cached_request.reload
updated = true if cached_request.version > current_version
sleep 20.0
end
Without polling
If you want an option that doesn't involve polling, you can only go for websockets I believe. However you have a more efficient option:
Create a mini application (evenmachine/sinatra/something light) where the clients will poll (you can pass through your main application to distribute this to differente nodes of this mini application), the point of this app is only to reroute messages from your main application to polling clients.
Now, you can create an internal API endpoint for your main application that it's used only by delayed job. Delayed job will hit this endpoint only when it notices that the fetched JSON is actually updated relative to the one currently stored. If that's the case, it will hit your main app API endpoint which in turn will send a message (again, probably through an HTTP API endpoint, this time on your mini app) to all your mini app instances, which in turn will send them to your clients.
In this way, you don't overload your main server but only these mini-nodes which can have localized outages (which is a big advantage, instead of having a big system outage).

Refresh data with API every X minutes

Ruby on Rails 4.1.4
I made an interface to a Twitch gem, to fetch information of the current stream, mainly whether it is online or not, but also stuff like the current title and game being played.
Since the website has a lot of traffic, I can't make a request every time a user walks in, so instead I need to cache this information.
Cached information is stored as a class variable ##stream_data inside class: Twitcher.
I've made a rake task to update this using cronjobs, calling Twitcher.refresh_stream, but naturally that is not running within my active process (to which every visitor is connecting to) but instead a separate process. So the ##stream_data on the actual app is always empty.
Is there a way to run code, within my currently running rails app, every X minutes? Or a better approach, for that matter.
Thank you for your time!
This sounds like a good call for caching
Rails.cache.fetch("stream_data", expires_in: 5.minutes) do
fetch_new_data
end
If the data is in the cache and is not old then it will be returned without executing the block, if not the block is used to populate the cache.
The default cache store just keeps things in memory so doesn't fix your problem: you'll need to pick a cache store that is shared across your processes. Both redis and memcached (via the dalli gem) are popular choices.
Check out Whenever (basically a ruby interface to cron) to invoke something on a regular schedule.
I actually had a similar problem with using google analytics. Google analytics requires that you have an API key for each request. However the api key would expire every hour. If you requested a new api key for every google analytics request, it'd be very slow per request.
So what I did was make another class variable ##expires_at. Now in every method that made a request to google analytics, I would check ##expires_at.past?. If it was true, then I would refresh the api key and set ##expires_at = 45.minutes.from_now.
You can do something like this.
def method_that_needs_stream_data
renew_data if ##expires_at.past?
# use ##stream_data
end
def renew_data
# renew ##stream_data here
##expires_at = 5.minutes.from_now
end
Tell me how it goes.

How to dynamically and efficiently pull information from database (notifications) in Rails

I am working in a Rails application and below is the scenario requiring a solution.
I'm doing some time consuming processes in the background using Sidekiq and saves the related information in the database. Now when each of the process gets completed, we would like to show notifications in a separate area saying that the process has been completed.
So, the notifications area really need to pull things from the back-end (This notification area will be available in every page) and show it dynamically. So, I thought Ajax must be an option. But, I don't know how to trigger it for a particular area only. Or is there any other option by which Client can fetch dynamic content from the server efficiently without creating much traffic.
I know it would be a broad topic to say about. But any relevant info would be greatly appreciated. Thanks :)
You're looking at a perpetual connection (either using SSE's or Websockets), something Rails has started to look at with ActionController::Live
Live
You're looking for "live" connectivity:
"Live" functionality works by keeping a connection open
between your app and the server. Rails is an HTTP request-based
framework, meaning it only sends responses to requests. The way to
send live data is to keep the response open (using a perpetual connection), which allows you to send updated data to your page on its
own timescale
The way to do this is to use a front-end method to keep the connection "live", and a back-end stack to serve the updates. The front-end will need either SSE's or a websocket, which you'll connect with use of JS
The SEE's and websockets basically give you access to the server out of the scope of "normal" requests (they use text/event-stream content / mime type)
Recommendation
We use a service called pusher
This basically creates a third-party websocket service, to which you can push updates. Once the service receives the updates, it will send it to any channels which are connected to it. You can split the channels it broadcasts to using the pub/sub pattern
I'd recommend using this service directly (they have a Rails gem) (I'm not affiliated with them), as well as providing a super simple API
Other than that, you should look at the ActionController::Live functionality of Rails
The answer suggested in the comment by #h0lyalg0rithm is an option to go.
However, primitive options are.
Use setinterval in javascript to perform a task every x seconds. Say polling.
Use jQuery or native ajax to poll for information to a controller/action via route and have the controller push data as JSON.
Use document.getElementById or jQuery to update data on the page.

How can I create a lock for concurrency across different requests (on a process-based webserver)

I have a rails app that people can send data to in the query params of a url. The rails app then validates the correctness of the data and creates a json reponse listing any detected errors. The validation itself is done by checking the data against a set of rules that live in a github repo.
Ideally I'd like to update my local copy of this repo once a day. In order to prevent complications I'd like any requests that come in while this update takes place to back off for a few seconds.
What's the best way to communicate to the incoming requests that an update is currently occuring? I'm using a process based webserver (unicorn), so memory mutexes don't seen like the right answer :(.

Suggestions for how to write a service in Rails 3

I am building an application which will send status requests to users (via email & sms) on a regular basis. I want to execute the service each hour which will:
Query the database for all requests that need to be sent (based on some logic)
Send the requests through Amazon's Simple Email Service (this is already working)
Write a record of the status request notification back to the data store
I am considering wrapping up this series of operations into a single controller with an end point that can be called remotely to kick off the process within the rails app.
Longer term, I will break this process out into an app that can be run independently of my rails app, but for now I'm just trying to keep it simple.
My first inclination is to build the following:
Controller with the following elements:
A method which will orchestrate the steps outlined above (and can be called externally)
A call to the status_request model which will bring back a collection of request needing to be sent
A loop to iterate through the pending requests, which will:
Make a call to my AWS Simple Email Service module to actually send the email, and
Make a call to the status_request model to log the request back to the database
Model:
A method on my status_request model which will bring back a collection of requests that need to be sent
A method in my status_request model which will log that a notification was sent
Since this will behave as a service that gets called periodically from an outside scheduler I don't think I'll need a view for this operation. (Will, of course, need views to show users and admins what requests have been sent, but that's later...).
As someone new to Rails, I'm asking for review of this approach and any suggestions you may have.
Thanks!
Instead of a controller which Jeff pointed out exposes a security risk, you may just want to expose a rake task and use cron to invoke it on an hourly basis.
If you are still interested in building a controller, look at devise gem and its single access token, token_authenticatable, for securing the methods you are exposing.
You may also want to look at delayed_job or resque to offload the call to status_request and the loop to AWS simple service to a background worker process.
You may want a seperate controller and view for the log file so you can review progress on demand.
And if you want to get real fancy use Amazon SNS to send you alerts when the service reaches some unacceptable level of failures, backlog, etc.
Since you are trying to invoke this from an outside process, your approach should work. You could also have a worker process that processes task when they are there.
You will need routes to expose your service, and you may want to also make security decisions. How will the service that invokes your application authenticate so all others can't hit it at will?
Another consideration should be how many emails are you sending. If there are enough, we may want to look into the fact that writing this sort of loop is going to be extremely top heavy; and may affect users on the current system if it's a web application.
In the end, there are many ways to do this. I would focus on the performance/usage you expect as well as security. There's never one perfect way to solve a problem like this, and your way should just be aware of the variables it will need to be operating within.
Resque and Redis might be helpful to you in scheduling and performing operatio n .They are simple and superfast, [here](http://railscasts.com/episodes/271-resque] is a simple tut on same.

Resources