Reading pending work from PhantomJS - ruby-on-rails

I am building a pool of PhantomJS instances, and I am trying to make it so that each instance is autonomous (it fetches next job to be done).
My concern is to choose between these two:
Right now I have a Rails app that can give to PhantomJS which URL needs to be parsed next. So, I could do an HTTP get call from PhantomJS to my Rails app and Rails would respond with a URL that is pending to be done (most likely Rails would get that from a queue).
I am thinking on building a stand alone Redis server that PhantomJS would access via Webdis, so Rails would push the jobs there, and PhantomJS instances would fetch from it directly.
I am trying to think what would be the correct decision in terms of performance: PhantomJS hitting the Rails server (so Rails needs to get the job from the queue and send it to PhantomJS), or just making PhantomJS to access a Redis server directly.

Maybe I need more info but why isn't the performance answer obvious? Phantom JS hitting the Redis server directly means less stuff to go through.
I'd consider developing whatever is easier to maintain. What's the ballpark req/minute? What sort of company (how funded / resource-strapped are you)?
There's also more OOTB solutions like IronMQ that may ease the pain

Related

Rails ActiveRecord/Postgres single query timeout?

I have a logging query (a simple INSERT) that happens on every single request.
For this request only (the one that happens on every page load), I want to set the limit to 500ms in case the database is locked/slow/down it won't affect the site, where the site hangs while it waits to connect/write.
Is there a way I can specify a timeout somehow on a per-query basis that I can abort the LoggedRequest.create! if it's taking too long?
I don't want to set it in my config because I have many other queries that shouldn't have timeouts that low.
I'm using Postgres 11.7
I also don't know how I feel about setting a timeout for the entire session because I don't want that connection to be shared from the pool with other queries that can't have that timeout.
Rails 6 introduces event based triggers for notifications, logging etc that comes in very handy, provided you are using/can afford to migrate to Rails 6. Here'a useful post that demonstrates creating event based triggers for notifications/logging: https://pramodbshinde.wordpress.com/2020/03/20/custom-events-tracking-with-activesupportnotifications-and-audited/
If, for some reason, you cannot use Rails 6, perhaps this article might help you find some answers: https://evilmartians.com/chronicles/the-silence-of-the-ruby-exceptions-a-rails-postgresql-database-transaction-thriller
If I were you, I could also contemplate using AJAX with a fire-and-forget API request to server for logging/whatever that is not critical to normal functioning of the application.

How does Rails handle concurrent request on the different servers?

This has been asked before, but never answered particularly exhaustively.
Let's say you have Rails running on one of the several web servers that support it, such as WEBrick, Mongrel, Apache and Nginx (through Passenger Phusion). The server receives two concurrent GETs, what happens? Is this clearly documented anywhere?
Basically I'm curious:
Is a new instance or rails is created by the server every time?
Does it somehow try to re-use existing instances (ruby processes with Rails already loaded in it?) to handle the request?
Isn't starting a new ruby process and re-loading Rails in it pretty slow?
Thanks! Any links to exhaustive clarifications would be greatly appreciated.
Some use workers (apache, phusion, unicorn), some don't. If you don't
use workers, it really depends wherever your application is threadsafe
or not. If you are, more than one request may be served at a time,
otherwise there's Rack::Lock which blocks that. If there are workers
(separate processes), each of them does a request then goes back to
the pool where the master assigns it a new request. Read
on

Rails 3.1 - Firing an specific event with the EventMachine

I would like to use the plugin em-eventsource ( https://github.com/AF83/em-eventsource ) for server-sent events in a Rails 3.1-project. My problem is, that there is only explained how to listen on events and receive messages, but not how to fire a specific event up and send the message. I would like to produce the event in an Active Record-Observer. Am I right when I think that I have to defer a operation with EventMachine to produce this event, or how can I solve this?
And yes, it has to be Ruby on Rails. If I don't get this to work with EventMachine, I would try to bypass the whole ruby-part with node.js.
Actually I worked on this library a little with the maintainer. I think you mixed the client part with the server one. em-eventsource is a client library which you can use to consume a ServerSentEvent API, it's not meant to fire SSE.
On the server side, it quite doesn't matter whether you are using Rails or any other stack (nodejs, php…) as long as the server you are running on supports streaming. The default web server shipped with Rails does not (Webrick) but there are many others which do: Thin, Puma, Goliath…
In order to fire SSE in Rails, you would have to use both a streaming-capable server among those cited, and abide by the SSE specification. It mostly falls down to, first, responding with the proper Content-type header ("text/event-stream") so that the client (browser) knows it should hang-on, and then start streaming on the socket. That latter part is the one not easily possible as of today in Rails 3 (yet not impossible!); Rails 4 actually now supports streaming in an easy way, with a clean and simple internal API, so it's definitely coming.
In the mean time, you'd either:
mess with Rack's API in Rails (using EventMachine I guess, there are some examples in the wild)
or have it smart and make use of the streaming feature provided by Sinatra, built on top of Rack (see https://gist.github.com/1476463 for an example of Sinatra app which can be mounted in a Rails one!)
or you could use an external service such as Pusher
or leverage a entirely different stack…
A good overview: http://blog.phusion.nl/2012/08/03/why-rails-4-live-streaming-is-a-big-deal/
Maybe I'm wrong, but if IIRC Rails can't support long pooling. Rails block whole server (or thread if you have more than one running inside server) for each request and can't reuse them unless whole response was send. That's why you should setup reverse proxy (like nginx) in front of Rails application if you suspect there could be many concurrent connections - to proxy slow client requests and send them to Rails when whole request is received. It's just how Rack works, there's not much you can do about this probably.

Using Pylons as a Web Backend

I am using Pylons for two things:
1) Serving API requests (returning JSONs describing my SQLAlchemy models)
2) Running a script 24/7 that fetches flight information from the internet (using HTTP) and pushes it into my DB (again using my models).
I am NOT using Pylons as a front end, but as a back end.
What would be the best way for my script to make HTTP request? is urllib / urllib2 my best option here?
How would I run my script constantly and not on a request serving basis? Is Celery / Cronjobs what I am looking for here?
Thanks!
Regarding your first question: yes, urllib/urllib2 is probably the best bet. It has very solid functionality for making HTTP requests to someone else.
Regarding your second question: Use your database. It's not super-scalable, but it's easy to implement a system where you have a flag in the database that is, essentially, an on-off switch for the application. Once that exists, make a page (with whatever security precautions you think prudent) that sets the flag and starts the application on a loop that continues indefinitely as long as the flag is set. A second page clears the flag, should you need to stop the HTTP requests without killing the entire server process. Instead of 'pages,' they could also be shell scripts or short standalone scripts. The important part is that you can implement this without requiring Celery or cron (although if you're already familiar with either, have at).

Tomcat is processing requests for a JRuby on Rails app before app is completely loaded

I have an edge case, although a very customer visible one, where Tomcat begins processing requests before all dependencies are properly loaded for a Ruby on Rails stack running underneath JRuby.
Once Tomcat is restarted, there is something similar to the following happening:
undefined method `utc_offset' for nil:NilClass
[RAILS_ROOT]/gems/gems/activesupport-2.3.8/lib/active_support/values/time_zone.rb:206:in `<=>'
This happens when the following code is invoked on one of my services:
#timezones = ActiveSupport::TimeZone.all
If you wait a few more seconds and refresh the requesting page, it'll load no problem.
Is there a way to ensure that Tomcat does not start processing these requests until the entire stack, ActiveSupport, ActiveRecord etc is loaded? Has anyone experienced any similar symptoms?
This sounds like a possible bug in JRuby-Rack, assuming that's what you're using to run your Rails app in Tomcat. JRuby-Rack is supposed to load the entirety of config/environment.rb before it will process requests, so I'm not sure how this would happen to you, but perhaps I've overlooked something. Could you share some more data (or maybe code or an app that reproduces the issue) about how you induced the error at http://kenai.com/jira/browse/JRUBY_RACK or http://bugs.jruby.org?
I'm not sure if there is something like that in Tomcat directly, but you can write a javax.servlet.Filter that will intercept all requests, and deny them until your application is loaded. When application is fully loaded, you ask filter to stop denying requests. (This isn't pure Ruby solution though).

Resources