ElasticSearch async delete? 200 just after deleting index in Rails app - ruby-on-rails

We're using ElasticSearch with the Tire gem on a Rails app. For our integration tests, we delete and recreate the index before each example, something along the lines of
Foo.index.delete
Foo.create_elasticsearch_index
Where Foo includes Tire::Model::Persistence. But we started having missing index errors from time to time when running our test suite on CI. I went ahead and enabled debugging and then I found the following in the logs:
# 2013-10-04 09:25:05:839 [DELETE] ("test_index")
#
curl -X DELETE http://some-server:9200/test_index
# 2013-10-04 09:25:05:840 [200]
#
# {
# "ok": true,
# "acknowledged": true
# }
# 2013-10-04 09:25:05:852 [HEAD] ("test_index")
#
curl -I "http://some-server:9200/test_index"
# 2013-10-04 09:25:05:852 [200]
As you can see, I get a 200 ok response for the DELETE request, but then when Tire does a HEAD request to see if the index exists before creating it, it still returns a 200 instead of a 404. This happens randomly, most of the times it works ok, but at some point in the test suite it will fail.
I tried waiting for a yellow status between the delete and the create operations, but it didn't work. So my question here would be, is the delete index operation asynchronous in some way? (Couldn't find anything on the ES doc about that). Is there any way to wait for the index to be deleted? (Since the yellow status doesn't work).
Edit: Thanks #PinnyM for the clarification. So all the operations through the HTTP API are indeed asynchronous. So the question that remains is, how can I wait for the index to be deleted so I can create it again?

Before each example, you delete and recreate the index. As part of that process you could then ensure that the delete process has completed (asynchronously) by polling for the exists function to return false... In crappy pseudo-code...
max_wait = 5
while wait < max_wait and Tire.index("test_index").exists:
wait some more
if wait > max_wait:
throw WaitedTooLongException()
The Tire docs for exist are here - essentially they are doing the same index polling you found in the debug code:
http://rubydoc.info/github/karmi/tire/master/Tire/Index#exists?-instance_method and here's the call: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-exists.html
Personally, I would adjust your tests that use the ES instance... If you put the not-exists loop at the start of the test and the index tear-down at the end of the test, then you could get two benefits. You could:
Assert that the index didn't exist when testing started - nice and clean.
Ask for index tear down at the end of a test and run your next test which might not need ES - possible slight speed-up depending on your App and CI implementation of test ordering.

To wait for asynchronous operations to finish I am using in my RSpecs
collection.__elasticsearch__.refresh_index!
Very useful when I just indexed a new document and I want to run a search on it immediately after in my spec.
the .refresh_index! is a blocking method. Actually I believe all ! methods might be... have you tried the other ! methods ?
collection.__elasticsearch__.delete_index!
collection.__elasticsearch__.create_index!
(Using ES 5.x with the appropriate elasticsearch-ruby/rails gems without Tire)

Related

How to disable class cache for part of Rails application

I am developing a Rails app for network automation. Part of app consists logic to run operations, part are operations themselves. Operation is simply a ruby class that performs several commands for network device (router, switch etc).
Right now, operation is simply part of Rails app repo. But in order to make development process more agile, I would like to decouple app and operations. I would have 2 repos - one for app and one for operations. App deploy would follow standard procedure, but operation would sync every time something is pushed to master. And what is more important, I don't want to restart app after operations repo update.
So my question is:
How to exclude several classes (or namespaces) from being cashed in production Rails app - I mean every time I call this class it would be reread file from disk. What could be potential dangers of doing so?
Some code example:
# Example operation - I would like to add or modify such classes withou
class FooOperation < BaseOperation
def perform(host)
conn = new_connection(host) # method from BaseOperation
result = conn.execute("foo")
if result =~ /Error/
# retry, its known bug in device foo
conn.execute("foo")
else
conn.exit
return success # method from BaseOperation
end
end
end
# somewhere in admin panel I would do so:
o = Operations.create(name: "Foo", class_name: "Foo")
o.id # => 123 # for next example
# Ruby worker which actually runs an operation
class OperationWorker
def perform(operation_id, host)
operation = Operation.find(operation_id)
# here, everytime I load this I want ruby to search for implementation on filesystem, never cache
klass = operation.class_name.constantize
class.new(host).perform #
end
end
i think you have quite a misunderstanding about how ruby code loading and interpretation works!
the fact that rails reloads classes at development time is kind of a "hack" to let you iterate on the code while the server has already loaded, parsed and executed parts of your application.
in order to do so, it has to implement quite some magic to unload your code and reload parts of it on change.
so if you want to have up-to-date code when executing an "operation" you are probably best of by spawning a new process. this will guarantee that your new code is read and parsed properly when executed with a blank state.
another thing you can do is use load instead of require because it will actually re-read the source on subsequent requests. you have to keep in mind, that subsequent calls to load just add to the already existing code in the ruby VM. so you need to make sure that every change is compatible with the already loaded code.
this could be circumvented by some clever instance_eval tricks, but i'm not sure that is what you want...

Debugging Rspec Postgres lockups

I am trying to test an app that uses gem devise_token_auth, which basically includes a couple extra DB read/writes on almost every request (to verify and update user access tokens).
Everything is working fine, except when testing a controller action that includes several additional db read/writes. In these cases, the terminal locks up and I'm forced to kill the ruby process via activity monitor.
Sometimes I get error messages like this:
ruby /Users/evan/.rvm/gems/ruby-2.1.1/bin/rspec spec/controllers/api/v1/messages_controller_spec.rb(1245,0x7fff792bf310) malloc: *** error for object 0x7ff15fb73c00: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
I have no idea how to interpret that. I'm 90% sure the problem is due to this gem and the extra DB activity it causes on each request because I when I revert to my previous, less intensive auth, all the issues go away. I've also gotten things under control by giving postgres some extra time on the offending tests:
after :each do
sleep 2
end
This works fine for all cases except one, which requires a timeout before the expect, otherwise it throws this error:
Failure/Error: expect(#user1.received_messages.first.read?).to eq true
ActiveRecord::StatementInvalid:
PG::UnableToSend: another command is already in progress
: SELECT "messages".* FROM "messages" WHERE "messages"."receiver_id" = $1 ORDER BY "messages"."id" ASC LIMIT 1
which, to me, points to the DB issue again.
Is there anything else I could be doing to track down/control these errors? Any rspec settings I should look into?
If you are running parallel rspec tasks, that could be triggering this. When we've run into issues like this, we have forced those tests to run in single, non-parallel instance of rspec in our CI using tags.
Try something like this:
context 'when both records get updated in one job', non_parallel do
it { is_expected.to eq 2 }
end
And then invoke rspec singularly on the non_parallel tag:
rspec --tag non_parallel
The bulk of your tests (not tagged with non_parallel) can still be run in parallel in your CI solution (e.g. Jenkins) for performance.
Of course, be careful applying this band-aid. It is always better to identify what is not race-safe in your code, since that race could happen in the real world.

Render status 200 before executing code in rails controller

I'm integrating a communications api and whenever a text/voice reaches my server(rails controller), I have to send back an OK (200) to the api. I want to send this response before executing my code block because if my code breaks (and is unable to send the OK), the communcations api keeps sending the messages for up to 3 days. Now that just complicates the problem already on my server because it would keep breaking as the same message keeps on coming.
I did some research and found two solutions.
Solution 1: The first solution is below (my current implementation) and it doesnt seem to be working (unless I didnt read the log files properly or I'm hallucinating).
def receive_text_message
head :ok, :content_type => 'text/html'
# A bunch of code down here
end
I thought this should do (per rails doc), but I'm not sure it does.
Solution 2: the second implementation which I'm contemplating is to quickly create a new process/thread to execute the code block and kill off the process that received the message...that way the api gets its OK very quickly and it doesnt have to wait on the successful execution of my code block. I could the spawnling (or spawn) gem to do this. I would go with creating a process since I use passenger (community) server. But new processes would eat up more RAM, plus I think it is harder to debug child processes/thread (i might be wrong on this)
Thanks for the help!
Side question: does rails attempt to restart a process after it just failed?
You could opt for returning a 200 in your controller and start a sidekiq job. That way the 200 will be returned immediately and your controller will be ready to process the next job. So no waste of time and resources in your controller. The let the worker to do the real hard job.
In your controller
def receive_text_message
head :ok, :content_type => 'text/html'
HardWorker.perform_async(params)
end
In your sidekiq worker:
class HardWorker
include Sidekiq::Worker
def perform(params)
# 'Doing hard work'
end
end
I like sidekiq mostly because it is handling the resources more nicely compared to rescue.

Permanent daemon for quering a web resource

I have a rails 3 application and looked around in the internet for daemons but didnt found the right for me..
I want a daemon which fetches data permanently (exchange courses) from a web resource and saves it to the database..
like:
while true
Model.update_attribte(:course, http::get.new("asdasd").response)
end
I've only seen cron like jobs, but they only run after a specific time... I want it permanently, depending on how long it takes to end the query...
Do you understand what i mean?
The gem light-daemon I wrote should work very well in your case.
http://rubygems.org/gems/light-daemon
You can write your code in a class which has a perform method, use a queue system like this and at application startup enqueue the job with Resque.enqueue(Updater).
Obviously the job won't end until the application is stopped, personally I don't like that, but if this is the requirement.
For this reason if you need to execute other tasks you should configure more than one worker process and optionally more than one queue.
If you can edit your requirements and find a trigger for the update mechanism the same approach still works, you only have to remove the while true loop
Sample class needed:
Class Updater
#queue = :endless_queue
def self.perform
while true
Model.update_attribute(:course, http::get.new("asdasd").response)
end
end
end
Finaly i found a cool solution for my problem:
I use the god gem -> http://god.rubyforge.org/
with a bash script (link) for starting / stopping a simple rake task (with an infinite loop in it).
Now it works fine and i have even some monitoring with god running that ensures that the rake task runs ok.

Ruby on Rails: How to run things in the background?

When a new resource is created and it needs to do some lengthy processing before the resource is ready, how do I send that processing away into the background where it won't hold up the current request or other traffic to my web-app?
in my model:
class User < ActiveRecord::Base
after_save :background_check
protected
def background_check
# check through a list of 10000000000001 mil different
# databases that takes approx one hour :)
if( check_for_record_in_www( self.username ) )
# code that is run after the 1 hour process is finished.
user.update_attribute( :has_record )
end
end
end
You should definitely check out the following Railscasts:
http://railscasts.com/episodes/127-rake-in-background
http://railscasts.com/episodes/128-starling-and-workling
http://railscasts.com/episodes/129-custom-daemon
http://railscasts.com/episodes/366-sidekiq
They explain how to run background processes in Rails in every possible way (with or without a queue ...)
I've just been experimenting with the 'delayed_job' gem because it works with the Heroku hosting platform and it was ridiculously easy to setup!!
Add gem to Gemfile, bundle install, rails g delayed_job, rake db:migrate
Then start a queue handler with;
RAILS_ENV=production script/delayed_job start
Where you have a method call which is your lengthy process i.e
company.send_mail_to_all_users
you change it to;
company.delay.send_mail_to_all_users
Check the full docs on github: https://github.com/collectiveidea/delayed_job
Start a separate process, which is probably most easily done with system, prepending a 'nohup' and appending an '&' to the end of the command you pass it. (Make sure the command is just one string argument, not a list of arguments.)
There are several reasons you want to do it this way, rather than, say, trying to use threads:
Ruby's threads can be a bit tricky when it comes to doing I/O; you have to take care that some things you do don't cause the entire process to block.
If you run a program with a different name, it's easily identifiable in 'ps', so you don't accidently think it's a FastCGI back-end gone wild or something, and kill it.
Really, the process you start should be "deamonized," see the Daemonize class for help.
you ideally want to use an existing background job server, rather than writing your own. these will typically let you submit a job and give it a unique key; you can then use the key to periodically query the jobserver for the status of your job without blocking your webapp. here is a nice roundup of the various options out there.
I like to use backgroundrb, its nice it allows you to communicate to it during long processes. So you can have status updates in your rails app
I think spawn is a great way to fork your process, do some processing in background, and show user just some confirmation that this processing was started.
What about:
def background_check
exec("script/runner check_for_record_in_www.rb #{self.username}") if fork == nil
end
The program "check_for_record_in_www.rb" will then run in another process and will have access to ActiveRecord, being able to access the database.

Resources