Ruby code memory leak in loop - ruby-on-rails

Below code is having memory leak. It's running under ruby 2.1.1. I am not able to find the actual leak.
q = Queue.new("test")
while true do
m = q.dequeue
body = JSON.parse(m.body)
user_id = body["Records"][0]
user = V2::User.find(user_id)
post = V2::Post.find(post_id)
end
After few hours of run I added GC.start but its not solving the problem
q = Queue.new("test")
while true do
m = q.dequeue
body = JSON.parse(m.body)
user_id = body["Records"][0]
user = V2::User.find(user_id)
post = V2::Post.find(post_id)
GC.start
end
I don't know how to find the actual memory leak.

Try removing the lines from the bottom up, and seeing if the memory leak persists. It's possible that the Memory leak is coming from the find method, or possibly the JSON.parse (extremely unlikely), or the custom Queue data structure. If the memory leak is still there after removing all of the lines, it is likely coming from the worker itself and/or the program running the workers.
q = Queue.new("test")
while true do
m = q.dequeue # Finally remove this and stub the while true with a sleep or something
body = JSON.parse(m.body) # Then remove these two lines
user_id = body["Records"][0]
user = V2::User.find(user_id) # Remove the bottom two lines first
post = V2::Post.find(post_id)
end

I bet the problem is with introduced local variables (sic!). Get rid of user_id and post_id and it’ll likely stop leaking:
# user_id = body["Records"][0]
# user = V2::User.find(user_id)
user = V2::User.find(body["Records"][0]) # sic!
The reason is how Ruby stores objects in RValues.

Related

How to wait for all Concurrent::Promise in an array to finish/resolve

#some_instance_var = Concurrent::Hash.new
(0...some.length).each do |idx|
fetch_requests[idx] = Concurrent::Promise.execute do
response = HTTP.get(EXTDATA_URL)
if response.status.success?
... # update #some_instance_var
end
# We're going to disregard GET failures here.
puts "I'm here"
end
end
Concurrent::Promise.all?(fetch_requests).execute.wait # let threads finish gathering all of the unique posts first
puts "how am i out already"
When I run this, the bottom line prints first, so it's not doing what I want of waiting for all the threads in the array to finish its work first, hence I keep getting an empty #some_instance_var to work with below this code. What am I writing wrong?
Never mind, I fixed this. That setup is correct, I just had to use the splat operator * for my fetch_requests array inside the all?().
Concurrent::Promise.all?(*fetch_requests).execute.wait
I guess it wanted multiple args instead of one array.

Getting a Primary Key error in Rails using Sidekiq and Sidekiq-Cron

I have a Rails project that uses Sidekiq for worker tasks, and Sidekiq-Cron to handle scheduling. I am running into a problem, though. I built a controller (below) that handled all of my API querying, validation of data, and then inserting data into the database. All of the logic functioned properly.
I then tore out the section of code that actually inserts API data into the database, and moved it into a Job class. This way the Controller method could simply pass all of the heavy lifting off to a job. When I tested it, all of the logic functioned properly.
Finally, I created a Job that would call the Controller method every minute, do the validation checks, and then kick off the other Job to save the API data (if necessary). When I do this the first part of the logic seems to work, where it inserts new event data, but the logic where it checks to see if this is the first time we've seen an event for a specific object seems to be failing. The result is a Primary Key violation in PG.
Code below:
Controller
require 'date'
class MonnitOpenClosedSensorsController < ApplicationController
def holderTester()
#MonnitschedulerJob.perform_later(nil)
end
# Create Sidekiq queue to process new sensor readings
def queueNewSensorEvents(auth_token, network_id)
m = Monnit.new("iMonnit", 1)
# Construct the query to select the most recent communication date for each sensor in the network
lastEventForEachSensor = MonnitOpenClosedSensor.select('"SensorID", MAX("LastCommunicationDate") as "lastCommDate"')
lastEventForEachSensor = lastEventForEachSensor.group("SensorID")
lastEventForEachSensor = lastEventForEachSensor.where('"CSNetID" = ?', network_id)
todaysDate = Date.today
sevenDaysAgo = (todaysDate - 7)
lastEventForEachSensor.each do |event|
# puts event["lastCommDate"]
recentEvent = MonnitOpenClosedSensor.select('id, "SensorID", "LastCommunicationDate"')
recentEvent = recentEvent.where('"CSNetID" = ? AND "SensorID" = ? AND "LastCommunicationDate" = ?', network_id, event["SensorID"], event["lastCommDate"])
recentEvent.each do |recent|
message = m.get_extended_sensor(auth_token, recent["SensorID"])
if message["LastDataMessageMessageGUID"] != recent["id"]
MonnitopenclosedsensorJob.perform_later(auth_token, network_id, message["SensorID"])
# puts "hi inner"
# puts message["LastDataMessageMessageGUID"]
# puts recent['id']
# puts recent["SensorID"]
# puts message["SensorID"]
# raise message
end
end
end
# Queue up any Sensor Events for new sensors
# This would be sensors we've never seen before, from a Postgres standpoint
sensors = m.get_sensor_ids(auth_token)
sensors.each do |sensor|
sensorCheck = MonnitOpenClosedSensor.select(:SensorID)
# sensorCheck = MonnitOpenClosedSensor.select(:SensorID)
sensorCheck = sensorCheck.group(:SensorID)
sensorCheck = sensorCheck.where('"CSNetID" = ? AND "SensorID" = ?', network_id, sensor)
# sensorCheck = sensorCheck.where('id = "?"', sensor["LastDataMessageMessageGUID"])
if sensorCheck.any? == false
MonnitopenclosedsensorJob.perform_later(auth_token, network_id, sensor)
end
end
end
end
The above code breaks Sensor Events for new sensors. It doesn't recognize that a sensor already exists, first issue, and then doesn't recognize that the event it is trying to create is already persisted to the database (uses a GUID for comparison).
Job to persist data
class MonnitopenclosedsensorJob < ApplicationJob
queue_as :default
def perform(auth_token, network_id, sensor)
m = Monnit.new("iMonnit", 1)
newSensor = m.get_extended_sensor(auth_token, sensor)
sensorRecord = MonnitOpenClosedSensor.new
sensorRecord.SensorID = newSensor['SensorID']
sensorRecord.MonnitApplicationID = newSensor['MonnitApplicationID']
sensorRecord.CSNetID = newSensor['CSNetID']
lastCommunicationDatePretty = newSensor['LastCommunicationDate'].scan(/[0-9]+/)[0].to_i / 1000.0
nextCommunicationDatePretty = newSensor['NextCommunicationDate'].scan(/[0-9]+/)[0].to_i / 1000.0
sensorRecord.LastCommunicationDate = Time.at(lastCommunicationDatePretty)
sensorRecord.NextCommunicationDate = Time.at(nextCommunicationDatePretty)
sensorRecord.id = newSensor['LastDataMessageMessageGUID']
sensorRecord.PowerSourceID = newSensor['PowerSourceID']
sensorRecord.Status = newSensor['Status']
sensorRecord.CanUpdate = newSensor['CanUpdate'] == "true" ? 1 : 0
sensorRecord.ReportInterval = newSensor['ReportInterval']
sensorRecord.MinimumThreshold = newSensor['MinimumThreshold']
sensorRecord.MaximumThreshold = newSensor['MaximumThreshold']
sensorRecord.Hysteresis = newSensor['Hysteresis']
sensorRecord.Tag = newSensor['Tag']
sensorRecord.ActiveStateInterval = newSensor['ActiveStateInterval']
sensorRecord.CurrentReading = newSensor['CurrentReading']
sensorRecord.BatteryLevel = newSensor['BatteryLevel']
sensorRecord.SignalStrength = newSensor['SignalStrength']
sensorRecord.AlertsActive = newSensor['AlertsActive']
sensorRecord.AccountID = newSensor['AccountID']
sensorRecord.CreatedOn = Time.now.getutc
sensorRecord.CreatedBy = "Monnit Open Closed Sensor Job"
sensorRecord.LastModifiedOn = Time.now.getutc
sensorRecord.LastModifiedBy = "Monnit Open Closed Sensor Job"
sensorRecord.save
sensorRecord = nil
end
end
Job to call controller every minute
class MonnitschedulerJob < ApplicationJob
queue_as :default
def perform(*args)
m = Monnit.new("iMonnit", 1)
getImonnitUsers = ImonnitCredential.select('"auth_token", "username", "password"')
getImonnitUsers.each do |user|
# puts user["auth_token"]
# puts user["username"]
# puts user["password"]
if user["auth_token"] != nil
m.logon(user["auth_token"])
else
auth_token = m.get_auth_token(user["username"], user["password"])
auth_token = auth_token["Result"]
end
network_list = m.get_network_list(auth_token)
network_list.each do |network|
# puts network["NetworkID"]
MonnitOpenClosedSensorsController.new.queueNewSensorEvents(auth_token, network["NetworkID"])
end
end
end
end
Sorry about the length of the post. I tried to include as much information as I could about the code involved.
EDIT
Here is the code for the extended sensor, along with the JSON response:
def get_extended_sensor(auth_token, sensor_id)
response = self.class.get("/json/SensorGetExtended/#{auth_token}?SensorID=#{sensor_id}")
if response['Result'] != "Invalid Authorization Token"
response['Result']
else
response['Result']
end
end
{
"Method": "SensorGetExtended",
"Result": {
"ReportInterval": 180,
"ActiveStateInterval": 180,
"InactivityAlert": 365,
"MeasurementsPerTransmission": 1,
"MinimumThreshold": 4294967295,
"MaximumThreshold": 4294967295,
"Hysteresis": 0,
"Tag": "",
"SensorID": 189092,
"MonnitApplicationID": 9,
"CSNetID": 24391,
"SensorName": "Open / Closed - 189092",
"LastCommunicationDate": "/Date(1500999632000)/",
"NextCommunicationDate": "/Date(1501010432000)/",
"LastDataMessageMessageGUID": "d474b3db-d843-40ba-8e0e-8c4726b61ec2",
"PowerSourceID": 1,
"Status": 0,
"CanUpdate": true,
"CurrentReading": "Open",
"BatteryLevel": 100,
"SignalStrength": 84,
"AlertsActive": true,
"CheckDigit": "QOLP",
"AccountID": 14728
}
}
Some thoughts:
recentEvent = MonnitOpenClosedSensor.select('id, "SensorID", "LastCommunicationDate"') -
this is not doing any ordering; you are presuming that the records you retrieve here are the latest records.
m = Monnit.new("iMonnit", 1)
newSensor = m.get_extended_sensor(auth_token, sensor)
without the implementation details of get_extended_sensor it's impossible to tell you how
sensorRecord.id = newSensor['LastDataMessageMessageGUID']
is resolving.
It's highly likely that you are getting duplicate messages. It's almost never a good idea to use input data as a primary key - rather autogenerate a GUID in your job, use that as the primary key, and then use the LastDataMessageMessageGUID as a correlation id.
So the issue that I was running into, as it turns out, is as follows:
A sensor event was pulled from the API and queued up in as a worker job in Sidekiq.
If the queue is running a bit slow, API speed or simply a lot of jobs to process, the 1 minute poll might hit again and pull the same sensor event down and queue it up.
As the queue processes, the sensor event gets inserted into the database with it's GUID being the primary key
As the queue continues to catch up with itself, it hits the same event that was scheduled as a secondary job. This job then fails.
My solution to this was to move my "does this SensorID and GUID exist in the database" to the actual job. So when the job ran the first thing it'd do is check AGAIN for the record to already exist. This means I am checking twice, but this quick check has low overhead.
There is still the risk that a check could happen and pass while another job is inserting the record, before it commits it to the database, and then it could fail. But the retry would catch it, and then clear it on out as a successful process when the check doesn't validate on the second round. Having said that, however, the check occurs AFTER the API data has been pulled. Since, in theory, the database persist of a single record from the API data would happen really fast (much faster than the API call would happen), it really does lower the chances of you having to hit a retry on any job....and I mean you'd have a better chance of hitting the lottery than having the second check fail and trigger a retry.
If anyone else has a better, or more clean solution, please feel free to include it as a secondary answer!

How can I make this method more concise?

I get a warning when running reek on a Rails project:
[36]:ArborReloaded::UserStoryService#destroy_stories has approx 8 statements (TooManyStatements)
Here's the method:
def destroy_stories(project_id, user_stories)
errors = []
#project = Project.find(project_id)
user_stories.each do |current_user_story_id|
unless #project.user_stories.find(current_user_story_id).destroy
errors.push("Error destroying user_story: #{current_user_story_id}")
end
end
if errors.compact.length == 0
#common_response.success = true
else
#common_response.success = false
#common_response.errors = errors
end
#common_response
end
How can this method be minimized?
First, I find that class and method size are useful for finding code that might need refactoring, but sometimes you really do need a long class or method. And there is always a way to make your code shorter to get around such limits, but that might make it less readable. So I disable that type of inspection when using static analysis tools.
Also, it's unclear to me why you'd expect to have an error when deleting a story, or who benefits from an error message that just includes the ID and nothing about what error occurred.
That said, I'd write that method like this, to reduce the explicit local state and to better separate concerns:
def destroy_stories(project_id, story_ids)
project = Project.find(project_id) # I don't see a need for an instance variable
errors = story_ids.
select { |story_id| !project.user_stories.find(story_id).destroy }.
map { |story_id| "Error destroying user_story: #{story_id}" }
respond errors
end
# Lots of services probably need to do this, so it can go in a superclass.
# Even better, move it to #common_response's class.
def respond(errors)
# It would be best to move this behavior to #common_response.
#common_response.success = errors.any?
# Hopefully this works even when errors == []. If not, fix your framework.
#common_response.errors = errors
#common_response
end
You can see how taking some care in your framework can save a lot of noise in your components.

Using index value in method

In my Rails application, in a model, I am trying to use the loop index x in the following method, and I can't figure out how to get the value:
def set_winners ## loops over 4 quarters
1.upto(4) do |x|
qtr_[x]_winner.winner = 1
qtr_[x]_winner.save
end
end
I'm going to keep searching but any help would be greatly appreciated!
edit: So I guess I can't do that! Here is the original method I was trying to refactor in full by looping four times:
def set_winners
## set all 4 quarter's winning squares
home_qtr_1 = game.home_q1_score.to_s.split('').last.to_i
away_qtr_1 = game.away_q1_score.to_s.split('').last.to_i
qtr_1_winner = squares.where(xvalue:home_qtr_1, yvalue:away_qtr_1).first
qtr_1_winner.winner = 1
qtr_1_winner.save
home_qtr_2 = game.home_q2_score.to_s.split('').last.to_i
away_qtr_2 = game.away_q2_score.to_s.split('').last.to_i
qtr_2_winner = squares.where(xvalue:home_qtr_2, yvalue:away_qtr_2).first
qtr_2_winner.winner = 1
qtr_2_winner.save
home_qtr_3 = game.home_q3_score.to_s.split('').last.to_i
away_qtr_3 = game.away_q3_score.to_s.split('').last.to_i
qtr_3_winner = squares.where(xvalue:home_qtr_3, yvalue:away_qtr_3).first
qtr_3_winner.winner = 1
qtr_3_winner.save
home_qtr_4 = game.home_q4_score.to_s.split('').last.to_i
away_qtr_4 = game.away_q4_score.to_s.split('').last.to_i
qtr_4_winner = squares.where(xvalue:home_qtr_4, yvalue:away_qtr_4).first
qtr_4_winner.winner = 1
qtr_4_winner.save
end
Is there a better way to do this if it's bad practice to dynamically change attribute names?
It looks like you are trying to do a PHP-like trick in a language that doesn't support it, and where we recommend NOT doing it because it results in code that is very difficult to debug due to the dynamically named variables.
It looks like you want to generate a variable name using:
qtr_[x]_winner
to create something like:
qtr_1_winner
Instead, consider creating an array named qtr_winner containing your objects and access the elements like:
qtr_winner[1]
or
qtr_winner[2]
etc.
You could create a hash to do a similar thing:
qtr_winner = {}
qtr_winner[1] = 5
then later access it using qtr_winner[1] and get 5 back or
qtr_winner[1].winner = 1
The determination of whether to use a hash or an array is whether you need to walk the container, or need random access. If you are always indexing into it using a value, then it's probably a wash about which is faster.
Based on your edit, you don't need dynamic variables. The only thing that changes in your loop is game.home_qN_score, so that's what the focus of your refactoring should be. Given that, here's a viable solution:
1.upto(4) do |i|
home_qtr = game.send("home_q#{i}_score)".to_s.split('').last.to_i
away_qtr = game.send("away_q#{i}_score)".to_s.split('').last.to_i
winner = squares.where(xvalue:home_qtr, yvalue:away_qtr).first
winner.winner = 1
winner.save
end
Original answer:
If qtr_1_winner, etc. are instance methods, you can use Object#send to achieve what you want:
def set_winners ## loops over 4 quarters
1.upto(4) do |x|
send("qtr_#{x}_winner").winner = 1
send("qtr_#{x}_winner").save
end
end

find_or_create and race-condition in rails, theory and production

Hi I've this piece of code
class Place < ActiveRecord::Base
def self.find_or_create_by_latlon(lat, lon)
place_id = call_external_webapi
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
result
end
end
Then I'd like to do in another model or controller
p = Post.new
p.place = Place.find_or_create_by_latlon(XXXXX, YYYYY) # race-condition
p.save
But Place.find_or_create_by_latlon takes too much time to get the data if the action executed is create and sometimes in production p.place is nil.
How can I force to wait for the response before execute p.save ?
thanks for your advices
You're right that this is a race condition and it can often be triggered by people who double click submit buttons on forms. What you might do is loop back if you encounter an error.
result = Place.find_by_place_id(...) ||
Place.create(...) ||
Place.find_by_place_id(...)
There are more elegant ways of doing this, but the basic method is here.
I had to deal with a similar problem. In our backend a user is is created from a token if the user doesn't exist. AFTER a user record is already created, a slow API call gets sent to update the users information.
def self.find_or_create_by_facebook_id(facebook_id)
User.find_by_facebook_id(facebook_id) || User.create(facebook_id: facebook_id)
rescue ActiveRecord::RecordNotUnique => e
User.find_by_facebook_id(facebook_id)
end
def self.find_by_token(token)
facebook_id = get_facebook_id_from_token(token)
user = User.find_or_create_by_facebook_id(facebook_id)
if user.unregistered?
user.update_profile_from_facebook
user.mark_as_registered
user.save
end
return user
end
The step of the strategy is to first remove the slow API call (in my case update_profile_from_facebook) from the create method. Because the operation takes so long, you are significantly increasing the chance of duplicate insert operations when you include the operation as part of the call to create.
The second step is to add a unique constraint to your database column to ensure duplicates aren't created.
The final step is to create a function that will catch the RecordNotUnique exception in the rare case where duplicate insert operations are sent to the database.
This may not be the most elegant solution but it worked for us.
I hit this inside a sidekick job that retries and gets the error repeatedly and eventually clears itself. The best explanation I've found is on a blog post here. The gist is that postgres keeps an internally stored value for incrementing the primary key that gets messed up somehow. This rings true for me because I'm setting the primary key and not just using an incremented value so that's likely how this cropped up. The solution from the comments in the link above appears to be to call ActiveRecord::Base.connection.reset_pk_sequence!(table_name) This cleared up the issue for me.
begin
result = Place.where(:place_id => place_id).limit(1)
result = Place.create(:place_id => place_id, ... ) if result.empty? #!
rescue ActiveRecord::StatementInvalid => error
#save_retry_count = (#save_retry_count || 1)
ActiveRecord::Base.connection.reset_pk_sequence!(:place)
retry if( (#save_retry_count -= 1) >= 0 )
raise error
end

Resources