Memory Leak in Ruby net/ldap Module - ruby-on-rails

As part of my Rails application, I've written a little importer that sucks in data from our LDAP system and crams it into a User table. Unfortunately, the LDAP-related code leaks huge amounts of memory while iterating over our 32K users, and I haven't been able to figure out how to fix the issue.
The problem seems to be related to the LDAP library in some way, as when I remove the calls to the LDAP stuff, memory usage stabilizes nicely. Further, the objects that are proliferating are Net::BER::BerIdentifiedString and Net::BER::BerIdentifiedArray, both part of the LDAP library.
When I run the import, memory usage eventually peaks at over 1GB. I need to find some way to correct my code if the problem is there, or to work around the LDAP memory issues if that's where the problem lies. (Or if there's a better LDAP library for large imports for Ruby, I'm open to that as well.)
Here's the pertinent bit of our my code:
require 'net/ldap'
require 'pp'
class User < ActiveRecord::Base
validates_presence_of :name, :login, :email
# This method is resonsible for populating the User table with the
# login, name, and email of anybody who might be using the system.
def self.import_all
# initialization stuff. set bind_dn, bind_pass, ldap_host, base_dn and filter
ldap = Net::LDAP.new
ldap.host = ldap_host
ldap.auth bind_dn, bind_pass
ldap.bind
begin
# Build the list
records = records_updated = new_records = 0
ldap.search(:base => base_dn, :filter => filter ) do |entry|
name = entry.givenName.to_s.strip + " " + entry.sn.to_s.strip
login = entry.name.to_s.strip
email = login + "#txstate.edu"
user = User.find_or_initialize_by_login :name => name, :login => login, :email => email
if user.name != name
user.name = name
user.save
logger.info( "Updated: " + email )
records_updated = records_updated + 1
elsif user.new_record?
user.save
new_records = new_records + 1
else
# update timestamp so that we can delete old records later
user.touch
end
records = records + 1
end
# delete records that haven't been updated for 7 days
records_deleted = User.destroy_all( ["updated_at < ?", Date.today - 7 ] ).size
logger.info( "LDAP Import Complete: " + Time.now.to_s )
logger.info( "Total Records Processed: " + records.to_s )
logger.info( "New Records: " + new_records.to_s )
logger.info( "Updated Records: " + records_updated.to_s )
logger.info( "Deleted Records: " + records_deleted.to_s )
end
end
end
Thanks in advance for any help/pointers!
By the way, I did ask about this in the net/ldap support forum as well, but didn't get any useful pointers there.

One very important thing to note is that you never use the result of the method call. That means that you should pass :return_result => false to ldap.search:
ldap.search(:base => base_dn, :filter => filter, :return_result => false ) do |entry|
From the docs: "When :return_result => false, #search will return only a Boolean, to indicate whether the operation succeeded. This can improve performance with very large result sets, because the library can discard each entry from memory after your block processes it."
In other words, if you don't use this flag, all entries will be stored in memory, even if you do not need them outside the block! So, use this option.

Related

Consume external API and performance

I am working a Ruby on Rails project that consumes data through some external API.
This API allows me to get a list of cars and display them on my single webpage.
I created a model that holds all methods related to this API.
The controller uses list_cars method from the model to forward the data to the view.
This is the model dedicated to the API calls:
class CarsApi
#base_uri = 'https://api.greatcars.com/v1/'
def self.list_cars
cars = Array.new
response = HTTParty.get(#base_uri + 'cars',
headers: {
'Authorization' => 'Token token=' + ENV['GREATCARS_API_TOKEN'],
'X-Api-Version' => ENV["GREATCARS_API_VERSION"]
})
response["data"].each_with_index do |(key, value), index|
id = response["data"][index]["id"]
make = response["data"][index]["attributes"]["make"]
store = get_store(id)
location = get_location(id)
model = response["data"][index]["attributes"]["model"]
if response["data"][index]["attributes"]["status"] == "on sale"
cars << Job.new(id, make, store, location, model)
end
end
cars
end
def self.get_store(job_id)
store = ''
response_related_store = HTTParty.get(#base_uri + 'cars/' + job_id + "/relationships/store",
headers: {
'Authorization' => 'Token token=' + ENV['GREATCARS_API_TOKEN'],
'X-Api-Version' => ENV["GREATCARS_API_VERSION"]
})
if response_related_store["data"]
store_id = response_related_store["data"]["id"]
response_store = HTTParty.get(#base_uri + 'stores/' + store_id,
headers: {
'Authorization' => 'Token token=' + ENV['GREATCARS_API_TOKEN'],
'X-Api-Version' => ENV["GREATCARS_API_VERSION"]
})
store = response_store["data"]["attributes"]["name"]
end
store
end
def self.get_location(job_id)
address, city, country, zip, lat, long = ''
response_related_location = HTTParty.get(#base_uri + 'cars/' + job_id + "/relationships/location",
headers: {
'Authorization' => 'Token token=' + ENV['GREATCARS_API_TOKEN'],
'X-Api-Version' => ENV["GREATCARS_API_VERSION"]
})
if response_related_location["data"]
location_id = response_related_location["data"]["id"]
response_location = HTTParty.get(#base_uri + 'locations/' + location_id,
headers: {
'Authorization' => 'Token token=' + ENV['GREATCARS_API_TOKEN'],
'X-Api-Version' => ENV["GREATCARS_API_VERSION"]
})
if response_location["data"]["attributes"]["address"]
address = response_location["data"]["attributes"]["address"]
end
if response_location["data"]["attributes"]["city"]
city = response_location["data"]["attributes"]["city"]
end
if response_location["data"]["attributes"]["country"]
country = response_location["data"]["attributes"]["country"]
end
if response_location["data"]["attributes"]["zip"]
zip = response_location["data"]["attributes"]["zip"]
end
if response_location["data"]["attributes"]["lat"]
lat = response_location["data"]["attributes"]["lat"]
end
if response_location["data"]["attributes"]["long"]
long = response_location["data"]["attributes"]["long"]
end
end
Location.new(address, city, country, zip, lat, long)
end
end
It takes... 1 minute and 10 secondes to load my home page!
I wonder if there is a better way to do this and improve performances.
It takes... 1 minute and 10 seconds to load my home page! I wonder if there is a better way to do this and improve performances.
If you want to improve the performance of some piece of code, the first thing you should always do is add some instrumentation. It seems you already have some metric how long loading the whole page takes but you need to figure out now WHAT is actually taking long. There are many great gems and services out there which can help you.
Services:
https://newrelic.com/
https://www.skylight.io/
https://www.datadoghq.com/
Gems
https://github.com/MiniProfiler/rack-mini-profiler
https://github.com/influxdata/influxdb-rails
Or just add some plain old logging
N+1 request
One assumption why this is slow could be that you do for every car FOUR additional requests to fetch the store and location. This means if you display 10 jobs on your homepage you would need to do 50 API requests. Even if each request just takes ~1 second it's almost one minute.
One simple idea is to cache the additional resources in a lookup table. I'm not sure how many jobs would share the same store and location though so not sure how much it would actually safe.
This could be a simple lookup table like:
class Store
cattr_reader :table do
{}
end
class << self
def self.find(id)
self.class.table[id] ||= fetch_store(id)
end
private
def fetch_store(id)
# copy code here to fetch store from API
end
end
end
This way, if several jobs have the same location / store you only do one request.
Lazy load
This depends on the design of the page but you could lazy load additional information like location and store.
One thing many pages do is to display a placeholder or dominant colour and lazy load further content with Java Script.
Another idea could be load store and location when scrolling or hovering but this depends a bit on the design of your page.
Pagination / Limit
Maybe you're also requesting too many items from the API. See if the API has some options to limit the number of items you request e.g. https://api.greatcars.com/v1/cars/limit=10&page=1
But as said, even if this is limited to 10 items you would end up with 50 requests in total. So until you fix the first issue, this won't have much impact.
General Caching
Generally I think it's not a good idea to always send an API request for each request you page gets. You could introduce some caching to e.g. only do an API request once every x minutes / hour / day.
This could be as simple as just storing in a class variable or using memcached / Redis / Database.

Refactor Models of gems - pluralized variables in rake-task

I'm an early ruby dev so excuse me if this is obvious.
Using the gioco gem to implement a gamification concept I renamed a few classes and references to better fit in the project (tested everything and it's working).
The problematic rename is Kind => BadgeKind.
This line in the following task:
r.points << Point.create({ :badge_kind_id => kinds.id, :value => '#{args.points}'})
What's the concept behind kinds.id and how do I solve this?
Error:
NameError: undefined local variable or method 'kinds' for main:Object
I tried: badge_kind, badge_kinds
Task:
task :add_badge, [:name, :points, :kind, :default] => :environment do |t, args|
arg_default = ( args.default ) ? eval(args.default) : false
if !args.name || !args.points || !args.kind
raise "There are missing some arguments"
else
badge_string = "kind = BadgeKind.find_or_create_by(name: '#{args.kind}')\n"
badge_string = badge_string + "badge = Badge.create({
:name => '#{args.name}',
:points => '#{args.points}',
:badge_kind_id => kind.id,
:default => '#{arg_default}'
})
"
if arg_default
# changing User.find(:all) to User.all
badge_string = badge_string + 'resources = User.all
'
badge_string = badge_string + "resources.each do |r|
r.points << Point.create({ :badge_kind_id => kinds.id, :value => '#{args.points}'})
r.badges << badge
r.save!
end
"
end
badge_string = badge_string + "puts '> Badge successfully created'"
# puts "badge_string:\n" + badge_string
eval badge_string
end
It looks like you already have tables in your application for Kind and Badge. If that's the case, then it's not a good idea to rename Badge to BadgeKind. The reason is because this notation is commonly used for join tables, and it looks like you might be colliding with one. Right now, I'm guessing a Badge can have a Kind or vice versa. In order to make that happen, there may be a BadgeKind table that contains their relationships. I can't tell for sure without seeing your schema file or migrations, but you're probably better off leaving that the way it is, or at least choosing a different name like BadgeType.
Also, it's worth noting that building up strings to be executed with eval is considered an antipattern. Eval is slow, and also exposes your application to malicious code injection. You're at less of a risk because it appears this is a rake task, but even so, if you refactor this code, you should try a different strategy. It looks like it would be relatively straightforward to refactor.

Skip extra request made by ActiveRecord

In a rails application, I do some requests on an external database. In newrelic, when I look at the SQL requests I have this :
2.997 718 ms SHOW FULL FIELDS FROM `whale`
3.717 721 ms SHOW VARIABLES WHERE Variable_name = 'character_set_client'
4.440 728 ms SHOW TABLES LIKE 'whale'
5.169 668 ms SHOW CREATE TABLE `whale`
5.839 731 ms SELECT id, `whale`.`name` FROM `whale`
As you can see, all requests take a long time so I want to minimize them. I only need the last result.
This is my simple controller :
class AnimalsController < ApplicationController
def index
MicsPix.pluck(:id, :name)
render text: 'ok'
end
end
And my model :
class MicsPix < ActiveRecord::Base
establish_connection(:otherdb)
def self.table_name
"whale"
end
end
Is there a solution to skip queries than I don't use? I don't necessarily want to use ActiveRecord.
I'm not certain where the extra queries arise but i have a suggestion on how to remove them
c = ActiveRecord::Base.establish_connection(
:adapter => "mysql2",
:host => "localhost",
:username => "root",
:password => "",
:database => "mydatabase"
)
sql = "SELECT id, `whale`.`name` FROM `whale`"
res = c.connection.execute(sql)
then reset db connection to default.
this code works for me, and I can get results from an external db with only query being executed. I did this in a controller method when I tried it out but I think it would be more neat in the model. As far as I understand the establish_connection(:otherdb) does the same as I do when I produce my c. So I think you could write something like this in your MicsPix
def get_list
sql = "SELECT id, `whale`.`name` FROM `whale`"
connection.execute(sql)
end
this returns and mysql res object which is quite similar to arrays but not quite.
regards
/Albin

How to set up usernames from email of all users?

I would like to create rake task to set the username of all users' without a username to the part before the '#' in their email address. So if my email is test#email.eu, my username should become test. If it's not available, prepend it by a number (1).
So i have problem witch checking uniqness of username. Code below isn`t working after second loop ex: when i have three emails: test#smt.com, test#smt.pl, test#oo.com username for test#oo.com will be empty.
I have of course uniqness validation for username in User model.
desc "Set username of all users wihout a username"
task set_username_of_all_users: :environment do
users_without_username = User.where(:username => ["", nil])
users_without_username.each do |user|
username = user.email.split('#').first
users = User.where(:username => username)
if users.blank?
user.username = username
user.save
else
users.each_with_index do |u, index|
pre = (index + 1).to_s
u.username = username.insert(0, pre)
u.save
end
end
end
end
Other ideas are in Gist: https://gist.github.com/3067635#comments
You could use a simple while loop for checking the username:
users_without_username = User.where{ :username => nil }
users_without_username.each do |user|
email_part = user.email.split('#').first
user.username = email_part
prefix = 1
while user.invalid?
# add and increment prefix until a valid name is found
user.username = prefix.to_s + email_part
prefix += 1
end
user.save
end
However, it might be a better approach to ask the user to enter a username upon next login.
if i understand your code correct, you are changing the username of existing users in the else branch? that does not look as if it's a good idea.
you should also use a real finder to select your users that don't have a username. otherwise you will load all the users before selecting on them.
i don't know if it "matches your requirements" but you could just put a random number to the username so that you do not have the problem of duplicates.
another thing that you can use is rubys retry mechanism. just let active-record raise an error and retry with a changed username.
begin
do_something # exception raised
rescue
# handles error
retry # restart from beginning
end
In your query User.find_by_username(username), you only expect 1 record to be provided. So you don't need any each. You should add your index in another way.

How to store the result of my algorithm?

I have an algorithm that searches through all of my sites users, finding those which share a common property with the user using the algorithm (by going to a certain page). It can find multiple users, each can have multiple shared properties. The algorithm works fine, in terms of finding the matches, but I'm having trouble working out how to store the data so that later I'll be able to use each unit of information. I need to be able to access both the found users, and each of the respective shared properties, so I can't just build a string. This is an example of the output, being run from the perspective of user 1:
user 4
sharedproperty3
sharedproperty6
user 6
sharedproperty6
sharedproperty10
shareproperty11
What do I need to do to be able to store this data, and have access to any bit of it for further manipulation? I was thinking of a hash of a hash, but I can't really wrap my head around it. I'm pretty new to programming, and Ruby in particular. Thanks for reading!
EDIT - Here's the code. I'm fully expecting this to be the most incorrect way to do this, but it's my first try so be gentle :)
So if I'm understanding you guys correctly, instead of adding the interests to a string, I should be creating an array or a hash, adding each interest as I find it, then storing each of these in an array or hash? Thanks so much for the help.
def getMatchedUsers
matched_user_html = nil
combined_properties = nil
online_user_list = User.logged_in.all
shared_interest = false
online_user_list.each do |n| # for every online user
combined_properties = nil
if n.email != current_user.email # that is not the current user
current_user.properties.each do |o| # go through all of the current users properties
n.properties.each do |p| # go through the online users properties
if p.interestname.eql?(o.interestname) # if the online users property matches the current user
shared_interest = true
if combined_properties == nil
combined_properties = o.interestname
else
combined_properties = combined_properties + ", " + o.interestname
end
end
end
if shared_interest == true
matched_user_html = n.actualname + ": " + combined_properties
end
end
end
end
return matched_user_html
render :nothing => true
end
This returns an array of hashes with all users and their corresponding sharedproperties.
class User
def find_matching_users
returning Array.new do |matching_users|
self.logged_in.each do |other_user|
next if current_user == other_user # jump if current_user
# see http://ruby-doc.org/core/classes/Array.html#M002212 for more details on the & opreator
unless (common_properties = current_user.properties & other_user.properties).empty?
matching_users << { :user => other_user, :common_properties => common_properties }
end
end
end
end
end
In your view you can do something like this:
<%- current_user.find_matching_users.each do |matching_user| -%>
<%-# you can acccess the user with matching_user[:user] -%>
<%-# you can acccess the common properties with matching_user[:common_properties] -%>
<%- end -%>
You can use a hash table with the key being the user object and the value being an array of the shared properties . This is assuming that you first need to do a lookup based on the user .
Something like this :
#user_results = { user1 => [sharedproperty3,sharedproperty7] , user2 => [sharedproperty10,sharedproperty11,sharedproperty12]}
You can then acces the values like :
#user_results[user1]
or you can also iterate over all the keys using #user_results.keys

Resources