Why is lua so slow in redis? Any workarounds? - lua

I'm evaluating the use of lua scrips in redis, and they seem to be a bit slow. I a benchmark as follows:
For a non-lua version, I did a simple SET key_i val_i 1M times
For a lua version, I did the same thing, but in a script: EVAL "SET KEYS[1] ARGV[1]" 1 key_i val_i
Testing on my laptop, the lua version is about 3x slower than the non-lua version. I understand that lua is a scripting language, not compiled, etc. etc. but this seems like a lot of performance overhead--is this normal?
Assuming this is indeed normal, are there any workaround? Is there a way to implement a script in a faster language, such as C (which redis is written in) to achieve better performance?
Edit: I am testing this using the go code located here: https://gist.github.com/ortutay/6c4a02dee0325a608941

The problem is not with Lua or Redis; it's with your expectations. You are compiling a script 1 million times. There is no reason to expect this to be fast.
The purpose of EVAL within Redis is not to execute a single command; you could do that yourself. The purpose is to do complex logic within Redis itself, on the server rather than on your local client. That is, instead of doing one set operation per-EVAL, you actually perform the entire series of 1 million sets within a single EVAL script, which will be executed by the Redis server itself.
I don't know much about Go, so I can't write the syntax for calling it. But I know what the Lua script would look like:
for i = 1, ARGV[1] do
local key = "key:" .. tostring(i)
redis.call('SET', key, i)
end
Put that in a Go string, then pass that to the appropriate call, with no key arguments and a single non-key argument that is the number of times to loop.

I stumbled on this thread and was also curious of the benchmark results. I wrote a quick Ruby script to compare them. The script does a simple "SET/GET" operation on the same key using different options.
require "redis"
def elapsed_time(name, &block)
start = Time.now
block.call
puts "#{name} - elapsed time: #{(Time.now-start).round(3)}s"
end
iterations = 100000
redis_key = "test"
redis = Redis.new
elapsed_time "Scenario 1: From client" do
iterations.times { |i|
redis.set(redis_key, i.to_s)
redis.get(redis_key)
}
end
eval_script1 = <<-LUA
redis.call("SET", "#{redis_key}", ARGV[1])
return redis.call("GET", "#{redis_key}")
LUA
elapsed_time "Scenario 2: Using EVAL" do
iterations.times { |i|
redis.eval(eval_script1, [redis_key], [i.to_s])
}
end
elapsed_time "Scenario 3: Using EVALSHA" do
sha1 = redis.script "LOAD", eval_script1
iterations.times { |i|
redis.evalsha(sha1, [redis_key], [i.to_s])
}
end
eval_script2 = <<-LUA
for i = 1,#{iterations} do
redis.call("SET", "#{redis_key}", tostring(i))
redis.call("GET", "#{redis_key}")
end
LUA
elapsed_time "Scenario 4: Inside EVALSHA" do
sha1 = redis.script "LOAD", eval_script2
redis.evalsha(sha1, [redis_key], [])
end
eval_script3 = <<-LUA
for i = 1,2*#{iterations} do
redis.call("SET", "#{redis_key}", tostring(i))
redis.call("GET", "#{redis_key}")
end
LUA
elapsed_time "Scenario 5: Inside EVALSHA with 2x the operations" do
sha1 = redis.script "LOAD", eval_script3
redis.evalsha(sha1, [redis_key], [])
en
I got the following results running on my Macbook pro
Scenario 1: From client - elapsed time: 11.498s
Scenario 2: Using EVAL - elapsed time: 6.616s
Scenario 3: Using EVALSHA - elapsed time: 6.518s
Scenario 4: Inside EVALSHA - elapsed time: 0.241s
Scenario 5: Inside EVALSHA with 2x the operations - elapsed time: 0.5s
In summary:
scenario 1 vs. scenario 2 show that the main contributor is the round trip time as scenario 1 makes 2 requests to Redis while scenario 2 only makes 1 and scenario 1 is ~2x the execution time
scenario 2 vs. scenario 3 shows that EVALSHA does provide some benefit and I am sure this benefit increases the more complex the script gets
scenario 4 vs scenario 5 shows the overhead of invoking the script is near minimal as we doubled the number of operations and saw a ~2x increase in execution time.

so there is now a workaround using a module created by John Sully. It works for Redis and KeyDB and allows you to use the V8 JIT engine which runs complex scripts much faster than Lua scripts. https://github.com/JohnSully/ModJS

Related

waitForCompletion(timeout) in Abaqus API does not actually kill the job after timeout passes

I'm doing a parametric sweep of some Abaqus simulations, and so I'm using the waitForCompletion() function to prevent the script from moving on prematurely. However, occassionally the combination of parameters causes the simulation to hang on one or two of the parameters in the sweep for something like half an hour to an hour, whereas most parameter combos only take ~10 minutes. I don't need all the data points, so I'd rather sacrifice one or two results to power through more simulations in that time. Thus I tried to use waitForCompletion(timeout) as documented here. But it doesn't work - it ends up functioning just like an indefinite waitForCompletion, regardless of how low I set the wait time. I am using Abaqus 2017, and I was wondering if anyone else had gotten this function to work and if so how?
While I could use a workaround like adding a custom timeout function and using the kill() function on the job, I would prefer to use the built-in functionality of the Abaqus API, so any help is much appreciated!
It seems like starting from a certain version the timeOut optional argument was removed from this method: compare the "Scripting Reference Manual" entry in the documentation of v6.7 and v6.14.
You have a few options:
From Abaqus API: Checking if the my_abaqus_script.023 file still exists during simulation:
import os, time
timeOut = 600
total_time = 60
time.sleep(60)
# whait untill the the job is completed
while os.path.isfile('my_job_name.023') == True:
if total_time > timeOut:
my_job.kill()
total_time += 60
time.sleep(60)
From outside: Launching the job using the subprocess
Note: don't use interactive keyword in your command because it blocks the execution of the script while the simulation process is active.
import subprocess, os, time
my_cmd = 'abaqus job=my_abaqus_script analysis cpus=1'
proc = subprocess.Popen(
my_cmd,
cwd=my_working_dir,
stdout='my_study.log',
stderr='my_study.err',
shell=True
)
and checking the return code of the child process suing poll() (see also returncode):
timeOut = 600
total_time = 60
time.sleep(60)
# whait untill the the job is completed
while proc.poll() is None:
if total_time > timeOut:
proc.terminate()
total_time += 60
time.sleep(60)
or waiting until the timeOut is reached using wait()
timeOut = 600
try:
proc.wait(timeOut)
except subprocess.TimeoutExpired:
print('TimeOut reached!')
Note: I know that terminate() and wait() methods should work in theory but I haven't tried this solution myself. So maybe there will be some additional complications (like looking for all children processes created by Abaqus using psutil.Process(proc.pid) )

Mutual Exclusion not happened in Ruby

Program:
def inc(n)
n + 1
end
sum = 0
threads = (1..10).map do
Thread.new do
10_000.times do
sum = inc(sum)
end
end
end
threads.each(&:join)
p sum
Output:
$ ruby MutualExclusion.rb
100000
$
My expected output of above program is less than 100,000. Because, the above program create 10 threads and each of the thread
update the shared variable 'sum' to 10,000 times. But during execution of the program, mutual exclusion will definitely happen. Because,
mutual exclusion is not handled here. So I expect less than 100,000 as output. But it gives exactly 100,000 as output. How it is
happened ? Who handle the mutual exclusion here ? And how I experiment this problem(ME).
The default interpreter for Ruby (MRI) doesn't execute threads in parallel. The mechanism that's preventing your race condition from introducing casually unexpected behavior is the Global Interpreter Lock (GIL).
You can learn more about this, including a very similar demonstration, here: http://www.jstorimer.com/blogs/workingwithcode/8085491-nobody-understands-the-gil

Why is this RegExp taking 16 minutes to process on Rails?

I've written a function to remove email addresses from my data using gsub. The code is below. The problem is that it takes a total of 27 minutes to execute the function on a set of 10,000 records. (16 minutes for the first pattern, 11 minutes for the second). Elsewhere in the code I process about 20 other RegExp's using a similar flow (iterating through data.each) and they all finish in less than a second. (BTW, I recognize that my RegExp's aren't perfect and may catch some strings that aren't email addresses.)
Is there something about these two RegExp's that is causing the processing time to be so high? I've tried it on seven different data sources all with the same result, so the problem isn't peculiar to my data set.
def remove_email_addresses!(data)
email_patterns = [
/[[:graph:]]+#[[:graph:]]+/i,
/[[:graph:]]+ +at +[^ ][ [[:graph:]]]{0,40} +dot +com/i
]
data.each do |row|
email_patterns.each do |pattern|
row[:title].gsub!(pattern,"") unless row[:title].blank?
row[:description].gsub!(pattern,"") unless row[:description].blank?
end
end
end
Check that your faster code isn't just doing var =~ /blah/ matching, rather than replacement: that is several orders of magnitude faster.
In addition to reducing backtracking and replacing + and * with ranges for safety, as follows...
email_patterns = [
/\b[-_.\w]{1,128}#[-_.\w]{1,128}/i,
/\b[-_.\w]{1,128} {1,10}at {1,10}[^ ][-_.\w ]{0,40} {1,10}dot {1,10}com/i
]
... you could also try "unrolling your loop", though this is unlikely to cause any issues unless there is some kind of interaction between the iterators (which there shouldn't be, but...). That is:
data.each do |row|
row[:title].gsub!(patterns[0],"") unless row[:title].blank?
row[:description].gsub!(patterns[0],"") unless row[:description].blank?
row[:title].gsub!(patterns[1],"") unless row[:title].blank?
row[:description].gsub!(patterns[1],"") unless row[:description].blank?
end
Finally, if this causes little to no speedup, consider profiling with something like ruby-prof to find out whether the regexes themselves are the issue, or whether there's a problem in the do iterator or the unless clauses instead.
Could it be that the data is large enough that it causes issues with paging once read in? If so, might it be faster to read the data in and parse it in chunks of N entries, rather than process the whole lot at once?

Ruby Multi threading, what am I doing wrong?

So, in order to improve to speed of our app I'm experimenting multi threading with our rails app.
Here is the code:
require 'thwait'
require 'benchmark'
city = Location.find_by_slug("orange-county", :select => "city, state, lat, lng", :limit => 1)
filters = ContractorSearchConditions.new()
image_filter = ImageSearchConditions.new()
filters.lat = city.lat
filters.lon = city.lng
filters.mile_radius = 20
filters.page_size = 15
filters.page = 1
image_filter.page_size = 5
sponsored_filter = filters.dup
sponsored_filter.has_advertised = true
sponsored_filter.page_size = 50
Benchmark.bm do |b|
b.report('with') do
1.times do
cities = Thread.new{
Location.where("lat between ? and ? and lng between ? and ?", city.lat-0.5, city.lat+0.5, city.lng-0.5, city.lng+0.5)
}
images = Thread.new{
Image.search(image_filter)[:hits]
}
sponsored_results_extended = Thread.new{
sponsored_filter.mile_radius = 50
#sponsored_results = Contractor.search( sponsored_filter )
}
results = Thread.new{
Contractor.search( filters )
}
ThreadsWait.all_waits(cities, images, sponsored_results_extended, results)
#cities = cities.value
#images = images.value
#sponsored_results = sponsored_results_extended.value
#results = results.value
end
end
b.report('without') do
1.times do
#cities = Location.where("lat between ? and ? and lng between ? and ?", city.lat-0.5, city.lat+0.5, city.lng-0.5, city.lng+0.5)
#image = Image.search(image_filter)[:hits]
#sponsored_results = Contractor.search( sponsored_filter )
#results = Contractor.search( filters )
end
end
end
Class.search is running a search on our ElasticSearch servers.(3 servers behind a Load balancer), where active record queries are being runned in our RDS instance.
(Everything is in the same datacenter.)
Here is the output on our dev server:
Bob#dev-web01:/usr/local/dev/buildzoom/rails$ script/rails runner script/thread_bm.rb -e development
user system total real
with 0.100000 0.010000 0.110000 ( 0.342238)
without 0.020000 0.000000 0.020000 ( 0.164624)
Nota: I've a very limited knowledge if no knowledge about thread, mutex, GIL, ..
There is a lot more overhead in the "with" block than the "without" block due to the Thread creation and management. Using threads will help the most when the code is IO-bound, and it appears that is NOT the case. Four searches complete in 20ms (without block), which implies that in parallel those searches should take less that amount of time. The "with" block takes 100ms to execute, so we can deduce that at least 80ms of that time is not spent in searches. Try benchmarking with longer queries to see how the results differ.
Note that I've made the assumption that all searches have the same latency, which may or may not be true, and always perform the same. It may be possible that the "without" block benefits from some sort of query caching since it runs after the "with" block. Do results differ when you swap the order of the benchmarks? Also, I'm ignoring overhead from the iteration (1.times). You should remove that unless you change the number of iterations to something greater than 1.
Even though you are using threads, and hence performing query IO in parallel, you still need to deserialize whatever results are coming back from your queries. This uses the CPU. MRI Ruby 2.0.0 has a global interpreter lock. This means Ruby code can only run one line at a time, not in parallel, and only on one CPU core. In order to deserialize all your results, the CPU has to context switch many times between the different threads. This is a lot more overhead than deserializing each result set sequentially.
If your wall time is dominated by waiting for a response from your queries, and they don't all come back at the same time, then there might be an advantage to parallelizing with threads. But it's hard to predict that.
You could try using JRuby or Rubinius. These will both utilize multiple cores, and hence can actually speed up your code as expected.

How to implement multi-user safe Linear Congruential Generator in Redis?

I use Linear Congruential Generators (http://en.wikipedia.org/wiki/Linear_congruential_generator) to generate IDs exposed to users.
nextID = (a * LastID + c) % m
Now I want to implement LCGs in Redis. Here's the problem:
Getting the current ID and generating the next ID outside Redis is not multi-user safe. Redis has 2 commands which can be used for simple counters: INCRBY and INCRBYFLOAT, but unfortunately Redis doesn't support modulo operation natively. At the moment the only way I see is using EVAL command and writing some lua script.
UPDATE1:
Some lua analog of
INCRBY LCG_Value ((LCG_Value*a+c)%m)-LCG_Value
seems to be a neat way to achieve this.
A server-side Lua script is probably the easiest and more efficient way to do it.
Now it can also be done using Redis primitive operations using a multi/exec block. Here it is in pseudo-code:
while not successful:
WATCH LCG_Value
$LastID = GET LCG_value
$NextID = (a*$LastID+c)%m
MULTI
SET LCG_value $NextID
EXEC
Of course, it is less efficient than the following Lua script:
# Set the initial value
SET LCG_value 1
# Execute Lua script with on LCG_value with a, c, and m parameters
EVAL "local last = tonumber(redis.call( 'GET', KEYS[1]));
local ret = (last*ARGV[1] + ARGV[2])%ARGV[3];
redis.call('SET',KEYS[1], ret);
return ret;
" 1 LCG_value 1103515245 12345 2147483648
Note: the whole script execution is atomic. See the EVAL documentation.

Resources