Redis sorted set and solving ties - lua

I'm using a Redis sorted set to store a ranking for a project I'm working on. We hadn't anticipated (!) how we wanted to handle ties. Redis sorts lexicographically the entries that have the same score, but what we want to do is instead give the same rank to all the entries that have the same score, so for instance in the case of
redis 127.0.0.1:6379> ZREVRANGE foo 0 -1 WITHSCORES
1) "first"
2) "3"
3) "second3"
4) "2"
5) "second2"
6) "2"
7) "second1"
8) "2"
9) "fifth"
10) "1"
we want to consider second1, second2 and second3 as both having position 2, and fifth to have position 5. Hence there is no entry in the third or fourth position. ZREVRANK is not useful here, so what's the best way to get the number I'm looking for?

It seems to me one way is writing a little Lua script and use the EVAL command. The resulting operation has still logarithmic complexity.
For example, suppose we are interested in the position of second2. In the script, first we get its score with ZSCORE, obtaining 2. Then we get the first entry with that score using ZRANGEBYSCORE, obtaining second3. The position we're after is then ZREVRANK of second3 plus 1.
redis 127.0.0.1:6379> ZSCORE foo second2
"2"
redis 127.0.0.1:6379> ZREVRANGEBYSCORE foo 2 2 LIMIT 0 1
1) "second3"
redis 127.0.0.1:6379> ZREVRANK foo second3
(integer) 1
So the script could be something like
local score = redis.call('zscore', KEYS[1], ARGV[1])
if score then
local member = redis.call('zrevrangebyscore', KEYS[1], score, score, 'limit', 0, 1)
return redis.call('zrevrank', KEYS[1], member[1]) + 1
else return -1 end

Related

Picking a chosen amount of things from a table without repetition

So I'm currently working on a little side project, so this is my first time learning LUA and I'm currently stuck. So what I'm trying to do is create a function that will randomly choose two numbers between 1 and 5 and make it so they can not collide with the player. I can not seem to get the ability to chose two numbers at random without them being the same. I've been looking around, but have not been able to find a clear answer. Any help would be much appreciated!
My code so far:
local function RandomChoice1()
local t = {workspace.Guess1.CB1,workspace.Guess1.CB2,workspace.Guess1.CB3,workspace.Guess1.CB4,workspace.Guess1.CB5}
local i = math.random(1,5)
end
If you need to select one with probability 20% (one from 1..5 range) and the second one with probability 25% (one from 1..5 range minus the first choice), then something like this should work:
local i1 = math.random(1,5) -- pick one at random from 1..5 interval
-- shift the interval up to account for the selected item
local i2 = math.random(2,5) -- pick one at random from 2..5 interval
-- assign 1 in case of a collision
if i2 == i1, then i2 = 1 end
This will guarantee the numbers not being equal and satisfying your criteria.
Instead of generating i2 you can generate difference i2 - i1
local i1 = math.random(5) -- pick one at random from 1..5 interval
local diff = math.random(4) -- pick one at random from 1..4 interval
local i2 = (i1 + diff - 1) % 5 + 1 -- from 1..5 interval, different from i1
print(i1, i2)
You could use recursion. Save the previous number and if it's the same just generate a new one until its not the same. This way you are garaunteed to never have the same number twice.
local i = 0;
function ran(min,max)
local a = math.random(min,max);
if (a == i) then
return ran(min,max);
else
i = a;
return a;
end
end
Example: "2 from 5" without doubles...
local t = {}
for i = 1, 5 do
t[i] = i
end
-- From now a simple table.remove()...
-- ( table.remove() returns the value of removed key/value pair )
-- ...on a random key avoids doubles
for i = 1, 2 do
print(table.remove(t, math.random(#t)))
end
Example output...
1
4

How to combine search text with other criteria using Redis?

I successfully wrote an intersection of text search and other criteria using Redis. To achieve that I'm using a Lua script. The issue is that I'm not only reading, but also writing values from that script. From Redis 3.2 it's possible to achieve that by calling redis.replicate_commands(), but not before 3.2.
Below is how I'm storing the values.
Names
> HSET product:name 'Cool product' 1
> HSET product:name 'Nice product' 2
Price
> ZADD product:price 49.90 1
> ZADD product:price 54.90 2
Then, to get all products that matches 'ice', for example, I call:
> HSCAN product:name 0 MATCH *ice*
However, since HSCAN uses a cursor, I have to call it multiple times to fetch all results. This is where I'm using a Lua script:
local cursor = 0
local fields = {}
local ids = {}
local key = 'product:name'
local value = '*' .. ARGV[1] .. '*'
repeat
local result = redis.call('HSCAN', key, cursor, 'MATCH', value)
cursor = tonumber(result[1])
fields = result[2]
for i, id in ipairs(fields) do
if i % 2 == 0 then
ids[#ids + 1] = id
end
end
until cursor == 0
return ids
Since it's not possible to use the result of a script with another call, like SADD key EVAL(SHA) .... And also, it's not possible to use global variables within scripts. I've changed the part inside the fields' loop to access the list of ID's outside the script:
if i % 2 == 0 then
ids[#ids + 1] = id
redis.call('SADD', KEYS[1], id)
end
I had to add redis.replicate_commands() to the first line. With this change I can get all ID's from the key I passed when calling the script (see KEYS[1]).
And, finally, to get a list 100 product ID's priced between 40 and 50 where the name contains "ice", I do the following:
> ZUNIONSTORE tmp:price 1 product:price WEIGHTS 1
> ZREMRANGEBYSCORE tmp:price 0 40
> ZREMRANGEBYSCORE tmp:price 50 +INF
> EVALSHA b81c2b... 1 tmp:name ice
> ZINTERSTORE tmp:result tmp:price tmp:name
> ZCOUNT tmp:result -INF +INF
> ZRANGE tmp:result 0 100
I use the ZCOUNT call to know in advance how many result pages I'll have, doing count / 100.
As I said before, this works nicely with Redis 3.2. But when I tried to run the code at AWS, which only supports Redis up to 2.8, I couldn't make it work anymore. I'm not sure how to iterate with HSCAN cursor without using a script or without writing from the script. There is a way to make it work on Redis 2.8?
Some considerations:
I know I can do part of the processing outside Redis (like iterate the cursor or intersect the matches), but it'll affect the application overall performance.
I don't want to deploy a Redis instance by my own to use version 3.2.
The criteria above (price range and name) is just an example to keep things simple here. I have other fields and type of matches, not only those.
I'm not sure if the way I'm storing the data is the best way. I'm willing to listen suggestion about it.
The only problem I found here is storing the values inside a lua scirpt. So instead of storing them inside a lua, take that value outside lua (return that values of string[]). Store them in a set in a different call using sadd (key,members[]). Then proceed with intersection and returning results.
> ZUNIONSTORE tmp:price 1 product:price WEIGHTS 1
> ZREVRANGEBYSCORE tmp:price 0 40
> ZREVRANGEBYSCORE tmp:price 50 +INF
> nameSet[] = EVALSHA b81c2b... 1 ice
> SADD tmp:name nameSet
> ZINTERSTORE tmp:result tmp:price tmp:name
> ZCOUNT tmp:result -INF +INF
> ZRANGE tmp:result 0 100
IMO your design is the most optimal one. One advice would be to use pipeline wherever possible, as it would process everything at one go.
Hope this helps
UPDATE
There is no such thing like array ([ ]) in lua you have to use the lua table to achieve it. In your script you are returning ids right, that itself is an array you can use it as a separate call to achieve the sadd.
String [] nameSet = (String[]) evalsha b81c2b... 1 ice -> This is in java
SADD tmp:name nameSet
And the corresponding lua script is the same as that of your 1st one.
local cursor = 0
local fields = {}
local ids = {}
local key = 'product:name'
local value = '*' .. ARGV[1] .. '*'
repeat
local result = redis.call('HSCAN', key, cursor, 'MATCH', value)
cursor = tonumber(result[1])
fields = result[2]
for i, id in ipairs(fields) do
if i % 2 == 0 then
ids[#ids + 1] = id
end
end
until cursor == 0
return ids
The problem isn't that you're writing to the database, it's that you're doing a write after a HSCAN, which is a non-deterministic command.
In my opinion there's rarely a good reason to use a SCAN command in a Lua script. The main purpose of the command is to allow you to do things in small batches so you don't lock up the server processing a huge key space (or hash key space). Since scripts are atomic, though, using HSCAN doesn't help—you're still locking up the server until the whole thing's done.
Here are the options I can see:
If you can't risk locking up the server with a lengthy command:
Use HSCAN on the client. This is the safest option, but also the slowest.
If you're want to do as much processing in a single atomic Lua command as possible:
Use Redis 3.2 and script effects replication.
Do the scanning in the script, but return the values to the client and initiate the write from there. (That is, Karthikeyan Gopall's answer.)
Instead of HSCAN, do an HKEYS in the script and filter the results using Lua's pattern matching. Since HKEYS is deterministic you won't have a problem with the subsequent write. The downside, of course, is that you have to read in all of the keys first, regardless of whether they match your pattern. (Though HSCAN is also O(N) in the size of the hash.)

Forcing a table rehash not working after a previous rehash

I've created a function that resizes an array and sets new entries to 0, but can also decrease the size of the array in 2 different ways:
1. Simply setting the n property to the new size (the length operator cannot be used because of this reason).
2. Setting all values after the new size to nil up to 2*size to force a rehash.
local function resize(array, elements, free)
local size = array.n
if elements < size then -- Decrease Size
array.n = elements
if free then
size = math.max(size, #array) -- In case of multiple resizes
local base = elements + 1
for idx = base, 2*size do -- Force a rehash -> free extra unneeded memory
array[idx] = nil
end
end
elseif elements > size then -- Increase Size
array.n = elements
for idx = size + 1, elements do
array[idx] = 0
end
end
end
How I tested it:
local mem = {n=0};
resize(mem, 50000)
print(mem.n, #mem) -- 50000 50000
print(collectgarbage("count")) -- relatively large number
resize(mem, 10000, true)
print(mem.n, #mem) -- 10000 10000
print(collectgarbage("count")) -- smaller number
resize(mem, 20, true)
print(mem.n, #mem) -- 20 20
print(collectgarbage("count")) -- same number as above, but it should be a smaller number
However when I don't pass true as the third argument to the second call of resize (so it doesn't force a rehash on the second call), the third call does end up rehashing it.
Am I missing something? I'm expecting the third one to also rehash after the second one has.
Here is a clearer picture of how the table usually looks like before and after the resizes:
table: 0x15bd3d0 n: 0 #: 0 narr: 0 nrec: 1
table: 0x15bd3d0 n: 50000 #: 50000 narr: 65536 nrec: 1
table: 0x15bd3d0 n: 10000 #: 10000 narr: 16384 nrec: 2
table: 0x15bd3d0 n: 20 #: 20 narr: 16384 nrec: 2
And here is what happens:
During the resize to 50000 elements, the table is rehashed several times, and at the end it contains exactly one hash part slot for the n field and enough array part slots for the integer keys.
During the shrinking to 10000 elements, you first assign nil to the integer keys 10001 to 65536, and then from 65537 to 100000. The first group of assignments will never cause a rehash, because you assign to existing fields. This has to do with the guarantees for the next function. The second group of assignments will cause rehashes, but since you are assinging nils, Lua will realize at some point that the array part of the table is more than half empty (see comment at the beginning of ltable.c). Lua will then shrink the array part to a reasonable size and use a second hash slot for the new key. But since you are assigning nils, that second hash slot is never occupied, and Lua is free to re-use it for all the remaining assignments (and it often but not always does). You wouldn't notice a rehash at this point anyway, because you will always end up with the 16384 array slots and 2 hash slots (one for n, one for the new element to be assigned).
The shrinking to 20 elements just continues this way, with the exception that a second hash slot is already available. So you might never get a rehash (and the array size stays larger than necessary), but if you do (Lua for some reason doesn't like the one free hash slot), you'll see the number of array slots drop to a reasonable level.
This is what it looks like when you do get a rehash during the second shrinking:
table: 0x11c43d0 n: 0 #: 0 narr: 0 nrec: 1
table: 0x11c43d0 n: 50000 #: 50000 narr: 65536 nrec: 1
table: 0x11c43d0 n: 10000 #: 10000 narr: 16384 nrec: 2
table: 0x11c43d0 n: 20 #: 20 narr: 32 nrec: 2
If you want to repeat my experiments, the git HEAD version of lua-getsize (original version here) now also returns the number of slots in the array/hash parts of a table.

how to delete key from list in redis matching a pattern?

Using the ruby redis client
I have a key that contains a list of values they follow the pattern of
campaign_id|telephone|query_id
there are thousands of these in a individual list what i want to do is delete all the ones that have for example the query_id of 4 from that redis list. Can you do this through some sort of pattern matching? Please could someone give me an example as i've been reading through other questions and am a bit lost
You basically have one of two options: a) do it your (RoR) application or b) do it in Redis.
I'm not a RoR expert so I can't advise on the how, but note that taking the a) path you'll basically be moving the entire list to your application and there do the filtering. The bigger your list is, the more time it will take it to cross the network.
Option b) means that you'll be filtering the list right in Redis - this can be done simply and efficiently when you use Lua. Example:
$ cat dellistbyqueryid.lua
-- removes a range of list elements that confirm to a given
-- query_id. Elements are stored as: 'campaign_id|telephone|query_id'
-- KEYS[1] - a list
-- ARGV[1] - a query_id
-- return: number of elements removed
local l = tonumber(redis.call('LLEN', KEYS[1]))
local n = 0
while l > 0 do
local curr = redis.call('LINDEX', KEYS[1], -1)
local id = curr:match( '.*|.*|(.*)' )
if id == ARGV[1] then
redis.call('RPOP', KEYS[1])
n = n + 1
else
redis.call('RPOPLPUSH', KEYS[1], KEYS[1])
end
l = l - 1
end
return n
Output:
$ redis-cli LPUSH list "foo|bar|1" "baz|qaz|2" "lua|redis|1"
(integer) 3
$ redis-cli --eval dellistbyqueryid.lua list , 1
(integer) 2
$ redis-cli LRANGE list 0 -1
1) "baz|qaz|2"

How can filter any SET by its concat value according to another SET in Redis

I have a filter optimization problem in Redis.
I have a Redis SET which keeps the doc and pos pairs of a type in a corpus.
example:
smembers type_in_docs.1
result: doc.pos pairs
array (size=216627)
0 => string '2805.2339' (length=9)
1 => string '2410.14208' (length=10)
2 => string '3516.1810' (length=9)
...
Another redis set i create live according to user choices
It contains selected docs.
smembers filteredDocs
I want to filter doc.pos pairs "type_in_docs" set according to user Doc id choices.
In fact if i didnt use concat values in set it was easy with SINTER.
So i implement a php filter code as below.
It works but need an optimization.
In big doc.pairs set too much time need. (Nearly After 150000 members!)
$concordance= $this->redis->smembers('types_in_docs.'.$typeID);
$filteredDocs= $this->redis->smembers('filteredDocs');
$filtered = array_filter($concordance, function($pairs) use ($filteredDocs) {
if( in_array(substr($pairs, 0, strpos($pairs, '.')), $filteredDocs) ) return true;
});
I tried sorted set with scores as docId.
Bu couldnt find a intersect or filter option for score values.
I am thinking and searching a Redis based solution with supported keys, sets or Lua script for time optimization.
But nothing find.
How can i filter Redis sets with concat values?
Thanks for helps.
Your code is slow primarily because you're moving a lot of data from Redis to your PHP filter. The general motivation here should be perform as much filtering as possible on the server. To do that you'd need to pay some sort of price in CPU & RAM.
There are many ways to do this, here's one:
Ensure you're using Redis v2.8.9 or above.
To allow efficiently looking for doc only, keep your doc.pos pairs as is but use Sorted Sets with score = 0, your e.g.:
ZADD type_in_docs.1 0 2805.2339 0 2410.14208 0 3516.1810
This will allow you to mimic SISMEMBER for doc in the set with:
ZRANGEBYLEX type_in_docs.1 [<$typeID> (<$typeID + "\xff">
You can now just SMEMBERS on the (usually) smaller filterDocs set and then call ZRANGEBYLEX on each for immediate gains.
If you want to do better - in extreme cases (i.e. large filterDocs, small type_in_docs) you should do the reverse.
If you want to do even better, use Lua to wrap up the filtering logic - something like:
-- #usage: redis-cli --filter_doc_pos.lua <filter set keyname> <type pairs keyname>
-- #returns: list of matching doc.pos pairs
local r = {}
for _, fv in pairs(redis.call("SMEMBERS", KEYS[1])) do
local t = redis.call("ZRANGEBYLEX", KEYS[2], "[" .. fv , "(" .. fv .. "\xff")
for _, tv in pairs(t) do
r[#r+1] = tv
end
end
return r

Resources