How can I reduce number of handshakes to a redis client? - lua

I am writing an application which requires a very fast response time, I have some queries to redis which would require intersection and union of multiple sets .
An example would be
((A union B) intersection C)
However when I do it with java client, each query requires 1 more handshake thus increasing my response time.
I was wondering if there was a way that I could do it in a single handshake, Lua scripting looks like a good option but I'm not sure how it would work internally

You can also use pipeline to reduce the RTT.
With pipeline, you can send multiple commands to Redis at one time, and read all replies latter.

Redis lua script is a blocking method which does everything at once
inside the redis server.
so you will avoid network round trip for
sure which you needed.
Since it is blocking you should always be
aware of when to use it, if all those union and intersection takes a
5 sec in production environment then it is blocking for other
commands that are waiting to be executed. other commands may not get executed and comes out throwing an time out error which may cause things worse.
So alternately you can do
partial lua script calls for every 50 elements are so (derive an
optimal number by trying out with different numbers).
Also consider using sunion and sinter of multiple elements instead of using only 2 if it fits.
ie,
sunion set1 set2 set3...
instead of
sunionstore temp set1 set2
sunionstore temp temp set3 and so on.
Hope this helps.

Related

How to ensure insert rate 1 insert per second when using ClickhouseIO

I'm using Apache Beam Java SDK to process events and write them to the Clickhouse Database.
Luckily there is ready to use ClickhouseIO.
ClickhouseIO accumulates elements and inserts them in batch, but because of the parallel nature of the pipeline it still results in a lot of inserts per second in my case. I'm frequently receiving "DB::Exception: Too many parts" or "DB::Exception: Too much simultaneous queries" in Clickhouse.
Clickhouse documentation recommends doing 1 insert per second.
Is there a way I can ensure this with ClickhouseIO?
Maybe some KV grouping before ClickhouseIO.Write or something?
It looks like you interpret these errors not quite correct:
DB::Exception: Too many parts
It means that insert affect more partitions than allowed (by default this value is 100, it is managed by parameter max_partitions_per_insert_block).
So either the count of affected partition is really large or the PARTITION BY-key was defined pretty granular.
How to fix it:
try to group the INSERT-batch such way it contains data related to less than 100 partitions
try to reduce the size of insert-block (if it quite huge) - withMaxInsertBlockSize
increase the limit max_partitions_per_insert_block in SQL-query (like this, INSERT .. SETTINGS max_partitions_per_insert_block=300 (I think ClickhouseIO should have the ability to set custom options on query level)) or on server-side by modifying userprofile-settings
DB::Exception: Too much simultaneous queries
This one managed by param max_concurrent_queries.
How to fix it:
reduce the count of concurrent queries by Beam means
increase the limit on the server-side in userprofile- or server-settings (see https://github.com/ClickHouse/ClickHouse/issues/7765)

Redis Capped Sorted Set, List, or Queue?

Has anyone implemented a capped data-structure of any kind in Redis? I'm working on building something like a news feed. The feed will wind up being manipulated and read from very frequently, and holding it in a sorted set in Redis would be cheap and perfect for my use case. The only issue is I only ever need n items per feed, and I'm worried about memory overflow, so I'd like to ensure each feed never gets above n items. It seems pretty trivial to make a capped sorted collection in Redis with Lua:
redis-cli EVAL "$(cat update_feed.lua)" 1 feeds:some_feed "thing_to_add", n
Where update_feed.lua looks something like (without testing it):
redis.call('ZADD', KEYS[1], os.time(), ARGV[1])
local num = redis.call('ZCARD', KEYS[1])
if num > ARGV[2]:
redis.call('ZREMRANGEBYRANK', KEYS[1], -n, -inf)
That's not bad at all, and pretty cheap, but it seems like such a basic thing that could be doable much more cheaply by instantiating the sorted set with only n buckets to begin with. I can't find a way to do that in redis, so I guess my question is: did I miss something, and if I didn't, why is there no structure for this in redis, even if it just ran the basic Lua script I described, it seems like it would be a typical enough use-case that it ought to be implemented as an option for redis data structures?
You can use LTRIM if it is a list.
Excerpt from the documentation.
LPUSH mylist someelement
LTRIM mylist 0 99
This pair of commands will push a new element on the list, while making sure that the list will not grow larger than 100 elements. This is very useful when using Redis to store logs for example. It is important to note that when used in this way LTRIM is an O(1) operation because in the average case just one element is removed from the tail of the list.
I use sorted sets myself for this. I too thought about using lists, but then I found that manipulating the INSIDE of a list is fairly expensive -- O(n) -- while manipulating the inside of a sorted set is O(log n).
That's what sealed the deal for me--will you ever be manipulating the inside of the set? If so, stick with sorted sets and just flush the oldest whenever you have to, just like you were thinking.

Can I profile Lua scripts running in Redis?

I have a cluster app that uses a distributed Redis back-end, with dynamically generated Lua scripts dispatched to the redis instances. The Lua component scripts can get fairly complex and have a significant runtime, and I'd like to be able to profile them to find the hot spots.
SLOWLOG is useful for telling me that my scripts are slow, and exactly how slow they are, but that's not my problem. I know how slow they are, I'd like to figure out which parts of them are slow.
The redis EVAL docs are clear that redis does not export any timekeeping functions to lua, which makes it seem like this might be a lost cause.
So, short a custom fork of Redis, is there any way to tell which parts of my Lua script are slower than others?
EDIT
I took Doug's suggestion and used debug.sethook - here's the hook routine I inserted at the top of my script:
redis.call('del', 'line_sample_count')
local function profile()
local line = debug.getinfo(2)['currentline']
redis.call('zincrby', 'line_sample_count', 1, line)
end
debug.sethook(profile, '', 100)
Then, to see the hottest 10 lines of my script:
ZREVRANGE line_sample_count 0 9 WITHSCORES
If your scripts are processing bound (not I/O bound), then you may be able to use the debug.sethook function with a count hook:
The count hook: is called after the interpreter executes every
count instructions. (This event only happens while Lua is executing a
Lua function.)
You'll have to build a profiler based on the counts you receive in your callback.
The PepperfishProfiler would be a good place to start. It uses os.clock which you don't have, but you could just use hook counts for a very crude approximation.
This is also covered in PiL 23.3 – Profiles
In standard Lua C, you can't. It's not a built-in function - it only returns seconds. So, there are two options available: You either write your own Lua extension DLL to return the time in msec, or:
You can do a basic benchmark using a millisecond-resolution time. You can access the current millisecond time with LuaSocket. Though this adds a dependency to your project, it's an effective way to do trivial benchmarking.
require "socket"
t = socket.gettime();

What is the difference between "raw" dirty operations, and dirty operations within mnesia:async_transaction

What is the difference between a series of mnesia:dirty_ commands executed within a function passed to mnesia:async_dirty() and those very same transactions executed "raw"?
I.e., is there any difference between doing:
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1})
and
F = fun() ->
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1}),
mnesia:dirty_write({table, Rec1})
end,
mnesia:async_dirty(F)
Thanks
Lets first quote the Users' Guide on async_dirty context:
By passing the same "fun" as argument to the function mnesia:async_dirty(Fun [, Args]) it will be performed in dirty context. The function calls will be mapped to the corresponding dirty functions.This will still involve logging, replication and subscriptions but there will beno locking, local transaction storage or commit protocols involved. Checkpointretainers will be updated but will be updated "dirty". Thus, they will be updatedasynchronously. The functions will wait for the operation to be performed on one node but not the others. If the table resides locally no waiting will occur.
The two options you have provided will be executed the same way. However, when you execute the dirty functions out of the fun as in option one, each one is a separate call into mnesia. With async_dirty, the 3 calls will be bundled and mnesia will only wait until the 3 are completed on the local node to return. However, the behavior of these two may differ in a mnesia multi-node cluster. Do some tests :)

Your advice on a Hadoop MapReduce job

I have 2 files stored on a HDFS filesystem:
tbl_userlog: <website url (non canonical)> <tab> <username> <tab> <timestamp>
example: www.website.com, foobar87, 201101251456
tbl_websites: <website url (canonical)> <tab> <total hits>
example: website.com, 25889
I have written an Hadoop sequence of jobs which joins the 2 files on the website, performs a filter on the amount of total hits > n per website and then counts for each user the amount of websites he has visited which has > n total hits. The details of the sequence are as following:
A Map-only job which canonicizes the url in tbl_userlog (i.e. removes www, http:// and https:// from the url field)
A Map-only job which sorts tbl_websites on the url
An identity Map-Reduce job which takes the output of the 2 previous jobs as KeyValueTextInput and feeds them to a CompositeInput in order to make use of Hadoop native joining feature defined with jobConf.set("mapred.join.expr", CompositeInputFormat.compose("inner" (...))
A Map and Reduce job which filters the result of the previous job on total hits > n in its Map phase, groups the results on the in the shuffling phase, and performs the count on the number of websites for each user in the Reduce phase.
In order to chain these steps, I just call the jobs sequentially in the described order. Each individual job outputs its results into HDFS which the following job in the chain then retrieves and processes in turn.
As I am new to Hadoop, I would like to ask for your counseling:
Is there a better way to chain these jobs? In this configuration all intermediate results are written to HDFS and then read back.
Do you see any design flaw in this job, or could it be written more elegantly by making use of some Hadoop feature that I have missed?
I am using Apache Hadoop 0.20.2 and using higher-level frameworks such as Pig or Hive is not possible in the scope of the project.
Thanks in advance for your replies!
I think what you have will work with a couple of caveats. Before I start listing them, I want to make two definitions clear. A map-only job is a job that has a defined Mapper and run's with 0 reducers. If the job is running with > 0 IdentityReducers, then the job is not a map-only job. A reduce-only job is a job that has a define Reducer and run's with an IdentityMapper.
Your first job, can be a map-only job, since all you're doing is canonicalizing URLs. But if you want to use CompositeInputFormat, you should run with an IdentityReducer with more than 0 reducer's.
For your second job, I don't know what you mean by a map-only job that sorts. Sorting by it's very nature is a reduce side task. You probably mean that it has a define Mapper but no Reducer. But in order for the URLs to be sorted, you should run with an IdentityReducer with more than 0 reducer's.
Your third job is an interesting idea, but you have to be careful with CompositeInputFormat. There are two conditions that must be met for you to be able to use this input format. The first is that there has to be the same number of files in both input directories. This can be achieved by setting the same number of reducer's for Job1 and Job2. The second condition is that the input files CANNOT be splittable. This can be achieved by using a non splittable compression such as bzip.
This job sounds good. Although you can filter website that have < n hits in the reducer of the previous job and save yourself some I/O.
There's obviously more than one solution to a problem in software, so while you're solution would work, I wouldn't recommend it. Having 4 MapReduce jobs for this task is a bit expensive IMHO. The implementation I have in mind is a M-R-R workflow that uses Secondary Sort.
As far as chaining jobs is concerned, you should have a look at Oozie, which is a workflow manager. I have yet to use it, but that's where I'd start.

Resources