How to model a supervisor tree in Elixir - erlang

Let's say that I have a Zone dynamic supervisor (zone is a genserver), and each Zone has its own Player dynamic supervisor (player is a genserver).
So each Zone has many players, and I have many zones.
Is the way to do that just to store the PID of the Player supervisor in the Zone GenServer?
Is this the correct approach? And then when I start a Zone start a Player supervisor as well?
This is purely conceptual and I am new to doing this sort of thing. I would appreciate any learning resources on this as well!

just to store the PID of the PlayerSupervisor in the ZoneGenServer?
This would be not robust enough if the PlayerSupervisor crashes for some reason. One way would be to make ZoneGenServer trap exists of the respective PlayerSupervisor and also crash upon PlayerSupervisor crash, but that would mean you are implementing part of OTP already provided. I would go with the following (ZoneSupervisor is started with :rest_for_one strategy, all others with :one_for_one):
————————————————————
| ZoneSupervisor |
————————————————————
⇓ ⇓
———————————————————— ——————————————————
| PlayerSupervisor | | ZoneGenServer |
———————————————————— ——————————————————
⇓
————————————————————
| PlayerGenServer |
————————————————————
Now when we are safe against crashes, the only thing would be to make ZoneGenServer aware of PlayerSupervisor. It might be done by asking ZoneSupervisor about its children and/or by name registration with {:via, module, term}. Using PID as a process handler is vulnerable to process restarts (due to crashes etc.) PID changes, the registered name does not.

Related

Implicit vs explicit garbage collection

Background:
I have created an API and I did profiling for memory usage & process time for each web service using this guide and memory profiler gem.
I created a table to keep all profiling results like this:
Request | Memory Usage(b) | Retained Memory(b) | Runtime(seconds)
----------------------------------------------------------------------------
Post Login | 444318 | 35649 | 1.254
Post Register | 232071 | 32673 | 0.611
Get 10 Users | 11947286 | 2670333 | 3.456
Get User By ID | 834953 | 131300 | 0.834
Note: all numbers are the average number of calling the same service 3 consecutive times.
And I read many guides and answers(like this and this) says that Garbage Collector is responsible for releasing memory and we should not explicitly interfere in memory management.
Then I just forced the Garbage Collector to start after each action (for test purpose) by adding the following filter in APIController:
after_action :dispose
def dispose
GC.start
end
And I found that Memory usage is reduced too much (more than 70%), retained memory almost same before and runtime is reduced as well.
Request | Memory Usage(b) | Retained Memory(b) | Runtime(seconds)
----------------------------------------------------------------------------
Post Login | 38636 | 34628 | 1.023
Post Register | 37746 | 31522 | 0.583
Get 10 Users | 2673040 | 2669032 | 2.254
Get User By ID | 132281 | 128913 | 0.782
Questions:
Is it good practice to add such filter and what are the side effects?
I thought that runtime will be more than before, but it seems less, what can be the reason?
Update:
I'm using Ruby 2.3.0 and I've used gc_tracer gem to monitor heap status because I'm afraid to have old garbage collection issues highlighted in Ruby 2.0 & 2.1
The issue is that the Ruby GC is triggered on total number of objects,
and not total amount of used memory
Then to do a stress test, I run the following:
while true
"a" * (1024 ** 2)
end
and result is that memory usage does not exceed the following limits (it was exceeding before and GC wont be triggered):
RUBY_GC_MALLOC_LIMIT_MAX
RUBY_GC_OLDMALLOC_LIMIT_MAX
So now I'm pretty sure that same GC issues of 2.0 & 2.1 don't exist anymore in 2.3, but still getting the following positive results by adding above mentioned filter (after_action :dispose):
Heap memory enhanced by 5% to 10% (check this related question)
General execution time enhanced by 20% to 40% (test done using third party tool Postman which consumes my API)
I'm still looking for answers to my two questions above.
Any feedback would be greatly appreciated

Neo4j slow concurrent merges

I have been experiencing some extremely bad slowdowns in Neo4j, and having spent a few days on the issue now, I still can't figure out why. I'm really hoping someone here can help. I've also tried the neo slack support group already, but to no avail.
My setup is as follows: the back-end is a django app that connects through the official drivers (pip package neo4j-driver==1.5.0) to a dockerized Neo4j Enterprise 3.2.3 instance. The data we write is added in infrequent bursts of around 15 concurrent merges to the same portion of the graph, and is triggered when a user interacts with some part of our product (each interaction causing a separate merge).
Each merge operation is the following query:
MERGE (m :main:entity:person {user: $user, task: $task, type: $type,
text: $text})
ON CREATE SET m.source = $list, m.created = $timestamp, m.task_id=id(m)
ON MATCH SET m.source = CASE
WHEN $source IN m.source THEN m.source
ELSE m.source + $list
END SET m.modified = $timestamp
RETURN m.task_id AS task_id;
A PROFILE of this query looks like this. As you can see, the individual processing time is in the ms range. We have tested running this 100+ times in quick succession with no issues. We have a Node key configured as in this schema.
The running system however seems to seize up and we see execution times for these queries hit as high as 2 minutes! A snapshot of the running queries looks like this.
Does anyone have any clues as to what may be going on?
Further system info:
ls data/databases/graph.db/*store.db* | du -ch | tail -1
249.6M total
find data/databases/graph.db/schema/index -regex '.*/native.*' | du -hc | tail -1
249.6M total
ps
1 root 297:51 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -cp /var/lib/neo4j/plugins:/var/lib/neo4j/conf:/var/lib/neo4j/lib/*:/var/lib/neo4j/plugins/* -server -Xms8G -Xmx8G -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPr
printenv | grep NEO
NEO4J_dbms_memory_pagecache_size=4G
NEO4J_dbms_memory_heap_maxSize=8G
The machine is has 16GB total memory and there is nothing else running on it.

Why do eVars register so many Nones?

I have some eVars that are registering a huge number of Nones.
These variables are set at the same time a corresponding traffic variable is set, for instance:
s.prop20 = someValue;
s.eVar20 = someValue;
When I give a check in Omniture, I see something like:
prop20
someValue1 | 12
someValue2 | 9
someValue3 | 5
.......
eVar20
None | 1987
someValue1 | 12
someValue2 | 9
someValue3 | 5
.......
I'm very confuse about that. One hypotheses of mine is that the variable eVar20 is created even when it is not set, is there a way to avoid this Nones?
Thanks
It has everything to do with how the attribution and expiration works differently between eVars and props.
Props do not have any attribution/expiration options. They are designed as 'traffic variables' and therefore only relate to the hit of data they were defined on.
eVars have flexible options for attribution (most recent, linear, original) and expiration (visit, 30 days etc).
As such what you're seeing is a default reporting option whereby when the metric you're looking at was recorded the 'none' value just states that there was no active eVar at that time.
Let me know if any of this is unclear.

How does an Erlang gen_server start_link a gen_server on another node?

I have an Erlang application that is getting a little too resource-hungry to stay on one node. I'm in the process of making gen_servers move from one process to another - which turns out to be relatively easy. I'm at the last hurdle: getting the factory process that creates these gen_servers to spawn them on the remote node instead of the local one. The default behavior of start_link is clearly to start locally only, but I don't see any option to change that.
It would seem that I'm going to have to be inventive with the solution and wanted to see if anyone out there had already implemented something like this with any success. IOW, what's the recommended solution?
EDIT
I'm looking at the chain of calls that are triggered by calling:
gen_server:start_link(?Module, Args, [])
gen_server:start_link/3:
start_link(Mod, Args, Options) ->
gen:start(?MODULE, link, Mod, Args, Options).
gen:start/5:
start(GenMod, LinkP, Mod, Args, Options) ->
do_spawn(GenMod, LinkP, Mod, Args, Options).
gen:do_spawn/5:
do_spawn(GenMod, link, Mod, Args, Options) ->
Time = timeout(Options),
proc_lib:start_link(?MODULE, init_it,
[GenMod, self(), self(), Mod, Args, Options],
Time,
spawn_opts(Options));
proc_lib:start_link/5:
start_link(M,F,A,Timeout,SpawnOpts) when is_atom(M), is_atom(F), is_list(A) ->
Pid = ?MODULE:spawn_opt(M, F, A, ensure_link(SpawnOpts)),
sync_wait(Pid, Timeout).
Which finally gets us to the interesting bit. There is a spawn_opt/4 that matches:
spawn_opt(M, F, A, Opts) when is_atom(M), is_atom(F), is_list(A) ->
...
...
BUT, there is one that would actually be useful to me:
spawn_opt(Node, M, F, A, Opts) when is_atom(M), is_atom(F), is_list(A) ->
...
...
It boggles my mind that this isn't exposed. I realize that there is a risk that a careless programmer might try to gen_server:start_link a process on a erlang node that happens to be running on Mars, blocking the call for half an hour, but surely, that's the programmers' lookout. Am I really stuck with modifying OTP or writing some sort of ad-hoc solution?
We don't start_link a server on the remote node directly. For a good program structure and simplicity, we start a separate application on the remote node, and delegate the creation of remote processes to a certain process running in the remote application.
Since linking to a process is mainly for the purpose of supervising or monitoring, we prefer doing the linking with local supervisors instead of remote processes. If you need the aliveness status of any remote process, I recommend erlang:monitor and erlang:demonitor.
A typical distributed set-up:
Node1
+---------------+ Node2
| App1 | +---------------+
| Supervisor1 | Proc Creation Request | App2 |
| Processes | -----------------------> | Supervisor2 |
| ...... | | |
| ...... | | | Create Children
| ...... | Monitor | V
| ...... | -----------------------> | Processes |
+---------------+ | ...... |
+---------------+
Maybe rpc module helps you. Especially function async_call.

Is there a reason why arrays in memory 'go' down while the function stack usually 'goes' up?

Though the actual implementation is platform specific, this idea is the cause for potentially dangerous buffer overflows. For example,
-------------
| arr[0] | \
------------- \
| arr[1] | -> arr[3] is local to a function
------------- /
| arr[2] | /
-------------
| frame ptr |
-------------
| ret val |
-------------
| ret addr |
-------------
| args |
-------------
My question is, is there a reason why the local array, for lack of a better verb, flows down? Instead, if the array was to flow up, wouldn't it significantly reduce the number of buffer overflow errors that overwrite the return address?
Granted, by using threads, one could overwrite the return address of a function that the current one has called. But lets ignore it for now.
The array on the stack works just like an array on the heap, i.e. its index increases as the memory address increases.
The stack grows downwards (towards lower addresses) instead of upwards, which is the reason for the array going in the opposite direction of the stack. There is some historic reason for that, probably from the time when the code, heap and stack resided in the same memory area, so the heap and the stack grew from each end of the memory.
I can't cite a source for this, but I believe it's so you can step through memory. Consider while *p++ or something along those lines.
Now, you could just as easily say while *p-- but I guess if they had a choice, they'd rather overwrite someone else's data than their own return value :) Talk about a 'greedy algorithm' (har har)
To have a subarray you usually pass just a pointer to it. Any indexing operation would need to know the size of the array, unless you'd like to make all of memory index backwards -- but if you would, you'd just get yourself in the same situation :P.

Resources