I am interested in bench-marking different parts of my program for speed. I having tried using info(statistics) and erlang:now()
I need to know down to the microsecond what the average speed is. I don't know why I am having trouble with a script I wrote.
It should be able to start anywhere and end anywhere. I ran into a problem when I tried starting it on a process that may be running up to four times in parallel.
Is there anyone who already has a solution to this issue?
EDIT:
Willing to give a bounty if someone can provide a script to do it. It needs to spawn though multiple process'. I cannot accept a function like timer.. at least in the implementations I have seen. IT only traverses one process and even then some major editing is necessary for a full test of a full program. Hope I made it clear enough.
Here's how to use eprof, likely the easiest solution for you:
First you need to start it, like most applications out there:
23> eprof:start().
{ok,<0.95.0>}
Eprof supports two profiling mode. You can call it and ask to profile a certain function, but we can't use that because other processes will mess everything up. We need to manually start it profiling and tell it when to stop (this is why you won't have an easy script, by the way).
24> eprof:start_profiling([self()]).
profiling
This tells eprof to profile everything that will be run and spawned from the shell. New processes will be included here. I will run some arbitrary multiprocessing function I have, which spawns about 4 processes communicating with each other for a few seconds:
25> trade_calls:main_ab().
Spawned Carl: <0.99.0>
Spawned Jim: <0.101.0>
<0.100.0>
Jim: asking user <0.99.0> for a trade
Carl: <0.101.0> asked for a trade negotiation
Carl: accepting negotiation
Jim: starting negotiation
... <snip> ...
We can now tell eprof to stop profiling once the function is done running.
26> eprof:stop_profiling().
profiling_stopped
And we want the logs. Eprof will print them to screen by default. You can ask it to also log to a file with eprof:log(File). Then you can tell it to analyze the results. We tell it to collapse the run time from all processes into a single table with the option total (see the manual for more options):
27> eprof:analyze(total).
FUNCTION CALLS % TIME [uS / CALLS]
-------- ----- --- ---- [----------]
io:o_request/3 46 0.00 0 [ 0.00]
io:columns/0 2 0.00 0 [ 0.00]
io:columns/1 2 0.00 0 [ 0.00]
io:format/1 4 0.00 0 [ 0.00]
io:format/2 46 0.00 0 [ 0.00]
io:request/2 48 0.00 0 [ 0.00]
...
erlang:atom_to_list/1 5 0.00 0 [ 0.00]
io:format/3 46 16.67 1000 [ 21.74]
erl_eval:bindings/1 4 16.67 1000 [ 250.00]
dict:store_bkt_val/3 400 16.67 1000 [ 2.50]
dict:store/3 114 50.00 3000 [ 26.32]
And you can see that most of the time (50%) is spent in dict:store/3. 16.67% is taken in outputting the result, another 16.67% is taken by erl_eval (this is why you get by running short functions in the shell -- parsing them becomes longer than running them).
You can then start going from there. That's the basics of profiling run times with Erlang. Handle with care, eprof can be quite a load on a production system or for functions that run for too long. Especially on a production system.
You can use eprof or fprof.
The normal way to do this is with timer:tc. Here is a good explanation.
I can recommend you this tool: https://github.com/virtan/eep
You will get something like this https://raw.github.com/virtan/eep/master/doc/sshot1.png as a result.
Step by step instruction for profiling all processes on running system:
On target system:
1> eep:start_file_tracing("file_name"), timer:sleep(20000), eep:stop_tracing().
$ scp -C $PWD/file_name.trace desktop:
On desktop:
1> eep:convert_tracing("file_name").
$ kcachegrind callgrind.out.file_name
Related
I am working on extracting graph embeddings with training GraphSage algorihm. I am working on a large graph consisting of (82,339,589) nodes and (219,521,164) edges. When I checked with ":queries" command the query is listed as running. Algorithm started in 6 days ago. When I look the logs with "docker logs xxx" the last logs listed as
2021-12-01 12:03:16.267+0000 INFO Relationship Store Scan (RelationshipScanCursorBasedScanner): Imported 352,492,468 records and 0 properties from 16247 MiB (17,036,668,320 bytes); took 59.057 s, 5,968,663.57 Relationships/s, 275 MiB/s (288,477,487 bytes/s) (per thread: 1,492,165.89 Relationships/s, 68 MiB/s (72,119,371 bytes/s))
2021-12-01 12:03:16.269+0000 INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING
INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING Actual
memory usage of the loaded graph: 8602 MiB
INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:64076] ] GraphSageTrain ::
Start
There is a way to see detailed logs about training process. Is it normal for taking 6 days for graphs with shared sizes ?
It is normal for GraphSAGE to take a long time compared to FastRP or Node2Vec. Starting in GDS 1.7, you can use
CALL gds.beta.listProgress(jobId: String)
YIELD
jobId,
taskName,
progress,
progressBar,
status,
timeStarted,
elapsedTime
If you call without passing in a jobId, it will return a list of all running jobs. If you call with a jobId, it will give you details about a running job.
This query will summarize the details for job 03d90ed8-feba-4959-8cd2-cbd691d1da6c.
CALL gds.beta.listProgress("03d90ed8-feba-4959-8cd2-cbd691d1da6c")
YIELD taskName, status
RETURN taskName, status, count(*)
Here's the documentation for progress logging. The system monitoring procedures might also be helpful to you.
I have been experiencing some extremely bad slowdowns in Neo4j, and having spent a few days on the issue now, I still can't figure out why. I'm really hoping someone here can help. I've also tried the neo slack support group already, but to no avail.
My setup is as follows: the back-end is a django app that connects through the official drivers (pip package neo4j-driver==1.5.0) to a dockerized Neo4j Enterprise 3.2.3 instance. The data we write is added in infrequent bursts of around 15 concurrent merges to the same portion of the graph, and is triggered when a user interacts with some part of our product (each interaction causing a separate merge).
Each merge operation is the following query:
MERGE (m :main:entity:person {user: $user, task: $task, type: $type,
text: $text})
ON CREATE SET m.source = $list, m.created = $timestamp, m.task_id=id(m)
ON MATCH SET m.source = CASE
WHEN $source IN m.source THEN m.source
ELSE m.source + $list
END SET m.modified = $timestamp
RETURN m.task_id AS task_id;
A PROFILE of this query looks like this. As you can see, the individual processing time is in the ms range. We have tested running this 100+ times in quick succession with no issues. We have a Node key configured as in this schema.
The running system however seems to seize up and we see execution times for these queries hit as high as 2 minutes! A snapshot of the running queries looks like this.
Does anyone have any clues as to what may be going on?
Further system info:
ls data/databases/graph.db/*store.db* | du -ch | tail -1
249.6M total
find data/databases/graph.db/schema/index -regex '.*/native.*' | du -hc | tail -1
249.6M total
ps
1 root 297:51 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -cp /var/lib/neo4j/plugins:/var/lib/neo4j/conf:/var/lib/neo4j/lib/*:/var/lib/neo4j/plugins/* -server -Xms8G -Xmx8G -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPr
printenv | grep NEO
NEO4J_dbms_memory_pagecache_size=4G
NEO4J_dbms_memory_heap_maxSize=8G
The machine is has 16GB total memory and there is nothing else running on it.
I have an ESP-12F module that I flashed with the current NodeMCU dev-branch firmware. The module is powered by a >2A power supply. I use 4 GPIO's to control the driver of a little stepper motor (this is the combo).
I wrote a little Lua script (partially based on the arduino version described here) in ESPlorer to control the motor, and the program does work, the motor turns accordingly, but it reboots the module when I call the function turn with too many steps. The limit is at around 180 steps, sometimes a little bit higher, sometimes a little bit below that number.
I'm really new to programming this kind of modules and I'm also just learning Lua, can anybody imagine what happens here and how I can avoid the reboots? BTW: I also tried supplying external 5 Volts to the driver board, but it did not change anything.
This is my script:
gpio.mode(5, gpio.OUTPUT)
gpio.mode(6, gpio.OUTPUT)
gpio.mode(7, gpio.OUTPUT)
gpio.mode(0, gpio.OUTPUT)
sg = function (n,v) gpio.write(n, (v == 0 and gpio.LOW or gpio.HIGH)) end
stepRight = function ()
sg(5,0);sg(6,0);sg(7,0);sg(0,1);
sg(5,0);sg(6,0);sg(7,1);sg(0,1);
sg(5,0);sg(6,0);sg(7,1);sg(0,0);
sg(5,0);sg(6,1);sg(7,1);sg(0,0);
sg(5,0);sg(6,1);sg(7,0);sg(0,0);
sg(5,1);sg(6,1);sg(7,0);sg(0,0);
sg(5,1);sg(6,0);sg(7,0);sg(0,0);
sg(5,1);sg(6,0);sg(7,0);sg(0,1);
sg(5,0);sg(6,0);sg(7,0);sg(0,0);
end
turn = function (dir, steps)
if dir == 'right' then
for i=0,steps,1 do
stepRight()
end
end
end
Here are some details about the module and the firmware:
NodeMCU custom build by frightanic.com
branch: dev
commit: c54bc05ba61fe55f0dccc1a1506791ba41f1d31b
SSL: true
modules: adc,cjson,crypto,dht,file,gpio,hmc5883l,http,i2c,l3g4200d,mqtt,net,node,ow,pwm,spi,tmr,tsl2561,uart,wifi
build built on: 2016-11-21 19:02
powered by Lua 5.1.4 on SDK 1.5.4.1(39cb9a32)
This is what it looks like when I call the turn function with a too high value:
turn('right',200)
ets Jan 8 2013,rst cause:2, boot mode:(3,7)
load 0x40100000, len 26144, room 16
tail 0
chksum 0x95
load 0x3ffe8000, len 2288, room 8
tail 8
chksum 0xa8
load 0x3ffe88f0, len 8, room 0
tail 8
chksum 0x66
csum 0x66
����o�r��n|�llll`��r�l�l��
NodeMCU custom build by frightanic.com
branch: dev
commit: c54bc05ba61fe55f0dccc1a1506791ba41f1d31b
SSL: true
modules: adc,cjson,crypto,dht,file,gpio,hmc5883l,http,i2c,l3g4200d,mqtt,net,node,ow,pwm,spi,tmr,tsl2561,uart,wifi
build built on: 2016-11-21 19:02
powered by Lua 5.1.4 on SDK 1.5.4.1(39cb9a32)
lua: cannot open init.lua
>
Update: I found a solution that works, but I can't explain why. Maybe someone can shed some light on this?
I thought that I had to approach the problem by finding out when and how the reboot occurs, so I added a little timer delay to the for loop:
for i=0,steps,1 do
stepRight()
tmr.delay(10)
end
This does not affect the speed of the motor in any noticable way, but now I can easily crank up the numbers as high as I want ;) I can use turn('right',200000) and the reboot is completely gone, it did not reoccur even once, even if I set the delay to only 1 µs. That's great - but I'd love to know why that helps?
You are calling the sg()7,200 times in a single turn function. You have to break your processing up to avoid time-outs. This is just the way that the ESP8266 SDK requires.
Read my FAQ in the documentation for a more detailed discussion.
First I have to mention that I run on a CentOS 7 tuned up to support 1 million connections. I tested with a simple C server and client and I connected 512000 clients. I could have connect more but I did not have enought RAM to spawn more linux client machines, since from a machine I can open 65536 connections; 8 machines * 64000 connections each = 512000.
I made a simple Erlang server to which I want to connect 1 million or half a million clients, using the same C client. The problem I'm having now is memory related. For each successfully gen_tcp:accept call I spawn a process. Around 50000 open connections costs me 3.7 GB RAM on server, meanwhile using the C server I could have open 512000 connections using 1.9 GB RAM. It is true that on the C server I did not created a process after accept to handle stuff, I just called accept again in while loop, but even so... guys on web did this erlang thing with less memory ( ejabberd riak )
I presume that the flags that I pass to the erlang VM should do the trick. From what I read in documentation and on the web this is what I have: erl +K true +Q 64200 +P 134217727 -env ERL_MAX_PORTS 40960000 -env ERTS_MAX_PORTS 40960000 +a 16 +hms 1024 +hmbs 1024
This is the server code, I open 1 listener that monitors port 5001 by calling start(1, 5001).
start(Num,LPort) ->
case gen_tcp:listen(LPort,[{reuseaddr, true},{backlog,9000000000}]) of
{ok, ListenSock} ->
start_servers(Num,ListenSock),
{ok, Port} = inet:port(ListenSock),
Port;
{error,Reason} ->
{error,Reason}
end.
start_servers(0,_) ->
ok;
start_servers(Num,LS) ->
spawn(?MODULE,server,[LS,0]),
start_servers(Num-1,LS).
server(LS, Nr) ->
io:format("before accept ~w~n",[Nr]),
case gen_tcp:accept(LS) of
{ok,S} ->
io:format("after accept ~w~n",[Nr]),
spawn(ex,server,[LS,Nr+1]),
proc_lib:hibernate(?MODULE, loop, [S]);
Other ->
io:format("accept returned ~w - goodbye!~n",[Other]),
ok
end.
loop(S) ->
ok = inet:setopts(S,[{active,once}]),
receive
{tcp,S, _Data} ->
Answer = 1, % Not implemented in this example
gen_tcp:send(S,Answer),
proc_lib:hibernate(?MODULE, loop, [S]);
{tcp_closed,S} ->
io:format("Socket ~w closed [~w]~n",[S,self()]),
ok
end.
Given this configuration your my beam consumed about 2.5 GB of memory just on start without even your module loaded.
However, if you reduce maximum number of processes to the reasonable value, like +P 60000 for 50 000 connections test, memory consumption drops rapidly.
With 60 000 processes limit VM only used 527MB of virtual memory on start.
I've tried to reproduce your test, but unfortunately I was only able to launch 30 000 netcat's on my system before running out of memory (because of client jobs). However I only observed increase of VM memory consumption up to 570MB.
So my suggestion is that your numbers come from high startup memory consumption and not great number of opened connections. Even then you actually should pay attention to the stats change along with increasing number of opened connections and not absolute values.
I finally used the following configuration for my benchmark:
erl +K true +Q 64200 +P 60000 -env ERL_MAX_PORTS 40960000 -env ERTS_MAX_PORTS 40960000 +a 16 +hms 1024 +hmbs 1024
So I've launched clients with the command
for i in `seq 1 50000`; do nc 127.0.0.1 5001 & done
Apart from tunes you already made you can adjust tcp buffers as well. By default they take OS default values, but you can pass {recbuf, Size}and {sndbuf, Size} to gen_tcp:listen. It may reduce memory footprints significantly.
I have several ~50 GB text files that I need to parse for specific contents. My files contents are organized in 4 line blocks. To perform this analysis I read in subsections of the file using file.read(chunk_size) and split into blocks of 4 then analyze them.
Because I run this script often, I've been optimizing and have tried varying the chunk size. I run 64 bit 2.7.1 python on OSX Lion on a computer with 16 GB RAM and I noticed that when I load chunks >= 2^31, instead of the expected text, I get large amounts of /x00 repeated. This continues as far as my testing has shown all the way to, and including 2^32, after which I once again get text. However, it seems that it's only returning as many characters as bytes have been added to the buffer above 4 GB.
My test code:
for i in range((2**31)-3, (2**31)+3)+range((2**32)-3, (2**32)+10):
with open('mybigtextfile.txt', 'rU') as inf:
print '%s\t%r'%(i, inf.read(i)[0:10])
My output:
2147483645 '#HWI-ST550'
2147483646 '#HWI-ST550'
2147483647 '#HWI-ST550'
2147483648 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
2147483649 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
2147483650 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967293 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967294 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967295 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967296 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967297 '#\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967298 '#H\x00\x00\x00\x00\x00\x00\x00\x00'
4294967299 '#HW\x00\x00\x00\x00\x00\x00\x00'
4294967300 '#HWI\x00\x00\x00\x00\x00\x00'
4294967301 '#HWI-\x00\x00\x00\x00\x00'
4294967302 '#HWI-S\x00\x00\x00\x00'
4294967303 '#HWI-ST\x00\x00\x00'
4294967304 '#HWI-ST5\x00\x00'
4294967305 '#HWI-ST55\x00'
What exactly is going on?
Yes, this is the known issue according to the comment in cpython's source code. You can check it in Modules/_io/fileio.c. And the code add a workaround on Microsoft windows 64bit only.