Why Iroha Consensus takes seconds? - hyperledger

I'm doing some scalability testing with hyperledger Iroha using docker containers. Therefore I increase the number of nodes within the network step by step, write some transactions into the ledger an determine the average latency for transaction processing.
The problem is that around 30 nodes the consensus seems to stop working properly i.e. it takes seconds until a transaction is committed.
I have already tried to vary some configuration parameters like vote delay but this does not change irohas behavior.
This my configuration for the iroha nodes:
{
"block_store_path" : "/tmp/block_store/",
"torii_port" : 50051,
"internal_port" : 10001,
"pg_opt" : "host=some-postgres port=5432 user=postgres password=password1234",
"max_proposal_size" : 50,
"proposal_delay" : 5000,
"vote_delay" : 5000,
"mst_enable" : false,
"mst_expiration_time" : 1440,
"max_rounds_delay": 50,
"stale_stream_max_rounds": 2
}
Which sometimes leads to transaction processing times of around 10 seconds: https://gist.github.com/dltuser12/913e036efd735b2996d387b1423096c9 (Iroha Log file for a corresponding example)

Related

Neo4j GraphSage training does not log anything

I am working on extracting graph embeddings with training GraphSage algorihm. I am working on a large graph consisting of (82,339,589) nodes and (219,521,164) edges. When I checked with ":queries" command the query is listed as running. Algorithm started in 6 days ago. When I look the logs with "docker logs xxx" the last logs listed as
2021-12-01 12:03:16.267+0000 INFO Relationship Store Scan (RelationshipScanCursorBasedScanner): Imported 352,492,468 records and 0 properties from 16247 MiB (17,036,668,320 bytes); took 59.057 s, 5,968,663.57 Relationships/s, 275 MiB/s (288,477,487 bytes/s) (per thread: 1,492,165.89 Relationships/s, 68 MiB/s (72,119,371 bytes/s))
2021-12-01 12:03:16.269+0000 INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING
INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING Actual
memory usage of the loaded graph: 8602 MiB
INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:64076] ] GraphSageTrain ::
Start
There is a way to see detailed logs about training process. Is it normal for taking 6 days for graphs with shared sizes ?
It is normal for GraphSAGE to take a long time compared to FastRP or Node2Vec. Starting in GDS 1.7, you can use
CALL gds.beta.listProgress(jobId: String)
YIELD
jobId,
taskName,
progress,
progressBar,
status,
timeStarted,
elapsedTime
If you call without passing in a jobId, it will return a list of all running jobs. If you call with a jobId, it will give you details about a running job.
This query will summarize the details for job 03d90ed8-feba-4959-8cd2-cbd691d1da6c.
CALL gds.beta.listProgress("03d90ed8-feba-4959-8cd2-cbd691d1da6c")
YIELD taskName, status
RETURN taskName, status, count(*)
Here's the documentation for progress logging. The system monitoring procedures might also be helpful to you.

Measure the duration of x amount of requests while using K6

I would like to use K6 in order to measure the time it takes to proces 1.000.000 requests (in total) by an API.
Scenario
Execute 1.000.000 (1 million in total) get requests by 50 concurrent users/theads, so every user/thread executes 20.000 requests.
I've managed to create such a scenario with Artillery.io, but I'm not sure how to create the same one while using K6. Could you point me in the right direction in order to create the scenario? (Most examples are using a pre-defined duration, but in this case I don't know the duration -> this is exactly what I want to measure).
Artillery yml
config:
target: 'https://localhost:44000'
phases:
- duration: 1
arrivalRate: 50
scenarios:
- flow:
- loop:
- get:
url: "/api/Test"
count: 20000
K6 js
import http from 'k6/http';
import {check, sleep} from 'k6';
export let options = {
iterations: 1000000,
vus: 50
};
export default function() {
let res = http.get('https://localhost:44000/api/Test');
check(res, { 'success': (r) => r.status === 200 });
}
The iterations + vus you've specified in your k6 script options would result in a shared-iterations executor, where VUs will "steal" iterations from the common pile of 1m iterations. So, the faster VUs will complete slightly more than 20k requests, while the slower ones will complete slightly less, but overall you'd still get 1 million requests. And if you want to see how quickly you can complete 1m requests, that's arguably the better way to go about it...
However, if having exactly 20k requests per VU is a strict requirement, you can easily do that with the aptly named per-vu-iterations executor:
export let options = {
discardResponseBodies: true,
scenarios: {
'million_hits': {
executor: 'per-vu-iterations',
vus: 50,
iterations: 20000,
maxDuration: '2h',
},
},
};
In any case, I strongly suggest setting maxDuration to a high value, since the default value is only 10 minutes for either executor. And discardResponseBodies will probably help with the performance, if you don't care about the response body contents.
btw you can also do in k6 what you've done in Artillery, have 50 VUs start a single iteration each and then just loop the http.get() call 20000 times inside of that one single iteration... You won't get a very nice UX that way, the k6 progressbars will be frozen until the very end, since k6 will have no idea of your actual progress inside of each iteration, but it will also work.

k6: how to manage rps-limit on each stage of increase the number of VUs

I have a question about basic term for which I did not find a detailed explanation. Input data: framework k6 v0.25.1, http-requests.
Question #1: what is the implementation of VU (virtual user) from a perspective:
1) client-side;
2) server-side;
3) interactions of client-server?
What should you read about subtleties of the VU essence, in particular within k6?
For now I found out what each VU occupies one network port on the client- and server-sides.
Load profiles:
1) rps:1; vus:1; duration for N minutes — I see in Grafana that increase in number of requests is really minimal: +~1rps. Everything is fine;
2) rps:1; vus: 1..1000 with acceleration during for N minutes by option target in the stages — I see that load has increased by ~+100rps in peak, although option "rps" according to k6 documentation is "The maximum number of requests to make per second, in total across all VUs" option i.e. instead of ~+100rps I expected to see load in ~1rps, by analogy with experience #1
— i.e. either k6 bug that rps limit incorrectly does not take amount of rps in all VUs threads or hidden legal behavior for VUs required for each VU to exist.
Note: I set an arbitrary timeout at beginning and end of scenario to achieve even load distribution.
Question #2: What could be cause of incredible growth of rps with illegally exceeded of rps limit when vus is increased?
Example:
import http from "k6/http";
export let options = {
stages: [
{ duration: "1m", target: 1, rps: 1 },
{ duration: "1m", target: 200, rps: 1 },
{ duration: "1m", target: 500, rps: 1 },
{ duration: "1m", target: 1000, rps: 1 },
{ duration: "1m", target: 500, rps: 1 },
{ duration: "1m", target: 200, rps: 1 },
{ duration: "1m", target: 1, rps: 1 },
]
};
export default function() {
http.get("https://httpbin.test.loadimpact.com/get");
console.log("request made by VU " + __VU);
};
Virtual User or VU is k6 specific definition and implementation. VU is the entity that execute your script, make one or more HTTP requests to your server.
If you are testing a web server, you can think VU is the same as real user.
If you are testing API, VU can produce more requests per second (RPS) to server than your real VUs. Example you can define 5 VUs, but each one can produce 10 requests per second. That's why when your VUs increase, you can reach RPS limit very quickly.
You can read more details about VU definition at this link.

Getting connections stats for circuits on Spring cloud zuul

I am running a few micro service instances that are functioning as edge routers and have the #EnableZuulProxy annotation. I have written a number of filters and these control the flow of requests into the system.
What I would like to do is get the circuit stats from what is going on under the covers. I see that there is a underlying netflix class DynamicServerListLoadBalancer that has some of the sts I would like to see. Is it possible to get an instance of it and at specific time get the stats form it>
I can see it has stuff like this: (I formatted a log statement that I saw in my logs)
c.n.l.DynamicServerListLoadBalancer : DynamicServerListLoadBalancer for client authserver initialized:
DynamicServerListLoadBalancer:{
NFLoadBalancer:
name=authserver,current
list of Servers=[127.0.0.1:9999],
Load balancer stats=
Zone stats: {
defaultzone=[
Zone:defaultzone;
Instance count:1;
Active connections count: 0;
Circuit breaker tripped count: 0;
Active connections per server: 0.0;]
},
Server stats:
[[
Server:127.0.0.1:9999;
Zone:defaultZone;
Total Requests:0;
Successive connection failure:0;
Total blackout seconds:0;
Last connection made:Wed Dec 31 19:00:00 EST 1969;
First connection made: Wed Dec 31 19:00:00 EST 1969;
Active Connections:0;
total failure count in last (1000) msecs:0;
average resp time:0.0; 9
0 percentile resp time:0.0;
95 percentile resp time:0.0;
min resp time:0.0;
max resp time:0.0;
stddev resp time:0.0
]]
}
ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList#5b1b78aa
All of this would be valuable to get and act on. Mostly the acting would be to feed usage heuristics back to the system.
Ok, like most of these things, I figured it out myself.
So here you go.
HystrixCommandKey hystrixCommandKey = HystrixCommandKey.Factory.asKey("what you are looking for");
HystrixCommandMetrics hystrixCommandMetrics = HystrixCommandMetrics.getInstance(hystrixCommandKey);
HystrixCommandProperties properties = hystrixCommandMetrics.getProperties();
long maxConnections = properties.executionIsolationSemaphoreMaxConcurrentRequests().get().longValue();
boolean circuitOpen = properties.circuitBreakerForceOpen().get().booleanValue();
int currentConnections = hystrixCommandMetrics.getCurrentConcurrentExecutionCount();
So in this example, "what you are looking for" is the hysteria command that you are looking for.
this gets you the properties of the particular hysteria thing you are looking for.
Form this you pull out the max connections, the current connections and whether the circuit was open.
So there you are.

Erlang and Redis: read performance

I suddenly encountered performance problems when trying to read 1M records from Redis sorted set. I used ZSCAN with cursor and batch size 5K.
Code was executed using Erlang R14 on the same machine that hosts Redis. Receiving of 5K elements batch takes near 1 second. Unfortunately, I failed to compile Erlang R16 on this machine, but I think it does not matter.
For comparison, Node.js code with node_redis (hiredis parser) does 1M in 2 seconds. Same results for Python and PHP.
Maybe I do something wrong?
Thanks in advance.
Here is my Erlang code:
-module(redis_bench).
-export([run/0]).
-define(COUNT, 5000).
run() ->
{_,Conn} = connect_to_redis(),
read_from_redis(Conn).
connect_to_redis() ->
eredis:start_link("host", 6379, 0, "pass").
read_from_redis(_Conn, 0) ->
ok;
read_from_redis(Conn, Cursor) ->
{ok, [Cursor1|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
io:format("Batch~n"),
read_from_redis(Conn, Cursor1).
read_from_redis(Conn) ->
{ok, [Cursor|_]} = eredis:q(Conn, ["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
read_from_redis(Conn, Cursor).
9 out of 10 times, slowness like this is a result of badly written drivers more than it is a result of the system. In this case, the ability to pipeline requests to Redis is going to be important. A client like redo can do pipelining and is maybe faster.
Also, beware measuring one process/thread only. If you want fast concurrent access, it is often balanced out against fast sequential access.
Switching to redis-erl decreased read time of 1M keys to 16 seconds. Not fast, but acceptable.
Here is new code:
-module(redis_bench2).
-export([run/0]).
-define(COUNT, 200000).
run() ->
io:format("Start~n"),
redis:connect([{ip, "host"}, {port, 6379}, {db, 0}, {pass, "pass"}]),
read_from_redis().
read_from_redis(<<"0">>) ->
ok;
read_from_redis(Cursor) ->
[{ok, Cursor1}|_] = redis:q(["ZSCAN", "if:push:sset:test", Cursor, "COUNT", ?COUNT]),
io:format("Batch~n"),
read_from_redis(Cursor1).
read_from_redis() ->
[{ok, Cursor}|_] = redis:q(["ZSCAN", "if:push:sset:test", 0, "COUNT", ?COUNT]),
read_from_redis(Cursor).

Resources