High Aerospike latency - lua

In the aerospike set we have four bins userId, adId, timestamp, eventype and the primary key is userId:timestamp. Secondary Index is created on userId to get all the records for a particular user and the resulted records are passed to stream udf. On our client side till 500 qps the aerospike query latency is reasonable and mean latency is in microseconds but as soon as we increase the qps above 500 the aerospike query latency shoots up (around ~ 10 ms)
message that we see on the client side is attached below:
Name: Aerospike-13780
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#41aa29a4
Total blocked: 0 Total waited: 554,450
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
com.aerospike.client.lua.LuaInputStream.read(LuaInputStream.java:38)
com.aerospike.client.lua.LuaStreamLib$read.call(LuaStreamLib.java:60)
org.luaj.vm2.LuaClosure.execute(Unknown Source)
org.luaj.vm2.LuaClosure.onInvoke(Unknown Source)
org.luaj.vm2.LuaClosure.invoke(Unknown Source)
org.luaj.vm2.LuaClosure.execute(Unknown Source)
org.luaj.vm2.LuaClosure.onInvoke(Unknown Source)
org.luaj.vm2.LuaClosure.invoke(Unknown Source)
org.luaj.vm2.LuaValue.invoke(Unknown Source)
com.aerospike.client.lua.LuaInstance.call(LuaInstance.java:128)
com.aerospike.client.query.QueryAggregateExecutor.runThreads(QueryAggregateExecutor.java:104)
com.aerospike.client.query.QueryAggregateExecutor.run(QueryAggregateExecutor.java:77)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
below is the lua file:
function ad_count(stream)
local function map_function(record)
local result = map()
result["adId"] = record["adId"]
result["timestamp"] = record["timestamp"]
return result
end
local function add_fn(aggregate, record)
local ad_id = record["adId"]
local map_result = aggregate[ad_id]
local l = list()
if map_result == null then
map_result = l
end
list.append(map_result, record["timestamp"])
aggregate[ad_id] = map_result
return aggregate
end
local m = map()
return stream:map(map_function):aggregate(m, add_fn)
end
There are 2 nodes and the server is hosted in AWS with T2.large instance type.
transaction-queues=8;transaction-threads-per-queue=8;transaction-duplicate-threads=0;transaction-pending-limit=20;migrate-threads=1;migrate-xmit-priority=40;migrate-xmit-sleep=500;migrate-read-priority=10;migrate-read-sleep=500;migrate-xmit-hwm=10;migrate-xmit-lwm=5;migrate-max-num-incoming=256;migrate-rx-lifetime-ms=60000;proto-fd-max=15000;proto-fd-idle-ms=60000;proto-slow-netio-sleep-ms=1;transaction-retry-ms=1000;transaction-max-ms=1000;transaction-repeatable-read=false;dump-message-above-size=134217728;ticker-interval=10;microbenchmarks=false;storage-benchmarks=false;ldt-benchmarks=false;scan-max-active=100;scan-max-done=100;scan-max-udf-transactions=32;scan-threads=4;batch-index-threads=4;batch-threads=4;batch-max-requests=5000;batch-max-buffers-per-queue=255;batch-max-unused-buffers=256;batch-priority=200;nsup-delete-sleep=100;nsup-period=120;nsup-startup-evict=true;paxos-retransmit-period=5;paxos-single-replica-limit=1;paxos-max-cluster-size=32;paxos-protocol=v3;paxos-recovery-policy=manual;write-duplicate-resolution-disable=false;respond-client-on-master-completion=false;replication-fire-and-forget=false;info-threads=16;allow-inline-transactions=true;use-queue-per-device=false;snub-nodes=false;fb-health-msg-per-burst=0;fb-health-msg-timeout=200;fb-health-good-pct=50;fb-health-bad-pct=0;auto-dun=false;auto-undun=false;prole-extra-ttl=0;max-msgs-per-type=-1;service-threads=40;fabric-workers=16;pidfile=/var/run/aerospike/asd.pid;memory-accounting=false;udf-runtime-gmax-memory=18446744073709551615;udf-runtime-max-memory=18446744073709551615;sindex-builder-threads=4;sindex-data-max-memory=18446744073709551615;query-threads=6;query-worker-threads=15;query-priority=10;query-in-transaction-thread=0;query-req-in-query-thread=0;query-req-max-inflight=100;query-bufpool-size=256;query-batch-size=100;query-priority-sleep-us=1;query-short-q-max-size=500;query-long-q-max-size=500;query-rec-count-bound=18446744073709551615;query-threshold=10;query-untracked-time-ms=1000;pre-reserve-qnodes=false;service-address=0.0.0.0;service-port=3000;mesh-address=10.0.1.80;mesh-port=3002;reuse-address=true;fabric-port=3001;fabric-keepalive-enabled=true;fabric-keepalive-time=1;fabric-keepalive-intvl=1;fabric-keepalive-probes=10;network-info-port=3003;enable-fastpath=true;heartbeat-mode=mesh;heartbeat-protocol=v2;heartbeat-address=10.0.1.80;heartbeat-port=3002;heartbeat-interval=150;heartbeat-timeout=10;enable-security=false;privilege-refresh-period=300;report-authentication-sinks=0;report-data-op-sinks=0;report-sys-admin-sinks=0;report-user-admin-sinks=0;report-violation-sinks=0;syslog-local=-1;enable-xdr=false;xdr-namedpipe-path=NULL;forward-xdr-writes=false;xdr-delete-shipping-enabled=true;xdr-nsup-deletes-enabled=false;stop-writes-noxdr=false;reads-hist-track-back=1800;reads-hist-track-slice=10;reads-hist-track-thresholds=1,8,64;writes_master-hist-track-back=1800;writes_master-hist-track-slice=10;writes_master-hist-track-thresholds=1,8,64;proxy-hist-track-back=1800;proxy-hist-track-slice=10;proxy-hist-track-thresholds=1,8,64;udf-hist-track-back=1800;udf-hist-track-slice=10;udf-hist-track-thresholds=1,8,64;query-hist-track-back=1800;query-hist-track-slice=10;query-hist-track-thresholds=1,8,64;query_rec_count-hist-track-back=1800;query_rec_count-hist-track-slice=10;query_rec_count-hist-track-thresholds=1,8,64
We even changed the following config parameters but it further increased the latency:
query-batch-size=1000
query-short-q-max-size=100000
query-long-q-max-size=100000
query-threads=28
query-worker-threads=400
query-req-max-inflight=1000

Linking back to the same question on the Aerospike community forum: https://discuss.aerospike.com/t/high-query-latency/2769/3
You're mainly bumping against the limits of that specific instance. We do not recommend using the t2 family. Please review the recommendations section of Aerospike's Amazon deployment guide. You should consider instances in the m3, c3, c4, r3, or i2 instance families.
A quick note on your configuration:
transaction-queues should be tuned to the number of cores. You have it set to 8, and a t2.large has 2 cores.
transaction-threads-per-queue is set too high for this instance type.
query-threads and query-worker-threads are both set too high for this instance.
Further, your queries involves a stream UDF so they will be slower than a regular query. UDFs are useful but you need to consider the context for which they're a good fit.
My suggestion is that you can probably model this differently, and skip the secondary index and UDF.

Related

Neo4j GraphSage training does not log anything

I am working on extracting graph embeddings with training GraphSage algorihm. I am working on a large graph consisting of (82,339,589) nodes and (219,521,164) edges. When I checked with ":queries" command the query is listed as running. Algorithm started in 6 days ago. When I look the logs with "docker logs xxx" the last logs listed as
2021-12-01 12:03:16.267+0000 INFO Relationship Store Scan (RelationshipScanCursorBasedScanner): Imported 352,492,468 records and 0 properties from 16247 MiB (17,036,668,320 bytes); took 59.057 s, 5,968,663.57 Relationships/s, 275 MiB/s (288,477,487 bytes/s) (per thread: 1,492,165.89 Relationships/s, 68 MiB/s (72,119,371 bytes/s))
2021-12-01 12:03:16.269+0000 INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING
INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING Actual
memory usage of the loaded graph: 8602 MiB
INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:64076] ] GraphSageTrain ::
Start
There is a way to see detailed logs about training process. Is it normal for taking 6 days for graphs with shared sizes ?
It is normal for GraphSAGE to take a long time compared to FastRP or Node2Vec. Starting in GDS 1.7, you can use
CALL gds.beta.listProgress(jobId: String)
YIELD
jobId,
taskName,
progress,
progressBar,
status,
timeStarted,
elapsedTime
If you call without passing in a jobId, it will return a list of all running jobs. If you call with a jobId, it will give you details about a running job.
This query will summarize the details for job 03d90ed8-feba-4959-8cd2-cbd691d1da6c.
CALL gds.beta.listProgress("03d90ed8-feba-4959-8cd2-cbd691d1da6c")
YIELD taskName, status
RETURN taskName, status, count(*)
Here's the documentation for progress logging. The system monitoring procedures might also be helpful to you.

Google cloud dataflow : Shutting down JVM after 8 consecutive periods of measured GC thrashing

Am using google cloud dataflow to do some transformation
am treading about 3 million records from GBQ and performing a transformation and writing transform result to GCS.
While doing this operation dataflow is failing with error
Error :
Shutting down JVM after 8 consecutive periods of measured GC thrashing
Workflow failed. Causes: S20:Read GBQ/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Read+Read GBQ/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow+Read GBQ/Reshuffle.ViaRandomKey/Reshuffle/ExpandIterable+Read GBQ/Reshuffle.ViaRandomKey/Values/Values/Map+Read GBQ/ReadFiles+Read GBQ/PassThroughThenCleanup/ParMultiDo(Identity)+Read GBQ/PassThroughThenCleanup/View.AsIterable/ParDo(ToIsmRecordForGlobalWindow)+transform+Split results/ParMultiDo(Partition)+Write errors/WriteFiles/RewindowIntoGlobal/Window.Assign+Write errors/WriteFiles/WriteShardedBundlesToTempFiles/ApplyShardingKey+Write errors/WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards/Reify+Write errors/WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards/Write+Write entities Gzip/WriteFiles/WriteShardedBundlesToTempFiles/ApplyShardingKey+Write entities Gzip/WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards/Reify+Write entities Gzip/WriteFiles/WriteShardedBundlesToTempFiles/GroupIntoShards/Write failed., A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on:
DataConverterOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
.as(DataConverterOptions.class);
Pipeline p = Pipeline.create(options);
EntityCreatorFn entityCreatorFn = EntityCreatorFn.newWithGCSMapping(options.getMapping(),
options.getWithUri(), options.getLineNumberToResult(), options.getIsPartialUpdate(), options.getQuery() != null);
PCollectionList<String> resultByType =
p.apply("Read GBQ", BigQueryIO.read(
(SchemaAndRecord elem) -> elem.getRecord().get("lineNumber") + "|" + elem.getRecord().get("sourceData"))
.fromQuery(options.getQuery()).withoutValidation()
.withCoder(StringUtf8Coder.of()).withTemplateCompatibility()).apply("transform",ParDo.of(entityCreatorFn))
.apply("Split results",Partition.of(2, (Partition.PartitionFn<String>) (elem, numPartitions) -> {
if (elem.startsWith(PREFIX_ERROR)) {
return PARTITION_ERROR;
}
return PARTITION_SUCCESS;
}));
FileIO.Sink sink = TextIO.sink();
resultByType.get(0).apply("Write entities Gzip", FileIO.write().to(options.getOutput()).withCompression(Compression.GZIP).withNumShards(options.getShards()).via(sink));
resultByType.get(1).apply("Write errors", TextIO.write().to(options.getErrorOutput()).withoutSharding());
p.run();
Shutting down JVM after 8 consecutive periods of measured GC thrashing. Memory is used/total/max = 109/301/2507 MB, GC last/max = 54.00/54.00 %, #pushbacks=0, gc thrashing=true.
Does 'EntityCreatorFn.newWithGCSMapping' cache elements in memory by any chance ? Seems like one of the steps in your pipeline is consuming too much memory (note that Dataflow cannot parallelize processing of a single element of a DoFn). I suggest adjusting your pipeline or trying out highmem machines. If the problem persists, please consider contacting Google Cloud Support with relevant job IDs etc.

I have collection of futures which are result of persist on dask dataframe. How to do a delayed operation on them?

I have setup a scheduler and 4 worker nodes to do some processing on csv. size of the csv is just 300 mb.
df = dd.read_csv('/Downloads/tmpcrnin5ta',assume_missing=True)
df = df.groupby(['col_1','col_2']).agg('mean').reset_index()
df = client.persist(df)
def create_sep_futures(symbol,df):
symbol_df = copy.deepcopy(df[df['symbol' == symbol]])
return symbol_df
lazy_values = [delayed(create_sep_futures)(symbol, df) for symbol in st]
future = client.compute(lazy_values)
result = client.gather(future)
st list contains 1000 elements
when I do this, I get this error:
distributed.worker - WARNING - Compute Failed
Function: create_sep_futures
args: ('PHG', symbol col_3 col_2 \
0 A 1.451261e+09 23.512857
1 A 1.451866e+09 23.886857
2 A 1.452470e+09 25.080429
kwargs: {}
Exception: KeyError(False,)
My assumption is that workers should get full dataframe and query on it. But I think it just gets the block and tries to do it.
What is the workaround for it? Since dataframe chunks are already in workers memory. I don't want to move the dataframe to each worker.
Operations on dataframes, using the dataframe syntax and API, are lazy (delayed) by default, you need do nothing more.
First problem: your syntax is wrong df[df['symbol' == symbol]] => df[df['symbol'] == symbol]. That is the origin of the False key.
So the solution you are probably looking for:
future = client.compute(df[df['symbol'] == symbol])
If you do want to work on the chunks separately, you can look into df.map_partitions, which you use with a normal function and takes care of passing data or delayed/futures or df.to_delayed, which will give you a set of delayed objects which you can use with a delayed function.

Apache Kafka Streams Materializing KTables to a topic seems slow

I'm using kafka stream and I'm trying to materialize a KTable into a topic.
It works but it seems to be done every 30 secs or so.
How/When does Kafka Stream decides to materialize the current state of a KTable into a topic ?
Is there any way to shorten this time and to make it more "real-time" ?
Here is the actual code I'm using
// Stream of random ints: (1,1) -> (6,6) -> (3,3)
// one record every 500ms
KStream<Integer, Integer> kStream = builder.stream(Serdes.Integer(), Serdes.Integer(), RandomNumberProducer.TOPIC);
// grouping by key
KGroupedStream<Integer, Integer> byKey = kStream.groupByKey(Serdes.Integer(), Serdes.Integer());
// same behaviour with or without the TimeWindow
KTable<Windowed<Integer>, Long> count = byKey.count(TimeWindows.of(1000L),"total");
// same behaviour with only count.to(Serdes.Integer(), Serdes.Long(), RandomCountConsumer.TOPIC);
count.toStream().map((k,v) -> new KeyValue<>(k.key(), v)).to(Serdes.Integer(), Serdes.Long(), RandomCountConsumer.TOPIC);
This is controlled by commit.interval.ms, which defaults to 30s. More details here:
http://docs.confluent.io/current/streams/developer-guide.html
The semantics of caching is that data is flushed to the state store and forwarded to the next downstream processor node whenever the earliest of commit.interval.ms or cache.max.bytes.buffering (cache pressure) hits.
and here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams

Getting connections stats for circuits on Spring cloud zuul

I am running a few micro service instances that are functioning as edge routers and have the #EnableZuulProxy annotation. I have written a number of filters and these control the flow of requests into the system.
What I would like to do is get the circuit stats from what is going on under the covers. I see that there is a underlying netflix class DynamicServerListLoadBalancer that has some of the sts I would like to see. Is it possible to get an instance of it and at specific time get the stats form it>
I can see it has stuff like this: (I formatted a log statement that I saw in my logs)
c.n.l.DynamicServerListLoadBalancer : DynamicServerListLoadBalancer for client authserver initialized:
DynamicServerListLoadBalancer:{
NFLoadBalancer:
name=authserver,current
list of Servers=[127.0.0.1:9999],
Load balancer stats=
Zone stats: {
defaultzone=[
Zone:defaultzone;
Instance count:1;
Active connections count: 0;
Circuit breaker tripped count: 0;
Active connections per server: 0.0;]
},
Server stats:
[[
Server:127.0.0.1:9999;
Zone:defaultZone;
Total Requests:0;
Successive connection failure:0;
Total blackout seconds:0;
Last connection made:Wed Dec 31 19:00:00 EST 1969;
First connection made: Wed Dec 31 19:00:00 EST 1969;
Active Connections:0;
total failure count in last (1000) msecs:0;
average resp time:0.0; 9
0 percentile resp time:0.0;
95 percentile resp time:0.0;
min resp time:0.0;
max resp time:0.0;
stddev resp time:0.0
]]
}
ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList#5b1b78aa
All of this would be valuable to get and act on. Mostly the acting would be to feed usage heuristics back to the system.
Ok, like most of these things, I figured it out myself.
So here you go.
HystrixCommandKey hystrixCommandKey = HystrixCommandKey.Factory.asKey("what you are looking for");
HystrixCommandMetrics hystrixCommandMetrics = HystrixCommandMetrics.getInstance(hystrixCommandKey);
HystrixCommandProperties properties = hystrixCommandMetrics.getProperties();
long maxConnections = properties.executionIsolationSemaphoreMaxConcurrentRequests().get().longValue();
boolean circuitOpen = properties.circuitBreakerForceOpen().get().booleanValue();
int currentConnections = hystrixCommandMetrics.getCurrentConcurrentExecutionCount();
So in this example, "what you are looking for" is the hysteria command that you are looking for.
this gets you the properties of the particular hysteria thing you are looking for.
Form this you pull out the max connections, the current connections and whether the circuit was open.
So there you are.

Resources