Getting connections stats for circuits on Spring cloud zuul - netflix-zuul

I am running a few micro service instances that are functioning as edge routers and have the #EnableZuulProxy annotation. I have written a number of filters and these control the flow of requests into the system.
What I would like to do is get the circuit stats from what is going on under the covers. I see that there is a underlying netflix class DynamicServerListLoadBalancer that has some of the sts I would like to see. Is it possible to get an instance of it and at specific time get the stats form it>
I can see it has stuff like this: (I formatted a log statement that I saw in my logs)
c.n.l.DynamicServerListLoadBalancer : DynamicServerListLoadBalancer for client authserver initialized:
DynamicServerListLoadBalancer:{
NFLoadBalancer:
name=authserver,current
list of Servers=[127.0.0.1:9999],
Load balancer stats=
Zone stats: {
defaultzone=[
Zone:defaultzone;
Instance count:1;
Active connections count: 0;
Circuit breaker tripped count: 0;
Active connections per server: 0.0;]
},
Server stats:
[[
Server:127.0.0.1:9999;
Zone:defaultZone;
Total Requests:0;
Successive connection failure:0;
Total blackout seconds:0;
Last connection made:Wed Dec 31 19:00:00 EST 1969;
First connection made: Wed Dec 31 19:00:00 EST 1969;
Active Connections:0;
total failure count in last (1000) msecs:0;
average resp time:0.0; 9
0 percentile resp time:0.0;
95 percentile resp time:0.0;
min resp time:0.0;
max resp time:0.0;
stddev resp time:0.0
]]
}
ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList#5b1b78aa
All of this would be valuable to get and act on. Mostly the acting would be to feed usage heuristics back to the system.

Ok, like most of these things, I figured it out myself.
So here you go.
HystrixCommandKey hystrixCommandKey = HystrixCommandKey.Factory.asKey("what you are looking for");
HystrixCommandMetrics hystrixCommandMetrics = HystrixCommandMetrics.getInstance(hystrixCommandKey);
HystrixCommandProperties properties = hystrixCommandMetrics.getProperties();
long maxConnections = properties.executionIsolationSemaphoreMaxConcurrentRequests().get().longValue();
boolean circuitOpen = properties.circuitBreakerForceOpen().get().booleanValue();
int currentConnections = hystrixCommandMetrics.getCurrentConcurrentExecutionCount();
So in this example, "what you are looking for" is the hysteria command that you are looking for.
this gets you the properties of the particular hysteria thing you are looking for.
Form this you pull out the max connections, the current connections and whether the circuit was open.
So there you are.

Related

Measure the duration of x amount of requests while using K6

I would like to use K6 in order to measure the time it takes to proces 1.000.000 requests (in total) by an API.
Scenario
Execute 1.000.000 (1 million in total) get requests by 50 concurrent users/theads, so every user/thread executes 20.000 requests.
I've managed to create such a scenario with Artillery.io, but I'm not sure how to create the same one while using K6. Could you point me in the right direction in order to create the scenario? (Most examples are using a pre-defined duration, but in this case I don't know the duration -> this is exactly what I want to measure).
Artillery yml
config:
target: 'https://localhost:44000'
phases:
- duration: 1
arrivalRate: 50
scenarios:
- flow:
- loop:
- get:
url: "/api/Test"
count: 20000
K6 js
import http from 'k6/http';
import {check, sleep} from 'k6';
export let options = {
iterations: 1000000,
vus: 50
};
export default function() {
let res = http.get('https://localhost:44000/api/Test');
check(res, { 'success': (r) => r.status === 200 });
}
The iterations + vus you've specified in your k6 script options would result in a shared-iterations executor, where VUs will "steal" iterations from the common pile of 1m iterations. So, the faster VUs will complete slightly more than 20k requests, while the slower ones will complete slightly less, but overall you'd still get 1 million requests. And if you want to see how quickly you can complete 1m requests, that's arguably the better way to go about it...
However, if having exactly 20k requests per VU is a strict requirement, you can easily do that with the aptly named per-vu-iterations executor:
export let options = {
discardResponseBodies: true,
scenarios: {
'million_hits': {
executor: 'per-vu-iterations',
vus: 50,
iterations: 20000,
maxDuration: '2h',
},
},
};
In any case, I strongly suggest setting maxDuration to a high value, since the default value is only 10 minutes for either executor. And discardResponseBodies will probably help with the performance, if you don't care about the response body contents.
btw you can also do in k6 what you've done in Artillery, have 50 VUs start a single iteration each and then just loop the http.get() call 20000 times inside of that one single iteration... You won't get a very nice UX that way, the k6 progressbars will be frozen until the very end, since k6 will have no idea of your actual progress inside of each iteration, but it will also work.

How do I properly transform missing datapoints as 0 in Prometheus?

We have an alert we want to fire based on the previous 5m of metrics (say, if it's above 0). However, if the metric is 0 it's not written to prometheus, and as such it's not returned for that time bucket.
The result is that we may have an example data-set of:
-60m | -57m | -21m | -9m | -3m <<< Relative Time
1 , 0 , 1 , 0 , 1 <<< Data Returned
which ultimately results in the alert firing every time the metric is above 0, not only when it's above 0 for 5m. I've tried writing our query with OR on() vector() appended to the end, but it does funny stuff with the returned dataset:
values:Array[12]
0:Array[1539021420,0.16666666666666666]
1:Array[1539021480,0]
2:Array[1539021540,0]
3:Array[1539021600,0]
4:Array[1539021660,0]
5:Array[1539021720,0]
6:Array[1539021780,0]
7:Array[1539021840,0]
8:Array[1539021900,0]
9:Array[1539021960,0]
10:Array[1539022020,0]
11:Array[1539022080,0]
For some reason it's putting the "real" data at the front of the array (even though my starting time is well before 1539021420) and continuing from that timestamp forward.
What is the proper way to have Prometheus return 0 for data-points which may not exist?
To be clear, this isn't an alertmanager question -- I'm using a different tool for alerting on this data.

Apache Kafka Streams Materializing KTables to a topic seems slow

I'm using kafka stream and I'm trying to materialize a KTable into a topic.
It works but it seems to be done every 30 secs or so.
How/When does Kafka Stream decides to materialize the current state of a KTable into a topic ?
Is there any way to shorten this time and to make it more "real-time" ?
Here is the actual code I'm using
// Stream of random ints: (1,1) -> (6,6) -> (3,3)
// one record every 500ms
KStream<Integer, Integer> kStream = builder.stream(Serdes.Integer(), Serdes.Integer(), RandomNumberProducer.TOPIC);
// grouping by key
KGroupedStream<Integer, Integer> byKey = kStream.groupByKey(Serdes.Integer(), Serdes.Integer());
// same behaviour with or without the TimeWindow
KTable<Windowed<Integer>, Long> count = byKey.count(TimeWindows.of(1000L),"total");
// same behaviour with only count.to(Serdes.Integer(), Serdes.Long(), RandomCountConsumer.TOPIC);
count.toStream().map((k,v) -> new KeyValue<>(k.key(), v)).to(Serdes.Integer(), Serdes.Long(), RandomCountConsumer.TOPIC);
This is controlled by commit.interval.ms, which defaults to 30s. More details here:
http://docs.confluent.io/current/streams/developer-guide.html
The semantics of caching is that data is flushed to the state store and forwarded to the next downstream processor node whenever the earliest of commit.interval.ms or cache.max.bytes.buffering (cache pressure) hits.
and here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams

Telegraf phpfpm not storing all tag measurements to influxdb

I have Telegraf configured and running with -input-filter phpfpm
Input filter configured:
[phpfpm]
urls = ["http://127.0.0.1:8080/fpmstats"]
This url works, and returns correct php-fpm stats:
pool: www
process manager: dynamic
start time: 03/Sep/2016:13:25:25 +0000
start since: 1240
accepted conn: 129
listen queue: 0
max listen queue: 0
listen queue len: 0
idle processes: 2
active processes: 1
total processes: 3
max active processes: 1
max children reached: 0
slow requests: 0
Telegraf Output is configured for Influxdb as follows:
[[outputs.influxdb]]
urls = ["udp://172.17.0.16:8089"] # Stick to UDP
database = "telegraf"
precision = "s"
retention_policy = "autogen"
write_consistency = "any"
timeout = "5s"
username = "telegraf"
password = "password"
user_agent = "telegraf"
udp_payload = 1024
This is 'almost' working, and data is being recieved by Influx - but only a couple of the measurements..
SHOW TAG KEYS FROM "phpfpm"
Shows only the following tagkey
host
pool
I expected to see values for accepted conn, listen queue, idel processes and so on. I cannot see any 'useful' data being posted to Influx.
Am I missing something, in terms of where to look for the phpfpm values being stored in the Influxdb.
Or is this a configuration problem.
I had a problem getting the http collector to work so stuck with UDP - is this a bad idea?
Data in InfluxDB is separated into measurements, tags, and fields.
Measurements are high level bucketing of data.
Tags are index values.
Fields are the actual data.
The data that you're working with has the measurement phpfpm and two tags host and pool.
I expected to see values for accepted conn, listen queue, idel processes and so on. I cannot see any 'useful' data being posted to Influx.
The values that you're looking for are most likely fields. To verify that this is the case run the query
SHOW FIELD KEYS FROM "phpfpm"

why pytz.country_timezones('cn') in centos system have different result?

Two computer install centos 6.5, kernel is 3.10.44, have different result.
one result is [u'Asia/Shanghai', u'Asia/Urumqi'], and the other is ['Asia/Shanghai', 'Asia/Harbin', 'Asia/Chongqing', 'Asia/Urumqi', 'Asia/Kashgar'].
Is there any config that make the first result same as the second result?
I have following python code:
def get_date():
date = datetime.utcnow()
from_zone = pytz.timezone("UTC")
to_zone = pytz.timezone("Asia/Urumqi")
date = from_zone.localize(date)
date = date.astimezone(to_zone)
return date
def get_curr_time_stamp():
date = get_date()
stamp = time.mktime(date.timetuple())
return stamp
cur_time = get_curr_time_stamp()
print "1", time.strftime("%Y %m %d %H:%M:%S", time.localtime(time.time()))
print "2", time.strftime("%Y %m %d %H:%M:%S", time.localtime(cur_time))
When use this code to get time, the result of one computer(have 2 results) is:
1 2016 04 20 08:53:18
2 2016 04 20 06:53:18
and the other(have 5 results) is:
1 2016 04 20 08:53:18
2 2016 04 20 08:53:18
I don't know why?
You probably just have an outdated version of pytz on the system returning five time zones (or perhaps on both systems). You can find the latest releases here. It's important to stay on top of time zone updates, as the various governments of the world change their time zones often.
Like most systems, pytz gets its data from the tz database. The five time zones for China were reduced to two in version 2014f (corresponding to pytz 2014.6). From the release notes:
China's five zones have been simplified to two, since the post-1970
differences in the other three seem to have been imaginary. The
zones Asia/Harbin, Asia/Chongqing, and Asia/Kashgar have been
removed; backwards-compatibility links still work, albeit with
different behaviors for time stamps before May 1980. Asia/Urumqi's
1980 transition to UTC+8 has been removed, so that it is now at
UTC+6 and not UTC+8. (Thanks to Luther Ma and to Alois Treindl;
Treindl sent helpful translations of two papers by Guo Qingsheng.)
Also, you may wish to read Wikipedia's Time in China article, which explains that the Asia/Urumqui entry is for "Ürümqi Time", which is used unofficially in some parts of the Xinjiang region. This zone is not recognized by the Chinese government, and is considered a politically charged issue. As such, many systems choose to omit the Urumqi time zone, despite it being in listed in the tz database.

Resources