I have a topology where spout reads data from Kafka and sends to bolt which in turn calls a REST API (A) and that calls another REST API (B). So far API B did not have throttling. Now they have implemented throttling (x number of max calls per clock minute).
We need to implement the throttling handler.
Option A
Initially we were thinking to do it in REST API (A) level and put a
Thread.sleep(x in millis) once the call is throttled by REST API (B)
but that will hold all the REST (A) calls waiting for that long (which will vary between 1 sec to 59 seconds) and that may increase the load for new calls coming in.
Option B
REST API (A) sends response back to Bolt about being throttled. Bolt notifies the Spout with process failure to
To not to change the offset for those messages
To tell spout to stop reading from Kafka and to stop emitting message to Bolt.
Spout waits for some time (say a minute) and resumes from where it left
Option A is straight forward to implement but not a good solution in my opinion.
Trying to figure out if Option B is feasible with topology.max.spout.pending however how to dynamically communicate to Storm to put a throttling in spout. Anyone please can you share some thoughts on this option.
Option C
REST API (B) throttles the call from REST (A) which will not handle the same but will send the 429 response code to the bolt. The bolt will re-queue the message to a error topic part of another storm topology. This message can have retry count as part of it and in case the same message gets throttled again we can re-queue again with ++retry count.
Updating the post as found a solution to make the option B feasible.
Option D
https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutRetryExponentialBackoff.java
/**
* The time stamp of the next retry is scheduled according to the exponential backoff formula (geometric progression):
* nextRetry = failCount == 1 ? currentTime + initialDelay : currentTime + delayPeriod^(failCount-1),
* where failCount = 1, 2, 3, ... nextRetry = Min(nextRetry, currentTime + maxDelay).
* <p/>
* By specifying a value for maxRetries lower than Integer.MAX_VALUE, the user decides to sacrifice guarantee of delivery for the
* previous polled records in favor of processing more records.
*
* #param initialDelay initial delay of the first retry
* #param delayPeriod the time interval that is the ratio of the exponential backoff formula (geometric progression)
* #param maxRetries maximum number of times a tuple is retried before being acked and scheduled for commit
* #param maxDelay maximum amount of time waiting before retrying
*
*/
public KafkaSpoutRetryExponentialBackoff(TimeInterval initialDelay, TimeInterval delayPeriod, int maxRetries, TimeInterval maxDelay) {
this.initialDelay = initialDelay;
this.delayPeriod = delayPeriod;
this.maxRetries = maxRetries;
this.maxDelay = maxDelay;
LOG.debug("Instantiated {}", this.toStringImpl());
}
The steps will be as follows:
Create kafkaSpoutRetryService using the above constructor
Set retry to KafkaSpoutConfig using
KafkaSpoutConfig.builder(kafkaBootStrapServers, topic).setRetry(kafkaSpoutRetryService)
Fail the Bolt in case there is throttling in Rest API (B) using
collector.fail(tuple) which will signal spout to process the tuple again, based in the retry configuration setup in step 1 and 2
Your option D sounds fine, but in the interest of avoiding duplicates in calls to API A, I think you should consider separating your topology into two.
Have a topology that reads from your original Kafka topic (call it topic 1), calls REST API A, and writes whatever the output of the bolt is back to a Kafka topic (call it topic 2).
You then make a second topology whose only job is to read from topic 2, and call REST API B.
This will allow you to use option D while avoiding extra calls to API A when you are saturating API B. Your topologies will look like
Kafka 1 -> Bolt A -> REST API A -> Kafka 2
Kafka 2 -> Bolt B -> REST API B
If you want to make the solution a little more responsive to the throttling, you can use the topology.max.spout.pending configuration in Storm to limit how many tuples can be in-flight at the same time. You could then make your bolt B buffer in-flight tuples until the throttling expires, at which point you can make it try sending the tuples again. You can use OutputCollector.resetTupleTimeout to avoid the tuples timing out while Bolt B is waiting for the throttling to expire. You can use tick tuples to make Bolt B periodically wake up and check whether the throttling has expired.
Related
I previously asked this question about using Locust as the means of delivering a static, repeatable request load to the target server (n requests per second for five minutes, where n is predetermined for each second), and it was determined that it's not readily achievable.
So, I took a step back and reformulated the problem into something that you probably could do using a custom load shape, but I'm not sure how – hence this question.
As in the previous question, we have a 5-minute period of extracted Apache logs, where each second, anywhere from 1 to 36 GET requests were made to an Apache server. From those logs, I can get a distribution of how many times a certain requests-per-second rate appeared; e.g. there's a 1/4000 chance of 36 requests being processed on any given second, 1/50 for 18 requests to be processed on any given second, etc.
I can model the distribution of request rates as a simple Python list: the numbers between 1 and 36 appear in it an equal number of times as 1–36 requests per second were made in the 5-minute period captured in the Apache logs, and then just randomly get a number from it in the tick() method of a custom load shape to get a number that informs the (user count, spawn rate) calculation.
Additionally, by using a predetermined random seed, I can make the test runs repeatable to within an acceptable level of variation to be useful in testing my API server configuration changes, since the same random list elements should be retrieved each time.
The problem is that I'm not yet able to "think in Locust", to think in terms of user counts and spawn rates instead of rates of requests received by the server.
The question becomes this:
How do you implement the tick() method of a custom load shape in such a way that the (user count, spawn rate) tuple results in a roughly known distribution of requests per second to be sent, possibly with the help of other configuration options and plugins?
You need to create a Locust User with the tasks you want it to run (e.g. make your http calls). You can define time between tasks to kind of control the requests per second. If you have a task to make a single http call and define wait_time = constant(1) you can roughly get 1 request per second. Locust's spawn_rate is a per second unit. Since you have the data you want to reproduce already and it's in 1 second intervals, you can then create a LoadTestShape class with the tick() method somewhat like this:
class MyShape(LoadTestShape):
repro_data = […]
last_user_count = 0
def tick(self):
self.last_user_count = requests_per_second
if len(self.repro_data) > 0:
requests_per_second = self.repro_data.pop(0)
requests_per_second_diff = abs(last_user_count - requests_per_second)
return (requests_per_second, requests_per_second_diff)
return None
If your first data point is 10 requests, you'd need requests_per_second=10 and requests_per_second_diff=10 to make Locust spin up all 10 users in a single second. If the next second is 25, you'd have requests_per_second=25 and requests_per_second_diff=15. In a Load Shape, spawn_rate also works for decreasing the number of users. So if next is 16, requests_per_second=16 and requests_per_second_diff=9.
Is it possible to reconnect to the same push query after connection was lost?
There is a queryId for terminating a query. Is there a similar mechanism which could be used for reconnections via ksqldb REST API?
In case that the subscription stopped at offset 5, it should receive also the new messages (without skipping any of them) after offset 5 after reconnection, but without replaying the whole topic from the beginning again.
The ideal would be exactly once processing for all messages (records), but at-least-once semantics are a must have in lot of scenarios.
"POST" "http://<ksqldb-host-name>:8088/query-stream"
{
"sql": "SELECT * FROM Movies EMIT CHANGES;",
"properties":{"auto.offset.reset":"earliest"}
}
Header response with queryID:
{"queryId":"f91157c7-cd12-407e-a173-5a4cbc398259","columnNames":["TITLE","ID","RELEASE_YEAR"],"columnTypes":["STRING","INTEGER","INTEGER"]}
What I'm probably missing is a programmatic way of committing the offsets.
I have a SQS consumer running in EventConsumerService that needs to handle up to 3K TPS successfully, sometimes upwards of 20K TPS (or 1.2 million messages per minute). For each message processed, I make a REST call to DataService's TCP VIP. I'm trying to perform a load test to find the max TPS that one host can handle in EventConsumerService without overstraining:
Request volume on dependencies, DynamoDB storage, etc
CPU utilization in both EventConsumerService and DataService
Network connections per host
IO stats due to overlogging
DLQ size must be minimal, currently I am seeing my DLQ growing to 500K messages due to 500 Service Unavailable exceptions thrown from DataService, so something must be wrong.
Approximate age of oldest message. I do not want a message sitting in the queue for over X minutes.
Fatals and latency of the REST call to DataService
Active threads
This is how I am performing the performance test:
I set up both my consumer and the other service on one host, the reason being I want to understand the load on both services per host.
I use a TPS generator to fill the SQS queue with a million messages
The EventConsumerService service is already running in production. Once messages started filling the SQS queue, I immediately could see requests being sent to DataService.
Here are the parameters I am tuning to find messagesPolledPerSecond:
messagesPolledPerSecond = (numberOfHosts * numberOfPollers * messageFetchSize) * (1000/(sleepTimeBetweenPollsPerMs+receiveMessageTimePerMs))
messagesInSurge / messagesPolledPerSecond = ageOfOldestMessageSLA
ageOfOldestMessage + settingsUpdatedLatency < latencySLA
The variables for SqsConsumer which I kept constant are:
numberOfHosts = 1
ReceiveMessageTimePerMs = 60 ms? It's out of my control
Max thread pool size: 300
Other factors are all game:
Number of pollers (default 1), I set to 150
Sleep time between polls (default 100 ms), I set to 0 ms
Sleep time when no messages (default 1000 ms), ???
message fetch size (default 1), I set to 10
However, with the above parameters, I am seeing a high amount of messages being sent to the DLQ due to server errors, so clearly I have set values to be too high. This testing methodology seems highly inefficient, and I am unable to find the optimal TPS that does not cause such a tremendous number of messages to be sent to the DLQ, and does not cause such a high approximate age of the oldest message.
Any guidance is appreciated in how best I should test. It'd be very helpful if we can set up a time to chat. PM me directly
I have a Flux, for each object I should make an API call to the third party REST (about 1000 calls). To prevent to many requests per second Im using:
Flux<Calls> callsIntervalFlux=
Flux.interval(Duration.ofMillis(100))
.zipWith(callsFlux, (i, call) -> call);
// and now Calls emits every 10ms, and REST API is not overloaded
The problem is, sometimes application fails with exception:
reactor.core.Exceptions$ErrorCallbackNotImplemented: reactor.core.Exceptions$OverflowException: Could not emit tick 32 due to lack of requests (interval doesn't support small downstream requests that replenish slower than the ticks)
Caused by: reactor.core.Exceptions$OverflowException: Could not emit tick 32 due to lack of requests (interval doesn't support small downstream requests that replenish slower than the ticks)
Is there any logic that I can add to prevent an error, or just skip this tick?
This means that the consumer of the results doesn't consume the data fast enough: the interval being on a fixed frequency, it tries to emits but nobody listens.
There's a need for some sort of more advanced permit-based rate limiter built on Reactor I think. But in the meantime, another simple (simplistic?) approach that you could try is to individually ensure each call is delayed from the previous one by 10ms:
Flux<Calls> callsIntervalFlux = callsFlux.delayElements(Duration.ofMillis(10));
(this operator was made to replace the zipWith(interval) pattern)
I'm using Prometheus' Summary metric to collect the latency of an API call. Instead of making an actual API call, I'm simply calling Thread.sleep(1000) to simulate a 1 second api-call latency value -- this makes the Summary hold a value of .01 (for 1 second of latency). But if, for example, I invoke Thread.sleep(1000) twice in the same minute, the Summary metric ends up with a value of .02 (for 2 seconds of latency), instead of two individual instances of .01 latency that just happened to occur within the same minute. My problem is the Prometheus query. The Prometheus query I am currently using is: rate(my_custom_summary_sum[1m]).
What should my Prometheus query be, such that I can see the latency of each individual Thread.sleep(1000) invocation. As of right now, the Summary metric collects and displays the total latency sum per minute. How can I display the latency of each individual call to Thread.sleep(1000) (i.e. the API request)?
private static final Summary mySummary = Summary.build()
.name("my_custom_summary")
.help("This is a custom summary that keeps track of latency")
.register();
Summary.Timer requestTimer = mySummary.startTimer(); //starting timer for mySummary 'Summary' metric
Thread.sleep(1000); //sleep for one second
requestTimer.observeDuration(); //record the time elapsed
This is the graph that results from this query:
Prometheus graph
Prometheus is a metrics-based monitoring system, it cares about overall performance and behaviour - not individual requests.
What you are looking for is a logs-based system, such as Graylog or the ELK stack.