Rate limit exceeded in Batch query - ios

I have client application to insert files into Google drive. One time it is required to insert multiple files into Google Drive. Batch query (GTLBatchQuery) is used to insert multiple files at a time to Google drive. Some time during insert, server is throwing rate limit exceeded error:
"error" : {
"message" : "Rate Limit Exceeded",
"data" : [
{
"reason" : "rateLimitExceeded",
"message" : "Rate Limit Exceeded",
"domain" : "usageLimits"
}
],
"code" : 417
},
Please direct me correct way to enable retry on this error. I have tried setting retryenabled to service:self.driveService.retryEnabled = YES;
self.driveService.maxRetryInterval = 60.0;
But it has no effect.
Is is possible to set retry for Batch query?
Should I need to set retry enabled to GTMHTTPFetcher?
Any code snippet on Implementing exponential backoff in objective-c is appreciated.

Standard exponential backoff as shown in the Google documentation is not the correct way to deal with rate limit errors. You will simply overload Drive with retries and make the problem worse.
Also, sending multiple updates in a batch is almost guaranteed to trigger rate limit errors if you have more than 20 or so updates, so I wouldn't do that either.
My suggestion is:-
Don't use batch, or if you do, keep each batch below 20 updates
If you get a rate limit, backoff for at least 5 seconds before retrying
Try to avoid the rate limit errors by keeping your updates below 20, or keeping the submission rate below one every 2 seconds
These numbers are all undocumented and subject to change.
The reason for 3 is that there is (was, who knows) a bug in Drive that even though an update returned a rate limit error, it did actually succeed, so you can end up inserting duplicate files. See 403 rate limit on insert sometimes succeeds

Related

Understand how k6 manages at low level a large number of API call in a short period of time

I'm new with k6 and I'm sorry if I'm asking something naive. I'm trying to understand how that tool manage the network calls under the hood. Is it executing them at the max rate he can ? Is it queuing them based on the System Under Test's response time ?
I need to get that because I'm running a lot of tests using both k6 run and k6 cloud but I can't make more than ~2000 requests per second (looking at k6 results). I was wondering if it is k6 that implement some kind of back-pressure mechanism if it understand that my system is "slow" or if there are some other reasons why I can't overcome that limit.
I read here that is possible to make 300.000 request per second and that the cloud environment is already configured for that. I also try to manually configure my machine but nothing changed.
e.g. The following tests are identical, the only changes is the number of VUs. I run all test on k6 cloud.
Shared parameters:
60 api calls (I have a single http.batch with 60 api calls)
Iterations: 100
Executor: per-vu-iterations
Here I got 547 reqs/s:
VUs: 10 (60.000 calls with an avg response time of 108ms)
Here I got 1.051,67 reqs/s:
VUs: 20 (120.000 calls with an avg response time of 112 ms)
I got 1.794,33 reqs/s:
VUs: 40 (240.000 calls with an avg response time of 134 ms)
Here I got 2.060,33 ​reqs/s:
VUs: 80 (480.000 calls with an avg response time of 238 ms)
Here I got 2.223,33 ​reqs/s:
VUs: 160 (960.000 calls with an avg response time of 479 ms)
Here I got 2.102,83 peak ​reqs/s:
VUs: 200 (1.081.380 calls with an avg response time of 637 ms) // I reach the max duration here, that's why he stop
What I was expecting is that if my system can't handle so much requests I have to see a lot of timeout errors but I haven't see any. What I'm seeing is that all the API calls are executed and no errors is returned. Can anyone help me ?
As k6 - or more specifically, your VUs - execute code synchronously, the amount of throughput you can achieve is fully dependent on how quickly the system you're interacting with responds.
Lets take this script as an example:
import http from 'k6/http';
export default function() {
http.get("https://httpbin.org/delay/1");
}
The endpoint here is purposefully designed to take 1 second to respond. There is no other code in the exported default function. Because each VU will wait for a response (or a timeout) before proceeding past the http.get statement, the maximum amount of throughput for each VU will be a very predictable 1 HTTP request/sec.
Often, response times (and/or errors, like timeouts) will increase as you increase the number of VUs. You will eventually reach a point where adding VUs does not result in higher throughput. In this situation, you've basically established the maximum throughput the System-Under-Test can handle. It simply can't keep up.
The only situation where that might not be the case is when the system running k6 runs out of hardware resources (usually CPU time). This is something that you must always pay attention to.
If you are using k6 OSS, you can scale to as many VUs (concurrent threads) as your system can handle. You could also use http.batch to fire off multiple requests concurrently within each VU (the statement will still block until all responses have been received). This might be slightly less overhead than spinning up additional VUs.

How to ensure insert rate 1 insert per second when using ClickhouseIO

I'm using Apache Beam Java SDK to process events and write them to the Clickhouse Database.
Luckily there is ready to use ClickhouseIO.
ClickhouseIO accumulates elements and inserts them in batch, but because of the parallel nature of the pipeline it still results in a lot of inserts per second in my case. I'm frequently receiving "DB::Exception: Too many parts" or "DB::Exception: Too much simultaneous queries" in Clickhouse.
Clickhouse documentation recommends doing 1 insert per second.
Is there a way I can ensure this with ClickhouseIO?
Maybe some KV grouping before ClickhouseIO.Write or something?
It looks like you interpret these errors not quite correct:
DB::Exception: Too many parts
It means that insert affect more partitions than allowed (by default this value is 100, it is managed by parameter max_partitions_per_insert_block).
So either the count of affected partition is really large or the PARTITION BY-key was defined pretty granular.
How to fix it:
try to group the INSERT-batch such way it contains data related to less than 100 partitions
try to reduce the size of insert-block (if it quite huge) - withMaxInsertBlockSize
increase the limit max_partitions_per_insert_block in SQL-query (like this, INSERT .. SETTINGS max_partitions_per_insert_block=300 (I think ClickhouseIO should have the ability to set custom options on query level)) or on server-side by modifying userprofile-settings
DB::Exception: Too much simultaneous queries
This one managed by param max_concurrent_queries.
How to fix it:
reduce the count of concurrent queries by Beam means
increase the limit on the server-side in userprofile- or server-settings (see https://github.com/ClickHouse/ClickHouse/issues/7765)

NEsper memory usage of "output" keyword

I have many EPL statements that output a period of time (1~24 hours), and following is my statement
"SELECT MessageID, VName, count(VName) as count FROM DDIEvent(MajorType=4).std:groupwin(VName).win:time(3 hour).win:length(10) group by VName having count(VName) >= 10 output last every 3 hour"
If there is no limit of the length window, my case will retain around 700K records in 3 hours.
And in above, my test case only have 100 different VName. For my understanding, there will have maximum 1000 records keep in memory at the same time, (100 * 10[length]) am i right?
But the memory usage of my application will keep growing until output to listener.
The memory usage almost the same as the sample without length window.
And after output to listener the memory significantly fall down.
Then, another cycle begin, memory grow slowly until 3 hour later.
I already check the document, do not find the memory related topic of the "output" keyword.
Does anyone knows what is the really root cause? And how to avoid? Or just my EPL's problem?
Thank you~
If your query removes the "MessageId" from the select clause, it becomes a regular aggregation query. You could do a "last(MessageId)" instead. Because "MessageId" is in there the rows that the engine delivers are a row for each event rather then a row for each aggregation group.

Rate Limit Twitter API

I'm kind of confusion with twitter api guide on rate limiting mention over here https://dev.twitter.com/docs/rate-limiting/1.1
In their guide twitter has mention the follow field would be present in the response headers which can be use to a determine the amount of api call allowed , left and will rest at info
X-Rate-Limit-Limit: the rate limit ceiling for that given request
X-Rate-Limit-Remaining: the number of requests left for the 15 minute window
X-Rate-Limit-Reset: the remaining window before the rate limit resets in UTC epoch seconds
Now they have also given a rate limit status api to query against
https://dev.twitter.com/docs/api/1.1/get/application/rate_limit_status
Now I'm kind of confuse which of the above value should I follow to see how much api call is available for me before the desired limit is reach .
Both seem to return the same. While /get/application/rate_limit_status is an API call which returns rate limits for all resources, X-rate-limits sets the header for the resource you just called.
Use /get/application/rate_limit_status to cache the no of API calls remaining, refresh at periodic intervals rather than having to make a call and then parse the header info to check if you've exceeded rate limits

BIDS SSRS Report query timeout issue while using Stored Procedure with timeout settings set appropriately

I've ran into a Timeout issue while executing a stored procedure for a SSRS Report I've created in Business Intelligence Development studio (BIDS). My stored procedure is pretty large and on average takes nearly 4 minutes to execute in SQL Server Management Studio. So i've accomidated for this by increasing the "Time out (in seconds)" to 600 seconds (10 mins). I've also increased the query timeout in the Tools->Options->Business Intelligence Designers-->Query Timeout AND Connection Timeout to 600 seconds as well.
Lastly, I've since created two other reports that use stored procedures with no problems. (they are a lot smaller and take roughly 30 seconds to execute). For my dataset properties, I always use Query type: "Text", and call the stored procedure with the EXEC command.
Any ideas as to why my stored procedure of interest is still timing out?
Below is the error message I receive after clicking "Refresh Fields":
"Could not create a list of fields for the query. Verify that you can connect to the data source and that your query syntax is correct."
Details
"Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The statement has been terminated."
Thank You for your time.
Check the Add Key="DatabaseQueryTimeout" Value="120" value in your rsreportserver.config file. You may need to increase it there also.
More info on that file:
http://msdn.microsoft.com/en-us/library/ms157273.aspx
Also, in addition to what the first commenter on your post stated, in my experience if you are rendering to PDF, those can time out also. Your large dataset is returned w/i a reasonable amount of time, however the rendering of the PDF can take forever. Try rendering to Excel. The BIDs results will render rather quickly, but exporting the results are what can cause an issue.

Resources