Explain Cost of Google Cloud PubSub when used with Cloud Dataflow - google-cloud-dataflow

The documentation on pubsub pricing is very minimal. Can someone explain the costs for the scenario below ?
Size of the data per event = 0.5 KB
Size of data per day = 1 TB
There is only one publisher app and there are two dataflow pipeline subscriptions.
The very rough estimate I can come up with is:
1x publishing
2x subscription (1x for each subscription)
2x acknowledgment (1x for each subscription ack)
The questions are:
Is total data volume per month, 150 (30* 1 TB * 5x) TB? That is 8000$ per month from the price calculator.
1 KB min size for the calculation is applicable even for acknowledging a message?
Dataflow handles subscribe/acknowledge in bundles of ParDos. But, Is the bundle for each message acknowledged separately?

One does not pay for acknowledgements in Google Cloud Pub/Sub, only for publishes, pulls, and pushes. With messages of size 0.5KB, the amount you'd get charged would depend on the batching because of the 1KB minimum size. If all requests had at least 1KB, then the total cost for publishing and getting messages to two subscribers would be:
1TB/day * 30 days * 3 = 92,160GB/month
10GB * $0 + 92,150GB * $0.04 = $3,686
If some messages were not batched, then the price could go up because of the 1KB minimum. The Google Cloud Pub/Sub client library does batch published messages by default, so assuming your messages were not published very sporadically (meaning they were not frequent enough to result in batching), you would hit the 1KB minimum. With the amount of data, you are probably going to end up with batching on your subscribe side as well.

Related

Need to send SMS to 11K people at once but don't know in advance how much $ to have in balance?

I have a python script which sends to all 11K people an SMS at once, they are from all sorts of countries.
I don't want to have money left over in my balance as I won't be doing that again.
Problem it's too difficult to estimate the cost as the people are from 190 different countries.
I know there is Auto-recharge which is enabled for me, but the issue is that it's sending all messages at once, so I do not think auto-recharge will work as it needs to recharge inside milliseconds.
Any solution?
I'd try batching strategies, since most numbers can't process more than 10 SMS/second (1 SMS/second for NA numbers) anyhow (anything more will just get queued), so 11k messages would take ~18 mins anyhow.
So split your pool into 5 batches of ~2k messages, and see how much the first 3 batches cost, which would inform how much money to load for batches 4 & 5.
NOTE: running out of money mid-batch would need to be adequately handled, too.
Sending costs will vary by [destination] country, but these rates are published, e.g. US - 0.75 cents/msg, India - 1.75 cents/msg, UK - 4 cents/msg, etc.
Then the problem becomes one of parsing out country codes from your target numbers if they're not already split (e.g. +18005551234 vs. +1 8005551234).

What is the best way to performance test an SQS consumer to find the max TPS that one host can handle?

I have a SQS consumer running in EventConsumerService that needs to handle up to 3K TPS successfully, sometimes upwards of 20K TPS (or 1.2 million messages per minute). For each message processed, I make a REST call to DataService's TCP VIP. I'm trying to perform a load test to find the max TPS that one host can handle in EventConsumerService without overstraining:
Request volume on dependencies, DynamoDB storage, etc
CPU utilization in both EventConsumerService and DataService
Network connections per host
IO stats due to overlogging
DLQ size must be minimal, currently I am seeing my DLQ growing to 500K messages due to 500 Service Unavailable exceptions thrown from DataService, so something must be wrong.
Approximate age of oldest message. I do not want a message sitting in the queue for over X minutes.
Fatals and latency of the REST call to DataService
Active threads
This is how I am performing the performance test:
I set up both my consumer and the other service on one host, the reason being I want to understand the load on both services per host.
I use a TPS generator to fill the SQS queue with a million messages
The EventConsumerService service is already running in production. Once messages started filling the SQS queue, I immediately could see requests being sent to DataService.
Here are the parameters I am tuning to find messagesPolledPerSecond:
messagesPolledPerSecond = (numberOfHosts * numberOfPollers * messageFetchSize) * (1000/(sleepTimeBetweenPollsPerMs+receiveMessageTimePerMs))
messagesInSurge / messagesPolledPerSecond = ageOfOldestMessageSLA
ageOfOldestMessage + settingsUpdatedLatency < latencySLA
The variables for SqsConsumer which I kept constant are:
numberOfHosts = 1
ReceiveMessageTimePerMs = 60 ms? It's out of my control
Max thread pool size: 300
Other factors are all game:
Number of pollers (default 1), I set to 150
Sleep time between polls (default 100 ms), I set to 0 ms
Sleep time when no messages (default 1000 ms), ???
message fetch size (default 1), I set to 10
However, with the above parameters, I am seeing a high amount of messages being sent to the DLQ due to server errors, so clearly I have set values to be too high. This testing methodology seems highly inefficient, and I am unable to find the optimal TPS that does not cause such a tremendous number of messages to be sent to the DLQ, and does not cause such a high approximate age of the oldest message.
Any guidance is appreciated in how best I should test. It'd be very helpful if we can set up a time to chat. PM me directly

How can I calculate the appropriate amount of channel capacity?

I am looking for a solution because the sth-channel is full.
I am troubled with calculating the appropriate capacity of channel capacity.
This document has the following description.
In order to calculate the appropriate capacity, just have in consideration the following parameters:
・The amount of events to be put into the channel by the sources per unit time (let's say 1 minute).
・The amount of events to be gotten from the channel by the sinks per unit time.
・An estimation of the amount of events that could not be processed per unit time, and thus to be reinjected into the channel (see next section).
How can I check the values of these parameters?
How can I check the values of these parameters?
You can't just check these parameters. They depend on your application.
What they are saying is that you should have a size which is large enough so the generator doesn't get stuck. This may not be possible in your application.
Say your generator receives one event per second and it takes 2 seconds for a receiver to manage that event. Now lets assume you have 3 receivers. In 1 second, you can manage to process 0.5 events per receiver. You have 3 receivers, so your receivers, together, are capable of processing 0.5 × 3 = 1.5 events, which is more than what you get as input. Your capacity can be 1 or 2, using 2 will greatly increase your chances that you do not get blocked.
Let's review another example:
Your generator wants to pushes 1,000 events per second
Your receivers take 3 seconds to process one event
You would need 1,000 x 3 = 3,000 receivers (3,000 goroutines that can run at full speed in parallel...)
In this example, the total number of receivers is so large that you have to either break up your code to work on multiple computers or optimize your receiver code so it can process the data in an amount of time that makes sense. Say you have 50 processors, your receivers will get 1,000 events per second, all 50 can run at full speed, you need one receiver to do its work in:
50 / 1000 = 0.05 seconds
Now let's assume that in most cases your goroutines take 0.02 but once in a while one will take 1 second. That means your goroutines can get a little behind. In that case your capacity (so the generator doesn't get blocked) should be a little over 1,000. Again, it will depend on how many of the routines get slowed down, etc. In this last example, a run is 0.02 seconds so to process 1,000 events it usually takes 0.02 seconds. If you can send those 1,000 event over the 1 second period, you may not even need the 50 goroutines and could have a smaller capacity. On the other hand, if you have big bursts where you may end up sending many (say 500) events all at ones, then more goroutines and a larger capacity is important to not get blocked.

How to write bosun alerts which handle low traffic volumes

If you are writing a bosun alert which is based of a percentage error rate for requests handled by your system, how do you write it in such a way that it handles periods of low traffic.
For example:
If I have an alert which looks back over the last 5 minutes and works out the error rate for requests
$errorRate = $numberErr/$numberReq and then triggers an alarm if the errorRate exceeds a predefined threshold crit = $errorRate > 0.05 this can work quite well so long as every 5 minute period had a sufficiently large number of requests ($numberReq).
If the number of requests in a 5 minute period was 10,000 then 501 errors would be required to trigger an alarm. However if the number of requests in a 5 minute period was 100 then only 5 errors would be required to trigger an alarm.
How can I write an alert which handles periods where the number of requests are so low that a small number of errors will equate to a large error rate. I had considered a sliding window of time, rather than a fixed 5 minute period, where the window would increase in size until the number of requests was high enough to give some confidence in the alarm. e.g. increase the time period until the number of requests is 10,000.
I can't find a way to achieve this in bosun, and I don't want to commit to a larger period of time for my alerts because the traffic rate varies so much. A longer period during peak traffic could result in an actual error causing a much larger impact.
I generally pair any percentage and/or historical based alerts with a static threshold.
For example: crit = numberErr > 100 && $errorRate > 0.05. That way the percent part doesn't matter unless the number of errors have also crossed some threshold because the entire statement won't be true.

Google Cloud Dataflow latency for real-time processing

How low can we expect the latency from Dataflow to be in cases where we just do a simple transform on a high-traffic Google Dataflow cluster, and each “data point” is small.
We’re planning on using the Sessions windowing strategy with a gap duration of 3 seconds, if that’s relevant.
Is it realistic that the time from a data point gets into Dataflow until we have a result to output can be less than 2 seconds? Less than 1 second?
We have been running benchmarks for our application flow using a test harness but then reverted to benchmarking the current out-of-the-box Google-supplied PubSub to PubSub template flow (see: https://cloud.google.com/dataflow/docs/templates/overview, although not listed here - you can create it from the Console).
Our test harness generated and sent millions of JSON-formatted messages of a few hundred bytes with timestamps and compared the latencies at either end.
Very Simply:
Test Publisher -> PubSub -> Data Flow -> PubSub -> Test Subscriber.
For single instance publisher and subscribers we varied the message rates and experimented with the windowing and Trigger Strategies to see if we could improve the average latency but typically weren't able to improve much beyond 1.7 seconds end-to-end for 1,500 - 2000 messages per second (our typical workload).
We then removed Dataflow from the equation and just hooked up the publisher to the subscriber directly and saw latencies typically around 20-30 milliseconds for identical message rates.
Reverting to using the standard PubSub to PubSub Data flow template we saw end-to-end latencies similar to that of our application data flow of around 1.5 - 1.7 seconds.
We sampled the timestamps at various points in the pipeline and written the values to a number custom metrics and have seen that the average latency for adding the message to the initial PCollection from the PubSubIO.Read was around 380msec, but the minimum was as low as 25msec, we ignored the higher values because of the startup overheads. But it seems that there was an overhead that we were unable to influence.
The windowing strategy we tried looked like this:
Pipeline p = Pipeline.create(options);
/*
* Attempt to read from PubSub Topic
*/
PCollectionTuple feedInputResults =
p.apply(feedName + ":read", PubsubIO.readStrings().fromTopic(inboundTopic))
.apply(Window.<String>configure()
.triggering(Repeatedly
.forever(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.millis(windowDelay)))
// Fire on any late data
.withLateFirings(AfterPane.elementCountAtLeast(windowMinElementCount))))
.discardingFiredPanes())
.apply(feedName + ":parse", ParDo.of(new ParseFeedInputFn())
.withOutputTags(validBetRecordTag,
// Specify the output with tag startsWithBTag, as a TupleTagList.
TupleTagList.of(invalidBetRecordTag)));

Resources