Google Cloud Functions triggers and Database quota consumption - firebase-realtime-database

I would like to know if the Firestore/Realtime Database quota is consumed when a google cloud function is triggered.
For example:
Firestore:
A client pushes a write operation to a document. At the same time, cloud function has an onWrite trigger for that document.
When onWrite is triggered, we have a DocumentSnapshot as parameter.
Does it mean that a read operation just happened?
What about the other triggers (also considering Realtime Database)?

Cloud Functions trigger invocations do not count as reads or writes against the product that generated it. However, if you write code inside the function to read or write the database, that will be billed normally.

Related

Cleanup on termination of durable function

Is it possibly to somehow get notified that the durable function as been or about to be terminated, such that it is possibly to initiate cleanup of the already finished activity functions for example? In my example we're sending multiple requests to a subsystem and need to revoke or refund the orders in case of the durable function is being terminated.
I don't believe that there is any way to subscribe to Terminatation events in Durable Functions, as the Durable Task Framework handles this before user code is ever invoked.
One option instead of using the explicit terminate API built into Durable Functions is to instead listen to some CustomTerminate event within your orchestration, using a Task.WhenAny() approach whenever you schedule an activity or suborchestration. Then, if you ever receive this CustomTerminate event instead of the Activity or SubOrchestration response, you could manually handle the cleanup at this point.

How can I implement throttling by the message value using MassTransit? (backend is SNS/SQS but flexible)

I'm interested in using MassTransit as the event bus to help me bust a cache, but I'm not sure how to properly throttle the service.
The situation
I have a .Net service that has a refreshCache(itemId) API which recomputes the cache for itemId. I want to call this whenever code in my organization modifies any data related to itemId.
However, due to legacy code, I may have 10 events for a given itemId emitted within the same second. Since the refreshCache(itemId) call is expensive, I'd prefer to only call it once every second or so per itemId.
For instance, imagine that I have 10 events emitted for item1 and then 1 event emitted for item2. I'd like refreshCache to be called twice, once with item1 and once with item2.
Trouble with MassTransit
I could send event messages that essentially are just itemId over SNS/SQS, and the .Net service could use a MassTransit consumer to listen to that SQS queue and call refreshCache for each message. Ideally, I can also throttle either in SNS/SQS or MassTransit.
I've read these docs: https://masstransit-project.com/advanced/middleware/rate-limiter.html and have tried to find the middleware in the code but wasn't able to locate it.
They seem to suggest that the rate-limiting just delays the delivery of messages, which means that my refreshCache would get called 10 times with item1 before getting called with item2. Instead, I'd prefer it get called once per item, ideally both immediately.
Similarly, it seems as if SNS and SQS can either rate-limit in-order delivery or throttle based on the queue but not based on the contents of that queue. It would not be feasible for me to have separate queues per itemId, as there will be 100,000+ distinct itemIds.
The Ask
Is what I'm trying to do possible in MassTransit? If not, is it possible via SQS? I'm also able to be creative with using RabbitMQ or adding in Lambdas, but would prefer to keep it simple.

Azure Durable Function getting slower and slower over time

My Azure Durable Function(Runtime V3) getting an average of 3M events per day. When it runs for two or three weeks it is getting slower and slower. When I remove two table storages(History & Instances) used by Durable Function Framework, it is getting better and works as expected. I hosted my function app in the consumption plan. And also inside my function app, I'm used Durabel Entities as well. In my code, I'm using sub orchestrators as well for the Fan-Out mechanism.
Is this problem possible when it comes to heavy workload? Do I need to clear those table storages from time to time or do I need to Delete the state of completed entities inside my Durable Entity Function?
Someone, please help me
Yes, you should perform periodic clean-ups yourself by calling the PurgeInstanceHistoryAsync method. See a similar post on how to do this: https://stackoverflow.com/a/60894392
Also review any loops or Monitor patterns that you may have in your code.
Any looping logic, (like foreach, for or while loops) will replay from the initial startup state. Whilst the Durable Function replay architecture is very efficient at doing this, the code we write may not be optimised for repetitive iterations.
Durable Monitor Pattern is almost an Anti-Pattern. The concept is OK but it is easily misinterpreted and is open to abuse. It is designed for a low-frequency loop that polls an endpoint either for a set number of iterations or up until a finite time, or of course when the state of the endpoint being monitoried has changed. That state change will be the trigger to perform the rest of the operation.
It is NOT an example of how to use general or high frequency looping structures in Durable functions
It is NOT and example of how to implement a traditional HTTP endpoint reporting monitor in an infinite loop (while(true)) style, perhaps to record changes into a data store over time.
If your durable function logic has an iterator that may involve many iterations, consider migrating the iteration step to a sub-orchestration that uses the Eternal Orchestration pattern

Twilio Functions - Memory and Database

I am using a Twilio Function which has an array of phone numbers.
I would like to be able to store these phone numbers in a 3rd party cloud database which we can edit with our CRM.
Then I'd write another Twilio function that will check the database and update the array in Twilio Functions with the latest data.
Alternatively if there is any other way that the first Twilio function could get the latest data from the database and store it in memory that would be great. I'd like to avoid checking the database for every request if possible in order to make the function as fast as possible.
Any help greatly appreciated!
Twilio developer evangelist here.
Currently, as Functions is in public beta, there is no API for Functions. So you cannot update Functions on Environment Variables for functions yet.
Also, due to beta limitations, you are unable to install Node modules, such as database drivers, so accessing remote data stores is currently not straightforward.
You can, from within a Function, make HTTP requests though. So, if your CRM could return a list of numbers in response to an HTTP request, then you could fetch them that way.
In terms of storing data in memory for Functions, this is not to be relied upon. Functions are short lived processes so the memory is volatile.
In your case, since you use a list of numbers, you could load the list in the first call to your Function and then pass those numbers through the URL for the remaining calls, so that you only need to make a request the first time.
Let me know if that helps at all.

Real time stream processing for IOT through Google Cloud Platform

I was concerned about real time stream processing for IOT through GCD pub/sub, Cloud Dataflow and perform analytics through BigQuery.I am seeking help for how to implement this.
Here is the architecture for IOT real-time stream processing
I'm assuming you mean that you want to stream some sort of data from outside the Google Cloud Platform into BigQuery.
Unless you're transforming the data somehow, I don't think that Data Flow is necessary.
Note, that BigQuery has its own Streaming API so you don't necessarily have to use Pub/Sub to get data into BigQuery.
In any case, these are the steps you should generally follow.
Method 1
Issue a service account (and download the .json file from IAM on Google Console)
Write your application to get the data you want to stream in
Inside that application, use the service account to stream directly into a BQ dataset and table
Analyse the data on the BigQuery console (https://bigquery.cloud.google.com)
Method 2
Setup PubSub queue
Write an application that collections the information you want to stream in
Push to PubSub
Configure DataFlow to pull from PubSub, transform the data however you need to and push to BigQuery
Analyse the data on the BigQuery console as above.
Raw Data
If you just want to put very raw data (no processing) into BQ, then I'd suggest using the first method.
Semi Processed / Processed Data
If you actually want to transform the data somehow, then I'd use the second method as it allows you to massage the data first.
Try to always use Method 1
However, I'd usually always recommend using the first method, even if you want to transform the data somehow.
That way, you have a data_dump table (raw data) in your dataset and you can still use DataFlow after that to transform the data and put it back into an aggregated table.
This gives you maximum flexibility because it allows you to create potentially n transformed datasets from the single data_dump table in BQ.

Resources