Understanding and mitigating Firebase Usage and Billing Fees

Understanding and mitigating Firebase Usage and Billing Fees - ios

I have tried my best to understand Firebase's usage and billing, but I found I only got more confused. For reference, I use the following Firebase Services in the project in question:
Realtime Database
Firestore
Authentication
Cloud Functions
Cloud Messaging
A few notes, I do update the Real-time Database frequently, I as well have a simple Firebase Functions method deployed, which is triggered (iOS Notification Trigger) whenever a change happens to the Real-time Database.
With that being said, all the services are within the Free Tier threshold:
But I still get billed, not much, but on a daily basis nonetheless.
I used to empty my buckets, and that would lower the billing a little, but after a recent Firebase Functions update, it automatically clears buckets now.
Questions:
What actions correlate to these costs? (frequency of Firebase Functions triggers, Size of deployed methods, etc.)
How can I mitigate these costs?
Do these costs scale with more users?

Related

O365 Activity Management API - Performance for huge audit log streams

This question is regarding the O365 Activity Management API
We are using the API to retrieve audit log notifications from multiple channels (AzureAD, Outlook, SharePoint, etc.) for very large tenants, meaning that we need to retrieve potentially millions of notifications over a relatively short timespan.
O365 gathers audit notifications into a series of "blobs" which then contain a number of individual notifications (JSON messages). To my understanding, which in part comes from correspondence with the API's dev. team and from reading the docs, these blobs should contain a "considerable" number notifications as to function as a sort of batch approach when doing the actual web requests.
In our approach, we request blobs URLs for an interval of an hour, and then do a request for the individual blobs.
However, we have tested with a number of different tenants and different PublisherIdentifiers, but only seem to get around 2.5 messages per blob on average, no matter the total number of notifications "waiting" to be fetched.
This becomes a major issue for the larger tenants as is puts a strain on the SIEM solution running the fetcher logic (a Python service), due to the number of needed requests, and it also gives us throttling issues with the API itself.
In effect, we simply cannot fetch the audit notifications fast enough to keep up - within the retention period. Had the blobs contained more notifications per blob, we would be fine - as the total amount of data (in MBs) is not that large.
A "funny" thing is, that if we use the visual query tool within the Admin Center of the tenant, it searches and retrieves the notifications very fast.
My questions
Has anyone had any experience with this issue, or perhaps had a better "batch performance"?
Does anyone have any ideas as to what we could try to get a better performance?
As mentioned we have been in direct contact with the dev team and the program manager in Redmond. They have been very helpful with other issues we had, but they referred us to support for this specific issue - who in turn referred us to the forums / community. We currently do not have access to premium support...
Example request for content blobs for an hour
https://manage.office.com/api/v1.0/{tenantid}/activity/feed/subscriptions/content?contentType=Audit.Exchange&PublisherIdentifier={pub.id}&startTime=2017-12-03T10:31:24&endTime=2017-12-03T11:31:24
When retrieving the individual blobs, we just use the URLs given to us by the above request.

You can avoid throttling by appending "?PublisherIdentifier={Tenant ID}" to the contentUri in the retrieve content get request.
How can I add a PublisherId to a GetBlob call to the Office365 Rest API to avoid throttling?

I have been working with Office 365 Management Activity API for the past 6 months. I too faced this kind of issue before. This issue will occur if you are trying to get all the audit log contents from your Office 365 tenant at a particular interval, it will result in throttling issue. For your information, it is not possible to avoid throttling issues (resource over usage) for large active tenants.
To overcome these issues, you can create and deploy a web application in cloud and register with Office 365 Management Activity API webhook.
Whenever the office 365 tenant wrap the activity logs into an Azure Blob, it will immediately give the blob details to your registered Web Application. You can refer this link to know about how to enable webhook for a Web Application. Once you received the blob detail from Office 365 tenant, extract the logs from the Azure Blob and save it in your own blob storage / store in SQL / NOSQL databases.

I had a similar issue. Pulling down logs would take longer than the interval of time allotted to the Python script and the script would start overlapping itself or would fall behind when trying to pull logs for a SIEM implementation.
https://github.com/IntegralDefense/o365_log_fetch
I'm a little late to this post, but by using Asyncio in Python 3.5+ as well as aiohttp, you can make concurrent calls to O365 Management API and pull down the logs much faster. I performed some testing and retrieved logs for a 13 hour window (Audit.Exchange, Audit.AzureActiveDirectory, and Audit.Sharepoint). It took around 20 minutes using 'requests' and sequentially making the API calls. After implementing Asyncio/aiohttp, the same time frame took just under 2 minutes (500,000+ individual events were pulled from the data located at several thousand content blobs/locations).
I've been running the script in 10 minute intervals and usually the script completes in < 10 seconds.
The script I pasted above also supports pagination. So if you get a content list that was truncated in the response from Microsoft, the script will keep reaching out and pulling down more content locations.
At this time, the documentation isn't up to speed, but hopefully that will be caught up soon.

Should i make a seperate app to send push notifications to 40-50k users in RoR App or use background jobs

I have a rails application that is in fact a backend of a popular IOS application which have a user base of 200k users who needs to be notified time to time.
Daily 40-50k users will be notified using push notifications. These push notifications will be realtime and scheduled ones. eg: if a new users signs up he will be notified within few seconds. eg: scheduled notifications will run at 10 pm daily with limited users ranging 10k-30k or sometimes more upto 100k.
I also will be doing business reporting to generate list of users fulfilling certain criteria and it requires firing mysql queries that could take upto 1-2 minutes of time.
My area of concern is should i have a seperate application with seperate mirror db to send push notifications to these users so my IOS users doesnt feel lag while using this application when push notifications are triggered or business reporting query is triggered.
Or should i use background jobs like Rails Active job, Sidekiq or Sucker Punch to perform push notifications and triggering business reporting queries.
Is background jobs in rails so powerful that it can manage this condition and doesn't let App users to feel lag in experience.
My application stack is:
Rails: 4.1.6
Ruby: 2.2
DB: Mysql
PaaS: AWS Elastic Beans
IOS Push gem: Houston

In my opinion, there are several factors that affect my decision.
1. Does your service need to keep many persistent connections?
If your answer is YES, then use another language which has better asynchronous IO (like Node.js) to implement your push service.
If your answer is NO, which means you only send requests to third-party services (like APNS), then consider the next factor.
2. Do you have to reuse your domain model in your push service?
If your answer is YES, then stick to Active Job + Sidekiq.
If your answer is NO, which means you only uses some fields (like id, name) of some table (like users), then consider the next factor.
3. Does your server have a limited memory resource?
A rails processes often consumes several hundreds of MB of memory, and Sidekiq requires a separate Rails process which can't be preforked (which means it does not share memory with your Rails app).
So if your answer is YES, then consider create a separate lightweight push service.
As for mirror database, if I have to do heavy query before push, I will definitely use mirror database.

Is there a way to schedule edits to firebase database?

I am trying to create automated edits to the database in firebase. Is there a way to do that on the server-side? I am new to iOS development and swift so any help would be greatly appreciated.
Also, I've tried Zapier but the service is not specific enough for my needs.

Yes - Firebase has quite a flexible set of options for server-side updates and it is simple enough to schedule a cronjob to connect to firebase and perform some scheduled update or edits.
The most generic approach is to use the REST API to perform your updates although there are specific libraries to support Node and other platforms.
It is worth being aware of the recent major upgrade to version 3 of Firebase which introduced quite a few significant changes - it can be easy to confuse the older examples floating around with the new API so be aware of the differences as you put together your first proof of concept examples.
I assume that you are looking to run on your own server although another alternative is to use a container hosting environment ( Google Apps etc ).
If you have your own server and are looking to integrate I would suggest starting with:
https://firebase.google.com/docs/server/setup#prerequisites
Then perhaps a quick look at:
https://firebase.googleblog.com/docs/web/quickstart.html
and
https://www.firebase.com/docs/rest/
If you are just getting started I would suggest a first task being to authenticate, retrieve and update a Firebase record.
You can configure server auth keys through the FB console and use these as part of you authentication process.
If you are unfamiliar with JWT then it is worth spending a little time getting up to speed on this and working through the examples at https://www.firebase.com/docs/rest/guide/user-auth.html
Further to your comment:
So the first approach that comes to mind is to run some kind of scheduled job in your Cron which would connect using the REST API, perform some kind of query on the existing data to identify those records that require an update and remove or modify them.
Giving a little more though you could extend this approach without having to run at a recurring period less than the minimal anticipated deletion time you could run the scheduler just to clean up at some longer period but filter your results to the client so that you are not including stale data. This approach is discussed a little at Firebase chat - removing old messages
Getting the right solution to your particular scenario will depend a lot on how well you structure your data which can be counter-intuitive; particularly for users who have come from an RDBMS background.
There may be an inclination to keep the data slim and unpolluted with old irrelevant data however Firebase is quite good at managing large minimally structured data and the overhead of this bloat may not be as bad a thing as you may think.
If the filtering itself isn't sufficient and you don't have a server that you can CRON a cleanup process then you can implement a firebase worker process in Node or similar and have this running on a container service such as Heroku or Google Apps. See Firebase push notifications - node worker for some ideas on how to approach this.
When asked Google advised that they didn't advise on where best to host worker services but they did mention both Google App Engine and Heroku.
Another approach if you don't want to implement and host a watcher/worker process is to simply include some code in the client that checks for and removes stale data periodically.
The firebase Queue is very cool but may be a bit of an overkill for simply expiring stale data.

Structuring backend queries

So this is more of a methodology question than a coding question. I want to ask this before I actually start coding in order to choose the best route. I have a messaging app. When the app launches I query in the background all the messages from the backend where current_user_id is equal to recipient_id. Now I have all of the messages stored the user needs to see so I locally store them into a sqlite database.
Great, but what about when the user gets new messages? How can i structure a query to receive those without having to query the entire table again? Also how do I set this up as a continual process? Is the phone always requesting update information from the backend while its in the foreground?
Thanks. I really appreciate your help. I'm currently using iOS and as stated SQLite. Also my backend is AWS node.js.

It looks like your goal is to ultimately synchronize data between two sources over a network with a constraint that the client is updated in a reasonable amount of time. You have a design choice to make between a push vs pull architecture.
Push architectures have the servers push data to clients when an event occurs.
Pull architectures have the device periodically poll the server for changes. This can be achieved through timed events.
There are hybrid approaches too.
Each have their advantages and disadvantages as some require constant polling. Others require constant connection based protocols which presents more scaling challenges.

Amazon DynamoDB Provisioned Throughput (iOS SDK)

I am new to DynamoDB. I am very much confused about provisioned throughput. I am creating a iPhone game in which the users can chat within the game. I am having a Chat table. The Chat table contains GameID, UserID and Message. How do I find the size of the item to calculate throughput. The size of the item entirely depends on the Message right? How to calculate the size of an item?
Amazon tells that we can either modify the throughput by using UpdateTable API or by manually from the console. If I want to change it form code, how will I know that the provisioned throughput has been exceeded for a certain table? How to check that from code?
I am also confused about the CloudWatch. How to understand this?
Could anyone please help me? Please don't point me to the documentation.
Thanks.

I will do my best to help with the confusion.
DynamoDB is a key:value database
CloudWatch is Amazon's products monitoring tool
Provisioned throughput is roughly the number Items KB you plan to Read/Write per seconds
Whenever you exceed your provisioned throughput,
DynamoDB answers with ProvisionedThroughputExceededException
DynamoDB notifies CloudWatch
What Cloudwatch does is basically record and aggregates data-points. For most applications, it will only keep track of aggregated data over each consecutive 5min periods.
You can then access these data for "manual" monitoring or set up "alarms".
There was a really interesting question on SO a couple of weeks earlier on DynamoDB auto-scaling using alarms. You might be interested in reading it: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/ErrorHandling.html
Knowing this, you can start building your application.
As for every DynamoDB services, one needs credentials to access it. Even though they can be restricted to a specific table or set of action, it is very dangerous to bundle them in an application. Would you give MySQL or MongoDB or credentials, even Read Only to any untrusted people ?
May I suggest you do build your application to rely on a server of your own ? This server being trusted and build by you, you could safely perform any authorization check there and grant it full access to your table.
I hope this helps. Feel free to ask for more precisions.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart