O365 Activity Management API - Performance for huge audit log streams - office365api

This question is regarding the O365 Activity Management API
We are using the API to retrieve audit log notifications from multiple channels (AzureAD, Outlook, SharePoint, etc.) for very large tenants, meaning that we need to retrieve potentially millions of notifications over a relatively short timespan.
O365 gathers audit notifications into a series of "blobs" which then contain a number of individual notifications (JSON messages). To my understanding, which in part comes from correspondence with the API's dev. team and from reading the docs, these blobs should contain a "considerable" number notifications as to function as a sort of batch approach when doing the actual web requests.
In our approach, we request blobs URLs for an interval of an hour, and then do a request for the individual blobs.
However, we have tested with a number of different tenants and different PublisherIdentifiers, but only seem to get around 2.5 messages per blob on average, no matter the total number of notifications "waiting" to be fetched.
This becomes a major issue for the larger tenants as is puts a strain on the SIEM solution running the fetcher logic (a Python service), due to the number of needed requests, and it also gives us throttling issues with the API itself.
In effect, we simply cannot fetch the audit notifications fast enough to keep up - within the retention period. Had the blobs contained more notifications per blob, we would be fine - as the total amount of data (in MBs) is not that large.
A "funny" thing is, that if we use the visual query tool within the Admin Center of the tenant, it searches and retrieves the notifications very fast.
My questions
Has anyone had any experience with this issue, or perhaps had a better "batch performance"?
Does anyone have any ideas as to what we could try to get a better performance?
As mentioned we have been in direct contact with the dev team and the program manager in Redmond. They have been very helpful with other issues we had, but they referred us to support for this specific issue - who in turn referred us to the forums / community. We currently do not have access to premium support...
Example request for content blobs for an hour
https://manage.office.com/api/v1.0/{tenantid}/activity/feed/subscriptions/content?contentType=Audit.Exchange&PublisherIdentifier={pub.id}&startTime=2017-12-03T10:31:24&endTime=2017-12-03T11:31:24
When retrieving the individual blobs, we just use the URLs given to us by the above request.

You can avoid throttling by appending "?PublisherIdentifier={Tenant ID}" to the contentUri in the retrieve content get request.
How can I add a PublisherId to a GetBlob call to the Office365 Rest API to avoid throttling?

I have been working with Office 365 Management Activity API for the past 6 months. I too faced this kind of issue before. This issue will occur if you are trying to get all the audit log contents from your Office 365 tenant at a particular interval, it will result in throttling issue. For your information, it is not possible to avoid throttling issues (resource over usage) for large active tenants.
To overcome these issues, you can create and deploy a web application in cloud and register with Office 365 Management Activity API webhook.
Whenever the office 365 tenant wrap the activity logs into an Azure Blob, it will immediately give the blob details to your registered Web Application. You can refer this link to know about how to enable webhook for a Web Application. Once you received the blob detail from Office 365 tenant, extract the logs from the Azure Blob and save it in your own blob storage / store in SQL / NOSQL databases.

I had a similar issue. Pulling down logs would take longer than the interval of time allotted to the Python script and the script would start overlapping itself or would fall behind when trying to pull logs for a SIEM implementation.
https://github.com/IntegralDefense/o365_log_fetch
I'm a little late to this post, but by using Asyncio in Python 3.5+ as well as aiohttp, you can make concurrent calls to O365 Management API and pull down the logs much faster. I performed some testing and retrieved logs for a 13 hour window (Audit.Exchange, Audit.AzureActiveDirectory, and Audit.Sharepoint). It took around 20 minutes using 'requests' and sequentially making the API calls. After implementing Asyncio/aiohttp, the same time frame took just under 2 minutes (500,000+ individual events were pulled from the data located at several thousand content blobs/locations).
I've been running the script in 10 minute intervals and usually the script completes in < 10 seconds.
The script I pasted above also supports pagination. So if you get a content list that was truncated in the response from Microsoft, the script will keep reaching out and pulling down more content locations.
At this time, the documentation isn't up to speed, but hopefully that will be caught up soon.

Related

YouTube Data API: The request cannot be completed because you have exceeded your quota. Сan I make an app using the YouTube Data API or not?

I got this error:
The request cannot be completed because you have exceeded your quota.
and I can not understand YouTube limits the number of requests? That is, I cannot create my project by taking API from my channel? If this is so, what is the point of YouTube Data API, if at the development stage I was already limited, what will happen when users come in, then my project will fall within 5 minutes?
and I cannot understand how I was able to make 10,000 requests per day, given that I worked on the localhost for about 3 hours, is this possible?
Indeed the Google's Developers Console shows text like Queries per day, but that's very much misleading (and may well be reported as an Web UI bug to Google).
You have to acknowledge that YouTube Data API's quota system is not accounting for the number of endpoints calls you made during a day long, but it accounts for the cumulated number of quota units corresponding to each of your endpoint calls.
For example, if you have 10000 units of quota allocated for daily usage, you may very easily exceed this upper bound after only 100 calls to the Search.list API endpoint.
Many API users find the default amount of quota allocated -- 10000 units -- to be quite constraining -- that even during the development stage of their apps. For tackling this issue, I recommend two things:
Develop your app such that to cache API responses it received from the endpoints it calls; this way, during the development stage of your app (afterwards, even during production, but albeit functioning with a different logic), repeated calls to endpoints would not result in actual API requests, but would get served from the app's local cache.
Apply for a quota extension, using Google's official form; be aware that, as per the experience of users of this forum, Google's answer, usually, does not arrive shortly.

Microsoft Graph API Daily Limitation

I'm using Microsoft Graph API to upload my excel file to Onedrive and convert it to PDF. My service has a lot of traffic so I want to know about the daily limitation of Microsoft Graph API? How many requests I can send to Microsoft Graph per day?
Somebody already asked about Throttling on Stackoverflow, but I'm not really sure about Daily Limitation API?
10 minutes - 10000 requests
1 day (1440 minutes) - 1440000 requests
Is that correct?
Throttling is done per user per app. The threshold is 10000 requests every 10 minutes.
Microsoft Graph API - Throttling
With Microsoft Modern APIs, you will never know what is the current throttling limits. The main reason for this is because they change depending on a lot of factors and I don't believe they are ever disclosed.
As a best practice you should have error handling on your App so it's smart when that happens.
Whenever your app is affected by throttling you will get a HTTP error code 429 response and you should back-off immediately and retry after a bit. From the documentation, however I never actually noticed that when I was dealing with throttling with Sharepoint a year ago, is a "Retry-after" header that will tell you how much to wait.
The documentation actually has a good explanation that you can find fairly easy here:
https://learn.microsoft.com/en-us/graph/throttling

We query the messages API a lot, but after a while it becomes inconsistent

We are using graph messages queries a lot in a sharepoint app of ours. Mail get tagged using extended properties and users are treating them using the sharepoint app.
We are accordingly requesting the graph API constantly even though we cache some of the information.
The app and the querying work superb, but after a short while (let's say 30 minutes) it starts to misbehave. It hangs, becomes slow and sometimes we have a timeout.
Users do have their app open at all times and we do refresh tokens using adal and all that (old school nowadays, I know)
Is there a correlation of performance or some kind of throttling going on inside exchange online, graph or sharepoint that can be the reason behind this?
Links, experiences or solutions are appreciated;)
Best regards
Ole

Microsoft Graph API - Throttling

Is there a specific number of requests/minute (specific to a tenant) that an application can make to Microsoft Graph APIs before requests start getting throttled?
No, not specific to a tenant (at least not for the Outlook-related parts of the Graph). Throttling is done per user per app. The threshold is 10000 requests every 10 minutes.
https://blogs.msdn.microsoft.com/exchangedev/2017/04/07/throttling-coming-to-outlook-api-and-microsoft-graph/
For non-Outlook stuff, I'm not sure what the limits are. All Graph has to say about it is here:
https://developer.microsoft.com/en-us/graph/docs/concepts/throttling
The takeaway here is you should not depend on a specific threshold since we can always change it if we need to in order to protect the integrity of the service. Ensure that your app can gracefully handle being throttled by handling the 429 error response properly.

Amazon DynamoDB Provisioned Throughput (iOS SDK)

I am new to DynamoDB. I am very much confused about provisioned throughput. I am creating a iPhone game in which the users can chat within the game. I am having a Chat table. The Chat table contains GameID, UserID and Message. How do I find the size of the item to calculate throughput. The size of the item entirely depends on the Message right? How to calculate the size of an item?
Amazon tells that we can either modify the throughput by using UpdateTable API or by manually from the console. If I want to change it form code, how will I know that the provisioned throughput has been exceeded for a certain table? How to check that from code?
I am also confused about the CloudWatch. How to understand this?
Could anyone please help me? Please don't point me to the documentation.
Thanks.
I will do my best to help with the confusion.
DynamoDB is a key:value database
CloudWatch is Amazon's products monitoring tool
Provisioned throughput is roughly the number Items KB you plan to Read/Write per seconds
Whenever you exceed your provisioned throughput,
DynamoDB answers with ProvisionedThroughputExceededException
DynamoDB notifies CloudWatch
What Cloudwatch does is basically record and aggregates data-points. For most applications, it will only keep track of aggregated data over each consecutive 5min periods.
You can then access these data for "manual" monitoring or set up "alarms".
There was a really interesting question on SO a couple of weeks earlier on DynamoDB auto-scaling using alarms. You might be interested in reading it: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/ErrorHandling.html
Knowing this, you can start building your application.
As for every DynamoDB services, one needs credentials to access it. Even though they can be restricted to a specific table or set of action, it is very dangerous to bundle them in an application. Would you give MySQL or MongoDB or credentials, even Read Only to any untrusted people ?
May I suggest you do build your application to rely on a server of your own ? This server being trusted and build by you, you could safely perform any authorization check there and grant it full access to your table.
I hope this helps. Feel free to ask for more precisions.

Resources