I am using AWS Cognito and DynamoDB.i have authenticated user using AWS Cognito and also used crud operation in DynamoDB successfully.I am creating dataset when internet is not available but i have no idea how to synchronize dataset with DynamoDB.Is AWS support dataset synchronization with DynamoDB.
You have several options depending on your use-case.
The most straightforward and simple option is to use DynamoDB streams. It can store all updates to DynamoDB table for up to 24 hours and allows you to read these changes and reapply them in another DB.
If 24 hour window is too strict for you, you will have to create some sort of DynamoDB snapshots. Say you can create a DynamoDB snapshot every 24 hours and store it into S3. Then you can use DynamoDB streams to read real time updates and snapshots to read baseline data.
To create DynamoDB snapshots you can use Data Pipeline service.
Related
I have tried my best to understand Firebase's usage and billing, but I found I only got more confused. For reference, I use the following Firebase Services in the project in question:
Realtime Database
Firestore
Authentication
Cloud Functions
Cloud Messaging
A few notes, I do update the Real-time Database frequently, I as well have a simple Firebase Functions method deployed, which is triggered (iOS Notification Trigger) whenever a change happens to the Real-time Database.
With that being said, all the services are within the Free Tier threshold:
But I still get billed, not much, but on a daily basis nonetheless.
I used to empty my buckets, and that would lower the billing a little, but after a recent Firebase Functions update, it automatically clears buckets now.
Questions:
What actions correlate to these costs? (frequency of Firebase Functions triggers, Size of deployed methods, etc.)
How can I mitigate these costs?
Do these costs scale with more users?
I can't ingest records into AWS Timestream if timestamp is out of the window of Memory Store. Thus, I can't implement functionality where I replay messages, process and ingest them if there is some issue. Are there any solutions for this?
Currently there is not a way to ingest records that are outside of the memory store retention window. Would you be able to create a table with a memory store retention window large enough for the window that you expect to be correcting data?
This question is regarding the O365 Activity Management API
We are using the API to retrieve audit log notifications from multiple channels (AzureAD, Outlook, SharePoint, etc.) for very large tenants, meaning that we need to retrieve potentially millions of notifications over a relatively short timespan.
O365 gathers audit notifications into a series of "blobs" which then contain a number of individual notifications (JSON messages). To my understanding, which in part comes from correspondence with the API's dev. team and from reading the docs, these blobs should contain a "considerable" number notifications as to function as a sort of batch approach when doing the actual web requests.
In our approach, we request blobs URLs for an interval of an hour, and then do a request for the individual blobs.
However, we have tested with a number of different tenants and different PublisherIdentifiers, but only seem to get around 2.5 messages per blob on average, no matter the total number of notifications "waiting" to be fetched.
This becomes a major issue for the larger tenants as is puts a strain on the SIEM solution running the fetcher logic (a Python service), due to the number of needed requests, and it also gives us throttling issues with the API itself.
In effect, we simply cannot fetch the audit notifications fast enough to keep up - within the retention period. Had the blobs contained more notifications per blob, we would be fine - as the total amount of data (in MBs) is not that large.
A "funny" thing is, that if we use the visual query tool within the Admin Center of the tenant, it searches and retrieves the notifications very fast.
My questions
Has anyone had any experience with this issue, or perhaps had a better "batch performance"?
Does anyone have any ideas as to what we could try to get a better performance?
As mentioned we have been in direct contact with the dev team and the program manager in Redmond. They have been very helpful with other issues we had, but they referred us to support for this specific issue - who in turn referred us to the forums / community. We currently do not have access to premium support...
Example request for content blobs for an hour
https://manage.office.com/api/v1.0/{tenantid}/activity/feed/subscriptions/content?contentType=Audit.Exchange&PublisherIdentifier={pub.id}&startTime=2017-12-03T10:31:24&endTime=2017-12-03T11:31:24
When retrieving the individual blobs, we just use the URLs given to us by the above request.
You can avoid throttling by appending "?PublisherIdentifier={Tenant ID}" to the contentUri in the retrieve content get request.
How can I add a PublisherId to a GetBlob call to the Office365 Rest API to avoid throttling?
I have been working with Office 365 Management Activity API for the past 6 months. I too faced this kind of issue before. This issue will occur if you are trying to get all the audit log contents from your Office 365 tenant at a particular interval, it will result in throttling issue. For your information, it is not possible to avoid throttling issues (resource over usage) for large active tenants.
To overcome these issues, you can create and deploy a web application in cloud and register with Office 365 Management Activity API webhook.
Whenever the office 365 tenant wrap the activity logs into an Azure Blob, it will immediately give the blob details to your registered Web Application. You can refer this link to know about how to enable webhook for a Web Application. Once you received the blob detail from Office 365 tenant, extract the logs from the Azure Blob and save it in your own blob storage / store in SQL / NOSQL databases.
I had a similar issue. Pulling down logs would take longer than the interval of time allotted to the Python script and the script would start overlapping itself or would fall behind when trying to pull logs for a SIEM implementation.
https://github.com/IntegralDefense/o365_log_fetch
I'm a little late to this post, but by using Asyncio in Python 3.5+ as well as aiohttp, you can make concurrent calls to O365 Management API and pull down the logs much faster. I performed some testing and retrieved logs for a 13 hour window (Audit.Exchange, Audit.AzureActiveDirectory, and Audit.Sharepoint). It took around 20 minutes using 'requests' and sequentially making the API calls. After implementing Asyncio/aiohttp, the same time frame took just under 2 minutes (500,000+ individual events were pulled from the data located at several thousand content blobs/locations).
I've been running the script in 10 minute intervals and usually the script completes in < 10 seconds.
The script I pasted above also supports pagination. So if you get a content list that was truncated in the response from Microsoft, the script will keep reaching out and pulling down more content locations.
At this time, the documentation isn't up to speed, but hopefully that will be caught up soon.
I have a Swift/iOS application that creates/holds data something like this:
buyer_name: "john#gmail.com"
seller_name: "tom#gmail.com"
price: "20.00"
product: "baseball cards"
timestamp: 1498677314931
I am taking the timestamp with
FIRServerValue.timestamp()
I would like to create some kind of timeout, where it deletes firebase data older than 30 minutes, as per the timestamp. The problem is that I would not like to do it directly from the app, as the user could technically logout after 5 minutes. I would like to do it from some other independent process.
So, I'm wondering if this can be done automatically through Firebase, or if I have to deploy some other application that does this continuously on an EC2 instance, or if there is some other method?
You can use Cloud Functions for Firebase and set up a cron job that runs, say, every five minutes and checks for timestamps older than 30 minutes.
As Will commented, that blog post is useful to learn about scheduling Cloud Functions. Here are some other useful resources to get you started:
Getting Started with Cloud Functions for Firebase - YouTube
Cloud Functions for Firebase Documentation
Cloud Functions Samples
Timing Cloud Functions for Firebase using an HTTP Trigger and Cron - YouTube
I am new to DynamoDB. I am very much confused about provisioned throughput. I am creating a iPhone game in which the users can chat within the game. I am having a Chat table. The Chat table contains GameID, UserID and Message. How do I find the size of the item to calculate throughput. The size of the item entirely depends on the Message right? How to calculate the size of an item?
Amazon tells that we can either modify the throughput by using UpdateTable API or by manually from the console. If I want to change it form code, how will I know that the provisioned throughput has been exceeded for a certain table? How to check that from code?
I am also confused about the CloudWatch. How to understand this?
Could anyone please help me? Please don't point me to the documentation.
Thanks.
I will do my best to help with the confusion.
DynamoDB is a key:value database
CloudWatch is Amazon's products monitoring tool
Provisioned throughput is roughly the number Items KB you plan to Read/Write per seconds
Whenever you exceed your provisioned throughput,
DynamoDB answers with ProvisionedThroughputExceededException
DynamoDB notifies CloudWatch
What Cloudwatch does is basically record and aggregates data-points. For most applications, it will only keep track of aggregated data over each consecutive 5min periods.
You can then access these data for "manual" monitoring or set up "alarms".
There was a really interesting question on SO a couple of weeks earlier on DynamoDB auto-scaling using alarms. You might be interested in reading it: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/ErrorHandling.html
Knowing this, you can start building your application.
As for every DynamoDB services, one needs credentials to access it. Even though they can be restricted to a specific table or set of action, it is very dangerous to bundle them in an application. Would you give MySQL or MongoDB or credentials, even Read Only to any untrusted people ?
May I suggest you do build your application to rely on a server of your own ? This server being trusted and build by you, you could safely perform any authorization check there and grant it full access to your table.
I hope this helps. Feel free to ask for more precisions.