Amazon DynamoDB Provisioned Throughput (iOS SDK) - ios

I am new to DynamoDB. I am very much confused about provisioned throughput. I am creating a iPhone game in which the users can chat within the game. I am having a Chat table. The Chat table contains GameID, UserID and Message. How do I find the size of the item to calculate throughput. The size of the item entirely depends on the Message right? How to calculate the size of an item?
Amazon tells that we can either modify the throughput by using UpdateTable API or by manually from the console. If I want to change it form code, how will I know that the provisioned throughput has been exceeded for a certain table? How to check that from code?
I am also confused about the CloudWatch. How to understand this?
Could anyone please help me? Please don't point me to the documentation.
Thanks.

I will do my best to help with the confusion.
DynamoDB is a key:value database
CloudWatch is Amazon's products monitoring tool
Provisioned throughput is roughly the number Items KB you plan to Read/Write per seconds
Whenever you exceed your provisioned throughput,
DynamoDB answers with ProvisionedThroughputExceededException
DynamoDB notifies CloudWatch
What Cloudwatch does is basically record and aggregates data-points. For most applications, it will only keep track of aggregated data over each consecutive 5min periods.
You can then access these data for "manual" monitoring or set up "alarms".
There was a really interesting question on SO a couple of weeks earlier on DynamoDB auto-scaling using alarms. You might be interested in reading it: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/ErrorHandling.html
Knowing this, you can start building your application.
As for every DynamoDB services, one needs credentials to access it. Even though they can be restricted to a specific table or set of action, it is very dangerous to bundle them in an application. Would you give MySQL or MongoDB or credentials, even Read Only to any untrusted people ?
May I suggest you do build your application to rely on a server of your own ? This server being trusted and build by you, you could safely perform any authorization check there and grant it full access to your table.
I hope this helps. Feel free to ask for more precisions.

Related

O365 Activity Management API - Performance for huge audit log streams

This question is regarding the O365 Activity Management API
We are using the API to retrieve audit log notifications from multiple channels (AzureAD, Outlook, SharePoint, etc.) for very large tenants, meaning that we need to retrieve potentially millions of notifications over a relatively short timespan.
O365 gathers audit notifications into a series of "blobs" which then contain a number of individual notifications (JSON messages). To my understanding, which in part comes from correspondence with the API's dev. team and from reading the docs, these blobs should contain a "considerable" number notifications as to function as a sort of batch approach when doing the actual web requests.
In our approach, we request blobs URLs for an interval of an hour, and then do a request for the individual blobs.
However, we have tested with a number of different tenants and different PublisherIdentifiers, but only seem to get around 2.5 messages per blob on average, no matter the total number of notifications "waiting" to be fetched.
This becomes a major issue for the larger tenants as is puts a strain on the SIEM solution running the fetcher logic (a Python service), due to the number of needed requests, and it also gives us throttling issues with the API itself.
In effect, we simply cannot fetch the audit notifications fast enough to keep up - within the retention period. Had the blobs contained more notifications per blob, we would be fine - as the total amount of data (in MBs) is not that large.
A "funny" thing is, that if we use the visual query tool within the Admin Center of the tenant, it searches and retrieves the notifications very fast.
My questions
Has anyone had any experience with this issue, or perhaps had a better "batch performance"?
Does anyone have any ideas as to what we could try to get a better performance?
As mentioned we have been in direct contact with the dev team and the program manager in Redmond. They have been very helpful with other issues we had, but they referred us to support for this specific issue - who in turn referred us to the forums / community. We currently do not have access to premium support...
Example request for content blobs for an hour
https://manage.office.com/api/v1.0/{tenantid}/activity/feed/subscriptions/content?contentType=Audit.Exchange&PublisherIdentifier={pub.id}&startTime=2017-12-03T10:31:24&endTime=2017-12-03T11:31:24
When retrieving the individual blobs, we just use the URLs given to us by the above request.
You can avoid throttling by appending "?PublisherIdentifier={Tenant ID}" to the contentUri in the retrieve content get request.
How can I add a PublisherId to a GetBlob call to the Office365 Rest API to avoid throttling?
I have been working with Office 365 Management Activity API for the past 6 months. I too faced this kind of issue before. This issue will occur if you are trying to get all the audit log contents from your Office 365 tenant at a particular interval, it will result in throttling issue. For your information, it is not possible to avoid throttling issues (resource over usage) for large active tenants.
To overcome these issues, you can create and deploy a web application in cloud and register with Office 365 Management Activity API webhook.
Whenever the office 365 tenant wrap the activity logs into an Azure Blob, it will immediately give the blob details to your registered Web Application. You can refer this link to know about how to enable webhook for a Web Application. Once you received the blob detail from Office 365 tenant, extract the logs from the Azure Blob and save it in your own blob storage / store in SQL / NOSQL databases.
I had a similar issue. Pulling down logs would take longer than the interval of time allotted to the Python script and the script would start overlapping itself or would fall behind when trying to pull logs for a SIEM implementation.
https://github.com/IntegralDefense/o365_log_fetch
I'm a little late to this post, but by using Asyncio in Python 3.5+ as well as aiohttp, you can make concurrent calls to O365 Management API and pull down the logs much faster. I performed some testing and retrieved logs for a 13 hour window (Audit.Exchange, Audit.AzureActiveDirectory, and Audit.Sharepoint). It took around 20 minutes using 'requests' and sequentially making the API calls. After implementing Asyncio/aiohttp, the same time frame took just under 2 minutes (500,000+ individual events were pulled from the data located at several thousand content blobs/locations).
I've been running the script in 10 minute intervals and usually the script completes in < 10 seconds.
The script I pasted above also supports pagination. So if you get a content list that was truncated in the response from Microsoft, the script will keep reaching out and pulling down more content locations.
At this time, the documentation isn't up to speed, but hopefully that will be caught up soon.

iOS chat app design

I'm building a simple chat app on iOS for fun (and to have projects to gain experience from), using socketsIO and a node backend. I am trying to figure out the best design for messages. I was planning to use a mongoDB database where each conversation would have its message data stored. Whenever the client sends a new message to the server, the server adds it to the appropriate conversation in the database.
I was also hoping to create a user Sign Up/Log In system which would add you to the database.
However, I've googled around quite a bit and I am really not sure if creating a database made up of conversations (that get updated whenever a sentMessage event is triggered) and user data is the right way to go.
Additionally, I've seen some people talk about saving the chats on the actual devices themselves, not in a database? What is the common design pattern for a chat app like this?
for the design I would use socket.io for emitting messages as well. It has a great community behind it, I woul also use MongoDb because everything is using JSON format and it's integrated so well with Node due to it using JavaScript.
Now the part you are interested about, is REDIS. Redis is a database that sits in RAM on the web and should be used with mongodb if you're going to be having higher traffic / need quick speed / less hanging and waiting.
REDIS would be your temporary save for the chat with a session because doing disk write/read/querying is a lot on the machine (looking at you MongoDB), If you plan on saving the chat with every message. Doing so MongoDb would just not scale all the well in the long run and is not as fast as REDIS. Mind you REDIS database will only hold the temporary chat log of let's say the last 1 million chat session or some limit (it's all in RAM so the size is limited can't have Terabytes or hundreds of Gigabytes of RAM on 1 server).
so the data flow would look something like
user sends message
server receives messsage via HTTP(S) post/put - Ajax/Observable
Server will use socket.io to emit the message to the designated user while saving the message to REDIS with a specific key/session/message.
designated user get's the update on their screen via io event.
-- inbetween there should be a check on the REDIS db of whether it is getting full. if it's full remove the last 10,000 inactive messages (could be from 1 year ago if the server hasn't gotten full yet) to make some space.
Saving the chat on the phone is an okay idea as it would save the users data/bandwidth and they could potentially look at their message while offline.
a solution is using SQL Lite which is a lightweight library that will sit inside your app acting as a database which you can perform queries on if your familiar with RDBMS you will have no problem implementing it. But now you gotta find a good way to manage saving data to REDIS/SQL-LITE/MongoDb.

Structuring backend queries

So this is more of a methodology question than a coding question. I want to ask this before I actually start coding in order to choose the best route. I have a messaging app. When the app launches I query in the background all the messages from the backend where current_user_id is equal to recipient_id. Now I have all of the messages stored the user needs to see so I locally store them into a sqlite database.
Great, but what about when the user gets new messages? How can i structure a query to receive those without having to query the entire table again? Also how do I set this up as a continual process? Is the phone always requesting update information from the backend while its in the foreground?
Thanks. I really appreciate your help. I'm currently using iOS and as stated SQLite. Also my backend is AWS node.js.
It looks like your goal is to ultimately synchronize data between two sources over a network with a constraint that the client is updated in a reasonable amount of time. You have a design choice to make between a push vs pull architecture.
Push architectures have the servers push data to clients when an event occurs.
Pull architectures have the device periodically poll the server for changes. This can be achieved through timed events.
There are hybrid approaches too.
Each have their advantages and disadvantages as some require constant polling. Others require constant connection based protocols which presents more scaling challenges.

Ruby on Rails performance on lots of requests and DB updates per second

I'm developing a polling application that will deal with an average of 1000-2000 votes per second coming from different users. In other words, it'll receive 1k to 2k requests per second with each request making a DB insert into the table that stores the voting data.
I'm using RoR 4 with MySQL and planning to push it to Heroku or AWS.
What performance issues related to database and the application itself should I be aware of?
How can I address this amount of inserts per second into the database?
EDIT
I was thinking in not inserting into the DB for each request, but instead writing to a memory stream the insert data. So I would have a scheduled job running every second that would read from this memory stream and generate a bulk insert, avoiding each insert to be made atomically. But i cannot think in a nice way to implement this.
While you can certainly do what you need to do in AWS, that high level of I/O will probably cost you. RDS can support up to 30,000 IOPS; you can also use multiple EBS volumes in different configurations to support high IO if you want to run the database yourself.
Depending on your planned usage patterns, I would probably look at pushing into an in-memory data store, something like memcached or redis, and then processing the requests from there. You could also look at DynamoDB, which might work depending on how your data is structured.
Are you going to have that level of sustained throughput consistently, or will it be in bursts? Do you absolutely have to preserve every single vote, or do you just need summary data? How much will you need to scale - i.e. will you ever get to 20,000 votes per second? 200,000?
These type of questions will help determine the proper architecture.

Amazon Web Services - What do i need?

Sorry for long explanation but I'm trying to find the right moves for days, any help would be much appreciated.
My IOS app will be used daily, An image and some data will be displayed to the user and will be saved not to connect again. So an user will use approximately 30kb per day.
Now, for testing, I'm using a basic hosting plan for MSSQL and Web Service. On SQL Server, I have 4 tables and an average of 5 columns (I mean It's not a complicated database)(and also I have a subquery). And I'm using .net web service for communicating from IOS app. And lastly, one image for one day is hosted.
I've tried to explain basically but It's expected to reach at least 1 million user after a short period of time according to my big clients.
So I want to start with AWS not to fail but really I don't know which products/settings do i need (from few users to millions) and how to start to AWS EC2. Also I want to specify that after AWS's documents and googling, I'm confused.
At least please show me the way. Thanks..
You want autoscaling both in the webtier and in database resources. You also likely want high availability (i.e. trans-AZ, trans-regional deployment). This answer might help point you in the right direction. Start with ElasticBeanstalk and RDS (if you can afford it). They both abstract out huge swathes of autoscaling.
Also pay close attention to the ElasticBeanstalk architectural overview. It'll help you distinguish between the web tier of your application, any application layers, and the database layer of your stack.

Resources