AWS DynamoDB client best practice (MVC app) - asp.net-mvc

I'm working to port some data access to dynamo DB in a high-traffic app. A bit of background - the app collects a very high volume of data, and some specific tables were causing performance issues in a traditional DB. So with a bit of re-design and some changes to the data layout we have been able to make them fit the DynamoDB niche nicely.
My question is around the use/creation of the client object. The SDK docs suggest it is better to create one client and share it amongst multiple threads, so in my repository implementation I have the client defined as a lazy singleton. This means it will be created once and all requests will share the same client (currently around 4000 requests per minute, but likely to grow massively as we come out of beta and start promoting the product).
Does anyone have any experience of making the AWS SDK scale?
Thanks
Sam

When you create one client and share it with multiple threads, only one thread can use the client at one point of time in some SDK.
Definitely if you create separate clients for different threads, it is going to slow down the process.
So I would suggest you to take a middle approach here,
Maximize the HTTP connection pooling size, so that more number of clients are allowed to be created.
And then you follow the sharing of client objects.
Batch operation can be used for .Net aws sdk
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BatchOperationsORM.html

Related

Impact of Fiori Apps on Server

We are following Embedded Architecture for our S4HANA 1610 system.
Please let me know what will be the impact on Server if we implement 200+ Standard Fiori Apps in our System ?
Regards,
Sayed
When you say “server”, are you referring to the ABAP backend, consisting of one or more SAP application servers and usually one database server?
In this case, you might get a very first impression using transaction ST03.
Here, you get a detailed analysis of resource consumption on the SAP application server.
You also get information about database access times, as seen from the application server.
This can give you a good hint about resource consumption on the database server.
Usually, the ABAP backend is accessed from Fiori via OData calls.
Not every user interaction causes an OData call, some interactions are handled locally at the frontend.
In general, implemented apps only require some space on the hard disk, as long as nobody is using them.
So the important questions for defining the expected workload are:
How many users are working with these apps in which frequency (Avg.
thinktime)?
How many OData calls are sent from these apps to the backend and how
many dialog steps are handled by the frontend itself?
How expensive are these OData calls (see ST03)?
Every app reflects one or more typical business processes, which need to be defined.
Your specific Customizing also plays an important role, because it controls different internal functionality.
It’s also mandatory, to optimize database access, because in productive use, tables might get bigger in size, which might slow down database access over time.
Usually, this kind of sizing is done by SAP Hardware and Technology partners.

How do I call Web API from MVC without latency?

I'm thinking about moving my DAL which uses DocumentDb and Azure Table Storage to a separate Web API and host it as a cloud service on Azure.
The primary purpose of doing this is to make sure that I keep a high performance DAL that can scale up easily and independently of my front-end application -- currently ASP.NET MVC 5 running as a cloud service on Azure but I'll definitely add mobile apps as well. With DocumentDb and Azure Table Storage, I'm finding myself doing a lot of data handling in my C# code, therefore, I think it would be a good idea to keep that separate from my front-end application.
However, I'm very concerned about latency issues introduced by HTTP calls from one cloud service to another which would defeat the purpose of separating DAL into its own application/cloud service.
What is the best way to separate my DAL from my front-end application without introducing any latency issues?
I think the trade off between scaling-out/partitioning resources and network latency is unavoidable. That being said, you may find the trade-off well worth it for many reasons (i.e. enabling parallel execution of application tasks, increased reliability, etc.) when working w/ large-scale systems.
Here are some general tips to help you minimize the hit on network latency:
Use caching to avoid cross-service calls whenever possible.
Batch cross-service calls and re-use connections whenever possible to minimize the cost associated w/ traversing the NAT out of one cloud service and through the load balancer into another. Note - your application must also be able to handle dropped connections (inevitable in cloud architecture).
Monitor performance metrics as much as possible to take measurements and identify bottlenecks.
Co-locate your applications layers within the same datacenter to keep cross-service latency to a minimum.
You may also find the following literature useful: http://azure.microsoft.com/en-us/documentation/articles/best-practices-performance/
I recently split out my DAL to a WebAPI that serves data from DocumentDB for both the MVC website and mobile applications for the same reasons stated by the questioner.
The statements from aliuy are valid performance considerations generally accepted as good practice.
But more specifically - in order to call Web API from MVC without latency using Azure cloud services, one should specify same affinity group for each resource (websites, cloud services, etc).
Affinity groups are a way you can group your cloud services by
proximity to each other in the Azure datacenter in order to achieve
optimal performance. When you create an affinity group, it lets Azure
know to keep all of the services that belong to your affinity group as
physically close to each other as possible.
https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-migrate-to-regional-vnet/

Should an iphone app communicate directly with a cassandra backend?

Obviously there are multiple steps and phases of implementing such a thing.
I was thinking I would eventually have a webserver that takes http json requests from the ios app, and then queries the cassandra backend and sends results back. I could load balance and all that fancy stuff still, and also provide a logical layer on server side, and keep the client app lightweight.
I'm not sure i understand how cassandra clients fit though. It seems like the cassandra objective c client could eliminate the need for the above approach.
I saw another question and answer but it wasnt clear, perhaps because it varys on the need.
An iPhone app should not directly connect to a Cassandra backend or any other DB store.
First of all, talking to a database often requires adapting a very specific binary protocol (for Cassandra in particular, binary CQL or Thrift). Writing an adapter that would let your Objective-C app communicate in this binary protocol is a major piece of work, and could easily cost more than the rest of your app in effort. If you hide the DB behind a web-server, however, you will be able to select from a variety of existing adapters available in different server-side languages, meaning that you don't need to redo all that low-level work. You'll only be responsible for a relatively small piece of server-side code that would translate your REST queries and forward them to one of the Cassandra adapters (which expose easy-to-use interfaces).
Secondly, if you wanted to connect to a remote database from the phone, your database server would have to open its ports to the internet at large, which is a very bad security practice, even if you use SSL and user credentials. Again, if you hide behind a web server, you will be putting in a layer of technology that has evolved for decades to remain secure on the public internet.
Finally, having your phone talk to Cassandra directly is a poor architectural pattern. When you write apps that communicate on the internet, you want them to know as little as possible about each other, only how to talk to each other (preferably in a standard protocol). That way you can replace or upgrade individual components while keeping everything else the same. This may not sound like a lot, but is actually the main reason why phones, or web browsers, don't directly talk to databases. (If this setup were a good idea in principle, the first two problems could be easily solved given enough engineering effort.)
The approach you first suggested with JSON and the web server is the only correct way to go.
Use something like RESTful API, there are many reasons for that.
if your servers ip addresses change you have to update all client, if you add more nodes you will need to update all clients, if you decide to upgrade your cassandra and some functions change your clients will break and you need to update all clients.

Websocket scalability, broadcasting concerns

If you have a complex requirement set with many users(&servers) how will your websocket infrastructure (server[s]) will scale, especially with broadcasting?
Of course, broadcasting is not part of the any websocket spec but it's there even in basic chat examples (a.k.a. hello world for websocket).
Client side (asking for new data) solution still seems more scalable than server side (broadcasting) solution with websockets' low latency and relatively cheap (http headerless) nature.
Edit:
OK, just think that you want to replace all your ajax code with websocket implementations which may mean that so many connections within so many different contexts. This adds enormous complexity to your system if you want to keep track of every possible scenario for broadcasting.
Low (network/thread etc) level implementation suggestions are also part of the problem not the solution, because this means you have to code a special server unlike general http servers.
Moreover, broadcasting brings some sort of stateful nature to the table which can't easily scale. Think about adding more servers and load balancing.
Scaling realtime web solutions can be a complex problem but one that services like Pusher (who I work for) have solved, and one that there are most definitely solutions defined for self hosted realtime web solutions - the PubSub paradigm is well understood and has been solved many times and in order to solve the problem there needs to be some state (who is subscribing to what). This paradigm is used in broadcasting the the types of scenarios that you are talking about.
Realtime web technologies have been built with large amounts of simultaneous connections in mind - many from the ground up. If you wanted to create a scalable solution you would most likely use an existing realtime web server that supports WebSockets, in the same way that it's highly unlikely that you would decide to implement your own HTTP Server you are unlikely to want to implement your own server which supports WebSockets from scratch.
Dedicated Realtime web servers also let you separate your application logic from your realtime communication mechanism (separation of concerns). Your application might need to maintain some state but the realtime technology deals with managing subscriptions and connections. How communication between the application and the realtime web technology is achieved is up to you but frequently messages queues are used and specifically redis is very popular in this space.
HTTP polling may conceptually be easier to understand - you can maintain statelessness and with each HTTP poll request you specify exactly what you are looking for. But it most definitely means that you will need to start scaling much sooner (adding more resource to handle the load).
WebSocket polling is something I've not considered before and I don't think I've seen it suggested anywhere before either; the idea that the client should say "I'm ready for my next set of data and here's what I want" is an interesting one. WebSockets have generally taken a leap away from the request/response paradigm but there may be scenarios where the increased efficiency of WebSockets and request/response using them may have some benefits. The SocketStream application framework might be worth a look as it might be relevant; after the initial application load all communication is performed over WebSockets which means that event basic request/response functionality uses WebSockets.
However, since we are talking about broadcasting data we need to go back to the PubSub paradigm where it makes much more sense to have active subscriptions and when new data is available that new data is distributed to those active subscriptions (pushed). All your application needs to know is if there are any active subscriptions or not in order to decide whether to publish the data or not. That problem has been solved.
The idea of websockets is that you keep a persistent connection with each client. When there is new data that you want to send to every client, you already know who all the clients are so you should just send it.
It sound like you want each client to constantly be sending requests to the server for new data. Why? It seems like that would waste everyone's bandwidth and I don't know why you think it will be more scalable. Maybe you could add more detail to your question like what kind of information you are broadcasting, how often, how many bytes, how many clients, etc.
Why not just consider an open websocket connection to be like a standing request from the client for more data?

Cloud computing: Learn to scale server up/down automatically

I'm really impressed with the power of cloud computing when it comes to the possibility to scale up and down your facilities depending on your load.
How can I shift my paradigm and learn to write my applications in that way? Write it once and forget(no matter of the future load) would be the best solution.
How can I practice my skills in that area?
Setup virtualization environment when I can add another VMs into the private cloud(via command line?) on some smart algorithms to foresee the load for some period of time?
Ideally I want to practice it without buying actual Cloud computing services and just on my hardware.
The only thing I want to practice here is app/web role and/or message queue systems scaling when current workers have too many jobs in queue. So let's rule out database scaling from the question's goal as too big topic.
One option I will throw out is to use a native Cloud execution framework. You might look at CloudIQ Platform. One component is CloudIQ Engine. It allows you to develop cloud native apps in C/C++, Java and .NET. You get the capabilities of scale up by simply adding workers to your cloud. The framework automatically distributes your applications to the new machine(s), and once installed, will begin sending work to them as requests come in. So in effect the cloud handles your queueing issue for you.
Check out the Download and Community links for more information.
You should try AWS- Amazon's offering a free tier that gives you storage, messaging and micro instances (only linux). you can start developing small try-outs without paying. writing an application that scales isn't that hard- try to break your flow into small, concurrent tasks. client-server applications are even easier- use a load balancer to raise\terminate servers by demand.

Resources