Multiple unary rpc calls vs long-running bidirectional streaming in grpc? - stream

I have a use case where many clients need to keep sending a lot of metrics to the server (almost perpetually). The server needs to store these events, and process them later. I don't expect any kind of response from the server for these events.
I'm thinking of using grpc for this. Initially, I thought client-side streaming would do (like how envoy does), but the issue is that client side streaming cannot ensure reliable delivery at application level (i.e. if the stream closed in between, how many messages that were sent were actually processed by the server) and I can't afford this.
My thought process is, I should either go with bidi streaming, with acks in the server stream, or multiple unary rpc calls (perhaps with some batching of the events in a repeated field for performance).
Which of these would be better?

the issue is that client side streaming cannot ensure reliable delivery at application level (i.e. if the stream closed in between, how many messages that were sent were actually processed by the server) and I can't afford this
This implies you need a response. Even if the response is just an acknowledgement, it is still a response from gRPC's perspective.
The general approach should be "use unary," unless large enough problems can be solved by streaming to overcome their complexity costs. I discussed this at 2018 CloudNativeCon NA (there's a link to slides and YouTube for the video).
For example, if you have multiple backends then each unary RPC may be sent to a different backend. That may cause a high overhead for those various backends to synchronize themselves. A streaming RPC chooses a backend at the beginning and continues using the same backend. So streaming might reduce the frequency of backend synchronization and allow higher performance in the service implementation. But streaming adds complexity when errors occur, and in this case it will cause the RPCs to become long-lived which are more complicated to load balance. So you need to weigh whether the added complexity from streaming/long-lived RPCs provides a large enough benefit to your application.
We don't generally recommend using streaming RPCs for higher gRPC performance. It is true that sending a message on a stream is faster than a new unary RPC, but the improvement is fixed and has higher complexity. Instead, we recommend using streaming RPCs when it would provide higher application (your code) performance or lower application complexity.

Streams ensure that messages are delivered in the order that they were sent, this would mean that if there are concurrent messages, there will be some kind of bottleneck.
Google’s gRPC team advises against using streams over unary for performance, but nevertheless, there have been arguments that theoretically, streams should have lower overhead. But that does not seem to be true.
For a lower number of concurrent requests, both seem to have comparable latencies. However, for higher loads, unary calls are much more performant.
There is no apparent reason we should prefer streams over unary, given using streams comes with additional problems like
Poor latency when we have concurrent requests
Complex implementation at the application level
Lack of load balancing: the client will connect with one server and ignore any new servers
Poor resilience to network interruptions (even small interruptions in TCP connections will fail the connection)
Some benchmarks here: https://nshnt.medium.com/using-grpc-streams-for-unary-calls-cd64a1638c8a

Related

Real time audio conversation iOS

I am designing an iOS app for a customer who wants to allow real-time (with minimum lag, max 50ms) conversations between users (a sort of Teamspeak). The lag must be low because the audio can also be live music, played with instruments, so all the users need to synchronize. I need a server, which will request audio recordings to every client and send to others (and make them hear the same sound at the same time).
HTTP is easy to manage/implement and easy to scale, but very low-performing because an average HTTP request takes > 50ms... (with a mid-level hardware), so I was thinking of TCP/UDP connections kept open between clients and server.
But I have some questions:
If I develop the server in Python (using TwistedMatrix, for example), how are its performance ?
I can't develop the server in C++ because it is hard to manage (scalable) and to develop.
Anyone used Nodejs (which is easy to scale) to manage TCP/UDP connections?
If I use HTTP, will it be fast enough with Keep-Alive? Becuase usually the time required for an HTTP Request to be performed is > 50ms (because opening-closing connection is hard), and I want the total procedure to be less than that time.
The server will be running on a Linux machine.
And finally: which type of compression can you suggest me? I thought Ogg Vorbis would be nice, but if there's anything better (and can be used in iOS), I am open to changes.
Thank you,
Umar.
First off, you are not going to get sub 50 ms latency. Others have tried this. See for example http://ejamming.com/ a service that attempts to do what you are doing, but has a musically noticeable delay over the line and is therefore, in the ears of many, completely unusable. They use special routing techniques to get the latency as low as possible and last I heard their service doesn't work with some router configurations.
Secondly, what language you use on server probably doesn't make much difference, as the delay from client to server will be worse than any delay caused by your service, but if I understand your service correctly, you are going to need a lot of servers (or server threads) just relaying audio data between clients or doing some sort of minimal mixing. This is a small amount of work per connection, but a lot of connections, so you need something that can handle that. I would lean towards something like Java, Scala, or maybe Go. I could be wrong, but I don't think this is a good use-case for node, which, as I understand it, does not do multithreading well at this time. Also, don't poo-poo C++, scalable services have been built C++. You could also build the relay part of the service in C++ and the rest in whatever.
Third, when choosing a compression format, you'll have to choose one that can survive packet loss if you plan to use UDP, and I think UDP is the only way to go for this. I don't think vorbis is up to this task, but I could be wrong. Off the top of my head, I'm not sure of anything that works on the iPhone and is UDP friendly, but I'm sure there are lots of things. Speex is an example and is open-source. Not sure if the latency and quality meet your needs.
Finally, to be blunt, I think there are som other things you should research a bit more. eg. DNS is usually cached locally and not checked every http call (though it may depend on the system/library. At least most systems cache dns locally). Also, there is no such protocol as TCP/UDP. There is TCP/IP (sometimes just called TCP) and UDP/IP (sometimes just called UDP). You seem to refer to the two as if they are one. The difference is very important for what you are doing. For example, HTTP runs on top of TCP, not UDP, and UDP is considered "unreliable", but has less overhead, so it's good for streaming.
Edit: speex
What concerns the server, the request itself is not a bottleneck. I guess you have sufficient time to set up the connection, as it happens only in the beginning of the session. Therefore the protocol is not of much relevance.
But consider that HTTP is a stateless protocol and not suitable for audio streaming. There are a couple of real time streaming protocols you can choose from. All of them will work over TCP or UDP (e.g. use raw sockets), and there are plenty of implementations.
In your case, the bottleneck with latency is not the server but the network itself. The connection between an iOS device and a wireless access point (AP) eats up about 40ms if the AP is not misconfigured and connection is good. (ping your iPhone.) In total, you'd have a minimum of 80ms for the path iOS -> AP -> Server -> AP -> iOS. But it is difficult to keep that latency stable. (Typical latency of AirPlay on my local network is about 300ms.)
I think live music over iOS devices is not practicable today. Try skype between two iOS devices and look how close you can get to 50ms. I'd bet no one can do it significantly better, what concerns latency.
Update: New research result!
I have to revise my claims regarding the latency of wifi connections of the iDevice. Apparently when you first ping your device, latency will be bad. But if I ping again no later than 200ms after that, I see an average latency 2ms-3ms between AP and iDevice.
My interpretation is that if there is no communication between AP and iDevice for more than 200ms, the network adapter of the iDevice will go to a less responsive sleep mode, probably to save battery power.
So it seems, live music is within reach again... :-)
Update 2
The ping-interval required for keep alive of low latency apparently differs from device to device. The reported 200ms is for an 3rd gen. iPad. For my iPhone 4 it's more like 50ms.
While streaming audio you probably don't need to bother with this, as data is exchanged on a more frequent basis. In my own context, I have sparse communication between an iDevice and a server, but low latency is crucial. A keep alive therefore is the way to go.
Best, Peter

Large number of WebSocket connections

I am writing an application that keeps track of content pushed around between users of a certain task. I am thinking of using WebSockets to send down new content as they are available to all users who are currently using the app for that given task.
I am writing this on Rails and the client side app is on iOS (probably going to be in Android too). I'm afraid that this WebSocket solution might not scale well. I am after some advice and things to consider while making the decision to go with WebSockets vs. some kind of polling solution.
Would Ruby on Rails servers (like Heroku) support large number of WebSockets open at the same time? Let's say a million connections for argument sake. Any material anyone can provide me of such stuff?
Would it cost a lot more on server hosting if I architect it this way?
Is it even possible to maintain millions of WebSockets simultaneously? I feel like this may not be the best design decision.
This is my first try at a proper Rails API. Any advice is greatly appreciated. Thx.
Million connections over WebSockets, using Ruby, I can't see its real if you not using clustering to spread connections between different instances to handle all the data processing.
The problem here is serializing and deserializing data.
As well you have to research of how often you will need to pull data to client from server, and if it worth to have just periodical checks using AJAX, then handling connection for whole time. Because if you do handle connection and then you not using it - it is waste of resources. WebSockets are build on top of TCP layer, and all connections are not "cheap" as well going through for OS and asking them for data available again is not the simple process, with millions connections it is something really almost impossible without using most advanced technologies in the world.
I head that Erlang is able to handle millions of connections, but I don't have details over it. As well connection is one thing, another is processing data and interaction between connections - this you might want to check, because if you have heavy processing algorithms, then you definitely need to look into horizontal scaling options over clustering solutions.
If you are implementing chat, use websockets.
If you are implementing 1 way messages in realtime use server sent events.
If you are implementing 1 way messages sent every few hours or so, use APNS.
The saying goes phone in hand, use websockets / server sent events.
Phone in pocket, use APNS.
APNS will alleviate wifi dips, tcp/ip socket hangs and many other issues. Really useful. There is the chance that it may take a little time to get through. But then again, there is the chance that websockets will take
Recent versions of iOS let you send APNS to the client without a popup message to the client so it can ask the server for more information. That along with some backgrounding implementations really improves things.
If possible, do not implement totally anonymous clients. It is very tricky to detect if a client reinstalls the app. So you'll end up sending duplicates to the client. Need to take that into account.
APNS looks trivial to implement in ruby, but I'd suggest avoiding the urge and going to using an existing gem/service out there that supports both google and apple. It is much trickier to implement than it may seem at first.
If you decide to stick with websockets, it may make sense to just leverage websockets in nginx like https://github.com/wandenberg/nginx-push-stream-module
ASIDE:
Using SMS where speed is critical is very expensive. $1/month per phone number only sends a max rate of 1 message per second. So sending 100 messages per second = $100/month plus message fees. Do note that 100 messages at a rate of 50 messages/second = $50/month. But if you want to send 1k messages, that takes 20 seconds.
Good luck

What is the performance overhead of Apache ActiveMQ vs. raw sockets?

We're looking to implement ActiveMQ to handle messaging between two of our servers, over a geographically diverse environment (Australia to the UK and back, via the internet).
I've been looking for some vague indicators of performance round the net but so far have had no luck.
My question: compared to a DIY TCP/SSL implementation of basic messaging, how would ActiveMQ perform? Similar systems of our own can send and receive messages across Australia in 100-150ms, over a SSL layer with an already established connection.
Also, does ActiveMQ persist its TLS/SSL connections, thus saving a substantial amount of time that would already be used in connection creation/teardown?
What I am hoping is that it will at least perform better than HTTPS, at a per-request level.
I am aware that performance can vary remarkably, depending on hardware, networks, code and so on. I'm just after something to start with.
I know the above is a little fuzzy - if you need any clarification please let me know and I will only be too happy to oblige.
Thank you.
What Tim means is that this is not an apples to apples comparison. If you are solely concerned with the performance of a single point to point connection to transfer data, a direct link will give you a good result (although DIY is still a dubious design decision). If you are building a system that requires the transfer of data and you have more complex functional requirements, then a broker-based messaging platform like ActiveMQ will come into play.
You should consider broker-based messaging if you want:
a post-office style system where a producer sends a message, and knows that it will be consumed at some point, even if there is no consumer there at that time
to not care where the consumer of a message is, or how many of them there are
a guarantee that a message will be consumed, even if the consumer that first handle it dies mid-way through the process (transactions, redelivery)
many consumers, with a guarantee that a message will only be consumed once - queues
many consumers that will each react to a single message - topics
These patterns are pretty standard, and apply to all off the shelf messaging products. As a general rule, DIY in this domain is a bad idea, as messaging is complex (see http://www.ohloh.net/p/activemq/estimated_cost for an estimate of how long it would take you do do same); and has many existing implementations of various flavours (some without a broker) that are all well used, commercially supported and don't require you to maintain them. I would think very hard before going down to the TCP level for any sort of data transfer as there is so much prior art.

Websocket scalability, broadcasting concerns

If you have a complex requirement set with many users(&servers) how will your websocket infrastructure (server[s]) will scale, especially with broadcasting?
Of course, broadcasting is not part of the any websocket spec but it's there even in basic chat examples (a.k.a. hello world for websocket).
Client side (asking for new data) solution still seems more scalable than server side (broadcasting) solution with websockets' low latency and relatively cheap (http headerless) nature.
Edit:
OK, just think that you want to replace all your ajax code with websocket implementations which may mean that so many connections within so many different contexts. This adds enormous complexity to your system if you want to keep track of every possible scenario for broadcasting.
Low (network/thread etc) level implementation suggestions are also part of the problem not the solution, because this means you have to code a special server unlike general http servers.
Moreover, broadcasting brings some sort of stateful nature to the table which can't easily scale. Think about adding more servers and load balancing.
Scaling realtime web solutions can be a complex problem but one that services like Pusher (who I work for) have solved, and one that there are most definitely solutions defined for self hosted realtime web solutions - the PubSub paradigm is well understood and has been solved many times and in order to solve the problem there needs to be some state (who is subscribing to what). This paradigm is used in broadcasting the the types of scenarios that you are talking about.
Realtime web technologies have been built with large amounts of simultaneous connections in mind - many from the ground up. If you wanted to create a scalable solution you would most likely use an existing realtime web server that supports WebSockets, in the same way that it's highly unlikely that you would decide to implement your own HTTP Server you are unlikely to want to implement your own server which supports WebSockets from scratch.
Dedicated Realtime web servers also let you separate your application logic from your realtime communication mechanism (separation of concerns). Your application might need to maintain some state but the realtime technology deals with managing subscriptions and connections. How communication between the application and the realtime web technology is achieved is up to you but frequently messages queues are used and specifically redis is very popular in this space.
HTTP polling may conceptually be easier to understand - you can maintain statelessness and with each HTTP poll request you specify exactly what you are looking for. But it most definitely means that you will need to start scaling much sooner (adding more resource to handle the load).
WebSocket polling is something I've not considered before and I don't think I've seen it suggested anywhere before either; the idea that the client should say "I'm ready for my next set of data and here's what I want" is an interesting one. WebSockets have generally taken a leap away from the request/response paradigm but there may be scenarios where the increased efficiency of WebSockets and request/response using them may have some benefits. The SocketStream application framework might be worth a look as it might be relevant; after the initial application load all communication is performed over WebSockets which means that event basic request/response functionality uses WebSockets.
However, since we are talking about broadcasting data we need to go back to the PubSub paradigm where it makes much more sense to have active subscriptions and when new data is available that new data is distributed to those active subscriptions (pushed). All your application needs to know is if there are any active subscriptions or not in order to decide whether to publish the data or not. That problem has been solved.
The idea of websockets is that you keep a persistent connection with each client. When there is new data that you want to send to every client, you already know who all the clients are so you should just send it.
It sound like you want each client to constantly be sending requests to the server for new data. Why? It seems like that would waste everyone's bandwidth and I don't know why you think it will be more scalable. Maybe you could add more detail to your question like what kind of information you are broadcasting, how often, how many bytes, how many clients, etc.
Why not just consider an open websocket connection to be like a standing request from the client for more data?

What is the most common approach for designing large scale server programs?

Ok I know this is pretty broad, but let me narrow it down a bit. I've done a little bit of client-server programming but nothing that would need to handle more than just a couple clients at a time. So I was wondering design-wise what the most mainstream approach to these servers is. And if people could reference either tutorials, books, or ebooks.
Haha ok. didn't really narrow it down. I guess what I'm looking for is a simple but literal example of how the server side program is setup.
The way I see it: client sends command: server receives command and puts into queue, server has either a single dedicated thread or a thread pool that constantly polls this queue, then sends the appropriate response back to the client. Is non-blocking I/O often used?
I suppose just tutorials, time and practice are really what I need.
*EDIT: Thanks for your responses! Here is a little more of what I'm trying to do I suppose.
This is mainly for the purpose of learning so I'd rather steer away from use of frameworks or libraries as much as I can. Take for example this somewhat made up idea:
There is a client program it does some function and constantly streams the output to a server(there can be many of these clients), the server then creates statistics and stores most of the data. And lets say there is an admin client that can log into the server and if any clients are streaming data to the server it in turn would stream that data to each of the admin clients connected.
This is how I envision the server program logic:
The server would have 3 Threads for managing incoming connections(one for each port listening on) then spawning a thread to manage each connection:
1)ClientConnection which would basically just receive output, which we'll just say is text
2)AdminConnection which would be for sending commands between server and admin client
3)AdminDataConnection which would basically be for streaming client output to the admin client
When data comes in from a client to the server the server parses what is relevant and puts that data in a queue lets say adminDataQueue. In turn there is a Thread that watches this queue and every 200ms(or whatever) would check the queue to see if there is data, if there is, then cycle through the AdminDataConnections and send it to each.
Now for the AdminConnection, this would be for any commands or direct requests of data. So you could request for statistics, the server-side would receive the command for statistics then send a command saying incoming statistics, then immediately after that send a statistics object or data.
As for the AdminDataConnection, it is just the output from the clients with maybe a few simple commands intertwined.
Aside from the bandwidth concerns of the logical problem of all the client data being funneled together to each of the admin clients. What sort of problems would arise from this design due to scaling issues(again neglecting bandwidth between clients and server; and admin clients and server.
There are a couple of basic approaches to doing this.
Worker threads or processes. Apache does this in most of its multiprocessing modes. In some versions of this, a thread or process is spawned for each request when the request arrives; in other versions, there's a pool of waiting threads which are assigned work as it arrives (avoiding the fork/thread create overhead when the request arrives).
Asynchronous (non-blocking) I/O and an event loop. This is basically using the UNIX select call (although both FreeBSD and Linux provide more optimized alternatives such as kqueue). lighttpd uses this approach and is able to achieve very high scalability, but any in-server computation blocks all other requests. Concurrent dynamic request handling is passed on to separate processes (via CGI) or waiting processes (via FastCGI or its equivalent).
I don't have any particular references handy to point you to, but if you look at the web sites for open source projects using the different approaches for information on their design wouldn't be a bad start.
In my experience, building a worker thread/process setup is easier when working from the ground up. If you have a good asynchronous framework that integrates fully with your other communications tasks (such as database queries), however, it can be very powerful and frees you from some (but not all) thread locking concerns. If you're working in Python, Twisted is one such framework. I've also been using Lwt for OCaml lately with good success.

Resources