Flask request.data is slow

Flask request.data is slow - docker

We have a few clients in Asia and the USA and we're seeing this strange behavior when calling request.data when handling their POST requests:
The Singapore client is super fast (> 10 ms)
The USA clients are not as fast (50 - 100 ms)
The Chinese client is the slowest (200+ ms)
We got the above data by using cProfile, so that should be accurate (I think?). The payload of each client varies between 50 - 700 bytes but does not seem to exhibit any patterns (the Singapore client has a medium sized POST payload and the Chinese one has a small sized one)
After looking in this question, I suspect we're facing something similar, where the request is processed immediately after the headers are received, so calling request.data blocks until the full POST payload is received. I am guessing that the Chinese clients are the slowest since the GFW slows down the transmission of the POST payload.
I have two questions:
Does the analysis make sense?
How can I fix this? The above behavior seems quite inefficient since my API instance is blocked for an additional amount of time and wastes CPU cycles. It seems like it would work better if the request was fully received before being sent to the API instance
FWIW, I inherited this code base and there may be some gaps in my understanding but our DCOS architecture is similar to the image below. I tried looking for configuration options in the external marathon LB to increase buffering or send only fully received requests but I didn't find such options.

Looks like I figured this one out!
Apparently Marathon LB is a wrapper around HAProxy and HAProxy has a mechanism to receive the full HTTP request payload before forwarding it on to the backend. Adding the http-buffer-request option to the Marathon-LB configuration seems to have done the trick!

Related

Slowness in the geolocation API

I'm working on a project that uses HERE's geolocation service.
The project is basically a feature in our system that will route a list of addresses. This routing will happen every day and will have around 7000 points, at least.
Today we use the HERE service to geolocate these addresses and send them to our routing service. However, we are facing a huge bottleneck in this implementation: Of the 7000 points we use for testing, we were able to send only about 200 to geolocate, if we send a larger number of points, we simply do not receive any more response, nor the return of timeout or anything like that.
About the implementation: we do not send all points in the same request, each point to be geocoded is sent in a request. We adjusted our software to send only four requests per second thinking that there could be a QPS block, but we were not successful in solving the problem. We thought about also implementing a massage queue, but this could end up increasing the total time of geolocation + routing, which for us makes the solution unfeasible.
In the code, we have an array that stores the addresses to be geocoded, and for each position of the array we execute a GET request for the following URL: https://geocoder.ls.hereapi.com/6.2/geocode.json?apiKey=TOKEN&searchtext=ADDRESS
If you can help me find a solution.

For a large numbers of geocodes you may wish to consider the Batch Geocoder API:
https://developer.here.com/documentation/batch-geocoder/dev_guide/topics/quick-start-batch-geocode.html
I cannot replicate a problem with more than 200 Geocoder requests in a row, so we may need to see some code before we can help further.

Are you using our freemium service ? just to let you know that our 6.2 version of geocoder API is no longer support any new feature development, and hence if you are still implmenting the use case. Please try to switch to V7. Do you mean that you are not able to send entire 7000 addresses and fetch response even in chunks. It could be also due to Linux system that has restricted number of pool network connections on the same moment.try to send requests from some home endpoint (that not behind firewall ) and from Windows system

Multiple unary rpc calls vs long-running bidirectional streaming in grpc?

I have a use case where many clients need to keep sending a lot of metrics to the server (almost perpetually). The server needs to store these events, and process them later. I don't expect any kind of response from the server for these events.
I'm thinking of using grpc for this. Initially, I thought client-side streaming would do (like how envoy does), but the issue is that client side streaming cannot ensure reliable delivery at application level (i.e. if the stream closed in between, how many messages that were sent were actually processed by the server) and I can't afford this.
My thought process is, I should either go with bidi streaming, with acks in the server stream, or multiple unary rpc calls (perhaps with some batching of the events in a repeated field for performance).
Which of these would be better?

the issue is that client side streaming cannot ensure reliable delivery at application level (i.e. if the stream closed in between, how many messages that were sent were actually processed by the server) and I can't afford this
This implies you need a response. Even if the response is just an acknowledgement, it is still a response from gRPC's perspective.
The general approach should be "use unary," unless large enough problems can be solved by streaming to overcome their complexity costs. I discussed this at 2018 CloudNativeCon NA (there's a link to slides and YouTube for the video).
For example, if you have multiple backends then each unary RPC may be sent to a different backend. That may cause a high overhead for those various backends to synchronize themselves. A streaming RPC chooses a backend at the beginning and continues using the same backend. So streaming might reduce the frequency of backend synchronization and allow higher performance in the service implementation. But streaming adds complexity when errors occur, and in this case it will cause the RPCs to become long-lived which are more complicated to load balance. So you need to weigh whether the added complexity from streaming/long-lived RPCs provides a large enough benefit to your application.
We don't generally recommend using streaming RPCs for higher gRPC performance. It is true that sending a message on a stream is faster than a new unary RPC, but the improvement is fixed and has higher complexity. Instead, we recommend using streaming RPCs when it would provide higher application (your code) performance or lower application complexity.

Streams ensure that messages are delivered in the order that they were sent, this would mean that if there are concurrent messages, there will be some kind of bottleneck.
Google’s gRPC team advises against using streams over unary for performance, but nevertheless, there have been arguments that theoretically, streams should have lower overhead. But that does not seem to be true.
For a lower number of concurrent requests, both seem to have comparable latencies. However, for higher loads, unary calls are much more performant.
There is no apparent reason we should prefer streams over unary, given using streams comes with additional problems like
Poor latency when we have concurrent requests
Complex implementation at the application level
Lack of load balancing: the client will connect with one server and ignore any new servers
Poor resilience to network interruptions (even small interruptions in TCP connections will fail the connection)
Some benchmarks here: https://nshnt.medium.com/using-grpc-streams-for-unary-calls-cd64a1638c8a

How is SIP scaled for high load?

Basically, I want to implement a VoIP system with sip in a vps server. But it seems that it would not be able to handle more than ~20 simultaneous calls(just bare sip). What are the workarounds to this problem? Can the sip server be just used as a database to tell the clients where to find their intended targets..? Like p2p? I am quite new to sip. Additional info is appreciated.

Your VPS server looks to pretty low-key and when you say it cant handle more than 20 Cps that seems to indicate it topped out on CPU. Correct me if thats not the case.
Options to Scale SIP
Of the Shelf SIP Load balancer - Available in Virtual / Hardware / Opensource and every flavor that you want. It hides a farm of SIP Servers that you have and it can be managed to spread the load accordingly.
Unless the nature of SIP server is defined, it can be difficult to understand the bottlenecks you face and without that its difficult to give a simple solution.

SIP scalability comes from delegating as much work to the endpoints and doing as little on the servers as possible.
What you describe is a "redirect server": it accepts and stores registrations from the endpoints (softphones, hardphones, etc), and responds with "3xx redirect" to incoming calls and forgets about them immediately.
This is probably the most extreme example of server minimization. SIP is a very versatile protocol, it lets you set up your server infrastructure in many different ways with varying degree of control over calls. It lets you trade off features for performance.
Even the flimsiest VPS should be able to handle the signalling for way more than 20 parallel calls even in full "stateful proxy" mode.
Just make sure media (the RTP streams) is not routed through your server. Set up STUN to help firewalled endpoints send media to each other directly.

Real time audio conversation iOS

I am designing an iOS app for a customer who wants to allow real-time (with minimum lag, max 50ms) conversations between users (a sort of Teamspeak). The lag must be low because the audio can also be live music, played with instruments, so all the users need to synchronize. I need a server, which will request audio recordings to every client and send to others (and make them hear the same sound at the same time).
HTTP is easy to manage/implement and easy to scale, but very low-performing because an average HTTP request takes > 50ms... (with a mid-level hardware), so I was thinking of TCP/UDP connections kept open between clients and server.
But I have some questions:
If I develop the server in Python (using TwistedMatrix, for example), how are its performance ?
I can't develop the server in C++ because it is hard to manage (scalable) and to develop.
Anyone used Nodejs (which is easy to scale) to manage TCP/UDP connections?
If I use HTTP, will it be fast enough with Keep-Alive? Becuase usually the time required for an HTTP Request to be performed is > 50ms (because opening-closing connection is hard), and I want the total procedure to be less than that time.
The server will be running on a Linux machine.
And finally: which type of compression can you suggest me? I thought Ogg Vorbis would be nice, but if there's anything better (and can be used in iOS), I am open to changes.
Thank you,
Umar.

First off, you are not going to get sub 50 ms latency. Others have tried this. See for example http://ejamming.com/ a service that attempts to do what you are doing, but has a musically noticeable delay over the line and is therefore, in the ears of many, completely unusable. They use special routing techniques to get the latency as low as possible and last I heard their service doesn't work with some router configurations.
Secondly, what language you use on server probably doesn't make much difference, as the delay from client to server will be worse than any delay caused by your service, but if I understand your service correctly, you are going to need a lot of servers (or server threads) just relaying audio data between clients or doing some sort of minimal mixing. This is a small amount of work per connection, but a lot of connections, so you need something that can handle that. I would lean towards something like Java, Scala, or maybe Go. I could be wrong, but I don't think this is a good use-case for node, which, as I understand it, does not do multithreading well at this time. Also, don't poo-poo C++, scalable services have been built C++. You could also build the relay part of the service in C++ and the rest in whatever.
Third, when choosing a compression format, you'll have to choose one that can survive packet loss if you plan to use UDP, and I think UDP is the only way to go for this. I don't think vorbis is up to this task, but I could be wrong. Off the top of my head, I'm not sure of anything that works on the iPhone and is UDP friendly, but I'm sure there are lots of things. Speex is an example and is open-source. Not sure if the latency and quality meet your needs.
Finally, to be blunt, I think there are som other things you should research a bit more. eg. DNS is usually cached locally and not checked every http call (though it may depend on the system/library. At least most systems cache dns locally). Also, there is no such protocol as TCP/UDP. There is TCP/IP (sometimes just called TCP) and UDP/IP (sometimes just called UDP). You seem to refer to the two as if they are one. The difference is very important for what you are doing. For example, HTTP runs on top of TCP, not UDP, and UDP is considered "unreliable", but has less overhead, so it's good for streaming.
Edit: speex

What concerns the server, the request itself is not a bottleneck. I guess you have sufficient time to set up the connection, as it happens only in the beginning of the session. Therefore the protocol is not of much relevance.
But consider that HTTP is a stateless protocol and not suitable for audio streaming. There are a couple of real time streaming protocols you can choose from. All of them will work over TCP or UDP (e.g. use raw sockets), and there are plenty of implementations.
In your case, the bottleneck with latency is not the server but the network itself. The connection between an iOS device and a wireless access point (AP) eats up about 40ms if the AP is not misconfigured and connection is good. (ping your iPhone.) In total, you'd have a minimum of 80ms for the path iOS -> AP -> Server -> AP -> iOS. But it is difficult to keep that latency stable. (Typical latency of AirPlay on my local network is about 300ms.)
I think live music over iOS devices is not practicable today. Try skype between two iOS devices and look how close you can get to 50ms. I'd bet no one can do it significantly better, what concerns latency.
Update: New research result!
I have to revise my claims regarding the latency of wifi connections of the iDevice. Apparently when you first ping your device, latency will be bad. But if I ping again no later than 200ms after that, I see an average latency 2ms-3ms between AP and iDevice.
My interpretation is that if there is no communication between AP and iDevice for more than 200ms, the network adapter of the iDevice will go to a less responsive sleep mode, probably to save battery power.
So it seems, live music is within reach again... :-)
Update 2
The ping-interval required for keep alive of low latency apparently differs from device to device. The reported 200ms is for an 3rd gen. iPad. For my iPhone 4 it's more like 50ms.
While streaming audio you probably don't need to bother with this, as data is exchanged on a more frequent basis. In my own context, I have sparse communication between an iDevice and a server, but low latency is crucial. A keep alive therefore is the way to go.
Best, Peter

Large number of WebSocket connections

I am writing an application that keeps track of content pushed around between users of a certain task. I am thinking of using WebSockets to send down new content as they are available to all users who are currently using the app for that given task.
I am writing this on Rails and the client side app is on iOS (probably going to be in Android too). I'm afraid that this WebSocket solution might not scale well. I am after some advice and things to consider while making the decision to go with WebSockets vs. some kind of polling solution.
Would Ruby on Rails servers (like Heroku) support large number of WebSockets open at the same time? Let's say a million connections for argument sake. Any material anyone can provide me of such stuff?
Would it cost a lot more on server hosting if I architect it this way?
Is it even possible to maintain millions of WebSockets simultaneously? I feel like this may not be the best design decision.
This is my first try at a proper Rails API. Any advice is greatly appreciated. Thx.

Million connections over WebSockets, using Ruby, I can't see its real if you not using clustering to spread connections between different instances to handle all the data processing.
The problem here is serializing and deserializing data.
As well you have to research of how often you will need to pull data to client from server, and if it worth to have just periodical checks using AJAX, then handling connection for whole time. Because if you do handle connection and then you not using it - it is waste of resources. WebSockets are build on top of TCP layer, and all connections are not "cheap" as well going through for OS and asking them for data available again is not the simple process, with millions connections it is something really almost impossible without using most advanced technologies in the world.
I head that Erlang is able to handle millions of connections, but I don't have details over it. As well connection is one thing, another is processing data and interaction between connections - this you might want to check, because if you have heavy processing algorithms, then you definitely need to look into horizontal scaling options over clustering solutions.

If you are implementing chat, use websockets.
If you are implementing 1 way messages in realtime use server sent events.
If you are implementing 1 way messages sent every few hours or so, use APNS.
The saying goes phone in hand, use websockets / server sent events.
Phone in pocket, use APNS.
APNS will alleviate wifi dips, tcp/ip socket hangs and many other issues. Really useful. There is the chance that it may take a little time to get through. But then again, there is the chance that websockets will take
Recent versions of iOS let you send APNS to the client without a popup message to the client so it can ask the server for more information. That along with some backgrounding implementations really improves things.
If possible, do not implement totally anonymous clients. It is very tricky to detect if a client reinstalls the app. So you'll end up sending duplicates to the client. Need to take that into account.
APNS looks trivial to implement in ruby, but I'd suggest avoiding the urge and going to using an existing gem/service out there that supports both google and apple. It is much trickier to implement than it may seem at first.
If you decide to stick with websockets, it may make sense to just leverage websockets in nginx like https://github.com/wandenberg/nginx-push-stream-module
ASIDE:
Using SMS where speed is critical is very expensive. $1/month per phone number only sends a max rate of 1 message per second. So sending 100 messages per second = $100/month plus message fees. Do note that 100 messages at a rate of 50 messages/second = $50/month. But if you want to send 1k messages, that takes 20 seconds.
Good luck

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart