I made a simple program which waits for 60 seconds. I have 300 input elements to process.
Number of threads - Batch - 1 and Streaming - 300 per this document
https://cloud.google.com/dataflow/docs/resources/faq#beam-java-sdk
In streaming mode - with 1 worker and 300 threads, job should complete in 2 to 3 minutes considering the overhead of spawning workers etc. My understanding is there will be 300 threads for each of 300 input elements and all sleep for 60 seconds and the job should get done. However, the job takes more time to complete.
Similarly, in Batch mode with 1 worker (1 Thread) and 300 input elements, it should take 300 minutes to complete.
Can someone clarify how this happens at worker level ?
There is considerable overhead in starting up and tearing down worker VMs, so it's hard to generalize from a short experiment such as this. In addition, there's not promise that there will be a given number of workers for streaming or batch, as this is an implementation-dependent parameter that my change at any time for any runner (and indeed may even be chosen dynamically).
Neo4j 3.5.12 Community Edition
Ubuntu Server 20.04.2
RAM: 32 Gb
EC2 instance with 4 or 8 CPUs (I change it to accommodate for processing at the moment)
Database files: 6.5Gb
Python, WSGI, Flask
dbms.memory.heap.initial_size=17g
dbms.memory.heap.max_size=17g
dbms.memory.pagecache.size=11g
I'm seeing high CPU use on the server in what appears to be a random pattern. I've profiled all the queries for the pages that I know that people are visiting at those times and they are all optimised with executions under 50ms in all cases. The CPU use doesn't seem linked with user numbers which are very low at most times anyway (max 40 concurrent users). I've checked all queries in cron jobs too.
I reduced the database notes significantly and that made no difference to performance.
I warm the database by preloading all nodes into ram with MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN count(n.prop) + count(r.prop);
The pattern is that there will be a few minutes of very low CPU use (as I would expect from this setup with these user numbers) and then processing on most CPU cores goes up to the high 90%s and the machine becomes unresponsive to new requests. Changing to an 8CPU instance sorts it, but shouldn't be needed for this level of traffic.
I would like to profile the queries with query logging, but the community edition doesn't support that.
Thanks.
Run a CPU profiler such as perf to record where CPU time is spent. You can then visualize it as a FlameGraph or, since your bursts only occur at random intervals, visualize it over time with Netflix' FlameScope
Since Neo4j is a Java application, it might also be worthwhile to have a look at async-profiler which is priceless when it comes to profiling Java applications (and it generates similar FlameGraphs and can output log files compatible with FlameScope or JMC)
I host a static web page with Caddy Server on Google's managed Kubernetes service. I monitor this site with Google Stackdriver.
I deploy my code via a Helm chart. See the source code here (permalink).
I have noticed that the latency will oscillate between about 40 ms and 200 ms every 3 minutes. Almost like clockwork.
As far as I can tell there isn't an obvious correlation between CPU usage and the latency period.
Here is a histogram of CPU usage and latency:
What could be causing this oddly repetitive pattern?
I was attempting to evaluate various Rails server solutions. First on my list was an nginx + passenger system. I spun up an EC2 instance with 8 gigs of RAM and 2 processors, installed nginx and passenger, and added this to the nginx.conf file:
passenger_max_pool_size 30;
passenger_pool_idle_time 0;
rails_framework_spawner_idle_time 0;
rails_app_spawner_idle_time 0;
rails_spawn_method smart;
I added a little "awesome" controller to rails that would just render :text => (2+2).to_s
Then I spun up a little box and ran this to test it:
ab -n 5000 -c 5 'http://server/awesome'
And the CPU while this was running on the box looked pretty much like this:
05:29:12 PM CPU %user %nice %system %iowait %steal %idle
05:29:36 PM all 62.39 0.00 10.79 0.04 21.28 5.50
And I'm noticing that it takes only 7-10 simultaneous requests to bring the CPU to <1% idle, and of course this begins to seriously drag down response times.
So I'm wondering, is a lot of CPU load just the cost of doing business with Rails? Can it only serve a half dozen or so super-cheap requests simultaneously, even with a giant pile of RAM and a couple of cores? Are there any great perf suggestions to get me serving 15-30 simultaneous requests?
Update: tried the same thing on one of the "super mega lots and lots of CPUs" EC2 thing. Holy crap was that a lot of CPU power. The sweet spot seemed to be about 2 simultaneous requests per CPU, was able to get it up to about 630 requests/second at 16 simultaneous requests. Don't know if that's actually cost efficient over a lot of little boxes, though.
I must say that my Rails app got a massive boost to supporting about 80 concurrent users from about 20 initially supported after adding some memcached servers (4 mediums at EC2). I run a high traffic sports site that really hit it a few months ago. Database size is about 6 gigs with heavy updates/inserts.
MySQL (RDS large high usage) cache also helped a bit.
I've tried playing with the passenger settings but got some curious results - like for example each thread eats up 250 megs of RAM which is odd considering the application isn't that big.
You can also save massive $ by using spot instances but don't rely entirely on that - their pricing seems to spike on occasion. I'd AutoScale with two policies - one with spot instances and another with on demand (read: reserved) instances.
When load testing a basic web application, what sanity checks do you do other than expected response time?
Is it fair to ask for peak memory usage?
What other checks do you make?
On the server
Requests per second the application can withstand
Requests per second that hit the database (if any, related to the number above, but it's useful to have them as separate figures)
Transferred bandwidth (separated by media type, if possible)
CPU utilization
Memory utilization
On the client
Response time
Weight of the average page
Is the CPU usage high at any time
Run something like YSlow to see what can you optimize on the output to make it quick for users
Stress testing tools usually come with most of these measures (except for Memory, CPU and database usage), as do YSlow or Firebug do on the client.
We look at a pretty wide variety of metrics when analyzing the results of a load test.
On the server, we start with these main 4 categories:
CPU (% utilization, context switches/sec, process queue length)
Memory (% use, page reads/sec, page writes/sec)
Bandwidth (incoming, outgoing, send & receive errors, # connections, connection failures, segment retransmits/sec)
Disk (Disk I/O Time %, avg service time, queue length, reads and writes/sec)
We also like look at metrics specific to the webserver and application server in use. For example, in IIS we look at IIS connection counts, cache hit rates and turnover frequency, etc. In .NET, we would be looking at ASP.NET Requests/sec, ASP.NET Last Request Execution Time, ASP.NET Current Requests, ASP.NET Queued Requests, ASP.NET Request Wait Time, ASP.NET Errors/sec and many others.
On the client side, we are primarily looking at total load time for the pages, duration and TTFB (time to first byte) for critical transactions, bandwidth usage, average page size and failure rate. We also find two metrics very useful - we call them Waiting Users and Average Wait Time. Not many tools have these - they tell you at each sample period exactly how many simulated users are in the process of retrieving a resource from the server and how long, on average, they have been waiting for the resource to arrive. We find these very useful for
determining when the server has reached its capacity
discovering that the server has stopped responding to certain types of requests (typically for certain resources, such as those requiring a database query)
Another good sanity check is to run the tests for at least 24 hours. We do that because one app ran nicely for a few hours then degraded. Discovered some issues with scheduled tasks as well as db connection pooling.
There are a number of services online that can do this type of testing for you as well. Of course, one of the downsides to this approach is that its harder to correlate the data from the service (which is what can be observed externally) with your own internal data about disk I/O, DB ops, etc. If you end up going this route I would suggest finding a vendor that will give you programmatic access to the raw test result data.