InfluxDB Line Protocol Performance 1.8 vs. 2.0 - influxdb

I noticed that there is a considerable (5x) performance penalty when upgrading my app from InfluxDB 1.8 to 2.0. I'm using ILP and the only change I made on the ingress side was switching from /write to /api/v2/write, adding the auth token and org and bucket query parameters. I am writing 25,000 data points per second in bursts of 100 data points using 10 connections in parallel. My laptop runs at ~18% with 1.8 and ~94% with 2.0.
Is this an expected behavior? Maybe some default database performance settings changed?

Related

Neo4j enterprise cluster master not fully utilizing CPU

We have a Neo4j v3.0.4 Enterprise cluster running on a machine in AWS with 16 cores and when we issue a lot of requests to it, it seems to only utilize at most ~40% of the CPU (when looking on the box using htop it only seems to utilize 6 cores?). Disk + network IO on said box all look negligible during the test.
Screen cap of CPU profile - the flat part is when we hit it with load:
Requests are routed to the DB via a cluster of Spring Boot apps using Spring Data Neo4j 4 and from our investigations it does not look like those servers are forming any bottlenecks, from a memory, CPU, and network IO POV.
We currently are NOT using Bolt, nor are we using causal clustering; however we are planning on moving towards both. In the interim though, is there anything that may cause this type of behavior? Could our DB be misconfigured? Could this be a JVM level problem?
Any advice is much appreciated - thanks!

Performance issues with Neo4j Spatial and OSM data

This is my first project using Neo4j and the associated spatial plug in. I am experiencing performance well below what I was expecting and below what's needed for this project. As a noob I may be missing something or have misunderstood something. Help is appreciated and needed.
I am experiencing very slow response time for Neo4j and Spatial plugin when trying to find surrounding OSM ways to a point specified by lat/lon to process GPS reading from a driven trip. I am calling spatial.closest ("layer', {lon, lat), 0.01) which is taking 6-11 seconds to process and return approximately 25 - 100 nodes.
I am running Neo4j community edition 3.0.4 and spatial 0.20 running on MacBook Pro 16GB / 512GB SSD. The OSM data is massachusetts-latest.osm (Massachusetts, USA.) I am accessing it via bolt and Cypher. Instrumented testing has been done from browser client, python client, java client as well as a custom version of spatial that reports timing for the spatial stored procedure. The Neo4j database is approximately 44GB in size, contains 76.5M nodes and 118.2M relationships. The schema and data are 'as-is' from the OSMImport.
To isolate the performance I added a custom version of spatial.closest( ) named spatial.timedClosest( ). The timedClosest() stored procedure takes the same input and has the same calls as spatial.closest(), but returns a Stream instead of a Stream. The Stream has timing information for the stored procedure.
The stored procedure execution time is split evenly between the internal call to getLayerOrThrow( ) and SpatialTopologyUtils.findClosestEdges( ).
1) Why does getLayer(layerName) take so long to execute? I am very surprised to observe getLayer(layerName) takes so long: 2.5 - 5 seconds. There is only one layer, the OSM layer, directly off the root node. I see the same hit on calls to spatial.getLayer(). Since the layer is an argument to many of the spatial procedures, this is a big deal. Anyone have insight into this?
2) Is there a way to speed up SpaitalTopologyUtils.findClosestEdges( )? Are there additional indexes that could be added to speed up the spatial proximity search?
My understanding is Neo4j is capable of handling billions of nodes / relationships. For this project I am planning to load North America OSM data. From my understanding of spatial plug in, it has spatial management and searching capabilities that would provide a good starting foundation.
#Bo Guo, sorry for the delayed response. I've been away from Neo4j for a bit. I replaced the existing indexing with geohash indexing (https://en.wikipedia.org/wiki/Geohash). As OSM data was loaded the roadways and boundaries were tested for intersections in geohash regions. Geohash worked nicely for lookup. Loading of the OSM data was still a bear. North America from OSM data on 8 core mid-range AMD server with SATA SSDs would take several days to a week.

Neo4j Huge database query performance configuration

I am new to Neo4j and graph databases. Saying that, I have around 40000 independent graphs uploaded into a neo4j database using Batch insertion, so far everything went well. My current database folder size is 180Gb, the problem is querying, which is too slow. Just to count number of nodes, it takes forever. I am using a server with 1TB ram and 40 cores, therefore I would like to load the entire database into memory and perform queries on it.
I have looked into the configurations but not sure what changes I should make to cache the entire database into memory. So please suggest me the properties I should modify.
I also noticed that most of the time Neo4j is using only one or two cores, How can I increase it?
I am using the free version for a university research project therefore I am unable to use High-Performance Cache is there an alternative in free version?
My Solution:
I added more graphs to my database and now my database size is 400GB with more than a billion nodes. I took Stefan's comments and used java APIs to access my database and moved my database to RAM disk. It takes to 3 hours to walk through all the nodes and collect information from each node.
RAM disk and Java APIs gave a big boost in performance.
Counting nodes in a graph is a global operation that obviously needs to touch each and every node. If caches are not populated (or not configured according to your dataset) the drive of your hard disc is the most influencing factor.
To speed up things, be sure to have caches configured efficiently, see http://neo4j.com/docs/stable/configuration-caches.html.
With current versions of Neo4j, a Cypher query traverses the graph in single threaded mode. Since most graph applications out there are concurrently used by multiple users, this model saturates the available cores.
If you want to run a single query multithreaded, you need to use Java API.
In general Neo4j community edition has some limitation in scaling for more than 4 cores (due to a more performant lock manager implementation in Enterprise edition). Also the HPC (high performance cache) in Enterprise edition reduces the impact of full garbage collections significantly.
What Neo4j version are you using?
Please share your current config (conf/* and data/graph.db/messages.log) you can use the personal edition of Neo4j enterprise.
What kinds of use cases do you want to run?
Counting all nodes is probably not your main operation (there are ways in the Java API that make it faster).
For efficient multi-core usage, run multiple clients or write java-code that utilizes more cores during traversal with ThreadPools.

Is DataSnap Optimized for responding to more than 1k users at the same time?

We want to start a big multi-tier application. The server side application must respond to more than 1000 users at the same time. We want to create server application by 64 bit compiler and client side with 32 bit. In this case we don't know DataSnap can respond to all client without any problem or not?
In this case The Server computer is very powerful (multi-processor and more than 16GB of RAM) and DataBase Management system is FireBird 2.5.
You need a way to perform realistic load tests.
For the Firebird database, you can simulate concurrent users with the free Apache JMeter tool. It can run SQL statements and record their execution time statistics (average, min/max etc.). So you could for example create a thread group with twenty different SQL queries, and then run twenty threads which each will perform these queries sequentially.
JMeter allows to define time limits on the SQL query, and JMeter treats it as an error if the query exceeds this limit. Then you can try to find the maximum client count where the overall error rate is still less than (for example) five percent.
But you also need to know how high the expected database load will be, and you will also need to have a test database with a realistic size, not only a couple of records. Also, some database queries like reports might cause higher load - these should be included in the simulation too, as they can affect overall performance. In JMeter, you can create a second thread group, running in parallel with the first one, for these long-running statements with different settings (less simulated clients).
Testing the database will show if there is a bottleneck already in this area. For example, the test result could be that the database can serve twenty clients with a total average transaction rate of 20 TPS (transactions per second), which means one client executes one transaction per second. But this TPS value will decrease with higher user count.
Related question: Firebird usage in big projects which also has a link to http://www.firebirdsql.org/en/case-studies-catalog/
Regarding DataSnap client load simulation: this can be done with a scripted client, which runs a predefined set of statements / commands over the connection.
To run a high number of load test clients simultaneaously you could use a service like Amazon Elastic Compute Cloud (EC2), to launch clones of your test machine image, saving you hardware costs. But of course I would start with a small client machine which simply runs ten or twenty scripted clients.
As far as I know DataSnap is based on Indy. And Indy's connection handling model is not very scaleable - one thread per connection, which is very resource consuming. Even using Indy's thread pools is not an option I think... Also in Windows (32 bit) for example there is a limit of the maximum threads you can create (2000 IIRC). Anyway - using many threads is not good and hits performance of the server (for reference - Windows Internals book, Windows Performance Team blog etc.)
A scalable, robust and professional application server would use IO Completion ports (IOCP) for data processing. But I don't know if DataSnap can take advantage of it.
UPDATE:
On the CodeRage7 I asked similar scaleability questions. Here are the answers:
Q: Recently there was a question on StackOverflow about DataSnap's scaleability/performance. So can DS handle, for example, 2000 or more concurent user request at the network and application level?
A: the scalability is based on scalability of TCP/HTTP/HTTPs and # of connections allowed in your server operating system. Also based on memory and hardware you employ. There is no specific limit in DataSnap.
My comment: While this is true, Indy's connection handling model, i.e. one thread per connection, introduces bottleneck especially in 32 bit Windows (2000 threads max). In Win64 it should not be so much problem, but again - this kind of handling data flow leads to performance degradation.
Q: Does DataSnap support some kind of load balancing?
A: Not directly. You can do this in code in your DataSnap server(s).
My comment: I've found very good paper on implementing Failover/Load Balancing in DataSnap in Andreano Lanusse's blog
Q: Does DataSnap support IO Completion ports for better scaleability?
This my question was left unanswered.
Hope this helps!
UPDATE2:
I found very interesting post on DS Performance: DataSnap analysis based on Speed & Stability tests
UPDATE3:
DataSnap, Deployment, Performance, and More (Marco Cantu)
Monitoring and control of connections in DataSnap XE2 - translated in English
Monitoring and control of connections in DataSnap XE2 - original
When the specifications for a system are made, you need to be very precise when it's about multiple users.
For example: you create a website, and the client expects 15.000 unique users.
Then the client usually comes up with a requirement that the system needs to support 15.000 simultaneous users, which is very naive.
You'll need a more detailed specification than that.
Usually it's more sensible to say something like: in 99% of the requests, 99% of the users can get a response to their request within 5 seconds average.
In normal usage, you'll never see all users send a request within the same second. If at some point they all arrive within the same minute (also very unlikely), you'll have a lot fewer concurrent users.
Even for websites with tens of thousands of users, where most of them connect on a daily basis, the webserver is idle most the time, and once and a while it jumps to 5% or in extreme cases to 20%. If we really have to serve all of these users at once we'd be screwed, but that never happens, and it's not realistic to prepare a server for such loads.

Why is membase server so slow in response time?

I have a problem that membase is being very slow on my environment.
I am running several production servers (Passenger) on rails 2.3.10 ruby 1.8.7.
Those servers communicate with 2 membase machines in a cluster.
the membase machines each have 64G of memory and a100g EBS attached to them, 1G swap.
My problem is that membase is being VERY slow in response time and is actually the slowest part right now in all of the application lifecycle.
my question is: Why?
the rails gem I am using is memcache-northscale.
the membase server is 1.7.1 (latest).
The server is doing between 2K-7K ops per second (for the cluster)
The response time from membase (based on NewRelic) is 250ms in average which is HUGE and unreasonable.
Does anybody know why is this happening?
What can I do inorder to improve this time?
It's hard to immediately say with the data at hand, but I think I have a few things you may wish to dig into to narrow down where the issue may be.
First of all, do your stats with membase show a significant number of background fetches? This is in the Web UI statistics for "disk reads per second". If so, that's the likely culprit for the higher latencies.
You can read more about the statistics and sizing in the manual, particularly the sections on statistics and cluster design considerations.
Second, you're reporting 250ms on average. Is this a sliding average, or overall? Do you have something like max 90th or max 99th latencies? Some outlying disk fetches can give you a large average, when most requests (for example, those from RAM that don't need disk fetches) are actually quite speedy.
Are your systems spread throughout availability zones? What kind of instances are you using? Are the clients and servers in the same Amazon AWS region? I suspect the answer may be "yes" to the first, which means about 1.5ms overhead when using xlarge instances from recent measurements. This can matter if you're doing a lot of fetches synchronously and in serial in a given method.
I expect it's all in one region, but it's worth double checking since those latencies sound like WAN latencies.
Finally, there is an updated Ruby gem, backwards compatible with Fauna. Couchbase, Inc. has been working to add back to Fauna upstream. If possible, you may want to try the gem referenced here:
http://www.couchbase.org/code/couchbase/ruby/2.0.0
You will also want to look at running Moxi on the client-side. By accessing Membase, you need to go through a proxy (called Moxi). By default, it's installed on the server which means you might make a request to one of the servers that doesn't actually have the key. Moxi will go get it...but then you're doubling the network traffic.
Installing Moxi on the client-side will eliminate this extra network traffic: http://www.couchbase.org/wiki/display/membase/Moxi
Perry

Resources