Slow Query Performance

Slow Query Performance - oledb

I am running some very large databases (500 MB and 300 MB) in my application on several different machines.
From a hardware perspective, the machines have been identically configured.
I am using SQL Server CE 4.0 as my DBMS.
The performance critical query has been indexed to improve its performance.
The problem is that on [only] one of the machines, I am observing egregiously slow query performance. This usually happens after a long period of time of inactivity (from a query perspective). After I do several (about 7-8) queries, the slow performance disappears.
The weird thing is that this initial slow query performance does not happen on the other machine.
The only difference between the two machines is the data contained inside the databases.
I suspect that the distribution of data on the slow machine is somehow reducing the effectiveness of the indexing and that SQL Server CE has to rebalance the indexing in a much more significant way than on the other faster machine.
One thing I notice is that when the query is very slow, the disk activity increases significantly and the process corresponding to reading the database shows a spike in the read bytes.
This does not happen on the other machine.
Does anyone know how I might go about root causing this issue?
My code is written in C++ and uses the ATL/OLEDB API to manipulate the database.
UPDATE: My performance profiling activities indicate that it's not the query itself that is slow - it is the processing of the returned rowset that takes a while. For each row returned, I query another database for related data. I understand that this is not the right way to do it but the performance problem only happens on one machine. One thing I noticed is that when I have other unrelated queries happening on the same database in other threads, the unrelated queries will stall the query that is exhibiting the performance problem.

Related

Neo4j randomly high CPU

Neo4j 3.5.12 Community Edition
Ubuntu Server 20.04.2
RAM: 32 Gb
EC2 instance with 4 or 8 CPUs (I change it to accommodate for processing at the moment)
Database files: 6.5Gb
Python, WSGI, Flask
dbms.memory.heap.initial_size=17g
dbms.memory.heap.max_size=17g
dbms.memory.pagecache.size=11g
I'm seeing high CPU use on the server in what appears to be a random pattern. I've profiled all the queries for the pages that I know that people are visiting at those times and they are all optimised with executions under 50ms in all cases. The CPU use doesn't seem linked with user numbers which are very low at most times anyway (max 40 concurrent users). I've checked all queries in cron jobs too.
I reduced the database notes significantly and that made no difference to performance.
I warm the database by preloading all nodes into ram with MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN count(n.prop) + count(r.prop);
The pattern is that there will be a few minutes of very low CPU use (as I would expect from this setup with these user numbers) and then processing on most CPU cores goes up to the high 90%s and the machine becomes unresponsive to new requests. Changing to an 8CPU instance sorts it, but shouldn't be needed for this level of traffic.
I would like to profile the queries with query logging, but the community edition doesn't support that.
Thanks.

Run a CPU profiler such as perf to record where CPU time is spent. You can then visualize it as a FlameGraph or, since your bursts only occur at random intervals, visualize it over time with Netflix' FlameScope
Since Neo4j is a Java application, it might also be worthwhile to have a look at async-profiler which is priceless when it comes to profiling Java applications (and it generates similar FlameGraphs and can output log files compatible with FlameScope or JMC)

Does influxDB have a limit to the number of databases you can have?

Perf testing a tool. We generate a bunch of metrics per test run. But we want to keep each test run separate. This seems like a db per run would allow us to do that, and at the same time allow us to give the tools we create to customers who would only have 1 install, and thus need only one db. But we are talking hundreds of db's... granted they should be smaller as most would only be for a set of metrics covering a couple of hours. But will influxdb limit us? or will performance suffer significantly?

Looks memory/virtual memory appears to be the limiting factor.
On two 16gb boxes I added 500 db's with data on one, and 500 sets of data to the same db on the other. The data was pretty small actually, the individual dbs were 440K after being loaded.
Memory use on the 500 db's was way higher.
500dbs 19.9g virtual, 3.3g resident
1db 2.5g virtual, .9g resident

Is it advisable to restart informatica application monthly to improve performance?

We have around 32 datamarts loading around 200+ tables out of which 50% of tables are on 11g Oracle database and 30% on 10g and rest 20 are flat files.
Lately we are facing performance issues while loading the datamarts.
Database parameters as well are network parameters are looking and as throughput is decreasing drastically we are of the opinion now that it is informatica which has problem.
Recently when through put had gone down and server was utilized to its 90% informatica application was restarted and the performance there after was little better than previous performance.
So my question is should we have Informatica restart as a scheduled activity ? Does restart actually improves the performance of the application or there are some other things which can play a role in the same?

What you have here is a systemic problem, but you have not established which component(s) of the system are the cause.
Are all jobs showing exactly the same degradation in performance? If not, what is the common characteristic of those that are? Not all jobs will have the same reliance on the Informatica server -- some will be dependent more on the performance of their target system(s), some on their source system(s), so I would be amazed if all showed exactly the same level of degradation.
What you have here is an exercise in data gathering, and then turning that data into useful information.
If you can isolate the problem to only certain jobs then I would take a log file from a time when the system is performing well, and from a time when it is not, and compare them directly, looking for differences in the performance of their components. You can also look at any database monitoring tools for changes in execution plan.
Rebooting servers? Maybe, but that is not necessarily the solution -- the real problem is the lack of data you have to diagnose your system.

Yes, It is good to do a restart every quarter.
It will refresh the Integration service cache.
Delete files from Cache and storage before you restart.

Since you said you have recently seen some reduced performance recently it might be due various reasons.
Some tips that may help:
Ensure all Indexes are in valid and compiled state.
If you are calling a procedure via worflow check the EXPLAIN plan and Cost ensure it is not doing a full table scan(cost should be less).
3.Gather stats on the source or target tables (especially which have deletes )) - This will help in de fragmentation - deleting the un allocated space. DBMS_STATS
Always good to have an house keeping scheduled weekly to do the above checks on indexes,remove temp/unnecessary files and gather stats (analyze indexes and tables).
Some best practices here performance tips

How to configure Neo4j to run in a minimal memory environment?

For demo purposes, I am running Neo4j in a low memory environment -- A laptop with 4GB of RAM, 1644MB is use for video memory, leaving only 2452 MB available for use.. It's also running SQL Server, our WCF services, and our clients.. So there's little memory for Neo4j.
I'm running LOAD CSV cypher scripts via REST from a C# service. There are more than 20 scripts, and theyt work well in a server environment. I've written code to paginate, so that they run in smaller batches. I've reduced the batch size very low ( 25 csv rows ) and a given script may do 300 batches, but I continue to get "Java heap space" errors at some point.
I've tried configuring Neo4j with a relatively large heap space ( 640MB ) which is all the available RAM size plus setting the cache_type to none, and it gets much further before I get the java heap space error. What I don't understand is in that case, why does it grow that much? Also until I restart the neo4j service, I get these java heap space errors quickly. The batch size doesn't seem to impact how much memory is used appreciably.
However, after doing that, and I run the application with these settings, the query performance becomes very slow due to the cache settings.
I am running this on a Windows 7 laptop with 4G RAM -- using Neo4j 2.2.1 Community Edition.
Thoughts?

Perhaps you can share your LOAD CSV statement and the other queries you run.
I think you just run into this:
http://markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
So PROFILE or EXPLAIN your queries and make it not to use that much intermediate state. We can help if you share your statements.
And you should use PERIODIC COMMIT 100.
Something like:
heap=512M
dbms.pagecache.memory=200M
keep_logical_logs=false
cache_type=none
http://console.neo4j.org runs neo4j in memory putting up to 50 instances in a single gigabyte of memory. So it should be doable.

Why is my PostgreSQL server cpu constrained?

My database is very cpu constrained, and I can't find the root cause of the issue. I currently have two applications servers each wit a Rails api connecting to PostgreSQL via the ruby-pg gem. Both application server also have sidekiq running background jobs, and I have a handful of support servers processing new posts from a national feed via sidekiq. If I were running out of memory, the solution would seemingly be straight forward. Any general ideas why I am CPU constrained?
Database Specs:
Rackspace 8GB Performance Tier cloud VM (8GB RAM, 8x Core CPU, SSD)
Debian 7 Wheezy Linux OS
PostgreSQL 9.1 with PostGIS extension
Possible Problems:
PostgreSQL 9.1 is bad at indexes
The database has nearly 10GB of indexes. I am going to upgrade my database to PostgreSQL version >= 9.2. In version 9.2, index only scans were introduced.
Too many connections
In the postgresql.conf, I have set max connection equal to '500'. Usually throughout the day, only 175 connections are utilized, but during peak times, sidekiq tasks will increase the current connections to 350. How many connections are recommended with an 8GB server instance?
Idol Connections
When I take a look at pg_stat_activity in the psql console, I see sidekiq is leaving a lot of IDLE connections. Could these connections result in CPU inflation? Does the fix exist in the api or in sidekiq?
Need a more powerful server
Maybe there is not a bug. I might need to simply increase the server instance. Again this would make more sense if I was memory bound. However, both app servers and 3 of the support sidekiq servers are 4gb performance tier instances. Essentially, servers that interact with the database have combined more than double the resources of the database. Should this even matter?
Additional questions:
What tools/techniques should I employ to troubleshoot the issue?
Any basic settings in the postgresql.conf related to cpu usage?
Are there any known issues related to rails, sidekiq, or the pg gem that could be a contributing factor? (I havent seen any open issues.)
Are there any general postgreSQL guideline for CPU usage?
Any other ideas thoughts that might help my search?

You are using massively too many concurrent connections. PostgreSQL will be wasting lots of its time on housekeeping and juggling concurrent queries. All the concurrent work will be fighting for CPU and buffer space, there'll be heavy contention on spinlocks, and it'll all generally be a mess.
On an 8 core machine, you should probably not have more than 20 actively working connections if you're mostly CPU constrained. If you're I/O limited, you can go higher, but 350 is just ridiculous.
If possible, put a PgBouncer in transaction pooling mode in front of your PostgreSQL instance, so queries get queued up and executed rapidly in series instead of slowly in parallel.
See number of database connections (Pg wiki).
Additionally, PostGIS can be very CPU-heavy. It sometimes needs to do very complex calculations. I suggest using the auto_explain module to record long running queries, and using pg_stat_statements / pg_stat_plans to record what's taking up resources. Examine these queries to see if they need improvement.
Your idle in transaction sessions must be dealt with, too. Depending on why they're idle and whether they have a transaction ID or not, they might be causing serious table bloat. They're also creating unnecessary signalling overhead within PostgreSQL, as it has to do more co-ordination with backends that're actively doing things. Finally, the number of open transactions its self increases the cost of some internal housekeeping operations.
So. Your DB will probably perform better if you reduce the connection counts, put a PgBouncer in transaction pooling mode in front, and fix those idle connections.

Most likely you are CPU constrained because your work needs a lot of CPU. :)
9.1 is not generally bad at indexes. There may be some specific issues, as all versions might, which exactly what they are might change from version to version.
Index-only-scans are mostly a benefit when you are IO constrained. I wouldn't hold out much hope for that being a magic bullet for you.
350 connections are certainly not helpful, but probably are not very harmful, either. But when they are harmful, it can be downright catastrophic. The correct value is more determined by the number of cores, not the amount of RAM. If it is easy to throttle down the sidekiq connections, do it even if you can't prove that it helps.
If the connections are just IDLE, not IDLE in transaction, then they probably aren't very harmful, but again there are a few cases where they can be. That is pretty much the same issue as the number of connections.
The connection you showed from top was idle in transaction. That status shouldn't be taking up much CPU, so that probably means it is rapidly cycling through statements and top just happens to catch it while it is between them. But you didn't say how many similar lines there were in top, if it is just that one it suggests your code is not running concurrently and 7 of you 8 CPUs are wasted.
Regarding the db server versus the other servers, if the database is fundamentally the limit, beating on it with a bigger hammer is not going to help. Often there is some flexibility about where computation is done. If you can get the app servers to do more computation that is currently done on the db and let the db focus on ACID issues, that would be good. But no one but you can know if that is possible or feasible.
My first stop would be to use pg_stat_statements to see what SQL statements are taking the most time. Maybe just adding an index to the slowest/most frequent query would make the problem magically go away.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart