Is it advisable to restart informatica application monthly to improve performance? - data-warehouse

We have around 32 datamarts loading around 200+ tables out of which 50% of tables are on 11g Oracle database and 30% on 10g and rest 20 are flat files.
Lately we are facing performance issues while loading the datamarts.
Database parameters as well are network parameters are looking and as throughput is decreasing drastically we are of the opinion now that it is informatica which has problem.
Recently when through put had gone down and server was utilized to its 90% informatica application was restarted and the performance there after was little better than previous performance.
So my question is should we have Informatica restart as a scheduled activity ? Does restart actually improves the performance of the application or there are some other things which can play a role in the same?

What you have here is a systemic problem, but you have not established which component(s) of the system are the cause.
Are all jobs showing exactly the same degradation in performance? If not, what is the common characteristic of those that are? Not all jobs will have the same reliance on the Informatica server -- some will be dependent more on the performance of their target system(s), some on their source system(s), so I would be amazed if all showed exactly the same level of degradation.
What you have here is an exercise in data gathering, and then turning that data into useful information.
If you can isolate the problem to only certain jobs then I would take a log file from a time when the system is performing well, and from a time when it is not, and compare them directly, looking for differences in the performance of their components. You can also look at any database monitoring tools for changes in execution plan.
Rebooting servers? Maybe, but that is not necessarily the solution -- the real problem is the lack of data you have to diagnose your system.

Yes, It is good to do a restart every quarter.
It will refresh the Integration service cache.
Delete files from Cache and storage before you restart.

Since you said you have recently seen some reduced performance recently it might be due various reasons.
Some tips that may help:
Ensure all Indexes are in valid and compiled state.
If you are calling a procedure via worflow check the EXPLAIN plan and Cost ensure it is not doing a full table scan(cost should be less).
3.Gather stats on the source or target tables (especially which have deletes )) - This will help in de fragmentation - deleting the un allocated space. DBMS_STATS
Always good to have an house keeping scheduled weekly to do the above checks on indexes,remove temp/unnecessary files and gather stats (analyze indexes and tables).
Some best practices here performance tips

Related

Why is my PostgreSQL server cpu constrained?

My database is very cpu constrained, and I can't find the root cause of the issue. I currently have two applications servers each wit a Rails api connecting to PostgreSQL via the ruby-pg gem. Both application server also have sidekiq running background jobs, and I have a handful of support servers processing new posts from a national feed via sidekiq. If I were running out of memory, the solution would seemingly be straight forward. Any general ideas why I am CPU constrained?
Database Specs:
Rackspace 8GB Performance Tier cloud VM (8GB RAM, 8x Core CPU, SSD)
Debian 7 Wheezy Linux OS
PostgreSQL 9.1 with PostGIS extension
Possible Problems:
PostgreSQL 9.1 is bad at indexes
The database has nearly 10GB of indexes. I am going to upgrade my database to PostgreSQL version >= 9.2. In version 9.2, index only scans were introduced.
Too many connections
In the postgresql.conf, I have set max connection equal to '500'. Usually throughout the day, only 175 connections are utilized, but during peak times, sidekiq tasks will increase the current connections to 350. How many connections are recommended with an 8GB server instance?
Idol Connections
When I take a look at pg_stat_activity in the psql console, I see sidekiq is leaving a lot of IDLE connections. Could these connections result in CPU inflation? Does the fix exist in the api or in sidekiq?
Need a more powerful server
Maybe there is not a bug. I might need to simply increase the server instance. Again this would make more sense if I was memory bound. However, both app servers and 3 of the support sidekiq servers are 4gb performance tier instances. Essentially, servers that interact with the database have combined more than double the resources of the database. Should this even matter?
Additional questions:
What tools/techniques should I employ to troubleshoot the issue?
Any basic settings in the postgresql.conf related to cpu usage?
Are there any known issues related to rails, sidekiq, or the pg gem that could be a contributing factor? (I havent seen any open issues.)
Are there any general postgreSQL guideline for CPU usage?
Any other ideas thoughts that might help my search?
You are using massively too many concurrent connections. PostgreSQL will be wasting lots of its time on housekeeping and juggling concurrent queries. All the concurrent work will be fighting for CPU and buffer space, there'll be heavy contention on spinlocks, and it'll all generally be a mess.
On an 8 core machine, you should probably not have more than 20 actively working connections if you're mostly CPU constrained. If you're I/O limited, you can go higher, but 350 is just ridiculous.
If possible, put a PgBouncer in transaction pooling mode in front of your PostgreSQL instance, so queries get queued up and executed rapidly in series instead of slowly in parallel.
See number of database connections (Pg wiki).
Additionally, PostGIS can be very CPU-heavy. It sometimes needs to do very complex calculations. I suggest using the auto_explain module to record long running queries, and using pg_stat_statements / pg_stat_plans to record what's taking up resources. Examine these queries to see if they need improvement.
Your idle in transaction sessions must be dealt with, too. Depending on why they're idle and whether they have a transaction ID or not, they might be causing serious table bloat. They're also creating unnecessary signalling overhead within PostgreSQL, as it has to do more co-ordination with backends that're actively doing things. Finally, the number of open transactions its self increases the cost of some internal housekeeping operations.
So. Your DB will probably perform better if you reduce the connection counts, put a PgBouncer in transaction pooling mode in front, and fix those idle connections.
Most likely you are CPU constrained because your work needs a lot of CPU. :)
9.1 is not generally bad at indexes. There may be some specific issues, as all versions might, which exactly what they are might change from version to version.
Index-only-scans are mostly a benefit when you are IO constrained. I wouldn't hold out much hope for that being a magic bullet for you.
350 connections are certainly not helpful, but probably are not very harmful, either. But when they are harmful, it can be downright catastrophic. The correct value is more determined by the number of cores, not the amount of RAM. If it is easy to throttle down the sidekiq connections, do it even if you can't prove that it helps.
If the connections are just IDLE, not IDLE in transaction, then they probably aren't very harmful, but again there are a few cases where they can be. That is pretty much the same issue as the number of connections.
The connection you showed from top was idle in transaction. That status shouldn't be taking up much CPU, so that probably means it is rapidly cycling through statements and top just happens to catch it while it is between them. But you didn't say how many similar lines there were in top, if it is just that one it suggests your code is not running concurrently and 7 of you 8 CPUs are wasted.
Regarding the db server versus the other servers, if the database is fundamentally the limit, beating on it with a bigger hammer is not going to help. Often there is some flexibility about where computation is done. If you can get the app servers to do more computation that is currently done on the db and let the db focus on ACID issues, that would be good. But no one but you can know if that is possible or feasible.
My first stop would be to use pg_stat_statements to see what SQL statements are taking the most time. Maybe just adding an index to the slowest/most frequent query would make the problem magically go away.

Multiple servers or everything in a single server?

I have a Rails app that uses MySQL, MongoDB, NodeJS (and SocketIO). Right now, the app (everything) is hosted inside 1 box. I would like to know what I should do when the number of users grow. What factors should I take into account to determine whether I need to host a separate element in another box (like MySQL, Node, Mongo in each of its separate box). Should I just make that one single box bigger? Is there a best-practice method that I can go with?
If you guys can provide me with reference, guides, research regarding this topic. Please do. I am super noob at deployment and server configuration.
We faced this dilemma at work a short while ago and found that simply upgrading to a more powerful single box sufficed and would give us room to grow further by up to 3-4 times.
The most important thing would be to identify your potential bottlenecks.
In our case there were 2 bottlenecks. Disk I/O and the database's ability to utilise memory.
On our new server we had the hard drive array configured in such a way as to maximise the disk I/O and we upgraded the database software to allow it to use more memory. In fact the DBMS now keeps the entire database in memory and only performs write operations to the disk as needed. This significantly improved performance.
The short answer is move everything to its own box. The longer answer is: it depends on your app's usage.
I recommend you use Nagios or similar to monitor your app's resource utilization -- that is, how much CPU and RAM each of your services use. When one starts to each up too much resources (and your page load speed is negatively affected), move that to its own box.
Then continue to monitor that box, beef up when necessary or shard out.
The high scalability blog is good for reading on what other people have done.

Rails hosting advice - EngineYard, Heroku, EC2

I'm developing a very sensitive application for a client that needs to have 99.9999999999% uptime guarantee.
It's a Rails application with MySQL database. I am thinking of hosting it on EngineYard due the low maintenance requirements and easiness to run.
Heroku does not seems to be the perfect solution due to uptime problems.
EC2 can also be a good solution but maybe it requires too much work to install and maintain.
My question is: how to make a redundant system using EngineYard, Heroku, EC2 or any other Rails hosting that you propose? Do I need to have 2 instances in different places of the world being replicated? Please advise the best way.
Regards.
Everyone wants 100% uptime, but achieving it is pretty much impossible. Since down-time can be caused by any of the links in the chain, and there usually are dozens, to achieve such a high standard you will need to buy gold-plated everything. Essentially, you'll have to spend a fortune. The difference between 99% uptime, which means your site is unavailable for around 88 hours a year, and 99.9% uptime, where it's less than ten hours is considerable, and from there to 99.99% is even higher, where the tolerance is under an hour for a full year.
Going beyond 99.99% is simply impractical. Nobody will sign a guarantee like this unless they're being dishonest, the agreement so loaded down with caveats as to be unenforcable, or don't mind dishing out heavy credits all the time. Amazon EC2's SLA is 99.99% for instance.
The metrics I've seen collected on a provider like Linode shows uptimes of about 99.97% to 99.99%. Occasionally you will see datacenters with 100% uptime, but this is the network level only and doesn't take into account intermittent internal glitches that may knock your server offline.
Choosing a managed hosting provider like Engine Yard might be the solution for you, because it can minimize your exposure to random events, but it won't get you such a high uptime in and of itself. They're very good at maintaining the system layer, but their ability to fix or work-around bugs in your application is very limited, and they are subject to the same intermittent networking issues with EC2 as anyone else.
There are two kinds of reliability you should be concerning yourself with. One is availability, which is purely a measure of how likely a client is to be able to use the application. The other is data integrity, which is a measure of how likely data is to be retained given any number of disaster scenarios.
Most people will accept that an application might be down every so often for brief periods of time, but people refuse to accept that data may go missing every now and then.
It isn't hard to get a "99.9999999999%" data retention rate, but you will need to plan out your backup, replication, and recovery strategy in detail and will have to exercise your systems regularly to ensure they are working as designed.
Where you have almost no control over the often patchy routing on the internet in general, the defect rate in the hardware of your server, the power in your data center and so on, you do have a huge amount of control over your backup strategy.
EY uses a company called TerraMark for their hosting, which is some pretty serious hosting infrastructure. Out of the 3 you listed, I would go with them.
For up time, you want to look at master/slave replication of your data, automatic failover, and you want to build redundancy wherever you can. High availability is a fairly involved topic, and has more to do with IT then dev, I would recommend asking where to start over at serverfault.com.

Data Warehouse: One Database or many?

At my new company, they keep all data associated with the data warehouse, including import, staging, audit, dimension and fact tables, together in the same physical database.
I've been a database developer for a number of years now and this consolidation of function and form seems counter to everything I know.
It seems to make security, backup/restore and performance management issues more manually intensive.
Is this something that is done in the industry? Are there substantial reasons for doing or not doing it?
The platform is Netezza. The size is in terabytes, hundreds of millions of rows.
What I'm looking to get from answers to this question is a solid understanding of how right or wrong this path is. From your experience, what are the issues I should be focused on arguing if this is a path that will cause trouble for us down the road. If it is no big deal, then I'd like to know that as well.
In general I would recommend using separate databases. This is the configuration I have always seen used in production and it really makes a lot of sense since - as you mentioned - both databases have fundamentally different purposes / usage patterns / etc.
Edit
If you're using one physical server, the fewer instances on that server the simpler the management and the more efficient the process.
If you put TWO instances on the same Physical Server you get:
Negatives:
Half the memory to use
Twice the count of database process
Positives:
You could take the entire staging db down without affecting the DW
So which is more precious to you, outage windows or CPU and Memory?
On the same the physical server multiple instances make performance management issues MUCH more manual to solve. If you look at the health of one of the instances, it might look fine but users are reporting poor performance, so you have to look at the next instance to see if the problem may be coming from there... and so on per instance.
Security is also harder with more than one instance. At best it's just as hard as a single instance but it's never easier. You'll have two admin accounts (SYS or something), Duplicate process accounts, etc.
Tell us why you think it's better to have more than one instance.
ORIGINAL POST
Can we be clear on terms. When you say "in the same Database" do you mean to say the same instance, or the same physical server. If you did move the staging to a new instance would it reside on the same physical hardware?
I think people get a little too hung up on instances. If you're going to put two instances on the same piece of hardware, you're only doubling the number of everything to very little advantage. All the server processes will be running twice... all the memory pools will be cut in half.
so let's say you really did mean two separate physical boxes...
Let's say you buy 2 12-way boxes (just say). When you're staging db server is done for the day, those 12 CPU's are wasting away. When your users pack up and go home, your prod DW CPUs are wasting away. CPU cycles are perishable, you can't get them back. BUT, if you had one 24 way box... then the staging DB COULD use 20 CPUs at night for some excellent Parallel Execution for building summary tables and your users will have double the capacity for processes during the day.
so let's say you meant the same hardware.
"It seems to make security, backup/restore and performance management issues more manually intensive."
Guaranteed that performance issues are harder to solve the more instances that share the same hardware. Guaranteed.
Security
What security do you do at the instance level?
Backup
What DW are you backing up at the instance level? You're not backing up tablespaces, but rather whole instances? Seems like that pattern will fail at a certain size.
PLATFORM: NETEZZA
Not familiar with the tool specifically. So if it's a single instance on a single box, then the division would seem more logical than physical and therefore the reasons they exist is for management, not performance. You don't increase your CPUs or memory by adding a database, right? So it doesn't seem like there's no performance upside to it. Each DB may be adding separate processes (performance hit), or it might be completely logical like schemas in Oracle. If each database is managed by new processes than data going between them will mean IPC.
Maybe the addition of the Netezza tag will get some traction.
We use databases for every segment (INVENTORY, CRM, BILLING...). There are no performance downsides and maintenance and overview is much better.
Better late than never, but for Netezza:
There are no performance hits while querying cross database. Netezza allows only SELECT operations cross database, no INSERT, UPDATE or DELETEstatements allowed.
This means you cannot do:
THISDB(ADMIN)=>INSERT INTO OTHERDB..TBL SELECT * FROM THISDBTABLE;
but you can do \c OTHERDB then
OTHERDB(ADMIN)=>INSERT INTO TBL SELECT * FROM THISDB..THISDBTABLE;
You are also not able to create a materialized view on a cross-database object, for example:
OTHERDB(ADMIN)=>CREATE MATERIALIZED VIEW BLAH AS SELECT * FROM THISDB..THISDBTABLE;
Administration might be where you will decide (though you probably already did long ago) on what kind of database(s) you'll create. Depending on your infrastructure, you might have a TEST/QA system and a PROD system on the same box, or on separate boxes.
You will gain speed in the load and the output if the tables are in the same schema (database). Obvious...but hey, I said it.
There is more overhead the more tables you put into one schema. Backups time, size of backups, ease of use.
Where I am, we have many multiple TB databases within one data-warehouse. Our rule of thumb is that a single loading process or a single report query should NOT have to span database. This keeps "like" tables together but gives some allowances for our backups and contingency processes. It also makes it a bit easier to "find" data.
For those processes that need to break this rule, we will either move data from one database to the other or allow the process to join across schemas.
I'm not as familiar with Netezza, so I'm not 100% sure what your options might be.
Few points for you to consider
a) If the data in one or more staging, audit, dimension and fact table has to be joined, you are better off keeping them in one database
b) Typically you will retain dimension tables and fact tables in the same database and distribute on most frequently joined columns to leverage "co-located join" functionality of Netezza
c) You should be able to use SQL grant permission to manage access to all objects (DB, tables, views etc)

Is no horizontal scalability when it comes to writing a RDBMS defect? or does it happen to all DBMS'?

When you hit a roof on reading from a database, you have two choices, scale vertically by putting more hardware in the server, or scale horizontally by putting a second server to help offload the reads.
Offloading reads to a second server, means that all writes will hit both servers, while read only hits one.
Problem is when you hit a roof with writing, since writing has to happen to all servers, it means that all servers will be overloaded with write requests, and the server comes unusable. Adding more servers to the problem doesn't help, since it only adds more servers that will be overloaded. So you have to scale vertically.
Is this something that is specific to RDBMS'? or is it something that happens with all DBMS'?
I know you can do things on software side, and split the database in two, eg. all entries starting with 0-m in one db while n-z in another, but IMHO it is more of a workaround than a solution to the problem.
I can't see that this would be specific to the relational model. All databases that have to read and write (and that's most of them) will have a similar problem.
For what it's worth, most databases are read far more than written so the write roof occurs less frequently than you might think. In addition, load balancing databases as per your method tends to be an immediate write to the primary with queued writes to all secondaries (at least in my experience).
In that case, you're not actually waiting around for multiple writes as a user, you just wait for the first. The DBMS itself manages the synchronisation between instances. This of course means that secondary databases might not be totally up-to-date but this can be controlled. Technically, this breaks the ACID properties of the system as a whole but this can be architected around.
I think this is the case with any DBMS, although some handle it better than others. Like you mention, partitioning the database in software seems to be the most common solution to this.
In many applications though, partitioning the database like that makes sense anyways if you are at such a huge scale that it becomes necessary. For example, if you had a social networking app, it would probably make sense to partition your database by country or other geographical regions. This would allow you to have your servers located geographically close to the regions they serve. It would also help mitigate any problems with a cross-database "social graph" since peoples friends tend to live nearby.
You're hardly going to "hit a roof with writing, since writing has to happen to all server" because in most of RDBMS installations:
1) Reads are overwhelming more frequent than writes
2) Modern RDBMs have Multi-Version Concurrency Control able to reduce blocking when reading/writing

Resources