I have many stored procedures running in Volt and it seems like 1 of them is causing spikes in CPU every now and then but I don't know which one.
Is there somewhere I can see the history of all the stored procedures that ran so that I could pinpoint the problematic one based on the time it occurred?
I have tried turning the Command Logging on but it's a binary file so I have no way of reading it.
My next option is to log from inside the stored procedures but I prefer to keep this option as a last resort because it will require some extra developing/deploying and it won't be relevant for internal procedures.
Is there any way to log/somehow see when stored procedures ran?
There isn't a log of every transaction in VoltDB that a user can review. The command log is not meant to be readable and only includes writes. However, there are some tools you can use to identify poorly performing or long-running procedures.
You can call "exec #Statistics PROCEDUREPROFILE 0;" to get a summary of all the procedures that have been executed, including the number of invocations and the average execution time in nanoseconds. If one particular procedure is the problem, it may stick out.
You can also grep the volt.log file for the phrase "is taking a long time", which is a message printed when a procedure or SQL statement takes longer than 1 second to execute.
Also, there is a script in the tools subdirectory called watch_performance.py, which can be used to monitor the performance. It is similar to calling "exec #Statistics PROCEDUREPROFILE 0;" at regular intervals, except there are some columns gathered from additional #Statistics selectors, and the output is formatted for readability. "./watch_performance.py -h" will output help and usage information. For example, you might run this during a performance load to get a picture of the workload. Or, you might run it over a longer period of time, perhaps at less granular intervals, to see the fluctuations in the workload over time.
Disclosure: I work for VoltDB
Related
Is there a way to measure the time it takes to perform each part of a Neo4j execution plan?
I can see the total execution time and total db hits. Also db hits and estimated rows for each part of the execution plan but not the time it takes to perform it. For example, the time it takes to perform a 'Filter' or 'Expand(All)' operation.
Nop you can't.
But you have the number of dbhits on each boxes, so you are already aware of the resources consumption of each part.
Why do you want to know the time of each part ?
Update answer after comment
A dbhit is an abstract unit of work for the database. So more dbhit you've got on a box, more work needs to be done on it, and so it takes more time.
On the other side, an execution depends a lot of the state of your computer. Do you have a lot of processes that are using the CPU, memory, network, hard drive ... ?
So to compare time executions is bad habit, you should compare the dbhits.
DBHits are always related to the time execution of a query, but the opposite is not necessary true.
I am trying to optimize the PostgreSQL 9.1 database for a Rails app I am developing. In postgresql.conf I have set
log_min_duration_statement = 200
I then use PgBadger to analyze the log file. The statement which, by far, takes up most of the time is:
COMMIT;
I get no more information than this and I am very confused as to what statement this is. Does anyone know what I can do to get more detailed information about the COMMIT queries? All other queries show the variables used in the statement, SELECT, UPDATE etc. But not the COMMIT queries.
As #mvp notes, if COMMIT is slow the usual reason is slow fsync()s because every transaction commit must flush data to disk - usually with the fsync() call. That's not the only possible reason for slow commits, though. You might:
have slow fsync()s as already noted
have slow checkpoints stalling I/O
have a commit_delay set - I haven't verified that delayed commits get logged as long running statements, but it seems reasonable
If fsync() is slow, your best option is to re-structure your work so you can run it in fewer larger transactions. A reasonable alternative can be to use a commit_delay to group commits; this will group commits up to improve overall throughput but will actually slow individual transactions down.
Better yet, fix the root of the problem. Upgrade to a RAID controller with battery backed write-back cache or to high-quality SSDs that're power-fail safe. See, ordinary disks can generally do less than one fsync() per rotation, or between 5400 and 15,000 per minute depending on the hard drive. With lots of transactions and lots of commits, that's going to limit your throughput considerably, especially since that's the best case if all they're doing is trivial flushes. By contrast, if you have a durable write cache on a RAID controller or SSD, the OS doesn't need to make sure the data is actually on the hard drive, it only needs to make sure it's reached the durable write cache - which is massively faster because that's usually just some power-protected RAM.
It's possible fsync() isn't the real issue; it could be slow checkpoints. The best way to see is to check the logs to see if there are any complaints about checkpoints happening too frequently or taking too long. You can also enable log_checkpoints to record how long and how frequent checkpoints are.
If checkpoints are taking too long, consider tuning the bgwriter completion target up (see the docs). If they're too frequent, increase checkpoint_segments.
See Tuning your PostgreSQL server for more information.
COMMIT is perfectly valid statement which purpose is to commit currently pending transaction. Because of nature of what it really does - making sure that data is really flushed to disk, it is likely to take most of the time.
How can you make your app work faster?
Right now, it is likely that your code is using so called auto-commit mode - that is, every statement is implicitlly COMMIT'ted.
If you explicitly wrap bigger blocks into BEGIN TRANSACTION; ... COMMIT; blocks, you will make your app work much faster and reduce number of commits.
Good luck!
Try to log every query for a couple of days and then see what is going on in the transaction before the COMMIT statement.
log_min_duration_statement = 0
This has happened to me on more than one occasion and has led to many lost hours chasing a ghost. As typical, when I am debugging some really difficult timing-related code I start adding tons of OutputDebugString() calls, so I can get a good picture of the sequence of related operations. The problem is, the Delphi 6 IDE seems to be able to only handle that situation for so long. I'll use a concrete example I just went through to avoid generalities (as much as possible).
I spent several days debugging my inter-thread semaphore locking code along with my DirectShow timestamp calculation code that was causing some deeply frustrating problems. After having eliminated every bug I could think of, I still was having a problem with Skype, which my application sends audio to.
After about 10 seconds the delay between my talking and hearing my voice come out of Skype on the second PC that I was using for testing, the far end of the call, started to grow. At around 20 - 30 seconds the delay started to grow exponentially and at that point triggered code I have that checks to see if a critical section was being held too long.
Fortunately it wasn't too late at night and having been through this before, I decided to stop relentlessly tracing and turned off the majority of the OutputDebugString(). Thankfully I had most of them wrapped in a conditional compiler define so it was easy to do. The instant I did this the problems went away, and it turned out my code was working fine.
So it looks like the Delphi 6 IDE starts to really bog down when the amount of OutputDebugstring() traffic is above some threshold. Perhaps it's just the task of adding strings to the Event Log debugger pane, which holds all the OutputDebugString() reports. I don't know, but I have seen similar problems in my applications when a TMemo or similar control starts to contain too many strings.
What have those of you out there done to prevent this? Is there a way of clearing the Event Log via some method call or at least a way of limiting its size? Also, what techniques do you use via conditional defines, IDE plug-ins, or whatever, to cope with this situation?
A similar problem happened to me before with Delphi 2007. Disable event viewing in the IDE and instead use DebugView from Sysinternals.
I hardly ever use OutputDebugString. I find it hard to analyze the output in the IDE and it takes extra effort to keep several sets of multiple runs.
I really prefer a good logging component suite (CodeSite, SmartInspect) and usually log to various files. Standard files for example are "General", "Debug" (standard debug info that I want to collect from a client installation as well), "Configuration", "Services", "Clients". These are all set up to "overflow" to a set of numbered files, which allows you to keep the logs of several runs by simply allowing more numbered files. Comparing log info from different runs becomes a whole lot easier that way.
In the situation you describe I would add debug statements that log to a separate logfile. For example "Trace". The code to make "Trace" available is between conditional defines. That makes turning it on pretty simple.
To avoid leaving in these extra debug statements, I tend to make the changes to turn on the "Trace" log without checking it out from source control. That way, the compiler of the build server will throw out "identifier not defined" errors on any statements unintentionally left in. If I want to keep these extra statements I either change them to go to the "Debug" log, or put them between conditional defines.
The first thing I would do is make certain that the problem is what you think it is. It has been a long time since I've used Delphi, so I'm not sure about the IDE limitations, but I'm a bit skeptical that the event log will start bogging down exponentially over time with the same number of debug strings being written in a period of 20-30 seconds. It seems more likely that the number of debug strings being written is increasing over time for some reason, which could indicate a bug in your application control flow that is just not as obvious with the logging disabled.
To be sure I would try writing a simple application that just runs in a loop writing out debug strings in chunks of 100 or so, and start recording the time it takes for each chunk, and see if the time starts to increase as significantly over a 20-30 second timespan.
If you do verify that this is the problem - or even if it's not - then I would recommend using some type of logging library instead. OutputDebugString really loses it's effectiveness when you use it for massive log dumps like that. Even if you do find a way to reset or limit the output window, you'd be losing all of that logging data.
IDE Fix Pack has an optimisation to improve performance of OutputDebugString
The IDE’s Debug Log View also got an optimization. The debugger now
updates the Log View only when the IDE is idle. This allows the IDE to
stay responsive when hundreds of OutputDebugString messages or other
debug messages are written to the Debug Log View.
Note that this only runs on Delphi 2007 and above.
Looking for a little help getting started on a little project i've had in the back of my mind for a while.
I have log file(s) varying in size depending on how often they are cleaned from 50-500MB. I'd like to write a program that will monitor the log file while its actively being written to. when in use it's being changed pretty quickly easily several hundred lines a second or so. Most if not all of the examples i've seen for reading log/text files are simply open and read file contents into a variable which isn't really feasible to do every time the file changes in this situation. I've not settled on a language to write this in but its on a windows box and I can work in .net flavors / java / or php ( heh dont think php will fly to well for this), and can likely muddle through another language if someone has a suggestion for something well built for handling this.
Essentially I believe what I'm looking for would probably be better described to as a high speed way of monitoring a text file for changes and seeing what those changes are. Each line written is relatively small. (less than 300 characters, so its not big data on each line).
EDIT: to change the wording to hopefully better describe what i'm trying to do. Which is write a program to keep an eye on a log file for a trigger then match a following action to that trigger. So my question here is pertaining to file handling inside a programming language.
I greatly appreciate any thoughts/comments.
If it's incremental then you can just read the whole file the first time you start analyzing logs, then you keep the current size as n. Next time you check (maybe a timed action to check last modified date) just skip first n bytes, read all new bytes and update size.
Otherwise you could use tail -f by getting its stdout and using it for your purposes..
The 'keep an eye on a log file' part of what you are describing is what tail does.
If you plan to implement it in Java, you can check this question: Java IO implementation of unix/linux "tail -f" and add your trigger logic to lines read.
I suggest not reinventing the wheel.
Try using the elastic.co
All of these applications are open source and free and are capable of monitoring (together) and trigger actions based on input.
filebeats - will read the log file line by line (supports multiline log messages as well) and will send it across to logstash. There are loads of other shippers you can use.
logstash - will take the log messages, filter them, add tags and send the messages to elasticsearch
elasticsearch - will take the log messages and index them, the store them. It is also capable of running actions based on input
kibana - is a user friendly web interface to query and analyze the data. Or just simply put it up on a dashboard.
Hope this helps.
I'm using Delphi7 and i need a solution to a big problem.Can someone provide me a faster way for searching through files and folders than using findnext and findfirst? because i also process the data for each file/folder (creation date/author/size/etc) and it takes a lot of time...I've searched a lot under WinApi but probably I haven't see the best function in order to accomplish this. All the examples which I've found made in Delphi are using findfirst and findnext...
Also, I don't want to buy components or use some free ones...
Thanks in advance!
I think any component that you'd buy, would also use findfirst/findnext. Recursively, of course. I don't think there's a way to look at every directory and file, without actually looking at every directory and file.
As a benchmark to see if your code is reasonably fast, compare performance against WinDirStat http://windirstat.info/ (Just to the point where it's gathered data, and is ready to build its graph of the space usage.)
Source code is available, if you want to see what they're doing. It's C, but I expect it's using the same API calls.
The one big thing you can do to really increase your performance is parse the MFT directly, if your volumes are NTFS. By doing this, you can enumerate files very, very quickly -- we're talking at least an order of magnitude faster. If all the metadata you need is part of the MFT record, your searches will complete much faster. Even if you have to do more reads for extra metadata, you'll be able to build up a list of candidate files very quickly.
The downside is that you'll have to parse the MFT yourself: There's no WinAPI functions for doing it that I'm aware of. You also get to worry about things that the shell normally does for you in worrying about things like hardlinks, junctions, reparse points, symlinks, shell links, etc.
However, if you want speed, the increase in complexity is the only way to achieve it.
I'm not aware of any available Delphi code that already implements an MFT parser, so you'll probably have to either use a 3rd party library or implement it yourself. I was going to suggest the Open Source (GPL) NTFS Undelete, which was written in Delphi, but it implements the MFT parsing via Python code and has a Delphi-Python bridge built in.
If you want to get really fast search results consider using the Windows Search (API) or the Indexing service.
Other improvements might be to make use of threads and split the search for files and the gathering of file properties or just do a threaded search.
I once ran into a very similar problem where the number of files in the directory, coupled with findfirst/findnext was taking more time than was reasonable. With a few files its not an issue, but as you scale upwards into the thousands, or tens of thousands of files, then performance drops considerably.
Our solution was to use a queue file in a separate directory. As files are "added" to the system they were written to a queue file (was a fixed record file). When the system needed to process data, it would see if the file existed, and if so then rename it and open the renamed version (this way adds could occur for the next process pass). The file was then processed in order. We then archived the queue file & processed files into a subdirectory based on the date and time (for example: G:\PROCESSED\2010\06\25\1400 contained the files run at 2:00 pm on 6/25/2010).
Using this approach not only did we reach an almost "real-time" processing of files (delayed only by the frequency by which we processed the queue file), but we also insured processing of files in the order they were added.
If you need to scan remote drive with that many files, I would strongly suggest doing so with a "client-server" design, so that the actual file scanning is always done locally and only the results are fetched remotely. That would save you a lot of time. Also, all "server" could scan in parallel.
If your program is running on Windows 7 or Server 2008 R2, there are some enhancements to the Windows FindFirstFileEx function which will make it run a bit faster. You would have to copy and modify the VCL functions to incorporate the new options.
There isn't much room for optimization with a findfirst / findnext loop, because it's mostly I/O bound: the operating system needs to read this information from your HDD!
The proof: Make a small program that implements a simple findfirst / findnext loop that does NOTHING with the files it finds. Restart your computer and run it over your big directory, note the time it takes to finish. Then run it again, without restarting the computer. You'll notice the second run is significantly faster, because the operating system cached the information!
If you know for sure the directory you're trying to scan is heavily accessed by the OS because of some other application that's using the data (this would put the directory structure information into the OS's cache and make scanning not be bound to the I/O) you can try running several findfirst/findnext loops in parallel using threads. The down side of this is that if the directory structure is not allready in the OS cache your algorithm is again bound to HDD in/out and it might be worst then the original because you're now making multiple parallel I/O requests that need to be handled by the same device.
When I had to tackle this same problem I decided against parallel loops because the SECOND run of the application is allways so much faster, prooving I'm bound to I/O and no ammount of CPU optimisation would fix the I/O bottleneck.
I solved a similar problem by using two threads. This way I could "process" the file(s) at the same time as they where scanned from the disk. In my case the processing was significantly slower than scanning so I also had to limit the number of files in memory at one time.
TMyScanThread
Scan the file structure, for each "hit" add the path+file to a TList/TStringList or similar using Syncronize(). Remember to Sleep() inside the loop to let the OS have some time too.
PseudoCode for the thread:
TMyScanThread=class(TThread)
private
fCount : Cardinal;
fLastFile : String;
procedure GetListCount;
procedure AddToList;
public
FileList : TStringList;
procedure Execute; Override;
end;
procedure TMyScanThread.GetListCount;
begin
fCount := FileList.Count;
end;
procedure TMyScanThread.AddToList;
begin
FileList.Add(fLastFile);
end;
procedure TMyScanThread.Execute;
begin
try
{ Get the list size }
Syncronize( GetListCount );
if fCount<500 then
begin
// FindFirst code goes here
{ Add a file to the list }
fLastFile := SR.Name; // Store Filename in local var
Syncronize( AddToList ); // Call method to add to list
SleepEx(0,True);
end else
SleepEx(1000,True);
finally
Terminate;
end;
end;
TMyProcessFilesThread
Get the oldest entry in the list, and Process it. Then output results to DB.
This class is implemented similarly with Syncronized methods that access the list.
One alternative to the Syncronize() calls is to use a TCriticalSection. Implementing Syncronization between threads is often a matter of taste and the task at hand ...
You can also try BFS vs. DFS. This may affect your performance.
Link
http://en.wikipedia.org/wiki/Breadth-first_search
http://en.wikipedia.org/wiki/Depth-first_search
When I started to run into performance problems working with lots of small files on in the file system I moved to storing the files as blobs in database. There is no reason why related information like size, creation, and author couldn't also be stored in the database. Once the tables are populated in the database, I suspect that the database engine could do a much faster job of finding records (files) than any solution that we are going to come up with since Database code is highly specialized for efficient searches through large data sets. This will definitely be more flexible since adding a new search would be as simple as creating a new Select statement. Example: Select * from files where author = 'bob' and size > 10000
I'm not sure that approach will help you. Could you tell us more about what you are doing with these files and the search criteria.