Entity Framework / Glimpse duration disparity - asp.net-mvc

I'm using the latest ASP.NET MVC and Entity Framework (MVC 5.2.2, EF 6.1.2), and latest of Glimpse. I'm working on improving query times to eagerly load an entity with several nested child objects, and have reduced the number of queries by using .Include("Object.Child") to bring in navigation properties. At first, I thought I was getting a good result, seeing the "Total query execution time" in the SQL tab of Glimpse reduce significantly. Yet the "Total connection open time" stays high, and is very long for the resulting combined mega-query. See screenshot below.
I'm wondering if anyone can help me understand what is going on with the differences in the two durations? Glimpse says my command takes <100 ms, but that the SQL connection takes >5 seconds. The query in this case is really messy with lots of joins etc, however it's not clear where the time goes if indeed the query itself finishes in 100 ms.
Note: I've seen the answer about why two durations here, but it doesn't explain the nature of each.

Thanks for asking the question. The timer for the connection duration starts when the connection is opened and finishes when it is closed. To work this out further, how are you using your context/connection, are you sharing it, keeping it around, etc?

After further testing, I think I've figured out what was happening. I saw in another question which suggested that the .Include() approach to eager loading hierarchical entities in Entity Framework can result in complex queries with many joins and duplication of data in the result set. I had a long XML string as one of my properties, so if this was duplicated many times at the database, it would take a long time to return/process.
As a test, I cleared the data-heavy field and reran the query, getting a far shorter "connection" duration (the one list on the right in Glimpse). It went from 9 seconds to under 200 ms total. Based on this I assume the data size was the culprit, and learned my lesson about using large data properties this way.
I'd still be interested to know whether Glimpse could show you the raw data being returned from a query, or even show the size in bytes, along with the record count. This would have likely made this problem evident.

A little late to this question, but I encountered the same problem and was also trying to understand the disparity between my query execution time and connection open time.
FWIW, I discovered that I was passing an enumerable in my view model to the view, rather than a concrete list. Thus, the view was triggering evaluation of the query and prolonging the amount of time that the connection remained open. By passing lists of the items (call .ToList() on the enumerable), I drastically reduced the amount of time for which the connection remained open.

Related

Performance issue in dot net web application

I have a production server (which is available over the internet) in which two applications are running one is in Java and the other is Dot Net. The response of Java application is comparatively faster than Dot Net. For dot net application MVC, My SQL, Entity Framework and LINQ is used. Java application also connected to MY SQL of same server.
In dot net application, the movement it hits the controller action and just before leaving the controller action, it writes the time stamp into log file. There is query executed and filtered the data using LINQ. All these are happening in 2 seconds. But the response to the browser takes time (15-20 seconds). In chrome browser "Network" tab, all the javascripts/css/images are getting loaded in less than 100 ms. I have checked the "Waiting (TTFB)" is taking for the main controller action, it is taking around 15 seconds. Document is also not huge, it is just 20 kb.
Now my question is,
1. What is the reason for such a huge TTFB ?
2. How to reduce the TTFB, (did not find proper answer). Is it because of Entity Framework/ LINQ.
Thanks in advance.
There would be performance issues, for millions of records in Entity Framework/ LINQ if queries are not used properly like using .ToList() function for counting rows instead use .Count() only.

Sqlite randomly slowing down on simple (but big) table on iOS

I'm working on an enterprise sales app, for iPad, that uses Sqlite as its internal database, and a strange behaviour recently showed up.
I have a huge table that is filled with information from several other tables (sort of like a "materialized view"), which can contain over 2 million rows, depending on how the user is set up. When the user wants to search for an item, the app performs a query on this huge table that has an indexed column and on other columns that are used as filters and/or metadata. I'll post the query and the basic idea below. Anyway, this query usually returns in 2~3 seconds on an iPad 4th gen, no more than that, and this is just fine. This table is dropped, re-created and filled every time the user taps a button to synchronize his data with our server.
However, recently the same query in the same table (with no relevant changes at all), randomly started to take 40~50 seconds. If you do the same thing later, on the same device, with the same filters (or even changing the filters!), the same query on the same table takes the 2~3 seconds again. I haven't found any specific situation that causes this slowdown, the app is the only one running at that time. The device is not the problem, we've seen this happen on at least 5 different iPads, one is an iPad 3 and the others are iPads 4th gen.
I don't think it is some sort of caching, since the app does not cache anything, and these times are rather random. Sometimes they take 40 seconds for 10 times in a row, then suddenly it starts to take only 2 seconds again, and the same thing the other way. The only thing that is clear to me is that this slowdown only occurs after intensive use (1 - 2 days of work using the app), so I'm also having troubles to cause this behaviour while debugging on the iPad I have with me.
What I've tried:
Attach Instruments to the process and check what resources are being used during the slowdown. The app does INTENSIVE use of the iPad's 'disk' (flash memory) during the whole time. I don't have the example to analyse it again now, but I think the CPU usage was around 30%. The RAM usage is stable at 90~100MB, which is normal for our app.
Run VACCUM on the db; - reduced ~50MB on a database I had as example. Went from ~600MB to ~550MB.
Run ANALYZE on the db; - didn't see any improvements
Run REINDEX on the db; - seems to be helping a little, but it's not solving the problem.
Kill the process and start over - nothing changes
The huge table is constructed as the following, and does NOT have any foreign keys or other any other constraint:
CREATE TABLE FMV_CATALOG(
UNIQUE_ID TEXT,
PRODUCT_ID INTEGER,
<bunch of metadata/filtered columns - total of 20 columns>
);
And the query that is made to find the products is:
SELECT
PRODUCT_ID
,UNIQUE_ID
<all other required columns, ~20 columns>
FROM
FMV_CATALOG
WHERE
UNIQUE_ID = '<some id>_<other id>'
AND PRODUCT_NAME LIKE '%iPhone%'
<and other optional, rarely used, filters.>
I'm totally out of ideas, so any help will be appreciated.
Thanks!
UPDATE (more info):
Important informations that I forgot to mention, Rob reminded me of it. My database connection is always open, it is closed only when the user logs out. We've noticed a huge performance on all parts of the app when we kept the connection opened, since we have hundreds of small queries that are executed on other situations (but not while browsing/searching the products catalog).
The query used to create the index is below:
CREATE INDEX IDX_MV_CATALOG ON MV_CATALOG(UNIQUE_ID);
Also, even though the column is named UNIQUE_ID, it is not unique. It was supposed to be originally, but now it is repeated N times. I know this is wrong, we'll change that ASAP.
This "UNIQUE_ID" (which is not really unique) is filled by joining the IDs of two other tables. This way, our "materialized view" removes the need of at least three joins when the user searches on our catalog, which improved our query times from ~20 seconds to ~2 seconds.
We don't call sqlite3 API directly on our queries, we have developed a wrapper class around it and we've been using it for at least 2 years now. And it's the first time ever we've been on this situation, but again it's the first time we're handling so much data.
A couple of thoughts:
You're not showing us the creation of any index on FMV_CATALOG. If nothing else, if UNIQUE_ID is, as the name suggests, unique, then I'd be inclined to define the table with a PRIMARY KEY:
CREATE TABLE FMV_CATALOG(
UNIQUE_ID TEXT PRIMARY KEY,
PRODUCT_ID INTEGER,
<bunch of metadata/filtered columns - total of 20 columns>
);
You should try using the SQLite EXPLAIN QUERY PLAN command to look at the query and look at its plan and make sure it's availing itself of your index. Do this as it is, and then again with PRIMARY KEY (and perhaps if that still doesn't do it, an index on the fields in your WHERE clause), and make sure the final query is definitely using your index.
I'm not sure why, if you have the unique id, why you're also looking at the other fields. If adding of the primary key (and possibly other index(es)) doesn't solve the problem, I might try just retrieving the record based upon the unique id, and then check for conformance with your other parameters in code. I don't believe you need to do this, but it's a worst case scenario.
In terms of why it will slow down, that's harder to guess what's going on without seeing the code (which I'm sure is too complicated to share in a simple S.O. question). I could imagine strange behavior if, for example, you fail to sqlite3_finalize after one of your sqlite3_prepare_v2 statements or if you accidentally failed to close the database and then opened it again elsewhere. I could imagine performance issues that might come in place if the sequence of sqlite3 calls wasn't precisely right. Use of something like FMDB can minimize the chance of those sorts of issues occurring (as well as simplifying your SQLite code). Or, if that's too radical of a step, try writing your own macros that call the SQLite calls, but also log the fact that you've called that sqlite3 function, and pour through that log and double check the sequence of your SQLite calls.
The only thing I can suggest is whether you can construct a simplified project that can reproduce the aberrant behavior. Tracking down a Heisenbug can be infuriating: Unless you can consistently reproduce the bug, it's hard to track down.

Neo4j Speed when invoked for the first time

I have a simple jquery which calls a servlet via get and then Neo4j is used to return data in JSON format.
The system is workable after the FIRST query but the very first time it is used the system ins unbelievably slow. This is some kind of initialisation issue. I am using Heroku web hosting.
The code is fairly long so I am not posting now, but are there any known issues regarding the first invocation of Neo4j?
I have done limited testing so far for performance as I had a lot of JSON problems anyway and they only just got resolved.
Summary:
JQuery(LINUX)<--> get (JSON) <---> Neo4j
First Query - response is 10-20 secs
Second Query - time is 2-3 secs
More queries - 2/3 secs.
This is not a one-off; I tested this a few times and always the same pattern comes up.
This is a normal behaviour of Neo4j where store files are mapped into memory lazily for parts of the files that become hot, and becoming hot requires perhaps thousands of requests to such a part. This is a behaviour that has big stores in mind, whereas for smaller stores it merely gets in the way (why not map the whole thing if it fits in memory?).
Then on top of that is an "object" cache that further optimizes access, that get populated lazily for requested entities.
Using an SSD instead of spinning media will usually speed up the initial non-memory-mapped random access quite a bit, but in your scenario I recognize that's not viable.
There are thoughts on beeing more sensitive to hot parts of the store (i.e. memory map even if not as hot) at the start of a database lifecycle, or more precisely have the heat sensitivity be a function of how much is currently memory mapped versus how much can be mapped at maximum. This has shown to make initial requests much more responsive.

Temporary table resource limit

i have two applications (server and client), that uses TQuery connected with TClientDataSet through TDCOMConnection,
and in some cases clientdataset opens about 300000 records and than application throws exception "Temporary table resource limit".
Is there any workaround how to fix this? (except "do not open such huge dataset"?)
update: oops i'm sorry there is 300K records, not 3 millions..
The error might be from the TQuery rather than the TClientDataSet. When using a TQuery it creates a temporary table and it might be this limit that you are hitting. However in saying this, loading 3,000,000 records into a TClientDataSet is a bad idea also as it will try to load every record into memory - which maybe possible if they are only a few bytes each but it is probably still going to kill your machine (obviously at 1kb each you are going to need 3GB of RAM minimum).
You should try to break your data into smaller chunks. If it is the TQuery failing this will mean adjusting the SQL (fewer fields / fewer records) or moving to a better database (the BDE is getting a little tired after all).
You have the answer already. Don't open such a huge dataset in a ClientDataSet (CDS).
Three million rows in a CDS is a huge memory load (depending on the size of each row, it can be gigantic).
The whole purpose of using a CDS is to work quickly with small datasets that can be manipulated in memory. Adding that many rows is ridiculous; use a real dataset instead, or redesign things so you don't need to retrieve so many rows at a time.
over 3 million records is way too much to handle at once. My guess is that you are performing an export or something like that which requires that many records to be sent down the wire. One method you could use to reduce this issue would be to have the middle-tier generate an export file, and then deliver that file to the client (preferably compressing first using ZLIB or something simular).
If you are pulling data back to the client for viewing purposes, then consider sending summary information only, and then allowing the client to dig thier way thru the data a portion at a time. The users would thank you because your performance will go way up and they won't have to dig thru records they don't care about looking at.
EDIT
Even 300,000 records is way too much to handle at once. If you had that many pennies, would you be able to carry them all? But if you made it into larger denominations, you could. if your sending data to the client for a report, then I strongly suggest a summary method... give them the large picture and let them drill slowly into the data. send grouped data and then let them open up slowly.
If this is a search results screen, then set a limit of the number of records to be returned + 1. For example to display 100 records, set the limit to 101. Still only display 100, the last record means that there were MORE than 100 records so the customer needs to adjust thier search criteria to return a smaller subset.
Temporary table resource limit is not a limit for one single query. it is the limit for all open queries together. so it may be a solution for you to close all other queries at the time.
if it is not possible for you to use ADO connection, also you can design a paging mechanism for querying data page by page.
GOOD LUCK

How do I overcome poor SSIS debugging performance?

I’m using SSIS to synchronize data between two databases. I’ve used SSIS and DTS in the past, but I generally write an application for things of this nature (I’m coder and it just comes easier to me).
In my package I use a SQL Task that returns about 15,000 rows. I’ve hooked that up to a Foreach Container, and within that I assign the resultset column values to variables, and then map those variables to parameters that are fed to another SQL Task.
The problem I’m having is with debugging, and not just more complicated debugging like breakpoints and evaluating values at runtime. I simply mean that if I run this with debugging rather than without, it takes hours to complete.
I ended up rewriting the process in Delphi, and the following is what I came up with:
Full Push of Data:
This pulls 15,000 rows, updates a destination table for each row, then pulls 11,000 rows and updates a destination table for each row.
Debugging:
Delphi App: 139s
SSIS: 4 hours, 46 minutes
Not Debugging:
Delphi App: 132s
SSIS: 384s
Update of Data:
This pulls 3,000 rows, but no updates are needed or made to the destination table. It then pulls 11,000 rows but, again, no updates are needed or made to the destination table.
Debugging:
Delphi App: 42s
SSIS: 1 hours, 10 minutes
Not Debugging:
Delphi App: 34s
SSIS: 205s
The odd thing is, I get the feeling that most of this time spent debugging is just updating UI elements in Visual Studio. If I watch the progress tab, a node is added to a tree for each iteration (thousands total), and this gets slower and slower as the process goes on. Trying to stop debugging usually doesn’t work, as Visual Studio seems caught in a loop updating the UI. If I check the profiler for SQL Server no actual work is being done. I'm not sure if the machine matters, but it should be more than up to the job (quad core, 4 gig of ram, 512 mb video card).
Is this sort of behavior normal? As I’ve said I’m a coder by trade, so I have no problem writing an app for this sort of thing (in fact it takes much less time for me to code an application than “draw” it in SSIS, but I figure that margin will shrink with more work done in SSIS), but I’m trying to figure out where something like SSIS and DTS would fit into my toolbox. So far nothing about it has really impressed me. Maybe I’m misusing or abusing SSIS in some way?
Any help would be greatly appreciated, thanks in advance!
SSIS control flow and loops are not very high performance, and not designed for processing these amounts of data. Especially during the debugging - before and after each task execution, debugger sends notifications to designer process, which updates colors of the shapes and this could be slow.
You could get much better performance using data flow. Data flow does not operate with single rows, it works with buffers of rows - much faster, and the debugger is only notified about beginning/end of the buffers - so its impact is less noticeable.
SSIS is not designed to do a foreach like that. If you are doing something for each row coming in, you probably want to read those into a dataflow and then using a lookup or merge join, determine whether to do an INSERT (these happen in bulk) or a database command object for multiple SQL UPDATE commands (a better performing option is to batch these into staging table and do a single UPDATE).
In another typical sync situation, you read all the data into a staging table, and do a SQL Server UPDATE on the existing rows (INNER JOIN) and INSERT on the new rows (LEFT JOIN, rhs IS NULL). There is also the possibility of using linked servers, but joins over that can be slow, since all (or a lot of) the data may have to come across the network.
I have SSIS packages that regular import 24 million rows, including handling data conversion and validation and slowly changing dimensions using the TableDifference component, and it performs relatively quickly for that large amount of data versus a separate client program.
I have noticed this is the behavior, I had an SSIS package for moves, that did somewhere in the neighborhood of 3 million entries, it was not possible to debug as it would run for about 3-4 days.
SSIS is still the way I did it, I just don't "debug" with SSIS, I run them when working with the full datasets. If I must debug, I use very small datasets.

Resources