Performance of generated T-SQL from Entity Framework - entity-framework-4

I recently used Entity Framework for a project, despite my DBA's strong disapproval. So one day he came to my office complaining about generated T-SQL that reaches his database.
For instance, when I want to select a product based on the id, I write something like this:
context.Products.FirstOrDefault(p=>p.Id==id);
Which translates to
SELECT ... FROM (SELECT TOP 1 ... FROM PRODUCTS WHERE ID=#id)
So he is shouting, "Why on earth would you write a SELECT * FROM (SELECT TOP 1)"
So I changed my code to
context.Products.Where(p=>p.Id==id).ToList().FirstOrDefault()
and this produces a much cleaner T-SQL:
SELECT ... FROM PRODUCTS WHERE ID=#id
The inner query and the TOP 1 dissappeared. Enough mambling, my question is this: Does the first query really put an overhead for SQL Server? Is it harder to parse than the second method? The Id column has a Clustered index on. I want a good answer so I can rub it on his face (or mine)
Thanks,
Themos

Have you tried running the queries manually and comparing the executions plans?
The biggest problem here isn't that the SQL isn't perfectly formed to your DBA's standards (although I'm fairly certain that the query engine will optimize out the extra select). The second query actually returns the entire contents of the Products table which you then analyse in memory and this is definitely a task that should be performed by the DB and not the application layer.
In short, he's being a pedant; leave it the way it was.

Related

IBM Cognos 10 - Smple way to globally rename a table column?

My client has decided they want to rename a very commonly used data item name.
So, for example, the database has a column called 'Cost' and they see 'Cost' on a heap of reports.
The client now wants to see 'Net Cost' everywhere.
So we need to change every occurrence of 'Cost' and change it to 'Net Cost'
I can do this in Framework Manager easily enough, and I can even run Tools > Report Dependency to find all the reports that use the 'Cost' column. But if there's 4,000 of them, that's a lot of work to update them all.
One idea is to deploy the entire content store to a Cognos Deployment zip file, extract that & do a global search & replace on the XML. But that's going to be messy & dangerous.
Option 2 is to use MotioPI to do a search & replace. I don't think the client will spring for buying this product just for this task.
Are there other options?
has anyone written anything in the Cognos SDK which will do a rename?
has someone investigated the Content Store database to the degree
that they could do a rename on all the report specs in SQL?
are there other options I've overlooked?
Any ideas would be greatly welcomed ...
Your first option is the way to go. This essentially boils down to an XML find-and-replace scenario. You'll need to isolate just the instances of the word "Cost" which are relevant to you. This may involve the start and end tags.
To change the data source reference across reports, you'll need to find and replace on the three part name [Presentation Layer].[Namespace].[Cost]. If there are filters on the item, they may just reference the one part name from the Query. Likewise, any derived queries would reference the two part name. Handle these by looking through the XML report spec and figuring out how to isolate the text.
I'm assuming your column names are set to inherit the business name from the model and not hard coded (Source Type should be Data Item Label, NOT Text). If not, you'll need to handle these as well. Looking at the XML, you would see <staticValue>Cost</staticValue> for these.
It's not really dangerous as you have a backup. Just take multiple passes, each with as granular a find and replace as possible.
Motio will just look at the values inside the tags, so you will be unable to isolate Cost, thus it can't be used for this. However, it would come in handy for mass validation of reports after the find and replace. A one seat license for the year could be justified by the amount of development time it could save here.
Have you tried using DRU? (http://www-01.ibm.com/support/docview.wss?uid=swg24021248)
I have used this tool before to do what you are describing.
Good luck.
You can at least search for text in the Content Store (v10.2.1) using something like:
set define off
select distinct T4.name as folder_name, T2.name as report_name
from cmobjprops7 T1
inner join cmobjnames T2 on T1.cmid=T2.cmid
inner join cmobjects T3 on T1.cmid=T3.cmid
inner join cmobjnames T4 on T3.pcmid=T4.cmid
inner join ( -- only want the latest version (this still shows reports that have been deleted)
Select T4.name as folder_name, T2.name as report_name, max(T3.modified) as latest_version_date
from cmobjnames T2
inner join cmobjects T3 on T2.cmid=T3.cmid
inner join cmobjnames T4 on T3.pcmid=T4.cmid
Where T2.Name like ‘%myReport%’ -- search only reports with 'myReport' i the name
and substr(T4.name,1,1) in ('Project Zeus','Project Jupiter') -- search only folders with this in the name
Group by T4.name, T2.name
) TL on TL.folder_name=T4.name and TL.report_name=T2.name and TL.latest_version_date=T3.modified
where T1.spec like '%[namespace].[column_name]%' -- text you want to find
and substr(T4.name,1,2) in ('Project Zeus','Project Jupiter')
order by 1 desc, 2;

Optimizing JOINs : comparison with indexed tables

Let's say we have a time consuming query described below :
(SELECT ...
FROM ...) AS FOO
LEFT JOIN (
SELECT ...
FROM ...) AS BAR
ON FOO.BarID = BAR.ID
Let's suppose that
(SELECT ...
FROM ...) AS FOO
Returns many rows (let's say 10 M). Every single row has to be joined with data in BAR.
Now let's say we insert the result of
SELECT ...
FROM ...) AS BAR
In a table, and add the ad hoc index(es) to it.
My question :
How would the performance of the "JOIN" with a live query differ from the performance of the "JOIN" to a table containing the result of the previous live query, to which ad hoc indexes would have been added ?
Another way to put it :
If a JOIN is slow, would there be any gain in actually storing and indexing the table to which we JOIN to ?
The answer is 'Maybe'.
It depends on the statistics of the data in question. The only way you'll find out for sure is to actually load the first query into a temp table, stick a relevant index on it, then run the second part of the query.
I can tell you if speed is what you want, if it's possible for you load the results of your first query permanently into a table then of course your query is going to be quicker.
If you want it to be even faster, depending on which DBMS you are using you could consider creating an index which crosses both tables - if you're using SQL Server they're called 'Indexed Views' or you can also look up 'Reified indexes' for other systems.
Finally, if you want the ultimate in speed, consider denormalising your data and eliminating the join that is occurring on the fly - basically you move the pre-processing (the join) offline at the cost of storage space and data consistency (your live table will be a little behind depending on how frequently you run your updates).
I hope this helps.

Indices not working on sqlite table

I am using indices on columns on which I am making a search. The indices are created like this:
CREATE INDEX index1 on <TABLE>(<col1> COLLATE NOCASE ASC)
CREATE INDEX index2 on <TABLE>(<col2> COLLATE NOCASE ASC)
CREATE INDEX index3 on <TABLE>(<col3> COLLATE NOCASE ASC)
Now, the select query to search for records is like this:
select <col1> from <TABLE> where <col1> like '%monit%' AND <col2> like '%84%' GROUP BY <col1> limit 0,501;
When I run EXPLAIN QUERY PLAN on my sqlite database like this:
EXPLAIN QUERY PLAN select <col1> from <TABLE> where <col1> like '%monit%' AND <col2> like '%84%' GROUP BY <col1> limit 0,501;
It returns the output as:
0|0|0|SCAN TABLE USING INDEX (~250000 rows)
and when I drop the index, the output this EXPLAIN QUERY PLAN produces is:
0|0|0|SCAN TABLE (~250000 rows)
0|0|0|USE TEMP B-TREE FOR GROUP BY
Isn't the number of rows that are scanned (~250000 rows) were supposed to be lesser when index was used in searching the table???
I guess the problem here is with LIKE keyword, because I have read somewhere that LIKE keyword nullifies the use if indices... Here is the link
EDIT: For indices to work on a query which is using LIKE, The right-hand side of the LIKE must be a string literal that does not begin with a wildcard character. So, in the above query, I tried using search parameter in like without '%' at the beginning:
EXPLAIN QUERY PLAN select <col1> from <TABLE> where <col1> like 'monit%' AND <col2> like '84%' GROUP BY <col1> limit 0,501;
and the output I got was this:
0|0|0|SEARCH TABLE partnumber USING INDEX model_index_partnumber (model>? AND model
so,you see. The number of rows being searched (rather than scan) are (~15625 rows) in this.
But the problem now is I cannot do away with % wild card at the beginning. Anyone pls suggest me an alternative way to achieve the same....
EDIT:
I have tried using FTS3 from terminal but when I typed this query:
CREATE VIRTUAL TABLE <tbl> USING FTS3 (<col_list>);
Its throwing error as:
Error: no such module: FTS3
Someone pls help me to enable FTS3 from terminal as well as XCode (need the steps I must perform for both tasks).
I am using sqlcipher and have already perform this from terminal:
CFLAGS="-DSQLITE_ENABLE_FTS3=1" ./configure
EDIT:
Please visit the question sqlite table taking time to fetch the records in LIKE query posted by me
EDIT:
Hey All, I got some success. I modified my select query to look like this:
select distinct description collate nocase as description from partnumber where rowid BETWEEN 1 AND (select max(rowid) from partnumber) AND description like '%a%' order by description;
And Bingo, the search time was like never before. But the problem now is when I execute the command EXPLAIN QUERY PLAN like this, it shows me using B-Tree for distinct which I dont want to use.
explain query plan select distinct description collate nocase as description from partnumber where rowid BETWEEN 1 AND (select max(rowid) from partnumber) AND description like '%a%' order by description;
Output:
0|0|0|SEARCH TABLE partnumber USING INTEGER PRIMARY KEY (rowid>? AND rowid<?) (~15625 rows)
0|0|0|EXECUTE SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE partnumber USING INTEGER PRIMARY KEY (~1 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT
A couple other options ...
Full Text Indexes:
http://sqlite.org/fts3.html
The most common (and effective) way to describe full-text searches is
"what Google, Yahoo and Altavista do with documents placed on the
World Wide Web".
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux'; /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */
Word Breaking:
If you're looking for words (or words that start with), you can break text blobs into words yourself and store your own indexed word tables. But even then, you'll be able to only do word like 'monit%' to get hits like "monitor"
If possible, use the full text - it will be much less code. But, if that's not an option for some reason, then you can fall back to your own word breaking tables but that's limited words begins with to avoid scans. (better than whole text block begins with).
Be aware that the sqlite that comes with iOS does not have Full Text enabled. You can work around that. There's instructions on that and it's use at:
http://longweekendmobile.com/2010/06/16/sqlite-full-text-search-for-iphone-ipadyour-own-sqlite-for-iphone-and-ipad/
The full docs on creating and querying full text tables are here: http://sqlite.org/fts3.html
To get FTS3 to also work from terminal, see:
Compiling the command line interface # http://www.sqlite.org/howtocompile.html
sqlite3 using fts3 create table in my mac terminal and how to use it in iphone xcode project?
This is quite simple. You are telling SQLITE to examine every record in the table. It is faster to do this without using an index, because using an index wuld involve additional IO. And index is used when you want to examine a subset of the records in a table where the extra IO of using the index is paid back by not having to examine every record in the table.
When you say LIKE "%something" that means all records with anything at all at the beginning of the field, followed by something. The only way to do this is to examine every single record. Note that indexes should still be used if you only use LIKE "something%" because in this case, SQLITE can use the index to find the subset of records beginning with "something". In the old days when databases where not so clever we used to write it like this to enforce the use of an index. SELECT * WHERE col1 >= "something" AND col1 < "somethinh", Note the intentional mispelling of something in the second condition.
If you can it is best to avoid using % at the beginning of a LIKE condition. In some cases you may be able to change your schema so that data is stored in two columns rather than one. Then you use a LIKE "something%" search on the second of the two columns. Of course this depends on your data being structured right.
But even if splitting into two columns is not possible, it may be possible to divide and conquer the data in another way. For instance you could split the search fields into words, and index every word in a single column in another search table. That way "look for something or other" becomes a list of records where "something" is an exact match on a record in the search table. No LIKE required. You would then get a record ID to retrieve the original record. This is one of the things that SOLR does internally so if you must stick with SQLITE and cannot leverage SOLR or LUCENE in any way, then you can always read up on how they build inverted indices and do the same thing yourself in your SQLITE db.
Remember that LIKE "%something%" must examine every record, but if you can select a subset of the data first, and then apply the LIKE search, this will run a lot faster. Filling the cache will have the same effect which is what your experiments with DISTINCT were doing. Maybe all you need to do is to enlarge the cache to get acceptable search times. The first search will still be slow, but people are often quite forgiving of problems which go away when you retry it.
When you use arbitrary wildcards like that you are getting very close to a full text search engine requirement like SOLR. These work by indexing the data 100% in RAM. With SQLITE you might be able to do something similar by creating a second in-memory database, reading all data from the disk tables into the in-memory db, then using the in-memory db for searching with wildcards. You would still have full-table scans with queries such as LIKE "%monit%" however that scan takes place in RAM where it is not as timeconsuming. You don't need to import all your data into RAM, only the parts where you need "%something%" searches, because SQLITE can do cross-database joins. SQLITE makes it easy to create an in-memory database, and the ATTACH DATABASE and DETACH DATABASE commands make it easy to connect a second database to your app. There is some example code for IOS in this question Can iPhone sqlite apps attach to other databases?
Not sure why you don't like EXPLAIN using B-Trees since the b-tree is probably the fastest possible search structure available when your data has to be read from a filesystem.
I have a MySQL book that suggests REVERSE() the text (and if your application permits, store in a column). Then search the reversed text using LIKE(REVERSE('%something')).

Is a full list returned first and then filtered when using linq to sql to filter data from a database or just the filtered list?

This is probably a very simple question that I am working through in an MVC project. Here's an example of what I am talking about.
I have an rdml file linked to a database with a table called Users that has 500,000 rows. But I only want to find the Users who were entered on 5/7/2010. So let's say I do this in my UserRepository:
from u in db.GetUsers() where u.CreatedDate = "5/7/2010" select u
(doing this from memory so don't kill me if my syntax is a little off, it's the concept I am looking for)
Does this statement first return all 500,000 rows and then filter it or does it only bring back the filtered list?
It filters in the database since your building your expression atop of an ITable returning a IQueryable<T> data source.
Linq to SQL translates your query into SQL before sending it to the database, so only the filtered list is returned.
When the query is executed it will create SQL to return the filtered set only.
One thing to be aware of is that if you do nothing with the results of that query nothing will be queried at all.
The query will be deferred until you enumerate the result set.
These folks are right and one recommendation I would have is to monitor the queries that LinqToSql is creating. LinqToSql is a great tool but it's not perfect. I've noticed a number of little inefficiencies by monitoring the queries that it creates and tweaking it a bit where needed.
The DataContext has a "Log" property that you can work with to view the queries created. I created a simple HttpModule that outputs the DataContext's Log (formatted for sweetness) to my output window. That way I can see the SQL it used and adjust if need be. It's been worth its weight in gold.
Side note - I don't mean to be negative about the SQL that LinqToSql creates as it's very good and efficient almost every time. Another good side effect of monitoring the queries is you can show your friends that are die-hard ADO.NET - Stored Proc people how efficient LinqToSql really is.

LINQ to SQL Pagination and COUNT(*)

I'm using the PagedList class in my web application that many of you might be familiar with if you have been doing anything with ASP.NET MVC and LINQ to SQL. It has been blogged about by Rob Conery, and a similar incarnation was included in things like Nerd Dinner, etc. It works great, but my DBA has raised concerns about potential future performance problems.
His issue is around the SELECT COUNT(*) that gets issued as a result of this line:
TotalCount = source.Count();
Any action that has paged data will fire off an additional query (like below) as a result of the IQueryable.Count() method call:
SELECT COUNT(*) AS [value] FROM [dbo].[Products] AS [t0]
Is there a better way to handle this? I considered using the Count property of the PagedList class to get the item count, but realized that this won't work because it's only counting the number of items currently displayed (not the total count).
How much of a performance hit will this cause to my application when there's a lot of data in the database?
iirc this stuff is a part of index stats and should be very efficient, you should ask your DBA to substatiate his concerns, rather than prematurely optimising.
Actually, this is a pretty common issue with Linq.
Yes, index stats will get used if the statement was only SELECT COUNT(*) AS [value] FROM [dbo].[Products] AS [t0] but 99% of the time its going to contain WHERE statements as well.
So basically two SQL statements are executed:
SELECT COUNT(*) AS [value] FROM [dbo].[Products] AS [t0] WHERE blah=blah and someint=500
SELECT blah, someint FROM [dbo].[Products] AS [t0] WHERE blah=blah and someint=500
You start receiving problems if the table is updated often as the COUNT(*) returned in the first statement doesnt equal the second statement...this may return an error message 'Row not found or changed.'
Some databases (Oracle, Postgresql, SQL Server I think) keep a record of row counts in the system tables; though these are sometimes only accurate to the point at which the statistics were last refreshed (Oracle). You could use this approach, if you only need a fairly-accurate-but-not-exact metric.
Which database are you using, or does that vary?
(PS I know that you are talking about MsSQL however)
I am no DBA but count(*) in MySQL is a real performance hit. Simply changing this to count(ID) really does improve the speed.
I came across this when I was querying a table with very large GLOB (Images) data. The query tool around 15 seconds to load. Changing the query to count(id) reduced the query to 0.02. Still a little slow but a hell of a lot better.
I think this is what the DBA is getting at. I have noticed then when debuggin Linq the statement that counts takes a very long time (1 second) to jump to the next statement.
Based on my finding I have to agree with the DBA's conserns...

Resources