Is a full list returned first and then filtered when using linq to sql to filter data from a database or just the filtered list? - asp.net-mvc

This is probably a very simple question that I am working through in an MVC project. Here's an example of what I am talking about.
I have an rdml file linked to a database with a table called Users that has 500,000 rows. But I only want to find the Users who were entered on 5/7/2010. So let's say I do this in my UserRepository:
from u in db.GetUsers() where u.CreatedDate = "5/7/2010" select u
(doing this from memory so don't kill me if my syntax is a little off, it's the concept I am looking for)
Does this statement first return all 500,000 rows and then filter it or does it only bring back the filtered list?

It filters in the database since your building your expression atop of an ITable returning a IQueryable<T> data source.

Linq to SQL translates your query into SQL before sending it to the database, so only the filtered list is returned.

When the query is executed it will create SQL to return the filtered set only.
One thing to be aware of is that if you do nothing with the results of that query nothing will be queried at all.
The query will be deferred until you enumerate the result set.

These folks are right and one recommendation I would have is to monitor the queries that LinqToSql is creating. LinqToSql is a great tool but it's not perfect. I've noticed a number of little inefficiencies by monitoring the queries that it creates and tweaking it a bit where needed.
The DataContext has a "Log" property that you can work with to view the queries created. I created a simple HttpModule that outputs the DataContext's Log (formatted for sweetness) to my output window. That way I can see the SQL it used and adjust if need be. It's been worth its weight in gold.
Side note - I don't mean to be negative about the SQL that LinqToSql creates as it's very good and efficient almost every time. Another good side effect of monitoring the queries is you can show your friends that are die-hard ADO.NET - Stored Proc people how efficient LinqToSql really is.

Related

Neo4j Cypher optimization of complex paginated query

I have a rather long and complex paginated query. I'm trying to optimize it. In the worst case - first, I have to execute the data query in a one call to Neo4j, and then I have to execute pretty much the same query for the count. Of course, I do everything in one transaction. Anyway, I don't like the overall execution time, so I extracted the most common part for both - data and count queries and execute it on the first call. This common query returns the IDs of nodes, which I then pass as parameters to the rest of data and count queries. Now, everything works much faster. One thing I don't like is that a common query can sometimes return quite a large set of IDs.. it can be 20k..50k Long IDs.
So my question is - because I'm doing this in a one transaction - is there a way to preserve such Set of IDs somewhere in Neo4j between common query and data/count query calls and just refer them somehow in the subsequent data/count queries without moving between app JVM and Neo4j?
Also, am I crazy for doing this, or is this a good approach to optimize a complex paginated query?
Only with a custom procedure.
Otherwise you'd need to return them.
But usually it's uncommon to both provide counts (even google doesn't provide "real" counts) and data.
One way is to just stream the results with the reactive driver as long as the user scrolls.
Otherwise I would just query for pageSize+1 and return "more than pageSize results".
If you just stream the id's back (and don't collect them as an aggregation) you can start using the id's received already to issue your new queries (even in parallel).

Write native SQL in Core Data

I need to write a native SQL Query while I'm using Core Data in my project. I really need to do that, since I'm using NSPredicate right now and it's not efficient enough (in just one single case). I just need to write a couple of subqueries and joins to fetch a big number of rows and sort them by a special field. In particular, I need to sort it by the sum of values of their child-entites. Right now I'm fetching everything using NSPredicate and then I'm sorting my result (array) manually, but this just takes too long since there are many thousands of results.
Please correct me if I'm wrong, but I'm pretty sure this can't be a huge challenge, since there's a way of using sqlite in iOS applications.
It would be awesome if someone could guide me into the right direction.
Thanks in advance.
EDIT:
Let me explain what I'm doing.
Here's my Coredata model:
And here's how my result looks on the iPad:
I'm showing a table with one row per customer, where every customer has an amount of sales he made from January to June 2012 (Last) AND 2013 (Curr). Next to the Curr there's the variance between those two values. The same thing for gross margin and coverage ratio.
Every customer is saved in the Kunde table and every Kunde has a couple of PbsRows. PbsRow actually holds the sum of sales amounts per month.
So what I'm doing in order to show these results, is to fetch all the PbsRows between January and June 2013 and then do this:
self.kunden = [NSMutableOrderedSet orderedSetWithArray:[pbsRows valueForKeyPath:#"kunde"]];
Now I have all customers (Kunde) which have records between January and June 2013.
Then I'm using a for loop to calculate the sum for each single customer.
The idea is to get the amounts of sales of the current year and compare them to the last year.
The bad thing is that there are a lot of customers and the for-loop just takes very long :-(
This is a bit of a hack, but... The SQLite library is capable of opening more than one database file at a given time. It would be quite feasible to open the Core Data DB file (read/only usage) directly with SQLite and open a second file in conjunction with this (reporting/temporary tables). One could then execute direct SQL queries on the data in the Core Data DB and persist them into a second file (if persistence is needed).
I have done this sort of thing a few times. There are features available in the SQLite library (example: full-text search engine) that are not exposed through Core Data.
If you want to use Core Data there is no supported way to do a SQL query. You can fetch specific values and use [NSExpression expressionForFunction:arguments:] with a sum: function.
To see what SQL commands Core Data executes add -com.apple.CoreData.SQLDebug 1 to "Arguments Passed on Launch". Note that this should not tempt you to use the SQL commands youself, it's just for debugging purposes.
Short answer: you can't do this.
Long answer: Core Data is not a database per se - it's not guaranteed to have anything relational backing it, let alone a specific version of SQLite that you can query against. Furthermore, going mucking around in Core Data's persistent store files is a recipe for disaster, especially if Apple decides to change the format of that file in some way. You should instead try to find better ways to optimize your usage of NSPredicate or start caching the values you care about yourself.
Have you considered using the KVC collection operators? For example, if you have an entity Foo each with a bunch of children Bar, and those Bars have a Baz integer value, I think you can get the sum of those for each Foo by doing something like:
foo.bars.#sum.baz
Not sure if these are applicable to predicates, but it's worth looking into.

Limit serverside results from WebApi controller and ODATA w/ Nhibernate

I've created a WebApi project in VS 2012, using NHibernate as my ORM and I intend to enable Odata support on it. So I've created a test controller with a single Get method that returns a list of entities from a table on my database.
Everything works fine, I can use OData to filter and order my results, etc. The problem is I couldn't find a way to limit the amount of data that's being returned from the database to the controller, and this table has millions of records in it.
Using the PageSize property of the Queryable attribute only seems to be limiting the amount of data returned to the client, but no the amount of Data returned from the DB.
I've tried applying a Take(n) on the IQueryable inside the get method before returning it, and it limits the results brought back from the DB, but it breaks the OData filtering, since if you try to query an entity that's not in the first n results, it just returns an empty collection.
I know you can use the $Top parameter on OData to accomplish this, but I would like not to depend on the client/consumer providing it in order to ensure that I'm not unnecessarily bringing thousands or even million of records that I'm not going to use.
I've also tried to manually check if the client provided a Top parameter on the query string, apply the OData transformation to my Queryable and then applying the Take(n) method over the transformed query. This approach enabled me to filter for any entity through OData, but it breaks pagination, because if I use the $Skip=n parameter, it again returns an empty collection.
So, is there any way to reliably limit the results fetched from the DB while not breaking the OData support?
We recently found that too. We are not applying a Take(pageSize) when server driven paging is enabled as we have to figure out if a next page link should be generated or not. We just enumerate the result set for pageSize number of entities and check if there are more entities or not. We thought that most providers generally bring a partial set of results as IQueryable is generally a lazy implementation. Turns out that is not true. Also, the database can optimize the query if it knows only pageSize number of results are required.
This is the issue that was opened for it. Good news is Youssef fixed it already :). This is the commit that fixed it. So, if you grab the nightly builds you should be good.

SQL SELECT with table aliases in Core Data

I have the following SQL query that I want to do using Core Data:
SELECT t1.date, t1.amount + SUM(t2.amount) AS importantvalue
FROM specifictable AS t1, specifictable AS t2
WHERE t1.amount < 0 AND t2.amount < 0 AND t1.date IS NOT NULL AND t2.date IS NULL
GROUP BY t1.date, t1.amount;
Now, it looks like CoreData fetch requests can only fetch from a single entity. Is there a way to do this entire query in a single fetch request?
The best way I know is to crate an abstract parent entity for entities you wish to fetch together.
So if you have - 'Meat' 'Vegetables' and 'Fruits' entities, you can create a parent abstract entity for 'Food' and then fetch for all the sweet entities in the 'Food' entity.
This way you will get all the sweet 'Meat' 'Vegetables' and 'Fruits'.
Look here:
Entity Inheritance in Apple documentation.
Nikolay,
Core Data is not a SQL system. It has a more primitive query language. While this appears to be a deficit, it really isn't. It forces you to bring things into RAM and do your complex calculations there instead of in the DB. The NSSet/NSMutableSet operations are extremely fast and effective. This also results in a faster app. (This is particularly apparent on iOS where the flash is slow and, hence, big fetches are to be preferred.)
In answer to your question, yes, a fetch request operates on a single entity. No, you are not limited to data on that entity. One uses key paths to traverse relationships in the predicate language.
Shannoga's answer is one good way to solve your problem. But I don't know enough about what you are actually trying to accomplish with your data model to judge whether using entity inheritance is the right path for your app. It may not be.
Your SQL schema from a server may not make sense in a CD app. Both the query language and how the data is used in the UI probably force a different structure. (For example, using a fetched results controller on iOS can force you to denormalize your data differently than you would on a server.)
Entity inheritance, like inheritance in OOP, is a stiff technology. It is hard to change. Hence, I use it carefully. When I do use it, I gain performance in some fetches and some simplification in other calculations. At other times, it is the wrong answer, performance wise.
The real answer is a question: what are you really trying to do?
Andrew

Linq-to-SQL query - Need to filter by IDs returned by Full-Text Search sql functions - Hitting limit for Contains

My objective:
I have built a working controller action in MVC which takes user input for various filter criteria and, using PredicateBuilder (part of LinqKit - sorry, I'm not allowed enough links yet) builds the appropriate LINQ query to return rows from a "master" table in SQL with a couple hundred thousand records. My implementation of the predicates is totally inelegant, as I'm new to a lot of this, and under a very tight deadline, but it did make life easier. The page operates perfectly as-is.
To this, I need to add a Full-Text search filter. Understanding the way LINQ translates Contains to LIKE(%%), using the advice in Simon Blog: LINQ-to-SQL - Enabling Full-Text Searching, I've already prepared Table Functions in SQL to run Freetext queries on the relevant columns. I have 4 functions, to match the query against 4 separate tables.
My approach:
At the moment, I'm building the predicates (I'll spare you) for the initial IQueryable data object, running a LINQ command to return them, like so:
var MyData = DB.Master_Items.Where(outer);
Then, I'm attempting to further filter MyData on the Keys returned by my full-text search functions:
var FTS_Matches_Subtable_1 = (from tbl in DB.Subtable_1
join fts in DB.udf_Subtable_1_FTSearch(KeywordTerms)
on tbl.ID equals fts.ID
select tbl.ForeignKey);
... I have 4 of those sets of matches which I've tried to use to filter my original dataset in several ways with no success. For instance:
MyNewData = MyData.Where(d => FTS_Matches_Subtable_1.Contains(d.Key) ||
FTS_Matches_Subtable_2.Contains(d.Key) ||
FTS_Matches_Subtable_3.Contains(d.Key) ||
FTS_Matches_Subtable_4.Contains(d.Key));
I just get the error: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
I get that it's because I'm trying to pass a relatively large set of data into the Contains function and LINQ is converting each record into a separate parameter, exceeding the limit.
I just don't know how to get around it.
I found another post linq expression to return property value which seemed SO promising. I tried ifwdev's solution (2nd highest ranked answer): using LinqKit to build an extension that will break up the queries into manageable chunks. But I can't figure out how to implement it. Out of my depth right now maybe?
Is there another approach that I'm missing? Some simpler way to accomplish this that I've overlooked?
Sorry for the long post. But thank you for any help you can provide!
This is a perfect time to go back to raw ado.net.
Twisting things around just to use linq to sql is probably just as time consuming if you wrote the query and hydration by hand.

Resources