Entity Framework Include OrderBy random generates duplicate data - asp.net-mvc

When I retrieve a list of items from a database including some children (via .Include), and order the randomly, EF gives me an unexpected result.. I creates/clones addition items..
To explain myself better, I've created a small and simple EF CodeFirst project to reproduce the problem.
First i shall give you the code for this project.
The project
Create a basic MVC3 project and add the EntityFramework.SqlServerCompact package via Nuget.
That adds the latest versions of the following packages:
EntityFramework v4.3.0
SqlServerCompact v4.0.8482.1
EntityFramework.SqlServerCompact v4.1.8482.2
WebActivator v1.5
The Models and DbContext
using System.Collections.Generic;
using System.Data.Entity;
namespace RandomWithInclude.Models
{
public class PeopleContext : DbContext
{
public DbSet<Person> Persons { get; set; }
public DbSet<Address> Addresses { get; set; }
}
public class Person
{
public int ID { get; set; }
public string Name { get; set; }
public virtual ICollection<Address> Addresses { get; set; }
}
public class Address
{
public int ID { get; set; }
public string AdressLine { get; set; }
public virtual Person Person { get; set; }
}
}
The DB Setup and Seed data: EF.SqlServerCompact.cs
using System.Collections.Generic;
using System.Data.Entity;
using System.Data.Entity.Infrastructure;
using RandomWithInclude.Models;
[assembly: WebActivator.PreApplicationStartMethod(typeof(RandomWithInclude.App_Start.EF), "Start")]
namespace RandomWithInclude.App_Start
{
public static class EF
{
public static void Start()
{
Database.DefaultConnectionFactory = new SqlCeConnectionFactory("System.Data.SqlServerCe.4.0");
Database.SetInitializer(new DbInitializer());
}
}
public class DbInitializer : DropCreateDatabaseAlways<PeopleContext>
{
protected override void Seed(PeopleContext context)
{
var address1 = new Address {AdressLine = "Street 1, City 1"};
var address2 = new Address {AdressLine = "Street 2, City 2"};
var address3 = new Address {AdressLine = "Street 3, City 3"};
var address4 = new Address {AdressLine = "Street 4, City 4"};
var address5 = new Address {AdressLine = "Street 5, City 5"};
context.Addresses.Add(address1);
context.Addresses.Add(address2);
context.Addresses.Add(address3);
context.Addresses.Add(address4);
context.Addresses.Add(address5);
var person1 = new Person {Name = "Person 1", Addresses = new List<Address> {address1, address2}};
var person2 = new Person {Name = "Person 2", Addresses = new List<Address> {address3}};
var person3 = new Person {Name = "Person 3", Addresses = new List<Address> {address4, address5}};
context.Persons.Add(person1);
context.Persons.Add(person2);
context.Persons.Add(person3);
}
}
}
The controller: HomeController.cs
using System;
using System.Data.Entity;
using System.Linq;
using System.Web.Mvc;
using RandomWithInclude.Models;
namespace RandomWithInclude.Controllers
{
public class HomeController : Controller
{
public ActionResult Index()
{
var db = new PeopleContext();
var persons = db.Persons
.Include(p => p.Addresses)
.OrderBy(p => Guid.NewGuid());
return View(persons.ToList());
}
}
}
The View: Index.cshtml
#using RandomWithInclude.Models
#model IList<Person>
<ul>
#foreach (var person in Model)
{
<li>
#person.Name
</li>
}
</ul>
this should be all, and you application should compile :)
The problem
As you can see, we have 2 straightforward models (Person and Address) and Person can have multiple Addresses.
We seed the generated database 3 persons and 5 addresses.
If we get all the persons from the database, including the addresses and randomize the results and just print out the names of those persons, that's where it all goes wrong.
As a result, i sometimes get 4 persons, sometimes 5 and sometimes 3, and i expect 3. Always.
e.g.:
Person 1
Person 3
Person 1
Person 3
Person 2
So.. it's copying/cloning data! And that's not cool..
It just seems that EF looses track of what addresses are a child of which person..
The generated SQL query is this:
SELECT
[Project1].[ID] AS [ID],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C1],
[Project1].[ID1] AS [ID1],
[Project1].[AdressLine] AS [AdressLine],
[Project1].[Person_ID] AS [Person_ID]
FROM ( SELECT
NEWID() AS [C1],
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent2].[ID] AS [ID1],
[Extent2].[AdressLine] AS [AdressLine],
[Extent2].[Person_ID] AS [Person_ID],
CASE WHEN ([Extent2].[ID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [People] AS [Extent1]
LEFT OUTER JOIN [Addresses] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Person_ID]
) AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[ID] ASC, [Project1].[C2] ASC
Workarounds
If i remove the .Include(p =>p.Addresses) from the query, everything goes fine. but of course the addresses aren't loaded and accessing that collection will make a new call to the database every time.
I can first get the data from the database and randomize later by just adding a .ToList() before the .OrderBy.. like this: var persons = db.Persons.Include(p => p.Addresses).ToList().OrderBy(p => Guid.NewGuid());
Does anybody have any idea of why it is happening like this?
Might this be a bug in the SQL generation?

As one can sort it out by reading AakashM answer and Nicolae Dascalu answer, it strongly seems Linq OrderBy requires a stable ranking function, which NewID/Guid.NewGuid is not.
So we have to use another random generator that would be stable inside a single query.
To achieve this, before each querying, use a .Net Random generator to get a random number. Then combine this random number with a unique property of the entity to get randomly sorted. And to 'randomize' a bit the result, checksum it. (checksum is a SQL Server function that compute a hash; original idea founded on this blog.)
Assuming Person Id is an int, you could write your query this way :
// Random instances should be stored and reused, not instanciated at each usage.
// But beware, it is not thread safe. If you want to share it between threads, you
// would have to use locks, see its documentation.
// https://learn.microsoft.com/en-us/dotnet/api/system.random.
// But using locks is a bad idea for scalability, especially in a Web context.
var randomGenerator = new Random();
// ...
var rnd = randomGenerator.NextDouble();
var persons = db.Persons
.Include(p => p.Addresses)
.OrderBy(p => SqlFunctions.Checksum(p.Id * rnd));
Like the NewGuid hack, this is very probably not a good random generator with a good distribution and so on. But it does not cause entities to get duplicated in results.
Beware:
If your query ordering does not guarantees uniqueness of your entities ranking, you must complement it for guarantying it. By example, if you use a non-unique property of your entities for the checksum call, then add something like .ThenBy(p => p.Id) after the OrderBy.
If your ranking is not unique for your queried root entity, its included children may get mixed with children of other entities having the same ranking. And then the bug will stay here.
Note:
I would prefer use .Next() method to get an int then combine it through a xor (^) to an entity int unique property, rather than using a double and multiply it. But SqlFunctions.Checksum unfortunately does not provide an overload for int data type, though the SQL server function is supposed to support it. You may use a cast to overcome this, but for keeping it simple I finally had chosen to go with the multiply.

tl;dr: There's a leaky abstraction here. To us, Include is a simple instruction to stick a collection of things onto each single returned Person row. But EF's implementation of Include is done by returning a whole row for each Person-Address combo, and reassembling at the client. Ordering by a volatile value causes those rows to become shuffled, breaking apart the Person groups that EF is relying on.
When we have a look at ToTraceString() for this LINQ:
var people = c.People.Include("Addresses");
// Note: no OrderBy in sight!
we see
SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[Id1] AS [Id1],
[Project1].[Data] AS [Data],
[Project1].[PersonId] AS [PersonId]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Data] AS [Data],
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [Person] AS [Extent1]
LEFT OUTER JOIN [Address] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PersonId]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC
So we get n rows for each A, plus 1 row for each P without any As.
Adding an OrderBy clause, however, puts the thing-to-order-by at the start of the ordered columns:
var people = c.People.Include("Addresses").OrderBy(p => Guid.NewGuid());
gives
SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C1],
[Project1].[Id1] AS [Id1],
[Project1].[Data] AS [Data],
[Project1].[PersonId] AS [PersonId]
FROM ( SELECT
NEWID() AS [C1],
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Data] AS [Data],
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [Person] AS [Extent1]
LEFT OUTER JOIN [Address] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PersonId]
) AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[Id] ASC, [Project1].[C2] ASC
So in your case, where the ordered-by-thing is not a property of a P, but is instead volatile, and therefore can be different for different P-A records of the same P, the whole thing falls apart.
I'm not sure where on the working-as-intended ~~~ cast-iron bug continuum this behaviour falls. But at least now we know about it.

I dont think there is an issue in query generation, but there is definately an issue when EF tries to convert rows into object.
It looks like there is an inherent assumption here that data for the same person in a joined statement will be returned grouped together order by or not.
for example the result of a joined query will always be
P.Id P.Name A.Id A.StreetLine
1 Person 1 10 ---
1 Person 1 11
2 Person 2 12
3 Person 3 13
3 Person 3 14
even if you order by some other column, same person would always appear one after the other.
this assumption is mostly true for any joined query.
But there is a deeper issue here i think. OrderBy is for when you want data in certain order ( as opposite to random), so that assumption does seem reasonable.
i think you should really get data out and then randomize it according to some other means in your code

From theory:
To sort a list of items, the compare function should be stable relative to items; this means that for any 2 items x, y the result of x< y should be the same as many time is queried(called).
I think the issue is related to misunderstanding of specification(documentation) of OrderBy method:
keySelector - A function to extract a key from an element.
EF didn't mention explicitly if the provided function should return the same value for same object as many times is called (in your case returns different/random values), but I think the "key" term that they used in documentation implicitly suggested this.

When you define a query path to define the query results, (use Include), the query path is only valid on the returned instance of ObjectQuery. Other instances of ObjectQuery and the object context itself are not affected. This functionality lets you chain multiple "Includes" for eager loading.
Therefor, Your statement translates into
from person in db.Persons.Include(p => p.Addresses).OrderBy(p => Guid.NewGuid())
select person
instead of what you intended.
from person in db.Persons.Include(p => p.Addresses)
select person
.OrderBy(p => Guid.NewGuid())
Hence your second workaround works fine :)
Reference: Loading Related Objects While Querying A Conceptual Model in Entity
Framework - http://msdn.microsoft.com/en-us/library/bb896272.aspx

I also ran into this problem, and solved it by adding a Randomizer Guid property to the main class I was fetching. I then set the column's default value to NEWID() like this (using EF Core 2)
builder.Entity<MainClass>()
.Property(m => m.Randomizer)
.HasDefaultValueSql("NEWID()");
When fetching, it gets a bit more complicated. I created two random integers to function as my order-by indexes, then ran the query like this
var rand = new Random();
var randomIndex1 = rand.Next(0, 31);
var randomIndex2 = rand.Next(0, 31);
var taskSet = await DbContext.MainClasses
.Include(m => m.SubClass1)
.ThenInclude(s => s.SubClass2)
.OrderBy(m => m.Randomizer.ToString().Replace("-", "")[randomIndex1])
.ThenBy(m => m.Randomizer.ToString().Replace("-", "")[randomIndex2])
.FirstOrDefaultAsync();
This seems to be working well enough, and should provide enough entropy for even a large dataset to be fairly randomized.

Related

Can I join POCO objects with Entities

If I have a list of objects who have a property that should match the value of a field of an entity, can I use this in a join like this (using linqpad for testing):
For some reason I think this is not going to work as it gets translated to sql. The query seems to take a long time in Linqpad and the sql that gets translated (at least that I can see) while its trying to execute doesnt seem to include anything about "dlist".
void Main()
{
var dlist = new List<dor>();
dlist.Add(new dor() {DeliveryScheduleID=223422});
dlist.Add(new dor() {DeliveryScheduleID=223423});
dlist.Add(new dor() {DeliveryScheduleID=223424});
dlist.Add(new dor() {DeliveryScheduleID=223425});
var retval = (from a in dlist
join b in DeliverySchedules on a.DeliveryScheduleID equals b.Id
join c in CustomerOrders on b.CustomerOrderID equals c.ID
select a.ID).Count ();
retval.Dump();
}
// Define other methods and classes here
public class dor {
public int DeliveryScheduleID {get;set;}
}

On a four-week struggle for a functional LINQ LAMBDA expression

I have been trying to find the correct LINQ LAMBDA expression for the last month, with zero successful results.
I cannot use the following tools:
Linqer is unable to run because the Microsoft tool it uses to create the SQL connection (the dbml file) refuses to install on my Win8.1 system
LinqPad doesn’t provide an actual translation until you actually buy the product (which makes the “trial” fundamentally broken in the first place)
I have three levels of tables that I need to bring back into a single viewmodel.
Level one is a company table. Easy as s**t.
Level two is a "cycle" table. This is where I have gotten hung up on, since many cycles can exist for a company, but I need to grab only the latest cycle by date.
Level three is a pair of tables that exist off the cycle, I only need a true/false test for content in those tables for the cycle in question. I haven't even tried this yet.
So far I have come up with a minimally functional SQL script (that only deals with the first two levels), but my MVC project is making use of several tools that hook straight into LAMBDA expressions, including PagedList. I need a LAMBDA expression and not a pure SQL expression.
My SQL:
SELECT
co.CompanyId
, co.CompanyName
, co.CompanyCity
, co.NumberOfEmployees
, co.ProspectingScore
, cd.PDFResourceLibrary
, cd.PresentationDone
, cd.MOUDone
FROM Company AS co
OUTER APPLY (
SELECT
TOP 1 MAX(CycleDate) AS CycleDate
, PDFResourceLibrary
, PresentationDone
, MOUDone
FROM Cycle AS cy
WHERE cy.CompanyId = co.CompanyId
GROUP BY PDFResourceLibrary, PresentationDone, MOUDone
) AS cd
ORDER BY co.ProspectingScore DESC
I have tried a number of lambda expressions to date:
db.Company
.GroupJoin(
db.Cycle
, co => co.CompanyId
, cy => cy.CycleId
, (x, y) => new { Company = x, Cycle = y }
).Select(
y => y.Cycle.OrderByDescending(y => y.CycleDate).SingleOrDefault()
).ToList();
But this throws a local variable cannot have the same name as a method type parameter as well as a Cannot implicitly convert type System.Collections.Generic.List to System.Collections.Generic.IEnumerable error.
Converting it to a Join flags the OrderByDescending as invalid, but I need that to drop all but the most latest cycle.
I also can't seem to do a join to save my effin' life. All the examples out there fail with my system.
For example, a join that comes sooooooo close is:
db.Company.Join(
db.Cycle.OrderByDescending(x => x.CycleDate).SingleOrDefault()
, co => co.CompanyId
, cy => cy.CycleId
, (x, y) => new { Company = x, Cycle = y }
).ToList();
but then it claims The type arguments for method Queryable Join cannot be inferred from the usage. Like, --what??
I have also tried the following:
db.Company.Include(
x => x.Cycle.OrderByDescending(y => y.CycleDate).SingleOrDefault()
).ToList();
which works for the company but I cannot seem to drill past the company and into the cycle, when I go, #(item.Cycle.PresentationDone it says that ICollection<Cycle> does not contain a definition for PresentationDone, even though I have an ICollection for Cycle in my Company model. It's right there, but the system won’t see it to follow.
An attempt with
db.Company.Select(x => new { Company = x, Cycle = x.Cycle.OrderByDescending(y => y.CycleDate).Single() }).ToList();
also throws the List to IEnumerable conversion error.
As a final note, please keep in mind that I am bringing two models into the same page, and the second model is the same as the first but focuses only on the company. IT works. It has no problem pulling data out of the DB:
viewModel.AllCompanies = db.Company.ToList();
Because it does not need to pull anything from the cycle -- I am ignoring anything beneath the Company level. But for the first query, I have to bring several items off of the cycle, and so I need to query for the most recent cycle.
EDIT:
With the generous assistance of Ivan Stoev I have assembled the following:
var query = (IPagedList<DashboardUserData>)(
from co in db.Company
join cy in db.Cycle on co.CompanyId equals cy.CycleId into cycles
from cd in cycles.OrderByDescending(cy => cy.CycleDate).Take(1).DefaultIfEmpty()
orderby co.ProspectingScore descending
select new {
CompanyId = co.CompanyId,
CompanyName = co.CompanyName,
CompanyCity = co.CompanyCity,
NumberOfEmployees = co.NumberOfEmployees,
ProspectingScore = co.ProspectingScore,
PDFResourceLibrary = (bool?)cd.PDFResourceLibrary,
PresentationDone = (bool?)cd.PresentationDone,
MOUDone = (bool?)cd.MOUDone
}).ToPagedList(regionPageIndex, pageSize);
And my model is such:
public IPagedList<DashboardUserData> RegionalCompanies { get; set; }
public class DashboardUserData {
public Guid CompanyId { get; set; }
public string CompanyName { get; set; }
public string CompanyCity { get; set; }
public int? NumberOfEmployees { get; set; }
public int? ProspectingScore { get; set; }
public bool? PDFResourceLibrary { get; set; }
public bool? PresentationDone { get; set; }
public bool? MOUDone { get; set; }
}
But for some reason I am unable to attach the data to the model. I get the following error:
Unable to cast object of type 'PagedList.PagedList`1[<>f__AnonymousType9`8[System.Guid,System.String,System.String,System.Nullable`1[System.Int32],System.Nullable`1[System.Int32],System.Nullable`1[System.Boolean],System.Nullable`1[System.Boolean],System.Nullable`1[System.Boolean]]]' to type 'PagedList.IPagedList`1[CCS.Models.DashboardUserData]'.
It's probably something stupidly simple, but I'm missing it.
EDIT 2:
When I add a filter to the original table, Company:
var query = (IPagedList<DashboardUserData>)(
from co in db.Company
where co.RegionId == new Guid(User.GetClaimValue("Region"))
I now get an issue of:
Only parameterless constructors and initializers are supported in LINQ to Entities
Still have that model issue from the first Edit, tho.
Edit 3:
Huh, I may have solved Edit 1:
viewModel.RegionalCompanies = (
from co in db.Company
where co.RegionId == regionId
join cy in db.Cycle on co.CompanyId equals cy.CycleId into cycles
from cd in cycles.OrderByDescending(cy => cy.CycleDate).Take(1).DefaultIfEmpty()
orderby co.ProspectingScore descending
select new DashboardUserData {
CompanyId = co.CompanyId,
CompanyName = co.CompanyName,
CompanyCity = co.CompanyCity,
NumberOfEmployees = co.NumberOfEmployees,
ProspectingScore = co.ProspectingScore,
PDFResourceLibrary = cd.PDFResourceLibrary,
PresentationDone = cd.PresentationDone,
MOUDone = cd.MOUDone
}).ToPagedList(regionPageIndex, pageSize);
But now the second viewModel:
viewModel.AllOtherCompanies = await db.Company.Where(c => c.RegionId != regionId).Include(c => c.Province).ToPagedListAsync(allPageIndex, pageSize);
return View(viewModel);
Is throwing one very confusing error message:
The method 'Skip' is only supported for sorted input in LINQ to Entities. The method 'OrderBy' must be called before the method 'Skip'.
which I have never seen before, even with the original code. Googling right now.
Edit 4:
OMFG, I think I have it working. I was concentrating on getting the first model to work properly with IPagedList, and forgot that IPagedList requires an order in order to page properly. This is coming once I get column sorting and paging implemented on the page side and the correct code in the controller, but once I stuck in a temporary .OrderBy() in the second viewModel, everything suddenly stood up properly.
A big shout-out to Ivan Stoev, your reply was a massive kick in the right direction!! Thank you!!
I would suggest you when working with complex queries, to use the LINQ query syntax for the most parts of the query because it's much easier to follow and modify due to the transparent identifiers. Also it maps more natively to the SQL query.
For instance, here is the LINQ equivalent of your SQL query:
var query =
from co in db.Company
from cd in (
from cy in db.Cycle
where cy.CompanyId == co.CompanyId
group cy by new { cy.PDFResourceLibrary, cy.PresentationDone, cy.MOUDone } into g
select new
{
CycleDate = g.Max(cy => cy.CycleDate),
g.Key.PDFResourceLibrary,
g.Key.PresentationDone,
g.Key.MOUDone
}
)
.OrderByDescending(cy => cy.CycleDate).Take(1) // TOP 1
.DefaultIfEmpty() // OUTER
orderby co.ProspectingScore descending
select new
{
co.CompanyId,
co.CompanyName,
co.CompanyCity,
co.NumberOfEmployees,
co.ProspectingScore,
cd.PDFResourceLibrary,
cd.PresentationDone,
cd.MOUDone
};
EF generated SQL query from the above:
SELECT
[Extent1].[CompanyId] AS [CompanyId],
[Extent1].[CompanyName] AS [CompanyName],
[Extent1].[CompanyCity] AS [CompanyCity],
[Extent1].[NumberOfEmployees] AS [NumberOfEmployees],
[Extent1].[ProspectingScore] AS [ProspectingScore],
[Limit1].[PDFResourceLibrary] AS [PDFResourceLibrary],
[Limit1].[PresentationDone] AS [PresentationDone],
[Limit1].[MOUDone] AS [MOUDone]
FROM [dbo].[Company] AS [Extent1]
OUTER APPLY (SELECT TOP (1) [Project1].[PDFResourceLibrary] AS [PDFResourceLibrary], [Project1].[PresentationDone] AS [PresentationDone], [Project1].[MOUDone] AS [MOUDone]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [PDFResourceLibrary],
[GroupBy1].[K2] AS [PresentationDone],
[GroupBy1].[K3] AS [MOUDone]
FROM ( SELECT
[Extent2].[PDFResourceLibrary] AS [K1],
[Extent2].[PresentationDone] AS [K2],
[Extent2].[MOUDone] AS [K3],
MAX([Extent2].[CycleDate]) AS [A1]
FROM [dbo].[Cycle] AS [Extent2]
WHERE [Extent2].[CompanyId] = [Extent1].[CompanyId]
GROUP BY [Extent2].[PDFResourceLibrary], [Extent2].[PresentationDone], [Extent2].[MOUDone]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC ) AS [Limit1]
ORDER BY [Extent1].[ProspectingScore] DESC
This should cover the question of how to convert the original SQL query.
But do you really need to follow the original SQL query? According to this requirement:
many cycles can exist for a company, but I need to grab only the latest cycle by date
it looks more natural to use something like this:
var query =
from co in db.Company
join cy in db.Cycle on co.CompanyId equals cy.CycleId into cycles
from cd in cycles.OrderByDescending(cy => cy.CycleDate).Take(1).DefaultIfEmpty()
orderby co.ProspectingScore descending
select new
{
co.CompanyId,
co.CompanyName,
co.CompanyCity,
co.NumberOfEmployees,
co.ProspectingScore,
cd.PDFResourceLibrary,
cd.PresentationDone,
cd.MOUDone
};
which generates:
SELECT
[Extent1].[CompanyId] AS [CompanyId],
[Extent1].[CompanyName] AS [CompanyName],
[Extent1].[CompanyCity] AS [CompanyCity],
[Extent1].[NumberOfEmployees] AS [NumberOfEmployees],
[Extent1].[ProspectingScore] AS [ProspectingScore],
[Limit1].[PDFResourceLibrary] AS [PDFResourceLibrary],
[Limit1].[PresentationDone] AS [PresentationDone],
[Limit1].[MOUDone] AS [MOUDone]
FROM [dbo].[Company] AS [Extent1]
OUTER APPLY (SELECT TOP (1) [Project1].[PDFResourceLibrary] AS [PDFResourceLibrary], [Project1].[PresentationDone] AS [PresentationDone], [Project1].[MOUDone] AS [MOUDone]
FROM ( SELECT
[Extent2].[CycleDate] AS [CycleDate],
[Extent2].[PDFResourceLibrary] AS [PDFResourceLibrary],
[Extent2].[PresentationDone] AS [PresentationDone],
[Extent2].[MOUDone] AS [MOUDone]
FROM [dbo].[Cycle] AS [Extent2]
WHERE [Extent1].[CompanyId] = [Extent2].[CycleId]
) AS [Project1]
ORDER BY [Project1].[CycleDate] DESC ) AS [Limit1]
ORDER BY [Extent1].[ProspectingScore] DESC

How do I use 2 include statements in a single MVC EF query?

I am trying to write a query that includes 2 joins.
1 StoryTemplate can have multiple Stories
1 Story can have multiple StoryDrafts
I am starting the query on the StoryDrafts object because that is where it's linked to the UserId.
I don't have a reference from the StoryDrafts object directly to the StoryTemplates object. How would I build this query properly?
public JsonResult Index(int userId)
{
return Json(
db.StoryDrafts
.Include("Story")
.Include("StoryTemplate")
.Where(d => d.UserId == userId)
,JsonRequestBehavior.AllowGet);
}
Thank you for any help.
Try to flatten your hierarchy if it works for you. Here is a sample, and you may want to customize it for your needs.
var result = from c in db.Customers
join o in db.Orders
on c equals o.Customers
select new
{
custid = c.CustomerID,
cname = c.CompanyName,
address = c.Address,
orderid = o.OrderID,
freight = o.Freight,
orderdate = o.OrderDate
};
If flattering does not meet your requirements then you need to use query that returns a Nested Group. Finally, look at the following link for more references - LINQ Query Expressions .

How to join multiple tables using LINQ-to-SQL?

I'm quite new to linq, so please bear with me.
I'm working on a asp.net webpage and I want to add a "search function" (textbox where user inputs name or surname or both or just parts of it and gets back all related information). I have two tables ("Person" and "Application") and I want to display some columns from Person (name and surname) and some from Application (score, position,...). I know how I could do it using sql, but I want to learn more about linq and thus I want to do it using linq.
For now I got two main ideas:
1.)
var person = dataContext.GetTable<Person>();
var application = dataContext.GetTable<Application>();
var p1 = from p in Person
where(p.Name.Contains(tokens[0]) || p.Surname.Contains(tokens[1]))
select new {Id = p.Id, Name = p.Name, Surname = p.Surname}; //or maybe without this line
//I don't know how to do the following properly
var result = from a in Application
where a.FK_Application.Equals(index) //just to get the "right" type of application
//this is not right, but I don't know how to do it better
join p1
on p1.Id == a.FK_Person
2.) The other idea is just to go through "Application" and instead of "join p1 ..." to use
var result = from a in Application
where a.FK_Application.Equals(index) //just to get the "right" type of application
join p from Person
on p.Id == a.FK_Person
where p.Name.Contains(tokens[0]) || p.Surname.Contains(tokens[1])
I think that first idea is better for queries without the first "where" condition, which I also intended to use. Regardless of what is better (faster), I still don't know how to do it using linq. Also in the end I wanted to display / select just some parts (columns) of the result (joined tables + filtering conditions).
I really want to know how to do such things using linq as I'll be dealing also with some similar problems with local data, where I can use only linq.
Could somebody please explain me how to do it, I spent days trying to figure it out and searching on the Internet for answers.
var result = from a in dataContext.Applications
join p in dataContext.Persons
on p.Id equals a.FK_Person
where (p.Name.Contains("blah") || p.Surname.Contains("foo")) && a.FK_Application == index
select new { Id = p.Id, Name = p.Name, Surname = p.Surname, a.Score, a.Position };
Well as Odrahn pointed out, this will give you flat results, with possibly many rows for a single person, since a person could join on multiple applications that all have the same FK. Here's a way to search all the right people, and then add on the relevant application to the results:
var p1 = from p in dataContext.Persons
where(p.Name.Contains(tokens[0]) || p.Surname.Contains(tokens[1]))
select new {
Id = p.Id, Name = p.Name, Surname = p.Surname,
BestApplication = dataContext.Applications.FirstOrDefault(a => a.FK_Application == index /* && ???? */);
};
Sorry - it looks like this second query will result in a roundtrip per person, so it clearly won't be scalable. I assumed L2S would handle it better.
In order to answer this properly, I need to know if Application and Person are directly related (i.e. does Person have many Applications)? From reading your post, I'm assuming that they are because Application seems to have a foreign key to person.
If so, then you could create a custom PersonModel which will be populated by the fields you need from the different entities like this:
class PersonModel
{
string Name { get; set; }
string Surname { get; set; }
List<int> Scores { get; set; }
List<int> Positions { get; set; }
}
Then to populate it, you'd do the following:
// Select the correct person based on Name and Surname inputs
var person = dataContext.Persons.Where(p => p.Name.Contains("firstname") || p.Name.Contains("surname")).FirstOrDefault();
// Get the first person we find (note, there may be many - do you need to account for this?)
if (person != null)
{
var scores = new List<int>();
var positions = new List<int>();
scores.AddRange(person.Applications.Select(i => i.Score);
positions.AddRange(person.Applications.Select(i => i.Position);
var personModel = new PersonModel
{
Name = person.Name,
Surname = person.Surname,
Scores = scores,
Positions = positions
};
}
Because of your relationship between Person and Application, where a person can have many applications, I've had to account for the possibility of there being many scores and positions (hence the List).
Also note that I've used lambda expressions instead of plain linqToSql for simple selecting so that you can visualise easily what's going on.

How to do multiple Group By's in linq to sql?

how can you do multiple "group by's" in linq to sql?
Can you please show me in both linq query syntax and linq method syntax.
Thanks
Edit.
I am talking about multiple parameters say grouping by "sex" and "age".
Also I forgot to mention how would I say add up all the ages before I group them.
If i had this example how would I do this
Table Product
ProductId
ProductName
ProductQty
ProductPrice
Now imagine for whatever reason I had tons of rows each with the same ProductName, different ProductQty and ProductPrice.
How would I groupt hem up by Product Name and add together ProductQty and ProductPrice?
I know in this example it probably makes no sense why there would row after row with the same product name but in my database it makes sense(it is not products).
To group by multiple properties, you need to create a new object to group by:
var groupedResult = from person in db.People
group by new { person.Sex, person.Age } into personGroup
select new
{
personGroup.Key.Sex,
personGroup.Key.Age,
NumberInGroup = personGroup.Count()
}
Apologies, I didn't see your final edit. I may be misunderstanding, but if you sum the age, you can't group by it. You could group by sex, sum or average the age...but you couldn't group by sex and summed age at the same time in a single statement. It might be possible to use a nested LINQ query to get the summed or average age for any given sex...bit more complex though.
EDIT:
To solve your specific problem, it should be pretty simple and straightforward. You are grouping only by name, so the rest is elementary (example updated with service and concrete dto type):
class ProductInventoryInfo
{
public string Name { get; set; }
public decimal Total { get; set; }
}
class ProductService: IProductService
{
public IList<ProductInventoryInfo> GetProductInventory()
{
// ...
var groupedResult = from product in db.Products
group by product.ProductName into productGroup
select new ProductInventoryInfo
{
Name = productGroup.Key,
Total = productGroup.Sum(p => p.ProductCost * p.ProductQty)
}
return groupedResult.ToList();
}
}

Resources