How to join multiple tables using LINQ-to-SQL? - join

I'm quite new to linq, so please bear with me.
I'm working on a asp.net webpage and I want to add a "search function" (textbox where user inputs name or surname or both or just parts of it and gets back all related information). I have two tables ("Person" and "Application") and I want to display some columns from Person (name and surname) and some from Application (score, position,...). I know how I could do it using sql, but I want to learn more about linq and thus I want to do it using linq.
For now I got two main ideas:
1.)
var person = dataContext.GetTable<Person>();
var application = dataContext.GetTable<Application>();
var p1 = from p in Person
where(p.Name.Contains(tokens[0]) || p.Surname.Contains(tokens[1]))
select new {Id = p.Id, Name = p.Name, Surname = p.Surname}; //or maybe without this line
//I don't know how to do the following properly
var result = from a in Application
where a.FK_Application.Equals(index) //just to get the "right" type of application
//this is not right, but I don't know how to do it better
join p1
on p1.Id == a.FK_Person
2.) The other idea is just to go through "Application" and instead of "join p1 ..." to use
var result = from a in Application
where a.FK_Application.Equals(index) //just to get the "right" type of application
join p from Person
on p.Id == a.FK_Person
where p.Name.Contains(tokens[0]) || p.Surname.Contains(tokens[1])
I think that first idea is better for queries without the first "where" condition, which I also intended to use. Regardless of what is better (faster), I still don't know how to do it using linq. Also in the end I wanted to display / select just some parts (columns) of the result (joined tables + filtering conditions).
I really want to know how to do such things using linq as I'll be dealing also with some similar problems with local data, where I can use only linq.
Could somebody please explain me how to do it, I spent days trying to figure it out and searching on the Internet for answers.

var result = from a in dataContext.Applications
join p in dataContext.Persons
on p.Id equals a.FK_Person
where (p.Name.Contains("blah") || p.Surname.Contains("foo")) && a.FK_Application == index
select new { Id = p.Id, Name = p.Name, Surname = p.Surname, a.Score, a.Position };
Well as Odrahn pointed out, this will give you flat results, with possibly many rows for a single person, since a person could join on multiple applications that all have the same FK. Here's a way to search all the right people, and then add on the relevant application to the results:
var p1 = from p in dataContext.Persons
where(p.Name.Contains(tokens[0]) || p.Surname.Contains(tokens[1]))
select new {
Id = p.Id, Name = p.Name, Surname = p.Surname,
BestApplication = dataContext.Applications.FirstOrDefault(a => a.FK_Application == index /* && ???? */);
};
Sorry - it looks like this second query will result in a roundtrip per person, so it clearly won't be scalable. I assumed L2S would handle it better.

In order to answer this properly, I need to know if Application and Person are directly related (i.e. does Person have many Applications)? From reading your post, I'm assuming that they are because Application seems to have a foreign key to person.
If so, then you could create a custom PersonModel which will be populated by the fields you need from the different entities like this:
class PersonModel
{
string Name { get; set; }
string Surname { get; set; }
List<int> Scores { get; set; }
List<int> Positions { get; set; }
}
Then to populate it, you'd do the following:
// Select the correct person based on Name and Surname inputs
var person = dataContext.Persons.Where(p => p.Name.Contains("firstname") || p.Name.Contains("surname")).FirstOrDefault();
// Get the first person we find (note, there may be many - do you need to account for this?)
if (person != null)
{
var scores = new List<int>();
var positions = new List<int>();
scores.AddRange(person.Applications.Select(i => i.Score);
positions.AddRange(person.Applications.Select(i => i.Position);
var personModel = new PersonModel
{
Name = person.Name,
Surname = person.Surname,
Scores = scores,
Positions = positions
};
}
Because of your relationship between Person and Application, where a person can have many applications, I've had to account for the possibility of there being many scores and positions (hence the List).
Also note that I've used lambda expressions instead of plain linqToSql for simple selecting so that you can visualise easily what's going on.

Related

Trying to use Entity Framework method for multiple adds to database

I have an ASP.NET MVC app using Entity Framework from our SQL Server backend.
Goal is to create ~18 WPackage entries via a foreach loop:
foreach (var dbitem in dbCList)
The code works for a single WPackage entry, but we have a request from the customer to create 300+ WPackages, so trying to use the Entity Framework code for a single "Add" and loop to create 300+ adds.
The T-SQL would be very challenging as there are many keys created on the fly/at row creation, so for activities >> resources, we'd have to insert the activity, grab or remember the activity key, then add resources with that newly created activity key.
Each WPackage (this is the main parent table) could have one or more of the following child table entries:
1+ activities
each activity would have 1+ resource
1+ budgets
1+ Signatures
1+ CostCodes
Our schema or model diagram would be:
WPackage
--Activities
-----Resources (child of Activities)
--CostCodes
--Budgets
--Signatures
The following code fails on:
dbContextTransaction.Commit();
with an error:
The transaction operation cannot be performed because there are pending requests working on this transaction.
[HttpPost]
public ActionResult Copy([Bind(Include = "ID,WBSID,...***fields excluded for brevity")] Package model)
{
if (ModelState.IsValid)
{
try
{
using (var dbContextTransaction = db.Database.BeginTransaction())
{
var dbCList = db.Packages.Join(db.WBS,
*expression omitted for brevity*)
// this dbClist will build about 18 items in the collection for below loop
foreach (var dbitem in dbCList)
{
int testWPID = dbitem;
WPackage prvWP = db.WPackages.Find(dbitem);
int previousWPID = dbitem;
WPackage previousWP = db.WPackages.Find(dbitem);
model.ID = dbitem;
db.WPackages.Add(model);
db.SaveChanges();
var budgets = db.Budgets.Where(i => i.WPID == previousWPID);
foreach (Budget budget in budgets)
{
budget.WPID = model.ID;
db.Budgets.Add(budget);
}
var costCodes = db.CostCodes.Where(i => i.WPID == previousWPID);
foreach (CostCode costCode in costCodes)
{
costCode.WPID = model.ID;
db.CostCodes.Add(costCode);
}
var activities = db.Activities.Where(i => i.WPID == previousWPID);
// *code excluded for brevity*
var previousActivityID = activity.ID;
db.Activities.Add(activity);
db.SaveChanges();
var resources = db.Resources.Where(i => i.ActivityID == previousActivityID);
foreach (Resource resource in resources)
{
resource.WPID = model.ID;
resource.ActivityID = activity.ID;
resource.ActivityNumber = activity.ActivityNumber;
db.Resources.Add(resource);
db.SaveChanges();
}
}
var signatures = db.RolesAndSigs
.Where(i => i.KeyId == previousWPID && i.Type == "WPL")
.OrderBy(i => i.Role)
.OrderBy(i => i.Person);
foreach (RolesAndSig signature in signatures)
{
db.RolesAndSigss.Add(signature);
}
db.SaveChanges();
dbContextTransaction.Commit();
}
}
}
}
I've also tried to have the Commit() run outside the foreach dbitem loop like:
db.SaveChanges();
//dbContextTransaction.Commit();
}
dbContextTransaction.Commit();
...but this returns error of:
[EXCEPTION] The property 'ID' is part of the object's key information and cannot be modified.
The code you posted has some issues that don't make sense, and probably aren't doing what you think they are doing. The crux of the issue you are facing is that Entity Framework tracks all references to entities it loads and associates:
Firstly this code:
int testWPID = dbitem;
WPackage prvWP = db.WPackages.Find(dbitem);
int previousWPID = dbitem;
WPackage previousWP = db.WPackages.Find(dbitem);
prvWP and previousWP will be pointing to the exact same reference, not two copies of the same entity. Be careful when updating either or any other reference retrieved or associated with that same ID. They all point to the same instance. If you do want a stand-alone snaphot reference you can use AsNoTracking().
Next, when you do something like this in a loop:
model.ID = dbitem;
db.WPackages.Add(model);
In the first iteration, "model" is not an entity. It is a deserialized block of data with the Type of the Package entity. As soon as you call .Add(model) that reference will now be pointing to a newly tracked entity reference. In the next loop you are telling EF to change that tracked entity reference's ID to a new value, and that is illegal.
What it looks like you want to do is create a copy of this model for each of the 18 expected iterations. For that what you want to do would be something more like:
foreach (var dbitem in dbCList)
{
var newModel = new WPackage
{
ID = dbItem,
WBSID = model.WBSID,
/// copy across all relevant fields from the passed in model.
};
db.WPackages.Add(newModel);
// ...
}
It would be quite worthwhile to leverage navigation properties for the related entities rather than using explicit joins and trying to scope everything in an explicit transaction with multiple SaveChanges() calls. EF can manage all of the FKs automatically rather than essentially using it as a wrapper for individual ADO CRUD operations.
You will need to be explicit between when you want to "clone" an object reference vs. "copy" a reference. For example, if I have a Customer that has an Address, and Addresses have a Country reference, when I clone a Customer, I will want to clone a new Address record for that Customer, however ensure that the Country reference is copied across. If I have a record for Jack at an 123 Apple Street, London in England, and go to clone Jack to make a record for Jill at the same address, they might be at the same location now, but not always, so I want them to point at different Address records in case Jill moves out. Still, there should only be one record for "England". (Jill may move to a different country, but her address record would just point at a different Country Id)
Wrong:
var jill = context.Customers.Single(c => c.Name == "Jack");
jill.Name = "Jill";
context.Customers.Add(jill);
This would attempt to rename Jack into Jill, then "Add" the already tracked instance, resulting in an exception.
Will work, but still Wrong:
var jack = context.Customers.AsNoTracking().Single(c => c.Name == "Jack");
var jill = jack;
jill.Name = "Jill";
context.Customers.Add(jill);
This would technically work by loading Jack as an untracked entity, and would save Jill as a new record with a new Id. However this is potentially very confusing. Depending on how the AddressId/Address is referenced we could end up with Jack and Jill referencing the same single Address record. Bad if you want Jack and Jill to have different addresses.
Right:
var jack = context.Customers
.Include(c => c.Address)
.ThenInclude(a => a.Country)
.Single(c => c.Name == "Jack");
var jill = new Customer
{
Name = "Jill",
// copy other fields...
Address = new Address
{
StreetNumber = jack.Address.StreetNumber,
StreetName = jack.Address.StreetName,
Country = jack.Address.Country
}
};
context.Customers.Add(jill);
The first detail is to ensure when we load Jack that we eager load all of the related details we will want to clone or copy references to. We then create a new instance for Jill, copying the values from Jack, including setting up a new Address record. The Country reference is copied across as there should only be ever a single record for "England".
Edit: For something like a roll-over scenario if you have a package by year, let's use the example of a Package class below:
public class Package
{
[Key]
public int PackageId { get; set; }
[ForeignKey("PackageType")]
public int PackageTypeId { get; set; }
public int Year { get; set; }
// .. More package related details and relationships...
public virtual PackageType PackageType { get; set; }
}
A goal might be to make a new Package and related data for Year 2022 from the data from 2021, and apply any changes from a view model passed in.
Find is a poor choice for this because Find wants to locate data by PK. If you're method simply passes an entity to be copied from (I.e. the data from 2021) then this can work, however if you have modified that data from 2021 to represent values you want for 2022 that could be dangerous or misleading within the code. (We don't want to update 2021's data, we want to create a new record set for 2022) To make a new Package for 2022 we just need the updated data to make up that new item, and a way to identify a source for what to use as a template. That identification could be the PK of the row to copy from (ProductId), or derived from the data passed in. (ProductTypeId, and Year-1) In both cases if we want to consider related data with the "copy from" product then it would be prudent to eager load that related data in one query rather than going back to the database repeatedly. Find cannot accommodate that.
For instance if I want to pass data to make a new product I pass a ProductTypeId, and a Year along with any values to use for the new structure. I can attempt to get a copy of the existing year to use as a template via:
var existingProduct = context.Products
.Include(x => x.Activities) // Eager load related data.
.Include(x => x.CostCodes)
// ...
.Single(x => x.ProductTypeId == productTypeId && x.Year = year - 1);
or if I passed a ProductId: (such as if I could choose to copy the data from a selected year like 2020 instead)
var existingProduct = context.Products
.Include(x => x.Activities)
.Include(x => x.CostCodes)
// ...
.Single(x => x.ProductId == copyFromProductId);
Both of these examples expect to find one, and only one existing product. If the request comes in with values that it cannot find a row for, there would be an exception which should be handled. This would fetch all of the existing product information that we can copy from, alongside any data that was passed into the method to create a new Product.

LINQ to Entities query with join inside method for use in MVC app

In my Person table is a RequestedLocation column which stores location IDs. The IDs match the LocationId column in the Locations table, the Locations table also has the text location names, in the LocatioName column.
In my view, I need to display the string LocationName in the view which has the Person model passed to it. The view will be displaying a List of people in a telerik grid. CUrrently it works great, except the RequestedLocation column is all integers.
I am populating all my grids with methods containing LINQ queries. Here is the method that currently works:
public List<Person> GetPeople()
{
var query = from p in _DB.Person.ToList()
select p;
return query.ToList();
}
Here is the regular SQL query that works, and I need to convert into LINQ:
SELECT ApplicantID
,FirstName
,LastName
,MiddleName
,DateofBirth
,Gender
,RequestedVolunteerRole
,RequestedVolunteerLocation
,l.LocationName
FROM Form.Person p
JOIN dbo.Location l ON p.RequestedVolunteerLocation = l.LocationID
Order BY ApplicantID
Here is my attempt to convert to LINQ:
public List<NewApplicantViewModel> GetPeople()
{
var query = from pl in _DB.Person.ToList()
join l in _Elig_DB.Locations.ToList() on pl.RequestedVolunteerLocation equals l.LocationID
select new
{
pl.RequestedVolunteerLocation = l.LocationName
};
return query.ToList();
The number of errors I get from this are numerous, but most are along the lines of:
Cannot convert from type Annonymous to Type List<NewAPplicantModel>
and
Invalid annonymous type declarator.
Please help, and thank you for reading my post.
Oh, and I have only been programming for a couple months, so if I am going about this all wrong, please let me know. Only thing I have to stick with is the table structure because it is an existing app that I am updating, and changing the location or person tables would have large consequences.
public List<NewApplicantViewModel> GetPeople()
{
var query = from pl in _DB.Person
join l in _Elig_DB.Locations on pl.RequestedVolunteerLocation
equals l.LocationID
select new NewApplicantViewModel
{
LocationName = l.LocationName,
otherPropery = p.Property
};
return query.ToList();
}
Beware of calling _DB.Person.ToList() it will load all persons from DB because ToList() immediately executes the query and the join would be performed in memory (not in DB).
The reason you are getting an error is you are projecting an anonymous type
select new
{
pl.RequestedVolunteerLocation = l.LocationName
};
Instead, you need to project a NewApplicantViewModel
select new NewApplicantViewModel
{
RequestedVolunteerLocation = l.LocationName
};

How do I use 2 include statements in a single MVC EF query?

I am trying to write a query that includes 2 joins.
1 StoryTemplate can have multiple Stories
1 Story can have multiple StoryDrafts
I am starting the query on the StoryDrafts object because that is where it's linked to the UserId.
I don't have a reference from the StoryDrafts object directly to the StoryTemplates object. How would I build this query properly?
public JsonResult Index(int userId)
{
return Json(
db.StoryDrafts
.Include("Story")
.Include("StoryTemplate")
.Where(d => d.UserId == userId)
,JsonRequestBehavior.AllowGet);
}
Thank you for any help.
Try to flatten your hierarchy if it works for you. Here is a sample, and you may want to customize it for your needs.
var result = from c in db.Customers
join o in db.Orders
on c equals o.Customers
select new
{
custid = c.CustomerID,
cname = c.CompanyName,
address = c.Address,
orderid = o.OrderID,
freight = o.Freight,
orderdate = o.OrderDate
};
If flattering does not meet your requirements then you need to use query that returns a Nested Group. Finally, look at the following link for more references - LINQ Query Expressions .

Entity Framework Include OrderBy random generates duplicate data

When I retrieve a list of items from a database including some children (via .Include), and order the randomly, EF gives me an unexpected result.. I creates/clones addition items..
To explain myself better, I've created a small and simple EF CodeFirst project to reproduce the problem.
First i shall give you the code for this project.
The project
Create a basic MVC3 project and add the EntityFramework.SqlServerCompact package via Nuget.
That adds the latest versions of the following packages:
EntityFramework v4.3.0
SqlServerCompact v4.0.8482.1
EntityFramework.SqlServerCompact v4.1.8482.2
WebActivator v1.5
The Models and DbContext
using System.Collections.Generic;
using System.Data.Entity;
namespace RandomWithInclude.Models
{
public class PeopleContext : DbContext
{
public DbSet<Person> Persons { get; set; }
public DbSet<Address> Addresses { get; set; }
}
public class Person
{
public int ID { get; set; }
public string Name { get; set; }
public virtual ICollection<Address> Addresses { get; set; }
}
public class Address
{
public int ID { get; set; }
public string AdressLine { get; set; }
public virtual Person Person { get; set; }
}
}
The DB Setup and Seed data: EF.SqlServerCompact.cs
using System.Collections.Generic;
using System.Data.Entity;
using System.Data.Entity.Infrastructure;
using RandomWithInclude.Models;
[assembly: WebActivator.PreApplicationStartMethod(typeof(RandomWithInclude.App_Start.EF), "Start")]
namespace RandomWithInclude.App_Start
{
public static class EF
{
public static void Start()
{
Database.DefaultConnectionFactory = new SqlCeConnectionFactory("System.Data.SqlServerCe.4.0");
Database.SetInitializer(new DbInitializer());
}
}
public class DbInitializer : DropCreateDatabaseAlways<PeopleContext>
{
protected override void Seed(PeopleContext context)
{
var address1 = new Address {AdressLine = "Street 1, City 1"};
var address2 = new Address {AdressLine = "Street 2, City 2"};
var address3 = new Address {AdressLine = "Street 3, City 3"};
var address4 = new Address {AdressLine = "Street 4, City 4"};
var address5 = new Address {AdressLine = "Street 5, City 5"};
context.Addresses.Add(address1);
context.Addresses.Add(address2);
context.Addresses.Add(address3);
context.Addresses.Add(address4);
context.Addresses.Add(address5);
var person1 = new Person {Name = "Person 1", Addresses = new List<Address> {address1, address2}};
var person2 = new Person {Name = "Person 2", Addresses = new List<Address> {address3}};
var person3 = new Person {Name = "Person 3", Addresses = new List<Address> {address4, address5}};
context.Persons.Add(person1);
context.Persons.Add(person2);
context.Persons.Add(person3);
}
}
}
The controller: HomeController.cs
using System;
using System.Data.Entity;
using System.Linq;
using System.Web.Mvc;
using RandomWithInclude.Models;
namespace RandomWithInclude.Controllers
{
public class HomeController : Controller
{
public ActionResult Index()
{
var db = new PeopleContext();
var persons = db.Persons
.Include(p => p.Addresses)
.OrderBy(p => Guid.NewGuid());
return View(persons.ToList());
}
}
}
The View: Index.cshtml
#using RandomWithInclude.Models
#model IList<Person>
<ul>
#foreach (var person in Model)
{
<li>
#person.Name
</li>
}
</ul>
this should be all, and you application should compile :)
The problem
As you can see, we have 2 straightforward models (Person and Address) and Person can have multiple Addresses.
We seed the generated database 3 persons and 5 addresses.
If we get all the persons from the database, including the addresses and randomize the results and just print out the names of those persons, that's where it all goes wrong.
As a result, i sometimes get 4 persons, sometimes 5 and sometimes 3, and i expect 3. Always.
e.g.:
Person 1
Person 3
Person 1
Person 3
Person 2
So.. it's copying/cloning data! And that's not cool..
It just seems that EF looses track of what addresses are a child of which person..
The generated SQL query is this:
SELECT
[Project1].[ID] AS [ID],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C1],
[Project1].[ID1] AS [ID1],
[Project1].[AdressLine] AS [AdressLine],
[Project1].[Person_ID] AS [Person_ID]
FROM ( SELECT
NEWID() AS [C1],
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name],
[Extent2].[ID] AS [ID1],
[Extent2].[AdressLine] AS [AdressLine],
[Extent2].[Person_ID] AS [Person_ID],
CASE WHEN ([Extent2].[ID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [People] AS [Extent1]
LEFT OUTER JOIN [Addresses] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Person_ID]
) AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[ID] ASC, [Project1].[C2] ASC
Workarounds
If i remove the .Include(p =>p.Addresses) from the query, everything goes fine. but of course the addresses aren't loaded and accessing that collection will make a new call to the database every time.
I can first get the data from the database and randomize later by just adding a .ToList() before the .OrderBy.. like this: var persons = db.Persons.Include(p => p.Addresses).ToList().OrderBy(p => Guid.NewGuid());
Does anybody have any idea of why it is happening like this?
Might this be a bug in the SQL generation?
As one can sort it out by reading AakashM answer and Nicolae Dascalu answer, it strongly seems Linq OrderBy requires a stable ranking function, which NewID/Guid.NewGuid is not.
So we have to use another random generator that would be stable inside a single query.
To achieve this, before each querying, use a .Net Random generator to get a random number. Then combine this random number with a unique property of the entity to get randomly sorted. And to 'randomize' a bit the result, checksum it. (checksum is a SQL Server function that compute a hash; original idea founded on this blog.)
Assuming Person Id is an int, you could write your query this way :
// Random instances should be stored and reused, not instanciated at each usage.
// But beware, it is not thread safe. If you want to share it between threads, you
// would have to use locks, see its documentation.
// https://learn.microsoft.com/en-us/dotnet/api/system.random.
// But using locks is a bad idea for scalability, especially in a Web context.
var randomGenerator = new Random();
// ...
var rnd = randomGenerator.NextDouble();
var persons = db.Persons
.Include(p => p.Addresses)
.OrderBy(p => SqlFunctions.Checksum(p.Id * rnd));
Like the NewGuid hack, this is very probably not a good random generator with a good distribution and so on. But it does not cause entities to get duplicated in results.
Beware:
If your query ordering does not guarantees uniqueness of your entities ranking, you must complement it for guarantying it. By example, if you use a non-unique property of your entities for the checksum call, then add something like .ThenBy(p => p.Id) after the OrderBy.
If your ranking is not unique for your queried root entity, its included children may get mixed with children of other entities having the same ranking. And then the bug will stay here.
Note:
I would prefer use .Next() method to get an int then combine it through a xor (^) to an entity int unique property, rather than using a double and multiply it. But SqlFunctions.Checksum unfortunately does not provide an overload for int data type, though the SQL server function is supposed to support it. You may use a cast to overcome this, but for keeping it simple I finally had chosen to go with the multiply.
tl;dr: There's a leaky abstraction here. To us, Include is a simple instruction to stick a collection of things onto each single returned Person row. But EF's implementation of Include is done by returning a whole row for each Person-Address combo, and reassembling at the client. Ordering by a volatile value causes those rows to become shuffled, breaking apart the Person groups that EF is relying on.
When we have a look at ToTraceString() for this LINQ:
var people = c.People.Include("Addresses");
// Note: no OrderBy in sight!
we see
SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[Id1] AS [Id1],
[Project1].[Data] AS [Data],
[Project1].[PersonId] AS [PersonId]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Data] AS [Data],
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [Person] AS [Extent1]
LEFT OUTER JOIN [Address] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PersonId]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC
So we get n rows for each A, plus 1 row for each P without any As.
Adding an OrderBy clause, however, puts the thing-to-order-by at the start of the ordered columns:
var people = c.People.Include("Addresses").OrderBy(p => Guid.NewGuid());
gives
SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C1],
[Project1].[Id1] AS [Id1],
[Project1].[Data] AS [Data],
[Project1].[PersonId] AS [PersonId]
FROM ( SELECT
NEWID() AS [C1],
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Data] AS [Data],
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [Person] AS [Extent1]
LEFT OUTER JOIN [Address] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PersonId]
) AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[Id] ASC, [Project1].[C2] ASC
So in your case, where the ordered-by-thing is not a property of a P, but is instead volatile, and therefore can be different for different P-A records of the same P, the whole thing falls apart.
I'm not sure where on the working-as-intended ~~~ cast-iron bug continuum this behaviour falls. But at least now we know about it.
I dont think there is an issue in query generation, but there is definately an issue when EF tries to convert rows into object.
It looks like there is an inherent assumption here that data for the same person in a joined statement will be returned grouped together order by or not.
for example the result of a joined query will always be
P.Id P.Name A.Id A.StreetLine
1 Person 1 10 ---
1 Person 1 11
2 Person 2 12
3 Person 3 13
3 Person 3 14
even if you order by some other column, same person would always appear one after the other.
this assumption is mostly true for any joined query.
But there is a deeper issue here i think. OrderBy is for when you want data in certain order ( as opposite to random), so that assumption does seem reasonable.
i think you should really get data out and then randomize it according to some other means in your code
From theory:
To sort a list of items, the compare function should be stable relative to items; this means that for any 2 items x, y the result of x< y should be the same as many time is queried(called).
I think the issue is related to misunderstanding of specification(documentation) of OrderBy method:
keySelector - A function to extract a key from an element.
EF didn't mention explicitly if the provided function should return the same value for same object as many times is called (in your case returns different/random values), but I think the "key" term that they used in documentation implicitly suggested this.
When you define a query path to define the query results, (use Include), the query path is only valid on the returned instance of ObjectQuery. Other instances of ObjectQuery and the object context itself are not affected. This functionality lets you chain multiple "Includes" for eager loading.
Therefor, Your statement translates into
from person in db.Persons.Include(p => p.Addresses).OrderBy(p => Guid.NewGuid())
select person
instead of what you intended.
from person in db.Persons.Include(p => p.Addresses)
select person
.OrderBy(p => Guid.NewGuid())
Hence your second workaround works fine :)
Reference: Loading Related Objects While Querying A Conceptual Model in Entity
Framework - http://msdn.microsoft.com/en-us/library/bb896272.aspx
I also ran into this problem, and solved it by adding a Randomizer Guid property to the main class I was fetching. I then set the column's default value to NEWID() like this (using EF Core 2)
builder.Entity<MainClass>()
.Property(m => m.Randomizer)
.HasDefaultValueSql("NEWID()");
When fetching, it gets a bit more complicated. I created two random integers to function as my order-by indexes, then ran the query like this
var rand = new Random();
var randomIndex1 = rand.Next(0, 31);
var randomIndex2 = rand.Next(0, 31);
var taskSet = await DbContext.MainClasses
.Include(m => m.SubClass1)
.ThenInclude(s => s.SubClass2)
.OrderBy(m => m.Randomizer.ToString().Replace("-", "")[randomIndex1])
.ThenBy(m => m.Randomizer.ToString().Replace("-", "")[randomIndex2])
.FirstOrDefaultAsync();
This seems to be working well enough, and should provide enough entropy for even a large dataset to be fairly randomized.

Entity Framework - Join on many to many

I have a simple many to many relationship and I am wondering how you get data out of it. Here is the setup
Tables
Media
Media_Keyword (many to many map)
Keyword
Here is the code I have:
public List<Keyword> GetFromMedia(int mediaID)
{
var media = (from m in Connection.Data.Media
where m.id == mediaID
select m).First();
var keys = (from k in media.Media_Keyword
select new Keyword {ID = k.Keywords.id, Name = k.Keywords.keyword});
return keys.ToList();
}
Is there a way to do this better?
Usually, I select right from the many-to-many map.
var keys = from k in Connection.Data.Media_Keyword
where k.MediaID == mediaID
select k.Keywords;
I've not used the entity framework specifically, but can't you just combine them like this?
public List<Keyword> GetFromMedia(int mediaID)
{
return (from m in Connection.Data.Media
from k in m.Media_Keyword
where m.id == mediaID
select new Keyword {ID = k.Keywords.id, Name = k.Keywords.keyword}).ToList();
}
Response to Kleinux (Don't know why i can't add a comment to your question)
Sure you can, but it's not necessarly a good things, because context giving you a new "keyword". Then, if you try to update this or something thinking that you will update, context gonna see it as a new keyword and would create a new one instead of updating it.
** UPDATE
Sorry for my english, i'm french, well not french but from Quebec. I'm giving my 110%!!

Resources