Accessing columns in cross reference table via LINQ with Entity Framework

Accessing columns in cross reference table via LINQ with Entity Framework - asp.net-mvc

I have a Users table, a Roles table, and a cross reference table Users_Roles which has the following columns:
User_ID Role_ID Beamline_ID Facility_ID Laboratory_ID
The Beamline_ID, Facility_ID, Laboratory_ID are only filled in depending on the Role_ID. If someone has the Role_ID of 2 ("Lab Admin") then they will have an entry in Laboratory_ID.
I am trying to figure out how to get the Laboratory_ID for a specific row in this table. For example, I know I have User_ID = 1. I want to get the Laboratory_ID for User_ID = 1 and Role_ID = 2 ("Lab Admin").
This is obviously simple when dealing with SQL but I am new to Entity Framework and I am trying to do this with Entities and I'm having some trouble. I am using MVC so in my controller I have done this:
User user = new User();
user.GetUser(User.Identity.Name);
var labID = user.Users_Roles.Where(r => r.Role_ID == 2);
That should get me the "row" of that user when the Role = Lab Admin but I don't know how to grab the Labortory_ID column now. I thought maybe it would be:
var labID = user.Users_Roles.Where(r => r.Role_ID == 2).Select(l => l.Laboratory_ID);
But that is not correct. Any help would be greatly appreciated.
EDIT: I am using the database first approach and I am using DBContext. So typically I would access the context like this:
var context = new PASSEntities();

As for why the code you've already tried doesn't work, at first glance I think you just need to tack on a .Single() to the end:
var labID = user.Users_Roles.Where(r => r.Role_ID == 2)
.Select(l => l.Laboratory_ID)
.Single();
Select() returns a sequence, and it looks like you want a single value, so I think that might be the cause of the problem you were having. Single() will throw an exception if there's not exactly one result, which sounds appropriate for your table structure, but there's also First() if you don't care about enforcing that.
But for this kind of thing, you might also think about just querying manually using your DbContext, instead of trying to load entities individually and then traversing down through their navigation properties -- that results in more local searching than you probably need, and, depending on the circumstances, might be performing more/larger queries against your context and database than necessary too.
Here's an example LINQ query you could use (you might have to adjust the names of your tables and so on):
var labID = (from ur in context.Users_Roles
where ur.User_ID == 1 && ur.Role_ID == 2
select ur.Laboratory_ID).Single();
When you execute that statement, it will get translated into SQL equivalent to this:
SELECT TOP 1 Laboratory_ID FROM Users_Roles WHERE User_ID = 1 AND Role_ID = 2
...so it's pretty efficient. (Actually, if User_ID and Role_ID form the primary key for the Users_Roles table, and that record has already been loaded previously, I think it'll just return the cached copy, and won't need to query the database at all.)
That's a very bare-bones query, though; more than likely, you're eventually going to want to query based on properties of the user or role, instead of just searching for hard-coded IDs you know in advance. In that case, you can adjust your query to something like this:
var labID = (from u in context.Users
join ur in context.Users_Roles on u.User_ID equals ur.User_ID
join r in context.Roles on ur.Role_ID equals r.Role_ID
where u.UserName == "John Smith" && r.RoleName == "Lab Admin"
select ur.Laboratory_ID).Single();
That would, as you can probably tell, return the Lab ID for the user John Smith under the role Lab Admin.
Does that make any sense? LINQ syntax can take a little getting used to.

Related

Navigation Property Not Evaluated EF Core

I have an issue in EF Core where I am trying to get a related entity and all it's dependent structures, but am not having much success with it.
Currently, I have a query like this:
var user = new Guid(id);
var userCustAffs = _data.UserCustomerAffiliation.Include(x => x.Customer)
.ThenInclude(x => x.Brand).Where(x => x.UserId.Equals(user)).ToList();
var result = userCustAffs.Select(p => p.Customer).ToList();
When I should be able to do something like this to simplify it (and remove unneccesary things being evaluated locally vs the db)
var user = new Guid(id);
var userCustAffs = _data.UserCustomerAffiliation.Include(x => x.Customer)
.ThenInclude(x => x.Brand).Where(x => x.UserId.Equals(user))
.Select(y => y.Customer).ToList();
However, when I do the latter query, I get an error that
The Include operation for navigation '[x].Customer.Brand' is unnecessary and was ignored
because the navigation is not reachable in the final query results
However, Brand is very important, as it drives some of the properties off of the Customer model. What is the proper way to restructure this query so that I get the results I want (e.g. Customer with its relevant Brand, limited by the userId affiliated on the UserCustomerAffiliation table).
I have seen a recommendation before to "start" the query from the Customer instead of UserCustomerAffiliation, but that seems contrary to every instinct I have from a DB optimization standpoint (and Customer does not have a navigation property back to UserCustomerAffiliation atm).

The answer to why this happens this (after some research) is quite interesting and a good example of why knowing how EF Core works is important to using it.
Linq in general works on the idea of deferred execution. To put it very simply, if I make a Linq statement on a particular line, it may not get evaluated or executed until the data is "needed." Most of the time we shortcut this with .ToList() which forces immediate execution. The general idea here is that sometimes datasets are not needed (say, if an exception occurs before it gets evaluated but after it would be 'loaded').
EF Core takes this one step further and ties the idea of deferred execution with database optimization. If, for example, I get a subset of data from the database:
var result = _context.SomeTable.Where(x => x.name == "SomeValue");
But later all I care about is the size of the dataset:
return result.Count;
The DB call can be optimized to
select count(*) from SomeTable where name = "SomeValue";
instead of
select * from SomeTable where name = "SomeValue";
Similarly, the query I have above was being optimized away. Because I chained the whole thing before it was evaluated, the EF Core optimizer threw away a table I needed.
The reason this works:
var user = new Guid(id);
var userCustAffs = _data.UserCustomerAffiliation.Include(x => x.Customer)
.ThenInclude(x => x.Brand).Where(x =>
x.UserId.Equals(user)).ToList();
var result = userCustAffs.Select(p => p.Customer).ToList();
Is because I force execution of the query that is something like
Select u.*, c.*, b.* from usercustomeraffiliation u,
inner join Customer c on u.customerid = c.id
inner join Brand b on c.brandid = c.id
where u.userid = 'userId';
And then strip out the customer object (and the brand object underneath it) in memory. It would be more efficient to be able to generate a query like:
Select c.*, b.* from Customer c on u.customerid = c.id
inner join Brand b on c.brandid = c.id
where c.id in (select u.customerid from usercustomeraffiliation u
where u.userid = 'userId');
But, that gets optimized away.

Is there anyway to make a lesser impact on my database with this request?

For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.

The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id

As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.

THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.

Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms

Difficulties with SQL query and COUNT

thanks in advnace for reading this. I'm not really good with SQL so please pardon any stupid mistake...
Here is the deal, I have four tables (i'm only going to give the basic fields, and dependencies between tables, for the sake of simplicity):
Company: companyId, companyName
User: userId, userName
Project: projectId, projectUserId, projectCompanyId, projectDate
Study: studyProjectId
The dependencies are like so:
A project is for a client (projectUserId) and carried out by a company (projectCompanyId)
There can be many studies for the same project, but each study is for one project (studyProjectId)
Here is the kind of request I'd like to write, but it doesn't work right now:
SELECT
project.projectId,
company.companyName,
user.userName,
COUNT( study.studyId ) AS numberStudies
FROM project, company, user, study
WHERE company.companyId = project.projectCompanyId,
AND user.userId = project.projectUserId,
AND study.studyProjectId = project.projectId
ORDER BY company.companyId, user.userId, project.projectDate;
It returns one record for which numberStudies equals the total number of studies. If I remove the COUNT from the SELECT, then I get the type of result I want, but without the column numberStudies (of course). Hoping you understand what I'm trying to get, what am I doing wrong here?
Thanks again in advance :)
EDIT: If possible, I'd like the request to show records even when numberStudies is 0.

As in the comments, you need a GROUP BY clause when you want to have aggregate results (like it seems you want: "Number of Studies per project, company and user"). So, the first thing to do is add:
GROUP BY project.projectId, company.companyName, user.userName
Notice that the three columns are exactly the three that you have (unaggregated) in the SELECT list.
SELECT
project.projectId,
company.companyName,
user.userName,
COUNT(study.studyId) AS numberStudies
FROM project, company, user, study
WHERE company.companyId = project.projectCompanyId,
AND user.userId = project.projectUserId,
AND study.studyProjectId = project.projectId
GROUP BY project.projectId, company.companyName, user.userName
ORDER BY company.companyId, user.userId, project.projectDate ;
This will show what you want but there are still a few issues:
First, you are using the old (SQL-89) syntax of implicit joins with the conditions in the WHERE clause. This syntax is not deprecated but the new (well, 20 years old SQL-92) syntax with the JOIN keyword has several advantages.
We can add aliases for the tables for readability.
There may be two companies or users with same name so we should group by their IDs, not only their names.
One advantage of explicit JOIN syntax is that it's easy to have results when there are no rows to join (as you want to show when there is no studies for a project). Just LEFT JOIN the study table.
So, the query becomes:
SELECT
p.projectId,
c.companyName,
u.userName,
COUNT(s.studyId) AS numberStudies
FROM
project AS p
JOIN
company AS c ON c.companyId = p.projectCompanyId
JOIN
user AS u ON u.userId = p.projectUserId
LEFT JOIN
study AS s ON s.studyProjectId = p.projectId
GROUP BY
c.companyId,
u.userId,
p.projectId,
c.companyName, u.userName
ORDER BY
c.companyId,
u.userId,
p.projectDate ;

you probably need a LEFT JOIN between study and project

ASP.NET MVC & EF4 Entity Framework - Are there any performance concerns in using the entities vs retrieving only the fields i need?

Lets say we have 3 tables, Users, Products, Purchases.
There is a view that needs to display the purchases made by a user.
I could lookup the data required by doing:
from p in DBSet<Purchases>.Include("User").Include("Product") select p;
However, I am concern that this may have a performance impact because it will retrieve the full objects.
Alternatively, I could select only the fields i need:
from p in DBSet<Purchases>.Include("User").Include("Product") select new SimplePurchaseInfo() { UserName = p.User.name, Userid = p.User.Id, ProductName = p.Product.Name ... etc };
So my question is:
Whats the best practice in doing this?
== EDIT
Thanks for all the replies.
[QUESTION 1]: I want to know whether all views should work with flat ViewModels with very specific data for that view, or should the ViewModels contain the entity objects.
Real example: User reviews Products
var query = from dr in productRepository.FindAllReviews()
where dr.User.UserId = 'userid'
select dr;
string sql = ((ObjectQuery)query).ToTraceString();
SELECT [Extent1].[ProductId] AS [ProductId],
[Extent1].[Comment] AS [Comment],
[Extent1].[CreatedTime] AS [CreatedTime],
[Extent1].[Id] AS [Id],
[Extent1].[Rating] AS [Rating],
[Extent1].[UserId] AS [UserId],
[Extent3].[CreatedTime] AS [CreatedTime1],
[Extent3].[CreatorId] AS [CreatorId],
[Extent3].[Description] AS [Description],
[Extent3].[Id] AS [Id1],
[Extent3].[Name] AS [Name],
[Extent3].[Price] AS [Price],
[Extent3].[Rating] AS [Rating1],
[Extent3].[ShopId] AS [ShopId],
[Extent3].[Thumbnail] AS [Thumbnail],
[Extent3].[Creator_UserId] AS [Creator_UserId],
[Extent4].[Comment] AS [Comment1],
[Extent4].[DateCreated] AS [DateCreated],
[Extent4].[DateLastActivity] AS [DateLastActivity],
[Extent4].[DateLastLogin] AS [DateLastLogin],
[Extent4].[DateLastPasswordChange] AS [DateLastPasswordChange],
[Extent4].[Email] AS [Email],
[Extent4].[Enabled] AS [Enabled],
[Extent4].[PasswordHash] AS [PasswordHash],
[Extent4].[PasswordSalt] AS [PasswordSalt],
[Extent4].[ScreenName] AS [ScreenName],
[Extent4].[Thumbnail] AS [Thumbnail1],
[Extent4].[UserId] AS [UserId1],
[Extent4].[UserName] AS [UserName]
FROM [ProductReviews] AS [Extent1]
INNER JOIN [Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[UserId]
LEFT OUTER JOIN [Products] AS [Extent3] ON [Extent1].[ProductId] = [Extent3].[Id]
LEFT OUTER JOIN [Users] AS [Extent4] ON [Extent1].[UserId] = [Extent4].[UserId]
WHERE N'615005822' = [Extent2].[UserId]
or
from d in productRepository.FindAllProducts()
from dr in d.ProductReviews
where dr.User.UserId == 'userid'
orderby dr.CreatedTime
select new ProductReviewInfo()
{
product = new SimpleProductInfo() { Id = d.Id, Name = d.Name, Thumbnail = d.Thumbnail, Rating = d.Rating },
Rating = dr.Rating,
Comment = dr.Comment,
UserId = dr.UserId,
UserScreenName = dr.User.ScreenName,
UserThumbnail = dr.User.Thumbnail,
CreateTime = dr.CreatedTime
};
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Thumbnail] AS [Thumbnail],
[Extent1].[Rating] AS [Rating],
[Extent2].[Rating] AS [Rating1],
[Extent2].[Comment] AS [Comment],
[Extent2].[UserId] AS [UserId],
[Extent4].[ScreenName] AS [ScreenName],
[Extent4].[Thumbnail] AS [Thumbnail1],
[Extent2].[CreatedTime] AS [CreatedTime]
FROM [Products] AS [Extent1]
INNER JOIN [ProductReviews] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [Users] AS [Extent3] ON [Extent2].[UserId] = [Extent3].[UserId]
LEFT OUTER JOIN [Users] AS [Extent4] ON [Extent2].[UserId] = [Extent4].[UserId]
WHERE N'userid' = [Extent3].[UserId]
ORDER BY [Extent2].[CreatedTime] ASC
[QUESTION 2]: Whats with the ugly outer joins?

In general, only retrieve what you need, but keep in mind to retrieve enough information so your application is not too chatty, so if you can batch a bunch of things together, do so, otherwise you'll pay network traffic cost everytime you need to go back to the database and retrieve some more stuffs.
In this case, assuming you will only need those info, I would go with the second approach (if that's what you really need).

Eager loading with .Include doesn't really play nice when you want filtering (or ordering for that matter).
That first query is basically this:
select p.*, u.*, p2.*
from products p
left outer join users u on p.userid = u.userid
left outer join purchases p2 on p.productid = p2.productid
where u.userid == #p1
Is that really what you want?
There is a view that needs to display the purchases made by a user.
Well then why are you including "Product"?
Shouldn't it just be:
from p in DBSet<Purchases>.Include("User") select p;
Your second query will error. You must project to an entity on the model, or an anonymous type - not a random class/DTO.
To be honest, the easiest and most well performing option in your current scenario is to query on the FK itself:
var purchasesForUser = DBSet<Purchases>.Where(x => x.UserId == userId);
That should produce:
select p.*
from products p
where p.UserId == #p1
The above query of course requires you to include the foreign keys in the model.
If you don't have the FK's in your model, then you'll need more LINQ-Entities trickery in the form of anonymous type projection.
Overall, don't go out looking to optimize. Create queries which align with the scenario/business requirement, then optimize if necessary - or look for alternatives to LINQ-Entities, such as stored procedures, views or compiled queries.
Remember: premature optimization is the root of all evil.
*EDIT - In response to Question Update *
[QUESTION 1]: I want to know whether all views should work with flat ViewModels with very specific data for that view, or should the ViewModels contain the entity objects.
Yes - ViewModel's should only contain what is required for that View. Otherwise why have the ViewModel? You may as well bind straight to the EF model. So, setup the ViewModel which only the fields it needs for the view.
[QUESTION 2]: What's with the ugly outer joins?
That is default behaviour for .Include. .Include always produces a left outer join.

I think the second query will throw exception because you can't map result to unmapped .NET type in Linq-to-entities. You have to return annonymous type and map it to your object in Linq-to-objects or you have to use some advanced concepts for projections - QueryView (projections in ESQL) or DefiningQuery (custom SQL query mapped to new readonly entity).
Generally it is more about design of your entities. If you select single small entity it is not a big difference to load it all instead of projection. If you are selecting list of entities you should consider projections - expecially if tables contains columns like nvarchar(max) or varbinar(max) which are not needed in your result!

Both create almost the same query: select from one table, with two inner joins. The only thing that changes from a database perspective is the amount of fields returned, but that shouldn't really matter that much.
I think here DRY wins from a performance hit (if it even exists): so my call is go for the first option.

Entity Framework: LINQ Include() does not work after DB update, why?

I am new to Entity Framework and LINQ and have run into a rather odd scenario.
I have been using the following query to return account information:
var account = ((from acct in _entities.Account
join m in _entities.Item on acct.Id equals m.Account.Id
where acct.Id == accountId && m.ItemNumber.EndsWith(itemNumber)
select acct) as ObjectQuery<Account>).Include("Item.ItemDetails");
We recently made some changes to the database and generated a new edmx file. Following the change the above query still returns the account and associated Item but the ItemDetails is no longer being included.
I have validated the SQL returned by the query and there doesn't seem to be anything wrong as the correct data is being returned.
Furthermore I don't see anthing different in the edmx file between the Item and ItemDetails objects as these were not changed and the navigation property is there.
Has anyone seen this before?
Thanks

In Include(...) is used the name of the navigation property so it will be good to check the exact name of the property from the .edmx (especially if it is singular or plural).
Also you can try to change the query like this:
var account = from acct in _entities.Account.Include("Item.ItemDetails")
join m in _entities.Item
on acct.Id equals m.Account.Id
where acct.Id == accountId && m.ItemNumber.EndsWith(itemNumber)
select acct;

You have one of two possible scenarios:
Item has a relationship to Account (expressed in your Entity Model as a EntityAssociation and in DB as a foreign key):
There is no relationship between Item set and Account set, hence, you must specify a join in LINQ as you have done.
Case 1: if this is the case, then you don't need a join statement... by selecting Acount.Item will naturally give you all items where Item.AccountID is equal to Account.ID
So your join statement: join m in _entities.Item on acct.Id equals m.Account.Id
has basically told Item to loop back onto Account to check the ID. If they were not already connected, then you could not have gotten m.Account.ID
Case 2: If there is no relationship between Account and Item, then the .Include() will definitely not work because the navigational property DOES NOT exist in your model.
Conclusion: Check your new model to see if a relationship exists between Account and Item. If yes, then remove the Join. If no relationship, then you've done something wrong.
Here is a select statement assuming scenario 1 and that Account.Item is not a collection:
var account = from acct in _entities.Account.Include("Item.ItemDetails")
where acct.Id == accountId && acct.Item.ItemNumber.EndsWith(itemNumber)
select acct;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart