Using RoR with a legacy table that uses E-A-V - ruby-on-rails

I'm needing to connect to a legacy database and pull a subset of data from a table that uses the entity-attribute-value model to store a contact's information. The table looks like the following:
subscriberid fieldid data
1 2 Jack
1 3 Sparrow
2 2 Dan
2 3 Smith
where fieldid is a foreign key to a fields table that lists custom fields a given customer can have (e.g. first name, last name, phone). The SQL involved is rather hairy as I have to join the table to itself for every field I want back (currently I need 6 fields) as well as joining to a master contact list that's based on the current user.
The SQL is something like this:
select t0.data as FirstName, t1.data as LastName, t2.data as SmsOnly
from subscribers_data t0 inner join subscribers_data t1
on t0.subscriberid = t1.subscriberid
inner join subscribers_data t2
on t2.subscriberid = t1.subscriberid
inner join list_subscribers ls
on (t0.subscriberid = ls.subscriberid and t1.subscriberid = ls.subscriberid)
inner join lists l
on ls.listid = l.listid
where l.name = 'My Contacts'
and t0.fieldid = 2
and t1.fieldid = 3;
How should I go about handling this with my RoR application? I would like to abstracat this away and still be able to use the normal "dot notation" for pulling the attributes out. Luckily the data is read-only for the foreseeable future.

This is exactly what #find_by_sql was designed for. I would reimplement #find to do what you need to do, something like this:
class Contact < ActiveRecord::Base
set_table_table "subscribers_data"
def self.find(options={})
find_by_sql <<EOS
select t0.data as FirstName, t1.data as LastName, t2.data as SmsOnly
from subscribers_data t0 inner join subscribers_data t1
on t0.subscriberid = t1.subscriberid
inner join subscribers_data t2
on t2.subscriberid = t1.subscriberid
inner join list_subscribers ls
on (t0.subscriberid = ls.subscriberid and t1.subscriberid = ls.subscriberid)
inner join lists l
on ls.listid = l.listid
where l.name = 'My Contacts'
and t0.fieldid = 2
and t1.fieldid = 3;
EOS
end
end
The Contact instances will have #FirstName and #LastName as attributes. You could rename them as AR expects too, such that #first_name and #last_name would work. Simply change the AS clauses of your SELECT.

I am not sure it is totally germane to your question, but you might want to take a look at MagicModel. It can generate models for you based on a legacy database. Might lower the amount of work you need to do.

Related

Rails: How to force ActiveRecord generate alias for an association every time (just like Hibernate in Java does it), not only when it's ambiguous?

I work on a project where there is STI Item with 5 subclasses (Item1, Item2 ... Item5). This STI (items table) is mapped over a join table item_parents to Parent record (parents table) record. The mapping is done via has_many trough:.
Each of the items has two fields: name, code both are strings. Parent has many fields, but for the sake of example let's say it has name, created_at.
On the frontend, they are displayed in one table, like this:
Parent.name | Parent.created_at | Item1.name | Item1.code | Item2.name | Item2.code | ...
Users can configure filtering for each of the columns. It can be any combination or no filter at all. For example, they can choose the following combination:
Parent.created_at before 2020.02.22
Item1.name containing 'abc'
Item2.name containing 'xyz'
Item3.code equals 'Z12'
The filtering code implemented like this:
def search(filters)
filters.reduce(Parent.all) { |query, (key, value)| apply_filter(query, key, value) }
end
def apply_filter(query, key, value)
case filter_key
when :parent_name_contains
query.where(Parent.arel_table[:name].matches("%#{value}%"))
when :parent_created_at_before
query.where(Parent.arel_table[:created_at].lt(value))
when :item1_name_contains
query.joins(:item1s).where(Item1.arel_table[:name].matches("%#{value}%"))
when :item2_name_contains
query.joins(:item2s).where(Item2.arel_table[:name].matches("%#{value}%"))
when :item1_code_equals
query.joins(:item1s).where(Item1.arel_table[:name].eq(value))
when :item2_code_equals
query.joins(:item2s).where(Item2.arel_table[:name].eq(value))
# ... and so on for all the filters
else
query
end
end
The problem
When I query by fields of two or more different subclasses of Item, ActiveRecord fails to generate correct WHERE clause. It does not use the alias that it has assigned for the association in JOIN clause.
Let's say I want to filter by Item1.name = 'i1' and Item2.name = 'i2', then what rails generates is this:
SELECT "parents".*
FROM "parents"
INNER JOIN "item_parents"
ON "item_parents"."parent_id" = "parents"."id"
INNER JOIN "items"
ON "items"."id" = "item_parents"."item_id"
AND "items"."item_type" = 'Item::Item1'
INNER JOIN "item_parents" "item_parents_parents_join"
ON "item_parents_parents_join"."parent_id" = "parents"."id"
INNER JOIN "items" "item2s_parents" -- OK. join has an alias
ON "item2s_parents"."id" = "item_parents_parents_join"."item_id"
AND "item2s_parents"."item_type" = 'Item::Item2'
WHERE "items"."name" = 'i1'
AND "items"."name" = 'i2' -- Wrong! Must be "item2s_parents"."name" = 'i2'
As a result, I have zero rows returned, because it's impossible to have an item with name equal to 'i1' AND 'i2' at the same time.
What I tried
It seemed to be a good idea to write a custom joins_item method, that would dig the query and check whether it has other joins called on it before (AR stores such information in query.values[:joins] and query.values[:left_outer_joins]) and if there is, then it would return another Arel::Table instance having the correct alias. If there is nothing joined before, then I don't need alias and return the default Arel::Table.
But then I found out that AR resolves aliases at the moment of building SQL. So even though I could guess the correct alias (or no alias) at the moment of joining it can change in the end. And this is actually what happens when you do left_outer_joins first and then joins. AR always places INNER JOINs before LEFT OUTER JOINs in the resulting SQL.
So the question is...
Is there a way to force AR to alias everything when I do joins or left_outer_joins with Arel, or any other more or less maintainable workaround/fix/monkey patch for this issue?

Optimizing SQL query using JOIN instead of NOT IN

I have a sql query that I'd like to optimize. I'm not the designer of the database, so I have no way of altering structure, indexes or stored procedures.
I have a table that consists of invoices (called faktura) and each invoice has a unique invoice id. If we have to cancel the invoice a secondary invoice is created in the same table but with a field ("modpartfakturaid") referring to the original invoice id.
Example of faktura table:
invoice 1: Id=152549, modpartfakturaid=null
invoice 2: Id=152592, modpartfakturaid=152549
We also have a table called "BHLFORLINIE" which consists of services rendered to the customer. Some of the services have already been invoiced and match a record in the invoice (FAKTURA) table.
What I'd like to do is get a list of all services that either does not have an invoice yet or does not have an invoice that's been cancelled.
What I'm doing now is this:
`SELECT
dbo.BHLFORLINIE.LeveringsDato AS treatmentDate,
dbo.PatientView.Navn AS patientName,
dbo.PatientView.CPRNR AS patientCPR
FROM
dbo.BHLFORLINIE
INNER JOIN dbo.BHLFORLOEB
ON dbo.BHLFORLOEB.BhlForloebID = dbo.BHLFORLINIE.BhlForloebID
INNER JOIN dbo.PatientView
ON dbo.PatientView.PersonID = dbo.BHLFORLOEB.PersonID
INNER JOIN dbo.HENVISNING
ON dbo.HENVISNING.BhlForloebID = dbo.BHLFORLOEB.BhlForloebID
LEFT JOIN dbo.FAKTURA
ON dbo.BHLFORLINIE.FakturaId = FAKTURA.FakturaId
WHERE
(dbo.BHLFORLINIE.LeveringsDato >= '2017-01-01' OR dbo.BHLFORLINIE.FakturaId IS NULL) AND
dbo.BHLFORLINIE.ProduktNr IN (110,111,112,113,8050,4001,4002,4003,4004,4005,4006,4007,4008,4009,6001,6002,6003,6004,6005,6006,6007,6008,7001,7002,7003,7004,7005,7006,7007,7008) AND
((dbo.FAKTURA.FakturaType = 0 AND
dbo.FAKTURA.FakturaID NOT IN (
SELECT FAKTURA.ModpartFakturaID FROM FAKTURA WHERE FAKTURA.ModpartFakturaID IS NOT NULL
)) OR
dbo.FAKTURA.FakturaType IS NULL)
GROUP BY
dbo.PatientView.CPRNR,
dbo.PatientView.Navn,
dbo.BHLFORLINIE.LeveringsDato`
Is there a smarter way of doing this? Right now the added the query performs three times slower because of the "not in" subquery.
Any help is much appreciated!
Peter
You can use an outer join and check for null values to find non matches
SELECT customer.name, invoice.id
FROM invoices i
INNER JOIN customer ON i.customerId = customer.customerId
LEFT OUTER JOIN invoices i2 ON i.invoiceId = i2.cancelInvoiceId
WHERE i2.invoiceId IS NULL

Unusual Joins SQL

I am having to convert code written by a former employee to work in a new database. In doing so I came across some joins I have never seen and do not fully understand how they work or if there is a need for them to be done in this fashion.
The joins look like this:
From Table A
Join(Table B
Join Table C
on B.Field1 = C.Field1)
On A.Field1 = B.Field1
Does this code function differently from something like this:
From Table A
Join Table B
On A.Field1 = B.Field1
Join Table C
On B.Field1 = C.Field1
If there is a difference please explain the purpose of the first set of code.
All of this is done in SQL Server 2012. Thanks in advance for any help you can provide.
I could create a temp table and then join that. But why use up the cycles\RAM on additional storage and indexes if I can just do it on the fly?
I ran across this scenario today in SSRS - a user wanted to see all the Individuals granted access through an AD group. The user was using a cursor and some temp tables to get the users out of AD and then joining the user to each SSRS object (Folders, reports, linked reports) associated with the AD group. I simplified the whole thing with Cross Apply and a sub query.
GroupMembers table
GroupName
UserID
UserName
AccountType
AccountTypeDesc
SSRSOjbects_Permissions table
Path
PathType
RoleName
RoleDesc
Name (AD group name)
The query needs to return each individual in an AD group associated with each report. Basically a Cartesian product of users to reports within a subset of data. The easiest way to do this looks like this:
select
G.GroupName, G.UserID, G.Name, G.AccountType, G.AccountTypeDesc,
[Path], PathType, RoleName, RoleDesc
from
GroupMembers G
cross apply
(select
[Path], PathType, RoleName, RoleDesc
from
SSRSOjbects_Permissions
where
Name = G.GroupName) S;
You could achieve this with a temp table and some outer joins, but why waste system resources?
I saw this kind of joins - it's MS Access style for handling multi-table joins. In MS Access you need to nest each subsequent join statement into its level brackets. So, for example this T-SQL join:
SELECT a.columna, b.columnb, c.columnc
FROM tablea AS a
LEFT JOIN tableb AS b ON a.id = b.id
LEFT JOIN tablec AS c ON a.id = c.id
you should convert to this:
SELECT a.columna, b.columnb, c.columnc
FROM ((tablea AS a) LEFT JOIN tableb AS b ON a.id = b.id) LEFT JOIN tablec AS c ON a.id = c.id
So, yes, I believe you are right in your assumption

How to build inner join in Rails with conditions?

I've a model StockUpdate which keeps track of stocks for every product for a store. Table attributes are: :product_id, :stock, :store_id. I was trying to find out last entry for every product for a given store. According to that I build my query in PGAdmin which is given below and it's working fine. I'm new in Rails and I don't know how to represent it in Model. Please help.
SELECT a.*
FROM stock_updates a
INNER JOIN
(
SELECT product_id, MAX(id) max_id
FROM stock_updates where store_id = 9 and stock > 0
GROUP BY product_id
) b ON a.product_id = b.product_id AND
a.id = b.max_id
I does not clearly understand what you want to do, but I think you can do something like this:
class StockUpdate < ActiveRecord::Base
scope :a_good_name, -> { joins(:product).where('store_id = ? and stock > ?', 9, 0) }
end
You can all call StoclUpdate.a_good_name.explain to check the generated sql
What you need is really simple and can be easily accomplished with 2 queries. Otherwise it becomes very complicated in a single query (it's still doable though):
store_ids = [0, 9]
latest_stock_update_ids = StockUpdate.
where(store_id: store_ids).
group(:product_id).
maximum(:id).
values
StockUpdate.where(id: latest_stock_update_ids)
Two queries, without any joins necessary. The same could be possible with a single query too. But like your original code, it would include subqueries.
Something like this should work:
StockUpdate.
where(store_id: store_ids).
where("stock_updates.id = (
SELECT MAX(su.id) FROM stock_updates AS su WHERE (
su.product_id = stock_updates.product_id
)
)
")
Or perhaps:
StockUpdate.where("id IN (
SELECT MAX(su.id) FROM stock_updates AS su GROUP BY su.product_id
)")
And to answer your original question, you can manually specify a joins like so:
Model1.joins("INNER JOINS #{Model2.table_name} ON #{conditions}")
# That INNER JOINS can also be LEFT OUTER JOIN, etc.

ASP.NET MVC & EF4 Entity Framework - Are there any performance concerns in using the entities vs retrieving only the fields i need?

Lets say we have 3 tables, Users, Products, Purchases.
There is a view that needs to display the purchases made by a user.
I could lookup the data required by doing:
from p in DBSet<Purchases>.Include("User").Include("Product") select p;
However, I am concern that this may have a performance impact because it will retrieve the full objects.
Alternatively, I could select only the fields i need:
from p in DBSet<Purchases>.Include("User").Include("Product") select new SimplePurchaseInfo() { UserName = p.User.name, Userid = p.User.Id, ProductName = p.Product.Name ... etc };
So my question is:
Whats the best practice in doing this?
== EDIT
Thanks for all the replies.
[QUESTION 1]: I want to know whether all views should work with flat ViewModels with very specific data for that view, or should the ViewModels contain the entity objects.
Real example: User reviews Products
var query = from dr in productRepository.FindAllReviews()
where dr.User.UserId = 'userid'
select dr;
string sql = ((ObjectQuery)query).ToTraceString();
SELECT [Extent1].[ProductId] AS [ProductId],
[Extent1].[Comment] AS [Comment],
[Extent1].[CreatedTime] AS [CreatedTime],
[Extent1].[Id] AS [Id],
[Extent1].[Rating] AS [Rating],
[Extent1].[UserId] AS [UserId],
[Extent3].[CreatedTime] AS [CreatedTime1],
[Extent3].[CreatorId] AS [CreatorId],
[Extent3].[Description] AS [Description],
[Extent3].[Id] AS [Id1],
[Extent3].[Name] AS [Name],
[Extent3].[Price] AS [Price],
[Extent3].[Rating] AS [Rating1],
[Extent3].[ShopId] AS [ShopId],
[Extent3].[Thumbnail] AS [Thumbnail],
[Extent3].[Creator_UserId] AS [Creator_UserId],
[Extent4].[Comment] AS [Comment1],
[Extent4].[DateCreated] AS [DateCreated],
[Extent4].[DateLastActivity] AS [DateLastActivity],
[Extent4].[DateLastLogin] AS [DateLastLogin],
[Extent4].[DateLastPasswordChange] AS [DateLastPasswordChange],
[Extent4].[Email] AS [Email],
[Extent4].[Enabled] AS [Enabled],
[Extent4].[PasswordHash] AS [PasswordHash],
[Extent4].[PasswordSalt] AS [PasswordSalt],
[Extent4].[ScreenName] AS [ScreenName],
[Extent4].[Thumbnail] AS [Thumbnail1],
[Extent4].[UserId] AS [UserId1],
[Extent4].[UserName] AS [UserName]
FROM [ProductReviews] AS [Extent1]
INNER JOIN [Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[UserId]
LEFT OUTER JOIN [Products] AS [Extent3] ON [Extent1].[ProductId] = [Extent3].[Id]
LEFT OUTER JOIN [Users] AS [Extent4] ON [Extent1].[UserId] = [Extent4].[UserId]
WHERE N'615005822' = [Extent2].[UserId]
or
from d in productRepository.FindAllProducts()
from dr in d.ProductReviews
where dr.User.UserId == 'userid'
orderby dr.CreatedTime
select new ProductReviewInfo()
{
product = new SimpleProductInfo() { Id = d.Id, Name = d.Name, Thumbnail = d.Thumbnail, Rating = d.Rating },
Rating = dr.Rating,
Comment = dr.Comment,
UserId = dr.UserId,
UserScreenName = dr.User.ScreenName,
UserThumbnail = dr.User.Thumbnail,
CreateTime = dr.CreatedTime
};
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Thumbnail] AS [Thumbnail],
[Extent1].[Rating] AS [Rating],
[Extent2].[Rating] AS [Rating1],
[Extent2].[Comment] AS [Comment],
[Extent2].[UserId] AS [UserId],
[Extent4].[ScreenName] AS [ScreenName],
[Extent4].[Thumbnail] AS [Thumbnail1],
[Extent2].[CreatedTime] AS [CreatedTime]
FROM [Products] AS [Extent1]
INNER JOIN [ProductReviews] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [Users] AS [Extent3] ON [Extent2].[UserId] = [Extent3].[UserId]
LEFT OUTER JOIN [Users] AS [Extent4] ON [Extent2].[UserId] = [Extent4].[UserId]
WHERE N'userid' = [Extent3].[UserId]
ORDER BY [Extent2].[CreatedTime] ASC
[QUESTION 2]: Whats with the ugly outer joins?
In general, only retrieve what you need, but keep in mind to retrieve enough information so your application is not too chatty, so if you can batch a bunch of things together, do so, otherwise you'll pay network traffic cost everytime you need to go back to the database and retrieve some more stuffs.
In this case, assuming you will only need those info, I would go with the second approach (if that's what you really need).
Eager loading with .Include doesn't really play nice when you want filtering (or ordering for that matter).
That first query is basically this:
select p.*, u.*, p2.*
from products p
left outer join users u on p.userid = u.userid
left outer join purchases p2 on p.productid = p2.productid
where u.userid == #p1
Is that really what you want?
There is a view that needs to display the purchases made by a user.
Well then why are you including "Product"?
Shouldn't it just be:
from p in DBSet<Purchases>.Include("User") select p;
Your second query will error. You must project to an entity on the model, or an anonymous type - not a random class/DTO.
To be honest, the easiest and most well performing option in your current scenario is to query on the FK itself:
var purchasesForUser = DBSet<Purchases>.Where(x => x.UserId == userId);
That should produce:
select p.*
from products p
where p.UserId == #p1
The above query of course requires you to include the foreign keys in the model.
If you don't have the FK's in your model, then you'll need more LINQ-Entities trickery in the form of anonymous type projection.
Overall, don't go out looking to optimize. Create queries which align with the scenario/business requirement, then optimize if necessary - or look for alternatives to LINQ-Entities, such as stored procedures, views or compiled queries.
Remember: premature optimization is the root of all evil.
*EDIT - In response to Question Update *
[QUESTION 1]: I want to know whether all views should work with flat ViewModels with very specific data for that view, or should the ViewModels contain the entity objects.
Yes - ViewModel's should only contain what is required for that View. Otherwise why have the ViewModel? You may as well bind straight to the EF model. So, setup the ViewModel which only the fields it needs for the view.
[QUESTION 2]: What's with the ugly outer joins?
That is default behaviour for .Include. .Include always produces a left outer join.
I think the second query will throw exception because you can't map result to unmapped .NET type in Linq-to-entities. You have to return annonymous type and map it to your object in Linq-to-objects or you have to use some advanced concepts for projections - QueryView (projections in ESQL) or DefiningQuery (custom SQL query mapped to new readonly entity).
Generally it is more about design of your entities. If you select single small entity it is not a big difference to load it all instead of projection. If you are selecting list of entities you should consider projections - expecially if tables contains columns like nvarchar(max) or varbinar(max) which are not needed in your result!
Both create almost the same query: select from one table, with two inner joins. The only thing that changes from a database perspective is the amount of fields returned, but that shouldn't really matter that much.
I think here DRY wins from a performance hit (if it even exists): so my call is go for the first option.

Resources