How do query expression joins depend on the order of keys? - f#

In the documentation for query expressions, I found:
Note that the order of the keys around the = sign in a join expression is significant.
I can't, however, find any information about how exactly the order is significant, what difference it makes, or what the rationale was for making an equality operator non-symmetric.
Can anyone either explain or point me to some better documentation?

This is important for joins. For example, if you look at the sample for leftOuterJoin:
query {
for student in db.Student do
leftOuterJoin selection in db.CourseSelection on
(student.StudentID = selection.StudentID) into result
for selection in result.DefaultIfEmpty() do
select (student, selection)
}
The order determines what happens when "missing" values occur. The key is this line in the docs:
If any group is empty, a group with a single default value is used instead.
With the current order, every StudentID within db.Student will be represented, even if db.CourseSelection doesn't have a matching element. If you reverse the order, the opposite is true - every "course selection" will be represented, with missing students getting the default value. This would mean that, in the above, if you switched the order, any students without a course selection would have no representation in the results, where the current order always shows every student.

The expression on the left of the operator must be derived from the "outer" thing being joined and the expression on the right must be derived from the "inner" thing (as you mention in your comment on Reed's answer). This is because of the LINQ API - the actual method that is invoked to build the query looks like this:
static member Join<'TOuter, 'TInner, 'TKey, 'TResult> :
outer:IQueryable<'TOuter> *
inner:IEnumerable<'TInner> *
outerKeySelector:Expression<Func<'TOuter, 'TKey>> *
innerKeySelector:Expression<Func<'TInner, 'TKey>> *
resultSelector:Expression<Func<'TOuter, 'TInner, 'TResult>> -> IQueryable<'TResult>
So you can't join on arbitrary boolean expressions (which you can do in SQL - something like JOIN ON a.x + b.y - 7 > a.w * b.z is fine in SQL but not in LINQ), you can only join based on an equality condition between explicit projections of the outer and inner tables. In my opinion this is a very unfortunate design decision, but it's been carried forward from LINQ into F#.

Related

Kdb+/q: How to bulk insert into a KDB+ table with an index?

I am trying to bulk insert multiple records simultaneously into a KDB+ database:
> trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
> t: .z.z / intentionally the same time
> `trades insert (t t;`buy `sell;10 10;10 10;`exch `exch;`sym `sym)
However It raises an error at the sym column
'sym
[0] `depths insert (t t;`buy `sell;10 10;10 10; `exch `exch;`sym `sym)
^
Have no Idea what I could be doing wrong here, but it seems to be value invariant i.e. it always raises an error on the last column irrespective of the value provided.
Could someone please advise me how I should go about inserting bulk records into kdb+ with an time index as depicted above.
Thanks
In your original insert statement, you had spaces between
`sym `sym
,
`exch `exch
and `buy `sell. The spaces between the symbols makes it an apply or index instead of a list which you desire.
Additionally, because you have specified your qty and price as
float
, you would have to specify the numbers as float when you are inserting to the
trades
table.
The following line should accomplish what you are intending to do:
`trades insert (2#t;`buy`sell;10 10f;10 10f;`exch`exch;`sym`sym)
Lastly, I would recommend changing the schema for the qtycolumn to int/long, as quantity generally does not require decimal points.
Hope this helps!
Daniel is on the money. To expand on his answer, q will collate space-separated lists into a single object for numeric values, and even then the type specification must be only present for the last item. Further details on list creation can be found here.
q)a:10f 10f
'10f
q)a:10 10f
Secondly, it's common for those learning kdb to often encounter type errors when appending to tables. The problem in this case is that kdb is not promoting a list of homogeneous atoms to a wider type (which is expected behaviour). The following is a useful little lambda for letting you know where you are going wrong when performing insert or upsert operations:
q)trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
q)rows:(t,t;`buy`sell;10 10;10 10;`exch`exch;`sym`sym)
q)insertTest:{[tab;rows] m:0!meta tab; wh: where not m[`t] ~' rt:.Q.ty each rows; #[flip;;enlist] `item`currType`expectedType!(m[`c] wh;rt wh; m[`t] wh)}
item currType expectedType
---------------------------
qty j f
price j f

Query against a Postgres array column type

TL;DR I'm wondering what the pros and cons are (or if they are even equivalent) between #> {as_champion, whatever} and using IN ('as_champion', 'whatever') is. Details below:
I'm working with Rails and using Postgres' array column type, but having to use raw sql for my query as the Rails finder methods don't play nicely with it. I found a way that works, but wondering what the preferred method is:
The roles column on the Memberships table is my array column. It was added via rails as so:
add_column :memberships, :roles, :text, array: true
When I examine the table, it shows the type as: text[] (not sure if that is truly how Postgres represents an array column or if that is Rails shenanigans.
To query against it I do something like:
Membership.where("roles #> ?", '{as_champion, whatever}')
From the fine Array Operators manual:
Operator: #>
Description: contains
Example: ARRAY[1,4,3] #> ARRAY[3,1]
Result: t (AKA true)
So #> treats its operand arrays as sets and checks if the right side is a subset of the left side.
IN is a little different and is used with subqueries:
9.22.2. IN
expression IN (subquery)
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result is "false" if no equal row is found (including the case where the subquery returns no rows).
or with literal lists:
9.23.1. IN
expression IN (value [, ...])
The right-hand side is a parenthesized list of scalar expressions. The result is "true" if the left-hand expression's result is equal to any of the right-hand expressions. This is a shorthand notation for
expression = value1
OR
expression = value2
OR
...
So a IN b more or less means:
Is the value a equal to any of the values in the list b (which can be a query producing single element rows or a literal list).
Of course, you can say things like:
array[1] in (select some_array from ...)
array[1] in (array[1], array[2,3])
but the arrays in those cases are still treated like single values (that just happen to have some internal structure).
If you want to check if an array contains any of a list of values then #> isn't what you want. Consider this:
array[1,2] #> array[2,4]
4 isn't in array[1,2] so array[2,4] is not a subset of array[1,2].
If you want to check if someone has both roles then:
roles #> array['as_champion', 'whatever']
is the right expression but if you want to check if roles is any of those values then you want the overlaps operator (&&):
roles && array['as_champion', 'whatever']
Note that I'm using the "array constructor" syntax for the arrays everywhere, that's because it is much more convenient for working with a tool (such as ActiveRecord) that knows to expand an array into a comma delimited list when replacing a placeholder but doesn't fully understand SQL arrays.
Given all that, we can say things like:
Membership.where('roles #> array[?]', %w[as_champion whatever])
Membership.where('roles #> array[:roles]', :roles => some_ruby_array_of_strings)
and everything will work as expected. You're still working with little SQL snippets (as ActiveRecord doesn't have a full understanding of SQL arrays or any way of representing the #> operator) but at least you won't have to worry about quoting problems. You could probably go through AREL to manually add #> support but I find that AREL quickly devolves into an incomprehensible and unreadable mess for all but the most trivial uses.

Auto-assigning objects to users based on priority in Postgres/Ruby on Rails

I'm building a rails app for managing a queue of work items. I have several types of users ("access levels") to whom I want to auto-assign these work items.
The end goal is an "Auto-assign" button on one of my views that will automatically grab the next work item based on a priority, which is defined by the users's access level.
I'm trying to set up a class method in my work_item model to automatically sort work items by type based on the user's access level. I am looking at something like this:
def self.auto_assign_next(access_level)
case
when access_level = 2
where("completed = 'f'").order("requested_time ASC").limit(1)
when access_level > 2
where("completed = 'f'").order("CASE WHEN form='supervisor' THEN 1 WHEN form='installer' THEN 2 WHEN form='repair' THEN 3 WHEN form='mail' THEN 4 WHEN form='hp' THEN 5 ELSE 6 END").limit(1)
end
This isn't very DRY, though. Ideally I'd like the sort order to be configurable by administrators, so maybe setting up a separate table on which the sort order is kept would be best. The problem with that idea is that I have no idea how to pass the priority order on that table to the [postgre]SQL query. I'm new to SQL in general and somewhat lost with this one. Does anybody have any suggestions as to how this should be handled?
One fairly simple approach starts with turning your case statement into a new table, listing form values versus what precedence value they should be sorted by:
id | form | precedence
-----------------------------------
1 | supervisor | 1
2 | installer | 2
(etc)
Create a model for this, say, FormPrecedences (not a great name, but I don't totally grok your data model so pick one that better describes it). Then, your query can look like this (note: I'm assuming your current model is called WorkItems):
when access_level > 2
joins("LEFT JOIN form_precedences ON form_precedences.form = work_items.form")
.where("completed = 'f'")
.order("COALESCE(form_precedences.precedence, 6)")
.limit(1)
The way this works isn't as complicated as it looks. A "left join" in SQL simply takes all the rows of the table on the left (in this case, work_items) and, for each row, finds all the matching rows from the table on the right (form_precedences, where "matching" is defined by the bit after the "ON" keyword: form_precedences.form = work_items.form), and emits one combined row. If no match is found, a LEFT JOIN will still emit a row, but with all the right-hand values being NULL. A normal join would skip any rows with no right-hand match found.
Anyway, with the precedence data joined on to our work items, we can just sort by the precedence value. But, in case no match was found during the join above, that value will be NULL -- so, I use COALESCE (which returns the first of its arguments that's not NULL) to default to a precedence of 6.
Hope that helps!

Join 2 tables in Hive using a phone number and a prefix (variable length)

I'm trying to match phone numbers to an area using Hive.
I've got a table (prefmap) that maps a number prefix (prefix) to an area (area) and another table (users) with a list of phone numbers (nb).
There is only 1 match per phone number (no sub-area)
The problem is that the length of the prefixes is not fixed so I cannot use the UDF function substr(nb,"prefix's length") in the JOIN's ON() condition to match the substring of a number to a prefix.
And when I try to use instr() to find if a number has a matching prefix:
SELECT users.nb,prefix.area
FROM users
LEFT OUTER JOIN prefix
ON (instr(prefmap.prefix,users.nb)=1)
I get an error on line4 "Both left and right aliases encountered in Join '1')
How could I get this to work?
I'm using hive 0.9
Thanks for any advice.
Probably not the best solution but at least it does the job:
use WHERE to define the matching condition instead of ON() (that is now forced to TRUE)
select users.nb, prefix.area
from users
LEFT OUTER JOIN prefix
ON(true)
WHERE instr(users.nb,prefmap.prefix)=1
It's not perfect as it's a bit slow. It creates as many temporary (useless) entries as there are in the matching table before the WHERE condition keeps the only right one. So it's better to use this only if it's not too long.
Can anyone think of a better way to do this?
hive cannot convert (instr(prefmap.prefix,users.nb)=1) to mapreduce job.
so hive's join just support equality expression. see hive joins wiki for more information.

How do I use TADOQuery.Parameters with integer parameter types that have to be put in two or more places in a query?

I have a complex query that contains more than one place where the same primary key value must be substituted. It looks like this:
select Foo.Id,
Foo.BearBaitId,
Foo.LinkType,
Foo.BugId,
Foo.GooNum,
Foo.WorkOrderId,
(case when Goo.ZenID is null or Goo.ZenID=0 then
IsNull(dbo.EmptyToNull(Bar.FanName),dbo.EmptyToNull(Bar.BazName))+' '+Bar.Strength else
'#'+BarZen.Description end) as Description,
Foo.Init,
Foo.DateCreated,
Foo.DateChanged,
Bug.LastName,
Bug.FirstName,
Goo.BarID,
(case when Goo.ZenID is null or Goo.ZenID=0 then
IsNull(dbo.EmptyToNull(Bar.BazName),dbo.EmptyToNull(Bar.FanName))+' '+Bar.Strength else
'#'+BarZen.Description end) as BazName,
GooTracking.Status as GooTrackingStatus
from
Foo
inner join Bug on (Foo.BugId=Bug.Id)
inner join Goo on (Foo.GooNum=Goo.GooNum)
left join Bar on (Bar.Id=Goo.BarID)
left join BarZen on (Goo.ZenID=BarZen.ID)
inner join GooTracking on(Goo.GooNum=GooTracking.GooNum )
where (BearBaitId = :aBaitid)
UNION
select Foo.Id,
Foo.BearBaitId,
Foo.LinkType,
Foo.BugId,
Foo.GooNum,
Foo.WorkOrderId,
Foo.Description,
Foo.Init,
Foo.DateCreated,
Foo.DateChanged,
Bug.LastName,
Bug.FirstName,
0,
NULL,
0
from Foo
inner join Bug on (Foo.BugId=Bug.Id)
where (LinkType=0) and (BearBaitId= :aBaitid )
order by BearBaitId,LinkType desc, GooNum
When I try to use an integer parameter on this non-trivial query, it seems impossible to me. I get this error:
Error
Incorrect syntax near ':'.
The query works fine if I take out the :aBaitid and substitute a literal 1.
Is there something else I can do to this query above? When I test with simple tests like this:
select * from foo where id = :anid
These simple cases work fine. The component is TADOQuery, and it works fine until you add any :parameters to the SQL string.
Update: when I use the following code at runtime, the parameter substitutions are actually done (some glitch in the ADO components is worked around) and a different error surfaces:
adoFooContentQuery.Parameters.FindParam('aBaitId').Value := 1;
adoFooContentQuery.Active := true;
Now the error changes to:
Incorrect syntax near the keyword 'inner''.
Note again, that this error goes away if I simply stop using the parameter substitution feature.
Update2: The accepted answer suggests I have to find two different copies of the parameter with the same name, which bothered me so I reworked the query like this:
DECLARE #aVar int;
SET #aVar = :aBaitid;
SELECT ....(long query here)
Then I used #aVar throughout the script where needed, to avoid the repeated use of :aBaitId. (If the number of times the parameter value is used changes, I don't want to have to find all parameters matching a name, and replace them).
I suppose a helper-function like this would be fine too: SetAllParamsNamed(aQuery:TAdoQuery; aName:String;aValue:Variant)
FindParam only finds one parameter, while you have two with the same name. Delphi dataset adds each parameter as a separate one to its collection of parameters.
It should work if you loop through all parameters, check if the name matches, and set the value of each one that matches, although I normally choose to give each same parameter a follow-up number to distingish between them.

Resources