Related
When running query statement with group by in Google BigQuery, it was failed, and show
Cannot GROUP BY field references from SELECT list alias xxx
I tried many times to obtain its rules, but failed either.
My investigation is below:
a> Create tables and insert values
Create table FFNR_A, FFNR_B CREATE TABLE FFNR_A (A1 INT NOT NULL,
A2 INT NOT NULL, A3 INT NOT NULL);
CREATE TABLE FFNR_B (B1 INT NOT
NULL, B2 INT NOT NULL, B3 INT NOT NULL,B4 INT NOT NULL);
INSERT INTO FFNR_A VALUES (0, 3, 1); INSERT INTO FFNR_A VALUES (1,
0, 2); INSERT INTO FFNR_A VALUES (2, 1, 1); INSERT INTO FFNR_A
VALUES (3, 2, 2); INSERT INTO FFNR_A VALUES (5, 3, 0); INSERT INTO
FFNR_A VALUES (6, 3, 2); INSERT INTO FFNR_A VALUES (7, 4, 1); INSERT
INTO FFNR_A VALUES (8, 4, 3);
INSERT INTO FFNR_B VALUES (1, 1, 2, 0); INSERT INTO FFNR_B VALUES
(2, 2, 3, 0); INSERT INTO FFNR_B VALUES (3, 2, 4, 0); INSERT INTO
FFNR_B VALUES (4, 1, 5, 0); INSERT INTO FFNR_B VALUES (5, 7, 0, 0);
INSERT INTO FFNR_B VALUES (6, 8, 2, 0); INSERT INTO FFNR_B VALUES
(7, 7, 1, 0); INSERT INTO FFNR_B VALUES (8, 8, 3, 0); INSERT INTO
FFNR_B VALUES (0, 1, 3, 0);
b> Run Query
-- Cannot GROUP BY field references from SELECT list alias B1 at [3:60]
SELECT A0.`A1`, B1.`B1`,
FROM `xxx`.`FFNR_B` B1, `xxx`.`FFNR_A` A0
WHERE (A0.`A2` = B1.`B1`) AND (A0.`A2` = B1.`B1`) GROUP BY B1.`B1`,
A0.`A1` limit 2;
-- Works
SELECT A0.`A1`, B1.`B2`,
FROM `xxx`.`FFNR_B` B1, `xxx`.`FFNR_A` A0
WHERE (A0.`A2` = B1.`B1`) AND (A0.`A2` = B1.`B1`) GROUP BY B1.`B2`,
A0.`A1` limit 2;
-- Replace B1->A1 and column A1->A2
-- If use B1(tab), failed either
SELECT A0.`A2`, A1.`B1`,
FROM `xxx`.`FFNR_B` A1, `xxx`.`FFNR_A` A0
WHERE (A0.`A1` = A1.`B1`) AND (A0.`A1` = A1.`B1`) GROUP BY A1.`B1`,
A0.`A2` limit 2;
I didn't get any doc in BigQuery docs.
Can you give me any suggestions about the rules of group by?
Or
Is it a bug in BigQuery?
Thanks
Concluding the discussion from the comments:
-- Cannot GROUP BY field references from SELECT list alias B1 at [3:60]
SELECT A0.`A1`, B1.`B1`, FROM `xxx`.`FFNR_B` B1, `xxx`.`FFNR_A` A0
WHERE (A0.`A2` = B1.`B1`) AND (A0.`A2` = B1.`B1`) GROUP BY B1.`B1`, A0.`A1` limit 2;
As #Mikhail Berlyant said, in your query, the GROUP BY is getting confused by B1 being table alias and column name in output. The table/alias name should not be the same with column name in GROUP BY. To avoid this issue, use table aliases different from column names or simply use GROUP BY B1, A1 or GROUP BY 1, 2. It is a limitation in BigQuery and not a Bug. Refer to this doc for more information about Groupable data types.
I have an 11 week game schedule for 11 teams (5 games each week). I need to try to select from that list 11 games (1 each week) that provide each of the 11 teams with a broadcast of one home and one away game. Ideally this would be code that I would be able to reuse for future years and that I could scale to more teams and weeks if necessary.
I know that the likelihood of finding a viable solution for a given, already created schedule is extremely low, and, in many cases there doesn't exist a solution. So, when a solution of the type listed above doesn't exist, I would like to get a schedule that come close. That is, one in which all the teams get two broadcasts, but some teams may get two home or two away games instead of one of each.
I've looked a several different approaches. I have a number of 5x2 (Away Team, Home Team) arrays (weekly matchups) that I've tried to run a sort/selection with conditions (like a_1 =\= a_j j>1 and a_i in {1..11}) on, but I can't figure out how to get the double restriction selection to work, and I can't figure out how to make it go back to a previous selection when it has no more viable selections. I've tried to brute force it, but 40 million possible combinations is more than I can handle.
I'm using MATLab to perform all the work. I can usually translate from C or C++ to MATLab usable code.
This seemed like a fun problem so I took a crack at formulating it as an IP.
Let J and T be the set of teams and weeks.
Let G be the set of all games; each element of G is a tuple (i,j,t) that indicates the away team (i), the home team (j), and the week (t).
Let H be the set of all home games; each element of H is a tuple (j,t) that indicates the home team (j) and the week (t).
Define the following binary decision variables:
w[j,t] = 1 if we broadcast the home game at j in week t, = 0 otherwise (defined for (j,t) in H)
x[j] = 1 if team j has an away-game broadcast, = 0 otherwise (defined for j in J)
y[j] = 1 if team j has a home-game broadcast, = 0 otherwise (defined for j in J)
z[j] = 1 if team j has both an away-game and a home-game broadcast, = 0 otherwise (defined for j in J)
Then the model is:
maximize sum {j in J} z[j]
subject to sum {j in J} w[j,t] = 1 for all t
x[j] <= sum {(i,t) in H: (j,i,t) in G} w[i,t] for all j
y[j] <= sum {t in T} w[j,t] for all j
z[j] <= (1/2) * (x[j] + y[j]) for all j
w[j,t], x[j], y[j], z[j] in {0,1}
The objective function calculates the total number of teams that get both a home and an away broadcast. The first constraint says we need exactly one broadcast per week. The second constraint says x[j] can't equal 1 unless there is some week when j's away game gets broadcast. The third constraint says the same for y[j] and the home broadcast. The fourth constraint says z[j] can't equal 1 unless both x[j] and y[j] equal 1. The last constraint says everything has to be binary.
I coded this model in Python/PuLP using an 11-game schedule. (Obviously you'd plug in your own schedule.)
from pulp import *
import numpy as np
# Number of teams, weeks, and games per week.
num_teams = 11
num_weeks = 11
num_games_per_week = 5
# Lists of teams and weeks.
teams = range(1, num_teams+1)
weeks = range(1, num_weeks+1)
# List of game tuples: (i, j, t) means team i plays at team j in week t.
games = [(1, 10, 1), (2, 9, 1), (3, 8, 1), (4, 7, 1), (5, 6, 1),
(6, 4, 2), (7, 3, 2), (8, 2, 2), (9, 1, 2), (10, 11, 2),
(2, 11, 3), (3, 10, 3), (4, 9, 3), (5, 8, 3), (6, 7, 3),
(7, 5, 4), (8, 4, 4), (9, 3, 4), (10, 2, 4), (11, 1, 4),
(3, 1, 5), (4, 11, 5), (5, 10, 5), (6, 9, 5), (7, 8, 5),
(8, 6, 6), (9, 5, 6), (10, 4, 6), (11, 3, 6), (1, 2, 6),
(4, 2, 7), (5, 1, 7), (6, 11, 7), (7, 10, 7), (8, 9, 7),
(9, 7, 8), (10, 6, 8), (11, 5, 8), (1, 4, 8), (2, 3, 8),
(5, 3, 9), (6, 2, 9), (7, 1, 9), (8, 11, 9), (9, 10, 9),
(10, 8, 10), (11, 7, 10), (1, 6, 10), (2, 5, 10), (3, 4, 10),
(11, 9, 11), (1, 8, 11), (2, 7, 11), (3, 6, 11), (4, 5, 11)]
# List of home games: (j, t) means there is a home game at j in week t.
home_games = [(j, t) for (i, j, t) in games]
# Initialize problem.
prob = LpProblem('Broadcast', LpMaximize)
# Generate decision variables.
w = LpVariable.dicts('w', home_games, 0, 1, LpInteger)
x = LpVariable.dicts('x', teams, 0, 1, LpInteger)
y = LpVariable.dicts('y', teams, 0, 1, LpInteger)
z = LpVariable.dicts('z', teams, 0, 1, LpInteger)
# Objective function.
prob += lpSum([z[j] for j in teams])
# Constraint: 1 broadcast per week.
for t in weeks:
prob += lpSum([w[j, t] for j in teams if (j, t) in home_games]) == 1
# Constraint: x[j] can only = 1 if we broadcast a game in which j is away team.
for j in teams:
prob += x[j] <= lpSum([w[i, t] for (i, t) in home_games if (j, i, t) in games])
# Constraint: y[j] can only = 1 if we broadcast a game in which j is home team.
for j in teams:
prob += y[j] <= lpSum(([w[j, t] for t in weeks if (j, t) in home_games]))
# Constraint: z[j] can only = 1 if x[j] and y[j] both = 1.
for j in teams:
prob += z[j] <= 0.5 * (x[j] + y[j])
# Solve problem.
prob.solve()
# Print status.
print("Status:", LpStatus[prob.status])
# Print optimal values of decision variables.
for v in prob.variables():
if v.varValue is not None and v.varValue > 0:
print(v.name, "=", v.varValue)
# Prettier print.
print("\nNumber of teams with both home and away broadcasts: {:.0f}".format(np.sum([z[j].value() for j in teams])))
for (i, j, t) in games:
if w[j, t].value() == 1:
print("Week {:2d}: broadcast team {:2d} at team {:2d}".format(t, i, j))
The results are:
Number of teams with both home and away broadcasts: 11
Week 1: broadcast team 1 at team 10
Week 2: broadcast team 10 at team 11
Week 3: broadcast team 5 at team 8
Week 4: broadcast team 8 at team 4
Week 5: broadcast team 6 at team 9
Week 6: broadcast team 11 at team 3
Week 7: broadcast team 4 at team 2
Week 8: broadcast team 9 at team 7
Week 9: broadcast team 7 at team 1
Week 10: broadcast team 2 at team 5
Week 11: broadcast team 3 at team 6
You can see that each team gets both a home and an away broadcast.
I'm attempting to perform a transpose on the column date by performing multiple joins of my table data_A on subsets of the same table:
Here's the code to create my test dataset, which contains duplicate records for every value of count:
create table database.data_A (member_id string, x1 int, x2 int, count int, date date);
insert into table database.data_A
select 'A0001',1, 10, 1, '2017-01-01'
union all
select 'A0001',1, 10, 2, '2017-07-01'
union all
select 'A0001',2, 20, 1, '2017-01-01'
union all
select 'A0001',2, 20, 2, '2017-07-01'
union all
select 'B0001',3, 50, 1, '2017-03-01'
union all
select 'C0001',4, 100, 1, '2017-04-01'
union all
select 'D0001',5, 200, 1, '2017-10-01'
union all
select 'D0001',5, 200, 2, '2017-11-01'
union all
select 'D0001',5, 200, 3, '2017-12-01'
union all
select 'D0001',6, 500, 1, '2017-10-01'
union all
select 'D0001',6, 500, 2, '2017-11-01'
union all
select 'D0001',6, 500, 3, '2017-12-01'
union all
select 'D0001',7, 1000, 1, '2017-10-01'
union all
select 'D0001',7, 1000, 2, '2017-11-01'
union all
select 'D0001',7, 1000, 3, '2017-12-01';
I'd like to transpose the data into this:
member_id x1 x2 date1 date2 date3
'A0001', 1, 10, '2017-01-01' '2017-07-01' .
'A0001', 2, 20, '2017-01-01' '2017-07-01' .
'B0001', 3, 50, '2017-03-01' . .
'C0001', 4, 100, '2017-04-01' . .
'D0001', 5, 200, '2017-10-01' '2017-11-01' '2017-12-01'
'D0001', 6, 500, '2017-10-01' '2017-11-01' '2017-12-01'
'D0001', 7, 1000, '2017-10-01' '2017-11-01' '2017-12-01'
My first program (which was not successful):
create table database.data_B as
select a.member_id, a.x1, a.x2, a.date_1, b.date_2, c.date_3
from (select member_id, x1, x2, date as date_1 from database.data_A where count=1) as a
left join
(select member_id, date as date_2 from database.data_A where count=2) as b
on (a.member_id=b.member_id)
left join
(select member_id, date as date_3 from database.data_A where count=3) as c
on (a.member_id=c.member_id);
Below will do the job.
select
member_id,
x1,
x2,
max(case when count=1 then date1 else '.' end) as date11,
max(case when count=2 then date1 else '.' end) as date2,
max(case when count=3 then date1 else '.' end) as date3
from data_A
group by member_id,x1, x2
I'm having trouble converting a SQL query to ActiveRecord and I'm hoping you can help.
select tbl2.id
from Table1 tbl1
JOIN Table2 tbl2 ON tbl1.id = tbl2.some_column_id
where tbl1.id in (1, 2, 3, 4, 5)
and tbl2.id not in (10, 13, 22, 44, 66)
Rails Models exists and the relationship is like this:
Table2:
has_many :table1
Assumed you setup your classes with appropriate table names (table1 and table2 are not good names for rails models, btw).
Then
Table2
.select(:id).joins(:table1)
.where(table1: { id: [1, 2, 3, 4, 5] }
.where.not(id: [10, 13, 22, 44, 66])
Let’s say I have this array with shipments ids.
s = Shipment.find(:all, :select => "id")
[#<Shipment id: 1>, #<Shipment id: 2>, #<Shipment id: 3>, #<Shipment id: 4>, #<Shipment id: 5>]
Array of invoices with shipment id's
i = Invoice.find(:all, :select => "id, shipment_id")
[#<Invoice id: 98, shipment_id: 2>, #<Invoice id: 99, shipment_id: 3>]
Invoices belongs to Shipment.
Shipment has one Invoice.
So the invoices table has a column of shipment_id.
To create an invoice, I click on New Invoice, then there is a select menu with Shipments, so I can choose "which shipment am i creating the invoice for". So I only want to display a list of shipments that an invoice hasn't been created for.
So I need an array of Shipments that don't have an Invoice yet. In the example above, the answer would be 1, 4, 5.
a = [2, 4, 6, 8]
b = [1, 2, 3, 4]
a - b | b - a # => [6, 8, 1, 3]
First you would get a list of shipping_id's that appear in invoices:
ids = i.map{|x| x.shipment_id}
Then 'reject' them from your original array:
s.reject{|x| ids.include? x.id}
Note: remember that reject returns a new array, use reject! if you want to change the original array
Use substitute sign
irb(main):001:0> [1, 2, 3, 2, 6, 7] - [2, 1]
=> [3, 6, 7]
Ruby 2.6 is introducing Array.difference:
[1, 1, 2, 2, 3, 3, 4, 5 ].difference([1, 2, 4]) #=> [ 3, 3, 5 ]
So in the case given here:
Shipment.pluck(:id).difference(Invoice.pluck(:shipment_id))
Seems a nice elegant solution to the problem. I've been a keen follower of a - b | b - a, though it can be tricky to recall at times.
This certainly takes care of that.
Pure ruby solution is
(a + b) - (a & b)
([1,2,3,4] + [1,3]) - ([1,2,3,4] & [1,3])
=> [2,4]
Where a + b will produce a union between two arrays
And a & b return intersection
And union - intersection will return difference
The previous answer here from pgquardiario only included a one directional difference. If you want the difference from both arrays (as in they both have a unique item) then try something like the following.
def diff(x,y)
o = x
x = x.reject{|a| if y.include?(a); a end }
y = y.reject{|a| if o.include?(a); a end }
x | y
end
This should do it in one ActiveRecord query
Shipment.where(["id NOT IN (?)", Invoice.select(:shipment_id)]).select(:id)
And it outputs the SQL
SELECT "shipments"."id" FROM "shipments" WHERE (id NOT IN (SELECT "invoices"."shipment_id" FROM "invoices"))
In Rails 4+ you can do the following
Shipment.where.not(id: Invoice.select(:shipment_id).distinct).select(:id)
And it outputs the SQL
SELECT "shipments"."id" FROM "shipments" WHERE ("shipments"."id" NOT IN (SELECT DISTINCT "invoices"."shipment_id" FROM "invoices"))
And instead of select(:id) I recommend the ids method.
Shipment.where.not(id: Invoice.select(:shipment_id).distinct).ids
When dealing with arrays of Strings, it can be useful to keep the differences grouped together.
In which case, we can use Array#zip to group the elements together and then use a block to decide what to do with the grouped elements (Array).
a = ["One", "Two", "Three", "Four"]
b = ["One", "Not Two", "Three", "For" ]
mismatches = []
a.zip(b) do |array|
mismatches << array if array.first != array.last
end
mismatches
# => [
# ["Two", "Not Two"],
# ["Four", "For"]
# ]
s.select{|x| !ids.include? x.id}