Transpose rows into columns in BigQuery using standard sql [duplicate] - transpose

This question already has answers here:
How to Pivot table in BigQuery
(7 answers)
Closed 2 years ago.
Good morning,
I'm trying to transpose some data in big query. I've looked at a few other people who have asked this on stackoverflow but the way to do this seems to be to use legacy sql (using group_concat_unquoted) rather than standard sql. I would use legacy but I've had issues with nested data in the past so have since used standard only.
Here's my example, to give some context I'm trying to map out some customer journeys which I have below:
uniqueid | page_flag | order_of_pages
A | Collection| 1
A | Product | 2
A | Product | 3
A | Login | 4
A | Delivery | 5
B | Clearance | 1
B | Search | 2
B | Product | 3
C | Search | 1
C | Collection| 2
C | Product | 3
However I'd like to transpose the data so it looks like this:
uniqueid | 1 | 2 | 3 | 4 | 5
A | Collection | Product | Product | Login | Delivery
B | Clearance | Search | Product | NULL | NULL
C | Search | Collection | Product | NULL | NULL
I've tried using multiple left joins but get the following error:
select a.uniqueid,
b.page_flag as page1,
c.page_flag as page2,
d.page_flag as page3,
e.page_flag as page4,
f.page_flag as page5
from
(select distinct uniqueid,
(case when uniqueid is not null then 1 end) as page_hit1,
(case when uniqueid is not null then 2 end) as page_hit2,
(case when uniqueid is not null then 3 end) as page_hit3,
(case when uniqueid is not null then 4 end) as page_hit4,
(case when uniqueid is not null then 5 end) as page_hit5
from `mytable`) a
LEFT JOIN (
SELECT *
from `mytable`) b on a.uniqueid = b.uniqueid
and a.page_hit1 = b.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) c on a.uniqueid = c.uniqueid
and a.page_hit2 = c.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) d on a.uniqueid = d.uniqueid
and a.page_hit3 = d.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) e on a.uniqueid = e.uniqueid
and a.page_hit4 = e.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) f on a.uniqueid = f.uniqueid
and a.page_hit5 = f.order_of_pages
Error: Query exceeded resource limits for tier 1. Tier 13 or higher required.
I've looked at using Array function as well but I've never used this before and I'm not sure if this is just for transposing the other way around. Any advice would be grand.
Thank you

for BigQuery Standard SQL
#standardSQL
SELECT
uniqueid,
MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid
You can play/test with below dummy data from your question
#standardSQL
WITH `mytable` AS (
SELECT 'A' AS uniqueid, 'Collection' AS page_flag, 1 AS order_of_pages UNION ALL
SELECT 'A', 'Product', 2 UNION ALL
SELECT 'A', 'Product', 3 UNION ALL
SELECT 'A', 'Login', 4 UNION ALL
SELECT 'A', 'Delivery', 5 UNION ALL
SELECT 'B', 'Clearance', 1 UNION ALL
SELECT 'B', 'Search', 2 UNION ALL
SELECT 'B', 'Product', 3 UNION ALL
SELECT 'C', 'Search', 1 UNION ALL
SELECT 'C', 'Collection', 2 UNION ALL
SELECT 'C', 'Product', 3
)
SELECT
uniqueid,
MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid
ORDER BY uniqueid
result is
uniqueid p1 p2 p3 p4 p5
A Collection Product Product Login Delivery
B Clearance Search Product null null
C Search Collection Product null null
Depends on your needs you can also consider below approach (not pivot though)
#standardSQL
SELECT uniqueid,
STRING_AGG(page_flag, '>' ORDER BY order_of_pages) AS journey
FROM `mytable`
GROUP BY uniqueid
ORDER BY uniqueid
if to run with same dummy data as above - result is
uniqueid journey
A Collection>Product>Product>Login>Delivery
B Clearance>Search>Product
C Search>Collection>Product

Related

What is the best way to combine these datasets to get my desired format?

Doing some ELT work...
What is the best way to combine these sets of data into the form of the desired output:
Dataset A:
| project_id1 | types1 |
A, apple
B, banana
Dataset B:
| project_id1 | project_id2 | types2 |
A, 15, strawberry
A, 25, onion
B, 5, peach
Desired Result:
| project_id1 | project_id2 | types |
A, 15, strawberry
A, 15, apple
A, 25, onion
A, 25, apple
B, 5, peach
B, 5, banana
And is there a name for this type of combination?
You can get that information by doing so:
Table
create table da (
project_id1 char(1),
types1 varchar(100)
);
insert into da values
('A', 'apple'),
('B', 'banana');
create table db (
project_id1 char(1),
project_id2 int,
types2 varchar(100)
);
insert into db values
('A', 15, 'strawberry'),
('A', 25, 'onion'),
('B', 5, 'peach');
Query
select * from (
select da.project_id1, db.project_id2, da.types1 as types
from da
inner join db on da.project_id1 = db.project_id1
UNION ALL
select db.project_id1, db.project_id2, db.types2 as types
from db
) x
order by project_id1, project_id2, types desc;
Result
project_id1 project_id2 types
A 15 strawberry
A 15 apple
A 25 onion
A 25 apple
B 5 peach
B 5 banana
Example
https://rextester.com/ISQA20343
I don't know of a name of this kind of data merging.

How to delete mirrored values in 2 columns

I have a table with 2 columns that contain some rows with unique id pairs and some rows with pairs that are a mirrored duplicate of another row. I want to remove one of the duplicates.
id1 | id2
-----+-----
1 | 9
2 | 10
5 | 4
6 | 16
7 | 11
8 | 12
9 | 1
10 | 2
12 | 14
14 | 8
16 | 6
So 1 | 9 mirrors 9 | 1. I want to keep 1 | 9 but delete 9 | 1.
I've tried.
SELECT
id1,
id2
FROM
(
SELECT
id1, id2, ROW_NUMBER() OVER (PARTITION BY id1, id2 ORDER BY id1) AS occu
FROM
table
) t
WHERE
t.occu = 1;
But it has no effect.
I'm pretty new to this so any help you can give would be greatly appreciated.
====UPDATE====
I accepted the answer from #Mureinik and adapted it to work as a filter in a subquery:
SELECT
*
FROM
table
WHERE
id1 NOT IN (SELECT
id1
FROM
table a
WHERE
id1 > id2
AND
EXISTS (SELECT *
FROM table b
WHERE a.id1 = b.id2 AND a.id2 = b.id1));
You could arbitrarily decide to keep the rows where id1 < id2, and use an exists clause to find their counterparts:
DELETE FROM myable a
WHERE id1 > id2 AND
EXISTS (SELECT *
FROM mytable b
WHERE a.id1 = b.id2 AND a.id2 = b.id1)

How to query a table which has a parent child relation

Situation:
I have a table called "word" which contains a word with the associated translations.
| ID | name | lang_id | parent_id |
|----|----------|---------|-----------|
| 1 | screw | 1 | null |
| 2 | schraube | 2 | 1 |
| 3 | vis | 3 | 1 |
So screw is the main word which has no parent. The other data sets have an association to the parent with the parent_id.
What I want:
I need a query which displays the word I searched for and the word which I typed in.
I want to get the datasets 2 and 3, if I query the word "schraube" from german to french.
I want to get the datasets 1 and 3, if I query the word "screw" from english to french.
...
What I tried:
select word.id, word.name, word.lang_id, word.parent_id
from word
left join word w2 on word.parent_id = w2.parent_id
WHERE w2.name = 'screw';
-- and word.lang_id = 2
Unfortunately the result doesn't contain the word I typed. Also this displays all datasets, not only the ones with the specific language.
You can modifiy th below query to get your answer.
DECLARE #FromLanguageId SMALLINT = 2; --german
DECLARE #ToLanguageId SMALLINT = 3; --french
DECLARE #NAME NVARCHAR(300) = 'schraube';
--DECLARE #FromLanguageId SMALLINT = 1; --english
--DECLARE #ToLanguageId SMALLINT = 3; --french
--DECLARE #NAME NVARCHAR(300) = 'screw';
--Get the mathing record
;
WITH ctematch
AS (
--gets the matching record (child or parent)
SELECT match.*
FROM [word] match
WHERE match.NAME LIKE #NAME),
--Join its sibling , parent and childs
ctefamilydata
AS (SELECT *
FROM ctematch match
UNION
--Parent
SELECT parent.*
FROM ctematch match
INNER JOIN [word] parent
ON match.[parent_id] = parent.[id]
UNION
--Child
SELECT child.*
FROM ctematch match
INNER JOIN [word] child
ON child.[parent_id] = match.[id]
UNION
--Siblings
SELECT siblings.*
FROM ctematch match
INNER JOIN [word] siblings
ON match.[parent_id] = siblings.[parent_id])
--Filter and get the data
SELECT *
FROM ctefamilydata Cte
WHERE Cte.[lang_id] = #ToLanguageId
OR Cte.[lang_id] = #FromLanguageId

php vlookup in sql

I have a table like this in sql
ID NAME SIZE GROUP1 GROUP2 SIZE2
1 casa xl 1 2
2 casa l 1 2
I'd like to obtain a table like this
ID NAME SIZE GROUP1 GROUP2 SIZE2
1 casa xl 1 2 l
2 casa l 1 2 xl
So the value of GROUP1 and GROUP2 identify the id that have similar NAME but different value for size
Ho can I do?
Join in the same table again, with the id that is not the same as the record itself:
select
t.ID, t.NAME, t.SIZE, t.GROUP1, t.GROUP2, t2.SIZE
from
TheTable t
inner join TheTable t2 on t2.ID = case t.GROUP1 when t.ID then t.GROUP2 else t.GROUP1 end
To select from table1 and insert it into table2:
insert into table2
select
t.ID, t.NAME, t.SIZE, t.GROUP1, t.GROUP2, t2.SIZE
from
table1 t
inner join table1 t2 on t2.ID = case t.GROUP1 when t.ID then t.GROUP2 else t.GROUP1 end

Linq query with join, group and count (extension format)

I have a table, that stores some information and reference (column parent_ID) to parent row in the same table.
I need to get list of all records (using Linq extension format) with count of child records. This is SQL query that gives me the desired information
Select a.ID, a.Name, Count(b.ID) from Table a
left join Table b on b.parent_ID=a.ID
group by a.ID, a.Name
Example
| ID | Name | parent_ID |
----------------------------------
| 1 | First | null |
| 2 | Child1 | 1 |
| 3 | Child2 | 1 |
| 4 | Child1.1 | 2 |
The result should be:
| 1 | First | 2 |
| 2 | Child1 | 1 |
| 3 | Child2 | 0 |
| 4 | Child1.1 | 0 |
I tried to make at least child counting, but it doesn't work...
var model = _db.Table
.GroupBy(r => new { r.parent_ID })
.Select(r => new {
r.Key.parent_ID,
ChildCount = r.GroupBy(g => g.parent_ID).Count()
});
Shouldn't this query be equivalent to something like this:
select parent_ID, count(parent_ID) from Table group by Table
but it returns count = 1 for each row...
How can i make such a query using linq extension format?
What I believe you are looking for is a group join which can be easier to understand using linq query syntax instead of the linq extensions so for ease of understanding I will post both methods.
I'm self-joining the table on the ID to the parent_ID into an object which keeps the referential integrity for you and select an anonymous object to select out the parent then all of the children.
This is using straight LINQ query syntax.
var model = from t1 in _db.Test1
join t2 in _db.Test1 on t1.ID equals t2.parent_ID into c1
select new {Parent = t1, Children = c1};
And here is the code using LINQ extensions
var model2 = _db.Test1.GroupJoin(_db.Test1,
t1 => t1.ID,
t2 => t2.parent_ID,
(t1, c1) => new {Parent = t1, Children = c1});
I used a quick test program I threw together to post the results into a textbox for both methods but the code was the same so I'll just post it once.
foreach (var test in model)
{
textBox1.AppendTextAddNewLine(String.Format("{0}: {1}",
test.Parent.Name,
test.Children.Count()));
}
And the results of both of those tests were the same below:
First: 2
Child1: 1
Child2: 0
Child1.1: 0
First: 2
Child1: 1
Child2: 0
Child1.1: 0

Resources