How to delete mirrored values in 2 columns - psql

I have a table with 2 columns that contain some rows with unique id pairs and some rows with pairs that are a mirrored duplicate of another row. I want to remove one of the duplicates.
id1 | id2
-----+-----
1 | 9
2 | 10
5 | 4
6 | 16
7 | 11
8 | 12
9 | 1
10 | 2
12 | 14
14 | 8
16 | 6
So 1 | 9 mirrors 9 | 1. I want to keep 1 | 9 but delete 9 | 1.
I've tried.
SELECT
id1,
id2
FROM
(
SELECT
id1, id2, ROW_NUMBER() OVER (PARTITION BY id1, id2 ORDER BY id1) AS occu
FROM
table
) t
WHERE
t.occu = 1;
But it has no effect.
I'm pretty new to this so any help you can give would be greatly appreciated.
====UPDATE====
I accepted the answer from #Mureinik and adapted it to work as a filter in a subquery:
SELECT
*
FROM
table
WHERE
id1 NOT IN (SELECT
id1
FROM
table a
WHERE
id1 > id2
AND
EXISTS (SELECT *
FROM table b
WHERE a.id1 = b.id2 AND a.id2 = b.id1));

You could arbitrarily decide to keep the rows where id1 < id2, and use an exists clause to find their counterparts:
DELETE FROM myable a
WHERE id1 > id2 AND
EXISTS (SELECT *
FROM mytable b
WHERE a.id1 = b.id2 AND a.id2 = b.id1)

Related

SUMPRODUCT but only on nonblank cells

Given:
Lp | COL1 | COL 2 | COL 3
ROW 1 | X | | X
ROW 2 | | X | X
ROW 3 | X | X |
ROW 4 | | |
ROW 5 | 1 | 1.5 | 2
ROW 6 | 2 | 1 | 3
I would like to use SUMPRODUCT of Row 1 with Row 5 (and then Row 6) but only in the places where row has X (or rather where it is non empty).
Expected result for Row 1: 1 * 2 + 2 * 3 = 8 (because first and last column is not empty)
Expected result for Row 2: 1.5 * 1 + 2 * 3 = 7.5 (second and last col not empty)
Expected result for Row 3: 1 * 2 + 1.5 * 1 = 3.5 (first and second non empty)
Expected result for Row 4: 0
I appreciate your help.
Use:
=SUMPRODUCT(($B$6:$D$6)*($B$7:$D$7)*(B2:D2<>""))
You can achieve the same thing without SUMPRODUCT.
Create another three columns COL1',2',3', replace
every X with the corresponding product using IF condition.
For example at COL1',ROW1 you write a formula such as =IF(A1="X", A$5\*A$6, 0)
(here A1 is COL1,ROW1)
and drag it to fill COL1',2',3'.
Then you do SUM over COL1',2',3'.

Transpose rows into columns in BigQuery using standard sql [duplicate]

This question already has answers here:
How to Pivot table in BigQuery
(7 answers)
Closed 2 years ago.
Good morning,
I'm trying to transpose some data in big query. I've looked at a few other people who have asked this on stackoverflow but the way to do this seems to be to use legacy sql (using group_concat_unquoted) rather than standard sql. I would use legacy but I've had issues with nested data in the past so have since used standard only.
Here's my example, to give some context I'm trying to map out some customer journeys which I have below:
uniqueid | page_flag | order_of_pages
A | Collection| 1
A | Product | 2
A | Product | 3
A | Login | 4
A | Delivery | 5
B | Clearance | 1
B | Search | 2
B | Product | 3
C | Search | 1
C | Collection| 2
C | Product | 3
However I'd like to transpose the data so it looks like this:
uniqueid | 1 | 2 | 3 | 4 | 5
A | Collection | Product | Product | Login | Delivery
B | Clearance | Search | Product | NULL | NULL
C | Search | Collection | Product | NULL | NULL
I've tried using multiple left joins but get the following error:
select a.uniqueid,
b.page_flag as page1,
c.page_flag as page2,
d.page_flag as page3,
e.page_flag as page4,
f.page_flag as page5
from
(select distinct uniqueid,
(case when uniqueid is not null then 1 end) as page_hit1,
(case when uniqueid is not null then 2 end) as page_hit2,
(case when uniqueid is not null then 3 end) as page_hit3,
(case when uniqueid is not null then 4 end) as page_hit4,
(case when uniqueid is not null then 5 end) as page_hit5
from `mytable`) a
LEFT JOIN (
SELECT *
from `mytable`) b on a.uniqueid = b.uniqueid
and a.page_hit1 = b.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) c on a.uniqueid = c.uniqueid
and a.page_hit2 = c.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) d on a.uniqueid = d.uniqueid
and a.page_hit3 = d.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) e on a.uniqueid = e.uniqueid
and a.page_hit4 = e.order_of_pages
LEFT JOIN (
SELECT *
from `mytable`) f on a.uniqueid = f.uniqueid
and a.page_hit5 = f.order_of_pages
Error: Query exceeded resource limits for tier 1. Tier 13 or higher required.
I've looked at using Array function as well but I've never used this before and I'm not sure if this is just for transposing the other way around. Any advice would be grand.
Thank you
for BigQuery Standard SQL
#standardSQL
SELECT
uniqueid,
MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid
You can play/test with below dummy data from your question
#standardSQL
WITH `mytable` AS (
SELECT 'A' AS uniqueid, 'Collection' AS page_flag, 1 AS order_of_pages UNION ALL
SELECT 'A', 'Product', 2 UNION ALL
SELECT 'A', 'Product', 3 UNION ALL
SELECT 'A', 'Login', 4 UNION ALL
SELECT 'A', 'Delivery', 5 UNION ALL
SELECT 'B', 'Clearance', 1 UNION ALL
SELECT 'B', 'Search', 2 UNION ALL
SELECT 'B', 'Product', 3 UNION ALL
SELECT 'C', 'Search', 1 UNION ALL
SELECT 'C', 'Collection', 2 UNION ALL
SELECT 'C', 'Product', 3
)
SELECT
uniqueid,
MAX(IF(order_of_pages = 1, page_flag, NULL)) AS p1,
MAX(IF(order_of_pages = 2, page_flag, NULL)) AS p2,
MAX(IF(order_of_pages = 3, page_flag, NULL)) AS p3,
MAX(IF(order_of_pages = 4, page_flag, NULL)) AS p4,
MAX(IF(order_of_pages = 5, page_flag, NULL)) AS p5
FROM `mytable`
GROUP BY uniqueid
ORDER BY uniqueid
result is
uniqueid p1 p2 p3 p4 p5
A Collection Product Product Login Delivery
B Clearance Search Product null null
C Search Collection Product null null
Depends on your needs you can also consider below approach (not pivot though)
#standardSQL
SELECT uniqueid,
STRING_AGG(page_flag, '>' ORDER BY order_of_pages) AS journey
FROM `mytable`
GROUP BY uniqueid
ORDER BY uniqueid
if to run with same dummy data as above - result is
uniqueid journey
A Collection>Product>Product>Login>Delivery
B Clearance>Search>Product
C Search>Collection>Product

Column SUM with "min-max" cell format

I'm trying to make two SUMs on the same column.
Here's my columns:
| 1-2 | 1 |
| 2 | 2-3 |
| 1 | 5 |
|-------|-------|
| 4 | 8 | Sum 1 that take the "min" value of each cells
| 5 | 9 | Sum 2 that take the "max" value of each cells
Sum 1 Column 1 : 1 + 2 + 1 = 4
Sum 2 Column 1 : 2 + 2 + 1 = 5
The cells notation is either {num} which is an absolute value, or {min}-{max} which is the min and max value
This is to create some work timing estimations and we would like to have this "min-max" concept. We have already something with split columns, but it will be more comfortable to keep 1 column with 2 possible values in each cells.
For the min:
=ArrayFormula(SUM(--(IFERROR(LEFT(A1:A3,FIND("-",A1:A3)-1),A1:A3))))
For the Max:
=ArrayFormula(SUM(--(IFERROR(RIGHT(A1:A3,len(A1:A3)-FIND("-",A1:A3)),A1:A3))))

How to list most frequent text values within a range?

I'm an intermediate excel user trying to solve an issue that feels a little over my head. Basically, I'm working with a spreadsheet which contains a number of orders associated with customer account #s and which have up to 5 metadata "tags" associated with them. I want to be use that customer account # to pull the 5 most commonly occurring metadata tags in order.
Here is a mock up of the first set of data
Account Number Order Number Metadata
5043 1 A B C D
4350 2 B D
4350 3 B C
5043 4 A D
5043 5 C D
1204 6 A B
5043 7 A D
1204 8 D B
4350 9 B D
5043 10 A C D
and the end result I'm trying to create
Account Number Most Common Tag 2nd 3rd 4th 5th
5043 A C B N/A
4350 B D C N/A N/A
1204 B A C N/A N/A
I was trying to work with the formula suggested here:
=ARRAYFORMULA(INDEX(A1:A7,MATCH(MAX(COUNTIF(A1:A7,A1:A7)),COUNTIF(A1:A7,A1:A7),0)))
But I don't know how to a) use the customer account # as a precondition for counting the text values within the range. b) how to circumvent the fact that the Match forumula only wants to work with a single column of data and c) how to read the 2nd, 3rd, 4th, and 5th most common values from this range.
The way I'm formatting this data isn't set in stone. I suspect the way I'm organizing this information is holding me back from simpler solutions, so any suggestions on re-thinking my organization would be just as helpful as insights on how to create a formula to do this.
Implementing this kind of frequency analysis using built-in functions is likely to be a frustrating exercise. Since you are working with Google Sheets, take advantage of the custom functions, written in JavaScript and placed into a script bound to the sheet (Tools > Script Editor).
The function I wrote for this purpose is below. Entering something like =tagfrequency(A2:G100) in the sheet will produce desired output:
+----------------+-----------------+-----+-----+-----+-----+
| Account Number | Most Common Tag | 2nd | 3rd | 4th | 5th |
| 5043 | D | A | C | B | N/A |
| 4350 | B | D | C | N/A | N/A |
| 1204 | B | A | D | N/A | N/A |
+----------------+-----------------+-----+-----+-----+-----+
Custom function
function tagFrequency(arr) {
var dict = {}; // the object in which to store tag counts
for (var i = 0; i < arr.length; i++) {
var acct = arr[i][0];
if (acct == '') {
continue; // ignore empty rows
}
if (!dict[acct]) {
dict[acct] = {}; // new account number
}
for (var j = 2; j < arr[i].length; j++) {
var tag = arr[i][j];
if (tag) {
if (!dict[acct][tag]) {
dict[acct][tag] = 0; // new tag
}
dict[acct][tag]++; // increment tag count
}
}
}
// end of recording, begin sorting and output
var output = [['Account Number', 'Most Common Tag', '2nd', '3rd', '4th', '5th']];
for (acct in dict) {
var tags = dict[acct];
var row = [acct].concat(Object.keys(tags).sort(function (a,b) {
return (tags[a] < tags[b] ? 1 : (tags[a] > tags[b] ? -1 : (a > b ? 1 : -1)));
})); // sorting by tag count, then tag name
while (row.length < 6) {
row.push('N/A'); // add N/A if needed
}
output.push(row); // add row to output
}
return output;
}
You also could get this report:
Account Number Tag count
1204 B 2
1204 A 1
1204 D 1
4350 B 3
4350 D 2
4350 C 1
5043 D 5
5043 A 4
5043 C 3
5043 B 1
with the formula:
=QUERY(
{TRANSPOSE(SPLIT(JOIN("",ArrayFormula(REPT(FILTER(A2:A,A2:A<>"")&",",5))),",")),
TRANSPOSE(SPLIT(ArrayFormula(CONCATENATE(FILTER(C2:G,A2:A<>"")&" ,")),",")),
TRANSPOSE(SPLIT(rept("1,",counta(A2:A)*5),","))
},
"select Col1, Col2, Count(Col3) where Col2 <>' ' group by Col1, Col2
order by Col1, Count(Col3) desc label Col1 'Account Number', Col2 'Tag'")
The formula will count the number of occurrences of any tag.

Rails query random models until an attribute reaches a quantity

I have a exercise model which has two attributes: title and points. The goal is getting a list of exercises until to reach an amount of points fixed previously. For example:
TABLE: exercises
title | points
==============
aaaaa | 3
bbbbb | 5
ccccc | 10
ddddd | 10
eeeee | 5
fffff | 3
#points <= 14
RESULT
aaaaa | 3
bbbbb | 5
eeeee | 5
or
aaaaa | 3
ccccc | 10
or
ccccc | 10
fffff | 3
... etc ...
Somthing like:
select_values("SELECT * FROM exercises WHERE SUM(points) < 14)
The following SQL should work in MySQL.
SET #psum := 0;
SELECT t1.* FROM (
SELECT m.*,
(#psum := #psum + m.points) AS cumulative_points
FROM (SELECT title, points from Exercises r ORDER BY RAND()) m
) t1
WHERE t1.cumulative_points <= 14;
However the ActiveRecord query would look a bit messy. E.g. something like:
random_exercises = Exercise.transaction do
Exercise.connection.execute("SET #psum = 0;")
# this is returned, because it's the last line in the block
Exercise.find_by_sql(%Q|
SELECT t1.* FROM (
SELECT m.*,
(#psum := #psum + m.points) AS cumulative_points
FROM (SELECT title, points from exercises r ORDER BY RAND()) m
) t1
WHERE t1.cumulative_points <= 14
|)
end
I think you need something like this:
Exercise.where(...).group(...).having('SUM(points) < ?', predefined_value)
Or, in your case, simply:
Exercise.having('SUM(points) < ?', predefined_value)

Resources