How do I handle nan values while using pandas join function? - join

My dataframe initially looks like this :
col1
col2
col3
a
b
c
a
b
d
a
b
nan
e
f
nan
Dataframe after replacing nan values with empty string:
col1
col2
col3
a
b
c
a
b
d
a
b
''
e
f
''
I want the dataframe to look like:
col1
col2
col3
a
b
c,d
e
f
import pandas as pd
import numpy as np
airflow_df = pd.read_csv("/Users/syc/astro-airflow-localdev/dags/export_dataframe.csv")
cols = ['trgt_schema','trgt_tbl','src_tbl']
subset_airflow = pd.DataFrame(airflow_df, columns=cols)
trgt_df_explode = subset_airflow.assign(src_tbl=t_df.src_tbl.str.split(',')).explode('src_tbl')
trgt_df_explode['src_tbl']=trgt_df_explode['src_tbl'].str.lower()
trgt_df_explode['src_tbl']=trgt_df_explode['src_tbl'].str.strip()
trgt_df_explode = trgt_df_explode.drop_duplicates()
trgt_df_explode_nan= trgt_df_explode.replace(np.nan, None , regex=True)
trgt_df_explode_nan['src_tbl'] = trgt_df_explode_nan.groupby(['trgt_schema','trgt_tbl'])['src_tbl'].transform(lambda x: ', '.join(map(str, x)))
trgt_df_explode = trgt_df_explode.drop_duplicates()

You can use dropna() function. More details: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html?

Related

How to repeat row N times and increment date by 1 for each new row in Google Sheets?

I have rows with start and end date. I need to repeat each row N times and increment new date column by one.
N = the number of days between the start date and en date
My table:
Column A
Start date
End date
A
10/09/2022
12/09/2022
B
15/09/2022
16/09/2022
C
08/09/2022
12/09/2022
The result I'd like to generate automatically (new row will often be added):
Column A
Start date
End date
Date
A
10/09/2022
12/09/2022
10/09/2022
A
10/09/2022
12/09/2022
11/09/2022
A
10/09/2022
12/09/2022
12/09/2022
B
15/09/2022
16/09/2022
15/09/2022
B
15/09/2022
16/09/2022
16/09/2022
C
08/09/2022
12/09/2022
08/09/2022
C
08/09/2022
12/09/2022
09/09/2022
C
08/09/2022
12/09/2022
10/09/2022
C
08/09/2022
12/09/2022
11/09/2022
C
08/09/2022
12/09/2022
12/09/2022
I hope my need is clear.
Thanks,
I've tried THIS, but the solution is for fixed N times while I need N to be dynamic.
UPDATE
I though it'll be easy to reproduce the solution to my exact need, but it's not the case... I've received two great solutions which work with my first example, but not the full need.
Here is an example of the exact need:
Col1
Col2
Col3
Col4
Col5
Col6
Col7
Col8
Start date
End date
Col11
Col12
Col13
Col14
Col15
Col16
Col17
Col18
Col19
A
B
C
D
E
F
G
H
10/09/2022
24/09/2022
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
A
05/10/2022
17/11/2022
D
E
F
G
H
I
J
K
L
try:
=QUERY(ARRAYFORMULA(SPLIT(FLATTEN(A2:A&"|"&B2:B&"|"&C2:C&"|"&MAP(B2:B,C2:C,LAMBDA(bx,cx,if(bx="",,TRANSPOSE(SEQUENCE(DATEDIF(bx,cx,"d")+1,1,bx,1)))))),"|")),"Select * Where Col4 IS NOT NULL")
-
Here's one way to do that using REDUCE.
=ARRAYFORMULA(
{A1:C1,"Date";
QUERY(
REDUCE(
{"","","",""},
BYROW(
FILTER(A2:C,A2:A<>"",B2:B<>"",C2:C<>""),
LAMBDA(row,JOIN("❄️",row))),
LAMBDA(acc,cur,
{acc;
LAMBDA(a,start,end,
LAMBDA(dif,
{IF(dif,{a,start,end}),SEQUENCE(MAX(dif),1,start)})
(SEQUENCE(end-start+1)))
(INDEX(SPLIT(cur,"❄️"),,1),
INDEX(SPLIT(cur,"❄️"),,2),
INDEX(SPLIT(cur,"❄️"),,3))})),
"OFFSET 1",0)})
UPDATE
=ARRAYFORMULA(
QUERY(
{TRIM(QUERY(
SPLIT(REDUCE(,
BYROW(A2:S3,LAMBDA(row,JOIN("❄",row)))&"♥"&J2:J3-I2:I3+1,
LAMBDA(
acc,cur,
{acc;
IF(SEQUENCE(INDEX(SPLIT(cur,"♥",,),,2)),
INDEX(SPLIT(cur,"♥",,),,1))})),"❄",,),
"OFFSET 1",0)),
QUERY(
FLATTEN(MAP(I2:I3,J2:J3,LAMBDA(start,end,SEQUENCE(1,end-start+1,start)))),
"WHERE Col1 IS NOT NULL")},
"SELECT Col"&JOIN(", Col",SEQUENCE(COLUMN(J1)))&
", Col"&COLUMNS(A2:S3)+1&
", Col"&JOIN(", Col",SEQUENCE(COLUMNS(A2:S3)-COLUMN(J1),1,COLUMN(J1)+1))))

List unique items, count and sort numerically by descending order, and combine with validate menu for search period

I need to apply solution found in this question:
Using single formula to list unique items, count and sort numerically by descending order and then alphabetically for items with same count
This time adding a data validation menu for the search period.
I can do this for single cell but I'm not able to apply it to the solution above.
Formula for search period is this:
=COUNTIFS($B3:$B,"*apple*",$A3:$A,">="&TODAY()- VLOOKUP(
SUBSTITUTE(D2," ",""),
{"24HOURS",0;
"2DAYS",1;
"3DAYS",4;
"7DAYS",7;
"2WEEKS",14;
"1MONTH",30;
"3MONTHS",90;
"6MONTHS",180;
"1YEAR",365;
"2YEARS",730;
"3YEARS",1095;
"TOTAL",999999},
2,FALSE))
Formula taken from solution on question above:
=QUERY(B:B,"Select B, count(B) where B matches '^(?!(?:ITEMS|ExcludeB|ExcludeC)$).+' group by B order by count(B) DESC label count(B) ''")
Image to help clarify:
My dummy file:
https://docs.google.com/spreadsheets/d/1iB4BnqhTBVNuCCQ4GnEIu95gbzYb0T9H9A3Pi1W5AZk/edit?usp=sharing
Is such a thing possible? Any pointers on how this can be done? Thank you
In Excel (since you tagged it) you can use the following in Office 365:
=LET(a,A2:INDEX(B:B,LOOKUP(2,1/(A:A<>""),ROW(B:B))),
aa,INDEX(a,,1),
ab,INDEX(a,,2),
u,UNIQUE(INDEX(a,,2)),
c,COUNTIF(ab,u),
d,COUNTIFS(ab,u,
aa,">="&TODAY()
-VLOOKUP(SUBSTITUTE(D2," ",""),
{"24HOURS",0;
"2DAYS",1;
"3DAYS",4;
"7DAYS",7;
"2WEEKS",14;
"1MONTH",30;
"3MONTHS",90;
"6MONTHS",180;
"1YEAR",365;
"2YEARS",730;
"3YEARS",1095;
"TOTAL",999999},
2,0)),
SORT(CHOOSE({1,2,3},u,c,d),{2,1,1},{-1,1,1}))
This should do it.
=QUERY(
A:B,
"Select B, count(B)
where
B matches '^(?!(?:ITEMS|ExcludeB|ExcludeC)$).+' and
A >= date '"&
TEXT(
IFERROR(
VLOOKUP(
D2,
{"2 4 H O U R S",TODAY()-1;
"3 D A Y S",TODAY()-3;
"7 D A Y S",TODAY()-7;
"2 W E E K S",TODAY()-14;
"1 M O N T H",EDATE(TODAY(),-1);
"3 M O N T H S",EDATE(TODAY(),-3);
"6 M O N T H S",EDATE(TODAY(),-6);
"1 Y E A R",EDATE(TODAY(),-12);
"2 Y E A R S",EDATE(TODAY(),-24);
"3 Y E A R S",EDATE(TODAY(),-36)},
2,FALSE),0),"yyyy-mm-dd")&"'
group by B
order by
count(B) DESC,
B asc
label count(B) ''")
Using an array
=QUERY(
{A3:A,E3:E},
"Select Col2, count(Col2)
where
Col2 matches '^(?!(?:ITEMS|ExcludeB|ExcludeC)$).+' and
Col1 >= date '"&
TEXT(
IFERROR(
VLOOKUP(
G2,
{"2 4 H O U R S",TODAY()-1;
"3 D A Y S",TODAY()-3;
"7 D A Y S",TODAY()-7;
"2 W E E K S",TODAY()-14;
"1 M O N T H",EDATE(TODAY(),-1);
"3 M O N T H S",EDATE(TODAY(),-3);
"6 M O N T H S",EDATE(TODAY(),-6);
"1 Y E A R",EDATE(TODAY(),-12);
"2 Y E A R S",EDATE(TODAY(),-24);
"3 Y E A R S",EDATE(TODAY(),-36)},
2,FALSE),0),"yyyy-mm-dd")&"'
group by Col2
order by
count(Col2) DESC,
Col2 asc
label
Col2 '',
count(Col2) ''")

How to parse Google Sheets query (show only the first column)

I'm trying to show only the first result from column (G) of the output of this query:
=QUERY(data!A1:O7122; "SELECT G, COUNT(G) WHERE E = '"&A3&"' GROUP BY G LABEL G '', COUNT(G) ''";0)
So I tried this:
=INDEX(QUERY(data!A1:O7122; "SELECT G, COUNT(G) WHERE E = '"&A3&"' GROUP BY G LABEL G '', COUNT(G) ''";0), 1, 1)
But it doesn't work (error). Any idea...? :)
I can display your expected output with slight adjustment on your formula as following, although I do not use your raw data for calculation:
=index(QUERY(data!A1:O18, "SELECT G, Count(G) WHERE E = '"&A1&"' Group By G Order by Count(G) Desc Label Count(G) ''",0),1,1)
Comparison without index as following:

ARRAYFORMULA not expanding

I have the following ARRAYFORMULA entered into cell D2, and it seems like it should be expanding downward throughout the entire column (I'll do the "check for blank A2" addition to the formula once this part works correctly), but nothing is expanding. Where have I gone astray?
Here's the formula:
=ArrayFormula(IFERROR(QUERY('Form Responses 1'!$A:$J,"select count(E) where (C contains '"& $A2:A &"' or C contains '"& $B2:B &"') and (E) Contains 'Option' label count(E) ''"),0)+
IFERROR(QUERY('Form Responses 1'!$A:$J,"select count(F) where (C contains '"& $A2:A &"' or C contains '"& $B2:B &"') and (F) Contains 'Option' label count(F) ''"),0)+
IFERROR(QUERY('Form Responses 1'!$A:$J,"select count(G) where (C contains '"& $A2:A &"' or C contains '"& $B2:B &"') and (G) Contains 'Option' label count(G) ''"),0)+
IFERROR(QUERY('Form Responses 1'!$A:$J,"select count(H) where (C contains '"& $A2:A &"' or C contains '"& $B2:B &"') and (H) Contains 'Option' label count(H) ''"),0)+
IFERROR(QUERY('Form Responses 1'!$A:$J,"select count(I) where (C contains '"& $A2:A &"' or C contains '"& $B2:B &"') and (I) Contains 'Option' label count(I) ''"),0))
A link to a copy of the spread follows:
https://docs.google.com/spreadsheets/d/14E1QEfcTYiwOG_gORkc8YoRuWEPM7FweLljj0hQCQTc/edit?usp=sharing
try:
=ARRAYFORMULA(IF(TRIM(B2:B)="",,IFNA(IFNA(VLOOKUP((B2:B),
QUERY({IFNA(IFNA(REGEXEXTRACT('Form Responses 1'!C2:C, "\b"&TEXTJOIN("\b|\b", 1, (B2:B))&"\b"),
REGEXEXTRACT('Form Responses 1'!C2:C, "\b"&TEXTJOIN("\b|\b", 1, TRIM(A2:A))&"\b"))),
MMULT(N(REGEXMATCH('Form Responses 1'!E2:I, "(?i)Option")), {1;1;1;1;1})},
"select Col1,sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''"), 2, 0), VLOOKUP(TRIM(A2:A),
QUERY({IFNA(IFNA(REGEXEXTRACT('Form Responses 1'!C2:C, "\b"&TEXTJOIN("\b|\b", 1, (B2:B))&"\b"),
REGEXEXTRACT('Form Responses 1'!C2:C, "\b"&TEXTJOIN("\b|\b", 1, TRIM(A2:A))&"\b"))),
MMULT(N(REGEXMATCH('Form Responses 1'!E2:I, "(?i)Option")), {1;1;1;1;1})},
"select Col1,sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''"), 2, 0)), 0)))
UPDATE:
=ARRAYFORMULA(IF(C2:C="",,IFNA(IFNA(VLOOKUP(B2:B, QUERY({TRIM(FLATTEN(QUERY(TRANSPOSE(
REGEXEXTRACT('Form Responses 1'!C2:C, TEXTJOIN("|", 1, "("&SUBSTITUTE(TRIM(UNIQUE(
FILTER({Sheet3!B2:B; Sheet3!A2:A}, {Sheet3!B2:B; Sheet3!A2:A}<>""))), " ", ").+(")&")"))),,9^9))) ,
MMULT(N(REGEXMATCH('Form Responses 1'!E2:I, "(?i)Option")), {1;1;1;1;1})},
"select Col1,sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''"), 2, 0),
VLOOKUP(A2:A, QUERY({TRIM(FLATTEN(QUERY(TRANSPOSE(
REGEXEXTRACT('Form Responses 1'!C2:C, TEXTJOIN("|", 1, "("&SUBSTITUTE(TRIM(UNIQUE(
FILTER({Sheet3!B2:B; Sheet3!A2:A}, {Sheet3!B2:B; Sheet3!A2:A}<>""))), " ", ").+(")&")"))),,9^9))) ,
MMULT(N(REGEXMATCH('Form Responses 1'!E2:I, "(?i)Option")), {1;1;1;1;1})},
"select Col1,sum(Col2) where Col1 is not null group by Col1 label sum(Col2)''"), 2, 0)), 0)))

Get values from other sheet based on other cell values

How to get values from another sheet if another cell empty. I'm trying to get values from sheet1 column B if in sheet1 column H empty.
I mean if Sheet1 column B2 to B4 have values and in Column H on the same sheet, if H2 and H3 have text only B4 value print.
Sheet1 Image
Sheet2 Image
Here is what I try to do that not work
=query(Sheet1!B2:B, "Select Sheet1!b where Sheet1!H <> ''")
try:
=FILTER(Sheet1!B2:B, Sheet1!B2:B<>"", Sheet1!C2:C="")
or:
=QUERY(Sheet1!B2:C, "select B where B is not null and C is null", 0)
or:
=QUERY(Sheet1!B2:C, "select B where B !='' and C =''", 0)

Resources