Informix - Fetching Records from a table - informix

Consider the table below:
Col1 Col2 Col3
123 ABC 20/5/2010
123 CDS 21/5/2010
123 VDS 22/5/2010
123 ABC 23/5/2010
123 VDS 24/5/2010
123 CDS 25/5/2010
123 ABC 26/5/2010
I need to fetch the first occurrence of CDS and calculate the time diff between the next row.
Similarly I need to find out the next occurrence of CDS and calculate the time diff with the next row.
This has to go on until there are no occurrences of CDS left in the table.
Will be grateful if someone can help on this!!

What is your desired output? Is it something like:
123 ABC 20/5/2010
123 CDS 21/5/2010
123 VDS 22/5/2010 1 day, 0:00:00
123 ABC 23/5/2010
123 VDS 24/5/2010
123 CDS 25/5/2010
123 ABC 26/5/2010 1 day, 0:00:00
If so, then I think the simplest way to get it is: create program in Python or similar language, select your data using SQL and calculate date diff with language of your choice. In "normal" SQL there is no such thing like "next row", where in other languages you can save date of last CDS and use it with next loop iteration.
This output was created with Python:
import time
import datetime
TXT = """123 ABC 20/5/2010
123 CDS 21/5/2010
123 VDS 22/5/2010
123 ABC 23/5/2010
123 VDS 24/5/2010
123 CDS 25/5/2010
123 ABC 26/5/2010"""
def txt2time(ts):
tpl = time.strptime(ts, '%d/%m/%Y')
return time.mktime(tpl)
last_date = ''
for line in TXT.split('\n'):
date_diff = ''
arr = line.split()
if last_date:
date_diff = '%s' % (datetime.timedelta(seconds = (txt2time(arr[2]) - txt2time(last_date))))
last_date = ''
if arr[1] == 'CDS':
last_date = arr[2]
print('%s %s' % (line.strip(), date_diff))
As you see I iterate over text lines, but you can easily change first loop with split('\n') to loop on recordset:
for row in cursor.fetchall():
if row[0] == 'CDS':
...
(you can find Python/Jython examples on many web pages including my questions and answers on SO).
I think it is possible to find such solution in SQL only. You will need function that return date from next row. And I think this may not be easy to create such function because such function will have to behave just like your select with filtering and ordering.

With the sample data, there is but one pair of rows with the value CDS in Col2, so there is but one row in the output. It is not clear what you'd expect if there were 4 rows with CDS. Your wording might be intended to imply that the first pair would contribute one row and the second pair would contribute a second row. Or it might be that you need to find the differences between consecutive occurrences of CDS, so that the 4 rows of data would produce 3 rows of output. (The question also leaves it open to discussion whether this applies within a single value for Col1, or whether the Col1 is immaterial to the result.)
Since the ambiguity exists, I'll tackle the second option, assuming that the CDS entries all have to be for the same value in Col1 (but that there may be many different values in Col1).
You don't mention which version of Informix you have; I am assuming IDS 11.50. The syntax may not work in earlier versions.
As so often, the table is anonymous in the question - so it is hereby designated Tab1.
Query
SELECT t1a.col1, t1a.col2, t1a.col3, t1b.col3 AS col4,
t1b.col3 - t1a.col3 AS delta
FROM tab1 AS t1a JOIN tab1 AS t1b
ON t1a.col1 = t1b.col1 AND
t1a.col3 < t1b.col3 AND
t1a.col2 = t1b.col2 AND
t1a.col2 = 'CDS' AND
NOT EXISTS(SELECT *
FROM tab1 AS t1c
WHERE t1c.col3 > t1a.col3 AND
t1c.col3 < t1b.col3 AND
t1c.col1 = t1a.col1 AND
t1c.col2 = t1a.col2
);
The '<' join orders the pairs of dates so the 't1a' value is less than the 't1b' value; the NOT EXISTS clause ensures that the are adjacent to each other by asserting that there is no row in tab1 with the same values for col1 and col2 and with a date that comes after the earlier date and before the later date. This is the crucial part of the query.
Result
col1 col2 col3 col4 delta
123 CDS 21/05/2010 25/05/2010 4
123 CDS 14/04/2010 22/04/2010 8
123 CDS 22/04/2010 21/05/2010 29
120 CDS 11/05/2010 16/05/2010 5
121 CDS 21/04/2010 30/04/2010 9
Schema
CREATE TABLE tab1 (col1 SMALLINT, col2 CHAR(3), col3 DATE);
Data
Running with DBDATE=DMY4/.
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '20/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '21/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '22/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '23/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '24/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '25/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '26/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '10/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '14/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '12/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '13/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'VDS', '19/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'CDS', '22/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (123, 'ABC', '16/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'ABC', '10/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'CDS', '11/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'VDS', '12/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'ABC', '13/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'VDS', '14/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (120, 'CDS', '16/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '17/5/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'CDS', '21/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '22/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'VDS', '23/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '24/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'VDS', '25/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '26/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '27/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '28/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'ABC', '29/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (121, 'CDS', '30/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'ABC', '23/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'VDS', '24/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'CDS', '25/4/2010');
INSERT INTO tab1(col1, col2, col3) VALUES (122, 'ABC', '26/4/2010');

Either I don't understand what seems like a simple question or others are over thinking it.
SELECT col1, col2, col3,
(SELECT MIN(a.col3) FROM tab1 a WHERE a.col3 > z.col3) - z.col3 AS DaysDiff
FROM tab1 z
WHERE z.col2 = "CDS"

Related

Google Sheet Query Pivot - count if a date is between a range of dates

https://docs.google.com/spreadsheets/d/1AQUNPb4d3EqeSZb9SwfzemR7MfSIkMFkDAZ8rPLA3rc/edit?usp=sharing
I have 3 columns :
City
Date start
End date
I want 3 table :
Pivot Table city with people which enter during the year (Done)
=query(QUERY({$A$2:$C$10};
"select Col1, count(Col1)
where year(Col2)=2018 or year(Col2)=2019 or year(Col2)=2020
group by Col1
pivot year(Col2)");
"select * order by Col4 desc, Col3 desc, Col2 desc label Col1 'Start'";1)
Pivot Table city with people which left during the year (Done)
=query(QUERY({$A$2:$C$10};
"select Col1, count(Col1)
where (year(Col3)=2018 or year(Col3)=2019 or year(Col3)=2020)
group by Col1
pivot year(Col3)");
"select * order by Col4 desc, Col3 desc, Col2 desc label Col1 'End'";1)
- Pivot Table city with people which stay during the year (Fail)
=query(QUERY({$A$2:$C$10};
"select Col1, count(Col1)
where
(2018>=YEAR(Col2) and 2018<=YEAR(Col3) or
(2019>=YEAR(Col2) and 2019<=YEAR(Col3) or
(2020>=YEAR(Col2) and 2020<=YEAR(Col3)
group by Col1
pivot year(Col2)");
"select * order by Col4 desc, Col3 desc, Col2 desc label Col1 'Between'";1)
For the last one, i am getting trouble.
I guess my Where condition is not adapted and my pivot not working too.
I know pivot year(Col2) can't work for the last one, because if a row got a date start 2015 and 2020 end start, i want it to be counted, but my pivot won't show up 2018 2019 2020.
Any idea ?
Thanks for your time
use:
=ARRAYFORMULA(QUERY(QUERY(UNIQUE(SPLIT(FLATTEN(IF(DAYS(C2:C10; B2:B10)>=
SEQUENCE(1; 5000; ); ROW(A2:A10)&"×"&A2:A10&"×"&TEXT(B2:B10+SEQUENCE(1; 5000; );
"yyyy-1-1"); )); "×"));
"select Col2,count(Col2)
where year(Col3) matches '2018|2019|2020'
group by Col2
pivot year(Col3)
label Col2'Between'");
"order by Col4 desc, Col3 desc, Col2 desc"))
update:
=ARRAYFORMULA(QUERY(QUERY(SPLIT(FLATTEN(IF(DATEDIF(B2:B; C2:C; "Y")>=
SEQUENCE(1; MAX(DATEDIF(B2:B; C2:C; "Y")); ); ROW(A2:A)&"×"&A2:A&"×"&
YEAR(B2:B)+SEQUENCE(1; MAX(DATEDIF(B2:B; C2:C; "Y")); )&"-1-1"; )); "×");
"select Col2,count(Col2)
where year(Col3) matches '2018|2019|2020'
group by Col2
pivot year(Col3)
label Col2'Between'");
"order by Col4 desc, Col3 desc, Col2 desc"))

Sort/Order query result based on calculated fields

I have a list of transactions in Transactions tab and in Summary I would like to summarize by tickers the performance. I am using query for grouping the data and using aggregate functions to calculate %-Win, %-Lost (see the link at the bottom with the sample spreadsheet):
Transaction tab:
=query({Transactions!B:B,Transactions!C:F},
"select Col1, count(Col2),sum(Col4),
(count(Col2)/(count(Col2)+count(Col3))), count(Col3),
sum(Col5),
(count(Col3)/(count(Col3)+count(Col2))) where Col1 is not NULL
and
(Col2 is not NULL or Col3 is not Null)
group by Col1
label count(Col2) 'Win', sum(Col4) '$-Win',
(count(Col2)/(count(Col2)+count(Col3))) '%-Win',
count(Col3) 'Lost', sum(Col5) '$-Lost',
(count(Col3)/(count(Col3)+count(Col2))) '%-Lost'",1)
Sample of Summary tab:
but I was not able to obtain from the query by ticker: Total Transactions, Net Gains, Exp. Value(Expected Value), so I did use Arrayformula, and it works, the problem is that I am not able to sort the result by expected value nor Net Gain (FUBO should be first). I was able to calculate percentage using a combination of aggregated functions, but not for the above additional calculations directly in the query.
I tried to use query clause order by: sum(Col3)+sum(Col5) (Net gains) but it doesn't work, it only returns a value when there are Win and Lost transactions.
Using Data->Sort Range doesn't provide the expected result either. Because there are different sources of data: the query and the result of Arrayformula.
I guess I would need to obtain all required calculated fields directly from the query and then to order by, or to find a way to sort the result combining the query and Arrayformula results. The clause order by works well for aggregated functions that are present in the select elements, but not when the sorting should happen based on a formula based on calculated columns.
Here you can find a sample file from my real situation:
https://docs.google.com/spreadsheets/d/1xrDSWGJVIsWD6fvAOdMOZkw2rEY9lGPZRb_Ww_nC7YQ/edit?usp=sharing
Note: A possible solution would be to combine all the results into one sort statement, but I am not able to make it work
=sort({
query({Transactions!B2:B,Transactions!C2:F}, "select Col1, count(Col2),sum(Col4), (count(Col2)/(count(Col2)+count(Col3))), count(Col3), sum(Col5), (count(Col3)/(count(Col3)+count(Col2))) where Col1 is not NULL and (Col2 is not NULL or Col3 is not Null) group by Col1 label count(Col2) '', sum(Col4) '', (count(Col2)/(count(Col2)+count(Col3))) '', count(Col3) '', sum(Col5) '', (count(Col3)/(count(Col3)+count(Col2))) ''",0),
ARRAYFORMULA(if(not(ISBLANK(A2:A)), B2:B+E2:E,)),
ARRAYFORMULA(if(not(ISBLANK(A2:A)), C2:C+F2:F,)),
ARRAYFORMULA(if(not(ISBLANK(A2:A)), (C2:C)*(D2:D) + (F2:F)*(G2:G),))
},10, FALSE)
In the same way avoiding using Arrayformula using two query statements, doesn't work:
=sort({
query({Transactions!B2:B,Transactions!C2:F}, "select Col1, count(Col2),sum(Col4), (count(Col2)/(count(Col2)+count(Col3))), count(Col3), sum(Col5), (count(Col3)/(count(Col3)+count(Col2))) where Col1 is not NULL and (Col2 is not NULL or Col3 is not Null) group by Col1 label count(Col2) '', sum(Col4) '', (count(Col2)/(count(Col2)+count(Col3))) '', count(Col3) '', sum(Col5) '', (count(Col3)/(count(Col3)+count(Col2))) ''",0),
query(query({Transactions!B2:B,Transactions!C2:F}, "select Col1, count(Col2),sum(Col4), (count(Col2)/(count(Col2)+count(Col3))), count(Col3), sum(Col5), (count(Col3)/(count(Col3)+count(Col2))) where Col1 is not NULL and (Col2 is not NULL or Col3 is not Null) group by Col1 label count(Col2) '', sum(Col4) '', (count(Col2)/(count(Col2)+count(Col3))) '', count(Col3) '', sum(Col5) '', (count(Col3)/(count(Col3)+count(Col2))) ''",0),"select Col2+Col5 label Col2+Col5 ''",0),
query(query({Transactions!B2:B,Transactions!C2:F}, "select Col1, count(Col2),sum(Col4), (count(Col2)/(count(Col2)+count(Col3))), count(Col3), sum(Col5), (count(Col3)/(count(Col3)+count(Col2))) where Col1 is not NULL and (Col2 is not NULL or Col3 is not Null) group by Col1 label count(Col2) '', sum(Col4) '', (count(Col2)/(count(Col2)+count(Col3))) '', count(Col3) '', sum(Col5) '', (count(Col3)/(count(Col3)+count(Col2))) ''",0), "select Col3+Col6 label Col3+Col6 ''",0),
query(query({Transactions!B2:B,Transactions!C2:F}, "select Col1, count(Col2),sum(Col4), (count(Col2)/(count(Col2)+count(Col3))), count(Col3), sum(Col5), (count(Col3)/(count(Col3)+count(Col2))) where Col1 is not NULL and (Col2 is not NULL or Col3 is not Null) group by Col1 label count(Col2) '', sum(Col4) '', (count(Col2)/(count(Col2)+count(Col3))) '', count(Col3) '', sum(Col5) '', (count(Col3)/(count(Col3)+count(Col2))) ''",0), "select Col3*Col4+Col6*Col7 label Col3*Col4+Col6*Col7 ''",0)
},10, FALSE)
Doesn't give all the result values for Net Gain and Exp. Value
As you can see it only provides Net Gains and Exp. Value where are Win and Lost values on the same row.
You should fill the blanks with 0.
=SORT(QUERY(query(ArrayFormula({Transactions!B:B,
IF(Transactions!C:F="",0, Transactions!C:F)}),
"select Col1, sum(Col2),sum(Col4),
(sum(Col2)/(sum(Col2)+sum(Col3))),
sum(Col3), sum(Col5), (sum(Col3)/(sum(Col3)+sum(Col2)))
where Col1 is not NULL and NOT (Col2 = 0 and Col3 = 0) group by Col1",1),
"select Col1, Col2, Col3, Col4, Col5, Col6, Col7,Col2+Col5,
Col3+Col6,Col3*Col4+Col6*Col7
label Col2 'Win',Col3 '$-Win', Col4 '%-Win', Col5 'Lost',
Col6 '$-Lost', Col7 '%-Lost', Col2+Col5 'Total Transactions',
Col3+Col6 'Net Gains',Col3*Col4+Col6*Col7 'Exp. Value'",1),
10,FALSE)
Notes:
The condition: NOT (Col2 = 0 and Col3 = 0)ensures to exclude transactions that were not sold, i.e. Win =0 and Lost = 0
The condition: IF(Transactions!C:F="",0, Transactions!C:F)ensures empty values are replaces by 0to ensure the agregate SQL functions work as expected

Query + Importrange function doesn't work with Contains parameter ( QUERY : PARSE_ERROR)

I would like to import a set of columns from a sheet to another with a "filter" parameter which exclude some value.
My formula look like this :
=Query(importrange("URL";"sheet name!a2:be");"SELECT Col1, Col3, Col4, Col26, Col8, Col30, Col40, Col41, Col44, Col45, Col49 WHERE Col8 NOT CONTAINS 'alc'")
However I came across with this error :
#VALEUR! Unable to parse query string for parameter 2 of function QUERY : PARSE_ERROR: Encountered " "Col8 "" at line 1, column 86.
Was expecting one of: "(" ... "(" ... .
I don't know why it doesn't work, CONTAINS is a valid parameter and even after reading documentation at
Google Visualization API query language i found no error in my syntax.
Issue:
The correct syntax is NOT Col8 CONTAINS and not Col8 NOT CONTAINS.
Solutions:
=Query(importrange("URL";"sheet name!a2:be");"SELECT Col1, Col3, Col4, Col26, Col8, Col30, Col40, Col41, Col44, Col45, Col49 WHERE NOT Col8 CONTAINS 'alc'")
Another approach would be to use Col8 <> 'alc':
=Query(importrange("URL";"sheet name!a2:be");"SELECT Col1, Col3, Col4, Col26, Col8, Col30, Col40, Col41, Col44, Col45, Col49 WHERE Col8 <> 'alc'")

Google sheets Query not returning values

I have the following Query:
=QUERY('Fleury Braz Leme'!$A$31:$I$55;"select H, I, F, E where G = '"&I39&"'")
Some values from column E are being returned, but others are not. As you can see in the case below/above, on that linked sheet, there are corresponding values:
What can I do to correct it?
try:
=INDEX(QUERY(TO_TEXT('Fleury Braz Leme'!$A$31:$I$55);
"select Col8, Col9, Col6, Col5
where Col7 = '"&I39&"'"))

How to QUERY (ImportRange) if the "where" clause also uses imported values?

I am trying to query ImportRange data by the condition that a column matches a value from the imported spreadsheet, but it does not work:
=Query(importRange("KEY1","namesheet1!A2:R1600"), "select Col4 where Col7 = (importRange("KEY1","namesheet2!F108:F108")) ",0)
Replace
"select Col4 where Col7 = (importRange("KEY1","namesheet2!F108:F108")) "
by
"select Col4 where Col7 = " & importRange("KEY1","namesheet2!F108")

Resources