Using a subquery as a parameter - google-sheets

I am using Google Sheets and have a connected query where I am using parameters. When one of the parameters is configured to be a subquery, the query will run, but no results are returned.
For example, here is my (simplified) query:
SELECT *
FROM table
WHERE campaign IN (#CAMPAIGN);
In this example, I have the #CAMPAIGN parameter in the Google Sheet configured as:
SELECT DISTINCT campaign FROM table2
If I manually substitute the parameter in the BQ console, it runs fine and returns the expected results. Is there a reason this functionality does not work with parameter substitution in the Google Sheet? Is there a way around this?

Depending on how much SQL SELECT type lookups you do, it may help to use a #customfunction that I wrote. You need to place my SQL .js in your Google sheets project and the =gsSQL() custom function will be available.
The one requirement for this versus using =QUERY() is that unique column titles are required for each column.
It is available on github:
gsSQL github project
This example works if each sheet is a table, so it would be entered something like
=gsSQL("SELECT books.id, books.title, books.author_id
FROM books
WHERE books.author_id IN (SELECT id from authors)
ORDER BY books.title")
In this example, I have a sheet named 'books' and another sheet named 'authors'.
If you need to specify a named range or an A1 notation range as a table, this can also be done with a little more work...
=gsSQL("SELECT books.id, books.title, books.author_id
FROM books
WHERE books.author_id IN (SELECT id from authors)
ORDER BY books.title", {{'books', 'books!$A$1:$I', 60};
{'authors', 'authors!$A$1:$J30', 60}}, true)
In this example, the books and authors come from specific ranges, the data will be cached for 60 seconds and column titles are output.

Related

is it possible to get a single column output from a multi-column Query?

I'm using a QUERY function in Google Sheets. I have a named data range ("Contributions" in table on another sheet) that consists of many columns, but I'm only concerned with two of them. For simplicity sake, it looks something like this:
I have another table that contains the unique set of names (e.g.: "Fred", "Ginger", etc. each only once) and I want to extract the level # (column B) from the above table to insert the most recent (largest number) in this second table.
Right now, my query looks like this:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
The problem is, that it outputs both B & C data - e.g.:
11 Fred
But since I already have the name (in column A of this other table) I only want it to output the value from B - e.g.:
11
Is there a way to output only a subset (in this case 1 of 2) of the columns of output based on a directive within the query itself (as opposed to doing post-processing of the results)?
Outputting a Subset of Columns Used in Query
In order to output only certain columns of a query result, the query only needs to select the columns to be displayed while the constraints / conditions may utilize other columns of data.
For example (as an answer to my own question) - I have a table like this:
I needed to get the data from the row with a name matching another cell (on another sheet) and with the latest (largest) number - but I only want to output the number part.
My initial attempt was:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
But that output both B & C where I only wanted B. The answer (thanks to # Calculuswhiz) was to continue using C for the condition but only select on B:
=QUERY(Contributions, "select B where C='"&A5&"' order by B desc limit 1",1)

Inner join with postgres array field

We have a CMS using rails,react and postgresql. We have pages and pieces stored in our table.
Each page consist of a set of pieces (An array field).
We have pieces that can be used across multiple pages.
Let's say we are rendering page_id 50806. our react front end requires data in the following format.
pieces: [
{id: B1fu5jthb, included_on_pages: [50808, 50806]},
{id: BJTNssF2Z, included_on_pages: [50808]}
]
So currently, to find included_on_pages, i am writing one query to fetch all the pieces of the page and then looping over each piece to find pages where the particular piece is included.
(Basically N+1 queries.)
select pieces from page_pieces where page_id = 50806
Looping over each piece
select page_id from page_pieces where 'B1fu5jthb' = any(page_pieces.pieces);
So my question,
Instead of looping over each piece and find which pages its included, can we write a single join statements to fetch all the pieces and their included_on_pages
I think a combination of unnesting, ANY comparison and array aggregation should work:
with
pcs as (select unnest(pieces) as id from page_pieces where page_id = 50806)
select id, array_agg(page_id) as included_on_pages
from pcs inner join page_pieces on id = any(pieces)
group by id;
See it on SQL Fiddle

SQLite select distinct join query how to

I have a sqlite database that I'm trying to build a query. The table column I need to retrieve is iEDLID from the table below :
Right now all I have to go on is a known iEventID from the table below :
And the the nClientLocationID from the table below.
So the requirements are I need to get current iEDLID to write, lookup from tblEventDateLocations for dEventDate and the tblLocation.nClientLocationID based on the tblLocations.iLocationID I already have and event selected on this screen.
So I would need a query that does a "SELECT DISTINCT table EventDateLocations.iEDLID FROM tblEventDateLocations ...."
So basically from another query I have the iEventID I need, and I have the event ID i need but where the dEventDate=(select date('now')) I need to retrieve the iEventDateID from table EventDates.iEventDateID to use on the table EventDateLocations
this is the point where I'm trying to wrap my head around the joins for this query and the syntax...
It seems like you want this:
select distinct edl.iEDLDID
from
tblEventDateLocations edl
join tblEventDates ed on edl.EventDateId = ed.EventDateId
where
ed.EventId = ?
and ed.dEventDate = date('now')
and edl.nClientLocationID = ?
where the ? of course represent the known event ID and location ID parameters.
Since nClientLocationId appears on table tblEventDateLocations you do not need to join table tblLocations unless you want to filter out results whose location ID does not appear in that table.

Change Data Capture with table joins in ETL

In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".

using SQL aggregate functions with JOINs

I have two tables - tool_downloads and tool_configurations. I am trying to retrieve the most recent build date for each tool in my database. The layout of the DB is simple. One table called tool_downloads keeps track of when a tool is downloaded. Another table is called tool_configurations and stores the actual data about the tool. They are linked together by the tool_conf_id.
If I run the following query which omits dates, I get back 200 records.
SELECT DISTINCT a.tool_conf_id, b.tool_conf_id
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
When I try to add in date information I get back hundreds of thousands of records! Here is the query that fails horribly.
SELECT DISTINCT a.tool_conf_id, max(a.configured_date) as config_date, b.configuration_name
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
I know the problem has something to do with group-bys/aggregate data and joins. I can't really search google since I don't know the name of the problem I'm encountering. Any help would be appreciated.
Solution is:
SELECT b.tool_conf_id, b.configuration_name, max(a.configured_date) as config_date
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
GROUP BY b.tool_conf_id, b.configuration_name

Resources