Join with <= in the On clause in Google bigquery - join

I want to execute a query with join in Google bigquery that has '<=' instead of '=' in it's on clause:
select s.count_value as count_value,s.total as total,sum(p.total) as accumulated from stats s join stats p on p.rn <=s.rn group by count_value,total,s.rn
When I run this query, I receive an error message saying:
Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name.
Any idea how I can implement this query?

You should enable Standard SQL to do such JOINs
See Enabling Standard SQL
in CLI - just add the --use_legacy_sql=false flag to your command line statement.

Related

Solr pass value between subquery and upper query

I have to perform the following SQL query on a Solr collection:
select c.CONTRACT
from contracts c
where c.LIMIT_TYPE = 4
and c.TRANSACTION_DATE_DB = (select max(TRANSACTION_DATE_DB) from contracts c2 where c.CONTRACT = c2.CONTRACT and c2.LIMIT_TYPE = 4)
and c.STATUS = 'ENABLED'
group by c.CONTRACT
order by c.CONTRACT desc;
The objective of the query is the following:
the subquery find, for each contract, the record with last transaction date for limit type 4
the upper query checks if, for each contract, if the record returned has status enabled.
I've seen that there are two possible ways to perform this query using solr: join and subquery
Here below I'll paste my 100th attempt using subquery:
http://localhost:8983/solr/contracts_collection/select?
q=*:*
&indent=true
fq=LIMIT_TYPE:4&
fq=STATUS:ENABLED
&fl=*,mySub:[subquery]
&mySub.q=*
&mySub.fq={!term f=CONTRACT_IDENTITY v=$row.CONTRACT_IDENTITY}
&mySub.fq=LIMIT_TYPE:4
&mySub.fl=CONTRACT_IDENTITY,TRANSACTION_DATE_DB,LIMIT_TYPE
&mySub.sort=TRANSACTION_DATE_DB+desc
&mySub.rows=1
The subquery is correct. However, for each contract, the result contains all the records for that contract. In fact, I'm missing here how to make the upper query return, for each contract, only the record having the same transaction_date_db returned by the subquery.
I tried to add the following condition:
&mySub.fq={!term f=TRANSACTION_DATE_DB v=$row.TRANSACTION_DATE_DB}
But still, I don't have the desired result.
I've also saw join, but I've failed miserably :P
Does anyone have any suggestion? Going crazy for two days now.
Thanks a lot

Joining on column names with spaces

I'm trying to join to tables using PROQ SQL. One of the columns I'm using for the join has a space in the column name. The query I'm using:
PROC SQL;
CREATE TABLE TEST AS
SELECT a.*, b.*
FROM TABLE_1 a
INNER JOIN TABLE_2 b
ON a.CONTNO = b."Contract Number";
RUN;
This is the error I'm getting:
ERROR 22-322: Syntax error, expecting one of the following: a name, *.
How do I fix this?
You just need to add square brackets around the Column name. For example:
b.[Contract Number]
Tips: Using alias (a, b) can be costly. When you only have one table to join, consider typing out the table rather than doing an alias.

Informix grammar explanation?What does the grammar in the picture mean?

A fragment of SQL in the Informix dialect
SELECT INSUREDNAME
FROM sc5100car3gdb#idp_5100_cb:PRPCINSURED P
WHERE P.PROPOSALNO = A.PROPOSALNO
What does this grammar mean?
The SQL fragment is:
SELECT INSUREDNAME
FROM sc5100car3gdb#idp_5100_cb:PRPCINSURED P
WHERE P.PROPOSALNO = A.PROPOSALNO
This means that there is a table PRPCINSURED in database sc5100car3gdb hosted on Informix server idp_5100_cb; inside the query, the table will be referred to by the alias P. It has columns INSUREDNAME and PROPOSALNO. Further, this must be a fragment of an SQL statement. The WHERE clause uses the alias P, but also references another table with the alias (or perhaps name) A. However, the context defining A is not shown; as it stands, the A will trigger an error. (When I ran an analogous query, I got the error SQL -217: Column (a) not found in any table in the query (or SLV is undefined).)
See the Informix Guide to SQL: Syntax manual on database object names for more information about the notation used for the table name.

Why does Hive warn that this subquery would cause a Cartesian product?

According to Hive's documentation it supports NOT IN subqueries in a WHERE clause, provided that the subquery is an uncorrelated subquery (does not reference columns from the main query).
However, when I attempt to run the trivial query below, I get an error FAILED: SemanticException Cartesian products are disabled for safety reasons.
-- sample data
CREATE TEMPORARY TABLE foods (name STRING);
CREATE TEMPORARY TABLE vegetables (name STRING);
INSERT INTO foods VALUES ('steak'), ('eggs'), ('celery'), ('onion'), ('carrot');
INSERT INTO vegetables VALUES ('celery'), ('onion'), ('carrot');
-- the problematic query
SELECT *
FROM foods
WHERE foods.name NOT IN (SELECT vegetables.name FROM vegetables)
Note that if I use an IN clause instead of a NOT IN clause, it actually works fine, which is perplexing because the query evaluation structure should be the same in either case.
Is there a workaround for this, or another way to filter values from a query based on their presence in another table?
This is Hive 2.3.4 btw, running on an Amazon EMR cluster.
Not sure why you would get that error. One work around is to use not exists.
SELECT f.*
FROM foods f
WHERE NOT EXISTS (SELECT 1
FROM vegetables v
WHERE v.name = f.name)
or a left join
SELECT f.*
FROM foods f
LEFT JOIN vegetables v ON v.name = f.name
WHERE v.name is NULL
You got cartesian join because this is what Hive does in this case. vegetables table is very small (just one row) and it is being broadcasted to perform the cross (most probably map-join, check the plan) join. Hive does cross (map) join first and then applies filter. Explicit left join syntax with filter as #VamsiPrabhala said will force to perform left join, but in this case it works the same, because the table is very small and CROSS JOIN does not multiply rows.
Execute EXPLAIN on your query and you will see what is exactly happening.

.group not returning all columns

I have a .group query that is not returning all the columns in the select and I was wondering if someone could validate my syntax.
Here is a query with a .group and the result from my console;
Expense.select('account_number, SUM(credit_amount)').group(:account_number).first
Expense Load (548.8ms) EXEC sp_executesql N'SELECT TOP (1) account_number, SUM(credit_amount) FROM [expenses] GROUP BY account_number'
(36.9ms) SELECT table_name FROM information_schema.views
Even though I select two columns, I'm only getting the first one to return. I'm wondering if I may be dealing with an db adapter problem.
Try giving your sum an alias:
expense = Expense.select('account_number, SUM(credit_amount) AS credit_amount').group(:account_number).first
puts expense.credit_amount
ActiveRecord doesn't create a default alias for aggregation operations such as SUM, COUNT etc... you have to do it explicitly to be able to access the results, as shown above.
The SUM(credit_amount) column from the SQL has no alias and will not have a column name by default. If you change it to have an alias SUM(credit_amount) As 'A' for example and select the alias name, it should pick it up.

Resources