I've got a database table that needs to be joined to a table that contains countries. The LEFT JOIN is done on country ISO code and is as follows:
SELECT table1.city, table2.Country, table2.Flag
FROM table1
LEFT JOIN table2
ON (table2.ISO = table1.country)
WHERE table1.id = ?
This goes pretty well unfortunately the ISO code for India is 'IN'. This is the country that fails to provide the country data and I think this is because IN is a reserved word for SQL. But how can I preform this query anyway?
Thanks
Ron
The use of a reserved word in the data should not affect the results at all. Reserved words are in issue when compiling the query, not when running it.
If your query is failing for India, then there is some other problem.
Related
According to Hive's documentation it supports NOT IN subqueries in a WHERE clause, provided that the subquery is an uncorrelated subquery (does not reference columns from the main query).
However, when I attempt to run the trivial query below, I get an error FAILED: SemanticException Cartesian products are disabled for safety reasons.
-- sample data
CREATE TEMPORARY TABLE foods (name STRING);
CREATE TEMPORARY TABLE vegetables (name STRING);
INSERT INTO foods VALUES ('steak'), ('eggs'), ('celery'), ('onion'), ('carrot');
INSERT INTO vegetables VALUES ('celery'), ('onion'), ('carrot');
-- the problematic query
SELECT *
FROM foods
WHERE foods.name NOT IN (SELECT vegetables.name FROM vegetables)
Note that if I use an IN clause instead of a NOT IN clause, it actually works fine, which is perplexing because the query evaluation structure should be the same in either case.
Is there a workaround for this, or another way to filter values from a query based on their presence in another table?
This is Hive 2.3.4 btw, running on an Amazon EMR cluster.
Not sure why you would get that error. One work around is to use not exists.
SELECT f.*
FROM foods f
WHERE NOT EXISTS (SELECT 1
FROM vegetables v
WHERE v.name = f.name)
or a left join
SELECT f.*
FROM foods f
LEFT JOIN vegetables v ON v.name = f.name
WHERE v.name is NULL
You got cartesian join because this is what Hive does in this case. vegetables table is very small (just one row) and it is being broadcasted to perform the cross (most probably map-join, check the plan) join. Hive does cross (map) join first and then applies filter. Explicit left join syntax with filter as #VamsiPrabhala said will force to perform left join, but in this case it works the same, because the table is very small and CROSS JOIN does not multiply rows.
Execute EXPLAIN on your query and you will see what is exactly happening.
I have a SQL query something like this
SELECT
P . ID,
P .code,
l.parent_id
FROM
properties P
LEFT JOIN locations l ON l. ID = P .location_id;
I want to convert this query to SOLR query. I can join two cores by below system
http://example.com:8999/solr/properties/select?q=*:*&fq={!join from=id to=location_id fromIndex=locations}p_id:12345
But I cant select the fields of locations core.How can I do this? Your valuable suggestion will be appreciated.
You can use subquery in the fl. Something like this fl=*,locations:[subquery fromIndex=locations]&locations.q={!terms f=id v=$row.location_id}
More info here https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-subquery
You can't. Solr does not support returning fields from both ends of an join. Solr is not a relational database, so you're usually better off trying to not use it as one.
Instead, index the information about each location to each property, and query based on that.
If any location info changes (which it turns out, usually happens very rarely), reindex the documents assigned that location.
I'm still a novice at SQL and I need to run a report which JOINs 3 tables. The third table has duplicates of fields I need. So I tried to join with a distinct option but hat didn't work. Can anyone suggest the right code I could use?
My Code looks like this:
SELECT
C.CUSTOMER_CODE
, MS.SALESMAN_NAME
, SUM(C.REVENUE_AMT)
FROM C_REVENUE_ANALYSIS C
JOIN M_CUSTOMER MC ON C.CUSTOMER_CODE = MC.CUSTOMER_CODE
/* This following JOIN is the issue. */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE = (SELECT SALESMAN_CODE FROM M_SALESMAN WHERE COMP_CODE = '00')
WHERE REVENUE_DATE >= :from_date
AND REVENUE_DATE <= :to_date
GROUP BY C.CUSTOMER_CODE, MS.SALESMAN_NAME
I also tried a different variation to get a DISTINCT.
/* I also tried this variation to get a distinct */
JOIN M_SALESMAN MS ON MC.SALESMAN_CODE =
(SELECT distinct(SALESMAN_CODE) FROM M_SALESMAN)
Please can anyone help? I would truly appreciate it.
Thanks in advance.
select distinct
c.customer_code,
ms.salesman_code,
SUM(c.revenue_amt)
FROM
c_revenue c,
m_customer mc,
m_salesman ms
where
c.customer_code = mc.customer_code
AND mc.salesman_code = ms.salesman_code
AND ms.comp_code = '00'
AND Revenue_Date BETWEEN (from_date AND to_date)
group by
c.customer_code, ms.salesman_name
The above will return you any distinct combination of Customer Code, Salesman Code and SUM of Revenue Amount where the c.CustomerCode matches an mc.customer_code AND that same mc record matches an ms.salesman_code AND that ms record has a comp_code of '00' AND the Revenue_Date is between the from and to variables. Then, the whole result will be grouped by customer code and salesman name; the only thing that will cause duplicates to appear is if the SUM(revenue) is somehow different.
To explain, if you're just doing a straight JOIN, you don't need the JOIN keywords. I find it tends to convolute things; you only need them if you're doing an "odd" join, like an LEFT/RIGHT join. I don't know your data model so the above MIGHT still return duplicates but, if so, let me know.
I am trying to join two tables and keep getting an error message that states...
The data types text and text are incompatible in the equal to operator.
I need to know how to effectively query the two tables without re-importing the data. The import took hours to run.
Both fields have a data type of TEXT.
SELECT doc4., doc.
FROM doc4 INNER JOIN
doc4 ON doc.unid = doc4.unid
unid is the field and it is a text type in both tables..
Assuming this is for MS-SQL, the = operator does not work directly for text, even though both fields are the same type.
The recommendation for Server 2014 at least is to replace it with one of
varchar(max), nvarchar(max), or varbinary(max) data types.
(ref. https://msdn.microsoft.com/en-us/library/ms143729.aspx)
Text has been deprecated, but if it's not convenient to replace it, it's still possible (though less efficient) to compare text fields by using cast.
SELECT doc4.something, doc.something
FROM doc
INNER JOIN doc4
ON CAST(doc.unid AS varchar(max)) =
CAST(doc4.unid AS varchar(max));
(Or say "FROM doc4 INNER JOIN doc", whichever is the prefered order.)
In my ETL process I am using Change Data Capture (CDC) to discover only rows that have been changed in the source tables since the last extraction. Then I do the transformation only for this rows. The problem is when I have for example 2 tables which I want to join into one dimension, and only one of them has changed. For example I have table Countries and Towns as following:
Countries:
ID Name
1 France
Towns:
ID Name Country_ID
1 Lyon 1
Now lets say a new row is added to Towns table:
ID Name Country_ID
1 Lyon 1
2 Paris 2
The Countries table has not been changed, so CDC for these tables shows me only the row from Towns table. The problem is when I do the join between Countries and Towns, there is no row in Countries change set, so the join will result in empty set.
Do you have an idea how to solve it? Of course there might be more difficult cases, involving 3 and more tables, and consequential joins.
This is a typical problem found when doing Realtime Change-Data-Capture, or even Incremental-only daily changes.
There's multiple ways to solve this.
One way would be to do your joins on the natural keys in the dimension or mapping table, to get the associated country (SELECT distinct country_name, [..other attributes..] from dim_table where country_id = X).
Another alternative would be to do the join as part of the change capture process - when a row is loaded to towns, a trigger goes off that loads the foreign key values into the associated staging tables (country, etc).
There is allot i could babble on for more information on but i will be specific to what is in your question. I would suggest the following to get the results...
1st Pass is where everything matches via the join...
Union All
2nd Pass Gets all towns where there isn't a country
(left outer join with a where condition that
requires the ID in the countries table to be null/missing).
You would default the Country ID value in that unmatched join to something designated as a "Unmatched Value" typically 0 or -1 is used or a series of standard -negative numbers that you could assign descriptions to later to identify why data is bad for your example -1 could be "Found Town Without Country".