I'm doing a simple query using some math functions (the actual query is more complicated, but even this simple example demonstrates the error):
SELECT 123 as distance,
c
FROM MyDomainClass as c
And this will return [distance,domain class] pairs. That works fine. However, I want to filter and sort by distance as well.
SELECT 123 as distance,
c
FROM MyDomainClass as c
WHERE distance < 10
Unfortunately, it cannot find the 'distance' column in the result set:
Unknown column 'distance' in 'where clause'
I turned SQL logging and see the following generated query:
select 123 as col_0_0_, ............. where distance<?
So I can see the error, but I don't understand why it's using a generated column name 'col_0_0_' in the result set or how I can use this generated name in the where clause.
Any ideas?
Related
I'm using a QUERY function in Google Sheets. I have a named data range ("Contributions" in table on another sheet) that consists of many columns, but I'm only concerned with two of them. For simplicity sake, it looks something like this:
I have another table that contains the unique set of names (e.g.: "Fred", "Ginger", etc. each only once) and I want to extract the level # (column B) from the above table to insert the most recent (largest number) in this second table.
Right now, my query looks like this:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
The problem is, that it outputs both B & C data - e.g.:
11 Fred
But since I already have the name (in column A of this other table) I only want it to output the value from B - e.g.:
11
Is there a way to output only a subset (in this case 1 of 2) of the columns of output based on a directive within the query itself (as opposed to doing post-processing of the results)?
Outputting a Subset of Columns Used in Query
In order to output only certain columns of a query result, the query only needs to select the columns to be displayed while the constraints / conditions may utilize other columns of data.
For example (as an answer to my own question) - I have a table like this:
I needed to get the data from the row with a name matching another cell (on another sheet) and with the latest (largest) number - but I only want to output the number part.
My initial attempt was:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
But that output both B & C where I only wanted B. The answer (thanks to # Calculuswhiz) was to continue using C for the condition but only select on B:
=QUERY(Contributions, "select B where C='"&A5&"' order by B desc limit 1",1)
According to Hive's documentation it supports NOT IN subqueries in a WHERE clause, provided that the subquery is an uncorrelated subquery (does not reference columns from the main query).
However, when I attempt to run the trivial query below, I get an error FAILED: SemanticException Cartesian products are disabled for safety reasons.
-- sample data
CREATE TEMPORARY TABLE foods (name STRING);
CREATE TEMPORARY TABLE vegetables (name STRING);
INSERT INTO foods VALUES ('steak'), ('eggs'), ('celery'), ('onion'), ('carrot');
INSERT INTO vegetables VALUES ('celery'), ('onion'), ('carrot');
-- the problematic query
SELECT *
FROM foods
WHERE foods.name NOT IN (SELECT vegetables.name FROM vegetables)
Note that if I use an IN clause instead of a NOT IN clause, it actually works fine, which is perplexing because the query evaluation structure should be the same in either case.
Is there a workaround for this, or another way to filter values from a query based on their presence in another table?
This is Hive 2.3.4 btw, running on an Amazon EMR cluster.
Not sure why you would get that error. One work around is to use not exists.
SELECT f.*
FROM foods f
WHERE NOT EXISTS (SELECT 1
FROM vegetables v
WHERE v.name = f.name)
or a left join
SELECT f.*
FROM foods f
LEFT JOIN vegetables v ON v.name = f.name
WHERE v.name is NULL
You got cartesian join because this is what Hive does in this case. vegetables table is very small (just one row) and it is being broadcasted to perform the cross (most probably map-join, check the plan) join. Hive does cross (map) join first and then applies filter. Explicit left join syntax with filter as #VamsiPrabhala said will force to perform left join, but in this case it works the same, because the table is very small and CROSS JOIN does not multiply rows.
Execute EXPLAIN on your query and you will see what is exactly happening.
I'm having a problem with a .first query in Rails 4 ActiveRecord. New behavior in Rails 4 is to add an order by the id field so that all db systems will output the same order.
So this...
Foo.where(bar: baz).first
Will give the query...
select foos.* from foos order by foos.id asc limit 1
The problem I am having is my select contains two sum fields. With the order by id thrown in the query automatically, I'm getting an error that the id field must appear in the group by clause. The error is right, no need for the id field if I want the output to be the sum of these two fields.
Here is an example that is not working...
baz = Foo.find(77).fooviews.select("sum(number_of_foos) as total_number_of_foos, sum(number_of_bars) as total_number_of_bars").reorder('').first
Here is the error...
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "foos.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...bars FROM "fooviews" ORDER BY "...
Since the select is an aggregate expression, there is no need for the order by id, but AR is throwing it in automatically.
I found that I can add a reorder('') on to the end before the .first and that removes the order by id, but is that the right way to fix this?
Thank you
[UPDATE] What I neglected to mention is that I'm converting a large Rails 3 project to Rails 4. So the output from the Rails 3 is an AR object. If possible, the I would like the solution to keep in that format so that there is less code to change in the conversion.
You will want to use take:
The take method retrieves a record without any implicit ordering.
For example:
baz = Foo.find(77).fooviews.select("sum(number_of_foos) as total_number_of_foos, sum(number_of_bars) as total_number_of_bars").take
The commit message here indicates that this was a replacement for the old first behavior.
I am getting this error in the pg production mode, but its working fine in sqlite3 development mode.
ActiveRecord::StatementInvalid in ManagementController#index
PG::Error: ERROR: column "estates.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = ...
^
: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = 'Mazzey' GROUP BY user_id
#myestate = Estate.where(:Mgmt => current_user.Company).group(:user_id).all
If user_id is the PRIMARY KEY then you need to upgrade PostgreSQL; newer versions will correctly handle grouping by the primary key.
If user_id is neither unique nor the primary key for the 'estates' relation in question, then this query doesn't make much sense, since PostgreSQL has no way to know which value to return for each column of estates where multiple rows share the same user_id. You must use an aggregate function that expresses what you want, like min, max, avg, string_agg, array_agg, etc or add the column(s) of interest to the GROUP BY.
Alternately you can rephrase the query to use DISTINCT ON and an ORDER BY if you really do want to pick a somewhat arbitrary row, though I really doubt it's possible to express that via ActiveRecord.
Some databases - including SQLite and MySQL - will just pick an arbitrary row. This is considered incorrect and unsafe by the PostgreSQL team, so PostgreSQL follows the SQL standard and considers such queries to be errors.
If you have:
col1 col2
fred 42
bob 9
fred 44
fred 99
and you do:
SELECT col1, col2 FROM mytable GROUP BY col1;
then it's obvious that you should get the row:
bob 9
but what about the result for fred? There is no single correct answer to pick, so the database will refuse to execute such unsafe queries. If you wanted the greatest col2 for any col1 you'd use the max aggregate:
SELECT col1, max(col2) AS max_col2 FROM mytable GROUP BY col1;
I recently moved from MySQL to PostgreSQL and encountered the same issue. Just for reference, the best approach I've found is to use DISTINCT ON as suggested in this SO answer:
Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord
This will let you get one record for each unique value in your chosen column that matches the other query conditions:
MyModel.where(:some_col => value).select("DISTINCT ON (unique_col) *")
I prefer DISTINCT ON because I can still get all the other column values in the row. DISTINCT alone will only return the value of that specific column.
After often receiving the error myself I realised that Rails (I am using rails 4) automatically adds an 'order by id' at the end of your grouping query. This often results in the error above. So make sure you append your own .order(:group_by_column) at the end of your Rails query. Hence you will have something like this:
#problems = Problem.select('problems.username, sum(problems.weight) as weight_sum').group('problems.username').order('problems.username')
#myestate1 = Estate.where(:Mgmt => current_user.Company)
#myestate = #myestate1.select("DISTINCT(user_id)")
this is what I did.
In rails 3.0.0, the following query works fine:
Author.where("name LIKE :input",{:input => "#{params[:q]}%"}).includes(:books).order('created_at')
However, when I input as search string (so containing a double colon followed by a dot):
aa:.bb
I get the following exception:
ActiveRecord::StatementInvalid: SQLite3::SQLException: ambiguous column name: created_at
In the logs the these are the sql queries:
with aa as input:
Author Load (0.4ms) SELECT "authors".* FROM "authors" WHERE (name LIKE 'aa%') ORDER BY created_at
Book Load (2.5ms) SELECT "books".* FROM "books" WHERE ("books".author_id IN (1,2,3)) ORDER BY id
with aa:.bb as input:
SELECT DISTINCT "authors".id FROM "authors" LEFT OUTER JOIN "books" ON "books"."author_id" = "authors"."id" WHERE (name LIKE 'aa:.bb%') ORDER BY created_at DESC LIMIT 12 OFFSET 0
SQLite3::SQLException: ambiguous column name: created_at
It seems that with the aa:.bb input, an extra query is made to fetch the distinct author id_s.
I thought Rails would escape all the characters. Is this expected behaviour or a bug?
Best Regards,
Pieter
The "ambiguous column" error usually happens when you use includes or joins and don't specify which table you're referring to:
"name LIKE :input"
Should be:
"authors.name LIKE :input"
Just "name" is ambiguous if your books table has a name column too.
Also: have a look at your development.log to see what the generated query looks like. This will show you if it's being escaped properly.
Replace
.includes(:books)
with
.preload(:books)
This should force activerecord to use 2 queries instead of the join.
Rails has 2 versions of includes: One which constructs a big query with joins (the 2nd of your 2 queries and thus more likely to result in ambiguous column references and one that avoids the joins in favour of a separate query per association.
Rails decides which strategy to used based on whether it thinks that your conditions, order etc refer to the included tables (since in that case the joins version is required). Where a condition is a string fragment that heuristic isn't very sophisticated - i seem to recall that it just scans the conditions for anything that might look like a column from another table (ie foo.bar) so having a literal of that form could fool it.
You can either qualify your column names so that it doesn't matter which includes strategy is used or you can use preload/eager_load instead of includes. These behave similarly to includes but force a specific include strategy rather than trying to guess which is most appropriate.