Duplicate ID's being generated violating primary key contraints - ruby-on-rails

Can anyone help explain this? I am using the Populator and Faker gems to put some generated data into my database. Among other things, I generate 10,000 comments (which are from the 'acts_as_commentable' gem. All this works. However, when I go to add a new comment, I get an error saying that I am violating the primary key by using an existing id. Look at my console output below. You can see I have 10,000 records starting with ID 1 and ending with ID 100000. I then try to add a new comment and it fails. This is only happening with this model/table. I can add new users, etc.
>> Comment.first(:order => 'id').id
=> 1
>> Comment.last(:order => 'id').id
=> 10000
>> Comment.count
=> 10000
>> Comment.create(:title => 'wtf is up?')
ActiveRecord::RecordNotUnique: PGError: ERROR: duplicate key value violates unique constraint "comments_pkey"
DETAIL: Key (id)=(1) already exists.
I suspect this is related to how the Populator gem is batching the records into the database. It is only happening on models/tables that I see with Populator.

This happens if the value of the id column is explicitly set in an insert statement.
For every id-column there is a sequence in Postgres, which is usually named tablename_columnname_seq, for example user_id_seq.
Please check the name in the table definition in pgadmin3 as rails does not support sequences with other names.
You can fix a sequence with a too low id by executing something similar to:
SELECT setval('user_id_seq', 10000);
To learn the highest number:
SELECT max(id) FROM users;
SELECT max(x) FROM
(SELECT max(id) As x FROM users
UNION SELECT last_value As x FROM user_id_seq As y);

I do not know what the actual problem is, but it is certainly linked to using the Populator gem to add records. Generating data by using:
Populator.sentences(1..3) # makes 3 sentences
is fine.
However, generating records like
User.populate 5000 do |user| # makes 5000 users in batches of 1000
user.name = Populator.words(1)
...
end
Something in there is causing my problem. note that I am using Rails 3.0.1

This is a handy PostreSQL script to fix sequences for all tables at once
SELECT 'SELECT SETVAL(' ||quote_literal(quote_ident(S.relname))|| ', MAX(' ||quote_ident(C.attname)|| ') ) FROM ' ||quote_ident(T.relname)|| ';'
FROM pg_class AS S, pg_depend AS D, pg_class AS T, pg_attribute AS C
WHERE S.relkind = 'S'
AND S.oid = D.objid
AND D.refobjid = T.oid
AND D.refobjid = C.attrelid
AND D.refobjsubid = C.attnum
ORDER BY S.relname;

Related

Group by Error: PG::GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function [duplicate]

I am getting this error in the pg production mode, but its working fine in sqlite3 development mode.
ActiveRecord::StatementInvalid in ManagementController#index
PG::Error: ERROR: column "estates.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = ...
^
: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = 'Mazzey' GROUP BY user_id
#myestate = Estate.where(:Mgmt => current_user.Company).group(:user_id).all
If user_id is the PRIMARY KEY then you need to upgrade PostgreSQL; newer versions will correctly handle grouping by the primary key.
If user_id is neither unique nor the primary key for the 'estates' relation in question, then this query doesn't make much sense, since PostgreSQL has no way to know which value to return for each column of estates where multiple rows share the same user_id. You must use an aggregate function that expresses what you want, like min, max, avg, string_agg, array_agg, etc or add the column(s) of interest to the GROUP BY.
Alternately you can rephrase the query to use DISTINCT ON and an ORDER BY if you really do want to pick a somewhat arbitrary row, though I really doubt it's possible to express that via ActiveRecord.
Some databases - including SQLite and MySQL - will just pick an arbitrary row. This is considered incorrect and unsafe by the PostgreSQL team, so PostgreSQL follows the SQL standard and considers such queries to be errors.
If you have:
col1 col2
fred 42
bob 9
fred 44
fred 99
and you do:
SELECT col1, col2 FROM mytable GROUP BY col1;
then it's obvious that you should get the row:
bob 9
but what about the result for fred? There is no single correct answer to pick, so the database will refuse to execute such unsafe queries. If you wanted the greatest col2 for any col1 you'd use the max aggregate:
SELECT col1, max(col2) AS max_col2 FROM mytable GROUP BY col1;
I recently moved from MySQL to PostgreSQL and encountered the same issue. Just for reference, the best approach I've found is to use DISTINCT ON as suggested in this SO answer:
Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord
This will let you get one record for each unique value in your chosen column that matches the other query conditions:
MyModel.where(:some_col => value).select("DISTINCT ON (unique_col) *")
I prefer DISTINCT ON because I can still get all the other column values in the row. DISTINCT alone will only return the value of that specific column.
After often receiving the error myself I realised that Rails (I am using rails 4) automatically adds an 'order by id' at the end of your grouping query. This often results in the error above. So make sure you append your own .order(:group_by_column) at the end of your Rails query. Hence you will have something like this:
#problems = Problem.select('problems.username, sum(problems.weight) as weight_sum').group('problems.username').order('problems.username')
#myestate1 = Estate.where(:Mgmt => current_user.Company)
#myestate = #myestate1.select("DISTINCT(user_id)")
this is what I did.

Rails Postgres Error GROUP BY clause or be used in an aggregate function

In SQLite (development) I don't have any errors, but in production with Postgres I get the following error. I don't really understand the error.
PG::Error: ERROR: column "commits.updated_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...mmits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at...
^
: SELECT COUNT(*) AS count_all, mission_id AS mission_id FROM "commits" WHERE "commits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at DESC
My controller method:
def show
#user = User.find(params[:id])
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #commits.group("mission_id").count.length
end
UPDATE:
So i digged further into this PostgreSQL specific annoyance and I am surprised that this exception is not mentioned in the Ruby on Rails Guide.
I am using psql (PostgreSQL) 9.1.11
So from what I understand, I need to specify which column that should be used whenever you use the GROUP_BY clause. I thought using SELECT would help, which can be annoying if you need to SELECT a lot of columns.
Interesting discussion here
Anyways, when I look at the error, everytime the cursor is pointed to updated_at. In the SQL query, rails will always ORDER BY updated_at. So I have tried this horrible query:
#commits.group("mission_id, date(updated_at)")
.select("date(updated_at), count(mission_id)")
.having("count(mission_id) > 0")
.order("count(mission_id)").length
which gives me the following SQL
SELECT date(updated_at), count(mission_id)
FROM "commits"
WHERE "commits"."user_id" = 1
GROUP BY mission_id, date(updated_at)
HAVING count(mission_id) > 0
ORDER BY updated_at DESC, count(mission_id)
LIMIT 25 OFFSET 0
the error is the same.
Note that no matter what it will ORDER BY updated_at, even if I wanted to order by something else.
Also I don't want to group the records by updated_at just by mission_id.
This PostgreSQL error is just misleading and has little explanation to solving it. I have tried many formulas from the stackoverflow sidebar, nothing works and always the same error.
UPDATE 2:
So I got it to work, but it needs to group the updated_at because of the automatic ORDER BY updated_at. How do I count only by mission_id?
#missions_commits = #commits.group("mission_id, updated_at").count("mission_id").size
I guest you want to show general number of distinct Missions related with Commits, anyway it won't be number on page.
Try this:
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #user.commits.distinct.count(:mission_id)
However if you want to get the number of distinct Missions on page I suppose it should be:
#missions_commits = #commits.collect(&:mission_id).uniq.count
Update
In Rails 3, distinct did not exist, but pure SQL counting should be used this way:
#missions_commits = #user.commits.count(:mission_id, distinct: true)
See the docs for PostgreSQL GROUP BY here:
http://www.postgresql.org/docs/9.3/interactive/sql-select.html#SQL-GROUPBY
Basically, unlike Sqlite (and MySQL) postgres requires that any columns selected or ordered on must appear in an aggregate function or the group by clause.
If you think it through, you'll see that this actually makes sense. Sqlite/MySQL cheat under the hood and silently drop those fields (not sure that's technically what happens).
Or thinking about it another way if you are grouping by a field, what's the point of ordering it? How would that even make sense unless you also had an aggregate function on the ordered field?

Postgresql error with Rails 3 using order("RANDOM()")

Im trying to query my db for records that are similar to the currently viewed record (based on taggings), which I have working but I would like to randomize the order.
my development environment is mysql so I would do something like:
#tattoos = Tattoo.tagged_with(tags, :any => true).order("RAND()").limit(6)
which works, but my production environment is heroku which is using postgresql so I tried using this:
#tattoos = Tattoo.tagged_with(tags, :any => true).order("RANDOM()").limit(6)
but I get the following error:
ActionView::Template::Error (PGError: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
SELECT DISTINCT tattoos.* FROM "tattoos" JOIN taggings
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477 ON
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.taggable_id = tattoos.id AND
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.taggable_type = 'Tattoo' WHERE
(tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 3 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 4 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 5 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 24 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 205) ORDER BY RANDOM() LIMIT 6):
After analyzing the query more closely, I have to correct my first draft. The query would require a DISTINCT or GROUP BY the way it is.
The (possibly) duplicate tattoos.* come from first joining to (possibly) multiple rows in the table taggings. Your query engine then tries to get rid of such duplicates again by using DISTINCT - in a syntactically illegal way.
DISTINCT basically sorts the resulting rows by the resulting columns from left to right and picks the first for each set of duplicates. That's why the leftmost ORDER BY column have to match the SELECT list.
MySQL is more permissive and allows the non-standard use of DISTINCT, but PostgreSQL throws an error.
ORMs often produce ineffective SQL statements (they are just crutches after all). However, if you use appropriate PostgreSQL libraries, such an illegal statement shouldn't be produced to begin with. I am no Ruby expert, but something's fishy here.
The query is also very ugly and inefficient.
There are several ways to fix it. For instance:
SELECT *
FROM (<query without ORDER BY and LIMIT>) x
ORDER BY RANDOM()
LIMIT 6
Or, better yet, rewrite the query with this faster, cleaner alternative doing the same:
SELECT ta.*
FROM tattoos ta
WHERE EXISTS (
SELECT 1
FROM taggings t
WHERE t.taggable_id = ta .id
AND t.taggable_type = 'Tattoo'
AND t.tag_id IN (3, 4, 5, 24, 205)
)
ORDER BY RANDOM()
LIMIT 6;
You'll have to implement it in Ruby yourself.
not sure about the random, as it should work.
But take a note of http://railsforum.com/viewtopic.php?id=36581
which has code that might suit you
/lib/agnostic_random.rb
module AgnosticRandom
def random
case DB_ADAPTER
when "mysql" then "RAND()"
when "postgresql" then "RANDOM()"
end
end
end
/initializers/extend_ar.rb (name doesn't matter)
ActiveRecord::Base.extend AgnosticRandom

Rails expanding fields with scope, PG does not like it

I have a model of Widgets. Widgets belong to a Store model, which belongs to an Area model, which belongs to a Company. At the Company model, I need to find all associated widgets. Easy:
class Widget < ActiveRecord::Base
def self.in_company(company)
includes(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
Which will generate this beautiful query:
> Widget.in_company(Company.first).count
SQL (50.5ms) SELECT COUNT(DISTINCT "widgets"."id") FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1
=> 15088
But, I later need to use this scope in more complex scope. The problem is that AR is expanding the query by selecting individual fields, which fails in PG because selected fields must in the GROUP BY clause or the aggregate function.
Here is the more complex scope.
def self.sum_amount_chart_series(company, start_time)
orders_by_day = Widget.in_company(company).archived.not_void.
where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day).
group(pg_print_date_group).
select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
end
def self.pg_print_date_group
"CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)"
end
And this is the select it is throwing at PG:
> Widget.sum_amount_chart_series(Company.first, 1.day.ago)
SELECT "widgets"."id" AS t0_r0, "widgets"."user_id" AS t0_r1,<...BIG SNIP, YOU GET THE IDEA...> FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 AND "widgets"."archived" = 't' AND "widgets"."voided" = 'f' AND ("widgets"."print_datetime" BETWEEN '2011-04-24 00:00:00.000000' AND '2011-04-25 23:59:59.999999') GROUP BY CAST((print_datetime + interval '-7 hours') AS date)
Which generates this error:
PGError: ERROR: column
"widgets.id" must appear in the
GROUP BY clause or be used in an
aggregate function LINE 1: SELECT
"widgets"."id" AS t0_r0,
"widgets"."user_id...
How do I rewrite the Widget.in_company scope so that AR does not expand the select query to include every Widget model field?
As Frank explained, PostgreSQL will reject any query which doesn't return a reproducible set of rows.
Suppose you've a query like:
select a, b, agg(c)
from tbl
group by a
PostgreSQL will reject it because b is left unspecified in the group by statement. Run that in MySQL, by contrast, and it will be accepted. In the latter case, however, fire up a few inserts, updates and deletes, and the order of the rows on disk pages ends up different.
If memory serves, implementation details are so that MySQL will actually sort by a, b and return the first b in the set. But as far as the SQL standard is concerned, the behavior is unspecified -- and sure enough, PostgreSQL does not always sort before running aggregate functions.
Potentially, this might result in different values of b in result set in PostgreSQL. And thus, PostgreSQL yields an error unless you're more specific:
select a, b, agg(c)
from tbl
group by a, b
What Frank highlighted is that, in PostgreSQL 9.1, if a is the primary key, than you can leave b unspecified -- the planner has been taught to ignore subsequent group by fields when applicable primary keys imply a unique row.
For your problem in particular, you need to specify your group by as you currently do, plus every field that you're basing your aggregate onto, i.e. "widgets"."id", "widgets"."user_id", [snip] but not stuff like sum(amount), which are the aggregate function calls.
As an off topic side note, I'm not sure how your ORM/model works but the SQL it's generating isn't optimal. Many of those left outer joins seem like they should be inner joins. This will result in allowing the planner to pick an appropriate join order where applicable.
PostgreSQL version 9.1 (beta at this moment) might fix your problem, but only if there is a functional dependency on the primary key.
From the release notes:
Allow non-GROUP BY columns in the
query target list when the primary key
is specified in the GROUP BY clause
(Peter Eisentraut)
Some other database system already
allowed this behavior, and because of
the primary key, the result is
unambiguous.
You could run a test and see if it fixes your problem. If you can wait for the production release, this can fix the problem without changing your code.
Firstly simplify your life by storing all dates in a standard time-zone. Changing dates with time-zones should really be done in the view as a user convenience. This alone should save you a lot of pain.
If you're already in production write a migration to create a normalised_date column wherever it would be helpful.
nrI propose that the other problem here is the use of raw SQL, which rails won't poke around for you. To avoid this try using the gem called Squeel (aka Metawhere 2) http://metautonomo.us/projects/squeel/
If you use this you should be able to remove hard coded SQL and let rails get back to doing its magic.
For example:
.select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
becomes (once your remove the need for normalising the date):
.select{sum(amount).as(total_amount)}
Sorry to answer my own question, but I figured it out.
First, let me apologize to those who thought I might be having an SQL or Postgres issue, it is not. The issue is with ActiveRecord and the SQL it is generating.
The answer is... use .joins instead of .includes. So I just changed the line in the top code and it works as expected.
class Widget < ActiveRecord::Base
def self.in_company(company)
joins(:store => {:area => :company}).where(:companies => {:id => company.id})
end
end
I'm guessing that when using .includes, ActiveRecord is trying to be smart and use JOINS in the SQL, but it's not smart enough for this particular case and was generating that ugly SQL to select all associated columns.
However, all the replies have taught me quite a bit about Postgres that I did not know, so thank you very much.
sort in mysql:
> ids = [11,31,29]
=> [11, 31, 29]
> Page.where(id: ids).order("field(id, #{ids.join(',')})")
in postgres:
def self.order_by_ids(ids)
order_by = ["case"]
ids.each_with_index.map do |id, index|
order_by << "WHEN id='#{id}' THEN #{index}"
end
order_by << "end"
order(order_by.join(" "))
end
User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id)
#=> [3,2,1]

rails - apply complex visibility settings to thinking sphinx results

in the product i'm developing, i have a Message model.
Message can be restricted to groups, or not restricted (available to
everyone).
If user belongs to one of Message's groups OR message is not
restricted, user can see the message.
here is the query selecting visible messages (in hope that it can
clarify what i mean)
(2,3,4,5,6,1) are the groups user belongs to, they are different for
each user
SELECT `messages`.* FROM `messages`
LEFT JOIN groups_messages ON
messages.id=groups_messages.message_id AND groups_messages.group_id in (2,3,4,5,6,1)
WHERE (messages.restricted=0 OR groups_messages.group_id is not NULL)
GROUP BY messages.id
here is analogical query using a subquery, in hope it helps to clarify what is needed
SELECT * FROM `messages` WHERE
(
restricted=0 OR id in ( select distinct message_id from groups_messages where group_id in (2,3,4,5,6,1) )
)
is it possible somehow to apply this visibility setting to thinking
sphinx results? meaning to apply this OR and IN to
Message.search "test" with/with_all
?
if it is not possible, another question would be - is it somehow
possible to get ids of all objects found in search,
so that i could perform query myself, just adding AND to my WHERE
condition
SELECT * FROM `messages` WHERE
(
restricted=0 OR id in ( select distinct message_id from groups_messages where group_id in (2,3,4,5,6,1) )
)
AND id in (ids_of_the_messages_found_by_thinking_sphinx)
i imagine both the query without LEFT JOIN and adding AND to WHERE
will be a bit resource intensive for mysql, but if other solutions are
not possible, then this would do
thanks,
Pavel K
received a response from Pat Allan, developer of Thinking Sphinx,
link text
I think the best way is to build a string that includes 0 if the
message is unrestricted, otherwise returns the group ids, concatenated
together with commas... ie:
"2,3,4,5,6" or "0"
So, you'll want to build a SQL snippet for an attribute, something
vaguely like:
has "IF(messages.restricted = 0, '0', GROUP_CONCAT (groups_messages.group_id SEPARATOR ','))", :as => :group_ids, :type => :multi
And then for searching:
Message.search "foo", :with => {
:group_ids => [0] + current_user.message_group_ids
}
The SQL snippet will have to be different if you're using PostgreSQL, though... let me know if that's the case.
will try that

Resources