How can I delete duplicates based on two attributes? - ruby-on-rails

My model is really simple:
create_table "stack_items", force: true do |t|
t.integer "stack_id"
t.integer "service_id"
t.text "description"
end
I need to remove duplicate StackItem records that have the same stack_id and service_id. However if one of the dupes has anything in the description field, I have to keep that one, and delete the other duplicate.
StackItem.group(:stack_id, :service_id).order("count_id desc").where("COUNT(*) > 1")
So far I've tried to just grab the duplicates but it's saying I cannot count within a where statement.
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: aggregate functions are not allowed in WHERE
How can I achieve this using Rails 4 and ActiveRecord? My database is Postgresql.

Related

Active Record Where Not boolean: true

I'm struggling to wrap my mind around an ActiveRecord query.
I'm trying to search my database for GolfRetailer objects with ID's 1..100, that have something (not nil) in their :website field, and that don't have true in their duplicate_domain field.
Here's the query I expected to work:
GolfRetailer.where.not(website: nil, duplicate_domain: true).where(id: 1..100)
I also tried this variant of essentially the same query: GolfRetailer.where.not(website: nil).where(id: 1..100, duplicate_domain: !true)
But both return an empty array, despite there definitely being records that meet those requirements.
When I run GolfRetailer.where.not(website: nil).where(id: 1..100) I get an array, and when I run GolfRetailer.where.not(website: nil, duplicate_domain: nil).where(id: 1..100) I also get an array, but with all records that do have the true duplicate_domain flag, which isn't what I'm looking for.
I'd rather not search for records that have duplicate_domain: nil as that's not always correct (I may not have processed their domain yet).
For clarity, here is the Schema for the Model.
create_table "golf_retailers", force: :cascade do |t|
t.string "name"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.string "place_id"
t.string "website"
t.string "formatted_address"
t.string "google_places_name"
t.string "email"
t.boolean "duplicate_domain"
t.index ["duplicate_domain"], name: "index_golf_retailers_on_duplicate_domain"
end
What am I missing to make this query work?
This is happening because in SQL when you do a != TRUE, any NULL values will not be included in the result. This is because the NULL value represents an unknown value, so the DB does not know how to do any comparison operations on an unknown value and therefore they're excluded.
One way to get around this is to use IS DISTINCT FROM:
GolfRetailer
.where(id: 1..100)
.where.not(website: nil)
.where("duplicate_domain IS DISTINCT FROM ?", true)
As others have mentioned, you should also ask yourself if it's really the case that it's ever unknown to you if a GolfRetailer has a duplicate_domain.
If, all GolfRetailers with a duplicate_domain of NULL actually mean they don't have one (false) than you should consider preventing a NULL value for that column entirely.
You can do this by adding a NOT NULL constraint on the column with a change_column database migration.
In order to add the NOT NULL constraint you will first need to make sure all of the data in the column has non-null values.
def change
GolfRetailer.in_batches.update_all(duplicate_domain: false)
change_column_null :golf_retailers, :duplicate_domain
end
If your application is under load, you should also be careful about the potential performance any migration like this might have - notably if you add a NOT NULL constraint with a default value.
Consider using something like the Strong Migrations gem to help find DB migrations that might cause downtime before production.

Ruby on Rails/PostGRE - No operator matches the given name and argument type(s). Error

I am obtaining this error on my Ruby on Rails app,
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
I've read on the stack overflow API and can't find an answer that works for me. So this is the specific parts of the code:
ActiveRecord::StatementInvalid in Store#show_item
Showing /media/store_test/app/views/store/show_item.html.erb where line #24 raised:
PG::UndefinedFunction: ERROR: operator does not exist: character varying = integer
LINE 1: ...CT "show_item".* FROM "current_item" WHERE (user_id = 1)
^
My logic behind this error is that I have two users, store users and employee users, they are both users but employee users have a "flag" on them, so they can see all items in store. Store users do not have this flag, so this web page should show items they have "wishlisted", and when I create a table to populate this, I am getting the above error/
This code works when Im a employee user, and populates my table as required, but does not work when I'm a store user.
Question: How can I fix this error without heavily modifying my code?
EDIT: SCHEMA
create_table "current_item", force: :cascade do |t|
t.string "name", default: "", null: false
t.string "description"
t.integer "cost"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.string "user_id"
end
Thanks to Marcin Kologziej:
user_id column was defined with the wrong type. Instead of strings, it should have been integer, as per the error. Therefore, I have created a new migration, using the code:
change_column :current_item, :user_id, :integer, using: 'user_id_id::integer'
And then performing:
rake db:migrate
And everything works perfectly now.

PG::UndefinedColumn: ERROR: column "courseid" of relation "courses" does not exist

I am doing Rails + PostgreSql app. I need to run sql dump on production env. I have courses table with courseID attribute. But when I run my sql I get this error:
PG::UndefinedColumn: ERROR: column "courseid" of relation "courses" does not exist
LINE 1: INSERT INTO courses (courseID, name, created_at, updated_at)...
Here is how my sql dump looks like:
INSERT INTO course (courseID, name, created_at, updated_at) VALUES
('CSCI150', 'Fundamentals of Programming',
localtimestamp, localtimestamp ),
etc...;
Tried to put quotes (' ') around attributes, didn't help. Strange error. What might cause that?
EDIT:
Here is what in my schema.rb
create_table "courses", force: :cascade do |t|
t.string "name"
t.string "courseID"
t.integer "school_id"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
All identifiers (including column names) that are not double-quoted are folded to lower case in PostgreSQL. Column names that were created with double-quotes and thereby retained upper-case letters (and/or other syntax violations) have to be double-quoted for the rest of their life. So, yes, PostgreSQL column names are case-sensitive
Read it here
Try changing courseID to lowercase or enclose it in doublequotes in the dump.

rails4 pluck with order and limit

In my sidebar I display the freshly created user profiles. Profile belongs_to user and user has_one_profile. I realized that I only use 3 columns from the profile table so it would be better to use pluck. I also have a link_to user_path(profile.user) in the partial, so somehow I have to tell who the user is. At the moment I'm using includes, but I don't need the whole user table. So I use to many columns both from the user and the profile tables.
How can I optimize this with pluck? I tried a few versions, but always got some error (most of the time profile.user is not defined).
My current code:
def set_sidebar_users
#profiles_sidebar = Profile.order(created_at: :desc).includes(:user).limit(3) if user_signed_in?
end
create_table "profiles", force: :cascade do |t|
t.integer "user_id", null: false
t.string "first_name", null: false
t.string "last_name", null: false
t.string "company", null: false
t.string "job_title", null: false
t.string "phone_number"
t.text "description"
t.datetime "created_at"
t.datetime "updated_at"
t.string "avatar"
t.string "location"
end
Okay let's explain three different way to accomplish what you are looking for.
First of all there is a difference in includes and joins
Includes just eager load the association with all of the specified columns for associations. It does not allow you to query or select multiple columns from both table. It what joins do . It allow you to query both tables and select columns of your choice.
def set_sidebar_users
#profiles_sidebar = Profile.select("profiles.first_name,profiles.last_name,profiles.id,users.email as user_email,user_id").joins(:user).order("profile.created_at desc").limit(3) if user_signed_in?
end
It will return you the Profiles relation which has all of the columns you provided in select clause. You can get them just like you do for profile object e-g
#profiles_sidebar.first.user_email will give you user email for this profile.
This approach is best if you want to query on multiple tables or wanna select multiple columns from both table.
2.Pluck
def set_sidebar_users
#profiles_sidebar = Profile.order(created_at: :desc).includes(:user).limit(3).pluck("users.email,profiles.first_name") if user_signed_in?
end
Pluck is just used to get columns from multiple associations but it does not allow you to use the power of ActiveRecord. It simply returns you the array of selected columns in same order.
like in the first example you can get the user for profile object with #profiles_sidebar.first.user But with pluck you cannot because it's just a plain array. So that's why your most of the solutions raise error profile.user is not defined
Association with selected columns.
Now this is option three. In first solution you can get multiple columns on both tables and use the power of ActiveRecord but it does not eager load the associations. So it will still cost you N+1 queries if you loop through the association on returned result like #profiles_sidebar.map(&:user)
So if you wanna use includes but want to use selected columns then you should have new association with selected columns and call that association.
e-g
In profile.rb
belongs_to :user_with_selected_column,select: "users.email,users.id"
Now you can include it in above code
def set_sidebar_users
#profiles_sidebar = Profile.order(created_at: :desc).includes(:user_with_selected_column).limit(3) if user_signed_in?
end
Now this will eager load users but will select only email and id of user.
More information can be found on
ActiveRecord includes. Specify included columns
UPDATE
As you asked about the pros for pluck so let's explain it.
As you know pluck returns you the plain array. So it does not instantiate ActiveRecord object it simply returns you the data returned from database.
So pluck is best to use where you don't need ActiveRecord Objects but just to show the returned data in tabular form.
Select returns you the relations so you can further query on it or call the model methods on it's instances.
So if we summaries it we can say
pluck for model values, select for model objects
More informations can be found at http://gavinmiller.io/2013/getting-to-know-pluck-and-select/

Rails 4: How to include associated objects conditionally

Here is what I am trying to do:
Product.first.reviews.includes(:comments).where('comments.reply_to_id=?', nil)
So basically I want to load the product reviews along with any associated comments.
Comments belong to the review, not on the product.
Here is the error:
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'comments.reply_to_id' in 'where clause': SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`product_id` = 8 AND (comments.reply_to_id=NULL)
From schema.rb:
create_table "comments", force: :cascade do |t|
...
t.integer "reply_to_id", limit: 4
end
#pavan is right. But you can use Hash syntax, which finds it out automatically.
Product
.first
.reviews
.includes(:comments)
.where(comments: { reply_to_id: nil })
From the API
If you want to add conditions to your included models you’ll have to
explicitly reference
You need to include .references at the end of the query when you are using includes with conditions
Product.first.reviews.includes(:comments).where('comments.reply_to_id=?', nil).references(:comments)

Resources