How to update a certain column with large number of rows

How to update a certain column with large number of rows - ruby-on-rails

We are using the following query to set a certain field to null in a table having 2 million rows. Is there a faster way to do this using the ActiveRecord API. Right now it takes 2-3 minutes to return from this call.
Foo.update_all(:bar => nil)

Try this, another way to do what you want.
ALTER TABLE foo DROP COLUMN bar;
and then,
ALTER TABLE foo ADD COLUMN bar INT(10) DEFAULT NULL;

Probably not. This will just execute very straightforward SQL:
UPDATE foos SET bar = NULL
No idea how to utilize the ActiveRecord API to make that faster.

Related

Does update change the order of records in a table in PostgreSQL?

My code depends on the order of records in the table. My assumption was that a table can be considered a list so that the records maintain order. I have a small update code as shown below that will update a record at a particular index in the table.
p = pieces[index]
p.position = 0
p.save
I check the order of records before this update and after this update then i see that after the update the record that is updated is moved to the last of the list. I print Piece.all to print the list. The order is maintained in mysql but when i deploy it to heroku which uses postgre the order was not maintained so this was a surprising find for me.
Is there no guarantee of order in tables and one should not depend on the order? Please correct my misunderstanding and thanks for the clarification.

You should NEVER depend on the order in my honest opinion.
Rows are returned in an unspecified order, per sql specs, unless you add an order by clause. In Postgres, that means you'll get rows in, basically, the order that live rows read on the disk.
MySQL tends to return rows in the order they're inserted, and this is why you see the different in behavior.
If you want them to always be returned in the order they were created, you can use Item.order("created_at")

You state:
My assumption was that a table can be considered a list so that the
records maintain order.
This is incorrect. A table represents an unordered set. There is no inherent ordering in the table. A result set similarly lacks ordering. The only way to guarantee the ordering of a result set is to use ORDER BY in the query.
So, an update changes values in one or more columns in one or more rows. It does not change the "ordering" of rows, because they are not ordered.
Note: Under some circumstances, a query may appear to return results in a particular order. You really should not depend on this behavior, unless the query has an explicit ORDER BY.

Tables normally are unordered, and should be presumed to be unordered unless they have a CLUSTER(ed) index. That's an important piece of information because understanding clustered indexes is somewhat useful. That said, what you receive back from a query, the resultset, should be presumed to be unordered because the join-order is always undefined.
So if order matters always be explicit and use ORDER BY. Now for illustration let's have some fun.
CREATE TABLE bar ( qux serial PRIMARY KEY, asdf text );
INSERT INTO bar (asdf) ( VALUES ('z'),('x'),('g'),('a') );
Now we've got this,
SELECT * FROM BAR;
qux | asdf
-----+------
1 | z
2 | x
3 | g
4 | a
Now we create a CLUSTERed index,
CREATE INDEX asdfidx ON bar (asdf);
CLUSTER bar USING asdfidx;
Now the order is guaranteed,
SELECT * FROM bar;
qux | asdf
-----+------
4 | a
3 | g
2 | x
1 | z

PostgreSQL and ActiveRecord subselect for race condition

I'm experiencing a race condition in ActiveRecord with PostgreSQL where I'm reading a value then incrementing it and inserting a new record:
num = Foo.where(bar_id: 42).maximum(:number)
Foo.create!({
bar_id: 42,
number: num + 1
})
At scale, multiple threads will simultaneously read then write the same value of number. Wrapping this in a transaction doesn't fix the race condition because the SELECT doesn't lock the table. I can't use an auto increment, because number is not unique, it's only unique given a certain bar_id. I see 3 possible fixes:
Explicitly use a postgres lock (a row-level lock?)
Use a unique constraint and retry on fails (yuck!)
Override save to use a subselect, I.E.
INSERT INTO foo (bar_id, number) VALUES (42, (SELECT MAX(number) + 1 FROM foo WHERE bar_id = 42));
All these solutions seem like I'd be reimplementing large parts of ActiveRecord::Base#save! Is there an easier way?
UPDATE:
I thought I found the answer with Foo.lock(true).where(bar_id: 42).maximum(:number) but that uses SELECT FOR UDPATE which isn't allowed on aggregate queries
UPDATE 2:
I've just been informed by our DBA, that even if we could do INSERT INTO foo (bar_id, number) VALUES (42, (SELECT MAX(number) + 1 FROM foo WHERE bar_id = 42)); that doesn't fix anything, since the SELECT runs in a different lock than the INSERT

Your options are:
Run in SERIALIZABLE isolation. Interdependent transactions will be aborted on commit as having a serialization failure. You'll get lots of error log spam, and you'll be doing lots of retries, but it'll work reliably.
Define a UNIQUE constraint and retry on failure, as you noted. Same issues as above.
If there is a parent object, you can SELECT ... FOR UPDATE the parent object before doing your max query. In this case you'd SELECT 1 FROM bar WHERE bar_id = $1 FOR UPDATE. You are using bar as a lock for all foos with that bar_id. You can then know that it's safe to proceed, so long as every query that's doing your counter increment does this reliably. This can work quite well.
This still does an aggregate query for each call, which (per next option) is unnecessary, but at least it doesn't spam the error log like the above options.
Use a counter table. This is what I'd do. Either in bar, or in a side-table like bar_foo_counter, acquire a row ID using
UPDATE bar_foo_counter SET counter = counter + 1
WHERE bar_id = $1 RETURNING counter
or the less efficient option if your framework can't handle RETURNING:
SELECT counter FROM bar_foo_counter
WHERE bar_id = $1 FOR UPDATE;
UPDATE bar_foo_counter SET counter = $1;
Then, in the same transaction, use the generated counter row for the number. When you commit, the counter table row for that bar_id gets unlocked for the next query to use. If you roll back, the change is discarded.
I recommend the counter approach, using a dedicated side table for the counter instead of adding a column to bar. That's cleaner to model, and means you create less update bloat in bar, which can slow down queries to bar.

Count occurrence of values in a serialized attribute(array) in Active Admin dashboard (Rails, Active admin 1.0, Postgresql database, postgres_ext gem)

I'd like to have a basic table summing up the number of occurence of values inside arrays.
My app is a Daily Deal app built to learn more Ruby on Rails.
I have a model Deals, which has one attribute called Deal_goal. It's a multiple select which is serialized in an array.
Here is the deal_goal taken from schema.db:
t.string "deal_goal",:array => true
So a deal A can have deal= goal =[traffic, qualification] and another deal can have as deal_goal=[branding, traffic, acquisition]
What I'd like to build is a table in my dashboard which would take each type of goal (each value in the array) and count the number of deals whose deal_goal's array would contain this type of goal and count them.
My objective is to have this table:
How can I achieve this? I think I would need to group each deal_goal array for each type of value and then count the number of times where this goals appears in the arrays. I'm quite new to RoR and can't manage to do it.
Here is my code so far:
column do
panel "top of Goals" do
table_for Deal.limit(10) do
column ("Goal"), :deal_goal ????
# add 2 columns:
'nb of deals with this goal'
'Share of deals with this goal'
end
end
Any help would be much appreciated!

I can't think of any clean way to get the results you're after through ActiveRecord but it is pretty easy in SQL.
All you're really trying to do is open up the deal_goal arrays and build a histogram based on the opened arrays. You can express that directly in SQL this way:
with expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select goal, count(*) n
from expanded_deals
group by goal
And if you want to include all four goals even if they don't appear in any of the deal_goals then just toss in a LEFT JOIN to say so:
with
all_goals(goal) as (
values ('traffic'),
('acquisition'),
('branding'),
('qualification')
),
expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select all_goals.goal goal,
count(expanded_deals.id) n
from all_goals
left join expanded_deals using (goal)
group by all_goals.goal
SQL Demo: http://sqlfiddle.com/#!15/3f0af/20
Throw one of those into a select_rows call and you'll get your data:
Deal.connection.select_rows(%q{ SQL goes here }).each do |row|
goal = row.first
n = row.last.to_i
#....
end
There's probably a lot going on here that you're not familiar with so I'll explain a little.
First of all, I'm using WITH and Common Table Expressions (CTE) to simplify the SELECTs. WITH is a standard SQL feature that allows you to produce SQL macros or inlined temporary tables of a sort. For the most part, you can take the CTE and drop it right in the query where its name is:
with some_cte(colname1, colname2, ...) as ( some_pile_of_complexity )
select * from some_cte
is like this:
select * from ( some_pile_of_complexity ) as some_cte(colname1, colname2, ...)
CTEs are the SQL way of refactoring an overly complex query/method into smaller and easier to understand pieces.
unnest is an array function which unpacks an array into individual rows. So if you say unnest(ARRAY[1,2]), you get two rows back: 1 and 2.
VALUES in PostgreSQL is used to, more or less, generate inlined constant tables. You can use VALUES anywhere you could use a normal table, it isn't just some syntax that you throw in an INSERT to tell the database what values to insert. That means that you can say things like this:
select * from (values (1), (2)) as dt
and get the rows 1 and 2 out. Throwing that VALUES into a CTE makes things nice and readable and makes it look like any old table in the final query.

INSERT INTO with unknown number of column in scriptella

I have to back up a table, which can change number of column. When my etl script starts it doesn't know the number of column. How can I create INSERT INTO table VALUES (?1, ?2, ...) script on the fly?
Regards,
Jacek

Depending on the database, one can use CREATE TABLE FROM SELECT (or similar) to back up the table. Example:
CREATE TABLE new_table AS (SELECT * FROM old_table);

Create missing auto increment attribute with rails migration

I'm writing a migration to convert a non-rails app into the right format for rails - one of the tables for some reason does not have auto increment set on the id column. Is there a quick way to turn it on while in a migration, maybe with change_column or something?

You need to execute an SQL statement.
statement = "ALTER TABLE `users` CHANGE `id` `id` SMALLINT( 5 ) UNSIGNED NOT NULL AUTO_INCREMENT"
ActiveRecord::Base.connection.execute(statement)
Note this is just an example. The final SQL statement syntax depends on the database.

If you're on postgesql, a single request won't make it. You'll need to create a new sequence in the database.
create sequence users_id_seq;
Then add the id column to your table
alter table users
add id INT UNIQUE;
Then set the default value for the id
alter table users
alter column id
set default nextval('users_id_seq');
Then populate the id column. This may be quite long if the table has many rows
update users
set id = nextval('users_id_seq');
Hope this helps postgresql users...

The Postgres answer by #jlfenaux misses out on the serial type, which does all of it for you automatically:
ALTER TABLE tbl add tbl_id serial;
More details in this related answer.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to update a certain column with large number of rows - ruby-on-rails

We are using the following query to set a certain field to null in a table having 2 million rows. Is there a faster way to do this using the ActiveRecord API. Right now it takes 2-3 minutes to return from this call. Foo.update_all(:bar => nil)

Try this, another way to do what you want. ALTER TABLE foo DROP COLUMN bar; and then, ALTER TABLE foo ADD COLUMN bar INT(10) DEFAULT NULL;

Probably not. This will just execute very straightforward SQL: UPDATE foos SET bar = NULL No idea how to utilize the ActiveRecord API to make that faster.

Related

Does update change the order of records in a table in PostgreSQL?

PostgreSQL and ActiveRecord subselect for race condition

Count occurrence of values in a serialized attribute(array) in Active Admin dashboard (Rails, Active admin 1.0, Postgresql database, postgres_ext gem)

INSERT INTO with unknown number of column in scriptella

Create missing auto increment attribute with rails migration

Categories

Resources