Deal with PostgreSQL concurrency using Rails find_or_create

Deal with PostgreSQL concurrency using Rails find_or_create - ruby-on-rails

For some reason this code can create duplicated games if different users run it at the same moment:
game = Game.find_or_create_by(
status: Game::STATUS[:waiting],
category_id: params[:category_id],
private: 0
) do |g|
is_new = true
g.user = current_user
end
I can't figure out clearly what is the matter, but probably its about different Unicorn processes which use different database connections so transactions can run in parallel.
If so, I need the right way to avoid it, maybe I should use Rails transactions or Postgres locks, but I really need an example of using.
Thank you.

It can happen in high concurrency levels.
According to rails docs, these queries will run:
SELECT * FROM games WHERE status = 'waiting' AND ... LIMIT 1;
INSERT INTO games (status, ...) VALUES ('waiting', ...);
The second only runs, when the first haven't returned a row.
It is possible, if two (or more) connections starts the first query within a few microseconds, that multiple processes will create multiple entries. To prevent that, you can use an ACCESS EXCLUSIVE lock on that table, or use a custom advisory lock.
You can use some unique index too, to prevent multiple entries to be inserted into your database, but if it's used on its own, that will cause SQL exceptions in this situation.
EDIT:
ACCESS EXCLUSIVE lock can be acquired through the LOCK command.
Advisory lock can be acquired through using the pg_advisory_lock(id) function.
Both requires you to run arbitrary SQL commands.
Another way would be to use custom queries with:
insert only if it's not exists (& return with all fields)
select only if not inserted
Something, like:
INSERT INTO games (status, category_id, private)
SELECT 'waiting', 2, 0
WHERE NOT EXISTS (
SELECT 1
FROM games
WHERE status = 'waiting'
AND category_id = 2
AND private = 0
)
RETURNING *;
-- only select, when this not inserted anything
SELECT *
FROM games
WHERE status = 'waiting'
AND category_id = 2
AND private = 0

Related

How to set a specific Number as Next ID and Its Continuation to use as Primary Key on Creation of Next Record On wards in PL-SQL in my Rails-4 App?

If my Question title is not clear,
Let me details it more:
In my rails application, in local(development) I am using PL-SQL as my databse, and there is a Customer table where the Primary ID starts from 1,2,3.......so on. Till I start using the same database in Local(development) & Production there was no problem in creating customers, so after I started running Local(development) & Production in parallel with same DB many of the time customer creations fails as ID's trying to create as Duplicate. So how can I set in my Local/Production to change the next Index to start with for the ID into another number to avoid this conflict. ?
Eg: I want to continue using(1,2.....etc) the ID in Local & in production I want to set the next Id from 50000 on wards and continuation.

If I understand your question correctly, you have development environment and live environment both linked to the same Database and same table (say the table name is customer_tab, and the primary id column is cus_id).
Although, this is highly not recommended practice, but if you want to have the primary id be in sequence regardless where the insert is coming from (live or dev), then you can use sequence and triggers. That is, insert statement will not insert the primary id, rather it will leave it null. However, on insert, you run the trigger that uses the sequence numbers. Something like this:
/*Create sequence*/
create sequence customer_tab_seq
start with 5000
increment by 1;
/*Create trigger*/
CREATE OR REPLACE TRIGGER customer_tab_tri
BEFORE INSERT ON customer_tab
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
BEGIN
IF INSERTING THEN
IF :NEW.cus_id IS NULL THEN
SELECT customer_tab_seq.NEXTVAL INTO :NEW.cus_id FROM DUAL;
END IF;
END IF;
END;
/
In this case, first row inserted will be given 5000 (regardless if it is from live or dev), the second will be given 5001 (regardless if it is from live or dev). And it will save you future conflicts.

Deadlock in update query

I have a update query in a stored procedure which is the main reason for causing deadlock.
This stored procedure is used in SSIS package in a foreach loop.
It looks like that the stored procedure calls the Salespreprocessing table and goes into deadlock state. This occurs when we make a call to this SSIS package simultaneously. Here is my SQL query
UPDATE SPP
SET SPP.Promotion_Id = T.PromotionID
FROM staging.SalesPreProcessing SPP WITH(INDEX(staging_CIDXSalesPreprocessing1))
INNER JOIN #WithConcatenatedPromotionID T
ON SPP.DocLineNo = T.BillItem
AND SPP.DocNum = T.BillNumber
AND SPP.Cust_Code = T.CustomerCode
AND SPP.ZCS_EAN_CODE = T.ProductCode
AND SPP.BILLING_REPORTING_DATE = T.PricingDate
WHERE SPP.InterfaceStatusTrackingID = #in_InterfaceStatusTrackingId AND SPP.setupid=#in_SetupId
I have created clustered index for setupid and a non-clustered indexes for rest of the columns of the table.
Here is my non-clustered Index
CREATE NONCLUSTERED INDEX [staging_CIDXSalesPreprocessing] on salespreprocessing
(
[SetupId] ASC,
[InterfaceStatusTrackingID] ASC
) INCLUDE`enter code here`
([DocLineNo] ,
[DocNum] ,
[Cust_Code] ,
[ZCS_EAN_CODE] ,
[Billing_Reporting_Date]
)
I am still getting Deadlock

Firstly the non-clustered index seems pointless as its first column is setupId which you say is the column for the clustered index. Thus, assuming that the setupId values are sufficiently variegated, queries will always use the clustered index over and above the nonclustered one. What is the primary key?
In terms of avoiding the deadlock you need to:
1) Ensure that the locks are taken in the same order each time that the SP is called within the foreach loop. I don't know what you're looping round? The results of another SP/query? If so ensure that there is an ORDER BY in that.
2) Is the foreach loop within a transaction? If it is does it need to be? Could you release the locks after each call to the SP by calling it from a non-transactional environment?
3) Take as few locks as possible within the SP. I can't see what query is used to create the temporary table you join to but that may be the issue. You need to use SQL Profiler to find out what object exactly the deadlock is occurring on but using hints such as ROWLOCK may help.

Is there anyway to make a lesser impact on my database with this request?

For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.

The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id

As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.

THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.

Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms

PostgreSQL and ActiveRecord subselect for race condition

I'm experiencing a race condition in ActiveRecord with PostgreSQL where I'm reading a value then incrementing it and inserting a new record:
num = Foo.where(bar_id: 42).maximum(:number)
Foo.create!({
bar_id: 42,
number: num + 1
})
At scale, multiple threads will simultaneously read then write the same value of number. Wrapping this in a transaction doesn't fix the race condition because the SELECT doesn't lock the table. I can't use an auto increment, because number is not unique, it's only unique given a certain bar_id. I see 3 possible fixes:
Explicitly use a postgres lock (a row-level lock?)
Use a unique constraint and retry on fails (yuck!)
Override save to use a subselect, I.E.
INSERT INTO foo (bar_id, number) VALUES (42, (SELECT MAX(number) + 1 FROM foo WHERE bar_id = 42));
All these solutions seem like I'd be reimplementing large parts of ActiveRecord::Base#save! Is there an easier way?
UPDATE:
I thought I found the answer with Foo.lock(true).where(bar_id: 42).maximum(:number) but that uses SELECT FOR UDPATE which isn't allowed on aggregate queries
UPDATE 2:
I've just been informed by our DBA, that even if we could do INSERT INTO foo (bar_id, number) VALUES (42, (SELECT MAX(number) + 1 FROM foo WHERE bar_id = 42)); that doesn't fix anything, since the SELECT runs in a different lock than the INSERT

Your options are:
Run in SERIALIZABLE isolation. Interdependent transactions will be aborted on commit as having a serialization failure. You'll get lots of error log spam, and you'll be doing lots of retries, but it'll work reliably.
Define a UNIQUE constraint and retry on failure, as you noted. Same issues as above.
If there is a parent object, you can SELECT ... FOR UPDATE the parent object before doing your max query. In this case you'd SELECT 1 FROM bar WHERE bar_id = $1 FOR UPDATE. You are using bar as a lock for all foos with that bar_id. You can then know that it's safe to proceed, so long as every query that's doing your counter increment does this reliably. This can work quite well.
This still does an aggregate query for each call, which (per next option) is unnecessary, but at least it doesn't spam the error log like the above options.
Use a counter table. This is what I'd do. Either in bar, or in a side-table like bar_foo_counter, acquire a row ID using
UPDATE bar_foo_counter SET counter = counter + 1
WHERE bar_id = $1 RETURNING counter
or the less efficient option if your framework can't handle RETURNING:
SELECT counter FROM bar_foo_counter
WHERE bar_id = $1 FOR UPDATE;
UPDATE bar_foo_counter SET counter = $1;
Then, in the same transaction, use the generated counter row for the number. When you commit, the counter table row for that bar_id gets unlocked for the next query to use. If you roll back, the change is discarded.
I recommend the counter approach, using a dedicated side table for the counter instead of adding a column to bar. That's cleaner to model, and means you create less update bloat in bar, which can slow down queries to bar.

MySQL stored procedure causing problems?

EDIT:
I've narrowed my mysql wait timeout down to this line:
IF #resultsFound > 0 THEN
INSERT INTO product_search_query (QueryText, CategoryId) VALUES (keywords, topLevelCategoryId);
END IF;
Any idea why this would cause a problem? I can't work it out!
I've written a stored proc to search for products in certain categories, due to certain constraints I came across, I was unable to do what I wanted (limiting, but whilst still returning the total number of rows found, with sorting, etc..)
It's meant splits up a string of category Ids, from 1,2,3 in to a temporary table, then builds the full-text search query based on sorting options and limits, executes the query string and then selects out the total number of results.
Now, I know I'm no MySQL guru, very far from it, I've got it working, but I keep getting time outs with product searches etc. So I'm thinking this may be causing some kind of problem?
Does anyone have any ideas how I can tidy this up, or even do it in a much better way that I probably don't know about?
Thanks.
DELIMITER $$
DROP PROCEDURE IF EXISTS `product_search` $$
CREATE DEFINER=`root`#`localhost` PROCEDURE `product_search`(keywords text, categories text, topLevelCategoryId int, sortOrder int, startOffset int, itemsToReturn int)
BEGIN
declare foundPos tinyint unsigned;
declare tmpTxt text;
declare delimLen tinyint unsigned;
declare element text;
declare resultingNum int unsigned;
drop temporary table if exists categoryIds;
create temporary table categoryIds
(
`CategoryId` int
) engine = memory;
set tmpTxt = categories;
set foundPos = instr(tmpTxt, ',');
while foundPos <> 0 do
set element = substring(tmpTxt, 1, foundPos-1);
set tmpTxt = substring(tmpTxt, foundPos+1);
set resultingNum = cast(trim(element) as unsigned);
insert into categoryIds (`CategoryId`) values (resultingNum);
set foundPos = instr(tmpTxt,',');
end while;
if tmpTxt <> '' then
insert into categoryIds (`CategoryId`) values (tmpTxt);
end if;
CASE
WHEN sortOrder = 0 THEN
SET #sortString = "ProductResult_Relevance DESC";
WHEN sortOrder = 1 THEN
SET #sortString = "ProductResult_Price ASC";
WHEN sortOrder = 2 THEN
SET #sortString = "ProductResult_Price DESC";
WHEN sortOrder = 3 THEN
SET #sortString = "ProductResult_StockStatus ASC";
END CASE;
SET #theSelect = CONCAT(CONCAT("
SELECT SQL_CALC_FOUND_ROWS
supplier.SupplierId as Supplier_SupplierId,
supplier.Name as Supplier_Name,
supplier.ImageName as Supplier_ImageName,
product_result.ProductId as ProductResult_ProductId,
product_result.SupplierId as ProductResult_SupplierId,
product_result.Name as ProductResult_Name,
product_result.Description as ProductResult_Description,
product_result.ThumbnailUrl as ProductResult_ThumbnailUrl,
product_result.Price as ProductResult_Price,
product_result.DeliveryPrice as ProductResult_DeliveryPrice,
product_result.StockStatus as ProductResult_StockStatus,
product_result.TrackUrl as ProductResult_TrackUrl,
product_result.LastUpdated as ProductResult_LastUpdated,
MATCH(product_result.Name) AGAINST(?) AS ProductResult_Relevance
FROM
product_latest_state product_result
JOIN
supplier ON product_result.SupplierId = supplier.SupplierId
JOIN
category_product ON product_result.ProductId = category_product.ProductId
WHERE
MATCH(product_result.Name) AGAINST (?)
AND
category_product.CategoryId IN (select CategoryId from categoryIds)
ORDER BY
", #sortString), "
LIMIT ?, ?;
");
set #keywords = keywords;
set #startOffset = startOffset;
set #itemsToReturn = itemsToReturn;
PREPARE TheSelect FROM #theSelect;
EXECUTE TheSelect USING #keywords, #keywords, #startOffset, #itemsToReturn;
SET #resultsFound = FOUND_ROWS();
SELECT #resultsFound as 'TotalResults';
IF #resultsFound > 0 THEN
INSERT INTO product_search_query (QueryText, CategoryId) VALUES (keywords, topLevelCategoryId);
END IF;
END $$
DELIMITER ;
Any help is very very much appreciated!

There is little you can do with this query.
Try this:
Create a PRIMARY KEY on categoryIds (categoryId)
Make sure that supplier (supplied_id) is a PRIMARY KEY
Make sure that category_product (ProductID, CategoryID) (in this order) is a PRIMARY KEY, or you have an index with ProductID leading.
Update:
If it's INSERT that causes the problem and product_search_query in a MyISAM table the issue can be with MyISAM locking.
MyISAM locks the whole table if it decides to insert a row into a free block in the middle of the table which can cause the timeouts.
Try using INSERT DELAYED instead:
IF #resultsFound > 0 THEN
INSERT DELAYED INTO product_search_query (QueryText, CategoryId) VALUES (keywords, topLevelCategoryId);
END IF;
This will put the records into the insertion queue and return immediately. The record will be added later asynchronously.
Note that you may lose information if the server dies after the command is issued but before the records are actually inserted.
Update:
Since your table is InnoDB, it may be an issue with table locking. INSERT DELAYED is not supported on InnoDB.
Depending on the nature of the query, DML queries on InnoDB table may place gap locks which will lock the inserts.
For instance:
CREATE TABLE t_lock (id INT NOT NULL PRIMARY KEY, val INT NOT NULL) ENGINE=InnoDB;
INSERT
INTO t_lock
VALUES
(1, 1),
(2, 2);
This query performs ref scans and places the locks on individual records:
-- Session 1
START TRANSACTION;
UPDATE t_lock
SET val = 3
WHERE id IN (1, 2)
-- Session 2
START TRANSACTION;
INSERT
INTO t_lock
VALUES (3, 3)
-- Success
This query, while doing the same, performs a range scan and places a gap lock after key value 2, which will not let insert key value 3:
-- Session 1
START TRANSACTION;
UPDATE t_lock
SET val = 3
WHERE id BETWEEN 1 AND 2
-- Session 2
START TRANSACTION;
INSERT
INTO t_lock
VALUES (3, 3)
-- Locks

Try wrapping your EXECUTE with the following:
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
EXECUTE TheSelect USING #keywords, #keywords, #startOffset, #itemsToReturn;
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ ;
I do something similiar in TSQL for all report stored proc and searches where repeatable reads aren't important to reduce locking/blocking issues with other processes running on the database.

Turn on slow queries, that will give you an idea of what is taking so long to execute that there is a timeout.
http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
Pick the slowest query and optimise that. then run for a while and repeat.
There is some excellent information and tools here http://hackmysql.com/nontech
DC
UPDATE:
Either you have a network problem causing the timeout, if you are using a local mysql instance then that is unlikely, OR something is locking a table for far too long causing a timeout. the process that is locking the table or tables for far too long will be listed in the slow log as a slow query. you can also get the slow log query to display any queries that fail to use an index resulting in an inefficient query.
If you can get the problem to occur while you are present then you can also use a tool like phpmyadmin or the commandline to run "SHOW PROCESSLIST\G" this will give you a list of what queries are running while the problem is occurring.
You think the problem is in your insert statement, therefore something is locking that table. therefore you need to find what is locking that table, therefore you need to find what is running so slow its locking the table for far too long. Slow queries is one way to do that.
Other things to look at
CPU - is it idle or running at full pelt
IO - is io causing holdups
RAM - are you swapping all the time (will cause excessive io)
Does the table product_search_query use an index?
What is the primary key?
If your index uses strings that are too long? you may build a huge index file that causes very slow inserts (slow query log will also show that)
And yes the problem may be elsewhere, but you must start somewhere mustn't you.
DC

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart