How to make my count thread-safe? - ruby-on-rails

I have a controller and its sole function is to increase a count in a model (i.e. foo_count) and load a view.
Lets assume I have 2 web instances running. If 10 concurrent users were to hit this page/controller at the same time. Will my count be 10?
Will there be a race condition of some sort? Since they are concurrent hits, both web requests will load a copy of model Foobar, with foo_count equals to 0 via FoobarController.
This means they were both operating on their own copy of Foobar that wasn't aware of the change the other web instance was doing. Which also means, the count will unlikely be 10.
What are some ways to resolve this?

You should use built-in records locking to avoid race conditions.

Related

Anylogic Queueing Priority based

so the initial problem is that I have 2 products, one individual and one standard. After the separate products got produced, they will send out, but in the logistics department, there is only 1 worker. So how do I prioritize the individual product? The worker should always send out the individual good before the standard product.
I'm stuck because I have no idea how to queue, either agent comparison or priority-based, but how does the block knows which product is which?
Thx
Easiest approach:
add a parameter "myPriority" to your product agent type (integer)
when creating a individualized one, set it to 10. Else to 1
in your queue, set it up as below. This will ensure the queue always moves higher-prio agents to the front. Make sure your queue block expects agents of your Product type
Also check example models and the help :)

Atomically updating the count of a DB field in a multi-threaded environment

Note - This question expands on an answer to another question here.
I'm importing a file into my DB by chunking it up into smaller groups and spawning background jobs to import each chunk (of 100 rows).
I want a way to track progress of how many chunks have been imported so far, so I had planned on each job incrementing a DB field by 1 when it's done so I know how many have processed so far.
This has a potential situation of two parallel jobs incrementing the DB field by 1 simultaneously and overwriting each other.
What's the best way to avoid this condition and ensure an atomic parallel operation? The linked post above suggests using Redis, which is one good approach. For the purposes of this question I'm curious if there is an alternate way to do it using persistent storage.
I'm using ActiveRecord in Rails with Postgres as my DB.
Thanks!
I suggest to NOT incrementing a DB field by 1, instead, create a DB record with for each job with a job id. There are two benefits:
You can count the number of records to let you know how many have processed without worrying about parallel operations.
You can also add some necessary logs into each job record and easily debug when any of the jobs fails when importing.
I suggest you use a postgresql sequence.
See CREATE SEQUENCE and Sequence Manipulation.
Especially nextval():
Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.

Count number of Postgres statements

I am trying to obtain count the number of Postgres Statements my Ruby on Rails application is performing against our database. I found this entry on stackoverflow, but it counts transactions. We have several transactions that make very large numbers of statements, so that doesn't give a good picture. I am hoping the data is available from PG itself - rather than trying to parse a log.
https://dba.stackexchange.com/questions/35940/how-many-queries-per-second-is-my-postgres-executing
I think you are looking for ActiveSupport instrumentation. Part of Rails, this framework is used throughout Rails applications to publish certain events. For example, there's an sql.activerecord event type that you can subscribe to to count your queries.
ActiveSupport::Notifications.subscribe "sql.activerecord" do |*args|
counter++
done
You could put this in config/initializers/ (to count across the app) or in one of the various before_ hooks of a controller (to count statements for a single request).
(The fine print: I have not actually tested this snippet, but that's how it should work AFAIK.)
PostgreSQL provides a few facilities that will help.
The main one is pg_stat_statements, an extension you can install to collect statement statistics. I strongly recommend this extension, it's very useful. It can tell you which statements run most often, which take the longest, etc. You can query it to add up the number of queries for a given database.
To get a rate over time you should have a script sample pg_stat_statements regularly, creating a table with the values that changed since last sample.
The pg_stat_database view tracks values including the transaction rate. It does not track number of queries.
There's pg_stat_user_tables, pg_stat_user_indexes, etc, which provide usage statistics for tables and indexes. These track individual index scans, sequential scans, etc done by a query, but again not the number of queries.

Multiple worker threads working on the same database - how to make it work properly?

I have a database that has a list of rows that need to be operated on. It looks something like this:
id remaining delivered locked
============================================
1 10 24 f
2 6 0 f
3 0 14 f
I am using DataMapper with Ruby, but really I think this is a general programming question that isn't specific to the exact implementation I'm using...
I am creating a bunch of worker threads that do something like this (pseudo-ruby-code):
while true do
t = any_row_in_database_where_remaining_greater_than_zero_and_unlocked
t.lock # update database to set locked = true
t.do_some_stuff
t.delivered += 1
t.remaining -= 1
t.unlock
end
Of course, the problem is, these threads compete with each other and the whole thing isn't really thread safe. The first line in the while loop can easily pull out the same row in multiple threads before they get a chance to get locked.
I need to make sure one thread is only working on one row at the same time.
What is the best way to do this?
The key step is when you select an unlocked row from the database and mark it as locked. If you can do that safely then everything else will be fine.
2 ways I know of that can make this safe are pessimistic and optimistic locking. They both rely on your database as the ultimate guarantor when it comes to concurrency.
Pessimistic Locking
Pessimistic locking means acquiring a lock upfront when you select the rows you want to work with, so that no one else can read them.
Something like
SELECT * from some_table WHERE ... FOR UPDATE
works with mysql and postgres (and possibly others) and will prevent any other connection to the database from reading the rows returned to you (how granular that lock is depends on the engine used, indexes etc - check your database's documentation). It's called pessimistic because you are assuming that a concurrency problem will occur and acquire the lock preventatively. It does mean that you bear the cost of locking even when not necessary and may reduce your concurrency depending on the granularity of the lock you have.
Optimistic Locking
Optimistic locking refers to a technique where you don't want the burden of a pessimistic lock because most of the time there won't be concurrent updates (if you update the row setting the locked flag to true as soon as you have read the row, the window is relatively small). AFAIK this only works when updating one row at a time
First add an integer column lock_version to the table. Whenever you update the table, increment lock_version by 1 alongside the other updates you are making. Assume the current lock_version is 3. When you update, change the update query to
update some_table set ... where id=12345 and lock_version = 3
and check the number of rows updated (the db driver returns this). if this updates 1 row then you know everything was ok. If this updates 0 rows then either the row you wanted was deleted or its lock version has changed, so you go back to step 1 in your process and search for a new row to work on.
I'm not a datamapper user so I don't know whether it / plugins for it provide support for these approaches. Active Record supports both so you can look there for inspiration if data mapper doesn't.
I would use a Mutex:
# outside your threads
worker_updater = Mutex.new
# inside each thread's updater
while true
worker_updater.synchronize do
# your code here
end
sleep 0.1 # Slow down there, mister!
end
This guarantees that only one thread at a time can enter the code in the synchronize. For optimal performance, consider what portion of your code needs to be thread-safe (first two lines?) and only wrap that portion in the Mutex.

Representing (and incrementing) relationship strength in Neo4j

I would like to represent the changing strength of relationships between nodes in a Neo4j graph.
For a static graph, this is easily done by setting a "strength" property on the relationship:
A --knows--> B
|
strength
|
3
However, for a graph that needs updating over time, there is a problem, since incrementing the value of the property can't be done atomically (via the REST interface) since a read-before-write is required. Incrementing (rather than merely updating) is necessary if the graph is being updated in response to incoming streamed data.
I would need to either ensure that only one REST client reads and writes at once (external synchronization), or stick to only the embedded API so I can use the built-in transactions. This may be workable but seems awkward.
One other solution might be to record multiple relationships, without any properties, so that the "strength" is actually the count of relationships, i.e.
A knows B
A knows B
A knows B
means a relationship of strength 3.
Disadvantage: only integer strengths can be recorded
Advantage: no read-before-write is required
Disadvantage: (probably) more storage required
Disadvantage: (probably) much slower to extract the value since multiple relationships must be extracted and counted
Has anyone tried this approach, and is it likely to run into performance issues, particularly when reading?
Is there a better way to model this?
Nice idea.
To reduce storage and multi-reads those relationships could be aggregated to one in a batch job which runs transactionally.
Each rel could also carry an individual weight value, whose aggregated value is used as weight. It doesn't have to be integer based and could also be negative to represent decrements.
You could also write a small server-extension for updating a weight value on a single relationship transactionally. Would probably even make sense for the REST API (as addition to the "set single value" operation have a modify single value operation.
PUT http://localhost:7474/db/data/node/15/properties/mod/foo
The body contains the delta value (1.5, -10). Another idea would be to replace the mode keyword by the actual operation.
PUT http://localhost:7474/db/data/node/15/properties/add/foo
PUT http://localhost:7474/db/data/node/15/properties/or/foo
PUT http://localhost:7474/db/data/node/15/properties/concat/foo
What would "increment" mean in a non integer case?
Hmm a bit of a different approach, but you could consider using a queuing system. I'm using the Neo4j REST interface as well and am looking into storing a constantly changing relationship strength. The project is in Rails and using Resque. Whenever an update to the Neo4j database is required it's thrown in a Resque queue to be completed by a worker. I only have one worker working on the Neo4j Resque queue so it never tries to perform more than one Neo4j update at once.
This has the added benefit of not making the user wait for the neo4j updates when they perform an action that triggers an update. However, it is only a viable solution if you don't need to use/display the Neo4j updates instantly (though depending on the speed of your worker and the size of your queue, it should only take a few seconds).
Depends a bit on what read and write load you are targeting. How big is the total graph going to be?

Resources