It seems the "withdraw" porg is the classic example,it's used in 《sicp》 and 《design concepts in programming languages》to explain the "shared state"
I want to know in the "actor model" is there some method to aovid the "shared state"? But I can't find a good example write in erlang/elixir to show it
There is an example of withdraw in 《programming erlang》2ed,chapter 22,but the example seems to show how to write the opt,not how to deal the "shared state":it use ets database to save the "balance",so the ets is the "shared state",and it use only one process,not two to "withdraw" and "deposit"
So is there some good example of "withdraw" to show how erlang/elixir deal with the "shared state" problem?I think it have to encode the balance in the message to handle it,and pass the "balance" everywhere,to aovid share it in a fix place.Maybe haskell's MVar will resolve it
An actor, or an Erlang/Elixir process, is in effect a single thread. If you're in a GenServer's handle_call function you are guaranteed to not receive another message or invoke another handle_call until this particular message handler is complete. All messages sent to a process are received in some order and handled one at a time; there is no concurrency within a process and so no opportunity for state to be concurrently modified.
A minimal Elixir setup might look like
defmodule Account do
use Genserver
def start_link(balance) do
GenServer.init(__MODULE__, balance)
end
def deposit(account, amount) do
GenServer.call(account, {:deposit, amount})
end
def withdraw(account, amount) do
GenServer.call(account, {:withdraw, amount})
end
#impl true
def init(balance) do
{:ok, balance}
end
#impl true
def handle_call({:deposit, amount}, _, balance) do
new_balance = balance + amount
{:reply, :ok, new_balance}
end
#impl true
def handle_call({:withdraw, amount}, _, balance) do
if amount < balance do
{:reply, {:error, :insufficient_balance}, balance}
else
new_balance = amount - balance
{:reply, :ok, new_balance}
end
end
end
In a classical multi-threaded environment with mutable state, you have an opportunity for one thread to calculate a new_balance while another thread overwrites the existing balance, and changes can get lost. (You cite Structure and Interpretation of Computer Programs and it has an entire subsection describing the issues here.) But since the actor is single-threaded, even if multiple other processes call Account.withdraw/2 on the same account, you're guaranteed to get a consistent behavior.
Just to add to what David Maze explained: a process sends a message to an OTP genserver by calling the function:
gen_server:call(GenserverModuleName, Message)
When processA calls that function, a message is sent to the genserver process, for example in Chapter 22 the message might be a withdrawal: {remove, "account0001", 200}. When processB calls that function, another message is sent to the genserver process, e.g. another withdrawal: {remove, "account001", 1000}. The genserver process, like all erlang processes, has a mailbox that accumulates messages from all the processes that send it messages.
The genserver then searches through the mailbox for messages that it knows how to handle, e.g. messages that match the parameters specified in the various clauses of the handle_call() function definition. However, a genserver only works on one message at a time, therefore there can be no race condition, i.e. where two processes try to change the same piece of data at the same time, like the account balance. The genserver will handle one withdrawal message, and if the account has a big enough balance, then the withdrawal is allowed, and the balance is updated in the ets table. Then the genserver will handle the next withdrawal message, and if the new balance is sufficiently large, the second withdrawal is allowed, and the balance is updated in the ets table. In other words, the genserver does not spin off two processes to handle the two withdrawal messages concurrently, rather the genserver handles the two withdrawal messages sequentially.
it use ets database to save the "balance",so the ets is the "shared
state
The genserver is the only process that knows about the ets table, and the genserver only accesses the ets table sequentially.
I think it have to encode the balance in the message to handle it,and
pass the "balance" everywhere,to aovid share it in a fix place.
No, the balance can remain in the ets table for the reasons stated above.
Related
To allow users create balance withdrawal request, I have a WithdrawalsController#create action. The code checks if the user has sufficient balance before proceeding to creating the withdrawal.
def create
if amount > current_user.available_balance
error! :bad_request, :metadata => {code: "002", description: "insufficient balance"}
return
end
withdrawal = current_user.withdrawals.create(amount: amount, billing_info: current_user.billing_info)
exposes withdrawal
end
This can pose a serious issue in a multi-threaded server. When two create requests arrive simultaneously, and both requests pass the the balance check before creating the withdrawal, then both withdrawal can be created even if the sum of two could be exceeding the original balance.
Having a class variable of Mutex will not be a good solution because that will lock this action for all users, where a lock on a per-user level is desired.
What is the best solution for this?
The following diagram illustrates my suspected threading issue, could it be occurring in Rails?
As far as I can tell your code is safe here, mutlithreading is not a much of a problem. Even with more app instances generate by your app server, each instance will end up testing amount > current_user.available_balance.
If you are really paranoiac about it. you could wrap the all with a transacaction:
ActiveRecord::Base.transaction do
withdrawal = current_user.withdrawals.create!(
amount: amount,
billing_info: current_user.billing_info
)
# if after saving the `available_balance` goes under 0
# the hole transaction will be rolled back and any
# change happened to the database will be undone.
raise ActiveRecord::Rollback if current_user.available_balance < 0
end
I've got a couple microservices (implemented in ruby, although I doubt that is important for my question). One of them provides items, and the other one processes them, and then marks them as processed (via a DELETE call)
The provider has an /items endpoint which lists a bunch of items identified with an id, in JSON format. It also has a DELETE /items/id endpoint which removes one item from the list (presumably because it is processed)
The code (very simplified) in the "processor" looks like this:
items = <GET provider/items>
items.each do |item|
process item
<DELETE provider/items/#{item.id}>
end
This has several problems, but the one I would like to solve is that it is not thread-safe, and thus I can't run it in parallel. If two workers start processing items simultaneously, they will "step onto each other's toes": they will get the same list of items, and then (try to) process and delete each item twice.
What is the simplest way I can change this setup to allow for parallel processing?
You can assume that I have ruby available. I would prefer keeping changes to a minimum, and would rather not install other gems if possible. Sidekiq is available as a queuing system on the consumer.
Some alternatives (just brainstorming):
Just drop HTTP and use pub-sub with a queue. Have the producer queueing items, a number of consumers processing them (and triggering state changes, in this case with HTTP if you fancy it).
If you really want to HTTP, I think there are a couple of missing pieces. If your items' states are pending and processed, there's a hidden/implicit state in your state machine: in_progress (or whatever). Once you think of it, picture becomes clearer: your GET /items is not idempotent (because it changes the state of items from pending to in progress) and hence should not be a GET in the first place.
a. an alternative could be adding a new entity (e.g. batch) that gets created via POST and groups some items under it and sends them. Items already returned won't be part of future batches, and then you can mark as done whole batches (e.g. PUT /batches/X/done). This gets crazy very fast, as you will start reimplementing features (acks, timeouts, errors) already present both in queueing systems and plain/explicit (see c) HTTP.
b. a slightly simpler alternative: just turn /items in a POST/PUT (weird in both cases) endpoint that marks items as being processed (and doesn't return them anymore because it only returns pending items). The same issue with errors and timeouts apply though.
c. have the producer being explicit and requesting the processing of an item to the other service via PUT. you can either include all needed data in the body, or use it as a ping and have the processor requesting the info via GET. you can add asynchronous processing in either side (but probably better in the processor).
I would honestly do 1 (unless compelling reason).
Seems to me that the issue is with parallelizing this implementation is you are thinking that each thread will call:
<GET provider/items>
One solution would be to get all the items first then do the async processing.
My Ruby is non-existent but it might look something like this:
class HardWorker
include Sidekiq::Worker
def perform(item)
process item
<DELETE provider/items/#{item.id}>
end
end
items = <GET provider/items>
items.each do |item|
HardWorker.perform_async(item)
end
This way your "producer" is the loop and the consumer is the async HardWorker.
What is the simplest way I can change this setup to allow for parallel processing?
If you can upgrade the code on the server, or add middle-man code, then the simplest way is a queue.
If you prefer just client-side, with no middle-man and no client-to-client talk, and some occasional redundancy is ok, then here are some ideas.
Reduce collisions by using shuffle
If it's ok for your server to receive a DELETE for a non-existent object
And the "process item" cost+time is relatively small
And the process is order-independent
Then you could shuffle the items to reduce collisions:
items.shuffle.each do |item|
process item
Check that the item exists by using HEAD
If your server has the HEAD method
And has a way to look up one item
And the HTTP connection is cheap+fast compared to "process item"
Then you could skip the item if it doesn't exists:
items.each do |item|
next if !<HEAD provider/items/id>
Refresh the items by using a polling loop
If the items are akin to you polling an ongoing pool of work
And are order independent
And the GET request is idempotent, i.e. it's ok to request all the items more than once
And the DELETE request returns a result that informs you the item did not exist
Then you could process items until you hit a redundancy, then refresh the items list:
loop do
items = <GET provider/items>
if items.blank?
sleep 1
next
end
items.each do |item|
process item
<DELETE provider/items/#{item.id}>
break if DELETE returns a code that indicates "already deleted"
end
end
All of the above combined using a polling loop, shuffle, and HEAD check.
This is surprisingly efficient, given no queue, nor middle-man, nor client-to-client talk.
There's still a rare redundant "process item" that can happen when multiple clients check if an item exists then start processing it; in practice this is near-zero probability, especially when there are many items.
loop do
items = <GET provider/items>
if items.blank?
sleep 1
next
end
items.shuffle do |item|
break if !<HEAD provider/items/id>
process item
<DELETE provider/items/#{item.id}>
break if DELETE returns a code that indicates "already deleted"
end
end
Ryan Bates mentions the LISTEN/NOTIFY functionality of Postgres when discussing push notifications in this episode, but I haven't been able to find any hint on how to implement a LISTEN/NOTIFY in my rails app.
Here is documentation for a wait_for_notify function inside of the pg adaptor, but I can't figure out what exactly that does/is designed for.
Do we need to tap directly into the connection variable of the pg adaptor?
You're looking in the right place with the wait_for_notify method, but since ActiveRecord apparently doesn't provide an API for using it, you'll need to get at the underlying PG::Connection object (or one of them, if you're running a multithreaded setup) that ActiveRecord is using to talk to Postgres.
Once you've got the connection, simply execute whatever LISTEN statements you need, then pass a block (and an optional timeout period) to wait_for_notify. Note that this will block the current thread, and monopolize the Postgres connection, until the timeout is reached or a NOTIFY occurs (so you wouldn't want to do this inside a web request, for example). When another process issues a NOTIFY on one of the channels you're listening to, the block will be called with three arguments - the channel that the notified, the pid of the Postgres backend that triggered the NOTIFY, and the payload that accompanied the NOTIFY (if any).
I haven't used ActiveRecord in quite a while, so there may be a cleaner way to do this, but this seems to work alright in 4.0.0.beta1:
# Be sure to check out a connection, so we stay thread-safe.
ActiveRecord::Base.connection_pool.with_connection do |connection|
# connection is the ActiveRecord::ConnectionAdapters::PostgreSQLAdapter object
conn = connection.instance_variable_get(:#connection)
# conn is the underlying PG::Connection object, and exposes #wait_for_notify
begin
conn.async_exec "LISTEN channel1"
conn.async_exec "LISTEN channel2"
# This will block until a NOTIFY is issued on one of these two channels.
conn.wait_for_notify do |channel, pid, payload|
puts "Received a NOTIFY on channel #{channel}"
puts "from PG backend #{pid}"
puts "saying #{payload}"
end
# Note that you'll need to call wait_for_notify again if you want to pick
# up further notifications. This time, bail out if we don't get a
# notification within half a second.
conn.wait_for_notify(0.5) do |channel, pid, payload|
puts "Received a second NOTIFY on channel #{channel}"
puts "from PG backend #{pid}"
puts "saying #{payload}"
end
ensure
# Don't want the connection to still be listening once we return
# it to the pool - could result in weird behavior for the next
# thread to check it out.
conn.async_exec "UNLISTEN *"
end
end
For an example of a more general usage, see Sequel's implementation.
Edit to add: Here's another description of what's going on. This may not be the exact implementation behind the scenes, but it seems to describe the behavior well enough.
Postgres keeps a list of notifications for each connection. When you use a connection to execute LISTEN channel_name, you're telling Postgres that any notifications on that channel should be pushed to this connection's list (multiple connections can listen to the same channel, so a single notification can wind up being pushed to many lists). A connection can LISTEN to many channels at the same time, and notifications to any of them will all be pushed to the same list.
What wait_for_notify does is pop the oldest notification off of the connection's list and passes its information to the block - or, if the list is empty, sleeps until a notification becomes available and does the same for that (or until the timeout is reached, in which case it just returns nil). Since wait_for_notify only handles a single notification, you're going to have to call it repeatedly if you want to handle multiple notifications.
When you UNLISTEN channel_name or UNLISTEN *, Postgres will stop pushing those notifications to your connection's list, but the ones that have already been pushed to that list will stay there, and wait_for_notify will still return them when it is next called. This might cause an issue where notifications that are accumulated after wait_for_notify but before UNLISTEN stick around and are still present when another thread checks out that connection. In that case, after UNLISTEN you might want to call wait_for_notify with short timeouts until it returns nil. But unless you're making heavy use of LISTEN and NOTIFY for many different purposes, though, it's probably not worth worrying about.
I added a better link to Sequel's implementation above, I'd recommend looking at it. It's pretty straightforward.
The accepted answer looks good to me. Some promising resources I found while exploring postgres LISTEN/NOTIFY:
https://gist.github.com/chsh/9c9f5702919c83023f83
https://github.com/schneems/hey_you
The source in hey_you is easy to read and looks similar to the other examples
I'm writing a module and it has local table with a list of filtered clients. When one of clients in table is killed all operations with it raise "client is invalid" exception.
So how can I check if client is killed?
Best would be to add a signal for the unmanage event on every client, you add to your list.
In the signal function you can then delete the client from the list of tables. It could look something like this:
client.add_signal('unmanage', function(c)
-- Remove c from your list
end)
Rails has a nice set of filters (before_validation, before_create, after_save, etc) as well as support for observers, but I'm faced with a situation in which relying on a filter or observer is far too computationally expensive. I need an alternative.
The problem: I'm logging web server hits to a large number of pages. What I need is a trigger that will perform an action (say, send an email) when a given page has been viewed more than X times. Due to the huge number of pages and hits, using a filter or observer will result in a lot of wasted time because, 99% of the time, the condition it tests will be false. The email does not have to be sent out right away (i.e. a 5-10 minute delay is acceptable).
What I am instead considering is implementing some kind of process that sweeps the database every 5 minutes or so and checks to see which pages have been hit more than X times, recording that state in a new DB table, then sending out a corresponding email. It's not exactly elegant, but it will work.
Does anyone else have a better idea?
Rake tasks are nice! But you will end up writing more custom code for each background job you add. Check out the Delayed Job plugin http://blog.leetsoft.com/2008/2/17/delayed-job-dj
DJ is an asynchronous priority queue that relies on one simple database table. According to the DJ website you can create a job using Delayed::Job.enqueue() method shown below.
class NewsletterJob < Struct.new(:text, :emails)
def perform
emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
end
end
Delayed::Job.enqueue( NewsletterJob.new("blah blah", Customers.find(:all).collect(&:email)) )
I was once part of a team that wrote a custom ad server, which has the same requirements: monitor the number of hits per document, and do something once they reach a certain threshold. This server was going to be powering an existing very large site with a lot of traffic, and scalability was a real concern. My company hired two Doubleclick consultants to pick their brains.
Their opinion was: The fastest way to persist any information is to write it in a custom Apache log directive. So we built a site where every time someone would hit a document (ad, page, all the same), the server that handled the request would write a SQL statement to the log: "INSERT INTO impressions (timestamp, page, ip, etc) VALUES (x, 'path/to/doc', y, etc);" -- all output dynamically with data from the webserver. Every 5 minutes, we would gather these files from the web servers, and then dump them all in the master database one at a time. Then, at our leisure, we could parse that data to do anything we well pleased with it.
Depending on your exact requirements and deployment setup, you could do something similar. The computational requirement to check if you're past a certain threshold is still probably even smaller (guessing here) than executing the SQL to increment a value or insert a row. You could get rid of both bits of overhead by logging hits (special format or not), and then periodically gather them, parse them, input them to the database, and do whatever you want with them.
When saving your Hit model, update a redundant column in your Page model that stores a running total of hits, this costs you 2 extra queries, so maybe each hit takes twice as long to process, but you can decide if you need to send the email with a simple if.
Your original solution isn't bad either.
I have to write something here so that stackoverflow code-highlights the first line.
class ApplicationController < ActionController::Base
before_filter :increment_fancy_counter
private
def increment_fancy_counter
# somehow increment the counter here
end
end
# lib/tasks/fancy_counter.rake
namespace :fancy_counter do
task :process do
# somehow process the counter here
end
end
Have a cron job run rake fancy_counter:process however often you want it to run.