Master and Slave Databases - master-slave

On some e-commerce site, an item has only one quantity left. If the item is purchased then the master(used for writes) db will be updated with quantity 0. Since some time lag is involved in master-slave sync, there is a possibility that slave(used for reads) db is not updated when 2nd customer requests information on the same product. He/she might still see stale data, i.e. one quantity available. Couple of questions are:
1) Is it a good idea to let all the customers related queries hit master database(for updated results)?
2) If not, and master-slave sync will definitely involve some time lag, in this can how can customer experience be improved?

Let's start with a simple case: only single master database.
We both search for product, we both see last item. You buy it. I think about it for 10 seconds more than you do, then click buy. Too late! So I get an error saying "Oops, that item is no longer in stock."
In master/slave it's exact same experience, just the window for it happening is slightly bigger. So really there's nothing you can do to prevent the issue. You don't want all customers doing read queries to master, then. What you want is to do writes through master, and if write fails because of out-of-date state (oops, can't buy) then apologize to the customer.
In the UI for displaying items you can warn users there's only 1 item left, and that will make them expect this error and be less upset when it happens.

Related

Making sure that item can only be bought by 1 person when 4000 people are trying to buy within a seconds

I run a marketplace iOS app and from time to time we have "competitions", where we have an especially sought after item for sale for a good price, that drops as a specific time. Sometimes thousands of people will try to buy this item within 1-2 seconds and I therefore need to make sure that only 1 person will get the item. The solution I have for it now feels kind of clumsy, so I was wondering how a good solution would look like when I use Firebase as my database.
The process is as such:
User finds the item on his iOS app and clicks "Purchase".
A request is sent to our API (build on RoR) that processes the purchase (usually takes 10-20 seconds for the purchase to go through).
Right now, I set the buyers ID temporarily as an attribute on the item, I wait a second and check whether the buyer ID is still the same on the item. It works, but it doesn't feel optimal.
Any suggestions on how I can make sure 2 people can't purchase the same item?
To avoid something like this in your rails app, the keywords mutex and race condition should probably help you to find a bunch of appropriate gems.
I personally like to use redis for this kind of task, because in redis, transactions are atomic by default (https://en.wikipedia.org/wiki/Atomicity_(database_systems)).
So maybe this gem could suit your needs (untested): https://github.com/kenn/redis-mutex.
For the theory, refer to this articles:
https://en.wikipedia.org/wiki/Mutual_exclusion
https://en.wikipedia.org/wiki/Race_condition
Store in /items/foo
a record with the structure:
{id:<blah>, available: <timestamp>, (purchaser: null)}
let buyers write their user name to to buy:
/item/foo/purchaser
You want 3 things to happen.
Block someone writing before the servers timestamp of available
only allow 1 person to do the operation. Once the /item/foo/purchaser is set, you don't want it modifiable (i.e. write once)
only allow the authenticated user id to be used in the purchaser field
To enforce this logic you use security rules, on the subpath of "/items/$itemid/purchaser"
".write": "now > data.parent().child('available').val()" +// 1.
"&& data.val() == null" + // 2.
"&& newData.val() == auth.id" // 3.
My guess is tht you should use locks.
On a request coming in, check if you can acquire a lock. If yes, the the user is the first one. Then, the next requests won't be able to acquire the lock. This means the product as already been purchased.
Take a look at this redis doc part : http://redis.io/topics/distlock
At the application(RoR) level, you can set a flag(eg: lock_foo=true) that is shared across the cluster(can be in your cache store).
If this value is true, don't allow any other users to access the product/make the purchase.
You can definitely implement this with Firebase. As dvxam and Anshul Mengi mentioned, a lock system is the good way to go:
You could have on the document a property called lock:
{
"lock": {
"userId": "myUserId",
"expiresAt": "myTimestamp"
}
}
When a user clicks on the purchase button, you can use a Firebase transaction to make sure only one user can get the lock and that the first one gets it.
When another user clicks the purchase button, if a non-expired lock is present with a different userId, you can deny the purchase.
When the user completes the purchase you can then use another transaction to check if it is the same userId and if the lock is not expired.
Transactions are absolutely necessary here, and they are not available on the Firebase REST api (hence no more in the ruby wrapper), so you would need to run this code client-side using the iOS SDK, or to spin a nodeJS server for this task.
Hope it helps.
How about this for different:
When user clicks purchase, immediately create a purchase request record that contains product, user and timestamp, and then poll every few seconds to see if the purchase was successful
Run a background job that searches for un-purchased products that have at least one purchase request against them, and marks the product as purchased (selecting one purchase request / user as the "winner")
I'm not sure if there's a specific pattern I can apply in this case or how this is "normally" solved?
I can't speak to Firebase, but I can definitely speak to how this is "normally" solved in Rails and relational databases.
Before jumping in to code, note that it seems like you need linearizability, one of the hardest things to ask of a database, and some databases can't guarantee it even when they say they do. You might be able to hack around needing linearizability if all you need to know is whether it's been purchased or not, but I wouldn't take that hack lightly. Consistency in distributed systems is a really complex and edge-case-ridden topic, especially while under load (which it sounds like you'll be).
In Rails+RDB (postgres, mysql, sqlite) an atomic, linearized quantity update looks roughly like this (with some rails validation niceities thrown in):
class Product
validates :quantity, numericality: {greater_than: 0}, on: :purchase
def purchase
with_lock do # simultaneously aquires a lock and reloads the model
return false if !valid?(:purchase) # immediately release the lock if not valid
update_attribute(:quantity, quantity - 1) # saves without validation; YYMV
end
end
end
This general pattern of "lock+reload -> check -> update" is the gold standard for reliability, but it's "heavy." The first object to acquire the lock will win, but while it's doing its thing, all the other processes asking for a lock will be in queue. Somewhere there's a timeout and max connection pool defined, so if say 4000 locks are asked for within 1 second but it takes 10 seconds to determine success, you'll need 4000 connections and, even worse, the last lock asked for will be waiting for over 11 hours! That will make managing the connection pools and setting reasonable timeouts challenging.
The benefits, though, are that it will "just work" - if the first purchase fails, the next purchase will acquire a lock, and so on, until someone wins. Then, it will return helpful ActiveModel errors to everyone else in the queue. Additionally, it's simple enough code-wise that you know as long as your database provides linearizability, you're in the clear.
To mitigate the 11-hours issue hopefully you can very quickly deny everyone with outstanding locks to flush the queue.
I don't know exactly what you're doing while you try to make a purchase, but if it was just a credit card validation and a data update, I'd highly recommend the approach I've outlined with a database known to be linearizably consistent. Otherwise, you're going to need to consult a true distributed systems expert or run your users under the bus figuring this out.

Is It possible to write a TFS Query to get Actual Time Taken for a Tasks?

I have been using TFS to track my backlog items and I am now trying to write a query to see how long I took on particular tasks in the last 7 days. So Far I have this query:
Work Item Type = Task
AND State = Done
AND Closed Date = #Today - 7
AND Area Path = #Project
AND Assigned To = #Me
and have added the column "Closed Date" which shows the time work stopped on this item. But I cannot get any information as to when work started on the Task
Feels like the data should be there as on particular tasks it has the following kind of information:
Is this possible? I don't mind extracting the data to Excel to analyse.
The reason I ask is because I would then like to go on to compare the amount of hours assigned to a particular task and compare it to amount of hours actually taken to help my predictions of time taken in future.
It's a pretty simple query actually.
Here's what I'm using.
(Work Item Type = User Story
OR Work Item Type = Bug
)
AND State <> New
AND State <> Removed
Simply include the Activated Date, the Resolved Date, and the Closed Date in the displayed columns. The time between Activated and Closed is your total cycle time.
My team uses Resolved as a "pending deployment" status, so comparing Resolved to Closed allows us to determine how long it takes to get an item from "done" to "in prod".
TFS is not intended to be a time tracking tool. You could query the work item history with the TFS API and check the timestamps on when the state transitions occurred if you really wanted to.
Agile projects don't focus on how long individual tasks take -- they focus on how much value the development team is providing over the course of a set period of time. One thing might be estimated low, one task might be estimated high, but it ultimately doesn't matter as long as the team delivers what they committed to deliver.
It's a good practice to track Cycle Time for a team to discover areas for improvements (for User Story level, yes)
It's not good that user must go Excel to calculate Cycle Time. I can't use diagram Azure Devops tools with this calculations and its not automated.
Cycle Time widget don't shows results for several teams and it's not flexible tool, so it's not enough

In Rails, will database locking ensure only process is checking a attribute value at one time without being a giant bottleneck?

Our setup is Rails 3 with a 6 app servers behind a load balancer and one PostgreSQL database.
In our app, and user can "tip" and artist during a performance.
The process flow looks like this:
User clicks on "tip" button
Tip object is created
An after_create callback makes sure user account has enough money, if so a financial transaction moves money. Else, a Rollback exception is raised.
What can happen is that if the user "spams" the tip button, multiple tips can be in-process at once. When this occurs the "does this user have enough money?" check returns the same value for many tips since the financial transaction have not happened yet.
What I need is to make sure each "tip" gets process sequentially. That way, the balance check for tip #2 does happen before tip #1 updates the balance.
We're already using Resque for other stuff, so that might be one solution. Although I don't know of a way of making sure multiple works don't start processing jobs in parallel and cause the same issue. Having one worker do tip jobs would not be viable solution as our app processes a lot of tips at any given instant.
If you enforce this within database transactions it is a fairly simple problem to solve.
http://www.postgresql.org/docs/9.1/interactive/mvcc.html

View counter in ASP.NET MVC

I'm going to create a view counter for articles. I have some questions:
Should I ignore article's author
when he opens the article?
I don't want to update database each
time. I can store in a
Dictionary<int, int> (articleId, viewCount) how many times
each article was viewed. After 100
hits I can update the database.
I should only count the hit once per
hour for each user and article. (if
the user opens one article many
times during one hour the view count
should be incremented only once).
For each question I want to know your suggestions how to do it right.
I'm especially interested how to do #3. Should I store the time when the user opened the article in a cookie? Does it mean that I should create a new cookie for each page?
I think I know the answer - they are analyzing the IIS log as Ope suggested.
Hidden image src is set to
http://stackoverflow.com/posts/3590653/ivc/[Random code]
[Random code] is needed because many people may share the same IP (in a network, for example) and the code is used to distinguish users.
Sure - I think that is a good idea
and 3. are related: The issue is where would you actually store this dictionary and logic.
An ASP.NET application or session scope are of course the easiest choice, but there you really need to understand the logic of application pools. ASP.NET applications are recycled from time to time: when there is no action on the site for a certain period or in special situations - e.g. if the process starts to take too much memory the application is shut down and a new one is started in the next request. There are events for session and application shut-down, but at least some years ago they were not really reliable: In many special cases they did not always fire. Perhaps they are better now, but it is painful to test. And 1 hour is really a long time: Usually sessions are kept alive only like 20 minutes after last request.
A reliable way would be to have a separate Windows service (a lot of work to program) or always storing to database with double-view analyses (quite a lot of overhead for such a small feature).
Do you have access to IIS logs? How about analyzing IIS logs e.g. every 30 minutes with some kind of timer process and taking the count from there? Or then just store all the hits to the database with user information and calculate the unique hits with a similar timed process.
One final question: Are you really sure none of the thousands of counter applications/services in the Internet wouldn't do the job close enough to your requirements?
Good luck!
This is the screenshot of this page in Firebug. You can see that there is a request which returns 204 status code (No Content).
This is stackoverflow's view counter. They are using a hidden image which point to a controller's action.
I have many articles. How to track which articles the user visited already?
P.S. BTW, why is this request made two times?

Tracking impressions/visits per web page

I have a site with several pages for each company and I want to show how their page is performing in terms of number of people coming to this profile.
We have already made sure that bots are excluded.
Currently, we are recording each hit in a DB with either insert (for the first request in a day to a profile) or update (for the following requests in a day to a profile). But, given that requests have gone from few thousands per days to tens of thousands per day, these inserts/updates are causing major performance issues.
Assuming no JS solution, what will be the best way to handle this?
I am using Ruby on Rails, MySQL, Memcache, Apache, HaProxy for running overall show.
Any help will be much appreciated.
Thx
http://www.scribd.com/doc/49575/Scaling-Rails-Presentation-From-Scribd-Launch
you should start reading from slide 17.
i think the performance isnt a problem, if it's possible to build solution like this for website as big as scribd.
Here are 4 ways to address this, from easy estimates to complex and accurate:
Track only a percentage (10% or 1%) of users, then multiply to get an estimate of the count.
After the first 50 counts for a given page, start updating the count 1/13th of the time by a count of 13. This helps if it's a few page doing many counts while keeping small counts accurate. (use 13 as it's hard to notice that the incr isn't 1).
Save exact counts in a cache layer like memcache or local server memory and save them all to disk when they hit 10 counts or have been in the cache for a certain amount of time.
Build a separate counting layer that 1) always has the current count available in memory, 2) persists the count to it's own tables/database, 3) has calls that adjust both places

Resources