Best way to implement Stackoverflow-style reputation - ruby-on-rails

I'm implementing a Stackoverflow-like reputation system on my rap lyrics explanation site, Rap Genius:
Good explanation: +10
Bad explanation: -1
Have the most explanations on a song: +30
My question is how to implement this. Specifically, I'm trying to decide whether I should create a table of reputation_events to aid in reputation re-calculation, or whether I should just recalculate from scratch whenever I need to.
The table of reputation_events would have columns for:
name (e.g., "good_explanation", "bad_explanation")
awarded_to_id
awarded_by_id
awarded_at
Whenever something happens that affects reputation, I insert a corresponding row into reputation_events. This makes it easy to recalculate reputation and to generate a human-readable sequence of events that generated a given person's reputation.
On the other hand, any given action could affect multiple user's reputation. E.g., suppose user A overtakes user B on a given song; based on the "Have the most explanations on a song" goal, I would have to remember to delete B's original "has_the_most_explanations" event (or maybe I would add a new event for B?)

In general, I never like data to exist in more than one place. It sounds like your "reputation_events" table would contain data that can be calculated from other data. If so, I'd recalculate from scratch, unless the performance impact becomes a real problem.
When you have calculated data stored, you have the possibility that it may not correspond correctly with the base data -- basically a corrupted state. Why even make it possible if you can avoid it?

I would do a reputation event list for the purpose of recalculation and being able to track down why the total rep value is what it is.
But why have a "name" column, why not just have a value with either a positive or negative int?
This table will get huge, make sure you cache.

Related

Design Pattern for Modeling Actuals that replace Estimates

What if any is a good best practice / approach for a use case where a given business activity uses estimates that are then replaced by actual as they become available? In the same way that effective dates can be used to "automatically" (without user's having to know about it) retrieve historically accurate dimension rows, is there a similar way to have actual "automatically" replace the estimates without overwriting the data? I'd rather not have separate fact tables or columns and require that the users have to "know" about this and manually change it to get the latest actuals.
Why not have 2 measures in your fact table, one for estimate and one for actual?
You could then have a View over the fact table with a single measure calculated as "if actual = 0 then estimate else actual".
Users who just need the current position can use the View; users who need the full picture can access the underlying fact table

If I have two models and need a calculation on each attribute, should I calculate on the fly or create a 3rd model?

I have two models - Score & Weight.
Each of these models have about 5 attributes.
I need to be able to create a weighted_score for my User, which is basically the product of Score.attribute_A * Weight.attribute_A, Score.attribute_B * Weight.attribute_B, etc.
Am I better off creating a 3rd model - say Weighted_Score, where I store the product value for each attribute in a row with the user_id and then query that table whenever I need a particular weighted_score (e.g. my_user.weighted_score.attribute_A) or am I better off just doing the calculations on the fly every time?
I am asking from an efficiency stand-point.
Thanks.
I think the answer is very situation-dependent. Creating a 3rd table may be a good idea if the calculation is very expensive, you don't want to bog down the rest of the system and it's ok for you to respond to the user right away with a message saying that calculation will occur in the future. In that case, you can offload the processing into a background worker and create an instance of the 3rd model asynchronously. Additionally, you should de-normalize the table so that you can access it directly without having to lookup the Weight/Score records.
Some other ideas:
Focus optimizations on the model that has many records. If Weight, for instance, will only have 100 records, but Score could have infinite, then load Weight into memory and focus all your effort on optimizing the Score queries.
Use memoization on the calc methods
Use caching on the most expensive actions/methods. if you don't care too much about how frequently the values update, you can explicitly sweep the cache nightly or something.
Unless there is a need to store the calculated score (lets say that it changes and you want to preserve the changes to it) i dont see any benefit of adding complexity to store it in a separate table.

DB-agnostic Calculations : Is it good to store calculation results ? If yes, what's the better way to do this?

I want to perform some simple calculations while staying database-agnostic in my rails app.
I have three models:
.---------------. .--------------. .---------------.
| ImpactSummary |<------| ImpactReport |<----------| ImpactAuction |
`---------------'1 *`--------------'1 *`---------------'
Basicly:
ImpactAuction holds data about... auctions (prices, quantities and such).
ImpactReport holds monthly reports that have many auctions as well as other attributes ; it also shows some calculation results based on the auctions.
ImpactSummary holds a collection of reports as well as some information about a specific year, and also shows calculation results based on the two other models.
What i intend to do is to store the results of these really simple calculations (just means, sums, and the like) in the relevant tables, so that reading these would be fast, and in a way that i can easilly perform queries on the calculation results.
is it good practice to store calculation results ? I'm pretty sure that's not a very good thing, but is it acceptable ?
is it useful, or should i not bother and perform the calculations on-the-fly?
if it is good practice and useful, what's the better way to achieve what i want ?
Thats the tricky part.At first, i implemented a simple chain of callbacks that would update the calculation fields of the parent model upon save (that is, when an auction is created or updated, it marks some_attribute_will_change! on its report and saves it, which triggers its own callbacks, and so on).
This approach fits well when creating / updating a single record, but if i want to work on several records, it will trigger the calculations on the whole chain for each record... So i suddenly find myself forced to put a condition on the callbacks... depending on if i have one or many records, which i can't figure out how (using a class method that could be called on a relation? using an instance attribute #skip_calculations on each record? just using an outdated field to mark the parent records for later calculation ?).
Any advice is welcome.
Bonus question: Would it be considered DB agnostic if i implement this with DB views ?
As usual, it depends. If you can perform the calculations in the database, either using a view or using #find_by_sql, I would do so. You'll save yourself a lot of trouble: you have to keep your summaries up to date when you change values. You've already met the problem when updating multiple rows. Having a view, or a query that implements the view stored as text in ImpactReport, will allow you to always have fresh data.
The answer? Benchmark, benchmark, benchmark ;)

Designing a points based system similar to Stack Overflow in Ruby on Rails

I'm not trying to recreate Stack Overflow and I did look at similar questions but they don't have many answers.
I'm interested in how to design a rails app, particularly the models and their associations, in order to capture various different kinds of actions and their points amount. Additionally these points decay over time and there are possible modifiers in the form of other actions or other data I'm tracking.
For example if I were designing Stack Overflow (which again I'm not) it would go something like the following.
Creating a question = 5 points
Answering a question = 10 points
The selected correct answer is a x2 modifier on the points for Answer a question.
From a design perspective it seems to me like I need 3 models for the key parts.
The action model is polymorphic so it can belong to questions, answers, or whatever. The kind of association is stored in the type field. It also contains a points field that is calculated at creation time by a lookup in the points model I will discuss next. It should also update a total points on the user model, which I won't discuss here.
The points model is a lookup table where actions go to figure out their points. It uses the actions type as a key. It also stores the number amount for the points and a field for their decay.
The modifier model is the one where I'm not sure what to do with. I think it should probably be a lookup table too like points using the action's type field. Additionally it needs some sort of conditional on when it should be applied. I'm not sure how to store a conditional statement. It also needs to store how the points are modified. For example x2, +5, -10, /100, etc. The other problem is how does the modifier get applied after the action has already happened. In my example it would be when a question is selected as answered. By this time the points were already set. The only way I can think of doing it is to have an after_save on every model that could be a modifier which checks the modifier table and applies them. That seems wrong to me somehow though.
There are other problems too like how to handle the decay. I guess I need a cron job that just recalculates everyone's points but that seems like it doesn't scale well.
I'm not sure if I'm over thinking this or what but I'd like some feedback.
I tend to prefer an log-aggregate-snapshot where you log discrete events and then periodically aggregate changes and store those in a separate table. This would allow you to handle something like decay as an insert job rather than an update job. Depending on how many votes there are, you could even aggregate them over time and just roll forward from a specific point (though probably there aren't enough per question or answer for this to be a concern) but given that you may have other things like user's total points to track that may be a good thing to snapshot.
I think you need to figure out how you are going to handle decay before you address it in a aggregate snapshot table, however.
Now Rails has gem to achieve this feature
https://github.com/tute/merit

Schema for storing "binary" values, such as Male/Female, in a database

Intro
I am trying to decide how best to set up my database schema for a (Rails) model. I have a model related to money which indicates whether the value is an income (positive cash value) or an expense (negative cash value).
I would like separate column(s) to indicate whether it is an income or an expense, rather than relying on whether the value stored is positive or negative.
Question:
How would you store these values, and why?
Have a single column, say Income,
and store 1 if it's an income, 0
if it's an expense, null if not
known.
Have two columns, Income and
Expense, setting their values to 1 or 0 as
appropriate.
Something else?
I figure the question is similar to storing a person's gender in a database (ignoring aliens/transgender/etc) hence my title.
My thoughts so far
Lookup might be easier with a single column, but there is a risk of mistaking 0 (false, expense) for null (unknown).
Having seperate columns might be more difficult to maintain (what happens if we end up with a 1 in both columns?
Maybe it's not that big a deal which way I go, but it would be great to have any concerns/thoughts raised before I get too far down the line and have to change my code-base because I missed something that should have been obvious!
Thanks,
Philip
How would you store these values, and why?
I would store them as a single column. Despite your desire to separate the data into multiple columns, anyone who understands accounting or bookkeeping will know that the dollar value of a transaction is one thing, not two separate things based on whether it's income or expense (or asset, liablity, equity and so forth).
As someone who's actually written fully balanced double-entry accounting applications and less formal budgeting applications, I suggest you rethink your decision. It will make future work on this endeavour a lot easier.
I'm sorry, that's probably not what you want to hear and may well result in ngative rep for me but I can't, in all honesty, let this go without telling you what a mistake it will be.
Your "thoughts so far" are an indication of the problems already appearing.
1/ "Having seperate columns might be more difficult to maintain (what happens if we end up with a 1 in both columns?" - well, this shouldn't happen. Data is supposed to be internally consistent to the data model. You would be best advised preventing it with an insert/update trigger or, say, a single column that didn't allow it to happen :-)
2/ "Lookup might be easier with a single column, but there is a risk of mistaking 0 (false, expense) for null (unknown)." - no mistake possible if the sign is stored with the magnitude of the value. And the whole idea of not knowing whether an item is expense or income is abhorrent to accountants. That knowledge exists when the transaction is created, it's not something that is nebulous until some point after a transaction happens.
Sometimes I use a character. For example, I have a column gender in my database that stores m or f.
And I usually choose to have just one column.
I would typically implement a flag as an nchar(1) and use some meaningful abbreviations. I think that's the easiest thing to work with. You could use 'I' for income and 'E' for expense, for example.
That said, I don't think that's a good way to do this system.
I would probably put incomes and expenses in separate tables, since they appear to be different sorts of things. The only advantages I can think of for putting them in the same table are lost once the meanings are differentiated by flags rather than postitive and negative values.

Resources