Designing a points based system similar to Stack Overflow in Ruby on Rails - ruby-on-rails

I'm not trying to recreate Stack Overflow and I did look at similar questions but they don't have many answers.
I'm interested in how to design a rails app, particularly the models and their associations, in order to capture various different kinds of actions and their points amount. Additionally these points decay over time and there are possible modifiers in the form of other actions or other data I'm tracking.
For example if I were designing Stack Overflow (which again I'm not) it would go something like the following.
Creating a question = 5 points
Answering a question = 10 points
The selected correct answer is a x2 modifier on the points for Answer a question.
From a design perspective it seems to me like I need 3 models for the key parts.
The action model is polymorphic so it can belong to questions, answers, or whatever. The kind of association is stored in the type field. It also contains a points field that is calculated at creation time by a lookup in the points model I will discuss next. It should also update a total points on the user model, which I won't discuss here.
The points model is a lookup table where actions go to figure out their points. It uses the actions type as a key. It also stores the number amount for the points and a field for their decay.
The modifier model is the one where I'm not sure what to do with. I think it should probably be a lookup table too like points using the action's type field. Additionally it needs some sort of conditional on when it should be applied. I'm not sure how to store a conditional statement. It also needs to store how the points are modified. For example x2, +5, -10, /100, etc. The other problem is how does the modifier get applied after the action has already happened. In my example it would be when a question is selected as answered. By this time the points were already set. The only way I can think of doing it is to have an after_save on every model that could be a modifier which checks the modifier table and applies them. That seems wrong to me somehow though.
There are other problems too like how to handle the decay. I guess I need a cron job that just recalculates everyone's points but that seems like it doesn't scale well.
I'm not sure if I'm over thinking this or what but I'd like some feedback.

I tend to prefer an log-aggregate-snapshot where you log discrete events and then periodically aggregate changes and store those in a separate table. This would allow you to handle something like decay as an insert job rather than an update job. Depending on how many votes there are, you could even aggregate them over time and just roll forward from a specific point (though probably there aren't enough per question or answer for this to be a concern) but given that you may have other things like user's total points to track that may be a good thing to snapshot.
I think you need to figure out how you are going to handle decay before you address it in a aggregate snapshot table, however.

Now Rails has gem to achieve this feature
https://github.com/tute/merit

Related

Question regarding role-playing dimension

I hope you can be helpful in answering one question in regards to role-playing dimensions.
When using views for a role playing dimension, Does it then matter which view is referred to later in the analysis. Especially, when sorting on the role playing dimension, can this be done no matter which view is used?
Hope the question is clear enough. If not, let me know and I will elaborate.
Thanks in advance.
Do you mean you have created a view similar to "SELECT * FROM DIM" for each role the Dim plays? If that's all you've done then you could use any of these views in a subsequent SQL statement that joins the DIM to a FACT table - but obviously if you use the "wrong" view it's going to be very confusing for anyone trying to read your SQL (or you trying to understand what've you've written in 3 months time!)
For example, if you have a fact table with keys OrderDate and ShipDate that both reference your DateDim then you could create vwOrderDate and vwShipDate. You could then join FACT.OrderDate to vwShipDate and FACT.ShipDate to vwOrderDate and it will make no difference to the actual resultset your query produces (apart from, possibly, column names).
However, unless the applicable attributes are very different for different roles, I really wouldn't bother creating views for role-playing Dims as it's an unnecessary overhead and just going to cause confusion to anyone you've given access to at this level of the DB (who presumably have pretty strong SQL skills to be given this level of access?).
If you are trying to make life easier for end-users then either create these types of "views" in the models of the BI tool(s) they are using - and not directly in the DB - or, if they are being given access to the DB, then create View(s) across the Fact(s) and all their joined Dimensions

Survey Monkey editing likert scale

I am currently working on my dissertation, and as part of it i have constructed a questionnaire on Survey Monkey.
In one of the questions, a matrix type, with ten items, with four choices, i made a miscalculation. I graded the scale from 1-4 instead of 0-3 and i already have about 14 answers to it. Now, if i edit the questionnaire and recode the scale from 0-3 how is that going to affect the answers i already have?
Are they going to change automatically to conform to the new scale, or is it going to disrupt my whole questionnaire?
You can change it and it won't affect your responses. The weight you set on a choice is used during the analytics stage and not actually stored with the response.
The help docs don't say so explicitly but seems to suggest it's safe. In the Analyzing Results section:
If needed, you can change the weight of each answer choice in the
Design section of the survey, even after the survey has collected
responses.

Detecting HTML table orientation based only on table data

Given an HTML table with none of it's cells identified as "< th >" or "header" cells, I want to automatically detect whether the table is a "Vertical" table or "Horizontal" table.
For example:
This is a Horizontal table:
and this is a vertical table:
of course keep in mind that the "Bold" property along with the shading and any styling properties will not be available at the classification time.
I was thinking of approaching this by a statistical means, I can hand write couple of features like "if the first row has numbers, but the first column doesn't. That's probably a Vertical table" and give score for each feature and combine to decide the Class of the table orientation.
Is that how you approach such a problem? I haven't used any statistical-based algorithm before and I am not sure what would be optimal for such a problem
This is a bit confusing question. You are asking about ML method, but it seems you have not created training/crossvalidation/test sets yet. Without data preprocessing step any discussion about ML method is useless.
If I'm right and you didn't created datasets yet - give us more info on data (if you take a look on one example how do you know the table is vertical or horizontal?, how many data do you have, are you always sure whether s table is vertical/horizontal,...)
If you already created training/crossval/test sets - give us more details how the training set looks like (what are the features, number of examples, do you need white-box solution (you can see why a ML model give you this result),...)
How general is the domain for the tables? I know some Web table schema identification algorithms use types, properties, and instance data from a general knowledge schema such as Freebase to attempt to identify the property associated with a column. You might try leveraging that knowledge in an classifier.
If you want to do this without any external information, you'll need a bunch of hand labelled horizontal and vertical examples.
You say "of course" the font information isn't available, but I wouldn't be so quick to dismiss this since it's potentially a source of very useful information. Are you sure you can't get your data from a little bit further back in the pipeline so that you can get access to this info?

DB-agnostic Calculations : Is it good to store calculation results ? If yes, what's the better way to do this?

I want to perform some simple calculations while staying database-agnostic in my rails app.
I have three models:
.---------------. .--------------. .---------------.
| ImpactSummary |<------| ImpactReport |<----------| ImpactAuction |
`---------------'1 *`--------------'1 *`---------------'
Basicly:
ImpactAuction holds data about... auctions (prices, quantities and such).
ImpactReport holds monthly reports that have many auctions as well as other attributes ; it also shows some calculation results based on the auctions.
ImpactSummary holds a collection of reports as well as some information about a specific year, and also shows calculation results based on the two other models.
What i intend to do is to store the results of these really simple calculations (just means, sums, and the like) in the relevant tables, so that reading these would be fast, and in a way that i can easilly perform queries on the calculation results.
is it good practice to store calculation results ? I'm pretty sure that's not a very good thing, but is it acceptable ?
is it useful, or should i not bother and perform the calculations on-the-fly?
if it is good practice and useful, what's the better way to achieve what i want ?
Thats the tricky part.At first, i implemented a simple chain of callbacks that would update the calculation fields of the parent model upon save (that is, when an auction is created or updated, it marks some_attribute_will_change! on its report and saves it, which triggers its own callbacks, and so on).
This approach fits well when creating / updating a single record, but if i want to work on several records, it will trigger the calculations on the whole chain for each record... So i suddenly find myself forced to put a condition on the callbacks... depending on if i have one or many records, which i can't figure out how (using a class method that could be called on a relation? using an instance attribute #skip_calculations on each record? just using an outdated field to mark the parent records for later calculation ?).
Any advice is welcome.
Bonus question: Would it be considered DB agnostic if i implement this with DB views ?
As usual, it depends. If you can perform the calculations in the database, either using a view or using #find_by_sql, I would do so. You'll save yourself a lot of trouble: you have to keep your summaries up to date when you change values. You've already met the problem when updating multiple rows. Having a view, or a query that implements the view stored as text in ImpactReport, will allow you to always have fresh data.
The answer? Benchmark, benchmark, benchmark ;)

Best way to implement Stackoverflow-style reputation

I'm implementing a Stackoverflow-like reputation system on my rap lyrics explanation site, Rap Genius:
Good explanation: +10
Bad explanation: -1
Have the most explanations on a song: +30
My question is how to implement this. Specifically, I'm trying to decide whether I should create a table of reputation_events to aid in reputation re-calculation, or whether I should just recalculate from scratch whenever I need to.
The table of reputation_events would have columns for:
name (e.g., "good_explanation", "bad_explanation")
awarded_to_id
awarded_by_id
awarded_at
Whenever something happens that affects reputation, I insert a corresponding row into reputation_events. This makes it easy to recalculate reputation and to generate a human-readable sequence of events that generated a given person's reputation.
On the other hand, any given action could affect multiple user's reputation. E.g., suppose user A overtakes user B on a given song; based on the "Have the most explanations on a song" goal, I would have to remember to delete B's original "has_the_most_explanations" event (or maybe I would add a new event for B?)
In general, I never like data to exist in more than one place. It sounds like your "reputation_events" table would contain data that can be calculated from other data. If so, I'd recalculate from scratch, unless the performance impact becomes a real problem.
When you have calculated data stored, you have the possibility that it may not correspond correctly with the base data -- basically a corrupted state. Why even make it possible if you can avoid it?
I would do a reputation event list for the purpose of recalculation and being able to track down why the total rep value is what it is.
But why have a "name" column, why not just have a value with either a positive or negative int?
This table will get huge, make sure you cache.

Resources