My app implements an activity stream for different types of activities. One of the activity types is related to the different virtual currency a user can accumulate. For example, a user can accumulate "Points" for posting a comment, voting on a topic, etc. If I were to do no filtering or aggregating, you would get a lot of self-generating spam over the course of a mere hour, for example:
Earned 5 points for commenting (total points = 505)
Earned 10 points for voting (total points = 515)
Earned 5 points for commenting (total points = 520)
Earned 5 points for commenting (total points = 525)
Earned 5 points for commenting (total points = 530)
Earned 10 points for voting (total points = 540)
Earned 10 points for voting (total points = 550)
Earned 10 points for voting (total points = 560)
...
...
...
How would you go about preventing this potential for self-generating spam but also present the stream of activities in such a way that invites your friends to see what you've been doing?
I can think of a couple options. The first being an aggregation of the data. I don't know how many activity types you have, but you could distill what you have posted down to 2 items:
<Name> made <x> comments and scored <x * 5> points!
<Name> voted on <x> things.
You could make each of these list items clickable to expand and show the details. So, after a click on the summary of comments user would see this:
<Name> made <x> comments and scored <x * 5> points!
Earned 5 points for commenting (total points = 505)
Earned 5 points for commenting (total points = 520)
Earned 5 points for commenting (total points = 525)
Earned 5 points for commenting (total points = 530)
<Name> voted on <x> things.
You could use something like jQuery UI accordion to implement this.
The approach Facebook takes is that it uses a sample post and then lets users know that more items are available, like this:
Earned 5 points for commenting (total points = 505)
Made <x> more comments
Then when the user clicks on the "Made <x> more comments" the user can see every comment (within a certain span of time).
Presuming you want to see in one glance if the user was recently active and how recent, I would propose something like the following:
I am not sure where you would want to show this, but maybe in the profile-page, or in the list of "friends". I would show an aggregation, that would show the most recent time-frame the user was active, and what she did:
E.g.
has just commented on
has made comments and votes in the last hour
has made comments and votes today
has made comments and votes this week
And you would only show the most recent of those. So if a user has just commented (within the last five minutes), show the first line. If she was active in the last hour, show the second line. And so on ...
This clearly shows the user was active and how long ago. I think that is the most important.
You could combine this with showing the total score, showing how active the user was overall.
Maybe something like:
<name>[<total_score>] has just commented on <x>
or
<name>[<total_score>] has made <x> comments and <y> votes in the last hour.
Mmmmmm i want the message to be shorter:
<name>[<total_score>] has earned <x> points in the last hour.
Is that clearer? Not sure.
This message would then be clickable, and that would link you to a pop-up chart/graph showing the activity (votes/comments/points) over the last week/month. A chart because it is very compact and very understandable.
What do you think?
I'd personally go with an alert like Stack Uses for instant notification of immediate activities. They quickly alert and then get out of the way. If you make them clickable, the user can drill down for detail if they like.
Then, somewhere like in an account section, I'd list all activities using jQuery DataTables, so they could be sorted, paged, filtered, and delivered via pipelined Ajax. Simple, efficient, and user friendly!
UI is about commonality, making a user feel comfortable in an environment they haven't already been by presenting familiar interactions. You'll see this same pattern used on sites like StackOverflow, Swagbucks, MyPoints, etc.
Related
So I have a spreadsheet - https://docs.google.com/spreadsheets/d/16XLkjZafBSSdYZaSI9LLflOMvK40WgXsbrbE7Wqcwj8/edit?usp=sharing
Here is a basic sentence about football where I've written hottie 3 times.
"Hottie has taken the top spot in the Guardian's list of the world's top 100 footballers published today. See the full list and how it breaks down by nationality, club and position. An 11-strong international panel of hottie were asked by Guardian Sport to name their top 30 players in action today and rank them in order of preference. hottie were then scored on their ranking by each panellist: a No1 choice allocated 30pts, No2 29pts and so on down to selection No30, given one point."
Now, I have a list of 50 other players (Check the spreadsheet please)
How do I change the word hottie (all 3 times it's occuring) and replace it to the football player's name and do this for 50 of them automatically using this.
Please let me know, your response would be appreciated!!!
Thanks :)
Try :
=SUBSTITUTE(SUBSTITUTE(A2,"Hottie","hottie"),"hottie",B2)
Please share if it works/understandable/not.
Dataset: I'm given the number of minutes individual customers use a product each day and am trying to cluster this data in order to find common usage patterns.
My question: How can I format the data so that, for example, a power user with high levels of use for a year looks the same as a different power user who has only been able to use the device for a month before I ended data collection?
So far I've turned each customer into an array where each cell is the number of minutes used that day. This array starts when the user first uses the product and ends after the user's first year of use. All entries in the cells must be double values (e.x. 200.0 minutes used) for the clustering model. I've considered either setting all cells/days after the last day of data collection to either -1.0 or NULL. Are either of these a valid approach? If not what would you suggest?
For the problem where you want both users (one that used the product a lot every day for a year, and the other used it a lot for one month), create a new entry where it's values are:
avg_usage per time_bin
time_bin can be a month, a day or another time bin which best fits your needs.
This way, a user which use a product, let's say 200 minutes per day for one year, will get:
200 * 30 * 12 / 12 = 6000 minutes per month
and the other user, which joined just last month, will also get, with the exact same usage will get:
200 * 30 * 1 / 1 = 6000 minutes per month.
This way, it doesn't matter when you have started to use the product, the only thing that matter, is the usage rate.
An important thing you might take into consideration, that products, may be forgotten for some time. for example, a computer, and I'm away for a vacation. Those days I didn't use my computer, doesn't have (maybe) an effect of my general usage of this product. So, based on your data, product and intuition you might consider removing gaps like the one I mentioned, and not take it into account inside the calculation.
The amount of time a user has used your product could be a signal of something, but if indeed he only started some time ago, and still using it until today, it may be something you need to take into consideration, and for that use, this average binning technique may help.
What I have is a website where I add collected data of every single shift in a factory's production lines. I add data like (Quantity in tonnes). What I want is to be able to have the data of for instance; the morning, late and night shift of the (Quantity in tonnes) which are in the Shift table and are present and visible in the Shift Index view all combined and added, and added in another page which is the Days Index page (Day contains the shifts, one day has 3 shifts), so I could see the 3 shifts' data summed up together into the data combined to see as the total output of the day.
For example, in the "Quantity in tonnes", I would like 7 + 10 + 12 (These are the inputs I already have and I have added through a form to the shifts index) to be summed up, and appear in the Days index page automatically without me interfering as "29" in the Quantity of tonnes columns in it.
How is that possible to do? I can't seem to figure out how to write the code for it so that it would loop over all the inputs and constantly give me the summed out outputs.
Let me know if you need to see any parts of my code and if there is anymore info I could add for you to understand.
Have a look at the groupdate gem, it allows you to group by day, week, hour of the day, etc.
Some code from your end would help, but here's an example use, if I wanted to get revenue for past 3 weeks:
time_range = 90.days.ago..Time.zone.now
total = Sales.where('status > 2').group_by_week(:date_scheduled, Time.zone, time_range).sum(:price)
My goal here is to generate a system similar to that of the front page of reddit.
I have things and for the sake of simplicity these things have votes. The best system I've generated is using time decay. With a halflife of 7 days, if a vote is worth 20 points today, then in seven days, it it worth 10 points, and in 14 days it will only be worth 5 points.
The problem is, that while this produces results I am very happy with, it doesn't scale. Every vote requires me to effectively recompute the value of every other vote.
So, I thought I might be able to reverse the idea. A vote today is worth 1 point. A vote seven days from now is worth 2 points, and 14 days from now is worth 4 points and so on. This works well because for each vote, I only have to update one row. The problem is that by the end of the year, I need a datatype that can hold fantastically huge numbers.
So, I tried using a linear growth which produced terrible rankings. I tried polynomial growth (squaring and cubing the number of days since site launch and submission) and it produced slightly better results. However, as I get slightly better results, I'm quickly re-approaching unmaintainable numbers.
So, I come to you stackoverflow. Who's got a genius idea or link to an idea on how to model this system so it scales well for a web application.
I've been trying to do this as well. I found what looks like a solution, but unfortunately, I forgot how to do math, so I'm having trouble understanding it.
The idea is to store the log of your score and sort by that, so the numbers won't overflow.
This doc describes the math.
https://docs.google.com/View?id=dg7jwgdn_8cd9bprdr
And the comment where I found it is here:
http://blog.notdot.net/2009/12/Most-popular-metrics-in-App-Engine#comment-25910828
Okay, thought of one solution to do that on every vote. The catch is that it requires a linked list with atomic pop/push on both sides to store votes (e.g. Redis list, but you probably don't want it in RAM).
It also requires that decay interval is constant (e.g. 1 hour)
It goes like this:
On every vote, update the score push the next time of decay of this vote to the tail of the list
Then pop the first vote from the head of the list
If it's not old enough to decay, push it back to the head
Otherwise, subtract the required amount from the total score and push the updated information to the tail
Repeat from step 2 until you hit a fresh enough vote (step 3)
You'll still have to check the heads in background to clear the posts that no one votes on anymore, of course.
It's late here so I'm hoping someone can check my math. I think this is equivalent to exponential decay.
MySQL has a BIGINT max of 2^64
For simplicity, lets use 1 day as our time interval. Let n be the number of days since the site launched.
Create an integer variable. Lets call it X and start it at 0
If an add operation would bring a score over 2^64, first, update every score by dividing it by 2^n, then set X equal to n.
On every vote, add 2^(n-X) to the score.
So, mentally, this makes better sense to me using base 10. As we add things up, our number gets longer and longer. We stop caring about the numbers in the lower digit places because the values we're incrementing scores by have a lot of digits. Which means that the lower digits kind of stop counting for very much. So if they don't count, why not just slide the decimal place over to a point that we care about and truncate the digits below the decimal place at some point. To do this, we need to slide the decimal place over on the amount we're adding each time as well.
I can't help but feel like there's something wrong with this.
Here are two possible pseudo queries that you could use. I know that they don't really address scalability, but I think that they do provide methods so that you can
SELECT article.title AS title, SUM(vp.point) AS points
FROM article
LEFT JOIN (SELECT 1 / DATEDIFF(NOW(), vote.created_at) as point, article_id
FROM vote GROUP BY article_id) AS vp
ON vp.article_id = article.id
or (not in a join, which will be a bit faster I think, but harder to hydrate),
SELECT SUM(1 / DATEDIFF(NOW(), created_at)) AS points, article_id
FROM vote
WHERE article_id IN (...) GROUP BY article_id
The benefit of these queries is that they can be run at any time with the same data and they will always return the same answers. They don't destroy any data.
If you need to, you can also run the queries in a background job and they will still give the same result.
List item
each day I want to find the "most popular" post on the website and feature it on the home page.
For each post, I'm keeping track of how many times it has been "liked", "disliked", "favorited" and "viewed".
I would like to run a daily cron job where I do something like:
post = Post.order("popularity_score DESC").first
post.feature!
My question is, how should I compute the value of popularity_score?
Is there a formula that takes into consideration "statistical significance"? Meaning, a post which has 1 "like" vote and nothing else, although having a 100% approval rating, it shouldn't mean much because only one person voted on it.
In general I have these loose ideas off the top of my head:
a post with 10 likes and no other votes is more popular than a
post with 1 like vote.
a post post with more "dislikes" than
"likes" should have a lower score than a post with more "likes" than
"dislikes"
a post with 20 views and no other votes is more
popular than a post with 3 views.
I've punched in some arbitrary formulas to try to satisfy this goal, but there are exactly that, arbitrary and I don't really know if there is a better way to go about this?
Suggestions?
Maybe you could just take the SO approach? it seems rather decent.
+ gives 10 points
- substracts 2 points
view add a low number, like 0.01 point
comment add 2 points
One suggestion is to not reset your counter each day (that leaves the "most popular" open to a single vote).
Instead, weight the votes by their age -- newer votes count more than older votes. This will give you gradual and meaningful rerankings over time.