Help me to get better understanding of Digg's Cassandra data model - join

http://about.digg.com/blog/looking-future-cassandra
I've found this article about Digg's move to Cassandra. But I didn't get the author's idea of Bucket for pair (user,item). Little more details on the idea would be helpful to me to understand the solution better.
Thanks

It sounds like they are using one row in a super column family per user with one super column per item; a subcolumn for an item super column represents a friend who dugg the item. At least in pycassa, this makes an insert as simple as:
column_family.insert(user, {item: {friend: ''}})
They could also have done this a couple of other ways, and I'm not sure which they chose.
One is to use a standard column family, use a (user,item) combination for the row key, and use one column per friend who dugg the item:
column_family.insert(user + item, {friend: ''})
Another is to use a standard column family, use just (user) for the row key, and use an (item, friend) combination for the column name:
column_family.insert(user, {item + friend: ''})
Doesn't sound like this is what they used, but it's an acceptable option as well.

Related

rails - count records by value

I am trying to group records based on a value from a column, so I can use it to display the information elsewhere. At the moment I have this working if I specify the values in the column -
#city_count = People.select('city,count(*)').where("city in ('london', 'paris')").group(:city).count
This works fine if I want a list of people in London and Paris but if the city list also has Sydney, New York, Rio etc I don't want to keep adding the extra cities to the 'city in', I would like this to just find the people selected by each city.
Does anyone know the best way of doing this? Also if it can include NULL values as well.
Just use:
#city_count = People.group(:city).count
to get counts for all cities. This will include an entry for nil.
A more efficient way would be to use the distinct and count methods together.
#city_counts = Person.distinct.count(:city)
That way the work is done in the db instead of in Ruby.

rails combine parameters in controller

Hopefully this is a little clearer. I'm sorry but I'm very new to coding in general. I have multiple tables that I have to query in succession in order to get to the correct array that I need. The following logic for the query is as follows:
this gives me an array based upon the store :id
store = Stores.find(params[:id])
this gives me another array based upon the param .location found in the table store where that value equals the row ID in the table Departments
department = Departments.find(store.location)
I need to preform one last query but in order to do so I need to figure out which day of the meeting is needed. In order to do this I have to create the parameter day_of_meeting found in the table Stores. I try to call it from the array above and create a new variable. In the Table Departments, I there are params such as day_1, day_2 and so on. I need to be able to call something like department.day_1 or department.day_2. Thus, I'm trying to actually create the variable by join the words "department.day_" to the variable store.day_of_meeting which would equal some integer, creating department.day_1...
which_day = ["department.day_", store.day_of_meeting].join("")
This query finds uses the value found from the variable department.day_1 to query table Meeting to find the values in the corresponding row.
meeting = Meeting.find(which_day)
Does this make my problem any clearer to understand?
findmethod can only accept parameters like Meeting.find(1) or Meeting.find("1-xx").
so, what you need is Meeting.find(department.send("day_" + store.day_of_meeting.to_s))
Hope to help!

Check how many times a row repeats in a table

I have some records in a table, with the columns: id, link, title.
Few rows in the db have the link column with the same value, and I want to know how for link how many rows have the same value.
I have an idea on how to do it but I think there is a much easy solution.
o = Repo.select(:link).distinct
o.each do |l|
Repo.where(link: l.link).size
end
Thank you.
You can use:
Repo.select('count(*)').group(:link)
As per my comment above, the group method will group the query based on selected parameter. So:
Repo.group(:link).count
will return a Hash such as:
{link1=> count_for_link_1, link2=> count_for_link_2 ...}

How to sort a list of 1million records by the first letter of the title

I have a table with 1 million+ records that contain names. I would like to be able to sort the list by the first letter in the name.
.. ABCDEFGHIJKLMNOPQRSTUVWXYZ
What is the most efficient way to setup the db table to allow for searching by the first character in the table.name field?
The best idea right now is to add an extra field which stores the first character of the name as an observer, index that field and then sort by that field. Problem is it's no longer necessarily alphabetical.
Any suggestions?
You said in a comment:
so lets ignore the first letter part. How can I all records that start with A? All A's no B...z ? Thanks – AnApprentice Feb 21 at 15:30
I issume you meant How can I RETURN all records...
This is the answer:
select * from t
where substr(name, 1, 1) = 'A'
I agree with the questions above as to why you would want to do this -- a regular index on the whole field is functionally equivalent. PostgreSQL (with some new ones in v. 9) has some rather powerful indexing capabilities for special cases which you might want to read about here http://www.postgresql.org/docs/9.1/interactive/sql-createindex.html

Rails select distinct

in a scenario i want to retrieve the records with different values so i used distinct for that,
Book.where(["user_id = ?",#user_id]).select('distinct title_id')
`this, only retrives the records like this [#<Book title_id: 30>, #<Book title_id: 31> ]`
but i want to fetch the id of Book as well along with title_id
so, please advise me how to work on this
thanks
use grouping:
Book.where(:user_id => #user.id).grouped('title_id')
problem is that if you do grouping you can't have different book ids, they are all grouped into single row. You can use GROUP_CONCAT to workaround that:
Book...select('books.*, GROUP_CONCAT(id) as ids')
that way you'll have book ids attribute for every group

Resources