Cassandra cql kind of multiget - ruby-on-rails

i want to make a query for two column families at once... I'm using the cassandra-cql gem for rails and my column families are:
users
following
followers
user_count
message_count
messages
Now i want to get all messages from the people a user is following. Is there a kind of multiget with cassandra-cql or is there any other possibility by changing the datamodel to get this kind of data?

I would call your current data model a traditional entity/relational design. This would make sense to use with an SQL database. When you have a relational database you rely on joins to build your views that span multiple entities.
Cassandra does not have any ability to perform joins. So instead of modeling your data based on your entities and relations, you should model it based on how you intend to query it. For your example of 'all messages from the people a user is following' you might have a column family where the rowkey is the userid and the columns are all the messages from the people that user follows (where the column name is a timestamp+userid and the value is the message):
RowKey Columns
-------------------------------------------------------------------
| | TimeStamp0:UserA | TimeStamp1:UserB | TimeStamp2:UserA |
| UserID |------------------|------------------|------------------|
| | Message | Message | Message |
-------------------------------------------------------------------
You would probably also want a column family with all the messages a specific user has written (I'm assuming that the message is broadcast to all users instead of being addressed to one particular user):
RowKey Columns
--------------------------------------------------------
| | TimeStamp0 | TimeStamp1 | TimeStamp2 |
| UserID |------------|------------|-------------------|
| | Message | Message | Message |
--------------------------------------------------------
Now when you create a new message you will need to insert it multiple places. But when you need to list all messages from people a user is following you only need to fetch from one row (which is fast).
Obviously if you support updating or deleting messages you will need to do that everywhere that there is a copy of the message. You will also need to consider what should happen when a user follows or unfollows someone. There are multiple solutions to this problem and your solution will depend on how you want your application to behave.

Related

Rails using Views instead of Tables

I need to create a Rails app that will show/utilize our current CRM system data. The thing is - I could just take Rails and use current DB as backend, but the table names and column names are the exact opposite Rails use.
Table names:
+-------------+----------------+--------------+
| Resource | Expected table | Actual table |
+-------------+----------------+--------------+
| Invoice | invoices | Invoice |
| InvoiceItem | invoice_items | InvItem |
+-------------+----------------+--------------+
Column names:
+-------------+-----------------+---------------+
| Property | Expected column | Actual column |
+-------------+-----------------+---------------+
| ID | id | IniId |
| Invoice ID | invoice_id | IniInvId |
+-------------+-----------------+---------------+
I figured I could use Views to:
Normalize all table names
Normalize all column names
Make it possible to not use column aliases
Make it possible to use scaffolding
But there's a big but:
Doing it on a database level, Rails will probably not be able to build SQL properly
App will probably be read-only, unless I don't use Views and create a different DB instead and sync them eventually
Those disadvantages are probably even worse when you compare it to just plain aliasing.
And so I ask - is Rails able to somehow transparently know the id column is in fact id, but is InvId in the database and vice versa? I'm talking about complete abstraction - simple aliases just don't cut it when using joins etc. as you still need to use the actual DB name.

Ruby on Rails: Join Tables Concept

So I have been out of the coding game for a while and recently decided to pick up rails. I have a question about the concept of Join tables in rails. Specifically:
1) why are these join tables needed in the database?
2) Why can't I just JOIN two tables on the fly like we do in SQL?
A join table allows a clean linking of association between two independent tables. Join tables reduce data duplication while making it easy to find relationships in your data later on.
E.g. if you compare a table called users:
| id | name |
-----------------
| 1 | Sara |
| 2 | John |
| 3 | Anthony |
with a table called languages:
| id| title |
----------------
| 1 | English |
| 2 | French |
| 3 | German |
| 4 | Spanish |
You can see that both truly exist as separate concepts from one another. Neither is subordinate to the other the way a single user may have many orders, (where each order row might store a unique foreign_key representing the user_id of the user that made it).
When a language can have many users, and a user can have many languages -- we need a way to join them.
We can do that by creating a join table, such as user_languages, to store every link between a user and the language(s) that they may speak. With each row containing every matchup between the pairs:
| id | user_id | language_id |
------------------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 4 |
| 4 | 2 | 1 |
| 5 | 3 | 1 |
With this data we can see that Sara (user_id: 1) is trilingual, while John(user_id: 2) and Anthony(user_id: 3) only speak English.
By creating a join table in-between both tables to store the linkage, we preserve our ability to make powerful queries in relation to data on other tables. For example, with a join table separating users and languages it would now be easy to find every User that speaks English or Spanish or both.
But where join tables get even more powerful is when you add new tables. If in the future we wanted to link languages to a new table called schools, we could simply create a new join table called school_languages. Even better, we can add this join table without needing to make any changes to the languages SQL table itself.
As Rails models, the data relationship between these tables would look like this:
User --> user_languages <-- Language --> school_languages <-- School
By default every school and user would be linked to Language using the same language_id(s)
This is powerful. Because with two join tables (user_languages & school_languages) now referencing the same unique language_id, it will now be easy to write queries about how either relates. For example we could find all schools who speak the language(s) of a user, or find all users who speak the language(s) of a school. As our tables expand, we can ride the joins to find relations about pretty much anything in our data.
tl;dr: Join tables preserve relations between separate concepts, making it easy to make powerful relational queries as you add new tables.

Join between Streaming data vs Historical Data in spark

Let say I have transaction data and visit data
visit
| userId | Visit source | Timestamp |
| A | google ads | 1 |
| A | facebook ads | 2 |
transaction
| userId | total price | timestamp |
| A | 100 | 248384 |
| B | 200 | 43298739 |
I want to join transaction data and visit data to do sales attribution. I want to do it realtime whenever transaction occurs (streaming).
Is it scalable to do join between one data and very big historical data using join function in spark?
Historical data is visit, since visit can be anytime (e.g. visit is one year before transaction occurs)
I did join of historical data and streaming data in my project. Here the problem is that you have to cache historical data in RDD and when streaming data comes, you can do join operations. But actually this is a long process.
If you are updating historical data, then you have to keep two copies and use accumulator to work with either copy at once, so it wont affect the the second copy.
For example,
transactionRDD is stream rdd which you are running at some interval.
visitRDD which is historical and you update it once a day.
So you have to maintain two databases for visitRDD. when you are updating one database, transactionRDD can work with cached copy of visitRDD and when visitRDD is updated, you switch to that copy. Actually this is very complicated.
I know this question is very old but lemme share my viewpoint.Today, this can be easily done in Apache Beam. And this job can run on same spark cluster.

rails user-defined custom columns

I am using Ruby on Rails 4 and MySQL. I have three types. One is Biology, one is Chemistry, and another is Physics. Each type has unique fields. So I created three tables in database, each with unique column names. However, the unique column names may not be known before hand. It will be required for the user to create the column names associated with each type. I don't want to create a serialized hash, because that can become messy. I notice some other systems enable users to create user-defined columns named like column1, column2, etc.
How can I achieve these custom columns in Ruby on Rails and MySQL and still maintain all the ActiveRecord capabilities, e.g. validation, etc?
Well you don't have much options, your best solution is using NO SQL database (at least for those classes).
Lets see how can you work around using SQL. You can have a base Course model with a has_many :attributes association. In which a attribute is just a combination of a key and a value.
# attributes table
| id | key | value |
| 10 | "column1" | "value" |
| 11 | "column1" | "value" |
| 12 | "column1" | "value" |
Its going to be difficult to determin datatypes and queries covering multiple attributes at the same time.

How to make selection content an attribute for a Rails model

I am having a hard time even formulating the question I want to be answered, so here's my situation:
I'm trying to make a simple stock market plotter tool using an existing database I populate elsewhere. My app already has a nice and dynamic plotter that works with any database, but it expects data in a certain way. So say my model (database) looks like this:
Stock:
|___ticker___|___open___|___close___|___date___|
| aapl | 100 | 101 | 1/1/11 |
| aapl | 101 | 102 | 1/2/11 |
| goog | 500 | 450 | 1/1/11 |
| goog | 450 | 451 | 1/2/11 |
...
My plotter routines work off of class attributes (I think thats the terminology), which correspond to columns in the database.
I can select all the data corresponding to 'aapl', and easily plot the open and close versus date since my model has said attributues.
#stock = Stock.select_by_ticker('aapl')
>> #stock.open #=> 100 ...
>> #stock.close #=> 101 ...
>> #stock.date #= 1/1/11 ...
so the attributes would be
{open, close, date}
But if I want to compare say the closing price for different stocks, I need attributes pertaining to each stock. So basically I want to end up with a model with ticker names as attributes, each corresponding to that ticker's hunk in the database. Using easy to build scopes, I want something like:
#stock = Stock.select_close_by_ticker('aapl','goog')
attributes are:
{aapl, goog, date}
where aapl and goog contain the closing price data for just that ticker. I can run multiple database queries if I need to, for now I just want to be able to sort my data into this form. Also, it must be completely dynamic, so I can't hardcode 'aapl', 'goog' and all the millions of other tickers into my model.
Would something like:
stocks = ['appl', 'goog']
Stock.find(:conditions => ['ticker in (?)'], stocks)
work for your scenario?

Resources