I'm using Cassandra 0.8.2
I am attempting to use the "valueless column" technique to set up my cassandra schema. The idea behind the valueless column is the following: The name of your column becomes the relevant information & the value of the "name/value" pair is empty. This is used to make queries faster - an example of denormalization. I want the name of the column to be the url of the back link. The row key is be a UUID of the target url of the back link. Is this even a good idea/schema design?
I'm using a very basic example to get the point of my question across. Here's what I have set up using the Cassandra-Cli:
create column family ArticleBackLinks
with comparator = UTF8Type
and key_validation_class = UTF8Type
and default_validation_class = UTF8Type
and column_metadata =
[
{column_name: www.arstechnica.com, validation_class: UTF8Type},
{column_name: www.apple.com, validation_class:UTF8Type},
{column_name: www.cnn.com, validation_class: UTF8Type},
{column_name: www.stackoverflow.com, validation_class: UTF8Type},
{column_name: www.reddit.com, validation_class: UTF8Type}
];
I get the error:
Command not found: `create column family ArticleBackLink...
I think my error is due to the period I am using in the column_name. In short, I would like to know if some of you have come across better ways to use the "valueless column" idea in Cassandra? Any good/better examples of the valueless column technique? Is my idea even the right way to use the valueless column technique?
Thanks in advance guys.
I think Cassandra does not like the dot in column_name, the following works
[default#stackoverflow] create column family ArticleBackLinks with
... comparator = UTF8Type and
... default_validation_class = UTF8Type and
... column_metadata =
... [
... {column_name: 'www.arstechnica.com', validation_class: UTF8Type},
... {column_name: 'www.apple.com', validation_class:UTF8Type},
... {column_name: 'www.cnn.com', validation_class: UTF8Type},
... {column_name: 'www.stackoverflow.com', validation_class: UTF8Type},
... {column_name: 'www.reddit.com', validation_class: UTF8Type}
... ];
881b31f0-bc64-11e0-0000-242d50cf1ff7
Waiting for schema agreement...
... schemas agree across the cluster
By the way, since you are using Cassandra 0.8.2 you should leverage CQL
So, statement like this will be helpful in future
UPDATE <COLUMN FAMILY> [USING <CONSISTENCY>
[AND TIMESTAMP <timestamp>] [AND TTL <timeToLive>]]
SET name1 = value1, name2 = value2 WHERE <KEY> = keyname;
Refer this
updated: added more thoughts as comment asked
It's a good idea to keep grouped information at one place. It adds on efficiency that Cassandra provides.
For example, your case can have category as RowKey and urls be column_name. So, on your front end, you can display categorized view quickly, because you know that arstechnicia and stackoverflow comes under technology group which is a rowKey. It adds a tiny bit of extra work when you insert data.
I use Cassandra 0.6.x, so sadly I can't tell a lot about secondary index that Cassandra 0.7.0+ supports. But supposedly, you can achieve what explained above by adding a column say, category, in the main CF whose index is held by ArticleBackLink and just query using CQL's select... where....
You might look into secondary index that might vanish the need of having a new 'index CF`. You may want to look into these:
Secondary Index in Cassandra 0.7
Cassandra Wiki FAQ
Q: Is there a difference between creating a secondary index vs creating an "index" CF manually such as "users_by_country"?
A: Yes. First, when creating your own index, a node may index data held by another node. Second, updates to the index and data are not atomic.
Related
Here is my simplified graph schema,
package:
property:
- name: str (indexed)
- version: str (indexed)
I want to query the version using multiple set of property criteria within single query. I can use within for a list of single property, but how to do it for multiple properties?
Consider I have 10 package nodes, (p1,v1, p2,v2, p3,v3,.. p10,v10)
I want to select only nodes which has (p1 with v1, p8 with v8, p10 with v10)
Is there a way to do with single gremlin query?
Something equivalent to SELECT * from package WHERE (name, version) in ((p1,v1),(p8,v8),(p10,v10)).
It's always best to provide some sample data when asking questions about Gremlin. I assume that this is an approximation of what your model is:
g.addV('package').property('name','gremlin').property('version', '1.0').
addV('package').property('name','gremlin').property('version', '2.0').
addV('package').property('name','gremlin').property('version', '3.0').
addV('package').property('name','blueprints').property('version', '1.0').
addV('package').property('name','blueprints').property('version', '2.0').
addV('package').property('name','rexster').property('version', '1.0').
addV('package').property('name','rexster').property('version', '2.0').iterate()
I don't think that there is a way that you can compare pairs of inputs and expect an index hit. You therefore have to do what you normally do in graphs and choose the index to best narrow your results before you filter in memory. I would assume that in your case this would be the "name" property, therefore grab those first then filter the pairs:
gremlin> g.V().has('package','name', within('gremlin','blueprints')).
......1> elementMap().
......2> where(select('name','version').is(within([name:'gremlin',version:'2.0'], [name:'blueprints',version:'2.0'])))
==>[id:3,label:package,name:gremlin,version:2.0]
==>[id:12,label:package,name:blueprints,version:2.0]
this might not be the most "creative" way of doing that,
but I think that the easiest way would be to use or:
g.V().or(
hasLabel('v1').has('prop', 'p1'),
hasLabel('v8').has('prop', 'p8'),
hasLabel('v10').has('prop', 'p10')
)
example: https://gremlify.com/6s
I have a domain object:
class Business {
String name
List subUnits
static hasMany = [
subUnits : SubUnit,
]
}
I want to get name and subUnits using HQL, but I get an error
Exception: org.springframework.orm.hibernate4.HibernateQueryException: not an entity
when using:
List businesses = Business.executeQuery("select business.name, business.subUnits from Business as business")
Is there a way I can get subUnits returned in the result query result as a List using HQL? When I use a left join, the query result is a flattened List that duplicates name. The actual query is more complicated - this is a simplified version, so I can't just use Business.list().
I thought I should add it as an answer, since I been doing this sort of thing for a while and a lot of knowledge that I can share with others:
As per suggestion from Yariash above:
This is forward walking through a domain object vs grabbing info as a flat list (map). There is expense involved when having an entire object then asking it to loop through and return many relations vs having it all in one contained list
#anonymous1 that sounds correct with left join - you can take a look at 'group by name' added to end of your query. Alternatively when you have all the results you can use businesses.groupBy{it.name} (this is a cool groovy feature} take a look at the output of the groupBy to understand what it has done to the
But If you are attempting to grab the entire object and map it back then actually the cost is still very hefty and is probably as costly as the suggestion by Yariash and possibly worse.
List businesses = Business.executeQuery("select new map(business.name as name, su.field1 as field1, su.field2 as field2) from Business b left join b.subUnits su ")
The above is really what you should be trying to do, left joining then grabbing each of the inner elements of the hasMany as part of your over all map you are returning within that list.
then when you have your results
def groupedBusinesses=businesses.groupBy{it.name} where name was the main object from the main class that has the hasMany relation.
If you then look at you will see each name has its own list
groupedBusinesses: [name1: [ [field1,field2,field3], [field1,field2,field3] ]
you can now do
groupedBusinesses.get(name) to get entire list for that hasMany relation.
Enable SQL logging for above hql query then compare it to
List businesses = Business.executeQuery("select new map(b.name as name, su as subUnits) from Business b left join b.subUnits su ")
What you will see is that the 2nd query will generate huge SQL queries to get the data since it attempts to map entire entry per row.
I have tested this theory and it always tends to be around an entire page full of query if not maybe multiple pages of SQL query created from within HQL compared to a few lines of query created by first example.
I'd like to have something like this in a bean:
ownWheelList ownSpareList
Those two lists are of the same model type. Both hold wheelbeans.
So while the first is good the second is not. This is clear, because RedBeans awaits beans of type spare which do not exists.
Is it possible to do something like aliasing on list like it is on objects?
I can't see it working, when you use ownList, it means you'll find every registry that have a column named mainbean_id on each table (wheel and spare).
So if you want to differ wheel and spare you could use a type column such as wheel = 1 AND spare = 2, then you can load the list using a with condition like $mainbean->with("AND type = ?", [1])->ownWheelList to get wheel list and $mainbean->with("AND type = ?", [2])->ownWheelList to get spare list.
For various reasons, I'm creating an app that takes a SQL query string as a URL parameter and passes it off to Postgres(similar to the CartDB SQL API, and CFPB's Qu). Rails then renders a JSON response of the results that come from Postgres.
Snippet from my controller:
#table = ActiveRecord::Base.connection.execute(#query)
render json: #table
This works fine. But when I use Postgres JSON functions (row_to_json, json_agg), it renders the nested JSON property as a string. For example, the following query:
query?q=SELECT max(municipal) AS series, json_agg(row_to_json((SELECT r FROM (SELECT sch_yr,grade_1 AS value ) r WHERE grade_1 IS NOT NULL))ORDER BY sch_yr ASC) AS values FROM ed_enroll WHERE grade_1 IS NOT NULL GROUP BY municipal
returns:
{
series: "Abington",
values: "[{"sch_yr":"2005-06","value":180}, {"sch_yr":"2005-06","value":180}, {"sch_yr":"2006-07","value":198}, {"sch_yr":"2006-07","value":198}, {"sch_yr":"2007-08","value":158}, {"sch_yr":"2007-08","value":158}, {"sch_yr":"2008-09","value":167}, {"sch_yr":"2008-09","value":167}, {"sch_yr":"2009-10","value":170}, {"sch_yr":"2009-10","value":170}, {"sch_yr":"2010-11","value":153}, {"sch_yr":"2010-11","value":153}, {"sch_yr":"2011-12","value":167}, {"sch_yr":"2011-12","value":167}]"
},
{
series: "Acton",
values: "[{"sch_yr":"2005-06","value":353}, {"sch_yr":"2005-06","value":353}, {"sch_yr":"2006-07","value":316}, {"sch_yr":"2006-07","value":316}, {"sch_yr":"2007-08","value":323}, {"sch_yr":"2007-08","value":323}, {"sch_yr":"2008-09","value":327}, {"sch_yr":"2008-09","value":327}, {"sch_yr":"2009-10","value":336}, {"sch_yr":"2009-10","value":336}, {"sch_yr":"2010-11","value":351}, {"sch_yr":"2010-11","value":351}, {"sch_yr":"2011-12","value":341}, {"sch_yr":"2011-12","value":341}]"
}
So, it only partially renders the JSON, running into problems when I have nested JSON arrays created with the Postgres functions in the query.
I'm not sure where to start with this problem. Any ideas? I am sure this is a problem with Rails.
ActiveRecord::Base.connection.execute doesn't know how to unpack database types into Ruby types so everything – numbers, booleans, JSON, everything – you get back from it will be a string. If you want sensible JSON to come out of your controller, you'll have to convert the data in #table to Ruby types by hand and then convert the Ruby-ified data to JSON in the usual fashion.
Your #table will actually be a PG::Result instance and those have methods such as ftype (get a column type) and fmod (get a type modifier for a column) that can help you figure out what sort of data is in each column in a PG::Result. You'd probably ask the PG::Result for the type and modifier for each column and then hand those to the format_type PostgreSQL function to get some intelligible type strings; then you'd map those type strings to conversion methods and use that mapping to unpack the strings you get back. If you dig around inside the ActiveRecord source, you'll see AR doing similar things. The AR source code is not for the faint hearted though, sorry but this is par for the course when you step outside the narrow confines of how AR things you should interact with databases.
You might want to rethink your "sling hunks of SQL around" approach. You'll probably have an easier time of things (and be able to whitelist when the queries do) if you can figure out a way to build the SQL yourself.
The PG::Result class (the type of #table), utilizes TypeMaps for type casts of result values to ruby objects. For your example, you could use PG::TypeMapByColumn as follows:
#table = ActiveRecord::Base.connection.execute(#query)
#table.type_map = PG::TypeMapByColumn.new [nil, PG::TextDecoder::JSON.new]
render json: #table
A more generic approach would be to use the PG::TypeMapByOid TypeMap class. This requires you to provide OIDs for each PG attribute type. A list of these can be found in pg_type.dat.
tm = PG::TypeMapByOid.new
tm.add_coder PG::TextDecoder::Integer.new oid: 23
tm.add_coder PG::TextDecoder::Boolean.new oid: 16
tm.add_coder PG::TextDecoder::JSON.new oid: 114
#table.type_map = tm
I'm importing data from old spreadsheets into a database using rails.
I have one column that contains a list on each row, that are sometimes formatted as
first, second
and other times like this
third and fourth
So I wanted to split up this string into an array, delimiting either with a comma or with the word "and". I tried
my_string.split /\s?(\,|and)\s?/
Unfortunately, as the docs say:
If pattern contains groups, the respective matches will be returned in the array as well.
Which means that I get back an array that looks like
[
[0] "first"
[1] ", "
[2] "second"
]
Obviously only the zeroth and second elements are useful to me. What do you recommend as the neatest way of achieving what I'm trying to do?
You can instruct the regexp to not capture the group using ?:.
my_string.split(/\s?(?:\,|and)\s?/)
# => ["first", "second"]
As an aside note
into a database using rails.
Please note this has nothing to do with Rails, that's Ruby.