This question already has answers here:
Nested hash in redis
(2 answers)
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
I want to get a data structure like this after the sql query:-
Users table has username and city. It has 10000 records
{
"cities_arry": {
"NY": ["john", "Mich", "Roh", "Dh", "Vir"],
"KL": ["Big", "ching", "qull"],
...
}
}
update: it is not possible to store nested hash on redis. So have to use MongoDb or some other tool.
Suppose you have a table users and you have city and username, and you want to find a data structure like above, then how would you approach
How to get faster query result to get data structure like this.
Redis doesn't support nested data structures, and specifically it doesn't support a Hash inside a Hash :) You basically have a choice between two options: either serialize the internal Hash and store it in a Hash field or use another Hash key and just keep a reference to it in a field of the outer Hash.
Related
I am quite a newbie to Cube.js. I have been trying to integrate Cube.js analytics functionality with my Ruby on Rails app. The database is PostgreSQL. In a database, there is a certain column called answers_json with jsonb data type which contains a nested hash. An example of data of that column is:
**answers_json:**
"question_weights_calc"=>
{"314"=>{"329"=>1.5, "331"=>4.5, "332"=>1.5, "333"=>3.0},
"315"=>{"334"=>1.5, "335"=>4.5, "336"=>1.5, "337"=>3.0},
"316"=>{"338"=>1.5, "339"=>3.0}}
There are many more keys in the same column with the same hash structure as shown above. I posted the specific part because I would be dealing with this part only. I need assistance with accessing the values in the hash. The column has a nested hash. In the example above, the keys "314", "315" and "316" are Category IDs. The keys associated with Category ID "314" are "329","331","332", "333"; which are Question IDs. Each category will have multiple questions. For different records, the category and question IDs will be dynamic. For example, for another record, Category ID and Question IDs associated with that category id will be different. I need to access the values associated with the key question id. For example, to access the value "1.5" I need to do this in my schema file:
**sql: `(answers_json -> 'question_weights_calc' -> '314' ->> '329')`**
But the issue here is, those ids will be dynamic for different records in the database. Instead of "314" and "329", they can be some other numbers. Adding different record's json here for clarification:
**answers_json:**
"question_weights_calc"=>{"129"=>{"273"=>6.0, "275"=>15.0, "277"=>8.0}, "252"=>{"279"=>3.0, "281"=>8.0, "283"=>3.0}}}
How can I know and access those dynamic IDs and their values since I also need to perform mathematical operations on values. Thanks!
As a general rule, it's difficult to run SQL-based reporting on highly dynamic JSON data. Postgres does have some useful functions for dealing with JSON, and you might be able to use json_each or json_object_keys plus a few joins to get there, but its quite likely that the performance and maintainability of such a query would be difficult to say the least 😅 Cube.js ultimately executes SQL queries, so if you do go the above route, the query should be easily transferrable to a Cube.js schema.
Another approach would be to create a separate data processing pipeline that collects all the JSON data and flattens it into a single table. The pipeline should then store this data back in your database of choice, from where you could then use Cube.js to query it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I recently discovered that you can add columns to a table in Rails by doing something like:
rails generate migration add_lastname_to_users lastname:string
Previously I used to join tables which was a very complicated to me but, adding a column seems to accomplish the same task.
Why should I choose one over another?
A table represents a single "entity". In this case, it probably makes most sense to store the users.lastname in the same table.
On the other hand, suppose a user can have many phone numbers. In this case, it is better to normalise the database and store this data in a separate table.
In other words, you want to avoid doing something like this:
users.phone_number_1
users.phone_number_2
users.phone_number_3
The key issues with this approach (as explained in more detail by the above link) are:
You'll have lots of redundant columns, for must users. This causes wasted storage space, and decreased performance.
You need to keep adding new columns if a user goes over the limit (e.g. 3 numbers, because there are 3 columns).
Querying the data gets much harder. For example, suppose you want to query "all users who have phone number X" -- you now need to search across multiple columns!
Instead, create a separate phones table - which is joined the the user by a user_id column.
I guess it depends on your application, but generally it's best to "normalize" your database. That is, define individual tables for specific objects. A user table might have the fields user_id, first_name, & last_name. You can then join on the user_id field. This tends to make your lookups faster and your tables smaller.
https://en.wikipedia.org/wiki/Database_normalization
This is not so much related to rails and ActiveRecord but more a question of database design.
Without going into too much detail: In a relational database management system you join tables (or columns in your case?) when some piece of information you need is already available in a different place (usually another table) (hence the "relational" in the name, tables are related to each other and share information). You do not want to repeat data.
This is different in NoSQL databases where joins might not even exist (MongoDB has the notion of embedded documents, you will at some point have to repeat data)
I your case, it is easy (from the point of view of the DB) to just take the already available information (first_name + last_name) and return that. Adding another column with the same information seems 'wasteful'.
You should be able to define a helper method in your model that returns the full name, see create a name helper
This question already has answers here:
SQL join: selecting the last records in a one-to-many relationship
(13 answers)
Closed 6 years ago.
Having trouble with the below query:
I have a surveys table
The surveys table has a foreign key to a contact (via contact_id)
There are mutiple surveys per contact
The survey has a column called scheduled_at with time data
I want a query off Surveys with one instance per contact, where that survey has the recent most scheduled_at compared to other instances with the same contact foreign key.
While this seems like a good SQL answer, wondering if there is a cleaner ActiveRecord solution?
Try running following.
#contact.surveys.order(scheduled_at: :desc).first(5)
This will return the 5 most recent surveys of that order.
Assumption:
#contact is the an object of your Contact model
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to learn about Data Warehouses right now, but I really don't get it. My question isn't really specific, but I just want somebody to explain to me the idea of data warehouses.
I'm trying right now to create a data warehouse out of SO's database.
In this database there are 8 tables, they are pretty self-explanatory for those who use SO:
Badges
Comments
PostHistory
PostLinks
Posts
Tags
Users
Votes
1. Dimensions
What would be the dimensions? That's the big part I don't understand. For me I see 7 dimensions: Badges, Comments, Posts, PostLinks, Tags, Users and Votes. But then I don't see the point of using data warehouses, the dimensions are exactly the tables.
-Would date be a dimension? Date of what? Of each comment AND post?
-Would it be relevant to separate Post into a Question dimension and an Answer dimension?
-What other dimensions can I put?
2. Fact Table
How can I put all the foreign keys (userId, postId, commentId...) in one table? For example, let's say a user posts a question but there's no comment. I would have a line in my fact table with his userId, the postId an NULL in the commentId column?
Measures. I'm thinking of the following measures in the fact table: number of questions, number of users, number of tags...
Can someone tell me about if I'm going in the right direction?
The first question you have to answer when building a data warehouse is "What question(s) do I want to answer?"
Using Stack Overflow as an example, one question could be, "How many posts are there about X each month over the last 2 years?"
To answer this question, we need to create Posts and Post Tags fact tables. Since these tables are select and insert only, we can denormalize the fact data so it's easier to select.
So, we might have a Post fact table that looks something like this.
Post
----
Post Number
Post Text
Post Timestamp
Post Tag 1
Post Tag 2
Post Tag 3
Post Tag 4
Post Tag 5
It would be somewhat straightforward to select based on the timestamp and group by month. We only care about the first 5 post tags, and we don't care if some of them are null.
Now, you don't have to denormalize the data. Generally, queries run faster if you denormalize the data.
You do the same thing for the other data available. What question(s) do you want to answer?
Stack Overflow is probably not the best data model to consider if you're trying to wrap your head around the concept of DW. It doesn't contain many "traditional" facts. The only examples which jump to my mind immediately are the Up/Down votes and the user rankings.
You would find many, of what we call "factless facts". These essentially treat the intersection of multiple dimensions as a fact, with just an implied "count" as the sole fact. As an example, in the Post Fact, it would simply be a count at the intersection of User, Date, SO Database, etc.
You would probably consider a concept such as the Junk dimension to support referencing the Tags in a Fact table. This would see you assign a pseudo key to each unique combination of Tags, and then this key is what you would store in the Fact table.
If you want to learn about DW, use your personal finances, this is how I learned. You can learn about snapshot facts with your account balances, you can learn about transactional facts with your purchases, and you can create Vendor and Account dimensions, among others.
This question already has answers here:
How to convert records including 'include' associations to JSON
(2 answers)
Closed 9 years ago.
I'm trying to store the results of a model and an association in a variable like so:
data = Sale.all.includes(:book)
However, all this does is store active record objects for Sale in data with no data on the associated Book (each Sale has_one Book)
So, I tried:
Sale.all.includes(:book).map |sale|
puts sale.book
puts sale
end
Which gives me exactly what I'm looking for. However I need all of this store inside of data variable.
What is the best way to get the sale data along with the book data (for each sale) into some sort of hash or JSON or XML?
Hope I'm explaining this well enough
You can use the as_json that returns a Hash and accepts some additional serialization options.
data = Sale.all.includes(:book).as_json(include: [:book])