Use RAKE Task to move CSV to POSTGRES relational DB - ruby-on-rails

I have the following models.
class Lesson < ActiveRecord::Base
has_many :books
has_many :vocabularies
has_many :sentenses
end
class Book < ActiveRecord::Base
belongs_to :lesson
end
class Vocabulary < ActiveRecord::Base
belongs_to :lesson
end
class Sentense < ActiveRecord::Base
belongs_to :lesson
end
With the following table schema:
Table Lesson [lesson_number, lesson_name]
Table Books [lesson_id, page_start, page_finish]
Table Vocabulary [lesson_id, word, meaning]
Table Sentences [lesson_id, sentence, sentence meaning]
And I have a CSV file with 15,000 lessons. The CSV file uses the same structure of 2 books, 10 vocabulary, 2 sentences consistently throughout all lesson plans.
My thoughts are to start like this.
namespace :import_csv do
desc "IMPORT Lessons"
task :lessons => :environment do
CSV.foreach('CcyTbl.csv') do |row|
lesson_name_id = row[0]
lesson_name = row[1]
Lesson.create(lesson_name_id: lesson_name_id, lesson_name: lesson_name)
end
end
desc "IMPORT BOOKS"
task :books => :environment do
CSV.foreach('CcyTbl.csv') do |row|
lesson_name_id = row[0]
book_name = row[3]
book_start_pg = row[7]
book_end_pg = row[8]
Lesson.create(lesson_name_id: lesson_name_id, book_name: book_name, book_end_pg: book_end_pg)
end
end
That much seems straight forward but I am struggling with:
How to handle null values.
Some lessons have two books
(Think column 3 has book1 and book2 is column 9 and sometimes book2 is null)
Lessons might have 5-10 vocabulary words
(Column 10 vocabulary 1, column 11 vocabulary 1 meaning, column 12 vocabulary, etc)
What is the best way to import the data in this CSV into their respective tables? Does it make more sense to create multiple rake tasks to do each portion or can it be done in one go?
UPDATE
Here is a link to a sample of the header row and first row of data.
(It is a bit too long to share a picture.)

You may want to create a data object that makes it easier to work with the CSV data. Decoupling the CSV format with the model creation will make the whole process simpler:
csv = CSV.new(body, headers: true, header_converters: :symbol, converters: :all)
data = csv.to_a.map {|row| row.to_hash }
See CSV reference.
Now we have an easy way to access each field.
data.each do |d|
lesson = Lesson.create!(d[:join], ...)
book = Book.create!(lesson: lesson, page_start:..)
end
BTW & FWIW,
class Sentense < ActiveRecord::Base
should be
class Sentence < ActiveRecord::Base

Related

how to prepare csv data using associated model with missing values in rails

I'm trying to prepare a csv file which'll be generated from multiple tables.
So I've my model setup like bellow
# Student
# attributes
:id, :name
class Student < ActiveRecord::Base
has_many :answers
end
# Question model
# attributes
:id, :question
class Question < ActiveRecord::Base
has_many :answers
end
# Answer model
# attributes
:id, :question_id, :student_id, :answer
class Answer < ActiveRecord::Base
belongs_to :question
belongs_to :student
end
Now I want to prepare a csv file.The header of the csv file will be the actual question of the Question model sorted(by id) and then prepare the csv data accordingly from Answer modoel .Now student might not answers all the questions.So I need to set n/a if a student doesn't answer a particular question. Offcourse the questions are sorted by id and the answer is also sorted by question_id. The output will be.
what_is_you_name, how_old_are_you, what_is_your_hobby
monsur, 18, playing football
ranjon, n/a, gardening
n/a, n/a, running
I query the Answer model.Because some of the Student skipped few question therefor there are no answer object present that's why the answer is in wrong position.
what_is_you_name, how_old_are_you, what_is_your_hobby
monsur, 18, playing football
ranjon, gardening
alex,running
So I need to set n/a if the particular student skipped a question .
I can't figure it out how to solve this problem.
You could overwrite attribute ie. if Answer model has field age you could do sth like:
def age
self.age || "n/a"
end
This should return age if it's present or "n/a" if it's nil.
If this does not solve your problem please provide a way how do you create CSV.
Ok after your explanation i came up with sth like this:
Create default array default = ['n/a', 'n/a', 'n/a']
You should add :question_id to .pluck like here:
answers_arr = student.answers.order(question_id: :asc).pluck(:question_id, :answer) it gives you sth like [[1, "18"], [2, "playing football"]]
Replace values in default array with values from pluck:
answers_arr.each { |e| default[e[0] -1] = e[1] } so default should look like ["18", "playing football", "n/a"]
EDIT
Here is a code from the comment below that solved problem:
response = []
all_response = []
qs_id = [1,2.3]
answers.each do |answer|
qs_ids.each do |q_id|
if answer.question_id == q_id
response << answer.answer
else
response << 'n/a'
end
if qs_ids.last = q_id
all_response << response
end
end
end

Count items in arrays cross 100's of thousands of records

I have a Rails app with a Postgres database that has an Artists table with a jsonb genres column.
There are hundreds of thousands of rows.
Each genre column in the row has an array like ["rock", "indie", "seen live", "alternative", "indie rock"] with different genres.
What I want to do is output a count of each genre in JSON across all the rows.
Something like: {"rock": 532, "power metal": 328, "indie": 862}
Is there a way to efficiently do that?
Update...here's what I've got at the moment...
genres = Artist.all.pluck(:genres).flatten.delete_if &:empty?
output = Hash[genres.group_by {|x| x}.map {|k,v| [k,v.count]}]
final = output.sort_by{|k,v| v}.to_h
Output is a hash instead of JSON, which is fine.
But already feels pretty slow, so I'm wondering if there's a better way to do it.
This is an extremely trivial task if you just use a decent relational db design:
class Artist < ApplicationRecord
has_many :artist_genres
has_many :genres, through: :artist_genres
end
class Genre < ApplicationRecord
has_many :artist_genres
has_many :artists, through: :artist_genres
end
class ArtistGenre < ApplicationRecord
belongs_to :artist
belongs_to :genre
end
You could then get the result by:
class Genre < ApplicationRecord
has_many :artist_genres
has_many :genres, through: :artist_genres
# This will instanciate a record for each row just like your average scope
# and return a ActiveRecord::Relation object.
def self.with_artist_counts
self.joins(:artist_genres)
.select('genres.name, COUNT(artist_genres.id) AS artists_count')
.group(:id)
end
# This pulls the columns as raw sql results and creates a hash with the genre
# name as keys
def self.pluck_artist_counts
self.connection.select_all(with_artist_counts.to_sql).inject({}) do |hash, row|
hash.merge(row["name"] => row["artists_count"])
end
end
end
On re-reading your question you state that the column IS a JSONb type. So the answer below will not work since you need to first get the array from the jsonb column. This should work better:
output = Artist.connection.select_all('select genre, count (genre) from (select id, JSONB_ARRAY_ELEMENTS(genres) as genre from artists) as foo group by genre;')
=> #<ActiveRecord::Result:0x00007f8ef20df448 #columns=["genre", "count"], #rows=[["\"rock\"", 5], ["\"blues\"", 5], ["\"seen live\"", 3], ["\"alternative\"", 3]], #hash_rows=nil, #column_types={"genre"=>#<ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Jsonb:0x00007f8eeef5d500 #precision=nil, #scale=nil, #limit=nil>, "count"=>#<ActiveModel::Type::Integer:0x00007f8eeeb4c060 #precision=nil, #scale=nil, #limit=nil, #range=-2147483648...2147483648>}>
output.rows.to_h
=> {"\"rock\""=>5, "\"blues\""=>5, "\"seen live\""=>3, "\"alternative\""=>3}
As mentioned in the comments, if you can change the DB to normalize it, go for it. An anonymous array in a jsonb column is just going to be painful going forward. If you need to use this answer I would at least think about adding a view to the DB so that you can get the genre count as a table that has a corresponding model in rails (that you can just create in your model definitions).
Original answer when I thought your column was a regular array column type in Postgres.
Here is a SQL way to do it in Rails:
genre_count = Artist.connection.select_all('SELECT
UNNEST(genres),
COUNT (UNNEST(genres))
FROM
artists
GROUP BY
UNNEST(genres);')
You can then use the method of your choice to turn a much smaller dataset into JSON.
I am not familiar enough with UNNEST know why I can't alias it like any other column to make it prettier. But it works.
http://sqlfiddle.com/#!15/30597/21/0

Ruby do-loop to break down an array/hash

I made a self referring database using the has_many :through relationship:
**Product**
name
**Ingredient**
quantity
product_id
product_component_id
I can have an egg, carton of 12 eggs, and a flat of 16 cartons.
I am trying to write a loop that starts with a product and breaks down all the components of each product and those to the most basic state. The goal is to return an array of all the base products that go into any given product so the carton would return 12 eggs and the flat would return 192 Eggs.
I gave it a shot and this is how far I got:
def product_breakdown
results = []
ingredients.each do |ingredient|
if ingredient.product_component_id == nil
results += ingredient
else
Keep digging deeper?
end
end
return results
end
I am missing a whole concept when it comes to using the loop. If anyone has an advise on the name of the concepts that this requires, I would be very appreciative.
edit in order to be more clear I copied the relationships of the database.
class Product < ActiveRecord::Base
has_many :ingredients
has_many :product_components, :through => :ingredients
end
class Ingredient < ActiveRecord::Base
belongs_to :product
belongs_to :product_component, class_name: "Product", :foreign_key => "product_component_id"
end
I suggest using each_with_object to build the array. That way you don't even need the results variable, just return each_with_object's return value.
How do you differentiate between a unit, carton, and flat?
If I understand correctly, each ingredient has a component which can be nil, Carton, or Flat? And one carton always contains 12 units, and one flat 16 cartons? And a source, which is the type of ingredient (egg, milk, etc?)
In that case, I'd define a couple helper methods on Ingredient, an as_unit class method and a unit_quantity instance method:
def unit_quantity
case product_component_id
when nil
quantity
when CARTON_COMPONENT_ID
12 * quantity
when FLAT_COMPONENT_ID
192 * quantity
end
end
def self.as_unit ingredients
source_ids = ingredients.map(&:product_source_id).uniq
raise "Can't join different types together" if source_ids.count != 1
source_id = source_ids.first
quantity = ingredients.reduce(0) { |total, ingredient| total += ingredient.unit_quantity }
Ingredient.new quantity: quantity, product_component_id: nil, product_source_id: source_id
end
That way, you can rewrite products_breakdown to be:
def products_breakdown ingredients
ingredients.group_by(&:product_source_id).map do |_, ingredients|
Ingredient.as_unit ingredients
end
end
This should result in:
$ ingredients
#=> [<Ingredient: 3 Cartons of Egg>, <Ingredient: 2 Flats of Milk>, <17 Units of Egg>]
$ product_breakdown ingredients
#=> [<Ingredient: 53 Units of Egg>, <Ingredient: 384 Units of Milk>]
Is this at all what you were looking for? I'm not sure I fully understood your question...

Rails to excel with associations

i'am new on rails , i've spent my day looking for solution to export my data into excel file.
I've tryed to_xls, simple_xlxs and other gems, also i've tryed to render xml template , but i've no success.
So i have associations between 2 models:
Call model:
class Call < ActiveRecord::Base
belongs_to :number
has_many :results
end
and my result model:
class Result < ActiveRecord::Base
belongs_to :call
end
So i need to genrate excel tables with my OWN headers in table calle'd as i want.
And also i wan't that in this excel file will be columns from my associeted model
What can i do?
Thanks
# install in a gem file or in the terminal to test
# gem install spreadsheet
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new
sheet1 = book.create_worksheet :name => 'test'
money_format = Spreadsheet::Format.new :number_format => "#,##0.00 [$€-407]"
date_format = Spreadsheet::Format.new :number_format => 'MM.DD.YYYY'
# set default column formats
sheet1.column(1).default_format = money_format
sheet1.column(2).default_format = date_format
# depending on your data you obviously have to create a loop that fits the format of your data
sheet1.row(0).push "just text", 5.98, DateTime.now
book.write 'sample.xls'
The above code example writes the data to columns you create. You can have your info in csv style. So if you're returning objects you can just get values for each object and join the array with ',' separation and loop through.

ActiveRecord group by on a join

Really been struggling trying to get a group by to work when I have to join to another table. I can get the group by to work when I don't join, but when I want to group by a column on the other table I start having problems.
Tables:
Book
id, category_id
Category
id, name
ActiveRecord schema:
class Category < ActiveRecord::Base
has_many :books
end
class Book < ActiveRecord::Base
belongs_to :category
end
I am trying to get a group by on a count of categories. I.E. I want to know how many books are in each category.
I have tried numerous things, here is the latest,
books = Book.joins(:category).where(:select => 'count(books.id), Category.name', :group => 'Category.name')
I am looking to get something back like
[{:name => fiction, :count => 12}, {:name => non-fiction, :count => 4}]
Any ideas?
Thanks in advance!
How about this:
Category.joins(:books).group("categories.id").count
It should return an array of key/value pairs, where the key represents the category id, and the value represents the count of books associated with that category.
If you're just after the count of books in each category, the association methods you get from the has_many association may be enough (check out the Association Basics guide). You can get the number of books that belong to a particular category using
#category.books.size
If you wanted to build the array you described, you could build it yourself with something like:
array = Categories.all.map { |cat| { name: cat.name, count: cat.books.size } }
As an extra point, if you're likely to be looking up the number of books in a category frequently, you may also want to consider using a counter cache so getting the count of books in a category doesn't require an additional trip to the database. To do that, you'd need to make the following change in your books model:
# books.rb
belongs_to :category, counter_cache: true
And create a migration to add and initialize the column to be used by the counter cache:
class AddBooksCountToCategories < ActiveRecord::Migration
def change
add_column :categories, :books_count, :integer, default: 0, null: false
Category.all.each do |cat|
Category.reset_counters(cat.id, :books)
end
end
end
EDIT: After some experimentation, the following should give you close to what you want:
counts = Category.joins(:books).count(group: 'categories.name')
That will return a hash with the category name as keys and the counts as values. You could use .map { |k, v| { name: k, count: v } } to then get it to exactly the format you specified in your question.
I would keep an eye on something like that though -- once you have a large enough number of books, the join could slow things down somewhat. Using counter_cache will always be the most performant, and for a large enough number of books eager loading with two separate queries may also give you better performance (which was the reason eager loading using includes changed from using a joins to multiple queries in Rails 2.1).

Resources