Rails + Capistrano, seed production database with file uploads? - ruby-on-rails

for a rails app, I'm using seeds.rb to populate the database with records and associated image-uploads. The seeds.rb gets all records data from a given YAML-file and grabs image-files from a folder to upload them. This works well in development-environment:
Folder Structure:
rails_app/
db/seeds.rb
...
data/
images1/
image1.jpg
image2.jpg
images2/
...
data.yml
data.yml:
item1:
description: Some description
filepath: images1/image1.jpg
item2:
description: ...
seeds.rb:
items = YAML.load_file(File.join(Rails.root, '..', 'data', 'data.yml'))
items.each do |item, details|
# create items with file-uploads, etc.
...
end
As all database-content is ready for production like this, we want to seed the production database via rake db:seed and access my local YAML-file and image-folder to create the records with their associated file-uploads.
To deploy, I'm using Capistrano and already found a task to seed data to production...
# Add this in config/deploy.rb
# and run 'cap production deploy:seed' to seed your database
desc 'Runs rake db:seed'
task :seed => [:set_rails_env] do
on primary fetch(:migration_role) do
within release_path do
with rails_env: fetch(:rails_env) do
execute :rake, "db:seed"
end
end
end
end
...Unfortunately, this task only works with the seeds.rb on the production server and thus can not find the YAML or images on my local machine.
How can I write a task for Capistrano to access my local YAML and files and db:seed them to the database?
(Appearantly it's not a common practice to seed the production database, but it worked well to get a YAML from the client-side with all files and already use this "proper" data for development/design)
Thanks!

At a high level, you'll want to create a task that is a prerequisite of deploy:seed. That will ensure your task is run first, just before the seed script is executed.
In terms of that task you create, you want it to upload certain files to the same server and relative to the same directory as where the seed task will be run. Looking at the seed task you pasted in your post, we can see the directory is release_path and the server is primary fetch(:migration_role).
Therefore, I suggest writing a task like this:
task :upload_seed_data do
on primary fetch(:migration_role) do
execute :mkdir, "-p", release_path.join("../data")
upload! "../data/data.yml", release_path.join("../data/data.yml")
# ... and so on for all files you want to upload
end
end
# Register the prerequisite
before "deploy:seed", :upload_seed_data

Related

How do I read from a local file and write to Heroku remote (rails app)

I have an app that's working locally. I deployed it to heroku but the DB is completely empty.
I have an excel spreadsheet. Just FYI using excel as storage wasn't my idea. I also have a rake task that reads columns from the excel and writes to my DB.
require './lib/excel_util'
namespace :import do
desc 'Import data from spreadsheet'
task data: :environment do
data =
Roo::Spreadsheet.open(
'lib/assets/sheet.xlsx'
)
headers = data.row(1) # index starts at 1 JFC
data.each_with_index do |row, idx|
next if idx == 0 # skip header # create hash from headers and cells
user_data = preprocess(data) # from my util library. mostly basic string operations
user = User.new(user_data)
user.save!
end
end
end
Locally, I ran bundle exec rails import:data to populate the users db with excel entries. Everything works locally.
Now, I've tried to run heroku run rake db:migrate and heroku rake import:data. The second one doesn't work because I do not commit the excel file to my git repo.
How do I read from my excel locally and write DB entries to Heroku? The excel has about 100 entries. the approaches i can think of are
somehow scp this file to heroku and execute the same rake task on Heroku
find a way to execute rake task locally, but write to the remote DB on heroku
store the file in another remote location, like S3. seems like an overkill to me.
just commit this file in my git repo. i really don't want to do that.
I prefer option 2. I can live with option 3 if we can get a free tier. I really don't wanna go with 1 or 4. Open to suggestions!

Rails: Loading schema into secondary database

How does one load a schema into a secondary database? It seems that the ability to set a secondary database connection while maintaining the main ActiveRecord::Base.connection is not supported in Rails.
Domain Definition
We have models using a secondary database. Our primary database is MySQL, the secondary database is PostgreSQL. To use the ActiveRecord documentation's example:
|
+-- Book
| |
| +-- ScaryBook
| +-- GoodBook
+-- Author
+-- BankAccount
Where Book is abstract and uses establish_connection to connect to Postgres.
When dealing with the database, we can either use ActiveRecord::Base.connection or Book.connection.
Schema dump
To wit: Rails database tasks in the schema namespace allow us to dump the schema as so:
ActiveRecord::SchemaDumper.dump(ActiveRecord::Base.connection, file)
Which would allow me to do the following:
ActiveRecord::SchemaDumper.dump(Book.connection, file)
Problem: Schema Load
However, the load task holds no such ability. It merely evaluates the schema file as a whole:
desc 'Load a schema.rb file into the database'
task :load => :environment do
file = ENV['SCHEMA'] || "#{Rails.root}/db/schema.rb"
if File.exists?(file)
load(file)
else
abort %{#{file} doesn't exist yet. Run "rake db:migrate" to create it then try again. If you do not intend to use a database, you should instead alter #{Rails.root}/config/application.rb to limit the frameworks that will be loaded}
end
end
Where the schema file runs ActiveRecord::Schema.define without the connection definition. (Noting that the define method runs in the context of "the current connection adapter").
How do I change that "current connection adapter" without doing an ad-hoc ActiveRecord::Base.establish_connection, which is not what I want to do? I essentially want to run ActiveRecord::Schema.define in the context of Book.connection.
Edit: I'll note that I need a programmatic solution outside of running rake tasks, which is why I'm looking within the rake tasks to see what they're actually doing.
How to dump schema:
ActiveRecord::Base.establish_connection "custom_db_#{Rails.env}".to_sym
File.open Rails.root.join('db/schema_custom_db.rb'), 'w:utf-8' do |file|
ActiveRecord::SchemaDumper.dump ActiveRecord::Base.connection, file
end
How to load schema:
ActiveRecord::Tasks::DatabaseTasks.load_schema_current :ruby, Rails.root.join('db/schema_custom_db.rb'), "custom_db_#{Rails.env}"
I've been struggling recently with trying to just use ActiveRecord. I can load migrations and dump schema but when it comes to loading a schema I run into a similar problem and it's not a Rails or Rake one. Just FYI.
I know you don't want do this standalone but I'm trying.
Please note see this post:
rake db:schema:load vs. migrations
I'm still obsessed with trying to do it on my own without Rails or Rake help so that I gain a better understanding of how ActiveRecord works.
I made a very minimal Rails structure just to generate migrations.
file structure:
Rakefile
Gemfile
/bin
bundle
rake
rails
/config
application.rb
boot.rb
database.yml
environment.rb
/db
/migrate
development.sqlite3
schema.rb
bin/rails generate migration CreateSystemSettings
and I get a file 20150207170924_create_system_settings.rb in the db/migrate folder.
bin/rake db:migrate
== 20150204070000 DropArticles: migrating
== 20150204070000 DropArticles: migrated (0.0001s)
== 20150207163123 AddPartNumberToProducts: migrating
== 20150207163123 AddPartNumberToProducts: migrated (0.0000s)
== 20150207163909 ChangeSystemSettings: migrating
== 20150207163909 ChangeSystemSettings: migrated (0.0000s)
== 20150207170924 CreateSystemSettings: migrating
-- create_table(:system_settings)
-> 0.0085s
== 20150207170924 CreateSystemSettings: migrated (0.0199s)
so that works
Update pending

How to run schema:load on the initial capistrano 3 deploy of my rails app

I would like to run db:schema:load in place of db:migrate on the initial deploy of my rails app.
This used to be fairly trivial, as seen in this stack overflow question, but in Capistrano 3, they have deprecated the deploy:cold task. The initial deploy isn't any different than all subsequent deploys.
Any suggestions? Thanks!
I, too, am new to Capistrano, and trying to use it for the first time to deploy a Rails app to production servers I configured with Puppet.
I finally had to dig into the Capistrano source (and capistrano/bundler, and capistrano/rails, and even sshkit and net-ssh to debug auth problems) to determine exactly how everything works before I felt confidant deciding for myself what changes I wanted to make. I just finished making those changes, and I'm pleased with the results:
# lib/capistrano/tasks/cold.rake
namespace :deploy do
desc "deploy app for the first time (expects pre-created but empty DB)"
task :cold do
before 'deploy:migrate', 'deploy:initdb'
invoke 'deploy'
end
desc "initialize a brand-new database (db:schema:load, db:seed)"
task :initdb do
on primary :web do |host|
within release_path do
if test(:psql, 'portal_production -c "SELECT table_name FROM information_schema.tables WHERE table_schema=\'public\' AND table_type=\'BASE TABLE\';"|grep schema_migrations')
puts '*** THE PRODUCTION DATABASE IS ALREADY INITIALIZED, YOU IDIOT! ***'
else
execute :rake, 'db:schema:load'
execute :rake, 'db:seed'
end
end
end
end
end
The deploy:cold task merely hooks my custom deploy:inidb task to run before deploy:migrate. That way the schema and seeds get loaded, and the deploy:migrate step that follows does nothing (safely) because there are no new migrations to run. As a safety, I test to see if the schema_migrations table already exists before loading the schema in case you run deploy:cold again.
Note: I choose to create the DB using Puppet so I can avoid having to grant the CREATEDB privilege to my production postgresql user, but if you want Capistrano to do it, just add "execute :rake, 'db:create'" before the db:schema:load, or replace all three lines with 'db:setup'.
You'll have to define deploy:cold as basically a duplicate of the normal deploy task but with deploy:db_load_schema instead of deploy:migrations. For example:
desc 'Deploy app for first time'
task :cold do
invoke 'deploy:starting'
invoke 'deploy:started'
invoke 'deploy:updating'
invoke 'bundler:install'
invoke 'deploy:db_load_schema' # This replaces deploy:migrations
invoke 'deploy:compile_assets'
invoke 'deploy:normalize_assets'
invoke 'deploy:publishing'
invoke 'deploy:published'
invoke 'deploy:finishing'
invoke 'deploy:finished'
end
desc 'Setup database'
task :db_load_schema do
on roles(:db) do
within release_path do
with rails_env: (fetch(:rails_env) || fetch(:stage)) do
execute :rake, 'db:schema:load'
end
end
end
end
It might even be better to run the deploy:db_schema_load task independently, as the tasks included in the default deploy might change over time.
I actually using db:setup for fresh deploys because it seeds the database after creating tables:
desc 'Setup database'
task :db_setup do
...
execute :rake, 'db:setup'
...
end

Ruby On Rails: way to create different seeds file for environments

How can one make the task rake db:seed to use different seeds.rb file on production and development?
edit: any better strategy will be welcome
You can have a rake task behave differently based on the current environment, and you can change the environment a task runs in by passing RAILS_ENV=production to the command. Using these two together you could produce something like so:
Create the following files with your environment specific seeds:
db/seeds/development.rb
db/seeds/test.rb
db/seeds/production.rb
Place this line in your base seeds file to run the desired file
load(Rails.root.join( 'db', 'seeds', "#{Rails.env.downcase}.rb"))
Call the seeds task:
rake db:seed RAILS_ENV=production
I like to implement all seeds inside one seed.rb file and then just separate the environments inside.
if Rails.env.production?
State.create(state: "California", state_abbr: "CA")
State.create(state: "North Dakota", state_abbr: "ND")
end
if Rails.env.development?
for 1..25
Orders.create(order_num: Faker::Number:number(8), order_date: Faker::Business.credit_card_expiry_date)
end
end
That way you do not need to cast the RAILS_ENV property on your rake task, or manage multiple files. You also can include Rails.env.test?, but I personally let RSPEC take care of the testing data.

Is there any way to have multiple seeds.rb files? Any kind of 'versioning' for seed data?

We need to add more seed data for some newly added tables to "version 100" of our rails project.
However, if we simply add it to the seeds.rb and re-run the rake db:seed command, it will of course Re-add the original seed data, duplicating it.
So if you've already added seed data to seeds.rb for, say, TableOne ...
How can we incrementally add seed data for TableTwo and TableThree at later stages of development?
I'd hoped I could simply create a NEW seeds_two.rb file and run rake db:seeds_two but that gave an error Don't know how to build task 'db:seeds_two'
So it looks like ONLY "seeds.rb" can be used.
How do people maintain incremental additions to seed data?
You can re-use the seed task, but make it idempotent.
To make the seed idempotent, simply check for the existence of the condition before executing a command. An example: do you want to create a new admin user?
User.find_or_create_by_username(:username => "admin")
instead of
User.create(:username => "admin")
However, seed should be used to populate your database when the project is created. If you want to perform complex data seeding durin the lifecycle of the app, simply create a new rake task, execute it then remove it.
For those who have concerns about this question
We can have multiple seed files in db/seeds/ folder, and, we can write a rake task to run the separate files as we desire to run
# lib/tasks/custom_seed.rake
namespace :db do
namespace :seed do
Dir[File.join(Rails.root, 'db', 'seeds', '*.rb')].each do |filename|
task_name = File.basename(filename, '.rb').intern
# Now we will create multiple tasks by each file name inside db/seeds directory.
task task_name => :environment do
load(filename)
end
end
# This is for if you want to run all seeds inside db/seeds directory
task :all => :environment do
Dir[File.join(Rails.root, 'db', 'seeds', '*.rb')].sort.each do |filename|
load(filename)
end
end
end
end
Then, in order to run specific seed file, you can just run
rake db:seed:seed_file_name
To run all the seeds file with order in that db/seeds folder, run below command
rake db:seed:all

Resources