Automatically delete all records after 30 days - ruby-on-rails

I have a concept for a rails app that I want to make. I want a model that a user can create a record of with a boolean attribute. After 30 days/Month unless the record has true boolean attribute, the record will automatically delete itself.

In rails 5 you have access to "active_job"(http://guides.rubyonrails.org/active_job_basics.html)
There are two simple ways.
After creating the record, you could set this job to be executed after 30 days. This job checks if the record matches the specifications.
The other alternative is to create an alternative job, that runs everyday which queries the database for every record (of this specific model) that where created 30 days ago and destroy them if they do not match the specifications. (If thats on the database it should be easy as: MyModel.where(created_at: 30.days.ago, destroyable: true).destroy_all)

There are a couple of options for achieving this:
Whenever and run a script or rake task
Clockwork and run a script or rake task
Background jobs which in my opinion is the "rails way".
For the 1,2 you need to check everyday if a record is 30 days old and then delete it if there isn't a true boolean (which means, checking all the records or optimize the query everyday and check only the 30 days old records etc...). For the 3rd option, you can schedule on the record creation, a job to run after 30 days and do the check for each record independently. It depends on how you are processing the jobs, for example, if you use sidekiq you can use scheduled jobs or if you use resque check resque-scheduler.

Performing the deletion is straightforward: create a class method (e.g Record.prune) on the record class in question that performs the deletion based on a query e.g. Record.destroy_all(retain: false) where retain is the boolean attribute you mention. I'd recommend then defining a rake task in lib/tasks that invokes this method (e.g.)
namespace :record
task :prune => :environment
Record.prune
end
end
The scheduling is more difficult; a crontab entry is sufficient to provide the correct timing, but ensuring that an appropriate environment (e.g. one that's loaded rbenv/rvm and any appropriate environment variables) is more difficult. Ensuring that your deployment process produces binstubs is probably helpful here. From there the bin/rake record:prune ought to be enough. It's hard to provide a more in-depth answer without more knowledge of all the environments in which you hope to accomplish this task.

I want to mention a non Rails approach. It depends on your database. When you use mongodb you can utilize mongodb "Expire Data from Collections" feature. When you use mysql you can utilize mysql event scheduler. You can find a good example here What is the best way to delete old rows from MySQL on a rolling basis?

Related

How to force Rails ActiveRecord to commit a transaction flush

Is it possible to force ActiveRecord to push/flush a transaction (or just a save/create)?
I have a clock worker that creates tasks in the background for several task workers. The problem is, the clock worker will sometimes create a task and push it to a task worker before the clock worker information has been fully flushed to the db which causes an ugly race condition.
Using after_commit isn't really viable due to the architecture of the product and how the tasks are generated.
So in short, I need to be able to have one worker create a task and flush that task to the db.
ActiveRecord uses #transaction to create a block that begins and either rolls back or commits a transaction. I believe that would help your issue. Essentially (presuming Task is an ActiveRecord class):
Task.transaction do
new_task = Task.create(...)
end
BackgroundQueue.enqueue(new_task)
You could also go directly to the #connection underneath with:
Task.connection.commit_db_transaction
That's a bit low-level, though, and you have to be pretty confident about the way the code is being used. #after_commit is the best answer, even if it takes a little rejiggering of the code to make it work. If it won't work for certain, then these two approaches should help.
execute uses async_exec under the hood which may or may not be what you want. You could try using the lower level methods execute_and_clear (or even exec_no_cache) instead.

Rails migration: only for schema change or also for updating data?

I'm a junior Rails developer and at work we faced the following problem:
Needed to update the value of a column only for one record.
What we did is creating a migration like this:
class DisableAccessForUser < ActiveRecord::Migration
def change
User.where(name: "User").first.update_column(:access, false)
end
end
Are migrations only for schema changes?
What other solutions do you suggest?
PS: I can only change it with code. No access to console.
The short version is, since migrations are only for schema changes, you wouldn't want to use them to change actual data in the database.
The main issue is that your data-manipulating migration(s) might be ignored by other developers if they load the DB structuring using either rake db:schema:load or rake db:reset. Both of which merely load the latest version of the structure using the schema.rb file and do not touch the migrations.
As Nikita Singh also noted in the comments, I too would say the best method of changing row data is to implement a simple rake task that can be run as needed, independent of the migration structure. Or, for a first time installation, the seed.rb file is perfect to load initial system data.
Hope that rambling helps.
Update
Found some documentation in some "official" sources:
Rails Guide for Migrations - Using Models in your Migrations. This section gives a description of a scenario in which data-manipulation in the migration files can cause problems for other developers.
Rails Guide for Migrations - Migrations and Seed Data. Same document as above, doesn't really explain why it is bad to put seed or data manipulation in the migration, merely says to put all that in the seed.rd file.
This SO answer. This person basically says the same thing I wrote above, except they provide a quote from the book Agile Web Development with Rails (3rd edition), partially written by David Heinemeier Hansson, creator of Rails. I won't copy the quote, as you can read it in that post, but I believe it gives you a better idea of why seed or data manipulation in migrations might be considered a bad practice.
Migrations are fine for schema changes. But when you work on much collaborated projects like pulling code everyday from lot of developers.
Chances are you might miss some migrations(Value update migrations..No problem for schema changes) Because migrations depends on the timestamps.
So what we do is create a rake task in a single namespace to update some table values( Be careful it does not overwrites)
And invoke all the rake task in that NameSpace whenever we update the code from Git.
Making data changes using classes in migrations is dangerous because it's not terribly future proof. Changes to the class can easily break the migration in the future.
For example, let's imagine you were to add a new column to user (sample_group) and access that column in a Rails lifecycle callback that executes on object load (e.g. after_initialize). That would break this migration. If you weren't skipping callbacks and validations on save (by using update_column) there'd be even more ways to break this migration going forward.
When I want to make data changes in migrations I typically fall back to SQL. One can execute any SQL statement in a migration by using the execute() method. The exact SQL to use depends on the database in use, but you should be able to come up with a db appropriate query. For example in MySQL I believe the following should work:
execute("UPDATE users SET access = 0 WHERE id IN (select id from users order by id limit 1);")
This is far more future proof.
There is nothing wrong with using a migration to migrate the data in your database, in the right situation, if you do it right.
There are two related things you should avoid in your migrations (as many have mentioned), neither of which preclude migrating data:
It's not safe to use your models in your migrations. The code in the User model might change, and nobody is going to update your migration when that happens, so if some co-worker takes a vacation for 3 months, comes back, and tries to run all the migrations that happened while she was gone, but somebody renamed the User model in the mean time, your migration will be broken, and prevent her from catching up. This just means you have to use SQL, or (if you are determined to keep even your migrations implementation-agnostic) include an independent copy of an ActiveRecord model directly in your migration file (nested under the migration class).
It also doesn't make sense to use migrations for seed data, which is, specifically, data that is to be used to populate a new database when someone sets up the app for the first time so the app will run (or will have the data one would expect in a brand new instance of the app). You can't use migrations for this because you don't run migrations when setting up your database for the first time, you run db:schema:load. Hence the special file for maintaining seed data: seeds.rb. This just means that if you do need to add data in a migration (in order to get production and everyone's dev data up to speed), and it qualifies as seed data (necessary for the app to run), you need to add it to seeds.rb too!
Neither of these, however, mean that you shouldn't use migrations to migrate the data in existing databases. That is what they are for. You should use them!
A migrations is simply a structured way to make database changes, both schema and data.
In my opinion there are situations in which using migrations for data changes is legitimate.
For example:
If you are holding data which is mostly constant in your database but changes annually, it is fine to make a migration each year to update it. For example, if you list the teams in a soccer league a migration would be a good way to update the current teams in each year.
If you want to mass-alter an attribute of a large table. For example if you had a slug column in your user and the name "some user" would be translated to the slug "some_user" and now you want to change it to "some.user". This is something I'd do with a migration.
Having said that, I wouldn't use a migration to change a single user attribute. If this is something which happens occasionally you should make a dashboard which will allow you to edit this data in the future. Otherwise a rake task may be a good option.
This question is old and I think rails approach changed over time here. Based on https://edgeguides.rubyonrails.org/active_record_migrations.html#migrations-and-seed-data it's OK to feed new columns with data here. To be more precise your migration code should contain also "down" block:
class DisableAccessForUser < ActiveRecord::Migration
def up
User.where(name: "User").first.update_column(:access, false)
end
def down
User.where(name: "User").first.update_column(:access, true)
end
end
If you use seeds.rb to pre-fill data, don't forget to include new column value there, too:
User.find_or_create_by(id: 0, name: 'User', access: false)
If I remember correctly, changing particular records may work, but I'm not sure about that.
In any case, it isn't a good practice, migrations should be user for schema changes only.
For updating one record I would use console. Just type 'rails console' in terminal and input code to change attributes.

About my way to truncate tables in my test DB

In my Rails app development, When I run my Rspec test, I need to truncate all tables in my test database in after(:all) .
(That's to clean up all data in every table in test db)
To approach this, I am thinking to first get all ActiveRecord models which represent the tables in test db, then for each model, I use delete_all method to clean up each table. Thant's something like:
ALL_ACTIVE_RECORD_MODELS.each do |model|
model.delete_all
end
I have two questions to ask regards to this:
1. How to get all active record models in Rails in my rspec code?
2. Am I using a acceptable way to truncate all tables in my test DB or not? If not, what is the alternative way?
There is a gem to do exactly this task called database_cleaner: https://github.com/bmabey/database_cleaner.
It will make sure that everything is removed from your database however its default strategy is not to delete content but to use transactions and simply roll back the changes after each test.
Be warned that this can occasionally lead to a gotcha when testing behaviour that's intended to be transactional as you won't see your transaction execute. You can fix this by adding self.use_transactional_fixtures = false before any set of tests that you don't want to use transactions. Remember to clear your data away again afterward though.

MongoMapper and migrations

I'm building a Rails application using MongoDB as the back-end and MongoMapper as the ORM tool. Suppose in version 1, I define the following model:
class SomeModel
include MongoMapper::Document
key :some_key, String
end
Later in version 2, I realize that I need a new required key on the model. So, in version 2, SomeModel now looks like this:
class SomeModel
include MongoMapper::Document
key :some_key, String
key :some_new_key, String, :required => true
end
How do I migrate all my existing data to include some_new_key? Assume that I know how to set a reasonable default value for all the existing documents. Taking this a step further, suppose that in version 3, I realize that I really don't need some_key at all. So, now the model looks like this
class SomeModel
include MongoMapper::Document
key :some_new_key, String, :required => true
end
But all the existing records in my database have values set for some_key, and it's just wasting space at this point. How do I reclaim that space?
With ActiveRecord, I would have just created migrations to add the initial values of some_new_key (in the version1 -> version2 migration) and to delete the values for some_key (in the version2 -> version3 migration).
What's the appropriate way to do this with MongoDB/MongoMapper? It seems to me that some method of tracking which migrations have been run is still necessary. Does such a thing exist?
EDITED: I think people are missing the point of my question. There are times where you want to be able to run a script on a database to change or restructure the data in it. I gave two examples above, one where a new required key was added and one where a key can be removed and space can be reclaimed. How do you manage running these scripts? ActiveRecord migrations give you an easy way to run these scripts and to determine what scripts have already been run and what scripts have not been run. I can obviously write a Mongo script that does any update on the database, but what I'm looking for is a framework like migrations that lets me track which upgrade scripts have already been run.
Check out Mongrations... I just finished reading about it and it looks like what you're after.
http://terrbear.org/?p=249
http://github.com/terrbear/mongrations
Cheers! Kapslok
One option is to use the update operation to update all of your data at once. Multi update is new in the development releases so you'll need to use one of those.
You can try this contraption I just made, but it only works with mongoid and rails 3 (beta 3) at the moment. http://github.com/adacosta/mongoid_rails_migrations . It'll be upgraded to rails 3 when it goes final.
Also another gem for MongoMapper migrations https://github.com/alexeypetrushin/mongo_mapper_ext
Mongrations is a super old gem, completely deprecated. I recommend NOT using it.
Exodus is a really cool migration framework for Mongo, that might be what you want:
https://github.com/ThomasAlxDmy/Exodus
We just build this one: https://github.com/eberhara/mongration - it is a regular node module (you can find it on npm).
We needed a good mongodb migration framework, but could not find any - so we built one.
It has lot's of better features than the regular migration frameworks:
Checksum (issues an error when a previosuly ran migration does not match its old version)
Persists migration state to mongo (there is no regular state file)
Full support to replica sets
Automatic handle rollbacks (developers must specify the rollback procedures)
Ability to run multiple migrations (sync or async) at the same time
Ability to run migrations against different databases at the same time
Hope it helps!
Clint,
You can write code to do updates -- though it seems that for updating a record based on its own fields is not supported.
In such a case, I did the following and ran it against the server:
------------------------------
records = Patient.all()
records.each do |p|
encounters = p.encounters
if encounters.nil? || encounters.empty?
mra = p.updated_at
#puts "\tpatient...#{mra}"
else
mra = encounters.last.created_at
#puts "\tencounter...#{mra}"
end
old = p.most_recent_activity
p.most_recent_activity = mra
p.save!
puts "#{p.last_name} mra: #{old} now: #{mra}"
end
------------------------------
I bet you could hook into Activerecord::Miration to automate and track your "migration" scripts.
MongoDB is a schema-less database. That's why there are no migrations. In the database itself, it doesn't matter whether the objects have the key :some_key or the key :some_other_key at any time.
MongoMapper tries to enforce some restrictions on this, but since the database is so flexible, you will have to maintain those restrictions yourself. If you need a key on every object, make sure you run a script to update those keys on pre-existing objects, or handle the case of an object that doesn't have that key as you come across them.
I am fairly new to MongoDB myself, but as far as I can see, due to the flexibility of the schema-less db this is how you will need to handle it.

Long running tasks in Rails

I have a controller that generates HTML, XML, and CSV reports. The queries used for these reports take over a minute to return their result.
What is the best approach to run these tasks in the background and then return the result to the user? I have looked into Backgroundrb. Is there anything more basic for my needs?
You could look at using DelayedJob to perform those queries for you, and have an additional table called "NotificationQueue". When a job is finished (with its resultset), store the resultset and the User ID of the person who made that query in the NotificationQueue table. then on every page load (and, if you like, every 15-20 seconds), poll that database and see if there are any completed queries.
DelayedJob is really great because you write your code as if it wasn't going to be a delayed job, and just change the code to do the following:
#Your method
Query.do_something(params)
#Change to
Query.send_later(:do_something, params)
We use it all the time at work, and it works great.

Resources