Delete all but 5 newest entries in MySQL table - stored-procedures

I currently have PHP code that handles the logic for this because I do not know how to handle it in SQL. I want to create a stored procedure that will delete all the rows except for the 5 newest for a given config_id. IE config_id = 5 gets passed to the SP so it knows which config_id it is looking to clean up.
CREATE TABLE `TAA`.`RunHistory` (
`id` int(11) NOT NULL auto_increment,
`start_time` datetime default NULL,
`stop_time` datetime default NULL,
`success_lines` int(11) default NULL,
`error_lines` int(11) default NULL,
`config_id` int(11) NOT NULL,
`file_id` int(11) NOT NULL,
`notes` text NOT NULL,
`log_file` longblob,
`save` tinyint(1) NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8;
Newest will be determined by start_time, if a stop_time is null but NOT the newest it should be deleted (stop_time can be null if a run was unceremoniously killed).

From
SQL query: Delete all records from the table except latest N?:
DELETE FROM `runHistory`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `runHistory`
ORDER BY start_time DESC
LIMIT 5
) foo
);

Here's a procedure I tested on MySQL 5.1.46, but it uses no subqueries so you won't get the error about no support for LIMIT in a subquery.
CREATE PROCEDURE DeleteBut5(IN c INT) BEGIN
DECLARE i INT;
DECLARE s DATETIME;
SELECT id, stop_time INTO i, s
FROM RunHistory WHERE config_id = c
ORDER BY stop_time DESC, id DESC
LIMIT 4, 1;
DELETE FROM RunHistory WHERE stop_time < s OR stop_time = s AND id < i;
END
I recommend you create this covering index:
CREATE INDEX cov ON RunHistory (config_id, stop_time, id);

begin;
declare v_start_time datetime;
declare v_id int;
#Find the id of the newest run
select id into v_id from RunHistory where start_time = (select max(start_time) from RunHistory);
#delete null stop times except for the newest run
delete from RunHistory where stop_time is null and id != v_id;
#first row is 0... skip 0-4, show 5
select start_time into v_start_time from RunHistory order by stop_time desc limit 4,1;
delete from RunHistory where start_time < v_start_time;
end;
There you go. I suggest indexing start_time. Stop_time may or may not be worth indexing. It's probably not. You can optimize that delete statement by changing it to the following, since we'll delete anything past the first five anyway:
delete from RunHistory where stop_time is null and id != v_id order by start_time desc limit 5;

https://stackoverflow.com/a/8303440/2576076 is good solution.
If your table has large number of rows, it's even more better.

Related

how to create migration on rails with raw sql (postgres) with large data set

I wanted to insert data into a related table. A 1 to many relationships. After searching the best practice I found this link and then I implemented this.
class InsertDataIntoPermissionAndPermissionGroup < ActiveRecord::Migration
def up
execute <<-SQL
-- ----------------------------------------------------------------------------
WITH a AS (
INSERT INTO spree_permission_groups (name, created_at, updated_at)
VALUES ('role_manager', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP) RETURNING id
)
INSERT INTO
spree_permissions (name, action, permission_group_id, created_at, updated_at)
SELECT 'Role', 'manage', id, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP FROM a
-- ----------------------------------------------------------------------------
-- ----------------------------------------------------------------------------
WITH b AS (
INSERT INTO spree_permission_groups (name, created_at, updated_at)
VALUES ('department_manager', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP) RETURNING id
)
INSERT INTO
spree_permissions (name, action, permission_group_id, created_at, updated_at)
SELECT 'Department', 'manage', id, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP FROM b
-- ----------------------------------------------------------------------------
-- ----------------------------------------------------------------------------
WITH c AS (
INSERT INTO spree_permission_groups (name, created_at, updated_at)
VALUES ('holiday_manager', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP) RETURNING id
)
INSERT INTO
spree_permissions (name, action, permission_group_id, created_at, updated_at)
SELECT 'Holiday', 'manage', id, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP FROM c
-- ----------------------------------------------------------------------------
SQL
end
def down
raise ActiveRecord::IrreversibleMigration
end
end
but I have more than 300 data. Is this still the right way of doing this? or I can import an excel data and do rails create method.
I also have an Error. Thought this will ok since I wrap it in sql.
Caused by:
PG::SyntaxError: ERROR: syntax error at or near "WITH"
LINE 14: WITH b AS (
^
UPDATE
fixed error
I would suggest you do insert records in rake task. And you can use Active record import gem(https://github.com/zdennis/activerecord-import) to insert bulk data to the table. You can even use active record import in migration too. Do check the gem documentation for using more available options.
For creating rake tasks, you can refer this - https://railsguides.net/how-to-generate-rake-task/
desc 'task description'
task :bulk_insert_records => :environment do |_, args|
spree_permission_groups = ['holiday_manager', 'department_manager', 'role_manager']
spree_permission_groups_list = []
spree_permission_groups.each do |spree_permission_group|
spree_permission_groups_list << SpreePermissionGroup.new(name: spree_permission_group)
end
SpreePermissionGroup.import(spree_permission_groups_list)
# Likewise use the id of the created records to set permissions
end

How to use SQL using Active Record

I am trying to optimise the performance of a query in Active Record. Previously I would do two SQL queries and it should be possible to do it in one.
These are the tables that I am running the queries on:
# Table name: notifications
#
# id :integer not null, primary key
# content :text(65535)
# position :integer
# Table name: dismissed_notifications
#
# id :integer not null, primary key
# notification_id :integer
# user_id :integer
This is the existing query:
where.not(id: user.dismissed_notifications.pluck(:id))
which produces:
SELECT `dismissed_notifications`.`id` FROM `dismissed_notifications` WHERE `dismissed_notifications`.`user_id` = 655
SELECT `notifications`.* FROM `notifications` WHERE (`notifications`.`id` != 1)
This is the SQL I would like to get, which returns the same records:
select *
from notifications n
where not exists(
select 1
from dismissed_notifications dn
where dn.id = n.id
and dn.user_id = 655)
You can write not exists Query like below
where('NOT EXISTS (' + user.dismissed_notifications.where('dismissed_notifications.id = notifications.id').to_sql + ')')
OR
There is also another way to reduce the number of queries is use select instead of pluck, it will create sub-query instead pulling records from database. Rails ActiveRecord Subqueries
where.not(id: user.dismissed_notifications.select(:id))
Which will generate below SQL query
SELECT `notifications`.*
FROM `notifications`
WHERE (
`notifications`.`id` NOT IN
(SELECT `dismissed_notifications`.`id`
FROM `dismissed_notifications`
WHERE `dismissed_notifications`.`user_id` = 655
)
)

make a join select using django orm

create table topic (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) NOT NULL,
`description` longtext NOT NULL
)
create table `subscribe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userid` varchar(128) NOT NULL,
`topicid` int(11) NOT NULL,
Foreign Key (topic_id) REFERENCES topic(id)
)
Now, given userid = "Amy01", I want to get all the topics that "Amy01" has subscribed.
when using SQL, it's:
select t.id, t.name, t.description
from topic t join subscribe s on t.id = s.topicid
where s.userid = "Amy01"
How can I get the same select using django orm ?
I already have a resolution, but I don't think it's pretty good:
searched_sub = FilterSubscribe.objects.filter(userid = "Amy01").select_related()
searched = []
for sub in searched_sub:
searched.append(sub.topicid)
then, searched is all the topics that Amy01 has subscribed.
Is there any better statements to achieve this?
I have found one solution:
my_topic = Topic.objects.filter(Subscribe_set__userid = "Amy01").distinct()
"Subscribe_set" is used for 'callback' the subscribe table.
what's more, when define the Subscribe entity in models.py, if you code like this:
topic = models.ForeignKey('Topic', related_name = '_subscribe'),
you can also achieve the select as:
my_topic = Topic.objects.filter(_subscribe__userid = "Amy01").distinct()
referenced to: https://docs.djangoproject.com/en/1.6/topics/db/queries/#backwards-related-objects

Rails: efficient way to collect data of one model from another with many-to-many relationship

I have two models, Bookmark and Tag. The tags are implemented by acts-as-taggable-on gem, but I will explain the main point here.
Bookmark model contains an url field. Given a Bookmark instance #bookmark, #bookmart.tags returns its tags. Different bookmarks can share a same tag (that is the many part from Tag).
Tag has name, and taggings_count. The taggings_count field stores how many bookmarks to which the tag are tagged. Behind the scene, there is an taggings table, but that doesn't matter.
Now is the question, I want to retrieve all those tags that are tagged by bookmarks with specific url value, and the result should be sorted by the number of bookmarks to which a certain tag is tagged (that number is not the same as the taggings_count field, which represents tagging count for all bookmark, but want bookmark for a specific url here). How can it be done so that the generated sql is efficient?
I know I can write directly in sql for efficiency, but I am also wondering whether Rails can do the same without hurting too much performance, so that I don't have to inject sql code in my Rails application
Following is the table definitions, in the taggings table, taggable_id acts as a foreign key to Bookmark, and tag_id a foreign key to Tag:
CREATE TABLE `bookmarks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`private` tinyint(1) DEFAULT NULL,
`read` tinyint(1) DEFAULT NULL,
`list_id` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `taggings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tag_id` int(11) DEFAULT NULL,
`taggable_id` int(11) DEFAULT NULL,
`taggable_type` varchar(255) DEFAULT NULL,
`tagger_id` int(11) DEFAULT NULL,
`tagger_type` varchar(255) DEFAULT NULL,
`context` varchar(128) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `taggings_idx` (`tag_id`,`taggable_id`,`taggable_type`,`context`,`tagger_id`,`tagger_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `tags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`taggings_count` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `index_tags_on_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
This is a solution:
bookmark_ids = Bookmark.where(url: "http://foobar.com").pluck(:id)
taggings = Tagging.where(taggable_id: bookmark_ids).where(taggable_type: "Bookmark")
tag_ids = taggings.pluck(:tag_id).uniq
tags = Tag.where(id: tag_ids).order(taggings_count: :desc)
It theory it could be written using the joins() method from ActiveRecord, but I don't know how the gem you're using defines the associations.
This might or might not work:
Tag.order(taggings_count: :desc)
.joins(taggings: :bookmark)
.where(bookmark: { url: "http://foobar.com" })
You could also write raw SQL, but it feels dirty in rails.
It turns out to be simple, acts-as-taggable-on can actually operates on an association. The docs have that sample, I just didn't noticed:
Bookmark.where(url: url).tag_counts_on(:tags).sort_by(&:taggings_count).reverse.map(&:name)
The only problem is that, the result is stilled sorted by popularity of those tags, no matter in what kinds of context they are popular. That doesn't really matter, though. The point is this rails statement only generates a single sql:
SELECT tags.*, taggings.tags_count AS count FROM "tags" JOIN (SELECT taggings.tag_id, COUNT(taggings.tag_id) AS tags_count FROM "taggings" INNER JOIN bookmarks ON bookmarks.id = taggings.taggable_id WHERE (taggings.taggable_type = 'Bookmark' AND taggings.context = 'tags') AND (taggings.taggable_id IN(SELECT bookmarks.id FROM "bookmarks" WHERE "bookmarks"."url" = 'http://kindleren.com/forum.php?gid=49')) GROUP BY taggings.tag_id HAVING COUNT(taggings.tag_id) > 0) AS taggings ON taggings.tag_id = tags.id

rails - boolean operators in find

params[:codes] = "9,10"
#result = Candidate.find :all,
:joins =>
params[:codes].split(',').collect {|c| ["INNER JOIN candidates_codes on candidates_codes.candidate_id = candidates.id, INNER JOIN codes on codes.code_id = candidates_codes.code_id AND codes.value = ?", c]}
Error
Association named 'INNER JOIN candidates_codes on candidates_codes.candidate_id = candidates.id, INNER JOIN codes on codes.code_id = candidates_codes.code_id AND codes.value = ?' was not found; perhaps you misspelled it?
Update
CREATE TABLE `candidates` (
 `id` int(11) NOT NULL auto_increment,
`first_name` varchar(255) collate utf8_unicode_ci default NULL,
`last_name` varchar(255) collate utf8_unicode_ci default NULL,
`mobile_number` varchar(255) collate utf8_unicode_ci default NULL,
`address` text collate utf8_unicode_ci,
`country` varchar(255) collate utf8_unicode_ci default NULL,
`created_at` datetime default NULL,
`updated_at` datetime default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=5 ;
CREATE TABLE `candidates_codes` (
`candidate_id` int(11) default NULL,
`code_id` int(11) default NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `codes` (
`id` int(11) NOT NULL auto_increment,
`section` varchar(255) collate utf8_unicode_ci default NULL,
`value` varchar(255) collate utf8_unicode_ci default NULL,
`created_at` datetime default NULL,
`updated_at` datetime default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=11 ;
Hi,
I am trying to create a find which can ether be "OR" or "AND"
for example(pseudocode)
array a = (1,2)
array b = (1)
find(1 AND 2) = array a
find(1 OR 2) = array a, array b
My code currently looks like this -
#result = Code.all :joins => :candidates,
:conditions => ["codes.id IN (?)", params['searches'][:Company]],
:select => "candidates.*"
code is a table full of codes that describe a candidate,
a habtm relationship exists between code and candidate
The only way of using AND I can see in the guides is between two columns..
Many Thanks
Alex
Since the association is done with a join table, doing an AND requires an INNER JOIN, once for each term in the AND. What you're trying to do is find a given candidate that has a mapping for all of the codes.
This could get messy, since you not only have to join for each term, but also again to the codes table if you're matching on a field there, such as value.
Assuming the number of terms isn't too high, and you pass in params[:codes] = "1,5,9", and that you're trying to match on codes.value:
Candidate.find :all,
:joins =>
params[:codes].split(',').collect {|c| "INNER JOIN candidates_codes#{c} on candidates_codes#{c}.candidate_id = candidates.id INNER JOIN codes#{c} on codes#{c}.id = candidates_codes#{c}.code_id AND codes#{c}.value = c"}
...or something like that. Warning that I haven't tested that code, but give it a whirl if that's what you're looking for.
Note I've removed the substitution from the last rev (where the ? is replaced by a variable) because joins don't support this. You should first sanitize the params (i.e. make sure they are integers, or whatever), or use the protected sanitize_sql method in the model.

Resources