2 column table, ignore duplicates on mass insert postgresql - ruby-on-rails

I have a Join table in Rails which is just a 2 column table with ids.
In order to mass insert into this table, I use
ActiveRecord::Base.connection.execute("INSERT INTO myjointable (first_id,second_id) VALUES #{values})
Unfortunately this gives me errors when there are duplicates. I don't need to update any values, simply move on to the next insert if a duplicate exists.
How would I do this?
As an fyi I have searched stackoverflow and most the answers are a bit advanced for me to understand. I've also checked the postgresql documents and played around in the rails console but still to no avail. I can't figure this one out so i'm hoping someone else can help tell me what I'm doing wrong.
The closest statement I've tried is:
INSERT INTO myjointable (first_id,second_id) SELECT 1,2
WHERE NOT EXISTS (
SELECT first_id FROM myjointable
WHERE first_id = 1 AND second_id IN (...))
Part of the problem with this statement is that I am only inserting 1 value at a time whereas I want a statement that mass inserts. Also the second_id IN (...) section of the statement can include up to 100 different values so I'm not sure how slow that will be.
Note that for the most part there should not be many duplicates so I am not sure if mass inserting to a temporary table and finding distinct values is a good idea.
Edit to add context:
The reason I need a mass insert is because I have a many to many relationship between 2 models where 1 of the models is never populated by a form. I have stocks, and stock price histories. The stock price histories are never created in a form, but rather mass inserted themselves by pulling the data from YahooFinance with their yahoo finance API. I use the activerecord-import gem to mass insert for stock price histories (i.e. Model.import columns,values) but I can't type jointable.import columns,values because I get the jointable is an undefined local variable

I ended up using the WITH clause to select my values and give it a name. Then I inserted those values and used WHERE NOT EXISTS to effectively skip any items that are already in my database.
So far it looks like it is working...
WITH withqueryname(first_id,second_id) AS (VALUES(1,2),(3,4),(5,6)...etc)
INSERT INTO jointablename (first_id,second_id)
SELECT * FROM withqueryname
WHERE NOT EXISTS(
SELECT first_id FROM jointablename WHERE
first_id = 1 AND
second_id IN (1,2,3,4,5,6..etc))
You can interchange the Values with a variable. Mine was VALUES#{values}
You can also interchange the second_id IN with a variable. Mine was second_id IN #{variable}.

Here's how I'd tackle it: Create a temp table and populate it with your new values. Then lock the old join values table to prevent concurrent modification (important) and insert all value pairs that appear in the new table but not the old one.
One way to do this is by doing a left outer join of the old values onto the new ones and filtering for rows where the old join table values are null. Another approach is to use an EXISTS subquery. The two are highly likely to result in the same query plan once the query optimiser is done with them anyway.
Example, untested (since you didn't provide an SQLFiddle or sample data) but should work:
BEGIN;
CREATE TEMPORARY TABLE newjoinvalues(
first_id integer,
second_id integer,
primary key(first_id,second_id)
);
-- Now populate `newjoinvalues` with multi-valued inserts or COPY
COPY newjoinvalues(first_id, second_id) FROM stdin;
LOCK TABLE myjoinvalues IN EXCLUSIVE MODE;
INSERT INTO myjoinvalues
SELECT n.first_id, n.second_id
FROM newjoinvalues n
LEFT OUTER JOIN myjoinvalues m ON (n.first_id = m.first_id AND n.second_id = m.second_id)
WHERE m.first_id IS NULL AND m.second_id IS NULL;
COMMIT;
This won't update existing values, but you can do that fairly easily too by using with a second query that does an UPDATE ... FROM while still holding the write table lock.
Note that the lock mode specified above will not block SELECTs, only writes like INSERT, UPDATE and DELETE, so queries can continue to be made to the table while the process is ongoing, you just can't update it.
If you can't accept that an alternative is to run the update in SERIALIZABLE isolation (only works properly for this purpose in Pg 9.1 and above). This will result in the query failing whenever a concurrent write occurs so you have to be prepared to retry it over and over and over again. For that reason it's likely to be better to just live with locking the table for a while.

Related

How to select first character of attribute and insert into other column of same record on same table over many rows

I have a huge User table of almost a million records and I need to iterate over each user, grab the first character of a string in first_name and insert that character into first_initial for that record on the same User table. I'd prefer not to iterate through this via Rails because it will take hours instantiating objects etc. and so I was wondering if anyone had a good way of doing this either using Rails' update_all or keeping it strictly in psql.
This outputs the chars but I'm failing at inserting them into the corresponding initial column:
SELECT SUBSTRING(first_name, 1, 1) FROM users;
Any help would be greatly appreciated.
This will do what you need, you can put it in a migration or just run it from console
User.update_all("first_initial = SUBSTRING(first_name, 1, 1)")

how to change the order of columns

I got the data by 'MODEL.all' command in rails console
I want to put the column 'cgi_name' in the 3rd position when I run MODEL.all in the rails console
I use the postgres for my DB
How to get it ?
To answer your question directly, you'll have to move the columns at DB level
Currently, I only know MYSQL to support this functionality:
ALTER TABLE Employees CHANGE COLUMN empName empName VARCHAR(50) AFTER department;
Postgres, to my knowledge, does not support this functionality:
Many people new to postgresql often ask if it has support for altering
column positions within a table. Currently it does not; if you want to
change column positions, you must either recreate the table, or add
new columns and move data
In the view, you'll have to either manually display the columns, or create a helper method to cycle through them in an order of your choosing
Simple answer is YOU CANNOT
There is no way to re-order the column names to be displayed when you select using Model.all.
Otherwise, you can re-order this by selecting each column in the order you want.
Model.select("column1, column2, cgi_name, column4 etc..")
Hope it helps :)

Most efficient way to duplicate a row in Sqlite for iOS

What is the most efficient way to duplicate a row in an Sqlite3 database exactly except with an updated PrimaryKey?
Use an insert .. select where both the insert and select reference the same relation. Then the entire operation will occur as a single statement within SQLite code itself which cannot be beaten for "efficiency".
It will be easier with an auto-PK (just don't select the PK column), although an appropriate natural key can be assigned as well if such can be derived.
See SQLite INSERT SELECT Query Results into Existing Table?

Advice on handling versioning of an EAV-model table

there are many ActiveRecord versioning gems available to Rails but most if not all of them are having trouble being maintained. on top of that, some of them seem to have various foreign key association issues.
I'm in the process of coding a content management system where pages are stored in a tree-like hierarchy and the page fields are stored in a separate table using EAV model.
keeping that in mind, I'm not looking for an all encompassing revisioning gem because I honestly don't think I'll find one. what I am looking for is some advice on how to handle this as a custom implementation. should I have a separate table for storing revisions and referring to a revision number in my EAV table? I foresee that this may lead to some complex validation problems though. I currently have a problem finding a clean way to validate a regular EAV table anyway so if anyone can comment on this it would be very much appreciated as well.
I hope this question is written well enough to SO standards. if you need any additional information, please do not hesitate to ask and I will try to help you help me. :)
I currently have a problem finding a clean way to validate a regular
EAV table anyway so if anyone can comment on this it would be very
much appreciated as well.
There isn't a clean way to either validate or constrain an EAV table. That's why DBAs call it an anti-pattern. (EAV starts on slide 16.) Bill doesn't talk about version, so I will.
Versioning looks simple, but it's not. To version a row, you can add a column. It doesn't really matter much whether it's a version number or a timestamp.
create table test (
test_id integer not null,
attr_ts timestamp not null default current_timestamp,
attr_name varchar(35) not null,
attr_value varchar(35) not null,
primary key (test_id, attr_ts, attr_name)
);
insert into test (test_id, attr_name, attr_value) values
(1, 'emp_id', 1),
(1, 'emp_name', 'Alomar, Anton');
select * from test;
test_id attr_ts attr_name attr_value
--
1 2012-10-28 21:00:59.688436 emp_id 1
1 2012-10-28 21:00:59.688436 emp_name Alomar, Anton
Although it might not look like it on output, all those attribute values are varchar(35). There's no simple way for the dbms to prevent someone from entering 'wibble' as an emp_id. If you need type checking, you have to do it in application code. (And you have to keep sleep-deprived DBAs from using the command-line and GUI interfaces the dbms provides.)
With a normalized table, of course, you'd just declare emp_id to be of type integer.
With versioning, updating Anton's name becomes an insert.
insert into test (test_id, attr_name, attr_value) values
(1, 'emp_name', 'Alomar, Antonio');
With versioning, selection is mildly complicated. You can use a view instead of a common table expression.
with current_values as (
select test_id, attr_name, max(attr_ts) cur_ver_ts
from test
-- You'll probably need an index on this pair of columns to get good performance.
group by test_id, attr_name
)
select t.test_id, t.attr_name, t.attr_value
from test t
inner join current_values c
on c.test_id = t.test_id
and c.attr_name = t.attr_name
and c.cur_ver_ts = t.attr_ts
test_id attr_name attr_value
--
1 emp_id 1
1 emp_name Alomar, Antonio
A normalized table of 1 million rows and 8 non-nullable columns has a million rows. A similar EAV table has 8 million rows. A versioned EAV table has 8 million rows, plus a row for every change to every value and every attribute name.
Storing a version number, and joining to a second table that contains the current values doesn't gain much, if anything at all. Every (traditional) insert would require inserts into two tables. What would be one row of 8 columns becomes 16 rows (8 in each of two tables.).
Selection is a little simpler, requiring only a join.

Postgres window function column with Rails

I'd like to use the rank() function PostgreSQL on one of my columns.
Character.select("*, rank() OVER (ORDER BY points DESC)")
But since I don't have a rank column in my table rails doesn't include it with the query. What would be the correct way to get the rank included in my ActiveRecord object?
try this:
Character.find_by_sql("SELECT *, rank() OVER (ORDER BY points DESC) FROM characters")
it should return you Character objects with a rank attribute, as documented here. However, this may not be database-agnostic and tends to get messy if you pass around the objects.
another (expensive) solution is to add a rank column to your table, and have a callback recalculate all records' rank using .order whenever a record is saved or destroyed.
edit :
another idea suitable for single-record queries can ben seen here

Resources