Postgresql FTS solution for existing data - ruby-on-rails

In rails app, I am trying and tinkering to add fts in postgres for existing data.
Here is what I have done:
class AddNameFtsIndexToCompanies < ActiveRecord::Migration
def up
execute(<<-'eosql'.strip)
DROP INDEX IF EXISTS index_companies_name;
CREATE INDEX index_companies_name
ON companies
USING gin( (to_tsvector('english', "companies"."name")) );
eosql
execute(<<-'eosql'.strip)
ALTER TABLE companies ADD COLUMN name_tsv tsvector;
CREATE TRIGGER tsv_name_update
BEFORE INSERT OR UPDATE ON companies FOR EACH ROW
EXECUTE PROCEDURE tsvector_update_trigger(name_tsv, 'pg_catalog.english', name);
CREATE INDEX index_companies_fts_name ON companies USING GIN (name_tsv);
eosql
end
def down
execute(<<-'eosql'.strip)
DROP INDEX IF EXISTS index_companies_name
eosql
execute(<<-'eosql'.strip)
DROP INDEX IF EXISTS index_fts_name;
DROP TRIGGER IF EXISTS tsv_name_update ON companies;
ALTER TABLE companies DROP COLUMN name_tsv
eosql
end
end
The value for name_tsv column is still empty.
But for just quick test , I tried this:
input_data = "foo"
Company.where(["to_tsvector(companies.name) ## plainto_tsquery(?)", input_data ])
and compare it with this:
input_data = "foo"
Company.where(["companies.name ilike ? ", "%#{input_data}%"])
And the former is slower.
Questions:
1. Why is it slower?
2. What is the best practice to populate tsvector column for existing data?
Although my question is related to rails app, but generally it's more about postgresql fts,
so any postgres-specific solution is still welcomed.

Why is it slower?
I am willing to bet it is doing a sequential scan in both cases and the tsvector conversion is slower than the pattern matching.
What is the best practice to populate tsvector column for existing data?
You need to create indexes that PostgreSQL can use for operations such as overlapping elements. Btree indexes (the default) don't give you that. You need a GIN or GIST index (the big difference in this case is that there is a read/write performance tradeoff in that choice). Also PostgreSQL won't know that it can use an index in your case because you aren't querying on the indexed column. What you need instead is a functional index. So you need to do something like:
CREATE INDEX company_name_idx_fts ON companies USING GIN (to_tsvector(name, 'English'));
Then you can scan the output of that function against your full text search in your query.

Related

Incrementally copy data from one (horrible) database to another (nicer) database in rails

As with all my questions at the moment, I'm working with the "Advantage Database Server" on a remote machine, which is slow, and clumsy.
It would be great if I could write a quick script to dump changes made to the "live" system into a nice PostgreSQL database.
The existing database is made up of about 30 tables, however only about 7 of these are actively updated.
I have the ones I want copied defined as models already.
The ADS tables all have a pseudo-column of "ROWID" which should stay the same within the existing database (according to the documentation) ... this is also often used as the "Primary Key" on the ADS tables except for the fact that it isn't indexed!
I'm proposing to create a new table in PostgreSQL with a copy of this data, including the pseudo-column ROWID (not a PostgreSQL reserved word, I believe), and then doing a comparison of the live ADS data to the PostgreSQL equivalent.
class Determinand << AisBase
self.table_name = 'DETS'
self.sequence_name = :autogenerated
self.primary_key = 'DET'
end
class PgDeterminand << PostgresBase
self.sequence_name = :autogenerated
self.primary_key = 'DET'
end
livet = Determinand.select("ROWID").map(&:ROWID)
devt = PgDeterminand.select("ROWID").map(&:ROWID)
new_dets = Determinand.find_by(ROWID: livet - devt)
# or maybe
(livet - devt).map do |rid|
Determinand.find_by(ROWID: rid)
end
and then loop through the new_dets to create new PgDeterminand rows ...
the reading is very slow:
puts Benchmark.measure { livet=Determinand.select("ROWID").map(&:ROWID) }
0.196957 0.098432 0.295389 ( 26.503560)
livet.count
=> 6136
and this is not a big table ...
can anyone think of a clearer way to look at doing this?
-- EDIT --
Okay, I've copied all the existing models to an "Ads" folder, created new objects in the Postgres (based on the existing schema.rb file), removed all the belongs_to from the models (no referential integrity on the AIS LIMS tables!) and I can quickly and easily copy the data to the new tables like follows:
def force_utf8 (hsh)
hsh.each_with_object({}) do |(i,j),a|
a[i]= j.present? && j.is_a?(String) ? j.encode("utf-8", invalid: :replace, undef: :replace, replace: '?') : j
end
end
Ads::Determinand.all.as_json.each do |d|
Determinand.create(force_utf8(d))
end
this isn't an incremental yet, but using the ROWID from the existing table, I should be able to work from there
-- EDIT 2 --
ROWID appears to be essentially sequential for each table ... except that it uses the order '[A-Za-z0-9+/]' ... awesome!
I was hoping to do just a "greater than last stored ROWID" for new data in the "Live" system:
Ads::Determinand.where(Ads::Determinand.arel_table['ROWID'].gt(Determinand.maximum(:ROWID))).as_json.each do |d|
Determinand.create(force_utf8(d))
end
but this obviously doesn't cope with ROWIDs after an ending "zz":
CFTquNARAXIFAAAezz is greater than CFTquNARAXIFAAAe+D
Okay, I have this mostly sorted now:
Schema Initialisation
first I moved all my models to an "Ads" directory (adding in "module Ads" to each model), set up 2 databases in my project and gathered the "existing" schema using rake db:schema:dump
then I created new models (e.g.):
rails g model Determinand
I then copied the existing model from the ads_schema.rb to the rails migration, and rake db:migrate:postgres
Initial Data Dump
I then did an initial data export/import.
On smaller tables, I was able to use the following:
Ads::Client.all.as_json.each do |c|
Client.create(c)
end
but on larger tables I had to use a CSV export from the ADS, and a pgloader script to bring in the data:
load CSV
from 'RESULTS.csv'
having fields
(
SAMPNUM, DET, JOB, GLTHAN, INPUT, OUTPUT, RESULT, ERROR, GENFLAG,
SPECFLAG, STATFLAG, COMPFLAG, REPEAT, DETORDER, ANALYST, DETDATE [date format 'DD/MM/YYYY'],
DETTIME, LOGDATE [date format 'DD/MM/YYYY'], APPROVED [date format 'DD/MM/YYYY'], APPROVEDBY, INSTRUMENT, FILENAME, LINE_NO,
TEXTRESULT, DATATYPE, SUITE, TEST, SECTION, UKAS, MCERTS, ACCRED, DEVIATING,
PRINT_1, PRINT_1_BY, PRINT_1_AT, PRINT_2, PRINT_2_BY, PRINT_2_AT, LABEL, LABLOCN
)
into postgresql://$user:$password#localhost/ads_project
TARGET TABLE results
TARGET COLUMNS
(
'SAMPNUM', 'DET', 'JOB', 'GLTHAN', 'INPUT', 'OUTPUT', 'RESULT', 'ERROR', 'GENFLAG',
'SPECFLAG', 'STATFLAG', 'COMPFLAG', 'REPEAT', 'DETORDER', 'ANALYST', 'DETDATE',
'DETTIME', 'LOGDATE', 'APPROVED', 'APPROVEDBY', 'INSTRUMENT', 'FILENAME', 'LINE_NO',
'TEXTRESULT', 'DATATYPE', 'SUITE', 'TEST', 'SECTION', 'UKAS', 'MCERTS', 'ACCRED', 'DEVIATING',
'PRINT_1', 'PRINT_1_BY', 'PRINT_1_AT', 'PRINT_2', 'PRINT_2_BY', 'PRINT_2_AT', 'LABEL', 'LABLOCN'
)
with csv header,
fields optionally enclosed by '"',
fields terminated by ',',
drop indexes
before load do
$ alter table results alter column created_at drop not null, alter column updated_at drop not null; $$
after load do
$ update results set created_at = "DETDATE", updated_at=NOW() where created_at is null and updated_at is null; $,
$ alter table results alter column created_at set not null, alter column updated_at set not null; $
;
Incremental Updates
for the incremental updates I have to do something like the following:
On smaller tables (~ <1000 rows):
Ads::DetLimit.where.not(ROWID: DetLimit.pluck(:ROWID)).as_json.each do |d|
DetLimit.create(force_utf8(d))
end
On Larger tables I need to use Ruby to limit the IDs that have changed (essentially white-list not black-list):
zzz = SuiteDet.pluck(:ROWID)
yyy = Ads::SuiteDet.pluck(:ROWID)
Ads::SuiteDet.where(ROWID: yyy-zzz).as_json.each do |d|
SuiteDet.create(force_utf8(d))
end
Deployment
I created a CopyTable script to run, so that I can batch it, with just the increments now, and it takes about 2 minutes to run, which is acceptable
I'm not familiar with ADS, but IMO it'll be a good start if you have access to modify the dsign by adding necessary indexes to it.
Also Determinand.pluck(:id) is always MUCH faster than Determinand.select("ROWID").map(&:ROWID)

Add uniqueness constraint to Postgres text array contents [duplicate]

I'm trying to come up with a PostgreSQL schema for host data that's currently in an LDAP store. Part of that data is the list of hostnames a machine can have, and that attribute is generally the key that most people use to find the host records.
One thing I'd like to get out of moving this data to an RDBMS is the ability to set a uniqueness constraint on the hostname column so that duplicate hostnames can't be assigned. This would be easy if hosts could only have one name, but since they can have more than one it's more complicated.
I realize that the fully-normalized way to do this would be to have a hostnames table with a foreign key pointing back to the hosts table, but I'd like to avoid having everybody need to do joins for even the simplest query:
select hostnames.name,hosts.*
from hostnames,hosts
where hostnames.name = 'foobar'
and hostnames.host_id = hosts.id;
I figured using PostgreSQL arrays could work for this, and they certainly make the simple queries simple:
select * from hosts where names #> '{foobar}';
When I set a uniqueness constraint on the hostnames attribute, though, it of course treats the entire list of names as the unique value instead of each name. Is there a way to make each name unique across every row instead?
If not, does anyone know of another data-modeling approach that would make more sense?
The righteous path
You might want to reconsider normalizing your schema. It is not necessary for everyone to "join for even the simplest query". Create a VIEW for that.
Table could look like this:
CREATE TABLE hostname (
hostname_id serial PRIMARY KEY
, host_id int REFERENCES host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
, hostname text UNIQUE
);
The surrogate primary key hostname_id is optional. I prefer to have one. In your case hostname could be the primary key. But many operations are faster with a simple, small integer key. Create a foreign key constraint to link to the table host.
Create a view like this:
CREATE VIEW v_host AS
SELECT h.*
, array_agg(hn.hostname) AS hostnames
-- , string_agg(hn.hostname, ', ') AS hostnames -- text instead of array
FROM host h
JOIN hostname hn USING (host_id)
GROUP BY h.host_id; -- works in v9.1+
Starting with pg 9.1, the primary key in the GROUP BY covers all columns of that table in the SELECT list. The release notes for version 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
Queries can use the view like a table. Searching for a hostname will be much faster this way:
SELECT *
FROM host h
JOIN hostname hn USING (host_id)
WHERE hn.hostname = 'foobar';
Provided you have an index on host(host_id), which should be the case as it should be the primary key. Plus, the UNIQUE constraint on hostname(hostname) implements the other needed index automatically.
In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:
CREATE INDEX hn_multi_idx ON hostname (hostname, host_id);
Starting with Postgres 9.3, you could use a MATERIALIZED VIEW, circumstances permitting. Especially if you read much more often than you write to the table.
The dark side (what you actually asked)
If I can't convince you of the righteous path, here is some assistance for the dark side:
Here is a demo how to enforce uniqueness of hostnames. I use a table hostname to collect hostnames and a trigger on the table host to keep it up to date. Unique violations raise an exception and abort the operation.
CREATE TABLE host(hostnames text[]);
CREATE TABLE hostname(hostname text PRIMARY KEY); -- pk enforces uniqueness
Trigger function:
CREATE OR REPLACE FUNCTION trg_host_insupdelbef()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
-- split UPDATE into DELETE & INSERT
IF TG_OP = 'UPDATE' THEN
IF OLD.hostnames IS DISTINCT FROM NEW.hostnames THEN -- keep going
ELSE
RETURN NEW; -- exit, nothing to do
END IF;
END IF;
IF TG_OP IN ('DELETE', 'UPDATE') THEN
DELETE FROM hostname h
USING unnest(OLD.hostnames) d(x)
WHERE h.hostname = d.x;
IF TG_OP = 'DELETE' THEN RETURN OLD; -- exit, we are done
END IF;
END IF;
-- control only reaches here for INSERT or UPDATE (with actual changes)
INSERT INTO hostname(hostname)
SELECT h
FROM unnest(NEW.hostnames) h;
RETURN NEW;
END
$func$;
Trigger:
CREATE TRIGGER host_insupdelbef
BEFORE INSERT OR DELETE OR UPDATE OF hostnames ON host
FOR EACH ROW EXECUTE FUNCTION trg_host_insupdelbef();
SQL Fiddle with test run.
Use a GIN index on the array column host.hostnames and array operators to work with it:
Why isn't my PostgreSQL array index getting used (Rails 4)?
Check if any of a given array of values are present in a Postgres array
In case anyone still needs what was in the original question:
CREATE TABLE testtable(
id serial PRIMARY KEY,
refs integer[],
EXCLUDE USING gist( refs WITH && )
);
INSERT INTO testtable( refs ) VALUES( ARRAY[100,200] );
INSERT INTO testtable( refs ) VALUES( ARRAY[200,300] );
and this would give you:
ERROR: conflicting key value violates exclusion constraint "testtable_refs_excl"
DETAIL: Key (refs)=({200,300}) conflicts with existing key (refs)=({100,200}).
Checked in Postgres 9.5 on Windows.
Note that this would create an index using the operator &&. So when you are working with testtable, it would be times faster to check ARRAY[x] && refs than x = ANY( refs ).
P.S. Generally I agree with the above answer. In 99% cases you'd prefer a normalized schema. Please try to avoid "hacky" stuff in production.

Count occurrence of values in a serialized attribute(array) in Active Admin dashboard (Rails, Active admin 1.0, Postgresql database, postgres_ext gem)

I'd like to have a basic table summing up the number of occurence of values inside arrays.
My app is a Daily Deal app built to learn more Ruby on Rails.
I have a model Deals, which has one attribute called Deal_goal. It's a multiple select which is serialized in an array.
Here is the deal_goal taken from schema.db:
t.string "deal_goal",:array => true
So a deal A can have deal= goal =[traffic, qualification] and another deal can have as deal_goal=[branding, traffic, acquisition]
What I'd like to build is a table in my dashboard which would take each type of goal (each value in the array) and count the number of deals whose deal_goal's array would contain this type of goal and count them.
My objective is to have this table:
How can I achieve this? I think I would need to group each deal_goal array for each type of value and then count the number of times where this goals appears in the arrays. I'm quite new to RoR and can't manage to do it.
Here is my code so far:
column do
panel "top of Goals" do
table_for Deal.limit(10) do
column ("Goal"), :deal_goal ????
# add 2 columns:
'nb of deals with this goal'
'Share of deals with this goal'
end
end
Any help would be much appreciated!
I can't think of any clean way to get the results you're after through ActiveRecord but it is pretty easy in SQL.
All you're really trying to do is open up the deal_goal arrays and build a histogram based on the opened arrays. You can express that directly in SQL this way:
with expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select goal, count(*) n
from expanded_deals
group by goal
And if you want to include all four goals even if they don't appear in any of the deal_goals then just toss in a LEFT JOIN to say so:
with
all_goals(goal) as (
values ('traffic'),
('acquisition'),
('branding'),
('qualification')
),
expanded_deals(id, goal) as (
select id, unnest(deal_goal)
from deals
)
select all_goals.goal goal,
count(expanded_deals.id) n
from all_goals
left join expanded_deals using (goal)
group by all_goals.goal
SQL Demo: http://sqlfiddle.com/#!15/3f0af/20
Throw one of those into a select_rows call and you'll get your data:
Deal.connection.select_rows(%q{ SQL goes here }).each do |row|
goal = row.first
n = row.last.to_i
#....
end
There's probably a lot going on here that you're not familiar with so I'll explain a little.
First of all, I'm using WITH and Common Table Expressions (CTE) to simplify the SELECTs. WITH is a standard SQL feature that allows you to produce SQL macros or inlined temporary tables of a sort. For the most part, you can take the CTE and drop it right in the query where its name is:
with some_cte(colname1, colname2, ...) as ( some_pile_of_complexity )
select * from some_cte
is like this:
select * from ( some_pile_of_complexity ) as some_cte(colname1, colname2, ...)
CTEs are the SQL way of refactoring an overly complex query/method into smaller and easier to understand pieces.
unnest is an array function which unpacks an array into individual rows. So if you say unnest(ARRAY[1,2]), you get two rows back: 1 and 2.
VALUES in PostgreSQL is used to, more or less, generate inlined constant tables. You can use VALUES anywhere you could use a normal table, it isn't just some syntax that you throw in an INSERT to tell the database what values to insert. That means that you can say things like this:
select * from (values (1), (2)) as dt
and get the rows 1 and 2 out. Throwing that VALUES into a CTE makes things nice and readable and makes it look like any old table in the final query.

How to migrate complex Rails database to use UUID primary keys Postgresql

I have a database I would like to convert to use UUID's as the primary key in postgresql.
I have roughly 30 tables with deep multi-level associations. Is there an 'easy' way to convert all current ID's to UUID?
From this: https://coderwall.com/p/n_0awq, I can see that I could alter the table in migration. I was thinking something like this:
for client in Client.all
# Retrieve children
underwritings = client.underwritings
# Change primary key
execute 'ALTER TABLE clients ALTER COLUMN id TYPE uuid;'
execute 'ALTER TABLE clients ALTER COLUMN id SET DEFAULT uuid_generate_v1();'
# Get new id - is this already generated?
client_id = client.id
for underwriting in underwritings
locations = underwriting.locations
other_record = underwriting.other_records...
execute 'ALTER TABLE underwritings ALTER COLUMN id TYPE uuid;'
execute 'ALTER TABLE underwritings ALTER COLUMN id SET DEFAULT uuid_generate_v1();'
underwriting.client_id = client_id
underwriting.saved
underwriting_id = underwriting.id
for location in locations
buildings = location.buildings
execute 'ALTER TABLE locations ALTER COLUMN id TYPE uuid;'
execute 'ALTER TABLE locations ALTER COLUMN id SET DEFAULT uuid_generate_v1();'
location.undewriting_id = underwriting_id
location.save
location_id = location.id
for building in buildings
...
end
end
for other_record in other_records
...
end
...
...
end
end
Questions:
Will this work?
Is there an easier way to do this?
Will child records be retrieved properly as long as they are retrieved before the primary key is changed?
Will the new primary key be already generated as soon as the alter table is called?
Thanks very much for any help or tips in doing this.
I found these to be quite tedious. It is possible to use direct queries to PostgreSQL to convert table with existing data.
For primary key:
ALTER TABLE students
ALTER COLUMN id DROP DEFAULT,
ALTER COLUMN id SET DATA TYPE UUID USING (uuid(lpad(replace(text(id),'-',''), 32, '0'))),
ALTER COLUMN id SET DEFAULT uuid_generate_v4()
For other references:
ALTER TABLE students
ALTER COLUMN city_id SET DATA TYPE UUID USING (uuid(lpad(replace(text(city_id),'-',''), 32, '0')))
The above left pads the integer value with zeros and converts to a UUID. This approach does not require id mapping and if needed old id could be retrieved.
As there is no data copying, this approach works quite fast.
To handle these and more complicated case of polymorphic associations please use https://github.com/kreatio-sw/webdack-uuid_migration. This gem adds additional helpers to ActiveRecord::Migration to ease these migrations.
I think trying to do something like this through Rails would just complicate matters. I'd ignore the Rails side of things completely and just do it in SQL.
Your first step is grab a complete backup of your database. Then restore that backup into another database to:
Make sure that your backup works.
Give you a realistic playpen where you can make mistakes without consequence.
First you'd want to clean up your data by adding real foreign keys to match all your Rails associations. There's a good chance that some of your FKs will fail, if they do you'll have to clean up your broken references.
Now that you have clean data, rename all your tables to make room for the new UUID versions. For a table t, we'll refer to the renamed table as t_tmp. For each t_tmp, create another table to hold the mapping from the old integer ids to the new UUID ids, something like this:
create table t_id_map (
old_id integer not null,
new_id uuid not null default uuid_generate_v1()
)
and then populate it:
insert into t_id_map (old_id)
select id from t_tmp
And you'll probably want to index t_id_map.old_id while you're here.
This gives us the old tables with integer ids and a lookup table for each t_tmp that maps the old id to the new one.
Now create the new tables with UUIDs replacing all the old integer and serial columns that held ids; I'd add real foreign keys at this point as well; you should be paranoid about your data: broken code is temporary, broken data is usually forever.
Populating the new tables is pretty easy at this point: simply use insert into ... select ... from constructs and JOIN to the appropriate t_id_map tables to map the old ids to the new ones. Once the data has been mapped and copied, you'll want to do some sanity checking to make sure everything still makes sense. Then you can drop your t_tmp and t_id_map tables and get on with your life.
Practice that process on a copy of your database, script it up, and away you go.
You would of course want to shut down any applications that access your database while you're doing this work.
Didn't want to add foreign keys, and wanted to to use a rails migration. Anyways, here is what I did if others are looking to do this (example for 2 tables, I did 32 total):
def change
execute 'CREATE EXTENSION "uuid-ossp";'
execute <<-SQL
ALTER TABLE buildings ADD COLUMN guid uuid DEFAULT uuid_generate_v1() NOT NULL;
ALTER TABLE buildings ALTER COLUMN guid SET DEFAULT uuid_generate_v1();
ALTER TABLE buildings ADD COLUMN location_guid uuid;
ALTER TABLE clients ADD COLUMN guid uuid DEFAULT uuid_generate_v1() NOT NULL;
ALTER TABLE clients ALTER COLUMN guid SET DEFAULT uuid_generate_v1();
ALTER TABLE clients ADD COLUMN agency_guid uuid;
ALTER TABLE clients ADD COLUMN account_executive_guid uuid;
ALTER TABLE clients ADD COLUMN account_representative_guid uuid;
SQL
for record in Building.all
location = record.location
record.location_guid = location.guid
record.save
end
for record in Client.all
agency = record.agency
record.agency_guid = agency.guid
account_executive = record.account_executive
record.account_executive_guid = account_executive.guid unless account_executive.blank?
account_representative = record.account_representative
record.account_representative_guid = account_representative.guid unless account_representative.blank?
record.save
end
execute <<-SQL
ALTER TABLE buildings DROP CONSTRAINT buildings_pkey;
ALTER TABLE buildings DROP COLUMN id;
ALTER TABLE buildings RENAME COLUMN guid TO id;
ALTER TABLE buildings ADD PRIMARY KEY (id);
ALTER TABLE buildings DROP COLUMN location_id;
ALTER TABLE buildings RENAME COLUMN location_guid TO location_id;
ALTER TABLE clients DROP CONSTRAINT clients_pkey;
ALTER TABLE clients DROP COLUMN id;
ALTER TABLE clients RENAME COLUMN guid TO id;
ALTER TABLE clients ADD PRIMARY KEY (id);
ALTER TABLE clients DROP COLUMN agency_id;
ALTER TABLE clients RENAME COLUMN agency_guid TO agency_id;
ALTER TABLE clients DROP COLUMN account_executive_id;
ALTER TABLE clients RENAME COLUMN account_executive_guid TO account_executive_id;
ALTER TABLE clients DROP COLUMN account_representative_id;
ALTER TABLE clients RENAME COLUMN account_representative_guid TO account_representative_id;
SQL
end

How to efficiently search for last record matching a condition in Rails and PostgreSQL?

Suppose you want to find the last record entered into the database (highest ID) matching a string: Model.where(:name => 'Joe'). There are 100,000+ records. There are many matches (say thousands).
What is the most efficient way to do this? Does PostgreSQL need to find all the records, or can it just find the last one? Is this a particularly slow query?
Working in Rails 3.0.7, Ruby 1.9.2 and PostgreSQL 8.3.
The important part here is to have a matching index. You can try this small test setup:
Create schema xfor testing:
-- DROP SCHEMA x CASCADE; -- to wipe it all for a retest or when done.
CREATE SCHEMA x;
CREATE TABLE x.tbl(id serial, name text);
Insert 10000 random rows:
INSERT INTO x.tbl(name) SELECT 'x' || generate_series(1,10000);
Insert another 10000 rows with repeating names:
INSERT INTO x.tbl(name) SELECT 'y' || generate_series(1,10000)%20;
Delete random 10% to make it more real life:
DELETE FROM x.tbl WHERE random() < 0.1;
ANALYZE x.tbl;
Query can look like this:
SELECT *
FROM x.tbl
WHERE name = 'y17'
ORDER BY id DESC
LIMIT 1;
--> Total runtime: 5.535 ms
CREATE INDEX tbl_name_idx on x.tbl(name);
--> Total runtime: 1.228 ms
DROP INDEX x.tbl_name_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id);
--> Total runtime: 0.053 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
--> Total runtime: 0.048 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_idx on x.tbl(name);
CLUSTER x.tbl using tbl_name_idx;
--> Total runtime: 1.144 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
CLUSTER x.tbl using tbl_name_id_idx;
--> Total runtime: 0.047 ms
Conclusion
With a fitting index, the query performs more than 100x faster.
Top performer is a multicolumn index with the filter column first and the sort column last.
Matching sort order in the index helps a little in this case.
Clustering helps with the simple index, because still many columns have to be read from the table, and these can be found in adjacent blocks after clustering. It doesn't help with the multicolumn index in this case, because only one record has to be fetched from the table.
Read more about multicolumn indexes in the manual.
All of these effects grow with the size of the table. 10000 rows of two tiny columns is just a very small test case.
You can put the query together in Rails and the ORM will write the proper SQL:
Model.where(:name=>"Joe").order('created_at DESC').first
This should not result in retrieving all Model records, nor even a table scan.
This is probably the easiest:
SELECT [columns] FROM [table] WHERE [criteria] ORDER BY [id column] DESC LIMIT 1
Note: Indexing is important here. A huge DB will be slow to search no matter how you do it if you're not indexing the right way.

Resources