ruby on rails' alphabetical order method doesn't place word boundary before "a"? [duplicate] - ruby-on-rails

I use PostgreSQL 9.3.3 and I have a table with one column named as title (character varying(50)).
When I have executed the following query:
select * from test
order by title asc
I got the following results:
#
A
#Example
Why "#Example" is in the last position? In my opinion "#Example" should be in the second position.

Sort behaviour for text (including char and varchar as well as the text type) depends on the current collation of your locale.
See previous closely related questions:
PostgreSQL Sort
https://stackoverflow.com/q/21006868/398670
If you want to do a simplistic sort by ASCII value, rather than a properly localized sort following your local language rules, you can use the COLLATE clause
select *
from test
order by title COLLATE "C" ASC
or change the database collation globally (requires dump and reload, or full reindex). On my Fedora 19 Linux system, I get the following results:
regress=> SHOW lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#'))
SELECT title FROM v ORDER BY title ASC;
title
-------
#
a
#a
a#
a#a
(5 rows)
regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#'))
SELECT title FROM v ORDER BY title COLLATE "C" ASC;
title
-------
#
#a
a
a#
a#a
(5 rows)
PostgreSQL uses your operating system's collation support, so it's possible for results to vary slightly from host OS to host OS. In particular, at least some versions of Mac OS X have significantly broken unicode collation handling.

It seems, that when sorting Oracle as well as Postgres just ignore non alpha numeric chars, e.g.
select '*'
union all
select '#'
union all
select 'A'
union all
select '*E'
union all
select '*B'
union all
select '#C'
union all
select '#D'
order by 1 asc
returns (look: that DBMS doesn't pay any attention on prefix before 'A'..'E')
*
#
A
*B
#C
#D
*E
In your case, what Postgres actually sorts is
'', 'A' and 'Example'
If you put '#' in the middle od the string, the behaviour will be the same:
select 'A#B'
union all
select 'AC'
union all
select 'A#D'
union all
select 'AE'
order by 1 asc
returns (# ignored, and so 'AB', 'AC', 'AD' and 'AE' actually compared)
A#B
AC
A#D
AE
To change the comparison rules you should use collation, e.g.
select '#' collate "POSIX"
union all
select 'A' collate "POSIX"
union all
select '#Example' collate "POSIX"
order by 1 asc
returns (as it required in your case)
#
#Example
A

Related

Change the ordering of NULL values in all table columns postgresql call

I am using rails with postgresql to populate a dataTable and was wondering if there is a way to change the default behaviour of NULLS being the higher value (coming after large numbers when sorted) to become equivalent to lower than 0 in a sort. From what I understand this is a built in postgresql behaviour, so I think I will have to use the sql call to achieve this. And I need to apply this to all columns so it works with the DataTables sort ASC/DESC functionality.
Example some functionality similar to:
def get_raw_records
Analytics::Database.where(id: 7).order('give nulls < 0 value here for all columns?')
end
NULLS FIRST/ LAST does not give this functionality I need something like coalesce that ideally does not return a sorted instance but changes the default behaviour of nulls placed after large values when for it is sorted client side
you can use
order by coalesce(col_name,-1)
close to Sabari suggested:
db=# with c(v) as (values('1'),('a'),(null),(null),('b'))
select * from c
order by coalesce(v,'-infinity') asc;
v
---
1
a
b
(5 rows)
db=# with c(v) as (values('1'),('a'),(null),(null),('b'))
select * from c
order by coalesce(v,'-infinity') desc;
v
---
b
a
1
(5 rows)
instead of fixed integer below zero I use -inf here. which works fine with text (not because text understands infinity, but rather because - goes before [a-z] or [0-9]), but would not work with normal integer. Of course you can cast it to float:
db=# with c(v) as (values(1::float),(3),(null),(null),(-9))
select * from c
order by coalesce(v,'-infinity') desc;
v
----
3
1
-9
(5 rows)
db=# with c(v) as (values(1::float),(3),(null),(null),(-9))
select * from c
order by coalesce(v,'-infinity') asc;
v
----
-9
1
3
(5 rows)
with is dangerous (well, not in sorting though) itself (and ugly). which leads to the answer - I don't see good solution here from the top of my head... You should better make separate boundaries for different data types.
In Postgres you can specify in the order by clause how you want NULL sorted:
select 1 as col UNION select 2 UNION SELECT NULL ORDER BY col ASC NULLS FIRST;
You can either specify NULLS FIRST or NULLS LAST
See https://www.postgresql.org/docs/current/static/queries-order.html

Find records with ID in array of IDS and keep the order of records matching that of IDs [duplicate]

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?
You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering
In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number
Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')
With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);
I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC
Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index
In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')
On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?
To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.
sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter
SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')
And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)
create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id
Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;
select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6
Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168
I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

Solving a PG::GroupingError: ERROR

The following code gets all the residences which have all the amenities which are listed in id_list. It works with out a problem with SQLite but raises an error with PostgreSQL:
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: {amenity_id: id_list}).
references(:listed_amenities).
group(:residence_id).
having("count(*) = ?", id_list.size)
The error on the PostgreSQL version:
What do I have to change to make it work with PostgreSQL?
A few things:
references should only be used with includes; it tells ActiveRecord to perform a join, so it's redundant when using an explicit joins.
You need to fully qualify the argument to group, i.e. group('residences.id').
For example,
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: { amenity_id: id_list }).
group('residences.id').
having('COUNT(*) = ?", id_list.size)
The query the Ruby (?) code is expanded to is selecting all fields from the residences table:
SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities"
ON "listed_amentities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1;
From the Postgres manual, When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column.
You'll need to either group by all fields that aggregate functions aren't applied to, or do this differently. From the query, it looks like you only need to scan the amentities table to get the residence ID you're looking for:
SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
And then fetch your residence data with that ID. Or, in one query:
SELECT "residences".*
FROM "residences"
WHERE "id" IN (SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
);

Faster search for records where 1st character of field doesn't match [A-Za-z]?

I currently have the following:
User (id, fname, lname, deleted_at, guest)
I can query for a list of user's by their fname initial like so:
User Load (9.6ms) SELECT "users".* FROM "users" WHERE (users.deleted_at IS NULL) AND (lower(left(fname, 1)) = 's') ORDER BY fname ASC LIMIT 25 OFFSET 0
This is fast thanks to the following index:
CREATE INDEX users_multi_idx
ON users (lower(left(fname, 1)), fname)
WHERE deleted_at IS NULL;
What I want to do now is be able to query for all Users that do not start with the letter's A-Z. I got this to work like so:
SELECT "users".* FROM "users" WHERE (users.deleted_at IS NULL) AND (lower(left(fname, 1)) ~ E'^[^a-zA-Z].*') ORDER BY fname ASC LIMIT 25 OFFSET 0
But the problem is this query is very slow and does not appear to be using the index to speed up the first query. Any suggestions on how I can elegantly make the 2nd query (non a-z) faster?
I'm using Postgres 9.1 with rails 3.2
Thanks
Updated answer
Preceding question here.
My first idea idea (index with text_pattern_ops) did not work with the regular expression in my tests. Better rewrite your query to:
SELECT *
FROM users
WHERE deleted_at IS NULL
WHERE lower(left(fname, 1)) < 'a' COLLATE "C"
OR lower(left(fname, 1)) > 'z' COLLATE "C"
ORDER BY fname
LIMIT 25 OFFSET 0;
Besides from these expressions being faster generally, your regular expression also had capital letters in it, which did not match the index with lower(). And the trailing characters were pointless while comparing to a single char.
And use this index:
CREATE INDEX users_multi_idx
ON users (lower(left(fname, 1)) COLLATE "C", fname)
WHERE deleted_at IS NULL;
The COLLATE "C" part is optional and only contributes a very minor gain in performance. It's purpose is to reset collation rules to default posix collation, which just uses byte order and is generally faster. Useful, where collation rules are not relevant anyway.
If you create the index with it, only queries that match the collation can use it. So you might just skip it to simplify things if performance is not your paramount requirement.
As an alternative to #ErwinBrandstetter's general solution, PostgreSQL supports partial indexes. You can say:
CREATE INDEX users_nonalphanumeric_not_deleted_key
ON users (id)
WHERE (users.deleted_at IS NULL) AND (lower(left(fname, 1)) ~ E'^[^a-zA-Z].*');
This index won't help for any other lookups, but it will precompute the answer for this particular query. This technique is often useful for queries that return a small, predefined subset from a much larger table, since the resulting index will disregard the vast majority of the table and contain only the rows of interest.

How can I speed up or optimize this SQLite query for iOS?

I have a pretty simple DB structure. I have 12 columns in a single table, most are varchar(<50), with about 8500 rows.
When I perform the following query on an iPhone 4, I've been averaging 2.5-3 seconds for results:
SELECT * FROM names ORDER BY name COLLATE NOCASE ASC LIMIT 20
Doesn't seem like this sort of thing should be so slow. Interestingly, the same query from the same app running on a 2nd gen iPod is faster by about 1.5 seconds. That part is beyond me.
I have other queries that have the same issue:
SELECT * FROM names WHERE SEX = ?1 AND ORIGIN = ?2 ORDER BY name COLLATE NOCASE ASC LIMIT 20
and
SELECT * FROM names WHERE name LIKE ?3 AND SEX = ?1 AND ORIGIN = ?2 ORDER BY name COLLATE NOCASE ASC LIMIT 20
etc.
I've added an index on the SQLite db: CREATE INDEX names_idx ON names (name, origin, sex, meaning) where name, origin, sex and meaning are the columns I tend to query against with WHERE and LIKE operators.
Any thoughts on improving the performance of these searches or is this about as atomic as it gets?
The index CREATE INDEX names_idx ON names (name, origin, sex, meaning) will only be used, I believe, if your query includes ALL those columns. If only some are used in a query, the index can't be used.
Going on your first query: SELECT * FROM names ORDER BY name COLLATE NOCASE ASC LIMIT 20 - I would suggest adding an index on name, just by itself, i.e. CREATE INDEX names_idx1 ON names (name). That should in theory speed up that query.
If you want other indexes with combined columns for other common queries, fair enough, and it may improve query speed, but remember it'll increase your database size.
What is the most used search criteria ? if you search for names for example you could create more tables according to the name initials. A table for names which start with "A" etc. The same for genre. This would improve your search performance in some cases.

Resources