Ruby variable in Postgresql Regex - ruby-on-rails

I have a table named orders with a column named comment in it.
I have a list of order ids, and I want to search through all the comment to see which order has referenced them.
So far I have worked out the regex part of the sql
Order.where('comment ~ \'\s\d{5}\' OR comment ~ \'^\d{5}\'')
.where("comment LIKE '%?%'", order_id)
But I can't seem to work out a way to make the query more flexible to find order_ids that have different length other than 5
Is it possible to make {5} more dynamic, to something like {#{order_id.to_s.length}}?
UPDATE
Just ran into this post
Using a boundary match would work even better for this case.
Order.where('comment ~ \'\y?\y\'', order_id)
Would do the trick.

We can pass digit_count
digit_count = order_id.to_s.length
Order.where('comment ~ \'\s\d{?}\' OR comment ~ \'^\d{?}\'', digit_count, digit_count)
if you are looking for specific order_id
Order.where("comment ~ \'\s#{order_id}\s\'")

Related

How to capture regex first occurence and interpolate into string with postgreSQL

I'm trying to concatenate the digits from a string that starts with 'CityName' into a separate string. I have the concatenation part. My issue is being able to access the matches from the regex
I have a regex in rails that looks like /CityName\s*(\d+)/i. I'm super new to regex and it's hard for me to wrap my head around the docs. But I'm assuming that this regex will find any digits after the CityName case intensively. And then it's interpolated if it matches an attribute on my model.
regex = /CityName\s*(\d+)/i
if line_1 =~ regex
"C#{$1}"
...
end
But further along in the execution, it's slowing down because I have to iterate over a lot of records. I have a query in psql that will do that calculations that I need, however I'm having a hard time implementing this regex replacement. My attempts so far look like:
CASE
when addr.line_1 ~* 'CityName\s*(\d+)' then 'C' || regex_matches('CityName\s*(\d+)')[0]
...
I'm having a hard time finding a solution to grab the first occurrence of the regex match. Thanks for any tips :D
EDIT: I am trying to grab the digits after 'CityName' from a string if that string contains 'CityName'
Ultimately I need assistance with the regex and how to contactenate the digits with 'C'
Your question is a bit unclear. Are you trying to add the digits to your selection or to filter records based on them?
If you just want to select them:
Address.select(%q{(regexp_matches(addr.line_1, 'CityName\s*(\d+)'))[1] as digits})
.map(&:digits)
If you want to filter based on then:
Address.where(%q{addr.line_1 ~ 'CityName\s*(\d+)'}).map &:email
.map(&:line_1)
Also a few notes:
Selecting digits case intensively does not really make sense. Digits
does not have case.
PostgreSQL arrays start from 1 instead of 0.
It seems you need a subquery or a WITH query:
SELECT tbl1.col1, sum(...), min(...) FROM (SELECT ..., CASE ...yourregex stuff... END col1 FROM ...) tbl1 GROUP BY 1;
WITH tbl1 AS (SELECT ..., CASE ...yourregex stuff... END col1 FROM ...) SELECT t.col1, sum(...) FROM tbl1 t GROUP BY 1;
If you need them regulary, you can also create views from the query or create a temp table, then you can use it in queries later.
Got it! Was able to finally start to figure out the regex.
WHEN addr.line_1 ~* '(?i)CityName\s*(\d+)' THEN 'C' || (SELECT (regexp_matches(addr.line_1, '(?i)CityName\s*(\d+)'))[1])
The (?i) allowed for case insensitive matching for CityName and then the concatenation worked. Thank you #ti6on for pointing out the index difference with postgres :D

Combining distinct with another condition

I'm migrating a Rails 3.2 app to Rails 5.1 (not before time) and I've hit a problem with a where query.
The code that works on Rails 3.2 looks like this,
sales = SalesActivity.select('DISTINCT batch_id').where('salesperson_id = ?', sales_id)
sales.find_each(batch_size: 2000) do |batchToProcess|
.....
When I run this code under Rails 5.1, it appears to cause the following error when it attempts the for_each,
ArgumentError (Primary key not included in the custom select clause):
I want to end up with an array(?) of unique batch_ids for the given salesperson_id that I can then traverse, as was working with Rails 3.2.
For reasons I don't understand, it looks like I might need to include the whole record to traverse through (my thinking being that I need to include the Primary key)?
I'm trying to rephrase the 'where', and have tried the following,
sales = SalesActivity.where(salesperson_id: sales_id).select(:batch_id).distinct
However, the combined ActiveRecordQuery applies the DISTINCT to both the salesperson_id AND the batch_id - that's #FAIL1
Also, because I'm still using a select (to let distinct know which column I want to be 'distinct') it also still only selects the batch_id column of course, which I am trying to avoid - that's #FAIL2
How can I efficiently pull all unique batch_id records for a given salesperson_id, so I can then for_each them?
Thanks!
How about:
SalesActivity.where(salesperson_id: sales_id).pluck('DISTINCT batch_id')
May need to change up the ordering of where and pluck, but pluck should return an array of the batch_ids

How to store regex or search terms in Postgres database and evaluate in Rails Query?

I am having trouble with a DB query in a Rails app. I want to store various search terms (say 100 of them) and then evaluate against a value dynamically. All the examples of SIMILAR TO or ~ (regex) in Postgres I can find use a fixed string within the query, while I want to look the query up from a row.
Example:
Table: Post
column term varchar(256)
(plus regular id, Rails stuff etc)
input = "Foo bar"
Post.where("term ~* ?", input)
So term is VARCHAR column name containing the data of at least one row with the value:
^foo*$
Unless I put an exact match (e.g. "Foo bar" in term) this never returns a result.
I would also like to ideally use expressions like
(^foo.*$|^second.*$)
i.e. multiple search terms as well, so it would match with 'Foo Bar' or 'Search Example'.
I think this is to do with Ruby or ActiveRecord stripping down something? Or I'm on the wrong track and can't use regex or SIMILAR TO with row data values like this?
Alternative suggestions on how to do this also appreciated.
The Postgres regular expression match operators have the regex on the right and the string on the left. See the examples: https://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-TABLE
But in your query you're treating term as the string and the 'Foo bar' as the regex (you've swapped them). That's why the only term that matches is the exact match. Try:
Post.where("? ~* term", input)

Postgresql text searching, matching multiple words

I don't know the name for this kind of search, but I see that it's getting pretty common.
Let's say I have records with the following file names:
'order_spec.rb', 'order.sass', 'orders_controller_spec.rb'
If I search with the following string 'oc' I would like the result to return 'orders_controller_spec.rb' due to match the o in orders and the c in controller.
If the string is 'os' then I'd like all 3 to match, 'order_spec.rb', 'order.sass', 'orders_controller_spec.rb'.
If the string is 'oco' then I'd like 'orders_controller_spec.rb'
What is the name for this kind of search and how would I go about getting this done in Postgresql?
This is a called a subsequence search. One simple way to do it in Postgres is to use the LIKE operator (or several of the other options in those docs) and fill the spaces between your letters with a wildcard, which for LIKE is %. To match anything with an o followed by an s in the words column, that would look like this:
SELECT * FROM table WHERE words LIKE '%o%s%';
This is a relatively expensive search, but you can improve performance with a varchar_pattern_ops or text_pattern_ops index to support faster pattern matching.
CREATE INDEX pattern_index ON table (words varchar_pattern_ops);

Configure Sphinx to index dash and search it with and without it

I have a record
Item id: 1, name: "wd-40"
How do I configure Sphinx to match this record on the following queries:
Item.search("wd40")
Item.search("wd-40")
To answer your title question, charset_table is what you want.
http://sphinxsearch.com/docs/current.html#charsets
But that doesnt actully solve the query of matching those two queries, indexing - wouldn't work, just be the inverse of indexing it.
Instead, you probably want ignore_chars
http://sphinxsearch.com/docs/current.html#conf-ignore-chars
First indexing:
By default, only ascii characters are indexed by Sphinx; the others are considered word separators. To fix that, you need to use the charset_table parameter to map the dash to the dash character.
Second searching:
AFAIK, it is not possible to make Sphinx to consider both searches like you are asking for. However, you can just use something like:
# in Python, but I believe is understandable
query = word
if '-' in word:
query += " | " + word.replace('-','')
Item.search(query) # if word = 'wd-40', query = 'wd-40 | wd40'

Resources