How to capture regex first occurence and interpolate into string with postgreSQL - ruby-on-rails

I'm trying to concatenate the digits from a string that starts with 'CityName' into a separate string. I have the concatenation part. My issue is being able to access the matches from the regex
I have a regex in rails that looks like /CityName\s*(\d+)/i. I'm super new to regex and it's hard for me to wrap my head around the docs. But I'm assuming that this regex will find any digits after the CityName case intensively. And then it's interpolated if it matches an attribute on my model.
regex = /CityName\s*(\d+)/i
if line_1 =~ regex
"C#{$1}"
...
end
But further along in the execution, it's slowing down because I have to iterate over a lot of records. I have a query in psql that will do that calculations that I need, however I'm having a hard time implementing this regex replacement. My attempts so far look like:
CASE
when addr.line_1 ~* 'CityName\s*(\d+)' then 'C' || regex_matches('CityName\s*(\d+)')[0]
...
I'm having a hard time finding a solution to grab the first occurrence of the regex match. Thanks for any tips :D
EDIT: I am trying to grab the digits after 'CityName' from a string if that string contains 'CityName'
Ultimately I need assistance with the regex and how to contactenate the digits with 'C'

Your question is a bit unclear. Are you trying to add the digits to your selection or to filter records based on them?
If you just want to select them:
Address.select(%q{(regexp_matches(addr.line_1, 'CityName\s*(\d+)'))[1] as digits})
.map(&:digits)
If you want to filter based on then:
Address.where(%q{addr.line_1 ~ 'CityName\s*(\d+)'}).map &:email
.map(&:line_1)
Also a few notes:
Selecting digits case intensively does not really make sense. Digits
does not have case.
PostgreSQL arrays start from 1 instead of 0.

It seems you need a subquery or a WITH query:
SELECT tbl1.col1, sum(...), min(...) FROM (SELECT ..., CASE ...yourregex stuff... END col1 FROM ...) tbl1 GROUP BY 1;
WITH tbl1 AS (SELECT ..., CASE ...yourregex stuff... END col1 FROM ...) SELECT t.col1, sum(...) FROM tbl1 t GROUP BY 1;
If you need them regulary, you can also create views from the query or create a temp table, then you can use it in queries later.

Got it! Was able to finally start to figure out the regex.
WHEN addr.line_1 ~* '(?i)CityName\s*(\d+)' THEN 'C' || (SELECT (regexp_matches(addr.line_1, '(?i)CityName\s*(\d+)'))[1])
The (?i) allowed for case insensitive matching for CityName and then the concatenation worked. Thank you #ti6on for pointing out the index difference with postgres :D

Related

Rails, Postgres 12, Query where pattern matches regex, and contains substring

I have a field in the database which contains strings that look like: 58XBF2022L1001390 I need to be able to query results which match the last letter(in this case 'L'), and match or resemble the last four digits.
The regular expression I've been using to find records which match the structure is: \d{2}[A-Z]{3}\d{4}[A-Z]\d{7}, So far I've tried using a scope to refine the results, but I'm not getting any results. Here's my scope
def self.filter_by_shortcode(input)
q = input
starting = q.slice!(0)
ending = q
where("field ~* ?", "\d{2}[A-Z]{3}\d{4}/[#{starting}]/\d{3}[#{ending}]\g")
end
Here are some more example strings, and the substring that we would be looking for. Not every string stored in this database field matches this format, so we would need to be able to first match the string using the regex provided, then search by substring.
36GOD8837G6154231
G4231
13WLF8997V2119371
V9371
78FCY5027V4561374
V1374
06RNW7194P2075353
P5353
57RQN0368Y9090704
Y0704
edit: added some more examples as well as substrings that we would need to search by.
I do not know Rails, but the SQL for what you want is relative simple. Since your string if fixed format, once that format is validated, simple concatenation of sub-strings gives your desired result.
with base(target, goal) as
( values ('36GOD8837G6154231', 'G4231')
, ('13WLF8997V2119371', 'V9371')
, ('78FCY5027V4561374', 'V1374')
, ('06RNW7194P2075353', 'P5353')
, ('57RQN0368Y9090704', 'Y0704')
)
select substr(target,10,1) || substr(target,14,4) target, goal
from base
where target ~ '^\d{2}[A-Z]{3}\d{4}[A-Z]\d{7}$';

I want to search for 2 characters, is there any solution? (trigram index only works for minimum 3 characters)

I've got a trigram index on first and last name columns and I want to search for 2 characters. I tried to execute the query with sequential scans turned on or off and measure how long it takes in each case but it takes a lot of time in both. Is there a solution for my 2 characters search to work faster?
In schema I have :
t.index "((((first_name)::text || ' '::text) || (last_name)::text))
gin_trgm_ops", name: "index_users_full_name", using: :gin
I don't think there is anything built in to do this. You could create a function that turns text into an array of bigrams (deciding what to do with spaces, punctuation, non-ASCII, etc.) and then index that array and query with && operator. But unless your two characters are something like 'qz', you might not get enough selectivity to make an index worthwhile.
create function bigram(text) returns text[] language sql as $$
select array_agg(substring($1,x,2)) from generate_series(1,length($1)-1) f(x);
$$ immutable;
create index on foobar using gin (bigram(t));
select * from foobar where bigram(t) && bigram('aa');

How to store regex or search terms in Postgres database and evaluate in Rails Query?

I am having trouble with a DB query in a Rails app. I want to store various search terms (say 100 of them) and then evaluate against a value dynamically. All the examples of SIMILAR TO or ~ (regex) in Postgres I can find use a fixed string within the query, while I want to look the query up from a row.
Example:
Table: Post
column term varchar(256)
(plus regular id, Rails stuff etc)
input = "Foo bar"
Post.where("term ~* ?", input)
So term is VARCHAR column name containing the data of at least one row with the value:
^foo*$
Unless I put an exact match (e.g. "Foo bar" in term) this never returns a result.
I would also like to ideally use expressions like
(^foo.*$|^second.*$)
i.e. multiple search terms as well, so it would match with 'Foo Bar' or 'Search Example'.
I think this is to do with Ruby or ActiveRecord stripping down something? Or I'm on the wrong track and can't use regex or SIMILAR TO with row data values like this?
Alternative suggestions on how to do this also appreciated.
The Postgres regular expression match operators have the regex on the right and the string on the left. See the examples: https://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-TABLE
But in your query you're treating term as the string and the 'Foo bar' as the regex (you've swapped them). That's why the only term that matches is the exact match. Try:
Post.where("? ~* term", input)

Normalize seeking value in SQL search query

[PostgreSQL(9.4), Rails(4.1)]
The problem:
I have a table with the names of tools. The column_name is hstore type and looks like this: name -> ('en': value, 'de': value). Worth noting that 'de' is unnecessary in this problem, cause all names are stored only in 'en' key.
Next I have to construct a search query that will find the right record, but the format of the text in query are unknown, e.g.:
In DB:
WQXZ 123GT, should match query: WQXZ_123-GT
In DB:
Three Words Name 123-D45, should match query: Three_WORDS_NAME 123D45
and so on...
Solution:
To get this happen I want to normalize the value that I'm looking for and the query in such way that both of them will be identical. To do this I need to make both values in downcase, remove all whitspaces, remove all non-alphanumeric characters, so the values above will be:
wqxz123gt == wqxz123gt
and
threewordsname123d45 == threewordsname123d45
I have no problem to format a search value in ruby:
"sTR-in.g24 3".downcase.gsub(/\s/, "").gsub(/\W/, "") # => "string243"
But I can't understand how to do this in SQL-search query to look like:
Tool.where("CODE_I_AM_LOOKING_FOR(name -> 'en') = (?)", value.downcase.gsub(/\s/, "").gsub(/\W/, ""))
Thank you for your time.
UPD: I can make a downcase in query:
Tool.where("lower(name -> 'en') = (?)", value.downcase)
But it solves only a part of the problem (downcase). The whitespaces and non-word characters (dots, dashes, underscores, etc.) are still an issue.
You can use Postgres replace function to remove spaces. Then use lower function to match on that value. Like this.
Tool.where("lower(replace(name -> 'en', ' ', '')) = (?)", value.downcase.gsub(/\s/, "").gsub(/\W/, "") )
I hope this would be helpful.
Nitin Srivastava's answer directed me in right direction. All I needed was to use regexp_replace function.
So the proper query is:
Tool.where(
"lower(regexp_replace((name -> 'en'), '[^a-zA-Z0-9]+', '', 'g')) = ?",
value.downcase.gsub(/\s/, "").gsub(/\W/,"")
)

Configure Sphinx to index dash and search it with and without it

I have a record
Item id: 1, name: "wd-40"
How do I configure Sphinx to match this record on the following queries:
Item.search("wd40")
Item.search("wd-40")
To answer your title question, charset_table is what you want.
http://sphinxsearch.com/docs/current.html#charsets
But that doesnt actully solve the query of matching those two queries, indexing - wouldn't work, just be the inverse of indexing it.
Instead, you probably want ignore_chars
http://sphinxsearch.com/docs/current.html#conf-ignore-chars
First indexing:
By default, only ascii characters are indexed by Sphinx; the others are considered word separators. To fix that, you need to use the charset_table parameter to map the dash to the dash character.
Second searching:
AFAIK, it is not possible to make Sphinx to consider both searches like you are asking for. However, you can just use something like:
# in Python, but I believe is understandable
query = word
if '-' in word:
query += " | " + word.replace('-','')
Item.search(query) # if word = 'wd-40', query = 'wd-40 | wd40'

Resources