Intersection of 2 mongo queries - ruby-on-rails

I want to emulate an "&" operator for searching elements in my mongo db.
there are 4 searchable fields name id tags negative_tags
for a match to be true, any of these could match.
For instance if I search a&b, "a" could be matched in any of the 4 fields and "b" as well. However, they need to both be matched
I tried doing the following
Model.or({:name.all => regexps}, {:id.all =>regexps}, {:tags.all => regexps}, {:negative_tags.all => regexps})
regexps is an array of regexp. For the example given it would be
[ /a/i, /b/i ]
However, this does not behave like I want, because you need the matches to all happen on the same property.
My other try was to run separate mongo queries for each regexps and take the intersection of the sets.
Model.or({:name.in => one_regexp}, {:id.in => one_regexp}, {:tags.in => one_regexp}, {:negative_tags.in => one_regexp})
My problem is that I am not sure how to merge the two hashes. Mongoid lazily evaluates the queries and returns a Mongoid::Criteria object.
I'd like to know how I can do an intersection

There are two distinct ways to handle this. Are you trying to have both regular expressions evaluate per field or can a be true for name and b be true for id?
If it is the latter, I would use a gem for this:
gem 'mongoid_search'
If it is the former, I'd simply join the array into a single regex:
one_regexp.collect {|regexp| "(?=.*#{regexp}" }.join
If what you want to do is to apply two RegEx expressions onto each field, simply put both both in nonconsuming patterns and use one regular expression. This is known as positive lookahead assertion (?=) combined with the .* operator that allows the order to be reversed.
/(?=.*a)(?=.*b)/

Related

How to store regex or search terms in Postgres database and evaluate in Rails Query?

I am having trouble with a DB query in a Rails app. I want to store various search terms (say 100 of them) and then evaluate against a value dynamically. All the examples of SIMILAR TO or ~ (regex) in Postgres I can find use a fixed string within the query, while I want to look the query up from a row.
Example:
Table: Post
column term varchar(256)
(plus regular id, Rails stuff etc)
input = "Foo bar"
Post.where("term ~* ?", input)
So term is VARCHAR column name containing the data of at least one row with the value:
^foo*$
Unless I put an exact match (e.g. "Foo bar" in term) this never returns a result.
I would also like to ideally use expressions like
(^foo.*$|^second.*$)
i.e. multiple search terms as well, so it would match with 'Foo Bar' or 'Search Example'.
I think this is to do with Ruby or ActiveRecord stripping down something? Or I'm on the wrong track and can't use regex or SIMILAR TO with row data values like this?
Alternative suggestions on how to do this also appreciated.
The Postgres regular expression match operators have the regex on the right and the string on the left. See the examples: https://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-TABLE
But in your query you're treating term as the string and the 'Foo bar' as the regex (you've swapped them). That's why the only term that matches is the exact match. Try:
Post.where("? ~* term", input)

Query against a Postgres array column type

TL;DR I'm wondering what the pros and cons are (or if they are even equivalent) between #> {as_champion, whatever} and using IN ('as_champion', 'whatever') is. Details below:
I'm working with Rails and using Postgres' array column type, but having to use raw sql for my query as the Rails finder methods don't play nicely with it. I found a way that works, but wondering what the preferred method is:
The roles column on the Memberships table is my array column. It was added via rails as so:
add_column :memberships, :roles, :text, array: true
When I examine the table, it shows the type as: text[] (not sure if that is truly how Postgres represents an array column or if that is Rails shenanigans.
To query against it I do something like:
Membership.where("roles #> ?", '{as_champion, whatever}')
From the fine Array Operators manual:
Operator: #>
Description: contains
Example: ARRAY[1,4,3] #> ARRAY[3,1]
Result: t (AKA true)
So #> treats its operand arrays as sets and checks if the right side is a subset of the left side.
IN is a little different and is used with subqueries:
9.22.2. IN
expression IN (subquery)
The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result is "false" if no equal row is found (including the case where the subquery returns no rows).
or with literal lists:
9.23.1. IN
expression IN (value [, ...])
The right-hand side is a parenthesized list of scalar expressions. The result is "true" if the left-hand expression's result is equal to any of the right-hand expressions. This is a shorthand notation for
expression = value1
OR
expression = value2
OR
...
So a IN b more or less means:
Is the value a equal to any of the values in the list b (which can be a query producing single element rows or a literal list).
Of course, you can say things like:
array[1] in (select some_array from ...)
array[1] in (array[1], array[2,3])
but the arrays in those cases are still treated like single values (that just happen to have some internal structure).
If you want to check if an array contains any of a list of values then #> isn't what you want. Consider this:
array[1,2] #> array[2,4]
4 isn't in array[1,2] so array[2,4] is not a subset of array[1,2].
If you want to check if someone has both roles then:
roles #> array['as_champion', 'whatever']
is the right expression but if you want to check if roles is any of those values then you want the overlaps operator (&&):
roles && array['as_champion', 'whatever']
Note that I'm using the "array constructor" syntax for the arrays everywhere, that's because it is much more convenient for working with a tool (such as ActiveRecord) that knows to expand an array into a comma delimited list when replacing a placeholder but doesn't fully understand SQL arrays.
Given all that, we can say things like:
Membership.where('roles #> array[?]', %w[as_champion whatever])
Membership.where('roles #> array[:roles]', :roles => some_ruby_array_of_strings)
and everything will work as expected. You're still working with little SQL snippets (as ActiveRecord doesn't have a full understanding of SQL arrays or any way of representing the #> operator) but at least you won't have to worry about quoting problems. You could probably go through AREL to manually add #> support but I find that AREL quickly devolves into an incomprehensible and unreadable mess for all but the most trivial uses.

Rails query by number of digits in field

I have a Rails app with a table: "clients". the clients table has a field: phone. phone data type is string. I'm using postgresql. I would like to write a query which selects all clients which have a phone value containing more than 10 digits. phone does not have a specific format:
+1 781-658-2687
+1 (207) 846-3332
2067891111
(345)222-777
123.234.3443
etc.
I've been trying variations of the following:
Client.where("LENGTH(REGEXP_REPLACE(phone,'[^\d]', '')) > 10")
Any help would be great.
You almost have it but you're missing the 'g' option to regexp_replace, from the fine manual:
The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. [...] The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one.
So regexp_replace(string, pattern, replacement) behaves like Ruby's String#sub whereas regexp_replace(string, pattern, replacement, 'g') behaves like Ruby's String#gsub.
You'll also need to get a \d through your double-quoted Ruby string all the way down to PostgreSQL so you'll need to say \\d in your Ruby. Things tend to get messy when everyone wants to use the same escape character.
This should do what you want:
Client.where("LENGTH(REGEXP_REPLACE(phone, '[^\\d]', '', 'g')) > 10")
# --------------------------------------------^^---------^^^
Try this:
phone_number.gsub(/[^\d]/, '').length

Multi parameter search via user input - ruby on rails & mongodb

I have a web page where a user can search through documents in a mongoDB collection.
I get the user's input through #q = params[:search].to_s
I then run a mongoid query:
#story = Story.any_of( { :Tags => /#{#q}/i}, {:Name => /#{#q}/i}, {:Genre => {/#{#q}/i}} )
This works fine if the user looks for something like 'humor' 'romantic comedy' or 'mystery'. But if looking for 'romance fiction', nothing comes up. Basically I'd like to add 'and' 'or' functionality to my search so that it will find documents in the database that are related to all strings that a user types into the input field.
How can this be done while still maintaining the substring search capabilties I currently have?Thanks in advance for help!
UPDATE:
Per Eugene's comment below...
I tried converting to case insensitive with #q.map! { |x| x="/#{x}/i"}. It does save it properly as ["/romantic/i","/comedy/i"]. But the query Story.any_of({:Tags.in => #q}, {:Story.in => #q})finds nothing.
When I change the array to be ["Romantic","Comedy"]. Then it does.
How can I properly make it case insensitive?
Final:
Removing the quotes worked.
However there is now no way to use an .and() search to find a book that has both words in all these fields.
to create an OR statement, you can convert the string into an array of strings, and then convert the array of strings into an array of regex and then use the '$in' option. So first, pick a delimeter - perhaps commas or space or you can set up a custom like ||. Let's say you do comma seperated. When user enters:
romantic, comedy
you split that into ['romantic', 'comedy'], then convert that to [/romantic/i, /comedy/i] then do
#story = Story.any_of( { :Tags.in => [/romantic/i, /comedy/i]}....
To create an AND query, it can get a little more complicated. There is an elemMatch function you could use.
I don't think you could do {:Tags => /romantic/i, :Tags => /comedy/i }
So my best thought would be to do sequential queries, even though there would be a performance hit, but if your DB isn't that big, it shouldn't be a big issue. So if you want Romantic AND Comedy you can do
query 1: find all collections that match /romantic/i
query 2: take results of query 1, find all collections that match /comedy/i
And so on by iterating through your array of selectors.

Suppress delimiters in Ruby's String#split

I'm importing data from old spreadsheets into a database using rails.
I have one column that contains a list on each row, that are sometimes formatted as
first, second
and other times like this
third and fourth
So I wanted to split up this string into an array, delimiting either with a comma or with the word "and". I tried
my_string.split /\s?(\,|and)\s?/
Unfortunately, as the docs say:
If pattern contains groups, the respective matches will be returned in the array as well.
Which means that I get back an array that looks like
[
[0] "first"
[1] ", "
[2] "second"
]
Obviously only the zeroth and second elements are useful to me. What do you recommend as the neatest way of achieving what I'm trying to do?
You can instruct the regexp to not capture the group using ?:.
my_string.split(/\s?(?:\,|and)\s?/)
# => ["first", "second"]
As an aside note
into a database using rails.
Please note this has nothing to do with Rails, that's Ruby.

Resources