Ransack/Metawhere querystring length - ruby-on-rails

I have a Rails app, and it started like all Rails app, simple, with a searching function of first name and last name of users, now time has gone by, the user list has gotten very large, and the search has increased from a two field search to multiple field searching.
This created the problem of the query searching with MetaWhere is in the range of 10000+ characters. This has caused Nginx and HAProxy to break unless specific settings has been tweaked.
I am wondering what alternative solutions are there to solve this issue?
I thought about making the search a POST request, however, I am expecting pagination to work as well. Also the ability to "Copy and paste" the url.
Another potential solution is to encode the query string in database as a JSON blob, and then append a special hash for the query, but this has a lot of moving parts.

Related

Dart Parse Server creating an index for full text search

I'm trying to use dart parse server in order to do a full-text search as explained here.
So far as I understand, I have the falling 2 options if I want to do that:
whereContainsWholeWord
whereContains
In the case of the first one, I will search the whole database for that specific word, and for the second one, I will search the database for some partial word.
This is exactly what I need, but because the second search is going to utilize regex, this is going to be slow.
Is there any way to create an index for full-text search via the dart parse server, and afterward use some query on that index? Or this feature is not implemented yet, because I could not find anything in this regard in the document

LIKE condition in SphinxQL

Dear programmers and IT experts, I need your help. I've just started to research what Sphinx is. I even made my own "google suggest", that fix frequent and common human search input mistakes. The problem is, that it tries to fix errors all the time and interrupt the real input.
Whell, I want the search engine try to find consilience in searched field by substring first, than, if consiliences are not found, than use my logic for fixing errors. If to say shortly, I want sphinx, first of all, execute this SQL equivalent command
SELECT * FROM suggest WHERE keyword LIKE('%$keyword%')
than, if nothing found, continue mistakes fixing.
The main questioin is....is it possible to tell spinx to search by substring?
Sphinx can mostly do that, but need to understand how it works. Sphinx indexes individual words, and matches by keywords. It uses a large inverted index to make queries fast (rather than running a substring match)
So can do MATCH('one two') as a query, and it will match a document that contains '... one two ...', but the order doesn't matter, and other words can be present, so will ALSO match '... two three one ...' which wouldn't happen with mysql LIKE (its a pure substring match)
Can use phrase operator to do that MATCH('"one two"')
Furthermore Sphinx matches whole words by default. So MATCH('one two') will only match those two works. It wont match against a document say "... one twotwentyone ...' whereas LIKE doesn't restrict to whole words.
So can use wildcards to allow part matches. MATCH('"*one two*"') --- also need to enable it on the index with min_infix_len config!
And even more, sphinx doesn't index punctuation etc (with default charset_table), so a document say '... One! (two?) ... ' WOULD still match MATCH('"one two"'). SQL like would NOT ignore that.
You could change sphinx to index more punctuation (via charset_table) to get a closer to substring match.
So SELECT * FROM index WHERE MATCH('"*$keyword*"') is possibly the closest sphinx query to your original (ie a substring match). Just as long as you aware of the differences. Also there is MySQL Collations to consider, they not exactly same as charset_table.
(frankly while this is correct. I wonder, if a bit OTT. If you just have a textual corpus you want to search, you could index it as normal. Then run queries though CALL KEYWORDS(), to get idea if the query is a valid word in the index (ie just tells you how many times given words appear in index). Can then run your algorithm to fix the mistakes)
As a side note sphinx does have a built in suggest system
http://sphinxsearch.com/blog/2016/10/03/2-3-2-feature-built-in-suggests/

Search a document in Elasticsearch by a list of Wildcarded statements on a single field

If I have documents in ElasticSearch that have a field called url and the contents of the url field are strings like "http://www.foo.com" or "http://www.bar.com/some/url/segment/the-page.html", is it possible to search for documents matching a list of wildcarded url fragments e.g., ["http://www.foo.", "http://www.bar.com//segment/.html", "://*bar.com/**"]?
If it is possible, what is the best approach to do this? I have explored wildcard query which only seems to support 1 fragment not multiple. Filters don't seem to support wildcarding as I have tried using * in a term filter without any luck.
To make it a little more complex, I'm also interested in being able to search by a lot of these fragments. I have come across terms filter lookup which seems like it is a good solution for dealing with many search terms, but I'm not sure wildcarding works with filters.
Any thoughts?

Thinking Sphinx & Rails questions

I'm building my first Rails app and have it working great with Thinking Sphinx. I'm understanding most of it but would love it if someone could help me clarify a few conceptual questions
When displaying search results after a sphinx query, should I be using the sphinx_attributes that are returned from the sphinx query? Or should my view use normal rails objects, such as #property.title, #property.amenities.title etc? If I use normal rails objects, doesn't that mean its doing extra queries?
In a forum, I'd like to display 'unread posts'. Obviously this is true/false for each user/topic combination, so I'm thinking I should be caching the 'reader' ids within the topic's sphinx index. This way I can quickly do a query for all unread posts for a given user_id. I've got this working, but then realised its pointless, as there is a time delay between sphinx indexes. So if a user clicks on an unread post, it will still appear unread until the sphinx DB is re-indexed
I'm still on development so I'm manually indexing/rebuilding, but on production, what is a standard time between re-indexing?
I have a model with several text fields - should I concat these all into one column in the sphinx index for a keyword search? Surely this is quicker than indexing all the separate fields.
Slightly off-topic, but just wondering - when you access nested models, for example #property.agents.name, does this affect performance? Or does rails automatically fetch all associated entries when a property is pulled from the database?
To answer each of your points:
For both of your examples, sphinx_attributes would not be helpful. Firstly, you've already loaded the property, so the title is available directly without an extra database hit. And for property.amenities.title you're dealing with an array of strings, which Sphinx has no concept of. Generally, I would only use sphinx_attributes for complicated calculated attributes, not standard column references.
Yes, you're right, there will be a delay with this value.
It depends on how often your data changes. I have some apps where I can index every day because changes are so rare, but others where we'll run it every 10 minutes. If the data is particularly volatile, I'll look at using deltas (usually via Sidekiq) to have changes reflected in Sphinx in a few seconds.
I don't think it's much difference either way - unless you want to search on any of those columns separately? If so, it'll need to be a separate field.
By default, as you use each property's agents, the agents for that property will be loaded from the database (one SQL call per property). You could look at the eager loading docs for how to manage this better when you're dealing with multiple records. Thinking Sphinx has the ability to pass through :include options to the underlying ActiveRecord call.

Cleaning Up Query Strings

This is more of an open question. What is your opinion on query strings in a URL? While creating sites in ASP.NET MVC you spend a lot of time thinking about and crafting clean URLs only for them to be shattered the first time you have to use query strings, especially on a search form.
For example I recently did a fairly simple search form with half a dozen text field and two or three lists of checkboxes and selects. This produced the query string below when submitted
countrylcid=2057&State=England&StateId=46&Where=&DateFrom=&DateTo=&Tags=&Keywords=&Types
=1&Types=0&Types=2&Types=3&Types=4&Types=5&Costs=0.0-9.99&Costs=10.00-29.99&Costs=30.00-59.99&Costs=60.00-10000.00
Beautiful I think you'll agree. Half the fields had no information in them and the list inputs are very verbose indeed.
A while ago I implemented a simple solution to this for paging which produced a url such as
www.yourdomain.com/browse/filter-on/page-1/perpage-50/
This used a catchall route to grab what is essentially a replacement query string after the filter-on portion. Works quite well but breaks down when doing form submissions.
Id be keen to hear what other solutions people have come up with? There are lots of articles on clean urls but are aimed at asp.net developers creating basic restful urls which MVC has covered. I am half considering diving into model binding to produce a proper solution along those lines. With the above convention the large query string could be rewritten as:
filter-on/countrylcid-2057/state-England/stateId-46/types-{1,0,2,3,4,5}/costs-{0.0-9.99,10.00-29.99,30.00-59.99,60.00-10000.00}/
Is this worth the effort?
Thanks,
My personal view is that where users are likely to want to either bookmark or pass on URLs to other people then a nice, clean "friendly" URL is the way to go. Aesthetically they are much nicer. For simple pagination and ordering then a re-written URL is a good idea.
However, for pages that have a large number of temporary, dynamic fields (such as a search) then I think the humble query string is fine. Like wise for pages whose contents are likely to change significantly given the exact same URL in the future. In these cases, URLs with query strings are fine and, perhaps, even preferable as they at least indicate to the observant user that the page is dynamic. However, in these cases it may be better to use form POST variables, anyway, that way visitors are not tempted to "fiddle" with the values.
In addition to what others have said, a URL implies a hierarchy that is semantic. Whether true today or not, the ancestry is directories and people still think of it as such. That's why you have controller/action/id. Likewise, to me a querystring implies options or queries.
Personally, I think a rewritten URL is best when you can't tell if there's an interpreter behind it -- maybe it's just a generated HTML file?
So however you choose to do it (and it's a pain on the client in a search form -- I'd say more trouble than it's worth), I'd support you doing it for hierarchies.
E.g. /Search/Country/State/City
but once you start getting into prices and types, or otherwise having to preface a "directory" with the type of value (e.g. /prices=50.00/ or worse, with an array), then that's where you've lost me.
In fact, if all elements are filled in, then all you've really done is taken the querystring, replaced "&" with "/", and combined your arrays into a single field.
If you're going to be writing the javascript anyways, why don't you just loop through the form elements and:
Remove the empty ones, cleaning up the querystring from the "&price_low=&price_high=&" sorts of things.
Combine multiple values into an array structure
But then submit as a querystring.
James
Aren't the values of the different fields available in the FormsCollection anyway on post?

Resources