Dart Parse Server creating an index for full text search - dart

I'm trying to use dart parse server in order to do a full-text search as explained here.
So far as I understand, I have the falling 2 options if I want to do that:
whereContainsWholeWord
whereContains
In the case of the first one, I will search the whole database for that specific word, and for the second one, I will search the database for some partial word.
This is exactly what I need, but because the second search is going to utilize regex, this is going to be slow.
Is there any way to create an index for full-text search via the dart parse server, and afterward use some query on that index? Or this feature is not implemented yet, because I could not find anything in this regard in the document

Related

Ransack/Metawhere querystring length

I have a Rails app, and it started like all Rails app, simple, with a searching function of first name and last name of users, now time has gone by, the user list has gotten very large, and the search has increased from a two field search to multiple field searching.
This created the problem of the query searching with MetaWhere is in the range of 10000+ characters. This has caused Nginx and HAProxy to break unless specific settings has been tweaked.
I am wondering what alternative solutions are there to solve this issue?
I thought about making the search a POST request, however, I am expecting pagination to work as well. Also the ability to "Copy and paste" the url.
Another potential solution is to encode the query string in database as a JSON blob, and then append a special hash for the query, but this has a lot of moving parts.

Search a document in Elasticsearch by a list of Wildcarded statements on a single field

If I have documents in ElasticSearch that have a field called url and the contents of the url field are strings like "http://www.foo.com" or "http://www.bar.com/some/url/segment/the-page.html", is it possible to search for documents matching a list of wildcarded url fragments e.g., ["http://www.foo.", "http://www.bar.com//segment/.html", "://*bar.com/**"]?
If it is possible, what is the best approach to do this? I have explored wildcard query which only seems to support 1 fragment not multiple. Filters don't seem to support wildcarding as I have tried using * in a term filter without any luck.
To make it a little more complex, I'm also interested in being able to search by a lot of these fragments. I have come across terms filter lookup which seems like it is a good solution for dealing with many search terms, but I'm not sure wildcarding works with filters.
Any thoughts?

Rails Active Record Relation sorting

Let's say I have a ruby on rails model, with a text field. Let's say I also have a query string. I want to make a query that sorts the database in descending order of maximal number of overlapping characters between the text field and the query string. How would I go about this?
Probably you can go with LIKE SQL statement, but will not be efficient.
Here is an example:
Company.where(Company.arel_table[:name].matches("%stack%"))
If you need to make a lot of queries search the text inside a field in database, you really need to start looking for a full text search software.
I can recommend you Elastic Search, unless you are familiar with some other tool.
You have here a recent tutorial and also a Railscast on subject.

Lucene partial word matching

Lucene does not support it out of the box, so I need some help building my query.
Lets say I have the document with a field value "Develop"
I would like this document to be returned for the searches "Dev" and "lop".
Maybe creating two queries?
"*keyword"
and
"keyword*"
and
"keyword"
?
How would you go about doing this with multiple words? Would you split the sentence/search into a words list and do the previous example for each word?
What you're asking is if I understand you correctly not feasible on any large scale search engine.
Lucene creates an index over keywords using term-document matrix and inverted-file techniques (see links at the bottom). A fully fledged string matching might be very nice to have, but it does not scale: you will never be able to query a decently sized index (say more than a couple of dozen/hundreds of documents) in an acceptable time.
Still, here are two ideas that might help...
Syllable tokenization
To come back to your example with 'Develop'. As long as you are happy with letting users search for syllables I guess you can do something.
You would have to create use tokenizer that splits up words in your indexed according to their syllables and create a database index over the syllables. (I am not sure there are built in tokenizers for the English language that can do that and writing one on your own might be tricky...)
An important thing to note:
If you would index the full words AND the seperate syllables the size of your index will be much larger than if you only index one of the two.
However I would not suggest to index only syllables. If you want to also allow your users to search for the full word 'Develop' (which I guess you want) this would result in two queries with a logical and between them, namely <'dev' AND 'lop'>. Although Lucene supports such logical constructs in queries they are very expensive. I have personally had some trouble in the past using logical queries in Lucene.
Stemming
Another way to somehow arrive at what you're trying could be to use a brutal form of word stemming (http://en.wikipedia.org/wiki/Stemming) that stems words to their first syllable. (This would allow to search for 'dev' but not for 'lop'...)
Again, I don't think such a word stem feature is already in Lucene. Writing one for yourself will be a pain and involve working with/importing huge dictionaries.
Links
These might be looking into if you don't know about search engine internals:
http://en.wikipedia.org/wiki/Index_%28search_engine%29
http://en.wikipedia.org/wiki/Vector_space_model
http://en.wikipedia.org/wiki/Inverted_file
http://en.wikipedia.org/wiki/Term-document_matrix
http://en.wikipedia.org/wiki/Tf-idf

Using Ferret to build unique tag clouds

I've been using Ferret as my full-text search engine in a small project I'm working on.
Through the documentation and a few examples online, i've been able to pull together a tag cloud generator using the full-text index to help with tag cloud generation using the IndexReader.terms method.
It's worked quite well up to now, when I want to get term data based on a search result.
For example, if the user searches for "cake", I want to show them a tag cloud of terms used in association with the term "cake".
I've been looking for examples of where the terms method can be used in association with a search result set or similar?
Currently I'm using the following method to generate my list of tags:
reader = Ferret::Index::IndexReader.new(Scrape.find_last_index_version)
terms = []
reader.terms(:all_quotes).each do |term, doc_freq|
terms << [term, doc_freq]
end
Cheers.
It's more like a term frequency chart (like a wordle) than a tag cloud? Or are these in a tag field? Anyway, the index doesn't keep track of term frequency within each possible document subset (such as the results of a search), so that method wouldn't be fast, even if it existed. For a single document, you can get the TermFreqVector and provide suggested documents that are good matches for other frequent terms in that document. So, you could take some of the top results, grab the term vectors from each one, and just add them up, but those aggregate functions don't exist natively (they generally try not to put slow operations in there.)

Resources