Elastic Search - Get the matching field - ruby-on-rails

I'm using ElasticSearch to implement search on a Webapp (Rails + Tire). When querying the ES server, is there a way to know what field of the Json returned matched the query?

The simplest way is to use the highlight feature, see support in Tire: https://github.com/karmi/tire/blob/master/test/integration/highlight_test.rb.
Do not use the Explain API for other then debugging purposes, as this will negatively affect the performance.

Have you tried using the Explain API from elastic search? The output of explain gives you a detailed explanation of why a document was matched, and it's relevance score.
The algorithm(s) used for searching the records are often much more complex than a single string match. Also, given the fact that you have the possibility of a term matching multiple fields (with possibly different weights), it may not be easy to come up with a simple answer. But, looking at the output of Explain API, you should be able to construct a meaningful message.

Related

Resume Parsing using Solr and TIKA

I was going through this slide. I'm getting little difficulty in understanding the approach.
My two queries are:
How does Solr maintain schema of semi-structured document like
resumes (such as Name, Skills, Education etc)
Can Apache TIKA extract the section wise information from PDFs? Since every resume would have dissimilar sections, how do I define a
common schema of entities?
You define the schema, so that you get the fields you expect and can search in the different fields based on what kind of queries you want to do. You can lump any unknown (i.e. where you're not sure about where it belongs) values into a common search field and rank that field lower.
You'll have to parse the response from Tika (or a different PDF / docx parser) yourself. Just using Tika by itself will not give you an automagically structured response tuned to the problem you're trying to solve. There will be a lot of manual parsing and trying to make sense of what is what from the uploaded document, and then inserting the relevant data into the relevant field.
We did many implementations using solr and elastic search.
And got two challenges
defining schema and more specific getting document to given schema
Then expanding search terms to more accurate and useful match. Solr, Elastic can match which they get from content, but not beyond that content.
You need to use Resume Parser like www.rchilli.com, Sovrn, daxtra, hireability or any others and use their output and map to your schema. Best part is you get access to taxonomies to enhance your content is solr.
You can use any one based on your budget and needs. But for us RChilli worked best.
Let me know if you need any further help.

Modifying Cypher Query Engine

I would like to modify the way Cypher processes queries sent to it for pattern matching. I have read about Execution plans and how Cypher chooses the best plan with the least number of operations and all. This is pretty good. However I am looking into implementing a Similarity Search feature that allows you to specify a Query graph that would be matched if not exact, close (similar). I have seen a few examples of this in theory. I would like to implement something of this sort for Neo4j. Which I am guessing would require a change in how the Query Engine deals with queries sent to it. Or Worse :)
Here are some links that demonstrate the idea
http://www.cs.cmu.edu/~dchau/graphite/graphite.pdf
http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper72.pdf
I am looking for ideas. Anything at all in relation to the topic would be helpful. Thanks in advance
(:I)<-[:NEEDING_HELP_FROM]-(:YOU)
From my point of view, better for you is to create Unmanaged Extensions.
Because you can create you own custom functionality into Neo4j server.
You are not able to extend Cypher Language without your own fork of source code.

When to use sphinx search, when to use normal query?

I am using thinking_sphinx in Rails. As far as I know, Sphinx is used for full text search. Let's say I have these queries:
keyword
country
sort order
I use Sphinx for all the search above. However, when I am querying without keyword, but just country and sort order only, is it a better to use just normal query in MySQL instead of using Sphinx?
In other words, should Sphinx be used only when keyword is searched?
Looking at overall performance and speed.
Not to sound snarky, but does performance really matter?
If you're building an application which will only be used by a handful of users within an organization, then you can probably dismiss the performance benefits of using one method over the other and focus instead on simplicity in your code.
On the other hand, if your application is accessed by a large number of users on the interwebz and you really need to focus on being performant, then you should follow #barryhunter's advice above and benchmark to determine the best approach in a given circumstance.
P.S. Don't optimize before you need to. Fight with all your heart to keep code out of your code.
Benchmark! Benchmark! Benchmark!
Ie test it yourself. The exact performance will vary depending on the exact data, and perhaps even the relative perofrmance of your sphinx and mysql servers.
Sphinx will offer killer-speeds over MySQL when searching by a text string and MySQL will probably be faster when searching by a numerical key.
So, assuming that both "country" and "sort order" can be indexed using a numerical index in MySQL, it will be better to use Sphinx only with "keyword" and for the other two - MySQL normal query.
However, benchmarks won't hurt, as barryhunter suggested ;)

Can I use Splunk to analyse events from a Rails application?

Looking at Splunk, http://www.splunk.com, it looks like a very nice platform for analysing how a system is performing in relation to the actions users are taking.
A Ruby on Rails implementation is provided, but it would seem to only offer traditional analytics.
Is there either:
A way to use Slunk to monitor events defined in the code of a rails app?
or
A better tool for the job?
Thanks!
There's no ruby-specific query handler for ruby generated logs. It's certainly possible to build one by
Defining how to acquire the fields from ruby-style generated logs (as linked above)
Defining how to translate your desired syntax to splunk's search language, which would be probably, for that query "sign_up referer=bla"
Splunk is extensible in various ways. For example, it would be pretty possible to author a search filter which can narrow the set of events in ruby, parsing a ruby expression. The splunk search language has its own ideas about quotation marks, backslashes, and pipes, but the rest of the text would be up to the filter. However, the core performance optimizations of limiting the search to events containing substrings is currently only possible in the splunk search language syntax.
That said, if your data set is very small, and the analysis you want to do limited in scope, then maybe some custom ruby solution is closer to what you want.
As far as splunk, check out answers.splunk.com and here is one answer related to rails:
http://splunk-base.splunk.com/answers/8830/how-do-i-extract-key-value-pairs-from-ruby-on-rails-logs

Opening a PDF file and searching for names there

I have a PDF file. And I want to search for names there.
How can I open the PDF and get all its text with Ruby?
Are there are any algorithms to find names?
What should I use as a search engine: Sphinx or something simpler (just LIKE sql queries)?
To find proper names in unstructured text, the technical name for the problem you are trying to solve is Named Entity Recognition or Named Entity Extraction. There are a number of different natural language toolkits and research papers which implement various algorithms to try to solve this problem. None of them will get perfect accuracy, but it may be good enough for your needs. I haven't tried it myself but the web page for Stanford Named Entity Recognizer has a link for Ruby Bindings.
Tough question. These domains remain in the research area of semantic web. I can only suggest some tracks but would be curious to know your definite choice.
I'd use pdf-reader: https://github.com/yob/pdf-reader
You could use a Bloom Filter matching some dictionary. You'd assume that words not matching the dictionary are names... Not always realistic but it's a first approach.
To get more names, you could check the words beginning with a capital letter (not great but we keep on finding some basic approaches). Some potential resource: http://snippets.dzone.com/posts/show/4235
For your search engine, the two main choices using Rails are Sphinx and SolR.
Hope this helps!

Resources