Can I use Splunk to analyse events from a Rails application? - ruby-on-rails

Looking at Splunk, http://www.splunk.com, it looks like a very nice platform for analysing how a system is performing in relation to the actions users are taking.
A Ruby on Rails implementation is provided, but it would seem to only offer traditional analytics.
Is there either:
A way to use Slunk to monitor events defined in the code of a rails app?
or
A better tool for the job?
Thanks!

There's no ruby-specific query handler for ruby generated logs. It's certainly possible to build one by
Defining how to acquire the fields from ruby-style generated logs (as linked above)
Defining how to translate your desired syntax to splunk's search language, which would be probably, for that query "sign_up referer=bla"
Splunk is extensible in various ways. For example, it would be pretty possible to author a search filter which can narrow the set of events in ruby, parsing a ruby expression. The splunk search language has its own ideas about quotation marks, backslashes, and pipes, but the rest of the text would be up to the filter. However, the core performance optimizations of limiting the search to events containing substrings is currently only possible in the splunk search language syntax.
That said, if your data set is very small, and the analysis you want to do limited in scope, then maybe some custom ruby solution is closer to what you want.

As far as splunk, check out answers.splunk.com and here is one answer related to rails:
http://splunk-base.splunk.com/answers/8830/how-do-i-extract-key-value-pairs-from-ruby-on-rails-logs

Related

Umbraco (Examine) Search - Synonyms

I am trying to implement synonym searching in the Examine search engine that comes with Umbraco 8 out of the box.
Does anyone have any experience with implementing synonym searching in Examine/Umbraco 8. The options that I have been considering after looking around are -
A package that can be installed in Umbraco 8 that offers this extended functionality (if one exists).
Implementing a custom index (currently just using the out of the box 'ExternalIndex') that somehow implements synonym searching in the analysis (via custom analyzer implementation etc - If that is even possible).
Manually formatting multiple search terms by checking for synonyms in the string beforehand, running all searches and consolidating the results after (really a nasty, last resort option - you don't have to tell me how bad this is, I already know).
I have been trawling around the forums for a definitive answer on this and cannot really find one. Essentially I want to stick with the Examine engine for simplicity, however I am starting to think that the best way to achieve what I am after would be to move to a new engine completely (elastic search for example).
Many thanks in advance.
Use algolia? It's free and will do what you need easily? https://www.algolia.com/
The Examine is based on something called the Lucene search index. Lucene is known to not really do synonyms I'm afraid (read why here and potential solution).
Your thinking is probably correct. Examine is good at what it does, if you want to use more advanced searching then you will be better off using a more advanced search provider. There are loads of options, Algolia is Saas and comes with a free plan depending on your usage. It's easy to install and you target data from the front-end.
YOu could also look into Azure Cognitive Search or Solr. These are probably harder to implement but will also do the job

Resume Parsing using Solr and TIKA

I was going through this slide. I'm getting little difficulty in understanding the approach.
My two queries are:
How does Solr maintain schema of semi-structured document like
resumes (such as Name, Skills, Education etc)
Can Apache TIKA extract the section wise information from PDFs? Since every resume would have dissimilar sections, how do I define a
common schema of entities?
You define the schema, so that you get the fields you expect and can search in the different fields based on what kind of queries you want to do. You can lump any unknown (i.e. where you're not sure about where it belongs) values into a common search field and rank that field lower.
You'll have to parse the response from Tika (or a different PDF / docx parser) yourself. Just using Tika by itself will not give you an automagically structured response tuned to the problem you're trying to solve. There will be a lot of manual parsing and trying to make sense of what is what from the uploaded document, and then inserting the relevant data into the relevant field.
We did many implementations using solr and elastic search.
And got two challenges
defining schema and more specific getting document to given schema
Then expanding search terms to more accurate and useful match. Solr, Elastic can match which they get from content, but not beyond that content.
You need to use Resume Parser like www.rchilli.com, Sovrn, daxtra, hireability or any others and use their output and map to your schema. Best part is you get access to taxonomies to enhance your content is solr.
You can use any one based on your budget and needs. But for us RChilli worked best.
Let me know if you need any further help.

Elastic Search - Get the matching field

I'm using ElasticSearch to implement search on a Webapp (Rails + Tire). When querying the ES server, is there a way to know what field of the Json returned matched the query?
The simplest way is to use the highlight feature, see support in Tire: https://github.com/karmi/tire/blob/master/test/integration/highlight_test.rb.
Do not use the Explain API for other then debugging purposes, as this will negatively affect the performance.
Have you tried using the Explain API from elastic search? The output of explain gives you a detailed explanation of why a document was matched, and it's relevance score.
The algorithm(s) used for searching the records are often much more complex than a single string match. Also, given the fact that you have the possibility of a term matching multiple fields (with possibly different weights), it may not be easy to come up with a simple answer. But, looking at the output of Explain API, you should be able to construct a meaningful message.

Implement text matching like solr in ruby

I am working on a test based Q & A application. The questions on the application mainly have two or three words as the answer.
Example : Q. Who founded Google ?
A. Larry Page , Sergei Brin
There are no options for the answer, the user has to actually type it in. Plus in some cases there might be a synonym for the answer.
Example: USB Drive, Universal Serial Bus Drive, Pen Drive are all correct answers for the question: What is meant by nerd bling?
I have worked with solr before and it's full text search is powerful enough to do a match, consider synonyms and give a score for the match. However, I need to match the answers in my RoR application. Instead of writing my own regex to handle the task, I am wondering if there are some libraries that I could look at within RoR for this.
Also, if I were to look under the hood of solr and take inspiration from the code there to create a library of my own, please suggest files/modules I should be looking at (since I barely have any idea about Java).
Try using elasticsearch which allows RESTFUL search. Refer to www.elasticsearch.org/

Opening a PDF file and searching for names there

I have a PDF file. And I want to search for names there.
How can I open the PDF and get all its text with Ruby?
Are there are any algorithms to find names?
What should I use as a search engine: Sphinx or something simpler (just LIKE sql queries)?
To find proper names in unstructured text, the technical name for the problem you are trying to solve is Named Entity Recognition or Named Entity Extraction. There are a number of different natural language toolkits and research papers which implement various algorithms to try to solve this problem. None of them will get perfect accuracy, but it may be good enough for your needs. I haven't tried it myself but the web page for Stanford Named Entity Recognizer has a link for Ruby Bindings.
Tough question. These domains remain in the research area of semantic web. I can only suggest some tracks but would be curious to know your definite choice.
I'd use pdf-reader: https://github.com/yob/pdf-reader
You could use a Bloom Filter matching some dictionary. You'd assume that words not matching the dictionary are names... Not always realistic but it's a first approach.
To get more names, you could check the words beginning with a capital letter (not great but we keep on finding some basic approaches). Some potential resource: http://snippets.dzone.com/posts/show/4235
For your search engine, the two main choices using Rails are Sphinx and SolR.
Hope this helps!

Resources