ElasticSearch: Altering indexed version of text - ruby-on-rails

Before the text in a field is indexed, I want to run code on it to transform it, basically what's going on here https://www.elastic.co/guide/en/elasticsearch/reference/master/gsub-processor.html (but that feature isn't out yet).
For example, I want to be able to transform all . in a field into - for the indexed version.
Any advice? Doing this in elasticsearch-rails.

Use a char_filter where you replace all . into - but this will change the characters of the indexed terms, not the _source itself. Something like this:
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : [
". => -"
]
}
}
or use Logstash with mutate and gsub filter to pre-process the data before being sent to Elasticsearch. Or you do it in your own indexer (whatever that is).

Related

How to make firebase list like this on swift

This is what I want it to be
first I did it like this:
self.ref.child("User/CareGiver/\(CaregiverUID)/Followed").setValue([value2])
the result looks like in the picture. but when I add second data. It replaces the old one.
so I change to this
self.ref.child("User/CareGiver/\(CaregiverUID)").updateChildValues([
"Followed" : ["\(value2)"]])
it still replaces data at position[0] and never make it to position 1
how can I do
array [UID1,UID2,UID3] to firebase (not add array data at the same time so it would be like this)
-[0] UID1
-[1] UID2
-[2] UID3
without replacing another one?
ps.sorry for broken English
JSON should look likes this
{
"Name_Care" : "asdsāļ—",
"Tel_Care" : "kknlk",
"Role" : "Care Giver",
"Followed" : [
"UID_A",
"UID_B",
"UID_C",
"UID_E"
]
}
in firebase would be like in picture
So if you want to add new values to your "followed" dictionary, you should use "setValue" function with uniq id for your new user.
Example:
self.ref.child("User/CareGiver/\(CaregiverUID)/Followed").setValue("3":"user_name")
In this case you add record user_name for key 3
I think this way can help you.

Read multiple concatenated json objects in Ruby

I have a file that contains multiple JSON objects that are not separated by comma :
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
Each of the objects standalone is a valid json object.
Is there a way that I can process this file easily?
I know this is NOT a valid json, but unfortunately this file is being generated by a 3rd party tool. I have no option of changing the way the output looks like.
I can't open a text editor and smart-insert commas / square brackets before the run, since this is an automated process (I also really don't want to write code that opens the file and manipulates it).
In .NET there's a library that has this exact feature :
https://stackoverflow.com/a/29480032/2970729
https://www.newtonsoft.com/json/help/html/P_Newtonsoft_Json_JsonReader_SupportMultipleContent.htm
Is there anything equivalent in Ruby?
As long as your file is that simple you might want to do something like this:
# content = File.read(filename)
content =<<-EOF
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
EOF
require 'json'
JSON.parse("[#{content.gsub(/\}.*?\{/m, '},{')}]")
#=> [{"field"=>"value", "another_field"=>"another_value"}, {"field"=>"value"}]
The yajl-ruby gem enables processing concatenated JSON in Ruby. The parser can read from a String or an IO. Each complete object is yielded to a block.
require 'yajl'
File.open 'file.json' do |f|
Yajl.load f do |object|
# do something with object
end
end
See the documentation for other options (buffer size, symbolized keys, etc).

grails gorm mongodb `like` functionality in criteria

Is like or rlike supported for searching a string in a collection's property value?
Does the collection need to define text type index for this to work? Unfortunately I can not create a text index for the property. There are 100 million documents and text index killed the performance (MongoDB is on single node). If this is not do-able without text index, its fine with me. I will look for alternatives.
Given below collection:
Message {
'payload' : 'XML or JSON string'
//few other properties
}
In grails, I created a Criteria to return me a list of documents which contain a specific string in the payload
Message.list {
projections {
like('payload' : searchString)
}
}
I tried using rlike('payload' : ".*${searchString}.*") as well. It did not result in any doc to me.
Note: I was able to get the document when I fired the native query on Mongo shell.
db.Message.find({payload : { $regex : ".*My search string.*" }}).pretty()
I got it working in a round about way. I believe there is a much better grails solution. Criteria approach did not work. So used the low level API converted the DBObjects to Domain objects.
def query = ['payload' : [ '$regex' : /${searchString}/ ] ]
def dbObjects = Message.collection.find(query).skip(offset).limit(defaultPageSize).toArray()
dbObjects?.collect { new Message(new JsonSlurper().parseText(it.toString()))}

How to use must_not with an empty JSON attribute with ElasticSearch + Grails?

I'm using Grails plugin to work with ElasticSearch over MySQL. I have a domain column mapped in my domain class as follows:
String updateHistoryJSON
(...)
static mapping = {
updateHistoryJSON type: 'text', column: 'update_history'
}
In MySQL, this basically maps to a TEXT column, which purpose is to store JSON content.
So, in both DB and ElasticSearch index, I have 2 instances:
- instance 1 has updateHistoryJSON = '{"zip":null,"street":null,"name":null,"categories":[],"city":null}'
- instance 2 has updateHistoryJSON = '{}'
Now, what I need is an ElasticSearch query that returns only instance 2.
I've been doing a closure like this, using Groovy DSL:
{
bool {
must_not = term(updateHistoryJSON: "{}")
minimum_should_match = 1
}
}
And ElasticSearch seems to ignore it, it keeps bringing back both instances.
On the other hand, if I use a filter like "missing":{"field":"updateHistoryJSON"}, it gives back no documents. The same goes for "exists": {"field":"updateHistoryJSON"}.
Any idea about what am I doing wrong here?
I'm still not sure about what was the problem, but at least I found a workaround.
Since the search based on updateHistoryJSON contents was not working, I decided to use a script to search based on updateHistoryJSON contents size, meaning, instead of looking for documents that had a non-empty JSON, I just look for documents which updateHistoryJSON size is greater than 2 ({} == size 2).
The closure I used is like this:
{script = {
script = "doc['updateHistoryJSON'].size() > 2"
}

Elastic Search: how to see the indexed data

I had a problem with ElasticSearch and Rails, where some data was not indexed properly because of attr_protected. Where does Elastic Search store the indexed data? It would be useful to check if the actual indexed data is wrong.
Checking the mapping with Tire.index('models').mapping does not help, the field is listed.
Probably the easiest way to explore your ElasticSearch cluster is to use elasticsearch-head.
You can install it by doing:
cd elasticsearch/
./bin/plugin -install mobz/elasticsearch-head
Then (assuming ElasticSearch is already running on your local machine), open a browser window to:
http://localhost:9200/_plugin/head/
Alternatively, you can just use curl from the command line, eg:
Check the mapping for an index:
curl -XGET 'http://127.0.0.1:9200/my_index/_mapping?pretty=1'
Get some sample docs:
curl -XGET 'http://127.0.0.1:9200/my_index/_search?pretty=1'
See the actual terms stored in a particular field (ie how that field has been analyzed):
curl -XGET 'http://127.0.0.1:9200/my_index/_search?pretty=1' -d '
{
"facets" : {
"my_terms" : {
"terms" : {
"size" : 50,
"field" : "foo"
}
}
}
}
More available here: http://www.elasticsearch.org/guide
UPDATE : Sense plugin in Marvel
By far the easiest way of writing curl-style commands for Elasticsearch is the Sense plugin in Marvel.
It comes with source highlighting, pretty indenting and autocomplete.
Note: Sense was originally a standalone chrome plugin but is now part of the Marvel project.
Absolutely the easiest way to see your indexed data is to view it in your browser. No downloads or installation needed.
I'm going to assume your elasticsearch host is http://127.0.0.1:9200.
Step 1
Navigate to http://127.0.0.1:9200/_cat/indices?v to list your indices. You'll see something like this:
Step 2
Try accessing the desired index:
http://127.0.0.1:9200/products_development_20160517164519304
The output will look something like this:
Notice the aliases, meaning we can as well access the index at:
http://127.0.0.1:9200/products_development
Step 3
Navigate to http://127.0.0.1:9200/products_development/_search?pretty to see your data:
ElasticSearch data browser
Search, charts, one-click setup....
Aggregation Solution
Solving the problem by grouping the data - DrTech's answer used facets in managing this but, will be deprecated according to Elasticsearch 1.0 reference.
Warning
Facets are deprecated and will be removed in a future release. You are encouraged to
migrate to aggregations instead.
Facets are replaced by aggregates - Introduced in an accessible manner in the Elasticsearch Guide - which loads an example into sense..
Short Solution
The solution is the same except aggregations require aggs instead of facets and with a count of 0 which sets limit to max integer - the example code requires the Marvel Plugin
# Basic aggregation
GET /houses/occupier/_search?search_type=count
{
"aggs" : {
"indexed_occupier_names" : { <= Whatever you want this to be
"terms" : {
"field" : "first_name", <= Name of the field you want to aggregate
"size" : 0
}
}
}
}
Full Solution
Here is the Sense code to test it out - example of a houses index, with an occupier type, and a field first_name:
DELETE /houses
# Index example docs
POST /houses/occupier/_bulk
{ "index": {}}
{ "first_name": "john" }
{ "index": {}}
{ "first_name": "john" }
{ "index": {}}
{ "first_name": "mark" }
# Basic aggregation
GET /houses/occupier/_search?search_type=count
{
"aggs" : {
"indexed_occupier_names" : {
"terms" : {
"field" : "first_name",
"size" : 0
}
}
}
}
Response
Response showing the relevant aggregation code. With two keys in the index, John and Mark.
....
"aggregations": {
"indexed_occupier_names": {
"buckets": [
{
"key": "john",
"doc_count": 2 <= 2 documents matching
},
{
"key": "mark",
"doc_count": 1 <= 1 document matching
}
]
}
}
....
A tool that helps me a lot to debug ElasticSearch is ElasticHQ. Basically, it is an HTML file with some JavaScript. No need to install anywhere, let alone in ES itself: just download it, unzip int and open the HTML file with a browser.
Not sure it is the best tool for ES heavy users. Yet, it is really practical to whoever is in a hurry to see the entries.
Kibana is also a good solution. It is a data visualization platform for Elastic.If installed it runs by default on port 5601.
Out of the many things it provides. It has "Dev Tools" where we can do your debugging.
For example you can check your available indexes here using the command
GET /_cat/indices
If you are using Google Chrome then you can simply use this extension named as Sense it is also a tool if you use Marvel.
https://chrome.google.com/webstore/detail/sense-beta/lhjgkmllcaadmopgmanpapmpjgmfcfig
Following #JanKlimo example, on terminal all you have to do is:
to see all the Index:
$ curl -XGET 'http://127.0.0.1:9200/_cat/indices?v'
to see content of Index products_development_20160517164519304:
$ curl -XGET 'http://127.0.0.1:9200/products_development_20160517164519304/_search?pretty=1'

Resources