We have an index of domain names in elasticsearch (we are using the tire gem with ruby to connect and maintain this) however we are having trouble with exact searches.
If I search for the term google.com in domains, it brings back google.com however it also brings back any domain with a dash (-) in such as in-google.com, research leads me to believe that - is a wildcard in ES and all I need to do is put not_analyzed but that doesn't work.
:domain => { :type => 'string' , :analyzer => 'whitespace' },
:domain_2 => { :type => 'string' , :analyzer => 'pattern' },
:domain_3 => { :type => 'string', :index => 'not_analyzed' },
:domain_4 => { :type => 'string', :analyzer => 'snowball' }
I've tried different analysers as you can see above, but they all have the same issue when searched using the 'head' plugin'.
https://gist.github.com/anonymous/8080839 is the code I'm using to generate the dataset to test with, what I'm looking for is the ability to search for JUST google and if I want *google I can implement my own wildcard?
I'm resigned to the fact that I'm going to have to delete and regenerate my index but no matter what analyser I choose or type, I still cannot get an exact match
You're not showing the sample queries you are using. Are you sure your queries and indexing uses the same text processing?
Also, you may want to check out the multi_field-approach to analyzing things multiple ways.
I've made a runnable example with a bunch of different queries that illustrate this. Note that the domain has been indexed in two ways, and note which field the queries are hitting: https://www.found.no/play/gist/ecc52fad687e83ddcf73
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"domain": {
"type": "multi_field",
"fields": {
"domain": {
"type": "string",
"analyzer": "standard"
},
"whitespace": {
"type": "string",
"analyzer": "whitespace"
}
}
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"domain":"google.com"}
{"index":{"_index":"play","_type":"type"}}
{"domain":"in-google.com"}
'
# Do searches
# Matches both
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"_all": "google.com"
}
}
}
'
# Also matches "google.com". in-google.com gets tokenized to ["in", "google.com"]
# and the default match operator is `or`.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"domain": {
"query": "in-google.com"
}
}
}
}
'
# What terms are generated? (Answer: `google.com` and `in`)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"size": 0,
"facets": {
"domain": {
"terms": {
"field": "domain"
}
}
}
}
'
# This should just match the second document.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"domain.whitespace": {
"query": "in-google.com"
}
}
}
}
'
Related
I have ror application that in admin dashboard, admin could observe the location of his employee, in my case, I use elk to gather information of employees that contains latitude and longitude and which send to my map based on his movement, My problem is, I have a template that logstash based on template create daily index but recently I found every field in my index that have type changed to text when indexed created.
this is my json that logstash reads:
{"driver_id": 31,"driver_email": "ankith.ravindran#mailinator.com","location": {"latitude": "-35.2824767","longitude": "149.1326453"},"created_at": "2021-06-29 14:28:47", "required_matches": 1, "type": "location"}
this is my logstash.conf file:
input {
file {
path => ["/usr/share/logstash/MPD_LOCATION/*",
"/usr/share/logstash/MPD_LOCATION/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*/*/*"]
start_position => "beginning"
type => "json"
sincedb_path => "/dev/null"
}
}
filter {
mutate {
gsub => ["message","/}+({)/", "}::{"]
}
mutate {
gsub => ["message","/}+( )/", "}::"]
}
split {
field => "message"
terminator => "::"
}
json { source => "message" }
mutate {
add_field => { "uuid" => "D%{driver_id}T%{created_at}" }
rename => {
"[location][latitude]" => "[location][lat]"
"[location][longitude]" => "[location][lon]"
}
convert => {
"[location][lat]" => "float"
"[location][lon]" => "float"
}
}
}
output {
if ([type] == "location") {
elasticsearch {
hosts => "http://elasticsearch:9200"
index => "live_locations_%{+YYYY_MM_dd}"
# manage_template => true
template => "/usr/share/logstash/Template/live_locations.json"
template_name => "live_locations"
# template_overwrite => true
document_id => "%{uuid}"
}
} else if ([type] == "app_info") {
elasticsearch {
hosts => "http://elasticsearch:9200"
index => "app_info_%{+YYYY_MM_dd}"
document_id => "%{uuid}"
}
}
stdout { codec => rubydebug }
}
this is my template file:
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"driver_id": { "type": "integer" },
"email": { "type": "text" },
"location": { "type": "geo_point" },
"app-platform": { "type": "text" },
"app-version": { "type": "text" },
"created_at": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"required_matches": { "type": "integer" }
}
}
}
for example, I defined type of created_at , date but when index created this field return as text and I can't understand what happened or field of location it's return float so I could not use my index as geo_point, I have to add I use elk in the version of 7.13 and used on docker.
Updated : I have two types of JSON that one of them just returns the location of the employee the second of them just returns app_version and app_platform of the employee that used.
Updated 2 : I change my input from logstash to filebeat but I still have the same problem.
I have used in my model to include spell check such that if the user inputs data like "Rentaal" then it should fetch the correct data as "Rental"
document.rb code
require 'elasticsearch/model'
class Document < ApplicationRecord
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
belongs_to :user
Document.import force: true
def self.search(query)
__elasticsearch__.search({
query: {
multi_match: {
query: query,
fields: ['name^10', 'service']
}
}
})
end
settings index: {
"number_of_shards": 1,
analysis: {
analyzer: {
edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter:
["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
}
},
filter: {
edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
"20" }
}
} do
mapping do
indexes :name, type: "string", analyzer: "edge_ngram_analyzer"
indexes :service, type: "string", analyzer: "edge_ngram_analyzer"
end
end
end
search controller code:
def search
if params[:query].nil?
#documents = []
else
#documents = Document.search params[:query]
end
end
However, if I enter Rentaal or any misspelled word, it does not display anything.
In my console
#documents.results.to_a
gives an empty array.
What am I doing wrong here? Let me know if more data is required.
Try to add fuzziness in your multi_match query:
{
"query": {
"multi_match": {
"query": "Rentaal",
"fields": ["name^10", "service"],
"fuzziness": "AUTO"
}
}
}
Explanation
Kstem filter is used for reducing words to their root forms and it does not work as you expected here - it would handle corectly phrases like Renta or Rent, but not the misspelling you provided.
You can check how stemming works with following query:
curl -X POST \
'http://localhost:9200/my_index/_analyze?pretty=true' \
-d '{
"analyzer" : "edge_ngram_analyzer",
"text" : ["rentaal"]
}'
As a result I see:
{
"tokens": [
{
"token": "ren"
},
{
"token": "rent"
},
{
"token": "renta"
},
{
"token": "rentaa"
},
{
"token": "rentaal"
}
]
}
So typical misspelling will be handled much better with applying fuzziness.
I'm attempting to use Eve to provide an RESTful API for a simple list of items.
I'd like to use 1) one HTTP request to create a list (possibly with initial items), 2) one HTTP request to add an item(s) (a common operation), 3) one HTTP request to get the list (including all child items). In other words:
1) POST /lists with body
{
"title": "My List",
"items": [{
"name": "Alice"
},
{
"name": "Bob"
}]
}
2) POST /lists/555555555555555555555555/items with body
{
"name": "Carol"
}
3) GET /lists/555555555555555555555555
{
"_id": "555555555555555555555555",
"title": "My List",
"items": [{
"_id": "aaaaaaaaaaaaaaaaaaaaaaaa",
"name": "Alice"
},
{
"_id": "bbbbbbbbbbbbbbbbbbbbbbbb",
"name": "Bob"
},
{
"_id": "cccccccccccccccccccccccc",
"name": "Carol"
}]
}
I haven't figured out how to do this with Eve. I can do (1) using an embedded list of dicts, but then I can't do (2)—I'd have to POST an item and then PATCH the list (?). I can do (2) using sub-resources, but then I can't do (1) ("value '{'name': 'Alice'}' cannot be converted to a ObjectId"). Or am I missing something?
If all three can't be done, could at least both (2) and (3)?
I figured out how to implement (2) and (3), using database event hooks to inject the embedded child documents into the parent list before it's returned to the client (and also delete the children when the parent is deleted). This works and supports the expected REST usage on individual list items. It results in two DB queries, however.
I suspect (1) could also be implemented using an event hook, but this will suffice for now.
Any further improvements/suggestions are welcome. It would be nice if there were an easier way to accomplish this (keywords: One-to-Many Relationships with Embedded Documents).
settings.py:
RESOURCE_METHODS = ['GET', 'POST', 'DELETE']
ITEM_METHODS = ['GET', 'PUT', 'PATCH', 'DELETE']
lists = {
'schema': {
'title': {
'type': 'string'
}
}
}
items = {
'url': 'lists/<regex("[a-f0-9]{24}"):list_id>/items',
'schema': {
'name': {'type': 'string',
'required': True
},
'list_id': {
'type': 'objectid',
'required': True,
'data_relation': {
'resource': 'lists',
'field': '_id'
}
}
}
}
DOMAIN = {
'lists': lists,
'items': items
}
main.py:
from bson.objectid import ObjectId
def before_returning_lists(response):
list_id = response['_id']
response['items'] = list(db.items.find({'list_id': ObjectId(list_id)}))
def after_deleting_lists(item):
list_id = item['_id']
db.items.delete_many({'list_id': ObjectId(list_id)})
app.on_fetched_item_lists += after_fetching_lists
app.on_deleted_item_lists += after_deleting_lists
Usage
curl -X POST http://127.0.0.1:5000/lists -d title="My List"
# (2)
curl -X POST http://127.0.0.1:5000/lists/5895fdb5a663e2dcad9e7647/items -d 'name=Alice'
curl -X POST http://127.0.0.1:5000/lists/5895fdb5a663e2dcad9e7647/items -d 'name=Bob'
curl -X POST http://127.0.0.1:5000/lists/5895fdb5a663e2dcad9e7647/items -d 'name=Carol'
# (3)
curl -X GET http://127.0.0.1:5000/lists/5895fdb5a663e2dcad9e7647
I using this gem for elasticsearch API
I trying to convert the following curl statement to an equivalent API call
curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"size": 100,
"fields": [
"#message",
"#timestamp"
],
"query": {
"term": {
"#message": "drop"
}
}
}'
I tried these but not getting intended results
Elasticsearch API
#esearch = Elasticsearch::Client.new log: true
#data2 = #esearch.search q: {
term:{
"#message" => "drop"
}
},
size:'100',
fields:'["#message", "#timestamp"]'
Transport API
client = Elasticsearch::Client.new
#data = client.perform_request 'GET', '_search', {
:size=> 100,
:query=> {
:term=> {
"message"=> "drop"
}
},
{
:fields=> [
'#message',
'#timestamp'
]
}
}
Please help
You need to wrap all of those parameters in a body element:
#data2 = #esearch.search
body: {
query: {term:{"#message" => "drop"}},
size:'100',
fields:'["#message", "#timestamp"]
}
i make this call to map by default, all datatypes as strings-
curl -XPUT 'http://localhost:9200/_all/_default_/_mapping' -d '
{
"mappings": {
"_default_": {
"dynamic_templates": [
{
"match": "*",
"mapping": {
"type": "string"
}
}
]
}
}
}
'
The mapping does not work, so i make this call to verify-
curl -XGET 'http://localhost:9200/_all/_mapping'
{
"logstash-2014.02.05": {
"_default_": {
"properties": {}
}
}
Why is the properties part empty?
You should delete the mappings key from your PUT request. You only specify mappings when you are creating an index, not when updating mappings.