How do you use the ingest-attachment plugin with elasticsearch-rails?

How do you use the ingest-attachment plugin with elasticsearch-rails? - ruby-on-rails

I was previously using the mapper-attachments plugin that is now deprecated, which was fairly easy to use along with normal indexing. Now that ingest-attachment has replaced it and requires a pipeline, etc. it has become confusing on how to properly use this.
Lets say I have a model named Media, that has a file field containing the base64 encoded file. I have the following mappings in that file:
mapping '_source' => { :excludes => ['file'] } do
indexes :id, type: :long, index: :not_analyzed
indexes :name, type: :text
indexes :visibility, type: :integer, index: :not_analyzed
indexes :created_at, type: :date, include_in_all: false
indexes :updated_at, type: :date, include_in_all: false
# attachment specific mappings
indexes 'attachment.title', type: :text, store: 'yes'
indexes 'attachment.author', type: :text, store: 'yes'
indexes 'attachment.name', type: :text, store: 'yes'
indexes 'attachment.date', type: :date, store: 'yes'
indexes 'attachment.content_type', type: :text, store: 'yes'
indexes 'attachment.content_length', type: :integer, store: 'yes'
indexes 'attachment.content', term_vector: 'with_positions_offsets', type: :text, store: 'yes'
end
I have created an attachment pipeline via curl:
curl -XPUT 'localhost:9200/_ingest/pipeline/attachment' -d'
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "file"
}
}
]
}'
Now, previously a simple Media.last.__elasticsearch__.index_document would have been sufficient to index a record along with the actual file via the mapper-attachments plugin.
I'm not sure how to do this with ingest-attachment using a pipeline and the elasticsearch-rails gem.
I can do the following PUT via curl:
curl -XPUT 'localhost:9200/assets/media/68?pipeline=attachment' -d'
{ "file" : "my_really_long_encoded_file_string" }'
This will index the encoded file but obviously it doesn't index the rest of the model's data (or overwrites it completely if it was previously indexed). I don't really want to have to include every single model attribute along with the file in a curl command. Are there better or simpler ways of doing this? Am I just completely off with out pipelines and ingest are supposed to work?

Finally figured this out. I needed up to update the ES gems, specifically elasticsearch-api.
With the mappings and pipeline set as I have it, you can easily just do:
Media.last.__elasticsearch__.index_document pipeline: :attachment
or
Media.last.__elasticsearch__.update_document pipeline: :attachment
This will index everything correctly and your file will be properly parsed and indexed via the ingest pipeline.

Related

Wish the search method for elasticsearch is working on rails

■The environment
MacOS
RailsServer
Ruby 2.4.1
Ruby on Rails 5.1.7
MySQL
Elasticsearch 7.10.2-SNAPSHOT
kuromoji
Gems
elasticsearch (7.4.0)
elasticsearch-api (7.4.0)
elasticsearch-model (7.1.0 80822d6)
elasticsearch-rails (7.1.0 80822d6)
elasticsearch-transport (7.4.0)
■My wish
Hi I'm japanese.
My wish is that read faster on database to use elasticsearch on rails.
Now i'm trouble with search method on rails.
I will explain the current settings.
First, I think the record on elastic search already input on following below.
curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open es_project_development Fpu_adXWT9Gtw7KZTh0aDw 1 1 6804 0 3.7mb 3.7mb
Second, the setting of index and mapping for elasticsearch are the following below.
/models/concerns/project_searchable.rb
module ProjectSearchable
extend ActiveSupport::Concern
included do
include Elasticsearch::Model
index_name "es_project_#{Rails.env}"
settings do
mappings dynamic: 'false' do
indexes :id, type: 'integer'
indexes :title, type: 'text', analyzer: 'kuromoji'
indexes :contents, type: 'text', analyzer: 'kuromoji'
indexes :industry, type: 'text'
...
def as_indexed_json(*)
attributes
.symbolize_keys
.slice(:id, :title, :contents, :industry, ...)
end
end
class_methods do
def create_index!
client = __elasticsearch__.client
client.indices.delete index: self.index_name rescue nil
client.indices.create(index: self.index_name,
body: {
settings: self.settings.to_hash,
mappings: self.mappings.to_hash
})
end
def es_search(query)
__elasticsearch__.search({
query: {
multi_match: {
fields: %w(id title contents industry ...),
type: 'cross_fields',
query: query,
operator: 'and'
}
}
})
end
end
end
Third, the view and the controller are the following below.
/views/top/index.html.slim
= form_tag search_path, {:method=>"get"}
table border="0"
tr
td
= text_field_tag 'keyword[name]', nil, class: 'write'
td
input.push type="submit" value=""
...
/controllers/projects_controller.rb
def search
...
#keyword = params.dig('keyword', 'name')
params = ''
connection = '?'
...
if #keyword.present?
params = "#{params}#{connection}keyword=#{#keyword}"
connection = '&'
end
...
Last, ProjectSearchable is included in models/project.rb.
/models/project.rb
class Project < ApplicationRecord
include ProjectSearchable
...
■The problem
On rails console, i typed command
Project.es_search('Game')
Response was following.
※Game is included in the record
#<Elasticsearch::Model::Response::Response:0x007fc1a85621a8
#klass=[PROXY] Project (call 'Project.connection' to establish a connection),
#search=
#<Elasticsearch::Model::Searching::SearchRequest:0x007fc1a8562248
#definition=
{:index=>"es_project_development",
:type=>nil,
:body=>
{:query=>
{:multi_match=>
{:fields=>
["id",
"title",
"contents",
"industry",
"required",
...
"comment"],
:type=>"cross_fields",
:query=>"Game",
:operator=>"and"}}}},
#klass=[PROXY] Project (call 'Project.connection' to establish a connection),
#options={}>>
I think elasticsearch doesn't work.
My opinion of this problem is in search method for elasticsearch.
But i don't have enough knowledge about that.
I really need your help.
Thank you.

Converting Relay Node ID to Rails ID when updating related records?

Background:
I have a Location model that has_one Address and has_many Rooms. When I want to update a location, either by updating its name, its address or its rooms, I'm using the following InputObjects to do this:
module Types
# Input interface for creating locations
class LocationUpdateType < Types::BaseInputObject
argument :id, ID, required: false
argument :name, String, required: true
argument :address, AddressUpdateType, required: true, as: :address_attributes
argument :rooms, [RoomUpdateType], required: true, as: :rooms_attributes
end
class AddressUpdateType < Types::BaseInputObject
argument :id, ID, required: true
# not sure if I need this or not but it's commented out for now
# argument :location_id, ID, required: true
argument :street, String, required: true
argument :city, String, required: true
argument :state, String, required: true
argument :zip_code, String, required: true
# I'm not using this yet but I'm anticipating it because
# accepts_nested_attributes can use it as an indicator to destroy this address
argument :_destroy, Boolean, required: false
end
class RoomUpdateType < Types::BaseInputObject
argument :id, ID, required: false
# same thing with this ID.
# argument :location_id, ID, required: true
argument :name, String, required: true
# accepts_nested_attributes flag for destroying related room records.
# like above, I'm not using it yet but I plan to.
argument :_destroy, Boolean, required: false
end
end
When I make a GraphQL request, I'm getting the following in my logs:
Processing by GraphqlController#execute as JSON
Variables: {"input"=>
{"id"=>"TG9jYXRpb24tMzM=",
"location"=>
{"name"=>"A New Building",
"address"=>
{"city"=>"Anytown",
"id"=>"QWRkcmVzcy0zMw==",
"state"=>"CA",
"street"=>"444 New Rd Suite 4",
"zipCode"=>"93400"},
"rooms"=>[{"id"=>"Um9vbS00Mw==", "name"=>"New Room"}]}}}
mutation locationUpdate($input:LocationUpdateInput!) {
locationUpdate(input: $input) {
errors
location {
id
name
address {
city
id
state
street
zipCode
}
rooms {
id
name
}
}
}
}
Which makes sense, I don't want to use real IDs on the client but the obfuscated Relay Node IDs.
Problem:
When my request goes to be resolved I'm using this Mutation:
module Mutations
# Update a Location, its address and rooms.
class LocationUpdate < AdminMutation
null true
description 'Updates a locations for an account'
field :location, Types::LocationType, null: true
field :errors, [String], null: true
argument :id, ID, required: true
argument :location, Types::LocationUpdateType, required: true
def resolve(id:, location:)
begin
l = ApptSchema.object_from_id(id)
rescue StandardError
l = nil
end
return { location: nil, errors: ['Location not found'] } if l.blank?
print location.to_h
# return { location: nil }
# This is throwing an error because it doesn't like the Relay Node IDs.
l.update(location.to_h)
return { location: nil, errors: l.errors.full_messages } unless l.valid?
{ location: l }
end
end
end
When I print the hash that gets passed into this resolver I get the following:
{
:name=>"A New Building",
:address_attributes=>{
:id=>"QWRkcmVzcy0zMw==",
:street=>"444 New Rd Suite 4",
:city=>"Anytown",
:state=>"CA",
:zip_code=>"93400"
},
:rooms_attributes=>[{:id=>"Um9vbS00Mw==", :name=>"New Room"}]
}
When l.update runs, I get the following error:
Couldn't find Room with ID=Um9vbS00Mw== for Location with ID=33
This makes perfect sense to me because the Relay Node IDs aren't stored in the database so I guess I'm trying to figure out how to convert the room.id from a Relay Node ID, to the ID in the database.
Now, I could dig through the hash and use the ApptSchema.object_from_id and convert all the Relay Node IDs to Rails IDs but that requires a database hit for each one. I see the documentation for Connections listed here but this looks more like how to deal with queries and pagination.
Do I need to send the database IDs to the client if I plan on updating records with related records? Doesn't that defeat the purpose of Relay Node IDs? Is there a way to configure my Input object types to convert the Relay Node IDs to Rails IDs so I get the proper IDs in the hash sent to my resolver?

After lots of research, it seems that Relay UUIDs aren't really feasible when trying to update records through relationships. I think Relay assumes that you're using something like MongoDB which does this with their record's primary keys by default.
Using friendly_id as #Nuclearman has suggested seems to be the best way to obfuscate urls. What I've decided to do is to add friendly_id to records that I want to view in a "detail mode" like /posts/3aacmw9mudnoitrswkh9vdrt by creating a concern like this:
module Sluggable
extend ActiveSupport::Concern
included do
validates :slug, presence: true
extend FriendlyId
friendly_id :create_slug_id, use: :slugged
private def create_slug_id
# Try to see if the slug has already been created before generating a new one.
#create_slug_id ||= self.slug
#create_slug_id ||= SecureRandom.alphanumeric(24)
end
end
end
And then including the concern in any model you want to have friendly_ids in like so:
class Location < ApplicationRecord
include Sluggable
end
Then in your GraphQL type do something like this:
module Types
class LocationType < Types::BaseObject
field :id, ID, null: false
# Only include this for model types that have friendly_id
field :friendly_id String, null: false
field :name, String, null: false
# AddressType and RoomType classes won't have friendly_id because
# I'm not going to have a url like /location/3aacmw9mudn/address/oitrswkh9vdrt
# or /location/3aacmw9mudn/rooms/oitrswkh9vdrt
field :address, AddressType, null: false
field :rooms, [RoomType], null: false
end
end
module Types
class LocationUpdateType < Types::BaseInputObject
argument :name, String, required: true
argument :address, AddressUpdateType, required: true, as: :address_attributes
argument :rooms, [RoomUpdateType], required: true, as: :rooms_attributes
end
end
You might want to add friendly_id to all your models but my guess is you may only want it for "important" ones. Variables passed to the query no longer use UUIDs for dependent relationships but only for objects you're updating when finding them by ID like so:
{"input"=>
{"id"=>"3aacmw9mudn",
"location"=>
{"name"=>"A New Building",
"address"=>
{"city"=>"Anytown",
"id"=>"4",
"state"=>"CA",
"street"=>"444 New Rd Suite 4",
"zipCode"=>"93400"},
"rooms"=>[{"id"=>"13", "name"=>"New Room"}]}}}
And now your resolver might look something like:
module Mutations
# Update a Location, its address and rooms.
class LocationUpdate < AdminMutation
null true
description 'Updates a locations for an account'
field :location, Types::LocationType, null: true
field :errors, [String], null: true
argument :id, ID, required: true
argument :location, Types::LocationUpdateType, required: true
def resolve(id:, location:)
# Here, we're passing the friendly_id to this resolver (see above)
# not that actual ID of the record.
l = Location.friendly.find(id)
return { l: nil, errors: ['Location not found'] } if l.blank?
l.update(location.to_h)
return { location: nil, errors: l.errors.full_messages } unless l.valid?
{ location: l }
end
end
end

Rails: documenting post endpoint which consumes json params using swagger-blocks?

Using swagger-blocks in rails, would i document a post endpoint which consumes a single json body such as:
{
"id":"1",
"name": "bill",
"age":"22"
}
No matter what I do, my tests keep saying that my setup is not valid Swagger 2.0 JSON schema.
Below is the code I am using to generate my documentation:
swagger_path '/list/add' do
operation :post do
key :summary, 'Add person to list'
parameter name: :id, in: :body, required: true, type: :string
parameter name: :name, in: :body, required: true, type: :string
parameter name: :age, in: :body, required: true, type: :string
response 200 do
key :description, 'Successfully added to list'
end
end
end

The JSON seems correct syntactically except that "id" should be generated automatically by default if you are trying to create. You may check this specification for violations.

Why multi-field mapping is not working with tire gem for elasticsearch?

I'm using elastic search to enhance search capabilities in my app. Search is working perfectly, however sorting is not for fields with multiple words.
When I try to sort the search by log 'message', I was getting the error:
"Can't sort on string types with more than one value per doc, or more than one token per field"
I googled the error and find out that I can use multi-fields mapping on the :message field (one analyzed and the other one not) to sort them. So I did this:
class Log < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
tire.mapping do
indexes :id, index: :not_analyzed
indexes :source, type: 'string'
indexes :level, type: 'string'
indexes :created_at, :type => 'date', :include_in_all => false
indexes :updated_at, :type => 'date', :include_in_all => false
indexes :message, type: 'multi_field', fields: {
analyzed: {type: 'string', index: 'analyzed'},
message: {type: 'string', index: :not_analyzed}
}
indexes :domain, type: 'keyword'
end
end
But, for some reason is not passing this mapping to ES.
rails console
Log.index.delete #=> true
Log.index.create #=> 200 : {"ok":true,"acknowledged":true}
Log.index.import Log.all #=> 200 : {"took":243,"items":[{"index":{"_index":"logs","_type":"log","_id":"5 ... ...
# Index mapping for :message is not the multi-field
# as I created in the Log model... why?
Log.index.mapping
=> {"log"=>
{"properties"=>
{"created_at"=>{"type"=>"date", "format"=>"dateOptionalTime"},
"id"=>{"type"=>"long"},
"level"=>{"type"=>"string"},
"message"=>{"type"=>"string"},
"source"=>{"type"=>"string"},
"updated_at"=>{"type"=>"date", "format"=>"dateOptionalTime"}}}}
# However if I do a Log.mapping I can see the multi-field
# how I can fix that and pass the mapping correctly to ES?
Log.mapping
=> {:id=>{:index=>:not_analyzed, :type=>"string"},
:source=>{:type=>"string"},
:level=>{:type=>"string"},
:created_at=>{:type=>"date", :include_in_all=>false},
:updated_at=>{:type=>"date", :include_in_all=>false},
:message=>
{:type=>"multi_field",
:fields=>
{:message=>{:type=>"string", :index=>"analyzed"},
:untouched=>{:type=>"string", :index=>:not_analyzed}}},
:domain=>{:type=>"keyword"}}
So, Log.index.mapping is the current mapping in ES which doesn't contain the multi-field that I created. Am I missing something? and why the multi-field is shown in Log.mapping but not in Log.index.mapping?

I have changed the workflow from:
Log.index.delete; Log.index.create; Log.import
to
Log.index.delete; Log.create_elasticsearch_index; Log.import
The MyModel.create_elasticsearch_index creates the index with proper mapping from model definition. See Tire's issue #613.

Custom analyzer in Tire Elastic not working with Mongoid

I am still doing something wrong.
Could somebody pls help me?
I want to create a custom analyzer with ascii filter in Rails + Mongoid.
I have a simple model product which has field name.
class Product
include Mongoid::Document
field :name
settings analysis: {
analyser: {
ascii: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase','asciifolding']
}
}
}
mapping do
indexes :name, analyzer: 'ascii'
end
end
Product.create(name:"svíčka")
Product.search(q:"svíčka").count #1
Product.search(q:"svicka").count #0 can't find - expected 1
Product.create(name:"svicka")
Product.search(q:"svíčka").count #0 can't find - expected 1
Product.search(q:"svicka").count #1
And when I check the indexes with elasticsearch-head I expected that the index is stored without accents like this "svicka", but the index looks like this "Svíčka".
What am I doing wrong?
When I check it with API it looks OK:
curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=asciifolding' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}
http://localhost:9200/development_imango_products/_mapping
{"development_imango_products":{"product":{"properties":{"name":{"type":"string","analyzer":"ascii"}}}}}
curl -XGET 'localhost:9200/development_imango_products/_analyze?field=name' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}

You can check how you are actually indexing your document using the analyze api.
You need also to take into account that there's a difference between what you index and what you store. What you store is returned when you query, and it is exactly what you send to elasticsearch, while what you index determines what documents you get back while querying.
Using the asciifolding is a good choice for you usecase, it should return results either query ing for svíčka or svicka. I guess there's just a typo in your settings: analyser should be analyzer. Probably that analyzer is not being used as you'd expect.
UPDATE
Given your comment you didn't solve the problem yet. Can you check what your mapping looks like (localhost:9200/index_name/_mapping)? The way you're using the analyze api is not that useful since you're manually providing the text analysis chain, but that doesn't mean that chain is applied as you'd expect to your field. Better if you provide the name of the field like this:
curl -XGET 'localhost:9200/index_name/_analyze?field=field_name' -d 'svíčka'
That way the analyze api will rely on the actual mapping for that field.
UPDATE 2
After you made sure that the mapping is correctly submitted and everything looks fine, I noticed you're not specifying the field that you want to to query. If you don't specify it you're querying the _all special field, which contains by default all the field that you're indexing, and uses by default the StandardAnalyzer. You should use the following query: name:svíčka.

elasticsearch needs settings and mapping in a single api call. I am not sure if its mentioned in tire docs, but I faced a similar problem, using both settings and mapping when setting up tire. Following should work:
class Product
include Mongoid::Document
# include tire stuff
field :name
settings(self.tire_settings) do
mapping do
indexes :name, analyzer: 'ascii'
end
end
# this method is just created for readablity,
# settings hash can also be passed directly
def self.tire_settings
{
analysis: {
analyzer: {
ascii: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase','asciifolding']
}
}
}
}
end
end

Your notation for settings/mappings is incorrect, as #rubish suggests, check documentation in https://github.com/karmi/tire/blob/master/lib/tire/model/indexing.rb (no question the docs should be better)
Always, always, always check the mapping of the index to see if your desired mapping has been applied.
Use the Explain API, as #javanna suggests, to check how your analyzing chain works quickly, without having to store documents, check results, etc.

Please note that It is very important to add two lines in a model to make it searchable through Tire. Your model should look like
class Model
include Mongoid::Document
include Mongoid::Timestamps
include Tire::Model::Search
include Tire::Model::Callbacks
field :filed_name, type: String
index_name "#{Tire::Model::Search.index_prefix}model_name"
settings :analysis => {
:analyzer => {
"project_lowercase_analyzer" => {
"tokenizer" => "keyword",
"filter" => ["lowercase"],
"type" => "custom"
}
},
} do
mapping do
indexes :field_name, :boost => 10, :type => 'String', :analyzer => 'standard', :filter => ['standard', 'lowercase','keyword']
end
end
def self.search( params = {} )
query = params[:search-text_field_name_from_form]
Model.tire.search(load: true, page: params[:page], per_page: 5) do
query { string query, default_operator: "AND" } if query.present?
end
end
You can change the index_name(should be unique) and the analyzer
And Your controller would be like
def method_name
#results = Model.search( params ).results
end
You can use #results in your view. Hope this may help you.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart