Elasticsearch rails - can't search for mac extensions

Elasticsearch rails - can't search for mac extensions - ruby-on-rails

I have setup elastic search rails with the 'elasticsearch-model',
'elasticsearch-rails' gems.
I am trying to search for these attachments by their content. It works well when I have an indexed PDF, Word, plain text, or a plefora of other formats. But it does not work when I index a mac format, e.g. .pages, .keynote and .numbers files.
I made sure that mac files get indexed, but it feels like they are not indexed properly. When I look at raw index data for a .word file vs .pages file, they both have their respective attachment fields populated as base64 representation of the document content. It seems like for the mac extensions, however, this base64 representation isn't accurate.
My index model definition:
settings index: { number_of_shards: 3 } do
mappings do
indexes :filename
indexes :uploaded_by
indexes :project_id, index: :not_analyzed
indexes :attachment, type: 'attachment'
end
end
def as_indexed_json(options={})
self.as_json({
only: [:project_id, :filename, :uploaded_by, :attachment],
methods: [:attachment]
})
end
My attachment method:
def attachment
if url
key = url.sub("https://s3.amazonaws.com/#{ENV['BUCKETNAME']}/", "")
content = AWS.s3.buckets[ENV['BUCKETNAME']].objects[key].read
Base64.encode64(content)
end
end
The file first gets uploaded to s3 (since client side sends it there directly), then read by the server from s3 to get indexed. this is a proof of concept code only, future dev will upload from client to server, index, then upload to s3, then delete from server.
E.S version: "1.7.1",
Lucene version: "4.10.4"

Related

How to migrate images to new field in Rails?

I use Ruby on Rails 5.2.3, Mongoid, Attachinary and Cloudinary for images.
class User
include Mongoid::Document
has_attachment :image, accept: [:jpg, :png, :gif]
field :pic, type: String
before_update :migrate_images
def migrate_images
self.image_url = self.pic
end
end
Images are saved in pic field as links.
Now I use this code, the problem is that this takes a very long time and not all images are saved.
User.where(:pic.exists => true).all.each &:update
log
irb(main):001:0> User.where(:pic.exists => true).all.each &:update
=> #<Mongoid::Contextual::Mongo:0x00007ffe5a3f98e0 #cache=nil, #klass=User, #criteria=#<Mongoid::Criteria
selector: {"pic"=>{"$exists"=>true}}
options: {}
class: User
embedded: false>
, #collection=#<Mongo::Collection:0x70365213493680 namespace=link_development.users>, #view=#<Mongo::Collection::View:0x70365213493380 namespace='link_development.users' #filter={"pic"=>{"$exists"=>true}} #options={"session"=>nil}>, #cache_loaded=true>

User.where(:pic.exists => true).all.each &:update
This is slow because .all.each loads all matching Users into memory, find_each is a bit more efficient on memory as it will load in batches, but it's still a waste of time and memory to load each object into memory and turn it into an object to copy one attribute. Then it runs an update on each individual one.
Instead, you can do this entirely in the database in a single query.
If the intent is to copy from User.pic to User.image_url, you can do this in a single statement.
# Find all the users who do not already have an image_url set
User.where(image_url: nil)
# Set their image_url to be their pic.
.update_all("image_url = pic")
This will run a single query:
update users
set image_url = pic
where image_url is null
There's no need to also check for users who lack a pic because there's no harm in setting nil to nil, and a simpler search might be faster. But if you like check you can use where.not. Users.where(image_url: nil).where.not(pic: nil)

update_columns update a string but it's nil when fetching on rails

I'm trying to save file to a Google Storage bucket, I followed the official guide for rails.
So I've this code for updating my file
after_create :upload_document, if: :path
def upload_document
file = Document.storage_bucket.create_file \
path.tempfile,
"cover_images/#{id}/#{path.original_filename}",
content_type: path.content_type,
acl: "public"
# Update the url to my path field on my database
update_columns(path: file.public_url)
end
I can store my file on my bucket, I can retrieve the public_url and update the path field on my table but when I try to fetch the path string I have a nil. Exemple on my rails console
Document.find(14)
=> #<Document id: 14, name: "Test", path: "https://storage.googleapis.com/xxxxxxxx-xxxx.x...", created_at: "2018-10-05 07:17:59", updated_at: "2018-10-05 07:17:59">
Document.find(14).path
=> nil
Document.find(14).name
=> "Test"
So I don't understand why I can access to my path field on my SQL database after an update using the update_columns of Rails.
Thanks a lot for your help

You have some method defined on Document class (or included module) that is overriding the default attribute accessor.
To find out which, write this in console:
Document.find(14).method(:path).source_location
In any case you can access directly the attribute with
Document.find(14)['path']

Ruby on Rails Postgresql Array CSV upload

This Rails app (using a postgresql db) imports data from a CSV to create new records of a model "Client." The import in Models/client.rb looks like this:
def self.import(file, industry)
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
industry.clients.create! row.to_hash
end
This works properly in creating the records and populating each record's attributes, per the CSV data, for all record types except for an Array.
Clients have an array type attribute of "emails" (among other array attributes).
The array attributes were created in a migration like this:
add_column :clients, :emails, :text, array: true, default: []
In the CSV, they are stored in cells like this:
["email1#domain.com", "email2#domain.com", "email3#domain.com"]
Upon uploading, these emails would show on my local server:
INSERT INTO "clients"... ["emails", "{\"email1#domain.com",\"email2#domain.com\"}"]
As you can see, it chops off the third element of the array "email3#domain.com", and this is true for the last element of all Arrays uploaded from the CSV.
My guess is that the Postgresql array type is having trouble with the format that the array is saved in the CSV (the - ["element1", "element2", ...] )- I have tried several different formats, but no success yet. Any thoughts on how to do this?

Instead of trying to upload these attributes as an array I changed the migration to a normal string.
add_column :clients, :emails, :string
After I upload the CSV data to the rails app with:
def self.import(file, industry)
CSV.foreach(file.path, headers: true, encoding:'iso-8859-1:utf-8') do |row|
industry.clients.create! row.to_hash
end
I am now just taking that string and using this to manipulate the data:
JSON.parse(#client.emails)
Because the data uploaded from the CSV cell is already in a format that works with the JSON.parse: ["element1", "element2", "element3",... ] this was an effective method.
*NOTE This does not achieve the exact result desired in my posted question, but is functionally serving the same purpose for what is needed in this rails app.

Custom analyzer in Tire Elastic not working with Mongoid

I am still doing something wrong.
Could somebody pls help me?
I want to create a custom analyzer with ascii filter in Rails + Mongoid.
I have a simple model product which has field name.
class Product
include Mongoid::Document
field :name
settings analysis: {
analyser: {
ascii: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase','asciifolding']
}
}
}
mapping do
indexes :name, analyzer: 'ascii'
end
end
Product.create(name:"svíčka")
Product.search(q:"svíčka").count #1
Product.search(q:"svicka").count #0 can't find - expected 1
Product.create(name:"svicka")
Product.search(q:"svíčka").count #0 can't find - expected 1
Product.search(q:"svicka").count #1
And when I check the indexes with elasticsearch-head I expected that the index is stored without accents like this "svicka", but the index looks like this "Svíčka".
What am I doing wrong?
When I check it with API it looks OK:
curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=asciifolding' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}
http://localhost:9200/development_imango_products/_mapping
{"development_imango_products":{"product":{"properties":{"name":{"type":"string","analyzer":"ascii"}}}}}
curl -XGET 'localhost:9200/development_imango_products/_analyze?field=name' -d 'svíčka'
{"tokens":[{"token":"svicka","start_offset":0,"end_offset":6,"type":"word","position":1}]}

You can check how you are actually indexing your document using the analyze api.
You need also to take into account that there's a difference between what you index and what you store. What you store is returned when you query, and it is exactly what you send to elasticsearch, while what you index determines what documents you get back while querying.
Using the asciifolding is a good choice for you usecase, it should return results either query ing for svíčka or svicka. I guess there's just a typo in your settings: analyser should be analyzer. Probably that analyzer is not being used as you'd expect.
UPDATE
Given your comment you didn't solve the problem yet. Can you check what your mapping looks like (localhost:9200/index_name/_mapping)? The way you're using the analyze api is not that useful since you're manually providing the text analysis chain, but that doesn't mean that chain is applied as you'd expect to your field. Better if you provide the name of the field like this:
curl -XGET 'localhost:9200/index_name/_analyze?field=field_name' -d 'svíčka'
That way the analyze api will rely on the actual mapping for that field.
UPDATE 2
After you made sure that the mapping is correctly submitted and everything looks fine, I noticed you're not specifying the field that you want to to query. If you don't specify it you're querying the _all special field, which contains by default all the field that you're indexing, and uses by default the StandardAnalyzer. You should use the following query: name:svíčka.

elasticsearch needs settings and mapping in a single api call. I am not sure if its mentioned in tire docs, but I faced a similar problem, using both settings and mapping when setting up tire. Following should work:
class Product
include Mongoid::Document
# include tire stuff
field :name
settings(self.tire_settings) do
mapping do
indexes :name, analyzer: 'ascii'
end
end
# this method is just created for readablity,
# settings hash can also be passed directly
def self.tire_settings
{
analysis: {
analyzer: {
ascii: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['lowercase','asciifolding']
}
}
}
}
end
end

Your notation for settings/mappings is incorrect, as #rubish suggests, check documentation in https://github.com/karmi/tire/blob/master/lib/tire/model/indexing.rb (no question the docs should be better)
Always, always, always check the mapping of the index to see if your desired mapping has been applied.
Use the Explain API, as #javanna suggests, to check how your analyzing chain works quickly, without having to store documents, check results, etc.

Please note that It is very important to add two lines in a model to make it searchable through Tire. Your model should look like
class Model
include Mongoid::Document
include Mongoid::Timestamps
include Tire::Model::Search
include Tire::Model::Callbacks
field :filed_name, type: String
index_name "#{Tire::Model::Search.index_prefix}model_name"
settings :analysis => {
:analyzer => {
"project_lowercase_analyzer" => {
"tokenizer" => "keyword",
"filter" => ["lowercase"],
"type" => "custom"
}
},
} do
mapping do
indexes :field_name, :boost => 10, :type => 'String', :analyzer => 'standard', :filter => ['standard', 'lowercase','keyword']
end
end
def self.search( params = {} )
query = params[:search-text_field_name_from_form]
Model.tire.search(load: true, page: params[:page], per_page: 5) do
query { string query, default_operator: "AND" } if query.present?
end
end
You can change the index_name(should be unique) and the analyzer
And Your controller would be like
def method_name
#results = Model.search( params ).results
end
You can use #results in your view. Hope this may help you.

store utf8 strings in db, but validate and parameterize transliteriated

In my app there's a number of routes like :user_login/resource/resource_name
example:
:vlad/photo_albums/my-first-album
Users should be able to create an albums with their native language namings ( in my case its russian ). But if user named his album as, for example, "Привет, Мир!" ( which is "Hello World!" in English), I want to use a string where all letters of the Russian alphabet are replaced by similar-sounding Latin in a resource link. E.g, user provides album title "Привет Мир!" and the corresponding link looks like 'vlad/photo_albums/privet-mir'.
I've made all necessary methods to transform russian to latin, but now I'm trying to find the best way to arrange all this.
First issue is that I need to find album by it's title:
#album = #user.albums.
where( :title => params[:album_title] ).first
redirect_to user_albums_path(#user) unless #album
I would really want to avoid using anything but latin in my sql statements.
Second issue is that I don't want to run validations on non-latin string ( should I be? ) so I want to latinize and parameterize it before validation, but still save the original string if it's latinized version passed the validation:
validates :title, :presence => true, :length => { :within => (2..25) },
:allow_blank => false, :uniqueness => { :scope => :user_id }
What I was thinking about to accomplish this were hash serialization like {:latin_version => ..., :original_version => ..} or separate yaml file concepts.
I need your thoughts on how to arrange this properly and what would be the most elegant way. Or am I to pedantic about it? Would it be fine to just store/search for/validate/display non-latin characters?

It's completely fine to store, validate and search non-latin characters. Most Ruby on Rails companies that provide multilingual and international versions of their applications use UTF-8 in the application and database layers. UTF-8 can be properly parameterized, displayed and validated in Ruby on Rails and all major browsers so you should not see any issues there. The best way to handle this is to set your database encoding and/or table string encoding to UTF-8 and then your Ruby on Rails encoding in application.rb:
config.encoding = "UTF-8"
If you are using MySQL or Postgres you will probably also want to be explicit about the database encoding in your database.yml file:
development:
adapter: mysql2
database: development_db
user: root
password:
encoding: utf8

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart