ThinkingSphinx: dynamic indices on the SQL-backed indices?

ThinkingSphinx: dynamic indices on the SQL-backed indices? - ruby-on-rails

I am trying to use ThinkingSphinx (with SQL-backed indices) in my Rails 5 project.
I need some dynamic run-time indices to search over.
I have a Message model:
class Message < ApplicationRecord
belongs_to :sender, class_name: 'User', :inverse_of => :messages
belongs_to :recipient, class_name: 'User', :inverse_of => :messages
end
and its indexer:
ThinkingSphinx::Index.define :message, :with => :active_record, :delta => true do
indexes text
indexes sender.email, :as => :sender_email, :sortable => true
indexes recipient.email, :as => :recipient_email, :sortable => true
indexes [sender.email, recipient.email], :as => :messager_email, :sortable => true
has sender_id, created_at, updated_at
has recipient_id
end
schema.rb:
create_table "messages", force: :cascade do |t|
t.integer "sender_id"
t.integer "recipient_id"
t.text "text"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.boolean "read", default: false
t.boolean "spam", default: false
t.boolean "delta", default: true, null: false
t.index ["recipient_id"], name: "index_messages_on_recipient_id", using: :btree
t.index ["sender_id"], name: "index_messages_on_sender_id", using: :btree
end
The problem is about so-called "dialogs". They don't exist in the database - they are determined at run-time. A dialog - that's a set of messages between 2 users, where each user may be either a sender or a receiver.
The task is to search through my dialogs and to find the dialog (dialog's messages) by the piece of the correspondent email. So complicated!
Here's my effort:
conditions = {messager_email: search_email}
with_current_user_dialogs =
"*, IF(sender_id = #{current_user.id} OR recipient_id = #{current_user.id}, 1, 0) AS current_user_dialogs"
messages = Message.search search_email, conditions: conditions,
select: with_current_user_dialogs,
with: {'current_user_dialogs' => 1}
This is almost fine - but still not. This query correctly searches only within my dialog (within the messages I sent or received) and only within :sender and :recipient fields simultaneously (which is not best).
Say my email is "client1#example.com". Other emails are like "client2#example.com", "client3#example.com", "manager1#example.com".
The trouble is that when I search for "client1" - I get all the messages where I was either a sender or a receiver. But I should get nothing in response - I need to search only across my correspondents emails - not mine.
Even worse stuff happens also while querying "client" - I get back the correct correspondents with "client2#example.com", "client3#example.com" - but the result is spoiled with wrong "client1#example.com".
I need a way to choose at run-time - which index subset to search within.
I mean this condition is not enough for me:
indexes [sender.email, recipient.email], :as => :messager_email, :sortable => true
It searches (for "client") within all the sender.email and all the recipient.email at once.
But I need to dynamically choose like: "search only within sender.email values conforming to if sender.id != current_user.id" OR "search only within recipient.email conforming to if recipient.id != current_user.id" (because I can be as a sender as a receiver).
That's what I call a "dynamic index".
How to do that? Such "dynamic index" surely would depend on the current current_user value - so it will be different for the different users - even on the same total messages set.
It is clear that I can't apply whatever post-search cut-offs (what to cut off?) - I need to somehow limitate the search itself.
I tried to search over some scope - but got the error that "searching is impossible over scopes" - something like that.
Maybe I should use the real-time indexing instead of the SQL-backed indexing?
Sorry for the complexity of my question.

Would the following work?
other = User.find_by :email => search_email
with_current_user_dialogs = "*, IF((sender_id = #{current_user.id} AND recipient_id = #{other.id}) OR (recipient_id = #{current_user.id} AND sender_id = #{other.id}), 1, 0) AS current_user_dialogs"
Or do you need partial matches on the searched email address?
[EDIT]
Okay, from the discussion in the comments below, it's clear that the field data is critical. While you can construct a search query that uses both fields and attributes, you can't have logic in the query that combines the two. That is, you can't say: "Search field 'x' when attribute 'i' is 1, otherwise search field 'y'."
The only way I can possibly see this working is if you're using fields for both parts of the logic. Perhaps something like the following:
current_user_email = "\"" + current_user.email + "\""
Message.search(
"(#sender_email #{current_user_email} #recipient_email #{search_email}) | (#sender_email #{search_email} #recipient_email #{current_user_email})"
)

Related

Ruby on Rails performance of search engine with indexed column

Hi i'm try to check search engine's performance of my ROR application.
I have 4 search input forms : title, content, created_on (date) and updated_on (date)
I want to check performace of search depending on the presence or absence of an index. (in my case, index presence on created_on and absence on updated_on)
My controller of Post
def index
search_start_time = Time.now
#posts = Post.search(params[:title], params[:content], params[:created_on], params[:updated_on])
# this line for check performance of search
puts Time.now - search_start_time
end
My schema
create_table 'posts', force: :cascade do |t|
t.string 'title', null: false
t.string 'content', null: false
t.date 'created_on', null: false, index: true
t.date 'updated_on', null: false
end
In my post.rb, i maked search method like this
def self.search(title, content, started_on, finished_on)
where([
"title LIKE ? AND content LIKE ? AND CAST(started_on AS text) LIKE ? AND CAST(finished_on AS text) LIKE ?",
"%#{title}%", "%#{content}%", "%#{started_on}%", "%#{finished_on}%"
])
end
With my code, i performance but there were not big difference with search performance of "indexed" and "not indexed" columns.
Is there a problem with my code? Or does the index not affect the search results?
The number of records is 10 million, and an indexed column always comes out similar to the speed of an unindexed column.
I tried to change my search method like this ->
def self.search(title = '', content = '', started_on = '', finished_on = '')
But there was not difference.

Can you have an association based on a JSONB column in Rails?

I have a Rails app (rails v6.0.3, ruby 2.7.1) that is using the Noticed gem to send notifications. I have the following model configuration:
class Vendor < ApplicationRecord
has_noticed_notifications
end
The has_noticed_notifications is, as described in their README, a "Helper for associating and destroying Notification records where(params: {param_name.to_sym => self})"
So when I create a Notification like so...
VendorAddedNotification.with(
vendor: vendor,
data_source: "user",
).deliver(some_user) # => Notification inserted!
I expect to be able to find the Notifications that reference the vendor, using the Noticed method, like so:
vendor = Vendor.find ...
vendor.notifications_as_vendor # => Expected: [ Notification#123 ]
However, the input is always an empty array (Actual => [])
I looked at their source code and it looks like notifications_as_vendor is the following query:
Notification.where(params: { :vendor => self }) # where self = an instance of the Vendor model
However, that doesn't seem to work, and I'm not sure if it's supposed to or not. I tried running a simpler query to see if it worked ...
Notification.where(params: { :data_source => "user" })
But that did not work either. However, when I ran the same query with a different signature, it did:
Notification.where("params->>'data_source' = ?", "user")
So my question is-- is this Notified's mistake, or am I missing something in my configuration? I'm using PSQL for this, here is the relevant schema:
...
create_table "notifications", force: :cascade do |t|
t.string "recipient_type", null: false
t.bigint "recipient_id", null: false
t.string "type", null: false
t.jsonb "params"
t.datetime "read_at"
t.datetime "created_at", precision: 6, null: false
t.datetime "updated_at", precision: 6, null: false
t.index ["read_at"], name: "index_notifications_on_read_at"
t.index ["recipient_type", "recipient_id"], name: "index_notifications_on_recipient_type_and_recipient_id"
end
...
And here are the related models:
class VendorAddedNotification < Noticed::Base
deliver_by :database
param :vendor
param :data_source
end
class Notification < ApplicationRecord
include Noticed::Model
belongs_to :recipient, polymorphic: true
end
Thank you in advance!

I've found why it's not working, it seems to be an issue with Notified.
In plain SQL I ran:
# PLAIN SQL
select "params" from "notifications" limit 1
Which returns the notification's params (returned notifcation's id=77)
# PLAIN SQL Result
"{""added_by"": {""_aj_globalid"": ""gid://stack-shine/WorkspaceMember/269""}, ""data_source"": ""user"", ""_aj_symbol_keys"": [""workspace_vendor"", ""data_source"", ""added_by""], ""workspace_vendor"": {""_aj_globalid"": ""gid://stack-shine/WorkspaceVendor/296""}}"
Now in Rails when I do
vendor = Notification.find(77).params[:vendor]
vendor.notifications_as_vendor.to_sql
The result is ...
"SELECT \"notifications\".* FROM \"notifications\" WHERE \"notifications\".\"params\" = '{\"vendor\":{\"_aj_globalid\":\"gid://stack-shine/Vendor/296\"},\"_aj_symbol_keys\":[\"vendor\"]}'"
... the extracted params from that query are:
'{\"vendor\":{\"_aj_globalid\":\"gid://stack-shine/Vendor/296\"},\"_aj_symbol_keys\":[\"vendor\"]}'
So ... In the database, the serialized params are A, but Rails is search for B:
# A: `params` In the database
"{""added_by"": {""_aj_globalid"": ""gid://stack-shine/WorkspaceMember/269""}, ""data_source"": ""user"", ""_aj_symbol_keys"": [""vendor"", ""data_source"", ""added_by""], ""vendor"": {""_aj_globalid"": ""gid://stack-shine/Vendor/296""}}"
# B: `params` Searched with by Rails
"{\"vendor\":{\"_aj_globalid\":\"gid://stack-shine/Vendor/296\"},\"_aj_symbol_keys\":[\"vendor\"]}"
Clearly this query could not work because the params in the database are not the params being search by Rails.
The notification, in the database, has extra parameters on top of "vendor" ("data_source" and "added_by") that are not being search up by the Vendor. Is this why it returns nothing?
For now, I'll simply the look up the notifications myself by storing the vendor_id in params and doing something like Notification.where("params >> vendor_id = ?", 123)

Storing List item details in a database?

So, this may be more of a "Software Engineering" question. But im thinking of a good way at how to store details for a Widget in active record.
Pretend Widget A has a show page, and in that show page we have some accordian style "FAQS" or something to that effect. Within the accordian is a list, with bullet points highlighting different things of how Widget A works, or how to use Widget A.
Since obviously we wouldn't want to make a separate page for each widget, these items would need to be stored somewhere. But we also wouldn't want to make...10, 20 or 30 separate fields in the database for each one of these. So whats the solutions for this?
My first thought is some sort of hash or array, but does rails allow this? Especially if they are long strings per item. Is there a better way?
Or is the proper way to do this is just claim this as a model (like.."faq_item") or something, and then have a reference ID for the Widget it needs to go to? (that way the "faq_item" model/schema would only need a few fields, and can just assigned the reference ID to the Widget it would belong to.

If each widget has only a few "FAQ items" (or "details", as I'll refer to them) and each detail is nothing more than a text string, you could store a widget's details in a serialized array as such:
# models/widget.rb
class Widget < ApplicationRecord
# serialize the `details` attribute as JSON into
# the `details` column on the widgets table
serialize :details, JSON
end
# db/schema.rb
# ...
create_table "widgets", force: :cascade do |t|
t.string "name"
t.text "details"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
# rails console
wid = Widget.create!(
:name =>
'Wideband, Voltage-Feedback Operational Amplifier With Disable',
:details => [
'Flexible supply range: 5-V to 12-V Single Supply, +/- 2.5-V to 5-V Dual Supply',
'Unity-Gain Stable: 500 MHz (G = 1)',
'High Output Current: 190 mA',
'High Slew Rate: 1800 V/us',
'Wideband 5-V Operation: 220 MHz (G = 2)'
])
# => #<Widget ...>
wid.details.first
# => "Flexible supply range: 5-V to 12-V Single Supply, +/- 2.5-V to 5-V Dual Supply"
You can look at the Rails 5 serialization API for more information on serialize.
If, however, you need to store more information for each detail (for instance, created_at/updated_at fields) or each widget has more than a few details, then it may be prudent to create a new table for widget details as you suggested:
# models/widget.rb
class Widget < ApplicationRecord
has_many :details, :dependent => :destroy
end
# models/widget/detail.rb
class Widget::Detail < ApplicationRecord
belongs_to :widget
end
# db/schema.rb
# ...
create_table "widget_details", force: :cascade do |t|
t.integer "widget_id"
t.text "content"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
wid = Widget.create!(
:name =>
'CMOS, 125 MHz Complete DDS Synthesizer',
:details => [
Widget::Detail.create!(:content => '125 MHz Clock Rate'),
Widget::Detail.create!(:content => 'On-Chip High Performance DAC'),
Widget::Detail.create!(:content => '32-Bit Frequency Tuning Word')
])
# => #<Widget ...>
wid.details.first
# => #<Widget::Detail ... content: "125 MHz Clock Rate" ...>

If you are using Postgres you could use a JSONB type field in your database. With a JSONB data type you will be able to have unstructured data while being able to query the field with Postgres and ActiveRecord without the need for a new table.
Like this:
rails g migration add_fields_to_widgets details:jsonb
rails db:migrate
Test your widget creation inside the rails console.
Widget.create(name: "Widget Foo", details: { "how to use": "Instructions on how to use", "height": "12cm", "width": "100cm" })
If you'd want to find all the widgets with 12cm height, you would just have to make a query like this:
Widget.where("details->>'height' = ?", "12cm")
which would return your original Widget Foo object, and then you would be able to manipulate it with pure JavaScript on your front-end.

ThinkingSphinx: OR-condition on the SQL-backed indices?

I am trying to use ThinkingSphinx in my Rails 5 project. I read an instruction at http://freelancing-gods.com/thinking-sphinx/
I need to implement the OR logic on SQL-backed indices.
Here is my class:
class Message < ApplicationRecord
belongs_to :sender, class_name: 'User', :inverse_of => :messages
belongs_to :recipient, class_name: 'User', :inverse_of => :messages
end
and its indexer:
ThinkingSphinx::Index.define :message, :with => :active_record, :delta => true do
indexes text
indexes sender.email, :as => :sender_email, :sortable => true
indexes recipient.email, :as => :recipient_email, :sortable => true
has sender_id, created_at, updated_at
has recipient_id
end
schema.rb:
create_table "messages", force: :cascade do |t|
t.integer "sender_id"
t.integer "recipient_id"
t.text "text"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.boolean "read", default: false
t.boolean "spam", default: false
t.boolean "delta", default: true, null: false
t.index ["recipient_id"], name: "index_messages_on_recipient_id", using: :btree
t.index ["sender_id"], name: "index_messages_on_sender_id", using: :btree
end
So I need to search only within 2 indices at once - :sender_email and :recipient_email - but ignoring indexes text.
In pseudocode I need something like this:
Message.search 'manager1#example.com' :conditions => {:sender_email => 'client1#example.com' OR :receiver_email => 'client1#example.com'}
Which means: find all the messages between 'manager1#example.com' and 'client1#example.com' (each of them could be either a sender or a receiver) - ignoring the messages containing the text with words 'manager1#example.com' or 'client1#example.com'.
Unfortunately, the docs say:
The :conditions option must be a hash, with each key a field and each value a string.
In other words, I need a conditional index set (at run-time) - but simultaneously over 2 indices (not 1 as documented).
I mean that it is a bad idea to allow only hashes as a condition - and no strings (like ActiveRecord queries do allow http://guides.rubyonrails.org/active_record_querying.html#pure-string-conditions ).
PS I would say that the ThinkingSphinx documentation http://freelancing-gods.com/thinking-sphinx/ is pretty bad and needs to be totally rewritten from scratch. I read it all and did not understand anything. It has no examples (complete examples - only partial - thus totally unclear). I even don't understand what are fields and attributes and how do they differ. Associations, conditions, etc - all is unclear. Very bad. The gem itself looks pretty good - but its documentation is awful.

I'm sorry to hear you've not been able to find a solution that works for you in the documentation. The challenging thing with Sphinx is that it uses the SphinxQL syntax, which is very similar to SQL, but also quite different at times - and so people often expect SQL-like behaviour.
It's also part of the challenge of maintaining this gem - I'm not sure it's wise to mimic the ActiveRecord syntax too closely, otherwise that could make things more confusing.
The key thing to note here is that you can make use of Sphinx's extended query syntax for matches to get the behaviour you're after:
Message.search :conditions => {
:sender_email => "(client1#example.com | manager1#example.com)",
:receiver_email => "(client1#example.com | manager1#example.com)"
}
This will return anything where the sender is either of the two values, and the receiver is either of the two values. Of course, this will include any messages sent from client1 to client1, or manager1 to manager1, but I'd expect that's rare and maybe not that big a problem.
One caveat to note is that # and . aren't usually treated as searchable word characters, so you may need to add them to your charset_table.
Also, given you're actually performing exact matches on the entire values of database columns, this does feel like a query that's actually better served by some database indices on the columns and using SQL instead. Sphinx (and I'd say most/all other full-text search libraries) are best suited to matching words and phrases within larger text fields.
As for the documentation… I've put a lot of effort into trying to make them useful, though I realise there's still a lot of improvement that could take place. I do have a page that outlines how fields and attributes differ - if that's not clear, feedback is definitely welcome.
Keeping documentation up-to-date requires a lot of effort in small and new projects - and Thinking Sphinx is neither of these, being 10 years old in a few months. I'm proud at how it still works well, it still supports the latest versions of Rails, and it's still actively maintained and supported. But it's open source - it's done in my (and others') spare time. It's not perfect. If you find ways to improve things, then please do contribute! The code and docs are on GitHub, and pull requests are very much welcome.

globalize2 problem

I have strange globalize2 problem. I'm trying to use globalize 2 and acts_as_textiled and acts_as_commentable. For example - lets we have Post model, that acts_as_commentable. From console
p = Post.find 1
c = p.comments.find 1
works fine, but in browser - nothing displayed
Similar, when Post contains
acts_as_textiled :body
from console body is containing correct data, but in browser i see nothing :(
Any ideas how to correct it?
Upd: "nothing displayed" means,
that for code like
class Post < ActiveRecord::Base
translates :title, :body
acts_as_textiled :body
end
on access to Post.body i've got nil, but on disabled globalize2 or
acts_as_textiled body returns his value. I've tried with different
locales - the same result.

Have you performed the necessary migrations? For localised content you should remove the localised fields in the main table (posts) and create a table for the localisations, like this:
create_table "post_translations", :force => true do |t|
t.string "locale"
t.integer "product_id"
t.string "title"
t.text "body"
end
Just guessing here :)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart