Index virtual attribute Thinking Sphinx - ruby-on-rails

I'm wondering how one might index a virtual attribute on a model with Thinking Sphinx. Given a Project model and some instance method which returns a boolean derived by some other information from another model, say Users, whose attribute is derived and is not on the project table in the database.
For example, suppose we have a method is_user_eligible such that we can query Project.first.is_user_eligible, and get a true or false response. This works in the ORM already.
How can I index this virtual attribute with Thinking Sphinx? I'm able to index virtual attributes in my django project which is on Haystack backed by Elasticsearch. I facilitated this by having a #property decorator on the model method. I figured I should be able to do this with Rails/ThinkingSphinx too, yet I get all sorts of bizarre SQL errors when trying to index. I've tried all sorts of various constructions in setting up my index (e.g. has -vs- indexes) and all result in some sort of SQL error while indexing.
Is this possible with Thinking Sphinx? If so, how can I index a virtual attribute?

You've made it clear that the value is not available as a column on the projects table, but is it on an associated model instead? If so, then you could refer to it via the association:
has user.is_eligible, :as => :is_user_eligible
However, if it's not a column, but can be determined within the context of the SQL query, then you can use a SQL snippet as the attribute definition (I know my example is rather contrived, but should give you some idea):
has "(users.foo = 'bar' || users.baz = 'qux')",
:as => :is_user_eligible,
:type => :boolean
If you're referring to associations that aren't used elsewhere in the index definition, you can force the references, or provide a SQL join statement:
join users
# or through more than one association:
join users.addresses
# or via your own custom join:
join "INNER JOIN users ON users.project_id = projects.id"
But if you cannot determine this value via SQL at all, then the only way to do this with Thinking Sphinx is use real-time indices instead of SQL-backed indices. What this then means is that instead of referring to associations and columns in your index definitions, you refer to methods instead. So, your attribute would become:
has is_user_eligible, :type => :boolean
The type must be specified - SQL indices can guess attribute types due to column types, but real-time indices don't have that reference point.
I realise the link to the real-time indices feature is a blog post I wrote over two years ago. However, the feature certainly works - I and others have been using it in production for quite some time (including with Flying Sphinx).
On the topic of has vs indexes: if you want to use the value as a filter or for sorting, then it must be an attribute, and thus you should use the has method. However, if it's textual data that you expect search queries to match on, then it should be a field, and thus use the indexes method.
Certainly I'd recommend switching to real-time indices anyway: it removes the need for deltas and you get up-to-date Sphinx records without needing to run 'ts:index' regularly (or at all - use ts:generate should your data end up in an out-of-date state). But make sure you switch all index definitions to real-time, instead of having some real-time and others SQL-backed.

Related

Rails scope complexity

I have a model to which I need to create a default scope. I am unsure of the best way to write this scope but I will explain how it needs to work.
Basically I need to get all items of the model and if two items have the same "order" value then it should look to the "version" field (which will contain, 1, 2, 3 etc) and pick the one with the highest value.
Is there a way of achieving this with just a scope?
Try this code:
scope :group_by_order, -> { order('order ASC').group('order') }
default_scope, { (group_by_order.map{ |key,values| values.order('version DESC') }.map{|key, values| values - values[1..-1]}).values.flatten }
Explanation Code:
order by "order" field.
group by "order" field.
map on the result hash, and order each values by "version" field
map again on values, and remove from index "1" to the end.
get all values, and flatten them
A word of caution using default scopes with order. When you performs updated on the collection such as update_all it will use the default scope to fetch the records, and what you think would be a quick operation will bring your database to its knees as it copies the rows to a temporary table before updating.
I would recommend just using a normal scope instead of a default scope.
Have a look at Select the 3 most recent records where the values of one column are distinct on how to construct the sql query you want and then put that into a find_by_sql statemate mentioned in How to chain or combine scopes with subqueries or find_by_sql
The ActiveRecord order method simply uses the SQL ORDER function which can have several arguments. Let's say you have some model with the attributes order and version then the correct way order the records as you describe it, is order(:order, :version). If you want this as the default scope would you end up with:
default_scope { order(:order, :version) }
First, default_scopes are dangerous. They get used whenever you use the model, unless you specifically force 'unscoped'. IME, it is rare to need a scope to every usage of a model. Not impossible, but rare. And rarer yet when you have such a big computation.
Instead of making a complex query, can you simplify the problem? Here's one approach:
In order to make the version field work, you probably have some code that is already comparing the order fields (otherwise you would not have unique rows with the two order fields the same, but the version field differing). So you can create a new field, that is higher in value than the last field that indicated the right entity to return. That is, in order to create a new unique version, you know that you last had a most-important-row. Take the most-important-rows' sort order, and increment by one. That's your new most-important-rows' sort order.
Now you can query for qualifying data with the highest sort order (order_by(sort_order, 'DESC').first).
Rather than focus on the query, focus on whether you are storing the right data, that can the query you want to achieve, easier. In this case, it appears that you're already doing an operation that would help identify a winning case. So use that code and the existing database operation, to reduce future database operations.
In sql you can easily order on two things, which will first order on the first and then order on the second if the first thing is equal. So in your case that would be something like
select * from posts order by order_field_1, version desc
You cannot name a column order since it is a sql reserved word, and since you did not give the real column-name, I just named it order_field_1.
This is easily translated to rails:
Post.order(:order_field_1, version: :desc)
I would generally advice against using default_scope since once set it is really hard to avoid (it is prepended always), but if you really need it and know the risks, it is really to apply as well:
class Post < ActiveRecord::Base
default_scope { order(:order_field_1, version: :desc) }
end
This is all actually documented very well in the rails guides.

What's the best way to create a model associated to a query instead of a table

I'm trying to create a reporting app with Rails 4.
As a reporting system, it has a lot of SQL queries where the result is not like any table schema. I mean, a select query where I have some joins, unions and etc and the result will be something like a row with it's columns being result of subqueries, sums and etc.
Would it be possible to have a Model with no table associated, but I can use the "find_by_sql" on it, to instanciate an array of that model with the results of my query?
Something like:
Use "select table1.field1, sum(if(...,table2.field,...) as field2, as field3 from...." as query, and return a array of a model "Result", where I can call a
array_of_result.first.field3?
Sorry if I'm not writing clearly enough.
EDIT: until now, sparky's anwser(http://railscasts.com/episodes/193-tableless-model) was the closest one, beacuse I want to use some of the ActiveRecord features, like specify a connection in the class(or even in a super class).
For pure reporting, especially when the result column names span multiple models, one alternative is to just pass the query directly back and deal with the result set:
ActiveRecord::Base.connection.execute([raw SQL query])
You'll get back a result set, which is typically an enumerable set of row results, but check the documentation for your DB adapter to find out for sure what it's returning.
For example, if you're using PostgreSQL as your database with the pg gem, you'll get back an instance of PG::Result which you can then operate on in the following way:
> results = ActiveRecord::Base.connection.execute("SELECT COUNT(*) FROM customers")
=> <PG:Result >
> results.count
=> 63 # the number of customers I have in this contrived example
> results.first
=> { "count": "63" }
> results[0]
=> { "count": "63" }
> results[0]["count"]
=> "63"
You'll need to cast your return values to something other than strings. ActiveRecord will typically do this for you in your models since it knows the column types, but by doing a raw query you'll probably just get back strings that you'll have to cast yourself. If you're just doing a query to display it on a page somewhere maybe the strings will be sufficient.
I'm sure you'll be doing more sophisticated reports, but you'll notice in my simple example that the key count wound up being created as the accessor to the result of the SELECT COUNT... query. If you specify column names, or alias them, the keys in the resulting hash set will match the column names or the aliases you've set.
You can certainly create a Reporting model.
You would want to start off by creating a tableless model. Essentially, this can be as simple as a file in your models directory with
class Reporting
end
in it, and a controller with some appropriate actions and views. However, have a look at
http://railscasts.com/episodes/193-tableless-model
http://railscasts.com/episodes/219-active-model
which cover tableless models and what you can do with active model with respect to validations etc.
In your case, you say that you have some complex joins etc. Sometimes it's easier in the short term to SQLize these, but if you can use activerecord you should. Apart from anything else, this will allow you to define custom methods in your model which you can chain and make your Reporting controller much cleaner

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?
Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

How do you normally sort items in Rails?

I have a little example Rails app called tickets, which views and edits fictional tickets sold to various customers. In tickets_controller.rb, inside def index, I have this standard line, generated by scaffolding:
#tickets = Ticket.find(:all)
To sort the tickets by name, I have found two possible approaches. You can do it this way:
#tickets = Ticket.find(:all, :order => 'name')
... or this way:
#tickets = Ticket.find(:all).sort!{|t1,t2|t1.name <=> t2.name}
(Tip: Ruby documentation explains that sort! will modify the array that it is sorting, as opposed to sort alone, which returns the sorted array but leaves the original unchanged).
What strategy do you normally use? When might you use .sort! versus the :order => 'criteria' syntax?
Use :order => 'criteria' for anything simple that can be done by the database (ie. basic alphabetical or chronological order). Chances are it's a lot faster than letting your Ruby code do it, assuming you have the right indexes in place.
The only time I could think you should use the sort method is if you have a complex attribute that's calculated at run-time and not stored in the database, like a 'trustworthiness value' based off number of good/bad responses or something. In that case it's better to use the sort method, but be aware that this will screw things up if you have pagination in place (each page will have ITS results in order, but the set of pages as a whole will be out of order).
I specify an order in the ActiveRecord finder or in the model association because sorting using SQL is faster. You should take advantage of the features offered by the RDBMS when you're able to do so.

Add fields to ActiveRecord model dynamically in Rails 2.2.2?

Say I wanted to allow an administrative user to add a field to an ActiveRecord Model via an interface in the Rails app. I believe the normal ActiveRecord::Migration code would be adequate for modifying the AR Model's table structure (something that would not be wise for many applications - I know). Of course, only certain types of fields could be added...in theory.
Obviously, the forms that add (or edit) records to this newly modified ActiveRecord Model would need to be build dynamically at run-time. A common form_for approach won't do. This discussion suggests this can only be accomplished with JavaScript.
http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/fc0b55fd4b2438a5
I've used Ruby in the past to query an object for it's available methods. I seem to remember it was insanely slow. I'm too green with Ruby and Rails to know an elegant way to approach this. I hope someone here may. I'm also open to entirely different approaches to this problem that don't involve modifying the database.
To access the columns which are currently defined for a model, use the columns method - it will give you, for each column, its name, type and other information (such as whether it is a primary key, etc.)
However, modifying the schema at runtime is delicate.
The schema is pre-loaded (and cached, from the DB driver) by each model class when it is first loaded. In production mode, Rails only does this once per model, around startup.
In order to force Rails to refresh its cached schema following your modification, you should force Ruby to reload the affected model's class (pretty much what Rails does for you automatically, after each request, when running in development mode - see how to reload a class using remove_const followed by load.)
If you have a Mongrel cluster, you also have to inform the other processes in the cluster, which run in their own separate memory space, to also reload their model's classes (some clusters will allow you to create a 'restart.txt' file, which will cause an automatic soft-restart of all processes in your cluster with no additional work required on your behalf.)
Now, these having been said, depending on the actual problem that you need to solve you may not need to dynamically alter the schema after all. Instead of adding, say, columns col1, col2 and col3 to some table entries (model Entry), you can use a table called dyn_attribs, where Entry has_many :dyn_attribs, and where dyn_attribs has both a key column (which in this case can have values col1, col2 or col3) and a value column (which lists the corresponding values for col1, col2 etc.)
Thus, instead of:
my_entry = Entry.find(123)
col1 = my_entry.col1
#do something with col1
you would use:
my_entry = Entry.find(123, :include => :dyn_attribs)
dyn_attribs = my_entry.dyn_attribs.inject(HashWithIndifferentAccess.new) { |s,a|
s[a.key] = a.value ; s
}
col1 = dyn_attribs[:col1]
#do something with col1
The above inject call can be factored away into the model, or even into a base class inherited from by all models that may require additional, dynamic columns/attributes (see Polymorphic associations on how to make several models share the same dyn_attribs table for dynamic attributes.)
UPDATE
Adding or renaming a column via a regular HTML form.
Assume that you have a DynAttrTable model representing a table with dynamic attributes, as well as a DynAttrDef defining the dynamic attribute names for a given table.
Run:
script/generate scaffold_resource DynAttrTable name:string
script/generate scaffold_resource DynAttrDef name:string
rake db:migrate
Then edit the generated models:
class DynAttrTable < ActiveRecord::Base
has_many :dyn_attr_defs
end
class DynAttrDef < ActiveRecord::Base
belongs_to :dyn_attr_table
end
You may continue to edit the controllers and the views like in this tutorial, replacing Recipe with DynAttrTable, and Ingredient with DynAttrDef.
Alternatively, use one of the plugins reviewed here to automatically put the dyn_attr_tables and dyn_attr_defs tables under management by an automated interface (with all its bells and whistles), with virtually zero implementation effort on your behalf.
This should get you going.
Say I wanted to allow an
administrative user to add a field to
an ActiveRecord Model via an interface
in the Rails app.
I've solved this sort of problem before by having an extra model called AdminAdditions. The table includes an id, an admin user id, a model name string, a type string, and a default value string.
I override the model's find and save methods to add attributes from its admin_additions, and save them appropriately when changed. The model table has a large text field, initially empty, where I save nondefault values of the added attributes.
Essentially the views and controllers can pretend that every attribute of the model has its own column. This means form_for and so on all work.
ActiveRecord::Migration.add_column(User, "email", :string)
You could use Flex Attributes for this, though if you want to be able to search or order by these new columns you'll have to write (a lot of) custom SQL.
I have seen the dynamic alteration/migration of tables offered as a solution many times but I have never actually seen it implemented. There are many reasons why this solution is rarely implemented.
If the table is large then the table may/will be locked for extended periods of what is supposed to be up-time.
Why is your model changing dynamically? It is quite rare for a models structure to need to change dynamically. It is more often an indication that you are trying to model something specific in a generalised way.
This is often an attempt a producing a "Categorised" model than could be better solved by another approach.
DDL statements are often not allowed by the same user that is being used for day to day DML requirements. Whilst this could be the case, and often is in the ROR arena it is not always the "right" way to do it.
What are you trying to achieve here? A better understanding of the problem would probably reveal a more natural solution.
If you were doing this with PostgreSQL now you could probably get away with a JSON type field and then just store whatever in the json hash.

Resources