text classification predefined categories with documents with Omnicat-bayes - ruby-on-rails

I'm using omnicat-bayes to analyse documents (text-classification). With this gem I'm able to create categories and "feed" those with documents. Currently the categories have enough documents in order to be "good enough" to recognize new documents in what category they should be placed in.
Now in my Documents controller under the create action are a few steps.
Creating a new Bayes instance
Creating the categories that will be used
Taking the pre-documents to train the categories
Actually training the categories
(all of those steps are under the run_all function)
The create action:
def create
#document = Document.new(document_params)
#document.case_id = #case.id
if #document.save
run_all
# Running the classify function on reden aanmelding
classify_one = #bayes.classify(#document.reden_aanmelding)
document_category = classify_one.to_hash[:top_score_key]
# Updating the document category by the top key returned by Bayes
#document.update_attribute(:category, document_category)
finding_required_records
# Training Cees Buddy with the document that got saved
#bayes.train(document_category, #document.reden_aanmelding)
redirect_to case_path(#case)
else
render :new
end
end
Inside the #document.save run_all function (I know this isn't really best practice) I'm creating the four steps named above.
Now after the create function is finished the Bayes instance is gone and the AI is now "stupid" again so to speak.
My question is: what would a proper place be and how can I accomplish this to create the new instance, new categories and feed them with documents out of my database. Would a singleton be interesting here?

This is quite a tricky problem, given that you'll probably want to scale the application to deal with more than a handful of documents.
The thing is that a production-mode Rails application web-server will usually fork into multiple processes or even run on more than one machine. Which means that documents trained in one process will be unknown on all the others, even if you use a singleton pattern.
So with only the omnicat-bayes gem, the best way to go about it is to create some kind of separate micro service that runs in its own process and does nothing more than process documents. The main application should then enqueue the processing into asynchronous jobs so it is okay if things take a bit longer in case the training process is busy with other documents.
How you communicate with this external OmniCat instance is up to you. The most comfortable way might be dRuby but I should add that I have no production-mode experience with it. A more future-proof solution would be to use some simple HTTP + JSON. In that case you could even switch out the service that does training and categorisation with some more powerful library that's not based on Ruby in the future.

Related

Rails: too many methods in model

TL;DR: I don't know how organise my logic domain classes.
I have the model "Application", this model is in the "core" of the App and is the way I "enter" and operate over other models like:
#application = Application.find(params[:application_id])
#application.payment.update_attribute 'active', true
or
unless #application.report.status
or
#application.set_income(params[:income][:new_income])
so the models Payment, Income and Report are basically empty because I initialise the Application model and from there I do things "on cascade" to change the "subordinated" models. But now the Application model has more than forty methods and 600 lines.
I'm doing it right? For instance when I want to add a new Payment I like to do :
payment = Payment.create params
inside the Application model because ActiveRecord "knows" how to handle the foreign keys automatically. I could create the payment inside the Payment model using:
application = Application.find(application_id)
params[:application_id] = application.id
self.create params
but this way, I need to set the Application.id manually and that looks more verbose and not elegant.
So --if I want to reduce my Application model--, should I create modules in APP/lib directory or should I move methods to the other models?
should I create modules in APP/lib directory
Basically, yes, that's what you should do. Although I'd probably make them classes rather than modules. The pattern it sounds like you're after is called "service Objects" (or sometimes "use cases"). What this does is takes the logic from a specific operation you want to perform, and puts it in it's own self-contained class. That class then collaborates with whatever models it needs to. So, your models stay quite small, and your "Service Classes" follow the Single Responsibility Principle. Your controllers then usually call a single "service class" to do what they need to do - so your controllers stay pretty minimal too.
If you google "rails service objects" or similar, you'll find lots of great stuff, but here's some resources to get you started.
Service objects rails casts: https://www.youtube.com/watch?v=uIp6N89PH-c
https://webuild.envato.com/blog/a-case-for-use-cases/
https://blog.engineyard.com/2014/keeping-your-rails-controllers-dry-with-services
http://blog.codeclimate.com/blog/2012/10/17/7-ways-to-decompose-fat-activerecord-models/ (there's one section on service objects there)
Keep in mind, once you do start using service objects, you don't necessarily have to ALWAYS go through your Application model to get to the related ones. A service object might take an application_id and then do eg. #payment = Payment.find_by(application_id: application_id) and so you don't have to fetch the application instance at all and can manipulate the #payment variable directly.
The fact that Rails makes it "easy" and "pretty" to get to related models doesn't necessarily mean you should do it.
I would not worry about long controller and spec files in Rails.
These files tend to get very long and the usual advice of keeping classes and methods short does not necessarily apply for controllers and their specs.
For example, in our production system user_controller.rb is 8500 lines long and the corresponding user_controller_spec.rb is 7000 lines long.
This is the length of our top 10 controllers
1285 app/controllers/*********_controller.rb
1430 app/controllers/***********_controller.rb
1444 app/controllers/****_controller.rb
1950 app/controllers/****_controller.rb
1994 app/controllers/********_controller.rb
2530 app/controllers/***********_controller.rb
2697 app/controllers/*********_controller.rb
2998 app/controllers/*****_controller.rb
3134 app/controllers/application_controller.rb
8737 app/controllers/users_controller.rb
TL;DR: If your app has four models that are all tied to tables in your database (ie. leveraging ActiveRecord and inheriting from ActiveModel::Base), the framework is pretty opinionated toward using model classes.
Abstractions of the service class pattern can be useful in some cases, but give yourself a break. One of the advantages of Rails is that its supposed to remove a lot of the barriers to development, among many things, by making organization decisions for you. Leverage your model classes.
Let's see if this starts an epic developer bickering war.
Also, its ok to create interfaces in your models for related model creation:
class Application < ActiveModel::Base
has_one :payment
def create_payment(attrs)
payment.create(attrs)
end
end
And by ok, i mean that the framework will allow this. But remember, you're already inheriting from ActiveModel::Base which defines many instance methods, including create.
I would recommend, esp. if this is a small project and you're just getting your feet wet, to use well-named rails controllers to read and write objects to the database:
class ApplicationPaymentsController < ActionController::Base
def create
application = Application.find(params[:id])
application.create_payment(payment_params)
end
private
def payment_params
params.require(:payment).permit(:x, :y) - whatever your attr names are.
end
end
The sleekness you're looking for in abstracting foreign keys in creating a relational record is taken care of for you with Rails associations:
http://guides.rubyonrails.org/association_basics.html (good starting point)
http://apidock.com/rails/ActiveRecord/Associations/ClassMethods/has_one (more explicit docs)
That will help you slim down models if that is your goal. Just for clarification, this is one of those things that devs are extremely opinionated on, one way or another, but the truth is that there are code smells (which should be addressed) and then there are folks who arbitrary preach file length maxes. The most important thing in all of this is readable code.
A good litmus test for refactoring working code is put it down for a few weeks, come back to it, and if its confusing then put in some time to make it better (hopefully guided by already written test coverage). Otherwise, enjoy what you do, especially if you're working solo.

Optimising export of DB using Rails

I have a RoR application which contains an API to manage applications, each of which contain recipes (and groups, ingredients, measurements).
Once the user has finished managing the recipes, they download a JSON file of the entire application. Because each application could have hundreds of recipes, the files can be large. It also means there is a lot of DB calls to get all the required data to export.
Now because of this, the request to download the application can take upwards of 30 seconds, sometimes more.
My current code looks something like this:
application.categories.each do |c|
c.recipes.each do |r|
r.groups.each do |r|
r.ingredients.each do |r|
Within each loop I'm storing the data in a HASH and then giving it to the user.
My question is: where do I go from here?
Is there a way to grab all the data I require from the DB in one query? From looking at the log, I can see it is running hundreds of queries.
If the above solution is still slow, is this something I should put into a background process, and then email the user a link (or similar)?
There are of course ways to grab more data at once. This is done with Rails includes or joins, depending on your needs. See this article for some detailed information.
The basic idea is that you can join between your tables so that each time new queries aren't generated. When you do application.categories, that's one query. For each of those categories, you'll do another query: c.recipes - this creates N+1 queries, where N is the number of categories you have. Rather, you can include them off the get go to create 1 or 2 queries (depending on what Rails does).
The basic syntax is easy:
Application.includes(:categories => :recipes).each do |application| ...
This generates 1 (or 2 - again, see article) query that grabs all applications, their categories, and each categories recipies all at once. You can tack on the groups and ingredients too.
As for putting the work in the background, my suggestion would be to just have a loading image, or get fancy by using a progress bar.
First of all I have to assume that the required has_many and belongs_to associations exist.
Generally you can do something like
c.recipes.includes(:groups)
or even
c.recipes.includes(:groups => :ingredients)
which will fetch recipes and groups (and ingredients) at once.
But since you have a quite big data set IMO it would be better if you limited that technique to the deepest levels.
The most usefull approach would be to use find_each and includes together.
(find_each fetches the items in batches in order to keep the memory usage low)
perhaps something like
application.categories.each do |c|
c.recipes.find_each do |r|
r.groups.includes(:ingredients).each do |r|
r.ingredients.each do |r|
...
end
end
end
end
Now even that can take quite a long time (for an http request) so you can consider using some async processing where the client will generate a request that is going to be processed by the server as a background job, and when that is ready, you can provide a download link (or send an email) to the client.
Resque is one possible solution for handling the async part.

Rails callbacks, observers, models, and where to put methods and logic

I'm working on an app at work. Basic stuff, user signs up (with an associated organization).
Initially I started off with a simple controller -
# Need to check if organization exists already; deny user creation if it does
if #organization.save
#user.save
redirect_to user_dashboard_path...
I soon found myself in a callback soup:
After the organization is validated, we save the user.
When the organization is created, I create another two models, EmailTemplate and PassTemplate (an organization has_one :email_template, has_one :pass_template)
after_create :init_company, :init_email_template, :init_pass_template, :init_form
Each of those callbacks generally calls method on the model, something like:
def init_email_template
self.email_template.create_with_defaults
end
Initially I thought this was quite clever - doing so much behind the scenes, but I've been reading Code Complete by Steve McConnell, and feel this is not simple at all. If I didn't know what was happening already, There's no hint that any time an organization is created it creates 3 associated objects (and some of those objects in turn initialize children objects).
It seems like a bad programming practice, as it obfuscates what's going on.
I thought about moving all of those initalizations to the controller, as an organization is only ever created once:
class OrganizationsController < AC
...
def create
if #organization.save
#organization.create_user
#organization.create_email_template
#organization.create_pass_template
end
That seems like cleaner code, and much easier to follow.
Question 1
*Are there better solutions, or best practices for handling creating associated objects upon creation of the hub object that I'm unaware of?*
Side note - I would have to rewrite a bunch of tests that assume that associations are automatically created via callbacks - I'm okay with that if it's better, easier to understand code.
Question 2
**What about a similar situation with after_save callbacks?**
I have a customer model that checks to see if it has an associated user_account after creation, and if not, creates it. It also creates a Tag model for that user_account once we've created the user_account
class Customer < AR
after_create :find_or_create_user_account
def find_or_create_user_account
if !self.user_account_exists?
#create the user
end
Tag.create(:user_id => self.user_account.id)
end
end
Somewhat simplified, but again, I believe it's not particularly good programming. For one, I'm putting logic to create two different models in a third model. Seems sloppy and again the principle of separating logic. Secondly, the method name does not fully describe what it's doing. Perhaps find_or_create_user_account_and_tag would be a better name, but it also goes against the principle of having the method do one thing- keeping it simple.
After reading about observers and services, my world was thrown for a bit of a loop.
A few months ago I put everything in controllers. It was impossible to test well (which was fine because I didn't test). Now I have skinny controllers, but my models are obese and, I think, unhealthy (not clear, not obvious, harder to read and decipher for another programmer/myself in a few months).
Overall I'm just wondering if there are some good guides, information, or best practices on separation of logic, avoiding callback soup, and where to different sorts of code
Why not the following?
after_create :init_associated_objects
def init_associated_objects
init_company
init_email_template
init_pass_template
init_form
end
My interpretation with "a method should do one thing" isn't strict and that I usually have a method that calls other method (much like the one above). At the end of the day, it's a divide and conquer strategy.
Sometimes I create utility POROs (plain old ruby objects) when it doesn't make sense to have an AR model but a group of functionalities is a class' responsibility. Reports, for instance, are not AR-backed models but it's easier when a report that needs to call multiple models is just instantiated once where the reporting period start and end are instance variables.
A rule of thumb that I follow: if I instantiate the models outside of the whole MVC stack (e.g. Rails console), the things that I expect to happen should stay inside the model.
I don't claim best practices but these have worked for me so far. I'm sure other people would have a better idea on this.

Specifying and Executing Rules in Ruby

I am looking for a Ruby/Rails tool that will help me accomplish the following:
I would like to store the following string, and ones similar to it, in my database. When an object is created, updated, deleted, etc., I want to run through all the strings, check to see if the CRUD event matches the conditions of the string, and if so, run the actions specified.
When a new ticket is created and it's category=6 then notify user 1234 via email
I am planning to create an interface that builds these strings, so it doesn't need to be a human-readable string. If a JSONish structure is better, or a tool has an existing language, that would be fantastic. I'm kinda thinking something along the lines of:
{
object_types: ['ticket'],
events: ['created', 'updated'],
conditions:'ticket.category=6',
actions: 'notify user',
parameters: {
user:1234,
type:'email'
}
}
So basically, I need the following:
Monitor CRUD events - It would be nice if the tool had a way to do this, but Ican use Rails' ModelObservers here if the tool doesn't natively provide it
Find all matching "rules" - This is my major unknown...
Execute the requested method/parameters - Ideally, this would be defined in my Ruby code as classes/methods
Are there any existing tools that I should investigate?
Edit:
Thanks for the responses so far guys! I really appreciate you pointing me down the right paths.
The use case here is that we have many different clients, with many different business rules. For the rules that apply to all clients, I can easily create those in code (using something like Ruleby), but for all of the client-specific ones, I'd like to store them in the database. Ideally, the rule could be written once, stored either in the code, or in the DB, and then run (using something Resque for performance).
At this point, it looks like I'm going to have to roll my own, so any thoughts as to the best way to do that, or any tools I should investigate, would be greatly appreciated.
Thanks again!
I don't think it would be a major thing to write something yourself to do this, I don't know of any gems which would do this (but it would be good if someone wrote one!)
I would tackle the project in the following way, the way I am thinking is that you don't want to do the rule matching at the point the user saves as it may take a while and could interrupt the user experience and/or slow up the server, so...
Use observers to store a record each time a CRUD event happens, or to make things simpler use the Acts as Audited gem which does this for you.
1.5. Use a rake task, running from your crontab to run through the latest changes, perhaps every minute, or you could use Resque which does a good job of handling lots of jobs
Create a set of tables which define the possible rules a user could select from, perhaps something like
Table: Rule
Name
ForEvent (eg. CRUD)
TableInQuestion
FieldOneName
FieldOneCondition etc.
MethodToExecute
You can use a bit of metaprogramming to execute your method and since your method knows your table name and record id then this can be picked up.
Additional Notes
The best way to get going with this is to start simple then work upwards. To get the simple version working first I'd do the following ...
Install acts as audited
Add an additional field to the created audit table, :when_processed
Create yourself a module in your /lib folder called something like processrules which roughly does this
3.1 Grabs all unprocessed audit entries
3.2 Marks them as processed (perhaps make another small audit table at this point to record events happening)
Now create a rules table which simply has a name and condition statement, perhaps add a few sample ones to get going
Name: First | Rule Statement: 'SELECT 1 WHERE table.value = something'
Adapt your new processrules method to execute that sql for each changed entry (perhaps you want to restrict it to just the tables you are working with)
If the rule matched, add it to your log file.
From here you can extrapolate out the additional functionality you need and perhaps ask another question about the metaprogramaming side of dynamically calling methods as this question is quite broad, am more than happy to help further.
I tend to think the best way to go about task processing is to setup the process nicely first so it will work with any server load and situation then plug in the custom bits.
You could make this abstract enough so that you can specify arbitrary conditions and rules, but then you'd be developing a framework/engine as opposed to solving the specific problems of your app.
There's a good chance that using ActiveRecord::Observer will solve your needs, since you can hardcode all the different types of conditions you expect, and then only put the unknowns in the database. For example, say you know that you'll have people watching categories, then create an association like category_watchers, and use the following Observer:
class TicketObserver < ActiveRecord::Observer
# observe :ticket # not needed here, since it's inferred by the class name
def after_create(ticket)
ticket.category.watchers.each{ |user| notify_user(ticket, user) }
end
# def after_update ... (similar)
private
def notify_user(ticket, user)
# lookup the user's stored email preferences
# send an email if appropriate
end
end
If you want to store the email preference along with the fact that the user is watching the category, then use a join model with a flag indicating that.
If you then want to abstract it a step further, I'd suggest using something like treetop to generate the observers themselves, but I'm not convinced that this adds more value than abstracting similar observers in code.
There's a Ruby & Rules Engines SO post that might have some info that you might find useful. There's another Ruby-based rules engine that you may want to explore that as well - Ruleby.
Hope that this helps you start your investigation.

How to create a new database from within a Rails app?

I'm working on a Rails app that has one database per account. (I know this is a controversial approach in itself, but I'm confident it's the right one in this case.)
I'd like to automate entirely the process of creating a new user account, which means I need to be able create a new database and populate it with some seed data programatically from within a Rails app.
My question, then, is how best to do this? I don't think I can just run migrations from within the app (or, if I can, how?), and just running the straight SQL queries within the app with hardcoded CREATE TABLE statements seems a really unwieldy way of doing things. What approach should I take, then?
Thanks in advance for your help!
David
This is an approach that my application requires. The app provides a web front-end onto a number of remote embedded devices which in turn monitor sensors. Each embedded device runs a ruby client process which reads a config file to determine its setup. There is a need to be able to add a new sensor type.
The approach I have is that each sensor type has it's own data table, which is written into by every device which has that sensor. So in order to be able to create a new sensor type, I need to be able to set up new tables.
One initial issue is that the remote embedded devices do not have a rails app on them - therefore table name pluralization is a bad plan, as the pluralization rules are not accessible to the remote devices. Therefore I set
ActiveRecord::Base.pluralize_table_names = false
in config/environment.rb
The data on each sensor device type is held in a SensorType model - which has two fields - the sensor name, and the config file contents.
Within the SensorType model class, there are methods for:
Parsing the config file to extract field names and types
Creating a migration to build a new model
Altering a particular field in the DB from a generic string to char(17) as it is a MAC address used for indexing
Altering the new model code to add appropriate belongs_to relationships
Build partial templates for listing the data in the table (a header partial and a line_item partial)
These methods are all bound together by a create_sensor_table method which calls all the above, and performs the appropriate require or load statements to ensure the new model is immediately loaded. This is called from the create method in the SensorTypeController as follows:
# POST /device_types
# POST /device_types.xml
def create
#sensor_type = SensorType.new(params[:sensor_type])
respond_to do |format|
if #sensor_type.save
#sensor_type.create_sensor_tables
flash[:notice] = 'SensorType was successfully created.'
#etc

Resources