solr, sunspot, bad request, illegal character - ruby-on-rails

I am introducing sunspot search into my project. I got a POC by just searching by the name field. When I introduced the description field and reindexed sold I get the following error.
** Invoke sunspot:reindex (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute sunspot:reindex
Skipping progress bar: for progress reporting, add gem 'progress_bar' to your Gemfile
rake aborted!
RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request
Error: {'responseHeader'=>{'status'=>400,'QTime'=>18},'error'=>{'msg'=>'Illegal character ((CTRL-CHAR, code 11))
at [row,col {unknown-source}]: [42,1]','code'=>400}}
Request Data: "<?xml version=\"1.0\" encoding=\"UTF-8\"?><add><doc><field name=\"id\">ItemsDesign 1322</field><field name=\"type\">ItemsDesign</field><field name=\"type\">ActiveRecord::Base</field><field name=\"class_name\">ItemsDesign</field><field name=\"name_text\">River City Clocks Musical Multi-Colored Quartz Cuckoo Clock</field><field name=\"description_text\">This colorful chalet style German quartz cuckoo clock accurately keeps time and plays 12 different melodies. Many colorful flowers are painted on the clock case and figures of a Saint Bernard and Alpine horn player are on each side of the clock dial. Two decorative pine cone weights are suspended beneath the clock case by two chains. The heart shaped pendulum continously swings back and forth.
On every
I assuming that the bad char is 
 that you can see at the bottom. that 
 is littered in a lot of the descriptions. I'm not even sure what char that is.
What can I do to get solr to ignore it or clean the data so that sold can handle it.
Thanks

Put the following in an initializer to automatically clean sunspot calls of any UTF8 control characters:
# config/initializers/sunspot.rb
module Sunspot
#
# DataExtractors present an internal API for the indexer to use to extract
# field values from models for indexing. They must implement the #value_for
# method, which takes an object and returns the value extracted from it.
#
module DataExtractor #:nodoc: all
#
# AttributeExtractors extract data by simply calling a method on the block.
#
class AttributeExtractor
def initialize(attribute_name)
#attribute_name = attribute_name
end
def value_for(object)
Filter.new( object.send(#attribute_name) ).value
end
end
#
# BlockExtractors extract data by evaluating a block in the context of the
# object instance, or if the block takes an argument, by passing the object
# as the argument to the block. Either way, the return value of the block is
# the value returned by the extractor.
#
class BlockExtractor
def initialize(&block)
#block = block
end
def value_for(object)
Filter.new( Util.instance_eval_or_call(object, &#block) ).value
end
end
#
# Constant data extractors simply return the same value for every object.
#
class Constant
def initialize(value)
#value = value
end
def value_for(object)
Filter.new(#value).value
end
end
#
# A Filter to allow easy value cleaning
#
class Filter
def initialize(value)
#value = value
end
def value
strip_control_characters #value
end
def strip_control_characters(value)
return value unless value.is_a? String
value.chars.inject("") do |str, char|
unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
str << char
end
str
end
end
end
end
end
Source (Sunspot Github Issues): Sunspot Solr Reindexing failing due to illegal characters

I tried the solution #thekingoftruth proposed, however it did not solve the problem. Found an alternative version of the Filter class in the same github thread that he links to and that solved my problem.
The main difference was the i use nested models through HABTM relationships.
This is my search block in the model:
searchable do
text :name, :description, :excerpt
text :venue_name do
venue.name if venue.present?
end
text :artist_name do
artists.map { |a| a.name if a.present? } if artists.present?
end
end
Here is the initializer that worked for me:
(in: config/initializers/sunspot.rb)
module Sunspot
#
# DataExtractors present an internal API for the indexer to use to extract
# field values from models for indexing. They must implement the #value_for
# method, which takes an object and returns the value extracted from it.
#
module DataExtractor #:nodoc: all
#
# AttributeExtractors extract data by simply calling a method on the block.
#
class AttributeExtractor
def initialize(attribute_name)
#attribute_name = attribute_name
end
def value_for(object)
Filter.new( object.send(#attribute_name) ).value
end
end
#
# BlockExtractors extract data by evaluating a block in the context of the
# object instance, or if the block takes an argument, by passing the object
# as the argument to the block. Either way, the return value of the block is
# the value returned by the extractor.
#
class BlockExtractor
def initialize(&block)
#block = block
end
def value_for(object)
Filter.new( Util.instance_eval_or_call(object, &#block) ).value
end
end
#
# Constant data extractors simply return the same value for every object.
#
class Constant
def initialize(value)
#value = value
end
def value_for(object)
Filter.new(#value).value
end
end
#
# A Filter to allow easy value cleaning
#
class Filter
def initialize(value)
#value = value
end
def value
if #value.is_a? String
strip_control_characters_from_string #value
elsif #value.is_a? Array
#value.map { |v| strip_control_characters_from_string v }
elsif #value.is_a? Hash
#value.inject({}) do |hash, (k, v)|
hash.merge( strip_control_characters_from_string(k) => strip_control_characters_from_string(v) )
end
else
#value
end
end
def strip_control_characters_from_string(value)
return value unless value.is_a? String
value.chars.inject("") do |str, char|
unless char.ascii_only? && (char.ord < 32 || char.ord == 127)
str << char
end
str
end
end
end
end
end

You need to get rid of control characters from UTF8 while saving your content. Solr will not reindex this properly and throw this error.
http://en.wikipedia.org/wiki/UTF-8#Codepage_layout
You can use something like this:
name.gsub!(/\p{Cc}/, "")
edit:
If you want to override it globally I think it could be possible by overriding value_for_methods in AttributeExtractor and if needed BlockExtractor.
https://github.com/sunspot/sunspot/blob/master/sunspot/lib/sunspot/data_extractor.rb
I wasn't checking this.
If you manage to add some global patch, please let me know.
I had lately same issue.

Related

How to track objects "called" inside a block?

Question:
I need to know the records' attributes that have been called inside a block (say I need something like the following):
def my_custom_method(&block)
some_method_that_starts_tracking
block.call
some_method_that_stops_tracking
puts some_method_that_returns_called_records_attributes
do_something_about(some_method_that_returns_called_records_attributes)
end
my_custom_method { somecodethatcallsauthorofbook1andemailandfirstnameofuser43 }
# this is the `puts` output above (just as an example)
# => {
# #<Book id:1...> => [:author],
# #<User id:43...> => [:email, :first_name]
# }
code inside the block can be anything
Specifically, I meant to track any instance of a subclass of ApplicationRecord, so it can be instance of any models like Book, User, etc...
Attempts:
From my understanding, this is similar to how rspec works when a method is expected to be called. That it somehow tracks any calls of that method. So, my initial attempt is to do something like the following (which does not yet fully work):
def my_custom_method(&block)
called_records_attributes = {}
ApplicationRecord.descendants.each do |klass|
klass.class_eval do
attribute_names.each do |attribute_name|
define_method(attribute_name) do
called_records_attributes[self] ||= []
called_records_attributes[self] << attribute_name
self[attribute_name]
end
end
end
end
block.call
# the above code will work but at this point, I don't know how to clean the methods that were defined above, as the above define_methods should only be temporary
puts called_records_attributes
end
my_custom_method { Book.find_by(id: 1).title }
# => {
# #<Book id: 1...> => ['title']
# }
the .descendants above probably is not a good idea because Rails use autoload if I'm not mistaken
as already said above in the comment, I do not know how to remove these "defined_methods" that are just supposed to be only temporary for the duration of this "block".
furthermore, my code above would probably have overriden the "actual" attribute getters of the models, if ever any has been already defined, which is bad.
Background:
I am writing a gem live_record which I am adding a new feature that will allow a developer to just simply write something like
<!-- app/views/application.html.erb -->
<body>
<%= live_record_sync { #book.some_custom_method_about_book } %>
</body>
... which will render #book.some_custom_method_about_book as-is on the page, but at the same time the live_record_sync wrapper method would take note of all the attributes that have been called inside that block (i.e. inside some_custom_method_about_book the #book.title is called), and then it sets these attributes as the block's own "dependencies", in which later when that specific book's attribute has been updated, I can already also update directly the HTML page of which this attribute is a "dependency" as like specified just above. I am aware that this is not an accurate solution, but I'd like to open up my chances by experimenting on this first.
-- Rails 5
Disclaimer: I believe this is just a mediocre solution, but hopefully helps anyone with the same problem.
I tried reading rspec source code, but because I couldn't easily comprehend what is happening under the hood, and that it occurred to me that rspec's (i.e.) expect(Book.first).to receive(:title) is different from what I really want because the methods there are already specified (i.e. :title), while what I want is to track ANY methods that are attributes, so because of these two reasons I skipped reading further, and attempted my own solution, which hopefully did somehow work; see below.
Note that I am using Thread local-storage here, so this code should be thread-safe (untested yet).
# lib/my_tracker.rb
class MyTracker
Thread.current[:my_tracker_current_tracked_records] = {}
attr_accessor :tracked_records
class << self
def add_to_tracked_records(record, attribute_name)
Thread.current[:my_tracker_current_tracked_records][{model: record.class.name.to_sym, record_id: record.id}] ||= []
Thread.current[:my_tracker_current_tracked_records][{model: record.class.name.to_sym, record_id: record.id}] << attribute_name
end
end
def initialize(block)
#block = block
end
def call_block_while_tracking_records
start_tracking
#block_evaluated_value = #block.call
#tracked_records = Thread.current[:my_tracker_current_tracked_records]
stop_tracking
end
def to_s
#block_evaluated_value
end
# because I am tracking record-attributes, and you might want to track a different object / method, then you'll need to write your own `prepend` extension (look for how to use `prepend` in ruby)
module ActiveRecordExtensions
def _read_attribute(attribute_name)
if Thread.current[:my_tracker_current_tracked_records] && !Thread.current[:my_tracker_is_tracking_locked] && self.class < ApplicationRecord
# I added this "lock" to prevent infinite loop inside `add_to_tracked_records` as I am calling the record.id there, which is then calling this _read_attribute, and then loops.
Thread.current[:my_tracker_is_tracking_locked] = true
::MyTracker.add_to_tracked_records(self, attribute_name)
Thread.current[:my_tracker_is_tracking_locked] = false
end
super(attribute_name)
end
end
module Helpers
def track_records(&block)
my_tracker = MyTracker.new(block)
my_tracker.call_block_while_tracking_records
my_tracker
end
end
private
def start_tracking
Thread.current[:my_tracker_current_tracked_records] = {}
end
def stop_tracking
Thread.current[:my_tracker_current_tracked_records] = nil
end
end
ActiveSupport.on_load(:active_record) do
prepend MyTracker::ActiveRecordExtensions
end
ActiveSupport.on_load(:action_view) do
include MyTracker::Helpers
end
ActiveSupport.on_load(:action_controller) do
include MyTracker::Helpers
end
Usage Example
some_controller.rb
book = Book.find_by(id: 1)
user = User.find_by(id: 43)
my_tracker = track_records do
book.title
if user.created_at == book.created_at
puts 'same date'
end
'thisisthelastlineofthisblockandthereforewillbereturned'
end
puts my_tracker.class
# => #<MyTracker ... >
puts my_tracker.tracked_records
# => {
# {model: :Book, record_id: 1} => ['title', 'created_at'],
# {model: :User, record_id: 43} => ['created_at']
# }
puts my_tracker
# => 'thisisthelastlineofthisblockandthereforewillbereturned'
# notice that `puts my_tracker` above prints out the block itself
# this is because I defined `.to_s` above.
# I need this `.to_s` so I can immediately print the block as-is in the views.
# see example below
some_view.html.erb
<%= track_records { current_user.email } %>
P.S. Maybe it's better that I wrap this up as a gem. If you're interested, let me know

wrong number of arguments (given 0, expected 4)

I am getting this error for this set up. My thought is that the file cannot properly access the csv. That I am attempting to import. I've got to import from one csv to create another csv using the model date. What do I put in the controller and views to show the new csv / manipulated data? Basically how can I pass one csv file in a model for manipulation (orders.csv) and out into another csv file (redemption.csv) the code in the model is just telling model to calculate the existing numbers in orders.csv a certain way for export without this argument error?
The controller (I don't really know what to do here)
class OrdersController < ApplicationController
def index
orders = Order.new
end
def redemptions
orders = Order.new
end
end
The View (not confident about this either)
<h1>Chocolates</h1>
puts "#{order.purchased_chocolate_count}"
<%= link_to "CSV", orders_redemptions_path, :format => :csv %>
Model
require 'csv'
# Define an Order class to make it easier to store / calculate chocolate tallies
class Order < ActiveRecord::Base
module ChocolateTypes
MILK = 'milk'
DARK = 'dark'
WHITE = 'white'
SUGARFREE = 'sugar free'
end
BonusChocolateTypes = {
ChocolateTypes::MILK => [ChocolateTypes::MILK, ChocolateTypes::SUGARFREE],
ChocolateTypes::DARK => [ChocolateTypes::DARK],
ChocolateTypes::WHITE => [ChocolateTypes::WHITE, ChocolateTypes::SUGARFREE],
ChocolateTypes::SUGARFREE => [ChocolateTypes::SUGARFREE, ChocolateTypes::DARK]
}
# Ruby has this wacky thing called attr_reader that defines the available
# operations that can be performed on class member variables from outside:
attr_reader :order_value
attr_reader :chocolate_price
attr_reader :required_wrapper_count
attr_reader :order_chocolate_type
attr_reader :chocolate_counts
def initialize(order_value, chocolate_price, required_wrapper_count, order_chocolate_type)
#order_value = order_value
#chocolate_price = chocolate_price
#required_wrapper_count = required_wrapper_count
#order_chocolate_type = order_chocolate_type
# Initialize a new hash to store the chocolate counts by chocolate type.
# Set the default value for each chocolate type to 0
#chocolate_counts = Hash.new(0);
process
end
# Return the number of chocolates purchased
def purchased_chocolate_count
# In Ruby, division of two integer values returns an integer value,
# so you don't have to floor the result explicitly
order_value / chocolate_price
end
# Return the number of chocolate bonuses to award (which can include
# multiple different chocolate types; see BonusChocolateTypes above)
def bonus_chocolate_count
(purchased_chocolate_count / required_wrapper_count).to_i
end
# Process the order:
# 1. Add chocolate counts to the totals hash for the specified order type
# 2. Add the bonus chocolate types awarded for this order
def process
chocolate_counts[order_chocolate_type] += purchased_chocolate_count
bonus_chocolate_count.times do |i|
BonusChocolateTypes[order_chocolate_type].each do |bonus_chocolate_type|
chocolate_counts[bonus_chocolate_type] += 1
end
end
end
# Output the chocolate counts (including bonuses) for the order as an array
# of strings suitable for piping to an output CSV
def csv_data
ChocolateTypes.constants.map do |output_chocolate_type|
# Get the display string (lowercase)
chocolate_key = ChocolateTypes.const_get(output_chocolate_type)
chocolate_count = chocolate_counts[chocolate_key].to_i
"#{chocolate_key} #{chocolate_count}"
end
end
end
# Create a file handle to the output file
CSV.open("redemptions.csv", "wb") do |redemption_csv|
# Read in the input file and store it as an array of lines
input_lines = CSV.read("orders.csv")
# Remove the first line from the input file (it just contains the CSV headers)
input_lines.shift()
input_lines.each do |input_line|
order_value, chocolate_price, required_wrapper_count, chocolate_type = input_line
# Correct the input values to the correct types
order_value = order_value.to_f
chocolate_price = chocolate_price.to_f
required_wrapper_count = required_wrapper_count.to_i
# Sanitize the chocolate type from the input line so that it doesn't
# include any quotes or leading / trailing whitespace
chocolate_type = chocolate_type.gsub(/[']/, '').strip
order = Order.new(order_value, chocolate_price, required_wrapper_count, chocolate_type)
order.process()
puts order.purchased_chocolate_count
# Append the order to the output file as a new CSV line
output_csv << order.csv_data
end
end
In Your initialize method you are not provide default value to argument.
def initialize(order_value, chocolate_price, required_wrapper_count, order_chocolate_type)
When you are trying to run orders = Order.new it is expecting four argument and you haven't provide it.
One more issue. Your local variable name should be order not orders for proper naming convention.
To assign default values properly, you can look here.

rails get output from controller methods?

I have the following code in my Application Controller
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
def get_mkts(all_idx)
mkts = Set.new
all_idx.each do |idx|
m = decode_index_names(idx)
puts m[:mkt]
mkts.add(m[:mkt])
end
end
def decode_index_names(name)
mkt = name.split(/[0-9]/)[0]
type = get_first_num(mkt);
{:mkt => mkt,:type => type}
end
def get_first_num(str)
str[/\d+/]
end
end
And I'm inputting an array of strings like this:
["USEQUITIES2tv10", "USEQUITIES2tv15", "USEQUITIES2tv20", "NONUSEQUITIES2tv5", "NONUSEQUITIES2tv10", "NONUSEQUITIES2tv15", "NONUSEQUITIES2tv20", "BONDS2tv5", "BONDS2tv10", "BONDS2tv15", "BONDS2tv20"
, "ES1", "ES2tv5", "ES2tv10", "ES2tv15", "ES2tv20", "NQ1", "NQ2tv5", "NQ2tv10", "NQ2tv15", "USBONDS2tv5", "USBONDS2tv10", "USBONDS2tv15", "USBONDS2tv20", "GERMANBONDS2tv5", "GERMANBONDS2tv10", "GERMANB
ONDS2tv15", "GERMANBONDS2tv20", "EQUITIESnBONDS2tv5", "EQUITIESnBONDS2tv10", "EQUITIESnBONDS2tv15", "EQUITIESnBONDS2tv20", "COMMODITIES2tv5", "COMMODITIES2tv10", "COMMODITIES2tv15", "COMMODITIES2tv20",
"CURRENCIES2tv5"]
The method get_mkts is supposed to loop through, extract the text up tot the first number and create a unique array of symbols (which is why i used Set). However, I can't get the method to output anything other than the original input. In rails console I'm able to see from the output of "puts m[:mkt]" that each loop through is getting the correct value, I just don't know how to return the set mkts instead of the input value. Any ideas?
Ruby methods return the result of the last statement if you don't use return. In your case it's each and that's why you get the input back. You can do something like this:
def get_mkts(all_idx)
mkts = Set.new
all_idx.each do |idx|
m = decode_index_names(idx)
puts m[:mkt]
mkts.add(m[:mkt])
end
mkts
end
This will return the mkts set instead of all_idx.
The method can be rewritten has:
def get_mkts(all_idx)
all_idx.map { |idx| decode_index_names(idx) }.uniq
end
Looks more rubyish and its shorter and cleaner

How to pass Arguments and use those in (resque-status) Resque::JobWithStatus?

my resque worker class is:
require 'resque'
require 'resque/job_with_status'
class PatstatResqueWorker < Resque::JobWithStatus
#queue = :my_worker_q
def self.perform(query, label)
puts "query:"
puts options['query']
puts "label:"
puts options['label']
end
end
and my controller part, where I call this resque is...
class MyController < ApplicationController
def resque
job_id = PatstatResqueWorker.create(:query => #query, :label => "yes")
status = Resque::Plugins::Status::Hash.get(job_id)
end
end
and its not working :(
if i remove the parameter from resque function it says Wrong number of arguments (2 for 0) and if i add the parameter section back it says options not defined :(
Could you help?
The reason you're getting the "options not defined" error is that you haven't defined options in the method that uses it. Your self.perform method expects to receive two distinct arguments, query and label, but the code inside the method expects to have an options hash. You've got to choose one or the other.
Either do this:
def self.perform(query, label)
# use the parameters we've already defined
puts "query:"
puts query
puts "label:"
puts label
end
# call it like this
PatstatResqueWorker.create(#query, "yes")
Or else do this:
# change the method signature to match what you're doing
def self.perform(options)
puts "query:"
puts options['query']
puts "label:"
puts options['label']
end
# call it like this, with string keys
PatstatResqueWorker.create('query' => #query, 'label' => "yes")
Notice that with the hash version, I changed the call to use strings for the hash keys instead of symbols. You can use symbols if you want, but you'd have to change it in the body of the method as well (i.e. options[:query] instead of options['query']). You've just got to be consistent.

Convert User input to integer

So I have a form where users can input a price. I'm trying to make a before_validation that normalizes the data, clipping the $ if the user puts it.
before_validation do
unless self.price.blank? then self.price= self.price.to_s.gsub(/\D/, '').to_i end
end
If user inputs $50 This code is giving me 0. If user inputs 50$ this code gives me 50. I think since the data type is integer that rails is running .to_i prior to my before_validation and clipping everything after the $. This same code works fine if the data type is a string.
Anyone have a solution that will let me keep the integer datatype?
One way is to override the mechanism on the model that sets the price, like this:
def price=(val)
write_attribute :price, val.to_s.gsub(/\D/, '').to_i
end
So when you do #model.price = whatever, it will go to this method instead of the rails default attribute writer. Then you can convert the number and use write_attribute to do the actual writing (you have to do it this way because the standard price= is now this method!).
I like this method best, but for reference another way to do it is in your controller before assigning it to the model. The parameter comes in as a string, but the model is converting that string to a number, so work with the parameter directly. Something like this (just adapt it to your controller code):
def create
#model = Model.new(params[:model])
#model.price = params[:model][:price].gsub(/\D/, '').to_i
#model.save
end
For either solution, remove that before_validation.
I would define a virtual attribute and do my manipulation there allowing you to format and modify both the getter and setter at will:
class Model < ActiveRecord::Base
def foo_price=(price)
self.price = price... #=> Mods to string here
end
def foo_price
"$#{price}"
end
You also might want to note that:
"$50.00".gsub(/\D/, '').to_i #=> 5000
My soluction
colum price type decimal
t.decimal :price, precision: 12, scale: 6
# app/concern/sanitize_fields.rb
module SanitizeFields
extend ActiveSupport::Concern
def clear_decimal(field)
return (field.to_s.gsub(/[^\d]/, '').to_d / 100.to_d) unless field.blank?
end
def clear_integer(field)
field.to_s.strip.gsub(/[^\d]/, '') unless field.blank?
end
# module ClassMethods
# def filter(filtering_params)
# results = self.where(nil)
# filtering_params.each do |key, value|
# results = results.public_send(key, value) if value.present?
# end
# results
# end
#
# #use
# #def index
# # #products = Product.filter(params.slice(:status, :location, :starts_with))
# #end
#
# end
end
#app/controllers/products_controller.rb
include SanitizeFields
params[:product][:price] = clear_decimal(params[:product][:price])

Resources