I'm trying to move data from one database to another from within a rake task.
However, I'm getting some fruity encoding issues on some of the data:
rake aborted!
PGError: ERROR: invalid byte sequence for encoding "UTF8": 0x92
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
What can I do to resolve this error and get the data in? As far as I can tell (not knowing anything about encoding), the source DB is latin1.
if both databases are PG then you can export and import the whole database using the pg_dump options to change the encoding... that would probably the most performant way to do it
if you do this via a rake task you can do the transcoding inside your rake-task... that actually means you will have to touch every attribute and reencode it...
as it seems your new database is utf8 whereas the old is latin1
you could do it by having every string/text/text-like value encoded using... checking for respond_to?(:encoding) makes sure the data is encoded only if it has some encoding information attached, i. e. numeric values wont be transcoded
def transcode(data, toEnc = 'utf8')
if data.respond_to?(:encoding) && data.encoding.name != toEnc
return data.dup.force_encoding toEnc
end
data
end
now you can just read a record from the old db, run it through this method and then write it to the new database
u = OldDBUser.first
u.attribute_names.each { |x|
u[x.to_sym] = transcode u[x.to_sym]
}
#... whatever you do with the transcoded u
... well I have not tested those, but please do, maybe its all you need
Related
Suppose we have some arbitrary Active Record object
obj = User.first
Is there a way to convert this into a text representation?
That is, is there a way to convert the object into some code that can be dropped into a completely different rails console to regenerate that same object?
The closest example I can give of this functionality is the dput() function from the R programming language. Is there an equivalent in ruby / rails, preferably one that works with Active Record objects?
Ruby has the Marshal module:
The marshaling library converts collections of Ruby objects into a
byte stream, allowing them to be stored outside the currently active
script. This data may subsequently be read and the original objects
reconstituted.
str = Marshal.dump(obj)
# => "\x04\bo:\nThing\x1A:\x10#new_recordF:\x10#attributeso:\x1EActiveModel::AttributeSet\x06;\a{\tI\"\aid\x06:\x06ETo:)ActiveModel::Attribute::FromDatabase\n:\n#name#\b:\x1C#value_before_type_casti\x06:\n#typeo:EActiveRecord::ConnectionAdapters::SQLite3Adapter::SQLite3Integer\t:\x0F#precision0:
You can then load the object back into memory:
restored_obj = Marshal.load(
StringIO.new(str) # usually this would be from a IO stream like a file
)
It has some pretty serious security implications though if you're accepting user input and other serialization formats like JSON or Yaml should be considered. Three are also issues if you use it for caching and then change Ruby versions.
Rails models in recent versions also support Global ID - which doesn't give you the exact same object but it gives you a URI which can be used to load the same record from the database.
gid = User.first.to_global_id
obj = GlobalID::Locator.locate(gid)
This is how ActiveJob passes around references to models.
So i initially had a foreign id tutor_id as type string. So i ran the following migrations.
change_column(:profiles, :tutor_id, 'integer USING CAST(tutor_id AS integer)')
The problem is that there was data already created which initially contained the tutor_id as type string. I did read however that by using CAST, the data should be converted into an integer.
So just to confirm i went into heroku run rails console to check the tutor_id of the profiles and tutor_id.is_a? Integer returns true.
However i am currently getting this error
ActionView::Template::Error (PG::UndefinedFunction: ERROR: operator does not exist: integer = text at character 66
Why is that so? Is the only way out to delete the data and to recreate it?
(I'm assuming the information provided above is enough to draw a conclusion, else i will add the relevant information too.)
You also have to update your code to use integers rather than strings. This error happens because your code somewhere still has the column type as string and the query sent has the value sent as '123'. PostgreSQL doesn't do automatic type conversions so it's telling you it can't do the comparison.
I'm having trouble with UTF8 chars in Ruby 2.1.5 and Rails 4.
The problem is, the data which come from an external service are like that:
"first_name"=>"ezgi \xE7enberci"
"last_name" => "\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"
These characters mostly include Turkish alphabet characters like "üğşiçö". When the application tries to save these data, the errors below occur:
ArgumentError: invalid byte sequence in UTF-8
Mysql2::Error: Incorrect string value
How can I fix this?
What's Wrong
Ruby thinks you have invalid byte sequences because your strings aren't UTF-8. For example, using the rchardet gem:
require 'chardet'
["ezgi \xE7enberci", "\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"].map do str
puts CharDet.detect str
end
#=> [{"encoding"=>"ISO-8859-2", "confidence"=>0.8600826867857209},
{"encoding"=>"windows-1255", "confidence"=>0.5807177322740268}]
How to Fix It
You need to use String#scrub or one of the encoding methods like String#encode! to clean up your strings first. For example:
hash = {"first_name"=>"ezgi \xE7enberci",
"last_name"=>"\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"}
hash.each_pair { |k,v| k[v.encode! "UTF-8", "ISO-8859-2"] }
#=> {"first_name"=>"ezgi çenberci", "last_name"=>"üţçđiţţöç"}
Obviously, you may need to experiment a bit to figure out what the proper encoding is (e.g. ISO-8859-2, windows-1255, or something else entirely) but ensuring that you have a consistent encoding of your data set is going to be critical for you.
Character encoding detection is imperfect. Your best bet will be to try to find out what encoding your external data source is using, and use that in your string encoding rather than trying to detect it automatically. Otherwise, your mileage may vary.
That doesn't look like utf-8 data so this exception is normal. Sounds like you need to tell ruby what encoding the string is actually in:
some_string.force_encoding("windows-1254")
You can then convert to UTF8 with the encode method. There are gems (eg charlock_holmes) that have heuristics for auto detecting encodings if you're getting a mix of encodings
So I've done a couple of days worth of research on the matter, and the general consensus is that there isn't one. So I was hoping for an answer more specific to my situation...
I'm using Rails to import a file into a database. Everything is working regarding the import, but I'm wanting to give the database itself an attribute, not just every entry. I'm creating a hash of the file, and I figured it'd be easiest to just assign it to the database (or the class).
I've created a class called Issue (and thus an 'issues' database) with each entry having a couple of attributes. I was wanting to figure out a way to add a class variable (at least, that's what I think is the best option) to Issue to simply store the hash. I've written a rake to import the file, iff the new file is different than the previous file imported (read, if the hash's are different).
desc "Parses a CSV file into the 'issues' database"
task :issues, [:file] => :environment do |t, args|
md5 = Digest::MD5.hexdigest(args[:file])
puts "1: Issue.md5 = #{Issue.md5}"
if md5 != Issue.md5
Issue.destroy_all()
#import new csv file
CSV.foreach(args[:file]) do |row|
issue = {
#various attributes to be columns...
}
Issue.create(issue)
end #end foreach loop
Issue.md5 = md5
puts "2: Issue.md5 = #{Issue.md5}"
end #end if statement
end #end task
And my model is as follows:
class Issue < ActiveRecord::Base
attr_accessible :md5
##md5 = 5
def self.md5
##md5
end
def self.md5=(newmd5)
##md5 = newmd5
end
attr_accessible #various database-entry attributes
end
I've tried various different ways to write my model, but it all comes down to this. Whatever I set the ##md5 in my model, becomes a permanent change, almost like a constant. If I change this value here, and refresh my database, the change is noted immediately. If I go into rails console and do:
Issue.md5 # => 5
Issue.md5 = 123 # => 123
Issue.md5 # => 123
But this change isn't committed to anything. As soon as I exit the console, things return to "5" again. It's almost like I need a .save method for my class.
Also, in the rake file, you see I have two print statements, printing out Issue.md5 before and after the parse. The first prints out "5" and the second prints out the new, correct hash. So Ruby is recognizing the fact that I'm changing this variable, it's just never saved anywhere.
Ruby 1.9.3, Rails 3.2.6, SQLite3 3.6.20.
tl;dr I need a way to create a class variable, and be able to access it, modify it, and re-store it.
Fixes please? Thanks!
There are a couple solutions here. Essentially, you need to persist that one variable: Postgres provides a key/value store in the database, which would be most ideal, but you're using SQLite so that isn't an option for you. Instead, you'll probably need to use either redis or memcached to persist this information into your database.
Either one allows you to persist values into a schema-less datastore and query them again later. Redis has the advantage of being saved to disk, so if the server craps out on you you can get the value of md5 again when it restarts. Data saved into memcached is never persisted, so if the memcached instance goes away, when it comes back md5 will be 5 once again.
Both redis and memcached enjoy a lot of support in the Ruby community. It will complicate your stack slightly installing one, but I think it's the best solution available to you. That said, if you just can't use either one, you could also write the value of md5 to a temporary file on your server and access it again later. The issue there is that the value then won't be shared among all your server processes.
We recently lost a database and I want to recover the data from de Production.log.
Every request is logged like this:
Processing ChamadosController#create (for XXX.XXX.XXX.40 at 2008-07-30 11:07:30) [POST]
Session ID: 74c865cefa0fdd96b4e4422497b828f9
Parameters: {"commit"=>"Gravar", "action"=>"create", "funcionario"=>"6" ... (all other parameters go here).
But some stuff to post on de database were in the session. In the request I have the Session ID, and I also have all the session files from the server.
Is there anyway I can, from this Session ID, open de session file and get it's contents?
It's probably best to load the session file into a hash -- using the session-id as the key -- and then go through all the log files in chronological order, and parse out the relevant info for each session, and modify your database with it.
I guess you're starting out with an old database backup? Make sure to do this in a separate Rails environment -- e.g. don't do this in production; create and use a separate "recovery" environment / DB.
think about some sanity checks you can run on the database afterwards, to make sure that the state of the records makes sense
Going forward:
make sure that you do regular backups going forward (e.g. with mysqldump if you use MySQL).
make sure to set up your database for master/slave replication
hope this helps -- good luck!
Have you tried using Marshal#load? I'm not sure how you're generating those session files, but it's quite possible Rails just uses Marshal.
A client exactly had the same problem a few weeks ago. I came up with the following solution:
play back the latest backup you have (in our case it was one year
old)
write a small parser that moves all the requests from production in a temporary database (i chose mongodb for that): i used a rake task and "eval" to create the hash.
play back the data in the following order
play in the first create of an object, if it does not already exist.
find the last update (by date) and play it back.
here is the regex for scanning the production.log:
file = File.open("location_of_your_production.log", "rb")
contents = file.read
contents.scan(/(Started POST \"(.*?)\" for (.*?) at (.*?)\n.*?Parameters: \{(.*?)\}\n.*?Completed (.*?) in (.*?)ms)/m).each do |x|
# now you can collect all the important data.
# do the same for GET requests as well, if you need it.
end
In my case, the temporary database speeded up the process of the logfile parsing, so the above noted steps could be taken. Of course, everything that was not sent over production.log will be lost. Also, updates of the objects would send the whole information, it might be different in your case. I could also recreate the image uploads, since the images were sent base64 encoded in the production.log.
good luck!