Suppose we have some arbitrary Active Record object
obj = User.first
Is there a way to convert this into a text representation?
That is, is there a way to convert the object into some code that can be dropped into a completely different rails console to regenerate that same object?
The closest example I can give of this functionality is the dput() function from the R programming language. Is there an equivalent in ruby / rails, preferably one that works with Active Record objects?
Ruby has the Marshal module:
The marshaling library converts collections of Ruby objects into a
byte stream, allowing them to be stored outside the currently active
script. This data may subsequently be read and the original objects
reconstituted.
str = Marshal.dump(obj)
# => "\x04\bo:\nThing\x1A:\x10#new_recordF:\x10#attributeso:\x1EActiveModel::AttributeSet\x06;\a{\tI\"\aid\x06:\x06ETo:)ActiveModel::Attribute::FromDatabase\n:\n#name#\b:\x1C#value_before_type_casti\x06:\n#typeo:EActiveRecord::ConnectionAdapters::SQLite3Adapter::SQLite3Integer\t:\x0F#precision0:
You can then load the object back into memory:
restored_obj = Marshal.load(
StringIO.new(str) # usually this would be from a IO stream like a file
)
It has some pretty serious security implications though if you're accepting user input and other serialization formats like JSON or Yaml should be considered. Three are also issues if you use it for caching and then change Ruby versions.
Rails models in recent versions also support Global ID - which doesn't give you the exact same object but it gives you a URI which can be used to load the same record from the database.
gid = User.first.to_global_id
obj = GlobalID::Locator.locate(gid)
This is how ActiveJob passes around references to models.
Related
From the Best Practices Guide to using Sidekiq, I understand it's best to pass "string, integer, float, boolean, null(nil), array and hash" as arguments to the job.
I often just pass the id of a persisted object to my jobs, but due to latency constraints I need to save the object after running the job.
The non-persisted object I'm working with contains a mixture of data types:
#MyObject<00x000>{
id: nil
start_time: Fri, 11 Dec 2020 08:45:00 PST -08:00 (*this is a TimeWithZone object)
rate: 18.0 (*this is a BigDecimal object)
...
}
I plan to pass this object to my job by converting it to a hash first:
MyJob.perform_async(my_object.attributes)
and then later persist the object like so:
MyObject.new(my_object_hash).save
My question is, is this safe? Even though I am passing a 'simple' datatype to Sidekiq, it actually contains complex objects. Am I going to lose precision?
Thank you!
This sounds like a "potayto, potahto" solution. You are not not using the serialisation of Sidekiq, but instead serialize it yourself.
Let's have a look at why sidekiq has this rule:
Even if they did serialize correctly, what happens if your queue backs up and that quote object changes in the meantime? [...]
Don't pass symbols, named parameters, keyword arguments or complex Ruby objects (like Date or Time!) as those will not survive the dump/load round trip correctly.
I like to add a third:
Serializing state makes it impossible to distinguish between persisted and ethereal (in-memory, memoized, lazy-loaded etc) data. E.g. a def sent_mails; #sent_mails ||= Mail.for(user_id: id); end now gets serialized: do you want that?
The solution is also provided by sidekiq:
Don't save state to Sidekiq, save simple identifiers. Look up the objects once you actually need them in your perform method.
The XY problem here
Your real problem is not where or how to serialize state. Because sidekiq warns against serializing state regardless of where and how you do this.
The problem you need to solve is either how to store state somewhere where it can be stored properly. Or to avoid storing the state at all: not in redis/sidekiq, nor in the storage that is giving you problems.
Latency
Is your storage slow? Is it not a validation, a serialisation, some side-effect of storage that is slow?
Can you improve this by making it a two-step: insert the state and update/enrich/validate it async later? If you are using Rails, it won't help you here, or might even work against you, but a common model is to store objects in a special "queue" table or events queue; e.g. kafka is famous for this.
When e.g. storage happens over a slow network to a slow API, this is probably unsolvable, but when storage happens in a local database, there are decades of solutions to improve write performance here that you can use. Both inside your database, or with some specialised queue for state-storage (sidekiq is not such a specialised storage queue) depending on the tech used to store. E.g. Linux will allow you to store through memory, making writes to disk really quick, but removing the guarantee that it was really written to disk.
E.g. In a bookkeeping api, we would store the validated object in PostgreSQL and then have async jobs add expensive attributes to this later (e.g. state that had to be retrieved from legacy APIs or through complex calculations).
E.g. in a write-heavy GIS system, we would store objects into a "to_process_places" table, that was monitored by tooling which processes the Places. It all really depends on your domain, and requirements.
Not using state.
A common solution is not to make objects, but use the actual payload by the customer. Just send the HTTP payload (in rails, the params) along and leave it at that. Maybe merge in a header (like the Request Date) or filter out some data (header tokens or cookies).
If your controller can operate with this data, so can a delayed job. Instead of building objects in the controller, leave that to the delayed job. This can even result in really neat and lean controllers: all they do is (some authentication and authorization and then) call the proper job and pass it a sanitized params.
Obviously this requires trade-offs like not being able to validate in-sync, but to give such info over email, push-notification, or delayed response instead, depending on your requirements (e.g. a large CSV import could just email any validation issues, but a login request might need to get immediate response if the login is invalid).
It also requires some thought: you probably don't want to send the Base64 encoded CSV along to sidekiq, but instead write the file to a (temp) storage and pass the filename/url along instead. This might sound obvious, because it is: file uploads are essentially an implementation of the earlier mentioned "temporary state storage": you don't pass the entire PDF/high-res-header-image/CSV along to sidekiq, but store it somewhere so sidekiq can pick it up later to process it. Why should the other attributes not employ the same pattern if passing them along to sidekiq is problematic?
The most important part from the best practices you linked is
Complex Ruby objects do not convert to JSON
Therefore you're not supposed to pass instances of a model to a worker.
If you're using Sidekiq workers, you should comply with this statement and the hash you're passing should be just fine. I am not exactly sure about the TimeWithZone object, but you could try converting this to a JSON or to a string as they do in the best practices guide.
However, if you're using ActiveJob instead of Sidekiq workers (does your Job inherit from ApplicationJob or does it include Sidekiq::Worker ?), then you don't have that problem because ActiveJob uses Global ID to convert objects into a String. And then before performing the job is deserializing the object again. Meaning you can pass an object to your job.
my_object = MyObject.find(1)
my_object.to_global_id #=> #<GlobalID:0x000045432da2344 [...] gid://your_app_name/MyObject/1>>
serialized_my_object = my_object.to_global_id.to_s
my_object = GlobalID.find(serialized_my_object)
You can find more information here
https://github.com/toptal/active-job-style-guide#active-record-models-as-arguments
After doing some experimentation on the Time objects in my job, I found that I am losing nanosecond precision at the other end of the job.
my_object.start_time
=> Mon, 21 Dec 2020 11:35:50 PST -08:00
my_object.strftime('%Y-%m-%d %H:%M:%S.%N')
=> "2020-12-21 11:35:50.151893000"
You can see here, we have precision including 6 digits after the decimal.
(see this answer for more about 'strftime')
Once we call JSON methods on the object:
generated = JSON.generate(my_object.attributes))
=> \"start_time\":\"2020-12-21T11:35:50.151-08:00\"
You can see here we are down to 3 digits of precision after the decimal. The remaining 3 digits are lost at this point.
parsed = JSON.parse(generated)
parsed[‘start_time’] = "2020-12-21T11:35:50.151-08:00"
It appears at the most basic level, the JSON library recursively calls as_json on each of the key-value pairs in the hash. So really it depends on how your particular object implements as_json.
This issue caused test failures that involved querying our db for persisted objects (initialized with something like, start_time = Time.zone.now (!)) that are meant to overlap in time exactly with our MyObject class. Once the half-baked my_object blueprints made it through Sidekiq, they lost a sliver of precision, causing a slight misalignment.
One way to hack away at this issue is by monkey patching the Time class.
In our case, a better solution was to go in the opposite direction and to not use so much precision in our tests. The my_object in the example is something that a human user will have on their calendar; in production we never receive so much precision from clients. So instead we fixed our tests by instructing some of our test objects to use something like Time.zone.now.beginning_of_minute, rather than Time.zone.now. We intentionally removed precision to fix the issue, as well as more closely mirror reality.
I know this question has been asked before, but most answers I've found are related to ActiveRecord or old (most cases, both) and I was wondering whether there's a new take on this.
Is short, my Rails app is an API, so please keep this in mind (can't normally use lots of helpful little view related helpers).
I've been reading about this and found the MoneyRails which seems quite neat. The problem I'm having with it is that when I retrieve the data, it returns an object instead of the an usable value:
class MyModel
include Mongoid::Document
...
field :price_GBP, type: Money
...
end
So to create the document I send a number and it created the document fine. Now when I query the same document it returns an object for price_GBP, which is fine, but my main grip is that it return the value fractional as in my_obj.price_GBP[:fractional] as a string rather than a number.
I'd rather not have my client to have to do the conversion fro string to number than to decimal.
I guess I could create a little helper that would convert the value in such circumstances like so (in my Model):
def my_helper
self.price_GBP = BigDecimal(self.price_GBP) # or something along those lines
end
Then in my controller:
#my_model = Model.find(id)
#my_model.price_GBP = #my_model.price_GBP = #my_model.my_helper
render json: #my_model
With the above in mind, would this be the best approach? If yes, what's the point of using the MoneyRails gem then?
Secondly, if not using the MoneyRails gem, should I use BigDecimal or Float as the field type?
When I tried BigDecimal, the data was saved ok, but when I've retrieve it, I got an string rather than a number. Is this the correct behaviour?
When I tried Float it all worked fine, but I read somewhere that Float is not the most accurate.
What are your thoughts?
Avoid using Float if you're planning on performing any type of arithmetic on the currency values. BigDecimal is good or you can also represent the value in cents and store it as an Integer. This is actually how the Money gem works.
My recommendation would be to continue to use the MoneyRails gem and use the built-in helpers to output the values. You mentioned not being able to use the helpers but I don't see what's preventing that - Rails includes jbuilder which allows you to formulate your JSON structure in a view with access to all "helpful little view related helpers" - for example
# app/views/someresource/show.json.jbuilder
# ...other attributes
json.price_GBP = humanized_money(#my_model.price_GBP)
Yet another ruby question but this is a bunch of questions in one. I'm really starting to like rails but there are some questions that I'd just like to ask straight out.
Right now, I'm implementing a queue in sqlite. I already have a scaffold setup with this working OK. The purpose is for a web crawler to read through the array and determine which links he should crawl next.
The architecture in the program is 2 controllers. one for Job and one for crawler. The Jobs has the standard Crud interface supplied by scaffold. Where I'm falling down is I'm still trying to understand how these things communicate with eachother.
The Job is formatted as a url:string and depth:decimal. The table is already populated with about 4 objects.
#sitesToCrawl = Job.all
#sitesToCrawl.each {|x|puts Job.url}
I have a bunch of questions about the above.
At the moment, this was supposed to display all the jobs and I foolishly thought it would display plain text but its actually a hexidecimal pointer to the object itself. What Im trying to do is iterate through the #sitesToCrawl and put out each Jobs url.
Questions start here:
1: I know ruby is dynamically typed. Will #sitesToCrawl become an array like i want it to be with each slot containing a job.
2: #sitesToCrawl.each is pretty straighforward and I'm assuming its an iterator.
is X the name od the method or what is the purpose of the symbol or string between |*|
3: Puts and print are more or less the same yes? if i say #x = puts 3 then would x be 3?
4: Job.url. Can objects be referenced this way or should I be using
##sitesToCrawl = db.execute("SELECT url FROM jobs;")
where db is a new database
As Rubish Gupta pointed out, in your block, you should do x.url, otherwise you're trying to access the url method on the class Job, not on instances of Job. In other words, in blocks, the items in the pipes are the arguments of the block, and each will iterate through your array, passing in one item at a time to your block. Check out the doc here.
Just to extend this idea, each on Hashes (associative arrays, maps, whatever you know them as) will pass two variables to your block: a key and a value, like this:
a_hash.each {|key_var, val_var| puts "#{key_var} is associated with #{val_var}"}
Also, it's been a bit since I've done plain ActiveRecord models, but you might look into doing
#sitesToCrawl = Job.all.to_a
since Job.all is a lazy finder in that it's building a query in potentia: you've essentially built a query string saying SELECT * FROM jobs, but it might not be executed until you try to access the items. each might do that, I can't remember off the top of my head, but if you're using a debugger to look at it, I know you need to_a to get it to run the query.
You should absolutely be using job_instance.url - that's the beauty of ActiveRecord, it makes database access easy, provided everything gets set up right :)
Finally, puts and print are almost the same - the difference is that puts "string" is essentialy print "sting"; STDOUT.flush - it flushes at the end of the statement.
I'm working on on a caching layer in my Rails app and I'm having trouble caching original DataMapper objects. They seem to have a lot of stuff attached that make marshaling fail (I get an error about Marshal being unable to serialize a Proc object).
So I am considering writing my own pre-serialization and post-deserialization methods for caching. Specifically I will turn the DataMapper object into a list of tuples with this:
o = Foo.get(1234)
as_list = o.model.properties.map { |p| [p.name, o.send(p.name)] }
And then cache that list.
My question is: How do I reconstruct the DataMapper object in a way that allows me to use it as it if were constructed by a normal DataMapper query?
My naive approach of Foo.new(foo=bar, goo=baz) doesn't seem to connect it up with all of the foreign keys and stuff.
After some "fun" code-spelunking I seem to have found something that works:
mc.set(key, HashWithIndifferentAccess[o.attributes])
as_hash = mc.get(key)
from_cache = Foo.load([as_hash], Foo.all.query).first
The load method on the model seems to be what get uses and the query seems to be required in order to get the repository names and a few other things.
I am using redis as my web cache, and I want to store those activerecord objects to redis directly, but using redis-rb I get an error.
It seems that I can't serialize it or some what. Is there a lib to do this for me?
Am I have to serialize it to json format?
Which serialization format would be the most efficient?
Redis stores strings (and a few other data structures of strings); so you can serialize into Redis values however you like so long as you end up with a string.
JSON is probably the best place to start as it's lean, not overly brittle, works well with live upgrade patterns, and is readable in situ. Later you can add more complexity to meet your goals as needed, e.g., compression. #to_json and #from_json are already on ActiveRecord if you want to use JSON (with YAJL or its ilk that shouldn't be excessively slow, relatively speaking.) #to_xml is also there, if you're into S&M.
Raw marshaling can also work, but occasionally goes horrifically wrong (I've had marshaled objects exceed 2MB after LZO compression that were only a few K in JSON.)
If it's really a bottleneck for you, you'll want to run your own efficiency tests for your goal(s), e.g., write speed, read speed, or storage size, with your own objects and data patterns.
You can convert your model to a hash using attributes method and then save it with mapped_hmset
def redis_set()
redis.mapped_hmset("namespace:modelName:#{self.id}", self.attributes)
end
def redis_get(id)
redis.hgetall("namespace:modelName:#{id}")
end
def self.set(friend_list, player_id)
redis.set("friend_list_#{player_id}", Marshal.dump(friend_list)) == 'OK' ? friend_list : nil
end
def self.get(player_id)
friend_list = redis.get("friend_list_#{player_id}")
Marshal.load(friend_list) if friend_list
end