ML.NET: MLContext random seed does not make it deterministic - f#

Update: the problem persists in Microsoft.ML 1.5.0.
I just implemented the first ML.NET API tutorial in F# and then initialised the MLContext like so:
let mlContext = MLContext (Nullable<int> 5)
I take it that this should cause the results to be the same between runs, but the results change. Am I doing something wrong?
The only other element I can find that invokes randomness is TrainTestSplit, but when I set the seed for that also in the same way, I still get changing results.

Related

Rails: How to write model spec for this method?

I've just started to take on my first model spec task at work. After writing a lot of feature specs, I find it hard to get into the different perspective of writing model specs (not taking the context into consideration). I'll take a method of the Order model as an example, to explain which difficulties I am experiencing:
def update_order_prices
self.shipping_price_cents = SHIPPING_PRICE_CENTS unless shipping_price_cents
return if order_lines.empty?
self.total_price_cents = calculate_order_price
self.total_line_items_price_cents = calculate_total_order_line_price
self.total_tax_cents = calculate_tax_amount
end
EDIT TL;DR
I am totally happy with an answer that simply writes me a spec for this method. The rest of the post just shows what I tried so far but is not necessary to answer this question.
First approach:
At first I didn't know what to test for. I tried to find out when and where the method was called and to find a scenario where I would know what the attributes that are touched in this method should be equal to. Put short, I spent a lot of time trying to understand the context. Then a coworker said that I should test methods in model specs self-contained, independent from the context. I should just make sure I identify all cases. So for this method that would be:
it sets shipping price cents to default (if not done already)
it returns early if order_lines is empty
it sets values if order_line is set
Current approach:
I tried writing the tests for these points but still questions arise:
Test 1
it 'sets shipping price cents to default (if not done already)' do
order.shipping_price_cents = nil
order.update_order_prices
expect(order.shipping_price_cents).to eq(Order::SHIPPING_PRICE_CENTS)
end
I am confident I got this one right, but feel free to prove me wrong. I set shipping_price_cents to nil to trigger the code that sets it, call the tested method on the cents to be equal to the default value as defined in the model.
Test 2
it 'returns early if order_lines is empty' do
expect(order.update_order_prices).to eq(nil)
end
So here I want to test that the method returns early when there is no object in the order_lines association. I didn't have a clue how to do that so I went into the console, took an order, removed the order_lines associated with it, and called the method to see what would be returned.
2.3.1 :011 > o.order_lines
=> #<ActiveRecord::Associations::CollectionProxy []>
2.3.1 :012 > o.update_order_prices
=> nil
Then did the same for an order with associated order_line:
2.3.1 :017 > o.update_order_prices
=> 1661
So I tested for 'nil' to be returned. But it doesn't feel like I am testing the right thing.
Test 3
it 'sets (the correct?) values if order_line is set' do
order_line = create(:order_line, product: product)
order = create(:order, order_lines: [order_line])
order.update_order_prices
expect(order.total_price_cents).to eq(order.calculate_order_price)
expect(order.total_line_items_price_cents).to eq(order.calculate_order_line_price)
expect(order.total_tax_cents).to eq(order.calculate_tax_amount)
end
I simply test that the attributes equal what they are set to, without using actual values, as I shouldn't look outside. If I wanted to test for an absolute value, I would have to investigate outside of this function which then wouldn't test the method but also status of the Order object etc.?
Running the tests
Failures:
1) Order Methods: #update_order_prices sets (the correct?) values if order_line is set
Failure/Error: expect(order.total_price_cents).to eq(order.calculate_order_price)
NoMethodError:
private method `calculate_order_price' called for #<Order:0x007ff9ee643df0>
Did you mean? update_order_prices
So, the first two tests passed, the third one didn't. At this point I feel a bit lost and would love hear how some experienced developers would write this seemingly simple test.
Thanks
I guess you have to spec against the exact values you are expecting after update_order_prices.
Let's say you set up your order and order lines to have a total price of 10 euros then I'd add the following expectation
expect(order.total_price_cents).to eq(1000)
Same for the other methods. Generally I try to test against specific values. Also as you are relying on the result of a private method you only care about the result.

How does Rspec 'let' helper work with ActiveRecord?

It said here https://www.relishapp.com/rspec/rspec-core/v/3-5/docs/helper-methods/let-and-let what variable defined by let is changing across examples.
I've made the same simple test as in the docs but with the AR model:
RSpec.describe Contact, type: :model do
let(:contact) { FactoryGirl.create(:contact) }
it "cached in the same example" do
a = contact
b = contact
expect(a).to eq(b)
expect(Contact.count).to eq(1)
end
it "not cached across examples" do
a = contact
expect(Contact.count).to eq(2)
end
end
First example passed, but second failed (expected 2, got 1). So contacts table is empty again before second example, inspite of docs.
I was using let and was sure it have the same value in each it block, and my test prove it. So suppose I misunderstand docs. Please explain.
P.S. I use DatabaseCleaner
P.P.S I turn it off. Nothing changed.
EDIT
I turned off DatabaseCleaner and transational fixtures and test pass.
As I can understand (new to programming), let is evaluated once for each it block. If I have three examples each calling on contact variable, my test db will grow to three records at the end (I've tested and so it does).
And for right test behevior I should use DatabaseCleaner.
P.S. I use DatabaseCleaner
That's why your database is empty in the second example. Has nothing to do with let.
The behaviour you have shown is the correct behaviour. No example should be dependant on another example in setting up the correct environment! If you did rely on caching then you are just asking for trouble later down the line.
The example in that document is just trying to prove a point about caching using global variables - it's a completely different scenario to unit testing a Rails application - it is not good practice to be reliant on previous examples to having set something up.
Lets, for example, assume you then write 10 other tests that follow on from this, all of which rely on the fact that the previous examples have created objects. Then at some point in the future you delete one of those examples ... BOOM! every test after that will suddenly fail.
Each test should be able to be tested in isolation from any other test!

Mongoid identity_map and memory usage, memory leaks

When I executing query
Mymodel.all.each do |model|
# ..do something
end
It uses allot of memory and amount of used memory increases at all the time and at the and it crashes. I found out that to fix it I need to disable identity_map but when I adding to my mongoid.yml file identity_map_enabled: false I am getting error
Invalid configuration option: identity_map_enabled.
Summary:
A invalid configuration option was provided in your mongoid.yml, or a typo is potentially present. The valid configuration options are: :include_root_in_json, :include_type_for_serialization, :preload_models, :raise_not_found_error, :scope_overwrite_exception, :duplicate_fields_exception, :use_activesupport_time_zone, :use_utc.
Resolution:
Remove the invalid option or fix the typo. If you were expecting the option to be there, please consult the following page with repect to Mongoid's configuration:
I am using Rails 4 and Mongoid 4, Mymodel.all.count => 3202400
How can I fix it or maybe some one know other way to reduce amount of memory used during executing query .all.each ..?
Thank you very much for the help!!!!
I started with something just like you by doing loop through millions of record and the memory just keep increasing.
Original code:
#portal.listings.each do |listing|
listing.do_something
end
I've gone through many forum answers and I tried them out.
1st attempt: I try to use the combination of WeakRef and GC.start but no luck, I fail.
2nd attempt: Adding listing = nil to the first attempt, and still fail.
Success Attempt:
#start_date = 10.years.ago
#end_date = 1.day.ago
while #start_date < #end_date
#portal.listings.where(created_at: #start_date..#start_date.next_month).each do |listing|
listing.do_something
end
#start_date = #start_date.next_month
end
Conclusion
All the memory allocated for the record will never be released during
the query request. Therefore, trying with small number of record every
request does the job, and memory is in good condition since it will be
released after each request.
Your problem isn't the identity map, I don't think Mongoid4 even has an identity map built in, hence the configuration error when you try to turn it off. Your problem is that you're using all. When you do this:
Mymodel.all.each
Mongoid will attempt to instantiate every single document in the db.mymodels collection as a Mymodel instance before it starts iterating. You say that you have about 3.2 million documents in the collection, that means that Mongoid will try to create 3.2 million model instances before it tries to iterate. Presumably you don't have enough memory to handle that many objects.
Your Mymodel.all.count works fine because that just sends a simple count call into the database and returns a number, it won't instantiate any models at all.
The solution is to not use all (and preferably forget that it exists). Depending on what "do something" does, you could:
Page through all the models so that you're only working with a reasonable number of them at a time.
Push the logic into the database using mapReduce or the aggregation framework.
Whenever you're working with real data (i.e. something other than a trivially small database), you should push as much work as possible into the database because databases are built to manage and manipulate big piles of data.

Display objects in collections one per each line in console

If I do User.where(active: false) in the Rails console, the result is hard to parse.
Is there any trick without iteration (besides gems) to output each value/object one per line?
As part of my comment, I am adding an answer here so that other programmers can get help.
You can use pp while using rails console to get presentable print. E.g.:
pp User.where(active: false)
It will print every object on a new line.

How to test a random uniq values with rspec

I have this code:
def self.generate_random_uniq_code
code = sprintf("%06d", SecureRandom.random_number(999999))
code = self.generate_random_uniq_code if self.where(code: code).count > 0
code
end
The goal is create random codes for a new register, the code can't exist already in the registers
I'm trying test this way, but when I mock the SecureRandom it always return the same value:
it "code is unique" do
old_code = Code.new
old_code.code = 111111
new_code = Code.new
expect(SecureRandom).to receive(:random_number) {old_code.code}
new_code.code = Code.generate_random_uniq_code
expect(new_code.code).to_not eq old_code.code
end
I was trying to find if there is a way to enable and disable the mock behavior, but I could not find it, I'm not sure I'm doing the test the right way, the code seems no work fine to me.
Any help is welcome, thanks!
TL;DR
Generally, unless you are actually testing a PRNG that you wrote, you're probably testing the wrong behavior. Consider what behavior you're actually trying to test, and examine your alternatives. In addition, a six-digit number doesn't really have enough of a key space to ensure real randomness for most purposes, so you may want to consider something more robust.
Some Alternatives
One should always test behavior, rather than implementation. Here are some alternatives to consider:
Use a UUID instead of a six-digit number. The UUID is statistically less likely to encounter collisions than your current solution.
Enforce uniqueness in your database column by adjusting the schema.
Using a Rails uniqueness validator in your model.
Use FactoryGirl sequences or lambdas to return values for your test.
Fix Your Spec
If you really insist on testing this piece of code, you should at least use the correct expectations. For example:
# This won't do anything useful, if it even runs.
expect(new_code.code).to_not old_code.code
Instead, you should check for equality, with something like this:
old_code = 111111
new_code = Code.generate_random_uniq_code
new_code.should_not eq old_code
Your code may be broken in other ways (e.g. the code variable in your method doesn't seem to be an instance or class variable) so I won't guarantee that the above will work, but it should at least point you in the right direction.

Resources