How to show partially evaluated method source in Ruby? - ruby-on-rails

Bear with me, this is an off-beat questions.
What I want to achieve is being able to log how a calculation was performed in a method.
Lets say we have this simple method inside a model. It performs a calculation and returns a result. Awesome.
def calc_reduction_factor
(numerator + 3) / denominator
end
The issue I am having is that the auditing requires all the calculations to be logged (when report is produced, not when function is run). So for this example with numbers:
puts "(#{model.numenator} + 3) / #{model.denominator} = #{model.calc_reduction_factor}"
>> (5 + 3) / 2 = 4
I am lazy and sometimes forgetful. If the function requires a change the logging would have to be changed too. This is a small example but in the future there might be hundreds different calc methods.
I am looking for a solution that will be able to print out the source of the function and also do the math. Closest I found was dentaku library. Could store string in a repo and then print them for loggin or evaluate for calc purposes. But it looks like it haven't been updated in a while and is a little slow.

Related

Solving a recurrence using master method when g(n)=log(n)

I'm trying to solve the recurrence f(n)=2f(n/2)+logn when f(1)=1 and n is a power of 2. I think that I should be able to do this using the master method. I've seen this before, but never with log. Can I get some help getting started, please?
One trick that’s often useful here is to replace the log n term with something that grows strictly faster or slower and to see what you get. For example, your recurrence is bounded from above and below, respectively, by these recurrences:
A(n) = 2A(n / 2) + √n.
B(n) = 2A(n / 2) + 1.
What do these solve to? What does that tell you about your recurrence?

Ruby: reverse calculation or spreadsheet function

I've got a module that calculates about 150-200 values. Once it's done, I want the ability to edit one or some of the results—which may or may not be one of the original, non-calculated values—and have the other results update, like the functionality of a spreadsheet.
The problem is I really don't know where to start. The module's code looks largely like this:
if #user.override > 0
h[:floor] += (#user.override / h[:size]) * 0.03
end
if #user.other_override > 0
h[:floor] += (#user.other_override / h[:size]) * 0.03
end
And is quite chronological, making this task even tougher.
Is there any approach here that'll work? I can barely wrap my head around how it might, other than to leverage an actual spreadsheet into my app.
What you are looking for is called "Reactive Programming". Many languages have an implementation of the ReactiveX framework, so does Ruby: RxRuby

How to call a scenario several times without tableized items in a behave test?

I 'd like to call the scenario -Let's say 500 times- in Gherkin test without tableized items. The reason is I 'd like to use randomized variables instead of written by myself.
I know how to implement random functionality to the tests but it is called just once.
For example :
Scenario Outline: I want to test speed with different values
When I set the speed to <speed>
And I wait for 5 seconds
Then it plays at <speed>
Examples:
| speed |
| 10 |
| 20 |
| 30 |
| 40 |
| 50 |
import random
speeds = ['10', '20', '30', '40', '50']
def next_speed():
return random.choice(speeds)
If I use random functionality like this, how can I call this scenario 500 times?
Thanks in advance.
I've thought about this one on and off for the last couple of years, while I've been using Behave. I use it to drive a functional test environment for critical communciations radios, and also the environment they're in (play / record audio, do some deep learning on the WAV files to confirm content, work with their UI and navigate around to create messages, stuff like that).
I've had the need to try to loop things before, outside of a single line with a supplied table of data.
While I was investigating some of the recesses of the Content object, I delved into the interpreter, and I'm wondering if I can manage to get it to implement something like this:
When I loop i 5 times
And I perform this action
And I perform another action
Then I should see this result
And loop back to i
This won't break the gherkin syntax, and provided I can implement something in the loop steps which will rewind the parser somehow, it should run them again. The hard part would be ensuring the results of all the steps are preserved, I suspect: I'd need to delve into the structures used for storing the results, so that the outputs would show all the iterations.
Has anyone else looked into implementing this into Behave via defined steps?
from __future__ import print_function
import functools
from behave.model import ScenarioOutline
def patch_scenario_with_autoretry(scenario, max_attempts=3):
"""Monkey-patches :func:`~behave.model.Scenario.run()` to auto-retry a
scenario that fails. The scenario is retried a number of times
before its failure is accepted.
This is helpful when the test infrastructure (server/network environment)
is unreliable (which should be a rare case).
:param scenario: Scenario or ScenarioOutline to patch.
:param max_attempts: How many times the scenario can be run.
"""
def scenario_run_with_retries(scenario_run, *args, **kwargs):
for attempt in range(1, max_attempts+1):
if not scenario_run(*args, **kwargs):
if attempt > 1:
message = u"AUTO-RETRY SCENARIO PASSED (after {0} attempts)"
print(message.format(attempt))
return False # -- NOT-FAILED = PASSED
# -- SCENARIO FAILED:
if attempt < max_attempts:
print(u"AUTO-RETRY SCENARIO (attempt {0})".format(attempt))
message = u"AUTO-RETRY SCENARIO FAILED (after {0} attempts)"
print(message.format(max_attempts))
return True
if isinstance(scenario, ScenarioOutline):
scenario_outline = scenario
for scenario in scenario_outline.scenarios:
scenario_run = scenario.run
scenario.run = functools.partial(scenario_run_with_retries, scenario_run)
else:
scenario_run = scenario.run
scenario.run = functools.partial(scenario_run_with_retries, scenario_run)
Reference: https://github.com/behave/behave/blob/master/behave/contrib/scenario_autoretry.py
If you try to use gherkin as a scripting tool, you're gonna have a bad time. There are much better tools for that, like python itself or robot framework. Ask yourself what advantage you expect from your gherkin test. Your gherkin should answer 'why' you are doing something, it should have examples that explain sufficiently the different cases - and preferably only the interesting ones.
You need to add row dynamic each time for test. It will never show in feature file but it needs to add rows.
Following links have few function that will create dynamic rows for you in step definition
http://www.programcreek.com/java-api-examples/index.php?api=gherkin.formatter.model.DataTableRow
or call step definition from step definition
http://www.specflow.org/documentation/Calling-Steps-from-Step-Definitions/
Hopefully I understand your question :)
What about this:
Change the step: "When I set the speed to speed" to
When I set the speed to {speed} so that it takes an argument.
In your feature: When I test the speed 500 times
And in that step: When I test the speed 500 times you:
==>create a for loop 500 times:
=====>choose a random speed
=====>execute the other steps with context.execute_steps and format(speed)
You'll have to work around this a bit because it takes unicodes, not integers.
==> However, one might agree with Szabo Peter that it is a little awkward to use gherkin/python-behave for this :). It kind of messes up the purpose. Also, even in line of my thinking here, it might be done more elegantly.
You can find some nice stuff here: https://jenisys.github.io/behave.example/tutorials/tutorial08.html
Cheerz
So edit after comment: (after editing and writing this example it looks even more silly then I thought it would, so yeah: don't use behave for this.
Example:
Feature: test feature
Scenario: test scenario
given I open the app
when I test the app 500 times at random speed
then the console says it is done
steps:
#given(u'I open the app')
=>def I_open_the_app(context):
==>#code to open the app
#when(u'I test the app 500 times at random speed')
=>def I_test_the_app_500_times_at_random_speed(context):
==>for times in range(1,500):
===>random_speed = random.randint(min_speed,max_speed)
===>context.execute_steps(u'''when I play at {speed}'''.format(speed=str(random_speed))
#when(u'I play at {speed}')
=>def I_play_at(context,speed)
==>play_at_speed(int(speed))
#then(u'the console says it is done')
=>def the_console_says_it_is_done
==>print('it is done')

Doing efficient mathematical calculations in Redis

Looking around the web for information on doing maths in Redis and don't actually find much. I'm using the Redis-RB gem in Rails, and caching lists of results:
e = [1738738.0, 2019461.0, 1488842.0, 2272588.0, 1506046.0, 2448701.0, 3554207.0, 1659395.0, ...]
$redis.lpush "analytics:math_test", e
Currently, our lists of numbers max in the thousands to tens of thousands per list per day, with number of lists likely in the thousands per day. (This is not actually that much; however, we're growing, and expect much larger sample sizes very soon.)
For each of these lists, I'd like to be able to run stats. I currently do this in-app
def basic_stats(arr)
return nil if arr.nil? or arr.empty?
min = arr.min.to_f
max = arr.max.to_f
total = arr.inject(:+)
len = arr.length
mean = total.to_f / len # to_f so we don't get an integer result
sorted = arr.sort
median = len % 2 == 1 ? sorted[len/2] : (sorted[len/2 - 1] + sorted[len/2]).to_f / 2
sum = arr.inject(0){|accum, i| accum +(i-mean)**2 }
variance = sum/(arr.length - 1).to_f
std_dev = Math.sqrt(variance).nan? ? 0 : Math.sqrt(variance)
{min: min, max: max, mean: mean, median: median, std_dev: std_dev, size: len}
end
and, while I could simply store the stats, I will often have to aggregate lists together to run stats on the aggregated list. Thus, it makes sense to store the raw numbers rather than every possible aggregated set. Because of this, I need the math to be fast, and have been exploring ways to do this. The simplest way is just doing it in-app, with 150k items in a list, this isn't actually too terrible:
$redis_analytics.llen "analytics:math_test", 0, -1
=> 156954
Benchmark.measure do
basic_stats $redis_analytics.lrange("analytics:math_test", 0, -1).map(&:to_f)
end
=> 2.650000 0.060000 2.710000 ( 2.732993)
While I'd rather not push 3 seconds for a single calculation, given that this might be outside of my current use-case by about 10x number of samples, it's not terrible. What if we were working with a sample size of one million or so?
$redis_analytics.llen("analytics:math_test")
=> 1063454
Benchmark.measure do
basic_stats $redis_analytics.lrange("analytics:math_test", 0, -1).map(&:to_f)
end
=> 21.360000 0.340000 21.700000 ( 21.847734)
Options
Use the SORT method on the list, then you can instantaneously get min/max/length in Redis. Unfortunately, it seems that you still have to go in-app for things like median, mean, std_dev. Unless we can calculate these in Redis.
Use Lua scripting to do the calculations. (I haven't learned any Lua yet, so can't say I know what this would look like. If it's likely faster, I'd like to know so I can try it.)
Some more efficient way to utilize Ruby, which seems a wee bit unlikely since utilizing what seems like a fairly decent stats gem has analogous results
Use a different database.
Example using StatsSample gem
Using a gem seems to gain me nothing. In Python, I'd probably write a C module, not sure if many ruby stats gems are in C.
require 'statsample'
def basic_stats(stats)
return nil if stats.nil? or stats.empty?
arr = stats.to_scale
{min: arr.min, max: arr.max, mean: arr.mean, median: arr.median, std_dev: arr.sd, size: stats.length}
end
Benchmark.measure do
basic_stats $redis_analytics.lrange("analytics:math_test", 0, -1).map(&:to_f)
end
=> 20.860000 0.440000 21.300000 ( 21.436437)
Coda
It's quite possible, of course, that such large stats calculations will simply take a long time and that I should offload them to a queue. However, given that much of this math is actually happening inside Ruby/Rails, rather than in the database, I thought I might have other options.
I want to keep this open in case anyone has any input that could help others doing the same thing. For me, however, I've just realized that I'm spending too much time trying to force Redis to do something that SQL does quite well. If I simply dump this into Postgres, I can do really efficient aggregation AND math directly in the database. I think I was just stuck using Redis for something that, when it started, was a good idea, but scaled out to something bad.
Lua scripting is probably the best way to solve this problem, if you can switch to Redis 2.6. Btw testing the speed should be pretty straightforward so given the small time investment needed I strongly suggest trying Lua scripting to see what is the result you get.
Another thing you could do is to use Lua to set data, and make sure it will also update a related Hash type per every list to directly retain the min/max/average stats, so you don't have to compute those stats every time, as they are incrementally updated. Not always possible btw, depends on your specific use case.
I would take a look at NArray. From their homepage:
This extension library incorporates fast calculation and easy manipulation of large numerical arrays into the Ruby language.
It looks like their array class has most all of the functions you need built in. Cmd-F "Statistics" on that page.

(Secure) Random string?

In Lua, one would usually generate random values, and/or strings by using math.random & math.randomseed, where os.time is used for math.randomseed.
This method however has one major weakness; The returned number is always just as random as the current time, AND the interval for each random number is one second, which is way too long if one needs many random values in a very short time.
This issue is even pointed out by the Lua Users wiki: http://lua-users.org/wiki/MathLibraryTutorial, and the corresponding RandomStringS receipe: http://lua-users.org/wiki/RandomStrings.
So I've sat down and wrote a different algorithm (if it even can be called that), that generates random numbers by (mis-)using the memory addresses of tables:
math.randomseed(os.time())
function realrandom(maxlen)
local tbl = {}
local num = tonumber(string.sub(tostring(tbl), 8))
if maxlen ~= nil then
num = num % maxlen
end
return num
end
function string.random(length,pattern)
local length = length or 11
local pattern = pattern or '%a%d'
local rand = ""
local allchars = ""
for loop=0, 255 do
allchars = allchars .. string.char(loop)
end
local str=string.gsub(allchars, '[^'..pattern..']','')
while string.len(rand) ~= length do
local randidx = realrandom(string.len(str))
local randbyte = string.byte(str, randidx)
rand = rand .. string.char(randbyte)
end
return rand
end
At first, everything seems perfectly random, and I'm sure they are... at least for the current program.
So my question is, how random are these numbers returned by realrandom really?
Or is there an even better way to generate random numbers in a shorter interval than one second (which kind of implies that os.time shouldn't be used, as explaind above), without relying on external libraries, AND, if possible, in an entirely crossplatform manner?
EDIT:
There seems to be a major misunderstanding regarding the way the RNG is seeded; In production code, the call to math.randomseed() happens just once, this was just a badly chosen example here.
What I mean by the random value is only random once per second, is easily demonstrated by this paste: http://codepad.org/4cDsTpcD
As this question will get downvoted regardless my edits, I also cancelled my previously accepted answer - In hope for a better one, even if just better opinions. I understand that issues regarding random values/numbers has been discussed many times before, but I have not found such a question that could be relevant to Lua - Please keep that in mind!
You should not call seed each time you call random, you ought to call it only once, on the program initialization (unless you get the seed from somewhere, for example, to replicate some previous "random" behaviour).
Standard Lua random generator is of poor quality in the statistical sense (as it is, in fact, standard C random generator), do not use it if you care for that. Use, for example, lrandom module (available in LuaRocks).
If you need more secure random, read from /dev/random on Linux. (I think that Windows should have something along the same lines — but you may need to code something in C to use it.)
Relying on table pointer values is a bad idea. Think about alternate Lua implementations, in Java, for example — there is no telling what they would return. (Also, the pointer values may be predictable, and they may be, under certain circumstances the same each time the program is invoked.)
If you want finer precision for the seed (and you will want this only if you're launching the program more often than once per second), you should use a timer with better resolution. For example, socket.gettime() from LuaSocket. Multiply it by some value, since math.randomseed is working with integer part only, and socket.gettime() returns time in (floating point) seconds.
require 'socket'
math.randomseed(socket.gettime() * 1e6)
for i = 1, 1e3 do
print(math.random())
end
This method however has one major
weakness; The returned number is
always just as random as the current
time, AND the interval for each random
number is one second, which is way too
long if one needs many random values
in a very short time.
It has those weaknesses only if you implement it incorrectly.
math.randomseed is supposed to be called sparingly - usually just once at the beginning of your program, and it usually seeds using os.time. Once the seed is set, you can use math.random many times, and it will yield random values.
See what happens on this sample:
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
> math.randomseed(2)
> return math.random(), math.random(), math.random()
0.70097636929759 0.80967634907443 0.088795455214007
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
When I change the seed from 1 to 2, I get different random results. But when I go back to 1, the "random sequence" is reset. I obtain the same values as before.
os.time() returns an ever-increasing number. Using it as a seed is appropriate; then you can invoke math.random forever and have different random numbers every time you invoke it.
The only scenario you have to be a bit worried about non-randomness is when your program is supposed to be executed more than once per second. In that case, as the others are saying, the simplest solution is using a clock with higher definition.
In other words:
Invoke math.randomseed with an appropiate seed (os.time() is ok 99% of the cases) at the beginning of your program
Invoke math.random every time you need a random number.
Regards!
Some thoughts on the first part of your question:
So my question is, how random are these numbers returned by realrandom really?
Your function is attempting to discover the address of a table by using a quirk of its default implementation of tostring(). I don't believe that the string returned by tostring{} has a specified format, or that the value included in that string has any documented meaning. In practice, it is derived from the address of something related to the specific table, and so distinct tables convert to distinct strings. However, the next version of Lua is free to change that to anything that is convenient. Worse, the format it takes will be highly platform dependent because it appears to use the %p format specifier to sprintf() which is only specified as being a sensible representation of a pointer.
There's also a much bigger issue. While the address of the nth table created in a process might seem random on your platform, tt might not be random at all. Or it might vary in only a few bits. For example, on my win7 box only a few bits vary, and not very randomly:
C:...>for /L %i in (1,1,20) do # lua -e "print{}"
table: 0042E5D8
table: 0061E5D8
table: 0024E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0064E5D8
table: 0042E5D8
table: 002FE5D8
table: 0042E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0024E5D8
table: 0042E5D8
table: 0042E5D8
table: 0061E5D8
table: 0042E5D8
Other platforms will vary, of course. I'd even expect there to be platforms where the address of the first allocated table is completely deterministic, and hence identical on every run of the program.
In short, the address of an arbitrary object in your process image is not a very good source of randomness.
Edit: For completeness, I'd like to add a couple of other thoughts that came to mind over night.
The stock tostring() function is supplied by the base library and implemented by the function luaB_tostring(). The relevant bit is this fragment:
switch (lua_type(L, 1)) {
...
default:
lua_pushfstring(L, "%s: %p", luaL_typename(L, 1), lua_topointer(L, 1));
break;
If you really are calling this function, then the end of the string will be an address, represented by standard C sprintf() format %p, strongly related to the specific table. One observation is that I've seen several distinct implementations for %p. Windows MSVCR80.DLL (the version of the C library used by the current release of Lua for Windows) makes it equivalent to %08X. My Ubuntu Karmic Koala box appears to make it equivalent to %#x which notably drops leading zeros. If you are going to parse out that part of the string, then you should do it in a way that is more flexible in the face of variation of the meaning of %p.
Note, also, that doing anything like this in library code may expose you to a couple of surprises.
First, if the table passed to tostring() has a metatable that provides the function __tostring(), then that function will be called, and the fragment quoted above will never be executed at all. In your case, that issue cannot arise because tables have individual metatables, and you didn't accidentally apply a metatable to your local table.
Second, by the time your module loads, some other module or user-supplied code might have replaced the stock tostring() with something else. If the replacement is benign, (such as a memoization wrapper) then it likely doesn't matter to the code as written. However, this would be a source of attack, and is entirely outside the control of your module. That doesn't strike me as a good idea if the goal is some kind of improved security for your random seed material.
Third, you might not be loaded in a stock Lua interpreter at all, and the larger application (Lightroom, WoW, Wireshark, ...) may choose to replace the base library functions with their own implementations. This is a much less likely issue for tostring(), but note that the base library's print() is a frequent target for replacement or removal in alternate implementations and there are modules (Lua Lanes, for one) that break if print is not the implementation in the base library.
A few important things come to mind:
In most other languages you typically only call the random 'seed' function once at the beginning of the program or perhaps at limited times throughout its execution. You generally do not want to call it each time you generate a random number/sequence. If you call it once when the program starts you get around the "once per second" limitation. By calling it each time you may actually end up with less randomness in your results.
Your realrandom() function seems to rely on a private implementation detail of Lua. What happens in the next major release if this detail changes to always return the same number, or only even numbers, etc.... Just because it works for now is not a strong enough guarantee, especially in the case of wanting a secure RNG.
When you say "everything seems perfectly random" how are you measuring this performance? We humans are terrible at determining if a sequence is random or not and just looking at a sequence of numbers would be virtually impossible to truly tell if they were random or not. There are many ways to quantify the "randomness" of a series including frequency distribution, autocorrelation, compression, and many more far beyond my understanding.
If you are writing a true "secure PRNG" for production do not write your own! Investigate and use a library or algorithm by experts who has spent years/decades studying, designing and trying to break it. True secure random number generation is hard.
If you need more info start on the PRNG article on Wikipedia and use the references/links there as needed.

Resources