I want to write a multiplayer game on iOS platform. The game relied on random numbers that generated dynamically in order to decide what happen next. But it is a multiplayer game so this "random number" should be the same for all device for every player in order to have a consistent game play.
Therefor I need a good reliable pseudorandom number generator that if I seed it a same number first than it will keep generate same sequences of random number on all device (iPad/iPhone/iPodTouch) and all OS version.
Looks like srand and rand will do the job for me but I am not sure does rand guarantee to generate same number on all device across all OS version? Otherwise is any good pseudorandom number generate algorithm?
From the C standard (and Objective C is a thin layer on top of C so this should still hold):
If srand is then called with the same seed value, the sequence of pseudo-random numbers shall be repeated.
There's no guarantee that different implementations (or even different versions of the same implementation) will give a consistent sequence based on the seed. If you really want to guarantee that, you can code up your own linear congruential generator, such as the example one in the standard itself:
// RAND_MAX assumed to be 32767.
static unsigned long int next = 1;
void srand(unsigned int seed) { next = seed; }
int rand(void) {
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}
And, despite the fact that there are better generators around, the simple linear congruential one is generally more than adequate, unless you're a statistician or cryptographer.
If you provide a seed value to rand then it should consistently provide the same sequence of pseudorandom numbers. You can also try arc4random().
Related
I'm trying to use the CAMPARY library (CudA Multiple Precision ARithmetic librarY). I've downloaded the code and included it in my project. Since it supports both cpu and gpu, I'm starting with cpu to understand how it works and make sure it does what I need. But the intent is to use this with CUDA.
I'm able to instantiate an instance and assign a value, but I can't figure out how to get things back out. Consider:
#include <time.h>
#include "c:\\vss\\CAMPARY\\Doubles\\src_cpu\\multi_prec.h"
int main()
{
const char *value = "123456789012345678901234567";
multi_prec<2> a(value);
a.prettyPrint();
a.prettyPrintBin();
a.prettyPrintBin_UnevalSum();
char *cc = a.prettyPrintBF();
printf("\n%s\n", cc);
free(cc);
}
Compiles, links, runs (VS 2017). But the output is pretty unhelpful:
Prec = 2
Data[0] = 1.234568e+26
Data[1] = 7.486371e+08
Prec = 2
Data[0] = 0x1.987bf7c563caap+86;
Data[1] = 0x1.64fa5c3800000p+29;
0x1.987bf7c563caap+86 + 0x1.64fa5c3800000p+29;
1.234568e+26 7.486371e+08
Printing each of the doubles like this might be easy to do, but it doesn't tell you much about the value of the 128 number being stored. Performing highly accurate computations is of limited value if there's no way to output the results.
In addition to just printing out the value, eventually I also need to convert these numbers to ints (I'm willing to try it all in floats if there's a way to print, but I fear that both accuracy and speed will suffer). Unlike MPIR (which doesn't support CUDA), CAMPARY doesn't have any associated multi-precision int type, just floats. I can probably cobble together what I need (mostly just add/subtract/compare), but only if I can get the integer portion of CAMPARY's values back out, which I don't see a way to do.
CAMPARY doesn't seem to have any docs, so it's conceivable these capabilities are there, and I've simply overlooked them. And I'd rather ask on the CAMPARY discussion forum/mail list, but there doesn't seem to be one. That's why I'm asking here.
To sum up:
Is there any way to output the 128bit ( multi_prec<2> ) values from CAMPARY?
Is there any way to extract the integer portion from a CAMPARY multi_prec? Perhaps one of the (many) math functions in the library that I don't understand computes this?
There are really only 2 possible answers to this question:
There's another (better) multi-precision library that works on CUDA that does what you need.
Here's how to modify this library to do what you need.
The only people who could give the first answer are CUDA programmers. Unfortunately, if there were such a library, I feel confident talonmies would have known about it and mentioned it.
As for #2, why would anyone update this library if they weren't a CUDA programmer? There are other, much better multi-precision libraries out there. The ONLY benefit CAMPARY offers is that it supports CUDA. Which means the only people with any real motivation to work with or modify the library are CUDA programmers.
And, as the CUDA programmer with the most vested interest in solving this, I did figure out a solution (albeit an ugly one). I'm posting it here in the hopes that the information will be of value to future CAMPARY programmers. There's not much information out there for this library, so this is a start.
The first thing you need to understand is how CAMPARY stores its data. And, while not complex, it isn't what I expected. Coming from MPIR, I assumed that CAMPARY stored its data pretty much the same way: a fixed size exponent followed by an arbitrary number of bits for the mantissa.
But nope, CAMPARY went a different way. Looking at the code, we see:
private:
double data[prec];
Now, I assumed that this was just an arbitrary way of reserving the number of bits they needed. But no, they really do use prec doubles. Like so:
multi_prec<8> a("2633716138033644471646729489243748530829179225072491799768019505671233074369063908765111461703117249");
// Looking at a in the VS debugger:
[0] 2.6337161380336443e+99 const double
[1] 1.8496577979210756e+83 const double
[2] 1.2618399223120249e+67 const double
[3] -3.5978270144026257e+48 const double
[4] -1.1764513205926450e+32 const double
[5] -2479038053160511.0 const double
[6] 0.00000000000000000 const double
[7] 0.00000000000000000 const double
So, what they are doing is storing the max amount of precision possible in the first double, then the remainder is used to compute the next double and so on until they encompass the entire value, or run out of precision (dropping the least significant bits). Note that some of these are negative, which means the sum of the preceding values is a bit bigger than the actual value and they are correcting it downward.
With this in mind, we return to the question of how to print it.
In theory, you could just add all these together to get the right answer. But kinda by definition, we already know that C doesn't have a datatype to hold a value this size. But other libraries do (say MPIR). Now, MPIR doesn't work on CUDA, but it doesn't need to. You don't want to have your CUDA code printing out data. That's something you should be doing from the host anyway. So do the computations with the full power of CUDA, cudaMemcpy the results back, then use MPIR to print them out:
#define MPREC 8
void ShowP(const multi_prec<MPREC> value)
{
multi_prec<MPREC> temp(value), temp2;
// from mpir at mpir.org
mpf_t mp, mp2;
mpf_init2(mp, value.getPrec() * 64); // Make sure we reserve enough room
mpf_init(mp2); // Only needs to hold one double.
const double *ptr = value.getData();
mpf_set_d(mp, ptr[0]);
for (int x = 1; x < value.getPrec(); x++)
{
// MPIR doesn't have a mpf_add_d, so we need to load the value into
// an mpf_t.
mpf_set_d(mp2, ptr[x]);
mpf_add(mp, mp, mp2);
}
// Using base 10, write the full precision (0) of mp, to stdout.
mpf_out_str(stdout, 10, 0, mp);
mpf_clears(mp, mp2, NULL);
}
Used with the number stored in the multi_prec above, this outputs the exact same value. Yay.
It's not a particularly elegant solution. Having to add a second library just to print a value from the first is clearly sub-optimal. And this conversion can't be all that speedy either. But printing is typically done (much) less frequently than computing. If you do an hour's worth of computing and a handful of prints, the performance doesn't much matter. And it beats the heck out of not being able to print at all.
CAMPARY has a lot of shortcomings (undoced, unsupported, unmaintained). But for people who need mp numbers on CUDA (especially if you need sqrt), it's the best option I've found.
I am working on a data set of more than 22,000 records, and when I tried it with the apriori model, it's taking way too much time even for small number of records like 20. Is there a problem in my code or Is there a faster way to convert the asscocians into a list quickly? The code I used is below.
for i in range(0, 20):
transactions.append([str(dataset.values[i,j]) for j in range(0, 543)])
from apyori import apriori
associations = apriori(transactions, min_support=0.004, min_confidence=0.3, min_lift=3, min_length=2)
result = list(associations)
It's difficult to assess without your data, but the complexity of Apriori is based on a number of factors, including your support threshold, number of transactions, number of items, average/max transaction length, etc.
In cases where even a small number of transactions is taking a long time to run it's often a matter of too low of a minimum support. When support is very low (near 0) the algorithm is effectively still brute forcing, since it has to look at all possible combinations of items, of every length. This is the equivalent of a mathematical power set, which is exponential. For just 41 items you're actually trying 2^41 -1 possible combinations, which is just over 1.1 TRILLION possibilities.
I recommend starting with a "high" min_support at first (e.g. 0.20) and then working your way down slowly. It's easier to test things that take seconds at first than ones that'll take a long time.
Other important note: There is no min_length parameter in Apyori. I'm not sure where everyone's getting that from (you're not alone in thinking there is one), unless it's this one random blog post I found. The parameters are as follows (straight from the code):
Keyword arguments:
min_support -- The minimum support of relations (float).
min_confidence -- The minimum confidence of relations (float).
min_lift -- The minimum lift of relations (float).
max_length -- The maximum length of the relation (integer).
For what it's worth, I wrote unofficial docs for Apyori that can be found here.
So in Lua it's common knowledge that you can use math.randomseed but it's also obvious that math.random sets the seed as well (calling it twice does not return the same result), what does it set it to, and how can I keep track of it, and if it's impossible, please explain why that is so.
This is not a Lua question, but general question on how some RNG algorithm works.
First, Lua don't have their own RNG - they just output you (slightly mangled) value from RNG of underlying C library. Most RNG implementations do not reveal you their inner state, but sometimes you can caclulate it yourself.
For example when you use Lua on Windows, you'll be using LCG-based RNG from MS C library. The numbers you get is a slice of seed, not full value. There are two ways you can deal with that:
If you know how many times you called random, you can just take initial seed value, feed it to your copy of the same algorithm with same constants that are hardcoded in MS library and get exact value of seed.
If you don't, but you can be sure that nobody interferes in between your two calls to random, you can get two generated numbers, and reverse LCG algorithm by shifting bits back to their place. This will leave you with several missing bits (with one more bit thanks to Lua mangling) that you will need to simply bruteforce - just reiterate over all missing bits until your copy of algorithm produces exactly same two "random" numbers you've recorded before. That will be current seed stored inside library's RNG as well. Well programmed solution in Lua can bruteforce this in about 0.2-0.5s on somewhat dated PC - I did it past. Here's example on Crypto.SE talking about this task in more details: Predicting values from a Linear Congruential Generator.
First approach can be used with any other RNG algorithm that doesn't use any real entropy, second with most RNGs that don't mask too much bits in slice to make bruteforcing unreasonable.
Real answer though is: you don't need to keep track of seed at all. What you want is probably something else.
If you set a seed all numbers math.random() generates are pseudo-random (This is always the case as the system will generate a seed by itself).
math.randomseed(4)
print(math.random())
print(math.random())
math.randomseed(4)
print(math.random())
Outputs
0.50827539156303
0.75454387490399
0.50827539156303
So if you reset the seed to the same value you can predict all values that are going to come up to the maximum number of consecutive values that you already generated using that seed.
What the seed does not do is keep the output of math.random() the same. It would be the same if you kept resetting it to the same value.
An analogy as an example
Imagine the random number is an integer between 0 and 9 (instead of a double between 0 and 1).
math.random() could traverse pi's decimals from an arbitrary starting position (default could be system time).
What you do when you use set.seed() is (not literally, this is an analogy as mentioned) set the starting decimals of where in pi you are going to retrieve your numbers.
If you now reset the seed to the same starting position the numbers are going to be the same as the last time you reset the starting position.
You will know the numbers of to the last call, after that you can't be certain anymore.
I'm making an iOS dice game and one beta tester said he liked the idea that the rolls were already predetermined, as I use arc4random_uniform(6). I'm not sure if they are. So leaving aside the possibility that the code may choose the same number consecutively, would I generate a different number if I tapped the dice in 5 or 10 seconds time?
Your tester was probably thinking of the idea that software random number generators are in fact pseudo-random. Their output is not truly random as a physical process like a die roll would be: it's determined by some state that the generators hold or are given.
One simple implementation of a PRNG is a "linear congruential generator": the function rand() in the standard library uses this technique. At its core, it is a straightforward mathematical function, and each output is generated by feeding in the previous one as input. It thus takes a "seed" value, and -- this is what your tester was thinking of -- the sequence of output values that you get is completely determined by the seed value.
If you create a simple C program using rand(), you can (must, in fact) use the companion function srand() (that's "seed rand") to give the LCG a starting value. If you use a constant as the seed value: srand(4), you will get the same values from rand(), in the same order, every time.
One common way to get an arbitrary -- note, not random -- seed for rand() is to use the current time: srand(time(NULL)). If you did that, and re-seeded and generated a number fast enough that the return of time() did not change, you would indeed see the same output from rand().
This doesn't apply to arc4random(): it does not use an LCG, and it does not share this trait with rand(). It was considered* "cryptographically secure"; that is, its output is indistinguishable from true, physical randomness.
This is partly due to the fact that arc4random() re-seeds itself as you use it, and the seeding is itself based on unpredictable data gathered by the OS. The state that determines the output is entirely internal to the algorithm; as a normal user (i.e., not an attacker) you don't view, set, or otherwise interact with that state.
So no, the output of arc4random() is not reliably repeatable by you. Pseudo-random algorithms which are repeatable do exist, however, and you can certainly use them for testing.
*Wikipedia notes that weaknesses have been found in the last few years, and that it may no longer be usable for cryptography. Should be fine for your game, though, as long as there's no money at stake!
Basically, it's random. No it is not based around time. Apple has documented how this is randomized here: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/arc4random_uniform.3.html
In Lua, one would usually generate random values, and/or strings by using math.random & math.randomseed, where os.time is used for math.randomseed.
This method however has one major weakness; The returned number is always just as random as the current time, AND the interval for each random number is one second, which is way too long if one needs many random values in a very short time.
This issue is even pointed out by the Lua Users wiki: http://lua-users.org/wiki/MathLibraryTutorial, and the corresponding RandomStringS receipe: http://lua-users.org/wiki/RandomStrings.
So I've sat down and wrote a different algorithm (if it even can be called that), that generates random numbers by (mis-)using the memory addresses of tables:
math.randomseed(os.time())
function realrandom(maxlen)
local tbl = {}
local num = tonumber(string.sub(tostring(tbl), 8))
if maxlen ~= nil then
num = num % maxlen
end
return num
end
function string.random(length,pattern)
local length = length or 11
local pattern = pattern or '%a%d'
local rand = ""
local allchars = ""
for loop=0, 255 do
allchars = allchars .. string.char(loop)
end
local str=string.gsub(allchars, '[^'..pattern..']','')
while string.len(rand) ~= length do
local randidx = realrandom(string.len(str))
local randbyte = string.byte(str, randidx)
rand = rand .. string.char(randbyte)
end
return rand
end
At first, everything seems perfectly random, and I'm sure they are... at least for the current program.
So my question is, how random are these numbers returned by realrandom really?
Or is there an even better way to generate random numbers in a shorter interval than one second (which kind of implies that os.time shouldn't be used, as explaind above), without relying on external libraries, AND, if possible, in an entirely crossplatform manner?
EDIT:
There seems to be a major misunderstanding regarding the way the RNG is seeded; In production code, the call to math.randomseed() happens just once, this was just a badly chosen example here.
What I mean by the random value is only random once per second, is easily demonstrated by this paste: http://codepad.org/4cDsTpcD
As this question will get downvoted regardless my edits, I also cancelled my previously accepted answer - In hope for a better one, even if just better opinions. I understand that issues regarding random values/numbers has been discussed many times before, but I have not found such a question that could be relevant to Lua - Please keep that in mind!
You should not call seed each time you call random, you ought to call it only once, on the program initialization (unless you get the seed from somewhere, for example, to replicate some previous "random" behaviour).
Standard Lua random generator is of poor quality in the statistical sense (as it is, in fact, standard C random generator), do not use it if you care for that. Use, for example, lrandom module (available in LuaRocks).
If you need more secure random, read from /dev/random on Linux. (I think that Windows should have something along the same lines — but you may need to code something in C to use it.)
Relying on table pointer values is a bad idea. Think about alternate Lua implementations, in Java, for example — there is no telling what they would return. (Also, the pointer values may be predictable, and they may be, under certain circumstances the same each time the program is invoked.)
If you want finer precision for the seed (and you will want this only if you're launching the program more often than once per second), you should use a timer with better resolution. For example, socket.gettime() from LuaSocket. Multiply it by some value, since math.randomseed is working with integer part only, and socket.gettime() returns time in (floating point) seconds.
require 'socket'
math.randomseed(socket.gettime() * 1e6)
for i = 1, 1e3 do
print(math.random())
end
This method however has one major
weakness; The returned number is
always just as random as the current
time, AND the interval for each random
number is one second, which is way too
long if one needs many random values
in a very short time.
It has those weaknesses only if you implement it incorrectly.
math.randomseed is supposed to be called sparingly - usually just once at the beginning of your program, and it usually seeds using os.time. Once the seed is set, you can use math.random many times, and it will yield random values.
See what happens on this sample:
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
> math.randomseed(2)
> return math.random(), math.random(), math.random()
0.70097636929759 0.80967634907443 0.088795455214007
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
When I change the seed from 1 to 2, I get different random results. But when I go back to 1, the "random sequence" is reset. I obtain the same values as before.
os.time() returns an ever-increasing number. Using it as a seed is appropriate; then you can invoke math.random forever and have different random numbers every time you invoke it.
The only scenario you have to be a bit worried about non-randomness is when your program is supposed to be executed more than once per second. In that case, as the others are saying, the simplest solution is using a clock with higher definition.
In other words:
Invoke math.randomseed with an appropiate seed (os.time() is ok 99% of the cases) at the beginning of your program
Invoke math.random every time you need a random number.
Regards!
Some thoughts on the first part of your question:
So my question is, how random are these numbers returned by realrandom really?
Your function is attempting to discover the address of a table by using a quirk of its default implementation of tostring(). I don't believe that the string returned by tostring{} has a specified format, or that the value included in that string has any documented meaning. In practice, it is derived from the address of something related to the specific table, and so distinct tables convert to distinct strings. However, the next version of Lua is free to change that to anything that is convenient. Worse, the format it takes will be highly platform dependent because it appears to use the %p format specifier to sprintf() which is only specified as being a sensible representation of a pointer.
There's also a much bigger issue. While the address of the nth table created in a process might seem random on your platform, tt might not be random at all. Or it might vary in only a few bits. For example, on my win7 box only a few bits vary, and not very randomly:
C:...>for /L %i in (1,1,20) do # lua -e "print{}"
table: 0042E5D8
table: 0061E5D8
table: 0024E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0064E5D8
table: 0042E5D8
table: 002FE5D8
table: 0042E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0024E5D8
table: 0042E5D8
table: 0042E5D8
table: 0061E5D8
table: 0042E5D8
Other platforms will vary, of course. I'd even expect there to be platforms where the address of the first allocated table is completely deterministic, and hence identical on every run of the program.
In short, the address of an arbitrary object in your process image is not a very good source of randomness.
Edit: For completeness, I'd like to add a couple of other thoughts that came to mind over night.
The stock tostring() function is supplied by the base library and implemented by the function luaB_tostring(). The relevant bit is this fragment:
switch (lua_type(L, 1)) {
...
default:
lua_pushfstring(L, "%s: %p", luaL_typename(L, 1), lua_topointer(L, 1));
break;
If you really are calling this function, then the end of the string will be an address, represented by standard C sprintf() format %p, strongly related to the specific table. One observation is that I've seen several distinct implementations for %p. Windows MSVCR80.DLL (the version of the C library used by the current release of Lua for Windows) makes it equivalent to %08X. My Ubuntu Karmic Koala box appears to make it equivalent to %#x which notably drops leading zeros. If you are going to parse out that part of the string, then you should do it in a way that is more flexible in the face of variation of the meaning of %p.
Note, also, that doing anything like this in library code may expose you to a couple of surprises.
First, if the table passed to tostring() has a metatable that provides the function __tostring(), then that function will be called, and the fragment quoted above will never be executed at all. In your case, that issue cannot arise because tables have individual metatables, and you didn't accidentally apply a metatable to your local table.
Second, by the time your module loads, some other module or user-supplied code might have replaced the stock tostring() with something else. If the replacement is benign, (such as a memoization wrapper) then it likely doesn't matter to the code as written. However, this would be a source of attack, and is entirely outside the control of your module. That doesn't strike me as a good idea if the goal is some kind of improved security for your random seed material.
Third, you might not be loaded in a stock Lua interpreter at all, and the larger application (Lightroom, WoW, Wireshark, ...) may choose to replace the base library functions with their own implementations. This is a much less likely issue for tostring(), but note that the base library's print() is a frequent target for replacement or removal in alternate implementations and there are modules (Lua Lanes, for one) that break if print is not the implementation in the base library.
A few important things come to mind:
In most other languages you typically only call the random 'seed' function once at the beginning of the program or perhaps at limited times throughout its execution. You generally do not want to call it each time you generate a random number/sequence. If you call it once when the program starts you get around the "once per second" limitation. By calling it each time you may actually end up with less randomness in your results.
Your realrandom() function seems to rely on a private implementation detail of Lua. What happens in the next major release if this detail changes to always return the same number, or only even numbers, etc.... Just because it works for now is not a strong enough guarantee, especially in the case of wanting a secure RNG.
When you say "everything seems perfectly random" how are you measuring this performance? We humans are terrible at determining if a sequence is random or not and just looking at a sequence of numbers would be virtually impossible to truly tell if they were random or not. There are many ways to quantify the "randomness" of a series including frequency distribution, autocorrelation, compression, and many more far beyond my understanding.
If you are writing a true "secure PRNG" for production do not write your own! Investigate and use a library or algorithm by experts who has spent years/decades studying, designing and trying to break it. True secure random number generation is hard.
If you need more info start on the PRNG article on Wikipedia and use the references/links there as needed.