This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Generating a random Gaussian double in Objective-C/C
Is there any way of getting a random number not from a uniform distribution, but from a Gaussian (Normal, Bell Curve) distribution in iOS? All the random number generators I have found are basically uniform and I want to make the numbers cluster around a certain point. Thanks!
Just use a uniform distribution generator and apply the Box-Muller Transform:
double u1 = (double)arc4random() / UINT32_MAX; // uniform distribution
double u2 = (double)arc4random() / UINT32_MAX; // uniform distribution
double f1 = sqrt(-2 * log(u1));
double f2 = 2 * M_PI * u2;
double g1 = f1 * cos(f2); // gaussian distribution
double g2 = f1 * sin(f2); // gaussian distribution
One simple option is to add several numbers from a uniform distribution together. Many dice based games use this approach to generate roughly normal distributions of results.
via wikipedia
If you can be more specific about what distribution you want there may be more precise solutions but combining several rolls is an easy and fairly flexible solution.
Related
I was wondering if it would be possible to poll the AnalyzerNode from the WebAudio API and use it to construct a PeriodicWave that is synthesized via an OscillatorNode?
My intuition is that something about the difference in amplitudes between analyzer frames can help calculate the right phase for a PeriodicWave, but I'm not sure how to go about implementing it. Any help on the right algorithm to use would be appreciated!
As luck would have it, I was working on a similar project just a few weeks ago. I put together a JSFiddle to explore the idea of reconstructing a phase-randomized version of a waveform using frequency data from an AnalyserNode. You can find that experiment here:
https://jsfiddle.net/mattdiamond/w4u7x8zk/
Here's the code that takes in the frequency data output from an AnalyserNode and generates a PeriodicWave:
function generatePeriodicWave(freqData) {
const real = [];
const imag = [];
freqData.forEach((x, i) => {
const amp = fromDecibels(x);
const phase = getRandomPhase();
real.push(amp * Math.cos(phase));
imag.push(amp * Math.sin(phase));
});
return context.createPeriodicWave(real, imag);
}
function fromDecibels(x) {
return 10 ** (x / 20);
}
function getRandomPhase() {
return Math.random() * 2 * Math.PI - Math.PI;
}
Since the AnalyserNode converts the FFT amplitude values to decibels, we need to recover those original values first (which we do by simply using the inverse of the formula that was used to convert them to decibels). We also need to provide a phase for each frequency, which we select at random from the range -π to π.
Now that we have an amplitude and phase, we construct a complex number by multiplying the amplitude by the cosine and sine of the phase. This is because the amplitude and phase correspond to a polar coordinate, and createPeriodicWave expects a list of real and imaginary numbers corresponding to Cartesian coordinates in the complex plane. (See here for more information on the mathematics behind this conversion.)
Once we've generated the PeriodicWave, all that's left to do is load it into an OscillatorNode, set the desired frequency, and start the oscillator. You'll notice that the default frequency is set to context.sampleRate / FFT_SIZE (you can ignore the toFixed, that was just for the sake of the UI). This causes the oscillator to play the wave at the same rate as the original samples. Increasing or decreasing the frequency from this value will pitch-shift the audio up or down, respectively.
You'll also notice that I chose 2^15 as the FFT size, which is the maximum size that the AnalyserNode allows. For my purposes -- creating interesting looped drones -- a larger FFT results in a more interesting and less "loopy" drone. (A while back I created a webpage that allowed users to generate drones from much larger FFTs... that experiment utilized a third-party FFT library instead of the AnalyserNode.) I'm not sure if this is the right FFT size for your purposes, but it's something to consider.
Anyway, I think that covers the core of the algorithm. Hope this helps! (And feel free to ask more questions in the comments if anything's unclear.)
I'm working on homework for my machine learning course and am having trouble understanding the question on Naive Bayes. The problem I have is a variation of question number 2 on the following page:
https://www.cs.utexas.edu/~mooney/cs343/hw3-old/hw3.html
The numbers I have are slightly different, so I'll replace the numbers from my assignment with the example above. I'm currently attempting to figure out the probability that the first text is physics. To do so, I have something that looks a little like this:
P(physics|c) = P(physics) * P(carbon|physics) * p(atom|physics) * p(life|physics) * p(earth|physics) / [SOMETHING]
P(physics|c) = .35 * .005 * .1 * .001 * .005 / [SOMETHING]
I'm basing this off of an example that I've seen in my notes, but I can't seem to figure out what I'm supposed to divide by. I'll provide the example from the notes as well.
Perhaps I'm going about this in the wrong way, but I'm unsure where the P(X) term that we're dividing by is coming from. How does this relate to the probability that the text is physics? I feel that getting this issue resolved will make the remainder of the assignment simple.
The denominator P(X) is just the sum of P(X|Y)*P(Y) for all possible classes.
Now, it's important to note that in Naive Bayes, you do not have to compute this P(X). You only have to compute P(X|Y)*P(Y) for each class, and then select the class that produced the highest probability.
In your case, I assume you must have several classes. You mentioned physics, but there must be others like chemistry or math.
So you can compute:
P(physics|X) = P(X|physics) * P(physics) / P(X)
P(chemistry|X) = P(X|chemistry) * P(chemistry) / P(X)
P(math|X) = P(X|math) * P(math) / P(X)
P(X) is the sum of P(X|Y)*P(Y) for all classes:
P(X) = P(X|physics)*P(physics) + P(X|chemistry)*P(chemistry) + P(X|math)*P(math)
(By the way, the above statement is exactly analogous to the example in the image that you provided. The equations are a bit complicated there, but if you rearrange them, you will find that P(X) = P(X|positive)*P(positive) + P(X|negative)*P(negative) in that example).
To produce the answer (that is, to determine Y among physics, chemistry, or math), you would select the maximum value among P(physics|X), P(chemistry|X), and P(math|X).
As I mentioned, you do not need to compute P(X) because this term exists in the denominator of all of P(physics|X), P(chemistry|X), and P(math|X). Thus, you only need to determine the max among P(X|physics)*P(physics), P(X|chemistry)*P(chemistry), and P(X|math)*P(math).
The point is that you don't really need a value for P(x) because it is the same among all classes. So you should ignore it and just compare the numbers before the division step. The highest number is the predicted class.
The reason it is in the equation is originating from the Bayes rule:
P(C1|X) = P(X|C1) * P(C1) / P(X)
I would like to generate numbers into an array that has normal distribution. Is there any function in objective-c or c that can help to get the result easily without any math?
Use the the Box-Muller-Transformation:
1.) you need two uniform distributed random numbers u and v as doubles in the interval (0,1] (0 needs to be excluded):
double u =(double)(random() %100000 + 1)/100000; //for precision
double v =(double)(random() %100000 + 1)/100000; //for precision
2.) calculate the uniform distributed value with average of 0 and the standard deviation sigma of 1:
double x = sqrt(-2*log(u))*cos(2*pi*v); //or sin(2*pi*v)
3.) if needed add sigma and average for your target distribution like this:
double y = x * sigmaValue + averageValue;
4.) put it in an array
[randomNumberArray addObject:[NSNumber numberWithDouble:y]]
There is no function norminv for objc. So, math is needed here.
Edit: I like using random() to be able to seed the random value generator
Let me preface this by saying, please, correct me if I'm wrong!
It's my understanding that the Box-Muller Transformation relies on the source numbers being them selves uniformly distributed, thus using random() or rand() as the source data-set for Box-Muller will NOT necessarily produce a uniform distribution.
It is instead intended to take a generic set of uniformly distributed random numbers, and produce independent pairs of random numbers uniformly distributed in a 2D coordinate system.
Wikipedia: Box-Muller Transform
There is however another way:
On most Unix systems (and thus Objective C on iOS or OSX) using the rand48 library of functions:
Reference: drand
double drand48(void);
void srand48(long int seedval);
srand48() seeds the generator, and drand48() produces random numbers uniformly distributed over the interval [0.0 - 1.0]
I've searched the "bgfg_gaussmix2.cpp" code, it says in gaussian mixture model, it stores mixture weight (w), mean ( nchannels values ) and covariance for each gaussian mixture of each pixel background model. I want to know the order of its storage, for instance, is it "weight, mean, covariance", or " mean, covariance, weight", or something else?
Thanks in advance.
If you are speeking about the gaussian mixture structure CvPBGMMGaussian, the storing order is
Weight
mean dimension 1
mean dimension 2
mean dimension 3
Variance
The three dimensions are packed in a float array.
Here is the definition of this structure :
#define CV_BGFG_MOG2_NDMAX 3
typedef struct CvPBGMMGaussian
{
float weight;
float mean[CV_BGFG_MOG2_NDMAX];
float variance;
}CvPBGMMGaussian
If you are not speeking about this structure, please be more precise in your question.
Does it help in classifying better if I add linear, non-linear combinatinos of the existing features ? For example does it help to add mean, variance as new features computed from the existing features ? I believe that it definitely depends on the classification algorithm as in the case of PCA, the algorithm by itself generates new features which are orthogonal to each other and are linear combinations of the input features. But how does it effect in the case of decision tree based classifiers or others ?
Yes, combination of existing features can give new features and help for classification. Moreover, combination of the feature with itself (e.g. polynomial from the feature) can be used as this additional data to be used during classification.
As an example, consider logistic regression classifier with such linear formula as its core:
g(x, y) = 1*x + 2*y
Imagine, that you have 2 observations:
x = 6; y = 1
x = 3; y = 2.5
In both cases g() will be equal to 8. If observations belong to different classes, you have no possibility to distinguish them. But let's add one more variable (feature) z, which is combination of the previous 2 features - z = x * y:
g(x, y, z) = 1*x + 2*y + 0.5*z
Now for same observations we have:
x = 6; y = 1; z = 6 * 1 = 6 ==> g() = 11
x = 3; y = 2.5; z = 3 * 2.5 = 7.5 ==> g() = 11.75
So now we get 2 different points and can distinguish between 2 observations.
Polynomial features (x^2, x^3, y^2, etc.) do not give additional points, but instead change the graph of the function. For example, g(x) = a0 + a1*x is a line, while g(x) = a0 + a1*x + a2*x^2 is parabola and thus can fit data much more closely.
In general, it's always better to have more features. Unless you have very predictive features (i.e. they allow for perfect separation of the classes to predict) already, I would always recommend adding more features. In practice, many classification algorithms (and in particular decision tree inducers) select the best features for their purposes anyway.
There are open-source Python libraries that automate feature creation / combination:
We can automate polynomial feature creations with sklearn.
We can automatically create spline features with sklearn.
We can combine features mathematically with Feature-engine. With MathFeatures we combine feature groups, and with RelativeFeatures we combine feature pairs.