Automata: Developing Context Free Grammars - automata

Can anyone give me the thought process involved in developing a context free grammar for this? I'm given a language where there are a certain number of 0's and a certain number of 1's but the number of 0's is not equal to the number of 1's. However the 0's come first then the 1's (that should make things more straight forward). So an acceptable string would be 0000111 or 01111111
I don't want you to just give me the direct answer, or the answer at all for that matter. Just the process of figuring it out.

Well, the direct answer which you don't want is:
S - initial symbol
S -> X | Y
X -> 0X1 | X1 | 1
Y -> 0Y1 | 0Y | 0
It's the first thing that comes to mind so there isn't too much of a process. Anyway, I would say that the very first thing you must see is that there are two possibilities - either you have more ones, or zeroes and it's good two divide the problem into these two (as I divided S into X and Y).
Then you see that "context free" makes it impossible to control the number in any place other than the boundary between zeroes and ones. The you just get the idea and write down the solution.

Related

Trying to understand the VITERBI algorithm a bit better

I'm currently trying to implement the viterbi algorithm in python, more specifically the version presented in an online course.
As it stands, the algorithm is presented that way:
given a sentence with K tokens, we have to generate K tags .
We assume that tag K-1 = tag K-2 = '*', then for k going from 0 to K,
we set the tag for the token as follows :
tag(WORD_k) = argmax(p(k-1, tag_k-2, tag_k-1) * e( word_k, tag_k) * q(tag_k, tag_k-1, tag_k-1))
From my understanding this is straightforward because the p parameters are already calculated on each step (we go from 1 forward, and we already know p0), and max for the e and q params can be calculated by one iteration through the tags (since we can't come up with 2 different tags, we basically have to find the tag T for which the q * e product is maximal, and return that). This saves a lot of time, since we are almost at linear time in terms in big O notation, instead of exponential complexity, which we would get if we iterated through all possible word/tag combinations.
Am I getting the core of the algorithm correctly or am I missing something out?
Thanks in advance
since we can't come up with 2 different tags, we basically have to
find the tag T for which the q * e product is maximal, and return that
Yeah, sounds about right. q is the trigram (transition) probability and e is named the emission probability. As you said is unchanged between different paths in each stage, so the max is only dependent on the other two.
Each tag sequence should start with two asterisks at positions -2 and -1. So the first assumption is correct:
If we assume to be the maximum probability that the last two tags at position k are u and v, based on what we just said about the beginning asterisks, the base case would be
.
You had two errors in the general case though. The emission probability is a conditional. Also in the trigram, is repeated two times and the formula given is incorrect:

Finding the largest prime factor of 600851475143 [duplicate]

This question already has answers here:
Project Euler #3 in Ruby solution times out
(2 answers)
Closed 9 years ago.
I'm trying to use a program to find the largest prime factor of 600851475143. This is for Project Euler here: http://projecteuler.net/problem=3
I first attempted this with this code:
#Ruby solution for http://projecteuler.net/problem=2
#Prepared by Richard Wilson (Senjai)
#We'll keep to our functional style of approaching these problems.
def gen_prime_factors(num) # generate the prime factors of num and return them in an array
result = []
2.upto(num-1) do |i| #ASSUMPTION: num > 3
#test if num is evenly divisable by i, if so add it to the result.
result.push i if num % i == 0
puts "Prime factor found: #{i}" # get some status updates so we know there wasn't a crash
end
result #Implicit return
end
#Print the largest prime factor of 600851475143. This will always be the last value in the array so:
puts gen_prime_factors(600851475143).last #this might take a while
This is great for small numbers, but for large numbers it would take a VERY long time (and a lot of memory).
Now I took university calculus a while ago, but I'm pretty rusty and haven't kept up on my math since.
I don't want a straight up answer, but I'd like to be pointed toward resources or told what I need to learn to implement some of the algorithms I've seen around in my program.
There's a couple problems with your solution. First of all, you never test that i is prime, so you're only finding the largest factor of the big number, not the largest prime factor. There's a Ruby library you can use, just require 'prime', and you can add an && i.prime? to your condition.
That'll fix inaccuracy in your program, but it'll still be slow and expensive (in fact, it'll now be even more expensive). One obvious thing you can do is just set result = i rather than doing result.push i since you ultimately only care about the last viable i you find, there's no reason to maintain a list of all the prime factors.
Even then, however, it's still very slow. The correct program should complete almost instantly. The key is to shrink the number you're testing up to, each time you find a prime factor. If you've found a prime factor p of your big number, then you don't need to test all the way up to the big number anymore. Your "new" big number that you want to test up to is what's left after dividing p out from the big number as many times as possible:
big_number = big_number/p**n
where n is the largest integer such that the right hand side is still a whole number. In practice, you don't need to explicitly find this n, just keep dividing by p until you stop getting a whole number.
Finally, as a SPOILER I'm including a solution below, but you can choose to ignore it if you still want to figure it out yourself.
require 'prime'
max = 600851475143; test = 3
while (max >= test) do
if (test.prime? && (max % test == 0))
best = test
max = max / test
else
test = test + 2
end
end
puts "Here's your number: #{best}"
Exercise: Prove that test.prime? can be eliminated from the if condition. [Hint: what can you say about the smallest (non-1) divisor of any number?]
Exercise: This algorithm is slow if we instead use max = 600851475145. How can it be improved to be fast for either value of max? [Hint: Find the prime factorization of 600851475145 by hand; it's easy to do and it'll make it clear why the current algorithm is slow for this number]

Why does this code causes the machine to crash?

I am trying to run this code but it keeps crashing:
log10(x):=log(x)/log(10);
char(x):=floor(log10(x))+1;
mantissa(x):=x/10**char(x);
chop(x,d):=(10**char(x))*(floor(mantissa(x)*(10**d))/(10**d));
rnd(x,d):=chop(x+5*10**(char(x)-d-1),d);
d:5;
a:10;
Ibwd:[[30,rnd(integrate((x**60)/(1+10*x^2),x,0,1),d)]];
for n from 30 thru 1 step -1 do Ibwd:append([[n-1,rnd(1/(2*n-1)-a*last(first(Ibwd)),d)]],Ibwd);
Maxima crashes when it evaluates the last line. Any ideas why it may happen?
Thank you so much.
The problem is that the difference becomes negative and your rounding function dies horribly with a negative argument. To find this out, I changed your loop to:
for n from 30 thru 1 step -1 do
block([],
print (1/(2*n-1)-a*last(first(Ibwd))),
print (a*last(first(Ibwd))),
Ibwd: append([[n-1,rnd(1/(2*n-1)-a*last(first(Ibwd)),d)]],Ibwd),
print (Ibwd));
The last difference printed before everything fails miserably is -316539/6125000. So now try
rnd(-1,3)
and see the same problem. This all stems from the fact that you're taking the log of a negative number, which Maxima interprets as a complex number by analytic continuation. Maxima doesn't evaluate this until it absolutely has to and, somewhere in the evaluation code, something's dying horribly.
I don't know the "fix" for your specific example, since I'm not exactly sure what you're trying to do, but hopefully this gives you enough info to find it yourself.
If you want to deconstruct a floating point number, let's first make sure that it is a bigfloat.
say z: 34.1
You can access the parts of a bigfloat by using lisp, and you can also access the mantissa length in bits by ?fpprec.
Thus ?second(z)*2^(?third(z)-?fpprec) gives you :
4799148352916685/140737488355328
and bfloat(%) gives you :
3.41b1.
If you want the mantissa of z as an integer, look at ?second(z)
Now I am not sure what it is that you are trying to accomplish in base 10, but Maxima
does not do internal arithmetic in base 10.
If you want more bits or fewer, you can set fpprec,
which is linked to ?fpprec. fpprec is the "approximate base 10" precision.
Thus fpprec is initially 16
?fpprec is correspondingly 56.
You can easily change them both, e.g. fpprec:100
corresponds to ?fpprec of 335.
If you are diddling around with float representations, you might benefit from knowing
that you can look at any of the lisp by typing, for example,
?print(z)
which prints the internal form using the Lisp print function.
You can also trace any function, your own or system function, by trace.
For example you could consider doing this:
trace(append,rnd,integrate);
If you want to use machine floats, I suggest you use, for the last line,
for n from 30 thru 1 step -1 do :
Ibwd:append([[n-1,rnd(1/(2.0*n- 1.0)-a*last(first(Ibwd)),d)]],Ibwd);
Note the decimal points. But even that is not quite enough, because integration
inserts exact structures like atan(10). Trying to round these things, or compute log
of them is probably not what you want to do. I suspect that Maxima is unhappy because log is given some messy expression that turns out to be negative, even though it initially thought otherwise. It hands the number to the lisp log program which is perfectly happy to return an appropriate common-lisp complex number object. Unfortunately, most of Maxima was written BEFORE LISP HAD COMPLEX NUMBERS.
Thus the result (log -0.5)= #C(-0.6931472 3.1415927) is entirely unexpected to the rest of Maxima. Maxima has its own form for complex numbers, e.g. 3+4*%i.
In particular, the Maxima display program predates the common lisp complex number format and does not know what to do with it.
The error (stack overflow !!!) is from the display program trying to display a common lisp complex number.
How to fix all this? Well, you could try changing your program so it computes what you really want, in which case it probably won't trigger this error. Maxima's display program should be fixed, too. Also, I suspect there is something unfortunate in simplification of logs of numbers that are negative but not obviously so.
This is probably waaay too much information for the original poster, but maybe the paragraph above will help out and also possibly improve Maxima in one or more places.
It appears that your program triggers an error in Maxima's simplification (algebraic identities) code. We are investigating and I hope we have a bug fix soon.
In the meantime, here is an idea. Looks like the bug is triggered by rnd(x, d) when x < 0. I guess rnd is supposed to round x to d digits. To handle x < 0, try this:
rnd(x, d) := if x < 0 then -rnd1(-x, d) else rnd1(x, d);
rnd1(x, d) := (... put the present definition of rnd here ...);
When I do that, the loop runs to completion and Ibwd is a list of values, but I don't know what values to expect.

Package for fast determination of similarity between two bit sequences

I need to compare a query bit sequence with a database of up to a million bit sequences. All bit sequences are 100 bits long. I need the lookup to be as fast as possible. Are there any packages out there for fast determination of the similarity between two bit sequences? --Edit-- The bit sequences are position sensitive.
I have seen a possible algorithm on Bit Twiddling Hacks but if there is a ready made package that would be better.
If the database is rather static, you may want to build a tree data structure on it.
Search the tree recursively or in multiple threads and per search keep an actual difference variable. If the actual difference becomes greater than what you would consider 'similar', abort the search.
E.g. Suppose we have the following tree:
root
0 1
0 1 0 1
0 1 0 1 0 1 0 1
If you want to look for patterns similar to 011, and only want to allow 1 different bit at most, search like this (recursively or multi-threaded):
Start at the root
Take the left branch (0), this is similar, so difference is still 0
Take the left branch (0), this is different, so difference becomes 1, which is still acceptable
take the left branch (0), this is different, so difference becomes 2, which is too high. Abort looking in this branch.
take the right branch (1), this is equal, so difference remains 1, continue to search in this branch (not shown here)
Take the right branch (1), this is equal, so difference remains 0, go on
take the left branch (0), this is different, so difference becomes 1, which is still acceptable, go on.
This goes on until you have found your bit patterns.
If your bit patterns are more dynamic and being updated in your application, you will have to update the tree.
If memory is a problem, consider going to 64-bit.
If you want to look up the, let's say 50, most matching patterns, and we can assume that the input data set is rather static (or can be dynamically updated), you can repeat the initial phase of the previous comment, so:
For every bit pattern, count the bits.
Store the bit patterns in a multi_map (if you use STL, Java probably has something similar)
Then, use the following algorithm:
Make 2 collections: one for storing the found patterns, one for storing possibly good patterns (this second collection should probably be map, mapping 'distances' to patterns)
Take your own pattern and count the bits, assume this is N
Look in the multimap at index N, all these patterns will have the same sum, but not necessarily be completely identical
Compare all the patterns at index N. If they are equal store the result in the first collection. If they are not equal, store the result in the second collection/map, using the difference as key.
Look in the multimap at index N-1, all these patterns will have a distance of 1 or more
Compare all the patterns at index N-1. If they have a distance of 1, store them in the first collection. If they have a larger distance, store the result in the second collection/map, using the difference as key.
Repeat for index N+1
Now look in the second collection/map and see if there is something stored with distance 1. If it is, remove them from the second collection/map and store them in the first collection.
Repeat this for distance 2, distance 3, ... until you have enough patterns.
If the number of required patterns is not too big, and the average distance is also not too big, then the number of real compares between patterns is probably only a few %.
Unfortunately, since the patterns will be distributed using a Gaussian curve, there will still be quite some patterns to check. I didn't do a mathematical check on it, but in practice, if you don't want too many patterns out of the millions, and the average distance is not too far, you should be able to find the set of most-close patterns by checking only a few percent of the total bit patterns.
Please keep me updated of your results.
I came up with a second alternative.
For every bit pattern of the million ones count the number of bits and store the bit patterns in an STL multi_map (if you're writing in C++).
Then count the number of bits in your pattern. Suppose you have N bits set in your bit pattern.
If you now want to allow at most D differences, look up all the bit patterns in the multi_map having N-D, N-D+1, ..., N-1, N, N+1, ... N+D-1, N+D bits.
Unfortunately, the division of bit patterns in the multi_map will follow a Gaussian pattern, which means that in practice you will still have to compare quite some bit patterns.
(Originally I thought this could be solved by counting even 0's and uneven 1's but this isn't true.)
Assuming that you want to allow 1 difference, you have to look up 3 slots in the multi_map out of the 100 possible slots, leaving you with 3% of the actual bit patterns to do a full compare.

Precision of reals through writeln/readln in Delphi

My clients application exports and imports quite a few variables of type real through a text file using writeln and readln. I've tried to increase the width of the fields written so the code looks like:
writeln(file, exportRealvalue:30); //using excess width of field
....
readln(file, importRealvalue);
When I export and then import and export again and compare the files I get a difference in the last two digits, e.g (might be off on the actual number of digits here but you get it):
-1.23456789012E-0002
-1.23456789034E-0002
This actually makes a difference in the app so the client wants to know what I can do about it. Now I'm not sure it's only the write/read that does it but I thought I'd throw a quick question out there before I dive into the hey stack again. Do I need to go binary on this?
This is not an app dealing with currency or something, I just write and read the values to/from file. I know floating points are a bit strange sometimes and I thought one of the routines (writeln/readln) may have some funny business going on.
You might try switching to extended for greater precision. As was pointed out though, floating point numbers only have so many significant digits of precision, so it is still possible to display more digits then are accurately stored, which could result in the behavior you specified.
From the Delphi help:
Fundamental Win32 real types
| Significant | Size in
Type | Range | digits | bytes
---------+----------------------------------+-------------+----------
Real | -5.0 x 10^–324 .. 1.7 x 10^308 | 15–16 | 8
Real48 | -2.9 x 10^–39 .. 1.7 x 10^38 | 11-12 | 6
Single | -1.5 x 10^–45 .. 3.4 x 10^38 | 7-8 | 4
Double | -5.0 x 10^–324 .. 1.7 x 10^308 | 15-16 | 8
Extended | -3.6 x 10^–4951 .. 1.1 x 10^4932 | 10-20 | 10
Comp | -2^63+1 .. 2^63–1 | 10-20 | 8
Currency | -922337203685477.5808.. | |
922337203685477.5807 | 10-20 | 8
Note: The six-byte Real48 type was called Real in earlier versions of Object Pascal. If you are recompiling code that uses the older, six-byte Real type in Delphi, you may want to change it to Real48. You can also use the {$REALCOMPATIBILITY ON} compiler directive to turn Real back into the six-byte type. The following remarks apply to fundamental real types.
Real48 is maintained for backward compatibility. Since its storage format is not native to the Intel processor architecture, it results in slower performance than other floating-point types.
Extended offers greater precision than other real types but is less portable. Be careful using Extended if you are creating data files to share across platforms.
Notice that the range is greater then the significant digits. So you can have a number larger then can be accurately stored. I would recommend rounding to the significant digits to prevent that from happening.
If you want to specify the precision of a real with a WriteLn, use the following:
WriteLn(RealVar:12:3);
It outputs the value Realvar with at least 12 positions and a precision of 3.
When using floating point types, you should be aware of the precision limitations on the specified types. A 4 byte IEEE-754 type, for instance, has only about 7.5 significant digits of precision. An eight byte IEEE-754 type has roughly double the number of significant digits. Apparently, the delphi real type has a precision that lies around 11 significant digits. The result of this is that any extra digits of formatting that you specify are likely to be noise that can result in conversions between base 10 formatted values and base 2 floating point values.
First of all I would try to see if I could get any help from using Str with different arguments or increasing the precision of the types in your app. (Have you tried using Extended?)
As a last resort, (Warning! Workaround!!) I'd try saving the customer's string representation along with the binary representation in a sorted list. Before writing back a floating point value I'd see if there already is a matching value in the table, whose string representation is already known and can be used instead. In order to make get this lookup quick, you can sort it on the numeric value and use binary search for finding the best match.
Depending on how much processing you need to do, an alternative could be to keep the numbers in BCD format to retain original accuracy.
It's hard to answer this without knowing what type your ExportRealValue and ImportRealValue are. As others have mentioned, the real types all have different precisions.
It's worth noting, contrary to some thought, extended is not always higher precision. Extended is 10-20 significant figures where double is 15-16. As you are having trouble around the tenth sig fig perhaps you are using extended already.
To get more control over the reading and writing you can convert the numbers to and from strings yourself and write them to a file stream. At least that way you don't have to worry if readln and writeln are up to no good behind your back.

Resources