Huffman coding prove on a 8 bit sequence [closed] - greedy

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
A data file contains a sequence of 8-bit characters such that all 256 characters are about as common: the maximum character frequency is less than twice the minimum character frequency. Prove that Huffman coding in this case is not more efficient than using an ordinary 8-bit fixed-length code.

The proof is direct. Assume w.l.o.g. that the characters are sorted in ascending order of frequency. We know that f(1) and f(2) will be joined first into f'(1), and since f(2) >= f(1) and 2*f(1) > f(256), this won't be joined until after f(256) is joined with something. By the same token, f(3) and f(4) will be joined into f'(2) with f'(2) >= f'(1) > f(256). Continuing thusly, we get f(253) and f(254) joined into f'(127) >= ... >= f'(1) > f(256). Finally, f(255) and f(256) are joined into f'(128) >= f'(127) >= ... >= f'(1). We now recognize that since f(256) < 2*f(1) <= f'(1) and f'(128) <= 2*f(256), f'(128) <= 2*f(256) < 4*f(1) <= 2*f'(1). Ergo, f'(128) < 2*f'(1), the same condition that held for the first round of the Huffman algorithm.
Since the condition holds on this round, it is straightforward to argue that it will similarly hold on all rounds. Huffman will perform 8 rounds until all nodes are joined to one, the root (128, 64, 32, 16, 8, 4, 2, 1), at which point the algorithm will terminate. Since at each stage each node is joined to another one which has, to that point, received the same treatment by the Huffman algorithm, each branch of the tree will have the same length: 8.
This is somewhat informal, more of a sketch than a proof, really, but it should be more than enough for you to write something more formal.

Related

Finding the largest prime factor of 600851475143 [duplicate]

This question already has answers here:
Project Euler #3 in Ruby solution times out
(2 answers)
Closed 9 years ago.
I'm trying to use a program to find the largest prime factor of 600851475143. This is for Project Euler here: http://projecteuler.net/problem=3
I first attempted this with this code:
#Ruby solution for http://projecteuler.net/problem=2
#Prepared by Richard Wilson (Senjai)
#We'll keep to our functional style of approaching these problems.
def gen_prime_factors(num) # generate the prime factors of num and return them in an array
result = []
2.upto(num-1) do |i| #ASSUMPTION: num > 3
#test if num is evenly divisable by i, if so add it to the result.
result.push i if num % i == 0
puts "Prime factor found: #{i}" # get some status updates so we know there wasn't a crash
end
result #Implicit return
end
#Print the largest prime factor of 600851475143. This will always be the last value in the array so:
puts gen_prime_factors(600851475143).last #this might take a while
This is great for small numbers, but for large numbers it would take a VERY long time (and a lot of memory).
Now I took university calculus a while ago, but I'm pretty rusty and haven't kept up on my math since.
I don't want a straight up answer, but I'd like to be pointed toward resources or told what I need to learn to implement some of the algorithms I've seen around in my program.
There's a couple problems with your solution. First of all, you never test that i is prime, so you're only finding the largest factor of the big number, not the largest prime factor. There's a Ruby library you can use, just require 'prime', and you can add an && i.prime? to your condition.
That'll fix inaccuracy in your program, but it'll still be slow and expensive (in fact, it'll now be even more expensive). One obvious thing you can do is just set result = i rather than doing result.push i since you ultimately only care about the last viable i you find, there's no reason to maintain a list of all the prime factors.
Even then, however, it's still very slow. The correct program should complete almost instantly. The key is to shrink the number you're testing up to, each time you find a prime factor. If you've found a prime factor p of your big number, then you don't need to test all the way up to the big number anymore. Your "new" big number that you want to test up to is what's left after dividing p out from the big number as many times as possible:
big_number = big_number/p**n
where n is the largest integer such that the right hand side is still a whole number. In practice, you don't need to explicitly find this n, just keep dividing by p until you stop getting a whole number.
Finally, as a SPOILER I'm including a solution below, but you can choose to ignore it if you still want to figure it out yourself.
require 'prime'
max = 600851475143; test = 3
while (max >= test) do
if (test.prime? && (max % test == 0))
best = test
max = max / test
else
test = test + 2
end
end
puts "Here's your number: #{best}"
Exercise: Prove that test.prime? can be eliminated from the if condition. [Hint: what can you say about the smallest (non-1) divisor of any number?]
Exercise: This algorithm is slow if we instead use max = 600851475145. How can it be improved to be fast for either value of max? [Hint: Find the prime factorization of 600851475145 by hand; it's easy to do and it'll make it clear why the current algorithm is slow for this number]

How to structure data for an exercise app [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
This seems like it should be so simple to create, but have had my gears turning on how exactly to set it up. Basically I want to create a way for users to be able to save and update their exercise data as they use the app. They will also be able to go back and review any day for their exercise results on that day. I am currently considering using property lists for doing this, but is there a better way?
Example:
04-28-12
-- Exercise: Bench Press Set 1: 115 lbs x 12 reps Set 2: 125 x 10 reps Set 3: 130lbs x 8 reps
-- Exercise Squats Set 1: 215 x 10 reps Set 2...etc;
I really appreciate any input that you guys have on this!
Thanks.
One way to do this is to use an NSArray which contains other arrays and an NSDictionary which can all be stored in a property list.
Then your structure would be like:
People (NSArray)
|
--> Exercises (NSArray)
|
--> Sets (NSArray)
|
--> 0 (NSDictionary)
|
--> Weight (key): 115 (value)
|
--> Repetitions (key): 12 (value)
|
--> Type (key): 1 (value integer mapped to exercise type like Squats)
This can be pretty cumbersome though when you get a lot of data in this one property list, so you might want to consider using Core Data instead (http://www.raywenderlich.com/934/core-data-on-ios-5-tutorial-getting-started)

Examples of very concise Forth applications? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
In this talk, Chuck Moore (the creator of Forth) makes some very bold, sweeping claims, such as:
"Every application that I have seen that I didn't code has ten times as much code in it as it needs"
"About a thousand instructions seems about right to me to do about anything"
"If you are writing code that needs [local variables] you are writing non-optimal code. Don't use local variables."
I'm trying to figure out whether Mr. Moore is a) an absolutely brilliant genius or b) a crackpot. But that's a subjective question, and I'm not looking for the answer to that question here. What I am looking for are examples of complex, real-world problems which can be solved in "1000 instructions or less" using Forth, and source code demonstrating how to do so. An example showing just one non-trivial piece of a real-world system would be fine, but no "toy" code samples which could be duplicated in 5 or 10 lines of another high-level language, please.
If you have written real-world systems in Forth, using just a small amount of source code, but aren't at liberty to show the source (because it is proprietary), I'd still like to hear about it.
You need to understand that Chuck Moore is a little different than you and me. He was trained in an era when mainframe computers consisted of 16 KB or its equivalent in core memory, and he was able to do quite a lot of things with the computers of the time. Perhaps the biggest success for Forth, outside of his OKAD-II chip design package (that's not a typo), was a multi-user multi-tasking Forth system responsible for concurrently controlling data acquisition instruments and data analysis/visualization software at NRAO on a fairly modestly sized computer barely able to compile Fortran source code on its own.
What he calls an "application", we might consider to be a "component" of a larger, more nebulous thing called an application. More generally, it's good to keep in mind that one Moore "application" is more or less equivalent to one "view" in an MVC triad today. To keep memory consumption small, he relies heavily on overlays and just-in-time compilation techniques. When switching from one program interface to another, it typically involves recompiling the entire application/view from source. This happens so fast you don't know it's happening. Kind of like how Android recompiles Dalvik code to native ARM code when you activate an application every time today.
At any given time, OKAD-II has no more than about 2.5 KB of its code loaded into memory and running. However, the on-disk source for OKAD-II is considerably larger than 2.5 KB. Though, it is still significantly more compact than its nearest competitor, SPICE.
I'm often curious about Chuck Moore's views and find his never-ending strive for simplicity fascinating. So, in MythBusters fashion, I put his claims to the test by trying to design my own system as minimally as I could make it. I'm happy to report he's very nearly spot-on in his claims, on both hardware and software issues. Case in point, during last September's Silicon Valley Forth Interest Group (SVFIG) meeting, I used my Kestrel-2 itself to generate video for the slide deck. This required I wrote a slide presentation program for it, which took up 4 KB of memory for the code, and 4 KB for the slide deck data structures. With an average space of six bytes per Forth word (for reasons I won't go into here), the estimate of "about 1000 (Forth) instructions" for the application is just about spot on to what Chuck Moore estimates his own "applications" to be.
If you're interested in speaking to real-world Forth coders (or who have done so in the past, as it increasingly seems to be), and you happen to be in the Bay Area, the Silicon Valley Forth Interest Group still meets every fourth Saturday of the month, except for November and December, which is the third Saturday. If you're interested in attending a meeting, even if only to interview Forth coders and get a taste of what "real-world" Forth is like, check us out on meetup.com and tag along. We also new stream our meetings on YouTube, but we're not very good at it. We're abusing inappropriate hardware and software to do our bidding, since we have a budget of zero for this sort of thing. :)
Forth is indeed amazingly compact! Words without formal parameters (and zero-operand instructions at the hardware - e.g. the GA144) saves a lot. The other main contributor to its compactness is the absolutely relentless factoring of redundant code that the calling convention and concatenative nature affords.
I don't know if it qualifies as a non-toy example, but the Turtle Graphics implementation for the Fignition (in FigForth) is just 307 bytes compiled and fits in a single source block! This includes the fixed point trig and all the normal turtle commands. This isn't the best example of readable Forth because of trying to squeeze it into a single source block with single-character names and such:
\ 8.8 fixed point sine table lookup
-2 var n F9F2 , E9DD , CEBD , AA95 , 7F67 , 4E34 , 1A c,
: s abs 3C mod dup 1D > if 3C swap - then dup E > if
-1 1E rot - else 1 swap then n + c# 1+ * ;
0 var x 0 var y 0 var a
0 var q 0 var w
: c 9380 C80 0 fill ; \ clear screen
: k >r 50 + 8 << r> ! ;
: m dup q # * x +! w # * y +! ; \ move n-pixels (without drawing)
: g y k x k ; \ go to x,y coord
: h dup a ! dup s w ! 2D + s q ! ; \ heading
: f >r q # x # y # w # r 0 do >r r + >r over + \ forward n-pixels
dup 8 >> r 8 >> plot r> r> loop o y ! x ! o r> o ;
: e key 0 vmode cls ; \ end
: b 1 vmode 1 pen c 0 0 g 0 h ; \ begin
: t a # + h ; \ turn n-degrees
Using it is extremely concise as well.
: sin 160 0 do i i s 4 / 80 + plot loop ;
: burst 60 0 do 0 0 g i h 110 f loop ;
: squiral -50 50 g 20 0 do 100 f 21 t loop ;
: circle 60 0 do 4 f 1 t loop ;
: spiral 15 0 do circle 4 t loop ;
: star 5 0 do 80 f 24 t loop ;
: stars 3 0 do star 20 t loop ;
: rose 0 50 0 do 2 + dup f 14 t loop ;
: hp 15 0 do 5 f 1 t loop 15 0 do 2 f -1 t loop ;
: petal hp 30 t hp 30 t ;
: flower 15 0 do petal 4 t loop ;
(shameless blog plug: http://blogs.msdn.com/b/ashleyf/archive/2012/02/18/turtle-graphics-on-the-fignition.aspx)
What is not well understood today is the way Forth anticipated an approach to coding that became popular early in the 21st century in association with agile methods. Specifically:
Forth introduced the notion of tiny method coding -- the use of small objects with small methods. You could make a case for Smalltalk and Lisp here too, but in the late 1980s both Smalltalk and Lisp practice tended toward larger and more complex methods. Forth always embraced very small methods, if only because it encouraged doing so much on the stack.
Forth, even more than Lisp, popularized the notion that the interpreter was just a little software pattern, not a dissertation-sized brick. Got a problem that's hard to code? The Forth solution had to be, "write a little language", because that's what Forth programming was.
Forth was very much a product of memory and time constraints, of an era where computers were incredibly tiny and terribly slow. It was a beautiful design that lets you build an operating system and a compiler in a matchbox.
An example of just how compact Forth can be, is Samuel Falvo's screencast Over the Shoulder 1 - Text Preprocessing in Forth (1h 06 min 25 secs, 101 MB, MPEG-1 format - at least VLC can play it). Alternative source ("Links and Resources" -> "Videos").
Forth Inc's polyFORTH/32 VAX/VMS assembler definitions took some 8 blocks of source. A VAX assembler, in 8K of source. Commented. I'm still amazed, 30 years later.
I can't verify at the moment, but I'm guessing the instruction count to parse those CODE definitions would be in the low hundreds. And when I said 'took some 8 blocks', it still takes, the application using that nucleus is live and in production, 30 years later.

Project Euler -Prob. #20 (Lua)

http://projecteuler.net/problem=20
I've written code to figure out this problem, however, it seems to be accurate in some cases, and inaccurate in others. When I try solving the problem to 10 (answer is given in question, 27) I get 27, the correct answer. However, when I try solving the question given (100) I get 64, the incorrect answer, as the answer is something else.
Here's my code:
function factorial(num)
if num>=1 then
return num*factorial(num-1)
else
return 1
end
end
function getSumDigits(str)
str=string.format("%18.0f",str):gsub(" ","")
local sum=0
for i=1,#str do
sum=sum+tonumber(str:sub(i,i))
end
return sum
end
print(getSumDigits(tostring(factorial(100))))
64
Since Lua converts large numbers into scientific notation, I had to convert it back to standard notation. I don't think this is a problem, though it might be.
Is there any explanation to this?
Unfortunately, the correct solution is more difficult. The main problem here is that Lua uses 64bit floating point variables, which means this applies.
Long story told short: The number of significant digits in a 64bit float is much too small to store a number like 100!. Lua's floats can store a maximum of 52 mantissa bits, so any number greater than 2^52 will inevitably suffer from rounding errors, which gives you a little over 15 decimal digits. To store 100!, you'll need at least 158 decimal digits.
The number calculated by your factorial() function is reasonably close to the real value of 100! (i.e. the relative error is small), but you need the exact value to get the right solution.
What you need to do is implement your own algorithms for dealing with large numbers. I actually solved that problem in Lua by storing each number as a table, where each entry stores one digit of a decimal number. The complete solution takes a little more than 50 lines of code, so it's not too difficult and a nice exercise.

How to find the remainder of large number division in C++?

I have a question regarding modulus in C++. What I was trying to do was divide a very large number, lets say for example, M % 2, where M = 54,302,495,302,423. However, when I go to compile it says that the number is to 'long' for int. Then when I switch it to a double it repeats the same error message. Is there a way I can do this in which I will get the remainder of this very large number or possibly an even larger number? Thanks for your help, much appreciated.
You can try storing the number in a "long long" (64 bit integral value), just be aware that if your application is multi-threaded and running on a 32-bit CPU you will need to synchronize between threads when reading/writing this value as it takes 2 clock cycles to read/write.
Alternatively, try a bignum library
If you want to make things interesting, if you are only ever doing modulo 2 you can check the lowest bit and get your answer. If you are only doing up to modulo 255 you can take the lowest 8 (unsigned char) bits and do the operation on them. If you are only doing up to modulo 65535 you can take the lowest 16 bits (unsigned short) and do the operation on them.
For large number arithmetic in C++, use the GMP library. In particular, the mpz_mod function would do this.
For a more natural C++ wrapper, the mpz_class class can help by providing operator overloading for multiprecision operations.
ints only range from –2,147,483,648 to 2,147,483,647. Check
http://msdn.microsoft.com/en-us/library/s3f49ktz(VS.71).aspx for data type ranges. I recommend a long long.
Hint: Use a linked list. Store the number as a group of numbers dynamically. For eg:
112233445566778899001122 => 11223344 55667788 99001122
Now consider the individual unit and start from left to right. Find the reminder and manipulate it to add to the next group and go on.
Now implementation is very easy :)
Edit:
112233445566778899001122/6 => 11223344 55667788 99001122/6
11223344/6 =>2
2*100000000 + 55667788 = 255667788
255667788/6 => 0
0*100000000 + 99001122 = 99001122
99001122/6=>0
So the reminder is 0.
Remember, the individual unit after manipulation should be under the maximum range int can support.

Resources