proof of optimality in activity selection - greedy

Can someone please explain in a not so formal way how the greedy choice is the optimal solution for the activity selection problem? This is the simplest explanation that I have found but I don't really get it
How does Greedy Choice work for Activities sorted according to finish time?
Let the given set of activities be S = {1, 2, 3, ..n} and activities be sorted by finish time. The greedy choice is to always pick activity 1. How come the activity 1 always provides one of the optimal solutions. We can prove it by showing that if there is another solution B with the first activity other than 1, then there is also a solution A of the same size with activity 1 as the first activity. Let the first activity selected by B be k, then there always exist A = {B – {k}} U {1}.(Note that the activities in B are independent and k has smallest finishing time among all. Since k is not 1, finish(k) >= finish(1)).

The following is my understanding of why greedy solution always words:
Assertion: If A is the greedy choice(starting with 1st activity in the sorted array), then it gives the optimal solution.
Proof: Let there be another choice B starting with some activity k (k != 1 or finishTime(k)>= finishTime(1)) which alone gives the optimal solution.So, B does not have the 1st activity and the following relation could be written between A & B could be written as:
A = {B - {k}} U {1}
Here:
1.Sets A and B are disjoint
2.Both A and B have compatible activities in them
Since we conclude that |A|=|B|, therefore activity A also gives the optimal solution.

Let's say A is a the optimal solution which starts with 1 if the intervals are S={1,2,3,.....m} and the length of the solution is say n1. If A is not an optimal solution, then there exists another solution B which starts with k!=1 and finishTime(k)>=finishTime(1), which has length n2.
So, n2>n1.
Now, if we exclude k from solution B then we are left with n2-1 number of elements.
Since, k doesn't overlap with other intervals in B, 1 will also not overlap.
This is because all intervals in B(excluding k) will have startTime>= finishTime(k)>=finishTime(1).Hence, if we replace k with 1 in B, we still have n2 length. But optimal solution starting with 1 was A with length n1. We are getting n1=n2 , which contradicts n2>n1. Hence Solution starting with 1 is optimal.

Related

How to solve this recurrence using masters method?

T(n)=4t(n/2) + n^2 and t(1)=1
I dont know guys, I can solve other ones but I seem to get stuck and cant start with this one
Let's work through this one and see what we find. In this case, we have a = 4, b = 2, and d = 2. Since logb a = 2 = d, we should get that t(n) = Θ(n2 log n).
Let's quickly check to see if that's the case by thinking about how much work is done per level in the tree. At the top level, we do n2 work and then make four calls on problems of size n/2. Each of those problems does (n/2)2 = n2 / 4 work, and since there's four copies of that problem the work done at the next level is n2. Each of those subproblems fires off four recursive calls on problems of size (n/4)2 = n2 / 16, and since there are sixteen of those subproblems the work done at that level is also n2. Overall, we see that each layer in the tree does n2 work and that there are Θ(log n) layers, so the total work done is Θ(n2 log n), matching our bound from the Master Theorem.

Greedy algorithm to finish a task with time constraint

This is a question from my midterm today and I wonder how to solve this. All i know is to prove the greedy algorithm using induction.
Question:
You are working on a programming project. There are n Java classes C1, C2, ..., Cn (the bossy architect says so). The architect also says that these classes have to be implemented in order (you are not allowed to implement C2 before you have completed C1 and so on).
Each of the Java classes takes at most 8 hours to implement. You work exactly 8 hours a day, and you should not leave a Java class unfinished at the end of the day.
To complete the project as soon as possible, a strategy is to implement as many classes as you can everyday. Prove that this greedy strategy is indeed the optimal one.
(Hint: let ti be the total number of classes completed in the first i days using the above strategy. The strategy always stays ahead if ti is no less than the total number of classes completed in the first i days using any other strategy)
This problem is similar to the classic task scheduling case where the waiting time in the system must be minimized.
Let C1, C2, ..., Cn your projected classes and c[1], c[2], ..., c[n] their required implementation time. Let's suppose you implement C1, C2, ... Cn in this order. Therefore, the total time (waiting + implementation) for each class Ck will be:
c[1] + c[2] + ... + c[k]
Therefore, we have the total time:
T = n·c[1] + (n – 1)·c[2] + ... + 2*c*[n – 1] + c[n] = sum(k = 1 to n) of (n – k + 1)·c[k]
(Sorry about the presentation — superscripts, subscripts, and math equations aren't supported...)
Let's suppose the implementation times in our permutation are not sorted by ascending order. We can therefore find two integers a and b such that a < b with c[a] > c[b]. If we switch them in the computation of T, we have:
T' = (n – a + 1)·c[b] + (n – b + 1)·c[a] + sum(k = 1 to n except a, b) of (n – k + 1)·c[k]
We finally compute T – T':
T – T' = (n – a + 1)(c[a] – c[b]) + (n – b + 1)(c[b] – c[a]) = (b – a)(c[a] – c[b])
Following our initial hypothesis (a < b and c[a] > c[b]), we have b – a > 0 and c[a] – c[b] > 0 as well, hence T – T' > 0.
This proves that we decrease the total waiting time by switching any pair of tasks so that the shorter one is done first.
Your problem statement is the same, except that before starting implementing a new class, you have to check whether you should start it now (if there is enough time left on the present day) or tomorrow. But the principle proven here holds when it comes to minimizing the total "waiting" time.
This is not a programming question for SO. The problem is not asking for a coding solution, rather its a proof that greedy is optimal. Which can be done with a proof by contradiction (no doubt taught in the class before the midterm).
What you want to do is to calculate the total time taken by greedy (there's only one solution) and disprove that any swaps in day would lead to a better solution. You probably also have to add something that mentions how swapping will allow u to permute the order to the optimal solution, if it exists.
I was going to write some formulae, but i realize Jeff Morin already has the equations, just going in the opposite direction. I think starting from the greedy solution might be easier to explain, since 'in order' is pretty much defined by the problem and you can only shift the work +- which day its done on.
The problem statement is incomplete. There is no indication that any class will take less than 8 hours. Since you can't leave any class unfinished, then you must start each class at the start of the day to be sure to have at least 8 hours to work on it. So if C2 really takes 3 hours and C3 really takes 5 hours, then a greedy algorithm would allow both classes to be done the same day. But after C2 takes 3 hours, you have to wait to day 3 to start C3 to be sure that you have enough time to finish since you don't know how long C3 will take.
So the restrictions really end up dictating that the effort will take n days, 1 day per class. So the implementation algorithm is strictly sequential, not greedy.
Edit Restrictions stated in problem.
(1) There are n Java classes C1, C2, ..., Cn
(2) these classes have to be implemented in order (you are not allowed to implement C2 before you have completed C1 and so on).
(3) Each of the Java classes takes at most 8 hours to implement
But there is no estimate for any class taking less than 8 hours.
(4) You work exactly 8 hours a day
(5) You should not leave a Java class unfinished at the end of the day.
The gist of this (3,4,& 5) is let's assume that I work on class 1 for 5 minutes. I now have 7 hours 55 minutes left. Can I start on Class 2? No because it might take up to 8 hours and I must finish before the end of my 8-hour day. So I must wait to day 2 to start class 2 and so on. Thus the implementation is strictly sequential and will take n days to complete, 1 day per class.
In order to use the Greedy algorithm you'd need additional information.
(6) You also know that each class has a known number of hours needed to code the class - h1, h2, h3, ..., hn. So class 1 takes h1 hours, class 2 takes h2 hours and so on. (From item 3 no class takes more than 8 hours)

Project Euler #3 Ruby Solution - What is wrong with my code?

This is my code:
def is_prime(i)
j = 2
while j < i do
if i % j == 0
return false
end
j += 1
end
true
end
i = (600851475143 / 2)
while i >= 0 do
if (600851475143 % i == 0) && (is_prime(i) == true)
largest_prime = i
break
end
i -= 1
end
puts largest_prime
Why is it not returning anything? Is it too large of a calculation going through all the numbers? Is there a simple way of doing it without utilizing the Ruby prime library(defeats the purpose)?
All the solutions I found online were too advanced for me, does anyone have a solution that a beginner would be able to understand?
"premature optimization is (the root of all) evil". :)
Here you go right away for the (1) biggest, (2) prime, factor. How about finding all the factors, prime or not, and then taking the last (biggest) of them that is prime. When we solve that, we can start optimizing it.
A factor a of a number n is such that there exists some b (we assume a <= b to avoid duplication) that a * b = n. But that means that for a <= b it will also be a*a <= a*b == n.
So, for each b = n/2, n/2-1, ... the potential corresponding factor is known automatically as a = n / b, there's no need to test a for divisibility at all ... and perhaps you can figure out which of as don't have to be tested for primality as well.
Lastly, if p is the smallest prime factor of n, then the prime factors of n are p and all the prime factors of n / p. Right?
Now you can complete the task.
update: you can find more discussion and a pseudocode of sorts here. Also, search for "600851475143" here on Stack Overflow.
I'll address not so much the answer, but how YOU can pursue the answer.
The most elegant troubleshooting approach is to use a debugger to get insight as to what is actually happening: How do I debug Ruby scripts?
That said, I rarely use a debugger -- I just stick in puts here and there to see what's going on.
Start with adding puts "testing #{i}" as the first line inside the loop. While the screen I/O will be a million times slower than a silent calculation, it will at least give you confidence that it's doing what you think it's doing, and perhaps some insight into how long the whole problem will take. Or it may reveal an error, such as the counter not changing, incrementing in the wrong direction, overshooting the break conditional, etc. Basic sanity check stuff.
If that doesn't set off a lightbulb, go deeper and puts inside the if statement. No revelations yet? Next puts inside is_prime(), then inside is_prime()'s loop. You get the idea.
Also, there's no reason in the world to start with 600851475143 during development! 17, 51, 100 and 1024 will work just as well. (And don't forget edge cases like 0, 1, 2, -1 and such, just for fun.) These will all complete before your finger is off the enter key -- or demonstrate that your algorithm truly never returns and send you back to the drawing board.
Use these two approaches and I'm sure you'll find your answers in a minute or two. Good luck!
Do you know you can solve this with one line of code in Ruby?
Prime.prime_division(600851475143).flatten.max
=> 6857

Pathfinding in Prolog

I'm trying to teach myself Prolog. Below, I've written some code that I think should return all paths between nodes in an undirected graph... but it doesn't. I'm trying to understand why this particular code doesn't work (which I think differentiates this question from similar Prolog pathfinding posts). I'm running this in SWI-Prolog. Any clues?
% Define a directed graph (nodes may or may not be "room"s; edges are encoded by "leads_to" predicates).
room(kitchen).
room(living_room).
room(den).
room(stairs).
room(hall).
room(bathroom).
room(bedroom1).
room(bedroom2).
room(bedroom3).
room(studio).
leads_to(kitchen, living_room).
leads_to(living_room, stairs).
leads_to(living_room, den).
leads_to(stairs, hall).
leads_to(hall, bedroom1).
leads_to(hall, bedroom2).
leads_to(hall, bedroom3).
leads_to(hall, studio).
leads_to(living_room, outside). % Note "outside" is the only node that is not a "room"
leads_to(kitchen, outside).
% Define the indirection of the graph. This is what we'll work with.
neighbor(A,B) :- leads_to(A, B).
neighbor(A,B) :- leads_to(B, A).
Iff A --> B --> C --> D is a loop-free path, then
path(A, D, [B, C])
should be true. I.e., the third argument contains the intermediate nodes.
% Base Rule (R0)
path(X,Y,[]) :- neighbor(X,Y).
% Inductive Rule (R1)
path(X,Y,[Z|P]) :- not(X == Y), neighbor(X,Z), not(member(Z, P)), path(Z,Y,P).
Yet,
?- path(bedroom1, stairs, P).
is false. Why? Shouldn't we get a match to R1 with
X = bedroom1
Y = stairs
Z = hall
P = []
since,
?- neighbor(bedroom1, hall).
true.
?- not(member(hall, [])).
true.
?- path(hall, stairs, []).
true .
?
In fact, if I evaluate
?- path(A, B, P).
I get only the length-1 solutions.
Welcome to Prolog! The problem, essentially, is that when you get to not(member(Z, P)) in R1, P is still a pure variable, because the evaluation hasn't gotten to path(Z, Y, P) to define it yet. One of the surprising yet inspiring things about Prolog is that member(Ground, Var) will generate lists that contain Ground and unify them with Var:
?- member(a, X).
X = [a|_G890] ;
X = [_G889, a|_G893] ;
X = [_G889, _G892, a|_G896] .
This has the confusing side-effect that checking for a value in an uninstantiated list will always succeed, which is why not(member(Z, P)) will always fail, causing R1 to always fail. The fact that you get all the R0 solutions and none of the R1 solutions is a clue that something in R1 is causing it to always fail. After all, we know R0 works.
If you swap these two goals, you'll get the first result you want:
path(X,Y,[Z|P]) :- not(X == Y), neighbor(X,Z), path(Z,Y,P), not(member(Z, P)).
?- path(bedroom1, stairs, P).
P = [hall]
If you ask for another solution, you'll get a stack overflow. This is because after the change we're happily generating solutions with cycles as quickly as possible with path(Z,Y,P), only to discard them post-facto with not(member(Z, P)). (Incidentally, for a slight efficiency gain we can switch to memberchk/2 instead of member/2. Of course doing the wrong thing faster isn't much help. :)
I'd be inclined to convert this to a breadth-first search, which in Prolog would imply adding an "open set" argument to contain solutions you haven't tried yet, and at each node first trying something in the open set and then adding that node's possibilities to the end of the open set. When the open set is extinguished, you've tried every node you could get to. For some path finding problems it's a better solution than depth first search anyway. Another thing you could try is separating the path into a visited and future component, and only checking the visited component. As long as you aren't generating a cycle in the current step, you can be assured you aren't generating one at all, there's no need to worry about future steps.
The way you worded the question leads me to believe you don't want a complete solution, just a hint, so I think this is all you need. Let me know if that's not right.

Push-relabel gap heuristics

I don't understand how to implement gap heuristics with push relabel. Wiki described it like this:
"In gap relabeling heuristic we maintain an array A of size n, holding in A[i]
the number of nodes for each label (up to n). If a label d is found, such that
A[d] = 0, then all nodes with label > d are relabeled to label n."
Use a gap heuristic. If there is a 'k' such that for no node height(u) =k, you can set height(u) = max(height(u), height(source) +1) for all nodes except source, for which height(u) >k. This is because any such 'k' represents a minimal cut in the graph, and no more flow will go from the nodes S={u where height(u) > k} to nodes in T={v, where height(v)0. But then height(u) > height(v)+1 , contradicting height(u) > k and height(v) < k.
Can someone explain to me in pseudocode how to add the gap heuristic to a FIFO push-relabel as shown in wiki's sample code?
Thanks,
Vince
It might be a little late but here is a link to a Stanford University notebook where you can find a push-relabel maximum flow using a Gap Heuristic in C.
I hope it helps you.
http://www.stanford.edu/~liszt90/acm/notebook.html#file3

Resources