Greedy algorithm to finish a task with time constraint - analysis

This is a question from my midterm today and I wonder how to solve this. All i know is to prove the greedy algorithm using induction.
Question:
You are working on a programming project. There are n Java classes C1, C2, ..., Cn (the bossy architect says so). The architect also says that these classes have to be implemented in order (you are not allowed to implement C2 before you have completed C1 and so on).
Each of the Java classes takes at most 8 hours to implement. You work exactly 8 hours a day, and you should not leave a Java class unfinished at the end of the day.
To complete the project as soon as possible, a strategy is to implement as many classes as you can everyday. Prove that this greedy strategy is indeed the optimal one.
(Hint: let ti be the total number of classes completed in the first i days using the above strategy. The strategy always stays ahead if ti is no less than the total number of classes completed in the first i days using any other strategy)

This problem is similar to the classic task scheduling case where the waiting time in the system must be minimized.
Let C1, C2, ..., Cn your projected classes and c[1], c[2], ..., c[n] their required implementation time. Let's suppose you implement C1, C2, ... Cn in this order. Therefore, the total time (waiting + implementation) for each class Ck will be:
c[1] + c[2] + ... + c[k]
Therefore, we have the total time:
T = n·c[1] + (n – 1)·c[2] + ... + 2*c*[n – 1] + c[n] = sum(k = 1 to n) of (n – k + 1)·c[k]
(Sorry about the presentation — superscripts, subscripts, and math equations aren't supported...)
Let's suppose the implementation times in our permutation are not sorted by ascending order. We can therefore find two integers a and b such that a < b with c[a] > c[b]. If we switch them in the computation of T, we have:
T' = (n – a + 1)·c[b] + (n – b + 1)·c[a] + sum(k = 1 to n except a, b) of (n – k + 1)·c[k]
We finally compute T – T':
T – T' = (n – a + 1)(c[a] – c[b]) + (n – b + 1)(c[b] – c[a]) = (b – a)(c[a] – c[b])
Following our initial hypothesis (a < b and c[a] > c[b]), we have b – a > 0 and c[a] – c[b] > 0 as well, hence T – T' > 0.
This proves that we decrease the total waiting time by switching any pair of tasks so that the shorter one is done first.
Your problem statement is the same, except that before starting implementing a new class, you have to check whether you should start it now (if there is enough time left on the present day) or tomorrow. But the principle proven here holds when it comes to minimizing the total "waiting" time.

This is not a programming question for SO. The problem is not asking for a coding solution, rather its a proof that greedy is optimal. Which can be done with a proof by contradiction (no doubt taught in the class before the midterm).
What you want to do is to calculate the total time taken by greedy (there's only one solution) and disprove that any swaps in day would lead to a better solution. You probably also have to add something that mentions how swapping will allow u to permute the order to the optimal solution, if it exists.
I was going to write some formulae, but i realize Jeff Morin already has the equations, just going in the opposite direction. I think starting from the greedy solution might be easier to explain, since 'in order' is pretty much defined by the problem and you can only shift the work +- which day its done on.

The problem statement is incomplete. There is no indication that any class will take less than 8 hours. Since you can't leave any class unfinished, then you must start each class at the start of the day to be sure to have at least 8 hours to work on it. So if C2 really takes 3 hours and C3 really takes 5 hours, then a greedy algorithm would allow both classes to be done the same day. But after C2 takes 3 hours, you have to wait to day 3 to start C3 to be sure that you have enough time to finish since you don't know how long C3 will take.
So the restrictions really end up dictating that the effort will take n days, 1 day per class. So the implementation algorithm is strictly sequential, not greedy.
Edit Restrictions stated in problem.
(1) There are n Java classes C1, C2, ..., Cn
(2) these classes have to be implemented in order (you are not allowed to implement C2 before you have completed C1 and so on).
(3) Each of the Java classes takes at most 8 hours to implement
But there is no estimate for any class taking less than 8 hours.
(4) You work exactly 8 hours a day
(5) You should not leave a Java class unfinished at the end of the day.
The gist of this (3,4,& 5) is let's assume that I work on class 1 for 5 minutes. I now have 7 hours 55 minutes left. Can I start on Class 2? No because it might take up to 8 hours and I must finish before the end of my 8-hour day. So I must wait to day 2 to start class 2 and so on. Thus the implementation is strictly sequential and will take n days to complete, 1 day per class.
In order to use the Greedy algorithm you'd need additional information.
(6) You also know that each class has a known number of hours needed to code the class - h1, h2, h3, ..., hn. So class 1 takes h1 hours, class 2 takes h2 hours and so on. (From item 3 no class takes more than 8 hours)

Related

Is there a long method operation for modulo if mod or % is not a supported function/operator?

This is related to the Zeller's Congruence algorithm where there is a requirement to use Modulo to get the actual day of an input date. However, in the software I'm using which is Blueprism, there is no modulo operator/function that is available and I can't get the result I would hope to get.
In some coding language (Python, C#, Java), Zeller's congruence formula were provided because mod is available.
Would anyone know a long method of combine arithmetic operation to get the mod result?
From what I've read, mod is the remainder result from two numbers. But
181 mod 7 = 6 and 181 divided by 7 = 25.857.. the remainder result are different.
There are two answers to this.
If you have a floor() or int() operation available, then a % b is:
a - floor(a/b)*b
(revised to incorporate Andrzej Kaczor's comment, thanks!)
If you don't, then you can iterate, each time subtracting b from a until the remainder is less than b. At that point, the remainder is a % b.

proof of optimality in activity selection

Can someone please explain in a not so formal way how the greedy choice is the optimal solution for the activity selection problem? This is the simplest explanation that I have found but I don't really get it
How does Greedy Choice work for Activities sorted according to finish time?
Let the given set of activities be S = {1, 2, 3, ..n} and activities be sorted by finish time. The greedy choice is to always pick activity 1. How come the activity 1 always provides one of the optimal solutions. We can prove it by showing that if there is another solution B with the first activity other than 1, then there is also a solution A of the same size with activity 1 as the first activity. Let the first activity selected by B be k, then there always exist A = {B – {k}} U {1}.(Note that the activities in B are independent and k has smallest finishing time among all. Since k is not 1, finish(k) >= finish(1)).
The following is my understanding of why greedy solution always words:
Assertion: If A is the greedy choice(starting with 1st activity in the sorted array), then it gives the optimal solution.
Proof: Let there be another choice B starting with some activity k (k != 1 or finishTime(k)>= finishTime(1)) which alone gives the optimal solution.So, B does not have the 1st activity and the following relation could be written between A & B could be written as:
A = {B - {k}} U {1}
Here:
1.Sets A and B are disjoint
2.Both A and B have compatible activities in them
Since we conclude that |A|=|B|, therefore activity A also gives the optimal solution.
Let's say A is a the optimal solution which starts with 1 if the intervals are S={1,2,3,.....m} and the length of the solution is say n1. If A is not an optimal solution, then there exists another solution B which starts with k!=1 and finishTime(k)>=finishTime(1), which has length n2.
So, n2>n1.
Now, if we exclude k from solution B then we are left with n2-1 number of elements.
Since, k doesn't overlap with other intervals in B, 1 will also not overlap.
This is because all intervals in B(excluding k) will have startTime>= finishTime(k)>=finishTime(1).Hence, if we replace k with 1 in B, we still have n2 length. But optimal solution starting with 1 was A with length n1. We are getting n1=n2 , which contradicts n2>n1. Hence Solution starting with 1 is optimal.

Recurrence Relation tree method

I am currently having issues with figuring our some recurrence stuff and since I have midterms about it coming up soon I could really use some help and maybe an explanation on how it works.
So I basically have pseudocode for solving the Tower of Hanoi
TOWER_OF_HANOI ( n, FirstRod, SecondRod, ThirdRod)
if n == 1
move disk from FirstRod to ThirdRod
else
TOWER_OF_HANOI(n-1, FirstRod, ThirdRod, SecondRod)
move disk from FirstRod to ThirdRod
TOWER_OF_HANOI(n-1, SecondRod, FirstRod, ThirdRod)
And provided I understand how to write the relation (which, honestly I'm not sure I do...) it should be T(n) = 2T(n-1)+Ɵ(n), right? I sort of understand how to make a tree with fractional subproblems, but even then I don't fully understand the process that would give you the end solution of Ɵ(n) or Ɵ(n log n) or whatnot.
Thanks for any help, it would be greatly appreciated.
Assume the time complexity is T(n), it is supposed to be: T(n) = T(n-1) + T(n-1) + 1 = 2T(n-1) + 1. Why "+1" but not "+n"? Since "move disk from FirstRod to ThirdRod" costs you only one move.
For T(n) = 2T(n-1) + 1, its recursion tree will exactly look like this:
https://www.quora.com/What-is-the-complexity-of-T-n-2T-n-1-+-C (You might find it helpful, the image is neat.) C is a constant; it means the cost per operation. In the case of Tower of Hanoi, C = 1.
Calculate the sum of the cost each level, you will easily find out in this case, the total cost will be 2^n-1, which is exponential(expensive). Therefore, the answer of this recursion equation is Ɵ(2^n).

How to solve this recurrence using masters method?

T(n)=4t(n/2) + n^2 and t(1)=1
I dont know guys, I can solve other ones but I seem to get stuck and cant start with this one
Let's work through this one and see what we find. In this case, we have a = 4, b = 2, and d = 2. Since logb a = 2 = d, we should get that t(n) = Θ(n2 log n).
Let's quickly check to see if that's the case by thinking about how much work is done per level in the tree. At the top level, we do n2 work and then make four calls on problems of size n/2. Each of those problems does (n/2)2 = n2 / 4 work, and since there's four copies of that problem the work done at the next level is n2. Each of those subproblems fires off four recursive calls on problems of size (n/4)2 = n2 / 16, and since there are sixteen of those subproblems the work done at that level is also n2. Overall, we see that each layer in the tree does n2 work and that there are Θ(log n) layers, so the total work done is Θ(n2 log n), matching our bound from the Master Theorem.

Moving Average across Variables in Stata

I have a panel data set for which I would like to calculate moving averages across years.
Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period.
For example:
P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)
I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.
I'd appreciate any guidance!
This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.
reshape long v, i(id) j(year)
tsset id year
Then a moving average is easy. Use tssmooth or just generate, e.g.
gen mave = (L.v + v + F.v)/3
or (better)
gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v
More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.
EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.
forval j = 1914/2011 {
local i = `j' - 1
local k = `j' + 1
gen P`j' = (v`i' + v`j' + v`k') / 3
}
That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.
FURTHER EDIT
As a matter of completeness, note that it is easy to handle missings without resorting to egen.
The numerator
(v`i' + v`j' + v`k')
generalises to
(cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k')
and the denominator
3
generalises to
!missing(v`i') + !missing(v`j') + !missing(v`k')
If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.
There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm
xtset id time
mvsumm observations, stat(mean) win(t) gen(new_variable) end

Resources