Hungarian Algorithm - Arbitrary Choices - graph-algorithm

I've looked at several explanations of the Hungarian Algorithm for solving the Assignment Problem and the vast majority of these cover very simplistic cases.
The most understandable explanation I've found is a YouTube video.
I can code the algorithm but I'm concerned about one special case. If you watch the video, the relevant case is explained from 31:55 to 37:42, but I’ll explain it below.
I should first mention that I will be dealing with a 300 x 300 matrix, so visual inspection is out of the question. Additionally, I need to find all minimum assignments. In other words, if there are multiple assignments that produce the same minimum value, I need to find them all.
Here's the particular case that I'm concerned about. You can see this explained in the YouTube video but I’ll go over it here. We start with this matrix:
3 1 1 4
4 2 2 5
5 3 4 8
4 2 5 9
When we reduce the rows and columns, we get this:
0 0 0 0
0 0 0 0
0 0 1 2
0 0 3 4
(Let me mention that I can visually see there are 4 solutions to this matrix and the total score is 13.)
Given the above reduced matrix, there are no unique zeros in any row or column, so, according to the algorithm described in the video, I can arbitrarily select any zero element for assignment, so I select (1,1).
I’ll mark the assigned zero with an asterisk and I’ll put an “x” next to those zeros in the rows and columns that are no longer available for consideration. Now we have this:
0* 0x 0x 0x
0x 0 0 0
0x 0 1 2
0x 0 3 4
Next, we continue examining rows for a unique zero. We find one at (3,2) so we mark it with an asterisk and mark the unavailable zeros with "x":
0* 0x 0x 0x
0x 0x 0 0
0x 0* 1 2
0x 0x 3 4
Next, we start looking for unique zeros in the columns (since all rows have been exhausted). We find column three has a unique zero at (2,3) so we mark it:
0* 0x 0x 0x
0x 0x 0* 0x
0x 0* 1 2
0x 0x 3 4
At this point, there are no more available zeros and row 4 has been left unassigned. (This particular YouTube video now uses a “ticking procedure”, which is a common technique for determining the minimum number of lines needed to cover all the zeros. If you are unfamiliar with this technique it is explained starting at 14:10 through 16:00, although the presenter uses a different matrix than shown here.) The “ticking procedure” is this:
Tick all rows that have no assigned zeros (row 4).
For each row that is ticked, tick the columns that contain a zero in that row.
For each column ticked in step 2, tick the corresponding rows that have assigned zeros.
Repeat steps 2 and 3 until no more ticking is possible.
Draw lines through all ticked columns and un-ticked rows.
At this point, the ticking procedure generates 4 vertical lines, covering all zeros. The four vertical lines tell us the zeros in the matrix represent one or more solutions, yet, as we see, row 4 is unassigned. The fact that fourth row remains unassigned in spite of the four vertical lines tells us that we chose the wrong zeros for assignment!
The video’s presenter indicates this problem is a result of our initial (arbitrary) assignment of element (1,1). The presenter says, “there are more sophisticated methods available” to get us out of this situation be he does not explain what these techniques are. He alludes to the existence of “intelligent” ways of selecting a zero, rather than the arbitrary selection we used to select the zero at (1,1).
One approach I could take (I’m not sure it’s the best) when faced with making an arbitrary assignment is to make the assignment from the row or column with the fewest number of available arbitrary choices. In this example, this means I would make the arbitrary assignment from either row 3 or 4, where there are only two arbitrary choices, rather than from row 1 or 2 where there are four arbitrary choices. Of course, since I need all correct solutions, I would have to iterate over all the available arbitrary assignments, whenever an arbitrary assignment is made. For example, if I select (3,1) as an arbitrary assignment, I would also have to assign (3,2) later.
My question then, after all this, is, when I am faced with the choice of arbitrarily selecting a zero for assignment, what is the best approach? Is it what I mention in the previous paragraph? How can I eliminate the dead-end solutions like the one shown? Please remember I still need to enumerate all solutions having the same minimal score.

Once the subtraction steps have been performed on all rows and columns, like you did, there is this step in the algorithm, which requires you to find the minimum number of rows or columns you can strike out to find no more zeroes in the cells that are left over (see step 3 in the Wikipedia article). If this minimum number of strike-out rows/columns equals n, then you have arrived at a matrix where the assignments can be made at positions that all are represented by zeroes.
This is the case in your second matrix.
Then there is also this algorithm step once you have done all the possible subtraction steps: if a row or column has only one zero, that zero represents an (optimal) assignment.
I think this rule can be generalised as follows:
If i rows (or columns) each have at the most i zeroes, then i of those zeroes represent (optimal) assignments.
That rule is obvious (and utterly unhelpful) when i is n.
But for small i, this can be helpful, although the algorithm to find such rows may be time consuming. In the example problem, we find that for i = 2 the rule applies for rows 3 and 4. The rule also implies that we can bar any other assignments in the columns that contain the zeroes. This means we can write our matrix as:
- - 0 0
- - 0 0
0 0 1 2
0 0 3 4
Now the rule can be applied again to rows 1 and 2, which each now also only have 2 zeroes.
We see two sub-matrixes of only zeroes (subject of where we applied the rule):
0 0
0 0
There are two ways to make assignments:
x 0
0 x
or:
0 x
x 0
In general, for a sub-matrix with i rows and i columns, there are i! solutions if all its elements are zero, but fewer if some of them are not.
In the concreate example, there are thus 2! solutions for the bottom-left sub-matrix, and 2! for the top-right matrix, resulting in 4 possible solutions.
Conclusion
Although the above considerations may sound interesting, I don't think an algorithm that searches for such sub-matrixes will be more efficient than an algorithm that just picks assignments in an ordered fashion, and backtracks as soon as it finds it is on a wrong track. As you will need all solutions, there is no sense in starting with a certain row. Backtracking should make sure the algorithm does not waste time on choices that have no future.

Related

Random select in with a bias towards certain outcomes (ie 60/40)

Lets say I have 2 lists and I would like to randomly select a winner between the lists but I would like to select the winner from list A 60% of the time and from list B 40% of the time, how can that be done in Google Sheets?
You can randomly select names from a list using this formula
INDEX(A2:A, RANDBETWEEN(1, COUNTA(A2:A)))
Without knowing some more information on your setup here is a general formula that does what you're describing:
=IF(RAND()<=0.6,INDEX(A2:A, RANDBETWEEN(1, COUNTA(A2:A))),INDEX(B2:B, RANDBETWEEN(1, COUNTA(B2:B))))
Essentially it is rolling a random number between 0 and 1. If it is equal to or less than .6 (simulating 60%, since there is a 60% chance it will be less than or equal to .6) it then selects a random name from Column A, otherwise (bottom 40%) it selects from column B.
You can also replace the "0.6" with A1 in my example to have the weight be a dynamic number. Changing A1 to 75% for example will then compare the random value against less than or equal to .75.
EDIT: Image shows the wrong condition, I was corrected in the sense you want less than or equal to .6 and not greater than, I had the weights flipped.

What is the probability that the first defective item occurs in the the fifth item inspected

Q In a production line, the probability of finding a defective item is 0.3 . What is the probability that the first defective item occurs in the the fifth item inspected.
f(x)=(1−p)^(x−1)P
How to apply this formula here?
Congratulations, you got the right formula for the problem in the first place. This is the hardest part most of the time. Now, let's use it.
[Side note: I don't know why you wrote the "p" uppercase one time and lowercase another time. It should be the same.]
[Side note 2: I assume you know how to evaluate this formula with an electronic calculator. Just to make sure: The notation x^y means "take x to the y-th power", or: "multiply x with itself y times".]
So your formula has two variables (p and x) and you got two numbers (0.3 and 5). What is left is to figure out which is which. I'd say 50% chance to get it right by guessing. Let's increase this chance by understanding the formula.
f(x) is the overall probability that an event with probability (1-p) occurs (x-1) times in a row, followed by one event with probability p.
In your example, we want to find the overall probability that the event "item is ok" occurs 4 times in a row, followed by one event of type "item is defective".
See the correspondences?

How to eliminate highlighting duplicates in google sheets conditional formatting

I have a spreadsheet where I need to conditional format/highlight the lowest 3 scores in a row to reflect dropped scores that are part of a Total calculation. I'm using the SMALL function to successfully calculate the Total..=SUM(A2:I2)-SMALL(A2:I2,1)-SMALL(A2:I2,2)-SMALL(A2:I2,3) but when I try to use the SMALL function in the Custom Formula field of the Conditional Format it highlights 0,60,60,60 and not 0,60,60
119 101 60 100 0 109 60 60 112 TOTAL:601
If four of the values are 0, it will highlight all for 0's.. if 60 is the lowest score and there are 4 or more scores of 60, it will highlight all and not reflect that only 3 of the scores are actually dropped.
Is there another way (custom formula) that can only highlight the lowest 3 scores in the row even when the 3rd lowest may have duplicates in the row?
I've come up with this formula (assuming values start in A1) which unfortunately is a bit long
=OR(A1<SMALL($A1:$I1,3),AND(A1=SMALL($A1:$I1,3),COUNTIF($A1:A1,SMALL($A1:$I1,3))<=(3-COUNTIF($A1:$I1,"<"&SMALL($A1:$I1,3)))))
or
=OR(A1<SMALL($A1:$I1,3),AND(A1=SMALL($A1:$I1,3),(COUNTIF($A1:A1,SMALL($A1:$I1,3))+COUNTIF($A1:$I1,"<"&SMALL($A1:$I1,3))<=3)))
The logic is that it highlights all cells which are less than the third smallest value, then any values (starting from the left) which are equal to the third smallest value until the total equals three.
I've changed the second row to show that it selects the second zero instead of the second 60.

How to generate a certain amount of numbers and spread them randomly across a grid?

I want to generate the number 2 5 times and the number 1 10 times. I'm trying to spread these across a String Grid in Delphi randomly. I also want to fill the rest of the grid that isn't 1 or 2, with 0's. I have no idea how to even start here.
It would look something like this (P stands for player and there would only be 5 2's and 10 1's): https://gyazo.com/aeef05c3a92ce7847c0f42ad40faa733
Given a grid with dimensions m×n, create an array of length m * n. Put five 2's and 10 1's in the array, and fill the remainder with 0's. (We'll assume the product of m and n is at least 15.) Shuffle the array. Copy each element of the shuffled array into successive cells in the grid.
While the approach represented in Robs answer will do the job I personally think it is way to complicated for it's purpose.
So what would be more simpler approach?
Well your goal is to place these numbers at random positions in grid.
How do you determine position of some object in a grid? You do it by its X (Column) and Y (Row) coordinates.
So how do you get random position in a grid? Simple chose two random values for X and Y coordinates.
As for placing certain numbers of number 1 and number 2 use two simple loops.

Cobol dashes are confusing me [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
cobol difference with Picture having a dash (-) and a having a X
I'm tying to get to grips with Cobol and can't understand the dashes when formatting a number. I have this example:
--9
Am I correct with the following?
The first dash - If number is a negative put a dash otherwise don't.
the second dash - I'm confused with this. There is already a dash at the start to specify whether its negative or positive.
9 - Numeric digit (0-9)
An example would be good. :S
Thanks
In view of your previous question, Im not sure what you are having trouble with. But lets try again...
In COBOL, numeric display fields may contain various types of "punctuation". This "punctuation" is defined in the items PICTURE clause. A few examples of the type of "punctuation" symbols you can use are: Explicit decimal points, plus/minus signs, CR/DR indicators and thousnads separators (commas in North America). There is a well defined set of rules that determine what type of "punctuation" can occur in the PICTURE clause and where. This link to PICTURE CLAUSE editing explains how to construct (or read) any given PICTURE clause.
One thing that you, and many others new to COBOL, trip up on is that a data definition in COBOL specifies two distinctly different types of information about numeric display data. One is the range of values it may hold and the other is how
that range of values may be displayed. Your example: PICTURE --9 tells me two things about the data item: 1) Values are integers in the range of -99 through to +99, and 2) Displaying this item will take 3 spaces. If the number is positive, spaces will appear before the first non zero digit. If the number is negative a minus sign will appear immediately to the left of the first non zero digit. Consider the following COBOL DISPLAY statement:
DISPLAY '>' DISP-NBR '<'
IF DISP-NBR has a PICTURE clause of: --9 this is how various values will be displayed.
0 displays as: > 0<
-1 displays as: > -1<
-11 displays as: >-11<
10 displays as: > 10<
Note that all displays take 3 character positions. At least 1 digit will always be displayed (because of the '9' in the PICTURE clause), other than that, no leading zeros are displayed. A minus sign will display only for negative values. The minus sign, if displayed will be to the immediate left of the first displayed digit.
Now to answer you specific question: The total number of character positions needed to display a numeric display data item is determined by the length of the PICTURE. You have a 3 character PICTURE so 3 character positions are needed. When
a sign is specified in the PICTURE, a space is always reserved for it. This is what limits the range of integers to those containing at most 2 digits. The second minus sign indicates 'zero supression'. Zero supression just means not printing leading zeros. Only 1 minus sign is ever printed and it will be to the immediate left of the first displayed digit.
COBOL contains a lot of flexability with respect to displaying numbers. Understanding the numeric display PICTURE clause is key to understanding how this all works.
from stackoverflow:cobol-difference-with-picture-having-a-dash-and-a-having-a-x
The dash means that if you have a negative number, a dash will be
shown beside (at the left) of the number. Only one dash will be
displayed. If the number is positive, a space will shown for every
dashes.

Resources