Minimise the maximum difference between any 2 elements of an array - greedy

You are given an array and an integer k. How to minimise the maximum difference between any 2 elements of array by changing atmost k elements to any number.
eg: 4,7,4,7,4 and k=2
Change element at index 1,3 to 4. Array becomes 4,4,4,4,4. So maximum difference between any 2 elements becomes 0.
I thought to sort the array and find absolute difference between median and all the numbers and change the number which has biggest difference to the median.
Eg 4,7,4,7,4. Median =4
Array after sorting 4,4,4,7,7
Absolute difference 0,0,0,3,3. SO changed the maximum absolute difference number.

Related

How do I get the average of consecutive positive or negative numbers into one cell?

https://docs.google.com/spreadsheets/d/1Z99BFJws_fjrY6Z143opVntSqI5t8vG8Dkyk5-zWx68/edit?usp=sharing
A2:A16 is a column of the values
C2:C16 adds up consecutive positive and negative values in the column but I would like a formula that could get the average of those consecutive values into one cell.
H7 shows what should be the answer for the average of those 3 consecutive positive numbers.
H13 shows what should be the answer for the average of those 4 consecutive negative numbers.
Is there a formula that can do this?
This formula will use helper column in B2:B16
In B2 type this formula:
=IF(SIGN(A2)=SIGN(A1),B1,0)+SIGN(A2:A16)
and drag down up to B16
This will help us determine the largest count of positive and negative number.
It should look like this:
In H7 insert this formula:
=ARRAYFORMULA(AVERAGE(OFFSET(C2,MATCH(E4,B2:B16,0)-E4,0,E4,1)))
This will give us the average of max consecutive positive numbers.
In H13 insert this formula:
=ARRAYFORMULA(AVERAGE(OFFSET(C2,MATCH(-(E5),B2:B16,0)-E5,0,E5,1)))
This will give us the average of max consecutive negative numbers.
Output:
Note: You can place the helper in any column with same row and hide it, just change the formula of H7 and H13 that are referencing to the helper column. Also if the set max consecutive number appear more than 1, it will only calculate the first occurrence.
References:
SIGN
ARRAYFORMULA
AVERAGE
OFFSET

Extracting properties of handwritten digits to fasten nearest neighbour algorithm

I have 1024 bit long binary representation of three handwritten digits: 0, 1, 8.
Basically, in 32x32 bitmap of a digit, rows are concatenated to form a binary vector.
There are 50 binary vectors for each digit.
When we apply Nearest neighbour to each digit, we can use hamming distance metric or some other, and then apply the algorithm to differentiate between the vectors.
Now I want to use another technique where instead of looking at each bit of a vector, I would like to analyse on less number of bits while comparing the vectors.
For example, I know that when one compares bitmap(size:1024 bits) of digits '8' and '0', We must have 1s in middle of the vector of digit '8' as there digit 8 visually appears as the combination of two zeros placed in column.
So our algorithm would look for the intersection of two zeros(which would be the middle of digit.
Thats the way I want to work. I want to convert the low level representation(looking at 1024 bitmap vector) to the high level representation(that consist of two properties extracted from bitmap).
Any suggestion? I hope, the question is somewhat clear to the audience.
Idea 1: Flood fill
This idea does not use the 50 patterns you have per digit: it is based on the idea that usually a "1" has all 0-bits connected around that "1" shape, while a "0" separates the 0-bits inside it from those outside it, and an "8" has two such enclosed areas. So counting connected areas of 0-bits would identify which of the three it is.
So you could use a flood fill algorithm, starting at any 0 bit in the vector, and set all those connected 0-bits to 1. In a 1 dimensional array you need to take care to correctly identify connected bits (either horizontally: 1 position apart, but not crossing a 32 boundary, or vertically... 32 positions apart). Of course, this flood-filling will destroy the image - so make sure to use a copy. If after one such flood-fill there are still 0 bits (which were therefore not connected to those you turned into 1), then choose one of those and start a second flood-fill there. Repeat if necessary.
When all bits have been set to 1 in that way, use the number of flood-fills you had to perform, as follows:
One flood-fill? It's a "1", because all 0-bits are connected.
Two flood-fills? It's a "0", because the shape of a zero separates two areas (inside/outside)
Three flood-fills? It's an "8", because this shape separates three areas of connected 0-bits.
Of course, this process assumes that these handwritten digits are well-formed. For example, if an 8-shape would have a small gap, like here:
..then it will not be identified as an "8", but a "0". This particular problem could be resolved by identifying "loose ends" of 1-bits (a "line" that stops). When you have two of those at a short distance, then increase the number you got from flood-fill counting with 1 (as if those two ends were connected).
Similarly, if a "0" accidentally has a small second loop, like here:
...it will be identified as an "8" instead of a "0". You could prevent this particular problem by requiring that each flood-fill finds a minimum number of 0-bits (like at least 10 0-bits) to count as one.
Idea 2: probability vector
For each digit, add up the 50 example vectors you have, so that for each position you have a count somewhere between 0 to 50. You would have one such "probability" vector per digit, so prob0, prob1 and prob8. If prob8[501] = 45, it means that it is highly probable (45/50) that an "8" vector will have a 1-bit at index 501.
Now transform these 3 probability vectors as follows: instead of storing a count per position, store the positions in order of decreasing count (probability). So if prob8[513] has the highest value (like 49), then that new array should start like [513, ...]. Let's call these new vectors A0, A8 and A1 (for the corresponding digit).
Finally, when you need to match a given input vector, simultaneously go through A0, A1 and A8 (always looking at the same index in the three vectors) and keep 3 scores. When the input vector has a 1 at the position specified in A0[i], then add 1 to score0. If it also has a 1 at the position specified in A1[i] (same i), then add 1 to score1. Same thing for score8. Increment i, and repeat. Stop this iteration as soon as you have a clear winner, i.e. when the highest score among score0, score1 and score8 has crossed a threshold difference with the second highest score among them. At that point you know which digit is being represented.

Funny (rounding?) errors when adding

One column has numbers (always with 2 decimals, some are computed but all multiplications and divisions rounded to 2 decimals), the other is cumulative. The cumulative column has formula =<above cell>+<left cell>.
In the cumulative column the result is 58.78, the next number in the first column is -58.78. Because of different formatting for zero than for positive or negative numbers, I spotted something was wrong. Changing the format to several decimals, the numbers appear as:
£58.780000000000000000000000000000
-£58.780000000000000000000000000000 £0.000000000000007105427357601000
The non-zero zero is about 2^(-47). Another time the numbers in the same situation are:
£50.520000000000000000000000000000
-£50.520000000000000000000000000000 -£0.000000000000007105427357601000
How can that happen?
Also, if I change the cell in cumulative column into the actual number 58.78, the result suddenly becomes zero.
Google Sheets uses double precision floating point arithmetics, which creates such artifacts. The relative precision of this format is 2^(-53), so for a number of size around 2^6 = 64 we expect 2^(-47) truncation error.
Some spreadsheet users would be worried if they found out that "58.78" is actually not 58.78, because this number does not admit an exact representation in this floating point format. So the spreadsheet is hiding the truth, rounding the number for display and printing fake zeros when asked for more digits. Those zeros after 58.78 are fake.
The truth comes to light when you subtract two numbers that appear to be identical but are not — because they were computed in different ways, e.g. one obtained as a sum while the other by direct input. Rounding the result of subtraction to zero would be too much of a lie: this is no longer a matter of a small relative error, the difference between 2^(-47) and 0 may be an important one. Hence the unexpected reveal of the mechanics behind the scenes.
See also: Why does Google Spreadsheets says Zero is not equals Zero?

what is the format of word alignments in machine translation?

I am reading this paper and having a difficulty understanding the way word alignments are represented. To be precise, right below section 4.1, the authors say the format of the alignment is (i,j) where i ranges within the source sentence length and j ranges within the target sentence range. This means that each alignment is a pair of two numbers, which given that sentences are typically not longer than 40-100 words, values for i, and j can be stored using short type. So, I expect to see that the amount of space required to store these alignments be 2 x sizeof(short) x number of word alignments. But if you go to the next page where, right above section 4.2, they say the space is sizeof(short) x number of word alignments. WHY? Am I confusing stuff?

What does value "-inf" mean after multiplication of two double in Objective-C

I'm doing some calculus on double numbers with 17 numbers after the decimal point.
like 0.1256478965842365987 * 0.125639874569874563
and I get the value named "-inf" when I display it in the console.
What is the signification of that?
It means minus infinity.
EEE 754 floating point numbers can represent positive or negative infinity, and
NaN (not a number). These three values arise from calculations whose result is
undefined or cannot be represented accurately. You can also deliberately set a
floating-point variable to any of them, which is sometimes useful. Some examples
of calculations that produce infinity or NaN:
Now, it is strange that you got that multiplying those two numbers.

Resources