minimize the maximum continious subarray in array of 0/1 - greedy

Algo question
Binary array of 0/1 given
In one operation i can flip any array[index] of array i.e. 0->1 or 1->0
so aim is to minimize the maximum lenth of continious 1's or 0's by using atmost k flips
eg if 11111 if array and k=1 ,best is to make array as 11011
And minimized value of maximum continous 1's or 0's is 2
for 111110111111 and k=3 ans is 2
I tried Brute Force (by trying various position flips) but its not efficient
I think Greedy ,but can not figure out exactly
can you please help me for algo,O(n) or similar

A solution could be devised by reading each bit in order and recording the size of each continuous group of 1 into a list A.
Once you are done filling A, you can follow the algorithm narrated by the pseudocode below:
result = N
for i = 1 to N
flips_needed = 0
for a in A:
flips_needed += <number of flips needed to make sure largest group remaining in a is of size i>
if k >= flips_needed:
result = flips_needed
break
return result
N is the number of bits in the entire initial sequence.
The algorithm above works by dividing the groups of 1 into sizes of at most i. Whenever doing that requires <= k, we have the result we are looking for, as i starts from 1 and goes up. (i.e. when we found flips_needed <= k, we know the groups of 1 are as minimal as they can get)

Related

Vectorization of FOR loop

Is there a way to vectorize this FOR loop I know about gallery ("circul",y) thanks to user carandraug
but this will only shift the cell over to the next adjacent cell I also tried toeplitz but that didn't work).
I'm trying to make the shift adjustable which is done in the example code with circshift and the variable shift_over.
The variable y_new is the output I'm trying to get but without having to use a FOR loop in the example (can this FOR loop be vectorized).
Please note: The numbers that are used in this example are just an example the real array will be voice/audio 30-60 second signals (so the y_new array could be large) and won't be sequential numbers like 1,2,3,4,5.
tic
y=[1:5];
[rw col]= size(y); %get size to create zero'd array
y_new= zeros(max(rw,col),max(rw,col)); %zero fill new array for speed
shift_over=-2; %cell amount to shift over
for aa=1:length(y)
if aa==1
y_new(aa,:)=y; %starts with original array
else
y_new(aa,:)=circshift(y,[1,(aa-1)*shift_over]); %
endif
end
y_new
fprintf('\nfinally Done-elapsed time -%4.4fsec- or -%4.4fmins- or -%4.4fhours-\n',toc,toc/60,toc/3600);
y_new =
1 2 3 4 5
3 4 5 1 2
5 1 2 3 4
2 3 4 5 1
4 5 1 2 3
Ps: I'm using Octave 4.2.2 Ubuntu 18.04 64bit.
I'm pretty sure this is a classic XY problem where you want to calculate something and you think it's a good idea to build a redundant n x n matrix where n is the length of your audio file in samples. Perhaps you want to play with autocorrelation but the key point here is that I doubt that building the requested matrix is a good idea but here you go:
Your code:
y = rand (1, 3e3);
shift_over = -2;
clear -x y shift_over
tic
[rw col]= size(y); %get size to create zero'd array
y_new= zeros(max(rw,col),max(rw,col)); %zero fill new array for speed
for aa=1:length(y)
if aa==1
y_new(aa,:)=y; %starts with original array
else
y_new(aa,:)=circshift(y,[1,(aa-1)*shift_over]); %
endif
end
toc
my code:
clear -x y shift_over
tic
n = numel (y);
y2 = y (mod ((0:n-1) - shift_over * (0:n-1).', n) + 1);
toc
gives on my system:
Elapsed time is 1.00379 seconds.
Elapsed time is 0.155854 seconds.

arbitrarily weighted moving average (low- and high-pass filters)

Given input signal x (e.g. a voltage, sampled thousand times per second couple of minutes long), I'd like to calculate e.g.
/ this is not q
y[3] = -3*x[0] - x[1] + x[2] + 3*x[3]
y[4] = -3*x[1] - x[2] + x[3] + 3*x[4]
. . .
I'm aiming for variable window length and weight coefficients. How can I do it in q? I'm aware of mavg and signal processing in q and moving sum qidiom
In the DSP world it's called applying filter kernel by doing convolution. Weight coefficients define the kernel, which makes a high- or low-pass filter. The example above calculates the slope from last four points, placing the straight line via least squares method.
Something like this would work for parameterisable coefficients:
q)x:10+sums -1+1000?2f
q)f:{sum x*til[count x]xprev\:y}
q)f[3 1 -1 -3] x
0n 0n 0n -2.385585 1.423811 2.771659 2.065391 -0.951051 -1.323334 -0.8614857 ..
Specific cases can be made a bit faster (running 0 xprev is not the best thing)
q)g:{prev[deltas x]+3*x-3 xprev x}
q)g[x]~f[3 1 -1 -3]x
1b
q)\t:100000 f[3 1 1 -3] x
4612
q)\t:100000 g x
1791
There's a kx white paper of signal processing in q if this area interests you: https://code.kx.com/q/wp/signal-processing/
This may be a bit old but I thought I'd weigh in. There is a paper I wrote last year on signal processing that may be of some value. Working purely within KDB, dependent on the signal sizes you are using, you will see much better performance with a FFT based convolution between the kernel/window and the signal.
However, I've only written up a simple radix-2 FFT, although in my github repo I do have the untested work for a more flexible Bluestein algorithm which will allow for more variable signal length. https://github.com/callumjbiggs/q-signals/blob/master/signal.q
If you wish to go down the path of performing a full manual convolution by a moving sum, then the best method would be to break it up into blocks equal to the kernel/window size (which was based on some work Arthur W did many years ago)
q)vec:10000?100.0
q)weights:30?1.0
q)wsize:count weights
q)(weights$(((wsize-1)#0.0),vec)til[wsize]+) each til count v
32.5931 75.54583 100.4159 124.0514 105.3138 117.532 179.2236 200.5387 232.168.
If your input list not big then you could use the technique mentioned here:
https://code.kx.com/q/cookbook/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window
That uses 'scan' adverb. As that process creates multiple lists which might be inefficient for big lists.
Other solution using scan is:
q)f:{sum y*next\[z;x]} / x-input list, y-weights, z-window size-1
q)f[x;-3 -1 1 3;3]
This function also creates multiple lists so again might not be very efficient for big lists.
Other option is to use indices to fetch target items from the input list and perform the calculation. This will operate only on input list.
q) f:{[l;w;i]sum w*l i+til 4} / w- weight, l- input list, i-current index
q) f[x;-3 -1 1 3]#'til count x
This is a very basic function. You can add more variables to it as per your requirements.

if (freq) x$counts else x$density length > 1 and only the first element will be used

for my thesis I have to calculate the number of workers at risk of substitution by machines. I have calculated the probability of substitution (X) and the number of employee at risk (Y) for each occupation category. I have a dataset like this:
X Y
1 0.1300 0
2 0.1000 0
3 0.0841 1513
4 0.0221 287
5 0.1175 3641
....
700 0.9875 4000
I tried to plot a histogram with this command:
hist(dataset1$X,dataset1$Y,xlim=c(0,1),ylim=c(0,30000),breaks=100,main="Distribution",xlab="Probability",ylab="Number of employee")
But I get this error:
In if (freq) x$counts else x$density
length > 1 and only the first element will be used
Can someone tell me what is the problem and write me the right command?
Thank you!
It is worth pointing out that the message displayed is a Warning message, and should not prevent the results being plotted. However, it does indicate there are some issues with the data.
Without the full dataset, it is not 100% obvious what may be the problem. I believe it is caused by the data not being in the correct format, with two potential issues. Firstly, some values have a value of 0, and these won't be plotted on the histogram. Secondly, the observations appear to be inconsistently spaced.
Histograms are best built from one of two datasets:
A dataframe which has been aggregated grouped into consistently sized bins.
A list of values X which in the data
I prefer the second technique. As originally shown here The expandRows() function in the package splitstackshape can be used to repeat the number of rows in the dataframe by the number of observations:
set.seed(123)
dataset1 <- data.frame(X = runif(900, 0, 1), Y = runif(900, 0, 1000))
library(splitstackshape)
dataset2 <- expandRows(dataset1, "Y")
hist(dataset2$X, xlim=c(0,1))
dataset1$bins <- cut(dataset1$X, breaks = seq(0,1,0.01), labels = FALSE)

How to use segment tree and scanline

Given 300000 segments.
Consider any pair of segments: a = [l1,r1] and b = [l2,r2].
If l2 >= l1 and r2 <= r1 , it is "good" pair.
If a == b, it is "bad" pair.
Overwise, it is "bad" pair.
How to find number of all "good" pairs among given segments using segment tree and scanline?
Sort the segments in increasing order with respect to their l-values and for pairs with same l-values sort them in decreasing order with respect to their r-value.
Suppose for a particular , you want to count the number of good pairs (ai,aj) such that j < i. Let ai=[l1,r1] and aj = [l2,r2]. Then we have l2 <= l1. Now we need to count all the possible values of j such that r2 <= r1. This can be done by maintaining a segment tree for the values of r for all j such that 0 < j < i. After querying for the i-th pair, update the segment tree with the r-value of the i-th segment.
Coming to segment tree part, build a segment tree on the values of r. On updating a value of r in segment tree, add 1 to the value of r in the segment tree and for querying for a particular value of r, query for sum in the range [0,r-1]. This will give total number of segments that fit good with the given segment.
If the values of r are big that would not fit into segment tree, then apply coordinate compression to values first and then use segment tree for the compressed values.

Histogram calculation in julia-lang

refer to julia-lang documentations :
hist(v[, n]) → e, counts
Compute the histogram of v, optionally using approximately n bins. The return values are a range e, which correspond to the edges of the bins, and counts containing the number of elements of v in each bin. Note: Julia does not ignore NaN values in the computation.
I choose a sample range of data
testdata=0:1:10;
then use hist function to calculate histogram for 1 to 5 bins
hist(testdata,1) # => (-10.0:10.0:10.0,[1,10])
hist(testdata,2) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,3) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,4) # => (-5.0:5.0:10.0,[1,5,5])
hist(testdata,5) # => (-2.0:2.0:10.0,[1,2,2,2,2,2])
as you see when I want 1 bin it calculates 2 bins, and when I want 2 bins it calculates 3.
why does this happen?
As the person who wrote the underlying function: the aim is to get bin widths that are "nice" in terms of a base-10 counting system (i.e. 10k, 2×10k, 5×10k). If you want more control you can also specify the exact bin edges.
The key word in the doc is approximate. You can check what hist is actually doing for yourself in Julia's base module here.
When you do hist(test,3), you're actually calling
hist(v::AbstractVector, n::Integer) = hist(v,histrange(v,n))
That is, in a first step the n argument is converted into a FloatRange by the histrange function, the code of which can be found here. As you can see, the calculation of these steps is not entirely straightforward, so you should play around with this function a bit to figure out how it is constructing the range that forms the basis of the histogram.

Resources