How to find Quartile 3 of a linked list given that you can only iterate through the list once - linked-list

Similar to question asked here (How to find the quartiles in the linked list with only one iteration).
I can easily find Quartile 1 using the method specified in the above post, however, quartile 3 is always wrong for me. Sometimes it is off by 1, and others 2 or more. I am moving the Quartile 3 pointer by 3 every iteration, is this correct, or am I missing something?

The idea is that the node references should make a step forward depending on the number of the iteration modulo 4:
The node reference for q1 should only move once in every 4 iterations, while the q2 node reference should step forward once every 2 iterations, and finally the q3 reference should step forward 3 out of 4 iterations.
Once the iterations are completed, it has to be decided whether the q* nodes are exactly on the corresponding median, or the average with its successor node should be taken. Again, this is determined by the size of the list modulo 4.
Here is a little implementation in JavaScript:
function makeList(arr) {
return arr.reduceRight((next, val) => ({ next, val }), null);
}
function* fromList(head) {
while (head) {
yield head.val;
head = head.next;
}
}
function getMedians(head) {
if (head == null || head.next == null) return; // Need at least 2 values
let node1 = head,
node2 = head,
node3 = head.next,
node4 = node3.next,
size = 2;
while (node4) {
if (size % 4 == 1) node1 = node1.next;
if (size % 2 == 0) node2 = node2.next;
if (size % 4 != 3) node3 = node3.next;
node4 = node4.next;
size++;
}
return [
size % 4 < 2 ? (node1.val + node1.next.val) / 2 : node1.val,
size % 2 < 1 ? (node2.val + node2.next.val) / 2 : node2.val,
size % 4 < 2 ? (node3.val + node3.next.val) / 2 : node3.val,
];
}
// Example run
let head = makeList([20, 30, 40, 50, 60, 70]);
console.log("list", ...fromList(head));
let [q1, q2, q3] = getMedians(head);
console.log("q1, q2, q3 => ", q1, q2, q3);

Related

Generate all numbers for 0 to 1 using a loop

This question might seem a bit silly..:) But I think I'm missing something somewhere..so bit confused...
I wanted to generate all numbers from 0 to 1. In other words, if I do 1/2, I get 0.5. Then 0.5/2 = 0.25. Then 0.25/2 = 0.125. This will go on until 0.00000001 (A total of 26 divisions)
But I want to generate all numbers in an increasing order from 0.00000001 to 1.
I tried doing something like so...
let first = 0.00000001
let last = 1.0
let interval = first * 2
let sequence = stride(from: first, to: last, by: interval)
for element in sequence {
print(element)
}
But it's not working. It seemed it just prints infinitely...
How can I properly use a for loop and print from 0.00000001 to 1 in a limited number of iterations..? Or any other loops to be used in this case..?
You can't use stride. stride produces an arithmetic sequence with a difference of interval, which is 0.00000002:
0.00000001
0.00000003
0.00000005
0.00000007
...
You want a geometric sequence between 0 and 1.
You could use sequence instead, which generates an infinite sequence:
let first = 0.00000001
let last = 1.0
for item in sequence(first: first, next: { $0 * 2 }).prefix(while: { $0 < last }) {
print(item)
}
{ $0 * 2 } is the function that generates the next element, and prefix(while:) is used to get first elements that satisfy the < last condition.
Here is another way you could approach it. Use stride to count down the powers of 2 from 26 to 0 and divide 1.0 by that power of 2 and display only the first 8 decimal places:
for n in stride(from: 26, through: 0, by: -1) {
print(String(format: "%.8f", 1.0 / pow(2.0, Double(n))))
}
or equivalently (removing the 1/n by using negative exponents):
for n in -26...0 {
print(String(format: "%.8f", pow(2.0, Double(n))))
}
Output:
0.00000001
0.00000003
0.00000006
0.00000012
0.00000024
0.00000048
0.00000095
0.00000191
0.00000381
0.00000763
0.00001526
0.00003052
0.00006104
0.00012207
0.00024414
0.00048828
0.00097656
0.00195312
0.00390625
0.00781250
0.01562500
0.03125000
0.06250000
0.12500000
0.25000000
0.50000000
1.00000000

Best way to parallelize computation over dask blocks that do not return np arrays?

I'd like to return a dask dataframe from an overlapping dask array computation, where each block's computation returns a pandas dataframe. The example below shows one way to do this, simplified for demonstration purposes. I've found a combination of da.overlap.overlap and to_delayed().ravel() as able to get the job done, if I pass in the relevant block key and chunk information.
Edit:
Thanks to a #AnnaM who caught bugs in the original post and then made it general! Building off of her comments, I'm including an updated version of the code. Also, in responding to Anna's interest in memory usage, I verified that this does not seem to take up more memory than naively expected.
def extract_features_generalized(chunk, offsets, depth, columns):
shape = np.asarray(chunk.shape)
offsets = np.asarray(offsets)
depth = np.asarray(depth)
coordinates = np.stack(np.nonzero(chunk)).T
keep = ((coordinates >= depth) & (coordinates < (shape - depth))).all(axis=1)
data = coordinates + offsets - depth
df = pd.DataFrame(data=data, columns=columns)
return df[keep]
def my_overlap_generalized(data, chunksize, depth, columns, boundary):
data = data.rechunk(chunksize)
data_overlapping_chunks = da.overlap.overlap(data, depth=depth, boundary=boundary)
dfs = []
for block in data_overlapping_chunks.to_delayed().ravel():
offsets = np.array(block.key[1:]) * np.array(data.chunksize)
df_block = dask.delayed(extract_features_generalized)(block, offsets=offsets,
depth=depth, columns=columns)
dfs.append(df_block)
return dd.from_delayed(dfs)
data = np.zeros((2,4,8,16,16))
data[0,0,4,2,2] = 1
data[0,1,4,6,2] = 1
data[1,2,4,8,2] = 1
data[0,3,4,2,2] = 1
arr = da.from_array(data)
df = my_overlap_generalized(arr,
chunksize=(-1,-1,-1,8,8),
depth=(0,0,0,2,2),
columns=['r', 'c', 'z', 'y', 'x'],
boundary=tuple(['reflect']*5))
df.compute().reset_index()
-- Remainder of original post, including original bugs --
My example only does xy overlaps, but it's easy to generalize. Is there anything below that is suboptimal or could be done better? Is anything likely to break because it's relying on low-level information that could change (e.g. block key)?
def my_overlap(data, chunk_xy, depth_xy):
data = data.rechunk((-1,-1,-1, chunk_xy, chunk_xy))
data_overlapping_chunks = da.overlap.overlap(data,
depth=(0,0,0,depth_xy,depth_xy),
boundary={3: 'reflect', 4: 'reflect'})
dfs = []
for block in data_overlapping_chunks.to_delayed().ravel():
offsets = np.array(block.key[1:]) * np.array(data.chunksize)
df_block = dask.delayed(extract_features)(block, offsets=offsets, depth_xy=depth_xy)
dfs.append(df_block)
# All computation is delayed, so downstream comptutions need to know the format of the data. If the meta
# information is not specified, a single computation will be done (which could be expensive) at this point
# to infer the metadata.
# This empty dataframe has the index, column, and type information we expect in the computation.
columns = ['r', 'c', 'z', 'y', 'x']
# The dtypes are float64, except for a small number of columns
df_meta = pd.DataFrame(columns=columns, dtype=np.float64)
df_meta = df_meta.astype({'c': np.int64, 'r': np.int64})
df_meta.index.name = 'feature'
return dd.from_delayed(dfs, meta=df_meta)
def extract_features(chunk, offsets, depth_xy):
r, c, z, y, x = np.nonzero(chunk)
df = pd.DataFrame({'r': r, 'c': c, 'z': z, 'y': y+offsets[3]-depth_xy,
'x': x+offsets[4]-depth_xy})
df = df[(df.y > depth_xy) & (df.y < (chunk.shape[3] - depth_xy)) &
(df.z > depth_xy) & (df.z < (chunk.shape[4] - depth_xy))]
return df
data = np.zeros((2,4,8,16,16)) # round, channel, z, y, x
data[0,0,4,2,2] = 1
data[0,1,4,6,2] = 1
data[1,2,4,8,2] = 1
data[0,3,4,2,2] = 1
arr = da.from_array(data)
df = my_overlap(arr, chunk_xy=8, depth_xy=2)
df.compute().reset_index()
First of all, thanks for posting your code. I am working on a similar problem and this was really helpful for me.
When testing your code, I discovered a few mistakes in the extract_features function that prevent your code from returning correct indices.
Here is a corrected version:
def extract_features(chunk, offsets, depth_xy):
r, c, z, y, x = np.nonzero(chunk)
df = pd.DataFrame({'r': r, 'c': c, 'z': z, 'y': y, 'x': x})
df = df[(df.y >= depth_xy) & (df.y < (chunk.shape[3] - depth_xy)) &
(df.x >= depth_xy) & (df.x < (chunk.shape[4] - depth_xy))]
df['y'] = df['y'] + offsets[3] - depth_xy
df['x'] = df['x'] + offsets[4] - depth_xy
return df
The updated code now returns the indices that were set to 1:
index r c z y x
0 0 0 0 4 2 2
1 1 0 1 4 6 2
2 2 0 3 4 2 2
3 1 1 2 4 8 2
For comparison, this is the output of the original version:
index r c z y x
0 1 0 1 4 6 2
1 3 1 2 4 8 2
2 0 0 1 4 6 2
3 1 1 2 4 8 2
It returns lines number 2 and 4, two times each.
The reason why this happens is three mistakes in the extract_features function:
You first add the offset and subtract the depth and then filter out the overlapping parts: the order needs to be swapped
df.y > depth_xy should be replaced with df.y >= depth_xy
df.z should be replaced with df.x, since it is the x dimension that has an overlap
To optimize this even further, here is a generalized version of the code that would work for an arbitrary number of dimension:
def extract_features_generalized(chunk, offsets, depth, columns):
coordinates = np.nonzero(chunk)
df = pd.DataFrame()
rows_to_keep = np.ones(len(coordinates[0]), dtype=int)
for i in range(len(columns)):
df[columns[i]] = coordinates[i]
rows_to_keep = rows_to_keep * np.array((df[columns[i]] >= depth[i])) * \
np.array((df[columns[i]] < (chunk.shape[i] - depth[i])))
df[columns[i]] = df[columns[i]] + offsets[i] - depth[i]
del coordinates
return df[rows_to_keep > 0]
def my_overlap_generalized(data, chunksize, depth, columns):
data = data.rechunk(chunksize)
data_overlapping_chunks = da.overlap.overlap(data, depth=depth,
boundary=tuple(['reflect']*len(columns)))
dfs = []
for block in data_overlapping_chunks.to_delayed().ravel():
offsets = np.array(block.key[1:]) * np.array(data.chunksize)
df_block = dask.delayed(extract_features_generalized)(block, offsets=offsets,
depth=depth, columns=columns)
dfs.append(df_block)
return dd.from_delayed(dfs)
data = np.zeros((2,4,8,16,16))
data[0,0,4,2,2] = 1
data[0,1,4,6,2] = 1
data[1,2,4,8,2] = 1
data[0,3,4,2,2] = 1
arr = da.from_array(data)
df = my_overlap_generalized(arr, chunksize=(-1,-1,-1,8,8),
depth=(0,0,0,2,2), columns=['r', 'c', 'z', 'y', 'x'])
df.compute().reset_index()

Random values with different weights

Here's a question about entity framework that has been bugging me for a bit.
I have a table called prizes that has different prizes. Some with higher and some with lower monetary values. A simple representation of it would be as such:
+----+---------+--------+
| id | name | weight |
+----+---------+--------+
| 1 | Prize 1 | 80 |
| 2 | Prize 2 | 15 |
| 3 | Prize 3 | 5 |
+----+---------+--------+
Weight is this case is the likely hood I would like this item to be randomly selected.
I select one random prize at a time like so:
var prize = db.Prizes.OrderBy(r => Guid.NewGuid()).Take(1).First();
What I would like to do is use the weight to determine the likelihood of a random item being returned, so Prize 1 would return 80% of the time, Prize 2 15% and so on.
I thought that one way of doing that would be by having the prize on the database as many times as the weight. That way having 80 times Prize 1 would have a higher likelihood of being returned when compared to Prize 3, but this is not necessarily exact.
There has to be a better way of doing this, so i was wondering if you could help me out with this.
Thanks in advance
Normally I would not do this in database, but rather use code to solve the problem.
In your case, I would generate a random number within 1 to 100. If the number generated is between 1 to 80 then 1st one wins, if it's between 81 to 95 then 2nd one wins, and if between 96 to 100 the last one win.
Because the random number could be any number from 1 to 100, each number has 1% of chance to be hit, then you can manage the winning chance by giving the range of what the random number falls into.
Hope this helps.
Henry
This can be done by creating bins for the three (generally, n) items and then choose a selected random to be dropped in one of those bins.
There might be a statistical library that could do this for you i.e. proportionately select a bin from n bins.
A solution that does not limit you to three prizes/weights could be implemented like below:
//All Prizes in the Database
var allRows = db.Prizes.ToList();
//All Weight Values
var weights = db.Prizes.Select(p => new { p.Weight });
//Sum of Weights
var weightsSum = weights.AsEnumerable().Sum(w => w.Weight);
//Construct Bins e.g. [0, 80, 95, 100]
//Three Bins are: (0-80],(80-95],(95-100]
int[] bins = new int[weights.Count() + 1];
int start = 0;
bins[start] = 0;
foreach (var weight in weights) {
start++;
bins[start] = bins[start - 1] + weight.Weight;
}
//Generate a random number between 1 and weightsSum (inclusive)
Random rnd = new Random();
int selection = rnd.Next(1, weightsSum + 1);
//Assign random number to the bin
int chosenBin = 0;
for (chosenBin = 0; chosenBin < bins.Length; chosenBin++)
{
if (bins[chosenBin] < selection && selection <= bins[chosenBin + 1])
{
break;
}
}
//Announce the Prize
Console.WriteLine("You have won: " + allRows.ElementAt(chosenBin));

Runtime of while loop pseudocode

I have a pseudocode which I'm trying to make a detailed analysis, analyze runtime, and asymptotic analysis:
sum = 0
i = 1
while (i ≤ n){
sum = sum + i
i = 2i
}
return sum
My assignment requires that I write the cost/runtime for every line, add these together, and find a Big-Oh notation for the runtime. My analysis looks like this for the moment:
sum = 0 1
long i = 1 1
while (i ≤ n){ log n + 1
sum = sum + i n log n
i = 2i n log n
}
return sum 1
=> 2 n log n + log n + 4 O(n log n)
is this correct ? Also: should I use n^2 on the while loop instead ?
Because of integer arithmetic, the runtime is
O(floor(ln(n))+1) = O(ln(n)).
Let's step through your pseudocode. Consider the case that n = 5.
iteration# i ln(i) n
-------------------------
1 1 0 5
2 2 1 5
3 4 2 5
By inspection we see that
iteration# = ln(i)+1
So in summary:
sum = 0 // O(1)
i = 1 // O(1)
while (i ≤ n) { // O(floor(ln(n))+1)
sum = sum + i // 1 flop + 1 mem op = O(1)
i = 2i // 1 flop + 1 mem op = O(1)
}
return sum // 1 mem op = O(1)

Scaling a number between two values

If I am given a floating point number but do not know beforehand what range the number will be in, is it possible to scale that number in some meaningful way to be in another range? I am thinking of checking to see if the number is in the range 0<=x<=1 and if not scale it to that range and then scale it to my final range. This previous post provides some good information, but it assumes the range of the original number is known beforehand.
You can't scale a number in a range if you don't know the range.
Maybe what you're looking for is the modulo operator. Modulo is basically the remainder of division, the operator in most languages is is %.
0 % 5 == 0
1 % 5 == 1
2 % 5 == 2
3 % 5 == 3
4 % 5 == 4
5 % 5 == 0
6 % 5 == 1
7 % 5 == 2
...
Sure it is not possible. You can define range and ignore all extrinsic values. Or, you can collect statistics to find range in run time (i.e. via histogram analysis).
Is it really about image processing? There are lots of related problems in image segmentation field.
You want to scale a single random floating point number to be between 0 and 1, but you don't know the range of the number?
What should 99.001 be scaled to? If the range of the random number was [99, 100], then our scaled-number should be pretty close to 0. If the range of the random number was [0, 100], then our scaled-number should be pretty close to 1.
In the real world, you always have some sort of information about the range (either the range itself, or how wide it is). Without further info, the answer is "No, it can't be done."
I think the best you can do is something like this:
int scale(x) {
if (x < -1) return 1 / x - 2;
if (x > 1) return 2 - 1 / x;
return x;
}
This function is monotonic, and has a range of -2 to 2, but it's not strictly a scaling.
I am assuming that you have the result of some 2-dimensional measurements and want to display them in color or grayscale. For that, I would first want to find the maximum and minimum and then scale between these two values.
static double[][] scale(double[][] in, double outMin, double outMax) {
double inMin = Double.POSITIVE_INFINITY;
double inMax = Double.NEGATIVE_INFINITY;
for (double[] inRow : in) {
for (double d : inRow) {
if (d < inMin)
inMin = d;
if (d > inMax)
inMax = d;
}
}
double inRange = inMax - inMin;
double outRange = outMax - outMin;
double[][] out = new double[in.length][in[0].length];
for (double[] inRow : in) {
double[] outRow = new double[inRow.length];
for (int j = 0; j < inRow.length; j++) {
double normalized = (inRow[j] - inMin) / inRange; // 0 .. 1
outRow[j] = outMin + normalized * outRange;
}
}
return out;
}
This code is untested and just shows the general idea. It further assumes that all your input data is in a "reasonable" range, away from infinity and NaN.

Resources