Quoting from the HDF5 Hyperslab doc -:
The block array determines the size of the element block selected from
the dataspace.
The example shows in a 2x2 dataset having the parameters set to the following-:
start offset is specified as [1,1], stride is [4,4], count is [3,7], and block is [2,2]
will result in 21 2x2 blocks. Where the selections will be (1,1), (5,1), (9,1), (1,5), (5,5) I can understand that because the starting point is (1,1) the selection starts at that point, also since the stride is (4,4) it moves 4 in each dimension, and the count is (3,7) it increments 3 times 4 in direction X and 7 times 4 in direction Y ie. in its corresponding dimension.
But what I don't understand is what is block size doing ? Does it mean that I will get 21 2x2 dimensional blocks ? That means each block contains 4 elements, but the count is already set in 3 in 1 dimension so how will that be possible ?
A hyperslab selection created through H5Sselect_hypserslab() lets you create a region defined by a repeating block of elements.
This is described in section 7.4.2.2 of the HDF5 users guide found here (scroll down a bit to 7.4.2.2). The H5Sselect_hyperslab() reference manual entry might also be helpful.
Here is a diagram from the UG:
And here are the values used in that figure:
offset = (0,1)
stride = (4,3)
count = (2,4)
block = (3,2)
Notice how the repeating unit is a 3x2 element block. So yes, you will get 21 2x2 blocks in your case. There will be a grid of three blocks in one dimension and seven in the other, each spaced 4 elements apart in each direction. The first block will be offset by 1,1.
The most confusing thing about this API call is that three of the parameters have elements as their units, while count has blocks as its unit.
Edit: Perhaps this will make how block and count are used more obvious...
HDFS default block size is 64 mb which can be increased according to our requirements.1 mapper processes 1 block at a time.
Related
I have 60 signals sequences samples with length 200 each labeled by 6 label groups, each label is marked with one of 10 values. I'd like to get prediction in each label group on each label when feeding the 200-length or even shorter sample to the network.
I tried to build own network based on https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/seqclassification/UCISequenceClassificationExample.java example, which, however, provides the label padding. I use no padding for the label and I'm getting exception like this:
Exception in thread "main" java.lang.IllegalStateException: Sequence lengths do not match for RnnOutputLayer input and labels:Arrays should be rank 3 with shape [minibatch, size, sequenceLength] - mismatch on dimension 2 (sequence length) - input=[1, 200, 12000] vs. label=[1, 1, 10]
In fact, it is a requirement for the labels to have a time dimension what is 200-long for the features the same as features are. So here I have to do some kind of techniques like zeroes padding in all 6 labels channel. On other hand, the input was wrong, I put all 60*200 there, however it should be [1, 200, 60] there while 6 labels are [1, 200, 10] each.
The thing under the question is in which part of 200-length label I should place the real label value [0], [199] or may be place labels to the typical parts of the signals they are associated with? My trainings that should check this is still in progress. What kind of padding is better? Zeroes padding or the label value padding? Still not clear and can't google out paper explaining what is the best.
According to JPEG2000 specs, Number of tiles in X and Y directions is calculated by following formula:
numXtiles = (Xsiz − XTOsiz)/ XTsiz
&
numYtiles = (Ysiz − YTOsiz)/ YTsiz
But it is not mentioned about the range of numXtiles or numYtiles.
Can we have numXtiles=0 while numYtiles=250 (or any other value) ?
In short, no. You will always need at least one row and one column of tiles to place your image in the canvas.
In particular, the SIZ marker of the JPEG 2000 stream syntax does not directly define the number of tiles, but rather the size of each tile. Since the tile width and height are defined to be larger than 0 (see page 453 of "JPEG 2000 Image compression fundamentals, standards and practice", by David Taubman and Michael Marcellin), you will always have at least one tile.
That said, depending on the particular implementation that you are using, there may be a parameter numXtiles that you can set to 0 without crashing your program. In that case, the parameter is most likely being ignored or interpreted differently.
I am processing images, which are long, usually a few hundred thousand pixel in length. The height is usually in the 500-1000 pixel range. The process involves modifying the images on a column by column basis. So, for example, I have a column of constant values that needs to be subtracted from each column in the image.
Currently I split the image into smaller blocks, put them into linearized 2d arrays. Then I make a linearized 2d array from the column of constant values that is the same size as the smaller block. Then a (image array - constant array) operation is done until the full image is processed.
Should I copy the constant column to the device, and then just operate column by column? Or should I try to make as large of a "constant array" as possible, and then perform the subtraction. I am not looking for 100% optimization or even close to that, but an idea about what the right approach to take is.
How can I optimize this process? Any resources to learn more about this type of processing would be appreciated.
Constant memory is up to 64KB, so assuming your pixels are 4 bytes or less, then you should be able to handle an image height up to about 16K pixels, and still put the entire "constant column" in constant memory.
After that, you don't need to process things "column by column". Constant memory is optimized for access when every thread is requesting the same value from constant memory, which perfectly describes your case.
Therefore, your thread code can be trivially simple:
#define MAX_COL_SIZE 1024
__constant__ float const_column[MAX_COL_SIZE];
__global__ void img_col_kernel(float *in, float *out, int num_cols, int col_size){
int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < num_cols)
for (int i = 0; i < col_size; i++)
out[idx+i*num_cols] = in[idx+i*num_cols] - const_column[i];
}
(coded in browser, not tested)
Set up const_column in your host code using cudaMemcpyToSymbol prior to calling img_col_kernel. Call the kernel with a 1D grid including a total number of threads equal to or greater than your image width (num_cols). Pass the "linearized 2D" pointers to your input and output images to the kernel (in and out). The above kernel should run pretty fast, and essentially be bound by memory bandwidth for images of width 1000 or more. For small images, you may want to increase the number of threads by dividing your image vertically into say, 4 pieces, and operate with 4 times as many threads (and 4 regions of constant memory).
Has anyone been able to do spatial operations with #ApacheSpark? e.g. intersection of two sets that contain line segments?
I would like to intersect two sets of lines.
Here is a 1-dimensional example:
The two sets are:
A = {(1,4), (5,9), (10,17),(18,20)}
B = {(2,5), (6,9), (10,15),(16,20)}
The result intersection would be:
intersection(A,B) = {(1,1), (2,4), (5,5), (6,9), (10,15), (16,17), (18,20)}
A few more details:
- sets have ~3 million items
- the lines in a set cover the entire range
Thanks.
One approach to parallelize this would be to create a grid of some size, and group line segments by the grids they belong to.
So for a grid with sizes n, you could flatMap pairs of coordinates (segments of line segments), to create (gridId, ( (x,y), (x,y) )) key-value pairs.
The segment (1,3), (5,9) would be mapped to ( (1,1), ((1,3),(5,9) ) for a grid size 10 - that line segment only exists in grid "slot" 1,1 (the grid from 0-10,0-10). If you chose a smaller grid size, the line segment would be flatmapped to multiple key-value pairs, one for each grid-slot it belongs to.
Having done that, you can groupByKey, and for each group, calculation intersections as normal.
It wouldn't exactly be the most efficient way of doing things, especially if you've got long line segments spanning multiple grid "slots", but it's a simple way of splitting the problem into subproblems that'll fit in memory.
You could solve this with a full cartesian join of the two RDDs, but this would become incredibly slow at large scale. If your problem is smallish, sure, this is an easy and cheap approach. Just emit the overlap, if any, between every pair in the join.
To do better, I imagine that you can solve this by sorting the sets by start point, and then walking through both at the same time, matching one's current interval versus another and emitting overlaps. Details left to the reader.
You can almost solve this by first mapping each tuple (x,y) in A to something like ((x,y),'A') or something, and the same for B, and then taking the union and sortBy the x values. Then you can mapPartitions to encounter a stream of labeled segments and implement your algorithm.
This doesn't quite work though since you would miss overlaps between values at the ends of partitions. I can't think of a good simple way to take care of that off the top of my head.
I use zeros to initialize my matrix like this:
height = 352
width = 288
nFrames = 120
imgYuv=zeros([height,width,3,nFrames]);
However, when I set the value of nFrames larger than 120, MATLAB gives me an error message saying out of memory.
The original function is
[imgYuv, S, A]= changeYuv(fileName, width, height, idxFrame, nFrames)
my command is
[imgYuv,S,A]=changeYuv('tilt.yuv',352,288,1:120,120);
Can anyone please tell me what's going on here?
PS: one of the purposes of the function is to load a yuv video which consists more than 2000 frames. Is there any possibility to implement that?
There are three ways to avoid the error
Process a limited number of
frames at any given time.
Work
with integer arrays. Most movies are
in 8-bit format, while Matlab
normally works with doubles.
uint8 takes 1 byte per element,
while double takes 8 bytes. Thus,
if you create your array as B =
zeros(height,width,3,nFrames,'uint8)`,
it only uses 1/8th of the memory.
This might work for 120 frames,
though for 2000 frames, you'll run
again into trouble. Note that not
all Matlab functions work for
integer arrays; you may have to
reimplement those that require
double.
Buy more RAM.
Yes, you (or rather, your Matlab session) are running out of memory.
Get out your calculator and find the product height x width x 3 x nFrames x 8 which will tell you how much memory you have tried to get in your call to zeros. That will be a number either close to or in excess of the RAM available to Matlab on your computer.
Your command is:
[imgYuv,S,A]=changeYuv('tilt.yuv',352,288,1:120,120);
That is:
352*288*120*120 = 1459814400
That is 1.4 * 10^9. If one object has 4 bytes, then you need 6GB. That is a lot of memory...
Referencing the code I've seen in your withdrawn post, your calculating the difference between adjacent frame histograms. One option to avoid massive memory allocation might be to just hold two frames in memory, instead of reading all the frames at once.
The function B = zeros([d1 d2 d3...]) creates an multi-dimensional array with dimensions d1*d2*d3*...
Depending on width and height, given the 3rd dimension of 3 and the 4th dimension of 120 (which effectively results in width*height*360), may result in a very huge array. There are certain memory limits on every machine, maybe you reached these... ;)