Convert hdf5 to netcdf and rename dimensions - hdf5

I have a set of HDF5 files that have the following header:
netcdf control-A-2017-05-12-090000-g1 {
dimensions:
phony_dim_0 = 16 ;
phony_dim_1 = 16 ;
phony_dim_2 = 200 ;
phony_dim_3 = 2 ;
phony_dim_4 = 1 ;
phony_dim_5 = 4 ;
variables:
...
Since it's HDF5, the dimensions are created as phony_dim_x. In this case, phony_dim_0 and phony_dim_1 are the y and x coordinates, respectively. I would like to rename the dimensions appropriately. Since renaming dimensions in HDF5 is not possible (since they don't technically exist), I need to convert to netcdf first. to do so I use ncks in.h5 out.nc.
However, the header info of the converted file is:
netcdf control-A-2017-05-12-090500-g1 {
dimensions:
phony_dim_0 = 16 ;
phony_dim_1 = 200 ;
phony_dim_2 = 2 ;
phony_dim_3 = 1 ;
phony_dim_4 = 4 ;
variables:
...
Here's the important part: the two phony_dim_[0,1] dimensions have been combined to a single dimension, phony_dim_0. I assume this is because they have the same value, and so the netcdf conversion assumes they're the same.
A variable that was listed in the hdf5 file as
ACCPA(phony_dim_0, phony_dim_1) ; is now ACCPA(phony_dim_0, phony_dim_0) ;, with two identical dimensions.
Thus, I am not able to rename the dimensions individually. If I do ncrename -d phony_dim_0,y out.nc, I get ACCPA(y, y) ;
Can anyone point me in the right direction to get around this?

The problem ended up being with ncks. Converting the file with ncks resulting in repeated dimensions (e.g. ACCPA(phony_dim_0, phony_dim_0) ;)
Using nccopy instead, the converted netCDF file did not produce repeated dimensions (ACCPA(phony_dim_0, phony_dim_1) ;)

Related

Modify values programmatically SPSS

I have a file with more than 250 variables and more than 100 cases. Some of these variables have an error in decimal dot (20445.12 should be 2.044512).
I want to modify programatically these data, I found a possible way in a Visual Basic editor provided by SPSS (I show you a screen shot below), but I have an absolute lack of knowledge.
How can I select a range of cells in this language?
How can I store the cell once modified its data?
--- EDITED NEW DATA ----
Thank you for your fast reply.
The problem now its the number of digits that number has. For example, error data could have the following format:
Case A) 43998 (five digits) ---> 4.3998 as correct value.
Case B) 4399 (four digits) ---> 4.3990 as correct value, but parsed as 0.4399 because 0 has been removed when file was created.
Is there any way, like:
IF (NUM < 10000) THEN NUM = NUM / 1000 ELSE NUM = NUM / 10000
Or something like IF (Number_of_digits(NUM)) THEN ...
Thank you.
there's no need for VB script, go this way:
open a syntax window, paste the following code:
do repeat vr=var1 var2 var3 var4.
compute vr=vr/10000.
end repeat.
save outfile="filepath\My corrected data.sav".
exe.
Replace var1 var2 var3 var4 with the names of the actual variables you need to change. For variables that are contiguous in the file you may use var1 to var4.
Replace vr=vr/10000 with whatever mathematical calculation you would like to use to correct the data.
Replace "filepath\My corrected data.sav" with your path and file name.
WARNING: this syntax will change the data in your file. You should make sure to create a backup of your original in addition to saving the corrected data to a new file.

List still being treated as a set even after converting

So i have an instance where even after converting my sets to lists, they aren't recognized as lists.
So the idea is to delete extra columns from a data frame comparing with columns in another. I have two data frames say df_test and df_train . I need to remove columns in df_test which are not in train .
extracols = set(df_test.columns) - set(df_train.columns) #Gives cols 2b
deltd
l = [extracols] # or list(extracols)
Xdp.dropna( subset = l, how ='any' , axis = 0)
I get an error : Unhashable type set
Even on printing l it prints like a set with {} curlies.
[{set}] doesn't cast to list, it just creates a list of length 1 with your set inside it.
Are you sure that list({set}) isn't working for you? Maybe you should post more of your code as it is hard to see where this is going wrong for you.

find lat/long point in a hdf5

I have an HDF5 files, global coverage of temperature. The file was converted from netcdf. The conversion process set longitude from 0 to 360 and additionally flipped the map upside down, so north is now south. I have used HDFView and I can display the file but there is no way to interact with the map so locate a specific lat/long combination. The file doesn't display properly in arcmap even after setting the correct projection.
Is there anyway I can display the data and click on a location and extract lat/long or draw a point in a specific lat/long?
Short answer: No, that's not possible.
Long answer: Unlike NetCDF, HDF5 is a general purpose file format. It allows you to store n-dimensional numerical arrays (called datasets), grouped into folders (hence the name "hierarchical"). Nothing more. There is no semantics. To HDF5, your data is not a "map", it's just an array. Therefore, HDFView does not "know" about latitudes and longitudes. That information was lost in the NetCDF => HDF5 conversion process. Actually, the lat/lon arrays are probably still in the file but they no longer have any inherent meaning. NetCDF, on the other hand, imposes a common data model including coordinate systems. That's why the various visualization tools let you interact with your data in a more sophisticated way.
What tool did you use to convert your NetCDF-file to HDF5?
You can use HDF5 to store meteorological data (I do that, it works well). But then you have to write your own tools for georeferencing and visualization. Check out the h5py project if you're into Python.
As #heron13 has said, HDF5 is a file format.
What version of NetCDF was your file? As version 4 uses an enhanced version of HDF5 as the storage layer.
Does your NetCDF file follow (have) the CF conventions or COARDS conventions? If so I would look at the program you used to convert it to HDF5, as HDF5 can support the same conventions. For example.
Once you confirm that the conventions are in the HDF5 file, arcmap is meant to support them too (sorry I do not have access to arcmap to confirm).
Here's a look at a NetCDF file with the CF conventions:
$ ncdump tos_O1_2001-2002.nc | less
netcdf tos_O1_2001-2002 {
dimensions:
lon = 180 ;
lat = 170 ;
time = UNLIMITED ; // (24 currently)
bnds = 2 ;
variables:
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
lon:bounds = "lon_bnds" ;
lon:original_units = "degrees_east" ;
...
While here is a view of the same file only using h5dump:
$ h5dump tos_O1_2001-2002.nc | less
HDF5 "tos_O1_2001-2002.nc" {
GROUP "/" {
ATTRIBUTE "Conventions" {
DATATYPE H5T_STRING {
STRSIZE 6;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "CF-1.0"
}
}
...
One other question, is there any reason why you are not using the NetCDF file in arcmap?

randomly selection of images from File

I have a file that contains a 400 images. What I want is to separate this file into two files: train_images and test_images.
The train_images should contains 150 images selected randomly, and all these images must be different from each other. Then, the test_images should also contains 150 images selected randomly, and should be different from each other, even from the images selected in the file train_images.
I begin by writing a code that aims to select a random number of images from a Faces file and put them on train_images file. I need your help in order to respond to my behavior described above.
clear all;
close all;
clc;
Train_images='train_faces';
mkdir(Train_images);
ImageFiles = dir('Faces');
totalNumberOfImages = length(ImageFiles)-1;
scrambledList = randperm(totalNumberOfImages);
numberIWantToUse = 150;
loop_counter = 1;
for index = scrambledList(1:numberIWantToUse)
baseFileName = ImageFiles(index).name;
str = fullfile('faces', baseFileName); % Better than STRCAT
face = imread(str);
imwrite( face, fullfile(Train_images, ['hello' num2str(index) '.jpg']));
loop_counter = loop_counter + 1;
end
Any help will be very appreciated.
Your code looks good to me. When you implement the test, you can re-run the scrambledList = randperm(totalNumberOfImages); then select the first 150 elements in scrambledList as you did in training process.
You can also directly re-initialize the loop:
for index = scrambledList(numberIWantToUse+1 : 2*numberIWantToUse)
... % same thing you wrote in your training loop
end
with this approach, your test sample will be completely different from the training sample.
Supposing that you have the Bioinformatics Toolbox, you can use crossvalind using the parameter HoldOut:
This is an example. trainand test are logical arrays, so you can use findto get the actual indexes:
ImageFiles = dir('Faces');
ImageFilesIndexes = ones(1,length(ImageFiles )) %Use a numeric array instead the char array
proportion = 150/400; %Testing set
[train,test] = crossvalind('holdout',ImageFilesIndexes,proportion );
training_files = ImageFiles(train); %250 files: It is better to use more data to train
testing_files = ImageFiles(test); %150 files
%Then do whatever you like with the files
Other possibilities are dividerand ( Neural Network Toolbox) and cvpartition (Statistics Toolbox)

Size of the array that Fortran can handle

I have 30000 files to process each file has 80000 x 5 lines. I need to read all files and process them finding the average of each line. I have written the code to read and extract all data from the file. My code is in Fortran. There is an array of (30000 X 800000) My program could not go over (3300 X 80000). I need to add the 4th column of each file in 300 file steps, I mean 4th column of 1st file with 4th column of 301st file, 4th col of 2nd file with 4th col of 302nd file and so on .Do you think this is because of the limitation of the size of array that Fortran can handle? If so, is there any way to increase the size of the array that Fortran can handle? What about the no of files? My code looks like this:
This program runs well.
implicit double precision (a-h,o-z),integer(i-n)
dimension x(78805,5),y(78805,5),den(78805,5)
dimension b(3300,78805),bb(78805)
character*70,fn
nf = 3300 ! NUMBER OF FILES
nj = 78804 ! Number of rows in file.
ns = 300 ! No. of steps for files.
ncores = 11 ! No of Cores
c--------------------------------------------------------------------
c--------------------------------------------------------------------
!Initialization
do i = 0,nf
do j = 1, nj
x(j,1) = 0.0
y(j,2) = 0.0
den(j,4) = 0.0
c a(i,j) = 0.0
b(i,j) = 0.0
c aa(j) = 0.0
bb(j) = 0.0
end do
end do
c-------!Body program-----------------------------------------------
iout = 6 ! Output Files upto "ns" no.
DO i= 1,nf ! LOOP FOR THE NUMBER OF FILES
write(fn,10)i
open(1,file=fn)
do j=1,nj ! Loop for the no of rows in the domain
read(1,*)x(j,1),y(j,2),den(j,4)
if(i.le.ns) then
c a(i,j) = prob(j,3)
b(i,j) = den(j,4)
else
c a(i,j) = prob(j,3) + a(i-ns,j)
b(i,j) = den(j,4) + b(i-ns,j)
end if
end do
close(1)
c ----------------------------------------------------------
c -----Write Out put [Probability and density matrix]-------
c ----------------------------------------------------------
if(i.ge.(nf-ns)) then
do j = 1, nj
c aa(j) = a(i,j)/(ncores*1.0)
bb(j) = b(i,j)/(ncores*1.0)
write(iout,*) int(x(j,1)),int(y(j,2)),bb(j)
end do
close(iout)
iout = iout + 1
end if
END DO
10 format(i0,'.txt')
END
It's hard to say for sure because you haven't given all the details yet, but your problem is quite possibly that you are using a 32 bit compiler producing 32 bit executables and you are simply running out of address space.
Although your operating system supports 64 bit address space, your 32 bit process is still limited to 32 bit addresses.
You have found a limit at 3300*78805*8 which is just under 2GB and this supports my theory.
No matter what is the cause of your immediate problem, your fundamental problem is that you appear to be loading everything into memory at once. I've not closely studied your algorithm but on first inspection it seems likely that you could re-arrange it to avoid having everything in memory at once.

Resources