\[Errno -101\] NetCDF: HDF error when opening netcdf file - hdf5

I have this error when opening my netcdf file.
The code was working before.
How do I fix this ?
Traceback (most recent call last):
File "", line 1, in
...
File "file.py", line 71, in gather_vgt
return xr.open_dataset(filename)
File "/.../lib/python3.6/site-packages/xarray/backends/api.py", line
286, in open_dataset
autoclose=autoclose)
File "/.../lib/python3.6/site-packages/xarray/backends/netCDF4_.py",
line 275, in open
ds = opener()
File "/.../lib/python3.6/site-packages/xarray/backends/netCDF4_.py",
line 199, in _open_netcdf4_group
ds = nc4.Dataset(filename, mode=mode, **kwargs)
File "netCDF4/_netCDF4.pyx", line 2015, in
netCDF4._netCDF4.Dataset.init
File "netCDF4/_netCDF4.pyx", line 1636, in
netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: b'file.nc'
When I try to open the same netcdf file with h5py I get this error :
OSError: Unable to open file (file locking disabled on this file
system (use HDF5_USE_FILE_LOCKING environment variable to override),
errno = 38, error message = '...')

You must be in this situation :
your HDF5 library has been updated (1.10.1) (netcdf uses HDF5 under the hood)
your file system does not support the file locking that the HDF5 library uses.
In order to read your hdf5 or netcdf files, you need set this environment variable :
HDF5_USE_FILE_LOCKING=FALSE
For references, this was introduced in HDF5 version 1.10.1,
Added a mechanism for disabling the SWMR file locking scheme.
The file locking calls used in HDF5 1.10.0 (including patch1)
will fail when the underlying file system does not support file
locking or where locks have been disabled. To disable all file
locking operations, an environment variable named
HDF5_USE_FILE_LOCKING can be set to the five-character string
'FALSE'. This does not fundamentally change HDF5 library
operation (aside from initial file open/create, SWMR is lock-free),
but users will have to be more careful about opening files
to avoid problematic access patterns (i.e.: multiple writers) >that the file locking was designed to prevent.
Additionally, the error message that is emitted when file lock
operations set errno to ENOSYS (typical when file locking has been
disabled) has been updated to describe the problem and potential
resolution better.
(DER, 2016/10/26, HDFFV-9918)

In my case, the solution as suggested by #Florian did not work. I found another solution, which suggested that the order in which h5py and netCDF4 are imported matters (see here).
And, indeed, the following works for me:
from netCDF4 import Dataset
import h5py
Switching the order results in the error as described by OP.

Related

how to save logs from c++ binary in beam python?

I have a c++ binary that uses glog. I run that binary within beam python on cloud dataflow. I want to save c++ binary's stdout, stderr and any log file for later inspection. What's the best way to do that?
This guide gives an example for beam java. I tried to do something like that.
def sample(target, output_dir):
import os
import subprocess
import tensorflow as tf
log_path = target + ".log"
with tf.io.gfile.GFile(log_path, mode="w") as log_file:
subprocess.run(["/app/.../sample.runfiles/.../sample",
"--target", target,
"--logtostderr"],
stdout=log_file,
stderr=subprocess.STDOUT)
I got the following error.
...
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/home/swang/.cache/bazel/_bazel_swang/09eb83215bfa3a8425e4385b45dbf00d/execroot/__main__/bazel-out/k8-opt/bin/garage/sample_launch.runfiles/pip_parsed_deps_apache_beam/site-packages/apache_beam/transforms/core.py", line 1877, in <lambda>
wrapper = lambda x, *args, **kwargs: [fn(x, *args, **kwargs)]
File "/home/swang/.cache/bazel/_bazel_swang/09eb83215bfa3a8425e4385b45dbf00d/execroot/__main__/bazel-out/k8-opt/bin/garage/sample_launch.runfiles/__main__/garage/sample_launch.py", line 17, in sample
File "/usr/local/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.8/subprocess.py", line 808, in __init__
errread, errwrite) = self._get_handles(stdin, stdout, stderr)
File "/usr/local/lib/python3.8/subprocess.py", line 1489, in _get_handles
c2pwrite = stdout.fileno()
AttributeError: 'GFile' object has no attribute 'fileno' [while running 'Map(functools.partial(<function sample at 0x7f45e8aa5a60>, output_dir='gs://swang/sample/20220815_test'))-ptransform-28']
google.cloud.storage API also does not seem to expose fileno().
import google.cloud.storage
google.cloud.storage.blob.Blob("test", google.cloud.storage.bucket.Bucket(google.cloud.storage.client.Client(), "swang"))
<Blob: swang, test, None>
blob = google.cloud.storage.blob.Blob("test", google.cloud.storage.bucket.Bucket(google.cloud.storage.client.Client(), "swang"))
reader = google.cloud.storage.fileio.BlobReader(blob)
reader.fileno()
Traceback (most recent call last):
File "/usr/lib/python3.8/code.py", line 90, in runcode
exec(code, self.locals)
I also considered writing the logs in c++ binary rather than passing them to python. As glog is implemented on top of c++ FILE rather than iostream, I have to reset stdout etc to gcs at FILE level like this rather than reset cout to gcs in iostream level like this. But gcs c++ API is only implemented on top of iostream, so this approach does not work. Using dup2 like this is another approach but seem too complicated and expensive.
You can use the Filesystems module of Beam to open a writable channel (file handle where you have write permissions) in any of the filesystems supported by Beam. If you are running in Dataflow, this will automatically use the credentials of the Dataflow job to access Google Cloud Storage: https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html?apache_beam.io.filesystems.FileSystems.create
If you are writing to GCS, you need to make sure that you don't overwrite an object, that would produce an error.

client.upload_file() for nested modules

I have a project structured as follows;
- topmodule/
- childmodule1/
- my_func1.py
- childmodule2/
- my_func2.py
- common.py
- __init__.py
From my Jupyter notebook on an edge-node of a Dask cluster, I am doing the following
from topmodule.childmodule1.my_func1 import MyFuncClass1
from topmodule.childmodule2.my_func2 import MyFuncClass2
Then I am creating a distributed client & sending work as follows;
client = Client(YarnCluster())
client.submit(MyFuncClass1.execute)
This errors out, because the workers do not have the files of topmodule.
"/mnt1/yarn/usercache/hadoop/appcache/application_1572459480364_0007/container_1572459480364_0007_01_000003/environment/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 59, in loads return pickle.loads(x) ModuleNotFoundError: No module named 'topmodule'
So what I tried to do is - I tried uploading every single file under "topmodule". The files directly under the "topmodule" seems to get uploaded, but the nested ones do not. Below is what I am talking about;
Code:
from pathlib import Path
for filename in Path('topmodule').rglob('*.py'):
print(filename)
client.upload_file(filename)
Console output:
topmodule/common.py # processes fine
topmodule/__init__.py # processes fine
topmodule/childmodule1/my_func1.py # throws error
Traceback:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-13-dbf487d43120> in <module>
3 for filename in Path('nodes').rglob('*.py'):
4 print(filename)
----> 5 client.upload_file(filename)
~/miniconda/lib/python3.7/site-packages/distributed/client.py in upload_file(self, filename, **kwargs)
2929 )
2930 if isinstance(result, Exception):
-> 2931 raise result
2932 else:
2933 return result
ModuleNotFoundError: No module named 'topmodule'
My question is - how can I upload an entire module and its files to workers? Our module is big so I want to avoid restructuring it just for this issue, unless the way we're structuring the module is fundamentally flawed.
Or - is there a better way to have all dask workers understand the modules perhaps from a git repository?
When you call upload_file on every file individually you lose the directory structure of your module.
If you want to upload a more comprehensive module you can package up your module into a zip or egg file and upload that.
https://docs.dask.org/en/latest/futures.html#distributed.Client.upload_file

Drake Visualizer : Unknown file extension in readPolyData when using .dae file

I am trying to add a custom mesh (a torus) .dae file for collision and visual to my .sdf model.
When I run my program, drake visualizer gives the following error
File "/opt/drake/lib/python2.7/site-packages/director/lcmUtils.py", line 119, in handleMessage
callback(msg)
File "/opt/drake/lib/python2.7/site-packages/director/drakevisualizer.py", line 352, in onViewerLoadRobot
self.addLinksFromLCM(msg)
File "/opt/drake/lib/python2.7/site-packages/director/drakevisualizer.py", line 376, in addLinksFromLCM
self.addLink(Link(link), link.robot_num, link.name)
File "/opt/drake/lib/python2.7/site-packages/director/drakevisualizer.py", line 299, in __init__
self.geometry.extend(Geometry.createGeometry(link.name + ' geometry data', g))
File "/opt/drake/lib/python2.7/site-packages/director/drakevisualizer.py", line 272, in createGeometry
polyDataList, visInfo = Geometry.createPolyDataFromFiles(geom)
File "/opt/drake/lib/python2.7/site-packages/director/drakevisualizer.py", line 231, in createPolyDataFromFiles
polyDataList = [ioUtils.readPolyData(filename)]
File "/opt/drake/lib/python2.7/site-packages/director/ioUtils.py", line 25, in readPolyData
raise Exception('Unknown file extension in readPolyData: %s' % filename)
Exception: Unknown file extension in readPolyData: /my_path/model.dae
Since prius.sdf also uses prius.dae, I assume this is possible. What am I doing wrong?
tl;dr drake_visualizer doesn't load dae files. If you put a similarly named .obj file in the same folder it will load that (and you can leave your sdf file still referencing the dae file).
Long answer:
drake_visualizer has a very specific, arbitrary protocol for loading files. Given an arbitrary file name (e.g., my_geometry.dae) it will
Strip off the extension.
Try the following files (in order), loading the first one it finds:
my_geometry.vtm
my_geometry.vtp
my_geometry.obj
original extension.
It can load: vtm, vtp, ply, obj, and stl files.
The worst thing is if you have both a vtp and an obj file in the same folder with the same name and you specify the obj, it'll still favor the vtp file.

Which compression types support chunking in dask?

When processing a large single file, it can be broken up as so:
import dask.bag as db
my_file = db.read_text('filename', blocksize=int(1e7))
This works great, but the files I'm working with have a high level of redundancy and so we keep them compressed. Passing in compressed gzip files gives an error that seeking in gzip isn't supported and so it can't be read in blocks.
The documentation here http://dask.pydata.org/en/latest/bytes.html#compression suggests that some formats support random access.
The relevant internal code I think is here:
https://github.com/dask/dask/blob/master/dask/bytes/compression.py#L47
It looks like lzma might support it, but it's been commented out.
Adding lzma into the seekable_files dict like in the commented out code:
from dask.bytes.compression import seekable_files
import lzmaffi
seekable_files['xz'] = lzmaffi.LZMAFile
data = db.read_text('myfile.jsonl.lzma', blocksize=int(1e7), compression='xz')
Throws the following error:
Traceback (most recent call last):
File "example.py", line 8, in <module>
data = bag.read_text('myfile.jsonl.lzma', blocksize=int(1e7), compression='xz')
File "condadir/lib/python3.5/site-packages/dask/bag/text.py", line 80, in read_text
**(storage_options or {}))
File "condadir/lib/python3.5/site-packages/dask/bytes/core.py", line 162, in read_bytes
size = fs.logical_size(path, compression)
File "condadir/lib/python3.5/site-packages/dask/bytes/core.py", line 500, in logical_size
g.seek(0, 2)
io.UnsupportedOperation: seek
I assume that the functions at the bottom of the file (get_xz_blocks) for example can be used for this, but don't seem to be in use anywhere in the dask project.
Are there compression libraries that do support this seeking and chunking? If so, how can they be added?
Yes, you are right that the xz format can be useful to you. The confusion is, that the file may be block-formatted, but the standard implementation lzmaffi.LZMAFile (or lzma) does not make use of this blocking. Note that block-formatting is only optional for zx files, e.g., by using --block-size=size with xz-utils.
The function compression.get_xz_blocks will give you the set of blocks in a file by reading the header only, rather than the whole file, and you could use this in combination with delayed, essentially repeating some of the logic in read_text. We have not put in the time to make this seamless; the same pattern could be used to write blocked xz files too.

Lua socket.http loads fine from example script, but does not load from third party host

I'm working on a Lua script which will be hosted by a third party program (some .exe which will call a certain function in my script). In order to implement a functionality I need (make a rest call to a webservice to retrieve certain info) I want to use socket.http.request.
I've first build an example script for the call I wanted to make:
local io = require("io")
local http = require("socket.http")
local ltn12 = require("ltn12")
local data = "some data")
local response = {}
socket.http.request({
method = "POST",
url = "http://localhost:8080/someServce/rest/commands/someCommand",
headers = {
["Content-Type"] = "application/x-www-form-urlencoded",
["Content-Length"] = string.len(data)
},
source = ltn12.source.string(data),
sink = ltn12.sink.table(response)
})
print(table.concat(response))
print("Done")
This works fine. I get the response I expect.
Now when I try to do this from the third party host, I first got an error:
module 'socket.http' not found:
no field package.preload['socket.http']
no file '.\socket\http.lua'
no file 'D:\SomeFolder\lua\socket\http.lua'
no file 'D:\SomeFolder\lua\socket\http\init.lua'
no file 'D:\SomeFolder\socket\http.lua'
no file 'D:\SomeFolder\socket\http\init.lua'
no file 'C:\Program Files (x86)\Lua\5.1\lua\socket\http.luac'
no file '.\socket\http.dll'
no file 'D:\SomeFolder\socket\http.dll'
no file 'D:\SomeFolder\loadall.dll'
no file '.\socket.dll'
no file 'D:\SomeFolder\socket.dll'
no file 'D:\SomeFolder\loadall.dll'
I've tried copying the socket folder from the LUA folder to the folder the host is executing from (D:\SomeFolder). It then finds the module, but fails to load it with another error:
loop or previous error loading module 'socket.http'
I've also tried moving the require statement outside of the function and making it global. This gives me yet another error:
module 'socket.core' not found:
no field package.preload['socket.core']
no file '.\socket\core.lua'
no file 'D:\SomeFolder\lua\socket\core.lua'
no file 'D:\SomeFolder\lua\socket\core\init.lua'
no file 'D:\SomeFolder\socket\core.lua'
no file 'D:\SomeFolder\socket\core\init.lua'
no file 'C:\Program Files (x86)\Lua\5.1\lua\socket\core.luac'
no file 'C:\Program Files (x86)\Lua\5.1\lua\socket\core.lua'
no file '.\socket\core.dll'
no file 'D:\SomeFolder\socket\core.dll'
no file 'D:\SomeFolder\loadall.dll'
no file '.\socket.dll'
no file 'D:\SomeFolder\socket.dll'
no file 'D:\SomeFolder\loadall.dll'
Then I tried copying the core.dll from socket into the D:\SomeFolder folder and it gave me another error:
error loading module 'socket.core' from file '.\socket\core.dll':
%1 is not a valid Win32 application.
Now I'm stuck. I think I must be doing something completely wrong, but I can't find any proper description on how to fix issues like this. Can anyone help me out?
As it turns out, the actual path Lua is going to look for is the problem here. Together with the third party we found that if we put a set of libraries in D:\SomeFolder\ everything now works. So for example there is now a socket.lua in D:\SomeFolder\and there are a socket and a mime forlder there as well.
Rule of thumb appears to be that the location of lua5.1.dll that is bound by the application is leading for the location of any modules you want to load.
You probably need to have the following folder structure (relative to your D:\SomeFolder folder):
socket.lua
socket/core.dll
socket/http.lua
socket/url.lua
socket/<any other file from socket folder required by http.lua>
I just tested this configuration and it works for me.
loop or previous error loading module 'socket.http'
This is usually caused by loading socket.http from socket/http.lua file itself.

Resources