I have thousands of parquet files that I need to process. Before processing the files, I'm trying to get various information about the files using the parquet metadata, such as number of rows in each partition, mins, maxs, etc.
I tried reading the metadata using dask.delayed hoping to distribute the metadata gathering tasks across my cluster, but this seems to lead to instability in Dask. See an example code snippet and an error of a node time out below.
Is there a way to read the parquet metadata from Dask? I know Dask's "read_parquet" function has a "gather_statistics" option, which you can set to false to speed up the file reads. But, I don't see a way to access all of the parquet metadata / statistics if it's set to true.
Example code:
#dask.delayed
def get_pf(item_to_read):
pf = fastparquet.ParquetFile(item_to_read)
row_groups = pf.row_groups.copy()
all_stats = pf.statistics.copy()
col = pf.info['columns'].copy()
return [row_groups, all_stats, col]
stats_arr = get_pf(item_to_read)
Example error:
2019-10-03 01:43:51,202 - INFO - 192.168.0.167 - distributed.worker - ERROR - Worker stream died during communication: tcp://192.168.0.223:34623
2019-10-03 01:43:51,203 - INFO - 192.168.0.167 - Traceback (most recent call last):
2019-10-03 01:43:51,204 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/distributed/comm/core.py", line 218, in connect
2019-10-03 01:43:51,206 - INFO - 192.168.0.167 - quiet_exceptions=EnvironmentError,
2019-10-03 01:43:51,207 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 729, in run
2019-10-03 01:43:51,210 - INFO - 192.168.0.167 - value = future.result()
2019-10-03 01:43:51,211 - INFO - 192.168.0.167 - tornado.util.TimeoutError: Timeout
2019-10-03 01:43:51,212 - INFO - 192.168.0.167 -
2019-10-03 01:43:51,213 - INFO - 192.168.0.167 - During handling of the above exception, another exception occurred:
2019-10-03 01:43:51,214 - INFO - 192.168.0.167 -
2019-10-03 01:43:51,215 - INFO - 192.168.0.167 - Traceback (most recent call last):
2019-10-03 01:43:51,217 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/distributed/worker.py", line 1841, in gather_dep
2019-10-03 01:43:51,218 - INFO - 192.168.0.167 - self.rpc, deps, worker, who=self.address
2019-10-03 01:43:51,219 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 729, in run
2019-10-03 01:43:51,220 - INFO - 192.168.0.167 - value = future.result()
2019-10-03 01:43:51,222 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 736, in run
2019-10-03 01:43:51,223 - INFO - 192.168.0.167 - yielded = self.gen.throw(*exc_info) # type: ignore
2019-10-03 01:43:51,224 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/distributed/worker.py", line 3029, in get_data_from_worker
2019-10-03 01:43:51,225 - INFO - 192.168.0.167 - comm = yield rpc.connect(worker)
2019-10-03 01:43:51,640 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 729, in run
2019-10-03 01:43:51,641 - INFO - 192.168.0.167 - value = future.result()
2019-10-03 01:43:51,643 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 736, in run
2019-10-03 01:43:51,644 - INFO - 192.168.0.167 - yielded = self.gen.throw(*exc_info) # type: ignore
2019-10-03 01:43:51,645 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/distributed/core.py", line 866, in connect
2019-10-03 01:43:51,646 - INFO - 192.168.0.167 - connection_args=self.connection_args,
2019-10-03 01:43:51,647 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 729, in run
2019-10-03 01:43:51,649 - INFO - 192.168.0.167 - value = future.result()
2019-10-03 01:43:51,650 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 736, in run
2019-10-03 01:43:51,651 - INFO - 192.168.0.167 - yielded = self.gen.throw(*exc_info) # type: ignore
2019-10-03 01:43:51,652 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/distributed/comm/core.py", line 230, in connect
2019-10-03 01:43:51,653 - INFO - 192.168.0.167 - _raise(error)
2019-10-03 01:43:51,654 - INFO - 192.168.0.167 - File "/usr/local/lib/python3.7/dist-packages/distributed/comm/core.py", line 207, in _raise
2019-10-03 01:43:51,656 - INFO - 192.168.0.167 - raise IOError(msg)
2019-10-03 01:43:51,657 - INFO - 192.168.0.167 - OSError: Timed out trying to connect to 'tcp://192.168.0.223:34623' after 10 s: connect() didn't finish in time
Does dd.read_parquet take a long time? If not, then you can follow whatever strategy is in there to do the reading in the client.
If the data has a single _metadata file in the root directory, then you can simply open this with fastparquet, which is exactly what Dask would do. It contains all the details of all of the data pieces.
There is no particular reason distributing the metadata reads should be a problem, but you should be aware that in some cases the total metadata items can add up to a substantial size.
Related
I'm following this guide on how to set up docker with timescale/wale for continuous archiving.
https://docs.timescale.com/timescaledb/latest/how-to-guides/backup-and-restore/docker-and-wale/#run-the-timescaledb-container-in-docker
Everything runs as expected, but when I get to the final step, I'm seeing:
written to stdout
2022-04-13 07:43:33.349 UTC [27] LOG: redo done at 0/50000F8
Connecting to wale (172.18.0.3:80)
writing to stdout
- 100% |********************************| 36 0:00:00 ETA
written to stdout
Connecting to wale (172.18.0.3:80)
wget: server returned error: HTTP/1.0 500 INTERNAL SERVER ERROR
2022-04-13 07:43:34.264 UTC [27] LOG: selected new timeline ID: 2
2022-04-13 07:43:34.282 UTC [27] LOG: archive recovery complete
Connecting to wale (172.18.0.3:80)
wget: server returned error: HTTP/1.0 500 INTERNAL SERVER ERROR
2022-04-13 07:43:34.838 UTC [27] LOG: could not open file "pg_wal/000000010000000000000006": Permission denied
2022-04-13 07:43:34.844 UTC [1] LOG: database system is ready to accept connections
It looks like the wget to wale is failing? It's connected to the same network as timescaledb_recovered so shouldn't it work? is there some additional config that the docs are missing? Or am I misreading these logs somehow?
Some additional error output from wale log:
['wal-e', '--terse', 'wal-push', '/var/lib/postgresql/data/pg_wal/000000010000000000000012']
Pushing wal file /var/lib/postgresql/data/pg_wal/000000010000000000000012: ['wal-e', '--terse', 'wal-push', '/var/lib/postgresql/data/pg_wal/000000010000000000000012']
172.18.0.2 - - [13/Apr/2022 14:09:17] "GET /wal-push/000000010000000000000012 HTTP/1.1" 200 -
['wal-e', '--terse', 'wal-fetch', '-p=0', '000000010000000000000011', '/var/lib/postgresql/data/pg_wal/000000010000000000000011']
Fetching wal 000000010000000000000011: ['wal-e', '--terse', 'wal-fetch', '-p=0', '000000010000000000000011', '/var/lib/postgresql/data/pg_wal/000000010000000000000011']
172.18.0.4 - - [13/Apr/2022 14:09:53] "GET /wal-fetch/000000010000000000000011 HTTP/1.1" 200 -
['wal-e', '--terse', 'wal-fetch', '-p=0', '000000010000000000000012', '/var/lib/postgresql/data/pg_wal/000000010000000000000012']
Fetching wal 000000010000000000000012: ['wal-e', '--terse', 'wal-fetch', '-p=0', '000000010000000000000012', '/var/lib/postgresql/data/pg_wal/000000010000000000000012']
172.18.0.4 - - [13/Apr/2022 14:09:54] "GET /wal-fetch/000000010000000000000012 HTTP/1.1" 200 -
['wal-e', '--terse', 'wal-fetch', '-p=0', '000000010000000000000011', '/var/lib/postgresql/data/pg_wal/000000010000000000000011']
Fetching wal 000000010000000000000011: ['wal-e', '--terse', 'wal-fetch', '-p=0', '000000010000000000000011', '/var/lib/postgresql/data/pg_wal/000000010000000000000011']
172.18.0.4 - - [13/Apr/2022 14:09:54] "GET /wal-fetch/000000010000000000000011 HTTP/1.1" 200 -
['wal-e', '--terse', 'wal-fetch', '-p=0', '00000002.history', '/var/lib/postgresql/data/pg_wal/00000002.history']
Fetching wal 00000002.history: ['wal-e', '--terse', 'wal-fetch', '-p=0', '00000002.history', '/var/lib/postgresql/data/pg_wal/00000002.history']
lzop: short read
wal_e.main CRITICAL MSG: An unprocessed exception has avoided all error handling
DETAIL: Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/wal_e/cmd.py", line 657, in main
args.prefetch)
File "/usr/lib/python3.5/site-packages/wal_e/operator/backup.py", line 353, in wal_restore
self.gpg_key_id is not None)
File "/usr/lib/python3.5/site-packages/wal_e/worker/worker_util.py", line 58, in do_lzop_get
return blobstore.do_lzop_get(creds, url, path, decrypt, do_retry=do_retry)
File "/usr/lib/python3.5/site-packages/wal_e/blobstore/file/file_util.py", line 52, in do_lzop_get
raise exc
File "/usr/lib/python3.5/site-packages/wal_e/blobstore/file/file_util.py", line 64, in write_and_return_error
key.get_contents_to_file(stream)
File "/usr/lib/python3.5/site-packages/wal_e/blobstore/file/calling_format.py", line 53, in get_contents_to_file
with open(self.path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/backups/wal_005/00000002.history.lzo'
STRUCTURED: time=2022-04-13T14:09:55.216294-00 pid=32
Failed to fetch wal 00000002.history: None
Exception on /wal-fetch/00000002.history [GET]
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python3.5/site-packages/flask/app.py", line 1816, in full_dispatch_request
return self.finalize_request(rv)
File "/usr/lib/python3.5/site-packages/flask/app.py", line 1831, in finalize_request
response = self.make_response(rv)
File "/usr/lib/python3.5/site-packages/flask/app.py", line 1957, in make_response
'The view function did not return a valid response. The'
TypeError: The view function did not return a valid response. The function either returned None or ended without a return statement.
172.18.0.4 - - [13/Apr/2022 14:09:55] "GET /wal-fetch/00000002.history HTTP/1.1" 500 -
['wal-e', '--terse', 'wal-fetch', '-p=0', '00000001.history', '/var/lib/postgresql/data/pg_wal/00000001.history']
Fetching wal 00000001.history: ['wal-e', '--terse', 'wal-fetch', '-p=0', '00000001.history', '/var/lib/postgresql/data/pg_wal/00000001.history']
lzop: short read
wal_e.main CRITICAL MSG: An unprocessed exception has avoided all error handling
DETAIL: Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/wal_e/cmd.py", line 657, in main
args.prefetch)
File "/usr/lib/python3.5/site-packages/wal_e/operator/backup.py", line 353, in wal_restore
self.gpg_key_id is not None)
File "/usr/lib/python3.5/site-packages/wal_e/worker/worker_util.py", line 58, in do_lzop_get
return blobstore.do_lzop_get(creds, url, path, decrypt, do_retry=do_retry)
File "/usr/lib/python3.5/site-packages/wal_e/blobstore/file/file_util.py", line 52, in do_lzop_get
raise exc
File "/usr/lib/python3.5/site-packages/wal_e/blobstore/file/file_util.py", line 64, in write_and_return_error
key.get_contents_to_file(stream)
File "/usr/lib/python3.5/site-packages/wal_e/blobstore/file/calling_format.py", line 53, in get_contents_to_file
with open(self.path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/backups/wal_005/00000001.history.lzo'
STRUCTURED: time=2022-04-13T14:09:55.689548-00 pid=38
Failed to fetch wal 00000001.history: None
Exception on /wal-fetch/00000001.history [GET]
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python3.5/site-packages/flask/app.py", line 1816, in full_dispatch_request
return self.finalize_request(rv)
File "/usr/lib/python3.5/site-packages/flask/app.py", line 1831, in finalize_request
response = self.make_response(rv)
File "/usr/lib/python3.5/site-packages/flask/app.py", line 1957, in make_response
'The view function did not return a valid response. The'
TypeError: The view function did not return a valid response. The function either returned None or ended without a return statement.
172.18.0.4 - - [13/Apr/2022 14:09:55] "GET /wal-fetch/00000001.history HTTP/1.1" 500 -
I've added some additional logs from the wale container that create the error message on running timescaledb-recovered. I'm guessing that there is some issue with the requests timescaledb-recovered is sending because wget works until that continer is started.
This is bizarre, but apparently the critical failure and 500 error are intended to lets postgres know no further segments need to be recovered. Incredibly frustrating.
I have looked hard for an answer, but haven't managed to find one, so I'm hoping someone here can help me understand this error and what is happening during the singularity pull command.
Here is the error:
Error executing process > 'QC_TRIM_READS (1)'
Caused by:
Failed to pull singularity image
command: singularity pull --name quay.io-biocontainers-sickle-trim-1.33--2.img.pulling.1632264509884 docker://quay.io/biocontainers/sickle-trim:1.33--2 > /dev/null
status : 127
message:
WARNING: pull for Docker Hub is not guaranteed to produce the
WARNING: same image on repeated pull. Use Singularity Registry
WARNING: (shub://) to pull exactly equivalent images.
/usr/bin/env: ‘python’: No such file or directory
ERROR: pulling container failed!
Here is the script (excuse the mess, I am just getting used to nextflow)
#!/usr/bin/env Nextflow
nextflow.enable.dsl=2
params.ref_genome = "./data/GmaxFiskeby_678_v1.0.fa"
params.ref_annotation = "./data/GmaxFiskeby_678_v1.1.gene_exons.gff3"
params.intermediate_dir = "$workDir/intermediate/"
workflow {
ref_genome_ch = Channel.fromPath("$params.ref_genome")
ref_annotation_ch = Channel.fromPath("$params.ref_annotation")
input_fastq_ch = Channel.fromPath("./data/*.fastq")
ref_genome_ch.view()
QC_TRIM_READS(input_fastq_ch)
STAR_INDEX_GENOME(ref_genome_ch, ref_annotation_ch)
}
process GZIP_VERSION {
echo true
script:
"""
gzip --version
"""
}
process UNZIP {
publishDir "intermediate/"
input:
path file
output:
path "${file.baseName}"
script:
"""
gzip -dfk ${file}
"""
}
process QC_TRIM_READS {
publishDir "intermediate/"
container 'quay.io/biocontainers/sickle-trim:1.33--2'
input:
path fastqFile
output:
path "${fastqFile.baseName}_trimmed.${fastqFile.getExtension()}"
script:
"""
sickle se \\
-f $fastqFile \\
-t sanger \\
-o ${fastqFile.baseName}_trimmed.${fastqFile.getExtension()} \\
-q 35 \\
-l 45
"""
}
process STAR_INDEX_GENOME {
publishDir "intermediate/indexedGenome/"
/*if (worflow.containerEngine == 'singularity'){
container "https://depot.galaxyproject.org/singularity/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:59cdd445419f14abac76b31dd0d71217994cbcc9-0"
} else {*/
container "quay.io/biocontainers/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:59cdd445419f14abac76b31dd0d71217994cbcc9-0" //'quay.io/biocontainers/star:2.6.1d--0'
//}
input:
path genome
path gtf
output:
path "star" , emit: index
script:
"""
STAR \\
--runMode genomeGenerate \\
--genomeDir star/ \\
--genomeFastaFiles ${genome}\\
--sjdbGTFfile ${gtf} \\
--sjdbGTFtagExonParentTranscript Parent \\
--sjdbOverhang 100 \\
--runThreadN 2
"""
}
Here is my configuration file:
//docker.enabled = false
singularity.enabled = true
singularity.autoMounts = true
I built my environment as a conda environment, here is the yml file:
name: nf-core
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=4.5=1_gnu
- attrs=21.2.0=pyhd3eb1b0_0
- brotlipy=0.7.0=py38h27cfd23_1003
- bzip2=1.0.8=h7b6447c_0
- c-ares=1.17.1=h27cfd23_0
- ca-certificates=2021.7.5=h06a4308_1
- cairo=1.14.12=h7636065_2
- cattrs=1.7.1=pyhd3eb1b0_0
- certifi=2021.5.30=py38h06a4308_0
- cffi=1.14.6=py38h400218f_0
- charset-normalizer=2.0.4=pyhd3eb1b0_0
- click=8.0.1=pyhd3eb1b0_0
- colorama=0.4.4=pyhd3eb1b0_0
- commonmark=0.9.1=pyhd3eb1b0_0
- coreutils=8.32=h7b6447c_0
- cryptography=3.4.7=py38hd23ed53_0
- curl=7.78.0=h1ccaba5_0
- expat=2.4.1=h2531618_2
- fontconfig=2.12.6=h49f89f6_0
- freetype=2.8=hab7d2ae_1
- fribidi=1.0.10=h7b6447c_0
- future=0.18.2=py38_1
- gettext=0.21.0=hf68c758_0
- git=2.32.0=pl5262hc120c5b_1
- gitdb=4.0.7=pyhd3eb1b0_0
- gitpython=3.1.18=pyhd3eb1b0_1
- glib=2.69.1=h5202010_0
- graphite2=1.3.14=h23475e2_0
- graphviz=2.40.1=h25d223c_0
- harfbuzz=1.7.6=h5f0a787_1
- hdf5=1.10.6=hb1b8bf9_0
- icu=58.2=he6710b0_3
- idna=3.2=pyhd3eb1b0_0
- importlib-metadata=4.8.1=py38h06a4308_0
- importlib_metadata=4.8.1=hd3eb1b0_0
- itsdangerous=2.0.1=pyhd3eb1b0_0
- jinja2=3.0.1=pyhd3eb1b0_0
- jpeg=9d=h7f8727e_0
- jsonschema=3.2.0=pyhd3eb1b0_2
- krb5=1.19.2=hac12032_0
- ld_impl_linux-64=2.35.1=h7274673_9
- libcurl=7.78.0=h0b77cf5_0
- libedit=3.1.20210714=h7f8727e_0
- libev=4.33=h7b6447c_0
- libffi=3.3=he6710b0_2
- libgcc-ng=9.3.0=h5101ec6_17
- libgfortran-ng=7.5.0=ha8ba4b0_17
- libgfortran4=7.5.0=ha8ba4b0_17
- libgomp=9.3.0=h5101ec6_17
- libiconv=1.15=h63c8f33_5
- libnghttp2=1.41.0=hf8bcb03_2
- libpng=1.6.37=hbc83047_0
- libssh2=1.9.0=h1ba5d50_1
- libstdcxx-ng=9.3.0=hd4cf53a_17
- libtiff=4.2.0=h85742a9_0
- libtool=2.4.6=h7b6447c_1005
- libwebp-base=1.2.0=h27cfd23_0
- libxcb=1.14=h7b6447c_0
- libxml2=2.9.12=h03d6c58_0
- lz4-c=1.9.3=h295c915_1
- markupsafe=2.0.1=py38h27cfd23_0
- ncbi-ngs-sdk=2.10.4=hdf6179e_0
- ncurses=6.2=he6710b0_1
- nextflow=21.04.0=h4a94de4_0
- nf-core=2.1=pyh5e36f6f_0
- openjdk=8.0.152=h7b6447c_3
- openssl=1.1.1l=h7f8727e_0
- ossuuid=1.6.2=hf484d3e_1000
- packaging=21.0=pyhd3eb1b0_0
- pango=1.42.0=h377f3fa_0
- pcre=8.45=h295c915_0
- pcre2=10.35=h14c3975_1
- perl=5.26.2=h14c3975_0
- perl-app-cpanminus=1.7044=pl526_1
- perl-business-isbn=3.004=pl526_0
- perl-business-isbn-data=20140910.003=pl526_0
- perl-carp=1.38=pl526_3
- perl-constant=1.33=pl526_1
- perl-data-dumper=2.173=pl526_0
- perl-encode=2.88=pl526_1
- perl-exporter=5.72=pl526_1
- perl-extutils-makemaker=7.36=pl526_1
- perl-file-path=2.16=pl526_0
- perl-file-temp=0.2304=pl526_2
- perl-mime-base64=3.15=pl526_1
- perl-parent=0.236=pl526_1
- perl-uri=1.76=pl526_0
- perl-xml-libxml=2.0132=pl526h7ec2d77_1
- perl-xml-namespacesupport=1.12=pl526_0
- perl-xml-sax=1.02=pl526_0
- perl-xml-sax-base=1.09=pl526_0
- perl-xsloader=0.24=pl526_0
- pip=21.2.2=py38h06a4308_0
- pixman=0.40.0=h7b6447c_0
- prompt-toolkit=3.0.17=pyhca03da5_0
- prompt_toolkit=3.0.17=hd3eb1b0_0
- pycparser=2.20=py_2
- pygments=2.10.0=pyhd3eb1b0_0
- pyopenssl=20.0.1=pyhd3eb1b0_1
- pyparsing=2.4.7=pyhd3eb1b0_0
- pyrsistent=0.17.3=py38h7b6447c_0
- pysocks=1.7.1=py38h06a4308_0
- python=3.8.11=h12debd9_0_cpython
- python_abi=3.8=2_cp38
- pyyaml=5.4.1=py38h27cfd23_1
- questionary=1.10.0=pyhd8ed1ab_0
- readline=8.1=h27cfd23_0
- requests=2.26.0=pyhd3eb1b0_0
- requests-cache=0.7.4=pyhd8ed1ab_0
- rich=10.10.0=py38h578d9bd_0
- setuptools=58.0.4=py38h06a4308_0
- singularity=2.4.2=0
- six=1.16.0=pyhd3eb1b0_0
- smmap=4.0.0=pyhd3eb1b0_0
- sqlite=3.36.0=hc218d9a_0
- sra-tools=2.11.0=pl5262h314213e_0
- tabulate=0.8.9=py38h06a4308_0
- tk=8.6.10=hbc83047_0
- typing-extensions=3.10.0.2=hd3eb1b0_0
- typing_extensions=3.10.0.2=pyh06a4308_0
- url-normalize=1.4.3=pyhd8ed1ab_0
- urllib3=1.26.6=pyhd3eb1b0_1
- wcwidth=0.2.5=pyhd3eb1b0_0
- wheel=0.37.0=pyhd3eb1b0_1
- xz=5.2.5=h7b6447c_0
- yaml=0.2.5=h7b6447c_0
- zipp=3.5.0=pyhd3eb1b0_0
- zlib=1.2.11=h7b6447c_3
- zstd=1.4.9=haebb681_0
prefix: /home/mkozubov/miniconda3/envs/nf-core
Here is the log file:
Sep-21 16:00:49.076 [main] DEBUG nextflow.cli.Launcher - $> nextflow run rnaseq.nf
Sep-21 16:00:49.318 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 21.04.0
Sep-21 16:00:49.367 [main] INFO nextflow.cli.CmdRun - Launching `rnaseq.nf` [reverent_jepsen] - revision: 0fc00d31fc
Sep-21 16:00:49.414 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/nextflow.config
Sep-21 16:00:49.418 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/nextflow.config
Sep-21 16:00:49.506 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Sep-21 16:00:50.238 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; plugins-dir=/home/mkozubov/.nextflow/plugins
Sep-21 16:00:50.240 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Sep-21 16:00:50.242 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins local root: .nextflow/plr/empty
Sep-21 16:00:50.258 [main] INFO org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Sep-21 16:00:50.262 [main] INFO org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Sep-21 16:00:50.266 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Sep-21 16:00:50.289 [main] INFO org.pf4j.AbstractPluginManager - No plugins
Sep-21 16:00:50.366 [main] DEBUG nextflow.Session - Session uuid: 22a13149-e9f8-47cc-8f09-98a6b000a83a
Sep-21 16:00:50.367 [main] DEBUG nextflow.Session - Run name: reverent_jepsen
Sep-21 16:00:50.372 [main] DEBUG nextflow.Session - Executor pool size: 5
Sep-21 16:00:50.418 [main] DEBUG nextflow.cli.CmdRun -
Version: 21.04.0 build 5552
Created: 02-05-2021 16:22 UTC (09:22 PDT)
System: Linux 5.10.16.3-microsoft-standard-WSL2
Runtime: Groovy 3.0.7 on OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12
Encoding: UTF-8 (UTF-8)
Process: 10590#DESKTOP-UJ90D1J [127.0.1.1]
CPUs: 5 - Mem: 1.9 GB (311.8 MB) - Swap: 1 GB (783.4 MB)
Sep-21 16:00:50.539 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/work
v9fs
Sep-21 16:00:50.541 [main] DEBUG nextflow.Session - Work-dir: /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/work [null]
Sep-21 16:00:50.545 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/bin
Sep-21 16:00:50.585 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Sep-21 16:00:50.616 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Sep-21 16:00:50.999 [main] DEBUG nextflow.Session - Session start invoked
Sep-21 16:00:51.461 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Sep-21 16:00:51.511 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: QC_TRIM_READS, UNZIP, STAR_INDEX_GENOME, GZIP_VERSION
Sep-21 16:00:51.643 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
Sep-21 16:00:51.643 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Sep-21 16:00:51.651 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
Sep-21 16:00:51.656 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=5; memory=1.9 GB; capacity=5; pollInterval=100ms; dumpInterval=5m
Sep-21 16:00:51.868 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
Sep-21 16:00:51.869 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Sep-21 16:00:51.904 [main] DEBUG nextflow.Session - Ignite dataflow network (5)
Sep-21 16:00:51.963 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > QC_TRIM_READS
Sep-21 16:00:51.965 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > STAR_INDEX_GENOME
Sep-21 16:00:51.966 [main] DEBUG nextflow.script.ScriptRunner - > Await termination
Sep-21 16:00:51.968 [main] DEBUG nextflow.Session - Session await
Sep-21 16:00:51.969 [PathVisitor-3] DEBUG nextflow.file.PathVisitor - files for syntax: glob; folder: ./data/; pattern: *.fastq; options: [:]
Sep-21 16:00:52.300 [Actor Thread 8] WARN nextflow.container.SingularityCache - Singularity cache directory has not been defined -- Remote image will be stored in the path: /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/work/singularity -- Use env variable NXF_SINGULARITY_CACHEDIR to specify a different location
Sep-21 16:00:52.300 [Actor Thread 8] INFO nextflow.container.SingularityCache - Pulling Singularity image docker://quay.io/biocontainers/sickle-trim:1.33--2 [cache /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/work/singularity/quay.io-biocontainers-sickle-trim-1.33--2.img]
Sep-21 16:00:52.300 [Actor Thread 7] INFO nextflow.container.SingularityCache - Pulling Singularity image docker://quay.io/biocontainers/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:59cdd445419f14abac76b31dd0d71217994cbcc9-0 [cache /mnt/c/Users/mkozubov/Desktop/nextflow_tutorial/rnaseq/work/singularity/quay.io-biocontainers-mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2-59cdd445419f14abac76b31dd0d71217994cbcc9-0.img]
Sep-21 16:00:52.433 [Actor Thread 5] ERROR nextflow.processor.TaskProcessor - Error executing process > 'QC_TRIM_READS (1)'
Caused by:
Failed to pull singularity image
command: singularity pull --name quay.io-biocontainers-sickle-trim-1.33--2.img.pulling.1632265252300 docker://quay.io/biocontainers/sickle-trim:1.33--2 > /dev/null
status : 127
message:
WARNING: pull for Docker Hub is not guaranteed to produce the
WARNING: same image on repeated pull. Use Singularity Registry
WARNING: (shub://) to pull exactly equivalent images.
/usr/bin/env: ‘python’: No such file or directory
ERROR: pulling container failed!
java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to pull singularity image
command: singularity pull --name quay.io-biocontainers-sickle-trim-1.33--2.img.pulling.1632265252300 docker://quay.io/biocontainers/sickle-trim:1.33--2 > /dev/null
status : 127
message:
WARNING: pull for Docker Hub is not guaranteed to produce the
WARNING: same image on repeated pull. Use Singularity Registry
WARNING: (shub://) to pull exactly equivalent images.
/usr/bin/env: ‘python’: No such file or directory
ERROR: pulling container failed!
at nextflow.container.SingularityCache.getCachePathFor(SingularityCache.groovy:304)
at nextflow.container.SingularityCache$getCachePathFor.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
at nextflow.container.ContainerHandler.createSingularityCache(ContainerHandler.groovy:85)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:193)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:61)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:194)
at nextflow.container.ContainerHandler.normalizeImageName(ContainerHandler.groovy:68)
at nextflow.container.ContainerHandler$normalizeImageName.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
at nextflow.processor.TaskRun.getContainer(TaskRun.groovy:587)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at org.codehaus.groovy.runtime.metaclass.MethodMetaProperty$GetBeanMethodMetaProperty.getProperty(MethodMetaProperty.java:76)
at org.codehaus.groovy.runtime.callsite.GetEffectivePogoPropertySite.getProperty(GetEffectivePogoPropertySite.java:85)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:341)
at nextflow.processor.TaskProcessor.createTaskHashKey(TaskProcessor.groovy:1939)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:193)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:61)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:591)
at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:59)
at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Failed to pull singularity image
command: singularity pull --name quay.io-biocontainers-sickle-trim-1.33--2.img.pulling.1632265252300 docker://quay.io/biocontainers/sickle-trim:1.33--2 > /dev/null
status : 127
message:
WARNING: pull for Docker Hub is not guaranteed to produce the
WARNING: same image on repeated pull. Use Singularity Registry
WARNING: (shub://) to pull exactly equivalent images.
/usr/bin/env: ‘python’: No such file or directory
ERROR: pulling container failed!
at nextflow.container.SingularityCache.runCommand(SingularityCache.groovy:256)
at nextflow.container.SingularityCache.downloadSingularityImage0(SingularityCache.groovy:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1029)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1012)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:101)
at nextflow.container.SingularityCache$_downloadSingularityImage_closure1.doCall(SingularityCache.groovy:191)
at nextflow.container.SingularityCache$_downloadSingularityImage_closure1.call(SingularityCache.groovy)
at nextflow.file.FileMutex.lock(FileMutex.groovy:107)
at nextflow.container.SingularityCache.downloadSingularityImage(SingularityCache.groovy:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1029)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1012)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:101)
at nextflow.container.SingularityCache$_getLazyImagePath_closure2.doCall(SingularityCache.groovy:281)
at nextflow.container.SingularityCache$_getLazyImagePath_closure2.call(SingularityCache.groovy)
at groovyx.gpars.dataflow.LazyDataflowVariable$1.run(LazyDataflowVariable.java:70)
... 3 common frames omitted
Sep-21 16:00:52.443 [Actor Thread 5] DEBUG nextflow.Session - Session aborted -- Cause: java.lang.IllegalStateException: Failed to pull singularity image
command: singularity pull --name quay.io-biocontainers-sickle-trim-1.33--2.img.pulling.1632265252300 docker://quay.io/biocontainers/sickle-trim:1.33--2 > /dev/null
status : 127
message:
WARNING: pull for Docker Hub is not guaranteed to produce the
WARNING: same image on repeated pull. Use Singularity Registry
WARNING: (shub://) to pull exactly equivalent images.
/usr/bin/env: ‘python’: No such file or directory
ERROR: pulling container failed!
Sep-21 16:00:52.494 [Actor Thread 5] DEBUG nextflow.Session - The following nodes are still active:
[process] QC_TRIM_READS
status=ACTIVE
port 0: (queue) closed; channel: fastqFile
port 1: (cntrl) - ; channel: $
Sep-21 16:00:52.507 [main] DEBUG nextflow.Session - Session await > all process finished
Sep-21 16:00:52.510 [main] DEBUG nextflow.Session - Session await > all barriers passed
Sep-21 16:00:52.521 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ]
Sep-21 16:00:52.685 [main] DEBUG nextflow.CacheDB - Closing CacheDB done
Sep-21 16:00:52.752 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
I have been using nf-core's rnaseq pipeline to guide me a bit: https://github.com/nf-core/rnaseq
If it helps, here is the pipeline I am trying to automate: https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/rna-seq-tutorial-with-reference-genome/#
My computer has a Windows 10 operating system, and I have enabled WSL2 and got Ubuntu.
I am fairly new to Docker, Singularity, and Nextflow so I am hoping someone can explain the error. I don't even understand why python is being mentioned. Is the issue that singularity cannot pull from Quay.io? I am a bit lost and would appreciate a nudge in the right direction.
Also the reason I am trying to get singularity to work is STAR immediately gives me a Segmentation fault error on my local machine (i'm assuming I run out of memory), and I would like to test this pipeline on our HPC (but I don't have root privileges).
You can ignore the Singularity warnings, but not the errors. The problem looks to be that you're missing python in your environment:
/usr/bin/env: ‘python’: No such file or directory
ERROR: pulling container failed!
You need to make sure you have Python 3 installed. If you have, you should be able to see it here with:
/usr/bin/python --version
You didn't mention the version of Ubuntu you are using, but if you have Ubuntu 20.04 then you should already have Python 3 pre-installed. If this is the case, and you have Python 3 already installed (i.e. you find that /usr/bin/python3 --version works, note the '3') but the above doesn't, try:
sudo apt-get install python-is-python3
This will install a symlink to point the /usr/bin/python interpreter at the current default python3.
I'm currently researching the resilience4j library and for some reason the following code doesn't work as expected:
#Test
public void testRateLimiterProjectReactor()
{
// The configuration below will allow 2 requests per second and a "timeout" of 2 seconds.
RateLimiterConfig config = RateLimiterConfig.custom()
.limitForPeriod(2)
.limitRefreshPeriod(Duration.ofSeconds(1))
.timeoutDuration(Duration.ofSeconds(2))
.build();
// Step 2.
// Create a RateLimiter and use it.
RateLimiterRegistry registry = RateLimiterRegistry.of(config);
RateLimiter rateLimiter = registry.rateLimiter("myReactorServiceNameLimiter");
// Step 3.
Flux<Integer> flux = Flux.from(Flux.range(0, 10))
.transformDeferred(RateLimiterOperator.of(rateLimiter))
.log()
;
StepVerifier.create(flux)
.expectNextCount(10)
.expectComplete()
.verify()
;
}
According to the official examples here and here this should be limiting the request() to 2 elements per second. However, the logs show it's fetching all of the elements immediately:
15:08:24.587 [main] DEBUG reactor.util.Loggers - Using Slf4j logging framework
15:08:24.619 [main] INFO reactor.Flux.Defer.1 - onSubscribe(RateLimiterSubscriber)
15:08:24.624 [main] INFO reactor.Flux.Defer.1 - request(unbounded)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(0)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(1)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(2)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(3)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(4)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(5)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(6)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(7)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(8)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onNext(9)
15:08:24.626 [main] INFO reactor.Flux.Defer.1 - onComplete()
I don't see what's wrong?
As already answered in comments above RateLimiter tracks the number of subscriptions, not elements. To achieve rate limiting on elements you can use limitRate (and buffer + delayElements).
For example,
Flux.range(1, 100)
.delayElements(Duration.ofMillis(100)) // to imitate a publisher that produces elements at a certain rate
.log()
.limitRate(10) // used to requests up to 10 elements from the publisher
.buffer(10) // groups integers by 10 elements
.delayElements(Duration.ofSeconds(2)) // emits a group of ints every 2 sec
.subscribe(System.out::println);
I am trying to use the dask.distributed.SLURMCluster to submit batch jobs to a SLURM job scheduler on a supercomputing cluster. The jobs all submit as expect, but throw an error after 1 minute of running: asyncio.exceptions.TimeoutError: Nanny failed to start in 60 seconds. How do I get the nanny to connect?
Full Trace:
distributed.nanny - INFO - Start Nanny at: 'tcp://206.76.203.125:38324'
distributed.dashboard.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.worker - INFO - Start worker at: tcp://206.76.203.125:37609
distributed.worker - INFO - Listening to: tcp://206.76.203.125:37609
distributed.worker - INFO - dashboard at: 206.76.203.125:35505
distributed.worker - INFO - Waiting to connect to: tcp://129.114.63.43:35489
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 8
distributed.worker - INFO - Memory: 2.00 GB
distributed.worker - INFO - Local Directory: /home1/06729/tg860286/tests/dask-rsmas-presentation/dask-worker-space/worker-pu937jui
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Waiting to connect to: tcp://129.114.63.43:35489
distributed.worker - INFO - Waiting to connect to: tcp://129.114.63.43:35489
distributed.worker - INFO - Waiting to connect to: tcp://129.114.63.43:35489
distributed.worker - INFO - Waiting to connect to: tcp://129.114.63.43:35489
distributed.nanny - INFO - Closing Nanny at 'tcp://206.76.203.125:38324'
distributed.worker - INFO - Stopping worker at tcp://206.76.203.125:37609
distributed.worker - INFO - Closed worker has not yet started: None
distributed.dask_worker - INFO - End worker
Traceback (most recent call last):
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/distributed/node.py", line 173, in wait_for
await asyncio.wait_for(future, timeout=timeout)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/asyncio/tasks.py", line 490, in wait_for
raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 440, in <module>
go()
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 436, in go
main()
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 422, in main
loop.run_sync(run)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/tornado/ioloop.py", line 532, in run_sync
return future_cell[0].result()
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/distributed/cli/dask_worker.py", line 416, in run
await asyncio.gather(*nannies)
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/asyncio/tasks.py", line 684, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/home1/06729/tg860286/miniconda3/envs/daskbase/lib/python3.8/site-packages/distributed/node.py", line 176, in wait_for
raise TimeoutError(
asyncio.exceptions.TimeoutError: Nanny failed to start in 60 seconds```
It looks like your workers weren't able to connect to the scheduler. My guess is that you need to specify a network interface. You should ask your system administrator which network interface you should use, and then specify that with the interface= keyword.
You might also want to read through https://blog.dask.org/2019/08/28/dask-on-summit , which gives a case study of common problems that arise.
I'm creating a Hourly task in Airflow that schedules a Dataflow Job, however the hook provided by Airflow Library most of the times crashes while the dataflow job actually succeed.
[2018-05-25 07:05:03,523] {base_task_runner.py:98} INFO - Subtask: [2018-05-25 07:05:03,439] {gcp_dataflow_hook.py:109} WARNING - super(GcsIO, cls).__new__(cls, storage_client))
[2018-05-25 07:05:03,721] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last):
[2018-05-25 07:05:03,725] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/bin/airflow", line 27, in <module>
[2018-05-25 07:05:03,726] {base_task_runner.py:98} INFO - Subtask: args.func(args)
[2018-05-25 07:05:03,729] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
[2018-05-25 07:05:03,729] {base_task_runner.py:98} INFO - Subtask: pool=args.pool,
[2018-05-25 07:05:03,731] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-05-25 07:05:03,732] {base_task_runner.py:98} INFO - Subtask: result = func(*args, **kwargs)
[2018-05-25 07:05:03,734] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1492, in _run_raw_task
[2018-05-25 07:05:03,738] {base_task_runner.py:98} INFO - Subtask: result = task_copy.execute(context=context)
[2018-05-25 07:05:03,740] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/contrib/operators/dataflow_operator.py", line 313, in execute
[2018-05-25 07:05:03,746] {base_task_runner.py:98} INFO - Subtask: self.py_file, self.py_options)
[2018-05-25 07:05:03,748] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 188, in start_python_dataflow
[2018-05-25 07:05:03,751] {base_task_runner.py:98} INFO - Subtask: label_formatter)
[2018-05-25 07:05:03,753] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 158, in _start_dataflow
[2018-05-25 07:05:03,756] {base_task_runner.py:98} INFO - Subtask: _Dataflow(cmd).wait_for_done()
[2018-05-25 07:05:03,757] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 129, in wait_for_done
[2018-05-25 07:05:03,759] {base_task_runner.py:98} INFO - Subtask: line = self._line(fd)
[2018-05-25 07:05:03,761] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 110, in _line
[2018-05-25 07:05:03,763] {base_task_runner.py:98} INFO - Subtask: line = lines[-1][:-1]
[2018-05-25 07:05:03,766] {base_task_runner.py:98} INFO - Subtask: IndexError: list index out of range
I look that file up in Airflow github repo and the line error does not match which makes me think that the actual Airflow instance from Cloud Composer is outdated. Is there any way to update it?
This would be resolved in 1.10 or 2.0.
Have a look to this PR
https://github.com/apache/incubator-airflow/pull/3165
This has been merged to master. You may use this PR code and create your own plugin.