Tensorflow: Ran out of memory trying to allocate 1.5KiB

Tensorflow: Ran out of memory trying to allocate 1.5KiB - memory

I am running tensorflow in a loop with 300 random structures to find a good network structure.
After the first epoch on the data are finished, I remove the worst 10% of them and start the second epoch on the networks. But, it fails in iteration ~350.
I am running it on Tesla K80 with 11.25 GiB of memory. I also have tensorflow version 0.9.0 and aggregation_method = tf.AggregationMethod.EXPERIMENTAL_TREE is set on tf.train.MomentumOptimizer.
Following is the error that I am getting. (Since it is a very long, I just selected the point that it is started, the change of details and the final logs.
I appreciate any help.
Afshin
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8192):
......
......
Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:656] Bin for 1.8KiB was 1.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303ee0000 of size 24832
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303ee6100 of size 768
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303ee6400 of size 73728
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303ef8400 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303ef8800 of size 86016
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303f0d800 of size 768
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303f0db00 of size 768
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303f0de00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303f0df00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x2303f0e000 of size 24832
....
....
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 188 Chunks of size 313856 totalling 56.27MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 318976 totalling 311.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 146 Chunks of size 397824 totalling 55.39MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 10.60GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 11386585088
InUse: 11386585088
MaxInUse: 11386585088
NumAllocs: 556930762
MaxAllocSize: 30105600
W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 1.5KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:899] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: zeros_1931 = Const[dtype=DT_DOUBLE, value=Tensor<type: double shape: [197] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
54.249; ||W|| 2175.582= lmbd*||W||= 5.291; seconds= 107.76
final; 2016-09-03 02:47:22; Iter= 10000; lr= 0.000078; l2= 0.002432; str= [43, 106, 200, 116, 1]; Train_loss= 1240.027; Test_loss= 1257.031; best_tets= 1254.249; ||W|| 2232.211= lmbd*||W||= 5.429; seconds= 116.30
0.95 0.006917335944 0.75 0.00218294805583 0.9 9000 [43, 46, 29, 1] 64
Traceback (most recent call last):
File "runner.py", line 66, in <module>
result += [dnnMultiLayerCoeff(maxiter,display,decay_rate,result[0][0],power,result[0][1],init_momentum,decay_step,result[0][2],result[0][3],batch_size,var,MaxUnImp,run_number,result[0][7],result[0][8])]
File "/scratch/afo214/tensorflow/dnnMultiLayerCoeff.py", line 130, in dnnMultiLayerCoeff
sess.run(tf.initialize_all_variables())
File "/usr/local/lib/python2.7/dist- packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist- packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Dst tensor is not initialized.
[[Node: zeros_1931 = Const[dtype=DT_DOUBLE, value=Tensor<type: double shape: [197] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'zeros_1931', defined at:
File "runner.py", line 64, in <module>
result += [dnnMultiLayerCoeff(maxiter,display,decay_rate,starter_learning_rate,power,l2lambda,init_momentum,decay_step,NoHiLayr,node[j],batch_size,var,MaxUnImp,run_number,w,b)]
File "/scratch/afo214/tensorflow/dnnMultiLayerCoeff.py", line 127, in dnnMultiLayerCoeff
train_step = tf.train.MomentumOptimizer(learning_rate,0.9).minimize(loss, global_step=global_step,aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 195, in minimize
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 297, in apply_gradients
self._create_slots(var_list)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/momentum.py", line 51, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 491, in _zeros_slot
named_slots[var] = slot_creator.create_zeros_slot(var, op_name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 106, in create_zeros_slot
val = array_ops.zeros(primary.get_shape().as_list(), dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 752, in zeros
output = constant(0, shape=shape, dtype=dtype, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/constant_op.py", line 166, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2260, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist- packages/tensorflow/python/framework/ops.py", line 1230, in __init__
self._traceback = _extract_stack()

I cleared the utilized GPU memory by deleting the the session objects for every new network and it works.

Related

Getting an upload error of ESP8266 using Arduino IDE with this report

The report is like this :
Executable segment sizes:
ICACHE : 32768 - flash instruction cache
IROM : 252980 - code in flash (default or ICACHE_FLASH_ATTR)
IRAM : 28261 / 32768 - code in IRAM (IRAM_ATTR, ISRs...)
DATA : 1560 ) - initialized variables (global, static) in RAM/HEAP
RODATA : 2484 ) / 81920 - constants (global, static) in RAM/HEAP
BSS : 26704 ) - zeroed variables (global, static) in RAM/HEAP
Sketch uses 285285 bytes (27%) of program storage space. Maximum is 1044464 bytes.
Global variables use 30748 bytes (37%) of dynamic memory, leaving 51172 bytes for local variables. Maximum is 81920 bytes.
"C:\Users\hp\AppData\Local\Arduino15\packages\esp8266\tools\python3\3.7.2-post1/python3" "C:\Users\hp\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.0.1/tools/upload.py" --chip esp8266 --port "COM4" --baud "115200" "" --before default_reset --after hard_reset write_flash 0x0 "C:\Users\hp\AppData\Local\Temp\arduino-sketch-A88D8C3CC17CAE2361CEB8CD194D0EE6/sketch_dec24a.ino.bin"
esptool.py v3.0
Serial port COM4
Connecting...
Traceback (most recent call last):
File "C:\Users\hp\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.0.1/tools/upload.py", line 66, in <module>
esptool.main(cmdline)
File "C:/Users/hp/AppData/Local/Arduino15/packages/esp8266/hardware/esp8266/3.0.1/tools/esptool\esptool.py", line 3552, in main
esp.connect(args.before, args.connect_attempts)
File "C:/Users/hp/AppData/Local/Arduino15/packages/esp8266/hardware/esp8266/3.0.1/tools/esptool\esptool.py", line 519, in connect
last_error = self._connect_attempt(mode=mode, esp32r0_delay=False)
File "C:/Users/hp/AppData/Local/Arduino15/packages/esp8266/hardware/esp8266/3.0.1/tools/esptool\esptool.py", line 499, in _connect_attempt
self.sync()
File "C:/Users/hp/AppData/Local/Arduino15/packages/esp8266/hardware/esp8266/3.0.1/tools/esptool\esptool.py", line 438, in sync
timeout=SYNC_TIMEOUT)
File "C:/Users/hp/AppData/Local/Arduino15/packages/esp8266/hardware/esp8266/3.0.1/tools/esptool\esptool.py", line 376, in command
self.write(pkt)
File "C:/Users/hp/AppData/Local/Arduino15/packages/esp8266/hardware/esp8266/3.0.1/tools/esptool\esptool.py", line 339, in write
self._port.write(buf)
File "C:/Users/hp/AppData/Local/Arduino15/packages/esp8266/hardware/esp8266/3.0.1/tools/pyserial\serial\serialwin32.py", line 325, in write
raise SerialTimeoutException('Write timeout')
serial.serialutil.SerialTimeoutException: Write timeout
Failed uploading: uploading error: exit status 1
I already installed all drivers and needed libraries.

How to start U-Boot from SD cards's FAT partition on Beaglebone Black

I'm currently reading Master Embedded Linux Programming and I'm on the chapter where it goes into bootloaders, more specifically U-Boot for the Beaglebone Black.
I have built a crosscompiler and I'm able to build U-Boot, however I can't make it run the way it is described in the book.
After some experimentation and Google'ing, I can make it work by writing MLO and u-boot.img in raw mode (using these command)
However, if I put the files in a FAT32 MBR boot partition, the Beaglebone will not boot, it will only show a string of C's, which indicate that it is trying to get its bootloader from the serial interface and it has decided it cannot boot from SD card.
I have also studied this answer. According to that answer I should be doing everything correctly. I've tried to experiment with the MMC raw mode options in the U-Boot build configuration, but I've not been able to find a change that works.
I feel like there must be something obvious I'm missing, but I can't figure it out. Are there any things I can try to debug this further?
Update: some more details on the partition tables.
When using the "raw way" of putting LBO and u-boot.img on the SD cards, I have not created any partitions at all. This works:
$ sudo sfdisk /dev/sda -l
Disk /dev/sda: 117,75 GiB, 126437294080 bytes, 246947840 sectors
Disk model: MassStorageClass
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
When trying to use a boot partition, that does not work, I have this configuration:
$ sudo sfdisk /dev/sda -l
Disk /dev/sda: 117,75 GiB, 126437294080 bytes, 246947840 sectors
Disk model: MassStorageClass
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x3d985ec3
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 133119 131072 64M c W95 FAT32 (LBA)
Update 2: The contents of the boot partition is the exact same 2 files that I use for the raw writes, so they are confirmed to work:
$ ls -al
total 1000
drwxr-xr-x 2 peter peter 16384 Jan 1 1970 .
drwxr-x---+ 3 root root 4096 Jul 18 08:44 ..
-rw-r--r-- 1 peter peter 108184 Jul 14 13:56 MLO
-rw-r--r-- 1 peter peter 893144 Jul 14 13:56 u-boot.img
Update 3: I have already tried the following U-Boot options to try it go get to work (in the SPL / TPL menu):
"Support FAT filesystems" This is enabled by default. I can't really find a good reference for the U-Boot options, but I am guessing this is what enables booting from a FAT partition (which is what I'm trying to do)
"MCC raw mode: by sector" I have disabled this. As expected, this indeed breaks the booting in raw mode, which is the only thing I got working up till now.
"MCC raw mode: by partition". I have tried to enable this and using partition 1 to load U-Boot from. I'm not sure how to understand this option. I assume raw mode does not require partitions, but this asks for what partition to use...
In general, if any one can point me to a U-Boot configuration reference, that would already by very helpful. Right now, I'm just randomly turning things on and off that sound like they may help.

U-boot doesn't show its terminal

I have compiled the u-boot for beaglebone black but the only message that I can see is:
At elinux.org tutorial the expected result is some errors followed by the u-boot terminal available to use.
To build u-boot I've followed the steps:
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- distclean
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- am335x_evm_config
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
my SD Card is set like this:
Disk /dev/sdd: 28,89 GiB, 31002198016 bytes, 60551168 sectors
Disk model: SD/MMC/MS PRO
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x33bcefe5
Device Boot Start End Sectors Size Id Type
/dev/sdd1 * 2048 2099199 2097152 1G c W95 FAT32 (LBA)
/dev/sdd2 2099200 60551167 58451968 27,9G 83 Linux

Unfotunetly I confirmed that was a hardware issue :(

Dask Scheduler Memory

our dask scheduler process seems to balloon in memory as time goes on and executions continue. Currently we see it using 5GB of mem, which seems high since all the data is supposedly living on the worker nodes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31172 atoz 20 0 5486944 5.071g 7100 S 23.8 65.0 92:38.64 dask-scheduler
when starting up the scheduler we would be below 1GB of memory use. Restarting the network doing a client.restart() doesn't seem to help, only a kill of the scheduler process itself and restart will free up the memory.
What is the expected usage of memory per single task executed?
Is the scheduler really only maintaining pointers to which worker contains the future's result?
----edit----
I think my main concern here is why a client.restart() doesn't seem to release the memory being used by the scheduler process. I'm obviously not expecting it to release all memory, but to get back to a base level. We are using client.map to execute our function across a list of different inputs. After executing, doing a client restart over and over and taking snapshots of our scheduler memory we see the following growth:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27955 atoz 20 0 670556 507212 13536 R 43.7 6.2 1:23.61 dask-scheduler
27955 atoz 20 0 827308 663772 13536 S 1.7 8.1 16:25.85 dask-scheduler
27955 atoz 20 0 859652 696408 13536 S 4.0 8.5 19:18.04 dask-scheduler
27955 atoz 20 0 1087160 923912 13536 R 62.3 11.3 20:03.15 dask-scheduler
27955 atoz 20 0 1038904 875788 13536 S 3.7 10.7 23:57.07 dask-scheduler
27955 atoz 20 0 1441060 1.163g 12976 S 4.3 14.9 35:54.45 dask-scheduler
27955 atoz 20 0 1646204 1.358g 12976 S 4.3 17.4 37:05.86 dask-scheduler
27955 atoz 20 0 1597652 1.312g 12976 S 4.7 16.8 37:40.13 dask-scheduler
I guess I was just surprised that after doing a client.restart() we don't see the memory usage go back to some baseline.
----further edits----
Some more info about what we're running, since the suggestion was if we were passing in large data structures, to send them directly to the workers.
we send a dictionary as an input for each task, when json dumping the dict, most are under 1000 characters.
---- even further edits: Reproduced issue ----
We reproduced this issue again today. I killed off the scheduler and restarted it, we had about 5.4 GB of free memory, we then ran the function that I'll paste below across 69614 dictionary objects that really hold some file based information (all of our workers are mapped to the same NFS datastore and we are using Dask as a distributed file analysis system.
Here is the function (note: squarewheels4 is a homegrown lazy file extraction and analysis package, it uses Acora and libarchive as its base for getting files out of a compressed archive and indexing the file.)
def get_mrc_failures(file_dict):
from squarewheels4.platforms.ucs.b_series import ChassisTechSupport
from squarewheels4.files.ucs.managed.chassis import CIMCTechSupportFile
import re
dimm_info_re = re.compile(r"(?P<slot>[^\|]+)\|(?P<size>\d+)\|.*\|(?P<pid>\S+)")
return_dict = file_dict
return_dict["return_code"] = "NOT_FILLED_OUT"
filename = "{file_path}{file_sha1}/{file_name}".format(**file_dict)
try:
sw = ChassisTechSupport(filename)
except Exception as e:
return_dict["return_code"] = "SW_LOAD_ERROR"
return_dict["error_msg"] = str(e)
return return_dict
server_dict = {}
cimcs = sw.getlist("CIMC*.tar.gz")
if not cimcs:
return_dict["return_code"] = "NO_CIMCS"
return_dict["keys_list"] = str(sw.getlist("*"))
return return_dict
for cimc in cimcs:
if not isinstance(cimc, CIMCTechSupportFile): continue
cimc_id = cimc.number
server_dict[cimc_id] = {}
# Get MRC file
try:
mrc = cimc["*MrcOut.txt"]
except KeyError:
server_dict[cimc_id]["response_code"] = "NO_MRC"
continue
# see if our end of file marker is there, should look like:
# --- END OF FILE (Done!
whole_mrc = mrc.read().splitlines()
last_10 = whole_mrc[-10:]
eof_line = [l for l in last_10 if b"END OF FILE" in l]
server_dict[cimc_id]["response_code"] = "EOF_FOUND" if eof_line else "EOF_MISSING"
if eof_line:
continue
# get DIMM types
hit_inventory_line = False
dimm_info = []
dimm_error_lines = []
equals_count = 0
for line in whole_mrc:
# regex each line... sigh
if b"DIMM Inventory" in line:
hit_inventory_line = True
if not hit_inventory_line:
continue
if hit_inventory_line and b"=========" in line:
equals_count += 1
if equals_count > 2:
break
continue
if equals_count < 2:
continue
# we're in the dimm section and not out of it yet
line = str(line)
reg = dimm_info_re.match(line)
if not reg:
#bad :/
dimm_error_lines.append(line)
continue
dimm_info.append(reg.groupdict())
server_dict[cimc_id]["dimm_info"] = dimm_info
server_dict[cimc_id]["dimm_error_lines"] = dimm_error_lines
return_dict["return_code"] = "COMPLETED"
return_dict["server_dict"] = server_dict
return return_dict
```
the futures are generated like:
futures = client.map(function_name, file_list)
After in this state my goal was to try and recover and have dask release the memory that it had allocated, here were my efforts:
before cancelling futures:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21914 atoz 20 0 6257840 4.883g 2324 S 0.0 62.6 121:21.93 dask-scheduler
atoz#atoz-sched:~$ free -h
total used free shared buff/cache available
Mem: 7.8G 7.1G 248M 9.9M 415M 383M
Swap: 8.0G 4.3G 3.7G
while cancelling futures:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21914 atoz 20 0 6258864 5.261g 5144 R 60.0 67.5 122:16.38 dask-scheduler
atoz#atoz-sched:~$ free -h
total used free shared buff/cache available
Mem: 7.8G 7.5G 176M 9.4M 126M 83M
Swap: 8.0G 4.1G 3.9G
after cancelling futures:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21914 atoz 20 0 6243760 5.217g 4920 S 0.0 66.9 123:13.80 dask-scheduler
atoz#atoz-sched:~$ free -h
total used free shared buff/cache available
Mem: 7.8G 7.5G 186M 9.4M 132M 96M
Swap: 8.0G 4.1G 3.9G
after doing a client.restart()
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21914 atoz 20 0 6177424 5.228g 4912 S 2.7 67.1 123:20.04 dask-scheduler
atoz#atoz-sched:~$ free -h
total used free shared buff/cache available
Mem: 7.8G 7.5G 196M 9.4M 136M 107M
Swap: 8.0G 4.0G 4.0G
Regardless of what I ran through the distributed system, my expectation was that after cancelling the futures it would be back to at least close to normal... and after doing a client.restart() we would definitely be near our normal baseline. Am I wrong here?
--- second repro ----
Reproduced the behavior (although not total memory exhaustion) using these steps:
Here's my worker function
def get_fault_list_v2(file_dict):
import libarchive
return_dict = file_dict
filename = "{file_path}{file_sha1}/{file_name}".format(**file_dict)
with libarchive.file_reader(filename) as arc:
for e in arc:
pn = e.pathname
return return_dict
I ran that across 68617 iterations / files
before running we saw this much memory being utilized:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12256 atoz 20 0 1345848 1.107g 7972 S 1.7 14.2 47:15.24 dask-scheduler
atoz#atoz-sched:~$ free -h
total used free shared buff/cache available
Mem: 7.8G 3.1G 162M 22M 4.5G 4.3G
Swap: 8.0G 3.8G 4.2G
After running we saw this much:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12256 atoz 20 0 2461004 2.133g 8024 S 1.3 27.4 66:41.46 dask-scheduler
After doing a client.restart we saw:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12256 atoz 20 0 2462756 2.134g 8144 S 6.6 27.4 66:42.61 dask-scheduler

Generally a task should take up less than a kilobyte on the scheduler. There are a few things you can trip up on that result in storing significantly more, the most common of which is including data within the task graph, which is shown below.
Data included directly in a task graph is stored on the scheduler. This commonly occurs when using large data directly in calls like submit:
Bad
x = np.random.random(1000000) # some large array
future = client.submit(np.add, 1, x) # x gets sent along with the task
Good
x = np.random.random(1000000) # some large array
x = client.scatter(x) # scatter data explicitly to worker, get future back
future = client.submit(np.add, 1, x) # only send along the future
This same principle exists using other APIs as well. For more information, I recommend providing an mcve. It's quite hard to help otherwise.

Get first and last times from pcap file with Wireshark command line tools (like tshark)

I have a huge collection of PCAP files, some of which have been "touched" since they were captured. This means the system timestamp on the file may not equate to the time of the data capture. Additionally, most of the files are autosaves from Wireshark, and sometimes the host computer doesn't get the data from the tap until after the capture time, so if this occurs just after a file autosaved, the next sequential file actually has captures prior to the end time of the previous file.
I have an automatic parser which uses tshark to go through these files. However, it takes about 2 minutes per file to run and I have tens of thousands of files, and I won't know that there's a timestamp issue until after it's run through the problem files.
Is there an easy way to grab the first "epoch time" and the last "epoch time" from a PCAP file using tshark (or another command line tool) without having to scan the entire file?

No (not with tshark).
However, Wireshark provides a program, capinfos, which reads a capture file to obtain information about the capture file such start-time, end-time, number-of-packets, etc. (See the help for details).
capinfos does no dissection and so will be much faster than tshark.
$capinfos -a -e wireless_080224_first.pcap.gz
File name: wireless_080224_first.pcap.gz
First packet time: 2008-02-24 13:10:09.637336
Last packet time: 2008-02-24 13:40:23.026171
$capinfos -T -r -a -e wireless_080224_first.pcap.gz
wireless_080224_first.pcap.gz 2008-02-24 13:10:09.637336 2008-02-24 13:40:23.026171
; Default output
$capinfos wireless_080224_first.pcap.gz
File name: wireless_080224_first.pcap.gz
File type: Wireshark/tcpdump/... - pcap (gzip compressed)
File encapsulation: Ethernet
File timestamp precision: microseconds (6)
Packet size limit: file hdr: 65535 bytes
Number of packets: 15 k
File size: 12 MB
Data size: 13 MB
Capture duration: 1813.388835 seconds
First packet time: 2008-02-24 13:10:09.637336
Last packet time: 2008-02-24 13:40:23.026171
Data byte rate: 7705 bytes/s
Data bit rate: 61 kbps
Average packet size: 894.31 bytes
Average packet rate: 8 packets/s
SHA1: 222837342c170e8fb0c2673aef9c056a2ddc08ae
RIPEMD160: ecf83704b912da3d2f69f4257fa9ee1658aac6cb
MD5: b82eda24d784e69ac0828a4ebffed885
Strict time order: True
Number of interfaces in file: 1
Interface #0 info:
<snip>

capinfos is the superior solution but if you don't have access to it or want to use tshark this is how you might want to go about it
tshark -r $file -Tfields -e frame.time_delta | sort -n | tail -1

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart