I iterated my dataset using Dataloader in PyTorch 0.2 like these:
dataloader = torch.utils.data.DataLoader(...)
data_iter = iter(dataloader)
data = data_iter.next()
but IndexError was raised.
Traceback (most recent call last):
File "main.py", line 193, in <module>
data_target = data_target_iter.next()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 201, in __next__
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 40, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/asr4/zhuminxian/adversarial/code/dataset/data_loader.py", line 33, in __getitem__
return self.X_train[idx], self.y_train[idx]
IndexError: index 4196 is out of bounds for axis 0 with size 4135
I am wondering why the index was out of bounds. Is it the bug of Pytorch?
I tried to run my code again, the same error raised, but at different iteration and with different out-of-bound index.
My guess is that your data.Dataset.__len__ was not overloaded properly and in-fact len(dataloader.dataset) returns a number larger than len(self.X_train).
Check your implementation of the underlying dataset in '/home/asr4/zhuminxian/adversarial/code/dataset/data_loader.py'.
Related
1: problem:
I have the need to use a custom data set in a tff simulation. I have built on the tff/python/research/compression example "run_experiment.py".
The error:
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-47998fd56829>", line 1, in <module>
runfile('B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py', args=['--experiment_name=temp', '--client_batch_size=20', '--client_optimizer=sgd', '--client_learning_rate=0.2', '--server_optimizer=sgd', '--server_learning_rate=1.0', '--total_rounds=200', '--rounds_per_eval=1', '--rounds_per_checkpoint=50', '--rounds_per_profile=0', '--root_output_dir=B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/logs/fed_out/'], wdir='B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection')
File "B:\tools and software\PyCharm 2020.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "B:\tools and software\PyCharm 2020.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py", line 292, in <module>
app.run(main)
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py", line 285, in main
train_main()
File "B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py", line 244, in train_main
input_spec=input_spec),
File "B:/projects/openProjects/githubprojects/BotnetTrafficAnalysisFederaedLearning/anomaly-detection/train_v04.py", line 193, in model_builder
metrics=[tf.keras.metrics.Accuracy()]
File "B:\tools and software\Anaconda\envs\bookProjects\lib\site-packages\tensorflow_federated\python\learning\keras_utils.py", line 125, in from_keras_model
if len(input_spec) != 2:
TypeError: object of type 'TensorSpec' has no len()
highlighting: TypeError: object of type 'TensorSpec' has no len()
2: have tried:
I have looked at the response to: TensorFlow Federated: How can I write an Input Spec for a model with more than one input
describing what would be needed to produce a custom input spec for.
I might be miss understanding input spec.
If I don't need to do this, and there is a better way, please tell.
3: source:
df = get_train_data(sysarg)
x_train, x_opt, x_test = np.split(df.sample(frac=1,
random_state=17),
[int(1 / 3 * len(df)), int(2 / 3 * len(df))])
x_train, x_opt, x_test = create_scalar(x_opt, x_test, x_train)
input_spec = tf.nest.map_structure(tf.TensorSpec.from_tensor, tf.convert_to_tensor(x_train))
TFF's models declare a slightly different input specification than you may be expecting; they generally are expecting both the x and the y values as parameters (IE, data and labels). It is unfortunate that you're hitting that AttributeError, as the ValueError TFF would be raising is probably more helpful in this case. Inlining the operative parts of the message here:
The top-level structure in `input_spec` must contain exactly two elements,
as it must specify type information for both inputs to and predictions from the model.
The TLDR in your particular example is: if you have access to the labels as well (y_train below), simply change your input_spec definition to:
input_spec = tf.nest.map_structure(
tf.TensorSpec.from_tensor,
[tf.convert_to_tensor(x_train), tf.convert_to_tensor(y_train)])
I'm trying to launch dask.cluster.Kmeans with the huge amount of data.
Working with CPU is OK since i wrap numpy arrays with dask.array.
Working with GPU doesn't seem to be possible due to not implemented functionalities in cupy.
I've tried to reproduce Mattew Rocklin example (https://blog.dask.org/2019/01/03/dask-array-gpus-first-steps) on generating random dask array from CuPy random generator - and it works, but it's not the case I want to use.
Wrapping cupy with dask.array - doesn't work.
>>> import dask.array as da
>>> import cupy as cp
>>> da.from_array(cp.arange(100000)).sum().compute()
I expect the sum of this array but get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/base.py", line 175, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/base.py", line 446, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/threaded.py", line 82, in get
**kwargs
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/local.py", line 491, in get_async
raise_exception(exc, tb)
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/compatibility.py", line 130, in reraise
raise exc
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/local.py", line 233, in execute_task
result = _execute_task(task, data)
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/core.py", line 119, in _execute_task
return func(*args2)
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/dask/array/core.py", line 100, in getter
c = np.asarray(c)
File "/home/ubuntu/miniconda3/envs/cupy/lib/python3.6/site-packages/numpy/core/numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: object __array__ method not producing an array
So how could I manage the work with CuPy through the dask array?
When creating the Dask Array from a CuPy array, you need to supply da.from_array the keyword argument asarray=False. So your code would look like the following.
>>> import dask.array as da
>>> import cupy as cp
>>> da.from_array(cp.arange(100000), asarray=False).sum().compute()
I'm getting a MemoryError when I try to drop duplicate timestamps on a large dataframe with the following code.
import dask.dataframe as dd
path = f's3://{container_name}/*'
ddf = dd.read_parquet(path, storage_options=opts, engine='fastparquet')
ddf = ddf.reset_index().drop_duplicates(subset='timestamp_utc').set_index('timestamp_utc')
...
Profiling shows that it was using up about 14GB of RAM on a dataset of 265MB of gzipped parquet files containing about 40 million rows of data.
Is there an alternative way I can drop duplicate indexes on my data without Dask using so much memory?
The traceback below
Traceback (most recent call last):
File "/anaconda/envs/surb/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/anaconda/envs/surb/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/chengkai/surbana_lift/src/consolidate_data.py", line 62, in <module>
consolidate_data()
File "/home/chengkai/surbana_lift/src/consolidate_data.py", line 37, in consolidate_data
ddf = ddf.reset_index().drop_duplicates(subset='timestamp_utc').set_index('timestamp_utc')
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/dataframe/core.py", line 2524, in set_index
divisions=divisions, **kwargs)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/dataframe/shuffle.py", line 64, in set_index
divisions, sizes, mins, maxes = base.compute(divisions, sizes, mins, maxes)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/base.py", line 407, in compute
results = get(dsk, keys, **kwargs)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/threaded.py", line 75, in get
pack_exception=pack_exception, **kwargs)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/local.py", line 521, in get_async
raise_exception(exc, tb)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/compatibility.py", line 67, in reraise
raise exc
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/local.py", line 290, in execute_task
result = _execute_task(task, data)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/local.py", line 270, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/local.py", line 270, in <listcomp>
args2 = [_execute_task(a, cache) for a in args]
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/local.py", line 267, in _execute_task
return [_execute_task(a, cache) for a in arg]
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/local.py", line 267, in <listcomp>
return [_execute_task(a, cache) for a in arg]
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/local.py", line 271, in _execute_task
return func(*args2)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/dataframe/core.py", line 69, in _concat
return args[0] if not args2 else methods.concat(args2, uniform=True)
File "/anaconda/envs/surb/lib/python3.6/site-packages/dask/dataframe/methods.py", line 329, in concat
out = pd.concat(dfs3, join=join)
File "/anaconda/envs/surb/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 226, in concat
return op.get_result()
File "/anaconda/envs/surb/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 423, in get_result
copy=self.copy)
File "/anaconda/envs/surb/lib/python3.6/site-packages/pandas/core/internals.py", line 5418, in concatenate_block_manage
rs
[ju.block for ju in join_units], placement=placement)
File "/anaconda/envs/surb/lib/python3.6/site-packages/pandas/core/internals.py", line 2984, in concat_same_type
axis=self.ndim - 1)
File "/anaconda/envs/surb/lib/python3.6/site-packages/pandas/core/dtypes/concat.py", line 461, in _concat_datetime
return _concat_datetimetz(to_concat)
File "/anaconda/envs/surb/lib/python3.6/site-packages/pandas/core/dtypes/concat.py", line 506, in _concat_datetimetz
new_values = np.concatenate([x.asi8 for x in to_concat])
MemoryError
It is not too surprising that the data becomes very big in memory. Parquet is a pretty efficient format in terms of space, especially with gzip compression, and strings all become python objects (so expensive in memory).
In addition, you have a number of worker threads operating on parts of the overall dataframe. That involves data copying, intermediates, and concatenation of results; the latter being pretty inefficient in pandas.
One suggestion: instead of reset_index, you can remove one step by specifying index=False to read_parquet.
Next suggestion: limit the number of threads you use to a smaller number than the default, which is probably your number of CPU cores. The easiest way to do that is to use the distributed client in-process
from dask.distributed import Client
c = Client(processes=False, threads_per_worker=4)
It may be better to set the index first, and then do the drop_duplicated with map_partitions to minimise cross-partition communication.
df.map_partitions(lambda d: d.drop_duplicates(subset='timestamp_utc'))
I've just started learning machine learning algorithms. I would like to train VGG-16 network for my own dataset. I am using tflearn.DNN to simulate the VGG net.
I want to save the output (which is a tensor) of fully connected layer, that extracts 4096 features, into a file. I wanted to know how to save these features.
When i ran the following lines:
feed_dict = feed_dict_builder(X, Y, model.inputs, model.targets)
output = model.predictor.evaluate(feed_dict, convnet1)
print(output)
output.save('features.npy')
I got the following exception and error:
Exception in thread Thread-48:
Traceback (most recent call last):
File "/home/anupama/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/anupama/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/anupama/anaconda3/lib/python3.6/site-packages/tflearn/data_flow.py", line 187, in fill_feed_dict_queue
data = self.retrieve_data(batch_ids)
File "/home/anupama/anaconda3/lib/python3.6/site-packages/tflearn/data_flow.py", line 222, in retrieve_data
utils.slice_array(self.feed_dict[key], batch_ids)
File "/home/anupama/anaconda3/lib/python3.6/site-packages/tflearn/utils.py", line 180, in slice_array
return [x[start] for x in X]
File "/home/anupama/anaconda3/lib/python3.6/site-packages/tflearn/utils.py", line 180, in <listcomp>
return [x[start] for x in X]
IndexError: index 2 is out of bounds for axis 1 with size 2
[0.0]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-f2d62c020964> in <module>()
4 output = model.predictor.evaluate(feed_dict, convnet1)
5 print(output)
----> 6 output.save('/home/anupama/Internship/feats')
AttributeError: 'list' object has no attribute 'save'
You should save the FC layer of the network as a separate tensor and use DNN.predictor to evaluate it. Sample code:
import tflearn
from tflearn.utils import feed_dict_builder
# VGG model definition
...
previous_layer = ...
fc_layer1 = tflearn.fully_connected(previous_layer, 4096, activation='relu', name='fc1')
fc_layer2 = tflearn.fully_connected(fc_layer1, 4096, activation='relu', name='fc2')
network = ...
# Training
model = tflearn.DNN(network)
model.fit(x, y)
# Evaluation
feed_dict = feed_dict_builder(x, y, model.inputs, model.targets)
output = model.predictor.evaluate(feed_dict, [fc_layer2])
np.save('features.npy', output)
When running a session in TensorFlow I get the following error
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 190, in minimize
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 241, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/ops/gradients.py", line 481, in gradients
in_grads = _AsList(grad_fn(op, *out_grads))
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 162, in _DiagGrad
return array_ops.diag_part(grad)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 380, in diag_part
return _op_def_lib.apply_op("DiagPart", input=input, name=name)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2156, in create_op
set_shapes_for_outputs(ret)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1612, in set_shapes_for_outputs
shapes = shape_func(op)
File "/local0/software/python/python_bleeding_edge/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 982, in _DiagPartShape
" do not match ")
ValueError: Invalid shape, shape[:mid] (?,) and shape[mid:] (?,) do not match
I am not sure where it comes from, since it does not give any error indication in the model construction. I've also tried different optimisers, e.g. GradientDescentOptimizer but the error persists.
Actually error is somehow obvious as said in the error
_DiagPartShape Invalid shape, shape[:mid] (?,) and shape[mid:] (?,) do not match
You provided wrong dimensions.