How to think about weights in Myrrix - machine-learning

I have the following input for Myrrix:
11, 101, 1
11, 102, 1
11, 103, 1
11, 104, 1000
11, 105, 1000
11, 106, 1000
12, 101, 1
12, 102, 1
12, 103, 1
12, 222, 1
13, 104, 1000
13, 105, 1000
13, 106, 1000
13, 333, 1000
I am looking for items to recommend to user 11. The expectation is that item 333 will be recommended first (because of the higher weights for user 13 and items 104, 105, 106).
Here are the recommendation results from Myrrix:
11, 222, 0.04709
11, 333, 0.0334058
Notice that item 222 is recommended with strength 0.047, but item 333 is only given a strength of 0.033 --- the opposite of the expected results.
I also would have expected the difference in strength to be larger (since 1000 and 1 are so different), but obviously that's moot when the order isn't even what I expected.
How can I interpret these results and how should I think about the weight parameter? We are working with a large client under a tight deadline and would appreciate any pointers.

It's hard to judge based on a small and synthetic data set. I think the biggest factor will be parameters here -- what are the # of features? lambda? I would expect features = 2 here. If it's higher I think you quickly over-fit this and the results are mostly the noise left over from that after it perfectly explains that user 11 doesn't interact with 222 and 333.
The values are quite low, suggesting both of these are not likely results, and so their order may be more noise than anything. Do you see different results if the model is rebuilt from another random starting point?

Related

How can I convert a GUID into a byte array in Ruby?

In order to save data traffic we want to send our GUID's as array of bytes instead of as a string (with the use of Google Protocol Buffers).
How can I convert a string representation of a GUID in Ruby to an array of bytes:
Example:
Guid: 35918bc9-196d-40ea-9779-889d79b753f0
=> Result: C9 8B 91 35 6D 19 EA 40 97 79 88 9D 79 B7 53 F0
In .NET this seems to be natively implemented:
http://msdn.microsoft.com/en-us/library/system.guid.tobytearray%28v=vs.110%29.aspx
Your example GUID is in a Microsoft specific format. From Wikipedia:
Other systems, notably Microsoft's marshalling of UUIDs in their COM/OLE libraries, use a mixed-endian format, whereby the first three components of the UUID are little-endian, and the last two are big-endian.
So in order to get that result, we have to move the bits around a little. Specifically, we have to change the endianess of the first three components. Let's start by breaking the GUID string apart:
guid = '35918bc9-196d-40ea-9779-889d79b753f0'
parts = guid.split('-')
#=> ["35918bc9", "196d", "40ea", "9779", "889d79b753f0"]
We can convert these hex-strings to binary via:
mixed_endian = parts.pack('H* H* H* H* H*')
#=> "5\x91\x8B\xC9\x19m#\xEA\x97y\x88\x9Dy\xB7S\xF0"
Next let's swap the first three parts:
big_endian = mixed_endian.unpack('L< S< S< A*').pack('L> S> S> A*')
#=> "\xC9\x8B\x915m\x19\xEA#\x97y\x88\x9Dy\xB7S\xF0"
L denotes a 32-bit unsigned integer (1st component)
S denotes a 16-bit unsigned integer (2nd and 3rd component)
< and > denote little-endian and big-endian, respectively
A* treats the remaining bytes as an arbitrary binary string (we don't have to convert these)
If you prefer an array of bytes instead of a binary string, you'd just use:
big_endian.bytes
#=> [201, 139, 145, 53, 109, 25, 234, 64, 151, 121, 136, 157, 121, 183, 83, 240]
PS: if your actual GUID isn't Microsoft specific, you can skip the swapping part.

Change PTILE algorithm in SPSS

Using SPSS 22 and in the documentation it lists several types of percentile calculations:
HPTILE
WPTILE
RPTILE
EPTILE
APTILE
From what I gather the default is APTILE. I would like to change it to HPTILE. The thing is it doesn't really say where to change it in SPSS syntax.
So in the syntax, I have:
CTABLES
/TABLES
...
[VALIDN F40.0,ptile 5, ptile 10, ptile 15, ptile 20 PTILE 25, ptile 30, ptile 35
ptile 40, ptile 45, MEDIAN, MEAN, ptile 55, ptile 60, PTILE 65, ptile 70, PTILE 75, ptile 80, ptile 85, PTILE 90, PTILE 95]
I was hoping it would be as simple as changing PTILE to HPTILE, but it results in an TABLE: Text hptile. An invalid subcommand, keyword, or option was specified.
How can I change the percentile algorithm used?
Well, I managed to get help in the IBM forums. Turns out none of SPSS percentile algorithms match the algorithms used in Excel (percentile.inc) and most databases that have native percentile functions (percentile_cont).
At any rate, this was what worked:
EXAMINE VARIABLES = Item Sales BY Item
/PERCENTILES(10, 25, 50, 65, 75, 90) = [WAVERAGE]
[HAVERAGE]
[ROUND]
[EMPIRICAL]
[AEMPIRICAL]

Keras uses way too much GPU memory when calling train_on_batch, fit, etc

I've been messing with Keras, and like it so far. There's one big issue I have been having, when working with fairly deep networks: When calling model.train_on_batch, or model.fit etc., Keras allocates significantly more GPU memory than what the model itself should need. This is not caused by trying to train on some really large images, it's the network model itself that seems to require a lot of GPU memory. I have created this toy example to show what I mean. Here's essentially what's going on:
I first create a fairly deep network, and use model.summary() to get the total number of parameters needed for the network (in this case 206538153, which corresponds to about 826 MB). I then use nvidia-smi to see how much GPU memory Keras has allocated, and I can see that it makes perfect sense (849 MB).
I then compile the network, and can confirm that this does not increase GPU memory usage. And as we can see in this case, I have almost 1 GB of VRAM available at this point.
Then I try to feed a simple 16x16 image and a 1x1 ground truth to the network, and then everything blows up, because Keras starts allocating lots of memory again, for no reason that is obvious to me. Something about training the network seems to require a lot more memory than just having the model, which doesn't make sense to me. I have trained significantly deeper networks on this GPU in other frameworks, so that makes me think that I'm using Keras wrong (or there's something wrong in my setup, or in Keras, but of course that's hard to know for sure).
Here's the code:
from scipy import misc
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Convolution2D, MaxPooling2D, Reshape, Flatten, ZeroPadding2D, Dropout
import os
model = Sequential()
model.add(Convolution2D(256, 3, 3, border_mode='same', input_shape=(16,16,1)))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Convolution2D(512, 3, 3, border_mode='same'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(Convolution2D(1024, 3, 3, border_mode='same'))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Convolution2D(256, 3, 3, border_mode='same'))
model.add(Convolution2D(32, 3, 3, border_mode='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(4))
model.add(Dense(1))
model.summary()
os.system("nvidia-smi")
raw_input("Press Enter to continue...")
model.compile(optimizer='sgd',
loss='mse',
metrics=['accuracy'])
os.system("nvidia-smi")
raw_input("Compiled model. Press Enter to continue...")
n_batches = 1
batch_size = 1
for ibatch in range(n_batches):
x = np.random.rand(batch_size, 16,16,1)
y = np.random.rand(batch_size, 1)
os.system("nvidia-smi")
raw_input("About to train one iteration. Press Enter to continue...")
model.train_on_batch(x, y)
print("Trained one iteration")
Which gives the following output for me:
Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN 5103)
/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
warnings.warn(warn)
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
convolution2d_1 (Convolution2D) (None, 16, 16, 256) 2560 convolution2d_input_1[0][0]
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D) (None, 8, 8, 256) 0 convolution2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 8, 8, 512) 1180160 maxpooling2d_1[0][0]
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D) (None, 4, 4, 512) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 4, 4, 1024) 4719616 maxpooling2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_3[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_4[0][0]
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_5[0][0]
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_6[0][0]
____________________________________________________________________________________________________
convolution2d_8 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_7[0][0]
____________________________________________________________________________________________________
convolution2d_9 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_8[0][0]
____________________________________________________________________________________________________
convolution2d_10 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_9[0][0]
____________________________________________________________________________________________________
convolution2d_11 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_10[0][0]
____________________________________________________________________________________________________
convolution2d_12 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_11[0][0]
____________________________________________________________________________________________________
convolution2d_13 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_12[0][0]
____________________________________________________________________________________________________
convolution2d_14 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_13[0][0]
____________________________________________________________________________________________________
convolution2d_15 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_14[0][0]
____________________________________________________________________________________________________
convolution2d_16 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_15[0][0]
____________________________________________________________________________________________________
convolution2d_17 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_16[0][0]
____________________________________________________________________________________________________
convolution2d_18 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_17[0][0]
____________________________________________________________________________________________________
convolution2d_19 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_18[0][0]
____________________________________________________________________________________________________
convolution2d_20 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_19[0][0]
____________________________________________________________________________________________________
convolution2d_21 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_20[0][0]
____________________________________________________________________________________________________
convolution2d_22 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_21[0][0]
____________________________________________________________________________________________________
convolution2d_23 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_22[0][0]
____________________________________________________________________________________________________
convolution2d_24 (Convolution2D) (None, 4, 4, 1024) 9438208 convolution2d_23[0][0]
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D) (None, 2, 2, 1024) 0 convolution2d_24[0][0]
____________________________________________________________________________________________________
convolution2d_25 (Convolution2D) (None, 2, 2, 256) 2359552 maxpooling2d_3[0][0]
____________________________________________________________________________________________________
convolution2d_26 (Convolution2D) (None, 2, 2, 32) 73760 convolution2d_25[0][0]
____________________________________________________________________________________________________
maxpooling2d_4 (MaxPooling2D) (None, 1, 1, 32) 0 convolution2d_26[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten) (None, 32) 0 maxpooling2d_4[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 4) 132 flatten_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 5 dense_1[0][0]
====================================================================================================
Total params: 206538153
____________________________________________________________________________________________________
None
Thu Oct 6 09:05:42 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960 Off | 0000:01:00.0 On | N/A |
| 30% 37C P2 28W / 120W | 1082MiB / 2044MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1796 G /usr/bin/X 155MiB |
| 0 2597 G compiz 65MiB |
| 0 5966 C python 849MiB |
+-----------------------------------------------------------------------------+
Press Enter to continue...
Thu Oct 6 09:05:44 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960 Off | 0000:01:00.0 On | N/A |
| 30% 38C P2 28W / 120W | 1082MiB / 2044MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1796 G /usr/bin/X 155MiB |
| 0 2597 G compiz 65MiB |
| 0 5966 C python 849MiB |
+-----------------------------------------------------------------------------+
Compiled model. Press Enter to continue...
Thu Oct 6 09:05:44 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960 Off | 0000:01:00.0 On | N/A |
| 30% 38C P2 28W / 120W | 1082MiB / 2044MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1796 G /usr/bin/X 155MiB |
| 0 2597 G compiz 65MiB |
| 0 5966 C python 849MiB |
+-----------------------------------------------------------------------------+
About to train one iteration. Press Enter to continue...
Error allocating 37748736 bytes of device memory (out of memory). Driver report 34205696 bytes free and 2144010240 bytes total
Traceback (most recent call last):
File "memtest.py", line 65, in <module>
model.train_on_batch(x, y)
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 712, in train_on_batch
class_weight=class_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1221, in train_on_batch
outputs = self.train_function(ins)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 717, in __call__
return self.function(*inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in __call__
outputs = self.fn()
MemoryError: Error allocating 37748736 bytes of device memory (out of memory).
Apply node that caused the error: GpuContiguous(GpuDimShuffle{3,2,0,1}.0)
Toposort index: 338
Inputs types: [CudaNdarrayType(float32, 4D)]
Inputs shapes: [(1024, 1024, 3, 3)]
Inputs strides: [(1, 1024, 3145728, 1048576)]
Inputs values: ['not shown']
Outputs clients: [[GpuDnnConv{algo='small', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='half', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0}), GpuDnnConvGradI{algo='none', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='half', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
A few things to note:
I have tried both Theano and TensorFlow backends. Both have the same problems, and run out of memory at the same line. In TensorFlow, it seems that Keras preallocates a lot of memory (about 1.5 GB) so nvidia-smi doesn't help us track what's going on there, but I get the same out-of-memory exceptions. Again, this points towards an error in (my usage of) Keras (although it's hard to be certain about such things, it could be something with my setup).
I tried using CNMEM in Theano, which behaves like TensorFlow: It preallocates a large amount of memory (about 1.5 GB) yet crashes in the same place.
There are some warnings about the CudNN-version. I tried running the Theano backend with CUDA but not CudNN and I got the same errors, so that is not the source of the problem.
If you want to test this on your own GPU, you might want to make the network deeper/shallower depending on how much GPU memory you have to test this.
My configuration is as follows: Ubuntu 14.04, GeForce GTX 960, CUDA 7.5.18, CudNN 5.1.3, Python 2.7, Keras 1.1.0 (installed via pip)
I've tried changing the compilation of the model to use different optimizers and losses, but that doesn't seem to change anything.
I've tried changing the train_on_batch function to use fit instead, but it has the same problem.
I saw one similar question here on StackOverflow - Why does this Keras model require over 6GB of memory? - but as far as I can tell, I don't have those issues in my configuration. I've never had multiple versions of CUDA installed, and I've double checked my PATH, LD_LIBRARY_PATH and CUDA_ROOT variables more times than I can count.
Julius suggested that the activation parameters themselves take up GPU memory. If this is true, can somebody explain it a bit more clearly? I have tried changing the activation function of my convolution layers to functions that are clearly hard-coded with no learnable parameters as far as I can tell, and that doesn't change anything. Also, it seems unlikely that these parameters would take up almost as much memory as the rest of the network itself.
After thorough testing, the largest network I can train is about 453 MB of parameters, out of my ~2 GB of GPU RAM. Is this normal?
After testing Keras on some smaller CNNs that do fit in my GPU, I can see that there are very sudden spikes in GPU RAM usage. If I run a network with about 100 MB of parameters, 99% of the time during training it'll be using less than 200 MB of GPU RAM. But every once in a while, memory usage spikes to about 1.3 GB. It seems safe to assume that it's these spikes that are causing my problems. I've never seen these spikes in other frameworks, but they might be there for a good reason? If anybody knows what causes them, and if there's a way to avoid them, please chime in!
It is a very common mistake to forget that the activations, gradients and optimizer moment tracking variables also take VRRAM, not just the parameters, increasing memory usage quite a bit. The backprob calculations themselves make it so the training phase takes almost double the VRAM of forward / inference use of the neural net, and the Adam optimizer triples the space usage.
So, in the beginning when the network is created, only the parameters are allocated. However, when the training starts. the model actiavtions, backprop computations and the optimizer's tracking variables get allocated, increasing memory use by a large factor.
To allow the training of larger models, people:
use model parallelism to spread the weights and computations over different accelerators
use gradient checkpointing, which allows a tradeoff between more computation vs lower memory use during back-propagation.
Potentially use a memory efficient optimizer that aims to reduce the number of tracking variables, such as Adafactor, for which you will find implementations for all popular deep learning frameworks.
Tools to train very large models:
Mesh-Tensorflow https://arxiv.org/abs/1811.02084
https://github.com/tensorflow/mesh
Microsoft DeepSpeed:
https://github.com/microsoft/DeepSpeed https://www.deepspeed.ai/
Facebook FairScale: https://github.com/facebookresearch/fairscale
Megatron-LM: https://arxiv.org/abs/1909.08053
https://github.com/NVIDIA/Megatron-LM
Article on integration in HuggingFace Transformers: https://huggingface.co/blog/zero-deepspeed-fairscale
Both Theano and Tensorflow augments the symbolic graph that is created, though both differently.
To analyze how the memory consumption is happening you can start with a smaller model and grow it to see the corresponding growth in memory. Similarly you can grow the batch_size to see the corresponding growth in memory.
Here is a code snippet for increasing batch_size based on your initial code:
from scipy import misc
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Convolution2D, MaxPooling2D, Reshape, Flatten, ZeroPadding2D, Dropout
import os
import matplotlib.pyplot as plt
def gpu_memory():
out = os.popen("nvidia-smi").read()
ret = '0MiB'
for item in out.split("\n"):
if str(os.getpid()) in item and 'python' in item:
ret = item.strip().split(' ')[-2]
return float(ret[:-3])
gpu_mem = []
gpu_mem.append(gpu_memory())
model = Sequential()
model.add(Convolution2D(100, 3, 3, border_mode='same', input_shape=(16,16,1)))
model.add(Convolution2D(256, 3, 3, border_mode='same'))
model.add(Convolution2D(32, 3, 3, border_mode='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(4))
model.add(Dense(1))
model.summary()
gpu_mem.append(gpu_memory())
model.compile(optimizer='sgd',
loss='mse',
metrics=['accuracy'])
gpu_mem.append(gpu_memory())
batches = []
n_batches = 20
batch_size = 1
for ibatch in range(n_batches):
batch_size = (ibatch+1)*10
batches.append(batch_size)
x = np.random.rand(batch_size, 16,16,1)
y = np.random.rand(batch_size, 1)
print y.shape
model.train_on_batch(x, y)
print("Trained one iteration")
gpu_mem.append(gpu_memory())
fig = plt.figure()
plt.plot([-100, -50, 0]+batches, gpu_mem)
plt.show()
Also, for speed Tensorflow hogs up the all available GPU memory. To stop that and you need to add config.gpu_options.allow_growth = True in get_session()
# keras/backend/tensorflow_backend.py
def get_session():
global _SESSION
if tf.get_default_session() is not None:
session = tf.get_default_session()
else:
if _SESSION is None:
if not os.environ.get('OMP_NUM_THREADS'):
config = tf.ConfigProto(allow_soft_placement=True,
)
else:
nb_thread = int(os.environ.get('OMP_NUM_THREADS'))
config = tf.ConfigProto(intra_op_parallelism_threads=nb_thread,
allow_soft_placement=True)
config.gpu_options.allow_growth = True
_SESSION = tf.Session(config=config)
session = _SESSION
if not _MANUAL_VAR_INIT:
_initialize_variables()
return session
Now if you run the prev snippet you get plots like:
Theano:
Tensorflow:
Theano: After model.compile() whatever the memory is needed, during the start of training, it almost doubles. This is because Theano augments the symbolic graph to do back-propagation and each tensor needs a corresponding tensor to achieve the backward flow of gradients. The memory needs don't seem to grow with batch_size and this is unexpected to me as the placeholder size should increase to accommodate the data inflow from CPU->GPU.
Tensorflow: No GPU memory is allocated even after model.compile() as Keras don't call get_session() till that time which actually calls _initialize_variables(). Tensorflow seems to hog memory in chunks for speed and so the memory don't grow linearly with batch_size.
Having said all that Tensorflow seems to be memory hungry but for big graphs its very fast.. Theano on the other hand is very gpu memory efficient but takes a hell lot of time to initialize the graph at the start of training. After that its also pretty fast.
200M params for 2 Gb GPU is toooo much. Also your architecture not efficient, using local bottlenecks will be more efficient.
Also you should go from small model to big, and not backwards, right now you have input 16x16, with this architecture that means that at the end most of your network will be "zero padded" and not based on input features.
Your model layers depends on your input, so you cant just set arbitrary number of layers and sizes, you need count how much data will be passed to each of them, with understanding why are doing so.
I would recommend you to watch this free course http://cs231n.github.io

How to create a "on/off" graphs with HighCharts?

I've read the documentation quite a few times, but I just can't seem to find a way to make a graph like this. Perhaps it's because I don't know what it's called, so I'm not even sure what to look for. Let me try to explain what I'm trying to do.
Normally if you have a series of points like this:
3 May, 5:00 PM ---> 0
3 May, 5:20 PM ---> 3
4 May, 5:00 PM ---> 0
4 May, 5:20 PM ---> 3
If you make a standard LINE GRAPH, high charts will plot the values INCREASE between the two. So I end up with this:
But the problem is, the values being shown are actually values changing at a point in time. In other words, what I want is this:
And even more importantly, it seems the spacing between time isn't correct. You'll notice that it creates a perfect zigzag, even though the times between the first and second point is 20 minutes (5PM to 5:20 PM), and the second point and 3rd point is 23 hours and 40 minutes (3 May 5:20 PM and 4 May 5PM). So what I really want is this:
Any idea what a graph like this is called?
Any idea how to make it using HighCharts?
UPDATE
The only solution I can think of right now, is to fake points between the real points. so for example if the value is 0 at 5PM and turns to 3 at 5:20 PM, then I will add 19 points in between these two. So at 5:01 I will make it 0, and 5:02 I will also make it 0, and 5:03 etc. Until 5:19. But even this method will result in a SLIGHTLY skewed line going up from 5:19 to 5:20. Which is what I'm actually trying to avoid.
Any ideas?
UPDATE 2
The "step : left" solution has definitely solved half of my problem, but for some reason I still have this:
You should now see that even though I have steps, they are not quite making the expected spacing. For 17:13 on 5 May, I expect the graph to be closer to the 6 May mark, than to the 5 May mark.
Any ideas as to why this is happening?
UPDATE 3
I created a jFiddle for my problem: https://jsfiddle.net/coderama/ubz7m0Lh/4/
UPDATE 4
Based on wergeld's input, it seems using "ordinal" on the x axis is the way to go --> http://api.highcharts.com/highstock#xAxis.ordinal
But it produces a pretty weird graph: https://jsfiddle.net/coderama/6tz8h53x/1/
I'll keep looking, but at least it feels like there's progress being made!
What you are looking for is the step option. You can set up something like:
$(function() {
$('#container').highcharts({
title: {
text: 'Step line types, with null values in the series'
},
xAxis: {
type: 'datetime',
tickInterval: 86400000
},
series: [{
data: [
[Date.UTC(2016, 04, 3, 17, 00), 0],
[Date.UTC(2016, 04, 3, 20, 00), 3],
[Date.UTC(2016, 04, 4, 17, 00), 0],
[Date.UTC(2016, 04, 5, 18, 00), 3],
[Date.UTC(2016, 04, 5, 19, 00), 0],
[Date.UTC(2016, 04, 6, 20, 00), 3],
[Date.UTC(2016, 04, 7, 17, 00), 0]
],
step: 'left'
}]
});
});
The step parameter tells highcharts how to go from your given point to the next point.

Alignment Rules

I'm having a bit of trouble with a homework problem, and I was wondering if anyone could point me in the right direction.
Suppose we are compiling for a machine
with 1-byte characters, 2-byte
shorts, 4-byte integers, and 8-byte
reals, and with alignment rules that
require the address of every primitive
data element to be an even multiple of
the element’s size. Suppose further
that the compiler is not permitted to
reorder fields. How much space will be
consumed by the following array?
A : array [0..9] of record
s : short;
c : char;
t : short;
d : char;
r : real;
i : integer;
end;
Now I understand the problem, for the most part, but the thing that is really throwing me for a loop is the "alignment rules that require the address of every primitive data
element to be an even multiple of the element’s size". My book isn't very description when it comes to alignment rules and to be completely honest, I'm not even positive on what an even multiple is. Any help would be appreciated.
Also, I believe the answer is 240-bytes, I just need some help getting there.
Let's break that down:
"alignment rules" that "require the address of every primitive data element to be an even multiple of the element’s size". It's not very interesting that we're talking about alignment rules; we knew that already.
"require the address" of "every primitive data element" to be "an even multiple of the element’s size". Now we're getting somewhere. We have a requirement and a scope:
Requirement: The address is an even multiple of the element's size.
Scope: Every primitive data element.
So, every time we position an element, we must impose the requirement.
Let us try to position an element in memory. The first thing we will position is the short labelled s. Since a short takes up 2 bytes of memory, and we must make its address a multiple of that size, the address must be a multiple of 2. Let's call that address N.
So, s takes up the space from N up to N + 2. (NOTE: For all of these ranges, the first endpoint is included, but the last endpoint is not. This is the normal way to describe ranges of integers in computer science; in most cases it is by far the most useful and least error-prone way to do it. Trust me.)
We continue with each other field.
c takes up one byte, from N + 2 to N + 3.
We are at N + 3, but we cannot start t there, because N + 3 is odd (since N is even). So we must skip a byte. Thus t ranges from N + 4 to N + 6.
Continuing with this sort of logic, we end up with d from N + 6 to N + 7; r from N + 8 to N + 16; i from N + 16 to N + 20. (NOTE that this only works if we restrict N to be a multiple of 8, or r will be unaligned. This is ok; when we allocate the memory for the array, we can align the start of it however we want - we just have to be consistent about the sequence of data after that point.)
So we need at least 20 bytes for this structure. (That's one of the advantages of the half-open ranges: the difference between the endpoints equals the size. If we included or excluded both endpoints from the range, we'd have to make a +1 or -1 correction.)
Now let's say we try to lay out the array as ten consecutive chunks of 20 bytes. Will this work? No; say that element 0 is at address 256 (a multiple of 8). Now r in element 1 will be unaligned, because it will start at 256 + 20 + 8, which is not divisible by 8. That's not allowed.
So what do we do now? We can't just insert an extra 4 bytes before r in element 1, because every element of the array must have the same layout (not to mention size). But there is a simple solution: we insert 4 bytes of additional padding at the end of each element. Now, as long as the array starts at some multiple of 8, every element will also start at a multiple of 8 (which, in turn, keeps r aligned), because the size is now a multiple of 8.
We conclude that we need 24 bytes for the structure, and thus 24 * 10 = 240 bytes for the array.
The phrase "an even multiple of the element’s size" may indicate that a 2-byte short must be aligned on a 4-byte boundary, for example.
That seems a bit wasteful to me but, since it's homework, it's certainly possible.
Using those rules, you would have (for an array of size 2):
Offset Variable Size Range
------ -------- ---- -----
0 s 2 0-1
4 c 1 2-2
8 t 2 4-5
12 d 1 6-6
16 r 8 16-23
24 i 4 24-27
28 * 4 28-31
32 s 2 32-33
34 c 1 34-34
36 t 2 36-37
38 d 1 38-38
48 r 8 48-55
56 i 4 56-59
60 * 4 60-63
The reason you have the padding is to bring each array element up to a multiple of 16 so that the r variable in each can be aligned to 16 bytes.
So the ten array elements would take up 320 bytes in that case.
It may also mean "even" as in "integral" rather than "multiple of two" (far more likely since it matches reality).
That would make the array:
Offset Variable Size Range
------ -------- ---- -----
0 s 2 0-1
4 c 1 2-2
8 t 2 4-5
12 d 1 6-6
16 r 8 8-15
24 i 4 16-19
28 * 4 20-23
32 s 2 24-25
34 c 1 26-26
36 t 2 28-29
38 d 1 30-30
48 r 8 32-39
56 i 4 40-43
60 * 4 44-47
In that case, you have 24 bytes per element for a total of 240 bytes. Again, you need padding to ensure that r is aligned correctly.
I disagree - I read "an even multiple of the element’s size" as "2-byte shorts must have even addresses", or "4-byte ints must be 4-byte aligned". So, an int at address 0x101 to 0x103 is a bus error, but 0x100 and 0x104 is correct
Hopefully this clears things out to an extent:(ans would be 236(232+4) to be exact)
import pprint
l=[2,1,2,1,8,4]
count=0
i=0
d={}
while(count<10):
for ele in l:
while True:
if(i%ele==0):
d[i]=ele
i=i+ele
break
i=i+1
count+=1
pprint.pprint(d)
Output :
{0: 2,
2: 1,
4: 2,
6: 1,
8: 8,
16: 4,
20: 2,
22: 1,
24: 2,
26: 1,
32: 8,
40: 4,
44: 2,
46: 1,
48: 2,
50: 1,
56: 8,
64: 4,
68: 2,
70: 1,
72: 2,
74: 1,
80: 8,
88: 4,
92: 2,
94: 1,
96: 2,
98: 1,
104: 8,
112: 4,
116: 2,
118: 1,
120: 2,
122: 1,
128: 8,
136: 4,
140: 2,
142: 1,
144: 2,
146: 1,
152: 8,
160: 4,
164: 2,
166: 1,
168: 2,
170: 1,
176: 8,
184: 4,
188: 2,
190: 1,
192: 2,
194: 1,
200: 8,
208: 4,
212: 2,
214: 1,
216: 2,
218: 1,
224: 8,
232: 4}
ans should be 240
Size of structure will be the alignment of immediate larger structure
https://www.google.com/amp/s/www.geeksforgeeks.org/data-structure-alignment/amp/
-s 2 short size 2 byte
-c 2 char 1 byte + 1 paddling
-t 2
-d 2
-r 8
-i 8 int 4 + 4 paddling
=24
so
24*10=240
Max is 8 byte so it should be divisible by 8
according to alignment rules

Resources