Torch: luajit out of memory on simple task

Torch: luajit out of memory on simple task - lua

I am trying to load the MNIST dataset in the th repl and do mean subtraction by the following:
file = torch.load('data/mnist.t7/train_32x32.t7', 'ascii')
data = file.data:type(torch.getdefaulttensortype())
mean = data:mean()
data:add(-mean)
The last line causes the following error:
.../torch/install/bin/luajit: not enough memory
I am running this on a laptop with 16GB of RAM. Also MNIST has already been loaded into data so not sure why doing data:add(-mean) would cause this issue. Any ideas?
Thanks

The problem was that it was trying to print the whole matrix (which is large) to the console.
This can be overcome by doing either
data = data:add(-mean)
or
data:add(-mean); - notice the semicolon
Answer provided by Soumith Chintala on the torch gitter.

Related

How to change data length parameter in maxima software?

I need to use maxima software to deal with data. I try to read data from a text file constructed as
1 2 3
11 22 33
ect.
Following comands allow for loading data sufficiently.
load(numericalio);
read_matrix("path to the file");
The problem arises when I apply them to a more realistic (larger) data set. In this case the message appears Expression longer than allowed by the configuration setting.
How to overcome this problem? I cannot see any option in configuration menu. I would be grateful for advice.

I ran into the same error message today, at it seems to be related to the size of the output that wxMaxima receives from the Maxima executable.
If you wish to display the output regardless, you can change it in the configuration here:
Edit>Configure>Worksheet>Show long expressions
Note that showing a massive expression or amount of data may dramatically slow the program down, so consider hiding the output (use a $ instead of a ; at the end of your lines) if you don't need to visualize the data.

Advice (Best practices) for handling large number of large 2D arrays in HDF5 files

I am using a python program to write a 4000x4000 array into an hdf5 file.
Then, I read the data by a c-program where I need it as an input to do some simulations. I need approximately 1000 of these 4000x4000 arrays (meaning, I am doing 1000 simulation runs).
My question now is the following: Which way is "better", 1000 separate hdf5 files or one big hdf5-file with 1000 different dataset (named 'dataset_%04d')?
Any advice or best practices behaviour for this kind of problem is greatly appreciated (as I am not too familiar with hdf5).
In case, this is of interest, here is the python code I am using to write the hdf5 file:
import h5py
h5f = h5py.File( 'data_0001.h5', 'w' )
h5f.create_dataset( 'dataset_1', data=myData )
h5f.close

This is really interesting as I'm currently dealing with similar problem.
Performace
To investigate the problem a little closer, I have created following file
import h5py
import numpy as np
def one_file(shape=(4000, 4000), n=1000):
h5f = h5py.File('data.h5', 'w')
for i in xrange(n):
dataset = np.random.random(shape)
dataset_name = 'dataset_{:08d}'.format(i)
h5f.create_dataset(dataset_name, data=dataset)
print i
h5f.close()
def more_files(shape=(4000, 4000), n=1000):
for i in xrange(n):
file_name = 'data_{:08d}'.format(i)
h5f = h5py.File(file_name, 'w')
dataset = np.random.random(shape)
h5f.create_dataset('dataset', data=dataset)
h5f.close()
print i
Then, in IPython,
>>> from testing import one_file, more_files
>>> %timeit one_file(n=25) # with n=25, the resulting file is 3.0GB
1 loops, best of 3: 42.5 s per loop
>>> %timeit more_files(n=25)
1 loops, best of 3: 41.7 s per loop
>>> %timeit one_file(n=250)
1 loops, best of 3: 7min 29s per loop
>>> %timeit more_files(n=250)
1 loops, best of 3: 8min 10s per loop
The difference is quite surprising to me, for n=25 having more files is faster, however this is no longer truth for more datasets.
Experience
As others noted in comments, there is probably no correct answer as this is very problem specific. I deal with hdf5 files for my research in plasma physics. I don't know if it helps you but I can share my hdf5 experience.
I'm running lots of simulations and output for a given simulation used to go to one hdf5 file. When a simulation finished, it dumped it's state to this hdf5 file, so later I was able to take this state and extend simulation from that point (I could change some parameters as well and I don't need to start from scratch). The output from this simulation went again to the same file. This was great - I had only one file for one simulation. However, there are certain drawbacks with this approach:
When a simulation crashes, you end up with a file that is not 'complete' - you can't start new simulation from that file.
There is no simple way, how you can safely take a look into hdf5 file when another process is writing to that file. If it happen that you try to read from and another process is writing to, you end up corrupted file and all you data is lost!
I don't know of any simple way how you can delete groups from a file (I anyone know a way, let me know). So, if I need to restructure a file, I need to create new one from it (h5copy, h5repack, ...).
So I end up with this approach which works much better:
I'm periodically flushing state from a simulation and after that I'm writing to a new file. If simulation crashes, I need only delete last file and I don't lost that much of cpu time.
I'm currently only plotting data from all files but the last one. Note that there is another way: see here, but my approach is definitely simpler and I'm ok with that.
It is much better to process more small files than one huge file - you see the progress and so on.
Hope this helps.

A little late to the party, I know, but I thought I'd share my experiences. My data sizes are smaller, but from a simplicity-of-analysis standpoint I actually prefer one large (1000, 4000, 4000) dataset. In your case, it looks like you'd need to use the maxshape property to make it extendable as you create new results. Saving multiple separate datasets makes it hard to look at trends across datasets since you have to slice them all separately. With one dataset you could do eg. data[:, 5, 20] to look across the 3rd axis. Also, to address the corruption problem, I highly recommend using h5py.File as a context manager:
with h5py.File('myfilename') as f:
f.create_dataset('mydata', data=data, maxshape=(1000, 4000, 4000))
This automatically closes the file even if there is an exception. I used to curse incessantly due to corrupted data and then I started doing this and haven't had a problem since.

retrieve sequence alignment score produced by emboss in biopython

I'm trying to retrieve the alignment score of two sequences compared using emboss in biopython. The only way that I know is to retrieve it from an output text file produced by emboss. The problem is that there will be hundreds of these files to iterate over. Is there an easier/cleaner method to retrieve the alignment score, without resorting to that? This is the main part of the code that I'm using.
From Bio.Emboss.Applications import StretcherCommandline
needle_cline = StretcherCommandline(asequence=,bsequence=,gapopen=,gapextend=,outfile=)
stdout, stderr = needle_cline()

I had the same problem and after some time spent on searching for a neat solution I popped up a white flag.
However, to speed up significantly the processing of output files I did the following things:
1) I used re python module for handling regular expressions to extract all data needed.
2) I created a ramdisk space for the output files. The use of a ramdisk here allowed for processing and exchanging all the data in RAM memory (much faster than writing and reading the output files from a hard drive, not to mention it saves your hdd in case of processing massive number of alignments).

I don't know if there is one specifically for your command.
For Primer3CommandLine, there is Primer3. Make your life much easier with something like:
from Bio.Emboss import Primer3
inputFile = "./wherever/your/outputfileis.out"
with open(inputFile) as fileHandle:
record = Primer3.parse(fileHandle)
# XXX check is len>0
primers = record.next().primers
numPrimers = len(primers)
# you should have access to each primer, using a for loop
# to check how to access the data you care about. For example:
I would also check http://biopython.org/wiki/SeqIO#Sequence_Input

Why dead code in OpenCL kernel influence result in Nvidia GTX550ti?

I am using OpenCL dev software of Nvidia on GTX550ti graphics card, and encounter a strange problem. (I am freshman for OpenCL).
My kernel code is like this:
__kernel void kernel_name(...)
{
size_t d = get_local_id(0);
char abc[8];
...
}
Actually, the char abc[8] is useless (dead code) for my case. But, if I have the char abc[8] in my kernel code, the result will be totally messy and the running time of kernel will be much longer (2095712 ns). If I comment out the char abc[8], the result becomes correct, and the running time of kernel becomes shorter (697856 ns). The compiler of kernel won't wipe off the dead code?
The above is just an explicit example that I can repeat. I also encounter more stranger case that one program gets different result when run at different time in totally the same environment.
Is that related to memory allocation or..? Anyone can give me some advice on how to find the problem?
By the way, oclDeviceQuery output information is listed as follows:
Platform Version = OpenCL 1.1
CUDA 4.2.1,
SDK Revision = 7027912
My OS is Windows XP.
Today is 2012-07-17, and I think I have resolved this problem.
don't use #include in kernel source file.
don't use ultra length line (for example, you write program to generate some line data for kernel source file) in kernel source file.

You're right, that shouldn't effect anything.
That's not your real code though, and I suspect given those run-times that your kernel isn't a simple thing. Possibly you're pushing your locals over some limit which means that variables are having to be stored in some slower memory which pushes your run-times up.
Something like that might also cause a change in behaviour if you had an uninitialised variable bug somewhere. In the fast store it happens to get a value that works. In the slow store it gets something else.
To check this theory I'd try to remove some other local data structure and see if it has the same effect. Anything else 8 bytes or larger should have the same effect.
...of course it's possibly you've found a bug in the OpenCL implementation, but that's easy to check. Just compile the kernel for a different OpenCL device, e.g. the CPU. This is worth doing anyway because different compiler pick up different issues.
Other than that I think you're back to standard debug techniques.
BTW: at one point in your question you call the array abs[8] rather than abc[8]. I assume that's a typo, but if it isn't then that could be your problem as the abs name will clash with the abs() function. That could confuse a stupid compiler.

Most memory efficient way to save binary file from the web with Python 2.6?

I'm trying to download (and save) a binary file from the web using Python 2.6 and urllib.
As I understand it, read(), readline() and readlines() are the 3 ways to read a file-like object.
Since the binary files aren't really broken into newlines, read() and readlines() read teh whole file into memory.
Is choosing a random read() buffer size the most efficient way to limit memory usage during this process?
i.e.
import urllib
import os
title = 'MyFile'
downloadurl = 'http://somedomain.com/myfile.avi'
webFile = urllib.urlopen(downloadurl)
mydirpath = os.path.join('c:', os.sep,'mydirectory',\
downloadurl.split('/')[-1])
if not os.path.exists(mydirpath):
print "Downloading...%s" % title
localFile = open(mydirpath, 'wb')
data = webFile.read(1000000) #1MB at a time
while data:
localFile.write(data)
data = webFile.read(1000000) #1MB at a time
webFile.close()
localFile.close()
print "Finished downloading: %s" % title
else:
print "%s already exists." % mydirypath
I chose read(1000000) arbitrarily because it worked and kept RAM usage down. I assume if I was working with a raw network buffer choosing a random amount would be bad since the buffer might run dry if the transfer rate was too low. But it seems urllib is already handling lower level buffering for me.
With that in mind, is choosing an arbitrary number fine? Is there a better way?
Thanks.

You should use urllib.urlretrieve for this. It will handle everything for you.

Instead of using your own read-write loop, you should probably check out the shutil module. The copyfileobj method will let you define the buffering. The most efficient method varies from situation to situation. Even copying the same source file to the same destination may vary due to network issues.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Torch: luajit out of memory on simple task - lua

The problem was that it was trying to print the whole matrix (which is large) to the console. This can be overcome by doing either data = data:add(-mean) or data:add(-mean); - notice the semicolon Answer provided by Soumith Chintala on the torch gitter.

Related

How to change data length parameter in maxima software?

Advice (Best practices) for handling large number of large 2D arrays in HDF5 files

retrieve sequence alignment score produced by emboss in biopython

Why dead code in OpenCL kernel influence result in Nvidia GTX550ti?

Most memory efficient way to save binary file from the web with Python 2.6?

Categories

Resources