Out of memory, plotting 24 images in 1 plot - memory

Ho, I want to plot 24 images in 1 plot using subplot.
I've already made the empty plots using this method:
# Import everything from matplotlib (numpy is accessible via 'np' alias)
from pylab import *
# create new figure of a3 size.
figure(figsize=(16.5, 11.7), dpi=300)
# do plotting for 24 figs in 1 plot
for i in range(1, 25):
#print i
subplot(4, 6, i)
Now i want to fill my subplots with the same data in everyplot (a background to plot against) in a line plot.
I do this using the following line:
plot(myData)
Once i run the program, it crashes telling me:
"_tkinter.TclError: not enough free memory for image buffer"
So after searching the web I read that i need to close the plots after i make them so that the memory can be reused.
However, how do i do this when using subplots ?
Frank
Edit:
I think it would get easily solved if i could 2 lists, 1 with each uniq item in myData, and the second list with the number of occurences of that uniq item. any1 got tips on that ?

Related

How is Spark reading my image using the image format?

It might be a silly question but I can't figure out how Spark read my image using the spark.read.format("image").load(....) argument.
After importing my image which gives me the following:
>>> image_df.select("image.height","image.width","image.nChannels", "image.mode", "image.data").show()
+------+-----+---------+----+--------------------+
|height|width|nChannels|mode| data|
+------+-----+---------+----+--------------------+
| 430| 470| 3| 16|[4D 55 4E 4C 54 4...|
+------+-----+---------+----+--------------------+
I arrive to the conclusion that:
my image is 430x470 pixels,
my image is colored (RGB due to nChannels = 3) which is an openCV compatible-type,
my image mode is 16 which corresponds to a particular openCV byte-order.
Does someone knows which website/documentation I could browse to know more about it?
the data in the data column is of type Binary but:
when I run image_df.select("image.data").take(1) I got an output which seems to be only one array (see below).
>>> image_df.select("image.data").take(1)
# **1/** Here are the last elements of the result
....<<One Eternity Later>>....x92\x89\x8a\x8d\x84\x86\x89\x80\x84\x87~'))]
# 2/ I got also several part of the result which looks like:
.....\x89\x80\x80\x83z|\x7fvz}tpsjqtkrulsvmsvmsvmrulrulrulqtkpsjnqhnqhmpgmpgmpgnqhnqhn
qhnqhnqhnqhnqhnqhmpgmpgmpgmpgmpgmpgmpgmpgnqhnqhnqhnqhnqhnqhnqhnqhknejmdilcilchkbh
kbilcilckneloflofmpgnqhorioripsjsvmsvmtwnvypx{ry|sz}t{~ux{ry|sy|sy|sy|sz}tz}tz}tz}
ty|sy|sy|sy|sz}t{~u|\x7fv|\x7fv}.....
What come next are linked to the results displayed above. Those might be due to my lack of knowledge concerning openCV (or else). Nonetheless:
1/ I don't understand the fact that if I got an RGB image, I should have 3 matrix but the output finishes by .......\x84\x87~'))]. I was more thinking on obtaining something like [(...),(...),(...\x87~')].
2/ Is this part has a special meaning? Like those are the separator between each matrix or something?
To be more clear about what I'm trying to achieve, I want to process images to do pixel comparison between each images. Therefore, I want to know the pixel values for a given position in my image (I assume that if I have an RGB image, I shall have 3 pixel values for a given position).
Example: let's say that I have a webcam pointing to the sky only during the day and I want to know the values of a pixel at a position corresponding to the top left sky part, I found out that the concatenation of those values gives the colour Light Blue which says that the photo was taken on a sunny day. Let's say that the only possibility is that a sunny day takes the colour Light Blue.
Next I want to compare the previous concatenation with another concat of pixel values at the exact same position but from a picture taken the next day. If I found out that they are not equal then I conclude that the given picture was taken on a cloudy/rainy day. If equal then sunny day.
Any help on that would be highly appreciated. I have vulgarized my example for a better understanding but my goal is pretty much the same. I know that ML model can exist to achieve those stuff but I would be happy to try this first. My first goal is to split this column into 3 columns corresponding to each color code: a red matrix, a green matrix, a blue matrix
I think I have the logic. I used the keras.preprocessing.image.img_to_array() function to understand how the values are classified (since I have an RGB image, I must have 3 matrix: one for each color R G B). Posting that if someone wonder how it works, I might be wrong but I think I have something :
from keras.preprocessing import image
import numpy as np
from PIL import Image
# Using spark built-in data source
first_img = spark.read.format("image").schema(imageSchema).load(".....")
raw = first_img.select("image.data").take(1)[0][0]
np.shape(raw)
(606300,) # which is 470*430*3
# Using keras function
img = image.load_img(".../path/to/img")
yy = image.img_to_array(img)
>>> np.shape(yy)
(430, 470, 3) # the form is good but I have a problem of order since:
>>> raw[0], raw[1], raw[2]
(77, 85, 78)
>>> yy[0][0]
array([78., 85., 77.], dtype=float32)
# Therefore I used the numpy reshape function directly on raw
# to have 470 matrix of 3 lines and 470 columns:
array = np.reshape(raw, (430,470,3))
xx = image.img_to_array(array) # OPTIONAL and not used here
>>> array[0][0] == (raw[0],raw[1],raw[2])
array([ True, True, True])
>>> array[0][1] == (raw[3],raw[4],raw[5])
array([ True, True, True])
>>> array[0][2] == (raw[6],raw[7],raw[8])
array([ True, True, True])
>>> array[0][3] == (raw[9],raw[10],raw[11])
array([ True, True, True])
So if I understood well, spark will read the image as a big array - (606300,) here - where in fact each element are ordered and corresponds to their respective color shade (R G B).
After doing my little transformations, I obtain 430 matrix of 3 columns x 470 lines. Since my image is (470x430) for (WidthxHeight), each matrix corresponds to a pixel heigth position and inside each: 3 columns for each color and 470 lines for each width position.
Hope that helps someone :)!

if (freq) x$counts else x$density length > 1 and only the first element will be used

for my thesis I have to calculate the number of workers at risk of substitution by machines. I have calculated the probability of substitution (X) and the number of employee at risk (Y) for each occupation category. I have a dataset like this:
X Y
1 0.1300 0
2 0.1000 0
3 0.0841 1513
4 0.0221 287
5 0.1175 3641
....
700 0.9875 4000
I tried to plot a histogram with this command:
hist(dataset1$X,dataset1$Y,xlim=c(0,1),ylim=c(0,30000),breaks=100,main="Distribution",xlab="Probability",ylab="Number of employee")
But I get this error:
In if (freq) x$counts else x$density
length > 1 and only the first element will be used
Can someone tell me what is the problem and write me the right command?
Thank you!
It is worth pointing out that the message displayed is a Warning message, and should not prevent the results being plotted. However, it does indicate there are some issues with the data.
Without the full dataset, it is not 100% obvious what may be the problem. I believe it is caused by the data not being in the correct format, with two potential issues. Firstly, some values have a value of 0, and these won't be plotted on the histogram. Secondly, the observations appear to be inconsistently spaced.
Histograms are best built from one of two datasets:
A dataframe which has been aggregated grouped into consistently sized bins.
A list of values X which in the data
I prefer the second technique. As originally shown here The expandRows() function in the package splitstackshape can be used to repeat the number of rows in the dataframe by the number of observations:
set.seed(123)
dataset1 <- data.frame(X = runif(900, 0, 1), Y = runif(900, 0, 1000))
library(splitstackshape)
dataset2 <- expandRows(dataset1, "Y")
hist(dataset2$X, xlim=c(0,1))
dataset1$bins <- cut(dataset1$X, breaks = seq(0,1,0.01), labels = FALSE)

Using Keras ImageDataGenerator in a regression model

I want to use the flow_from_directory method of the ImageDataGenerator
to generate training data for a regression model, where the target value can be any float value between 1 and -1. flow_from_directory has a "class_mode" parameter with the description
class_mode: one of "categorical", "binary", "sparse" or None. Default:
"categorical". Determines the type of label arrays that are returned:
"categorical" will be 2D one-hot encoded labels, "binary" will be 1D
binary labels, "sparse" will be 1D integer labels.
Which of these values should I take? None of them seems to really fit...
With Keras 2.2.4 you can use flow_from_dataframe which solves what you want to do, allowing you to flow images from a directory for regression problems. You should store all your images in a folder and load a dataframe containing in one column the image IDs and in the other column the regression score (labels) and set class_mode='other' in flow_from_dataframe.
Here you can find an example where the images are in image_dir, the dataframe with the image IDs and the regression scores is loaded with pandas from the "train file"
train_label_df = pd.read_csv(train_file, delimiter=' ', header=None, names=['id', 'score'])
train_datagen = ImageDataGenerator(rescale = 1./255, horizontal_flip = True,
fill_mode = "nearest", zoom_range = 0.2,
width_shift_range = 0.2, height_shift_range=0.2,
rotation_range=30)
train_generator = train_datagen.flow_from_dataframe(dataframe=train_label_df, directory=image_dir,
x_col="id", y_col="score", has_ext=True,
class_mode="other", target_size=(img_width, img_height),
batch_size=bs)
I think that organizing your data differently, using a DataFrame (without necessarily moving your images to new locations) will allow you to run a regression model. In short, create columns in your DataFrame containing the file path of each image and the target value. This allows your generator to keep regression values and images properly synced even when you shuffle your data at each epoch.
Here is an example showing how to link images with binomial targets, multinomial targets and regression targets just to show that "a target is a target is a target" and only the model might change:
df['path'] = df.object_id.apply(file_path_from_db_id)
df
object_id bi multi path target
index
0 461756 dog white /path/to/imgs/756/61/blah_461756.png 0.166831
1 1161756 cat black /path/to/imgs/756/61/blah_1161756.png 0.058793
2 3303651 dog white /path/to/imgs/651/03/blah_3303651.png 0.582970
3 3367756 dog grey /path/to/imgs/756/67/blah_3367756.png -0.421429
4 3767756 dog grey /path/to/imgs/756/67/blah_3767756.png -0.706608
5 5467756 cat black /path/to/imgs/756/67/blah_5467756.png -0.415115
6 5561756 dog white /path/to/imgs/756/61/blah_5561756.png -0.631041
7 31255756 cat grey /path/to/imgs/756/55/blah_31255756.png -0.148226
8 35903651 cat black /path/to/imgs/651/03/blah_35903651.png -0.785671
9 44603651 dog black /path/to/imgs/651/03/blah_44603651.png -0.538359
10 49557622 cat black /path/to/imgs/622/57/blah_49557622.png -0.295279
11 58164756 dog grey /path/to/imgs/756/64/blah_58164756.png 0.407096
12 95403651 cat white /path/to/imgs/651/03/blah_95403651.png 0.790274
13 95555756 dog grey /path/to/imgs/756/55/blah_95555756.png 0.060669
I describe how to do this in great detail with examples here:
https://techblog.appnexus.com/a-keras-multithreaded-dataframe-generator-for-millions-of-image-files-84d3027f6f43
At this moment (newest version of Keras from January 21st 2017) the flow_from_directory could only work in a following manner:
You need to have a directories structured in a following manner:
directory with images\
1st label\
1st picture from 1st label
2nd picture from 1st label
3rd picture from 1st label
...
2nd label\
1st picture from 2nd label
2nd picture from 2nd label
3rd picture from 2nd label
...
...
flow_from_directory returns batches of a fixed size in a format of (picture, label).
So as you can see it could only be used for a classification case and all options provided in a documentation specify only a way in which the class is provided to your classifier. But, there is a neat hack which could make a flow_from_directory useful for a regression task:
You need to structure your directory in a following manner:
directory with images\
1st value (e.g. -0.95423)\
1st picture from 1st value
2nd picture from 1st value
3rd picture from 1st value
...
2nd value (e.g. - 0.9143242)\
1st picture from 2nd value
2nd picture from 2nd value
3rd picture from 2nd value
...
...
You also need to have a list list_of_values = [1st value, 2nd value, ...]. Then your generator is defined in a following manner:
def regression_flow_from_directory(flow_from_directory_gen, list_of_values):
for x, y in flow_from_directory_gen:
yield x, list_of_values[y]
And it's crucial for a flow_from_directory_gen to have a class_mode='sparse' to make this work. Of course this is a little bit cumbersome but it works (I used this solution :) )
There's just one glitch in the accepted answer that I would like to point out. The above code fails with an error message like:
TypeError: only integer scalar arrays can be converted to a scalar index
This is because y is an array. The fix is simple:
def regression_flow_from_directory(flow_from_directory_gen,
list_of_values):
for x, y in flow_from_directory_gen:
values = [list_of_values[y[i]] for i in range(len(y))]
yield x, values
The method to generate the list_of_values can be found in https://stackoverflow.com/a/47944082/4082092

Healpy plotting: How do i make a figure with subplots using the healpy.mollview projection?

I've just recently started trying to use healpy and i can't figure out how to make subplots to contain my maps. I have a thermal emission map of a planet as function of time and i need to look at it at several moments in time (lets say 9 different times) and superimpose some coordinates, to check that my planet is rotating the right way.
So far, i can do 2 things.
Make 9 different figures with the superimposed coordinates.
Make a figure with 9 subplots containing 9 different maps but that superimposes all of my coordinates on all of my subplots, instead of just the time-appropriate ones.
I'm not sure if this is a very simple problem but it's been driving me crazy and i cant find anything that works.
I'll show you what i mean:
OPTION 1:
import healpy as hp
import matplolib.pyplot as plt
MAX = 10**(23)
MIN = 10**10
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(Fmap_wvpix[t,:],
title = "Map at t="+str(t), min = MIN, max=MAX))
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),1 ],
d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),2],
'k*',markersize = 6)
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),1 ],
d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),2],
'r*',markersize = 6)
This makes 9 figures that look pretty much like this :
Flux map superimposed with some stars at time = t
But i need a lot of them so i want to make an image that contains 9 subplots that look like the image.
OPTION 2:
fig = plt.figure(figsize = (10,8))
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(Fmap_wvpix[t,:],
title = "Map at t="+str(t), min = MIN, max=MAX,
sub = int('33'+str(i+1)))
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),1 ],
d[t,np.where(np.abs(d[t,:,2]-SSP[t])<0.5),2],
'k*',markersize = 6)
hp.visufunc.projplot(d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),1 ],
d[t,np.where(np.abs(d[t,:,2]-(SOP[t]))<0.2),2],
'r*',markersize = 6)
This gives me subplots but it draws all the projplot stars on all of my subplots! (see following image)
Subplots with too many stars
I know that i need a way to call the axes that has the time = t map and draw the stars for time = t on the appropriate map, but everything i've tried so far has failed. I've mostly tried to use projaxes thinking i can define a matplotlib axes and draw the stars on it but it doesnt work. Any advice?
Also, i would like to draw some lines on my map as well but i cant figure out how to do that either. The documentation says projplot but it won't draw anyting if i don't tell it i want a marker.
PS: This code is probably useless to you as it won't work if you don't have my arrays. Here's a simpler version that should run:
import numpy as np
import healpy as hp
import matplotlib.pyplot as plt
NSIDE = 8
m = np.arange(hp.nside2npix(NSIDE))*1
MAX = 900
MIN = 0
fig = plt.figure(figsize = (10,8))
for i in range(9):
t = 4000+10*i
hp.visufunc.mollview(m+100*i, title = "Map at t="+str(t), min = MIN, max=MAX,
sub = int('33'+str(i+1)))
hp.visufunc.projplot(1.5,0+30*i, 'k*',markersize = 16)
So this is supposed to give me one star for each frame and the star is supposed to be moving. But instead it's drawing all the stars on all the frames.
What can i do? I don't understand the documentation.
If you want to have healpy plots in matplotlib subplots, the following would be the way to go. The key is to use plt.axes() to select the active subplot and to use the hold=True keyword in the healpy functions.
import healpy as hp
import numpy as np
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(ncols=2)
plt.axes(ax1)
hp.mollview(np.random.random(hp.nside2npix(32)), hold=True)
plt.axes(ax2)
hp.mollview(np.arange(hp.nside2npix(32)), hold=True)
I have just encountered this question looking for a solution to the same problem, but managed to find it from the documentation of mollview (here).
As you notice there, they say that 'sub' received the same syntax as the function subplot (from matplotlib). This format is:
( # of rows, # of columns, # of current subplot)
E.g. to make your plot, the value sub wants to receive in each iteration is
sub=(3,3,i)
Where i runs from 1 to 9 (3*3).
This worked for me, I haven't tried this with your code, but should work.
Hope this helps!

Logistic mapping in gnuplot

I have quite big problem when it comes to plotting data.
First, I've obtained file data.dat from my c++ program, which implements the logistic map.
Data.dat looks as follows: first column should be the number k which should be on the bottom of the plot. When k is in the range [2,3) everything is fine, there is only one attractor (corresponding value to each k, which is always in the range (0,1)), but when it's [3,4) things get complicated.
For each point k there are 2 up to 100 points corresponding to each k.
Each of these points is in the separate column, but I have no idea how could I connect those to certain k.
Here is a sample of my data for points: 2.5, 3, 3.2, 3.5, 3.8 and 3.99999, divided by the newline for clarity (it's not divided by a newline in my original data file)
http://pastebin.com/2AcAjXzk
Thanks for any help, cheers.
Gnuplot cannot handle such a data format properly. Either modify your program such that it prints in each line the k followed by a single value, or you process your data file with a short awk script before plotting:
plot '< awk ''{ for(i = 1; i <= NF; i++) print $1, $i}'' file.txt' using 1:2 with dots notitle

Resources