Octave : using bar and histc - histogram

I want to plot unequal width histograms.
x = [10 12 15 18] #bin edges
y = [3 2 6] #corresponding frequences
bar(e, c, 'histc')
I get the following output:
warning: implicit conversion from matrix to string
error: set: unknown property "
"
error: called from:
error: /usr/share/octave/3.2.3/m/plot/bars.m at line 120, column 7
error: /usr/share/octave/3.2.3/m/plot/bar.m at line 161, column 7
error: /usr/share/octave/3.2.3/m/plot/bar.m at line 67, column 19
It seems that histc isn't working.
I have octave 3.2 installed.
Any ideas?

Use
help bar
to get information about the parameters. You will see that 'histc' is not a valid style option (this is only supported in Matlab).
If you need to count the frequencies of your data the histc function can do this.

Related

Octave: "error: matrix cannot be indexed with . error: called from fitgmdist at line 486 column 14"

I am trying to use the Gaussian mixture model to cluster my data in octave. As a start, I am trying to fit the data to a Gaussian distribution using the fitgmdist function in octave. However, I am getting the following error:
> (error: matrix cannot be indexed with . error:
> called from fitgmdist at line 486 column 14)
whenever my source data gets bigger than a certain limit.
Attached below is a sample of the code I am using:
clear; close all; clc
pkg load statistics
k = 30; % target number of clusters
X; % X is the input source data to be clustered
X_final = normalize(X); % normalizing the input data
gm = fitgmdist( X_final, k, 'start', 'plus',
'CovarianceType', 'diagonal',
'RegularizationValue', 0.0001
);

Parsing a dimension field with variable formatting in Teradata?

I have a dimension field that holds data in the below format. I am using teradata to query this field.
10 x 10 x 10
5.0x6x7
10 x 12x 1
6.0 x6.0 x6.0
0 X 0 X 0
I was wondering how should I go about parsing this filed to only obtain the numbers into 3 different columns.
Something like this should work or at least get you close.
REGEXP_SUBSTR(DATA, '(.*?)(x ?|$)', 1, 1, 'i', 1) AS length,
REGEXP_SUBSTR(DATA, '(.*?)(x ?|$)', 1, 2, 'i', 1) AS width,
REGEXP_SUBSTR(DATA, '(.*?)(x ?|$)', 1, 3, 'i', 1) AS height
Return the first captured group of a set of characters that are followed by a case-insensitive 'x' and an optional space or the end of the line. The 4th argument is the instance of this match to return.

Why is it giving me error of "Expected 2D array, got 1D array instead"

I used regressor.fit([X_train], [Y_train]), it did worked but when I ran the below code ,it gave me the following error "ValueError: shapes (1,9) and (21,21) not aligned: 9 (dim 1) != 21 (dim 0)"
Please help
The problem is in the index chosen and the dimension. Change x to
X = dataset.iloc[:, :-1].values

Keras Tokenization (fit on text)

When i am running this script-->
tokenizer.fit_on_texts(df['text'].values)
sequences = tokenizer.texts_to_sequences(df['text'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
I am getting this error
AttributeError Traceback (most recent call
last)
<ipython-input-4-7c08b89b116a> in <module>()
----> 1 tokenizer.fit_on_texts(df['text'].values)
2 sequences = tokenizer.texts_to_sequences(df['text'].values)
3 word_index = tokenizer.word_index
4 print('Found %s unique tokens.' % len(word_index))
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/text.py in
fit_on_texts(self, texts)
220 self.filters,
221 self.lower,
--> 222 self.split)
223 for w in seq:
224 if w in self.word_counts:
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/text.py in
text_to_word_sequence(text, filters, lower, split)
41 """
42 if lower:
---> 43 text = text.lower()
44
45 if sys.version_info < (3,):
AttributeError: 'float' object has no attribute 'lower'
My size of CSV file is 6970963 when I reduce the size it works, is there any size limit of keras Tokenizer or I am doing something wrong
I guess file size is not the issue, try using a try block and look at the data your are passing. Use some thing like this instead of the line
#instead of this
tokenizer.fit_on_texts(df['text'].values)
#use this to look at the data when it is causing that error.
try:
tokenizer.fit_on_texts(df['text'].values)
except Exception as e:
print("exceiption is", e)
print("data passedin ", df['text'].values)
Then you can accordingly fix the error you are getting.
Check the datatype of the text you are fitting the tokenizer on. It sees it as a float instead of string. You need to convert to string before fitting a tokenizer on it.
Try something like this:
train_x = [str(x[1]) for x in train_x]
Although it is an old thread, but still following could be answer.
You data may have nan, which are interpreted as a float instead of nan. either force the type as str(word) or remove the nan using data.fillna('empty')

Keeping zeros in data with sklearn

I have a csv dataset that I'm trying to use with sklearn. The goal is to predict future webtraffic. However, my dataset contains zeros on days that there were no visitors and I'd like to keep that value. There are more days with zero visitors then there are with visitors (it's a tiny tiny site). Here's a look at the data
Col1 is the date:
10/1/11
10/2/11
10/3/11
etc....
Col2 is the # of visitors:
12
1
0
0
1
5
0
0
etc....
sklearn seems to interpret the zero values as NaN values which is understandable. How can I use those zero values in a logistic function (is that even possible)?
Update:
The estimator is https://github.com/facebookincubator/prophet and when I run the following:
df = pd.read_csv('~/tmp/datafile.csv')
df['y'] = np.log(df['y'])
df.head()
m = Prophet()
m.fit(df);
future = m.make_future_dataframe(periods=365)
future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
m.plot(forecast);
m.plot_components(forecast);
plt.show
I get the following:
growthprediction.py:7: RuntimeWarning: divide by zero encountered in log
df['y'] = np.log(df['y'])
/usr/local/lib/python3.6/site-packages/fbprophet/forecaster.py:307: RuntimeWarning: invalid value encountered in double_scalars
k = (df['y_scaled'].ix[i1] - df['y_scaled'].ix[i0]) / T
Traceback (most recent call last):
File "growthprediction.py", line 11, in <module>
m.fit(df);
File "/usr/local/lib/python3.6/site-packages/fbprophet/forecaster.py", line 387, in fit
params = model.optimizing(dat, init=stan_init, iter=1e4)
File "/usr/local/lib/python3.6/site-packages/pystan/model.py", line 508, in optimizing
ret, sample = fit._call_sampler(stan_args)
File "stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.pyx", line 804, in stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.StanFit4Model._call_sampler (/var/folders/ym/m6j7kw0d3kj_0frscrtp58800000gn/T/tmp5wq7qltr/stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.cpp:16585)
File "stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.pyx", line 398, in stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666._call_sampler (/var/folders/ym/m6j7kw0d3kj_0frscrtp58800000gn/T/tmp5wq7qltr/stanfit4anon_model_35bf14a7f93814266f16b4cf48b40a5a_4758371668158283666.cpp:8818)
RuntimeError: k initialized to invalid value (nan)
In this line of your code:
df['y'] = np.log(df['y'])
you are taking logarithm of 0 when your df['y'] is zero, which results in warnings and NaNs in your resulting dataset, because logarithm of 0 is not defined.
sklearn itself does NOT interpret zero values as NaNs unless you replace them with NaNs in your preprocessing.

Resources