Flux model parameters collapse to zero - machine-learning

I have been working with the Flux.jl library, and desire to create a simple Proof-of-Concept autoencoder. Having referenced the model zoo, I created the following toy model, which takes input along the y=x^2 curve
in R^2 and attempts to reconstruct after sending it to a 1-D code layer representation:
using Flux
using Flux: #epochs, onehotbatch, mse, throttle#, params
using Base.Iterators: partition
using Distributions
##creating simple 2-1-2 AE
function build_train_data()
function gen()
xy = vcat(x,y)
return xy
train_data=[gen() for i in 1:10]
return train_data
function train()
model = Chain(
#info("Training model.....")
loss(x) = mse(model(x),x)
opt = ADAM(lr)
evalcb = throttle(() -> #show(loss(train_data[1])), 1)
#epochs 100 Flux.train!(loss, Flux.params(model), zip(train_data), opt, cb = evalcb)
return model
m =train()
Now, I'm not expecting the moon from this model. That being said, I did not anticipate yielding the following results:
x=[0.9860286863631649 0.9209976855681348 0.6793548732252492 0.909752849042454 0.6926766153752839 0.9622926489586887 0.9639670701324241 0.8053711974593387 0.19502650255217913 0.38968830975794666; 0.9722525703310686 0.8482367368218608 0.46152304377489445 0.8276502463408622 0.479800893487759 0.9260071422399301 0.9292325122996898 0.6486227656970891 0.03803533669773513 0.15185697876200538]
m(x)=Float32[0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.96854496 0.84631497 0.46364155 0.8260073 0.4818032 0.92298573 0.9261639 0.6491447 0.0359351 0.15342757]
Flux.params(m)=Params([Float32[0.058760125 1.4413338], Float32[-0.0049902047], Float32[-1.0241822; 0.6694982], Float32[0.0, -0.005099244]])
for one training round and
x=[0.4789886773906975 0.8739656341280784 0.8535570077535617 0.6553854355816602 0.5611963054162175 0.22277653137378484 0.8716704866290759 0.30803815544599367 0.6973631796646094 0.07522895316317268; 0.22943015306848968 0.7638159296368942 0.7285595654852137 0.4295300691725624 0.31494129321281245 0.04962938293093494 0.7598094372601699 0.09488750521057016 0.48631540435193427 0.00565939539402683]
m(x)=Float32[0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0]
Flux.params(m)=Params([Float32[-0.071433514 -0.4906463], Float32[0.0], Float32[-0.14397836; 0.5831637], Float32[0.0, 0.0]])
for another.
As you can see, in the former case, reconstruction seems to work well enough for the "x^2" row of inputs, which leads me to believe that the model is at least partially working. The source of this problem has eluded my usual suite of debugging techniques, which leads me to believe that the source of the problem could lie with typing, (lack of) GPU utilization, or something more idiosyncratically julian.


Correct use of Histogram Data - Setting interval size?

In my model, I am saving results from numerous Parameter Variation runs in a Histogram Data object.
Here are my Histogram Data settings:
Number of intervals: 7
Value range:
Automatically detected
Initial Interval Size: 10
I then print out these results using the following :
//if final replication, write Histogram Data into Excel
if(getCurrentReplication() == lastReplication){
double intervalWidth = histogramData.getIntervalWidth();
int intervalQty = histogramData.getNumberOfIntervals();
for(int i = 0; i < intervalQty; i++){
traceln(intervalWidth*i + " " + histogramData.getPDF(i));
excelRecords.setCellValue(String.valueOf(intervalWidth*i) + " - " + String.valueOf(intervalWidth*(i+1)), 1, rowIndex, columnIndex);
excelRecords.setCellValue(histogramData.getPDF(i), 1, rowIndex, columnIndex+1);
Example of my intended results:
10 - 80%
20 - 10%
30 - 5%
40 - 2%
Actual results:
0.0 0.0
10.0 0.0
20.0 0.0
30.0 0.998782775272379
40.0 0.0011174522089635631
50.0 9.9772518657461E-5
60.0 0.0
Results after settings initial interval size to 0.1:
0.0 0.9974651710510558
4.0 0.001117719851502934
8.0 9.181270208774101E-4
12.0 2.3951139675062872E-4
16.0 1.5967426450041916E-4
20.0 9.979641531276197E-5
24.0 0.0
How would I go about obtaining my desired results? Am I fundamentally misunderstanding something about the HistogramData object?
Thank you for your help.
The function you are using (getPDF(i)) returns value for the interval in fractions (not in percentages). So, you have to multiply the value by 100 in order to get it as a percentage. As for histogram bars, model analyze the results, specified interval numbers and interval size. After that, it will build the respective number of bars that cover all results. In your case, intervals from 0 to 30 do not provide any results and bars are not presented (PDF here is 0.0).

Python: Random forest regression with discrete (categorial) features?

I am using random forest regressor as my target values is not categorial. However, the features are.
When I run the algorithm it treats them as continuous variables.
Is there any way to treat them as categorial?
when I try random forest regressor it treats user ID for example as continuous (taking values 1.5 etc.)
The dtype in the data frame is int64.
Could you help me with that?
here is the code I have tried:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import tree
from matplotlib import pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
import numpy as np
df = pd.read_excel('Data_frame.xlsx', sheet_name=5)
X = df.drop('productivity', axis='columns')
y = df['productivity']
X_train, X_test, y_train, y_test = train_test_split(X, y)
rf = RandomForestRegressor(bootstrap=False, n_estimators=1000, criterion='squared_error', max_depth=5, max_features='sqrt')
rf.fit(X_train.values, y_train)
_ = tree.plot_tree(rf.estimators_[1], feature_names=X.columns, filled=True,fontsize=8)
y_predict = rf.predict(X_test.values)
mae = mean_absolute_error(y_predict,y_test)
First of all, RandomForestRegressor only accepts numerical values. So encoding your numerical values to categorical is not a solution because you are not going to be able to train you model.
The way to deal with this type of problem is OneHotEncoder. This function will create one column for every value that you have in the specified feature.
Below there is the example of code:
# creating initial dataframe
values = (1,10,1,2,2,3,4)
df = pd.DataFrame(values, columns=['Numerical_data'])
Datafram will look like this:
0 1
1 10
2 1
3 2
4 2
5 3
6 4
Now, OneHotEncode it:
enc = OneHotEncoder(handle_unknown='ignore')
enc_df = pd.DataFrame(enc.fit_transform(df[['Bridge_Types']]).toarray())
0 1 2 3 4
0 1.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 1.0
2 1.0 0.0 0.0 0.0 0.0
3 0.0 1.0 0.0 0.0 0.0
4 0.0 1.0 0.0 0.0 0.0
5 0.0 0.0 1.0 0.0 0.0
6 0.0 0.0 0.0 1.0 0.0
Then, depending your necesities, you can join this calculated frame to you DataSet. Be aware that you should remove the initial feature:
# merge with main df bridge_df on key values
df = df.join(enc_df)
Numerical_data 0 1 2 3 4
0 1 1.0 0.0 0.0 0.0 0.0
1 10 0.0 0.0 0.0 0.0 1.0
2 1 1.0 0.0 0.0 0.0 0.0
3 2 0.0 1.0 0.0 0.0 0.0
4 2 0.0 1.0 0.0 0.0 0.0
5 3 0.0 0.0 1.0 0.0 0.0
6 4 0.0 0.0 0.0 1.0 0.0
Of course, if you have hundreds of different values in your specified feature, many columns will be created. But this is the way to proceed.

Generate weighted random number in Swift [duplicate]

Check out this question:
Swift probability of random number being selected?
The top answer suggests to use a switch statement, which does the job. However, if I have a very large number of cases to consider, the code looks very inelegant; I have a giant switch statement with very similar code in each case repeated over and over again.
Is there a nicer, cleaner way to pick a random number with a certain probability when you have a large number of probabilities to consider? (like ~30)
This is a Swift implementation strongly influenced by the various
answers to Generate random numbers with a given (numerical) distribution.
For Swift 4.2/Xcode 10 and later (explanations inline):
func randomNumber(probabilities: [Double]) -> Int {
// Sum of all probabilities (so that we don't have to require that the sum is 1.0):
let sum = probabilities.reduce(0, +)
// Random number in the range 0.0 <= rnd < sum :
let rnd = Double.random(in: 0.0 ..< sum)
// Find the first interval of accumulated probabilities into which `rnd` falls:
var accum = 0.0
for (i, p) in probabilities.enumerated() {
accum += p
if rnd < accum {
return i
// This point might be reached due to floating point inaccuracies:
return (probabilities.count - 1)
let x = randomNumber(probabilities: [0.2, 0.3, 0.5])
returns 0 with probability 0.2, 1 with probability 0.3,
and 2 with probability 0.5.
let x = randomNumber(probabilities: [1.0, 2.0])
return 0 with probability 1/3 and 1 with probability 2/3.
For Swift 3/Xcode 8:
func randomNumber(probabilities: [Double]) -> Int {
// Sum of all probabilities (so that we don't have to require that the sum is 1.0):
let sum = probabilities.reduce(0, +)
// Random number in the range 0.0 <= rnd < sum :
let rnd = sum * Double(arc4random_uniform(UInt32.max)) / Double(UInt32.max)
// Find the first interval of accumulated probabilities into which `rnd` falls:
var accum = 0.0
for (i, p) in probabilities.enumerated() {
accum += p
if rnd < accum {
return i
// This point might be reached due to floating point inaccuracies:
return (probabilities.count - 1)
For Swift 2/Xcode 7:
func randomNumber(probabilities probabilities: [Double]) -> Int {
// Sum of all probabilities (so that we don't have to require that the sum is 1.0):
let sum = probabilities.reduce(0, combine: +)
// Random number in the range 0.0 <= rnd < sum :
let rnd = sum * Double(arc4random_uniform(UInt32.max)) / Double(UInt32.max)
// Find the first interval of accumulated probabilities into which `rnd` falls:
var accum = 0.0
for (i, p) in probabilities.enumerate() {
accum += p
if rnd < accum {
return i
// This point might be reached due to floating point inaccuracies:
return (probabilities.count - 1)
Is there a nicer, cleaner way to pick a random number with a certain probability when you have a large number of probabilities to consider?
Sure. Write a function that generates a number based on a table of probabilities. That's essentially what the switch statement you've pointed to is: a table defined in code. You could do the same thing with data using a table that's defined as a list of probabilities and outcomes:
probability outcome
----------- -------
0.4 1
0.2 2
0.1 3
0.15 4
0.15 5
Now you can pick a number between 0 and 1 at random. Starting from the top of the list, add up probabilities until you've exceeded the number you picked, and use the corresponding outcome. For example, let's say the number you pick is 0.6527637. Start at the top: 0.4 is smaller, so keep going. 0.6 (0.4 + 0.2) is smaller, so keep going. 0.7 (0.6 + 0.1) is larger, so stop. The outcome is 3.
I've kept the table short here for the sake of clarity, but you can make it as long as you like, and you can define it in a data file so that you don't have to recompile when the list changes.
Note that there's nothing particularly specific to Swift about this method -- you could do the same thing in C or Swift or Lisp.
This seems like a good opportunity for a shameless plug to my small library, swiftstats:
For example, this would generate 3 random variables from a normal distribution with mean 0 and variance 1:
import SwiftStats
let n = SwiftStats.Distributions.Normal(0, 1.0)
Supported distributions include: normal, exponential, binomial, etc...
It also supports fitting sample data to a given distribution, using the Maximum Likelihood Estimator for the distribution.
See the project readme for more info.
You could do it with exponential or quadratic functions - have x be your random number, take y as the new random number. Then, you just have to jiggle the equation until it fits your use case. Say I had (x^2)/10 + (x/300). Put your random number in, (as some floating-point form), and then get the floor with Int() when it comes out. So, if my random number generator goes from 0 to 9, I have a 40% chance of getting 0, and a 30% chance of getting 1 - 3, a 20% chance of getting 4 - 6, and a 10% chance of an 8. You're basically trying to fake some kind of normal distribution.
Here's an idea of what it would look like in Swift:
func giveY (x: UInt32) -> Int {
let xD = Double(x)
return Int(xD * xD / 10 + xD / 300)
let ans = giveY (arc4random_uniform(10))
I wasn't very clear above - what I meant was you could replace the switch statement with some function that would return a set of numbers with a probability distribution that you could figure out with regression using wolfram or something. So, for the question you linked to, you could do something like this:
import Foundation
func returnLevelChange() -> Double {
return 0.06 * exp(0.4 * Double(arc4random_uniform(10))) - 0.1
newItemLevel = oldItemLevel * returnLevelChange()
So that function returns a double somewhere between -0.05 and 2.1. That would be your "x% worse/better than current item level" figure. But, since it's an exponential function, it won't return an even spread of numbers. The arc4random_uniform(10) returns an int from 0 - 9, and each of those would result in a double like this:
0: -0.04
1: -0.01
2: 0.03
3: 0.1
4: 0.2
5: 0.34
6: 0.56
7: 0.89
8: 1.37
9: 2.1
Since each of those ints from the arc4random_uniform has an equal chance of showing up, you get probabilities like this:
40% chance of -0.04 to 0.1 (~ -5% - 10%)
30% chance of 0.2 to 0.56 (~ 20% - 55%)
20% chance of 0.89 to 1.37 (~ 90% - 140%)
10% chance of 2.1 (~ 200%)
Which is something similar to the probabilities that other person had. Now, for your function, it's much more difficult, and the other answers are almost definitely more applicable and elegant. BUT you could still do it.
Arrange each of the letters in order of their probability - from largest to smallest. Then, get their cumulative sums, starting with 0, without the last. (so probabilities of 50%, 30%, 20% becomes 0, 0.5, 0.8). Then you multiply them up until they're integers with reasonable accuracy (0, 5, 8). Then, plot them - your cumulative probabilities are your x's, the things you want to select with a given probability (your letters) are your y's. (you obviously can't plot actual letters on the y axis, so you'd just plot their indices in some array). Then, you'd try find some regression there, and have that be your function. For instance, trying those numbers, I got
e^0.14x - 1
and this:
let letters: [Character] = ["a", "b", "c"]
func randLetter() -> Character {
return letters[Int(exp(0.14 * Double(arc4random_uniform(10))) - 1)]
returns "a" 50% of the time, "b" 30% of the time, and "c" 20% of the time. Obviously pretty cumbersome for more letters, and it would take a while to figure out the right regression, and if you wanted to change the weightings you're have to do it manually. BUT if you did find a nice equation that did fit your values, the actual function would only be a couple lines long, and fast.

logistic regression with gradient descent error

I am trying to implement logistic regression with gradient descent,
I get my Cost function j_theta for the number of iterations and fortunately my j_theta is decreasing when plotted j_theta against the number of iteration.
The data set I use is given below:
1 20 30
1 40 60
1 70 30
1 50 50
1 50 40
1 60 40
1 30 40
1 40 50
1 10 20
1 30 40
1 70 70
y= 0
The code that I managed to write for logistic regression using Gradient descent is:
%1. The below code would load the data present in your desktop to the octave memory
%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
% Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.
mn = mean(x);
sd = std(x);
x(:,2) = (x(:,2) - mn(2))./ sd(2);
x(:,3) = (x(:,3) - mn(3))./ sd(3);
% We will not use vectorized technique, Because its hard to debug, We shall try using many for loops rather
theta = zeros(size(x(1,:)))';
for num_iter=1:max_iter
% We calculate the cost Function
for i=1:m
for j=1:n+1
% theta(j)
h= 1.0 ./(1.0 + exp(-z));
j_cost_each=j_cost_each + ( (-y(i) * log(h)) - ((1-y(i)) * log(1-h)) );
% j_cost_each
j_theta(num_iter)=(1/m) * j_cost_each;
for j=1:n+1
grad(j) = 0;
for i=1:m
h=1.0 ./ (1.0 + exp(-z));
grad(j) += (h-y(i)) * x(i,j);
theta(j)=theta(j)- alpha * grad(j);
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1); % This will take the postion or array number from y for all the class that has value 1
neg = find(y == 0); % Similarly this will take the position or array number from y for all class that has value 0
% Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(x(pos, 2), x(pos,3), '+');
hold on
plot(x(neg, 2), x(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
plot_x = [min(x(:,2))-2, max(x(:,2))+2]; % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off
%%%%%%% The only difference is In the last plot I used X where as now I use x whose attributes or features are featured scaled %%%%%%%%%%%
If you view the graph of x1 vs x2 the graph would look like,
After I run my code I create a decision boundary. The shape of the decision line seems to be okay but it is a bit displaced. The graph of the x1 vs x2 with decision boundary is given below:
![enter image description here][2]
Please suggest me where am I going wrong ....
The New Graph::::
![enter image description here][1]
If you see the new graph the coordinated of x axis have changed ..... Thats because I use x(feature scalled) instead of X.
The problem lies in your cost function calculation and/or gradient calculation, your plotting function is fine. I ran your dataset on the algorithm I implemented for logistic regression but using the vectorized technique because in my opinion it is easier to debug.
The final values I got for theta were
theta =
I also used alpha = 0.3
I plotted the decision boundary and it looks fine, I would recommend using the vectorized form as it is easier to implement and to debug in my opinion.
I also think your implementation of gradient descent is not quite correct. 50 iterations is just not enough and the cost at the last iteration is not good enough. Maybe you should try to run it for more iterations with a stopping condition.
Also check this lecture for optimization techniques.
