RLlib PPO continuous actions seem to become nan after total_loss = inf?

RLlib PPO continuous actions seem to become nan after total_loss = inf? - machine-learning

After some amount of training on a custom Multi-agent environment using RLlib's (1.4.0) PPO network, I found that my continuous actions turn into nan (explodes?) which is probably caused by a bad gradient update which in turn depends on the loss/objective function.
As I understand it, PPO's loss function relies on three terms:
The PPO Gradient objective [depends on outputs of old policy and new policy, the advantage, and the "clip" parameter=0.3, say]
The Value Function Loss
The Entropy Loss [mainly there to encourage exploration]
Total Loss = PPO Gradient objective (clipped) - vf_loss_coeff * VF Loss + entropy_coeff * entropy.
I have set entropy coeff to 0. So I am focusing on the other two functions contributing to the total loss. As seen below in the progress table, the relevant portion where the total loss becomes inf is the problem area. The only change I found is that the policy loss was all negative until row #445.
So my question is: Can anyone explain what policy loss is supposed to look like and if this is normal? How do I resolve this issue with continuous actions becoming nan after a while? Is it just a question of lowering the learning rate?
EDIT
Here's a link to the related question (if you need more context)
END OF EDIT
I would really appreciate any tips! Thank you!
Total loss
policy loss
VF loss
430
6.068537
-0.053691725999999995
6.102932
431
5.9919114
-0.046943977000000005
6.0161843
432
8.134636
-0.05247503
8.164852
433
4.222730599999999
-0.048518334
4.2523246
434
6.563492
-0.05237444
6.594456
435
8.171028999999999
-0.048245672
8.198222999999999
436
8.948264
-0.048484523
8.976327000000001
437
7.556602000000001
-0.054372005
7.5880575
438
6.124418
-0.05249534
6.155608999999999
439
4.267647
-0.052565258
4.2978816
440
4.912957700000001
-0.054498855
4.9448576
441
16.630292999999998
-0.043477765999999994
16.656229
442
6.3149705
-0.057527818
6.349851999999999
443
4.2269225
-0.05446908599999999
4.260793700000001
444
9.503102
-0.052135203
9.53277
445
inf
0.2436709
4.410831
446
nan
-0.00029848056
22.596403
447
nan
0.00013323531
0.00043436907999999994
448
nan
1.5656527000000002e-05
0.0002645221
449
nan
1.3344318000000001e-05
0.0003139485
450
nan
6.941916999999999e-05
0.00025863337
451
nan
0.00015686743
0.00013607396
452
nan
-5.0206604e-06
0.00027541115000000003
453
nan
-4.5543664e-05
0.0004247162
454
nan
8.841756999999999e-05
0.00020278389999999998
455
nan
-8.465959e-05
9.261127e-05
456
nan
3.8680790000000003e-05
0.00032097592999999995
457
nan
2.7373152999999996e-06
0.0005146417
458
nan
-6.271608e-06
0.0013273798000000001
459
nan
-0.00013192794
0.00030621013
460
nan
0.00038987884
0.00038019830000000004
461
nan
-3.2747877999999998e-06
0.00031471922
462
nan
-6.9349815e-05
0.00038836736000000006
463
nan
-4.666238e-05
0.0002851575
464
nan
-3.7067155e-05
0.00020161088
465
nan
3.0623291e-06
0.00019258813999999998
466
nan
-8.599938e-06
0.00036465342000000005
467
nan
-1.1529375e-05
0.00016500981
468
nan
-3.0851965e-07
0.00022042097
469
nan
-0.0001133984
0.00030230957999999997
470
nan
-1.0735256e-05
0.00034000343000000003

It appears that RLLIB's PPO configuration of grad_clip is way too big (grad_clip=40). I changed it to grad_clip=4 and it worked.

I met the same problem when running the rllib example. I also post my problem in this issue. I am also running PPO in a countious and bounded action space. The PPO output actions that are quite large and finally crash dued to Nan related error.
For me, it seems that when the log_std of the action normal distribution is too large, large actions(about 1e20) will appear. I copy the codes for calculate loss in RLlib(v1.10.0) ppo_torch_policy.py and paste them below.
logp_ratio = torch.exp(
curr_action_dist.logp(train_batch[SampleBatch.ACTIONS]) -
train_batch[SampleBatch.ACTION_LOGP])
action_kl = prev_action_dist.kl(curr_action_dist)
mean_kl_loss = reduce_mean_valid(action_kl)
curr_entropy = curr_action_dist.entropy()
mean_entropy = reduce_mean_valid(curr_entropy)
surrogate_loss = torch.min(
train_batch[Postprocessing.ADVANTAGES] * logp_ratio,
train_batch[Postprocessing.ADVANTAGES] * torch.clamp(
logp_ratio, 1 - self.config["clip_param"],
1 + self.config["clip_param"]))
For that large actions, the logp curr_action_dist.logp(train_batch[SampleBatch.ACTIONS])computed by <class 'torch.distributions.normal.Normal'> will be -inf. And then curr_action_dist.logp(train_batch[SampleBatch.ACTIONS]) -train_batch[SampleBatch.ACTION_LOGP]) return Nan. torch.min and torch.clamp will still keep the Nan output(refer to the doc).
So in conclusion, I guess that the Nan is caused by the -inf value of the log probability of very large actions, and the torch failed to clip it according to the the "clip" parameter.
The difference is that I do not set entropy_coeff to zero. In my case, the std variance is encouraged to be as large as possible since the entropy is computed for the total normal distribution instead of the distribution restricted to the action space. I am not sure whether you get large σ as I do. In addition, I am using Pytorch, things may be different for Tf.

Related

problem with missing value. Does not work for every missing value?

I want my missing values to be replaced by the mode of given data. But my code is replacing only one of the missing values. Why?
my real data is:
0 NaN
1 NaN
2 normal
3 normal
4 normal
...
395 normal
396 normal
397 normal
398 normal
399 normal
Name: rbc, Length: 400, dtype: object
my code is:
rbc = data_penyakit['rbc'].mode()
rbc = data_penyakit['rbc'].mask(pd.isna, rbc)
rbc
and the result is
0 normal
1 NaN
2 normal
3 normal
4 normal
...
395 normal
396 normal
397 normal
398 normal
399 normal
Name: rbc, Length: 400, dtype: object
Why is the second missing value not replaced?

mode is giving back nan as the second most frequent item. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mode.html
So how about
fill = data_penyakit['rbc'].mode().iloc[0]
rbc.fillna(value=fill, inplace=True)

Perform basic arithmetic operations on AudioKit FM oscillator parameters: Interpolation & time Transition

Does AudioKit provide a method to calculate interpolated values of discrete array members?
Does AudioKit provide a method to smooth transition operation between parameters of an oscillator like baseFrequency, AKOperation.periodicTrigger or hold?
Below the code I use for FM generation:
let oscillator = AKOperation.fmOscillator(baseFrequency: Synth.frequency,
carrierMultiplier: 2,
modulatingMultiplier: 0.8,
modulationIndex: 1,
amplitude: Synth.amplitude.triggeredWithEnvelope(
trigger: AKOperation.periodicTrigger(period: Synth.cyclic),
attack: 0.01,
hold: Synth.hold,
release: 0.01))
For input parameter interpolated values of Frequency Cycle and Duty shall be calculated by interpolation based on the table (array) below:
P1 Freq. Cycle Duty %
-10 200 100 100
-3.04 405 100 100
-0.51 300 500 100
-0.50 200 800 5
0.09 400 600 10
0.10 400 600 50
1.16 550 552 52
2.67 763 483 55
4.24 985 412 58
6.00 1234 322 62
8.00 1517 241 66
10.00 1800 150 70
The transition of values (for Freq., Cycle ans Duty) shall be smoothen based on input parameter P1. Is this what AKComputedParameter e.g. smoothDelay is made for?
How do I tell AudioKit to apply AKComputedParameter?
Do you have a sample code (code snippet) for achievement of interpolation/transition operation with application to oscillator based on the code above? Either based on AK or vDSP methods.
I’m not quiet sure on how to apply https://audiokit.io/docs/Protocols/AKComputedParameter.html

I think this question was downvoted somewhat because it seems like you're asking for too much of an actual implementation with that table of values. I'm going to ignore that and say that however you decide to change the parameters of the oscillator in your app logic, you can make the transitions smooth by portamento'ing the values.
So, in your case for frequency you would replace Synth.frequency with a parameter you set that you would then portamento like AKOperation.parameters[0].portamento(halfTime: 0.5)
See an example for using parameters here: https://audiokit.io/playgrounds/Synthesis/Plucked%20String%20Operation/

Multi-class classification in sparse dataset

I have a dataset of factory workstations.
There are two types of error in same particular time.
User selects error and time interval (dependent variable-y)
Machines produces errors during production (independent variables-x)
User selected error types are 8 unique in total so I tried to predict those errors using machine-produced errors(total 188 types) and some other numerical features such as avg. machine speed, machine volume, etc.
Each row represents user-selected error in particular time;
For example in first line user selects time interval as:
2018-01-03 12:02:00 - 2018-01-03 12:05:37
and m_er_1(machine error 1) also occured in same time interval 12 times.
m_er_1_dur(machine error 1 duration) is total duration of machine error in seconds
So I matched those two tables and looks like below;
user_error m_er_1 m_er_2 m_er_3 ... m_er_188 avg_m_speed .. m_er_1_dur
A 12 0 0 0 150 217
B 0 0 2 0 10 0
A 3 0 0 6 34 37
A 0 0 0 0 5 0
D 0 0 0 0 3 0
E 0 0 0 0 1000 0
In the end, I have 1900 rows 390 rows( 376( 188 machine error counts + 188 machine error duration) + 14 numerical features) and due to machine errors it is a sparse dataset, lots of 0.
There a none outliers, none nan values, I normalized and tried several classification algorithms( SVM, Logistic Regression, MLPC, XGBoost, etc.)
I also tried PCA but didn't work well, for 165 components explained_variance_ratio is around 0.95
But accuracy metrics are very low, for logistic regression accuracy score is 0.55 and MCC score around 0.1, recall, f1, precision also very low.
Are there some steps that I miss? What would you suggest for multiclass classification for sparse dataset?
Thanks in advance

GridSearchCV freezing with linear svm

I have problem with GridSearchCV freezing (CPU is active but program in not advancing) with linear svm (but with rbf svm it works fine).
Depending on the random_state that I use for splitting my data, I have this freezing in different splits points of cv for different PCA components?
The features of one sample looks like the following(it is about 39 features)
[1 117 137 2 80 16 2 39 228 88 5 6 0 10 13 6 22 23 1 227 246 7 1.656934307 0 5 0.434195726 0.010123735 0.55568054 5 275 119.48398 0.9359527 0.80484825 3.1272728 98 334 526 0.13454546 0.10181818]
Another sample's features:
[23149 4 31839 9 219 117 23 5 31897 12389 108 2 0 33 23 0 0 18 0 0 0 23149 0 0 74 0.996405221 0.003549844 4.49347E-05 74 5144 6.4480677 0.286384 0.9947901 3.833787 20 5135 14586 0.0060264384 0.011664075]
If I delete the last 10 feature I don't have this problem ( The 10 new features that I added before my code worked fine). I did not check other combinations of the 10 last new features to check if a specific feature is causing this problem.
Also I use StandardScaler to scale the features but still facing this issue. I have less of this problem if I use MinMaxScaler scaler (but read soewhere it is not good for svm).
I also put n_jobs to different numbers and it only could advance by little but freezes again.
What do you suggest?
I followed part of this code to write my code:
TypeError grid seach

NMF Sparse Matrix Analysis (using SKlearn)

Just looking for some brief advice to put me back on the right track. I have been working on a solution to a problem where I have a very sparse input matrix (~25% of information filled, rest is 0's) stored in a sparse.coo_matrix:
sparse_matrix = sparse.coo_matrix((value, (rater, blurb))).toarray()
After some work on building this array from my data set and messing around with some other options, I currently have my NMF model fitter function defined as follows:
def nmf_model(matrix):
model = NMF(init='nndsvd', random_state=0)
W = model.fit_transform(matrix);
H = model.components_;
result = np.dot(W,H)
return result
Now, the issue is my output doesn't seem to be accounting for the 0 values correctly. Any value that was a 0 gets bumped to some value less than 1 and my known values fluctuate from the actual quite a bit (All data are ratings between 1 and 10). Can anyone spot what I am doing wrong? From the documentation for scikit, I assumed using the nndsvd initialization would help account for the empty values correct. Sample output:
#Row / Column / New Value
35 18 6.50746917334 #Actual Value is 6
35 19 0.580996641675 #Here down are all "estimates" of my function
35 20 1.26498699492
35 21 0.00194119935464
35 22 0.559623469753
35 23 0.109736902936
35 24 0.181657421405
35 25 0.0137801897011
35 26 0.251979684515
35 27 0.613055371646
35 28 6.17494590041 #Actual values is 5.5
Appreciate any advice any more experienced ML coders can offer!

Categories

HOME

imagemagick

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

RLlib PPO continuous actions seem to become nan after total_loss = inf? - machine-learning

It appears that RLLIB's PPO configuration of grad_clip is way too big (grad_clip=40). I changed it to grad_clip=4 and it worked.

Related

problem with missing value. Does not work for every missing value?

Perform basic arithmetic operations on AudioKit FM oscillator parameters: Interpolation & time Transition

Multi-class classification in sparse dataset

GridSearchCV freezing with linear svm

NMF Sparse Matrix Analysis (using SKlearn)

Categories

Resources