After some amount of training on a custom Multi-agent environment using RLlib's (1.4.0) PPO network, I found that my continuous actions turn into nan (explodes?) which is probably caused by a bad gradient update which in turn depends on the loss/objective function.
As I understand it, PPO's loss function relies on three terms:
The PPO Gradient objective [depends on outputs of old policy and new policy, the advantage, and the "clip" parameter=0.3, say]
The Value Function Loss
The Entropy Loss [mainly there to encourage exploration]
Total Loss = PPO Gradient objective (clipped) - vf_loss_coeff * VF Loss + entropy_coeff * entropy.
I have set entropy coeff to 0. So I am focusing on the other two functions contributing to the total loss. As seen below in the progress table, the relevant portion where the total loss becomes inf is the problem area. The only change I found is that the policy loss was all negative until row #445.
So my question is: Can anyone explain what policy loss is supposed to look like and if this is normal? How do I resolve this issue with continuous actions becoming nan after a while? Is it just a question of lowering the learning rate?
EDIT
Here's a link to the related question (if you need more context)
END OF EDIT
I would really appreciate any tips! Thank you!
Total loss
policy loss
VF loss
430
6.068537
-0.053691725999999995
6.102932
431
5.9919114
-0.046943977000000005
6.0161843
432
8.134636
-0.05247503
8.164852
433
4.222730599999999
-0.048518334
4.2523246
434
6.563492
-0.05237444
6.594456
435
8.171028999999999
-0.048245672
8.198222999999999
436
8.948264
-0.048484523
8.976327000000001
437
7.556602000000001
-0.054372005
7.5880575
438
6.124418
-0.05249534
6.155608999999999
439
4.267647
-0.052565258
4.2978816
440
4.912957700000001
-0.054498855
4.9448576
441
16.630292999999998
-0.043477765999999994
16.656229
442
6.3149705
-0.057527818
6.349851999999999
443
4.2269225
-0.05446908599999999
4.260793700000001
444
9.503102
-0.052135203
9.53277
445
inf
0.2436709
4.410831
446
nan
-0.00029848056
22.596403
447
nan
0.00013323531
0.00043436907999999994
448
nan
1.5656527000000002e-05
0.0002645221
449
nan
1.3344318000000001e-05
0.0003139485
450
nan
6.941916999999999e-05
0.00025863337
451
nan
0.00015686743
0.00013607396
452
nan
-5.0206604e-06
0.00027541115000000003
453
nan
-4.5543664e-05
0.0004247162
454
nan
8.841756999999999e-05
0.00020278389999999998
455
nan
-8.465959e-05
9.261127e-05
456
nan
3.8680790000000003e-05
0.00032097592999999995
457
nan
2.7373152999999996e-06
0.0005146417
458
nan
-6.271608e-06
0.0013273798000000001
459
nan
-0.00013192794
0.00030621013
460
nan
0.00038987884
0.00038019830000000004
461
nan
-3.2747877999999998e-06
0.00031471922
462
nan
-6.9349815e-05
0.00038836736000000006
463
nan
-4.666238e-05
0.0002851575
464
nan
-3.7067155e-05
0.00020161088
465
nan
3.0623291e-06
0.00019258813999999998
466
nan
-8.599938e-06
0.00036465342000000005
467
nan
-1.1529375e-05
0.00016500981
468
nan
-3.0851965e-07
0.00022042097
469
nan
-0.0001133984
0.00030230957999999997
470
nan
-1.0735256e-05
0.00034000343000000003
It appears that RLLIB's PPO configuration of grad_clip is way too big (grad_clip=40). I changed it to grad_clip=4 and it worked.
I met the same problem when running the rllib example. I also post my problem in this issue. I am also running PPO in a countious and bounded action space. The PPO output actions that are quite large and finally crash dued to Nan related error.
For me, it seems that when the log_std of the action normal distribution is too large, large actions(about 1e20) will appear. I copy the codes for calculate loss in RLlib(v1.10.0) ppo_torch_policy.py and paste them below.
logp_ratio = torch.exp(
curr_action_dist.logp(train_batch[SampleBatch.ACTIONS]) -
train_batch[SampleBatch.ACTION_LOGP])
action_kl = prev_action_dist.kl(curr_action_dist)
mean_kl_loss = reduce_mean_valid(action_kl)
curr_entropy = curr_action_dist.entropy()
mean_entropy = reduce_mean_valid(curr_entropy)
surrogate_loss = torch.min(
train_batch[Postprocessing.ADVANTAGES] * logp_ratio,
train_batch[Postprocessing.ADVANTAGES] * torch.clamp(
logp_ratio, 1 - self.config["clip_param"],
1 + self.config["clip_param"]))
For that large actions, the logp curr_action_dist.logp(train_batch[SampleBatch.ACTIONS])computed by <class 'torch.distributions.normal.Normal'> will be -inf. And then curr_action_dist.logp(train_batch[SampleBatch.ACTIONS]) -train_batch[SampleBatch.ACTION_LOGP]) return Nan. torch.min and torch.clamp will still keep the Nan output(refer to the doc).
So in conclusion, I guess that the Nan is caused by the -inf value of the log probability of very large actions, and the torch failed to clip it according to the the "clip" parameter.
The difference is that I do not set entropy_coeff to zero. In my case, the std variance is encouraged to be as large as possible since the entropy is computed for the total normal distribution instead of the distribution restricted to the action space. I am not sure whether you get large σ as I do. In addition, I am using Pytorch, things may be different for Tf.
I have a dataframe like this:
A B C
1 32 nan nan
2 32 nan nan
3 nan nan 14
4 nan nan nan
my desired output is the following:
A B C
1 32 nan nan
2 32 nan nan
3 ND nan 14
4 ND nan ND
Basically, I need:
To exclude all the columns with only nan values
To fill nan (with "ND") after a value and not before! (see column C)
Could you help me, please?
Thanks
I found the following solution:
df=df.where(df.ffill().isna(), df.fillna("ND"))
However, if you try to run this line you will get the following error:
TypeError: <U2 cannot be converted to an IntegerDtype
I solved it using replace:
df=df.where(df.ffill().isna(), df.fillna(123456789123456789))
df= df.replace(123456789123456789, "ND")
I am using DatePipe to display Epoch time, the milliseconds time is in UTC, I want to keep it display on UTC so I am setting timezone option is '0'.
Here is my code:
<span class="description">{{epochTime | date:'dd/MM/yy hh:mm a':'0'}}</span>
with 'epochTime' = '1570274515223' <=> 'Sat Oct 05 2019 11:21:55 AM'.
But on the UI is: '05/10/19 06:21 PM'
It is getting local time zone here. Why I can not force pipe to use UTC time?
'0' is not valid. Instead, specify the offset in one of the following ways: 0, '+0','+00:00', 'UTC'
There is a relatively new package that has come out called acmeR for producing estimates of mortality (for birds and bats), and it takes into consideration things like bleedthrough (was the carcass still there but undetected, and then found in a later search?), diminishing searcher efficiency, etc. This is extremely useful, except I can't seem to get it to work, despite it seeming to be pretty straightforward. The data structure should be like:
Date, in US format mm/dd/yyyy or ISO 8601 format yyyy-mm-dd
Time, in am/pm US format HH:MM:SS AM or 24-hr ISO format HH:MM:SS
ID, arbitrary distinct alpha strings unique to each carcas
Species, arbitrary distinct alpha strings (e.g. AOU, ABMP, IBP)
Event, “Place”, “Check”, or “Search” (only 1st letter counts)
Found, TRUE or FALSE (only 1st letter counts)
and look something like this:
“Date”,“Time”,“ID”,“Species”,“Event”,“Found”
“1/7/2011”,“08:00:00 PM”,“T091”,“UNBA”,“Place”,TRUE
“1/8/2011”,“12:00:00 PM”,“T091”,“UNBA”,“Check”,TRUE
“1/8/2011”,“16:00:00 PM”,“T091”,“UNBA”,“Search”,FALSE
My data look like this:
Date: Date, format: "2017-11-09" "2017-11-09" "2017-11-09" ...
Time: Factor w/ 644 levels "1:00 PM","1:01 PM",..: 467 491 518 89 164 176 232 39 53 247 ...
Species: Factor w/ 52 levels "AMCR","AMKE",..: 31 27 39 27 39 31 39 45 27 24 ...
ID: Factor w/ 199 levels "GHBT001","GHBT002",..: 1 3 2 3 2 1 2 7 3 5 ...
Event: Factor w/ 3 levels "Check","Place",..: 2 2 2 3 3 3 1 2 1 2 ...
Found: logi TRUE TRUE TRUE FALSE TRUE TRUE ...
I have played with the date, time, event, etc formats too, trying multiple formats because I have had some issues there. So here are some of the errors I have encountered, depending on what subset of data I use:
Error in optim(p0, f, rd = rd, method = "BFGS", hessian = TRUE) :non-finite value supplied by optim In addition: Warning message: In log(c(a0, b0, t0)) : NaNs produced
Error in read.data(fname, spec = spec, blind = blind) : Expecting date format YYYY-MM-DD (ISO) or MM/DD/YYYY (USA) USA
Error in solve.default(rv$hessian): system is computationally singular: reciprocal condition number = 1.57221e-20
Warning message: # In sqrt(diag(Sig)[2]) : NaNs produced
Error in solve.default(rv$hessian) : Lapack routine dgesv: system is exactly singular: U[2,2] = 0
The last error is most common (and note, my data are non-numeric, sooooo... I am not sure what math is being done behind the scenes to come up with these equations, then fail in the solve), but the others are persistent too. Sometimes, despite the formatting being exactly the same in one dataset, a subset of that data will return an error when the parent dataset does not (does not have anything to do with species being there/not being there in one or the other dataset, as far as I can tell).
I cannot find any bug reports or issues with the acmeR package out there - so maybe it runs perfectly and my data are the problem, but after three ecologists have vetted the data and pronounced it good, I am at a dead end.
Here is a subset of the data, so you can see what they look like:
Date Time Species ID Event Found
8 2017-11-09 1:39 PM VATH GHBT007 P T
11 2017-11-09 2:26 PM CORA GHBT004 P T
12 2017-11-09 2:30 PM EUST GHBT006 P T
14 2017-11-09 6:43 AM CORA GHBT004 S T
18 2017-11-09 8:30 AM EUST GHBT006 S T
19 2017-11-09 9:40 AM CORA GHBT004 C T
20 2017-11-09 10:38 AM EUST GHBT006 C T
22 2017-11-09 11:27 AM VATH GHBT007 S F
32 2017-11-09 10:19 AM EUST GHBT006 C F
From firebug:
>>> jQuery.fullCalendar.parseISO8601("2011-04-18T17:00:00Z").getUTCHours()
22
Shouldn't the result be 17?
parseISO8601 returns this date object:
>>> jQuery.fullCalendar.parseISO8601("2011-04-18T17:00:00Z")
Date {Mon Apr 18 2011 17:00:00 GMT-0500 (CDT)}
I think the date object should be "12:00:00 GMT-0500" to be the same time. Am I misunderstanding it?
From FF 4's Date object:
>>> new Date("2011-04-18T17:00:00Z")
Date {Mon Apr 18 2011 12:00:00 GMT-0500 (CDT)}
>>> new Date("2011-04-18T17:00:00Z").getUTCHours()
17
you are experiencing this issue:
http://code.google.com/p/fullcalendar/issues/detail?id=750
will get it fixed soon