I am new to Machine Learning, and I am trying to solve MountainCar-v0 using Q-learning. I can solve the problem now, but I am still confused.
According to the MountainCar-v0's Wiki, the reward remains -1 for every step, even if the car has reached the destination. How does the invariant reward help the agent learn? If every step gives the same reward, how can the agent tell if it is a good move or a bad move?
Thanks in advance!
The goal is to get the car to its destination as quickly as possible. If the agent has a fast run, even though the reward is still negative, it is still higher than the lower reward the agent would receive for a relatively slow run. This difference is enough for the agent to learn. The reward system for this environment encourages the agent to get to its target destination as soon as possible because it only stops receiving negative rewards once it reaches that terminal state.
Related
My Jenkins pipeline includes an Integration Stage where I deploy my software and test it in combination with a real-life instance of resource X. To use X, I must subscribe to it before I start the test, and unsubscribe from it afterwards. Each of these steps can take up to several minutes.
In daily operation, I keep observing the pattern that a first build #42 unsubscribes from X. Seconds later, the next build #43 starts and wants to subscribe to X. X, still captured in the previous unsubscription, does not handle this well and breaks, consequently also breaking the build #43.
The pattern we're doing there - unsubscribing and resubscribing in quick succession - is not something to be expected in real life and I cannot really pretend X was doing something wrong there. So instead of urging X's team to change the resource, I'd prefer to improve my test.
The first idea that came into my mind was creating a new instance of X each time. This proved to be too complicated. It doesn't have the APIs for that, technical users are too hard to obtain, and instantiation is so heavy it would slow down the already-slow pipeline down by several minutes more. I discarded the possibility.
Another idea that came into my mind was creating multiple instances N of X. Successive builds would then be able to choose a different one each time, giving the other instances sufficient time to cool down before they are used again.
A trivial solution would be to create the pool of instances and then randomly choose one of them. This would leave me with a 1/N chance of choosing the same one as the build before. With high N, this can be moved into an acceptable range of failure probability, but it still leaves this nagging "it's not really reliable" feeling you don't want to have with your pipeline.
A more complex solution would be that I saved in some central place when each instance of X was used for the last time, i.e. a simple map of instance number to last-used timestamp. However, for that, I'd need to exchange information between the builds. Leading up to my question:
How can I share a small amount of data between builds of the same job? Preferrably a simple variable that can be accessed by the pipeline code right away. Alternativey, a file or some other means of permanent storage. The solution should survive the fact that some builds might break before they ever reach that stage, i.e. should not imply that the direct predecessor passed that stage successfully.
Try using the Global Environment Variable.
The global environment variable will retain the value after each build.
You can use the same environment variables across jobs or across the successive build of the same job.
I have a Jmeter script which I am running from local environment and achieved 20 tps(transactions) per second.
I have moved the same script to Jenkins and ran it from there. It worked as expected.
My Next step is to reduce tps from 20 to 2.
So I introduced Ramp up time of 30 seconds and it worked as expected from local environment.
I moved the script to jenkins, it gave me 20 tps when I run the script from jenkins.
Can someone tell me why this is happening and what I need to do to fix this.
I have tried several approaches like hard coding the ramp up time, creating a new jenkins project with new script.
Thanks in advance
It's hard to say what exactly is wrong without seeing your full Thread Group configuration, normally people use Timers for throttling JMeter's throughput to the given number of requests/transactions per second
Depending on what you're trying to achieve you can consider using:
Constant Throughput Timer (however it's precise enough on "minute" level, for the first minute of your test execution you will need to manipulate the load using the ramp-up approach
Precise Throughput Timer which is more powerful and "precise" however it's configuration is more complex and you need to provide test duration, units, etc.
Throughput Shaping Timer which is kind of a balance between precision and simplicity however it's a custom plugin so you will have to install it on Jenkins master/slaves
I am sporadically getting the following errors:
W Refusing to split
at '\x00\x00\x00\x15\xbc\x19)b\x00\x01': proposed
split position is out of range
['\x00\x00\x00\x15\x00\xff\x00\xff\x00\xff\x00\xff\x00\x01',
'\x00\x00\x00\x15\xbc\x19)b\x00\x01'). Position of last group
processed was '\x00\x00\x00\x15\xbc\x19)a\x00\x01'.
When it happens, the error is logged every so often and the job never seems to end. Although it seems that it did actually complete the job otherwise.
In the last instance I am using 10 workers and have auto scaling disabled. I am using the Python implementation of Apache Beam.
This is not an error, it's part of normal operation of a pipeline. We should probably reduce its logging level to INFO and rephrase it, because it very frequently confuses people.
This message (rather obscurely) signals that Dataflow is trying to apply dynamic rebalancing, but there's no work that can be further subdivided.
I.e. your job is stuck doing something non-parallelizable on a small number of workers, while other workers are staying idle. To investigate this further, one would need to look at the code of your job and the Dataflow job id.
I'm curious if anyone can point me towards greater visibility into how various Beam Runners manage autoscaling. We seem to be experiencing hiccups during both the 'spin up' and 'spin down' phases, and we're left wondering what to do about it. Here's the background of our particular flow:
1- Binary files arrive on gs://, and object notification duly notifies a PubSub topic.
2- Each file requires about 1Min of parsing on a standard VM to emit about 30K records to downstream areas of the Beam DAG.
3- 'Downstream' components include things like inserts to BigQuery, storage in GS:, and various sundry other tasks.
4- The files in step 1 arrive intermittently, usually in batches of 200-300 every hour, making this - we think - an ideal use case for autoscaling.
What we're seeing, however, has us a little perplexed:
1- It looks like when 'workers=1', Beam bites off a little more than it can chew, eventually causing some out-of-RAM errors, presumably as the first worker tries to process a few of the PubSub messages which, again, take about 60 seconds/message to complete because the 'message' in this case is that a binary file needs to be deserialized in gs.
2- At some point, the runner (in this case, Dataflow with jobId 2017-11-12_20_59_12-8830128066306583836), gets the message additional workers are needed and real work can now get done. During this phase, errors decrease and throughput rises. Not only are there more deserializers for step1, but the step3/downstream tasks are evenly spread out.
3-Alas, the previous step gets cut short when Dataflow senses (I'm guessing) that enough of the PubSub messages are 'in flight' to begin cooling down a little. That seems to come a little too soon, and workers are getting pulled as they chew through the PubSub messages themselves - even before the messages are 'ACK'd'.
We're still thrilled with Beam, but I'm guessing the less-than-optimal spin-up/spin-down phases are resulting in 50% more VM usage than what is needed. What do the runners look for beside PubSub consumption? Do they look at RAM/CPU/etc??? Is there anything a developer can do, beside ACK a PubSub message to provide feedback to the runner that more/less resources are required?
Incidentally, in case anyone doubted Google's commitment to open-source, I spoke about this very topic with an employee there yesterday, and she expressed interest in hearing about my use case, especially if it ran on a non-Dataflow runner! We hadn't yet tried our Beam work on Spark (or elsewhere), but would obviously be interested in hearing if one runner has superior abilities to accept feedback from the workers for THROUGHPUT_BASED work.
Thanks in advance,
Peter
CTO,
ATS, Inc.
Generally streaming autoscaling in Dataflow works like this :
Upscale: If the pipeline's backlog is more than a few seconds based on current throughput, pipeline is upscaled. Here CPU utilization does not directly affect the amount of upsize. Using CPU (say it is at 90%), does not help in answering the question 'how many more workers are required'. CPU does affect indirectly since pipelines fall behind when they they don't enough CPU thus increasing backlog.
Downcale: When backlog is low (i.e. < 10 seconds), pipeline is downcaled based on current CPU consumer. Here, CPU does directly influence down size.
I hope the above basic description helps.
Due to inherent delays involved in starting up new GCE VMs, the pipeline pauses for a minute or two during resizing events. This is expected to improve in near future.
I will ask specific questions about the job you mentioned in description.
I'm on Neo4j 3.1.2. I'm trying to automate monitoring a causal cluster for proper redundancy, preferably over the http interface, dbms.cluster.overview being the most obvious call. But when they die, servers drop off this list without regard to how they exit. The operations manual says there is a difference between clean shutdowns and unclean ones. How do I figure out if a server left cleanly or uncleanly? Is there a procedure to clean up a unclean failure that's never coming back?
In general I would like to know the number of core servers Neo4j is checking for consensus. I don't see an API to find that number. That way I could tell how close to failure we are.
The expected_core_cluster size setting is used when bootstrapping the cluster on first formation. A cluster will not form without the configured amount of cores and this should in general be configured to the full and fixed amount.
This setting is then also used as the minimum consensus group size. The consensus group size (core machines successfully voted into the raft) can shrink and grow dynamically but bounded on the lower end at this number.
The intention is in almost all cases for users to leave this setting alone. If you have 5 machines then you can survive failures down to 3 remaining, e.g. with 2 dead members. The three remaining can still vote another replacement member in successfully up to a total of 6 (2 of which are still dead) and then after this, one of the superfluous dead members will be immediately and automatically voted out (so you are left with 5 members in the consensus group, 1 of which is currently dead). Operationally you can now bring the last machine up by bringing in another replacement or repairing the dead one.
If the intention really is to bring the expected_core_cluster size down to 3 then today you will have to update the setting and do a rolling restart. This is considered an uncommon scenario. Causal clustering optimizes operationally for repair/replace.
The only difference between a clean and an unclean shutdown is that the former leads to a quicker discovery of a core member disappearing, since that is based on a timeout.