--workerCacheMB setting missing in apache beam 0.6? - google-cloud-dataflow

In Google Cloud Dataflow 1.x, I presumably had access to this critical pipeline option called:
workerCacheMb
I tried to set in in my beam 0.6 pipeline, but couldn't do so (it said that no such option existed.). I then scoured through the options source code to see if any option had a similar name -- but I still couldn't find it.
I need to set it as I think that my worfklow's incredibly slowness is due to a side input that 3GB but that appears to be taking well over 20 minutes to read. (I have a View.asList() and then I'm trying to do a for-loop on the list -- it's taking more than 20 minutes and still going; even at 3 GB, that's way too slow.) So, I was hoping that setting the workerCacheMb would help. (The only other theory I have is to switch from serializablecoder to AvroCoder....)

Are you using the right class of options?
The following code works for me in Beam:
DataflowWorkerHarnessOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create()
.cloneAs(DataflowWorkerHarnessOptions.class);
options.setWorkerCacheMb(3000);

Related

My code gives an "attempt to call a nil value" on the computer controlled seed analyzer in Minecraft

So I have spent a few hours looking for documentation on the item "Computer Controlled Seed Analyzer" with no current information that is useful. My goal is to set up a seed analyzer that will check for a plant next to the analyzer and analyze it.
My code:
local sides = require("sides")
if hasPlant(sides.left) and isAnalyzed() == false then
analyze(side.left)
end
From my logic, I believe the outcome should analyze the seed, but instead it gives an attempt to call a nil value (global hasPlant). From my research, sides were not defined at the time therefor I added the local line. What else would I be missing?
Two problems here:
The mods involved are currently buggy, so OpenComputers integration doesn't work at all. I opened pull request #1260 for AgriCraft and #31 for InfinityLib that will fix it. Until it's fixed, there's nothing you can do in-game to make it work. If you don't want to wait for official releases with the fixes, you can use my unofficial builds of AgriCraft and of InfinityLib, which I used to test my PRs and the below code.
The Lua code you're writing is wrong. I'm not sure where you got it from, but here's how you make it work:
if component.agricraft_peripheral.hasPlant("EAST") and component.agricraft_peripheral.isAnalyzed() == false then
component.agricraft_peripheral.analyze("EAST")
end
Of note:
The Agricraft API takes the strings DOWN, UP, NORTH, SOUTH, WEST, and EAST, rather than the numeric constants from side.
The functions provided by components in OpenComputers aren't globals; they're nested inside of component.
You may need local component = require("component"), so add it to the top if you get an error about it missing. (It works for me without it, but a bunch of documentation says you need it.)

Error when performing schema changes in DSE 5.0

I am trying to get my head around using graphs for the first time - and as you can imagine, I am having a fair bit of trial and error.
Subsequently, I am doing a lot of;
Create Schema
find a mistake / modelling error
delete schena
rinse and repeat
All of which is completely fine: But for the fact that I seem to constantly be getting the following error;
Schema migration interrupted. The migration operation will continue in the background.
Now if I get this error when doing a schema.clear(), then, it actually doesn't continue in the background at all - it is lying!
I have to rerun the command and sometimes, several times to get the schema deleted.
And if that isn't annoying enough - I might end up with the following, too.
Script evaluation exceeded the configured threshold for the request: [149a3432-b1b3-45b7-8e68-d21c0325d877 - schema.clear()]
I have a single DC, two racks, with 2 nodes each - as a training cluster.
I am using DSE 5.0.1
I am using the GossipingPropertyFileSnitch - snitch
(I also have the rack properties file, too, for the above snitch type.
And I also ensure that I have run;
:remote config timeout max
in the gremlin-console, too...
So I am not really sure how it can complain about timing out and since this is all on my local PC in Virtual machines - and is only being used by me - I don't understand how something is interrupting the command I just asked it to complete, either!
Thanks if anyone has any ideas!
-Gavin
With special thanks to Jeremy at DataStax - I have a solution for my time out issue;
I still don't understand why it complained in the first place - in that I was the only person using the cluster - on virtual machines on my own PC.... but nonetheless - I can successfully complete commands in the gremlin console.
The required change is in DSE.YAML
altering the following configuration item to a value higher than the default of 30 seconds. (I set it to 180 sec)
realtime_evaluation_timeout: 180 sec

BPEL, threads stucked on HashMap.getEntry?

I am new to SOA, and currently we met a problem when using BPEL to do some XML transformation.
we have 3 SOA projects will do something like:
Read input files from folder which is in text format
Save file content in Database and put on AQ
Read file id from AQ, load content from database, and transform to our internal XML format
apply some business logic and transform content back to text format.
SOA proejct1 do step 1-2, project2 do step 3 and project3 to step4.
We are doing some load test with input 7000 files.
the problem we experienced is that the memory use of "Old Generation" keep accumulating, although major GC can reduce it, it still keep growing, until 100%. Then no new BEPL instance can be created, and we met transaction timeout.
after analyze heap dump, we get a result like below, it seems that BPELFactoryImpl hold a HashMap which more than 180M, and it will keep growing. so does anyone experienced something similar?
we use SOA version 12.1.3. this problem stopped us for weeks, please help, thanks a lot.
Image of heap analysis
Guys
Finally we got an answer on this, it was caused by a bug, as said by Oracle Support, we are waiting for the patch.
thanks for your attention.
It's a bug. You should raise an SR referring for: stuck threads on
at java.util.HashMap.getEntry(HashMap.java:465)
at java.util.HashMap.get(HashMap.java:417)
at oracle.xml.parser.v2.XMLNode.setUserData(XMLNode.java:2137)
at oracle.bpel.lang.v20.model.impl.ExtensibleElementImpl.doCreateElement(ExtensibleElementImpl.java:502)
at oracle.dp.entity.impl.EmFacadeObjectImpl.getElement(EmFacadeObjectImpl.java:35)
at oracle.bpel.lang.v20.model.impl.ExtensibleElementImpl.performDOMChange(ExtensibleElementImpl.java:707)
at oracle.bpel.lang.v20.model.impl.ExtensibleElementImpl.doOnChange(ExtensibleElementImpl.java:636)
at oracle.bpel.lang.v20.model.impl.ExtensibleElementImpl$DOMUpdater.notifyChanged(ExtensibleElementImpl.java:535)
at oracle.dp.notify.impl.NotifierImpl.emNotify(NotifierImpl.java:39)
at oracle.dp.entity.impl.EmHolderImpl.doNotifyOnSet(EmHolderImpl.java:53)
at oracle.dp.entity.impl.EmHolderImpl.set(EmHolderImpl.java:47)
at oracle.bpel.lang.v20.model.impl.CopyImpl.setTo(CopyImpl.java:115)
at com.collaxa.cube.engine.ext.bpel.v2.wmp.BPEL2xCallWMP$CallArgument$1.evaluate(BPEL2xCallWMP.java:190)
at com.collaxa.cube.engine.ext.bpel.v2.wmp.BPEL2xCallWMP.invokeMethod(BPEL2xCallWMP.java:103)
at com.collaxa.cube.engine.ext.bpel.v2.wmp.BPEL2xCallWMP.__executeStatements(BPEL2xCallWMP.java:62)
at com.collaxa.cube.engine.ext.bpel.common.wmp.BaseBPELActivityWMP.perform(BaseBPELActivityWMP.java:188)
at com.collaxa.cube.engine.CubeEngine.performActivity(CubeEngine.java:2880)
....
Bug 20857627 (20867804) : Performance issue due to large number of threads stuck in HashMap.get

Jenkins - retrieve full console output during build step

I have been scouring the internet for days, I have a problem similar to this.
I need to retrieve the console output in raw (plain) text. But if I can get it in HTML that is fine too, I can always parse it. The only thing is that I need to get it during the build step, which is a problem since the location where it should be available is truncated...
I have tried retrieving the console output from the following URL's (relative to the job):
/consoleText
/logText/progressiveText
/logText/progressiveHTML
The two text ones are plain text and would be perfect if not for the truncation, same goes for the HTML one... exactly what I need - only its truncated....
I am sure it is possible to retrieve this information somehow, since when viewing /consoleFull there is a real-time update of the console, without truncating or buffering.
However, upon examining that web page, instead of finding the content I desired, I found this code where it should have been (I did not include the full pages code, since it would be mostly irrelevant, and I believe those answering would be able to find out and know what should be there on their own)
new Ajax.Request(href,{
method: "post",
parameters: {"start":e.fetchedBytes},
requestHeaders: headers,
onComplete: function(rsp,_) {
var stickToBottom = scroller.isSticking();
var text = rsp.responseText;
if(text!="") {
var p = document.createElement("DIV");
e.appendChild(p); // Needs to be first for IE
// Use "outerHTML" for IE; workaround for:
// http://www.quirksmode.org/bugreports/archives/2004/11/innerhtml_and_t.html
if (p.outerHTML) {
p.outerHTML = '<pre>'+text+'</pre>';
p = e.lastChild;
}
else p.innerHTML = text;
Behaviour.applySubtree(p);
if(stickToBottom) scroller.scrollToBottom();
}
e.fetchedBytes = rsp.getResponseHeader("X-Text-Size");
e.consoleAnnotator = rsp.getResponseHeader("X-ConsoleAnnotator");
if(rsp.getResponseHeader("X-More-Data")=="true")
setTimeout(function(){fetchNext(e,href);},1000);
else
$("spinner").style.display = "none";
}
});
Specifically, I am hoping there is a way for me to get the content from text whatever it may be. I am not familiar with this language and so am not sure how I might be able to get the content I want. Plugins won't help since I want to retrieve this content as part of my script during the build step
You did pretty much good investigation already. I can only add the following: all console related plug-ins I know are designed as a post build actions.
The Log Trigger plugin provides a post-build action that allows Hudson
builds to search their console log for a given regular expression and
if found, trigger additional downstream jobs.
So it looks like there is no straightforward solution to your problem. I can see the following options:
1. Use tee or something similar (applicable to shell build steps only)
This solution is far from being universal, but it can provide quick access to the latest console output, produced by a command or set of command.
tee - read from standard input and write to standard output and files
Using synonyms on the system level other Jenkins build steps can modified in order to produce console output. File with console output can be referenced through Jenkins or using any other way.
2. Modify Jenkins code
You can just do a quick fix for internal usage or provide a patch introducing specific system-wide setting.
3. Mimic /console behavior
Code in your example is used to request updates from the Jenkins server. As you may expect the server side can return piece of information starting with some offset. Let me show.
Periodically console page sends requests to the server:
Parameters are straightforward:
Response is a chunk of information to be added:
Another request with updated offset (start) value
You can easily understand there is no data by analyzing Content-Length
So the answer is: use url/job-name/build-number/logText/progressiveHtml, specify start offset, send request and receive console update.
I had a similar issue, the last part of my Jenkinsfile build script needs to parse the ConsoleLog for particular error messages to put in an email build report.
First attempt: http request.
It felt like a hack, it mostly worked, but ran into issues when we locked down access to the Jenkins server & my build nodes could no longer perform annon http gets on the page
Second attempt: use the APIs to enumerate the log lines.
It felt like the right thing to do, but it failed horribly as my nodes would take 30 minutes to get through the 100 meg log files. My presumption is that the Jenkins server was not caching the file, so each request involved a re-reading of the entire file up until the point of the last read.
Third and most successful solution: run grep on the server.
node('master') {
sh 'grep some_criteria $JENKINS_HOME/workspace/path/to/job/console.log'
}
it was fast, reliable, and it didn't matter how big the log files were.
Yes, this required trust of the Jenkins admin and knowledge of the directory paths on the Jenkins server - but since I was the admin, I trusted myself to do the right thing. Your mileage may vary.
To add some insight: when the Jenkins build was in progress, the response for the .../consoleText URL maxed out at 10000 lines, exactly.
I was using 'requests()' package in Python. I have tried the same URL with curl and again received only the first 10K lines.
Only after the build has finished both methods returned the full log (>22K lines in my case).
I will research further and hope to report back.
[2015-08-18] Update: It seems that this is a known issue (see here) and it's fixed in Jenkins 1.618 and later. I am still running 1.615 so I cannot verify.
Amir

symfony2 free -m out of memory

I have symfony2 app out there. But we have RAM memory problems... It works like a charm when there is 50 active people (google analytics).
I select data from DB usally like this:
$qb=$this->createQueryBuilder('s')
->addSelect('u')
->where('s.user = :user')
->andWhere('s.admin_status = false')
->andWhere('s.typ_statusu != :group')
->setParameter('user', $user)
->setParameter('group', 'group')
->innerJoin('s.user', 'u')
->orderBy('s.time', 'DESC')
->setMaxResults(15);
return $query=$qb->getQuery()->getResult();
The queries are fast i dont have problem with them.
Let me please know exactly what you need and i will paste it here. I need to fix it so much..
BUT THE PROBLEM COME NOW: When there is 470people at the same time.. (google analytics) there is about 7GB of memory away... then it fall down after peak to 5GB. But WHY SO MUCH??? My scripts take from 10-17MB of memory usage in app_dev.
I also use APC. How can i solve this situation? Why is there so much memory consumed? Thx for any advice!!
Whats your average memory?
BTW: When i will not solve this i will be in big troubles.
One problem could be doctrine and if you are hydrating too much obejcts in every single request.
Set max execution time of a script to only 30 seconds:
max_execution_time = 30
Set APC shm_size to something reasonable compared to your memory:
apc.shm_size = 256M
Then optimize your query. And if you use php/symfony in cli, you better limit the resource usage for php in cli too.
Ensure you are understanding correctly memory consumption. http://blog.scoutapp.com/articles/2010/10/06/determining-free-memory-on-linux
To fast Apc you can remove the modified check with apc.stat = 0 but you will need to clear apc-cache every time you modify existing files: http://www.php.net/manual/en/apc.configuration.php#ini.apc.stat
To reduce memory consumption reduce hydration my adding ->select('x') and fetching only the essential.
To optimize memory consumption enable mysql-cache, something like /etc/mysql/my.cnf:
query_cache_size=128M
query_cache_limit=1M
Do not forget to enable and check your slow-query-log to avoid bottlenecks.
I suspect that your page has more that one query. How many queries happening on the page? Worst thing in the doctrine is the ability to make queries through getter(ex. getComments()). If you are using a many-to-many relation, this leads to huge problems. You can see all queries via profiler in dev environment.
Also possible, that problem in the settings of apache or php. Incorrect settings of php-fpm lead to problems too. The best solution is to stress test your server with tools like siege and look what goes on through the htop or top. 300 peoples can be a heavy load for the "naked" apache
Have you ever tried to retrieve scalar results instead of a collection of objects ?
// [...]
return $query=$qb->getQuery()->getScalarResult();
http://docs.doctrine-project.org/en/latest/reference/query-builder.html#executing-a-query
http://doctrine-orm.readthedocs.org/en/2.0.x/reference/dql-doctrine-query-language.html#query-result-formats
At symfony configuration level, have you double-checked your configuration to ensure caching has been enabled properly ?
http://symfony.com/doc/current/reference/configuration/doctrine.html#caching-drivers
Detaching entities from your entity manager could prove useful depending on your overall process:
http://docs.doctrine-project.org/en/2.0.x/reference/working-with-objects.html#detaching-entities

Resources