How to put a timeout on Computer.waitUntilOnline - jenkins

Short version
In a Jenkins post-build groovy script, is there a way to have the Computer.waitUntilOnline function time out after a certain period of time?
Background
We do testing for embedded devices, and our jenkins slaves are laptops connected to certain hardware setups. In certain situations we need to have a groovy post-build script reboot the computer, and wait for it to come online again. However, sometimes these machines don't come online again, and our groovy script just keeps waiting indefinitely.
The waitUntilOnline function can throw an InterruptedException, but from what I can tell you need to be running multiple threads in order to to trigger that exception. Running multiple threads just to trigger a timeout seems like the wrong way to go about things.
I've also found some information on using timeout. This is intended for Jenkins Pipelines, so I'm not sure I can use it in post-build groovy, and I've not gotten it working. I've tried various combinations, including :
timeout(20)
{
computer.waitUntilOnline()
}
and
timeout(20)
{
waitUntil{
try {
computer.waitUntilOnline()
}catch (exception){
manager.listener.logger.println("caught exception!");
}
}
}
but all seem to throw an exception like this :
groovy.lang.MissingMethodException: No signature of method: Script1.timeout() is applicable for argument types: (java.lang.Integer, Script1$_rebootAndWait_closure1) values: [20, Script1$_rebootAndWait_closure1#7b6564fc]
Any suggestions are appreciated.
EDIT :
I've also tried the #groovy.transform.TimedInterrupt annotation, as mentioned in this question, and I'm getting weird results.
When I run the simple loopy example, I get the expected result : it prints some value for i. However, if I attempt to combine this with rebooting the computer, like so :
import hudson.util.RemotingDiagnostics
def runCmd(computer, cmd)
{
def channel = computer.getChannel()
str = RemotingDiagnostics.executeGroovy( """
p = '$cmd'.execute()
p.waitFor()
p.in.text
""", channel )
}
#groovy.transform.TimedInterrupt( 3L )
def loopy()
{
int i = 0
try {
while( true ) {
i++
}
}
catch( e ) {
manager.listener.logger.println("i is "+i);
}
}
def rebootAndWait(computer)
{
manager.listener.logger.println("about to loopy : " +computer.name);
cmd = 'shutdown /r /t 10 /c "Restarting after Jenkins test completed"'
// cmd = "cmd.exe /c echo loopy test > c:\\\\Users\\\\frederikvs\\\\fvs_test.txt"
runCmd(computer, cmd)
// eventually we want to wait here for the computer to come back online, but for now we'll just have loopy
loopy();
}
rebootAndWait(manager.build.getBuiltOn().toComputer())
I get weird results : sometimes I get the expected output of some value for i, and sometimes I get an uncaught exception :
java.util.concurrent.TimeoutException: Execution timed out after 3 units. Start time: Fri Aug 17 10:48:07 CEST 2018
at sun.reflect.GeneratedConstructorAccessor5665.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247)
at Script1.loopy(Script1.groovy)
at Script1.rebootAndWait(Script1.groovy:47)
at Script1$rebootAndWait.callCurrent(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:52)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:154)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:166)
at Script1.run(Script1.groovy:51)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SecureGroovyScript.evaluate(SecureGroovyScript.java:343)
at org.jvnet.hudson.plugins.groovypostbuild.GroovyPostbuildRecorder.perform(GroovyPostbuildRecorder.java:380)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
at hudson.model.Build$BuildExecution.post2(Build.java:186)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635)
at hudson.model.Run.execute(Run.java:1819)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Here's the kicker : whether I get the value of i, or the exception, seems to depend on the previous run.
If I run it with the reboot command a few times, I always get the exception. If I run it with the echo command (currently commented out) a few times, I always get the expected result of printing i.
But if I run it with the reboot command, then switch it around to the echo command and run that a few times, the first time with the echo it will give the exception, after that it'll give the value for i.
And if I switch from the echo command to the reboot command, the first run with the reboot will also be fine (printing the value of i), and after that it'll start giving the exception.
I don't quite understand how the previous run can have an effect on the timeout of the present run...
Again, any input is appreciated!

Related

Glue job failed with "No space left on device" or "ArrayIndexOutOfBoundsException" when writing a huge data frame

I have a glue job that:
create dynamic frames from several data catalogs
change to spark dataframes.
join 4 dataframes and complete aggregation.
write to s3 with csv/parquet file type.
It had no problem with medium-sized data source(about 20G data in total), G1x DPU, 20 workers with execution time 40min.
But when data volume increased to 60G in total, with G2x DPU and 50 workers, time consuming increased to 4-6 hours, and result in error:
21/08/07 16:41:27 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last):
File "/tmp/test-deal-stepfun.py", line 213, in <module>
df.coalesce(1).write.partitionBy("log_date_syd").mode("overwrite").csv(args['DEST_FOLDER'])
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 927, in csv
self._jwrite.csv(path)
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o253.csv.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:664)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task creation failed: java.lang.ArrayIndexOutOfBoundsException: 0
java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.spark.scheduler.CompressedMapStatus.getSizeForBlock(MapStatus.scala:119)
at org.apache.spark.MapOutputTrackerMaster$$anonfun$getLocationsWithLargestOutputs$1.apply(MapOutputTracker.scala:612)
at org.apache.spark.MapOutputTrackerMaster$$anonfun$getLocationsWithLargestOutputs$1.apply(MapOutputTracker.scala:599)
at org.apache.spark.ShuffleStatus.withMapStatuses(MapOutputTracker.scala:192)
at org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs(MapOutputTracker.scala:599)
at org.apache.spark.MapOutputTrackerMaster.getPreferredLocationsForShuffle(MapOutputTracker.scala:568)
at org.apache.spark.sql.execution.ShuffledRowRDD.getPreferredLocations(ShuffledRowRDD.scala:152)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
...
BTW, I have job parameters for optimization memory and disk:
"--conf: spark.executor.memory=20g --conf: spark.driver.memory=20g --conf: spark.driver.memoryOverhead=10g --conf: spark.executor.memoryOverhead=10g" to add more memory to spark driver and executors.
"--write-shuffle-files-to-s3: true" redirect intermediate files to s3 to give more space for worker nodes.
In job script, set s3 retry
conf = SparkConf()
conf.set("spark.hadoop.fs.s3.maxRetries","20").set("spark.hadoop.fs.s3.sleepTimeSeconds","30")
In job script, add options to create dynamic frame
"useS3ListImplementation": True,
"groupFiles": "InPartition",
"groupSize": "10485760"
Optimize spark job code. Drop unused columns before join, and distinct for left join.
The errors are related to "no space left on device", or "ArrayIndexOutOfBoundsException" when writing.
The metrics pattern:
metrics
How to avoid failure for huge data writing in glue job, thanks a lot!
I recently encountered this same issue while running an AWS glue job configured to use S3 for shuffle. In my case the issue was that I had incorrectly set the configuration for spark.shuffle.glue.s3ShuffleBucket. Once I fixed my job parameters to have --conf spark.shuffle.glue.s3ShuffleBucket=S://mybucket/mypath, with the key being --conf and the value being spark.shuffle.glue.s3ShuffleBucket=S://mybucket/mypath, it worked.

Jenkins Out of Memory error while running multiple jmeter script

I have a Jmeter/Jenkins/maven/perforce framework for API testing purpose.Where each JMX contain around 200 test cases .each test cases are written inside a thread group with one user only. Each test case has some precondition for data fetching from multiple dbs then heating request then use multiple assertions using bean shell one also.Also we have used a lot of customize jar so that we can access server and read the logs or to edit the properties file over there.
If we ran the script from jmeter it ran smoothly but if we run the script from Jenkins say 20 jmx at a time sequentially,then after some time it may be 1 2 or 17 hr it fails showing out of memory error.
My current Jenkins server config is like this:
free -h
total used free shared buff/cache available
Mem:
31G 3.1G 12.9G 16M 15G 24G
i have already tweaked with heap space like 6/8 ,12/12.
Log at the time of failure:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.lang.Thread.run(Thread.java:811)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.xerces.xni.XMLString.toString(Unknown Source)
at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.handleCharacter(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
at <unknown class>.<unknown method>(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at utils.APIReportProcessing.fetchAPIReportDetail(APIReportProcessing.java:84)
at jmeterRun.RunProcess.prepareFinalResults(RunProcess.java:179)
at jmeterRun.RunProcess.executeJMeterAndWriteResults(RunProcess.java:158)
at jmeterRun.ControllerJMeter.main(ControllerJMeter.java:115)
... 6 more
Here is code from APIReportProcessing part where it fails.
Below is the code where i am getting error.
public static void fetchAPIReportDetail(String rawXMLReportFile) {
File rawXMLReport = null;
try {
rawXMLReport = new File(rawXMLReportFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(rawXMLReport);
doc.getDocumentElement().normalize();
individualModuleCount.add(passCount + "," + totalTestCount);
} catch (Exception var13) {
var13.printStackTrace();
Logging.log("info", "Error in fetching up data from XML file. Exception:" + var13.getMessage());
} finally {
try {
rawXMLReport.delete();
} catch (Exception var12) {
var12.printStackTrace();
Logging.log("error", "Error in deleting XML data file. Exception:" + var12.getMessage());
}
}
Thanks,
Bibek
You need to increase heap not for JMeter but for Jenkins, check out How to add Java arguments to Jenkins? article for instructions.
You might want to switch to XMLSlurper or XMLParser, both are SAX based therefore the memory footprint should be less.
According to JMeter Best Practices you should rather be using CSV output format for JMeter result files, if you need to have the count of passed requests you could go for JMeterPluginsCMD Command Line Tool
Sorry for delayed response.
In my case the issue resolved with updating correct java version.Previous java version was IBM version where the issue occured.I changed it to the openjdk version "1.8.0_191" version.
Thanks,
BIbek

Jenkins active-directory Could not find matching constructor

Caution: I'm good with EnterpriseLinux, but assume I'm a reluctant pedestrian at best for Jenkins, jars, wars, jpis, java, and groovy. I'm so sorry.
I've got a Jenkins box set up on RHEL7, mainly via the (admittedly rotting) Chef cookbook, so it's repeatable and nearly idiot-proof. When it comes to adding the module, I'm adding the HPIs by local file (secure site, no net access) like so:
plugins=%w(active-directory mailer display-url-api)
require 'digest'
plugins
.each_with_index do |plugin_with_version, index|
p, v = plugin_with_version.split(':') # yeah I know
source = "#{Chef::Config[:file_cache_path]}/cookbooks/#{cookbook_name}/files/default/#{p}.hpi"
directory "#{node['jenkins']['master']['home']}/plugins" do
owner node['jenkins']['master']['user']
group node['jenkins']['master']['group']
mode 0755
end
cookbook_file "#{node['jenkins']['master']['home']}/plugins/#{p}.hpi" do
action :create
owner node['jenkins']['master']['user']
group node['jenkins']['master']['group']
mode 0755
notifies :create, "ruby_block[jenkins_restart_flag]", :immediately
end
end
When I pre-game the files portion with HPIs, it populates the /var/lib/jenkins/plugins location, so I think I'm getting there.
# ls -l /var/lib/jenkins/plugins/
total 708
drwxr-xr-x 6 jenkins jenkins 77 Aug 30 08:37 active-directory
-rwxr-xr-x 1 jenkins jenkins 583280 Aug 30 08:37 active-directory.hpi
drwxr-xr-x 4 jenkins jenkins 53 Aug 30 08:37 display-url-api
-rwxr-xr-x 1 jenkins jenkins 19478 Aug 30 08:37 display-url-api.hpi
drwxr-xr-x 4 jenkins jenkins 53 Aug 30 08:37 mailer
-rwxr-xr-x 1 jenkins jenkins 115745 Aug 30 08:37 mailer.hpi
In fact, all three plugins seem to be active in /pluginManager/installed:
active directory plugin 2.8
Display URL API 2.2.0
Mailer Plugin 1.21
.. and the two deps have their boxes checked and dimmed, where the AD plugin is just checked. That suggests that they're installed and activated, but I'm guessing.
Now to configuring the AD plugin, I think, and here's where things go horribly wrong today.
Here's the script I'm using, about the 5th such script (Google's my only friend here when the brain's outta clues):
import hudson.plugins.active_directory.*
import jenkins.model.*
def instance = Jenkins.getInstance();
def ActiveDirectoryDomain adDomain = new ActiveDirectoryDomain("Example_Domain_Name_2", "Example_Domain_Controller_\
2");
def domains = new ArrayList<ActiveDirectoryDomain>();
domains.add(adDomain);
def securityRealm = new ActiveDirectorySecurityRealm(
"",
domains,
"",
"",
"",
"",
GroupLookupStrategy.RECURSIVE,
false,
true,
null)
println(securityRealm.domains)
instance.setSecurityRealm(securityRealm)
instance.save()
But the invocation totally bails. The meat of the error message, removing the chef bleating, is:
---- Begin output of "/usr/lib/jvm/java-1.8.0/bin/java" -jar "/var/chef/cache/jenkins-cli.jar" -s http://localhost:8080 -"remoting" groovy = ----
STDOUT:
STDERR: Aug 30, 2018 1:32:03 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class hudson.cli.ClientAuthenticationCache$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
ERROR: Unexpected exception occurred while performing groovy command.
groovy.lang.GroovyRuntimeException: Could not find matching constructor for: hudson.plugins.active_directory.ActiveDirectorySecurityRealm(java.lang.String, java.util.ArrayList, java.lang.String, java.lang.String, java.lang.String, java.lang.String, hudson.plugins.active_directory.GroupLookupStrategy, java.lang.Boolean, java.lang.Boolean, null)
at groovy.lang.MetaClassImpl.invokeConstructor(MetaClassImpl.java:1732)
at groovy.lang.MetaClassImpl.invokeConstructor(MetaClassImpl.java:1532)
at org.codehaus.groovy.runtime.callsite.MetaClassConstructorSite.callConstructor(MetaClassConstructorSite.java:49)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
at RemoteClass.run(RemoteClass:9)
at groovy.lang.GroovyShell.runScriptOrMainOrTestOrRunnable(GroovyShell.java:263)
at groovy.lang.GroovyShell.run(GroovyShell.java:518)
at groovy.lang.GroovyShell.run(GroovyShell.java:497)
at hudson.cli.GroovyCommand.run(GroovyCommand.java:89)
at hudson.cli.CLICommand.main(CLICommand.java:280)
at hudson.cli.CliManagerImpl.main(CliManagerImpl.java:95)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:929)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:903)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:855)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at hudson.cli.CliManagerImpl$1.call(CliManagerImpl.java:66)
at hudson.remoting.CallableDecoratorAdapter.call(CallableDecoratorAdapter.java:18)
at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
---- End output of "/usr/lib/jvm/java-1.8.0/bin/java" -jar "/var/chef/cache/jenkins-cli.jar" -s http://localhost:8080 -"remoting" groovy = ----
No Joy, right? Here's the choicest cut:
groovy.lang.GroovyRuntimeException: Could not find matching constructor for: hudson.plugins.active_directory.ActiveDirectorySecurityRealm(java.lang.String, java.util.ArrayList, java.lang.String, java.lang.String, java.lang.String, java.lang.String, hudson.plugins.active_directory.GroupLookupStrategy, java.lang.Boolean, java.lang.Boolean, null)
Now, before we dive into this one, I do want to say that the other (really 3 or 4) scripts I also tried, like the Internet van-candy it is, also bailed with similar constructor errors. I can run those and present the errors for comparison if need be, but what I want to suggest is that it smells like a larger problem, that somehow my addons aren't dropping in the code they should be, even though it all looks okay. Again, still guessing.
And yes, in the other 3-4 attempts I tuned the scripts with internal custom data; this one, with morale so low, I didn't even bother. But I promise that I used valid data with the rest, and the plan is to use real values if we can get past the constructor error.
And the questions, in a very particular order:
what's a known-good invocation for groovy to create that AD config? With the most recent of what appears to be a very changing codebase?
Has anyone else seen this groovy constructor issue with a similar-ish setup?
Any hints to get me closer to a win?
Thanks for reading this far, and I hope your day is going very well. ;-)
So all Groovy scripting in Jenkins is a very thin layer over the actual Java objects, so to find the right constructor, we need to look at the code for the plugin: https://github.com/jenkinsci/active-directory-plugin/blob/1b082cbfb7d236d326c218c7b474fb51cb930080/src/main/java/hudson/plugins/active_directory/ActiveDirectorySecurityRealm.java#L224-L270
If we take the first constructor as an example:
ActiveDirectorySecurityRealm(String domain, String site, String bindName, String bindPassword, String server)
So you would call that like:
def securityRealm = new ActiveDirectorySecurityRealm("Example_Domain_Name_2", null, null, null, "Example_Domain_Controller_2")
Or something like that.
Firstly, remember to restart Jenkins after adding a plugin! That seems like a big thing to stress.
Following Noah's hints, here's what worked for me:
import hudson.plugins.active_directory.*
import jenkins.model.*
def instance = Jenkins.getInstance();
// public ActiveDirectorySecurityRealm(String domain, String site, String bindName, String bindPassword, String server)
def securityRealm = new ActiveDirectorySecurityRealm(
'myRealm',
'Default-First-Site-Name',
'bindaddr#myRealm,
'bindpassword_cleartext',
'ad_server1fqdn,ad_server2fqdn'
)
securityRealm.getDomains().each({
it.site = securityRealm.site
it.bindName = securityRealm.bindName
it.bindPassword = securityRealm.bindPassword
})
instance.setSecurityRealm(securityRealm)
instance.save()
Note that I added in the tweak from Konstantinos here too.
.. and it worked! I think. No alerts. Until I understood the lack of guard code and cleanup pieces made it constantly refresh bits it didn't need to and never remove bits it should, that is -- definitely a pattern we don't like to perpetuate where I'm using it in automation. So I'm stuck replicating the config anyway, but at least I have the model to generate it. Still a great day.
In the future:
find the best constructor from this week's version of the code
use that to form the simplest call in your script
check whether Konstantinos' tweak is still needed
Many thanks to Konstantinos and Noah here. I apparently can't recommend you give them credit, but I am very grateful.

Galago 3.5 Indexing

Downloaded Galago 3.5 bin version and tried to index wiki-small.corpus following this guide. Strangely I get a File Not Found Exception for the .index file when trying to run the build index command. This error goes away when I explicitly use the inputPath and indexPath but instead now I get this exception -
Created executor: org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor#69107c05
Running without server!
Use --server=true to enable web-based status page.
Stage inputSplit completed with 0 errors.
Mar 14, 2014 3:26:01 PM org.lemurproject.galago.core.parse.UniversalParser process
INFO: Processing split: /Users/nanz/Downloads/wiki-small.corpus
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:137)
at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:52)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$TupleUnshredder.processTuple(DocumentSplit.java:2033)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$DuplicateEliminator.processTuple(DocumentSplit.java:1989)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyTuples(DocumentSplit.java:1705)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyUntilFileId(DocumentSplit.java:1732)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyUntil(DocumentSplit.java:1740)
at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedReader.run(DocumentSplit.java:1940)
at org.lemurproject.galago.tupleflow.FileOrderedReader.run(FileOrderedReader.java:76)
at org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor$LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:96)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.lemurproject.galago.core.parse.UniversalParser.constructParserWithSplit(UniversalParser.java:213)
at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:132)
... 10 more
Caused by: java.lang.NullPointerException
at org.lemurproject.galago.core.index.KeyValueReader.getManifest(KeyValueReader.java:35)
at org.lemurproject.galago.core.index.corpus.CorpusReader.init(CorpusReader.java:41)
at org.lemurproject.galago.core.index.corpus.CorpusReader.(CorpusReader.java:32)
at org.lemurproject.galago.core.parse.CorpusSplitParser.(CorpusSplitParser.java:33)
... 16 more
Stage parsePostings completed with 1 errors.
java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
Exception in thread "main" java.util.concurrent.ExecutionException: Stage threw an exception:
at org.lemurproject.galago.tupleflow.execution.JobExecutor$JobExecutionStatus.waitForStages(JobExecutor.java:1062)
at org.lemurproject.galago.tupleflow.execution.JobExecutor$JobExecutionStatus.run(JobExecutor.java:971)
at org.lemurproject.galago.tupleflow.execution.JobExecutor.runWithoutServer(JobExecutor.java:1122)
at org.lemurproject.galago.tupleflow.execution.JobExecutor.runLocally(JobExecutor.java:1177)
at org.lemurproject.galago.core.tools.AppFunction.runTupleFlowJob(AppFunction.java:101)
at org.lemurproject.galago.core.tools.apps.BuildIndex.run(BuildIndex.java:789)
at org.lemurproject.galago.core.tools.AppFunction.run(AppFunction.java:55)
at org.lemurproject.galago.core.tools.App.run(App.java:82)
at org.lemurproject.galago.core.tools.App.run(App.java:73)
at org.lemurproject.galago.core.tools.App.main(App.java:69)
Caused by: java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor$LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:99)
at java.lang.Thread.run(Thread.java:695)
I tried building the source code and I got the same results in that case as well. Can somebody point out where I am going wrong ? Hardly anybody seems to have faced this issue so there's not much I get via a simple Google search.
Solved. Just in case someone else faces this issue, one of my friends figured it out that Galago would not work directly on the wiki-small.corpus file as it tries to look for corpus.keys which do not exist for this. Just replace this .corpus file instead with the directory of documents and everything will work just fine. Do specify the indexPath and inputPath parameters explicitly. Use "galago build help" to view the exact syntax. Cheers.
I know this is late, but the wiki-small.corpus file from the textbook's website was built with an old version of galago, namely the 1.0 series, which is preserved in this google code repository: https://code.google.com/p/galagosearch/
The newer releases of Galago (2.0 ... 3.5 ...3.7) are part of newer development under the Lemur Project on sourceforge, and the corpus format has since changed. If you had a corpus file built with Galago 3.5, your commands should have worked.

Flume NullPointerExceptions on checkpoint

I've setup a file to file source/sink , just as a test of basic flume functionality.
Im currently using the "exec" source, with the command being "tail -F mytmpfile".
In my script, I continuously echo "....." >> mytmpfile , so that the tail command constitutes a stream.
However, I've started seeing the following exception in the flume logs:
java.lang. IllegalStateException: Channel closed [channel=c1]. Due to
java.lang.NullPointerException: null
at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:183)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
at org.apache.flume.channel.file.Log.replay(Log.java:406)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
... 1 more
Any thoughts on where this NullPointerException is coming from? It appears from scanning the code that maybe it related to a missing folder or directory. But I cant find the exact line on the git hub branches.
This is using apache-flume-1.3.1.23-...
In the past I've had problems with file channels, and they've normally boiled down to two problems:
1) If you're running multiple agents on the same box, make sure you configure them to have separate dataDirs and checkpointDir.
2) On Linux boxes, check that your tmpfs isn't near its capacity. If it's getting full, flume will complain. Try stopping the flume agent, unmount tmpfs, enlarge it, remount and restart the agent.

Resources