Nutch: Job Failed - ruby-on-rails

i have problem while running nutch for inject
following is the command i am running
bin/nutch inject bin/crawl/crawldb bin/urls
after running above command, gets following error
Injector: starting at 2014-04-02 13:02:29
Injector: crawlDb: bin/crawl/crawldb
Injector: urlDir: bin/urls/seed.txt
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 2
Injector: total number of urls injected after normalization and filtering: 0
Injector: Merging injected urls into crawl db.
Injector: overwrite: false
Injector: update: false
Injector: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
at org.apache.nutch.crawl.Injector.run(Injector.java:316)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Injector.main(Injector.java:306)
I am running nutch for the first time.
i have checked solr, nutch are installed properly.
below details are from log file
java.io.IOException: The temporary job-output directory file:/usr/share/apache-nutch-1.8/bin/crawl/crawldb/1639805438/_temporary doesn't exist!
at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
at org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:449)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-04-02 12:54:46,251 ERROR crawl.Injector - Injector: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
at org.apache.nutch.crawl.Injector.run(Injector.java:316)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Injector.main(Injector.java:306)

was using bin/nutch inject bin/crawl/crawldb bin/urls command to inject
instead of bin/nutch inject crawl/crawldb bin/urls
Which solves the error.
and for fetching urls i have done changes to regex-urlfilter.txt file, now am able to fetch the urls.

Make sure you don't have any syntax errors in any of your nutch config files.

Related

Dataflow launcher Payload Java

I build a stream with an http source and a sink dataflow-launcher to execute a spring batch task named batchPY546Task
To launch this task i set the localFilePath=path-of-the-file parameter.
So in the documentation, with the http source it's possible to pass informations thru the payload.
https://github.com/spring-cloud-stream-app-starters/tasklauncher-dataflow/blob/master/spring-cloud-starter-stream-sink-task-launcher-dataflow/README.adoc
{
"name":"foo",
"deploymentProps": {"key1":"val1","key2":"val2"},
"args":["--debug", "--foo", "bar"]
}
I try many syntaxes :
curl http://localhost:57110 -H"Content-Type:application/json" -d '{"name":"batchPy546Task", "args":{"localFilePath=/tmp/remote-files1/BLM-54.00.01_Multicontrat_Creation_IDCRT011-b.xml"}}'
and all are wrongs
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of java.util.ArrayList<java.lang.Object> out of START_OBJECT token
at [Source: (byte[])"{"name":"batchPy546Task", "args":{"localFilePath=/tmp/remote-files1/BLM-54.00.01_Multicontrat_Creation_IDCRT011-b.xml"}}"; line: 1, column: 34] (through reference chain: org.springframework.cloud.stream.app.task.launcher.dataflow.sink.LaunchRequest["args"])
at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59) ~[jackson-databind-2.10.2.jar!/:2.10.2]
How can I pass the parameter lcalFilePath to my task ?
Versions :
dataflow server 14.2
skipper server 2.3.2
datafmow auncher is up to date and compatible with dataflow server 2.4.2.
Regards
Problem solved with this syntax :
curl http://localhost:57110 -H"Content-Type:application/json" -d '{"name":"batchPy546Task", "args":["localFilePath=/tmp/remote-files1/BLM-54.00.01_Multicontrat_Creation_IDCRT011-b.xml"]}'
args must be a json list "args":["localFilePath=/tmp/..."]

Azure Functions Dependency Injection Failing Again

I have an Azure V2 application running in Azure. For about the 3rd time now over various releases my custom dependency Injection is failing again!
This time it is worse than ever as the issue only happens when it is deployed to Azure and the logging was a nightmare to find out what was happening.
Azure Functions is currently running as follows with the usual error that it is unable to index my Functions.
018-11-10T10:41:56.091 [Information] Host Status: {
"id": "ad-api-dev",
"state": "Default",
"version": "2.0.12165.0",
"versionDetails": "2.0.12165.0 Commit hash: f9d6c271296eaae48f933c67d1df796fd4ff2a94"
}
2018-11-10T10:41:57.607 [Information] Initializing Host.
2018-11-10T10:41:57.607 [Information] Host initialization: ConsecutiveErrors=0, StartupCount=1
2018-11-10T10:41:57.644 [Information] Starting JobHost
2018-11-10T10:41:57.655 [Information] Starting Host (HostId=ad-api-dev, InstanceId=387865be-cfdf-45ca-98ba-2e0091827461, Version=2.0.12165.0, ProcessId=5140, AppDomainId=1, InDebugMode=True, InDiagnosticMode=False, FunctionsExtensionVersion=~2)
2018-11-10T10:41:57.674 [Information] Loading functions metadata
2018-11-10T10:41:57.723 [Information] 15 functions loaded
2018-11-10T10:41:58.158 [Information] Generating 15 job function(s)
2018-11-10T10:41:58.237 [Error] Error indexing method 'CompleteSimulation.Run'
Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexingException : Error indexing method 'CompleteSimulation.Run' ---> System.InvalidOperationException : Cannot bind parameter 'executionService' to type IExecutionService. Make sure the parameter Type is supported by the binding. If you're using binding extensions (e.g. Azure Storage, ServiceBus, Timers, etc.) make sure you've called the registration method for the extension(s) in your startup code (e.g. builder.AddAzureStorage(), builder.AddServiceBus(), builder.AddTimers(), etc.).
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexMethodAsyncCore(MethodInfo method,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 277
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexMethodAsync(MethodInfo method,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 167
End of inner exception
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexMethodAsync(MethodInfo method,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 175
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexTypeAsync(Type type,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 103
2018-11-10T10:41:58.574 [Warning] Function 'CompleteSimulation.Run' failed indexing and will be disabled.
My local version runs fine as below
Azure Functions Core Tools (2.2.32 Commit hash: c5476ae629a0a438d6850e58eae1f5203c896cd6)
Function Runtime Version: 2.0.12165.0
[10/11/2018 13:35:15] Building host: startup suppressed:False, configuration suppressed: False
[10/11/2018 13:35:15] Reading host configuration file 'Y:\src\modo.solutions\Modo.Esg.Api\ModoSol.Functions.Api\bin\Debug\netstandard2.0\host.json'
[10/11/2018 13:35:15] Host configuration file read:
[10/11/2018 13:35:15] {
[10/11/2018 13:35:15] "version": "2.0"
[10/11/2018 13:35:15] }
[10/11/2018 13:35:17] Initializing extension with the following settings: Initializing extension with the following settings:
You can see I am using the same runtime version locally as Azure, So if anyone knows why my Dependency Injection would work locally but not when deployed it would be appreciated.
I am using Autofac 4.8.1 as my dependency injection container.
Thanks in advance.
BTW If anyone else has that issue of not being able to see why their Functions host is not starting follow these instructions.
Go into the Portal for your Azure function and find and enable the Live Log Streaming.
While this window is open re-deploy/publish your app.
Once thats done go into cloud explorer (Visual Studio Windows only :/) and find the logs.
If you don't have the live log streaming window open while you deploy you don't get any logs!!!
Screen shot below.

Error when queing build with sonarqube. Unauthorized

I am trying to integrate sonarqube with TFS, I created a build definition with only one step, the sonar qube integration, based on this tutorial:
https://blogs.msdn.microsoft.com/visualstudioalm/2015/08/24/build-tasks-for-sonarqube-analysis/
I know my sonarqube is already setup, because I can access it through the browser and the database is correctly setup.
However I am getting this error:
14:45:53.684 Default properties file was not found at C:\BuildAgents\DefaultBuildAgent\5\.sonarqube\bin\SonarQube.Analysis.xml
14:45:53.762 Updating build integration targets...
14:45:53.84 Fetching analysis configuration settings...
Unhandled Exception: System.Net.WebException: The remote server returned an error: (401) Unauthorized.
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at System.Net.WebClient.DownloadString(String address)
at SonarQube.TeamBuild.PreProcessor.WebClientDownloader.Download(String url)
at SonarQube.TeamBuild.PreProcessor.SonarWebService.GetProperties(String projectKey, String projectBranch)
at SonarQube.TeamBuild.PreProcessor.TeamBuildPreProcessor.FetchArgumentsAndRulesets(ISonarQubeServer server, ProcessedArgs args, TeamBuildSettings settings, IDictionary`2& serverSettings, AnalyzerSettings& analyzerSettings)
at SonarQube.TeamBuild.PreProcessor.TeamBuildPreProcessor.DoExecute(ProcessedArgs args)
at SonarQube.TeamBuild.PreProcessor.TeamBuildPreProcessor.Execute(String[] args)
at SonarQube.TeamBuild.PreProcessor.Program.Main(String[] args)
Pre-processing succeeded.
Unexpected exit code received from batch file: 255
******************************************************************************
Finishing task: SonarQubePreBuild
******************************************************************************
Task SonarQubePreBuild failed. This caused the job to fail. Look at the logs for the task for more details.
******************************************************************************
Finishing Build
******************************************************************************
Worker Worker-28c6fdb7-9350-4b65-bbba-0e9aab5e0e83 finished running job 28c6fdb7-9350-4b65-bbba-0e9aab5e0e83
You need to specify the authentication token in the SonarQube service endpoint in TFS: click!
To obtain a user token in SonarQube follow these steps
Be sure the sonar.login and sonar.password properties in SonarQube.Analysis.xml are commented out, otherwise the token won't be used.

Angular - Error: [$injector:unpr] Unknown provider: IdleProvider

Getting the error Error: [$injector:unpr] Unknown provider: IdleProvider in my application when it is deployed to our staging server using dokku but I am not getting it when running it on my local machine. I'm using ng-idle 1.2.1
I've found this question asked a number of times but the cause was always related to the changes made in version 1.0.0 where the service names were changed. The only thing I can think of is that the minification of the code is the problem but as far as I can see the code should be ok but I am not an expert. Any help would be greatly appreciated.
It's written in Coffeescript
configuration = (RestangularProvider, $logProvider, growlProvider, IdleProvider, KeepaliveProvider) ->
.
.
.
return
configuration.$inject = [
'RestangularProvider'
'$logProvider'
'growlProvider'
'IdleProvider'
'KeepaliveProvider'
]
angular
.module 'vssApp.config', [
'restangular'
]
.config configuration
EDIT
While trying to replicate the problem on my local machine I removed the 'ngIdle' module in the modules array below. This resulted in the same behavior so I am assuming that the problem stems from the ngIdle module not being loaded correctly here. I still feel that minification could be causing the problem but, again, I'm not sure why or how to fix it.
modules = [
'ui.router'
'ui.bootstrap'
'ui.select'
'ngAnimate'
'ngMessages'
'ngSanitize'
'ngCookies'
'smart-table'
'angularMoment'
'templates'
'angular-storage'
'angular-growl'
'vssApp.core.auth'
'vssApp.core.loading'
'ngIdle'
'cgPrompt'
'vssApp.filters'
]
runBlock.$inject = [
'$templateCache'
]
angular
.module 'vssApp.core', modules
.run runBlock
EDIT 2
Here's the full output from the error message I'm getting
Error: [$injector:modulerr] Failed to instantiate module vssApp due to:
Error: [$injector:modulerr] Failed to instantiate module vssApp.config due to:
Error: [$injector:unpr] Unknown provider: IdleProvider
http://errors.angularjs.org/1.3.16/$injector/unpr?p0=IdleProvider
at https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:18814
at https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:16489
at getService (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:14903)
at Object.invoke (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:15466)
at runInvokeQueue (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:13793)
at https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:14062
at forEach (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:19482)
at loadModules (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:13587)
at https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:13964
at forEach (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:19482)
http://errors.angularjs.org/1.3.16/$injector/modulerr?p0=vssApp.config&p1=E…net%2Fassets%2Fapplication-85a5fd382c73380bf2a71b66e581c941.js%3A3%3A19482)
at https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:18814
at https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:14406
at forEach (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:19482)
at loadModules (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:13587)
at https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:13964
at forEach (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:19482)
at loadModules (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:13587)
at createInjector (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:4:16844)
at doBootstrap (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:28466)
at bootstrap (https://SERVER/assets/application-85a5fd382c73380bf2a71b66e581c941.js:3:28995)
http://errors.angularjs.org/1.3.16/$injector/modulerr?p0=vssApp&p1=Error%3A…net%2Fassets%2Fapplication-85a5fd382c73380bf2a71b66e581c941.js%3A3%3A28995)
A rule of thumb is to load modules dependencies in each place where they are used. This allows to decouple them. And this eliminates race condition with service provider injection.
If the app looks like this
angular.module('vssApp', ['vssApp.config', 'ngIdle', ...])..
angular.module('vssApp.config', ['restangular'])...
service provider for Idle service is not defined at the time when vssApp.config module is loaded.
While this
angular.module('vssApp', ['ngIdle', 'vssApp.config', ...])
angular.module('vssApp.config', ['restangular'])...
avoids race condition but still indicates code smell.
It should be
angular.module('vssApp.config', ['restangular', 'ngIdle'])...
This issue applies to service providers only and config phase. Service instances can be injected for any module order.
Finally found the cause and solution to this, it seems to have been a bower issue.
It's a Rails app, so I specified ng-idle 1.2.1 in the bower.json file but for some reason the bower file was ignored when the app was being deployed using Dokku and the last installed version 0.3.5 remained, which meant that the pre-1.0.0 ng-idle services naming convention was still being used where all service names were preceded with a $. This resulted in the Unknown provider: IdleProvider error because $IdleProvider was the actual service name.
In the end I had to connect to the docker container and remove and reinstall all bower components. Running bower update as part of the deployment was not enough for some reason. When I have more time I will investigate what caused this behavior and I will report here.

Jenkins-Testlink integration - HTTP server returned unexpected status: Found

I’m trying to connect Jenkins (1.482) with TestLink (1.9.4) thru Jenkins configuration in order to retrieve tests, but while running the job in Jenkins I get the below error in the console log.
Please note that Jenkins is hosted on tomcat (linux) on network“gnb” and Testlink is hosted on php (linux) on another network “<company network name>”. It works well when both are on my localhost (in windows)
but this integration does not work when both Jenkins and TestLink are on separate networks/hosts.
I get the below error on the console while running the job:
Preparing TestLink client API.
Using TestLink URL: http://<hostname>/mr61_php5/testlink/lib/api/xmlrpc.php
FATAL: Error verifying developer key: HTTP server returned unexpected status: Found
br.eti.kinoshita.testlinkjavaapi.util.TestLinkAPIException: Error verifying developer key: HTTP server returned unexpected status: Found
at br.eti.kinoshita.testlinkjavaapi.MiscService.checkDevKey(MiscService.java:66)
at br.eti.kinoshita.testlinkjavaapi.TestLinkAPI.(TestLinkAPI.java:162)
at hudson.plugins.testlink.TestLinkBuilder.getTestLinkSite(TestLinkBuilder.java:244)
at hudson.plugins.testlink.TestLinkBuilder.perform(TestLinkBuilder.java:134)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:160)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1502)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Caused by: org.apache.xmlrpc.client.XmlRpcHttpTransportException: HTTP server returned unexpected status: Found
at org.apache.xmlrpc.client.XmlRpcSunHttpTransport.getInputStream(XmlRpcSunHttpTransport.java:94)
at org.apache.xmlrpc.client.XmlRpcStreamTransport.sendRequest(XmlRpcStreamTransport.java:152)
at org.apache.xmlrpc.client.XmlRpcHttpTransport.sendRequest(XmlRpcHttpTransport.java:143)
at org.apache.xmlrpc.client.XmlRpcSunHttpTransport.sendRequest(XmlRpcSunHttpTransport.java:69)
at org.apache.xmlrpc.client.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:56)
at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:167)
at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:158)
at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:147)
at br.eti.kinoshita.testlinkjavaapi.BaseService.executeXmlRpcCall(BaseService.java:90)
at br.eti.kinoshita.testlinkjavaapi.MiscService.checkDevKey(MiscService.java:62)
... 12 more
ERROR: Error communicating with TestLink. Check your TestLink configuration.
I have below settings in my Jenkins’s global configuration for Testlink installation
Name: testlink
URL: http://<host name>/mr61_php5/testlink/lib/api/xmlrpc.php
Developer key: generated from Testlink (Settings->Generate a new key)
Can you please point me if I miss something?
Usually in the Testlink folder structure, the path that you have mentioned, does not contain the xmlrpc.php file
Probabaly worng URL: URL: http:///mr61_php5/testlink/lib/api/
The correct URL is usually of this format
.../testlink/lib/api/xmlrpc//xmlrpc.php
Kindly check the correct URL, or try opening the xmlrpc.php page, so that you can get the correct path of the file. As per my assumption it should be somewhat like this:
http:///mr61_php5/testlink/lib/api/xmlrpc/xmlrpc.php
Good answer In my case it is as below...
http://IP:PORT/testlink/lib/api/xmlrpc/v1/xmlrpc.php in 1.9.11 version of testlink

Resources