I am very new to Kafka and Samza. I tried the hello-samza exampe and it is working. What I am looking for is to create a samza task that reads the message from a kafka topic.The task I added does not throw any error, and is not reading any message from the topic. Yarn UI shows task as accepted.Not sure what I am doing wrong here.
Here is the Class
public class MyTask implements StreamTask {
#Override
public void process(IncomingMessageEnvelope incomingMessageEnvelope, MessageCollector messageCollector, TaskCoordinator taskCoordinator) throws Exception {
System.out.println(" key - " + incomingMessageEnvelope.getKey() + " | message " + incomingMessageEnvelope.getMessage());
}
}
Here is the properties file
# Job
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
job.name=addresses
# YARN
yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz
# Task
task.class=samza.examples.wikipedia.task.MyTask
task.inputs=addressestopic
# Serializers
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
# Kafka System
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=json
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.producer.bootstrap.servers=localhost:9092
# Job Coordinator
job.coordinator.system=kafka
# Add configuration to disable checkpointing for this job once it is available in the Coordinator Stream model
# See https://issues.apache.org/jira/browse/SAMZA-465?focusedCommentId=14533346&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14533346 for more details
job.coordinator.replication.factor=1
If the Yarn UI shows that your job is in "ACCEPTED" state and not "RUNNING", then it is possible that Yarn has not yet found resources to run your Samza job.
Usually, I have noticed this to happen when you run out of disk space on your local box on which you are executing.
Can you check the Yarn UI (localhost:8088) and verify that "Active Nodes" (on the top) == 1 ?
Additionally, you can cross-check the number of "Lost Nodes" to be equal to zero.
If there are Lost Nodes, you can click on the link to see why it is not available for use.
Related
Situation
Cron is (alongside the Rails app), deployed to GCP with cron.yaml:
cron:
- description: count things regularly
url: /api/v1/cron/rake_task
schedule: every 30 minutes
timezone: Europe/Berlin
Problem
Question
How to see the cron log? View reveals nothing at all, but clearly there has to be a sensible way to debug the failure.
On a standard environment one could go after /var/log/syslog or /var/log/cron.log, but here there's nothing if I log in to VM or even go after the main gaeapp container.
Any leads would be welcome!
Cron log gets populated into the log of default service.
Instead of View link log of default service should be filtered for the Rake task route that the cron is calling.
I have a base http request handler that all the other request handlers inherit. I wrapped it up in a try-catch so that I can email myself the stack trace every time an exception occurs.
Here is how I did it in python/webapp2, I'm sure you could setup something similar in ruby. (alternatively you could use a product like https://rollbar.com/)
from google.appengine.api import mail, app_identity
def email_dev(subject, body):
try:
app_id = app_identity.get_application_id()
mail.AdminEmailMessage(
sender="%s <no-reply#%s.appspotmail.com>" % (app_id, app_id),
subject=subject,
body=body,
).send()
except Exception as e:
print("Error sending email: "+str(e))
class BaseHandler(webapp2.RequestHandler):
def dispatch(self, *args, **kwargs):
try:
return super(BaseHandler, self).dispatch(*args, **kwargs)
except Exception:
email_dev(
"Handler: "+self.__class__.__name__+" failed",
traceback.format_exc())
raise
Every example I've seen (task-launcher sink and triggertask source ) shows how to launch the task defined by uri attribute.
My tasks definitions look like this :
sampleTask <t2: timestamp || t1: timestamp>
sampleTask-t1 timestamp
sampleTask-t2 timestamp
sampleTaskRunner composed-task-runner --graph=sampleTask
My question is how do I launch the composed task runner (sampleTaskRunner, defined by DSL) from stream application.
Thanks
UPDATE
I ended up with the below solution that triggers task using SCDF REST API :
composedTask definition :
<timestamp || mySampleTask>
Stream definition :
http | httpclient | log
Deployment properties :
app.http.port=81
app.httpclient.body=name=composedTask&arguments=--increment-instance-enabled=true
app.httpclient.http-method=POST
app.httpclient.url=http://localhost:9393/tasks/executions
app.httpclient.headers-expression={'Content-Type':'application/x-www-form-urlencoded'}
Though it's easy to implement http sink component, would be great if stream application starters will provide one out of the box.
Another concern I have is about discovering the SCDF REST URL when deployed in distributed environment.
Here's a quick take from one of the SCDF's R&D team members (Glenn Renfro).
stream create foozer --definition "trigger --fixed-delay=5 | tasklaunchrequest-transform --uri=maven://org.springframework.cloud.task.app:composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT --command-line-arguments='--graph=sampleTask-t1||sampleTask-t2 --increment-instance-enabled=true --spring.datasource.url=jdbc:mariadb://localhost:3306/test --spring.datasource.username=root --spring.datasource.password=password --spring.datasource.driverClassName=org.mariadb.jdbc.Driver' | task-launcher-local" --deploy
In the foozer stream definition,
1) "trigger" source happens to trigger an upstream event every 5s
2) "tasklaunchrequest-transform" processor takes a few arguments; more specifically, it uses "composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT" to launch a composed-task graph (i.e., sampleTask-t1||sampleTask-t2)
3) Pay attention to --increment-instance-enabled. This was recently added to CTR application and this provides the ability to re-launch a composed-task in a recurring cadence
4) Since the CTR and SCDF must share the same database, we are also passing datasource properties as command-line args. (SCDF-server is already started with the same datasource credentials)
Hope this helps.
Lastly, we will add a sample to the reference guide via: spring-cloud/spring-cloud-dataflow#1780
I think I followed very step on the document, but I still ran into this exception. (the only different is that I run this from Eclipse J2EE, but I won't expect this really maters, doesn't it?)
Code: (I didn't write this, it's right from the beam project example). I think you'd have to specify a google cloud platform project and provide the right credential to access it. However, I didn't find anywhere in this example project that does the setting up.
public static void main(String[] args) {
// Create a PipelineOptions object. This object lets us set various execution
// options for our pipeline, such as the runner you wish to use. This example
// will run with the DirectRunner by default, based on the class path configured
// in its dependencies.
PipelineOptions options = PipelineOptionsFactory.create();
// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(options);
// Apply the pipeline's transforms.
// Concept #1: Apply a root transform to the pipeline; in this case, TextIO.Read to read a set
// of input text files. TextIO.Read returns a PCollection where each element is one line from
// the input text (a set of Shakespeare's texts).
// This example reads a public data set consisting of the complete works of Shakespeare.
p.apply(TextIO.Read.from("gs://apache-beam-samples/shakespeare/*"))
.....
)
Exception:
Exception in thread "main" java.lang.IllegalStateException: Failed to validate gs://apache-beam-samples/shakespeare/*
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:309)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:205)
at org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76)
at org.apache.beam.runners.direct.DirectRunner.apply(DirectRunner.java:296)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:388)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:302)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:152)
at google.dataflow.beam.example.MinimalWordCount.main(MinimalWordCount.java:77)
Caused by: java.io.IOException: Unable to match files in bucket apache-beam-samples, prefix shakespeare/ against pattern shakespeare/[^/]*
at org.apache.beam.sdk.util.GcsUtil.expand(GcsUtil.java:234)
at org.apache.beam.sdk.util.GcsIOChannelFactory.match(GcsIOChannelFactory.java:53)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:304)
... 8 more
Caused by: com.google.api.client.http.HttpResponseException: 400 Bad Request
{
"error" : "invalid_grant"
}
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
at com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:207)
at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:149)
at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:135)
at com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
at com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52)
at com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:300)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.ResilientOperation$AbstractGoogleClientRequestExecutor.call(ResilientOperation.java:166)
at com.google.cloud.hadoop.util.ResilientOperation.retry(ResilientOperation.java:66)
at com.google.cloud.hadoop.util.ResilientOperation.retry(ResilientOperation.java:103)
at org.apache.beam.sdk.util.GcsUtil.expand(GcsUtil.java:227)
... 10 more
Try to run it From command Prompt if using Windows.
Go to the folder containing pom.xml file and open cmd there.
then give command with the respective arguments.
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args=" --output=counts" -Pdirect-runner
If you want to run with your input file. Then make a txt file with any name and put it in the folder containing pom. And then Fire following Command.
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=YOURFILENAME.txt --output=counts" -Pdirect-runner**
Hope this will do. Rest i am looking into your issue
I would like to monitor elasticsearch using nagios.
Basiclly, I want to know if elasticsearch is up.
I think I can use the elasticsearch Cluster Health API (see here)
and use the 'status' that I get back (green, yellow or red), but I still don't know how to use nagios for that matter ( nagios is on one server and elasticsearc is on another server ).
Is there another way to do that?
EDIT :
I just found that - check_http_json. I think I'll try it.
After a while - I've managed to monitor elasticsearch using the nrpe.
I wanted to use the elasticsearch Cluster Health API - but I couldn't use it from another machine - due to security issues...
So, in the monitoring server I created a new service - which the check_command is check_command check_nrpe!check_elastic. And now in the remote server, where the elasticsearch is, I've editted the nrpe.cfg file with the following:
command[check_elastic]=/usr/local/nagios/libexec/check_http -H localhost -u /_cluster/health -p 9200 -w 2 -c 3 -s green
Which is allowed, since this command is run from the remote server - so no security issues here...
It works!!!
I'll still try this check_http_json command that I posted in my qeustion - but for now, my solution is good enough.
After playing around with the suggestions in this post, I wrote a simple check_elasticsearch script. It returns the status as OK, WARNING, and CRITICAL corresponding to the "status" parameter in the cluster health response ("green", "yellow", and "red" respectively).
It also grabs all the other parameters from the health page and dumps them out in the standard Nagios format.
Enjoy!
Shameless plug: https://github.com/jersten/check-es
You can use it with ZenOSS/Nagios to monitor cluster health, data indices, and individual node heap usage.
You can use this cool Python script for monitoring your Elasticsearch cluster. This script check your IP:port for Elasticsearch status. This one and more Python script for monitoring Elasticsearch can be found here.
#!/usr/bin/python
from nagioscheck import NagiosCheck, UsageError
from nagioscheck import PerformanceMetric, Status
import urllib2
import optparse
try:
import json
except ImportError:
import simplejson as json
class ESClusterHealthCheck(NagiosCheck):
def __init__(self):
NagiosCheck.__init__(self)
self.add_option('H', 'host', 'host', 'The cluster to check')
self.add_option('P', 'port', 'port', 'The ES port - defaults to 9200')
def check(self, opts, args):
host = opts.host
port = int(opts.port or '9200')
try:
response = urllib2.urlopen(r'http://%s:%d/_cluster/health'
% (host, port))
except urllib2.HTTPError, e:
raise Status('unknown', ("API failure", None,
"API failure:\n\n%s" % str(e)))
except urllib2.URLError, e:
raise Status('critical', (e.reason))
response_body = response.read()
try:
es_cluster_health = json.loads(response_body)
except ValueError:
raise Status('unknown', ("API returned nonsense",))
cluster_status = es_cluster_health['status'].lower()
if cluster_status == 'red':
raise Status("CRITICAL", "Cluster status is currently reporting as "
"Red")
elif cluster_status == 'yellow':
raise Status("WARNING", "Cluster status is currently reporting as "
"Yellow")
else:
raise Status("OK",
"Cluster status is currently reporting as Green")
if __name__ == "__main__":
ESClusterHealthCheck().run()
I wrote this a million years ago, and it might still be useful: https://github.com/radu-gheorghe/check-es
But it really depends on what you want to monitor. The above measures:
if Elasticsearch responds to HTTP
if ingestion rate drops under the defined levels
if total number of documents drops the defined levels
But of course there's much more that might be interesting. From query time to JVM heap usage. We wrote a blog post about the most important ones here: https://sematext.com/blog/top-10-elasticsearch-metrics-to-watch/
Elasticsearch has APIs for all these, so you may be able to use a generic check_http_json to get the needed metrics. Alternatively, you may want to use something like Sematext Monitoring for Elasticsearch, which gets these metrics out of the box, then forward threshold/anomaly alerts to Nagios. (disclosure: I work for Sematext)
I'm trying to stop a Windows service on a local machine (the service is Topshelf.Host, if that matters) with this code:
serviceController.Stop();
serviceController.WaitForStatus(ServiceControllerStatus.Stopped, timeout);
timeout is set to 1 hour, but service never actually gets stopped. Strange thing with it is that from within Services MMC snap-in I see it in "Stopping" state first, but after a while it reverts back to "Started". However, when I try to stop it manually, an error occurs:
Windows could not stop the Topshelf.Host service on Local Computer.
Error 1061: The service cannot accept control messages at this time.
Am I missing something here?
I know I am quite late to answer this but I faced a similar issue , i.e., the error: "The service cannot accept control messages at this time." and would like to add this as a reference for others.
You can try killing this service using powershell (run powershell as administrator):
#Get the PID of the required service with the help of the service name, say, service name.
$ServicePID = (get-wmiobject win32_service | where { $_.name -eq 'service name'}).processID
#Now with this PID, you can kill the service
taskkill /f /pid $ServicePID
Either your service is busy processing some big operation or is in transition to change the state. hence is not able to accept anymore input...just think of it as taking more than it can chew...
if you are sure that you haven't fed anything big to it, just go to task manager and kill the process for this service or restart your machine.
I had exact same problem with Topshelf hosted service. Cause was long service start time, more than 20 seconds. This left service in state where it was unable to process further requests.
I was able to reproduce problem only when service was started from command line (net start my_service).
Proper initialization for Topshelf service with long star time is following:
namespace Example.My.Service
{
using System;
using System.Threading.Tasks;
using Topshelf;
internal class Program
{
public static void Main()
{
HostFactory.Run(
x =>
{
x.Service<MyService>(
s =>
{
MyService testServerService = null;
s.ConstructUsing(name => testServerService = new MyService());
s.WhenStarted(service => service.Start());
s.WhenStopped(service => service.Stop());
s.AfterStartingService(
context =>
{
if (testServerService == null)
{
throw new InvalidOperationException("Service not created yet.");
}
testServerService.AfterStart(context);
});
});
x.SetServiceName("my_service");
});
}
}
public sealed class MyService
{
private Task starting;
public void Start()
{
this.starting = Task.Run(() => InitializeService());
}
private void InitializeService()
{
// TODO: Provide service initialization code.
}
[CLSCompliant(false)]
public void AfterStart(HostControl hostStartedContext)
{
if (hostStartedContext == null)
{
throw new ArgumentNullException(nameof(hostStartedContext));
}
if (this.starting == null)
{
throw new InvalidOperationException("Service start was not initiated.");
}
while (!this.starting.Wait(TimeSpan.FromSeconds(7)))
{
hostStartedContext.RequestAdditionalTime(TimeSpan.FromSeconds(10));
}
}
public void Stop()
{
// TODO: Provide service shutdown code.
}
}
}
I've seen this issue as well, specifically when a service is start pending and I send it a stop programmatically which succeeds but does nothing. Also sometimes I see stop commands to a running service fail with this same exception but then still actually stop the service. I don't think the API can be trusted to do what it says. This error message explanation is quite helpful...
http://technet.microsoft.com/en-us/library/cc962384.aspx
I run into a similar issue and found out it was due to one of the services getting stuck in a state of start-pending, stop pending, or stopped.
Rebooting the server or trying to restart services did not work.
To solve this, I run the Task Manager in the server and in the "Details" tab I located the services that were stuck and killed the process by ending the task. After ending the task I was able to restart services without problem.
In brief:
1. Go to Task Manager
2. Click on "Detail" tab
3. Locate your service
4. Right click on it and stop/kill the process.
That is it.
I know it was opened while ago, but i am bit missing the option with Windows command prompt, so only for sake of completeness
Open Task Manager and find respective process and its PID i.e PID = 111
Eventually you can narrow down the executive file i.e. Image name = notepad.exe
in command prompt use command TASKKILL
example: TASKKILL /F /PID 111 ; TASKKILL /F /IM notepad.exe
I had this exact issue internally when starting and stopping a service using PowerShell (Via Octopus Deploy). The root cause for the service not responding to messages appeared to be related to devs accessing files/folders within the root service install directory via an SMB connection (looking at a config file with notepad/explorer).
If the service gets stuck in that situation then the only option is to kill it and sever the connections using computer management. After that, service was able to be redeployed fine.
May not be the exact root cause, but something we now check for.
I faced the similar issue. This error sometimes occur because the service can no longer accept control messages, this may be due to disk space issues in the server where that particular service's log file is present.
If this occurs, you can consider the below option as well.
Go to the location where the service exe & its log file is located.
Free up some space
Kill the service's process via Task manager
Start the service.
I just fought this problem while moving code from an old multi partition box to a newer single partition box. On service stop I was writing to D: and since it didn't exist anymore I got a 1061 error. Any long operation during the OnStop will cause this though unless you spin the call off to another thread with a callback delegate.