Sometimes my Travis builds fail because there are too many active Sauce Connect tunnels. I tried limiting the number of concurrent builds on Travis to 5 (then 4...) but it still happens. I searched for the error message but couldn't find any useful results.
Here is an example: https://travis-ci.org/SEL-Columbia/dokomoforms/jobs/50537933
Is there a standard way to deal with this?
This actually is not a Travis problem. Sauce is throttling the Tunnels. Your license with Sauce will determine how many concurrent tunnels you can have. But I do believe there is an upper limit. This actually could be a bug in the integration. Did you check your.travis.yml where the tunnel is created. What happens if you run without sauce connect?
Related
With a Kafka cluster of 3 and a Zookeeper cluster of the same I brought up one distributed connector node. This node ran successfully with a single task. I then brought up a second connector, this seemed to run as some of the code in the task definitely ran. However it then didn't seem to stay alive (though with no errors thrown, the not staying alive was observed by a lack of expected activity, while the first connector continued to function correctly). When I call the URL http://localhost:8083/connectors/mqtt/tasks, on each connector node, it tells me the connector has one task. I would expect this to be two tasks, one for each node/worker. (Currently the worker configuration says tasks.max = 1 but I've also tried setting it to 3.
When I try and bring up a third connector, I get the error:
"POST /connectors HTTP/1.1" 500 90 5
(org.apache.kafka.connect.runtime.rest.RestServer:60)
ERROR IO error forwarding REST request:
(org.apache.kafka.connect.runtime.rest.RestServer:241)
java.net.ConnectException: Connection refused
Trying to call the connector POST method again from the shell returns the error:
{"error_code":500,"message":"IO Error trying to forward REST request:
Connection refused"}
I also tried upgrading to Apache Kafka 0.10.1.1 that was released today. I'm still seeing the problems. The connectors are each running on isolated Docker containers defined by a single image. They should be identical.
The problem could be that I'm trying to run the POST request to http://localhost:8083/connectors on each worker, when I only need to run it once on a single worker and then the tasks for that connector will automatically distribute to the other workers. If this is the case, how do I get the tasks to distribute? I currently have the max set to three, but only one appears to be running on a single worker.
Update
I ultimately got things running using essentially the same approach that Yuri suggested. I gave each worker a unique group ID, then gave each connector task the same name. This allowed the three connectors and their single tasks to share a single offset, so that in the case of sink connectors the messages they consumed from Kafka were not duplicated. They are basically running as standalone connectors since the workers have different group ids and thus won't communicate with each other.
If the connector workers have the same group ID, you can't add more than one connector with the same name. If you give the connectors different names, they will have different offsets and consume duplicate messages. If you have three workers in the same group, one connector and three tasks, you would theoretically have an ideal situation where the tasks share an offset and the workers make sure the tasks are always running and well distributed (with each task consuming a unique set of partitions). In practice the connector framework doesn't create more than one task, even with tasks.max set to 3 and when the topic tasks are consuming has 25 partitions.
If anyone knows why I'm seeing this behaviour, please let me know.
I've encountered with similar issue in the same situation as yours.
Task.max is configured for a topic and distributed workers automatically decide what nodes handle topic. So, if you have 3 workers in a cluster and your topic configuration says task.max=2 then only 2 of 3 workers will process the topic. In theory, if one of workers fails, 3rd should pick up workload. But..
The distributed connector turned out to be very unreliable: once you add\remove some nodes, the cluster broke down and all workers did nothing but tried to choose leader and failed. The only way to fix was to restart whole cluster and preferably all workers simultaneously.
I chose another way - I used standalone worker and it works like a charm to me because distribution of load is implemented on Kafka client level and once some worker dropped, the cluster re-balances automatically and clients connected to unoccupied topics.
PS. Maybe it will be useful for you too. Confluent connector is not tolerate to invalid payload that does not match topic's schema. Once the connector get some invalid message it silently dies. The only way to find out is to analyze metrics.
I'm posting an answer to an old question, since Kafka Connect has moved on a lot in three years.
In the latest version (2.3.1) there is incremental rebalancing which massively improves the behaviour of Kafka Connect.
It's also worth noting that when configuring Kafka Connect rest.advertised.host.name must be set correctly, as if it's not you will see errors including the one quoted
{"error_code":500,"message":"IO Error trying to forward REST request: Connection refused"}
See this post for more details.
I work on a large open source project based on ruby on rails. We use Github, Travis, Code Climate and others. Our test suite takes a long time to run and we have many pull requests opened and updated through the day, which creates a large backlog. We even implemented a build killer in our bot to prevent any unnecessary builds, however we still have a backlog. Is it possible for us to host our own runner to increase the number of workers?
There's Travis CI Enterprise (https://enterprise.travis-ci.com/) that lets people host their own runners, but that's probably mostly only for paid-for customers. Have you guys swapped over to the container-based builds? Might speed things up a bit. What's the project?
I am using Jenkins as CI from the past one year. But sometimes the service does not start automatically even it was configured as "Automatic" in windows services. And there was no any error or warning logged in event viewer. It is very crazy why it happens like this. Is there any configuration to be done to avoid such case?
I ran into a very similar situation and was only able to get a successful restart after opening up connectivity to http://178.255.83.1:80. This seems to be a OCSP server, so the traffic seems legitimate.
I'm not sure if it's actually the issue you were having, but it might be something to look at.
I've installed a master jenkins instance and 2 slave nodes.
Both slaves are not synchronous with the master. Sometimes it shows that the slaves are 2 days or 1 hour in the future, sometimes it shows that time on slaves is behind the master - it seems to randomize.
Because of this some selenium tests or builds or other jobs doesn't work correctly anymore. The problem occurred suddenly and it doesn't matter which version of jenkins has been installed.
Has anyone an idea how to fix this problem?
Thank you very much.
Cheers
Christoph
It is hard to explain why the time difference would vary abruptly between the two machines. I assume you are referring to the information given by the http://jenkins.mydomain/computer/ url.
Normally you want to keep your machines in time sync, and enable NTP clients on each host, each pointing to the same set of NTP servers, either internal to your organization (if that is available), or the standard free NTP services available on the web.
Do you have this setup already and see abrupt time variations? If so, review your list of NTP services and make sure to use reliable ones and also the same list, it should help. Maybe narrow it down to just one service and then expand if need be.
While I'm interested in Jenkins as a means to provide continuous build functionality, I'm really even more interested in Jenkins as a means to exercise my application in its prod environment against unexpected changes in infrastructure beyond my control that may effect my application. I can't find a ton of information on using Jenkins in this way, but I was wondering if there are others out there doing this? Essentially I have a project that runs maven test parametized with my prod url, but for these projects I don't actually do any building. Are there other tools besides Jenkins I should be considering to do this? If so, why?
If you've got your tests set up to run via Maven already, I think Jenkins would be a good option. You could set up email, IM or SMS alerts using Jenkins plugins, and keep a record of the results within Jenkins.
The only down sides I can think of are:
You'll probably want to run your monitoring a lot more frequently than a regular CI job, so you might want to keep more build records than the default of 10.
If you already have a system like Nagios or OpenView to monitor system resources, it might be better to integrate app monitoring into that rather than having another source of truth.
Jenkins Provides a plugin called Status Monitor Plugin
We have ours set to check a specific URL every 5 mins and email us when something fails. Our problem is that it won't sent emails to cell phone carrier email addresses. However, if regular email will suffice, the setup time for a plugin is less than a half hour and it is reliable as long as the Jenkins server stays up.