Icinga2 Cluster? - icinga

I'm trying to configure an Icinga2 Master Server with 2 Clients for the beginning. So I want the configuration like I'm configuring the Master and synchronize the Configs to the Clients.
This works already, but if a client goes down. The Master says it is still up, because the clients are checking themselves.
The tricky thing is that I can't work with IP's because all IP's are dynamic and I can't register a dyn-dns for every Server. Later it will be 30-50 Servers.
Hope someone can help me.

You can use puppet-icinga2 which allows collecting information about nodes. On client side you'd create exportable resource (puppet code follows):
##icinga2::object::host { $::fqdn:
display_name => $::fqdn,
address => $::ipaddress_eth0,
check_command => 'hostalive',
target => "/etc/icinga2/zones.d/${::domain}/hosts.conf",
zone => $::fqdn,
}
##::icinga2::object::endpoint { "$::fqdn":
host => "$::ipaddress_eth0",
}
##::icinga2::object::zone { "$::fqdn":
endpoints => [ "$::fqdn", ],
parent => 'master',
}
which will be propagated to master (PuppetDB is required):
Icinga2::Object::Host <<| |>> { }
Icinga2::Object::Endpoint <<| |>> { }
Icinga2::Object::Zone <<| |>> { }
As long as the puppet master has stable DNS you'll have updated zone.conf. After puppet agent run on client host information gets registered in PuppetDB. Upon next puppet agent run on master it will have up-to-date information about the node.
Then you can implement a check from icinga master:
apply Service "ping" to Host {
import "generic-service"
check_command = "ping"
zone = "master" //execute check from master zone
assign where "linux-server" in host.groups
}
Note there are also other automation integrations like Ansible which might offer similar functionality.

Related

Why is the Network Watcher on Azure not destroyed by Terraform?

I have a simple Terraform configuration to create azure virtual network. When I do plan and then apply, a virtual network is created inside of a resource group as expected. But in addition to this resource group, there is one more created by the name NetworkWatcherRG, and inside of it I see a network watcher.
And the network watcher.
Now when I run the Terraform destroy command, I expect that every thing is cleaned up, all the Resource groups are destroyed. But instead, everything except for the NetworkWatcherRG and the Network Watcher inside of it are destroyed.
Looks like the Network Watcher along with its resource group, is NOT managed by Terraform. What am I missing?
The network watcher is not immediately obvious. Its not reveled immediately. So to see that, you need to go the simplified view of the resource groups. You need to click the Refresh button atleast 5 times(each time with a 2 second time gap) or you have to wait for long time and then click refresh.
So what is this network watcher and is it that Azure is creating it by itself and not managed by Terraform?
My Terraform configuration file is as follows.
# Terraform settings Block
terraform {
required_version = ">= 1.0.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">= 2.0"
}
}
}
# Provider Block
provider "azurerm" {
features {}
}
# create virtual network
resource "azurerm_virtual_network" "myvnet" {
name = "vivek-1-vnet"
address_space = ["10.0.0.0/16"] # This is a list, it has []. If it has { }, then its a map.
location = azurerm_resource_group.myrg.location
resource_group_name = azurerm_resource_group.myrg.name
tags = { # This is a map. This is {}
"name" = "vivek-1-vnet"
}
}
# Resource-1: Azure Resource Group
resource "azurerm_resource_group" "myrg" {
name = "vivek-vnet-rg"
location = var.resource_group_location
}
variable "resource_group_location" {
default = "centralindia"
description = "Location of the resource group."
}
And finally the commands I use are as follows.
terraform fmt
terraform init
terraform validate
terraform plan -out main.tfplan
terraform apply main.tfplan
terraform plan -destroy -out main.destroy.tfplan
terraform apply main.destroy.tfplan
I read the response from #RahulKumarShaw-MT . I believe the answer and it makes complete sense that terraform won't destroy resources it didn't create (unless someone can demonstrate otherwise). That said, I was able to delete the NetworkWatcherRG group using terraform! What I did to achieve this was I made sure to add a network watcher as one of my declared resources using azurerm_network_watcher (see https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/network_watcher) in the same terraform script where I requested a virtual machine resource in another separate resource group. I think you created a vnet. My script creates a vnet too, and hence why I think Azure concludes that there is a need for a network watcher maybe? I named the first resource group, which contains my network watcher, whatever I wanted; doesn't have to be 'NetworkWatcherRG'. I watched the resource group be created and destroyed successfully with Terraform (using terraform apply and terraform destroy respectively, of course) along with my VM and vnet resources. Anyway, at the end, I refreshed the Azure Portal web page and saw no resource groups or resources in my test subscription. I'm not an Azure expert, but I suspect that if Azure already sees a network watcher present, then it won't create an additional one when terraform created my resources (e.g. - in my case a vm and a vnet), as a watcher will already be present as long as terraform creates that resource first before Azure gets a chance to.
Before applying terraform code i checked in my resource groups with name network watcher resource group for me , by default this resource grpup is created by Azure side.
As Mike-Ubezzi wrote on Microsoft forums:
Network Watcher resources are located in the hidden NetworkWatcherRG
resource group which is created automatically. For example, the NSG
Flow Logs resource is a child resource of Network Watcher and is
enabled in the NetworkWatcherRG.
The Network Watcher resource represents the backend service for
Network Watcher and is fully managed by Azure. Customers do no need to
manage it. Operations like move are not supported on the resource.
However, the resource can be
deleted.
So terraform destroy will only delete the resource created by you(mentioned in .tfstate file).This is the region you won't able to delete the NetworkWatcherRG Resource Group.

easiest way to schedule a Google Cloud Dataflow job

I just need to run a dataflow pipeline on a daily basis, but it seems to me that suggested solutions like App Engine Cron Service, which requires building a whole web app, seems a bit too much.
I was thinking about just running the pipeline from a cron job in a Compute Engine Linux VM, but maybe that's far too simple :). What's the problem with doing it that way, why isn't anybody (besides me I guess) suggesting it?
This is how I did it using Cloud Functions, PubSub, and Cloud Scheduler
(this assumes you've already created a Dataflow template and it exists in your GCS bucket somewhere)
Create a new topic in PubSub. this will be used to trigger the Cloud Function
Create a Cloud Function that launches a Dataflow job from a template. I find it easiest to just create this from the CF Console. Make sure the service account you choose has permission to create a dataflow job. the function's index.js looks something like:
const google = require('googleapis');
exports.triggerTemplate = (event, context) => {
// in this case the PubSub message payload and attributes are not used
// but can be used to pass parameters needed by the Dataflow template
const pubsubMessage = event.data;
console.log(Buffer.from(pubsubMessage, 'base64').toString());
console.log(event.attributes);
google.google.auth.getApplicationDefault(function (err, authClient, projectId) {
if (err) {
console.error('Error occurred: ' + err.toString());
throw new Error(err);
}
const dataflow = google.google.dataflow({ version: 'v1b3', auth: authClient });
dataflow.projects.templates.create({
projectId: projectId,
resource: {
parameters: {},
jobName: 'SOME-DATAFLOW-JOB-NAME',
gcsPath: 'gs://PATH-TO-YOUR-TEMPLATE'
}
}, function(err, response) {
if (err) {
console.error("Problem running dataflow template, error was: ", err);
}
console.log("Dataflow template response: ", response);
});
});
};
The package.json looks like
{
"name": "pubsub-trigger-template",
"version": "0.0.1",
"dependencies": {
"googleapis": "37.1.0",
"#google-cloud/pubsub": "^0.18.0"
}
}
Go to PubSub and the topic you created, manually publish a message. this should trigger the Cloud Function and start a Dataflow job
Use Cloud Scheduler to publish a PubSub message on schedule
https://cloud.google.com/scheduler/docs/tut-pub-sub
There's absolutely nothing wrong with using a cron job to kick off your Dataflow pipelines. We do it all the time for our production systems, whether it be our Java or Python developed pipelines.
That said however, we are trying to wean ourselves off cron jobs, and move more toward using either AWS Lambdas (we run multi cloud) or Cloud Functions. Unfortunately, Cloud Functions don't have scheduling yet. AWS Lambdas do.
There is a FAQ answer to that question:
https://cloud.google.com/dataflow/docs/resources/faq#is_there_a_built-in_scheduling_mechanism_to_execute_pipelines_at_given_time_or_interval
You can automate pipeline execution by using Google App Engine (Flexible Environment only) or Cloud Functions.
You can use Apache Airflow's Dataflow Operator, one of several Google Cloud Platform Operators in a Cloud Composer workflow.
You can use custom (cron) job processes on Compute Engine.
The Cloud Function approach is described as "Alpha" and it's still true that they don't have scheduling (no equivalent to AWS cloudwatch scheduling event), only Pub/Sub messages, Cloud Storage changes, HTTP invocations.
Cloud composer looks like a good option. Effectively a re-badged Apache Airflow, which is itself a great orchestration tool. Definitely not "too simple" like cron :)
You can use cloud scheduler to schedule your job as well. See my post
https://medium.com/#zhongchen/schedule-your-dataflow-batch-jobs-with-cloud-scheduler-8390e0e958eb
Terraform script
data "google_project" "project" {}
resource "google_cloud_scheduler_job" "scheduler" {
name = "scheduler-demo"
schedule = "0 0 * * *"
# This needs to be us-central1 even if the app engine is in us-central.
# You will get a resource not found error if just using us-central.
region = "us-central1"
http_target {
http_method = "POST"
uri = "https://dataflow.googleapis.com/v1b3/projects/${var.project_id}/locations/${var.region}/templates:launch?gcsPath=gs://zhong-gcp/templates/dataflow-demo-template"
oauth_token {
service_account_email = google_service_account.cloud-scheduler-demo.email
}
# need to encode the string
body = base64encode(<<-EOT
{
"jobName": "test-cloud-scheduler",
"parameters": {
"region": "${var.region}",
"autoscalingAlgorithm": "THROUGHPUT_BASED",
},
"environment": {
"maxWorkers": "10",
"tempLocation": "gs://zhong-gcp/temp",
"zone": "us-west1-a"
}
}
EOT
)
}
}

Adding Basic Monitoring Package to Virtual Guest via API

Is it possible to add a monitoring package through the Softlayer API. On the portal, I can go into the Monitoring section and Order a "Monitoring Package - Basic", which will associate it with that Virtual Guest.
Is it possible to do this either during the placeOrder call or after the initial placeOrder call (i.e if the customer wants to add Basic Monitoring after the server is provisioned).
I tried to look into examples but they all assumed that there was a monitoring agent available, but it wasnt in my case. I also looked into Going Further with Softlayer part 3 but not sure how to extract the Basic Monitoring package from Product_Package Service.
Im using Python to do this, so any pointers in associating a Monitoring service during creation or after-creation would be very helpful.
Thanks in Advance!
try this:
"""
Order a Monitoring Package
Build a SoftLayer_Container_Product_Order_Monitoring_Package object for a new
monitoring order and pass it to the SoftLayer_Product_Order API service to order it
In this care we'll order a Basic (Hardware and OS) package with Basic Monitoring Package - Linux
configuration for more details see below
Important manual pages:
https://sldn.softlayer.com/reference/datatypes/SoftLayer_Container_Product_Order_Monitoring_Package
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Product_Item_Price
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order/verifyOrder
http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order/placeOrder
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Monitoring_Agent_Configuration_Template_Group
License: http://sldn.softlayer.com/article/License
Author: SoftLayer Technologies, Inc. <sldn#softlayer.com>
"""
import SoftLayer
USERNAME = 'set me'
API_KEY = 'set me'
"""
Build a skeleton SoftLayer_Container_Product_Order_Monitoring_Package object
containing the order you wish to place.
"""
oderTemplate = {
'complexType': 'SoftLayer_Container_Product_Order_Monitoring_Package',
'packageId': 0, # the packageID for order monitoring packages is 0
'prices': [
{'id': 2302} # this is the price for Monitoring Package - Basic ((Hardware and OS))
],
'quantity': 0, # the quantity for order a service (in this case monitoring package) must be 0
'sendQuoteEmailFlag': True,
'useHourlyPricing': True,
'virtualGuests': [
{'id': 4906034} # the virtual guest ID where you want add the monitoring package
],
'configurationTemplateGroups': [
{'id': 3} # the templateID for the monitoring group (in this case Basic Monitoring package for Unix/Linux operating system.)
]
}
# Declare the API client to use the SoftLayer_Product_Order API service
client = SoftLayer.Client(username=USERNAME, api_key=API_KEY)
productOrderService = client['SoftLayer_Product_Order']
"""
verifyOrder() will check your order for errors. Replace this with a call to
placeOrder() when you're ready to order. Both calls return a receipt object
that you can use for your records.
Once your order is placed it'll go through SoftLayer's provisioning process.
"""
try:
order = productOrderService.verifyOrder(oderTemplate)
print(order)
except SoftLayer.SoftLayerAPIError as e:
print("Unable to verify the order! faultCode=%s, faultString=%s"
% (e.faultCode, e.faultString))
exit(1)
this is an example to create an network monitoring
"""
Create network monitoring
The script creates a monitoring network with Service ping
in a determinate IP address
Important manual pages
http://sldn.softlayer.com/reference/services/SoftLayer_Network_Monitor_Version1_Query_Host
http://sldn.softlayer.com/reference/datatypes/SoftLayer_Network_Monitor_Version1_Query_Host
License: http://sldn.softlayer.com/article/License
Author: SoftLayer Technologies, Inc. <sldn#softlayer.com>
"""
import SoftLayer.API
from pprint import pprint as pp
# Your SoftLayer API username and key.
USERNAME = 'set me'
API_KEY = 'set me'
# The ID of the server you wish to monitor
serverId = 7698842
"""
ID of the query type which can be found with SoftLayer_Network_Monitor_Version1_Query_Host_Stratum/getAllQueryTypes.
This example uses SERVICE PING: Test ping to address, will not fail on slow server response due to high latency or
high server load
"""
queryTypeId = 1
# IP address on the previously defined server to monitor
ipAddress = '10.104.50.118'
# Declare the API client
client = SoftLayer.Client(username=USERNAME, api_key=API_KEY)
networkMonitorVersion = client['SoftLayer_Network_Monitor_Version1_Query_Host']
# Define the SoftLayer_Network_Monitor_Version1_Query_Host templateObject.
newMonitor = {
'guestId': serverId,
'queryTypeId': queryTypeId,
'ipAddress': ipAddress
}
# Send the request for object creation and display the return value
try:
result = networkMonitorVersion.createObject(newMonitor)
pp(result)
except SoftLayer.SoftLayerAPIError as e:
print("Unable to create new network monitoring "
% (e.faultCode, e.faultString))
exit(1)
Regards

How can a docker service know about all other containers of the same service?

I'm working on a file sync Docker microservice. Basically I will have a file-sync service that is global to the swarm (one on each node). Each container in the service needs to peer with all the other containers on different nodes. Files will be distributed across the nodes, not a complete duplicate copy. Some files will reside on only certain nodes. I want to be able to selectively copy a subset of the files from one node to another.
How can I get a list of the endpoints of all the other containers so the microservice can peer with the them? This needs to happen programmatically.
On a related note, I'm wondering if a file-sync microservice is the best route for the solution I'm working on.
Basically I have some videos a user has uploaded. I want to be able to encode them into different formats. I was planning on having the video encoding node have the file-sync service pull the files, encode the videos, and then use the file-sync to push the encoded files back to the same server. I know I can use some kind of object store but that isn't available to me with bare metal dedicated servers and I'd rather not deal with OpenStack if I don't need to.
Thanks to #johnharris85 for the above suggestion. For anyone else that is interested I created a snippet that can be used in node.
https://gist.github.com/brennancheung/62d2abe16569e600d2be5e9495c85331
const dns = require('dns')
function lookup (serviceName) {
const tasks = `tasks.${serviceName}`
return new Promise((resolve, reject) => {
dns.lookup(tasks, { all: true }, (err, addresses, family) => {
if (err) {
return reject(err)
}
const filtered = addresses.filter(address => address.family === 4)
const ips = filtered.map(x => x.address)
resolve(ips)
})
})
}
async function main () {
const result = await lookup('hello')
console.log(result)
}
main()

f5 LTM irule - can a pool name be generated in an irule

I need to setup a configuration for many similar environments. Each will have a different hostname that follows a pattern, e.g. env1, env2, etc.
I can use a pool per environment and a single virtual server with an irule that selects a pool based on hostname.
What I'd prefer to do is dynamically generate and select the pool name based on the requested hostname rather than listing out every pool in the switch statement. It's easier to maintain and automatically handles new environments.
The code might look like:
when HTTP_REQUEST {
pool [string tolower [HTTP:host]]
}
and each pool name matches the hostname.
Is this possible? Or is there a better method?
EDIT
I've expanded my hostname pool selection. I'm now trying to include the port number. The new rule looks like:
when HTTP_REQUEST {
set lb_port "[LB::server port]"
set hostname "[string tolower [getfield [HTTP::host] : 1]]"
log local0.info "Pool name $hostname-$lb_port-pool"
pool "$hostname-$lb_port-pool"
}
This is working, but I'm seeing no-such-pool errors in the logs because somehow a port 0 request is coming into the pool. It seems to be the first request and the followed by the request with the legitimate port.
Wed Feb 17 20:39:14 EST 2016 info tmm tmm[6519] Rule /Common/one-auto-pool-select-by-hostname-port <HTTP_REQUEST>: Pool name my.example.com-80-pool
Wed Feb 17 20:39:14 EST 2016 err tmm1 tmm[6519] 01220001 TCL error: /Common/one-auto-pool-select-by-hostname-port <HTTP_REQUEST> - no such pool: my.example.com-0-pool (line 1) invoked from within "pool "$hostname-$lb_port-pool""
Wed Feb 17 20:39:14 EST 2016 info tmm1 tmm[6519] Rule /Common/one-auto-pool-select-by-hostname-port <HTTP_REQUEST>: Pool name my.example.com-0-pool
What is causing the port 0 request? And is there any workaround? e.g. could I test for port 0 and select a default port or ignore it?
ONE MORE EDIT
Rebuilt the virtual server, and now the error has gone. The rebuild of the VS was just to rename it though. I'm fairly sure I recreated the settings exactly the same.
Yes, you can specify the pool name in a string. What you have there would work as long as you have a pool with that same name. Though it doesn't show an example of doing it this way, you can also check out the pool wiki page on DevCentral for more information.
As an aside, in my environment I generally create pools with the suffix _pool to distinguish them from other objects when looking at config files. So in my iRules, I would do something like this (essentially the same thing):
when HTTP_REQUEST {
pool "[string tolower [HTTP::host]]_pool"
}
The simple case mentioned by Michael works. I'd recommend removing the port value if present:
when HTTP_REQUEST {
pool "pool_[string tolower [getfield [HTTP::host] : 1]]_[LB::server port]"
}
Keep in mind that clients might send a partial hostname. If the DNS search path is set to example.org then the client might hit shared/ which maps to shared.example.org, but the HTTP::host header will just have shared. Some API libraries may append the port number even if it's on the default port. Simple code might not send a Host header. Malicious code might send completely bogus Host headers. You could trap these cases with catch.
You can also use a datagroup to map hostnames to pools. This allows multiple hosts to use the same pool. Sample code:
when HTTP_REQUEST {
set host [string tolower [getfield [HTTP::host] ":" 1]]
if { $host == "" } {
# if there's no Host header, pull from virtual server name
# we use: pool_<virtualserver>_PROTOCOL
set host [getfield [virtual name] _ 2]
} elseif { not ($host contains ".") } {
# if Host header does not contain a dot, assume example.org
set host $host.example.org
}
set pool [class match -value $host[HTTP::uri] starts_with dg_shared.example.org]
if { $pool ne ""} {
set matched [class match -name $host[HTTP::uri] starts_with dg_shared.example.org]
set log(matched) $matched
set log(pool) $pool
if { [catch { pool $pool } ] } {
set log(reason) "Failed to Connect to Pool"
call hsllog log
call errorpage 404 $log(reason) "https://[HTTP::host][HTTP::uri]" log
}
} else {
call errorpage 404 "No Pool Found" "https://[HTTP::host][HTTP::uri]" log
}
}
when SERVER_CONNECTED {
if {!($pool ends_with "_HTTPS") } {
SSL::disable serverside
}
}
This allows host.example.org/path1 to be on a different pool than host.example.org or host.example.org/path2 by including separate entries in the datagroup. I didn't include the hsllog and errorpage procs here. They dump the log array as well as the other passed parameters.
We then disable serverside ssl for pools that don't end in _HTTPS.
Note: As with dynamically generated pool names, the BIG-IP UI does not look inside datagroups for pool references, so the interface will allow you do delete one of these pools thinking it's not in use.
We use BigIPReport to identify orphan pools:
https://devcentral.f5.com/s/articles/bigip-report

Resources