browserless/chrome docker taking too much cpu on certain websites - docker

We are using browserless docker to read content on certain websites. For few websites, the CPU increases proportional to the number of sessions open.
Specifics
docker version browserless/chrome:1.35-chrome-stable
Script to reproduce once docker is up and running.
#!/bin/bash
HOST='localhost:3000'
curl_new_session() {
echo $(curl -XPOST http://$HOST/webdriver/session -d ' {"desiredCapabilities":{"browserName":"chrome","platform":"ANY","chromeOptions":{"binary":"","args":["--window-size=1400,900","--no-sandbox","--headless"]},"goog:chromeOptions":{"args":["--window-size=1400,900","--no-sandbox","--headless"]}}}' | jq '.sessionId') | tr -d '"'
{"desiredCapabilities":{"browserName":"chrome","platform":"ANY","chromeOptions":{"binary":"","args":["--window-size=1400,900","--no-sandbox","--headless"]},"goog:chromeOptions":{"args":["--window-size=1400,900","--no-sandbox","--headless"]}}}
}
# we open the session and keep it running
curl_visit_url() {
local id=$1
local url=$2
echo "http://$HOST/webdriver/session/$id/url"
echo $(curl http://$HOST/webdriver/session/$id/url -d '{"url":"'$url'"}' |jq '' )
}
for i in {1..5}
do
id=$(curl_new_session)
echo $id
curl_visit_url $id 'http://monday.com' &
sleep 0.5
echo '.'
done
This specific site(monday.com), uses too much cpu but this can happen with other sites as well.
Question
Considering we encounter such websites from time to time. What's the best way to handle them?
We want the sessions to be kept alive and not close after opening them.

Related

Kubernetes Deployment dynamic port forwarding

I am moving a Docker Image from Docker to a K8s Deployment. I have auto-scale rules on so it starts 5 but can go to 12. The Docker image on K8s starts perfectly with a k8s service in front to cluster the Deployment.
Now each container has its own JVM which has a Prometheus app retrieving its stats. In Docker, this is no problem because the port that serves Prometheus info is dynamically created with a starting port of 8000, so the docker-compose.yml grows the port by 1 based on how many images are started.
The problem is that I can't find how to do this in a K8s [deployment].yml file. Because Deployment pods are dynamic, I would have thought there would be some way to set a starting HOST port to be incremented based on how many containers are started.
Maybe I am looking at this the wrong way so any clarification would be helpful meanwhile will keep searching the Google for any info on such a thing.
Well after reading and reading and reading so much I came to the conclusion that K8s is not responsible to open ports for a Docker Image or provide ingress to your app on some weird port, it's not its responsibility. K8s Deployment just deploys the Pods you requested. You can set the Ports option on a DEPLOYMENT -> SPEC -> CONTAINERS -> PORTS which just like Docker is only informational. But this allows you to JSONPath query for all PODS(containers) with a Prometheus port available. This will allow you to rebuild the "targets" value in Prometheus.yaml file. Now having those targets makes them available to Grafana to create a dashboard.
That's it, pretty easy. I was complicating something because I did not understand it. I am including a script I QUICKLY wrote to get something going USE AT YOUR OWN RISK.
By the way, I use Pod and Container interchangeably.
#!/usr/bin/env bash
#set -x
_MyappPrometheusPort=8055
_finalIpsPortArray=()
_prometheusyamlFile=prometheus.yml
cd /docker/images/prometheus
#######################################################################################################################################################
#One container on the K8s System is weave and it holds the subnet we need to validate against.
#weave-net-lwzrk 2/2 Running 8 (7d3h ago) 9d 192.168.2.16 accl-ffm-srv-006 <none> <none>
_weavenet=$(kubectl get pod -n kube-system -o wide | grep weave | cut -d ' ' -f1 )
echo "_weavenet: $_weavenet"
#The default subnet is the one that lets us know the conntainer is part of kubernetes network.
# Range: 10.32.0.0/12
# DefaultSubnet: 10.32.0.0/12
_subnet=$( kubectl exec -n kube-system $_weavenet -c weave -- /home/weave/weave --local status | sed -En "s/^(.*)(DefaultSubnet:\s)(.*)?/\3/p" )
echo "_subnet: $_subnet"
_cidr2=$( echo "$_subnet" | cut -d '/' -f2 )
echo "_cidr2: /$_cidr2"
#######################################################################################################################################################
#This is an array of the currently monitored containers that prometheus was sstarted with.
#We will remove any containers form the array that fit the K8s Weavenet subnet with the myapp prometheus port.
_targetLineFound_array=($( egrep '^\s{1,20}-\s{0,5}targets\s{0,5}:\s{0,5}\[.*\]' $_prometheusyamlFile | sed -En "s/(.*-\stargets:\s\[)(.*)(\]).*/\2/p" | tr "," "\n"))
for index in "${_targetLineFound_array[#]}"
do
_ip="${index//\'/$''}"
_ipTocheck=$( echo $_ip | cut -d ':' -f1 )
_portTocheck=$( echo $_ip | cut -d ':' -f2 )
#We need to check if the IP is within the subnet mask attained from K8s.
#The port must also be the prometheus port in case some other port is used also for Prometheus.
#This means the IP should be removed since we will put the list of IPs from
#K8s currently in production by Deployment/AutoScale rules.
#Network: 10.32.0.0/12
_isIpWithinSubnet=$( ipcalc $_ipTocheck/$_cidr2 | sed -En "s/^(.*)(Network:\s+)([0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?\.[0-9]{1}[0-9]?[0-9]?)(\/[0-9]{1}[0-9]{1}.*)?/\3/p" )
if [[ "$_isIpWithinSubnet/$_cidr2" == "$_subnet" && "$_portTocheck" == "$_MyappPrometheusPort" ]]; then
echo "IP managed by K8s will be deleted: _isIpWithinSubnet: ($_ip) $_isIpWithinSubnet"
else
_finalIpsPortArray+=("$_ip")
fi
done
#######################################################################################################################################################
#This is an array of the current running myapp App containers with a prometheus port that is available.
#From this list we will add them to the prometheus file to be available for Grafana monitoring.
readarray -t _currentK8sIpsArr < <( kubectl get pods --all-namespaces --chunk-size=0 -o json | jq '.items[] | select(.spec.containers[].ports != null) | select(.spec.containers[].ports[].containerPort == '$_MyappPrometheusPort' ) | .status.podIP' )
for index in "${!_currentK8sIpsArr[#]}"
do
_addIPToMonitoring=${_currentK8sIpsArr[index]//\"/$''}
echo "IP Managed by K8s as myapp app with prometheus currently running will be added to monitoring: $_addIPToMonitoring"
_finalIpsPortArray+=("$_addIPToMonitoring:$_MyappPrometheusPort")
done
######################################################################################################################################################
#we need to recreate this string and sed it into the file
#- targets: ['192.168.2.13:3201', '192.168.2.13:3202', '10.32.0.7:8055', '10.32.0.8:8055']
_finalPrometheusTargetString="- targets: ["
i=0
# Iterate the loop to read and print each array element
for index in "${!_finalIpsPortArray[#]}"
do
((i=i+1))
_finalPrometheusTargetString="$_finalPrometheusTargetString '${_finalIpsPortArray[index]}'"
if [[ $i != ${#_finalIpsPortArray[#]} ]]; then
_finalPrometheusTargetString="$_finalPrometheusTargetString,"
fi
done
_finalPrometheusTargetString="$_finalPrometheusTargetString]"
echo "$_finalPrometheusTargetString"
sed -i -E "s/(.*)-\stargets:\s\[.*\]/\1$_finalPrometheusTargetString/" ./$_prometheusyamlFile
docker-compose down
sleep 4
docker-compose up -d
echo "All changes were made. Exiting"
exit 0
Ideally, you should be using the Average of JVM across all the replicas. There is no meaning to create a different deployment with a different port if you are running the single same Docker image across all the replicas.
i think keeping a single deployment with resource requirements set to deployment would be the best practice.
You can get the JVM average of all the running replicas
sum(jvm_memory_max_bytes{area="heap", app="app-name",job="my-job"}) / sum(kube_pod_status_phase{phase="Running"})
as you are running the same Docker image across all replicas and K8s service by default will be managing the Load Balancing, average utilization would be an option to monitor.
Still, if you want to filter and get different values you can create different deployments (Not at all good way) or use the stateful sets.
You can also filter the data by hostname (POD name) in Prometheus, so will get the each replica usage.

How to Deploy Systemd Microservices to Docker Image

I was maintaining application which develop on C, running by Systemd and it was microservices. Each services can communicate by Linux shared memory (IPCS), and use HTTP to communicate to outside. My question is, is it good to move this all services to one Docker container? I new in the container topic and people was recommended me to learn and use it.
The simple design of my application is below:
Note: MS is Microservice
Docker official web says :
It is generally recommended that you separate areas of concern by using one service per container
When the docker starts, it needs to link to a live and foreground process. If this process ends, the entire container ends. Also the default behavior in docker is related to logs is "catch" the stdout of the single process.
several process
If you have several process, a no one is the "main", I think it is possible to start them as background process but you will need a heavy while in bash to simulate a foreground process. Inside this loop you could check if your services still running because has no sense a live container when its internal process are exited or has errors.
while sleep 60; do
ps aux |grep my_first_process |grep -q -v grep
PROCESS_1_STATUS=$?
ps aux |grep my_second_process |grep -q -v grep
PROCESS_2_STATUS=$?
# If the greps above find anything, they exit with 0 status
# If they are not both 0, then something is wrong
if [ $PROCESS_1_STATUS -ne 0 -o $PROCESS_2_STATUS -ne 0 ]; then
echo "One of the processes has already exited."
exit 1
fi
done
One process
As apache and other tools make, you could create one live process and then, start your another child process from inside. This is called spawn process. Also as you mentioned http, this process could expose http endpoints to exchange information with the outside.
I'm not a C expert, but system method could be an option to launch another process:
#include <stdlib.h>
int main()
{
system("commands to launch service1");
system("commands to launch service2");
return 0;
}
Here some links:
How do you spawn another process in C?
https://suchprogramming.com/new-linux-process-c/
http://cplusplus.com/forum/general/250912/
Also to create a basic http server in c, you could check this:
https://stackoverflow.com/a/54164425/3957754
restinio::run(
restinio::on_this_thread()
.port(8080)
.address("localhost")
.request_handler([](auto req) {
return req->create_response().set_body("Hello, World!").done();
}));
This c program will keep live when it starts because is a server. So this will be perfect for docker.
Rest Api is the most common strategy to exchange information over internet between servers and/or devices.
If you achieve this, your c program will have these features:
start another required process (ms1,ms2,ms3, etc)
expose rest http endpoints to send a receive some information between your services and the world. Sample
method: get
url: https://alexsan.com/domotic-services/ms1/message/1
description: rest endpoint which returns the message 1 from service ms1 queue
returns:
{
"command": "close gateway=5"
}
method: post
url: https://alexsan.com/domotic-services/ms2/message
description: rest endpoint which receives a message containing a command to be executed on service ms2
receive:
{
"id": 100,
"command" : "open gateway=2"
}
returns:
{
"command": "close gateway=5"
}
These http endpoints could be invoked from webs, mobiles, etc
Use high level languages
You could use python, nodejs, or java to start a server and from its inside, launch your services and if you want expose some http endpoints. Here a basic example with python:
FROM python:3
WORKDIR /usr/src/app
# create requirements
RUN echo "bottle==0.12.17" > requirements.txt
# app.py is creating with echo just for demo purposes
# in real scenario, app.py should be another file
RUN echo "from bottle import route, run" >> app.py
RUN echo "import os" >> app.py
RUN echo "os.spawnl(os.P_DETACH, '/opt/services/ms1.acme')" >> app.py
RUN echo "os.spawnl(os.P_DETACH, '/opt/services/ms2.acme')" >> app.py
RUN echo "os.spawnl(os.P_DETACH, '/opt/services/ms3.acme')" >> app.py
RUN echo "os.spawnl(os.P_DETACH, '/opt/services/ms4.acme')" >> app.py
RUN echo "#route('/domotic-services/ms2/message')" >> app.py
RUN echo "def index():" >> app.py
RUN echo " return 'I will query the message'" >> app.py
RUN echo "run(host='0.0.0.0', port=80)" >> app.py
RUN pip install --no-cache-dir -r requirements.txt
CMD [ "python", "./app.py" ]
Also you can use nodejs:
https://github.com/jrichardsz/nodejs-express-snippets/blob/master/01-hello-world.js
https://nodejs.org/en/knowledge/child-processes/how-to-spawn-a-child-process/

Erlang: Why does inet:gethostbyname deliver 2 addresses whereas inet:getaddr delivers just one

The problem
I used inet:gethostbyname(Hoststr) in a docker environment with a couple containers for over a year without problems. Due to non recoverable read errors on the SSD which -- according to the provider -- is perfectly fine I was forced to reinstall from scratch with a CentOS image.
After restore I get a crash which boils down to
3> inet:gethostbyname("www").
{ok,{hostent,"www",
["www"],
inet,4,
[{10,0,1,205},{10,0,1,180}]}}
obviously because I get 2 IPs.
getaddr works fine:
4> inet:getaddr("www", inet).
{ok,{10,0,1,205}}
Question
I can replace gethostbyname with getaddr, no problem, but I would like to know why I get 2 IPs in the first place and how this misbehavior could creep in.
PHP does just fine:
id=$(docker ps -a | grep "vx_www.1\." | grep -v "xited" | awk '{print $1}') && docker exec -it $id ash
php > echo gethostbyname('www');
10.0.1.205
Is it a docker problem?
The problem may lie on the docker side, as both addresses ping.
It gets even more interesting (from the host):
/ # ip a | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b"
127.0.0.1
10.0.1.205
10.0.1.255
172.18.0.13
172.18.255.255
10.0.8.33
10.0.8.255
10.0.0.162
10.0.0.255
10.0.9.19
10.0.9.255
This should show addresses from within the docker system, if I understand correctly, and the latter address is not in the list. But it is somewhere:
# docker inspect sa3oizstflg3 | grep "10.0.1"
"Addr": "10.0.1.180/24"
What is a VirtualIP?
Actually I get this address with the ID of www as VirtualIP
together with a bunch of others
"VirtualIPs": [
{
"NetworkID": "y3rf5yes97yjs1rfzpzf4aou8",
"Addr": "10.0.0.148/24"
},
{
"NetworkID": "swagio8bzavl2bf5u5bivmt13",
"Addr": "10.0.1.180/24"
},
{
"NetworkID": "tnmpad21shpvsps6fps5m4own",
"Addr": "10.0.8.5/24"
},
{
"NetworkID": "mz9aogx7jxit8x2bflgpyh5lh",
"Addr": "10.0.9.2/24"
}
The same container listens to 2 different IPs
Taking a different container with PHP on board, I get the second address given by inet:gethostbyname("www") for the same container, so both seem to be correct and usable:
# id=$(docker ps -a | grep "vx_wsm.1\." | grep -v "xited" | awk '{print $1}') && docker exec -it $id ash
/ # php -a
Interactive shell
php > echo gethostbyname('www');
10.0.1.180
Now I am confused. Anybody knows what is happening here?
inet:gethostbyname seems to be not wrong but more correct, then.
Erlang question
As an addendum: I am not that proficient in Erlang. In my code it reads:
get_ip_web(Web) -> % e.g. www = > 100.0.1.226
[case X of $, -> $.; _ -> X end || X <- lists:flatten(io_lib:format("~p",element(6, element(2, inet:gethostbyname(Web))))), X=/=${, X=/=$}].
How to rewrite this fine piece of code to pick one of the two addresses, but working also with only one result?
Well, this is mostly an academic question as I didn't understand this one-liner at all initially -- my comment was incomprehensible. This is no longer the case, but I still struggle with constructs easy to handle in other languages, especially if there is no routine for a long time.
For your information: I replaced the one-liner above with this much simpler one-liner:
inet:ntoa(element(2,inet:getaddr(Web, inet)))
Erlang:
From the code, inet:getaddr/2 just gets the first ip from inet:getaddrs/2, which in turn gets them from gethostbyname.
You should use the hostent record instead of element (only if you go with functions that return a hostent, I'd rather use the getaddr like you did in the end, it's actually the same code):
-module(test).
-include_lib("kernel/include/inet.hrl").
-export([get_ip/0]).
get_ip() ->
{ok, #hostent{h_addr_list = [IP|_]}} = inet:gethostbyname("www.stackoverflow.com"),
inet:ntoa(IP).
Docker:
If you run ip -a in the host, you'll get only the ips for the host, not the ones used by containers. Usually the host has an IP in each of the bridges that make the different docker networks, so it's in the same range as the containers.
Regarding the several ips, I don't have experience with docker swarm (which it seems that you're using), only with kubernetes.
My guess is that you have a service exposed in several docker networks that shares the name (www) with the container that implements it. Thus, in the network of the www container you have DNS both resolve www to the container IP and the service virtualIP address. Maybe you can find the virtual ips in iptables/nftables?

why executing docker log -f it returns data from scratch?

I have a simple code in Dotnet core console and it simply counted forever :
class Program
{
static void Main(string[] args)
{
int counter = 1;
while (true)
{
counter++;
Console.WriteLine(counter);
}
}
}
I contained it with docker and ran it via VS Docker run.
but every time that I execute > docker logs -f dockerID it starts and shows counting from scratch! (1,2,3,....). I expected that whenever I run this command it shows me logs from the last integer that it counts!
Is "docker logs -f" cause to run a new instance of my application every time?
When running docker help logs you see this:
--tail string Number of lines to show from the end of the logs
(default "all")
So you should do:
docker logs --tail 100 -f dockerID
Additional reference: docker logs documentation > Options
docker logs -f should work similar to tail -f, so, it should show just new logs.
Therefore, if you're watching logs repeated, it looks your program is being executed in loop.
The only thing that comes to my mind is a little trick, although it's not an elegant solution, it can be useful for other similar situations:
Define in your system history time format according to docker logs
export HISTTIMEFORMAT="%Y-%m-%dT%H:%M:%S "
Get your last docker logs command date with a simple script.
LAST_DOCKER_LOG=`history | grep " docker logs" | tac | head -1 | awk '{print $2}'`
Get logs after your last docker logs execution:
docker logs --since $LAST_DOCKER_LOG <your_docker>
In one line:
docker logs -t --since `history | grep " docker logs" | tac | head -1 | awk '{print $2}'` <your_docker>
Note: history time and docker log times have to be in the same timezone.
In spite of everything, I'd have a look to your loop execution. Maybe, docker log -t timestamp show option can help you to solve it.

Convert check_load in Nagios to Zabbix

Hi I have just built my Zabbix server and in the process of configuring some checks currently setup in Nagios.
One these checks is check_load. Can anyone explain what this check means in Nagios and how I can replicate it in Zabbix.
In Nagios check_load monitors server load. Server load is a good indication of what your overall utilisation looks like : http://en.wikipedia.org/wiki/Load_(computing)
You can view server load easily on most *nix servers using the top command. The 3 numbers at the top right show your 1, 5 and 15 minute load averages. As a brief guide the load should be less than your number of processors. So for instance if you have a 4 cpu server then I would expect your load average to sit below 4.00.
I recently did a quick load monitor in nagios script format for http://www.dataloop.io
It was done quickly and needs a fair bit of work to work across other systems. But it gives a feel for how to scrape the output of top:
#!/bin/bash
onemin=$(top -b -n1 | sed -n '1p' | cut -d ' ' -f 13 | sed 's/%//')
fivemin=$(top -b -n1 | sed -n '1p' | cut -d ' ' -f 14 | sed 's/%//')
fifteenmin=$(top -b -n1 | sed -n '1p' | cut -d ' ' -f 15 | sed 's/%//')
int_fifteenmin=$( printf "%.0f" $fifteenmin )
echo "OK | 1min=$onemin;;;; 5min=$fivemin;;;; 15min=$fifteenmin;;;;"
alert=10
if [ "$int_fifteenmin" -gt "$alert" ]
then
exit 2
fi
exit 0
Hope this explains enough for you to create a Zabbix equivalent.
In zabbix, it is a zabbix agent built-in check. Search for system.cpu.load here.
As for what it measures, the already posted link to wikipedia article is a great read.

Resources