Create KsqlDB Stream via Kafka REST fails - ksqldb

I'm having difficulties executing "Create Stream" command against KsqlDB using the REST interface. Here is the failing command:
curl -X "POST" "http://localhost:8088/query" \
-H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" \
-d $'{
"ksql": "CREATE STREAM pageviews_home_2 AS SELECT * FROM pageviews_original WHERE pageid=\'home\';”,
"streamsProperties": {
"ksql.streams.auto.offset.reset": "earliest"
}'
The error I get is:
{"#type":"generic_error","error_code":50000,"message":"Failed to deserialize buffer"}
The very same command, when executed from within the KsqlDB prompt runs successfully:
ksql> CREATE STREAM pageviews_home AS SELECT * FROM pageviews_original WHERE pageid='home';
Message
----------------------------------------------
Created query with ID CSAS_PAGEVIEWS_HOME_17
----------------------------------------------
ksql> show streams;
Stream Name | Kafka Topic | Key Format | Value Format | Windowed
------------------------------------------------------------------------------------------
KSQL_PROCESSING_LOG | default_ksql_processing_log | KAFKA | JSON | false
PAGEVIEWS_ALICE | PAGEVIEWS_ALICE | KAFKA | DELIMITED | false
PAGEVIEWS_HOME | PAGEVIEWS_HOME | KAFKA | DELIMITED | false
PAGEVIEWS_ORIGINAL | pageviews | KAFKA | DELIMITED | false
------------------------------------------------------------------------------------------
ksql>
What is the proper way to accomplish this task via KsqDB REST interface?

Related

Docker Swarm: bypass load balancer and make direct request to specific containers

I have two containers running in a swarm. Each exposes a /stats endpoint which I am trying to scrape.
However, using the swarm port obviously results in the queries being load balanced and therefore the stats are all intermingled:
+--------------------------------------------------+
| Server |
| +-------------+ +-------------+ |
| | | | | |
| | Container A | | Container B | |
| | | | | |
| +-------------+ +-------------+ |
| \ / |
| \ / |
| +--------------+ |
| | | |
| | Swarm Router | |
| | | |
| +--------------+ |
| v |
+-------------------------|------------------------+
|
A Stats
B Stats
A Stats
B Stats
|
v
I want to keep the load balancer for application requests, but also need a direct way to make requests to each container to scrape the stats.
+--------------------------------------------------+
| Server |
| +-------------+ +-------------+ |
| | | | | |
| | Container A | | Container B | |
| | | | | |
| +-------------+ +-------------+ |
| | \ / | |
| | \ / | |
| | +--------------+ | |
| | | | | |
| | | Swarm Router | | |
| v | | v |
| | +--------------+ | |
| | | | |
+--------|----------------|----------------|-------+
| | |
A Stats | B Stats
A Stats Normal Traffic B Stats
A Stats | B Stats
| | |
| | |
v | v
A dynamic solution would be ideal, but since I don't intend to do any dynamic scaling something like hardcoded ports for each container would be fine:
::8080 Both containers via load balancer
::8081 Direct access to container A
::8082 Direct access to container B
Can this be done with swarm?
From inside an overlay network you can get IP-addresses of all replicas with tasks.<service_name> DNS query:
; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> -tA tasks.foo_test
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19860
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;tasks.foo_test. IN A
;; ANSWER SECTION:
tasks.foo_test. 600 IN A 10.0.1.3
tasks.foo_test. 600 IN A 10.0.1.5
tasks.foo_test. 600 IN A 10.0.1.6
This is mentioned in the documentation.
Also, if you use Prometheus to scrape those endpoints for metrics, you can combine the above with dns_sd_configs to set the targets to scrape (here is an article how). This is easy to get running but somewhat limited in features (especially in large environments).
A more advanced way to achieve the same is to use dockerswarm_sd_config (docs, example configuration). This way the list of endpoints will be gathered by querying Docker daemon, along with some useful labels (i.e. node name, service name, custom labels).
While less than ideal, you can introduce a microservice that acts as an intermediary to the other containers that are exposing /stats. This microservice would have to be configured with the individual endpoints and operate in the same network as said endpoints.
This doesn't bypass the load balancer, but instead makes it so it does not matter.
The intermediary could roll-up the information or you could make it more sophisticated by passing a list of opaque identifiers which the caller can then use to individually query the intermediary.
It is slightly "anti-pattern" in the sense that you have a highly coupled "stats" proxy that must be configured to be able to hit each endpoint.
That said, it is good in the sense that you don't have to expose individual containers outside of the proxy. From a security perspective, this may be better because you're not leaking additional information out of your swarm.
You can try to publish a specific container port on a host machine
,add to your services:
ports:
- target: 8081
published: 8081
protocol: tcp
mode: host

Reliable way of getting the full container name inside a container running in Docker Swarm

Background: I have a setup where many different scalable services connect to their databases via a connection pool (per instance). These services run within a Docker Swarm.
In my current database setup, this ends up looking as follows (using PostgreSQL in this example):
PID | Database | User | Application | Client | ...
... | db1 | app | --standard JDBC driver string-- | x | ...
... | db1 | app | --standard JDBC driver string-- | y | ...
... | db1 | app | --standard JDBC driver string-- | y | ...
... | ... | app | ... | x | ...
... | ... | app | ... | x | ...
... | db2 | app | --standard JDBC driver string-- | y | ...
... | db2 | app | --standard JDBC driver string-- | y | ...
... | ... | app | ... | x | ...
... | ... | app | ... | x | ...
What I would like to do, effectively, is provide the current Docker Swarm container name, including scaling identifier, to the DBMS to be able to better monitor the database connections, i.e.:
PID | Database | User | Application | Client | ...
... | db1 | app | books-service.1 | x | ...
... | db1 | app | books-service.2 | y | ...
... | db1 | app | books-service.3 | y | ...
... | ... | app | ... | x | ...
... | ... | app | ... | x | ...
... | db2 | app | checkout-service.2 | y | ...
... | db2 | app | checkout-service.2 | y | ...
... | ... | app | ... | x | ...
... | ... | app | ... | x | ...
(obviously, setting the connection string is trivial - it's getting the information to set that is the issue)
Since my applications are managed by Docker Swarm (and sometime in the future, likely Kubernetes), I cannot manually set this value via environment (as I do not perform a docker run).
When running docker ps on a given Swarm node, I see the following output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
59cdbf724091 my-docker-registry.net/books-service:latest "/bin/sh -c '/usr/bi…" 7 minutes ago Up 7 minutes books-service.1.zabvo1jal0h2xya9qfftnrnej
0eeeee15e92a my-docker-registry.net/checkout-service:latest "/bin/sh -c 'exec /u…" 8 minutes ago Up 8 minutes checkout-service.2.189s7d09m0q86y7fdf3wpy0vc
Of note is the NAMES column, which includes an identifier for the actual instance of the given container (or image, however you'd prefer to look at it). Do note that this name is not the hostname of the container, which by default is the container ID.
I know there are ways to determine if an application is running inside Docker (e.g. using /proc/1/cgroup), but that doesn't help me either as those also only list the container ID.
Is there a way to get this value from inside a Docker container that is being run in a swarm?
What you need (books-service.1) is a combination of a swarm service name and a task slot. Both of these can be passed to the container as environment variables, as well as a full task name (books-service.1.zabvo1jal0h2xya9qfftnrnej):
version: "3.0"
services:
test:
image: debian:buster
command: cat /dev/stdout
environment:
SERVICE_NAME: '{{ .Service.Name }}'
TASK_SLOT: '{{ .Task.Slot }}'
TASK_NAME: "{{ .Task.Name }}"
After passing these you can either strip the TASK_NAME to get what you need or combine the SERVICE_NAME with the TASK_SLOT. Come to think of it, you can combine them right in the template:
version: "3.0"
services:
test:
image: debian:buster
command: cat /dev/stdout
environment:
MY_NAME: '{{ .Service.Name }}.{{.Task.Slot}}'
Other possible placeholders can be found here.

How to use a runtime variable in FitNesse

Let's say I have the following FitNesse page:
!| com.myproject.fitnesse.fixtures.SSHFixture |
| set host | ${hostSi1} |
| set port | ${port} |
| set user | ${user} |
| connect |
| show | run command | pwd |
| disconnect |
www.<variable>.com
The page contains one table and a link. The table will execute the console command pwd. How do I save the result of that command in a FitNesse Variable? I want then to use the variable within the same page. For example in the mentioned link.
Some resources are mentioning SLIM style, but I have no idea how to accomplish that in my case:
Using data from fitnesse table as a variable
#Fried Hoeben: Yes its a script. Got a solution from my colleague.
Let's say you have Fixture for DB stuff and there's method called 'execute query' which will return result of the query.
Setting value to variable 'myname' so we could use in another fixture table as '#{myname}'.
!| com.mystest.fitnesse.fixtures.DBFixture |
| set database | ${dbName} |
| set username | ${dbUser} |
| set password | ${dbPassword} |
| connect | ${dbType} | to | ${dbUrl} | database | ${dbPort} |
| set | myname | execute query | SELECT name FROM customer WHERE id = 1 |
| disconnect |
Use of variable 'myname':
!| com.mytest.fitnesse.fixtures.SSHFixture |
| set host | ${host} |
| set port | ${port} |
| set user | ${user} |
| connect |
| show | execute|echo #{myname} |
| disconnect |
Not sure if the feature of set is part of fitnesse default impl. or of our company implementation.

How to know a process is running under docker?

I may be asking a very beginner level question but I need a way to distinguish process under docker and that under non-docker in a box. The 'ps' command command output gives me a feeling that process is running in linux box and cannot confirm if same is under hood of docker.
In the same context is it possible / feasible that process under docker be started with docker root file system.
Is the same feasible or there any other solution for same?
You can identify Docker process via the process tree on the Docker host (or on the VM if using docker for mac/windows)
The parent process to 2924(haproxy) is 2902
The parent process to 2902(haproxy-start) is 2881
2881 will be docker-container which is managed by a dockerd process
To view your process listing in a tree format use ps -ejH or pstree (available in the psmisc package)
To get a quick list of whats running under dockerd
/ # pstree $(pgrep dockerd)
dockerd-+-docker-containe-+-docker-containe-+-java---17*[{java}]
| | `-8*[{docker-containe}]
| |-docker-containe-+-sinopia-+-4*[{V8 WorkerThread}]
| | | |-{node}
| | | `-4*[{sinopia}]
| | `-8*[{docker-containe}]
| |-docker-containe-+-node-+-4*[{V8 WorkerThread}]
| | | `-{node}
| | `-8*[{docker-containe}]
| |-docker-containe-+-tinydns
| | `-8*[{docker-containe}]
| |-docker-containe-+-dnscache
| | `-8*[{docker-containe}]
| |-docker-containe-+-apt-cacher-ng
| | `-8*[{docker-containe}]
| `-20*[{docker-containe}]
|-2*[docker-proxy---6*[{docker-proxy}]]
|-docker-proxy---5*[{docker-proxy}]
|-2*[docker-proxy---4*[{docker-proxy}]]
|-docker-proxy---8*[{docker-proxy}]
`-28*[{dockerd}]
Show the parents of a PID (-s)
/ # pstree -aps 3744
init,1
`-dockerd,1721 --pidfile=/run/docker.pid -H unix:///var/run/docker.sock --swarm-default-advertise-addr=eth0
`-docker-containe,1728 -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim ...
`-docker-containe,3711 8d923b3235eb963b735fda847b745d5629904ccef1245d4592cc986b3b9b384a...
`-java,3744 -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp/zookeeper/bin/../build/cl
|-{java},4174
|-{java},4175
|-{java},4176
|-{java},4177
|-{java},4190
|-{java},4208
|-{java},4209
|-{java},4327
|-{java},4328
|-{java},4329
|-{java},4330
|-{java},4390
|-{java},4416
|-{java},4617
|-{java},4625
|-{java},4629
`-{java},4632
Show all children of docker, including namespace changes (-S):
/ # pstree -apS $(pgrep dockerd)
dockerd,1721 --pidfile=/run/docker.pid -H unix:///var/run/docker.sock --swarm-default-advertise-addr=eth0
|-docker-containe,1728 -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim ...
| |-docker-containe,3711 8d923b3235eb963b735fda847b745d5629904ccef1245d4592cc986b3b9b384a...
| | |-java,3744,ipc,mnt,net,pid,uts -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp/zookeeper/bin/../build/cl
| | | |-{java},4174
| | | |-{java},4175
| | | |-{java},4629
| | | `-{java},4632
| | |-{docker-containe},3712
| | `-{docker-containe},4152
| |-docker-containe,3806 49125f8274242a5ae244ffbca121f354c620355186875617d43876bcde619732...
| | |-sinopia,3841,ipc,mnt,net,pid,uts
| | | |-{V8 WorkerThread},4063
| | | |-{V8 WorkerThread},4064
| | | |-{V8 WorkerThread},4065
| | | |-{V8 WorkerThread},4066
| | | |-{node},4062
| | | |-{sinopia},4333
| | | |-{sinopia},4334
| | | |-{sinopia},4335
| | | `-{sinopia},4336
| | |-{docker-containe},3814
| | `-{docker-containe},4038
| |-docker-containe,3846 2a756d94c52d934ba729927b0354014f11da6319eff4d35880a30e72e033c05d...
| | |-node,3910,ipc,mnt,net,pid,uts lib/dnsd.js
| | | |-{V8 WorkerThread},4204
| | | |-{V8 WorkerThread},4205
| | | |-{V8 WorkerThread},4206
| | | |-{V8 WorkerThread},4207
| | | `-{node},4203
The command lxc-ls and the command lxc-ps may be installable on your Linux distribution. This will allow you to list the running LXC containers and the processes running within those containers respectively. You should be able to link the output from lxc-ls to lxc-ps using streams and get a list of all containerized processes.
The big caveat is that you specified Docker and not every Docker instance is running on LXC nor is it necessarily a localhost process. Docker defines an API that can be called to list remote Docker instances, so this technique will not help with enumerating processes on remote machines as well.
In windows docker behave little bit different.
It's processes are not run as child of parent process, but running as separate process on the host.
They can be viewed by (for example), powershell, like
Get-Process powershell
For example, getting processes on the host when running microsoft/iis container will include additional powershell process (since ms/iis container runs powershell as a main executable process).

why those two IP prefixes, which are the same to me, map to the different AS number

I use the whois command to try to map two ip prefixes address to the AS numbers. The results are given blow:
$ whois -h whois.cymru.com " -v 1.0.4.0/22 "
AS | IP | BGP Prefix | CC | Registry | Allocated | AS Name
56203 | 1.0.4.0 | 1.0.4.0/24 | AU | apnic | 2011-04-12 | BIGRED-NET-AU Big Red Group
$ whois -h whois.cymru.com " -v 1.0.0.0/24 "
AS | IP | BGP Prefix | CC | Registry | Allocated | AS Name
15169 | 1.0.0.0 | 1.0.0.0/24 | AU | apnic | 2011-08-11 | GOOGLE - Google Inc.
My question is why those two prefixes have different AS number. It seems to me that those two are same, so the AS number should be also exact same number.
Many thanks for your help!
narisu
The two listings are on different subnets. This is caused by different subnet masks (the trailing /22 or /24 above).

Resources