Apache Nutch doesn't expose its API - docker

I'm trying to use Apache Nutch 1.x Rest API. I use docker images to set up Nutch and Solr. You can see the demo repo in here
Apache Nutch uses Solr as its dependents. Solr works great, I'm able to reach its GUI at localhost:8983.
However, I cannot reach Apache Nutch's API at localhost:8081. The problem starts here. The Apache Nutch 1.X RESTAPI doc indicates that I can start the server like this
2. :~$ bin/nutch startserver -port <port_number> [If the port option is not mentioned then by default the server starts on port 8081]
Which I am doing in docker-compose.yml file.
I'm also exposing the ports to the outside.
ports:
- "8080:8080"
- "8081:8081"
But I wasn't able to successfully call the API from my computer.
The rest API documentation says that if I send a get request to /admin endpoint, I would get a response.
GET /admin
When I try this with Postman or from the browser, it cannot reach out to the server and gives me back a 500 error.
However, when I get inside of the container with docker exec -it and try to curl localhost:8081/admin, I get the correct response. So within the container the API is up and running, but it is not exposed to outside.
In one of my tryouts, I have added a frontend application in another container and send rest requests to Solr and Nutch containers. Solr worked, Nutch failed with 500. This tells me that Nutch container is not only unreachable to the outside world, it is also unreachable to the containers within the same network.
Any idea how to workaround this problem?

nutch by default only reply to requests from localhost:
bash-5.1# /root/nutch/bin/nutch startserver -help
usage: NutchServer [-help] [-host <host>] [-port <port>]
-help Show this help
-host <host> The host to bind the Nutch Server to. Default is
localhost.
So you need to start it with -host 0.0.0.0 to be able to reach it from the host machine or another container:
services:
nutch:
image: 'apache/nutch:latest'
command: '/root/nutch/bin/nutch startserver -port 8081 -host 0.0.0.0'

Related

Use Docker with same port as other program

I am currently facing following problem:
I build a docker container of a node server (a simple express server which sends tracing data to Zipkin on port 9411) and want to run it along Zipkin.
So as I understood, the node server should send tracing data to Zipkin using port 9411.
If I run the server with node only (not as docker), I can run it along Zipkin and everything is working fine.
But if I got Zipkin running and than want to fire up my Docker Container, I get the error
Error starting userland proxy: listen tcp4 0.0.0.0:9411: bind: address already in use.
My understanding is that there is a conflict concerning the port 9411, since it seems to be blocked by Zipkin, but obviously, also the server in the Docker container needs to use it to communicate with Zipkin.
I would appreciate if anybody got an idea how I could solve this problem.
Greetings,
Robert
When you start a docker container, you add a port binding like this:
docker run ... -p 8000:9000
where 8000 is the port you can use on the pc to access port 9000 within the container.
Don't bind the express server to 9411 as zipkin is already using that port.
I found the solution: using the flag --network="host" does the job, -p also is not needed.

Considering URL redirections: How to use GUIs of web applications running in different containers within the same docker network on a remote server?

I have the feeling that I am overlooking something obvious as my solutions/ideas so far seem too cumbersome. I have searched intensively for a good solution, but so far without success - probably because I do not know what to look for.
Question:
How do you interact with the graphical interfaces of web servers running in different containers (within the same Docker Network) on a remote server, given URL redirections between these containers?
Initial situation:
I have two containers (a Flask web application and a Tomcat server with OpenAM running on it) running on my docker host (Azure-VM).
On the VM I can output the content of both containers via the ports that I have opened.
Using ssh port forwarding I can interact with the graphical components of both containers on my local machine.
Both containers were created with the same docker-compose and can be accessed via their domain name without additional network settings.
So far I have configured OpenAM on my local machine using ssh port forwarding.
Problem:
The Flask web app references OpenAM by its domain name defined in docker-compose and vice versa. I forward to my local machine the port of the Flask container. The Flask application is running and I can interact with it in my browser.
The system fails as soon as I am redirected from Flask to OpenAM on my local machine because the reference to the OpenAM container used by Flask is specific to the Docker network. Also, the port of the OpenAM Container is different.
In other words, the routing between the two networks is nonexistent.
Solutions Ideas:
Execute the requests on the VM using command-line tools.
Use a container with a headless browser that automatically executes the requests.
Use Network Setting 'Host' and execute the headless browser on the VM instead.
Route all requests through a single container (similar to a VPN) and use ssh port forwarding.
Simplified docker-compose:
version: "3.4"
services:
openam:
image: openidentityplatform/openam
ports:
- 5001:8080
command: /usr/local/tomcat/bin/catalina.sh run
flask:
build: ./SimpleHTTPServer
ports:
- 5002:8000
command: python -m http.server 8000
Route all requests through a single container - This is the correct approach.
See API gateway pattern
The best solution that I could find so far. It does not serve for production. However, for prototyping or if simply trying to emulate a server structure by using containers it is an easy setup.
General Idea:
Deploy a third VNC container running a Webbrowser and forward the port of this third container to your local machine. As the third container is part of the docker network it can naturally resolve the internal domain names and the VNC installation on your local machine enables you to interact with the GUIs.
Approach
Add the VNC to the docker-compose of the original question.
Enable X11 forwarding on the server and client-side.
Forward the port of the VNC container using ssh.
Install VNC on the client, start a new session, and enter the predefined password.
Try it out.
Step by Step
Add the VNC container (inspired by creack's post on stackoverflow) to the docker-compose file from the original question:
version: "3.4"
services:
openam:
image: openidentityplatform/openam
ports:
- 5001:8080
command: /usr/local/tomcat/bin/catalina.sh run
flask:
build: ./SimpleHTTPServer
ports:
- 5002:8000
command: python -m http.server 8000
firefoxVnc:
container_name: firefoxVnc
image: creack/firefox-vnc
ports:
- 5900:5900
environment:
- HOME=/
command: x11vnc -forever -usepw -create
Run the docker-compose: docker-compose up
Enable X11 forwarding on the server and client-side.
On client side $ vim ~/.ssh/config and add the following lines:
Host *
ForwardAgent yes
ForwardX11 yes
On server-side run $ vim /etc/ssh/sshd_config and edit the following lines:
X11Forwarding yes
X11DisplayOffset 10
Forward the port of the VNC container using ssh
ssh -v -X -L 5900:localhost:5900 gw.example.com
Make sure to include the -X flag for X11. The -v flag is just for debugging.
Install VNC on the client, start a new session and enter the predefined password.
Install VNC viewer on your local machine
Open the installed viewer and start a new session using the forwarded address localhost:59000
When prompted type in the password 1234 which was set in the original Dockerfile of the VNC dicker image (see creack's post linked above).
You can now either go to openam:8080/openam/ or apache:80 within the browser of the VNC localhost:5900 session.
An even better solution that is clean, straightforward, and also works perfectly when running parts of the application on different virtual machines.
Setup and Use an SSH SOCKS Tunnel
For Google Chrome and macOS:
Set your network settings to host within the Dockerfile or docker-compose.
Start an SSH tunnel:
$ ssh -N -D 9090 [USER]#[SERVER_IP]
Add the SwitchyOmega proxy addon to your Chrome browser.
Configure SwitchyOmega by going to New Profile > Proxy Profile, clicking create, entering the same server IP as for the ssh command and the port 9090.
Open a new terminal tap and run:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--user-data-dir="$HOME/proxy-profile" \
--proxy-server="socks5://localhost:9090"
A new Crome session will open up in which you can simply browse your docker applications.
Reference | When running Linux or Windows | Using Firefox (no addon needed)
The guide How to Set up SSH SOCKS Tunnel for Private Browsing explains how to set up an SSH SOCKS Tunnel running Mac, Windows, or Linux and using Google Chrome or Firefox. I simply referenced the setup for macOS and Crome in case the link should die.

Docker: cannot get access to server

Here is a little of backstory. I implemented a couple of web APIs using microservices architecture. I am trying to make my microservices accessible via HTTPS. The microservices are developed using .net core, so according to Microsoft document, to enforce HTTPS, I need to configure Kestrel. Following is how I did it.
.UseKestrel(options =>
{
options.Listen(IPAddress.Loopback, 5000);
options.Listen(IPAddress.Loopback, 5001, listenOptions =>
{
listenOptions.UseHttps("cert.pfx", "pwd");
});
})
To make it simple, I use kestrel by itself and skip reverse proxy. I will certainly include Nginx as reverse proxy but that is the future work. I tested locally, it worked. Then, I deployed it onto Docker. Here is the docker-compose.override file
version: '3.4'
services:
dataservice:
environment:
- ASPNETCORE_ENVIRONMENT=Development
- ASPNETCORE_URLS=https://+:443;http://+:80
ports:
- "5000:80"
- "5001:443"
In dockerfile, port 5000 and 5001 are exposed. I built the project into images, and run it on docker, using docker run -it --rm -p 5000:80 --name *name* *imagename*. Docker shows Now listening on: http://127.0.0.1:5000 and Now listening on: https://127.0.0.1:5001. Now the problem is, leave the https part aside, the APIs cannot even accessed by http. The browser just shows This page isn’t working 127.0.0.1 didn’t send any data. ERR_EMPTY_RESPONSE. I found a similar question from here Docker: cannot open port from container to host
, somehow this is about server should listen to 0.0.0.0. Though I am not fully understand the reason, I changed the kestrel configuration to
options.Listen(IPAddress.Any, 5000);
built and ran docker images again, and Docker shows Now listening on: http://0.0.0.0:5000, still it doesn't work. I also tried to replace the IP with localhost, it has no use. I did not use .UseHttpsRedirection(), https should have nothing to do with the problem.
Am I missing any configuration or doing anything wrong? It would be really helpful if anyone could shed some light. Thank you in advance.
You should listen on 80 and 443 inside the container, i.e. options.Listen(IPAddress.Any, 80); because this docker declaration
ports:
- "5000:80"
means that the local port 80 (the port from your source code) is exported to external port 5000, and not the other way around.

Traefik backend health checks not working

Traefik Version: 1.6.4
My company uses Docker Swarm to present applications as services, using Traefik for routing. All has been working fine so far, but we're having trouble implementing backend health checks in Traefik.
All of our applications expose a health check, which works fine and returns 200 via a simple curl or hitting it in a web browser. We've applied Docker labels to our swarm services to reflect these health checks. As an example:
traefik.backend.healthcheck.path=/health
traefik.backend.healthcheck.interval=30s
The Traefik service logs report the following:
{"level":"warning","msg":"Health check still failing. Backend: \"backend-for-my-app\" URL: \"https://10.0.0.x:8080\" Reason: received non-200 status code: 406","time":"2018-07-10T19:41:25Z"}
Within the app containers we have Apache running with ModSecurity. ModSecurity is blocking the request because the host header is a numeric IP address. I shelled into the Traefik container and did a couple curls against the app container to test:
curl -k https://10.0.0.x:8080/health <-- ModSecurity blocks this, returns a 406
curl -k -H "Host: myapp.company.com" https://10.0.0.x:8080/health <-- works fine, returns a 200
TLDR: I need a way to set a host header for the Traefik backend health check. In Traefik docs, I don't see a way of doing this. Has anyone run into this issue, and/or know of a solution?
In Traefik 1.7, there is an option to add custom headers, and define a host header for the backend healthcheck:
https://docs.traefik.io/v1.7/configuration/backends/docker/
For example traefik.backend.healthcheck.hostname=foobar.com
Please note that 1.7 is still in RC, but you can test it out to see if it resolves your issue. If you need this in production, you will have to wait for 1.7 to reach stable status.

Can't log to graylog2 docker container via HTTP endpoint

I have a running graylog2 docker container on a remote machine with ports 3000 and 12900 exposed (3000 routes to port 9000 within docker) and I can open graylog web UI on that port. So that works as expected. But for some reason I can't add logs from outside the container. Running this from the cli WORKS from INSIDE the container, but DOESN'T WORK from OUTSIDE:
curl -XPOST http://localhost:3000/gelf -p0 -d '{"short_message":"Hello there", "host":"example.org", "facility":"test", "_foo":"bar"}'
Running this command from outside the docker container I get:
{"type":"ApiError","message":"HTTP 404 Not Found"}
Edit: Found some information that this could possibly be solved by setting GRAYLOG_REST_TRANSPORT_URI to a public IP when running the docker container. Unfortunately when I start it like that, I run into another problem - can't start any inputs to receive logs. Bind address: 0.0.0.0 Port: 3000 It throws:
Request to start input 'project' failed. Check your Graylog logs for more information.
Edit2: Moved my testing environment to a local machine, to rule out possible server misconfigurations. Getting same errors and same problems.
Edit3: Decided to test out the graylog1 docker image and with that one everything actually works as expected right off the bat! So as a backup I could use an old version, but I'd rather avoid that if possible.
You have to start a GELF HTTP input to be able to receive GELF messages via HTTP.
The Graylog REST API does not provide this type of input.

Resources