Elasticsearch Securing the connection - docker

i am (desperately) new to elasticsearch (7.9.0) and i currently have a cluster with two nodes running.
After a lot of effort it is performing as i would like it to.
It is running on docker and also has an nginx in front of it to route the traffic to it since it is being accessed directly from my website (angular 10).
The elasticsearch is being used as well from my laravel backend directly through the docker container name so that is secure (i guess).
My problem now is that i cannot find or understand a way to secure the http access from outside docker (eg the normal website).
Going via Laravel is an option but this is too slow for my purpose.
Is there a way i can securely have http access to the elasticsearch from the web?
Also, is there a way i can restrict the actions to read only actions?
If you need more info to help out please let me know as i am not knowledgable on what is important here and what not.
Thanks

Angular is a front-end and is run in your user's web browser. If Angular can somehow reach your Elasticsearch instance, everyone can do so. No matter what. You can try to obscure it as many as you want, but if there is direct exposure to Elasticsearch, it will be reachable.
So you have to either assume this fact, or go the slow way and proxy the requests to Laravel, so it can verify that the information requested is actually available for the user performing the request.

Related

Concerns with gRPC architecture (gRPC, nginx, docker)

I'm currently trying to create a tracing tool for fun (which supports gRPC tracing) and was confused as to whether or not I was thinking about this architecture properly. A tracing tool keeps track of the entire workflow/journey of the request (from the moment a user clicks the button, to when the request goes to the API gateway, between microservices, and back.
Let's say the application is a bookstore, and it is broken up to 2 microservices, maybe account and books. Let's say that there is a User Interface, and when you click a button, it allows a user to favorite a book. I'm only using 2 microservices to keep this example simple.
**Different parts of the Fake/Mock up application**
UI ->
nginx -> I wanted to use this as an API Gateway.
microservice 1 -> (Contains data for all Users of a bookstore)
microservice 2 -> (Contains data for all the books)
**So my goal is to figure a way to trace that request. So we can imagine the request goes to nginx
Concern #1: When the request goes to nginx, it is HTTP. Cool, but when the request is sent to the microservice, it is a grpc call (or over http2). Can nginx get an http request and then send that request over http2...? Not sure if I'm wording this correctly or not. I know nginx plus supports http2. I also know that grpc has a grpc gateway too.
Concern #2: Containerization. Do I have to containerize both microservices individually, or would I have to containerize the entire docker container itself. Is it simple to link nginx and docker?
Concern #3: When tracing gRPC requests (finding out how much time a request is fulfilled), I'm considering using a middleware logger or a tracing API (opentracing, jaegar, etc.) to do this. How else would I figure out how long it takes for gRPC to make requests?
I was wondering if it was possible to address these concerns, if my thought process is correct, and if this architecture is feature.
Most solutions in the industry are implemented on top of a container orchestration solution (Kubernetes, Docker Swarm, etc).
It is usually not a good idea to "containerize" and manage reverse proxy yourself.
The reverse proxy should be aware of all the containers status (by hooking to orchestrator) and dynamically update its configuration when a container created, crashed, or relocated (due to a machine gets out of service).
Kubernetes handles GRPC using the mesh networks. Please take a look at kubernetes service mesh.
If you decided to use Traefik and Docker Swarm check out traefik h2c support.
In conclusion, consider more modern alternatives to Nginx when you want to load balance GRPC.

How to bring two Cloud Run Apps under one domain to avoid CORS

I have two apps I wanted to have "fully managed" by Cloud Run. One is a pure Vue.js SPA and the other is the belonging backend server for it that is connected to a MySQL and also fetches some other API endpoints.
Now I have deployed both apps but am totally unaware on how I can give the frontend app access to the backend app. They should be both running on the same domain to avoid the frontend from.
Current URL of the frontend app: https://myapp-xl23p3zuiq-ew.a.run.app
So I'd love to have the server accessible by: https://myapp-xl23p3zuiq-ew.a.run.app/api
Is this somewhat possible to achieve with Cloud Run?
I was having the same issue. The general idea that one usually has is to use path mapping and map / to your client and /server to your backend. After googling for a while I found this:
https://cloud.google.com/run/docs/mapping-custom-domains
Base path mapping: not supported
The term base path refers to the URL
path name that is after the domain name. For example, users is the
base path of example.com/users. Cloud Run only allows you to map a
domain to /, not to a specific base path. So any path routing has to
be handled by using a router inside the service's container or by
using Firebase Hosting.
Option1:
I ended up creating an "all in one" docker image with an nginx as reverse proxy and the client (some static files) and server (in my case a python application powered by uwsgi).
If you are looking for inspiration, you can check out the public repository here: https://gitlab.com/psono/psono-combo
Opttion2:
An alternative would be to host your client on client.example.com, your server on server.example.com and then create a third docker run instance with a reverse proxy under example.com.
All requestes would be "proxied" to the client and server. Your users will only interact with example.com so CORS won't be an issue.
Option3:
Configure CORS, so people accessing example.com can also connect to server.example.com
Currently this is not possible in Cloud Run, as already said on the comments to your question.
You could check if there are any Feature Request for this functionality on Buganizer (Google Issue Tracker), currently there seems to be none, and if that is indeed the case, you can create a new Feature Request by changing the request type from Bug to Feature Request and as Google develops it on their road map, you will be informed.
Hope this helped you.

How can I implement a sub-api gateway that can be replicated?

Preface
I am currently trying to learn how micro-services work and how to implement container replication and API gateways. I've hit a block though.
My Application
I have three main services for my application.
API Gateway
Crawler Manager
User
I will be focusing on the API Gateway and Crawler Manager services for this question.
API Gateway
This is a docker container running a Go server. The communication is all done with GraphQL.
I am using an API Gateway because I expect to have different services in my application each having their own specialized API. This is to unify everything.
All it does is proxy requests to their appropriate service and return a response back to the client.
Crawler Manager
This is another docker container running a Go server. The communication is done with GraphQL.
More or less, this behaves similar to another API gateway. Let me explain.
This service expects the client to send a request like this:
{
# In production 'url' will be encoded in base64
example(url: "https://apple.example/") {
test
}
}
The url can only link to one of these three sites:
https://apple.example/
https://peach.example/
https://mango.example/
Any other site is strictly prohibited.
Once the Crawler Manager service receives a request and the link is one of those three it decides which other service to have the request fulfilled. So in that way, it behaves much like another API gateway, but specialized.
Each URL domain gets its own dedicated service for processing it. Why? Because each site varies quite a bit in markup and each site needs to be crawled for information. Because their markup is varied, I'd like a service for each of them so in case a site is updated the whole Crawler Manager service doesn't go down.
As far as the querying goes, each site will return a response formatted identical to other sites.
Visual Outline
Problem
Now that we have a bit of an idea of how my application works I want to discuss my actual issues here.
Is having a sort of secondary API gateway standard and good practice? Is there a better way?
How can I replicate this system and have multiple Crawler Manager service family instances?
I'm really confused on how I'd actually create this setup. I looked at clusters in Docker Swarm / Kubernetes, but with the way I have it setup it seems like I'd need to make clusters of clusters. That makes me question my design overall. Maybe I need to not think about keeping them so structured?
At a very generic level, if service A calls service B that has multiple replicas B1, B2, B3, ... then it needs to know how to call them. The two basic options are to have some sort of service registry that can return all of the replicas, and then pick one, or to put a load balancer in front of the second service and just directly reach that. Usually setting up the load balancer is a little bit easier: the service call can be a plain HTTP (GraphQL) call, and in a development environment you can just omit the load balancer and directly have one service call the other.
/-> service-1-a
Crawler Manager --> Service 1 LB --> service-1-b
\-> service-1-c
If you're willing to commit to Kubernetes, it essentially has built-in support for this pattern. A Deployment is some number of replicas of identical pods (containers), so it would manage the service-1-a, -b, -c in my diagram. A Service provides the load balancer (its default ClusterIP type provides a load balancer accessible only within the cluster) and also a DNS name. You'd configure your crawler-manager pods with perhaps an environment variable SERVICE_1_URL=http://service-1.default.svc.cluster.local/graphql to connect everything together.
(In your original diagram, each "box" that has multiple replicas of some service would be a Deployment, and the point at the top of the box where inbound connections are received would be a Service.)
In plain Docker you'd have to do a bit more work to replicate this, including manually launching the replicas and load balancers.
Architecturally what you've shown seems fine. The big "if" to me is that you've designed it so that each site you're crawling potentially gets multiple independent crawling containers and a different code base. If that's really justified in your scenario, then splitting up the services this way makes sense, and having a "second routing service" isn't really a problem.

Only allow an Openshift app to be connected with another one

I am currently using the free version of Openshift. I have a scalable ruby on rails + postgres app using 2 of my gears and have a separate (potentially scalable) elasticsearch app using the 3rd gear.
The elasticsearch app was generated using https://github.com/rbrower3/openshift-elasticsearch-cartridge
Since the elasticsearch runs as an app on its own url then that leaves it open to attack from the outside world if someone found out the web address of it.
I have considered the elasticsearch-jetty plugin, although I've not managed to lock it down with a username and password successfully yet, but was wondering if there were any other options for limiting access to my elasticsearch Openshift app somehow, eg using apache somehow, so that only my other app can make connections to it (which would need to be read and write - updating the elasticsearch index as well as selecting data from it).
Thanks
The most basic answer is we support .htaccess for Apache where you can specify a username and password. The other option is to add some other Auth option in front of your elasticsearch by modifying the code in the repo to do that. I am not familiar enough with a default elasticsearch install to know what specific mechanism you can use.

How to host a Rails application as an API that is only accessible locally?

I am starting to create a RESTful API that is built on Ruby on Rails. I would like my other applications (which are hosted on the same server) to be able to use this API. I had the idea that if the API is only available locally, I won't have to deal with the authentication logic since it won't be publicly accessible. I have never done this sort of thing before, so I don't even know if what I am asking for is possible (or if this is even a good idea).
How can I host this application so that my REST API is only locally accessible?
You can do one of the following:
Set the webserver to listen on loopback only
If you need to give access to the local network then configure your firewall to forward ports accordingly
Set the webserver to listen only on the private network interface (not public)

Resources