How to use EmbeddedKsql in SpringBoot application? - ksqldb

I have Kafka Streams java Application up and running. I was trying to use KSQL to create simple queries and Use Kafka Stream for complex solution. I wanted to run both KSQL and Kafka Streams as
Java application.
I was going to through https://github.com/confluentinc/ksql/blob/master/ksqldb-examples/src/main/java/io/confluent/ksql/embedded/EmbeddedKsql.java. is there any documentation for EmbeddedKsql ? or any working prototype ?

KsqlDb 0.10 has been launched and of the newest features in it is the Java Client.
Please go through - https://www.confluent.io/blog/ksqldb-0-10-0-latest-features-updates/

The KsqlDB server does not have a supported Java API at this time. The project doesn't offer any guarantees of maintaining compatibility between releases.
If you were to run ksqlDB embedded in your Java application then KsqlContext would be the class to play around with. But I'm not sure how up-to-date it is, nor can I guarantee it won't be removed in a future release. I'm afraid there there aren't any documentation or examples to look at, as it's not a supported use.
The only supported way to communicate with ksqlDB is really through its HTTP endpoints. You could still embed the server in your own Java app and talk locally of HTTP, though running them in separate JVMs has many benefits.

Related

Rational behind appending versions as Service/Deployment name on k8s with spring cloud skipper

I am kind of new the spring cloud dataflow world and while playing around with the framework, I see that if I have a stream = 'test-steram' with 1 application called 'app'. When I deploy using skipper to kubernetes, I see that It creates pod/deployment & service on kubernetes with name as
test-stream-app-v1.
My question is why do we need to have v1 in service/deployment names on k8s? What role does it play in the overall workflow using spring cloud dataflow?
------Follow up -----------
Just wanted to confirm few points to make sure i am on right track to understand the flow
My understanding is with traditional stream (bind through kafka topics) service (object on kubernetes) do not play a significant role.
Rolling Update (Red/Black) pattern has implemented in following way in skipper and versioning in deployment/service plays a role in following way.
Let's assume that app-v1 deployment already exists and upgrade is requested. Skipper creates app-v2 deployment and
wait for it to be ready. Once ready it destroys app-v1
If my above understanding is right I have following follow up questions...
I see that skipper can deploy and package (and it do not have to be a traditional stream) to work with. Is that the longer term plan or Skipper is only intended to work spring-cloud-dataflow streams?
In case of non-tradtional stream package, where an package has multiple apps(rest microservices) in a group, how this model of versioning will work? I mean when I want to call the microservice from other microservice, I cannot possibly know or less than ideal to know the release-version of the app?
#Anand. Congrats on the 1st post!
The naming convention goes by the idea that each of the stream application is "versioned" if Skipper is used with SCDF. The version gets bumped for when, as a user, when you rolling-upgrade and rolling-downgrade the streaming-application versions or the application-specific properties either on-demand or via CI/CD automation.
It is very relevant for continuous-delivery and continuous-deployment workflows, and we provide native options in SCDF through commands such as stream update .. and stream rollback .. respectively. For any of these operations, the applications will be rolling updated in K8s, and each action will bump the number in the application name. In your example, you'd see them as test-stream-app-v1, `test-stream-app-v2, etc.
With all the historical versions in a central place (i.e., Skipper's database), you'd be able to interact with them via stream history.. and stream manifest .. commands in SCDF.
To learn more about all this, watch this demo-webinar (starts # ~41.25), and also have a look at samples in the reference guide.
I hope this helps.

Spring Cloud Dataflow reducing cost of streams

I'm using Spring Cloud Dataflow local server and deploying 60+ streams with a Kafka topic and custom sink. The memory/cpu usage cost is not currently scalable. I've set the Xmx to 64m for most streams.
Currently exploring my options.
Disable embedded Tomcat server. I tried this once and SCDF couldn't tell the deployment status of the stream.
Group multiple Kafka "source" topics to a single sink app. This is allowed by Kafka but unclear if SCDF will permit subscribing to multiple topics.
Switch to using Kubernetes deployer. Won't exactly reduce the memory/cpu usage but distribute it across multiple machines. Haven't pursued this option because Kubernetes isn't used in my org yet. Maybe this will force the issue.
Open to other ideas. Might also be able to tweak Kafka configs such as max.poll.records and reduce memory usage.
Thanks!
First, I'd like to clarify the differences between SCDF and Stream/Task apps in the data pipeline.
SCDF is a lightweight Spring Boot app that includes the DSL, REST-APIs, and the Dashboard. Simply put, it serves as the orchestrator to define and deploy stream and task/batch data pipelines made of stream and task applications respectively.
The actual business logic, its performance, and the underlying resource consumption are at the individual Stream/Task application level. SCDF doesn't interfere with the app's operation, nor it contributes to the resource load. Everything, in the end, is standalone Boot apps - standalone Java processes.
Now, to your exploratory steps.
Disable embedded Tomcat server. I tried this once and SCDF couldn't tell the deployment status of the stream.
SCDF is a REST server and it requires the application container (in this case Tomcat), you cannot disable it.
Group multiple Kafka "source" topics to a single sink app. This is allowed by Kafka but unclear if SCDF will permit subscribing to multiple topics.
Again, there is no relation between SCDF and the apps. SCDF orchestrates full-blown Stream/Task (aka: Boot apps) into coherent data pipeline. If you have to produce or consumer to/from multiple Kafka topics, it is done at application level. Checkout the multi-io sample for more details.
There's the facility to consume from multiple topics directly via named-destination, too. SCDF provides a DSL/UI capability to build fan-in and fan-out pipelines. Refer to docs for more details. This video could be useful, too.
Switch to using Kubernetes deployer.
SCDF's Local-server is generally recommended for development. Primarily because there's no resiliency baked into the Local-server implementation. For example, if the streaming apps crash for any reason, there's no mechanism to restart them automatically. This is exactly why we recommend either SCDF's Kubernetes or Cloud Foundry server implementations in production. The platform provides the resiliency and fault-tolerance by automatically restarting the apps under fault scenarios.
From resourcing standpoint, once again, it depends on each application. They are standalone microservice application doing a specific operation at runtime, and it is up to how much resources the business logic requires.

Which application container is better for Docker container?

Our future architecture is to move towards docker /micro services. Currently we are using JBoss EAP 6.4 (with potential to upgrade to EAP 7) and Tomcat.
According to me JEE container is too heavy (slow, more memory, higher maintenance etc) for microservices environment. However, I was told that EAP 7 is quite fast and light weight and can be used for developing microservices. What is your input in deciding EAP 7 vs Tomcat 8 for docker/microservices? Cost and speed would be consideration.
EAP7 vs Tomcat 8 is an age old question answered multiple times here, here and here.
Tomcat is only a web container where as EAP7 is an application server that provides all Java EE 7 features such as persistence, messaging, web services, security, management, etc. EAP7 comes in two profiles - Web Profile and Full Profile. The Web Profile is much trimmer version and includes only the relevant implementations typically required for building a web application. The Full Profile is, as you'd expect, contains full glory of the platform. So using EAP 7 Web Profile will help you cut down the bloat quite a bit.
With Tomcat, you'll have to use something like Spring to bring the equivalent functionality and package all the relevant JARs with your application.
These discussions are typically helpful when you are starting a brand new project and have both Java EE or Spring resources at hand. Here are the reasons you may consider using EAP7:
You are already using EAP 6.4. Migrating to EAP 7 would be seamless. Using Docker would be just a different style of packaging your applications. All your existing monitoring, clustering, logging would continue to work. If you were to go with Tomcat, then you'll have to learn the Spring way of doing things. If you have time and resources and willing to experiment, you can go that route too. But think about what do you want to gain out of it?
EAP 7 is optimized for container and cloud deployments. Particularly, it is available as a service with OpenShift and so you know it works OOTB.
EAP 7 will give a decent performance boost in terms of latency and throughput over EAP 6.4. Read https://access.redhat.com/articles/2607521 for more details.
You may also consider TomEE. They provide Java EE stack integrated with Tomcat.
Another option, as #Federico recommended, consider using WildFly Swarm. Then you can really start customizing on what parts of the Java EE platform do you want. And your deployment model is using a JAR file.
As for packaging using Docker, they all provide a base image and you need to bundle your application in it. Here are a couple of important considerations for using a Docker image for microservices:
Size of the Docker image: A container may die unexpectedly or orchestration framework may decide to reschedule it on a different host. A bigger image size will take that much more longer to download. This means your perceived startup time of the service would be more for a bigger image. This also means dynamic scaling of the app would take longer to be effective.
Bootup time of the image: After the image is downloaded, the container may startup quickly but how long does it take for the application to be "ready"?
As a personal note, I'm much more familiar with Java EE stack than Tomcat/Spring, and WildFly continues to be favorite application server.
Besides using traditional Application servers, which are not really that heavy, you can taste different flavor of Java EE, called microcontainers.
Java EE is just a set of standards. Standard results in an API specification and everyone is then free to implement the specification. An Application Server (AS) is mainly a fine-tuned collection of this functionality. Those APIs were not brought to life for no reason. These represent functionality commonly used in projects. Application server can be viewed as a "curated set" of those functionalities. This approach has many advantages - AS has many users, therefore it is well tested over time. Wiring the functionality on your own may result in bugs.
Anyhow, a new age has come, where with Docker, the application carries its dependencies with it. The need for a full-blown application server with all the functionality ready to be served to applications is no longer required in many cases. In times past, the application server did not exactly know which services the applications deployed will need. Therefore, everything was bundled in. Some of more innovative AS like WildFly instantiated only the services required. Also, there are Java EE profiles which eased the monolith Application Server a little bit.
Right now, we usually ship the application together with it's dependencies (JDK, libraries, AS) inside a Docker - or we're heading there. Therefore, an effort to bundle exactly the right amount of is a logical choice. But, and it is a "big but", the need for the functionality of the AS is still relevant. It is still good idea to develop common functionality based on standards and common effort. It only no longer seems to be an option to distribute it as one big package, potentially leaving most of the APIs inactive. This effort has many names, be it microcontainers, uberjar creators ...
WildFly Swarm
Payara Micro
Spring Boot*
There are Java EE server so light it is doubtful to use anything else.
* Spring Boot is not based on Java EE and in default configuration present in the Getting Started guide, Tomcat is used internally.
WebSphere Liberty
Apache TomEE
The key point is, your Java EE application should be developed as an independent Java EE application. Wrapping it with "just enough" functionality is delegated onto these micro solutions. This is, at least in my humble opinion, the right way to go. This way, you will retain compatibility with both full-blown AS and micro-solutions. The uber-jar, containing all the dependencies, can be created during or after the build process.
WildFly Swarm or Payara Micro are able to "scan" the application, running only the services required. For a real-world application, the memory footprint in production can be as low as 100 MB - for a real-world application. This is probably what you want. Spring Boot can do similar things, if you require Spring. However, from my experience, Spring Boot is much more heavyweight and memory hungry than modern Java EE, because it obviously has Spring inside, so if you are seeking lightweigtness in terms of memory consumption, try Java EE, especially WildFly Swarm (or pure WildFly) and Payara Micro. Those are my favorite AS and they can be really, really small. I would say WildFly Swarm is much easier to start with, Payara micro requires more reading, but offers interesting functionality. Both can work as a wrapper - you can just wrap your current project with them after the build phase, no need to change anything.
Payara Micro even provides Docker images ready to use ! As you can see, Java EE is mature and ready to enter Docker lands :)
One of the very good and reliable resources is Adam Bien, for example in his Java EE micro/nanoservices video. Have a look.

Elasticsearch client for iOS

Does anyone know of an elasticsearch client library for iOS? Would be a bonus if it was written in swift as well.
The elastic search 'clients' section shows multiple libraries for a number of platforms but nothing for iOS, I feel like someone must have done this?
Cheers
I doubt that anyone has - last time I checked there were none and for good reasons. Keep in mind that in order to allow an IOS client (or Android for that matter) to use a client library to connect to Elasticsearch you'd have to open up your cluster for either an http or node access - which would allow anyone to do anything to your cluster.
Maybe you could proxy it to prevent deletions and insertions but even so it would open up your cluster's data and open you up to DoS attacks.
Generally a better idea is to create your own REST API that incorporates some type of authentication and authorization and does not open up your cluster to the world.
If you still feel strongly about moving forward you can always just hit the http interface of the REST API for ES. Or take a look at this project someone was working on a few years ago at least to give you a head start:
https://github.com/tallpsmith/ElasticSearchIOSHead
Some recent discussions on this topic:
http://elasticsearch-users.115913.n3.nabble.com/Objective-C-client-for-ElasticSearch-iphone-ipad-etc-td3911216.html
Running Elasticsearch server on a mobile device (android / iphone / ios)
This may be what you're looking for. ElasticSwift. Seems to be in active development but haven't looked deep into how far they've gone.
This is another iOS Swift client that was made back in 2018: Appbase-Swift. It's a lightweight ElasticSearch/appbase.io client. Doesn't seem like it's been updated in awhile though.
We tried a lot for elastic and finally concluded that it on most of the challenging side. another thing is a bit bulky to use on mobile, as per my understanding no room for the optimization. We can only use HTTP Web Service call for the same service. Also, the flavor of the offline search can not be implemented in the mobile elastic till date...

Can Couchbase 2.0 be accessed from erlang via erlmc & memchache 1.3 protocol?

I'm developing an application in erlang/elixir. I'd like to access Couchbase 2.0 from erlang. I found the erlmc project (https://github.com/JacobVorreuter/erlmc ) which is a binary protocol memcached client. The notes say "you must have a version 1.3 or greater of memcached."
I understand that Couchbase 2.0 uses memcached binary protocol for accessing data, and I'm looking for the best way to do this from erlang.
The manual talks about a "Couchbase API Port" on 8092, and calls the 11210 (close to the 11211 memcached normal port) as "internal cluster port".
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-network-ports.html
So, the question is this:
Is setting up erlmc to talk to Couchbase 2.0 on port 8092 the correct way to go about it?
Erlmc talks about how it hashes keys to find the right server, which makes me think that it might be too old of a version of the memcached protocol (or is there a built in MOXI on couchbase 2.0 that I should be connecting to? If so which port?)
Which is the port for the erlang views? And presumably the REST interface for views does not support straight key lookups, so I'll need to write code to access that as well, right?
I'm keen to use a pure erlang solution since NIFs are not concurrent and I'll have some unknown number of processes wanting to access Couchbase 2.0 at the same time.
The last time I worked with Couch was CouchDB, and so I'm trying to piece things together after the merger of Couch and Membase.
If I'm off on the wrong track, please advise on the best way to access Couchbase 2.0 from erlang in a highly concurrant manner. The memcached protocol should be pretty solid, thus possibly libraries a couple years old should work, right?
Thanks!
The short answer is: yes, Couchbase is compatible with memcached text protocol.
But the key point here is "memcached text protocol". Since memcached is using two different protocol types (text and binary), you should use those clients that are using text protocol.
At Mochi, we are using merle for memcached, and looks like it should work for you. Recently, one of my colleagues forked it and made some minor corrections: https://github.com/twonds/merle
Also, consider taking a look at https://github.com/EchoTeam/mcd. This client could use some refactoring, but is also production proven and even allows simple sharding.
Thanks to Xavier's contributions, I refactored the whole thing added pooling, now it builds and performs okay. I also included a basho_bench driver so you can benchmark it yourself. You can find the code on here . I am pretty sure this would perform better than text protocol.
I had to create own vbucket aware erlmc based erlang couchbase client.
The differences:
- http connection to retrieve vbucket map from couchbase
- fill two "reserved" bytes with vbucket id (see python client for example)
- active once async tcp connection for performance reason
The only answer I have so far is:
https://github.com/chitika/cberl
This project is based on the C++ "official" couchbase client.
It seems to have two possible problems:
1) it might be abandoned (last activity was 3 months ago)
2) it uses an NIF, which as I understand it, cannot be accessed concurrently.
We don't use Couchbase with Erlang, but with Python, which also needs to connect with a memcache client. I can't speak to the Erlang libraries specifically, but hopefully the lessons apply in both situations.
Memcache Client Limitations
Memcache clients can only access memcache functionality. You won't be able to use views or any other features not specified in the memcache protocol. If you want access to the views, you will need to use the REST protocol separately on port 8092 (docs).
Connecting to Couchbase with Vanilla Memcache Clients
The ports mentioned on that page are used either internally or by "smart" clients written for Couchbase specifically. By default, memcache clients can connect to the normal memcache port 11211 on any of the nodes in your Couchbase cluster. Do not use the memcache cluster features of any memcache client not written specifically for Couchbase; the usual methods of distribution for vanilla memcached are incompatible with Couchbase.
Explanation
In order to connect with the memcached client, you need to connect to port for the Couchbase bucket directly. When you set up a new bucket, you specify the port you want the bucket to be accessible on. The default bucket is setup on port 11211. Each bucket acts like an independent memcached instance, but is internally distributed to all nodes in the cluster. You can connect to the bucket port on any of the Couchbase servers, and you will be accessing the same data set.
This means that you should not try to use the distributed memcache features of your memcache client. Those features are designed for ad-hoc memcached clusters. Just connect to the appropriate port on the Couchbase server as if it was a single memcached server.
The reason this is possible is because there is a Moxi instance which finds the appropriate Couchbase server to process the request. This Moxi instance automatically runs for each bucket on every Couchbase server. Even though you may not be connected to the node which has your specific key, Moxi will transparently direct your request to the appropriate server.
In this way, you can use a vanilla Memcache client to talk to Couchbase, without needing any additional logic to keep track of cluster topology. Moxi takes care of that piece for you.
Binary protocol
We did have the binary protocol working at one point, but there were problems when we tried to use the flush_all command. That was a while ago, though. I suggest experimenting yourself to see if the level of support meets your needs.

Resources