Confluent Platform KSQL in headless mode - ksqldb

I've read through KSQL deployment options here https://www.confluent.jp/blog/deep-dive-ksql-deployment-options/. So it is recommended to use headless KSQL for production deployment.
But I have not found any hints on how I can stop/change queries when in production (headless) mode when KSQL disables interactive access to server via REST/CLI. Does that mean that I need to shut down all KSQL servers in order to add/change one query?

You can deloy headless or interactive into Production, depending on what meets your needs.
Headless is designed to allow you to run a known set of queries in a locked down fashion. This can be a requirement for production systems with strict SLAs, where you don't want someone connecting and kicking off an expensive query or dropping something that causes SLAs to be broken.
As you correctly identify, the Headless deployment mode doesn't allow you to change the DDL of your cluster through a CLI/API. Instead, it would be more normal to have some kind of automation around updating the SQL file and bouncing the cluster. We are aware there is much room for improvement here.
Keep in mind that KSQL does not, at the time of writing, support updating an existing table or stream. However, this is something we're actively working towards. Until that is supported, in general you should only add queries to the file. Any deletions or changes to existing queries would require careful testing as there are many changes KSQL does not currently support. Always ensure changes are thoroughly testing before any prod deployment. Alternatively, some users spin up new clusters when changes need to be made, (hopefully infrequently!). Once caught up, they fail over clients and turn of the old cluster. Again, this is an area in which KSQL will see improvements.
Hope this helps and thanks for using KSQL!

Related

Is there a benefit to developing an iOS app against a docker instance?

Our backend is containerised with docker for use with minikube, I was wondering if as an iOS developer I can take advantage of this by running the backend locally on my laptop rather than having to communicate with a staging cloud based environment which can often be flaky.
Am I misunderstanding how this technology works, or would this be a viable and useful case for docker in iOS development, speeding up request and response times and allowing more control over the state of the backend I am building against?
Thanks for any clarity on this idea
What you’re explaining is possible and is something I do in my day job regularly so as to possibility, yes you can do this.
The question of whether this raises any benefit is broad and depends on every individuals needs. If you are finding that your cloud instance is extremely slow at the moment and you don’t have capacity to improve its performance, a locally run docker instance could very well help with this.
One thing to keep in mind though is that any changes you make to a local instance/server in order to make the app work as expected will need to be reflected into your production instance as soon as your app goes live to the public otherwise you will see undesired behaviour due to the app relying on non-existent server configs.

How to implement rolling updates and a relational database?

I have an existing system that uses a relational DBMS. I am unable to use a NoSQL database for various internal reasons.
The system is to get some microservices that will be deployed using Kubernetes and Docker with the intention to do rolling upgrades to reduce downtime. The back end data layer will use the existing relational DBMS. The micro services will follow good practice and "own" their data store on the DBMS. The one big issue with this seems to be how to deal with managing the structure of the database across this. I have done my research:
https://blog.philipphauer.de/databases-challenge-continuous-delivery/
http://www.grahambrooks.com/continuous%20delivery/continuous%20deployment/zero%20down-time/2013/08/29/zero-down-time-relational-databases.html
http://blog.dixo.net/2015/02/blue-turquoise-green-deployment/
https://spring.io/blog/2016/05/31/zero-downtime-deployment-with-a-database
https://www.rainforestqa.com/blog/2014-06-27-zero-downtime-database-migrations/
All of the discussions seem to stop around the point of adding/removing columns and data migration. There is no discussion of how to manage stored procedures, views, triggers etc.
The application is written in .NET Full and .NET Core with Entity Framework as the ORM.
Has anyone got any insights on how to do continious delivery using a relational DBMS where it is a full production system? Is it back to the drawing board here? In as much that using a relational DBMS is "too hard" for rolling updates?
PS. Even though this is a continious delivery problem I have also tagged with Kubernetes and Docker as that will be the underlying tech in use for the orchestration/container side of things.
All of the following under the assumption that I understand correctly what you mean by "rolling updates" and what its consequences are.
It has very little (as in : nothing at all) to do with "relational DBMS". Flatfiles holding XML will make you face the exact same problem. Your "rolling update" will inevitably cause (hopefully brief) periods of time during which your server-side components (e.g. the db) must interact with "version 0" as well as with "version -1" of (the client-side components of) your system.
Here "compatibility theory" (*) steps in. A "working system" is a system in which the set of offered services is a superset (perhaps a proper superset) of the set of required services. So backward compatibility is guaranteed if "services offered" is never ever reduced and "services required" is never extended. However, the latter is typically what always happens when the current "version 0" is moved to "-1" and a new "current version 0" is added to the mix. So the conclusion is that "rolling updates" are theoretically doable as long as the "services" offered on server side are only ever extended, and always in such a way as to be, and always remain, a superset of the services required on (any version currently in use on) the client side.
"Services" here is to be interpreted as something very abstract. It might refer to a guarantee to the effect that, say, if column X in this row of this table has value Y then I will find another row in that other table using a key computed such-and-so, and that other row might be guaranteed to have column values satisfying this-or-that condition.
If that "guarantee" is introduced as an expectation (i.e. requirement) on (new version of) client side, you must do something on server side to comply. If that "guarantee" is currently offered but a contradicting guarantee is introduced as an expectation on (new version of) client side, then your rolling update scenario has by definition become inachievable.
(*) http://davidbau.com/archives/2003/12/01/theory_of_compatibility_part_1.html
There are also parts 2 and 3.
I work in an environment that achieves continuous delivery. We use MySQL.
We apply schema changes with minimal interruption by using pt-online-schema-change. One could also use gh-ost.
Adding a column can be done at any time if the application code can work with the extra column in place. For example, it's a good rule to avoid implicit columns like SELECT * or INSERT with no columns-list clause. Dropping a column can be done after the app code no longer references that column. Renaming a column is trickier to do without coordinating an app release, and in this case you may have to do two schema changes, one to add the new column and a later one to drop the old column after the app is known not to reference the old column.
We do upgrades and maintenance on database servers by using redundancy. Every database master has a replica, and the two instances are configured in master-master (circular) replication. So one is active and the other is passive. Applications are allowed to connect only to the active instance. The passive instance can be restarted, upgraded, etc.
We can switch the active instance in under 1 second by changing an internal CNAME, and updating the read_only option in each MySQL instance.
Database connections are terminated during this switch. Apps are required to detect a dropped connection and reconnect to the CNAME. This way the app is always connected to the active MySQL instance, freeing the passive instance for maintenance.
MySQL replication is asynchronous, so an instance can be brought down and back up, and it can resume replicating changes and generally catches up quickly. As long as its master keeps the binary logs needed. If the replica is down for longer than the binary log expiration, then it loses its place and must be reinitialized from a backup of the active instance.
Re comments:
how is the data access code versioned? ie v1 of app talking to v2 of DB?
That's up to each app developer team. I believe most are doing continual releases, not versions.
How are SP's, UDF's, Triggers etc dealt with?
No app is using any of those.
Stored routines in MySQL are really more of a liability than a feature. No support for packages or libraries of routines, no compiler, no debugger, bad scalability, and the SP language is unfamiliar and poorly documented. I don't recommend using stored routines in MySQL, even though it's common in Oracle/Microsoft database development practices.
Triggers are not allowed in our environment, because pt-online-schema-change needs to create its own triggers.
MySQL UDFs are compiled C/C++ code that has to be installed on the database server as a shared library. I have never heard of any company who used UDFs in production with MySQL. There is too a high risk that a bug in your C code could crash the whole MySQL server process. In our environment, app developers are not allowed access to the database servers for SOX compliance reasons, so they wouldn't be able to install UDFs anyway.

setting up Neo4j replication on two instances

I am planning to configure some sort of 2 node replication for neo4j, similar to mysql replication. Since I am a little constrained on resources I don't want to pay for more than two Cloud compute instances. Also I am happy with just one real time or near real time copy of the neo4j database. So the approach i can think of is:
Configure HA on the two compute nodes with the help of an arbiter instance. Setup one neo4j instance (master) on first node and another neo4j instance (slave) + another neo4j instance (arbiter, only for arbitration, no data logging) instance on second node.
OR
Setup a cron for online backup using the neo4j-backup tool. Setup incremental backups every hour or so. Not sure the load it may put on the prod server, planning to test that out.
I am more inclined on the first approach since I get a more real time copy the database (I also get HA/load balancing with instant failover but that is not a priority right now).
Please let me know
which of the two approach is better,
if there is another way to achieve the same or
if any of the above approaches are not suitable or have some flaws.
I am a little new to Neo4j HA so please pardon me for my ignorance. Thanks !
So. You already mentioned available solutions.
TL;DR; I prefer first option.
Cluster
In general, recommended layout is 3 nodes (2 slaves + 1 master).
But your layout - 2 nodes (1 master + 1 slave + 1 arbiter) is viable too. Especially if one server can handle your workload.
Good things:
Almost "real-time" replica.
Possibility to utilise resources to handle bigger workload.
Better availability.
Notes:
If you have 10mb/sec write load on master, then same load will be applied on slave node. This shouldn't affect reads from slave at all (except write load is REALLY huge).
Maintenance costs are bigger, then single-instance installation. You should plan how to handle cluster upgrades, configuration updates, plugin updates.
Branched data. In clustered environment there is possibility to end up in "split-brain" scenario, when 2 nodes have different data and decision should be made which data should be kept. Neo4j handles such cases quite good. But you should keep in mind that small data-loss can occur in VERY RARE scenarios.
Backup
Good things:
Simple. Just do backups from database.
Consistency check. When backup is made, tool runs consistency check to verify if database is not damaged. There is no possibility that Backup will screw up live database. If there any issues - you will be notified via logs from backup utility. See below detailed info on to how backup is performed.
Database. Neo4j backup is fully-functional database. You can spin-up server that points to backup database, and do everything you wan't.
Incremental backups. You can do incremental backups as often, as you wan't.
Notes:
Neo4j scales vertically very well (depends on size of database). It can handle huge load on single instance (we had up to 3k requests/second on medium machine). So, you can get one bigger machine for Neo4j server and other smaller (cheaper) for backups.
How backup is performed?
One thing that should be kept in mind - live database is still fully operational. Backup utility doesn't not stop or prevent any actions.
When transaction in database is committed, all changes are appended to transaction log.
When there are no previous backup present: copy whole storage.
When there is previous backup AND transaction logs are available: copy new transaction logs and replay them on to storage.
When there is previous backup AND transactions are NOT available: discard existing storage, copy existing storage.
Why transaction logs can not be available? Your configuration may say to keep only latest transaction logs (i.e. 1 hour), or not to keep at all.
Relevant settings:
keep_logical_logs
logical_log_rotation_threshold
Other
Anyway, you should consider making backups event in clustered environment. Everything can fail, in any moment.
In general - everything depends on your load and database size.
If your database is small enough to fully fit in memory and one machine is enough to handle all load, then one Neo4j instance will be enough. Just do backup.
If you wan't better scalability/availability and real-time working replica, then cluster setup is best choice.

Postgresql replication in rails with data-fabric gem

I am currently setting up a master-slave app using Ruby on Rails. I am planning to use data-fabric or octopus gem for handling the read/write connections.
This is my first time setting up master-slave DBs. I am confused over the various open source tools available to implement the postgresql replication e.g. pgpool II, pgcluster, Bucardo and Hot Standby/Streaming Replication (built in feature in postgresql 9.1)
My requirements are
fault tolerance(high availability and no data loss on failover)
load balancing
Thanks in advance
Note: I have gone through the stackoverflow post regarding postgresql replication but they are pretty old and not helping to conclude on which tool I should go with.
In your case, streaming replication is the place to start. It is not very flexible but it does what you need regarding database reads as long as you don't need to replicate between major versions.
Database Replication 101
Database replication is a way to ensure that data saved to a specific server becomes stored in a number of other servers. This is often done to better utilize more limited network connections, ensure fault tolerance (so there is essentially a hot back-up), ensure that read-only queries can be distributed over a larger number of databases, etc. This all must be done without sacrificing the the basic guarantees of ACID.
There are a number of different overlapping ways to categorize replication solutions. These include:
Page or file-level vs row-level vs statement-level
Synchronous vs Asynchronous
Master-slave vs Multi-Master
In general understanding replication and the tradeoffs between solutions requires relatively strong understanding of database mechanics and ACID guarantees. I will assume you are relatively familiar with storage mechanics, and deterministic vs non-deterministic operations and the like.
What is Being Replicated? File changes (Physical) vs Row Changes (Logical) vs Statements
The simplest approach is to replicate block changes to files, for example as stored in the write-ahead log in PostgreSQL. This replicates changes at the page level and it requires identical file formats. This means you cannot replicate across major versions, CPU architectures, or operating systems. Anything that could affect the alignment of tuples, for example, will cause the replication to either fail or, worse, corrupt the slave's database. This is the approach streaming replication uses. It is simple to set up, and it always replicates everything in the database cluster.
Additionally this approach means you can easily guarantee that the master and slave databases are identical down to the file level. Because of the fact that the PostgreSQL WAL is cluster-global it is unlikely that this approach will ever replicate anything short of the entire database cluster.
As a description of how this works, suppose I:
UPDATE my_table SET rand_value = random() WHERE id > 10000;
In this case, this changes a bunch of data pages and the file operations are replicated to the replicas. The files remain identical between the master and slave.
Another approach, one taken by Slony, Bucardo, and others is to replicate rows in a logical manner. In this approach, changed rows are flagged and logged, and the changes sent to the replicas. The replicas re-run row operations from the master database. Because these are add-on tools which do not replicate file operations but rather logical database operations, they can replicate across CPU architectures, operating systems, etc. Also they are usually designed so that you can replicate some but not all tables in a database, allowing for a lot of flexibility. On the other hand this leads to a lot of potential for errors. "Oops, that table was not replicated" is a real problem.
In this case when I run the update statement above, a trigger is fired capturing the actual rows inserted and deleted and these are logged, replicated, and the row operations re-run. Because this happens after rand() is run, the databases are logically, but not necessarily physically identical.
A final approach is statement replication. In this case we replicate statements and re-run the statements on the replicas. Some configurations of PgPool will do this. In this case, you cannot ensure that a database is logically equivalent to its replica if any non-deterministic functions are run. In the statement above, the statement itself will run on each replica, ensuring different pseudorandom numbers in the relevant column.
Synchronous vs Asynchronous
This distinction is important to understand regarding failover guarantees. In an asynchronous replication system, the updates are queued and transferred when possible to the replicas and re-run there. In a synchronous replication system the database which accepts the write will not return a successful commit until at least a certain number of replica databases report a successful commit.
Asynchronous replication is generally more robust and produces better availability than synchronous replication. This is because synchronous replication introduces additional points of failure. If you have one master and one slave, then if either system goes down, your database becomes unavailable at least for write operations.
The tradeoff though is that synchronous replication offers a guarantee that data which is committed is in fact available on replicas in the event that the master, say, suffers catastrophic hardware failure immediately following commit. This is a very low probability event, but in some cases it is important that you know the data is still available. In short this provides additional durability guarantees not present in async replication.
Multi-Master vs Master-Slave
Most replication systems are master-slave. In this case, all writes begin at one node and are replicated to other nodes. Writes may only begin at one node. They may not begin at other nodes. This makes replication straight-forward because we know that the slaves represent a past state of the master.
Multi-master replication allows writes to occur to more than one node. In an asynchronous replication system, this leads to the problem of conflict resolution. These problems are actually worse than most assume when you add DDL statements. Suppose two different users run the above update statement on two different masters. We will now have a set of records that have to be replicated across but they will conflict.
Multi-master replication typically requires that people think through this conflict resolution process quite carefully. It is never a process that just works out of the box. Often times you write your own conflict resolution routines. For this reason I typically recommend avoiding multi-master replication unless you really need it.

ASP.Net MVC Poor Production Performance

We have a MVC 3 application which has been deployed onto a newly built Windows 2008 R2 Web Edition server which is performing badly.
This application has been through development, quality assurance and user acceptance testing cycles on the same operating system (different boxes) with no performance issues.
The only difference we can see with the server is that it sits in the DMZ and as such has two network adapters configured, one for the internet, and one to punch through the firewall.
We have put all sorts of logging into the application and confirmed that up until the 'return ActionResult' everything is working correctly (ie ~500ms). It then takes 15 seconds to render the page.
We have tried turning on debug=false in the config file, i'm not sure what else to look for here, it seems like an environment issue.
Any suggestions please ? I am about to investigate if the thread pool size could be causing problems.
Also, if it helps the page is using multiple partial views, i have read others having problems with them.
Thanks,
Matt
Since the application performs ok in other environments I would suggest you investigate following:
Database - are you running against different database? How long the queries execute? If you have non-optimized database with million records on production, and only few records in test you want find performance problems soon enough.
Network - what is the latency between web box and database? If you loose 100ms for each database query just because of network than if your page triggers 50 queries you've lost 5secs. I've seen poorly configured routers / load balancers that were doing just that.
Try profiling each component of your system (db, network, web box) in order to find out where you're wasting all that time. Try http://code.google.com/p/mvc-mini-profiler/.
PS. You MUST have debug=false in your prod env.

Resources