kubernetes-zookeeper
Kubernetes ZooKeeper
This project contains tools to facilitate the deployment of
Apache ZooKeeper on
Kubernetes using
StatefulSets.
It requires Kubernetes 1.7 or greater.
Limitations
-
Scaling is not currently supported. An ensemble’s membership can not be updated in a safe way in
ZooKeeper 3.4.10 (The current stable release). - Observers are currently not supported. Contributions are welcome.
- Persistent Volumes must be used. emptyDirs will likely result in a loss of data.
ZooKeeper Docker Image
The docker directory contains the Makefile for a Docker image that
runs a ZooKeeper server using some custom scripts.
Manifests
The manifests directory contains server Kubernetes manifests that can be used for
demonstration purposes or production deployments. If you primarily deploy manifests directly you can modify any of
these to fit your use case.
Helm
The helm directory contains a helm repository that deploys a ZooKeeper
ensemble.
Administration and Configuration
Regardless of whether you use manifests or helm to deploy your ZooKeeper ensemble, there are some common administration
and configuration items that you should be aware of.
Ensemble Size
As notes in the limitation section, ZooKeeper membership can’t be dynamically configured using the
latest stable version. You need to select an ensemble size that suites your use case. For demonstration purposes, or if
you are willing to tolerate at most one planned or unplanned failure, you should select an ensemble size of 3. This is
done by setting the spec.replicas
field of the StatefulSet to 3,
apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: zk spec: serviceName: zk-hs replicas: 3 ...
and passing in 3 as --servers
parameter to the start-zookeeper
script.
apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: zk spec: ... command: - sh - -c - "start-zookeeper --servers=3 ...
For production use cases, 5 servers may be desirable. This allows you to tolerate one planned and one unplanned
failure.
Memory
While ZooKeeper periodically snapshots all of its data to its data directory, the entire working data set must fit on
heap. The --heap
parameter of the start-zookeeper
script controls the heap size of the ZooKeeper servers,
apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: zk ... command: - sh - -c - "start-zookeeper ... --heap=512M ...
and the spec.template.containers[0].resources.requests.memory
controls the memory allocated to the JVM process.
apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: zk spec: ... containers: - name: kubernetes-zookeeper imagePullPolicy: Always image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10" resources: requests: memory: "1Gi" ...
You should probably not use heap sizes larger than 8 GiB. For production deployments you should consider setting the
requested memory to the maximum of 2 GiB and 1.5 times the size of the configured JVM heap.
CPUs
ZooKeeper is not a CPU intensive application. For a production deployment you should start with 2 CPUs and adjust as
necessary. For a demonstration deployment, you can set the CPUs as low as 0.5. The amount of CPU is configured by
setting the StatefulSet’s spec.template.containers[0].resources.requests.cpus
.
apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: zk spec: ... containers: - name: kubernetes-zookeeper imagePullPolicy: Always image: "gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10" resources: requests: memory: "4Gi" cpu: "2" ...
Networking
The Headless Service that controls the domain of the ensemble must have two ports. The sever
port is used for
inter-server communication, and the leader-election
port is used to perform leader election.
apiVersion: v1 kind: Service metadata: name: zk-hs labels: app: zk spec: ports: - port: 2888 name: server - port: 3888 name: leader-election clusterIP: None selector: app: zk
These ports must correspond to the container ports in the StatefulSet’s .spec.template
and the parameters passed to
the start-zookeeper
script.
apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: zk spec: serviceName: zk-hs replicas: 3 podManagementPolicy: Parallel updateStrategy: type: RollingUpdate template: ... ports: - containerPort: 2181 name: client - containerPort: 2888 name: server - containerPort: 3888 name: leader-election command: - sh - -c - "start-zookeeper ... --election_port=3888 --server_port=2888 ...
The Service used to load balance client connections has one port.
apiVersion: v1 kind: Service metadata: name: zk-cs labels: app: zk spec: ports: - port: 2181 name: client selector: app: zk
The client
port must correspond to the container port specified in the StatefulSet’s .spec.template
and the
parameter passed to the start-zookeeper
script.
```yaml apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: zk spec: serviceName: zk-hs replicas: 3 podManagementPolicy: Parallel updateStrategy: type: RollingUpdate template: ... ports: - containerPort: 2181 name: client - containerPort: 2888 name: server - containerPort: 3888 name: leader-election command: - sh - -c - "start-zookeeper ... --client_port=2181 ...
Storage
Currently, the use of Persistent Volumes to provide durable, network attached storage is mandatory. If you use the
provided image with emptyDirs, you will likely suffer a data loss. The storage
field of the StatefulSet’s
spec.volumeClaimTemplates
controls the storage the amount of storage allocated.
volumeClaimTemplates: - metadata: name: datadir spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 20Gi
The volumeMounts
in the StatefulSet’s spec.template
control the mount point of the PersistentVolumes requested by
the PersistentVolumeClaims,
volumeMounts: - name: datadir mountPath: /var/lib/zookeeper
and the parameters passed to the start-zookeeper
script instruct the ZooKeeper server to use the PersistentVolume
backed directory for its snapshots and write ahead log.
--data_dir The directory where the ZooKeeper process will store its
snapshots. The default is /var/lib/zookeeper/data. This
directory must be backed by a persistent volume.
--data_log_dir The directory where the ZooKeeper process will store its
write ahead log. The default is
/var/lib/zookeeper/data/log. This directory must be
backed by a persistent volume.
Note that, because we use network attached storage there is no benefit to using multiple PersistentVolumes to
segregate the snapshot and write ahead log to separate storage media.
ZooKeeper Time
ZooKeeper does not use wall clock time. Rather, it uses internal ticks that are based on an elapsed number of
milliseconds. The various timeouts for the ZooKeeper ensemble can be controlled by the parameters passed to the
start-zookeeper
script.
--tick_time The length of a ZooKeeper tick in ms. The default is
2000.
--init_limit The number of Ticks that an ensemble member is allowed
to perform leader election. The default is 10.
--sync_limit The maximum session timeout that the ensemble will
allows a client to request. The default is 5.
--max_session_timeout The maximum time in milliseconds for a client session
timeout. The default value is 2 * tick time.
--min_session_timeout The...