Apache Kafka 性能测试|Kafka性能优化|Kafka性能调优

Cloud Insight 是一个数据管理平台，支持 Kafka 的监控。提供数据聚合、过滤、分组的功能，让用户能够在集群环境中，了解 Kafka 运行整体情况，迅速做出判断。

免费注册查看 DEMO

支持指标

Cloud Insight Kafka 监控，默认监控以下性能指标：

kafka.messages_in
kafka.net.bytes_in
kafka.net.bytes_out
kafka.net.bytes_rejected
kafka.replication.isr_expands
kafka.replication.isr_shrinks
kafka.replication.leader_elections
kafka.replication.unclean_leader_elections
kafka.request.fetch.failed
kafka.request.fetch.time.99percentile
kafka.request.fetch.time.avg
kafka.request.handler.avg.idle.pct
kafka.request.metadata.time.99percentile
kafka.request.metadata.time.avg
kafka.request.offsets.time.99percentile
kafka.request.offsets.time.avg
kafka.request.produce.failed
kafka.request.produce.time.99percentile
kafka.request.produce.time.avg
kafka.request.update_metadata.time.99percentile
kafka.request.update_metadata.time.avg

简易安装

想要可视化 Kafka 的性能，往往需要自建运维系统：利用 Zabbix 等开源工具搭建运维监控平台。这往往意味着大量的工作，以及繁琐的调试过程。

而报警、指标的运算、不同主机间数据的聚合，以及自定义指标的可视化，都需要对接新的开源工具。从而，更多的时间成本和人力成本，会投入进来。

Cloud Insight 探针的安装只需一条指令，且提供 Puppet 对探针进行批量处理。监控 Kafka 也只需开启 Docker 配置文件。过程十分简单。

而且，Cloud Insight 数据的自动抓取和上传，以及丰富的可视化效果，再加上多渠道的报警。让您免于自建运维监控系统的困扰。

数据管理

Cloud Insight 数据管理功能，能够针对集群中，不同主机的 Kafka 性能指标，进行聚合、过滤、分组。

通过简单的指标查询，能够快速了解分属于不同功能模块、地域、网段的 Kafka 的性能的最大值、平均值、最小值。让运维工作更简单、更敏捷。

开启 Kafka 监控

1. JMX

OneAPM Cloud Insight Agent 通过 JMX 获取 Kafka 中的性能指标。

由于每个实体最多可以监控 350 个性能指标，所以您需要按照下方的配置方法，修改配置文件来确定自己需要哪些指标。

2. 编辑配置文件

编辑配置文件 conf.d/kafka.yaml，使 Cloud Insight Agent 可以与 Kafka 通信。


		                        			WARNING

		                        			This sample works only for Kafka >= 0.8.2.


											instances:

											- host: localhost

											port: 9999

											name: jmx_instance

											user: username

											password: password

											#java_bin_path: /path/to/java #Optional, should be set if the agent cannot find your java executable

											#trust_store_path: /path/to/trustStore.jks # Optional, should be set if ssl is enabled

											#trust_store_password: password

											

											init_config:

											is_jmx: true


											# Metrics collected by this check. You should not have to modify this.

											conf:

											#

											# Aggregate cluster stats

											#

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=BrokerTopicMetrics,

											name=AllTopicsBytesOutPerSec'

											attribute:

											MeanRate:

											metric_type: counter

											alias: kafka.net.bytes_out

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=BrokerTopicMetrics,

											name=AllTopicsBytesInPerSec'

											attribute:

											MeanRate:

											metric_type: counter

											alias: kafka.net.bytes_in

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=BrokerTopicMetrics,

											name=AllTopicsMessagesInPerSec'

											attribute:

											MeanRate:

											metric_type: gauge

											alias: kafka.messages_in

											#

											# Request timings

											#

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=BrokerTopicMetrics,

											name=AllTopicsFailedFetchRequestsPerSec'

											attribute:

											MeanRate:

											metric_type: gauge

											alias: kafka.request.fetch.failed

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=BrokerTopicMetrics,

											name=AllTopicsFailedProduceRequestsPerSec'

											attribute:

											MeanRate:

											metric_type: gauge

											alias: kafka.request.produce.failed

											- include:

											domain: 'kafka.network'

											bean: 'kafka.network:type=RequestMetrics,name=Produce-TotalTimeMs'

											attribute:

											Mean:

											metric_type: counter

											alias: kafka.request.produce.time.avg

											99thPercentile:

											metric_type: counter

											alias: kafka.request.produce.time.99percentile

											- include:

											domain: 'kafka.network'

											bean: 'kafka.network:type=RequestMetrics,name=Fetch-TotalTimeMs'

											attribute:

											Mean:

											metric_type: counter

											alias: kafka.request.fetch.time.avg

											99thPercentile:

											metric_type: counter

											alias: kafka.request.fetch.time.99percentile

											- include:

											domain: 'kafka.network'

											bean: 'kafka.network:type=RequestMetrics,name=UpdateMetadata-TotalTimeMs'

											attribute:

											Mean:

											metric_type: counter

											alias: kafka.request.update_metadata.time.avg

											99thPercentile:

											metric_type: counter

											alias: kafka.request.update_metadata.time.99percentile

											- include:

											domain: 'kafka.network'

											bean: 'kafka.network:type=RequestMetrics,name=Metadata-TotalTimeMs'

											attribute:

											Mean:

											metric_type: counter

											alias: kafka.request.metadata.time.avg

											99thPercentile:

											metric_type: counter

											alias: kafka.request.metadata.time.99percentile

											- include:

											domain: 'kafka.network'

											bean: 'kafka.network:type=RequestMetrics,name=Offsets-TotalTimeMs'

											attribute:

											Mean:

											metric_type: counter

											alias: kafka.request.offsets.time.avg

											99thPercentile:

											metric_type: counter

											alias: kafka.request.offsets.time.99percentile


											#

											# Replication stats

											#

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=ReplicaManager,name=ISRShrinksPerSec'

											attribute:

											MeanRate:

											metric_type: counter

											alias: kafka.replication.isr_shrinks

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=ReplicaManager,name=ISRExpandsPerSec'

											attribute:

											MeanRate:

											metric_type: counter

											alias: kafka.replication.isr_expands

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=ControllerStats,

											name=LeaderElectionRateAndTimeMs'

											attribute:

											MeanRate:

											metric_type: counter

											alias: kafka.replication.leader_elections

											- include:

											domain: 'kafka.server'

											bean: 'kafka.server:type=ControllerStats,

											name=UncleanLeaderElectionsPerSec'

											attribute:

											MeanRate:

											metric_type: counter

											alias: kafka.replication.unclean_leader_elections


											#

											# Log flush stats

											#

											- include:

											domain: 'kafka.log'

											bean: 'kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs'

											attribute:

											MeanRate:

											metric_type: counter

											alias: kafka.log.flush_rate

3. 编辑 Consumer 配置文件

编辑 Consumer 配置文件 conf.d/kafka_consumer.yaml。


		                        			init_config:


		                        			instances:

		                        			- kafka_connect_str: localhost:19092

		                        			zk_connect_str: localhost:2181

		                        			zk_prefix: /0.8

		                        			consumer_groups:

		                        			my_consumer:

		                        			my_topic: [0, 1, 4, 12]