背景
线上发现elasticsearch集群状态red,并且有个es节点jvm内存使用不断升高,直到gc后依然内存不够使用,服务停止。查看日志,elasticsearch出现OOM报错。
[2023-12-06T08:21:26,706][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-10.136.5.85] fatal error in thread [Thread-1243], exiting
java.lang.OutOfMemoryError: Java heap space
at io.netty.util.internal.PlatformDependent.allocateUninitializedArray(PlatformDependent.java:281) ~[?:?]
at io.netty.buffer.PoolArena$HeapArena.newByteArray(PoolArena.java:662) ~[?:?]
at io.netty.buffer.PoolArena$HeapArena.newChunk(PoolArena.java:672) ~[?:?]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247) ~[?:?]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:227) ~[?:?]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:147) ~[?:?]
at io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:339) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocato服务器托管网r.heapBuffer(服务器托管网AbstractByteBufAllocator.java:168) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:159) ~[?:?]
at org.elasticsearch.transport.NettyAllocator$NoDirectBuffers.heapBuffer(NettyAllocator.java:137) ~[?:?]
at org.elasticsearch.transport.NettyAllocator$NoDirectBuffers.ioBuffer(NettyAllocator.java:122) ~[?:?]
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
[2023-12-06T08:21:26,707][WARN ][o.e.h.AbstractHttpServerTransport] [node-10.136.5.85] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/10.136.5.85:9200, remoteAddress=/10.136.5.71:49648}
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.exceptionCaught(Netty4HttpRequestHandler.java:69) [transport-netty4-client-7.8.1.jar:7.8.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:125) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:174) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.lang.OutOfMemoryError: Java heap space
at io.netty.util.internal.PlatformDependent.allocateUninitializedArray(PlatformDependent.java:281) ~[?:?]
at io.netty.buffer.PoolArena$HeapArena.newByteArray(PoolArena.java:662) ~[?:?]
at io.netty.buffer.PoolArena$HeapArena.newChunk(PoolArena.java:672) ~[?:?]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247) ~[?:?]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:227) ~[?:?]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:147) ~[?:?]
at io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:339) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:168) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:159) ~[?:?]
at org.elasticsearch.transport.NettyAllocator$NoDirectBuffers.heapBuffer(NettyAllocator.java:137) ~[?:?]
at org.elasticsearch.transport.NettyAllocator$NoDirectBuffers.ioBuffer(NettyAllocator.java:122) ~[?:?]
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) ~[?:?]
... 7 more
搜索之前的广大网友的经验,discuss论坛有和我这一模一样的报错,但是没有回答。相似的报错,GitHub上有一个issue,但是已经在7.4版本解决了。我的版本号是7.8.1,说明问题不是一个问题。
先排查机器内存是否不够用,发现不是,是jvm.options配置的-Xms和-Xmx内存不够用,尝试配置到机器内存一半,配置30G,观察,依然出现同样的问题。
排查报错日志,发现是netty的报错,怀疑是写入出现问题。
Dump一下内存,分析下是什么占用了这么多的内存。
并发写入,用到了ScheduledThreadPool ,创建了大量线程 gc无法回收线程中内存导致内存不够用了。
既然是写入问题,那测试下硬盘问题,使用fio测试硬盘随机读写
使用iostat -xdm 10
查看实时的硬盘读写,大概3M/s 磁盘随机读写都有10M/s了,说明不是磁盘的问题。
查看网络问题,ping有问题的机器,发现了端倪,只有他又慢,延迟又高,其它都比较低。
使用iperf检测网络问题。还没检查,得知客户给这台机器插得网口是百兆网口,其它机器都是千兆网口。
结合elasticsearch 集群Transport业务逻辑,分片平均分配到所有机器上,每台机器都接收写入的请求。整个集群的网络分发数据有木桶效应,一个网口慢会让整个集群都慢。(如果没有特别指定分片所处的节点)
结论
出现上述报错,优先排查集群网络问题,查看数据的写入量是否超过了网口速率的上限。改成千兆网口即可
服务器托管,北京服务器托管,服务器租用 http://www.fwqtg.net
机房租用,北京机房租用,IDC机房托管, http://www.fwqtg.net
C++类的静态成员 声明并使用静态成员 使用static关键字将其与类关联在一块,同样,static声明的对象也可以声明访问权限public,private 注意,类的静态成员存在于任何对象之外,对象中不包含任何与静态数据有关的数据,就是说类中声明的静态成员是…