-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathatom.xml
65 lines (34 loc) · 44.4 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Reason's 知识铺</title>
<subtitle>Reason's 知识铺</subtitle>
<link href="https://reason94.github.io/atom.xml" rel="self"/>
<link href="https://reason94.github.io/"/>
<updated>2020-11-18T15:06:36.907Z</updated>
<id>https://reason94.github.io/</id>
<author>
<name>[object Object]</name>
</author>
<generator uri="https://hexo.io/">Hexo</generator>
<entry>
<title>Netty优雅到底做了什么?</title>
<link href="https://reason94.github.io/2020/11/15/netty-gracefullyshutdown-1/"/>
<id>https://reason94.github.io/2020/11/15/netty-gracefullyshutdown-1/</id>
<published>2020-11-15T10:32:01.000Z</published>
<updated>2020-11-18T15:06:36.907Z</updated>
<content type="html"><![CDATA[<h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>基于Netty实现的Rpc框架,其Server在退出时总是会造成部分的pv lost,很疑惑Server退出时调用的为Netty的gracefully shutdown,为啥没有起到预期的效果?</p><p>看来有必要基于这个疑惑对Netty的gracefully shutdown来一次详细的源码阅读与流程分析了。</p><blockquote><p>本篇文章着重分析Netty优雅退出流程的源码,优雅退出源码中涉及的多线程并发、网络知识将不做过多的介绍与深入说明。</p></blockquote><h1 id="Netty优雅退出介绍"><a href="#Netty优雅退出介绍" class="headerlink" title="Netty优雅退出介绍"></a>Netty优雅退出介绍</h1><p>Netty的优雅关机为netty内置的功能,提供了相对柔和的关机方式,『尽可能』的减少关机过程中通信的损失。</p><p>Netty优雅退出的对外接口与总入口在EventLoopGroup,仅需调用它的shutdownGracefully方法即可,接口清晰方便快捷。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">EventLoopGroup parentGroup;</span><br><span class="line">EventLoopGroup workGroup;</span><br><span class="line">bossGroup.shutdownGracefully();</span><br><span class="line">workerGroup.shutdownGracefully();</span><br></pre></td></tr></table></figure><p>除了无参的构造函数外,还提供了有参构造函数,方法接口定义如下:<br><img src="/images/netty-gracefullyshutdown-intreface.png" alt="netty gracefullyShutdown 接口定义"><br>quietPeriod的含义为安静期或者平静期,具体什么含义先卖个关子稍后分析;timeout的含义为优雅退出的最大等待时间,超过该时间后将被强制退出,无论此时是否有任务在执行。</p><h1 id="Netty优雅关闭分析"><a href="#Netty优雅关闭分析" class="headerlink" title="Netty优雅关闭分析"></a>Netty优雅关闭分析</h1><h2 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h2><h3 id="NioEventLoopGroup"><a href="#NioEventLoopGroup" class="headerlink" title="NioEventLoopGroup"></a>NioEventLoopGroup</h3><p>Netty的优雅退出从EventLoopGroup的shutdownGracefully方法开始分析,以NioEventLoopGroup为例,源码实现如下</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">源码位置:MultithreadEventExecutorGroup#shutdownGracefully</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> Future<?> shutdownGracefully(<span class="keyword">long</span> quietPeriod, <span class="keyword">long</span> timeout, TimeUnit unit) {</span><br><span class="line"> <span class="keyword">for</span> (EventExecutor l: children) { <span class="comment">// 循环遍历每个chaild,即EventLoop,调用其shutdownGracefully方法</span></span><br><span class="line"> l.shutdownGracefully(quietPeriod, timeout, unit);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> terminationFuture();</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>代码逻辑一目了然,循环遍历每个子EventExecutor即EventLoop,调用其shutdownGracefully方法。</p><h3 id="NioEventLoop"><a href="#NioEventLoop" class="headerlink" title="NioEventLoop"></a>NioEventLoop</h3><p>分析此部分的源码前,先回忆下Netty的Channel、EventLoop、EventLoopGroup之间的关系,有助于理解后续优雅退出的逻辑。</p><blockquote><p>EventLoopGroup管理多个EventLoop;EventLoop持有并管理一个Thread,可以简单认为其代表一个Thread;一个Channel会绑定到一个EventLoop上,其生命周期的所有过程均由改EventLoop处理;一个EventLoop可以被多个Channel共用。</p></blockquote><p>NioEventLoop的shutdownGracefully的代码在其父类SingleThreadEventExecutor中,我们重点关注退出流程,为方便理解仅保留关闭流程相关代码。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line">源码位置: SingleThreadEventExecutor#shutdownGracefully</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> Future<?> shutdownGracefully(<span class="keyword">long</span> quietPeriod, <span class="keyword">long</span> timeout, TimeUnit unit) {</span><br><span class="line"> <span class="comment">// 省略</span></span><br><span class="line"> <span class="keyword">if</span> (isShuttingDown()) { <span class="comment">// (1)</span></span><br><span class="line"> <span class="keyword">return</span> terminationFuture();</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">boolean</span> inEventLoop = inEventLoop();</span><br><span class="line"> <span class="keyword">boolean</span> wakeup;</span><br><span class="line"> <span class="keyword">int</span> oldState;</span><br><span class="line"> <span class="keyword">for</span> (;;) { <span class="comment">// (2) 自旋锁修改状态</span></span><br><span class="line"> <span class="keyword">if</span> (isShuttingDown()) {</span><br><span class="line"> <span class="keyword">return</span> terminationFuture();</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">int</span> newState;</span><br><span class="line"> wakeup = <span class="keyword">true</span>;</span><br><span class="line"> oldState = state;</span><br><span class="line"> <span class="keyword">if</span> (inEventLoop) { <span class="comment">// 判断当前执行线程是否为本类实例对应的线程</span></span><br><span class="line"> newState = ST_SHUTTING_DOWN; <span class="comment">// (3) 修改状态为ST_SHUTTING_DOWN</span></span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">switch</span> (oldState) {</span><br><span class="line"> <span class="keyword">case</span> ST_NOT_STARTED:</span><br><span class="line"> <span class="keyword">case</span> ST_STARTED:</span><br><span class="line"> newState = ST_SHUTTING_DOWN;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> newState = oldState;</span><br><span class="line"> wakeup = <span class="keyword">false</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (STATE_UPDATER.compareAndSet(<span class="keyword">this</span>, oldState, newState)) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> gracefulShutdownQuietPeriod = unit.toNanos(quietPeriod); <span class="comment">// (4)</span></span><br><span class="line"> gracefulShutdownTimeout = unit.toNanos(timeout);</span><br><span class="line"> <span class="comment">// 省略</span></span><br><span class="line"> <span class="keyword">return</span> terminationFuture();</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>此处的代码整体逻辑是在修改state状态信息为ST_SHUTTING_DOWN关机中,修改完毕后方法结束。</p><p>(1) 防止多次状态修改:按照Netty的设计实现,NioEventLoop可以通过程序启动入口类、Channel、ChannelHandlerCtx等多处、多个执行时机、多个线程获取到,因此这里进行校验防止多次修改。</p><p>(2) 自旋锁+CAS(volatile)方式修改状态值,保证状态修改的原子性与可见性:基于(1)中的理由,修改状态时需要进行并发控制,netty采用了自旋锁+CAS的轻量锁方式进行控制。</p><p>(3) 修改状态为ST_SHUTTING_DOWN:本方法执行最终是为了修改volatile修饰的state状态为关机中状态</p><p>(4) 设置平静期与关闭超时时间</p><p>可以看到此方法的执行逻辑也比较简单,重中之重的逻辑就是修改状态为ST_SHUTTING_DOWN之后退出。修改个标识位就可以实现优雅退出了?应该没有那么简单。</p><p>众所周知状态标识位的引入,一般是为了进行行为控制,在NioEventLoop源码中查找state状态的使用情况,发现优雅退出的真正执行逻辑。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">源码位置:NioEventLoop#run</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> selectCnt = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (;;) {</span><br><span class="line"> <span class="comment">// 省略掉EventLoop的核心逻辑,本部分代码为进行各种网络事件的监听与分发执行</span></span><br><span class="line"> } <span class="keyword">catch</span> (CancelledKeyException e) {</span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"> } <span class="keyword">catch</span> (Throwable t) {</span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// Always handle shutdown even if the loop processing threw an exception. (0) 每次循环执行时均关注shutdown状态的变更</span></span><br><span class="line"> <span class="keyword">try</span> {</span><br><span class="line"> <span class="keyword">if</span> (isShuttingDown()) { </span><br><span class="line"> closeAll(); <span class="comment">// (1) 关闭本EventLoop关联的所有的Channel</span></span><br><span class="line"> <span class="keyword">if</span> (confirmShutdown()) { <span class="comment">// (2) 退出前最后的确认工作</span></span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">catch</span> (Throwable t) {</span><br><span class="line"> handleLoopException(t);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>(0) 可以发现NioEventLoop run方法在每次循环结束前,均进行一次退出状态的检查</p><p>(1) 发现状态变更为ST_SHUTTING_DWON,关闭EventLoop关联的所有Channel</p><p>(2) Channel关闭完成后执行最终的退出前确认工作,在次方法中将执行自定义Task与shutdownhook,并且会用到quietPeriod与timeout参数判断退出时机。</p><p>closeAll方法中先进行该EventLoop关联channel的获取,然后循环调用每个channel的close方法进行Channel关闭,实现逻辑也比较简单,不再进行赘述</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">源码位置:AbstractChannel#closeAll</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">closeAll</span><span class="params">()</span> </span>{</span><br><span class="line"> selectAgain(); <span class="comment">// Selector先进行一次选择,查看当前是否有所监听的事件发声</span></span><br><span class="line"> Set<SelectionKey> keys = selector.keys();</span><br><span class="line"> Collection<AbstractNioChannel> channels = <span class="keyword">new</span> ArrayList<AbstractNioChannel>(keys.size());</span><br><span class="line"> <span class="keyword">for</span> (SelectionKey k: keys) {</span><br><span class="line"> Object a = k.attachment();</span><br><span class="line"> <span class="keyword">if</span> (a <span class="keyword">instanceof</span> AbstractNioChannel) {</span><br><span class="line"> channels.add((AbstractNioChannel) a); <span class="comment">// 获得所有活跃的Channel</span></span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">for</span> (AbstractNioChannel ch: channels) {</span><br><span class="line"> ch.unsafe().close(ch.unsafe().voidPromise()); <span class="comment">// 执行Netty Channel的关闭操作,此处进行网络连接相关的关闭处理。</span></span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3 id="NioSocketChannel-Unsafe"><a href="#NioSocketChannel-Unsafe" class="headerlink" title="NioSocketChannel.Unsafe"></a>NioSocketChannel.Unsafe</h3><p>netty优雅退出流程中,channel的关闭并不是调用的channel的close方法由ChannelPipeline进行Unsafe#close调用,而是直接调用Unsafe#close。为什么进行这样的操作呢,在这里存个疑。</p><p>先关注下Unsafe#close的源码</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br></pre></td><td class="code"><pre><span class="line">源码位置:AbstractChannel.Unsafe#close()</span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">close</span><span class="params">(<span class="keyword">final</span> ChannelPromise promise, <span class="keyword">final</span> Throwable cause,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">final</span> ClosedChannelException closeCause, <span class="keyword">final</span> <span class="keyword">boolean</span> notify)</span> </span>{</span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"> closeInitiated = <span class="keyword">true</span>;</span><br><span class="line"> <span class="keyword">final</span> <span class="keyword">boolean</span> wasActive = isActive();</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">final</span> ChannelOutboundBuffer outboundBuffer = <span class="keyword">this</span>.outboundBuffer; <span class="comment">// restore</span></span><br><span class="line"> <span class="keyword">this</span>.outboundBuffer = <span class="keyword">null</span>; <span class="comment">// Disallow adding any messages and flushes to outboundBuffer. (1) 不允许再接收任何新的flush请求</span></span><br><span class="line"> </span><br><span class="line"> Executor closeExecutor = prepareToClose(); <span class="comment">// (2) channel close前的准备工作</span></span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (closeExecutor != <span class="keyword">null</span>) { <span class="comment">// (3) 配置了SO_LINGER > 0的情况,将在稍后执行channel close调度与事件发送</span></span><br><span class="line"> closeExecutor.execute(<span class="keyword">new</span> Runnable() {</span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">try</span> {</span><br><span class="line"> <span class="comment">// Execute the close.</span></span><br><span class="line"> doClose0(promise); <span class="comment">// (5) 关闭连接</span></span><br><span class="line"> } <span class="keyword">finally</span> {</span><br><span class="line"> <span class="comment">// Call invokeLater so closeAndDeregister is executed in the EventLoop again!</span></span><br><span class="line"> invokeLater(<span class="keyword">new</span> Runnable() {</span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (outboundBuffer != <span class="keyword">null</span>) {</span><br><span class="line"> <span class="comment">// Fail all the queued messages</span></span><br><span class="line"> outboundBuffer.failFlushed(cause, notify);</span><br><span class="line"> outboundBuffer.close(closeCause);</span><br><span class="line"> }</span><br><span class="line"> fireChannelInactiveAndDeregister(wasActive);</span><br><span class="line"> }</span><br><span class="line"> });</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> });</span><br><span class="line"> } <span class="keyword">else</span> { <span class="comment">// (4) 尚未配置SO_LINGER时,直接执行关闭操作</span></span><br><span class="line"> <span class="keyword">try</span> {</span><br><span class="line"> <span class="comment">// Close the channel and fail the queued messages in all cases.</span></span><br><span class="line"> doClose0(promise);</span><br><span class="line"> } <span class="keyword">finally</span> {</span><br><span class="line"> <span class="keyword">if</span> (outboundBuffer != <span class="keyword">null</span>) {</span><br><span class="line"> <span class="comment">// Fail all the queued messages.</span></span><br><span class="line"> outboundBuffer.failFlushed(cause, notify);</span><br><span class="line"> outboundBuffer.close(closeCause);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (inFlush0) {</span><br><span class="line"> invokeLater(<span class="keyword">new</span> Runnable() {</span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</span><br><span class="line"> (wasActive);</span><br><span class="line"> }</span><br><span class="line"> });</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> fireChannelInactiveAndDeregister(wasActive);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>可以看出Channel.unsafe.close方法中为连接close的重要方法,控制着连接关闭时的关键行为,我们着重对它进行分析。</p><p>(1) 首先将Channel持有的outboundBuffer对象设置为null,netty官方注释说明不再允许新的消息被添加到outboundBuffer中了。这意味着什么呢?</p><p>根据注释的信息很容易想到outboundBuffer空与非空将会控制write与flush的行为,翻源码验证下</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">源码位置:AbstractChannel#write</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">final</span> <span class="keyword">void</span> <span class="title">write</span><span class="params">(Object msg, ChannelPromise promise)</span> </span>{</span><br><span class="line"> assertEventLoop();</span><br><span class="line"> ChannelOutboundBuffer outboundBuffer = <span class="keyword">this</span>.outboundBuffer;</span><br><span class="line"> <span class="keyword">if</span> (outboundBuffer == <span class="keyword">null</span>) {</span><br><span class="line"> safeSetFailure(promise, newClosedChannelException(initialCloseCause));</span><br><span class="line"> <span class="comment">// release message now to prevent resource-leak</span></span><br><span class="line"> ReferenceCountUtil.release(msg);</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"> outboundBuffer.addMessage(msg, size, promise);</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line">源码位置:AbstractChannel#flush</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">final</span> <span class="keyword">void</span> <span class="title">flush</span><span class="params">()</span> </span>{</span><br><span class="line"> ChannelOutboundBuffer outboundBuffer = <span class="keyword">this</span>.outboundBuffer;</span><br><span class="line"> <span class="keyword">if</span> (outboundBuffer == <span class="keyword">null</span>) {</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"> outboundBuffer.addFlush();</span><br><span class="line"> flush0();</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>outboundChannelBuffer为空时,Channel的write与flush操作将会直接返回,write还将释放msg的引用计数以便回收空间。</p><p>也就是说outboundChannelBuffer为空时将不进行消息的输出操作,若此时处于一个请求处理的中间阶段客户端将收不到后续响应。</p><p>(2) prepareToClose()代码虽然只有一行,但却承担着取消selector事件监听的重要作用</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">源码位置: NioSocketChannelUnsafe#prepareToClose()</span><br><span class="line"></span><br><span class="line"><span class="keyword">private</span> <span class="keyword">final</span> <span class="class"><span class="keyword">class</span> <span class="title">NioSocketChannelUnsafe</span> <span class="keyword">extends</span> <span class="title">NioByteUnsafe</span> </span>{</span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="function"><span class="keyword">protected</span> Executor <span class="title">prepareToClose</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">try</span> {</span><br><span class="line"> <span class="keyword">if</span> (javaChannel().isOpen() && config().getSoLinger() > <span class="number">0</span>) { <span class="comment">// (a) 注意solinger > 0的判断</span></span><br><span class="line"> doDeregister(); <span class="comment">// (b) 取消监听SelectKey</span></span><br><span class="line"> <span class="keyword">return</span> GlobalEventExecutor.INSTANCE;</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">catch</span> (Throwable ignore) {</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">null</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">源码位置: AstractNioChannel#doDeregister</span><br><span class="line"></span><br><span class="line"><span class="meta">@Override</span></span><br><span class="line"><span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">doDeregister</span><span class="params">()</span> <span class="keyword">throws</span> Exception </span>{</span><br><span class="line"> eventLoop().cancel(selectionKey()); <span class="comment">// (c) 取消对channel上所有类型事件的处理:读/写等</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>重点关注下getSoLinger() > 0的判断,这里用的为TCP的SO_LINGER选项,作用是等待发送缓冲区中的数据发送完成,并不保证发送缓冲区中的数据一定被对端接收,只是会等待一段时间让这个过程完成。<br>Netty借用TCP的SO_LINGER机制完成缓冲区消息的发送工作。</p><p>紧接着进行了doRegister的工作取消了Channel上的事件监听,到这里多路复用器Selector将不处理读写事件。但此时的连接依旧存活,也就是说通信的对端依旧可以进行消息的发送或响应。<br>那么什么时机关闭连接呢?</p><p>(3) (4) 部分doClose0进行真正的连接关闭操作,fireChannelInactiveAndDeregister在netty的ChannlePipeLine中传输channelInactive事件通知进行连接关闭操作。<br>(3) (4)的区别在于执行连接断开的时机不同,(3)将等待1s后,再进行连接的关闭,具体的实现与prepareToClose返回的Executor有密不可分的关联。<br>此处不赘述了,具体源码可参见GlobalEventExecutor的构造函数与fetchFromScheduledTaskQueue方法。</p><p>(5) 层层代码跟进发现,连接关闭的最终环节发生在如下,调用Java的NIO api进行连接关闭</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">源码位置:NioSocketChannel#doClose</span><br><span class="line"><span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">doClose</span><span class="params">()</span> <span class="keyword">throws</span> Exception </span>{</span><br><span class="line"> <span class="keyword">super</span>.doClose();</span><br><span class="line"> javaChannel().close(); <span class="comment">// 连接关闭</span></span><br></pre></td></tr></table></figure><p>到这里,连接相关的关闭操作终于结束了,此时与Netty建立的连接全都悉数关闭,是不是可以退出进程万事大吉了?</p><p>还记得EventLoop run方法中,closeAll方法后还有个confirmShutdown()方法吗,感觉事情并不简单。</p><h3 id="confirmShutdown"><a href="#confirmShutdown" class="headerlink" title="confirmShutdown()"></a>confirmShutdown()</h3><p>confirmShutdown的执行流程较为直截了当,进行后续的未完成任务的执行收尾工作。</p><p>首先进行未到执行时机的定时任务的取消,将不再执行这些任务。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cancelScheduledTasks();</span><br></pre></td></tr></table></figure><p>对于已经在执行队列中的任务,继续执行。这些任务分为运行过程中添加的任务以及注册的shutdownHook任务。若执行完成,退出</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (runAllTasks() || runShutdownHooks()) {</span><br><span class="line"> <span class="keyword">if</span> (isShutdown()) {</span><br><span class="line"> <span class="comment">// Executor shut down - no new tasks anymore.</span></span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">true</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>若退出时间超时,也立即退出。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (isShutdown() || nanoTime - gracefulShutdownStartTime > gracefulShutdownTimeout) {</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">true</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>若出在退出前的平静期内,则使当前EventLoop每100ms检测下是否在这段时期有新的任务加入,有则执行。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (nanoTime - lastExecutionTime <= gracefulShutdownQuietPeriod) {</span><br><span class="line"> <span class="comment">// Check if any tasks were added to the queue every 100ms.</span></span><br><span class="line"> <span class="comment">// <span class="doctag">TODO:</span> Change the behavior of takeTask() so that it returns on timeout.</span></span><br><span class="line"> taskQueue.offer(WAKEUP_TASK);</span><br><span class="line"> <span class="keyword">try</span> {</span><br><span class="line"> Thread.sleep(<span class="number">100</span>);</span><br><span class="line"> } <span class="keyword">catch</span> (InterruptedException e) {</span><br><span class="line"> <span class="comment">// Ignore</span></span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">false</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>执到这里,netty的gracefully shutdown流程彻底结束了,可以安心的退出。</p><h2 id="总结梳理"><a href="#总结梳理" class="headerlink" title="总结梳理"></a>总结梳理</h2><p>经过上述对源码的分析,总结发现Netty的优雅退出主要包含2部分内容的优雅退出:网络资源的退出关闭、任务线程池的执行与关闭。按照时间的先后顺序可以总结如下<br><img src="/images/netty-procedure.png" alt="netty关闭流程"><br>其中6和7的执行没有严格的先后顺序,均在线程的Task任务列表中调度。</p><p>重点关注下连接关闭相关的4、5、6步骤,Netty在关闭连接时,4步骤首先将Channel持有的OutboundBuffer置为null,本Channel将忽略这个时间点之后提交的write和flush消息,比如调用channel,writeAndFlush消息将被忽略。第5步将取消监听处理Channel上的各种网络事件Read/Write等,但并未关闭连接,也就是说连接的双方依旧可以在该连接上进行数据的通信与传输。等待一段时间后将最终进行连接的关闭释放,也即第6步。</p><p>也就是Netty的优雅退出并不能保证完全的流量无损,将存在以下的问题与风险:</p><ul><li>OutboundBuffer设置为null后,若阶段发送传输的消息不为完整的消息包,将造成消息传输不完整的问题;若处于请求接收完成但响应的过程,将造成pv lost</li><li>取消事件监听但并未关闭连接时,传输的另一端依旧可以发送消息,此时的消息将不被处理,如果为服务端的关闭,将造成客户端pv lost</li></ul><p>对我们的启发是如果RPC框架要实现Server端的优雅停机,需要考虑上面的风险与问题。对于问题1首先考虑每次write、flush消息时发送完整的消息,其次考虑延后shutdown的时机到处理完现存的消息后。<br>对于问题2考虑进行流量截断,停机时首先通过各种手段截断流量,如取消RPC服务的注册、主动发送状态给客户端、直接响应错误不处理请求由客户端重试。</p>]]></content>
<summary type="html"><h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>基于Netty实现的Rpc框架,其Server在退出时总是会造成部分的pv lost,很疑惑Server退出时调用的为Netty的grace</summary>
<category term="Netty" scheme="https://reason94.github.io/categories/Netty/"/>
<category term="Netty" scheme="https://reason94.github.io/tags/Netty/"/>
<category term="优雅退出" scheme="https://reason94.github.io/tags/%E4%BC%98%E9%9B%85%E9%80%80%E5%87%BA/"/>
</entry>
<entry>
<title>Hello World</title>
<link href="https://reason94.github.io/2020/10/21/hello-world/"/>
<id>https://reason94.github.io/2020/10/21/hello-world/</id>
<published>2020-10-21T05:23:08.517Z</published>
<updated>2020-10-21T05:23:08.518Z</updated>
<content type="html"><![CDATA[<p>Welcome to <a href="https://hexo.io/">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="https://hexo.io/docs/troubleshooting.html">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues">GitHub</a>.</p><h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo new <span class="string">"My New Post"</span></span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/writing.html">Writing</a></p><h3 id="Run-server"><a href="#Run-server" class="headerlink" title="Run server"></a>Run server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo server</span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/server.html">Server</a></p><h3 id="Generate-static-files"><a href="#Generate-static-files" class="headerlink" title="Generate static files"></a>Generate static files</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo generate</span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/generating.html">Generating</a></p><h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo deploy</span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/one-command-deployment.html">Deployment</a></p>]]></content>
<summary type="html"><p>Welcome to <a href="https://hexo.io/">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/">documentation</a> for</summary>
</entry>
</feed>