記錄一下通過分析Tomcat內部jar包找出request.getReader()所用的字元編碼在哪裡設置和起效的完整分析流程

前言：之前寫Java服務端處理POST請求時遇到了請求體轉換成字元流所用編碼來源的疑惑，在doPost方法里通過request.getReader()獲取的BufferedReader對象內部的 Reader用的是什麼編碼將位元組流轉換成字元流的呢？又是在哪裡設置呢和什麼時候生效的呢？通過查找資料， ...

前言：

　　之前寫Java服務端處理POST請求時遇到了請求體轉換成字元流所用編碼來源的疑惑，在doPost方法里通過request.getReader()獲取的BufferedReader對象內部的

Reader用的是什麼編碼將位元組流轉換成字元流的呢？又是在哪裡設置呢和什麼時候生效的呢？通過查找資料，我瞭解到通過HttpServletRequest對象獲得請求體數據

有三種方法，其中兩種是不管HTTP請求頭設置Content-Type為何值都能夠在不重覆獲取輸入流的前提下獲取到數據的，一個是request.getInputStream()，一個是request.getReader()；

對於前者我們可以在其上面套一個InputStreamReader並設置編碼便能正確讀取出字元數據，但是對於後者猜測是通過request.setCharacterEncoding(charsetName);來設置；但是當時

挺想知道這兩句代碼是怎麼關聯起來的，於是就開始了讀源碼的過程。

步驟：

　　最開始的時候我是想通過request.getReader()來找出答案，於是通過列印request.getClass().toString()，知道了request對象真正的類是org.apache.catalina.connector.RequestFacade，

通過名字最終找出這個類是Tomcat安裝目錄中lib目錄下的catalina.jar，導入到項目找出RequestFacade.getReader()的源碼為：

public BufferedReader getReader() throws IOException {
    if (this.request == null) {
        throw new IllegalStateException(sm.getString("requestFacade.nullRequest"));
    } else {
        return this.request.getReader();
    }
}

然後找出this.request的類是org.apache.catalina.connector.Request，通過RequestFacade構造方法初始化，接著找到org.apache.catalina.connector.Request.getReader()的代碼為：

public BufferedReader getReader() throws IOException {
    if (this.usingInputStream) {
        throw new IllegalStateException(sm.getString("coyoteRequest.getReader.ise"));
    } else {
        this.usingReader = true;
        this.inputBuffer.checkConverter();
        if (this.reader == null) {
            this.reader = new CoyoteReader(this.inputBuffer);
        }
        return this.reader;
    }
}

這裡註意this.inputBuffer.checkConverter();這裡將會把request.setCharacter(charsetName)設置的編碼應用在位元組流轉換為字元串的過程上，這個過程後面再講。

我們先看new CoyoteReader(this.inputBuffer);由於CoyoteBuffer是繼承自BufferedReader，故真正將位元組流轉換為字元流的應該是this.inputBuffer，

查看代碼得知它的類型為：org.apache.catalina.connector.InputBuffer，類定義為：

public class InputBuffer extends Reader implements ByteInputChannel, CharInputChannel, CharOutputChannel {
。。。。。
}

由於它和InputStreamReader有共同的父類Reader，故我猜測將位元組流轉換成字元流的應該就是InputBuffer類了，但是線索到了就斷了我不知道接下來該看哪裡了（後來理清思路後發現其實應該往上找看InputBuffer是在哪創建及賦值的），

於是我回到最初的猜測，request.getReader()是通過request.setCharacterEncoding(charsetName)來實現的；通過查看request.setCharacterEncoding(charsetName)源碼

得知RequestFacade設置字元編碼是通過內部的org.apache.catalina.connector.Request，而這個Request又是通過內部的org.apache.coyote.Request來實現的，導入所需jar包：tomcat-coyote.jar

其中coyoteRequest.setCharacterEncoding(charsetName)的代碼為：

public void setCharacterEncoding(String enc) {
    this.charEncoding = enc;
}

到了這裡後線索又斷了，我只知道最初RequestFacade設置的編碼最終是保存在org.apache.catalina.connector.Request里，但是這個編碼是什麼時候用到了InputBuffer上就不知道了。

趁著這階段還弄清楚了RequestFacade無論是設置編碼、獲得編碼、getContentLength()等方法本質上都是通過org.apache.coyote.Request來最終實現的。

回到正題，線索斷了以後我後來通過找到是哪裡new了InputBuffer及是哪裡給InputBuffer設置編碼和位元組流等思考繼續回到了org.apache.catalina.connector.Request類的定義里，

通過搜索發現org.apache.catalina.connector.Request內部的this.inputBuffer是在構造方法里創建的，但是只有一個空殼，而RequestFacade.getInputStream()最終也是以this.inputBuffer作為了

位元組流的參數new CoyoteInputStream(this.inputBuffer);故它可能本身既能讀取字元流又能讀取位元組流，即它是存儲著第一手的數據。

接著找到了org.apache.catalina.connector.Request中的一個方法：

public void setCoyoteRequest(org.apache.coyote.Request coyoteRequest) {
    this.coyoteRequest = coyoteRequest;
    this.inputBuffer.setRequest(coyoteRequest);
}

我之前一直鑽進找InputBuffer編碼的巷道里，忘了找coyoteRequest這麼重要的屬性是從哪賦值的了，經過搜查org.apache.catalina.connector.Request里只有這個set方法可以給this.coyoteRequest賦值，故這個set方法

一定會執行，也就是說this.inputBuffer.setRequest(coyoteRequest);會執行，而coyoteRequest里保存著RequestFacade設置的編碼，故而InputBuffer里需要的編碼來源有了。

接著看InputBuffer里哪裡會用到這個coyoteRequest，找了一下InputBuffer里一大堆方法都用到了coyoteRequest，經過一番思考想到外部程式是通過BufferedReader來讀取字元流的，而BufferedReader讀取字元流又是

通過構造方法初載入的的Reader來讀取的，即是通過InputBuffer的Read(char[]....)方法讀取數據的，故找到InputBuffer中的這個方法，定義如下：

public int read(char[] cbuf, int off, int len) throws IOException {
    if (this.closed) {
        throw new IOException(sm.getString("inputBuffer.streamClosed"));
    } else {
        return this.cb.substract(cbuf, off, len);
    }
}

可見InputBuffer讀取字元流又是通過this.cb的substract方法讀取的，查找代碼得知cb是CharChunk類，導入jar包：tomcat-util.jar，CharChunk.substract的源碼為：

public int substract(char[] src, int off, int len) throws IOException {
        int n;
        if (this.end - this.start == 0) {
            if (this.in == null) {
                return -1;
            }

            n = this.in.realReadChars(this.buff, this.end, this.buff.length - this.end);
            if (n < 0) {
                return -1;
            }
        }

        n = len;
        if (len > this.getLength()) {
            n = this.getLength();
        }

        System.arraycopy(this.buff, this.start, src, off, n);
        this.start += n;
        return n;
    }

這裡面的this.in.realReadChars(...)很關鍵，從名字可以猜測這個是真正讀取字元數組的方法，然後通過查找，this.in就是之前的InputBuffer對象。

然後我通過看CharChunk的代碼，發現this.start和this.end最初值為0，故第一次調用此方法時會執行this.in.realReadChars(...)，我們來看這個方法定義：

public int realReadChars(char[] cbuf, int off, int len) throws IOException {
        if (!this.gotEnc) {
            this.setConverter();
        }

        boolean eof = false;
        if (this.bb.getLength() <= 0) {
            int nRead = this.realReadBytes(this.bb.getBytes(), 0, this.bb.getBytes().length);
            if (nRead < 0) {
                eof = true;
            }
        }

        if (this.markPos == -1) {
            this.cb.setOffset(0);
            this.cb.setEnd(0);
        } else {
            this.cb.makeSpace(this.bb.getLength());
            if (this.cb.getBuffer().length - this.cb.getEnd() == 0 && this.bb.getLength() != 0) {
                this.cb.setOffset(0);
                this.cb.setEnd(0);
                this.markPos = -1;
            }
        }

        this.state = 1;
        this.conv.convert(this.bb, this.cb, eof);
        return this.cb.getLength() == 0 && eof ? -1 : this.cb.getLength();
    }

通過查看代碼發現this.goEnc初始為false，只有this.setConverter()後才變為true，故第一次會執行setConverter()，再來看setConverter()的源碼：

protected void setConverter() throws IOException {
        if (this.coyoteRequest != null) {
            this.enc = this.coyoteRequest.getCharacterEncoding();
        }

        this.gotEnc = true;
        if (this.enc == null) {
            this.enc = "ISO-8859-1";
        }

        this.conv = (B2CConverter)this.encoders.get(this.enc);
        if (this.conv == null) {
            if (SecurityUtil.isPackageProtectionEnabled()) {
                try {
                    this.conv = (B2CConverter)AccessController.doPrivileged(new PrivilegedExceptionAction<B2CConverter>() {
                        public B2CConverter run() throws IOException {
                            return new B2CConverter(InputBuffer.this.enc);
                        }
                    });
                } catch (PrivilegedActionException var3) {
                    Exception e = var3.getException();
                    if (e instanceof IOException) {
                        throw (IOException)e;
                    }
                }
            } else {
                this.conv = new B2CConverter(this.enc);
            }

            this.encoders.put(this.enc, this.conv);
        }

    }

有代碼：this.enc = this.coyoteRequest.getCharacterEncoding();

並且通過this.enc初始化了一個B2CConverter對象，從名字可猜測這個類是將位元組流轉換成字元流的轉換器；

我們回到realReadChars(...)的源碼里有必執行的代碼：this.conv.convert(this.bb, this.cb, eof);

這個代碼是將this.bb轉換生成字元流數據到this.cb里（bb是ByteChunk對象），至此可知將位元組流轉換成字元流是通過InputBuffer的this.conv.convert(...)轉換，而字元編碼則是通過setConverter()來獲取coyoteRequest的編碼進行設置在this.conv里，且

setConverter()只執行一次，因為setConverter()內部會將this.gotEnc = true;，故我們需要找出最早執行setConverter()地方，發現除了realReadChars()還有checkConverter()方法也會執行setConverter()方法，而

checkConverter()方法在org.apache.catalina.connector.Request.getReader()方法里就會執行，故可以得知必需先調用RequestFacade.setCharacterEncoding(charsetName)方法再執行getReader()方法，順序錯了設置的編碼將不會起效於Reader中，

對於ResponseFacade.getWriter()也是一樣。