軟體工程 wc.exe 代碼統計作業

軟體工程 wc.exe 代碼統計作業分享 1. Github 項目地址 "https://github.com/EdwardLiu Aurora/WordCount" "更好地閱讀本文，可點擊這裡" 基本要求 [x] c 統計文件字元數 (實現) [x] w 統計文件詞數（實現） [x] l 統計 ...

軟體工程 wc.exe 代碼統計作業分享

1. Github 項目地址

https://github.com/EdwardLiu-Aurora/WordCount

更好地閱讀本文，可點擊這裡

基本要求
- [x] -c 統計文件字元數 (實現)
- [x] -w 統計文件詞數（實現）
- [x] -l 統計文件行數（實現）
擴展功能
- [x] -s 遞歸處理目錄下符合條件得文件（實現）
- [x] -a 返迴文件代碼行 / 空行 / 註釋行（實現）
- [x] 支持各種文件的通配符（*,?）（實現）
高級功能
- [ ] -x 圖形化界面（未實現）

2. PSP 表格

PSP2.1	Personal Software Process Stages	預估耗時(分鐘)	實際耗時(分鐘)
Planning	計劃	5	5
· Estimate	· 估計這個任務需要多少時間	600	730
Development	開發	480	610
· Analysis	· 需求分析 (包括學習新技術)	60	60
· Design Spec	· 生成設計文檔	60	60
· Design Review	· 設計覆審 (和同事審核設計文檔)	30	30
· Coding Standard	· 代碼規範 (為目前的開發制定合適的規範)	30	10
· Design	· 具體設計	30	60
· Coding	· 具體編碼	120	240
· Code Review	· 代碼覆審	30	30
· Test	· 測試（自我測試，修改代碼，提交修改）	120	120
Reporting	報告	120	120
· Test Report	· 測試報告	60	60
· Size Measurement	· 計算工作量	30	30
· Postmortem & Process Improvement Plan	· 事後總結, 並提出過程改進計劃	30	30
合計		605	735

3. 解題思路描述

(1) 返迴文件的字元數

定義：返迴文件中除去的字元總數（中文字元分離出來計算）

思路：使用 Java 按行讀取文件，每個行就是一個 String 對象。
使用 String.length() 來統計該行的字元數，並且按照 Character 的值範圍判斷是否為中文字。使用兩個 int 變數來計算總的字元數以及總的中文字元數。

(2) 返迴文件的辭彙數

定義：不包含中文字元，只包含 0-9,a-z,A-Z 和 _ 的連續欄位稱為辭彙

思路：查看 Java 根據以上規則，編寫符合的正則表達式，使用正則表達式進行按行累加單詞數。

(3) 返迴文件的行數

定義：返迴文件中總行數（根據換行符決定）

思路：根據 Java 按行讀取文件，設定計數器。

(4) 遞歸處理目錄下符合條件的文件

定義：該目錄及子目錄下的文件全部分析，附帶用戶需要的數據

思路：使用一個函數，將該目錄下的所有符合條件的文件路徑轉成一個 ArrayList 對象並且返回到 Main 函數，由 Main 函數繼續處理。

(5) 返回更複雜的數據

代碼行：除了格式控制符號（如 "{}" "()" ";" 等）之外，包含多餘一個字元的代碼；

思路：設置一個 Set 裡面存儲了所有的格式控制字元，如果檢測到字元不在 Set 內，則判斷為代碼行（要註意的跟註釋行衝突的情況：

1. 當該行有 // 和 /* 時，觀察哪一個在前面
2. 當該行在 /* */ 註釋內時，則不屬於代碼行；
3. 當該行是 /* 註釋行第一行或者末尾一行的時候，要註意檢測 /* 前 或 */ 後 的字元；
4. 當該行僅包含 // 時，檢查 // 前的符號

註釋行：包括註釋的行號，無論本行是不是代碼行；即包含 // 或在 /* */ 範圍內的行；

思路：按行讀取，按照 // 或者 /* */ 區分情況

空行：全是空格或者格式控制字元的行；

思路：按照正則表達式和 String.indexOf() 函數進行匹配處理

(6) 文件通配符

定義：可以按照 * 代表任意 0 ~ 多個字元以及 ? 代表 1 個任意字元進行匹配

思路：先將 ? 和 * 替換為特定的正則表達式表示，然後將 . 替換為正則表達式表示，然後進行每一個路徑中的正則匹配。

4. 設計實現過程

包的說明

bean：存放將要返回的複合類型
service: 存放具體業務的函數實現
com.edwardliu_aurora：Main 函數的具體實現

類的說明

CharCount:
- allCharCount 所有字元總數
- chnCharCount 所有中字總數
LineCount:
- blankLineCount 空白行數統計
- codeLineCount 代碼行數統計
- commentLineCount 註釋行數統計
BasicStatistic:
- public CharCount getCharCount(String filePath) 返回字數統計
- public long getWordCount(String filePath) 返回詞數統計
- public long getLineCount(String filePath) 返回行數統計
ExtraStatistic:
- public LineCount getDetailLineCount(String filePath) 獲取詳細的行數信息
Utils:
- public static Charset charsetRecognize(String filePath) 識別文本的編碼類型(僅支持 GBK 和 UTF-8)
- public static ArrayList
Main:
- 主要負責輸入輸出以及以上函數的合理調用

5. 代碼說明

CharCount:
- allCharCount 所有字元總數
- chnCharCount 所有中字總數
```
package bean;

/
記錄總字元數和中文字元數的類
/
public class CharCount {
// 全體字元數目
long allCharCount = 0;
// 中文字元數目
long chnCharCount = 0;

public CharCount(long allCharCount, long chnCharCount) {
    this.allCharCount = allCharCount;
    this.chnCharCount = chnCharCount;
}

public long getAllCharCount() {
    return allCharCount;
}

public void setAllCharCount(long allCharCount) {
    this.allCharCount = allCharCount;
}

public long getChnCharCount() {
    return chnCharCount;
}

public void setChnCharCount(long chnCharCount) {
    this.chnCharCount = chnCharCount;
}

}

- LineCount:
    - blankLineCount    空白行數統計
    - codeLineCount     代碼行數統計
    - commentLineCount  註釋行數統計

package bean;

// 記錄詳細行數的類
public class LineCount {
// 空行
int blankLineCount = 0;
// 代碼行
int codeLineCount = 0;
// 註釋行
int commentLineCount = 0;

public LineCount(int blankLineCount, int codeLineCount, int commentLineCount) {
    this.blankLineCount = blankLineCount;
    this.codeLineCount = codeLineCount;
    this.commentLineCount = commentLineCount;
}

public int getBlankLineCount() {
    return blankLineCount;
}

public void setBlankLineCount(int blankLineCount) {
    this.blankLineCount = blankLineCount;
}

public int getCodeLineCount() {
    return codeLineCount;
}

public void setCodeLineCount(int codeLineCount) {
    this.codeLineCount = codeLineCount;
}

public int getCommentLineCount() {
    return commentLineCount;
}

public void setCommentLineCount(int commentLineCount) {
    this.commentLineCount = commentLineCount;
}

}

- BasicStatistic:
    - public CharCount getCharCount(String filePath)    返回字數統計

// 返迴文件字元數的函數
public CharCount getCharCount(String filePath){
// 全體字元數變數和中文字元數變數
long allCharCount = 0, chnCharCount = 0;
// 新建 nio 文件路徑對象
Path path = Paths.get(filePath);
// 為了避免文本太大，這裡採用惰性的 Stream

    - public long getWordCount(String filePath)         返回詞數統計

// 返迴文件的辭彙數
public long getWordCount(String filePath){
long wordCount = 0;
// 為了避免文本太大，這裡採用惰性的 Stream

    - public long getLineCount(String filePath)         返回行數統計

// 返迴文件的行數
public long getLineCount(String filePath){
long lineCount = 0;
// 為了避免文本太大，這裡採用惰性的 Stream

- ExtraStatistic:
    - public LineCount getDetailLineCount(String filePath)  獲取詳細的行數信息

package service;

import bean.LineCount;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;

// 高級統計功能
public class ExtraStatistic {

public LineCount getDetailLineCount(String filePath) {
    LineCount lineCount = new LineCount(0,0,0);
    // 正則表達式匹配任何非空字元
    Pattern pattern = Pattern.compile("\\S");
    // 統計空行數
    Path path = Paths.get(filePath);
    try(Stream<String> lines = Files.lines(path, Utils.charsetRecognize(filePath))){
        lineCount.setBlankLineCount(
                (int) lines.filter(line -> {
                    if(line.length() == 0) return true;
                    int i = 0;
                    Matcher matcher = pattern.matcher(line);
                    while(matcher.find()){
                        i++;
                        // 如果有超過一個非空白字元，則不為空行
                        if(i > 1) return false;
                    }
                    // 其餘為空行
                    return true;
                }).count()
        );
    }
    catch(Exception e){
        e.printStackTrace();
        System.out.println("文件不存在或無法訪問");
        return null;
    }
    // 統計註釋行和代碼行
    try(BufferedReader bufferedReader = new BufferedReader(
            new InputStreamReader(
                    new FileInputStream(filePath),
                    Utils.charsetRecognize(filePath)
            ))){
        int commentLineCount = 0;
        int codeLineCount = 0;
        // 按行讀取文件
        for(String line = bufferedReader.readLine(); line != null; line = bufferedReader.readLine())
        {
            // 單行註釋符號位置
            int oneLinePos = line.indexOf("//");
            // 多行註釋符號位置
            int mulLinePos = line.indexOf("/*");
            // 如果該行有 //，且 // 在前，則將第一次匹配到的 // 後的內容刪去，註釋行 +1
            if(oneLinePos >= 0 && (mulLinePos < 0 || (mulLinePos >= 0 && oneLinePos < mulLinePos))){
                line = line.substring(0,oneLinePos);
                commentLineCount++;
                // 如果有 >1 個非空字元，則同時也為代碼行
                Matcher matcher = pattern.matcher(line);
                int i = 0;
                while(matcher.find()) i++;
                if(i > 1) codeLineCount++;
            }
            // 如果該行只有 /* ，則檢查是否同時為代碼行。註釋行 +1，連續讀取直到遇到 */ 行
            else if(mulLinePos >= 0){
                line = line.substring(0, mulLinePos);
                commentLineCount++;
                // 如果有 >1 個非空字元，則同時也為代碼行
                Matcher matcher = pattern.matcher(line);
                int i = 0;
                while(matcher.find()) i++;
                if(i > 1) codeLineCount++;
                line = bufferedReader.readLine();
                while(line.indexOf("*/") < 0) {
                    commentLineCount++;
                    line = bufferedReader.readLine();
                }
                commentLineCount++;
                line = line.substring(line.indexOf("*/")+2);
                // 如果有超過一個非空字元，則也為代碼行
                i = 0;
                matcher = pattern.matcher(line);
                while(matcher.find()) i++;
                if(i > 1) codeLineCount++;
            }
            // 如果沒有註釋，則看是否能匹配到 >1 個非空字元
            else{
                int i = 0;
                Matcher matcher = pattern.matcher(line);
                while(matcher.find()) i++;
                if(i > 1) codeLineCount++;
            }
        }
        lineCount.setCodeLineCount(codeLineCount);
        lineCount.setCommentLineCount(commentLineCount);
    }
    catch(Exception e){
        e.printStackTrace();
        System.out.println("文件不存在或無法訪問");
        return null;
    }
    return lineCount;
}

}

- Utils:
    - public static Charset charsetRecognize(String filePath)                               識別文本的編碼類型(僅支持 GBK 和 UTF-8)

// 文件編碼類型簡單識別
public static Charset charsetRecognize(String filePath){
try{
File file = new File(filePath);
InputStream in = new java.io.FileInputStream(file);
byte[] b = new byte[3];
in.read(b);
in.close();
if (b[0] == -17 && b[1] == -69 && b[2] == -65)
return Charset.forName("UTF-8");
else{
try (Stream

    - public static ArrayList<String> getFilesPath(String folderPath,String filePattern)    返回某目錄下的所有符合 filePattern 通配符的文件路徑

// 獲取一個目錄下及其子目錄下的所有的文件路徑
public static ArrayList

- Main:
    - 主要負責輸入輸出以及以上函數的合理調用

package com.edwardliu_aurora;

import bean.CharCount;
import bean.LineCount;
import service.BasicStatistic;
import service.ExtraStatistic;
import service.Utils;

import java.io.File;
import java.util.ArrayList;

public class Main {
public static void main(String[] args) {
boolean charCount = false;
boolean wordCount = false;
boolean lineCount = false;
boolean directory = false;
boolean detailLine = false;
for(int i=0;i<args.length-1;i++){
if(args[i].equals("-c")) charCount = true;
else if(args[i].equals("-w")) wordCount = true;
else if(args[i].equals("-l")) lineCount = true;
else if(args[i].equals("-s")) directory = true;
else if(args[i].equals("-a")) detailLine = true;
}
BasicStatistic basicStatistic = new BasicStatistic();
ExtraStatistic extraStatistic = new ExtraStatistic();
if(directory) {
String filePattern = args[args.length-1];
filePattern = filePattern.
replaceAll("\?","[^/\\\\:*?<>|]").
replaceAll("\","[^/\\\\:*?<>|]").
replaceAll("\.","\\.");
ArrayList

6. 測試運行

單元測試 (已經在開發過程中進行，在這裡就不展示了)

測試文件

空文件

只有一個字元的文件

只有一個詞的文件

只有一行的文件

一個典型的源文件

基本功能

統計特定文件字元數、辭彙數、行數

已經在上方圖片展示

高級功能

代碼覆蓋率測試

7. 實際花費時間 (見開頭 PSP 表，已填入)

8. 項目小結

我在這個項目中使用了 Java 1.8 中才開始支持的 Stream API。好處是可以支持函數式編程，可以很方便地用並行運算對文件進行統計工作。而同時這也帶來了一個問題——用戶必須使用 JRE 1.8+ 的版本才能運行我的程式。
在這個項目中，我在讀取文件時發現了一個文件編碼的識別問題。文件編碼的識別問題本身比較複雜，因為 Windows 下的文本文件，有的帶有 BOM 頭信息，而有的沒有攜帶 BOM 頭信息。對於有攜帶 BOM 頭信息的，我可以很方便地識別出該文件是否為 UTF-8 編碼。然而，有很多文件是沒有 BOM 編碼的，我只能根據異常來猜測這個文件是什麼編碼的。所以目前我的編碼識別函數只能支持 UTF-8 和 GBK 兩種編碼，並沒有支持其他編碼。一旦用戶的文本文件是其他編碼的，我的程式會出現不可預知的錯誤。
我沒有對用戶輸入的參數進行判斷和校驗。我只對是否存在這個文件做了簡單的校驗。一旦用戶把命令輸入錯誤了，我的程式將發生無法預料的錯誤。
使用軟體工程的方法來進行項目的設計，前期也許會花費很多時間，但是事後在編寫的過程中會更加清晰有條理，讓整體項目設計變得更加可控。