哈希表之單詞查找並輸出其上下文

-Advertisement-

#哈希表之單詞查找英文文章，寫入一個文本文檔，作為基礎測試數據。建立一個待查關鍵字文件，存儲待查詢的單詞。輸入查詢的單詞，輸出該單詞的出現次數，及每個出現位置的上下文（前一句，單詞/短語所在的句子，下一句）。目前只支持查詢一個單詞。以下為代碼： #include <iostream> #i ...

哈希表之單詞查找

英文文章，寫入一個文本文檔，作為基礎測試數據。建立一個待查關鍵字文件，存儲待查詢的單詞。
輸入查詢的單詞，輸出該單詞的出現次數，及每個出現位置的上下文（前一句，單詞/短語所在的句子，下一句）。
目前只支持查詢一個單詞。

以下為代碼：


#include <iostream>
#include <string.h>
#include<vector>
#include <map>
#include <set>
#include<sstream>
#include <fstream>
#include <windows.h>
const int HASHNUM = 29989;

typedef struct Hashtable{            //哈希表結構體
    char *word;
    int count;
    struct Hashtable * next;
}Hashnode;
Hashnode* hashtable[HASHNUM];       //HASHNUMBER大小的指針數組(儲存指針的數組) 作為hash表

std::vector<std::string>lines;
std::map<std::string,std::set<int>> wordline;   //需要使用到迭代器匹配行號所以使用map嵌套set

unsigned int Hashfuntion(const char *str);
void Hashinsert(char *str);
void readintohash(char *path);
void findquery(char *path);
void getlinecplusplus(char *path,char *str);


void readintohash(char *path){
    char ch[1024] = {NULL};              //存儲一行字元串
    char *p;
    FILE *fp = fopen(path,"r");
    if(fp == NULL)
        exit(-1);
    while(( fgets(ch,sizeof (ch),fp) != NULL ))                           //讀取到換行符時，或者到達文件末尾時停止
    {
        if(strcmp( " " , ch ) == 0)                                       //空行繼續
            continue;

        const char *ch1 = ", \n\t.";                                      //要分割的字元(逗號、空格、換行、製表符、句號)
        p = strtok(ch,ch1);                                               //用strtok函數從一行字元串中分離出每個單詞，分隔符設置為(空格、逗號、換行、製表符)
        while(p != NULL)
        {
            Hashinsert(p);
            p = strtok(NULL, ch1);                              //每次找到一個分隔符後，一個空（NULL）就被放到分隔符處，strtok函數用這種方法來連續查找該字元串，此處的NULL是一個指向後面還未分割的單詞的指針，因此，首次調用時，第一個元素必須指向要分解的字元串，隨後調用要把元素設成NULL。
        }
    }
    fclose(fp);
}

void Hashinsert(char *str){
    int hashcode = Hashfuntion(str);
    Hashnode *newnode;
    for (newnode = hashtable[hashcode]; newnode != NULL ; ++newnode) {
        if(strcmp(str, newnode->word) == 0){       //查找單詞是否已經在hash表中了
            (newnode->count)++;
            return;
        }
    }
    //如果上面沒返回，也就是說hash表中沒有這個單詞，添加新節點，加入這個單詞
    newnode = new Hashnode;
    newnode->count = 1;

    newnode->word = (char *)malloc(strlen(str) + 1);
    strcpy(newnode->word,str);

    newnode->next = hashtable[hashcode];        //將新生成的節點插入到hashcode為下標的鏈表中去
    hashtable[hashcode] = newnode;
}

unsigned int Hashfuntion(const char *str)
{
    unsigned int index = 0;
    for (; *str != '\0' ; str++) {
        index = index * 31 + *str;
    }
    return index % HASHNUM;
}

void findquery(char *path){
    char ch[50] = {NULL};
    FILE *fp = fopen(path,"r");
    std::ofstream outfile1("E:\\answer.txt");

    while(fgets(ch,sizeof (ch),fp)){                    //把待查關鍵字讀入
        int hashcode2 = Hashfuntion(ch);
        outfile1 << hashtable[hashcode2]->word << "出現了" << hashtable[hashcode2]->count << "次" << std::endl;
    }
    getlinecplusplus("E:\\database.txt",ch);
    fclose(fp);
}

void getlinecplusplus(char *path,char *str){
    std::ifstream readfile(path);                                        //用ifstream讀入txt文件path並與readfile相連
    std::ofstream outfile("E:\\answer.txt",std::ofstream::app);         //用ofstream將答案輸出到answer.txt
    int linecount = 0;
    std::string line;
    std::string word;
    while(std::getline(readfile, line)){
        lines.push_back(line);                                        //將一行放入vector<string>中
        std::istringstream iss(line);                                 //獲取一行
        ++linecount;                                                  //行號加1
        while(iss >> word){  
            wordline[word].insert(linecount);                        //加入詞的行號
        }
    }
    std::set<int>::const_iterator itset;
    std::vector<std::string>::const_iterator itvector;              //新建兩個迭代器

    int i;
    for(itset = (wordline.find(str)->second).begin();itset != (wordline.find(str)->second).end(); itset++)
    {
        outfile << "出現在第 "<< *itset <<"行" << std::endl;
        i = 0;
        for (itvector = lines.begin(); itvector != lines.end(); itvector++) {       //前一句，單詞所在的句子，下一句
            if (*itset == ++i)          //行號與vector迴圈次數匹配上了
            {
                if(itvector != lines.begin()){                                      //第一句不能列印前一句
                    outfile << *(itvector-1) << std::endl;
                }
                if(itvector != lines.end()-1){                                      //最後一句不能列印後一句
                    outfile << *(itvector+1) << std::endl;
                }
                outfile << *itvector << std::endl;                                  //單詞本身句
            }
        }
    }
    outfile.close();
}

int main(){
    SetConsoleOutputCP(65001);              //避免控制臺中文亂碼
    readintohash("E:\\database.txt");      //自行更改路徑
    findquery("E:\\query.txt");
    return 0;
}

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

你是來找茬的吧？對自己的博客進行調優

“作為博客園的使用者而不是開發者，就不能對博客進行調優了？看好了，我只示範一次。” —— 我說的 0x00 大綱 0x01 前言用過很多博客和寫作平臺，但是最終還是選擇了博客園，畢竟，自定義 CSS 和自定義 JS 是真的香！某天突發奇想，決定對自己的博客進行下優化，現將其中的一些心得與大家分享。 ...
物聯網 IOT 設備如何脫離信息孤島？

目前在家庭物聯網這一塊，絕大部分的電子消費品都是基於wifi聯網的設備。從商家那裡達到消費者手中之後，簡單開機使用無法體現其全部價值，還是需要經過消費者給設備配網的過程，把設備從信息孤島接入互聯互通的世界。 ...
後臺開發進階：白話DDD從入門到實踐

DDD（領域驅動設計）是 Eric Evans 於 2003 年提出的解決複雜的中大型軟體的方法，開始一直不慍不火。直到 Martin Fowler 於 2014 年發表的論文《Microservices》引起大家對微服務的關註，DDD 才重新慢慢的回到了大眾的視野中。 DDD 這幾年升溫的同時，... ...
Spring學習筆記 - 第三章 - AOP與Spring事務

原文地址：Spring學習筆記 - 第三章 - AOP與Spring事務 Spring 學習筆記全系列傳送門： Spring學習筆記 - 第一章 - IoC（控制反轉）、IoC容器、Bean的實例化與生命周期、DI（依賴註入） Spring學習筆記 - 第二章 - 註解開發、配置管理第三方Bean、 ...
Spring IOC官方文檔學習筆記（五）之bean的作用域

家居網購項目實現09 以下皆為部分代碼，詳見 https://github.com/liyuelian/furniture_mall.git 21.功能20-修改購物車 21.1需求分析/圖解進入購物車頁面，可以修改購買數量更新該商品的金額更新購物車商品數量和總金額 21.2思路分析 21.3 ...
python學習： fire庫的使用教程

一. 介紹 fire是python中用於生成命令行界面(Command Line Interfaces, CLIs)的工具，不需要做任何額外的工作，只需要從主模塊中調用fire.Fire()，它會自動將你的代碼轉化為CLI，Fire()的參數可以說任何的python對象二. 安裝 pip inst ...
11個案例講透 Python 函數參數

今天給大家分享一下自己整理的一篇 Python 參數的內容，內容非常的乾，全文通過案例的形式來理解知識點，自認為比網上 80% 的文章講的都要明白，如果你是入門不久的 python 新手，相信本篇文章應該對你會有不小的幫助。接下來是正文。 1、參數分類函數，在定義的時候，可以有參數的，也可以沒有 ...
Chapter_2_變數和簡單數據類型

# In[1]# 2.3 字元串name = "ada loveada"print(name.title())print(name.upper())print(name.isupper())name = name.upper()print(name.isupper())print(name.lowe ...