閱讀(18.6k) 書簽贊(0) 我要糾錯

字符串

2018-02-24 15:53 更新

字符串

程序1：循環(huán)輸入并將每個單詞插入集合S(忽略重復單詞)，然后排序輸出。

int main(void) {
    set<string> s;
    set<string>::iterator j;
    string t;
    while(cin >> t)
        s.insert(t);
    for(j=s.begin(); j!=s.end(); ++j)
        cout<<*j<<endl;
    return 0;
}

程序2：單詞計數(shù)

int main(void) {
    map<string, int> m;
    map<string, int>::iterator j;
    string t;
    while(cin >> t)
        m[t]++;
    for(j=m.begin(); j!=m.end(); ++j)
        cout<<j->first<<" "<<j->second<<endl;
    return 0;
}

程序3：建立自己的哈希表(散列表)，以下是一種實現(xiàn)：

class Hash {
public:
    Hash(): seed_(131), size_(0) {
        memset(head_, 0, sizeof(head_));
    }

    void Insert(const char* str) {
        unsigned int id = hash(str);
        char *dst = (char*)node_[size_].word;
        while(*dst++ = *str++);
        node_[size_].next = head_[id];
        head_[id] = &node_[size_];
        ++size_;
    }

    bool Find(const char* str) {
        unsigned int id = hash(str);
        for(Node* p=head_[id]; p; p=p->next) {
            char* dst = (char*)p->word;
            int i = 0;
            for(; *(str+i) && *(str+i)==*(dst+i); ++i);
            if(!*(str+i) && !*(dst+i)) return true;
        }
        return false;
    }

private:
    unsigned int hash(const char* str) {// BKDR Hash Function
        unsigned int hash = 0;
        while(*str) {
            hash = hash * seed_ + (*str++);
        }
        return (hash & 0x7FFFFFFF) % kHashSize;
    }

private:
    unsigned int seed_;
    unsigned int size_;
    static const int kWordSize = 26 + 1;
    static const int kNodeSize = 20000;
    static const int kHashSize = 10001;
    struct Node {
        char word[kWordSize];
        Node *next;
    };
    Node node_[kNodeSize];
    Node* head_[kHashSize];
};

后綴數(shù)組

假設我們有以下字符串及一個char*數(shù)組：

 char c[20] = "hawstein";
 char* pc[20];

我們讓指針pc[i]指向字符串的第i個字符，即：

for(int i=0; i<8; ++i)
    pc[i] = &c[i];

這時候我們輸出pc[i]，會得到字符串"hawstein"的所有后綴：

hawstein
awstein
wstein
stein
tein
ein
in
n

然后，我們對數(shù)組pc進行排序，將所有后綴按字典序排序：

sort(pc, pc+8, cmp);

其中，比較函數(shù)cmp如下：

inline bool cmp(char* p, char*q) {
    return strcmp(p, q) < 0;
}

這時，我們再輸出pc[i]，會得到排序后的結果：

awstein
ein
hawstein
in
n
stein
tein
wstein

我們把數(shù)組pc稱為“后綴數(shù)組”。這里需要注意，數(shù)組pc 中存儲的是指向每個后綴首字符的地址。我們也可以存儲每個后綴首字符在原數(shù)組中的下標，效果是一樣的。

本章中用后綴數(shù)組解決了一個小問題：可重疊最長重復子串。比如對于字符串"banana"，其后綴數(shù)組為：

a
ana
anana
banana
na
nana

掃描一次數(shù)組，比較相鄰子串，找到相鄰子串的最長公共前綴即可。本例為"ana"，其中一個a是重疊的。

后綴數(shù)組是處理字符串的有力工具，常見的兩種實現(xiàn)方法是：倍增算法和DC3算法。推薦閱讀以下材料來學習后綴數(shù)組：

許智磊，《后綴數(shù)組》

羅穗騫，《后綴數(shù)組——處理字符串的有力工具》

以上內容是否對您有幫助：

← 堆

寫筆記

我要補充