Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS Chapter2 WHICH DNA PATTERN ...
Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS
尋找模序
一、
轉錄因數會結合基因上游的特定序列,調控基因的轉錄表達,但是在不同個體中,這個序列會有一些差別。本章講述用貪婪、隨機演算法來尋找這個序列:尋找模序。
二、一些概念:
1. Score、Profile 的含義如圖
根據profile matrix 可以計算出某個kmer在某一profile下的概率
三、
提出問題:Motif Finding Problem:
Given a collection of strings, find a set of k-mers, one from each string, that minimizes the score of the resulting motif.
Input: A collection of strings Dna and an integer k.
Output: A collection Motifs of k-mers, one from each string in Dna, minimizing Score(Motifs) among all possible choices of k-mers.
一組序列中,尋找一組k-mer,它們的Score是最低的(或者與consensus sequence的海明距離之和最小)
1 遍歷
MedianString(Dna, k) distance ← ∞ for each k-mer Pattern from AA…AA to TT…TT if distance > d(Pattern, Dna) distance ← d(Pattern, Dna) Median ← Pattern return Median
2 貪婪法 GreedyMotifSearch
GREEDYMOTIFSEARCH(Dna, k, t) BestMotifs ← motif matrix formed by first k-mers in each string from Dna for each k-mer Motif in the first string from Dna Motif1 ← Motif for i = 2 to t form Profile from motifs Motif1, …, Motifi - 1 Motifi ← Profile-most probable k-mer in the i-th string in Dna Motifs ← (Motif1, …, Motift) if Score(Motifs) < Score(BestMotifs) BestMotifs ← Motifs output BestMotifs
詳解 http://www.mrgraeme.co.uk/greedy-motif-search/
*貪婪法 GreedyMotifSearch with pseudocounts
pseudocounts:在形成profile matrix時,給0項設為一個較小的值
GreedyMotifSearch(Dna, k, t)
form a set of k-mers BestMotifs by selecting 1st k-mers in each string from Dna
for each k-mer Motif in the first string from Dna
Motif1 ← Motif
for i = 2 to t
apply Laplace's Rule of Succession to form Profile from motifs Motif1, …, Motifi-1
Motifi ← Profile-most probable k-mer in the i-th string in Dna
Motifs ← (Motif1, …, Motift)
if Score(Motifs) < Score(BestMotifs)
BestMotifs ← Motifs
output BestMotifs
3. 隨機法Randomized Motif Search
RandomizedMotifSearch(Dna, k, t)
#隨機從每個DNA取k-mer,生成一組motifs randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna BestMotifs ← Motifs while forever Profile ← Profile(Motifs)#根據motifs形成Profile矩陣 Motifs ← Motifs(Profile, Dna) #根據profile矩陣從一組DNA生成一組幾率最大的motifs if Score(Motifs) < Score(BestMotifs) BestMotifs ← Motifs else return BestMotifs
隨機演算法起到作用的原因是,隨機選取的一組Motifs,有可能選到潛在正確的一個k-mer,那麼就在這中形成傾斜,直至尋找到較優解
改進: 上一個演算法,每次迭代都重新隨機生成一組新的Motifs,這可能把潛在的正確模序拋棄了,改進的方法是每次隨機只更改一行k-mer
GibbsSampler(Dna, k, t, N) randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna BestMotifs ← Motifs for j ← 1 to N i ← Random(t) Profile ← profile matrix constructed from all strings in Motifs except for Motif[i] Motif[i] ← Profile-randomly generated k-mer in the i-th sequence if Score(Motifs) < Score(BestMotifs) BestMotifs ← Motifs return BestMotifs