筆記 Bioinformatics Algorithms Chapter2

-Advertisement-

Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS Chapter2 WHICH DNA PATTERN ...

Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS

尋找模序

一、

轉錄因數會結合基因上游的特定序列，調控基因的轉錄表達，但是在不同個體中，這個序列會有一些差別。本章講述用貪婪、隨機演算法來尋找這個序列：尋找模序。

（NF-κB binding sites from the Drosophila melanogaster genome）

二、一些概念：

1. Score、Profile 的含義如圖

根據profile matrix 可以計算出某個kmer在某一profile下的概率

三、

提出問題：Motif Finding Problem:

Given a collection of strings, find a set of k-mers, one from each string, that minimizes the score of the resulting motif.

Input: A collection of strings Dna and an integer k.

Output: A collection Motifs of k-mers, one from each string in Dna, minimizing Score(Motifs) among all possible choices of k-mers.

一組序列中，尋找一組k-mer，它們的Score是最低的（或者與consensus sequence的海明距離之和最小）

1 遍歷

MedianString(Dna, k)
        distance ← ∞
        for each k-mer Pattern from AA…AA to TT…TT
            if distance > d(Pattern, Dna)
                 distance ← d(Pattern, Dna)
                 Median ← Pattern
        return Median

2 貪婪法 GreedyMotifSearch

GREEDYMOTIFSEARCH(Dna, k, t)
        BestMotifs ← motif matrix formed by first k-mers in each string
                      from Dna
        for each k-mer Motif in the first string from Dna
            Motif1 ← Motif
            for i = 2 to t
                form Profile from motifs Motif1, …, Motifi - 1
                Motifi ← Profile-most probable k-mer in the i-th string
                          in Dna
            Motifs ← (Motif1, …, Motift)
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
        output BestMotifs

詳解 http://www.mrgraeme.co.uk/greedy-motif-search/

*貪婪法 GreedyMotifSearch with pseudocounts

pseudocounts：在形成profile matrix時，給0項設為一個較小的值

GreedyMotifSearch(Dna, k, t)
        form a set of k-mers BestMotifs by selecting 1st k-mers in each string from Dna
        for each k-mer Motif in the first string from Dna
            Motif1 ← Motif
            for i = 2 to t
                apply Laplace's Rule of Succession to form Profile from motifs   Motif1, …, Motifi-1
                Motifi ← Profile-most probable k-mer in the i-th string in Dna
            Motifs ← (Motif1, …, Motift)
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
        output BestMotifs

3. 隨機法Randomized Motif Search

RandomizedMotifSearch(Dna, k, t)
　　　　　#隨機從每個DNA取k-mer，生成一組motifs
        randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna
        BestMotifs ← Motifs
        while forever
            Profile ← Profile(Motifs)#根據motifs形成Profile矩陣
            Motifs ← Motifs(Profile, Dna) #根據profile矩陣從一組DNA生成一組幾率最大的motifs
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
            else
                return BestMotifs

隨機演算法起到作用的原因是，隨機選取的一組Motifs，有可能選到潛在正確的一個k-mer，那麼就在這中形成傾斜，直至尋找到較優解

改進: 上一個演算法，每次迭代都重新隨機生成一組新的Motifs，這可能把潛在的正確模序拋棄了，改進的方法是每次隨機只更改一行k-mer

GibbsSampler(Dna, k, t, N)
        randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna
        BestMotifs ← Motifs
        for j ← 1 to N
            i ← Random(t)
            Profile ← profile matrix constructed from all strings in Motifs except for Motif[i]
            Motif[i] ← Profile-randomly generated k-mer in the i-th sequence
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
        return BestMotifs

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

支付寶系統架構內部剖析

支付寶系統架構概況典型處理預設資金處理平臺財務會計支付清算核算中心交易柔性事務消息系統數據分佈數據緩存支付寶技術產品線支付寶的開源分散式消息中間件–Metamorphosis(MetaQ) Metamorphosis (MetaQ) 是一個高性能、高可用、可擴展的分散式消息中 ...
springMVC框架對BaseCtrl封裝，簡化開發

讓你的項目有對象，你的項目如何才會有面向對象特征呢？沒有面向對象特征的項目不是好項目哦。此篇博文會使用到面向對象特征中的封裝繼承，還有就是枚舉類型。這篇博文教你如何讓你的項目體現面向對象特征。最近公司需要做一個後臺系統，做了幾個月了老系統的維護更新，真心有點受不了，畢竟一個項目經過了幾個人的手，每 ...
spring-boot-2.0.3不一樣系列之番外篇 - 自定義session管理，絕對有值得你看的地方

前言還記得當初寫spring-session實現分散式集群session的共用的時候，裡面有說到利用filter和HttpServletRequestWrapper可以定製自己的getSession方法，實現對session的控制，從而將session存放到統一的位置進行存儲，達到session共 ...
PHP加密解密函數

網上搜的加密解密函數，保存下 ...
第47節：Java當中的基本類型包裝類

Java當中的基本類型包裝類 01 基本數據類型對象的包裝類什麼是基本數據類型對象包裝類呢？就是把基本數據類型封裝成對象，這樣就可以提供更多的操作基本數值的功能了。基本數據類型對象的包裝類個： | 基本數據類型 | 包裝類 | | | | | byte | Byte | | short | ...
c/c++ 標準庫插入迭代器詳解

標準庫插入迭代器詳解插入迭代器作用：copy等函數不能改變容器的大小，所以有時copy先容器是個空的容器，如果不使用插入迭代器，是無法使用copy等函數的。例如下麵的代碼就是錯誤的： lst2是個空的容器，copy函數不能擴容容器lst2，所以會發生運行時錯誤。用插入迭代器就可以很好的解決 ...
使用gunicorn部署Flask項目

[ ] 本文出處：http://b1u3buf4.xyz/ " ] 本文作者：[B1u3Buf4" [ ] 本文授權：禁止轉載從自己的博客移動過來。 gunicorn是一個python Wsgi的WEB服務框架，只支持在Unix系統上運行，來源於Ruby的unicorn項目。雖然可以獨自運行，但功 ...
PyDev for Eclipse 無法正常使用的解決方法

【問題描述】在eclipse中配置Python解釋器PyDev時，按照如下配置，可以配置完成，如圖：安裝好PyDev後，接下來配置Python解釋器時，點擊Windows -> Preferences -> PyDev -> Interpreters -> Python Interpreter報 ...