| 24小時(shí)熱門(mén)版塊排行榜 |
| 5 | 1/1 | 返回列表 |
| 查看: 6681 | 回復(fù): 32 | ||||||||||||||
| 當(dāng)前只顯示滿(mǎn)足指定條件的回帖,點(diǎn)擊這里查看本話(huà)題的所有回帖 | ||||||||||||||
cnlics木蟲(chóng) (小有名氣)
|
[交流]
【分享】蛋白質(zhì)結(jié)構(gòu)預(yù)測(cè)流程 已有23人參與
|
|||||||||||||
|
我慢慢翻譯慢慢貼 這里貼的內(nèi)容是以前收集的,應(yīng)該是來(lái)自EMBL,我粗略瀏覽了下內(nèi)容,還沒(méi)有過(guò)時(shí)。 WORD文檔可以在這里下載: http://ifile.it/dwzy278 蛋白質(zhì)結(jié)構(gòu)預(yù)測(cè)一般流程見(jiàn)下圖: ![]() 內(nèi)容目錄: •相關(guān)實(shí)驗(yàn)數(shù)據(jù) •序列數(shù)據(jù)和初步分析 •搜索序列數(shù)據(jù)庫(kù) •識(shí)別結(jié)構(gòu)域 •多序列比對(duì) •比較或同源建模 •二級(jí)結(jié)構(gòu)預(yù)測(cè) •折疊的識(shí)別 •折疊分析與二級(jí)結(jié)構(gòu)比對(duì) •序列與結(jié)構(gòu)的比對(duì) [ Last edited by cnlics on 2010-9-16 at 08:24 ] |
蛋白質(zhì)生物學(xué)實(shí)驗(yàn)經(jīng)驗(yàn) | 分子生物實(shí)驗(yàn)及蛋白純化結(jié)晶相關(guān)鏈接 | 生物信息學(xué) | 生物化學(xué)和分子生物學(xué) |
精品收藏 | 待下載 | 蛋白質(zhì) | 交叉知識(shí) |
比偶長(zhǎng)大 | 蛋白 分析軟件 | 生物信息學(xué) |
木蟲(chóng) (小有名氣)
|
折疊識(shí)別方法及其鏈接 一些折疊識(shí)別方法的鏈接(僅列名稱(chēng)): •通過(guò)web網(wǎng)運(yùn)行的方法: o 3D-pssm (本站) o TOPITS (EMBL) o UCLA-DOE Structre Prediction Server (UCLA) o 123D o UCSC HMM (UCSC) o FAS (Burnham Institute) •有可執(zhí)行程序或代碼的方法: o THREADER(Warwick) o ProFIT CAME (Salzburg) •其他相關(guān)鏈接: o Protein Structure Prediction Centre (US) o CASP1 o CASP2 o CASP3 o UCLA-DOE Fold-Recognition Benchmark Home Page 即使不存在已知3D結(jié)構(gòu)的同源蛋白,仍然可能通過(guò)折疊識(shí)別方法,從已知的3D結(jié)構(gòu)中找到未知蛋白最接近的折疊。 3D結(jié)構(gòu)的相似性: 目前(真正 意義上的)從頭預(yù)測(cè)蛋白質(zhì)3D結(jié)構(gòu)仍然是不可能的,在較短的將來(lái)也不可能找到識(shí)別折疊的一般性方法。但是,長(zhǎng)期以來(lái)人們就意識(shí)到,即使沒(méi)有顯著的序列或功能上的相似性,蛋白質(zhì)常常采取相似的折疊, Ab initio prediction of protein 3D structures is not possible at present, and a general solution to the protein folding problem is not likely to be found in the near future. However, it has long been recognised that proteins often adopt similar folds despite no significant sequence or functional similarity and that nature is apparently restricted to a limited number of protein folds. There are numerous protein structure classifications now available via the WWW: • SCOP (MRC Cambridge) • CATH (University College, London) • FSSP (EBI, Cambridge) • 3 Dee (EBI, Cambridge) • HOMSTRAD (Biochemistry, Cambridge) • VAST (NCBI, USA) Thus for many proteins (~ 70%) there will be a suitable structure in the database from which to build a 3D model. Unfortuantely, the lack of sequence similarity will mean that many of these go undetected until after 3D structure determination. The goal of fold recognition Methods of protein fold recognition attempt to detect similarities between protein 3D structure that are not accompanied by any significant sequence similarity. There are many approaches, but the unifying theme is to try and find folds that are compatable with a particular sequence. Unlike sequence-only comparison, these methods take advantage of the extra information made available by 3D structure information. In effect, the turn the protein folding problem on it's head: rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence. 部分相關(guān)文章(略) The structure was correctly predicted to adopt a ras-p21 type fold The realities of fold recognition Despite initially promising results, methods of fold recognition are not always accurate. Guides to the accuracy of protein fold recognition can be found in the proceedings of the Critical Assessment of Structure Predictions (CASP) conferences. At the first meeting in 1994 (CASP1) the methods were found to be about 50 % accurate at best with respect to their ability to place a correct fold at the top of a ranked list. Though many methods failed to detect the correct fold at the top of a ranked list, a correct fold was often found in the top 10 scoring folds. Even when the methods were successful, alignments of sequence on to protein 3D structure were usually incorrect, meaning that comparative modelling performed using such models would be inaccurate. The CASP2 meeting held in December 1996, showed that many of the methods had improved, though it is difficult to compare the results of the two assessments (i.e. CASP1 & CASP2) since very different criteria were used to assess correct answers. It would be foolish and over-ambitious for me to present a detailed assessment of the results here. However, and important thing to note, was that Murzin & Bateman managed to attain near 100% success by the use of careful human insight, a knowledge of known structures, secondary structure predictions and thoughts about the function of the target sequences. Their results strongly support the arguments given below that human insight can be a powerful aid during fold recognition. A summary of the results from this meeting can be found in the PROTEINS issue dedicated to the meeting (PROTEINS, Suppl 1, 1997). The CASP3 meeting was held in December 1998. It showed some progress in the ability of fold recognition methods to detect correct protein folds and in the quality of alignments obtained. A detailed summary of the results will appear towards the end of 1999 in the PROTEINS supplement. For my talk, I did a crude assessment of 5 methods of fold recognition. I took 12 proteins of known structure (3 from each folding class) an ran each of the five methods using default parameters. I then asked how often was a correct fold (not allowing trival sequence detectable folds) found in the first rank, or in the top 10 scoring folds. I also asked how often the method found the correct folding class in the first rank. The results are summarised in here in a PostScript file. Perhaps the worst result from this study is shown below: One method suggested that the sequence for the Probe (left) (a four helix bundle) would best fit onto the structure shown on the right (an OB fold, comprising a six stranded barrel). The results suggest that one should use caution when using these methods. In spite of this, the methods remain very useful. A practical approach: Although they are not 100 % accurate, the methods are still very useful. To use the methods I would suggest the following: • Run as many methods as you can, and run each method on as many sequences (from your homologous protein family) as you can. The methods almost always give somewhat different answers with the same sequences. I have also found that a single method will often give different results for sets of homologous sequences, so I would also suggest running each method on as many homologoues as possible. After all of these runs, one can build up a consensus picture of the likely fold in a manner similar to that used for secondary structure prediction above. • Remember the expected accuracy of the methods, and don't use them as black-boxes. Remember that a correct fold may not be at the top of the list, but that it is likely to be in the top 10 scoring folds. • Think about the function of your protein, and look into the function of the proteins that have been found by the various methods. If you see a functional similarity, then you may have detected a weak sequence homologue, or remote homologue. At CASP2, as said above, Murzin & Bateman managed to obtain remarkably accurate predictions by identification of remote homologues. Their paper appeard in the PROTEINS supplement for the CASP2 experiment: Murzin AG, Bateman A (1997) Distant homology recognition using structural classification of proteins Proteins, Suppl 1, 105-112. and provides some key insights into protein fold recognition using humans rather than computers. • Don't trust the alignments that are output by the programs. They can be used as a starting point, but the best alignment of sequence on to tertiary structure is still likely to come from careful human intervention. One strategy for doing this is discussed in the next section [ Last edited by cnlics on 2010-9-19 at 16:59 ] |
木蟲(chóng) (小有名氣)
|
實(shí)驗(yàn)數(shù)據(jù) 許多實(shí)驗(yàn)數(shù)據(jù)可以輔助結(jié)構(gòu)預(yù)測(cè)過(guò)程,包括: •二硫鍵,固定了半胱氨酸的空間位置 •光譜數(shù)據(jù),可以提供蛋白的二級(jí)結(jié)構(gòu)內(nèi)容 •定位突變研究,可以發(fā)現(xiàn)活性或結(jié)合位點(diǎn)的殘基 •蛋白酶切割位點(diǎn),翻譯后修飾如磷酸化或糖基化提示了殘基必須是暴露的 •其他 預(yù)測(cè)時(shí),必須清楚所有的數(shù)據(jù)。必須時(shí)刻考慮:預(yù)測(cè)與實(shí)驗(yàn)結(jié)果是否一致?如果不是,就有必要修改做法。 [ Last edited by cnlics on 2010-9-14 at 19:31 ] |
木蟲(chóng) (小有名氣)
|
蛋白序列數(shù)據(jù) 對(duì)蛋白序列的初步分析有一定價(jià)值。例如,如果蛋白是直接來(lái)自基因預(yù)測(cè),就可能包含多個(gè)結(jié)構(gòu)域。更嚴(yán)重的是,可能會(huì)包含不太可能是球形或可溶性的區(qū)域。此流程圖假設(shè)你的蛋白是可溶的,可能是一個(gè)結(jié)構(gòu)域并不包含非球形結(jié)構(gòu)域。 需要考慮以下方面: •是跨膜蛋白或者包含跨膜片段嗎?有許多方法預(yù)測(cè)這些片段,包括: o TMAP (EMBL) o PredictProtein (EMBL/Columbia) o TMHMM (CBS, Denmark) o TMpred (Baylor College) o DAS (Stockholm) •如果包含卷曲(coiled-coils)可以在COILS server 預(yù)測(cè)coiled coils 或者下載 COILS 程序(最近已經(jīng)重寫(xiě),注意GCG程序包里包含了COILS的一個(gè)版本) •蛋白包含低復(fù)雜性區(qū)域?蛋白經(jīng)常含有數(shù)個(gè)聚谷氨酸或聚絲氨酸區(qū),這些地方不容易預(yù)測(cè)。可以用SEG(GCG程序包里包含了一個(gè)版本的SEG程序)檢查 。 如果出現(xiàn)以上一種情況,就應(yīng)該將序列打成碎片,或忽略序列中的特定區(qū)段,等等。這個(gè)問(wèn)題與細(xì)胞定位結(jié)構(gòu)域相關(guān)。 [ Last edited by cnlics on 2010-9-16 at 08:25 ] |
木蟲(chóng) (小有名氣)
|
搜索序列數(shù)據(jù)庫(kù) 分析任何新序列的第一步顯然是搜索序列數(shù)據(jù)庫(kù)以發(fā)現(xiàn)同源序列。這樣的搜索可以在任何地方或者在任何計(jì)算機(jī)上完成。而且,有許多WEB服務(wù)器可以進(jìn)行此類(lèi)搜索,可以輸入或粘貼序列到服務(wù)器上并交互式地接收結(jié)果。 序列搜索也有許多方法,目前最有名的是BLAST程序。可以容易得到在本地運(yùn)行的版本(從 NCBI 或者 Washington University),也有許多的WEB頁(yè)面允許對(duì)多基因或蛋白質(zhì)序列的數(shù)據(jù)庫(kù)比較蛋白質(zhì)或DNA序列,僅舉幾個(gè)例子: •National Center for Biotechnology Information (USA) Searches •European Bioinformatics Institute (UK) Searches •BLAST search through SBASE (domain database; ICGEB, Trieste) •還有更多的站點(diǎn) 最近序列比較的重要進(jìn)展是發(fā)展了gapped BLAST 和PSI-BLAST (position specific interated BLAST),二者均使BLAST更敏感,后者通過(guò)選取一條搜索結(jié)果,建立模式(profile),然后用再它搜索數(shù)據(jù)庫(kù)尋找其他同源序列(這個(gè)過(guò)程可以一直重復(fù)到發(fā)現(xiàn)不了新的序列為止),可以探測(cè)進(jìn)化距離非常遠(yuǎn)的同源序列。很重要的一點(diǎn)是,在利用下面章節(jié)方法之前,通過(guò)PSI-BLAST把蛋白質(zhì)序列和數(shù)據(jù)庫(kù)比較,找尋是否有已知結(jié)構(gòu)。 將一條序列和數(shù)據(jù)庫(kù)比較的其他方法有: •FASTA軟件包 (William Pearson, University of Virginia, USA) •SCANPS (Geoff Barton, European Bioinformatics Institute, UK) •BLITZ (Compugen's fast Smith Waterman search) •其他方法. It is also possible to use multiple sequence information to perform more sensitive searches. Essentially this involves building a profile from some kind of multiple sequence alignment. A profile essentially gives a score for each type of amino acid at each position in the sequence, and generally makes searches more sentive. Tools for doing this include: •PSI-BLAST (NCBI, Washington) •ProfileScan Server (ISREC, Geneva) •HMMER 隱馬氏模型(Sean Eddy, Washington University) •Wise package (Ewan Birney, Sanger Centre;用于蛋白質(zhì)對(duì)DNA的比較) •其他方法. A different approach for incorporating multiple sequence information into a database search is to use a MOTIF. Instead of giving every amino acid some kind of score at every position in an alignment, a motif ignores all but the most invariant positions in an alignment, and just describes the key residues that are conserved and define the family. Sometimes this is called a "signature". For example, "H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)-[DE]" describes a family of DNA binding proteins. It can be translated as "histidine, followed by either a phenylalanine or tryptophan, followed by an amino acid (x), followed by leucine, isoleucine, valine or methionine, followed by any amino acid (x), followed by glycine,... [etc.]". PROSITE (ExPASy Geneva) contains a huge number of such patterns, and several sites allow you to search these data: •ExPASy •EBI It is best to search a few different databases in order to find as many homologues as possible. A very important thing to do, and one which is sometimes overlooked, is to compare any new sequence to a database of sequences for which 3D structure information is available. Whether or not your sequence is homologous to a protein of known 3D structure is not obvious in the output from many searches of large sequence databases. Moreover, if the homology is weak, the similarity may not be apparent at all during the search through a larger database. One last thing to remember is that one can save a lot of time by making use of pre-prepared protein alignments. Many of these alignments are hand edited by experts on the particular protein families, and thus represent probably the best alignment one can get given the data they contain (i.e. they are not always as up to date as the most recent sequence databases). These databases include: •SMART (Oxford/EMBL) •PFAM (Sanger Centre/Wash-U/Karolinska Intitutet) •COGS (NCBI) •PRINTS (UCL/Manchester) •BLOCKS (Fred Hutchinson Cancer Research Centre, Seatle) •SBASE (ICGEB, Trieste) 通常把蛋白質(zhì)序列和數(shù)據(jù)比較都有很多的方法,這些對(duì)于識(shí)別結(jié)構(gòu)域非常有用。 [ Last edited by cnlics on 2010-9-14 at 19:54 ] |
| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 270求調(diào)劑 +8 | 小杰pp 2026-03-31 | 10/500 |
|
|---|---|---|---|---|
|
[考研] 能源動(dòng)力 調(diào)劑 +3 | 不破不立0 2026-04-02 | 3/150 |
|
|
[考研] 材料調(diào)劑 +8 | 一樣YWY 2026-04-02 | 8/400 |
|
|
[考研] 0856初試324分求調(diào)劑 +6 | 想上學(xué)求調(diào) 2026-04-01 | 6/300 |
|
|
[考研] 070300化學(xué)求調(diào)劑 +14 | 小黃鴨寶 2026-03-30 | 14/700 |
|
|
[考研] 08生物與醫(yī)藥專(zhuān)碩初試346找調(diào)劑 +6 | dianeeee 2026-04-01 | 7/350 |
|
|
[考研] 生物學(xué)327,求調(diào)劑 +5 | 書(shū)上的梅子 2026-04-01 | 6/300 |
|
|
[考研] 318求調(diào)劑,計(jì)算材料方向 +7 | 吸喵有害笙命 2026-04-01 | 8/400 |
|
|
[考研] 294分080500材料科學(xué)與工程求調(diào)劑 +15 | 柳溪邊 2026-03-26 | 16/800 |
|
|
[考研] 086502化學(xué)工程342求調(diào)劑 +7 | 阿姨復(fù)古不過(guò) 2026-03-27 | 7/350 |
|
|
[考研] 求調(diào)劑 +4 | 圖鑒212 2026-03-30 | 5/250 |
|
|
[考研] 326求調(diào)劑 +4 | 崽崽仔 2026-03-31 | 4/200 |
|
|
[考研] 301求調(diào)劑 +8 | axibli 2026-04-01 | 8/400 |
|
|
[考研] 合肥區(qū)域性重點(diǎn)一本招收調(diào)劑 +4 | 6266jl 2026-03-30 | 8/400 |
|
|
[考研] 315求調(diào)劑 +6 | akie... 2026-03-28 | 7/350 |
|
|
[考研] 一志愿南開(kāi)大學(xué)0710生物學(xué)359求調(diào)劑 +5 | 兔兔兔111223314 2026-03-29 | 7/350 |
|
|
[考研] 本科雙非材料,跨考一志愿華電085801電氣,283求調(diào)劑,任何專(zhuān)業(yè)都可以 +6 | 芝士雪baoo 2026-03-28 | 8/400 |
|
|
[考研] 085602 化工專(zhuān)碩 338分 求調(diào)劑 +12 | 路癡小琪 2026-03-27 | 12/600 |
|
|
[考研] 394求調(diào)劑 +3 | 好事多磨靜候佳?/a> 2026-03-26 | 5/250 |
|
|
[考研] 340求調(diào)劑 +5 | jhx777 2026-03-27 | 5/250 |
|