| 5 | 1/1 | 返回列表 |
| 查看: 6676 | 回復(fù): 32 | ||||||||||||||
| 當(dāng)前只顯示滿足指定條件的回帖,點(diǎn)擊這里查看本話題的所有回帖 | ||||||||||||||
cnlics木蟲 (小有名氣)
|
[交流]
【分享】蛋白質(zhì)結(jié)構(gòu)預(yù)測(cè)流程 已有23人參與
|
|||||||||||||
|
我慢慢翻譯慢慢貼 這里貼的內(nèi)容是以前收集的,應(yīng)該是來自EMBL,我粗略瀏覽了下內(nèi)容,還沒有過時(shí)。 WORD文檔可以在這里下載: http://ifile.it/dwzy278 蛋白質(zhì)結(jié)構(gòu)預(yù)測(cè)一般流程見下圖: ![]() 內(nèi)容目錄: •相關(guān)實(shí)驗(yàn)數(shù)據(jù) •序列數(shù)據(jù)和初步分析 •搜索序列數(shù)據(jù)庫 •識(shí)別結(jié)構(gòu)域 •多序列比對(duì) •比較或同源建模 •二級(jí)結(jié)構(gòu)預(yù)測(cè) •折疊的識(shí)別 •折疊分析與二級(jí)結(jié)構(gòu)比對(duì) •序列與結(jié)構(gòu)的比對(duì) [ Last edited by cnlics on 2010-9-16 at 08:24 ] |
蛋白質(zhì)生物學(xué)實(shí)驗(yàn)經(jīng)驗(yàn) | 分子生物實(shí)驗(yàn)及蛋白純化結(jié)晶相關(guān)鏈接 | 生物信息學(xué) | 生物化學(xué)和分子生物學(xué) |
精品收藏 | 待下載 | 蛋白質(zhì) | 交叉知識(shí) |
比偶長(zhǎng)大 | 蛋白 分析軟件 | 生物信息學(xué) |
木蟲 (小有名氣)
|
二級(jí)結(jié)構(gòu)預(yù)測(cè)方法和鏈接 有許多做結(jié)構(gòu)預(yù)測(cè)的WEB服務(wù)器,下面是簡(jiǎn)單的總括: • PSI-pred (PSI-BLAST profiles used for prediction; David Jones, Warwick) • JPRED Consensus prediction (includes many of the methods given below; Cuff & Barton, EBI) • DSC King & Sternberg (本服務(wù)器) • PREDATORFrischman & Argos(EMBL) • PHD home page Rost & Sander,EMBL,Germany • ZPRED server Zvelebil et al.,Ludwig,U.K. • nnPredict Cohen et al,UCSF,USA. • BMERC PSA Server Boston University,USA • SSP (Nearest-neighbor) Solovyev and Salamov,Baylor College, USA. With no homologue of known structure from which to make a 3D model, a logical next step is to predict secondary structure. Although they differ in method, the aim of secondary structure prediction is to provide the location of alpha helices, and beta strands within a protein or protein family. 單條序列的方法 二級(jí)結(jié)構(gòu)預(yù)測(cè)已經(jīng)存在約1/4世紀(jì)了,早期的方法受制于缺乏數(shù)據(jù),僅對(duì)單條序列進(jìn)行預(yù)測(cè),而不是對(duì)同源序列家族,而且能得到數(shù)據(jù)的已知3D結(jié)構(gòu)較少。早期最有名的方法是Chou & Fasman,Garnier,Osguthorbe & Robson (GOR)以及Lim。盡管作者開始聲稱準(zhǔn)確率很高(70-80 %),仔細(xì)檢查后,這些方法僅有56 到60%的準(zhǔn)確率(Kabsch & Sander,1984,見下)。早期預(yù)測(cè)二級(jí)結(jié)構(gòu)的一個(gè)問題是 An early problem in secondary structure prediction had been the inclusion of structures used to derive parameters in the set of structures used to assess the accuracy of the method. 關(guān)于主題的一些好的參考資料: • 對(duì)單條序列的早期方法Early methods on single sequences o Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222. o Lim, V.I. (1974). Journal of Molecular Biology, 88, 857-872. o Garnier, J., Osguthorpe, D.~J. \& Robson, B. (1978).Journal of Molecular Biology, 120, 97-120. o Kabsch, W. & Sander, C. (1983). FEBS Letters, 155, 179-182. (An assessment of the above methods) • Later methods on single sequences o Deleage, G. & Roux, B. (1987). Protein Engineering , 1, 289-294 (DPM) o Presnell, S.R., Cohen, B.I. & Cohen, F.E. (1992). Biochemistry, 31, 983-993. o Holley, H.L. & Karplus, M. (1989). Proceedings of the National Academy of Science, 86, 152-156. o King, R. & Sternberg, M. J.E. (1990). Journal of Molecular Biology, 216, 441-457. o D. G. Kneller, F. E. Cohen & R. Langridge (1990) Improvements in Protein Secondary Structure Prediction by an Enhanced Neural Network, Journal of Molecular Biology, 214, 171-182. (NNPRED) Recent improvments The availability of large families of homologous sequences revolutionised secondary structure prediction. Traditional methods, when applied to a family of proteins rather than a single sequence proved much more accurate at identifying core secondary structure elements. The combination of sequence data with sophisticated computing techniques such as neural networks has lead to accuracies well in excess of 70 %. Though this seems a small percentage increase, these predictions are actually much more useful than those for single sequence, since they tend to predict the core accurately. Moreover, the limit of 70-80% may be a function of secondary structure variation within homologous proteins. Automated methods There are numerous automated methods for predicting secondary structure from multiply aligned protein sequences. Some good references on the subject include (the acronyms in parentheses given after each reference refer to the associated WWW servers, given below): • Zvelebil, M.J.J.M., Barton, G.J., Taylor, W.R. & Sternberg, M.J.E. (1987). Prediction of Protein Secondary Structure and Active Sites Using the Alignment of Homologous Sequences Journal of Molecular Biology, 195, 957-961. (ZPRED) • Rost, B. & Sander, C. (1993), Prediction of protein secondary structure at better than 70 % Accuracy, Journal of Molecular Biology, 232, 584-599. PHD) • Salamov A.A. & Solovyev V.V. (1995), Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments. Journal of Molecular Biology, 247,1 (NNSSP) • Geourjon, C. & Deleage, G. (1994), SOPM : a self optimised prediction method for protein secondary structure prediction. Protein Engineering, 7, 157-16. (SOPMA) • Solovyev V.V. & Salamov A.A. (1994) Predicting alpha-helix and beta-strand segments of globular proteins. (1994) Computer Applications in the Biosciences,10,661-669. (SSP) • Wako, H. & Blundell, T. L. (1994), Use of amino-acid environment-depdendent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. 2. Secondary Structures, Journal of Molecular Biology, 238, 693-708. • Mehta, P., Heringa, J. & Argos, P. (1995), A simple and fast approach to prediction of protein secondary structure from multiple aligned sequences with accuracy above 70 %. Protein Science, 4, 2517-2525. (SSPRED) • King, R.D. & Sternberg, M.J.E. (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci,5, 2298-2310. (DSC). Nearly all of these now run via the world wide web. For individual details, see the papers for the individual methods, or click on the underlined acronyms given after most of the references given above (note that you can also run the methods by going to the approriate WWW site). Manual intervention It has long been recognised that patterns of residue conservation are indicative of particular secondary structure types. Alpha helices have a periodicity of 3.6, which means that for helices with one face buried in the protein core, and the other exposed to solvent, will have residues at positions i, i+3, i+4 & i+7 (where i is a residue in an a helix) will lie on one face of the helix. Many alpha helices in proteins are amphipathic, meaning that one face is pointing towards the hydrophobic core and the other towards the solvent. Thus patterns of hydrophobic residue conservation showing the i, i+3, i+4, i+7 pattern are highly indicative of an alpha helix. For example, this helix in myoglobin has this classic pattern of hydrophobic and polar residue conservation (i = 1): Similarly, the geometry of beta strands means that adjacent residues have their side chains pointing in oppposite directions. Beta strands that are half buried in the protein core will tend to have hydrophobic residues at positions i, i+2, i+4, i+8 etc, and polar residues at positions i+1, i+3, i+5, etc. For example, this beta strand in CD8 shows this classic pattern: Beta strands that are completely buried (as is often the case in proteins containing both alpha helices and beta strands) usually contain a run of hydrophobic residues, since both faces are buried in the protein core. This strand from Chemotaxis protein CheY is a good example: The principle behind most manual secondary structure predictions is to look for patterns of residue conservation that are indicative of secondary structures like those shown above. It has been shown in numerous successful examples that this strategy often leads to nearly perfect predictions. The work of Barton et al, Nierman & Krischner, Bazan and Benner & co-workers provide good starting points for getting doing this sort of work oneself. Some useful references are: • Recent reviews on the subject (and on secondary structure prediction generally) See also references therein o Rost, B., Schneider, R. & Sander, C. (1993), Trends in Biochemical Sciences, 18, 120-123. o Benner, S. A., Gerloff, D. L. & Jenny, T. F. (1994), Science, 265, 1642-1644. o Barton, G. J. (1995), Protein Secondary Structure Prediction, Current Opinion in Structural Biology,5, 372-376. o Russell, R. B. & Sternberg, M. J. E. (1995), Protein Structure Prediction: How Good Are We?, Current Biology, 5, 488-490. • Some guides for predicting structure: o Benner, S. A. (1989), Patterns of divergence in homolgous proteins as indicators of tertiary and quaternary structure, Advances in Enzyme Regulation, 31, 219-236. o Benner, S. A. (1992), Predicting de novo the folded structure of proteins, Current Opinion in Structural Biology, 2, 402-412. • Some particular examples of protein secondary structure predictions: o Crawford, I. P., Niermann, T. & Kirschner, K. (1987), Predictions of secondary structure by evolutionary comparison: Application to the alpha subunit of tryptophan synthase, PROTEINS: Structure, Function and Genetics, 1, 118-129. o Bazan, J. F. (1990), Structural Design and Molecular Evolution of a Cytokine Receptor Superfamily,Proceedings of the National Academy of Science, 87, 6934-6938. o Benner, S. A. & Gerloff, D. (1990), Patterns of Divergence in Homologous Proteins and tertiary structure. A prediction of the structure of the catalytic domain of protein kinases, Advances in Enzyme Regulation, 31, 121-181. o Jenny, T. F. & Benner, S. A. (1994) A prediction of the secondary structure of the pleckstrin homology domain, A prediction of the secondary structure of the pleckstrin homology domain, PROTEINS: Structure, Function and Genetics, 20, 1-3. o Benner, S. A., Badcoe, I., Cohen, M. A. and Gerloff, D. L. (1993) Predicted secondary structure for the src homology 3 domain, Journal of Molecular Biology, 229, 295-305. o Gerloff, D. L., Jenny, T. F., Knecht, L. J., Gonnet, G.H. & Benner, S. A. (1993), The nitrogenase MoFe protein. A secondary structure prediction. FEBS Letters, 318, 118-124. o Gerloff, D. L., Chelvanayagam, G. & Benner, S. A. (1995), A predicted consensus structure for the protein-kinase c2 homology (c2h) domain, the repeating unit of synaptotagmin, PROTEINS: Structure, Function and Genetics, 22, 299-310. o Barton, G. J., Newman, R. H., Freemont, P. F. & Crumpton, M. J. (1991), Amino acid sequence analysis of the annexin super-gene family of proteins, European Journal of Biochemistry, 198, 749-760. o Russell, R. B., Breed, J. & Barton, G. J., (1992) Conservation analysis and secondary structure prediction of the SH2 family of phosphotyrosine binding domains, FEBS Letters, 304, 15-20. o Livingstone, C. D. & Barton, G. J. (1994), Secondary structure prediction from multiple sequence data: Blood clotting factor XII and Yersinia protein tyrosine phosphatase, International Journal of Peptide and Protein Research o Barton, G. J., Barford, D. A. & Cohen, P. T. (1994), European Journal of Biochemsitry, 220, 225-237. o Perkins, S. J., Smith K. F., Williams, S. C., Haris, P. I., Chapman, D. & Sim, R. B. (1994), The secondary structure of the von Willebrand Factor Type A Domain in Factor B of Human Complement by Fourier Transform Infrared Spectroscopy, Journal of Molecular Biology, 238, 104-119. o Edwards, Y. J. K. & Perkins, S. J., (1995) The protein fold of the von Willebrand factor type A is predicted to be similar to the open twisted beta-sheet flanked by alpha-helices found in human ras-p21, 358, 283-286. o Lupas, A., Koster, A. J., Walz, J. & Baumeister, W. (1994) Predicted secondary structure of the 20S proteasome and model structure of the putative peptide channel, FEBS Letters, 354, 45-49. A strategy for secondary structure prediction In practice, I recommend getting as many state-of-the-art prediction approaches as possible and combining this with some human insight to give a consensus prediction for the family. If you then align all of your predictions (including ideas you have based on residue conservation) with your multiple sequence alignment you can get a consensus picture of the structure. For example, here is part of an alignment of a family of proteins I looked at recently: In this figure, three automated secondary structure predictions (PHD, SOPMA and SSPRED) appear below the alignment of 12 glutamyl tRNA reductase sequences. Positions within the alignment showing a conservation of hydrophobic side-chain character are shown in yellow, and those showing near total conservation of non-hydrophobic residues (often indicative of active sites) are coloured green. Predictions of accessibility performed by PHD (PHD Acc. Pred.) are also shown (b = buried, e = exposed), as is a prediction I performed by looking for patterns indicative of the three secondary structure types shown above. For example, positions (within the alignment) 38-45 exhibit the classical amphipathic helix pattern of hydrophobic residue conservation, with positions i, i+3, i+4 and i+7 showing a conservation of hydrophobicity, with intervening positions being mostly polar. Positions 13-16 comprise a short stretch of conserved hydrophobic residues, indicative of a beta-strand, similar to the example from CheY protein shown above. By looking for these patterns I built up a prediction of the secondary structure for most regions of the protein. Note that most methods - automated and manual - agree for many regions of the alignment. Given the results of several methods of predicting secondary structure, one can build up a consensus picture of the secondary structure, such as that shown at the bottom of the alignment above. Note that you can get predictions like the above (i.e. consensus predictions) from the very useful JPRED server. |
木蟲 (小有名氣)
|
實(shí)驗(yàn)數(shù)據(jù) 許多實(shí)驗(yàn)數(shù)據(jù)可以輔助結(jié)構(gòu)預(yù)測(cè)過程,包括: •二硫鍵,固定了半胱氨酸的空間位置 •光譜數(shù)據(jù),可以提供蛋白的二級(jí)結(jié)構(gòu)內(nèi)容 •定位突變研究,可以發(fā)現(xiàn)活性或結(jié)合位點(diǎn)的殘基 •蛋白酶切割位點(diǎn),翻譯后修飾如磷酸化或糖基化提示了殘基必須是暴露的 •其他 預(yù)測(cè)時(shí),必須清楚所有的數(shù)據(jù)。必須時(shí)刻考慮:預(yù)測(cè)與實(shí)驗(yàn)結(jié)果是否一致?如果不是,就有必要修改做法。 [ Last edited by cnlics on 2010-9-14 at 19:31 ] |
木蟲 (小有名氣)
|
蛋白序列數(shù)據(jù) 對(duì)蛋白序列的初步分析有一定價(jià)值。例如,如果蛋白是直接來自基因預(yù)測(cè),就可能包含多個(gè)結(jié)構(gòu)域。更嚴(yán)重的是,可能會(huì)包含不太可能是球形或可溶性的區(qū)域。此流程圖假設(shè)你的蛋白是可溶的,可能是一個(gè)結(jié)構(gòu)域并不包含非球形結(jié)構(gòu)域。 需要考慮以下方面: •是跨膜蛋白或者包含跨膜片段嗎?有許多方法預(yù)測(cè)這些片段,包括: o TMAP (EMBL) o PredictProtein (EMBL/Columbia) o TMHMM (CBS, Denmark) o TMpred (Baylor College) o DAS (Stockholm) •如果包含卷曲(coiled-coils)可以在COILS server 預(yù)測(cè)coiled coils 或者下載 COILS 程序(最近已經(jīng)重寫,注意GCG程序包里包含了COILS的一個(gè)版本) •蛋白包含低復(fù)雜性區(qū)域?蛋白經(jīng)常含有數(shù)個(gè)聚谷氨酸或聚絲氨酸區(qū),這些地方不容易預(yù)測(cè)?梢杂肧EG(GCG程序包里包含了一個(gè)版本的SEG程序)檢查 。 如果出現(xiàn)以上一種情況,就應(yīng)該將序列打成碎片,或忽略序列中的特定區(qū)段,等等。這個(gè)問題與細(xì)胞定位結(jié)構(gòu)域相關(guān)。 [ Last edited by cnlics on 2010-9-16 at 08:25 ] |
木蟲 (小有名氣)
|
搜索序列數(shù)據(jù)庫 分析任何新序列的第一步顯然是搜索序列數(shù)據(jù)庫以發(fā)現(xiàn)同源序列。這樣的搜索可以在任何地方或者在任何計(jì)算機(jī)上完成。而且,有許多WEB服務(wù)器可以進(jìn)行此類搜索,可以輸入或粘貼序列到服務(wù)器上并交互式地接收結(jié)果。 序列搜索也有許多方法,目前最有名的是BLAST程序?梢匀菀椎玫皆诒镜剡\(yùn)行的版本(從 NCBI 或者 Washington University),也有許多的WEB頁面允許對(duì)多基因或蛋白質(zhì)序列的數(shù)據(jù)庫比較蛋白質(zhì)或DNA序列,僅舉幾個(gè)例子: •National Center for Biotechnology Information (USA) Searches •European Bioinformatics Institute (UK) Searches •BLAST search through SBASE (domain database; ICGEB, Trieste) •還有更多的站點(diǎn) 最近序列比較的重要進(jìn)展是發(fā)展了gapped BLAST 和PSI-BLAST (position specific interated BLAST),二者均使BLAST更敏感,后者通過選取一條搜索結(jié)果,建立模式(profile),然后用再它搜索數(shù)據(jù)庫尋找其他同源序列(這個(gè)過程可以一直重復(fù)到發(fā)現(xiàn)不了新的序列為止),可以探測(cè)進(jìn)化距離非常遠(yuǎn)的同源序列。很重要的一點(diǎn)是,在利用下面章節(jié)方法之前,通過PSI-BLAST把蛋白質(zhì)序列和數(shù)據(jù)庫比較,找尋是否有已知結(jié)構(gòu)。 將一條序列和數(shù)據(jù)庫比較的其他方法有: •FASTA軟件包 (William Pearson, University of Virginia, USA) •SCANPS (Geoff Barton, European Bioinformatics Institute, UK) •BLITZ (Compugen's fast Smith Waterman search) •其他方法. It is also possible to use multiple sequence information to perform more sensitive searches. Essentially this involves building a profile from some kind of multiple sequence alignment. A profile essentially gives a score for each type of amino acid at each position in the sequence, and generally makes searches more sentive. Tools for doing this include: •PSI-BLAST (NCBI, Washington) •ProfileScan Server (ISREC, Geneva) •HMMER 隱馬氏模型(Sean Eddy, Washington University) •Wise package (Ewan Birney, Sanger Centre;用于蛋白質(zhì)對(duì)DNA的比較) •其他方法. A different approach for incorporating multiple sequence information into a database search is to use a MOTIF. Instead of giving every amino acid some kind of score at every position in an alignment, a motif ignores all but the most invariant positions in an alignment, and just describes the key residues that are conserved and define the family. Sometimes this is called a "signature". For example, "H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)-[DE]" describes a family of DNA binding proteins. It can be translated as "histidine, followed by either a phenylalanine or tryptophan, followed by an amino acid (x), followed by leucine, isoleucine, valine or methionine, followed by any amino acid (x), followed by glycine,... [etc.]". PROSITE (ExPASy Geneva) contains a huge number of such patterns, and several sites allow you to search these data: •ExPASy •EBI It is best to search a few different databases in order to find as many homologues as possible. A very important thing to do, and one which is sometimes overlooked, is to compare any new sequence to a database of sequences for which 3D structure information is available. Whether or not your sequence is homologous to a protein of known 3D structure is not obvious in the output from many searches of large sequence databases. Moreover, if the homology is weak, the similarity may not be apparent at all during the search through a larger database. One last thing to remember is that one can save a lot of time by making use of pre-prepared protein alignments. Many of these alignments are hand edited by experts on the particular protein families, and thus represent probably the best alignment one can get given the data they contain (i.e. they are not always as up to date as the most recent sequence databases). These databases include: •SMART (Oxford/EMBL) •PFAM (Sanger Centre/Wash-U/Karolinska Intitutet) •COGS (NCBI) •PRINTS (UCL/Manchester) •BLOCKS (Fred Hutchinson Cancer Research Centre, Seatle) •SBASE (ICGEB, Trieste) 通常把蛋白質(zhì)序列和數(shù)據(jù)比較都有很多的方法,這些對(duì)于識(shí)別結(jié)構(gòu)域非常有用。 [ Last edited by cnlics on 2010-9-14 at 19:54 ] |
| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 282求調(diào)劑 +17 | ycy1201 2026-04-01 | 19/950 |
|
|---|---|---|---|---|
|
[考研] 292求調(diào)劑 +17 | 木蟲er12138 2026-04-01 | 17/850 |
|
|
[考研] 0817化工學(xué)碩調(diào)劑 +11 | 努力上岸中! 2026-03-31 | 11/550 |
|
|
[考研] 070300一志愿211,312分求調(diào)劑院校 +14 | 小黃鴨寶 2026-03-30 | 14/700 |
|
|
[考研] 320分,材料與化工專業(yè),求調(diào)劑 +14 | 一定上岸aaa 2026-03-27 | 18/900 |
|
|
[考研] 349求調(diào)劑 +6 | 吃的不少 2026-04-01 | 6/300 |
|
|
[考研] 08工科275分求調(diào)劑 +12 | AaAa7420 2026-03-31 | 12/600 |
|
|
[考研] 生物學(xué)327,求調(diào)劑 +4 | 書上的梅子 2026-04-01 | 5/250 |
|
|
[考研] 環(huán)境工程 085701,267求調(diào)劑 +15 | minht 2026-03-29 | 16/800 |
|
|
[考研] 求調(diào)劑,一志愿 南京航空航天大學(xué) ,080500材料科學(xué)與工程學(xué)碩,總分289分 +10 | @taotao 2026-03-29 | 10/500 |
|
|
[考研] 考研生物與醫(yī)藥調(diào)劑 +7 | 鐵憨憨123425 2026-03-31 | 7/350 |
|
|
[考研] 張芳銘-中國(guó)農(nóng)業(yè)大學(xué)-環(huán)境工程專碩-298 +9 | 手機(jī)用戶 2026-03-26 | 9/450 |
|
|
[考研] 266分,求材料冶金能源化工等調(diào)劑 +8 | 哇呼哼呼哼 2026-03-27 | 10/500 |
|
|
[考研] 081200-11408-276學(xué)碩求調(diào)劑 +4 | 崔wj 2026-03-31 | 4/200 |
|
|
[考研] 調(diào)劑 +4 | GK72 2026-03-30 | 4/200 |
|
|
[考研] 材料專碩 085600求調(diào)劑 +7 | BBQ233 2026-03-30 | 7/350 |
|
|
[考研] 材料與化工304求B區(qū)調(diào)劑 +4 | 邱gl 2026-03-26 | 7/350 |
|
|
[考研] 2026年華南師范大學(xué)歡迎化學(xué),化工,生物,生醫(yī)工等專業(yè)優(yōu)秀學(xué)子加入! +3 | llss0711 2026-03-28 | 6/300 |
|
|
[考研] 一志愿211院校 344分 東北農(nóng)業(yè)大學(xué)生物學(xué)學(xué)碩,求調(diào)劑 +5 | 丶風(fēng)雪夜歸人丶 2026-03-26 | 8/400 |
|
|
[考研] 打過很多競(jìng)賽,085406控制工程300分,求調(diào)劑 +3 | askeladz 2026-03-26 | 3/150 |
|