The role of a protein depends heavily on its 3D shape, which is composed of semi-independent three-dimensional blocks called domains. Domains fold independently and constitute units of evolution. Most proteins contain multiple domains that are associated with a particular functions; moreover, the same domain can be found in different proteins. Automated recognition of domains can make prediction of proteins function easier and can support the analysis of proteins. Here, we propose a novel algorithm designed for domain recognition by identification of domain boundaries in the protein structure. The proposed algorithm uses a contact graph and an iterative approach to find meaningful clusters corresponding to the protein domains. The distinctive feature of the method is its effective complexity, that improves over other well-known methods, while holding a comparable level of correct domain assignments.
Accepté le :
DOI : 10.1051/ro/2015040
Keywords: Graph theory, computational biology, protein structure
Milostan, Maciej 1 ; Lukasiak, Piotr 1, 2
@article{RO_2016__50_2_363_0,
author = {Milostan, Maciej and Lukasiak, Piotr},
title = {DomGen-Graph based method for protein domain delineation},
journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
pages = {363--374},
year = {2016},
publisher = {EDP Sciences},
volume = {50},
number = {2},
doi = {10.1051/ro/2015040},
mrnumber = {3479876},
language = {en},
url = {https://www.numdam.org/articles/10.1051/ro/2015040/}
}
TY - JOUR AU - Milostan, Maciej AU - Lukasiak, Piotr TI - DomGen-Graph based method for protein domain delineation JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2016 SP - 363 EP - 374 VL - 50 IS - 2 PB - EDP Sciences UR - https://www.numdam.org/articles/10.1051/ro/2015040/ DO - 10.1051/ro/2015040 LA - en ID - RO_2016__50_2_363_0 ER -
%0 Journal Article %A Milostan, Maciej %A Lukasiak, Piotr %T DomGen-Graph based method for protein domain delineation %J RAIRO - Operations Research - Recherche Opérationnelle %D 2016 %P 363-374 %V 50 %N 2 %I EDP Sciences %U https://www.numdam.org/articles/10.1051/ro/2015040/ %R 10.1051/ro/2015040 %G en %F RO_2016__50_2_363_0
Milostan, Maciej; Lukasiak, Piotr. DomGen-Graph based method for protein domain delineation. RAIRO - Operations Research - Recherche Opérationnelle, Special issue: Research on Optimization and Graph Theory dedicated to COSI 2013 / Special issue: Recent Advances in Operations Research in Computational Biology, Bioinformatics and Medicine, Tome 50 (2016) no. 2, pp. 363-374. doi: 10.1051/ro/2015040
, , , , and , Scop database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 32 (2004) D226–D229. | DOI
, , , , and , Domanspattern based method for protein domain boundaries prediction and analysis. Found. Comput. Decision Sci. 36 (2011) 99.
, , , , , , and , The protein data bank. Nucleic Acids Res. 28 (2000) 235–242. | DOI
and , Complexity issues in computational biology. Fund. Inform. 118 (2012) 385–401. | MR | Zbl
, and , Predicting secondary structures of proteins. IEEE Eng. Med. Biol. Mag. 24 (2005) 88–94. | DOI
, and , Application of tabu search strategy for finding low energy structure of protein. Computational Intelligence Techniques in Bioinformatics. Artif. Intell. Med. 35 (2005) 135–145. | DOI
, and , New machine learning methods for prediction of protein secondary structures. Control and Cybernet. 36 (2007) 183–201.
and , A threading-based method (findsite) for ligand-binding site prediction and functional annotation. Proc. Natl. Acad. Sci. USA 105 (2008) 129–34. | DOI
and , A novel method to compare protein structures using local descriptors. BMC Bioinform. 12 (2011) 344. | DOI
I. Dhillon, Y. Guan and B. Kulis, A fast kernel-based multilevel algorithm for graph clustering. In Proc. of the eleventh ACM SIGKDD international conference on Knowledge discovery in Data Mining. ACM (2005) 629–634.
, and , An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30 (2002) 1575–1584. | DOI
, , and , Assessment of domain boundary predictions and the prediction of intramolecular contacts in casp8. Proteins: Structure, Function and Bioinform. 77 (2009) 196–209. | DOI
L.R. Ford and D.R. Fulkerson, Flows in Networks, Vol. 1962. Princeton Princeton University Press (1962). | MR | Zbl
, , and , G-pas 2.0–an improved version of protein alignment tool with an efficient backtracking routine on multiple gpus. Bull. Pol. Acad. Sci.: Tech. Sci. 60 (2012) 491–494.
, , , and , G-dna–a highly efficient multi-gpu/mpi tool for aligning nucleotide reads. Bull. Pol. Acad. Sci.: Tech. Sci. 61 (2013) 989–992.
, , and , Improving the performance of domainparser for structural domain partition using neural network. Nucleic Acids Res. 31 (2003) 944–952. | DOI
and , A systematic comparison of protein structure classifications: Scop, cath and fssp. Structure 7 (1999) 1099–1112. | DOI
and , Protein structure comparison by alignment of distance matrices. J. Molec. Biol. 233 (1993) 123–138. | DOI
and , The fssp database of structurally aligned protein fold families. Nucleic Acids Res. 22 (1994) 3600–9.
and , Parser for protein folding units. Proteins: Structure, Function and Bioinform. 19 (1994) 256–268.
and , Mapping the protein universe. Science 273 (1996) 595–602. | DOI
, , and , A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinform. 19 (2003) ii81–ii91. | DOI
, , , , and , Domain assignment for protein structures using a consensus approach: characterization and analysis. Protein Sci. 7 (1998) 233–242. | DOI
and , Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (1983) 2577–2637. | DOI
, , , and , Web and grid technologies in bioinformatics, computational and systems biology: A review. Curr. Bioinform. 3 (2008) 10–31. | DOI
and , Sequence-based prediction of protein domains. Nucleic Acids Res. 32 (2004) 3522–3530. | DOI
, and , Some operations research methods for analyzing protein sequences and structures. Ann. Oper. Res. 175 (2010) 9–35. | MR | Zbl | DOI
, , and , Scop: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247 (1995) 536–540. | DOI
and , Spectral methods for graph clustering–a survey. Eur. J. Oper. Res. 211 (2011) 221–231. | MR | Zbl | DOI
, and , Protein-binding site prediction based on three-dimensional protein modeling. Proteins: Structure, Function, and Bioinform. 77 (2009) 152–156. | DOI
, , , , , , , , , , , , , , , , , , , , and , The cath domain structure database and related resources gene3d and dhs provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 33 (2004) D247–D251. | DOI
, et al. The cath domain structure database and related resources gene3d and dhs provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 33 (2005) D247–D251. | DOI
and , Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10 (2006) 191–218. | MR | Zbl | DOI
, Graph clustering. Comput. Sci. Rev. 1 (2007) 27–64. | Zbl | DOI
, , , and , Assessment of ligand-binding residue predictions in casp9. Proteins: Structure, Function, and Bioinform. 79 (2011) 126–136. | DOI
and , Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci. 4 (1995) 872–884. | DOI
, A procedure for detecting structural domains in proteins. Protein science: a Publication of the Protein Society 4 (1995) 103. | DOI
, , and , The orderly colored longest path problem – a survey of applications and new algorithms. RAIRO: RO 48 (2014) 25–51. | Zbl | MR | Numdam | DOI
, and , Protein structure: insights from graph theory. J. Theoret. Comput. Chem. 1 (2002) 187–211. | DOI
, , and , Multi-agent model of hepatitis c virus infection. Art. Intell. Med. 60 (2014) 123–131. | DOI
, and , Identification of structural domains in proteins by a graph heuristic. Proteins: Structure, Function, and Bioinform. 35 (1999) 338–352. | DOI
, and , Protein domain decomposition using a graph-theoretic approach. Bioinform. 16 (2000) 1091–1104. | DOI
, Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100 (1971) 68–86. | Zbl | DOI
, I-tasser server for protein 3d structure prediction. BMC Bioinform. 9 (2008) 40. | DOI
, Protein structure prediction: when is it useful? Curr. Opin. Struct. Biol. 19 (2009) 145–155. | DOI
, and , Minimum spanning tree based split-and-merge: A hierarchical clustering method. Inform. Sci. 181 (2011) 3397–3410. | DOI
Cité par Sources :





