Shiangyi Lin
Identification of Inconsistent Families in SCOPe database
Proteins often fold into compact structural units known as domains, which are the basic units of protein function and evolution. Delineating domain boundaries is a pre-requisite for further analyses of protein structures, but the accuracy of computer programs is still not satisfactory. With the support from my mentor, I have surveyed the current version of SCOPe (Structural Classification of Proteins – extended) database in the last two semesters: I have found and labeled/corrected protein families in which different structures have been inconsistently divided into domains, and built a version of a consistent check that can be used to perform automated error checks on manual edits. This summer, I hope to further improve the current domain identification algorithms used to build the SCOPe database by surveying and integrating suitable structural comparison program into the identification pipeline.