A method of localization of constant and variable regions in homologous protein sequences

P. V. Kostetsky, R. R. Vladimirova

M. M. Shemyakin Institute of Bioorganic Chemistry, Academy of Sciences of the USSR, Moscow

Abstract: A set of aligned homologous protein sequences is divided into two groups consisting of m and n sequences. Each group contains sequences from the most related organisms. Value of the position dissimilarity of proteins from different groups of m and n sequences is defined as a number of mismatches in comparison of all possible m × n pairs of amino acid residues in the position (each from different group) divided by m × n. Ten position average of dissimilarity values is plotted vs. the first position number. Area of the figure between the profile of dissimilarity values and its mean value line characterizes the overall irregularity of amino acid substitutions along the protein sequences. If the area is greater than the average area for 1000 random profiles by more than two standard deviation units, the profile extrema containing the «surplus» of area are cut off. The cut-off stretches are likely to be variable and constant regions. If necessary, each of stretches may be separately tested and statistically estimated using a standard size sample of artificial protein families.

Intergroup comparison of protein sequences reveals high overall irregularity of amino acid substitutions and identifies variable and conservative regions for all considered families of proteins: phospholipases A2, aspartate aminotransferases, alpha-subunits of Na+, K+-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre, human rhodopsins.

Russian Journal of Bioorganic Chemistry 1990, 16 (12):1618-1628

Full Text (PDF, in Russian)