How closely related are you to a flower?
This question could not have been answered 10 years ago, but thanks to a powerful sequencing tool developed by Biochemistry Prof. William Pearson, this question has become less mysterious.
For decades, scientists have investigated the relationships between different species because this information may provide insight into the evolutionary process of life. But, DNA - the genetic code of all life - and its protein products often eluded serious analysis in the late 1970s.
Before the rise of personal computers, comparisons of genetic codes or protein sequences were difficult to complete. Rough comparisons could be made through electrophoresis, a process that separates DNA strands by their weight. Exact comparisons involved great amounts of data and time.
Sequencing is the process of determining a nucleotide or protein sequence from entire DNA or protein molecules. Comparisons of sequences can determine the divergence of evolution between species. They have helped establish mutations of specific proteins as they evolve with species, and have the potential to examine how cancerous cells differ from healthy ones.
One of the first computer programs to compare proteins was written in 1983 by Pearson. When he arrived at the University he had a computer, but no laboratory or students. Believing that a rapid method to compare proteins was needed, he networked his computer to another that contained digital protein sequences.
"I wrote FASTP [FAST Protein], a program which compares sequences in a linear manner," he said.
"This program was adopted very quickly," Biology Prof. Robert Kretsinger said. "Labs involved in protein sequencing recognized the power of such a program."
Through a simple but time consuming process, an organism's sequence is mapped out and the data converted into a file.
This experimental sequence then is compared to a known organism's genome, or genetic map. These reference genomes are available at the National Institutes of Health and universities throughout the country.
Sequencing is useful for scientists studying unknown bacteria, who need to know how they fit into the evolutionary history of life.
"Those labs with lots of new sequences will ask what they are similar to," Kretsinger said. "They will want to test it against databases that are publicly available to see what the protein in question is most related to."
These databases also include the complete genome of organisms such as the bacteria E. coli and the fruit fly Drosophila. The final results of the Human Genome Research Project are expected to be added to the database by 2003.
Pearson's program uses a series of algorithms to search for statistical matches within two sequences. When portions of the sequences match, evolutionary and biological meanings of their similarities may be determined. Researchers may not want to study an organism if it is not related to another.
"It's easier to extract DNA and sequence it than to experiment on the organism," Pearson said.
FASTP was the first in a series of programs Pearson wrote. In 1985, Pearson made improvements to produce FASTA (FAST Analysis).
"FASTA can detect weak similarities between proteins," Kretsinger said. "Some similarities are obvious, but FASTA can pick up those that are at the edge of statistical similarity."
Pearson smiles when asked about his decision to distribute the software without cost. After writing his first program in 1977, a private company approached him with an offer to license the software in exchange for a percentage of the profits. Pearson agreed, but very few people bought the program.
"I realized that I want people to use my software," he said. "The way to get a program used is to have it out there so people can use it. The fewer barriers you put, the better."
"It has gone through various upgrades, the versions are publicly available" Kretsinger said. "It is still the standard for many, many labs"
Pearson has continued developing FASTA and uses sequencing with his students in the graduate biochemistry department.




 
         
                