Genetics has been an informational science since the elucidation of DNAís structure. Todayís researchers say the field shifted to a more computational mode in 1990ñthe year that research groups began mapping genes to specific chromosomal sites for the Human Genome Project. "That year was pivotal, because it was then that the need to sequence significant amounts of DNA became compelling," says Richard Gibbs, director of the genome sequencing project at Baylor College of Medicine in Houston.
The pace of genome research is expected to increase as researchers devise shortcuts to direct sequencing (J.C. Venter et al., Nature, 381:364-6, 1996). This will result in a need for more tools to seek meaning in the reams of A, T, C, and G DNA bases coming from automated sequencers (H. Ahern, The Scientist, Oct. 16, 1995, page 18). The As, Ts, Cs, and Gs, in contiguous triplets, encode the amino acid sequences of proteins. Software assists researchers at all stages of gene discovery and analysisñpedigree charting; gene mapping; reading DNA sequences from electrophoresis gels and predicting encoded protein sequences; identifying primers for gene amplification; and searching among similar sequences in other species for homologies.
Several dozen investigators from all over the world responded via E-mail and telephone to The Scientistís request for opinions on available products. They represent a wide variety of institutions, ranging from small academic labs to medical centers to biotech companies small and large. A software "greatest hits" list of sorts emerges from their replies.
Pedigree Analysis
Genetic research begins with families. Two thousand clinical and research geneticists worldwide use Cherwell Scientific Publishing Inc.ís Cyrillic software for pedigree construction, according to company statistics.
A pedigree depicts blood relationships between individuals. For decades, geneticists have drawn pedigrees by hand, but todayís abundant data complicate the task. "Researchersí needs for pedigree handling have moved in two directionsñhandling large amounts of marker information on relatively few large families, and handling large numbers of markers on a large number of quite small families," says Cyrillicís creator, Cyril Chapman, head of the clinical genetics department at the Churchill Hospital in Oxford, England.
A Cyrillic user starts a pedigree by working with the computer mouse to select a symbol for the proband, the person whose medical condition prompted the genetic study. A square symbolizes a male, a circle represents a female. With the mouse, a researcher using Cyrillic can build lines to connect additional symbolsñvertically for generations, horizontally to show sibships and matings. Cyrillic goes beyond pen-and-paper pedigrees in the ability to list data under the symbols (such as medical history, risk estimates, and genetic markers) and to accommodate large families. The program can easily handle a pedigree of 2,000 individuals. Cyrillic runs on Windows, and costs $599.
Debra Collins, a genetic counselor and director of the genetics education center at the University of Kansas Medical Center, uses Cyrillic to track her work on families with Von Hippel-Lindau disease. This condition causes hypertension and brain and eye tumors. "You can edit the pedigrees easily, as we have seen families for over 10 years, and changes are made each year," she comments. She also likes the feature that deletes names from a pedigree, which is important for protecting privacy of patients in publications.
Sequencing Software
Deciphering a nucleotide sequence requires either a human brain or a computer to size-order the pieces of DNA resulting from a sequencing experiment and identify one of four possible labeled ends of each piece. The four types of labels correspond to the four types of DNA bases. In the early 1980s, researchers used digitizer pens to manually read DNA sequences from gels. "It took a long time, but was very accurate," recalls Steven Krawetz, an associate professor at the Center for Molecular Medicine and Genetics at Wayne State University in Detroit.
Then came image-capture devicesñcameras that digitized the information on gels. In 1987, Krawetz helped develop the first DNAñsequencing software for automated film readers, which eventually would become products offered by Bio Image Corp. of Ann Arbor, Mich. "Gel readers digitize the image and use software to define lanes, and identify where bands differ in relative intensity," he says. Lanes are areas on a gel where material to be separated is placed, and bands are detectable fragments of that material.
But even with the best systems, some bands are ambiguous. According to Krawetz, this is where Bio Imageís DNA Sequence Film Reader and Sequence Assembly Manager, now offered together for $9,000, excel. "[The software] displays multiple images for difficult regions, which is really helpful when reviewing data to make a call for an ambiguous area," he states.
The DNA Sequence Film Reader calls out base sequences from electrophoresis gels. The Sequence Assembly Manager can assemble up to 1,000 DNA sequences into a 50-kilobase contiguous sequence (known as a contig) in just minutes. It also draws evolutionary tree diagrams, derives complementary sequences, searches for subsequences, and sends information to external databases. It is designed to run on SPARC-station computers from Sun Microsystems in Mountain View, Calif.
The company offers other image-analysis software for Windows and Macintosh environments.
Sequence-Analysis Software
In selecting software, gene researchers consider cost, functions, and ease of use. Sequencher, from Gene Codes Corp. in Ann Arbor, Mich., is one reasonably priced, full-featured package, according to scientists who use it. "Iím a big fan of Sequencher for editing and assembling data, for creating graphical representations of sequences and planning experiments," reports David Diamond, a visiting scientist at the Center for Molecular Biology and Gene Therapy at Loma Linda University in California.
Gene Codes Corp. president Howard Cash calls Sequencher "the dominant program in the North American genome project for DNA sequencing." Clients include biotech companies (such as Human Genome Sciences Inc. in Rockville, Md.; Millennium Pharmaceuticals in Cambridge, Mass.; and Genentech Corp. in South San Francisco, Calif.) and pharmaceutical giants (Monsanto Corp. in St. Louis and Glaxo-Wellcome Corp. in Research Triangle Park, N.C.)
Macintosh-based Sequencher rapidly aligns DNA sequences to derive a consensus sequence and interfaces with automated devices. The software also identifies regions of ambiguous sequence, screens for vectors and transposons (jumping genes), generates restriction maps, and more. Sequencher costs $1,800 for academic researchers and $2,600 for commercial users.
Searching For Functions
Once a geneís sequence is known, the next step is to compare it with other known sequences to search for similarities, or homologies, that hint at the geneís function. MacVector, offered by Oxford Molecular Group Ltd. of Campbell, Calif., excels in this area, report several users. It allows easy access via the Internet to Entrez, a database of gene and protein sequences maintained at the National Center for Biotechnology Information (NCBI) in the National Library of Medicine in Bethesda, Md. MacVector links up to NCBIís BLAST service, which searches for and identifies homologies. The package also can access libraries of specialized DNA sequences, such as transcription factors, maintained by the manufacturer.
MacVectorís versatility and ease of use contribute to its popularity. Gary Swergold, senior staff fellow in the Food and Drug Administrationís division of cell and gene therapy regulation in Rockville, Md., uses MacVector to compare newly sequenced genes to known genes. "BLAST searches are much faster than over the [World Wide Web]. Sequence manipulations are, overall, fairly simple because of the programís use of a Mac interface," he reports.
Oxford acquired MacVector from Eastman Kodak Co., which developed the program, early this year and is honoring Kodakís prices for existing academic users. The price otherwise is $2,495.
For sheer number and variety of services, many researchers mention the offerings of the Genetics Computer Group (GCG). The private, Madison, Wis.-based company, which was established in 1990, began in the laboratory of genetics at the University of Wisconsin in 1980. Today, GCGís "Wisconsin Package" reaches nearly 50,000 investigators from 550 VAX mainframes worldwide, according to vice president Maggie Smith.
GCGís 140 programs do nearly everything imaginable with DNA and protein sequences, users say. Michal Prochazka, visiting scientist at the Phoenix Epidemiology and Clinical Research Branch of the National Institute of Diabetes and Digestive and Kidney Diseases, uses GCG to study diabetes genes in the Pima Indians. "This involves a lot of sequencing, determining exon-intron structure of genes, and searches of databases to find if new genes we identify have homology with anything known," he tells The Scientist.
But some researchers say the documentation for GCG is not as helpful as they would like. "Learning to do some new analysis involved consulting colleagues to see if anyone had done something similar," contends Marion B. Coulter-Mackie, an assistant professor of pediatrics at the University of British Columbia in Vancouver.
Next January, GCG will begin charging sites with per-user licensing fees. "Weíre looking for fees of $100 for an academic user for a year and $200 for a commercial user," Smith says, adding that long-time academic users will pay $50 per year. The licensing fee for an academic/nonprofit institution using GCG is $4,000; for a commercial organization, the figure is $12,000.
On a smaller scale than GCG is Gene Jockey II. It is a classic example of an individual researcherís invention that evolved into a commercial product as colleagues used it and liked it. Philip Taylor, higher research officer at the center for reproductive biology at the Medical Research Council in Edinburgh, devised Macintosh-based Gene Jockey II for his research on pituitary-gland-releasing factor hormones.
"Gene Jockey started out as a sequence editor with a few analysis tools attached and grew from there. Other people in the lab started to use it and requested new features," Taylor relates.
BioSoft Inc. of Ferguson, Mo., markets the software. Company president John Lamble calls Gene Jockey II a "sequence processor for Macintosh." It searches, edits, manipulates, and analyzes. "With the size of current sequence databases, software like Gene Jockey II is essential. Manual searching would take many man-life-times," Lamble says. The price of Gene Jockey II is $1,000.
Cost Concerns
Cost can be a limiting factor in a researcherís choice of software. For example, scientists hail Lasergene software for Windows and Macintosh from DNASTAR Inc. for its ease of use, but the price tag is prohibitive for some labs. This customizable, full-featured program sells complete for $4,100 for academic users and $4,500 for commercial users; periodic updates are an additional $500 to $1,000. Users can bring the price down by tailoring the packages to eliminate some programs, according to Regina Holter, technical software consultant for the Madison, Wis.-based DNASTAR.
Brenda Shirley, an assistant professor of biology at Virginia Polytechnic Institute and State University in Blacksburg, is "thrilled" with DNASTAR. She comments that "itís relatively user-friendly and easy to learn by trial and error." Shirley also likes the programís editing, mapping, and protein-analysis functions. "It also does a very nice job with alignments, one of the main reasons for choosing DNASTAR over other systems."
Oliver Wildner, a visiting fellow at the National Center for Human Genome Research at the National Institutes of Health, cites areas for improvement. "The major disadvantage of [DNASTAR] is that the user has to switch between several different programs" he says, referring to the difficulty and time lag in switching between DNASTARís programs that perform different functions on the same DNA sequence.
And both Wildner and Shirley find it easier to access public databases directly than through DNASTAR. The company is improving Web access, according to marketing director Patricia Hoyle.
Some researchers prefer more basic packages than those of GCG or DNASTAR, if the programs do what they need. For this reason, many investigators cite DNA Strider, which is known more through word of mouth than through glossy ads. Credited by many to "some guy in France," the program hails from Christian Marck at the Centre díEtudes de Saclay in Cedex.
Melissa Caimano, a doctoral candidate in the microbiology department at the University of Alabama, Birmingham, calls DNA Strider "just what smaller labs need to get them started at a low initial investment." She uses it for DNA and protein-sequence analysis. "A few things that you canít do are align multiple sequences, compare two sequences, search with sequences containing mismatched base pairs, and link to GenBank," she notes.
Several others agree that DNA Strider doesnít have as many features as other products; but it is fast, inexpensive and easy to use. "It has a larger following in Europe than the U.S. It has no copyright protection, so I fear many treat it as freeware, which it is not," says Loma Lindaís Diamond. The program is available from the author for $200.
Mix And Match
Some scientists use several products to generate the optimal combination of features for their research. We use MacVector most often. It is a simple program and great for a quick check of sequences for restriction sites, DNA subsequences, simple alignment, or assembling contigs from automated sequencing," relates Michael Sullivan, a pediatric oncologist at the cancer genetics laboratory at the University of Otago in New Zealand. "However, MacVector is limited when it comes to serious sequence crunching, which is when we use the less friendly GCG package," he reports, referring to the size of DNA sequences that can be easily analyzed. "We also have a cutdown DNASTAR package for primer design only," he adds.
Betsy Hosler, a postdoctoral fellow at the Day neuromuscular Laboratory at Massachusetts General Hospital, uses different software at different stages of her gene research. Cyrillic tracks pedigrees, DNA Strider edits DNA sequences and generates restriction maps, and GCG assembles consensus sequences, corresponding to the beginning, middle, and end of gene analysis. "I have looked around for a coordinated package that would handle all our needs, but havenít yet found anything which doesnít cost more than the computer we would run it on!" she states.
Ever-expanding Internet resources compete with commercially available softwareñbut there is room for both, researchers stress. Krawetz says 300 to 400 Internet sites provide DNA-analysis freeware and shareware for such functions as restriction analysis and database searching. "I donít know how companies are going to respond to that from a marketing perspective. The one thing a company can offer is support for their products," he suggests. Others add that none of the freeware is assembled into a comprehensive package, such as the companies offer.
John Lambe, president of BioSoft Inc., defends commercial products, saying, essentially, that researchers get what they pay for. "As publishers, we go out of our way to be accountable for our programs in terms of updating them and letting people know about improvements. We feel this is a worthwhile advantage which should be paid for. It is in marked contrast to much freeware on the Web, which is dubious in quality to begin with, and for which the author takes little further responsibility," he states.
Software choices will keep pace with genetic research, developers predict. "We are going to continue to need software to display gene sequences, and to fish out homologies," says Gibbs. Future software will enable researchers to study more than one gene at a time, and detect more subtle homologies. Taylor foresees new products for multiple-processor microcomputers that can perform parallel tasks in genome sequencing and analysis.
Gene Jockey developer Taylor sums up the current and future role of DNA analysis software. "There are, or will be, genome projects for every economically and scientifically important organism," he comments. "Even if we only had the Human Genome Project to deal with, sequencing and sequence assembly is just the beginning. Any serious attempt to understand the higher levels of organization will be just as dependent on software."
By Ricki Lewis
Ricki Lewis, a freelance science writer based in Scotia, N.Y., is the author of several biology textbooks. She is online at 76715.3517@compuserve.com.
feedback form |
permissions |
international |
locate your campus rep |
request a review copy
Copyright ©2001 The McGraw-Hill Companies.
digital solutions |
publish with us |
customer service |
mhhe home
Any use is subject to the
Terms of Use and Privacy Policy.
McGraw-Hill Higher Education is one of the many fine businesses of the
The McGraw-Hill Companies.