Cochise Linux Users Group
Education

Next Meeting
Check with the mailing list


Sequence Analysis Tools

BLAST - BLAST (Basic Local Alignment Search Tool - is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA.

EMBOSS - EMBOSS (The European Molecular Biology Open Software Suite - is an Open Source software analysis package specially developed for the needs of the molecular biology user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web.

BLAT - BLAT ( by keeping an index of the entire human genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences but an index derived from the current assembly of the entire draft human genome.

HMMER HMMER ( profiles hidden Markov models for biological sequence analysis.

Data Formats

DDBJ - DDBJ (DNA Data Bank of Japan - is the sole DNA data bank in Japan, which is officially certified to collect DNA sequences from researchers and to issue the internationally recognized accession number to data submitters. Since the collected data is exchanged with EMBL/EBI and GenBank/NCBI on a daily basis, the three data banks share virtually the same data at any given time.

EMBL - The EMBL (European Molecular Biology Laboratory - Nucleotide Sequence Database constitutes Europe's primary nucleotide sequence resource.

GenBank - GenBank is the NIH (National Institute of Health - genetic sequence database, an annotated collection of all publicly available DNA sequences. There are approximately 22,617,000,000 bases in 18,197,000 sequence records as of August 2002.

Pfam - Pfam ( is a large collection of multiple sequence alignments and hidden Markov models covering many common protein families.

PROSITE - PROSITE ( is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs.

SWISS-PROT - Swiss-Prot ( a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.

Common Tables

Nucleic Acids - Nomenclature ( for Incompletely Specified Bases in Nucleic Acid Sequences

Genetic Codes - The Genetic Codes Compiled by Andrzej (Anjay - Elzanowski and Jim Ostell National Center for Biotechnology Information (NCBI), Bethesda, MD.

Amino Acids - Nomenclature ( and Symbolism for Amino Acids and Peptides


European Bioinformatics Institute - The European Bioinformatics Institute (EBI - is a non-profit academic organization that forms part of the European Molecular Biology Laboratory (EMBL). The EBI is a center for research and services in Bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures.

UCSC Genome Bioinformatics Site - This site ( contains the reference sequence for the human genome and the working drafts for the mouse and rat genomes.

Virginia Bioinfomatics Institute - (

Bioperl - The Bioperl Project ( is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.

Biojava - The BioJava Project ( is an open-source project dedicated to providing Java tools for processing biological data.

The WWW Virtual Library: Model Organisms - This site ( is a catalog of Internet resources relating to biological model organisms.


WIMS - WWW Interactive Mathematics Server ( is an Internet server system designed for mathematical educational purposes.


GRASS - GRASS GIS (Geographic Resources Analysis Support System - is an open source, free software Geographical Information System (GIS) with raster, topological vector, image processing, and graphics production functionality that operates on various platforms through a graphical user interface and shell in X-Windows.

SWAT - SWAT (Soil and Water Assessment Tool - is an ARS hydrologic model developed to predict the impact of land management practices on water, sediment and agricultural chemical yields in large complex watersheds with varying soils, land use and management conditions over long periods of time.


Celestia - Celestia ( an OpenGL-based 3D space simulation for Unix and Win32 that lets you travel through the solar system, to the stars, and even beyond the galaxy. Visit over 100,000 stars, 100 solar system bodies, and all known extrasolar planets.

Education Links

K-12 Linux Project - This web site ( provides information on setting Linux servers and workstations in a classroom or for school projects.

SEUL - The end goal of SEUL ( is to have a comprehensive suite of high-quality applications (productivity applications as well as leisure/programming applications) available under the GPL for the Linux platform, as well as a broader base of educated users around the world who understand why free software is better.

SAL - SAL (Scienmtific Applications on Linux - is a collection of information and links to software that will be of interest to scientists and engineers.

SchoolForge - Schoolforge's ( mission is to unify independent organizations that advocate, use, and develop open resources for primary and secondary education. Schoolforge is intended to empower member organizations to make open educational resources more effective, efficient, and ubiquitous by enhancing communication, sharing resources, and increasing the transparency of development.

LSTP - The LTSP (Linux Terminal Server Project - is all about running thin client computers in a GNU/Linux environment.

