Bioinformatics II (Homework Assignment 1)

Consider a pairwise alignment between Sequences A and B.  If a residue from Sequence A is aligned to
a gap character, it is sometimes useful to determine whether the resulting alignment position is part
of an "internal" gap or part of a "terminal" gap.  In this case, the alignment position would
be part of an internal gap if there is at least one residue in Sequence B that precedes it
in the alignment and there is at least one residue in Sequence B that follows it
in the alignment.  Otherwise, the alignment position is said to be part of a terminal gap.  For example,
the following alignment ...

AGT-CTTG--
---ACTAGGA

has 5 positions that are part of terminal gaps (3 positions on the 5' end and
2 on the 3' end) and 1 position that is an internal gap.
 

Write a program that implements a dynamic programming algorithm for global alignment of  DNA sequence pairs.
The program should find the alignment with the optimal score, where the score of an alignment is:

3*(# of alignment positions that are "internal" gaps) +
2*(# of alignment positions that are in "terminal" gaps on the 5' end) +
2*(# of alignment positions that are in "terminal" gaps on the 3' end) +
2*(# of alignment positions that are mismatches) +
0*(# of alignment positions that are matches).

For a given  pair of sequences, there may be multiple alignments that achieve the optimal score.  The
program should print one of these optimal alignments.  The program should also report the optimal score.

The program should read sequence files that are in "FASTA"  format.  With this format, the name of a sequence is on a line that begins with the character ">".
On subsequent lines, the sequence is listed.  The sequence is assumed to end when the file ends or the next ">" is encountered.

The sequence file that you should use as input when you hand in ouput of your file is here.

The homework should be emailed to me (thorne@ncsu.edu) before class on Wednesday February 1.

In the email, you should include: the computer code that you have written, the command
that I can use to compile the program on the unix-type system (e.g. Mac OSX, linux, unix), the command that I can use to run the program on a unix-type system, and the output that resulted.

(this page's address is http://statgen.ncsu.edu/thorne/bioinf2hwk1.html)