Index of /zaykin/htr/win

      Name                    Last modified       Size  Description

[DIR] Parent Directory 10-Apr-2007 21:43 - [   ] bash.exe 19-Feb-2002 15:14 468k [   ] cat.exe 19-Feb-2002 19:26 17k [   ] cntcols.exe 08-Jul-2002 16:42 201k [   ] cygintl-1.dll 13-Dec-2001 05:33 22k [   ] cygwin1.dll 06-Jul-2002 02:19 883k [TXT] dat.txt 02-Aug-2002 10:46 3k [   ] echo.exe 30-Mar-2001 13:29 20k [   ] emgi.exe 02-Aug-2002 11:48 698k [   ] emgi.sh 25-Jul-2002 18:05 2k [   ] pcols.exe 08-Jul-2002 16:10 215k [   ] rm.exe 15-Jun-2001 15:24 64k [TXT] wnddat.txt 08-Jul-2002 17:12 5k [TXT] wndhdr.txt 08-Jul-2002 17:12 1k

// Time-stamp: <2002-07-09 02:19:49 zaykin>
// written by Dmitri Zaykin, zaykin@statgen.ncsu.edu

Programs here and at ftp://statgen.ncsu.edu/pub/zaykin/htr/
implement the Haplotype Trend Regression (HTR) method from

Zaykin DV, Westfall PH, Young SS, Karnoub MC, Wagner MJ, Ehm
MG. 2002. Testing association of statistically inferred haplotypes
with discrete and continuous traits in samples of unrelated
individuals. Human Heredity 53:79-91.

This software is also referenced in Xu C-F, Lewis K, Cantone CL, Khan
P, Donnelly C, White N, Crocker N, Boyd PR, Zaykin DV, Purvis IJ.
2002. The effectiveness of computational methods in haplotype
prediction. Hum Genet 110:148-156.

The programs in htr.zip archive (or files in "win" subdirectory) are
Cygwin ports of my UNIX code (to MS Windows). Files is src are the
original UNIX source.

Try "emgi.exe" (using fixed number of markers) or "bash emgi.sh"
(sliding window script) for a short help on usage.

There are two ways to run the programs, (1) the "fixed" set of markers
mode and (2) the "sliding window" mode.

(1) emgi.exe. A possible command line using "dat.txt" file:

  emgi.exe 11 .001 x dat.txt 10000 1234 > out.txt

11 -- # of random EM restarts

.001 -- EM convergence precision

x -- missing data label. If this is a character having special meaning
to the system, it should be quoted, e.g. '?'.

dat.txt -- data file. First column is the phenotypic value (could be
not only continuous, but the binary too). Next go genotypes, two
columns per marker. That is, each individual is represented by a
row. Allele names are arbitrary characters or strings, however they
are recoded to integers in the output unless they're integers
originally.

10000 -- # of shufflings for the empirical p-value. Use 0 to compute
the asymptotic test.

1234 -- random seed

out.txt -- output file

(2) The second mode is the sliding window. The program is a shell
script, emgi.sh, which is a wrap around emgi.exe binary. Example using
provided data and header files with the window size of 3 markers and
missing data coded by 'x':

  bash emgi.sh wnddat.txt wndhdr.txt 3 x > wndout.txt

The script is set up to compute asymptotic p-values, for speed. To
compute shuffled ones, replace num_runs=0 with, say num_runs=10000
in the script.


**** Random notes ****

1) On occasion the program produces the following user-oblivious
message:

  "Numerical Recipes run-time error...
  a or b too big, or MAXIT too small in betacf
  ...now exiting to system..."

This error is related to p-value computation for the multiple
regression F statistic and means low variance of estimated haplotypes
(or phenotype) for the complete data subset. My program checks for
this and the p-value should be set to 1 in the output (you can't get a
significant result in such cases). Across "usual" data sets, only a
minor proportion of would-be non-significant results generates this
message.