Association testing#
admix assoc \
--pfile <geno_file_prefix> \
--pheno <pheno_file> \
--method ATT,TRACTOR \
--family quant \
--quantile-normalize True \
--out toy-admix
Parameter options#
- admix.cli.assoc(pfile: str, pheno: str, out: str, method: str | List[str] = 'ATT', family: str = 'quant', quantile_normalize: bool = False, snp_list: str | None = None, fast: bool = True)[source]#
Perform association testing.
- Parameters:
pfile (str) – Prefix to the PLINK2 file (.pgen should not be added). When using a method requiring local ancestry, a matching
<pfile>.lanc
file should also exist.pheno (str) – Path to the phenotype file. The text file should be space delimited with header and one individual per row. 1st column: individual ID. 2nd column: phenotype values. 3rd - nth columns: covariates. NaN should be encoded as “NA” and these individuals will be removed in the analysis. Binary phenotype should be encoded as 0 and 1, and
--family binary
should be used. All columns will be used for the analysis. NaN should be encoded as “NA” and NaN will be imputed with the mean of each covariate. Categorical covariates will be converted to one hot encodings internally.out (str) – Path the output file.
<out>.<method>.assoc
will be created.method (Union[str, List[str]]) – Method to use for association analysis (default ATT). Other methods include: TRACTOR, ADM, SNP1, HET
family (str) – Family to use for association analysis (default quant). One of
quant
orbinary
.quantile_normalize (bool) – Whether to quantile normalize the phenotype and every covariate. When
--family binary
is used, quantile normalization will only be applied to covariates.snp_list (str) – Path to the SNP list file. Each line should be a SNP ID. Only SNPs in the list will be used for the analysis.
fast (bool) – Whether to use fast mode (default True).
Additional notes#
Running parallel jobs#
To parallelize the analysis, use --snp-list
option to split the analysis into
multiple jobs with each job analyzing a subset of SNPs. --snp-list
accepts a
file path containing a list of SNPs (1 SNP per line). For example, to split the
above job into 10 jobs, we run the following code to create snplist files:
import admix
import numpy as np
DSET_PREFIX="/path/to/pfile" # e.g. "toy-admix"
dset = admix.io.read_dataset(DSET_PREFIX)
index_list = np.array_split(dset.snp.index, 10)
for i, index in enumerate(index_list):
np.savetxt(f"cache/{DSET_PREFIX}.{i}.snplist".format(i), index, fmt="%s")
# note the added --snp-list line
# replace ${JOB_ID} with 0, 1, 2, ..., ${{N_JOB - 1}}
admix assoc \
--pfile toy-admix \
--pheno toy-admix.pheno \
--method ATT,TRACTOR \
--quantile-normalize True \
--snp-list cache/toy-admix.${JOB_ID}.snplist \
--out toy-admix
Methodology details#
For background, we recommend reading On powerful GWAS in admixed populations. Nature Genetics (2021)