Association testing#

admix assoc \
    --pfile <geno_file_prefix> \
    --pheno <pheno_file> \
    --method ATT,TRACTOR \
    --family quant \
    --quantile-normalize True \
    --out toy-admix

Parameter options#

admix.cli.assoc(pfile: str, pheno: str, out: str, method: str | List[str] = 'ATT', family: str = 'quant', quantile_normalize: bool = False, snp_list: str | None = None, fast: bool = True)[source]#

Perform association testing.

Parameters:
  • pfile (str) – Prefix to the PLINK2 file (.pgen should not be added). When using a method requiring local ancestry, a matching <pfile>.lanc file should also exist.

  • pheno (str) – Path to the phenotype file. The text file should be space delimited with header and one individual per row. 1st column: individual ID. 2nd column: phenotype values. 3rd - nth columns: covariates. NaN should be encoded as “NA” and these individuals will be removed in the analysis. Binary phenotype should be encoded as 0 and 1, and --family binary should be used. All columns will be used for the analysis. NaN should be encoded as “NA” and NaN will be imputed with the mean of each covariate. Categorical covariates will be converted to one hot encodings internally.

  • out (str) – Path the output file. <out>.<method>.assoc will be created.

  • method (Union[str, List[str]]) – Method to use for association analysis (default ATT). Other methods include: TRACTOR, ADM, SNP1, HET

  • family (str) – Family to use for association analysis (default quant). One of quant or binary.

  • quantile_normalize (bool) – Whether to quantile normalize the phenotype and every covariate. When --family binary is used, quantile normalization will only be applied to covariates.

  • snp_list (str) – Path to the SNP list file. Each line should be a SNP ID. Only SNPs in the list will be used for the analysis.

  • fast (bool) – Whether to use fast mode (default True).

Additional notes#

Running parallel jobs#

To parallelize the analysis, use --snp-list option to split the analysis into multiple jobs with each job analyzing a subset of SNPs. --snp-list accepts a file path containing a list of SNPs (1 SNP per line). For example, to split the above job into 10 jobs, we run the following code to create snplist files:

import admix
import numpy as np

DSET_PREFIX="/path/to/pfile" # e.g. "toy-admix"

dset = admix.io.read_dataset(DSET_PREFIX)
index_list = np.array_split(dset.snp.index, 10)
for i, index in enumerate(index_list):
    np.savetxt(f"cache/{DSET_PREFIX}.{i}.snplist".format(i), index, fmt="%s")
# note the added --snp-list line
# replace ${JOB_ID} with 0, 1, 2, ..., ${{N_JOB - 1}}

admix assoc \
    --pfile toy-admix \
    --pheno toy-admix.pheno \
    --method ATT,TRACTOR \
    --quantile-normalize True \
    --snp-list cache/toy-admix.${JOB_ID}.snplist \
    --out toy-admix

Methodology details#

For background, we recommend reading On powerful GWAS in admixed populations. Nature Genetics (2021)