Genetic correlation estimation#

Step 0: Prepare data#

  1. Phased genotypes and inferred local ancestry (please follow preparing dataset). So you have ${prefix}.chr${chrom}.[pgen|psam|pvar|lanc] files.

  2. Phenotype and covariates file per trait ${trait}.txt.

With these files, you can run the following command to estimate \(r_\text{admix}\).

Step 1: compute GRM \(\mathbf{K}_1\) and \(\mathbf{K}_2\) for each chromosome#

mkdir -p ${out_dir}/admix-grm
admix admix-grm \
    --pfile ${prefix}.chr${chrom} \
    --out-prefix ${out_dir}/admix-grm/chr${chrom}

This step will generate ${out_dir}/admix-grm/chr${chrom}.[grm.bin|grm.id|grm.n|weight.tsv] files.

Step 2: merging GRMs across chromosomes#

admix admix-grm-merge \
    --prefix ${out_dir}/admix-grm/chr\
    --out-prefix ${out_dir}/admix-grm/merged

This step will generate ${out_dir}/admix-grm/merged.[grm.bin|grm.id|grm.n|weight.tsv] files.

Step 3: calculating the GRM (\(\mathbf{K}_1 + r_\text{admix} \mathbf{K}_2)\) at different \(r_\text{admix}\) values and estimating log-likelihood at different \(r_\text{admix}\) values#

admix genet-cor \
    --pheno ${trait}.txt
    --grm-prefix ${out_dir}/admix-grm/merged \
    --out-dir ${out_dir}/estimate/${trait}

Parameter options#

admix.cli.admix_grm(pfile: str, out_prefix: str, maf_cutoff: float = 0.005, her_model='mafukb', freq_cols=['LANC_FREQ1', 'LANC_FREQ2'], snp_chunk_size: int = 256, snp_list: str | None = None, write_raw: bool = False) None[source]#

Calculate the admix GRM for a given pfile

Parameters:
  • pfile (str) – Path to the pfile

  • out_prefix (str) – Prefix of the output files

  • maf_cutoff (float, optional) – MAF cutoff for the admixed individuals, by default 0.005

  • her_model (str, optional) – Heritability model, by default “mafukb” one of “uniform”, “gcta”, “ldak”, “mafukb”

  • freq_cols (List[str], optional) – Columns of the pfile to use as frequency, by default [“LANC_FREQ1”, “LANC_FREQ2”] to perform the ancestry-specific MAF cutoffs

  • snp_chunk_size (int, optional) – Number of SNPs to read at a time, by default 256 This can be tuned to reduce memory usage

  • snp_list (str, optional) – Path to a file containing a list of SNPs to use. Each line should be a SNP ID. Only SNPs in the list will be used for the analysis. By default None

  • write_raw (bool, optional) – Whether to write the raw GRM, G1, G2, G12, by default False

Returns:

  • GRM files ({out_prefix}.[K1, K2].[grm.bin | grm.id | grm.n] will be generated)

  • Weight file ({out_prefix}.weight.tsv will be generated)

admix.cli.admix_grm_merge(prefix: str, out_prefix: str, n_part: int = 22) None[source]#

Merge multiple GRM matrices

Parameters:
  • prefix (str) – Prefix of the GRM files, any files with the pattern of <prefix>.* will be merged

  • out_prefix (str) – Prefix of the output file

  • n_part (int, optional) – Number of partitions, by default 22

Returns:

  • GRM files ({out_prefix}.[K1, K2].[grm.bin | grm.id | grm.n] will be generated)

  • Weight file ({out_prefix}.weight.tsv will be generated)

admix.cli.genet_cor(pheno: str, grm_prefix: str, out_dir: str, rg_grid=array([0., 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.]), quantile_normalize: bool = True, n_thread: int = 2, clean: bool = True)[source]#

Estimate genetic correlation

Parameters:
  • pheno (str) – phenotype file, the 1st column contains ID, 2nd column contains phenotype, and the rest of columns are covariates.

  • grm_prefix (str) – folder containing K1, K2 GRM files

  • out_dir (str) – folder to store the output files

  • rg_grid (list, optional) – List of rg values to grid search, by default np.linspace(0, 1.0, 21)

  • quantile_normalize (bool) – whether to perform quantile normalization for both phenotype and each column of covariates

  • n_thread (int, optional) – number of threads, by default 2

Additional notes#

For more background, we recommend reading Causal effects on complex traits are similar across segments of different continental ancestries within admixed individuals. Nature Genetics (2023).