Genetic correlation across local ancestry segments#
[1]:
import pandas as pd
import numpy as np
import admix
import os
[2]:
dset = admix.io.read_dataset("example_data/CEU-YRI")
[3]:
dset
[3]:
admix.Dataset object with n_snp x n_indiv = 15357 x 10000, n_anc=2
snp: 'CHROM', 'POS', 'REF', 'ALT', 'QUAL', 'FILTER', 'INFO', 'LANC_FREQ1', 'LANC_FREQ2', 'LANC_NHAPLO1', 'LANC_NHAPLO2', 'FREQ'
[4]:
np.random.seed(1234)
cor = 0.9
sim = admix.simulate.quant_pheno(dset, hsq=0.25, cor = 0.9, n_sim=10)
admix.simulate.quant_pheno: 100%|██████████████████████████████████████████████████████████████████████████| 15/15 [00:13<00:00, 1.08it/s]
[5]:
pheno_df = pd.DataFrame({"PHENO": sim['pheno'][:, 0], "COVAR": np.random.normal(size=dset.n_indiv)}, index=dset.indiv.index)
os.makedirs("example_data/genet_cor/", exist_ok=True)
pheno_df.to_csv("example_data/genet_cor/pheno.tsv", sep='\t')
out_dir=example_data/genet_cor
pfile=example_data/CEU-YRI
Step 1: Compute GRM#
mkdir -p ${out_dir}/admix-grm
admix append-snp-info \
--pfile ${pfile} \
--out ${pfile}.snp_info
admix admix-grm \
--pfile ${pfile} \
--out-prefix ${out_dir}/admix-grm/grm
Step 2: Estimate genetic correlation#
mkdir -p ${out_dir}/estimate
# this step will take a while
admix genet-cor \
--pheno ${out_dir}/pheno.tsv \
--grm-prefix ${out_dir}/admix-grm/grm \
--out-dir ${out_dir}/estimate/
admix summarize-genet-cor \
--est-dir ${out_dir}/estimate/ \
--out-prefix ${out_dir}/estimate
cat ${out_dir}/estimate.summary.json
{
"n": 10000,
"rg_mode": 0.875,
"rg_hpdi(50%)": [
0.838,
0.909
],
"rg_hpdi(95%)": [
0.754,
0.961
],
"rg=1_pval": 0.001345845749376501
}
Step 3: meta-analysis#
To obtain results for simulations from all correlation parameters and simulation replicates (or for all traits in real data analysis), we recommend using computing clusters to parallelize this process. After these results are obtained, one can use admix meta-analyze-genet-cor
to meta-analyze these results. For example,
admix meta-analyze-genet-cor --loglkl-files "${out_dir}/*.loglkl.txt"