Genetic correlation estimation¶
Step 0: Prepare data¶
Phased genotypes and inferred local ancestry (please follow preparing dataset). So you have
files.Phenotype and covariates file per trait
With these files, you can run the following command to estimate \(r_\text{admix}\).
Step 1: compute GRM \(\mathbf{K}_1\) and \(\mathbf{K}_2\) for each chromosome¶
mkdir -p ${out_dir}/admix-grm
admix admix-grm \
--pfile ${prefix}.chr${chrom} \
--out-prefix ${out_dir}/admix-grm/chr${chrom}
This step will generate ${out_dir}/admix-grm/chr${chrom}.[grm.bin||grm.n|weight.tsv]
Step 2: merging GRMs across chromosomes¶
admix admix-grm-merge \
--prefix ${out_dir}/admix-grm/chr\
--out-prefix ${out_dir}/admix-grm/merged
This step will generate ${out_dir}/admix-grm/merged.[grm.bin||grm.n|weight.tsv]
Step 3: calculating the GRM (\(\mathbf{K}_1 + r_\text{admix} \mathbf{K}_2)\) at different \(r_\text{admix}\) values and estimating log-likelihood at different \(r_\text{admix}\) values¶
admix genet-cor \
--pheno ${trait}.txt
--grm-prefix ${out_dir}/admix-grm/merged \
--out-dir ${out_dir}/estimate/${trait}
Parameter options¶
- admix.cli.admix_grm(pfile: str, out_prefix: str, maf_cutoff: float = 0.005, her_model='mafukb', freq_cols=['LANC_FREQ1', 'LANC_FREQ2'], snp_chunk_size: int = 256, snp_list: str | None = None, write_raw: bool = False) None [source]¶
Calculate the admix GRM for a given pfile
- Parameters:
pfile (str) – Path to the pfile
out_prefix (str) – Prefix of the output files
maf_cutoff (float, optional) – MAF cutoff for the admixed individuals, by default 0.005
her_model (str, optional) – Heritability model, by default “mafukb” one of “uniform”, “gcta”, “ldak”, “mafukb”
freq_cols (List[str], optional) – Columns of the pfile to use as frequency, by default [“LANC_FREQ1”, “LANC_FREQ2”] to perform the ancestry-specific MAF cutoffs
snp_chunk_size (int, optional) – Number of SNPs to read at a time, by default 256 This can be tuned to reduce memory usage
snp_list (str, optional) – Path to a file containing a list of SNPs to use. Each line should be a SNP ID. Only SNPs in the list will be used for the analysis. By default None
write_raw (bool, optional) – Whether to write the raw GRM, G1, G2, G12, by default False
- Returns:
GRM files ({out_prefix}.[K1, K2].[grm.bin | | grm.n] will be generated)
Weight file ({out_prefix}.weight.tsv will be generated)
- admix.cli.admix_grm_merge(prefix: str, out_prefix: str, n_part: int = 22) None [source]¶
Merge multiple GRM matrices
- Parameters:
prefix (str) – Prefix of the GRM files, any files with the pattern of <prefix>.* will be merged
out_prefix (str) – Prefix of the output file
n_part (int, optional) – Number of partitions, by default 22
- Returns:
GRM files ({out_prefix}.[K1, K2].[grm.bin | | grm.n] will be generated)
Weight file ({out_prefix}.weight.tsv will be generated)
- admix.cli.genet_cor(pheno: str, grm_prefix: str, out_dir: str, rg_grid=array([0., 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.]), quantile_normalize: bool = True, n_thread: int = 2, clean: bool = True)[source]¶
Estimate genetic correlation
- Parameters:
pheno (str) – phenotype file, the 1st column contains ID, 2nd column contains phenotype, and the rest of columns are covariates.
grm_prefix (str) – folder containing K1, K2 GRM files
out_dir (str) – folder to store the output files
rg_grid (list, optional) – List of rg values to grid search, by default np.linspace(0, 1.0, 21)
quantile_normalize (bool) – whether to perform quantile normalization for both phenotype and each column of covariates
n_thread (int, optional) – number of threads, by default 2
Additional notes¶
For more background, we recommend reading Causal effects on complex traits are similar across segments of different continental ancestries within admixed individuals. Nature Genetics (2023).