admix.data.calc_partial_pgs#
- admix.data.calc_partial_pgs(dset: Dataset, df_weights: DataFrame, dset_ref: Dataset | None = None, ref_pop_indiv: List[List[str]] | None = None, weight_col='WEIGHT') DataFrame [source]#
Given a vector of polygenic score weights, calculate polygenic scores with regard to every ancestry backgrounds for each individual.
- Parameters:
dset (admix.Dataset) – The admix.Dataset object.
df_weights (pd.DataFrame) – The dataframe for polygenic score weights for each SNP, containing CHROM, POS, REF, ALT, WEIGHT, with index being SNP ID. CHROM, POS, REF, ALT will be aligned with dset.snp.
dset_ref (admix.Dataset, optional) – The reference dataset object. Use dapgen.align_snp to align the SNPs between dset and dset_ref. CHROM and POS must match, with potential flips of REF and ALT allele coding.
ref_pop_indiv (List[List[str]], optional) – The list of reference individual ID in dset_ref.
weight_col (str, optional) – The column name for the weight in df_weights.
- Returns:
pd.DataFrame – The polygenic scores (PGS) for each individual with shape of (n_indiv, n_anc).
- Raises:
AssertionError – If both dset_ref and ref_pop_indiv are None or not None.
Notes
The dset and df_weights should align, with potential allele flip.
If dset_ref is provided, dset and dset_ref should align, with potential allele flip.
The length of ref_pop_indiv should match with dset.n_anc.