The MetaXcan Software hosts a suite of tools i.e PrediXcan, SPrediXcan, MultiXcan and SMultiXcan. This post describes the file format output from each tool.
PrediXcan
Individual-level data method to compute gene-trait associations. Detailed info
The output is a tab delimited file which contains Individual predicted expression on the rows and gene predicted in the columns.
The first two columns contain the FID and IID for every observation.
Association
Gives the association between predicted expression and an outcome.(PrediXcanAssociation.py
)
Each output has the following columns;
gene
: ENSEMBLE ID or intron ideffect
: estimated effect sizese
: estimated effect size standard errorzscore
: predicted association z-scorepvalue
: association p-valuen_samples
: number of samples usedstatus
: If there was any error in the computation, it is stated here
SPrediXcan
Runs association between the gene models and summary statistics.
Each output file is a CSV, with each row containing a gene association at a given trait-tissue combination:
gene
: ENSEMBLE ID or intron idgene_name
: HUGO name or intron idzscore
: predicted association z-scoreeffect_size
: estimated effect sizepvalue
: association p-valuevar_g
: estimated variance of predicted expression or splicing, calculated as W' * G * W (where W is the vector of SNP weights in a gene’s model, W' is its transpose, and G is the covariance matrix)pred_perf_r2
: prediction model cross-validated performancepred_perf_pval
: prediction model cross-validated performancepred_perf_qval
: deprecated, empty field left for compatibilityn_snps_used
: number of snps in the intersection of GWAS and modeln_snps_in_cov
: number of snps in the LD compilationn_snps_in_model
: number of snps in the modelbest_gwas_p
: smallest p-value acros GWAS snps used in this modellargest_weight
: largest prediction model weight
MultiXcan
Multi-Tissue PrediXcan, takes multiple gene expression files as input.
This script computes a gene-level association from predicted gene expression to a human trait, using multiple studies for each gene jointly. It supports adjusting for covariates. It inputs predicted expression files as generated by Predict.py
The results look like:
gene
: a gene’s id: as listed in the Tissue Transcriptome model. Ensemble Id for most gene model releases. Can also be a intron’s id for splicing model releases.pvalue
: significance p-value of MultiXcan associationn_models
: number of models (tissues) available for this genen_samples
: number of individuals available to this gene-phenotype combination (k.e. inner join of phenotype and predictions)p_i_best
: best p-value of single-tissue PrediXcan association.m_i_best
: name of best single-tissue PrediXcan association.p_i_worst
: worst p-value of single-tissue PrediXcan association.m_i_worst
: name of worst single-tissue PrediXcan association.status
: If there was any error in the computation, it is stated heren_used
: number of independent components of variation kept among the tissues' predictions. (Synthetic independent tissues)max_eigen
: In the PCA decomposition of predicted expression, the maximum eigenvalue.min_eigen
: In the PCA decomposition of predicted expression, the minimum eigenvalue.min_eigen_kept
: In the PCA decomposition of predicted expression, the minimum eigenvalue kept (i.e. surviving SVD)
If you specify --loadings_output
, you’ll get a file specify the loadings of the PC decomposition of predicted expressions for each gene:
gene
: Ensemble Id (or intron id) being analizedpc
: identifier of principal componenttissue
: tissue being analyzedweight
: coefficient of loading from tissues to PC
If you specify --coefficient_output
, you get a file with effect sizes for the tissues involved in each gene:
param
: effect size of the PCA-regularized regression. (i.e. effect sizes of the PC components, converted to tissue-space)variable
: tissue being analyzedgene
: ensemble ID (or intron id)
SMultiXcan
Summary-stats based Multi-Tissue PrediXcan.
The results contain the following columns;
gene
: a gene’s id: as listed in the Tissue Transcriptome model.gene_name
: gene name as listed by the Transcriptome Model, typically HUGO for a gene. It can also be an intron’s id.pvalue
: significance p-value of S-MultiXcan associationn
: number of “tissues” available for this genen_indep
: number of independent components of variation kept among the tissues' predictions. (Synthetic independent tissues)p_i_best
: best p-value of single-tissue S-PrediXcan association.t_i_best
: name of best single-tissue S-PrediXcan association.p_i_worst
: worst p-value of single-tissue S-PrediXcan association.t_i_worst
: name of worst single-tissue S-PrediXcan association.eigen_max
: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the top independent componenteigen_min
: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the last independent componenteigen_min_kept
: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the smalles independent component that was kept.z_min
: minimum z-score among single-tissue S-PrediXcan associations.z_max
: maximum z-score among single-tissue S-PrediXcan associations.z_mean
: mean z-score among single-tissue S-PrediXcan associations.z_sd
: standard deviation of the mean z-score among single-tissue S-PrediXcan associations.tmi
: trace of T * T', where This correlation of predicted expression levels for different tissues multiplied by its SVD pseudo-inverse. It is an estimate for number of indepent components of variation in predicted expresison across tissues (typically close to n_indep)status
: If there was any error in the computation, it is stated here