A landscape of gene expression regulation for synovium in arthritis

May 12, 2024

106

Random Image Popup

Ethics statement

The study was approved by the Ethics Committee of Xi’an Jiaotong University Honghui Hospital. All patients were provided written, informed consent before participating in the study.

Study participants and samples collection

We collected synovium samples and corresponding blood samples from 245 osteoarthritis patients undergoing knee joint replacement surgery (77 men, 168 women, age 46–84 years, mean 67 years), with no history of significant knee surgery, infection, or fracture, and no malignancy within the previous 5 years at the Honghui Hospital (Xi’an, China). All the patients were unrelated Chinese Han adults. Clinical information, including age, sex, and past medical history, was collected from the electronic medical records.

We obtained synovium samples from joint replacement and removed the nearby adipose tissue, then the samples freeze immediately with liquid nitrogen and stored under −80 °C. We also obtained the blood sample to extract DNA for genotyping for all patients.

DNA extraction

The genomic DNA was isolated from the blood using FlexGen Blood DNA Kit (KANGWEISHIJI Biotech, Jiangsu, China) according to the manufacturer’s instructions.

Genotyping and quality control

We used Illumina Infinium Asian Screening Array-24+ v1.0 for genotyping 659,184 variants (Engage to Life Energy Co. Ltd., China). Genotypes were called using GenomeStudio (Illumina) and transformed to plink format. The quality control was carried out referring to the GTEx project⁶ and eQTLQC⁷⁵, which contained the following steps:

1.

All variants were annotated to 1000 genome EAS data and removed variants which not inconsistent with 1000 genome.
2.

Filtering out variants and individuals with a missing rate greater than 0.2, then filtered out variants and individuals with a missing rate greater than 0.02. Variants filtering should be performed before individual filtering. This was done in two steps to avoid too strict filtering.
3.

Removed wrong sex individuals by “–check-sex” in plink 1.9.
4.

Removed variants with MAF < 0.01 and Hardy-Weinberg equilibrium exact test P-value < 1e-6 by “–hwe 1e-6 midp” in plink 1.9.
5.

Removed individuals with heterozygosity rate deviated more than 3 standard deviations from the heterozygosity rate mean. The heterozygosity rate was calculated by “–het” in plink 1.9, and this calculation was performed on variants in approximate linkage equilibrium (“–indep-pairwise 50 5 0.2” in plink 1.9).
6.

For each pair of ‘related’ individuals with a pihat > 0.2, we removed the individual with the lowest call rate.

The final datasets contained 243 patients and 485,437 variants.

Imputation

The imputation was conducted using Impute2 (v2.3.2_x86_64_static)^76,77 with the 1000 genomes haplotypes phase 3 data (https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html) as the reference panel. To increase the overall accuracy, we set -k 100 and -buffer 300, and other parameters were set to default.

We also conducted quality control for imputed variants as above (genotype missing rate, MAF, and Hardy-Weinberg equilibrium), expected the threshold of MAF was 0.05. We excluded 2 patients due to the high heterozygosity rate. The resulting dataset contained 243 patients and 4,499,337 variants.

RNA-seq sequencing and quality control

We performed a gene expression analysis on synovium samples from 210 patients. The RNA was extracted by TRIzol extraction method. Total RNA was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer’s recommendations. The library preparations were sequenced on an Illumina Novaseq 6000 platform and 150 bp paired-end reads were generated (Novogene Co. Ltd. Beijing, China).

We used Fastp⁷⁸ (version 0.19.7) to perform basic statistics on the quality of the raw reads. The steps of data processing were as follows:

(1)

Discard a paired reads if either one read contains adapter contamination.
(2)

Discard a paired reads if more than 10% of bases are uncertain in either one read.
(3)

Discard a paired reads if the proportion of low-quality (Phred quality < 5) bases is over 50% in either one read.

Quantification of RNA levels and gene expression

We applied FastQC to check samples quality and excluded the samples with low Q20 and Q30. Clean RNA-seq reads were mapped to human reference genome hg19 (contains only autosomes, sex chromosomes and mitochondrial chromosomes) with STAR (v2.7.9a)⁷⁹ aligner based on the GENCODE v19 (July 2013 freeze) annotation, the parameters setting was same as GTEx v8 project pipeline⁶. Gene-level expression read counts and TPM values were generated by RNA-SeQC (v2.3.5)⁸⁰ with default parameters. The read mapping rates and base mismatch rates were calculated from the output of SAMtools (v1.9)⁸¹ stats subcommand. The proportion of reads genomic origin were produced with QualiMap (v.2.2.2-dev)⁸². Samples should meet the following metrics as used for GTEx project: read mapping rate ≥ 0.2, base mismatch rate ≤ 0.01, intergenic mapping rate ≤ 0.3, rRNA mapping rate ≤ 0.3.

After removing samples that failed mapping quality control, we conducted expression outlier filtration. Briefly, the pairwise expression correlation coefficients were calculated using the log-transformed TPM values of all genes, assume the correlation coefficient between sample i and sample j is expressed as r_ij, we calculated

$${\bar{r}}_{i}=\mathop{\sum }\limits_{j}\frac{{r}_{{ij}}}{n}$$

(1)

The average correlation coefficient of sample i with all others of the total n samples. Lower ${\bar{r}}_{i}$ represent a lower quality. Then we calculated

$${D}_{i}=\frac{{\bar{r}}_{i}-\bar{\bar{r}}}{{{{{{\rm{median}}}}}}\left(\left|{\bar{r}}_{i}-\bar{\bar{r}}\right|\right)}$$

(2)

to provide a sense of distance from the grand correlation mean $\overline{\bar{r}}$. Six samples with D < −5.0 were considered as outliers and removed. The filtered gene expression dataset included 204 synovium samples.

Finally, samples that passed both genotype data and RNA-seq data quality control were used for the following analysis. Gene expression values for all samples were normalized for eQTL analyses using the following procedure: (1) read counts were normalized between samples using TMM⁸³ implemented in edgeR;⁸⁴ (2) genes were selected based on expression thresholds of ≥ 0.1 TPM in ≥ 20% of samples and ≥ 6 reads (unnormalized) in ≥ 20% of samples; (3) expression values for each gene were inverse normal transformed across samples. Only autosomal genes were used in the following eQTL analysis.

Covariates for eQTL analysis

To control for population effects on the discovery of QTLs, genotype principal components (PCs) were used as covariates in QTL mapping. The PCs were calculated by smartpca in EIGENSOFT⁸⁵ in 243 individuals, then we calculated the statistical significance of each principal component by twstats (Tracy–Widom statistics) and select the first two PCs as covariates (P < 0.05). To infer hidden factors associated with the cohort, sequencing batch, or other technical differences, we applied probabilistic estimation of expression residuals (PEER) for normalized expression data (run_PEER.R in GTEx v8 pipeline). According to the GTEx v8 method, 30 PEER factors were selected (150 ≤ samples count <250). Finally, sex and age were also used as covariates.

Identification of cis-eQTLs

For eQTL analysis, we got 202 synovium samples with matched genotype and gene expression datasets. For each gene, we considered genetic variants within 1 Mb of the transcription start site (TSS) as cis-eQTL and followed a similar method with GTEx project⁶. All variants on autosome with MAF ≥ 0.05 across the 202 individuals were included, except the MHC region (chr6:28477797-33448354).

We used the GTEx modified version of FastQTL⁸⁶ (https://github.com/francois-a/fastqtl; gtex_v6p version) to calculate cis-eQTL, and the adaptive permutation mode was used with the setting –permute 1000 10000. Nominal P-values for each gene-variant pair were calculated based on linear regression, including all covariates. The gene-level q-values⁸⁷ were calculated based on the beta distribution-extrapolated empirical P-values from FastQTL. A false discovery rate (FDR) threshold of ≤0.05 was applied to identify genes with at least one significant cis-eQTL (“eGenes”).

To identify all significant variant-gene pairs associated with cis-eGenes, the nominal P-value threshold was calculated as F⁻¹(p_t), where F⁻¹ is the inverse cumulative distribution of the beta distribution and p_t was the empirical P-value of the gene closest to the 0.05 FDR threshold. For each eGene, significant eQTLs were defined as variants with a nominal P-value below the nominal P-value threshold for that gene.

Annotation of variants

The annotation of SNPs was performed by ANNOVAR⁸⁸ (version 2020 Jun 08), with annotation datasets wgEncodeGencodeBasicV19.

Comparison of pLI over gene expression bins

This analysis was performed referring to eQTLGen⁷ project. All genes were divided into 10 bins according to the average expression quantile. The pLI of each gene were downloaded from https://static-content.springer.com/esm/art%3A10.1038%2Fnature19057/MediaObjects/41586_2016_BFnature19057_MOESM241_ESM.zip⁸⁹.

Identification of independent eQTL

We identified the conditionally independent eQTL signals using the forward stepwise regression followed by a backward selection step stepwise procedure described in GTEx v8⁶, which was calculated by tensorQTL v1.0.6⁹⁰. The primary eQTLs were defined as independent eQTLs with the highest ranking of each eGene, and the rest of the independent eQTLs were secondary independent signals.

Estimation the variance explained by independent eQTLs in gene expression

We used GCTA⁹¹ –make-grm to calculate the genetic relationship matrix for our samples, and then used –reml to calculate the variance explained by each independent eQTLs.

Heritability estimation of eGenes

We compared the heritability of eGenes with single and multiple independent signals. We used GCTA⁹¹ –reml to estimate the variance explained by the SNPs for each eGenes as the estimated heritability, which following the method of FUSION pipeline⁹².

ATAC-seq sequencing and peak calling

ATAC-seq libraries were constructed for synovium from following the original protocol⁹³. In brief, two hundred thousand cells were lysed with cold lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, and 0.03% Tween20), and centrifuged at 500 × g for 8 min at 4 °C. The supernatant was carefully removed, and the nuclei was resuspended with Tn5 transposase reaction mix (25 μl 2 × TD buffer, 2.5 μl Tn5 transposase, and 22.5 μl nuclease-free water) (Illumina) at 37 °C for 30 min. Immediately after the transposition reaction, DNA was purified using a Qiagen MinElute kit. Libraries were sequenced on an Illumina HiSeq X Ten sequencer. The ATAC-seq experiment and library sequencing were performed by Frasergen Bioinformatics Co., Ltd, Wuhan, China.

Adapters were trimmed from ATAC-seq reads sequences using custom Python scripts. Pair-end reads were aligned to hg19 using Bowtie2⁹⁴. Duplicate reads and reads with MAPQ < 30 were discarded. After filtering, the qualified reads were subjected to MACS2⁹⁵ to call peaks for each sample with parameters (-q 0.05 –nomodel –shift -100 –extsize 200 –keep-dup all). In total, we identified 154,649 ATAC-seq peaks from synovium.

Tissue specificity analysis

Synovium eQTLs were compared with 49 tissues from the GTEx v8 project (https://console.cloud.google.com/storage/browser/gtex-resources). The mashR (version 0.2.73)²⁰ method was used to assess sharing of significant signals among each tissue. Specifically, we randomly selected 1 million eQTL pairs from each tissue as the null signal. Then we fitted the model using the mash() function and used get_pairwise_sharing() function to assess sharing of significant signals among each pair of tissues.

Epigenetic markers enrichment for independent signals

The epigenetic datasets were downloaded from the GEO database. We downloaded 15 histone peak data for 6 histone markers (H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K27me3, and H3K9me3) from GSE163548 and GSE112655⁹⁶, 3 histone markers (H3K27ac, H3K4me1, H3K4me3) from NBDC database ID hum0207.v1⁹⁷. All epigenetic data were generated from knee OA patients’ synovium tissues.

The enrichment analysis was performed by chi-square test, compared with background SNPs from SNPsnap⁹⁸ with matched MAF, LD buddies, distance to nearest gene, and gene density.

Estimated heritability enrichment for arthritis and mental disorders

The heritability enrichment analysis was performed following the method of Kosoy et al.¹³. Briefly, we used the partitioned heritability analysis of LDSC²³ to calculate the heritability enrichment, and the estimated coefficients from LD score regression are normalized by the per-SNP heritability (h2/total SNPs per GWAS). To enable comparisons of the regression coefficients across traits with a wide range of heritabilities, we chose to normalize by the per-SNP heritability and named this adjusted metric the “normalized heritability coefficient”. The normalized heritability coefficients of mental disorders were from single GWAS summary data, and the normalized heritability coefficient of OA and RA were combined separately by meta-analysis (METASOFT v2.0.0)⁹⁹.

For enrichment analysis in independent eQTL, background SNPs were selected from SNPsnap database with matched MAF, LD buddies, distance to nearest gene, and gene density. For enrichment analysis in eQTac regions, background regions were non-significant eQTac regions.

Colocalization analysis between synovium eQTLs and related traits GWAS

Bayesian colocalization analysis

We searched synovium-related traits in the NHGRI-EBI GWAS Catalog (version e105_r2022-03-08)¹⁰⁰, and downloaded 25 GWAS datasets that had full summary statistics available (Supplementary Data 2). To examine colocalization between eQTLs and GWAS associations, we analyzed all 25 genome-wide significant signals by using coloc v5.2.0¹⁰¹.

Specifically, for each genome-wide significant signal (P < 5×10⁻⁸), we considered the region spanning 100 kb on either side of the index variants and merged the overlapped region. Correlations (LD) between SNPs were calculated in the UK Biobank or 1000 genomes EAS population, depending on the GWAS population. We also harmonized the allele orders between SNP summary statistics and reference population, to avoid conflict. The MHC region (chr6:28477797-33448354) was excluded from GWAS summary data.

Firstly, we used runsusie() function to distinguish multiple causal variants and obtained the posterior probability for each variant, the coverage of credible sets was set to 0.3 to capture moderate signals. Then for each GWAS signal that overlapped with any eQTL signals, we conducted the colocalization analysis. We considered a 60% posterior probability of GWAS and eQTL shared association in the region (PPH4 ≥ 0.6) to indicate evidence of colocalization.

LD-based colocalization analysis

For genes that couldn’t well fine-mapped by coloc5, we used previously described methods by conducting LD and conditional analysis¹⁰² to perform colocalization. We performed an initial colocalization analysis based on LD between a lead GWAS variant and a lead-independent eQTL variant.

To get the lead GWAS SNPs, we extracted all SNPs that met the threshold of genome-wide significance (P = 5 × 10⁻⁸) from both 25 GWAS summary datasets and GWAS catalog database. The GWAS catalog SNPs were searched as “arthritis” and “synovium”, and downloaded all associations except the tendon rupture phenotype. Then manually selected the trait for “rheumatoid arthritis”, “osteoarthritis”, “juvenile idiopathic arthritis”, “synovitis”, and “ankylosing spondylitis”. To reduce redundancy, we next pruned the GWAS SNPs by plink –clump with an LD threshold r2 = 0.7.

We then performed the conditional analysis in the eQTL data by providing genotypes for the lead GWAS variant to regression model as a covariate. We considered signals to be colocalized if (1) the pairwise LD was high between the GWAS variant and eQTL variants (r2 ≥ 0.7 in both in eQTL population and GWAS population) and (2) after conditioning on the GWAS variant, the lead eQTL variant no longer met the eQTL mapping threshold of eGene.

Comparison of colocalized genes in different eQTL datasets

For the colocalized genes, we compared our results with genes from previous representative studies to identify novel genes specifically identified by our synovium dataset. For OA, the GWAS data were collected from Boer et al.²⁴, Zengini et al.²⁶, and Tachmazidou et al.²⁷ (n = 826,690, 327,918, and 455,221, respectively). The colocalized genes were identified by using eQTL data from 48 GTEx tissues, and the synovium eQTL study¹¹ from 77 individuals, respectively. The RA GWASs data were collected from Ishigaki et al.^25,28, Ha et al.³², and Okada et al.²⁹ (n = 212,453, 276,020, 311,292, and 103,638 respectively). The colocalized genes were identified by using eQTL data from 48 GTEx tissues, DICE immune cells eQTLs, and BlurPrint immune cells eQTLs (monocyte, neutrophils, and T cells), respectively. The AS GWASs data were collected from Ellinghaus et al.³¹ (n = 42,939). The colocalized genes were identified by using eQTL data from peripheral blood. The JIA GWASs data were collected from Hinks et al.³⁰ (n = 15,872). The colocalized genes were identified by using eQTL data from LCLs, T cells, and fibroblast. For AS and JIA, we also conducted LD-based colocalization analysis for the GWAS tag SNPs with 48 GTEx tissues and BlurPrint immune cells eQTLs (monocytes, neutrophils, and T cells) to get the colocalized genes. All used GWAS studies are listed in Supplementary Data 2. We annotated the related functions of colocalized genes in three databases: DisGeNET¹⁰³, MGI¹⁰⁴, PHGKB¹⁰⁵, and previously published articles.

Protein–protein (PPI) analysis for colocalized genes

We conducted PPI analysis for the colocalized gene using the online version of STRING database v11.5 with an interaction threshold of 0.3. We used Cytoscape to visualize the obtained PPI network and identified hub genes with highest degree in the network.

Cell culture

The rheumatoid fibroblast-like synoviocyte line MH7A were cultured in DMEM medium (HyClone, USA) supplemented with 10% fetal bovine serum (Biological Industries, Israel), 100 units/mL penicillin, and 0.1 mg/mL streptomycin at 37 °C incubator with 5% CO₂. The MH7A cell line was obtained from Shanghai Guan&Dao Biological Engineering Co., Ltd and was authenticated using short tandem repeat (STR) profiling by scientific service at Beijing Tsingke Biotech (Beijing, China).

Fragment deletion by CRISPR-Cas9

Genotyping of rs142845557 was conducted by PCR in MH7A cells. A 918 bp sequence centered on rs142845557 was PCR-amplified from MH7A genomic DNA using primers in Supplementary Table 6. To efficiently eliminate the fragment containing rs142845557, CRISPR-associated RNA-guided endonuclease Cas9 cleavage technology (CRISPR-Cas9) was used¹⁰⁶. In brief, we first designed a set of single-guided RNAs (sgRNAs) targeting upstream and downstream of the enhancer fragment by using the CRISPR design platform maintained by the CRISPick (https://portals.broadinstitute.org/gppx/crispick/public). One pair of sgRNA was designed for this SNP (Supplementary Table 6). Oligonucleotides containing these sgRNAs were cloned into lentiCRISPR v2 plasmid (Addgene#52961).

DNA and RNA isolation and real-time qPCR (qRT-PCR)

DNA was isolated using the TIANGEN Genomic DNA Extraction Kit (catalog no. DP304; TIANGEN Biotech, Beijing, China). Total RNA was isolated from the MH7A cells using fast 200 (Fastagen, China) and was reverse-transcribed into cDNA by using the PrimeScript RT Reagent kit (TakaRa, Japan). The qRT-PCR reaction was performed using the QuantiTect SYBR Green PCR Kit (QIAGEN, USA). We used GAPDH as an endogenous control to normalize the differences in samples. Primers of qRT-PCR are shown in Supplementary Table 6.

Scratch wound-healing assay

1 × 10⁵ cells/well MH7A cells undergoing different treatments were cultured in a 24-well tissue culture plate. The straight wound in the middle of the culture was subsequently created by a sterile pipette tip after cells reached 100% confluence. After being washed with phosphate-buffered saline (PBS) twice to smooth the edge of scratch and to remove the floating cells, the cells were cultured in DMEM medium supplemented with 1% fetal bovine serum at 37 °C with 5% CO₂.

Transwell assay

Transwell assay was performed to test the invasion ability of MH7A cells. 8-μm-pore transwell chambers (Corning, USA) with 20 µL Matrigel precoated on the upper transwell chamber were put on a 24-well plate. The MH7A cells undergoing different treatments were transferred to the serum-free medium of the upper transwell chamber, 600 µL medium with 15% fetal bovine serum was added to the lower chamber as a chemoattractant. After 48 h, the upper chamber was washed with PBS multiple times. Cells in the upper layer that had not migrated were removed with cotton swabs gently. Moreover, the transwell chamber was fixed in 4% paraformaldehyde solution for 15 min and stained by 0.1% crystal violet for 20 min. The cells were counted by an inverted optical microscope (NOVEL, China) and photographed.

Cell proliferation assay by using Cell-Counting Kit-8 (CCK-8)

The MH7A cells undergoing different treatments (5 × 10³ cells/well) were seeded in 96-well plates. And then, the CCK-8 solution was added to each well at 24, 48, 72, and 96 h, and incubated for another 3 h in the incubator. After the incubation, we used a microplate reader (MULTISKAN FC, Thermo Scientific, USA) to detect the optical density (OD) value at 450 nm. There were 5 replicates in each group.

Cell apoptosis assay using one-step TUNEL apoptosis assay kit

The MH7A cells undergoing different treatments (2 × 10⁴ cells/well) were seeded in 96-well plates. TUNEL-reaction was performed by using the one-step TUNEL apoptosis assay kit (Beyotime, China) according to the manufacturer’s instructions. Each well was fixed with paraformaldehyde for 30 min and permeabilized with 0.1% Triton® X-100 for 15 min. The wells were then washed with PBS and incubated with TUNEL test solution for 1 h at 37 °C. After washed by PBS twice, we used DAPI to counterstain cell nuclei for 10 min at room temperature. Randomly chosen fields were captured by using microscope (NOVEL, China).

Quantify the accessibility of potential regulatory elements

Firstly, we trained a gkm-SVM⁷³ model to predict the chromatin accessibility for different alleles of variants located on the potential regulatory elements. The gkm-SVM produced a scoring function characterized by a set of weights quantifying the contribution of each possible 10-mer to a region’s chromatin accessibility in synovium tissue. All 154,649 peaks of ATAC-seq were further trimmed, and 100-bp sequences of summits were used as the positive training set to maximize the open chromatin signals. We then generated a negative training set by randomly sampling from the genome of regions that matched the length, GC content, and repeat fraction of the positive training set (gkm-SVM v0.8.0). To remove false negative regions as much as possible, we excluded any regions with P < 0.2 from the sampling. We then trained a gkm-SVM model by LS-GKM⁴⁴ to accelerate computation, with default parameters in the gkm-SVM model (word length l = 10, informative columns k = 6, and truncated filter d = 3) and measured the classification performance using ROC curves.

Then we applied this model to predict the scores of sequences which 9 bp surrounding all SNPs by “gkmpredict” in the LS-GKM⁴⁴. The chromatin accessibility score (CAS) for each PRE was defined as the weighted sum of genotype dosage of sequences around SNPs located on this PRE, with the predict the scores as weight:

$${CAS}=\mathop{\sum }\limits_{i=1}^{n}\mathop{\sum }\limits_{j=1}^{m}({S}_{{ij}}\times {G}_{{ij}})$$

(3)

The n represent the SNP counts in the PRE region, m is the number of alleles for SNP i, G is the genotype dosage for allele j, S is the score of j allele predicted by the SVM model.

The potential regulatory elements (PRE) were defined as the ±250 bp region of summits of each peak, which referred to the enhancer definition of Activity by Contact (ABC) model⁴⁷. We excluded the PREs that contained any indel or insert mutations to keep the consistent length of sequences. We also removed PREs for those containing SNPs less than 10 bp apart, as these SNPs genotypes would disturb the prediction of each other. Considering that if a PRE only contained a single variant or multiple variants in stronger LD, the variation of this PRE across the population would be highly collinear with this variant (or haplotype), which was not expected in the following analysis. So, we only retained PREs containing more than one independent SNP (LD R² < 0.3).

Validation of the eQTac method in another independent dataset

We used the dataset from dbGap phs000815¹⁴ (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000815.v2.p1) containing genotype, RNA-seq, and ATAC-seq data from 92 individuals to validate our eQTac method. For the genotype data, we performed imputation analysis and then SNPs with MAF < 0.1 were removed due to the small sample size. For the ATAC-seq data, we used macs2 to call the peaks and extract the 100 bp sequences around the summits of peaks as the positive training set. The gene expression data were processed using the same pipeline as our eQTL calculation. The actual open chromatin scores of PRE regions were defined as the inverse normalized transformed reads counts from ATAC-seq data. Significant eQTL genes were subjected to subsequent analysis. The correlation results using linear regression analysis for the open chromatin scores and target gene expression values were used to define the ground truth. That’s, PRE-gene pairs with P < 0.05 were defined as true correlation. Then we used our eQTac method to identify the significant PRE-gene pairs.

Identification of eQTac

The cis-eQTac was calculated for each gene calculated in cis-eQTL mapping, with the same cis region and covariates as eQTL mapping:

$${{{{{\rm{expression}}}}}} \sim {\beta }_{1}\times {{{{{\rm{CAS}}}}}}+{{{{{\rm{Covariates}}}}}}$$

(4)

The CAS was the chromatin accessibility score for each PRE, with β₁ as the effect of this open chromatin region to the target gene. To identify the significant eQTac and controlled FDR (false discovery rates), we generated the null distribution of P-values for all PRE-gene pairs by randomly permuting the individual labels of gene expression 100 times. The FDR was determined as the proportion of permuted P-values over the proportion of non-permuted P-values under a specific P-value threshold. The threshold of P-values was set as 6.36 × 10⁻⁴ under FDR = 0.05.

Estimated causal SNPs for synovium eQTL

Dap-g¹⁰⁷ methods were applied to the cis-eQTL data to produce estimates of the causal SNPs. Dap-g is designed for multi-SNP genetic association analysis which employs a spike-and-slab prior model to select potential multiple independent cis-eQTLs in eQTL mapping. Briefly, the fine-mapping was conducted in the following steps:

(1)

Calculated single-SNP Bayes factor for each SNP-gene pair.
(2)

Calculated the priors probability for each SNP-gene pair by torus¹⁰⁸, which includes the distance of each SNP to gene TSS and the chromatin accessibility annotation as prior information.
(3)

Conducted multi-SNP fine-mapping by dap-g, with -ld_control 0.5 and –no_size_limit.

The 80%, 90%, 95%, and 99% credible set for each cis-eQTL consists of variants that include the causal variant with 80%, 90%, 95%, and 99% probability, respectively.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.