Skip to contents

The goal of tfboot is to facilitate statistical analysis of SNPs disrupting transcription factor binding sites (TFBS) using bootstrapping resampling to create empirical null distributions.

If you use tfboot, please consider citing the paper: Turner, S.D., et al. (2023). tfboot: Bootstrapping and statistical analysis for transcription factor binding site-disrupting variants in gene sets. bioRxiv 2023.07.14.549004. DOI: 10.1101/2023.07.14.549004.

Installation

You can install tfboot from GitHub with the code below. To install only tfboot:

# install.packages("devtools")
devtools::install_github("colossal-compsci/tfboot")

To install required and suggested packages, including motifbreakR and those needed to build the vignette:

# install.packages("devtools")
devtools::install_github("colossal-compsci/tfboot",
                         build_vignettes = TRUE, 
                         dependencies = c("Imports", "Suggests"))

See the pkgdown documentation for an introductory vignette and function documentation.

Example

Let’s use an example from the vignette. First, let’s load some pre-baked motifbreakR results. mbres is a set of motifbreakR results run on SNPs in the 5kb promoter region of a random selection of 5 genes. mball is the precomputed set of motifbreakR results run on SNPs in the promoter region of all genes.

mbres <- vignettedata$mbres
mbres
#> # A tibble: 1,335 × 10
#>    gene_id   SNP_id            tf    pctRef pctAlt scoreRef scoreAlt effect alleleDiff alleleEffectSize
#>    <chr>     <chr>             <chr>  <dbl>  <dbl>    <dbl>    <dbl> <chr>       <dbl>            <dbl>
#>  1 100316005 chr33:7472152_G/C ACE2   0.986  0.783     4.85     3.87 strong     -0.980          -0.199 
#>  2 100316005 chr33:7472176_T/G ADR1   0.759  0.917     3.34     4.01 weak        0.673           0.154 
#>  3 100316005 chr33:7471139_A/C AFT2   0.964  0.878     5.50     5.02 weak       -0.480          -0.0843
#>  4 100316005 chr33:7472176_T/G AFT2   0.902  0.740     5.15     4.24 strong     -0.911          -0.160 
#>  5 100316005 chr33:7473834_G/T AFT2   0.879  0.705     5.02     4.04 strong     -0.980          -0.172 
#>  6 100316005 chr33:7472152_G/C AGL42  0.859  0.693     5.14     4.15 strong     -0.986          -0.165 
#>  7 100316005 chr33:7471139_A/C AGL55  0.695  0.861     3.97     4.91 strong      0.939           0.165 
#>  8 100316005 chr33:7472830_A/T ALX3   0.972  0.776     4.23     3.43 strong     -0.806          -0.185 
#>  9 100316005 chr33:7473857_T/C ALX3   0.919  0.773     4.02     3.42 weak       -0.600          -0.138 
#> 10 100316005 chr33:7473583_G/A ARG80  0.731  0.961     3.16     4.15 strong      0.993           0.230 
#> # ℹ 1,325 more rows
mball <- vignettedata$mball
mball
#> # A tibble: 12,689 × 10
#>    gene_id   SNP_id            tf        pctRef pctAlt scoreRef scoreAlt effect alleleDiff alleleEffectSize
#>    <chr>     <chr>             <chr>      <dbl>  <dbl>    <dbl>    <dbl> <chr>       <dbl>            <dbl>
#>  1 100315917 chr33:5706617_C/T ARG80      0.979  0.749     4.23     3.24 strong     -0.993           -0.230
#>  2 100315917 chr33:5706617_C/T ARG81      0.991  0.825     5.89     4.90 strong     -0.982           -0.165
#>  3 100315917 chr33:5703167_G/T ARGFX      0.781  0.889     5.06     5.74 weak        0.677            0.105
#>  4 100315917 chr33:5706617_C/T ARR1       0.634  0.880     2.53     3.42 strong      0.892            0.231
#>  5 100315917 chr33:5707492_C/T AT1G19040  0.882  0.779     7.60     6.73 strong     -0.861           -0.100
#>  6 100315917 chr33:5704406_C/G AT3G46070  0.784  0.892     5.13     5.79 weak        0.666            0.103
#>  7 100315917 chr33:5706617_C/T ATHB-12    0.781  0.890     4.12     4.67 weak        0.552            0.106
#>  8 100315917 chr33:5706617_C/T ATHB-13    0.721  0.868     4.98     5.98 strong      0.993            0.145
#>  9 100315917 chr33:5706617_C/T ATHB-16    0.813  0.966     5.32     6.32 strong      0.998            0.153
#> 10 100315917 chr33:5706617_C/T ATHB-4     0.778  0.931     4.98     5.94 strong      0.963            0.151
#> # ℹ 12,679 more rows

Summarize motifbreakR results on our 5 genes of interest. This shows us the actual values for the number of SNPs in the upstream regions of these genes, and summary statistics on the allele differences, effect sizes, etc. See the vignette and ?mb_summarize for details.

mbsmry <- mb_summarize(mbres)
mbsmry
#> # A tibble: 1 × 7
#>   ngenes nsnps nstrong alleleDiffAbsMean alleleDiffAbsSum alleleEffectSizeAbsMean alleleEffectSizeAbsSum
#>    <int> <int>   <int>             <dbl>            <dbl>                   <dbl>                  <dbl>
#> 1      5  1335     950             0.798            1066.                   0.155                   206.

Bootstrap resample motifbreakR results for all genes. Resample sets of 5 genes 250 times.

set.seed(42)
mbboot <- mb_bootstrap(mball, ngenes=5, boots = 250)
mbboot$bootwide
#> # A tibble: 250 × 9
#>     boot genes                                    ngenes nsnps nstrong alleleDiffAbsMean alleleDiffAbsSum alleleEffectSizeAbsMean alleleEffectSizeAbsSum
#>    <int> <chr>                                     <int> <int>   <int>             <dbl>            <dbl>                   <dbl>                  <dbl>
#>  1     1 431304;426469;100315917;396544;374124         5  1062     750             0.797             847.                   0.155                   164.
#>  2     2 426183;395959;431304;429501;396170            5  1244     914             0.808            1005.                   0.157                   195.
#>  3     3 102465361;426183;396544;426469;429172         5   886     655             0.812             720.                   0.149                   132.
#>  4     4 396007;407779;693245;429501;100498692         5  1514    1139             0.815            1234.                   0.153                   232.
#>  5     5 426883;396544;408041;426183;426469            5   925     623             0.787             728.                   0.147                   136.
#>  6     6 425059;429035;100529062;396007;425613         5  1350    1026             0.821            1109.                   0.155                   209.
#>  7     7 408042;426880;100498692;425499;426885         5   823     627             0.823             677.                   0.157                   129.
#>  8     8 396170;425058;426886;395587;396045            5  1262     910             0.805            1016.                   0.147                   185.
#>  9     9 102466833;426183;100529061;396045;395959      5  1263     963             0.822            1038.                   0.153                   193.
#> 10    10 429035;408042;100529062;100529061;425613      5  1289     987             0.823            1061.                   0.154                   198.
#> # ℹ 240 more rows
mbboot$bootdist
#> # A tibble: 6 × 2
#>   metric                  bootdist   
#>   <chr>                   <list>     
#> 1 alleleDiffAbsMean       <dbl [250]>
#> 2 alleleDiffAbsSum        <dbl [250]>
#> 3 alleleEffectSizeAbsMean <dbl [250]>
#> 4 alleleEffectSizeAbsSum  <dbl [250]>
#> 5 nsnps                   <dbl [250]>
#> 6 nstrong                 <dbl [250]>

Compare the values from our five genes of interest to the empirical null distribution from bootstrap resampling.

bootstats <- mb_bootstats(mbsmry, mbboot)
bootstats
#> # A tibble: 6 × 5
#>   metric                      stat bootdist     bootmax     p
#>   <chr>                      <dbl> <list>         <dbl> <dbl>
#> 1 nsnps                   1335     <dbl [250]> 1597     0.136
#> 2 nstrong                  950     <dbl [250]> 1210     0.188
#> 3 alleleDiffAbsMean          0.798 <dbl [250]>    0.831 0.836
#> 4 alleleDiffAbsSum        1066.    <dbl [250]> 1304.    0.148
#> 5 alleleEffectSizeAbsMean    0.155 <dbl [250]>    0.163 0.456
#> 6 alleleEffectSizeAbsSum   206.    <dbl [250]>  251.    0.128

Visualize the results:

plot_bootstats(bootstats)

See vignette("intro", package="tfboot") or the articles on the pkgdown documentation website for more.