1. Data import

Introduction to `bubbler`

Denoised amplicon count datasets can be generated by various means, thus bubbler has various data import methods. I designed bubbler to work alongside other common microbial ecology analysis tools (phyloseq, qiime2, DADA2, vegan, ape, etc), with it’s main purpose being to simplify and add functionality to community composition visualization.

bubbler generates relative_abundance tables in tibble::tibble format. This format was chosen because it fits into the tidyverse ecosystem, which was used to develop the majority of this package. Generated tables can be modified by passing them to arranging, pooling, subsampling, etc, functions, which will return a modified table. Using the magrittr::%>% pipe, you can string together multiple functions to create the desired visualization. Here I provide examples for the first and arguable most important step: importing the data.

QIIME2 - .qza

qiime2 asv-table and taxonomy artifacts (.qza), and optionally, the qiime-formatted metadata (.tsv) can be imported. Here, bubbler::rel_abund_qiime is using data from the “Moving Pictures” Qiime2 tutorial.

# path to qiime-formatted asv counts 
counts_q <- system.file("extdata", "qiime", "table-dada2.qza", package = "bubbler")
# path to qiime-formatted taxonomy data 
taxa_q <- system.file("extdata", "qiime", "taxonomy.qza", package = "bubbler")
# path to qiime-formatted metadata
metadata_q <- system.file("extdata", "qiime", "sample-metadata.tsv", package = "bubbler")

# make a relative abundance table 
rel_abund_qiime(counts_q, taxa_q, metadata_q)
#> # A tibble: 26,180 × 13
#>    sample_id asv    level taxon rel_abund barcode_sequence body_site  year month
#>    <chr>     <chr>  <chr> <chr>     <dbl> <fct>            <fct>     <dbl> <dbl>
#>  1 L1S105    33e2c… Phyl… Firm…   0       AGTGCGATGCGT     gut        2009     3
#>  2 L1S105    5656d… Phyl… Firm…   0       AGTGCGATGCGT     gut        2009     3
#>  3 L1S105    7d893… Phyl… Firm…   0       AGTGCGATGCGT     gut        2009     3
#>  4 L1S105    ecf9e… Phyl… Prot…   0       AGTGCGATGCGT     gut        2009     3
#>  5 L1S105    acfe4… Phyl… Firm…   0       AGTGCGATGCGT     gut        2009     3
#>  6 L1S105    80b20… Phyl… Firm…   0       AGTGCGATGCGT     gut        2009     3
#>  7 L1S105    a1b97… Phyl… Firm…   0       AGTGCGATGCGT     gut        2009     3
#>  8 L1S105    d781f… Phyl… Firm…   0       AGTGCGATGCGT     gut        2009     3
#>  9 L1S105    bfbed… Phyl… Firm…   0.00193 AGTGCGATGCGT     gut        2009     3
#> 10 L1S105    90d32… Phyl… Firm…   0.00399 AGTGCGATGCGT     gut        2009     3
#> # ℹ 26,170 more rows
#> # ℹ 4 more variables: day <dbl>, subject <fct>,
#> #   reported_antibiotic_usage <fct>, days_since_experiment_start <dbl>

DADA2 - .tsv

DADA2 denoises .fastq files to generate ASV count tables and ASV taxonomic classifications. Normally, I export these as .tsv files. bubbler::rel_abund_tsv expects to see an asv-table with ASVs as columns and samples as rows (wide format), and a taxonomy table with taxonomic levels as columns and ASVs as rows.

# path to asv counts in tab-separated format
counts <- system.file("extdata", "tsv", "seqtab.tsv", package = "bubbler")
# path to taxonomy data in tab-separated format
taxa <- system.file("extdata", "tsv", "taxa.tsv", package = "bubbler")
# path to metadata in tab-seperated format 
metadata <- system.file("extdata", "tsv", "metadata.tsv", package = "bubbler")

# make a relative abundance table
rel_abund_tsv(counts, taxa, metadata)
#> # A tibble: 200 × 8
#>    sample_id asv   level  taxon         rel_abund Depth Carbon_source Date      
#>    <chr>     <chr> <chr>  <chr>             <dbl> <dbl> <chr>         <date>    
#>  1 Smp1      ASV1  Phylum Actinomyceto…   6.09e-6    15 Hexadecane    2021-01-21
#>  2 Smp1      ASV2  Phylum Bacillota       1.83e-5    15 Hexadecane    2021-01-21
#>  3 Smp1      ASV3  Phylum Bacillota       4.83e-3    15 Hexadecane    2021-01-21
#>  4 Smp1      ASV4  Phylum Pseudomonado…   5.48e-5    15 Hexadecane    2021-01-21
#>  5 Smp1      ASV5  Phylum Pseudomonado…   6.09e-5    15 Hexadecane    2021-01-21
#>  6 Smp1      ASV6  Phylum Pseudomonado…   7.71e-3    15 Hexadecane    2021-01-21
#>  7 Smp1      ASV7  Phylum Bacteroidota    1.76e-4    15 Hexadecane    2021-01-21
#>  8 Smp1      ASV8  Phylum Bacillota       0          15 Hexadecane    2021-01-21
#>  9 Smp1      ASV9  Phylum Pseudomonado…   0          15 Hexadecane    2021-01-21
#> 10 Smp1      ASV10 Phylum Pseudomonado…   6.09e-6    15 Hexadecane    2021-01-21
#> # ℹ 190 more rows

Phyloseq - phyloseq R object

If you are analyzing your data through the phyloseq package, the phyloseq object can be imported as well, as long as it contains an otu_table, tax_table, and optionally, sam_data.

# example phyloseq
rel_abund_phy(physeq, taxa_data = TRUE, meta_data = TRUE)
#> Loading required package: phyloseq
#> # A tibble: 1,000 × 9
#>    sample_id asv   level  taxon  rel_abund depth location date       sample_id.1
#>    <chr>     <chr> <chr>  <chr>      <dbl> <int> <chr>    <date>     <chr>      
#>  1 Smp1      ASV1  Phylum Pseud…   2.12e-6    30 place_b  2020-02-17 Smp1       
#>  2 Smp1      ASV2  Phylum Spiro…   6.35e-6    30 place_b  2020-02-17 Smp1       
#>  3 Smp1      ASV3  Phylum Pseud…   1.68e-3    30 place_b  2020-02-17 Smp1       
#>  4 Smp1      ASV4  Phylum Pseud…   1.91e-5    30 place_b  2020-02-17 Smp1       
#>  5 Smp1      ASV5  Phylum Actin…   2.12e-5    30 place_b  2020-02-17 Smp1       
#>  6 Smp1      ASV6  Phylum Actin…   2.68e-3    30 place_b  2020-02-17 Smp1       
#>  7 Smp1      ASV7  Phylum Pseud…   6.14e-5    30 place_b  2020-02-17 Smp1       
#>  8 Smp1      ASV8  Phylum Bacil…   0          30 place_b  2020-02-17 Smp1       
#>  9 Smp1      ASV9  Phylum Pseud…   0          30 place_b  2020-02-17 Smp1       
#> 10 Smp1      ASV10 Phylum Bacte…   2.12e-6    30 place_b  2020-02-17 Smp1       
#> # ℹ 990 more rows

Kracken2/Bracken - .txt

The .txt files from a Bracken workflow can be imported.

# path to bracken-formatted .txt files
path <- system.file("extdata", "bracken", package = "bubbler")

rel_abund_bracken(path)
#> # A tibble: 9,200 × 3
#>    sample_id taxon                                  rel_abund
#>    <chr>     <chr>                                      <dbl>
#>  1 20_S91    Stenotrophomonas sp. LM091           0.000206   
#>  2 20_S91    Stenotrophomonas sp. 364             0.0000245  
#>  3 20_S91    Stenotrophomonas sp. 169             0.000000670
#>  4 20_S91    Stenotrophomonas sp. Pemsol          0.000000168
#>  5 20_S91    Stenotrophomonas sp. DR822           0.000000168
#>  6 20_S91    Stenotrophomonas sp. NA06056         0.000000168
#>  7 20_S91    Stenotrophomonas sp. SXG-1           0.000000168
#>  8 20_S91    Stenotrophomonas rhizophila          0.0000106  
#>  9 20_S91    Stenotrophomonas maltophilia         0.00000452 
#> 10 20_S91    Stenotrophomonas sp. SAU14A_NAIMI4_5 0.000000335
#> # ℹ 9,190 more rows

Introduction to bubbler

QIIME2 - .qza

DADA2 - .tsv

Phyloseq - phyloseq R object

Kracken2/Bracken - .txt

Introduction to `bubbler`