1. Data import
data_import.Rmd
Introduction to bubbler
Denoised amplicon count datasets can be generated by various means,
thus bubbler
has various data import methods. I designed
bubbler
to work alongside other common microbial ecology
analysis tools (phyloseq, qiime2, DADA2, vegan, ape, etc), with it’s
main purpose being to simplify and add functionality to community
composition visualization.
bubbler
generates relative_abundance tables in
tibble::tibble
format. This format was chosen because it
fits into the tidyverse
ecosystem, which was used to
develop the majority of this package. Generated tables can be modified
by passing them to arranging, pooling, subsampling, etc, functions,
which will return a modified table. Using the
magrittr::%>%
pipe, you can string together multiple
functions to create the desired visualization. Here I provide examples
for the first and arguable most important step: importing the data.
QIIME2 - .qza
qiime2 asv-table and taxonomy
artifacts (.qza), and optionally, the qiime-formatted metadata (.tsv)
can be imported. Here, bubbler::rel_abund_qiime
is using
data from the “Moving Pictures” Qiime2 tutorial.
# path to qiime-formatted asv counts
counts_q <- system.file("extdata", "qiime", "table-dada2.qza", package = "bubbler")
# path to qiime-formatted taxonomy data
taxa_q <- system.file("extdata", "qiime", "taxonomy.qza", package = "bubbler")
# path to qiime-formatted metadata
metadata_q <- system.file("extdata", "qiime", "sample-metadata.tsv", package = "bubbler")
# make a relative abundance table
rel_abund_qiime(counts_q, taxa_q, metadata_q)
#> # A tibble: 26,180 × 13
#> sample_id asv level taxon rel_abund barcode_sequence body_site year month
#> <chr> <chr> <chr> <chr> <dbl> <fct> <fct> <dbl> <dbl>
#> 1 L1S105 33e2c… Phyl… Firm… 0 AGTGCGATGCGT gut 2009 3
#> 2 L1S105 5656d… Phyl… Firm… 0 AGTGCGATGCGT gut 2009 3
#> 3 L1S105 7d893… Phyl… Firm… 0 AGTGCGATGCGT gut 2009 3
#> 4 L1S105 ecf9e… Phyl… Prot… 0 AGTGCGATGCGT gut 2009 3
#> 5 L1S105 acfe4… Phyl… Firm… 0 AGTGCGATGCGT gut 2009 3
#> 6 L1S105 80b20… Phyl… Firm… 0 AGTGCGATGCGT gut 2009 3
#> 7 L1S105 a1b97… Phyl… Firm… 0 AGTGCGATGCGT gut 2009 3
#> 8 L1S105 d781f… Phyl… Firm… 0 AGTGCGATGCGT gut 2009 3
#> 9 L1S105 bfbed… Phyl… Firm… 0.00193 AGTGCGATGCGT gut 2009 3
#> 10 L1S105 90d32… Phyl… Firm… 0.00399 AGTGCGATGCGT gut 2009 3
#> # ℹ 26,170 more rows
#> # ℹ 4 more variables: day <dbl>, subject <fct>,
#> # reported_antibiotic_usage <fct>, days_since_experiment_start <dbl>
DADA2 - .tsv
DADA2 denoises .fastq files to generate ASV count tables and ASV
taxonomic classifications. Normally, I export these as .tsv files.
bubbler::rel_abund_tsv
expects to see an asv-table with
ASVs as columns and samples as rows (wide format), and a taxonomy table
with taxonomic levels as columns and ASVs as rows.
# path to asv counts in tab-separated format
counts <- system.file("extdata", "tsv", "seqtab.tsv", package = "bubbler")
# path to taxonomy data in tab-separated format
taxa <- system.file("extdata", "tsv", "taxa.tsv", package = "bubbler")
# path to metadata in tab-seperated format
metadata <- system.file("extdata", "tsv", "metadata.tsv", package = "bubbler")
# make a relative abundance table
rel_abund_tsv(counts, taxa, metadata)
#> # A tibble: 200 × 8
#> sample_id asv level taxon rel_abund Depth Carbon_source Date
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <date>
#> 1 Smp1 ASV1 Phylum Actinomyceto… 6.09e-6 15 Hexadecane 2021-01-21
#> 2 Smp1 ASV2 Phylum Bacillota 1.83e-5 15 Hexadecane 2021-01-21
#> 3 Smp1 ASV3 Phylum Bacillota 4.83e-3 15 Hexadecane 2021-01-21
#> 4 Smp1 ASV4 Phylum Pseudomonado… 5.48e-5 15 Hexadecane 2021-01-21
#> 5 Smp1 ASV5 Phylum Pseudomonado… 6.09e-5 15 Hexadecane 2021-01-21
#> 6 Smp1 ASV6 Phylum Pseudomonado… 7.71e-3 15 Hexadecane 2021-01-21
#> 7 Smp1 ASV7 Phylum Bacteroidota 1.76e-4 15 Hexadecane 2021-01-21
#> 8 Smp1 ASV8 Phylum Bacillota 0 15 Hexadecane 2021-01-21
#> 9 Smp1 ASV9 Phylum Pseudomonado… 0 15 Hexadecane 2021-01-21
#> 10 Smp1 ASV10 Phylum Pseudomonado… 6.09e-6 15 Hexadecane 2021-01-21
#> # ℹ 190 more rows
Phyloseq - phyloseq R object
If you are analyzing your data through the phyloseq
package, the phyloseq
object can be imported as well, as
long as it contains an otu_table, tax_table, and optionally,
sam_data.
# example phyloseq
rel_abund_phy(physeq, taxa_data = TRUE, meta_data = TRUE)
#> Loading required package: phyloseq
#> # A tibble: 1,000 × 9
#> sample_id asv level taxon rel_abund depth location date sample_id.1
#> <chr> <chr> <chr> <chr> <dbl> <int> <chr> <date> <chr>
#> 1 Smp1 ASV1 Phylum Pseud… 2.12e-6 30 place_b 2020-02-17 Smp1
#> 2 Smp1 ASV2 Phylum Spiro… 6.35e-6 30 place_b 2020-02-17 Smp1
#> 3 Smp1 ASV3 Phylum Pseud… 1.68e-3 30 place_b 2020-02-17 Smp1
#> 4 Smp1 ASV4 Phylum Pseud… 1.91e-5 30 place_b 2020-02-17 Smp1
#> 5 Smp1 ASV5 Phylum Actin… 2.12e-5 30 place_b 2020-02-17 Smp1
#> 6 Smp1 ASV6 Phylum Actin… 2.68e-3 30 place_b 2020-02-17 Smp1
#> 7 Smp1 ASV7 Phylum Pseud… 6.14e-5 30 place_b 2020-02-17 Smp1
#> 8 Smp1 ASV8 Phylum Bacil… 0 30 place_b 2020-02-17 Smp1
#> 9 Smp1 ASV9 Phylum Pseud… 0 30 place_b 2020-02-17 Smp1
#> 10 Smp1 ASV10 Phylum Bacte… 2.12e-6 30 place_b 2020-02-17 Smp1
#> # ℹ 990 more rows
Kracken2/Bracken - .txt
The .txt files from a Bracken workflow can be imported.
# path to bracken-formatted .txt files
path <- system.file("extdata", "bracken", package = "bubbler")
rel_abund_bracken(path)
#> # A tibble: 9,200 × 3
#> sample_id taxon rel_abund
#> <chr> <chr> <dbl>
#> 1 20_S91 Stenotrophomonas sp. LM091 0.000206
#> 2 20_S91 Stenotrophomonas sp. 364 0.0000245
#> 3 20_S91 Stenotrophomonas sp. 169 0.000000670
#> 4 20_S91 Stenotrophomonas sp. Pemsol 0.000000168
#> 5 20_S91 Stenotrophomonas sp. DR822 0.000000168
#> 6 20_S91 Stenotrophomonas sp. NA06056 0.000000168
#> 7 20_S91 Stenotrophomonas sp. SXG-1 0.000000168
#> 8 20_S91 Stenotrophomonas rhizophila 0.0000106
#> 9 20_S91 Stenotrophomonas maltophilia 0.00000452
#> 10 20_S91 Stenotrophomonas sp. SAU14A_NAIMI4_5 0.000000335
#> # ℹ 9,190 more rows