Skip to contents

Relative abundance

bubbler takes an ASV table and converts counts to relative abundances using this simple formula:

Relative abundance=xixi=1\text{Relative abundance} = \frac{x_i}{\sum x_i} = 1

Where xix_i is the count of a given ASV, divided by the sum of all ASV counts. This way, the sum of all relative abundances equal one. bubbler then tacks on taxonomic information and optionally metadata. By default, bubbler::bar_plot produces stacked bar plots, which show the proportional differences of counts between samples.

# 1. make rel_abund
rel_abund  <-  rel_abund_qiime(counts_q, 
                               taxa_q, 
                               metadata_q,
                               taxa_level = "Genus")

# 2. modify rel_abund
rel_abund_pool <- rel_abund %>%
    # pool taxa so that only the 12 most abundant taxa are displayed 
    pool_taxa(n_taxa = 12, keep_metadata = TRUE) 

# 3. plot rel_abund
rel_abund_pool %>%
    bar_plot()  + 
    # facet samples by body site. free_x removes unwanted white space
    facet_wrap(~body_site, scales = "free_x" ) 

bubbler::bar_plots can also be filled in. Based on the above plot, subject proportions on the right palm are vastly different, and this information would be lost in a filled-in bar plot. By filling the plotting area, we get a better representation of the between-sample composition, at the cost of obfuscating the between-sample proportions.

rel_abund_pool %>%
    bar_plot(position = "fill")  + 
    facet_wrap(~body_site, scales = "free_x") 

One last thing about relative abundances: when setting position to “fill”, we are effectively recomputing rel_abund, so that it sums to one within each bar on the x-axis. This transformation is equivalent to the following formula:

Grouped relative abundance=xijj=1kxij=k levels\text{Grouped relative abundance} = \frac{x_{ij}}{\sum_{j=1}^{k} x_{ij}} = k \text{ levels}

Where kk is the levels of the grouping variable (in this case sample_id) and i_i indexs through the ASVs and j_j indexs through the levels, Within each level of the grouping variable, the sum of relative abundance equals one, and the total sum of rel_abund is equal to the kk levels. This information is not very important in practice because position = “fill” scales the relative abundances for you.

Visualizing variables

bubbler allows you to focus on any metadata variable in your relative abundance table. Say you wanted to look at relative abundance across body_site, rather than sample_id on the x-axis. This can be done by modifying the x_var argument of bar_plot.

rel_abund_qiime(counts_q, taxa_q, metadata_q, taxa_level = "Genus") %>%
    pool_taxa(n_taxa = 12,
              keep_metadata = TRUE) %>%
    arrange_taxa() %>%
    bar_plot(x_var = "body_site")

For any given x_var, (sample_id, body_site, etc ), you have two options for computing the scaled rel_abund: Setting the var argument within the rel_abund function, or setting position = “fill” within bubbler::bar_plot. I prefer the latter method, which keeps the proportional differences.

# compute in rel_abund
rel_abund_qiime(counts_q, taxa_q, metadata_q,
                taxa_level = "Genus",
                var = "body_site") %>% #  scaling by body_site
    pool_taxa(n_taxa = 12,
              keep_metadata = TRUE) %>%
    arrange_taxa() %>%
    bar_plot(x_var = "body_site") # setting body_site as x_var
# scale in plot
rel_abund_qiime(counts_q, taxa_q, metadata_q,
                taxa_level = "Genus") %>%
    pool_taxa(n_taxa = 12,
              keep_metadata = TRUE) %>%
    arrange_taxa() %>%
    bar_plot(position = "fill", x_var = "body_site")  # setting position as fill

Showing both scaled and unscaled information.

If you want the best of both worlds, you can create a scaled bar plot, and superimpose the proportional abundances. Currently, this feature does not work with a faceting functions, so the only option would be to split a rel_abund table into multiple plots.

# order by sample read abundance, plot as line
library(patchwork)

counts_q <- system.file("extdata", "qiime", "table-dada2.qza", package = "bubbler")
taxa_q <- system.file("extdata", "qiime", "taxonomy.qza", package = "bubbler")
metadata_q <- system.file("extdata", "qiime", "sample-metadata.tsv", package = "bubbler")

q <- rel_abund_qiime(
    asv_qiime = counts_q,
    taxa_qiime = taxa_q,
    metadata_qiime = metadata_q,
    taxa_level = "Genus", ) %>% 
    pool_taxa(n_taxa = 8, keep_metadata = TRUE) %>%
    arrange_taxa() %>%
    arrange_var(levels = "body_site")
    # arrange_var_abund(flip = TRUE)

yes <- subset_rel_abund(q, var = "reported_antibiotic_usage", selection = "Yes")
no <- subset_rel_abund(q, var = "reported_antibiotic_usage", selection = "No")

p1 <- bar_plot(yes, position = "fill", true_line = TRUE ) + ggtitle("Antibiotics used") + 
geom_text(aes(label = body_site, y = 0.9), 
          angle = 90, color = "white", size = 3)
p2 <- bar_plot(no, position = "fill" , true_line = TRUE) + ggtitle("Antibiotics not used") + 
geom_text(aes(label = body_site, y = 0.9), 
          angle = 90, color = "white", size = 3)
 p1 + p2 + plot_layout(guides = "collect", axes = "collect")