R | Jim Hester

Stylish reports with Knitr and Bootstrap

I am pleased to announce my project for generating styled, dynamic html reports knitr_bootstrap. Here is an example of a report. I have been using knitr to produce reports for around 6 months or so now, first using pandoc to generate the html, lately I have been using the Rmarkdown R package instead. This has a couple of advantages over pandoc. First, it is much easier to install as anyone who has tried to install haskell on CentOS can attest to.

Setting ggplot2 default color scales

ggplot2 is a very nice R plotting library, however some people do not like the default color scales for plots. You can explicitly set the color scale for each one of your plots, but a better solution would be to simply change the defaults. ## Warning in rm(scale_colour_discrete): object 'scale_colour_discrete' not ## found library(ggplot2) #default colors qplot(data=iris, Petal.Length, Petal.Width, color=Species, size=I(4)) #change default without arguments scale_colour_discrete <- scale_colour_grey last_plot() #change default with arguments scale_colour_discrete <- function(.

Parsing gff files with Rcpp for use with ggbio

The ggbio package is a great tool to use for visualizing next generation sequencing data. It combines the graphing power of the ggplot2 package with the biology aware data structures in bioconductor. The package includes support for plotting genes in the standard databases supplied by bioconductor, which works well for heavily studied organisms such as human and mouse. If you are interested in a less well annotated organism, there is no prebuilt database to pull from.

Plotting manual fitted model predictions using ggplot

ggplot provides convenient smoothing functions for fitting models to data with the built in geom_smooth and stat_smooth methods. library(ggplot2) (points = ggplot(data=mtcars, aes(x=hp,y=mpg)) + geom_point()) (points_smoothed = points + geom_smooth(method="lm", se=F)) (one_facet <- points_smoothed + facet_wrap(~cyl)) When you are faceting data, either spatially or by color/linetype/shape doing the subsetting and model fitting manually can be somewhat daunting. (two_facet = points_smoothed + facet_grid(cyl~gear)) However once you understand the process, and are familiar with the plyr library of functions it is actually very straightforward.

One liner for perl dependencies

If your module is FooBar, and you are using cpanminus then cpanm perl -MFooBar -e 'print join("\n", keys %INC),"\n"' will install all the dependencies needed for that module. This however will not work if you do not have the modules installed to run the script in the first place, but if you install the Devel::Modlist package it is as simple as cpanm `perl -MDevel::Modlist=stdout,noversion FooBar.pl`

Setting up a local cpan using cpanminus without root access

When asked why colleagues do not use perl modules in their work, often the response is that they do not know how to install them without having root access to the server they are working on. Cpan can be configured to install to a different base directory, however this requires a number of options to be set correctly, and can be a pain to get set up. However using cpan minus and the local::lib module makes this process as painless as running three simple commands, easy enough to set up for just about anyone.

On the fly bam to sam conversion using named pipes

In bioinformatics the common format developed for storing short read alignments is the SAM format which has a binary representation and an ASCII text form. There exists a C API to work with the binary format directly, as well as language bindings for most of the common programming languages. Heng Li, the author of the format and the bwa short read aligner, created the samtools program to work with the SAM format and convert between bam/sam among many other tasks.