Real-time file import with the vroom package

Real-time file import with the vroom package

Abstract

File import in R could be considered a solved problem, with multiple widely used packages (data.table, readr, and others) providing fast, robust import of common formats in addition to the functions available in base R. However I feel there is still room for improvement in existing approaches. vroom is able to index and then query multi-Gigabyte files, including those with categorical, text and temporal data, in near real-time. This is a huge boon for interactive data analysis as you can jump directly into exploratory analysis without sampling or long waits for full import. vroom leverages the Altrep framework introduced in R 3.5 along with lazy, just-in-time parsing of the data to provide this improved latency without requiring changes to existing data manipulation code. I will throughly explain the techniques used in vroom to ensure good performance, describe challenges overcome in implementing it, and provide an interactive demonstration of its capabilities.

Date
Location
Cleveland, OH
Avatar
Jim Hester
Software Engineer

I’m a Senior Software Engineer at Netflix and R package developer.