Photo by Thomas Jensen on Unsplash

Database Best Practices

Abstract

Getting data into and out of databases is one of the most fundamental parts of data science. Much of the world’s data is stored in databases of various forms, including traditional databases such as MySQL, PostgreSQL, SQL Server and Oracle, as well as NoSQL databases like BigQuery, Redshift and Spark. Modern methods using the odbc, pool and DBI packages make connecting, querying and retrieval from these diverse data sources concise, uniform and performant. Odbc provides a DBI interface to ODBC drivers, allowing fast access to a wide variety of ODBC compatible databases. Pool provides a fault-tolerant connection pool, very useful for use in shiny applications and web services. DBI allows interaction with any of these sources using a consistent interface. We will discuss the most effective ways to use these packages in a data analysis, shiny applications and production web services. This talk was a preview of my talk of the same name at rstudio::conf(2017L)

Date
Location
Cleveland, OH
Avatar
Jim Hester
Software Engineer

I’m a Senior Software Engineer at Netflix and R package developer.