My interview with Hadley Wickham, on the best books for data scientists learning computer science

Hadley's book choices

I’m very happy to publish today my Five Books interview with Hadley Wickham, one of the most respected programmers in the world of data science and R. Hadley recommended five of the best computer science books for data scientists, which gave us an opportunity to discuss programming languages, writing style, and the future of the tidyverse. The interview can be found at

Data science is often said to be built on three pillars: domain expertise, statistics, and programming. Hadley Wickham, Chief Scientist at RStudio and creator of many packages for the R programming language, chooses the best books to help aspiring data scientists build solid computer science fundamentals.

Read More

How to rank the 32 teams in the 2018 FIFA World Cup with R and the 'elo' package

The 2018 World Cup is upon us! If you’re tempted to do a little betting, or you’re taking part in a friendly forecast competition with friends or colleagues, read on. In this tutorial, we’ll learn how to use R and the ‘elo’ package to create Elo rankings for the 32 teams in the tournament, and how to use those rankings to predict the result of football matches.

Read More

How to use R to identify variable types before importing a file to MySQL

I recently had to import a lot of CSV files into a MySQL database. Given that I didn’t know the data and the format of the files very well, I wrote this short R script. It prints a data.frame that indicates, for each variable in the file:

  • Its data type in your R data.frame;
  • Some more information about the data (range for integers and dates, maximum decimal places for floats, maximum length for strings);
  • The corresponding data type in MySQL 8.0;
  • Whether the column includes missing values.
Read More

Digging deeper: online resources for intermediate to advanced R users

Anybody wanting to learn R from scratch in 2018 will find an incredible wealth of tutorials, interactive learning websites, and high-quality videos at their disposal—almost to a point where it’s difficult to know where to start! This is of course a good thing, and is mainly due to R’s quickly growing popularity, with a constant stream of new users from both industry and academia wanting to learn the fundamentals.

But I’ve found that once you reach a certain level of confidence with the language, it becomes more difficult to find material for intermediate/advanced users who wish to become really good at R programming. But these materials do exist—they just tend to be mentioned and highlighted less frequently by the community.

Hence this post, where I’ve tried to gather a variety of books, courses and resources that should be beneficial to you, if you’re at that level where you don’t need another tidyverse tutorial, but wish you could get advanced insights from seasoned R programmers.

Read More

A web scraping tutorial using rvest on

The purpose of this tutorial is to show a concrete example of how web scraping can be used to build a dataset purely from an external, non-preformatted source of data.

Our example will be the website, which I’ve been using for many years to find book recommendations. As explained on the website itself, Fivebooks asks experts to recommend the five best books in their subject and explain their selection in an interview. Their archive consists of more than one thousand interviews (i.e. five thousand book recommendations), and they add two new interviews every week.

Our objective will be to use R, and in particular the rvest package, to gather the entire list of books recommended on Fivebooks, and see which ones are the most popular.

Read More

How to connect R to an Ingres database

If your main data is stored in an SQL database, creating a connection to query this database directly from R can save you hours of tedious data exports. The process is usually straightforward, but I recently had to set up a connection to Ingres. Unfortunately, a simple Google query wasn’t quite enough to find good documentation, since Ingres isn’t as common as other relational database management systems these days.

Read More

Efficient file input, output and storage in R

Whether used in academia, industry or journalism, working with R involves importing and exporting a lot of data. While the basic functions to read and write files are known to all users, different methods have been developed over the years to optimise this process.

In this article, we’ll have a look at the most efficient ways to read and write permanent files (i.e. in plain-text formats such as CSV), and to save and load binary files, a solution often overlooked by R users but much better suited to regular analysis of a given dataset.

Read More

Graph d'observation des groupes parlementaires à l’Assemblée nationale

Le graph ci-dessous est mis à jour automatiquement pour observer les dissensions internes aux groupes parlementaires de l’Assemblée nationale, en collectant et analysant tous les votes contraires au mot d’ordre de chaque groupe lors des scrutins publics. Les données agrégées permettent de surveiller la tendance générale dans le temps, ainsi que l’apparition de “frondeurs” au sein de chaque groupe parlementaire.


Read More