4 data wrangling tasks in R for advanced beginners

Learn how to add columns, get summaries, sort your results and reshape your data.

1 2 3 4 5 6 7 8 Page 7
Page 7 of 8

This may be more apparent in the world of scientific experimentation. If you're testing a new cholesterol drug, for example, the categories you set up in advance might look at patients by age, gender and whether they're given the drug or a placebo. The measurements (or calculations resulting from those measurements) are your results: Changes in overall cholesterol level, LDL and HDL, for example. But whatever your data, you should have at least one category and one measurement if you want to create a long data frame.

In the example data we've been using here, my categories are fy and company, while my measurements are revenue, profit and margin.

And now here's the next concept you need to understand about reshaping from wide to long: Because you want only one measurement in each row, you'll need to add a column that says which type of measurement each value is if you have more than one category.

In my existing wide format, the column headers tell me the measurement type: revenue, profit or margin. But since I'm rearranging this to only have one of those numbers in each row, not three, I'll add a column to show which measurement it is.

I think an example will make this a lot clearer. Here's one "wide" row:

fy company revenue profit margin
2010 Apple 65225 14013 21.48409

And here's how to have only one measurement per row — by creating three "long" rows:

fy company financialCategory value
2010 Apple revenue 65225
2010 Apple profit 14013
2010 Apple margin 21.5

The column financialCategory now tells me what type of measurement each value is. And now, the term "value" should make more sense.

At last we're ready for some code to reshape a data frame from wide to long! As with pretty much everything in R, there are multiple ways to perform this task. In the tidyverse, you'll want to install the package tidyr.

As of early 2019, tidyr introduced two new functions: pivot_longer() and pivot_wider(). They're somewhat more intuitive to use than the package's earlier reshaping functions, gather() and spread() (which still work, but will no longer be recommended or maintained).

To use the new functions, install the development version of tidyr from GitHub version with remotes::install_github("tidyverse/tidyr") or devtools::install_github("tidyverse/tidyr"), and load it with library(tidyr). That should overwrite any older tidyr version which may be on your system. .

Use tidyr's pivot_longer() function to go from wide to long(er). pivot_longer() uses the following format to assign results to a variable named longData:

longData <- pivot_longer(df = your original data frame, cols = c(vector of names of columns you want to move into a single column), names_to = "new_category_column_name", values_to = "new_value_column_name")

Note that the names of the columns you're moving from separate columns into a single column do not need to be in quotation marks, but the names of your new category and value columns do.

Using tidyr, wide-to-long code can simply be:

companiesLong <- pivot_longer(companiesData, cols = c(revenue, profit, margin), names_to = "variable", values_to = "value")

This produces:

  fy company variable value
1 2010 Apple revenue 65225.0
2 2011 Apple revenue 108249.0
3 2012 Apple revenue 156508.0
4 2010 Google revenue 29321.0
5 2011 Google revenue 37905.0
6 2012 Google revenue 50175.0
7 2010 Microsoft revenue 62484.0
8 2011 Microsoft revenue 69943.0
9 2012 Microsoft revenue 73723.0
10 2010 Apple profit 14013.0
11 2011 Apple profit 25922.0
12 2012 Apple profit 41733.0
13 2010 Google profit 8505.0
14 2011 Google profit 9737.0
15 2012 Google profit 10737.0
16 2010 Microsoft profit 18760.0
17 2011 Microsoft profit 23150.0
18 2012 Microsoft profit 16978.0
19 2010 Apple margin 21.5
20 2011 Apple margin 23.9
21 2012 Apple margin 26.7
22 2010 Google margin 29.0
23 2011 Google margin 25.7
24 2012 Google margin 21.4
25 2010 Microsoft margin 30.0
26 2011 Microsoft margin 33.1
27 2012 Microsoft margin 23.0
1 2 3 4 5 6 7 8 Page 7
Page 7 of 8
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon