How to Sort Data in R

How to Sort Data in R – Sort and Arrange Data Frames Using Base R and dplyr

🎯 Topic: How to Sort Data in R Programming

Overview (150+ words):
Sorting is one of the first operations you will use when exploring data: it helps you quickly inspect extremes, find top performers, and prepare tables for reporting. In R, base functions such as sort(), order(), sort.list() and rank() are the most direct and reliable tools for sorting vectors and data frames without loading extra packages. sort() works on vectors and returns a sorted vector; for data frames you typically use order() to get row indices that you can use to reorder rows (for example, df[order(df$col), ]). Multi-column sorting (primary, secondary keys) is straightforward by passing multiple vectors to order(). To sort a column in descending order you can use decreasing = TRUE with sort(), or negate a numeric vector inside order() (e.g., -df$col) when mixing ascending/descending keys. rank() is helpful if you want to add a rank column instead of reordering rows. This lesson uses the built-in mtcars dataset (loaded by default in R) and demonstrates single-column sorts, multi-column sorts, descending order tricks, ranking, handling NAs (by simulation), and verification steps — all explained in clear, student-friendly language with examples you can run immediately.

Dataset used: mtcars (built-in)

We use R’s built-in mtcars dataset so you can run examples without preparing data. mtcars contains motor vehicle data with numeric columns such as mpg (miles per gallon), cyl (cylinders), hp (horsepower), and wt (weight). Row names are car models — useful for showing ordered results with labels.

R code: Quick preview of mtcars

# Preview built-in dataset
head(mtcars)
dim(mtcars)   # 32 rows, 11 columns
rownames(mtcars)[1:6]  # car model names (rownames are useful for display)
  

Example 1 — Sort a numeric vector (use sort())

# Sort mpg vector (ascending and descending)
sorted_mpg_asc  <- sort(mtcars$mpg)
sorted_mpg_desc <- sort(mtcars$mpg, decreasing = TRUE)

# Show a few values
sorted_mpg_asc[1:6]
sorted_mpg_desc[1:6]
  

Explanation: sort() returns an ordered vector of values. It doesn’t change the data frame rows. Use sort() when you only need the ordered values (for reporting or simple checks).

Example 2 — Reorder rows by one column (use order())

# Order rows by mpg ascending (lowest mpg first)
ord_mpg_asc <- order(mtcars$mpg)
mtcars_mpg_asc <- mtcars[ord_mpg_asc, ]
head(mtcars_mpg_asc, 6)

# Order rows by mpg descending (highest mpg first)
mtcars_mpg_desc <- mtcars[order(mtcars$mpg, decreasing = TRUE), ]
head(mtcars_mpg_desc, 6)
  

Explanation: order() returns row indices sorted by the key. Use those indices to subset the data frame and reorder rows. decreasing=TRUE applies to the overall ordering when a single key is used.

Example 3 — Multi-column sorting (primary and secondary keys)

# Sort by cylinders (cyl ascending) then within each cyl by mpg descending
# Trick: negate mpg for descending behavior while keeping cyl ascending
mtcars_multi <- mtcars[order(mtcars$cyl, -mtcars$mpg), ]
head(mtcars_multi, 12)

# Alternative explicit approach: use order with multiple args and decreasing trick for numeric keys
mtcars_multi2 <- mtcars[order(mtcars$cyl, mtcars$mpg, decreasing = c(FALSE, TRUE)), ]  # note: decreasing with vector is not supported in base order; prefer negation
  

Explanation: For multiple keys, pass each key to order() in priority order. Because base order() does not accept a vector of decreasing flags safely, negating numeric vectors (-mtcars$mpg) is a practical technique to get descending order for a secondary numeric key.

Example 4 — Create rank columns using rank()

# Create a rank column for mpg (highest mpg -> rank 1)
mtcars$mpg_rank <- rank(-mtcars$mpg, ties.method = "first")
head(mtcars[order(mtcars$mpg_rank), c("mpg","mpg_rank")], 6)
  

Explanation: rank() computes the ordinal rank of each element. Negating the vector yields ranks so that the largest value gets rank 1. The ties.method argument controls tie handling.

Example 5 — Use sort.list() (efficient ordering) and show row labels

# sort.list returns an ordering similar to order()
ord2 <- sort.list(mtcars$hp)
mtcars_by_hp <- mtcars[ord2, ]
# Show car names and hp for the first rows
data.frame(Car = rownames(mtcars_by_hp)[1:6], hp = mtcars_by_hp$hp)[1:6, ]
  

Explanation: sort.list() is a base-R alternative to order() that can be a bit faster in some contexts. Use rownames to keep model labels visible after reordering.

Example 6 — Handling NAs when sorting (simulate NAs)

# mtcars has no NAs by default; simulate some
tmp <- mtcars
tmp$mpg[c(3,8)] <- NA

# order() by mpg will put NAs at the end by default
tmp_sorted <- tmp[order(tmp$mpg), ]
# Remove NAs before sorting if you need only complete rows
tmp_no_na <- tmp[!is.na(tmp$mpg), ]
tmp_no_na_sorted <- tmp_no_na[order(tmp_no_na$mpg), ]
  

Explanation: By default order() places NA values at the end. If you want NAs excluded, filter them out first with !is.na().

Verification tips

  • Always inspect results with head() or tail() after sorting.
  • Use rownames() to preserve/inspect row labels (e.g., car models).
  • For character sorting, consider tolower() if you need case-insensitive order.
  • When mixing ascending and descending keys, negating numeric keys is a safe base-R trick.

Practice Exercises (Self-assessment)

  1. Using mtcars, sort the rows by mpg descending and show the top 5 cars (code + output).
  2. Sort by cyl ascending and within each cyl by hp descending; show the first 10 rows (code + output).
  3. Create a column revenue_like = mpg * wt and then add a rank column rev_rank where highest value gets rank 1; show the 6 highest-ranked rows (code + output).
  4. Simulate NA in hp for two rows, then sort by hp with NAs at the end, and then show how to exclude NAs and sort remaining rows (code + output).
  5. Explain in 2–3 sentences when you would use order() vs. when sort() is sufficient.

Answer Format (How to present answers)

## Exercise #n — Short title
# R code
...R code here...

# Output (printed):
...expected printed output (e.g., head(...), print(...))...

# Short explanation (2-4 sentences)
Explanation...
  

Example Solutions (Compact)

# Ex 1: Top 5 by mpg descending
mtcars[order(mtcars$mpg, decreasing = TRUE), ][1:5, ]

# Ex 2: cyl asc, hp desc (negation trick)
mtcars[order(mtcars$cyl, -mtcars$hp), ][1:10, ]

# Ex 3: Create revenue_like and rank (using mpg and wt)
mtcars$revenue_like <- mtcars$mpg * mtcars$wt
mtcars$rev_rank <- rank(-mtcars$revenue_like, ties.method = "first")
mtcars[order(mtcars$rev_rank), ][1:6, c("revenue_like","rev_rank")]

# Ex 4: Simulate NA in hp and sort
tmp <- mtcars
tmp$hp[c(2,5)] <- NA
# NAs at end by default
tmp_sorted <- tmp[order(tmp$hp), ]
# Exclude NAs then sort
tmp_no_na <- tmp[!is.na(tmp$hp), ]
tmp_no_na[order(tmp_no_na$hp), ]

# Ex 5: When to use order() vs sort()
# Short answer: Use sort() when you only need an ordered vector. Use order() when you need row indices to reorder a data frame by one or more keys.
  

Closing notes

Mastering sort(), order(), and rank() gives you full control over how data is presented and prepared. Practice the examples on mtcars until reordering rows, multi-key sorts, descending strategies, and NA handling feel intuitive. These base-R techniques are broadly applicable and form a solid foundation before exploring tidyverse alternatives.

Educational Resources Footer