How to Create and Plot Data Frames in R

Understanding How to Create and Plot Data Frames in R

Learning how to create and plot data frames in R is essential for beginners in data analysis. A data frame is a structured way to store and manipulate data, making it easier to visualize relationships between variables. Using R’s base plotting tools, you can explore data trends effectively.

Creating and Visualizing Data in R Step by Step

To create a data frame in R, use the data.frame() function. For example, you can build a small dataset such as student marks and visualize it through plot(). Once plotted, you can use lines() to add multiple trends and legend() to label them clearly.

Why Plotting Data Frames in R Matters

Plotting data frames in R helps present analytical results visually, making data interpretation easier. Transitioning from simple numeric data to beautiful visual plots enhances understanding. Moreover, learning these steps prepares beginners for advanced data visualization using packages like ggplot2 later on.

Practice and Experiment with Base R Plotting

Experiment regularly to master how to create and plot data frames in R efficiently. Try modifying colors, labels, and plot types. With consistent practice, learners gain confidence and build strong visualization skills that support data science and analytics projects.


✅ Step 1: Create the data frame

# Create a data frame with Student Id and their subject marks
data <- data.frame(
  Student_Id = c(1,2,3,4,5,6),
  Mathematics = c(75,85,88,79,80,74),
  Physics = c(65,78,77,49,85,87),
  Chemistry = c(88,77,79,88,77,79)
)

# Print the data to see it in the console
print(data)

Explanation:

  • data.frame() creates a table-like structure (called a *data frame*) that holds data in columns and rows.
  • Student_Id, Mathematics, Physics, and Chemistry are the column names.
  • c() is used to create a list (or vector) of values.
  • print(data) simply shows the data in your R console.

✅ Step 2: Plot using base R (simple line plot)

# Plot Mathematics marks of all students
plot(data$Student_Id, data$Mathematics, 
     type = "o",                     # "o" means both points and lines
     col = "blue",                   # Color of the line
     xlab = "Student ID",            # Label for X-axis
     ylab = "Marks",                 # Label for Y-axis
     main = "Mathematics Marks of Students") # Title of the graph

Explanation:

  • plot() is a basic function to draw graphs in R.
  • data$Student_Id means take the Student_Id column from the data frame.
  • type = "o" draws both points and lines.
  • xlab, ylab, and main add labels and a title for better understanding.
  • col = "blue" sets the color of the line.

✅ Step 3: Add more subjects to the same plot (for comparison)

# Add Physics marks in red
lines(data$Student_Id, data$Physics, 
      type = "o", 
      col = "red")

# Add Chemistry marks in green
lines(data$Student_Id, data$Chemistry, 
      type = "o", 
      col = "green")

# Add a legend to identify lines
legend("bottomright", 
       legend = c("Mathematics", "Physics", "Chemistry"),
       col = c("blue", "red", "green"),
       lty = 1, pch = 1)

Explanation:

  • lines() adds new lines to the existing plot.
  • Each col gives a different color.
  • legend() adds a small box showing which color belongs to which subject.
  • "bottomright" places the legend at the bottom-right corner of the plot.

✅ Summary

Step Function Purpose
1 data.frame() Create the dataset
2 plot() Draw a basic line graph
3 lines() + legend() Add more data and legend

Explanation of Each Part of the legend() Function in R

The legend() function in R is used to add a descriptive legend to a plot. Below is the exact code used and a detailed explanation of each argument in a simple, lightweight, and SEO-friendly format.

legend("bottomright",
       legend = c("Mathematics", "Physics", "Chemistry"),
       col = c("blue", "red", "green"),
       lty = 1, pch = 1,
       cex = 0.8,
       pt.cex = 0.8,
       y.intersp = 0.8
)

Argument-by-Argument Explanation

1. Position: "bottomright"

Defines where the legend appears in the plot. "bottomright" places it in the lower-right corner. Other positions include "topleft", "topright", "center", etc.

2. legend = c("Mathematics", "Physics", "Chemistry")

These are the text labels that appear inside the legend. Each label corresponds to a specific data series in the graph.

3. col = c("blue", "red", "green")

Assigns the colors used for each line in the legend. The colors must match the colors used in the plot.

4. lty = 1

Defines the line type shown in the legend. 1 means a solid line. Other values include 2 for dashed, 3 for dotted, and more.

5. pch = 1

Sets the type of symbol (point marker) shown in the legend. 1 is a hollow circle. Examples: 16 for filled circle, 17 for triangle.

6. cex = 0.8

Controls the text size inside the legend. Smaller values reduce text size and help reduce the overall legend box size.

7. pt.cex = 0.8

Controls the size of the symbols in the legend. Lower values make the symbols smaller, improving readability.

8. y.intersp = 0.8

Adjusts the vertical spacing between legend items. A value less than 1 reduces spacing, making the legend more compact.

Summary

The combination of cex, pt.cex, and y.intersp helps make the legend smaller, cleaner, and better suited for compact graphs.

Educational Resources Footer
GitHub