Understanding How to Create and Plot Data Frames in R
Learning how to create and plot data frames in R is essential for beginners in data analysis. A data frame is a structured way to store and manipulate data, making it easier to visualize relationships between variables. Using R’s base plotting tools, you can explore data trends effectively.
Creating and Visualizing Data in R Step by Step
To create a data frame in R, use the data.frame() function. For example, you can build a small dataset such as student marks and visualize it through plot(). Once plotted, you can use lines() to add multiple trends and legend() to label them clearly.
Why Plotting Data Frames in R Matters
Plotting data frames in R helps present analytical results visually, making data interpretation easier. Transitioning from simple numeric data to beautiful visual plots enhances understanding. Moreover, learning these steps prepares beginners for advanced data visualization using packages like ggplot2 later on.
Practice and Experiment with Base R Plotting
Experiment regularly to master how to create and plot data frames in R efficiently. Try modifying colors, labels, and plot types. With consistent practice, learners gain confidence and build strong visualization skills that support data science and analytics projects.
✅ Step 1: Create the data frame
# Create a data frame with Student Id and their subject marks
data <- data.frame(
Student_Id = c(1,2,3,4,5,6),
Mathematics = c(75,85,88,79,80,74),
Physics = c(65,78,77,49,85,87),
Chemistry = c(88,77,79,88,77,79)
)
# Print the data to see it in the console
print(data)
Explanation:
data.frame()creates a table-like structure (called a *data frame*) that holds data in columns and rows.Student_Id,Mathematics,Physics, andChemistryare the column names.c()is used to create a list (or vector) of values.print(data)simply shows the data in your R console.
✅ Step 2: Plot using base R (simple line plot)
# Plot Mathematics marks of all students
plot(data$Student_Id, data$Mathematics,
type = "o", # "o" means both points and lines
col = "blue", # Color of the line
xlab = "Student ID", # Label for X-axis
ylab = "Marks", # Label for Y-axis
main = "Mathematics Marks of Students") # Title of the graph
Explanation:
plot()is a basic function to draw graphs in R.data$Student_Idmeans take the Student_Id column from the data frame.type = "o"draws both points and lines.xlab,ylab, andmainadd labels and a title for better understanding.col = "blue"sets the color of the line.
✅ Step 3: Add more subjects to the same plot (for comparison)
# Add Physics marks in red
lines(data$Student_Id, data$Physics,
type = "o",
col = "red")
# Add Chemistry marks in green
lines(data$Student_Id, data$Chemistry,
type = "o",
col = "green")
# Add a legend to identify lines
legend("bottomright",
legend = c("Mathematics", "Physics", "Chemistry"),
col = c("blue", "red", "green"),
lty = 1, pch = 1)
Explanation:
lines()adds new lines to the existing plot.- Each
colgives a different color. legend()adds a small box showing which color belongs to which subject."bottomright"places the legend at the bottom-right corner of the plot.
✅ Summary
| Step | Function | Purpose |
|---|---|---|
| 1 | data.frame() |
Create the dataset |
| 2 | plot() |
Draw a basic line graph |
| 3 | lines() + legend() |
Add more data and legend |
Explanation of Each Part of the legend() Function in R
The legend() function in R is used to add a descriptive legend to a plot. Below is the exact code used and a detailed explanation of each argument in a simple, lightweight, and SEO-friendly format.
legend("bottomright",
legend = c("Mathematics", "Physics", "Chemistry"),
col = c("blue", "red", "green"),
lty = 1, pch = 1,
cex = 0.8,
pt.cex = 0.8,
y.intersp = 0.8
)
Argument-by-Argument Explanation
1. Position: "bottomright"
Defines where the legend appears in the plot. "bottomright" places it in the lower-right corner. Other positions include "topleft", "topright", "center", etc.
2. legend = c("Mathematics", "Physics", "Chemistry")
These are the text labels that appear inside the legend. Each label corresponds to a specific data series in the graph.
3. col = c("blue", "red", "green")
Assigns the colors used for each line in the legend. The colors must match the colors used in the plot.
4. lty = 1
Defines the line type shown in the legend. 1 means a solid line. Other values include 2 for dashed, 3 for dotted, and more.
5. pch = 1
Sets the type of symbol (point marker) shown in the legend. 1 is a hollow circle. Examples: 16 for filled circle, 17 for triangle.
6. cex = 0.8
Controls the text size inside the legend. Smaller values reduce text size and help reduce the overall legend box size.
7. pt.cex = 0.8
Controls the size of the symbols in the legend. Lower values make the symbols smaller, improving readability.
8. y.intersp = 0.8
Adjusts the vertical spacing between legend items. A value less than 1 reduces spacing, making the legend more compact.
Summary
The combination of cex, pt.cex, and y.intersp helps make the legend smaller, cleaner, and better suited for compact graphs.

