How to Generate Random Data in R Programming
A Complete Guide to Creating and Exporting Random Datasets for Educational Purposes
Quick Start Example
Here’s a simple educational example to get you started with generating random student data:
Complete R Script
# Install package (only once) install.packages("openxlsx") # Load library library(openxlsx) # Generate random educational data set.seed(123) # for reproducibility student_data <- data.frame( StudentID = 1:20, Name = paste("Student", 1:20), Age = sample(18:22, 20, replace = TRUE), Gender = sample(c("Male", "Female"), 20, replace = TRUE), Marks_Math = sample(50:100, 20, replace = TRUE), Marks_Science = sample(50:100, 20, replace = TRUE), Marks_English = sample(50:100, 20, replace = TRUE) ) # View data in RStudio print(student_data) # Export to Excel file write.xlsx(student_data, file = "Student_Data.xlsx") # The file "Student_Data.xlsx" will be saved in your working directory
📘 Step-by-Step Explanation
1
Install and Load the Package
Package Setup
install.packages("openxlsx") # Install package (only first time) library(openxlsx) # Load the package into R
- install.packages(“openxlsx”): Downloads and installs the openxlsx package from CRAN. This package helps us write and read Excel files.
- library(openxlsx): Activates the package for use in the current R session.
Teaching point:
Packages in R are like “apps” that add extra features.
2
Set Random Seed
Reproducibility Setup
set.seed(123)
- This makes sure that every time you run the code, you get the same random data.
- Without it, random numbers would change each time you run the script.
Teaching point:
set.seed() ensures reproducibility (very important in data science).
3
Create the Dataset
Data Generation
student_data <- data.frame( StudentID = 1:20, Name = paste("Student", 1:20), Age = sample(18:22, 20, replace = TRUE), Gender = sample(c("Male", "Female"), 20, replace = TRUE), Marks_Math = sample(50:100, 20, replace = TRUE), Marks_Science = sample(50:100, 20, replace = TRUE), Marks_English = sample(50:100, 20, replace = TRUE) )
- data.frame(): Creates a table-like structure in R (like an Excel sheet).
- StudentID = 1:20: Creates student IDs from 1 to 20.
- Name = paste(“Student”, 1:20): Creates names like Student 1, Student 2, … Student 20.
- Age = sample(18:22, 20, replace = TRUE): Randomly picks ages between 18–22 for 20 students.
- Gender = sample(c(“Male”, “Female”), 20, replace = TRUE): Randomly assigns Male/Female for each student.
- Marks_Math, Marks_Science, Marks_English: Random marks between 50–100 are assigned.
Teaching point:
This is how you simulate real-life data for practice.
4
View the Data in RStudio
Display Data
print(student_data)
- Displays the dataset in the console.
- You can also simply type student_data to see the data in RStudio.
Teaching point:
Always check your data before exporting.
5
Export the Data to Excel
Export to Excel
write.xlsx(student_data, file = "Student_Data.xlsx")
- write.xlsx(): Writes the dataset into an Excel file.
- file = “Student_Data.xlsx”: The name of the Excel file.
- The file is saved in the current working directory.
💡 Key Takeaways
- Use set.seed() for reproducible random data
- The sample() function is perfect for generating random values
- The openxlsx package makes Excel export simple and efficient
- Always verify your data with print() before exporting
