📊 Data Types in R: Appropriate Uses of Different Data Types
Learning Objective: Master the fundamental data types in R and understand when and how to use each type effectively in real-world data analysis scenarios.
1. Introduction to Data Types in R
Data types are the foundation of programming in R. They define what kind of data can be stored and what operations can be performed on that data. Understanding data types is crucial for:
- Efficient memory management
- Proper data manipulation and analysis
- Avoiding errors in statistical computations
- Writing optimized code for large datasets
R has six basic (atomic) data types, and choosing the right one significantly impacts your program’s performance and accuracy.
2. The Six Basic Data Types in R
| Data Type | Description | Example Values | Primary Use Case |
|---|---|---|---|
| Numeric (Double) | Decimal numbers (default) | 3.14, 100.5, -0.001 | Continuous measurements, calculations |
| Integer | Whole numbers | 1L, 100L, -50L | Counting, indexing, discrete data |
| Character | Text strings | “Hello”, ‘MBA’, “2024” | Names, labels, categorical text |
| Logical | Boolean values | TRUE, FALSE | Conditions, filtering, binary outcomes |
| Complex | Complex numbers | 1+2i, 3-4i | Mathematical computations, signal processing |
| Raw | Raw bytes | as.raw(65) | Binary data, file operations |
3. Numeric Data Type (Double)
Definition and Characteristics
The numeric data type (technically “double” or double-precision floating-point) is the default numerical type in R. It can store decimal numbers with approximately 15-17 significant digits of precision.
✅ When to Use Numeric Data Type:
- Financial Analysis: Stock prices, revenue, profit margins
- Statistical Measurements: Mean, median, standard deviation
- Scientific Calculations: Measurements, ratios, percentages
- Business Metrics: ROI, conversion rates, growth rates
# Creating numeric variables
revenue <- 125000.50
expenses <- 87500.75
profit_margin <- 0.165
# Calculations
net_profit <- revenue - expenses
roi <- (net_profit / expenses) * 100
# Display results
cat("Revenue: $", revenue, "\n")
cat("Expenses: $", expenses, "\n")
cat("Net Profit: $", net_profit, "\n")
cat("ROI: ", roi, "%\n")
# Check data type
class(revenue)
typeof(revenue)
Revenue: $ 125000.5 Expenses: $ 87500.75 Net Profit: $ 37499.75 ROI: 42.85657 % [1] "numeric" [1] "double"
# Quarterly sales data
Q1_sales <- 250000.00
Q2_sales <- 275000.50
Q3_sales <- 310000.75
Q4_sales <- 340000.25
# Calculate year-over-year growth
total_sales <- Q1_sales + Q2_sales + Q3_sales + Q4_sales
average_quarterly <- total_sales / 4
growth_rate <- ((Q4_sales - Q1_sales) / Q1_sales) * 100
cat("Total Annual Sales: $", format(total_sales, big.mark=","), "\n")
cat("Average Quarterly Sales: $", format(average_quarterly, big.mark=","), "\n")
cat("Q4 vs Q1 Growth: ", round(growth_rate, 2), "%\n")
# Statistical analysis
sales_vector <- c(Q1_sales, Q2_sales, Q3_sales, Q4_sales)
cat("\nStatistical Summary:\n")
cat("Mean: $", mean(sales_vector), "\n")
cat("Median: $", median(sales_vector), "\n")
cat("Std Dev: $", round(sd(sales_vector), 2), "\n")
Total Annual Sales: $ 1,175,001 Average Quarterly Sales: $ 293,750.4 Q4 vs Q1 Growth: 36 % Statistical Summary: Mean: $ 293750.4 Median: $ 292500.6 Std Dev: $ 37322.87
round() function to control decimal places for presentation: round(value, 2) for two decimal places.
4. Integer Data Type
Definition and Characteristics
Integers are whole numbers without decimal points. In R, you must explicitly define integers by appending L to the number (e.g., 100L). Integers use less memory than numeric types.
✅ When to Use Integer Data Type:
- Counting: Number of customers, products, employees
- Indexing: Array positions, database IDs
- Discrete Data: Age in years, number of transactions
- Memory Optimization: Large datasets with whole numbers
# Integer variables for business metrics
total_customers <- 15000L
new_customers_month <- 450L
products_sold <- 8500L
inventory_count <- 12500L
# Calculations
customer_growth <- new_customers_month
remaining_inventory <- inventory_count - products_sold
cat("Total Customers: ", total_customers, "\n")
cat("New Customers This Month: ", new_customers_month, "\n")
cat("Products Sold: ", products_sold, "\n")
cat("Remaining Inventory: ", remaining_inventory, "\n")
# Verify data type
cat("\nData Type Check:\n")
cat("Class: ", class(total_customers), "\n")
cat("Type: ", typeof(total_customers), "\n")
# Comparison: Integer vs Numeric memory
numeric_var <- 15000
integer_var <- 15000L
cat("\nMemory Comparison:\n")
cat("Numeric size: ", object.size(numeric_var), " bytes\n")
cat("Integer size: ", object.size(integer_var), " bytes\n")
Total Customers: 15000 New Customers This Month: 450 Products Sold: 8500 Remaining Inventory: 4000 Data Type Check: Class: integer Type: integer Memory Comparison: Numeric size: 56 bytes Integer size: 56 bytes
# Employee demographics (integers are perfect for age)
employee_ages <- c(25L, 32L, 45L, 28L, 55L, 38L, 42L, 29L)
num_employees <- length(employee_ages)
# Age analysis
youngest <- min(employee_ages)
oldest <- max(employee_ages)
median_age <- median(employee_ages)
# Age group categorization
under_30 <- sum(employee_ages < 30L)
age_30_to_40 <- sum(employee_ages >= 30L & employee_ages < 40L)
age_40_plus <- sum(employee_ages >= 40L)
cat("Employee Demographics Analysis\n")
cat("==============================\n")
cat("Total Employees: ", num_employees, "\n")
cat("Youngest Employee: ", youngest, " years\n")
cat("Oldest Employee: ", oldest, " years\n")
cat("Median Age: ", median_age, " years\n\n")
cat("Age Distribution:\n")
cat("Under 30: ", under_30, " employees\n")
cat("30-39: ", age_30_to_40, " employees\n")
cat("40+: ", age_40_plus, " employees\n")
Employee Demographics Analysis ============================== Total Employees: 8 Youngest Employee: 25 years Oldest Employee: 55 years Median Age: 35 years Age Distribution: Under 30: 3 employees 30-39: 2 employees 40+: 3 employees
5. Character Data Type
Definition and Characteristics
Character data type stores text strings. Strings must be enclosed in either single quotes ('text') or double quotes ("text"). This is essential for storing names, labels, descriptions, and categorical data.
✅ When to Use Character Data Type:
- Identifiers: Customer names, product codes, IDs
- Categorical Data: Department names, product categories, regions
- Text Analysis: Customer reviews, feedback, descriptions
- Labels: Chart labels, report headers, annotations
# Customer database
customer_names <- c("John Smith", "Sarah Johnson", "Michael Brown", "Emily Davis")
customer_ids <- c("CUST001", "CUST002", "CUST003", "CUST004")
departments <- c("Sales", "Marketing", "Finance", "Operations")
email_domains <- c("gmail.com", "company.com", "yahoo.com", "outlook.com")
# String operations
cat("Customer Database\n")
cat("=================\n")
for(i in 1:length(customer_names)) {
cat("ID: ", customer_ids[i], " | Name: ", customer_names[i],
" | Dept: ", departments[i], "\n")
}
# String functions
cat("\nString Analysis:\n")
cat("First customer name length: ", nchar(customer_names[1]), " characters\n")
cat("Uppercase ID: ", toupper(customer_ids[1]), "\n")
cat("Lowercase email: ", tolower(email_domains[2]), "\n")
# Check data type
cat("\nData Type: ", class(customer_names), "\n")
Customer Database ================= ID: CUST001 | Name: John Smith | Dept: Sales ID: CUST002 | Name: Sarah Johnson | Dept: Marketing ID: CUST003 | Name: Michael Brown | Dept: Finance ID: CUST004 | Name: Emily Davis | Dept: Operations String Analysis: First customer name length: 10 characters Uppercase ID: CUST001 Lowercase email: company.com Data Type: character
# Product information
product_names <- c("Laptop Pro 15", "Wireless Mouse", "USB-C Hub", "Mechanical Keyboard")
product_categories <- c("Electronics", "Accessories", "Accessories", "Electronics")
product_skus <- c("ELEC-LP-001", "ACC-MS-045", "ACC-HB-023", "ELEC-KB-089")
# String concatenation and manipulation
cat("Product Catalog\n")
cat("===============\n\n")
for(i in 1:length(product_names)) {
full_label <- paste(product_skus[i], "-", product_names[i])
cat("Product ", i, ": ", full_label, "\n")
cat("Category: ", product_categories[i], "\n")
# Extract category prefix from SKU
sku_prefix <- substr(product_skus[i], 1, 4)
cat("SKU Prefix: ", sku_prefix, "\n\n")
}
# Count products by category
electronics_count <- sum(product_categories == "Electronics")
accessories_count <- sum(product_categories == "Accessories")
cat("Category Summary:\n")
cat("Electronics: ", electronics_count, " products\n")
cat("Accessories: ", accessories_count, " products\n")
Product Catalog =============== Product 1 : ELEC-LP-001 - Laptop Pro 15 Category: Electronics SKU Prefix: ELEC Product 2 : ACC-MS-045 - Wireless Mouse Category: Accessories SKU Prefix: ACC- Product 3 : ACC-HB-023 - USB-C Hub Category: Accessories SKU Prefix: ACC- Product 4 : ELEC-KB-089 - Mechanical Keyboard Category: Electronics SKU Prefix: ELEC Category Summary: Electronics: 2 products Accessories: 2 products
paste()- Concatenate stringssubstr()- Extract substringtoupper(),tolower()- Change casenchar()- Count charactersgrep(),grepl()- Pattern matching
6. Logical Data Type
Definition and Characteristics
Logical (or Boolean) data type has only two possible values: TRUE or FALSE. These are fundamental for conditional operations, filtering data, and control flow in programs.
✅ When to Use Logical Data Type:
- Data Filtering: Selecting records that meet criteria
- Conditional Checks: Testing if conditions are met
- Binary Outcomes: Yes/No, Pass/Fail, Active/Inactive
- Control Flow: If-else statements, while loops
# Sales representatives performance
sales_reps <- c("Alice", "Bob", "Charlie", "Diana", "Edward")
monthly_sales <- c(125000, 87000, 145000, 92000, 156000)
target <- 100000
# Logical evaluations
met_target <- monthly_sales >= target
exceeded_120k <- monthly_sales > 120000
top_performer <- monthly_sales == max(monthly_sales)
# Performance report
cat("Monthly Sales Performance Report\n")
cat("================================\n\n")
for(i in 1:length(sales_reps)) {
cat("Representative: ", sales_reps[i], "\n")
cat("Sales: $", monthly_sales[i], "\n")
cat("Met Target: ", met_target[i], "\n")
cat("Exceeded $120K: ", exceeded_120k[i], "\n")
cat("Top Performer: ", top_performer[i], "\n")
cat("---\n")
}
# Summary statistics
total_met_target <- sum(met_target)
total_exceeded_120k <- sum(exceeded_120k)
percentage_met <- (total_met_target / length(sales_reps)) * 100
cat("\nSummary:\n")
cat("Representatives meeting target: ", total_met_target, " out of ",
length(sales_reps), "\n")
cat("Representatives exceeding $120K: ", total_exceeded_120k, "\n")
cat("Success Rate: ", percentage_met, "%\n")
# Check data type
cat("\nData Type: ", class(met_target), "\n")
Monthly Sales Performance Report ================================ Representative: Alice Sales: $ 125000 Met Target: TRUE Exceeded $120K: TRUE Top Performer: FALSE --- Representative: Bob Sales: $ 87000 Met Target: FALSE Exceeded $120K: FALSE Top Performer: FALSE --- Representative: Charlie Sales: $ 145000 Met Target: TRUE Exceeded $120K: TRUE Top Performer: FALSE --- Representative: Diana Sales: $ 92000 Met Target: FALSE Exceeded $120K: FALSE Top Performer: FALSE --- Representative: Edward Sales: $ 156000 Met Target: TRUE Exceeded $120K: TRUE Top Performer: TRUE --- Summary: Representatives meeting target: 3 out of 5 Representatives exceeding $120K: 3 Success Rate: 60 % Data Type: logical
# Product quality inspection
product_ids <- c("P001", "P002", "P003", "P004", "P005")
weight_kg <- c(5.2, 4.8, 5.0, 5.5, 4.9)
dimensions_ok <- c(TRUE, TRUE, FALSE, TRUE, TRUE)
quality_passed <- c(TRUE, FALSE, TRUE, TRUE, TRUE)
# Specifications
min_weight <- 4.9
max_weight <- 5.3
# Validation checks
weight_in_range <- (weight_kg >= min_weight) & (weight_kg <= max_weight)
overall_pass <- weight_in_range & dimensions_ok & quality_passed
cat("Quality Control Report\n")
cat("=====================\n\n")
for(i in 1:length(product_ids)) {
cat("Product ID: ", product_ids[i], "\n")
cat("Weight: ", weight_kg[i], " kg | In Range: ", weight_in_range[i], "\n")
cat("Dimensions OK: ", dimensions_ok[i], "\n")
cat("Quality Test: ", quality_passed[i], "\n")
cat("OVERALL STATUS: ", ifelse(overall_pass[i], "PASS ✓", "FAIL ✗"), "\n")
cat("---\n")
}
# Summary
pass_count <- sum(overall_pass)
fail_count <- sum(!overall_pass)
pass_rate <- (pass_count / length(product_ids)) * 100
cat("\nQuality Summary:\n")
cat("Products Passed: ", pass_count, "\n")
cat("Products Failed: ", fail_count, "\n")
cat("Pass Rate: ", round(pass_rate, 1), "%\n")
Quality Control Report ===================== Product ID: P001 Weight: 5.2 kg | In Range: TRUE Dimensions OK: TRUE Quality Test: TRUE OVERALL STATUS: PASS ✓ --- Product ID: P002 Weight: 4.8 kg | In Range: FALSE Dimensions OK: TRUE Quality Test: FALSE OVERALL STATUS: FAIL ✗ --- Product ID: P003 Weight: 5 kg | In Range: TRUE Dimensions OK: FALSE Quality Test: TRUE OVERALL STATUS: FAIL ✗ --- Product ID: P004 Weight: 5.5 kg | In Range: FALSE Dimensions OK: TRUE Quality Test: TRUE OVERALL STATUS: FAIL ✗ --- Product ID: P005 Weight: 4.9 kg | In Range: TRUE Dimensions OK: TRUE Quality Test: TRUE OVERALL STATUS: PASS ✓ --- Quality Summary: Products Passed: 2 Products Failed: 3 Pass Rate: 40 %
>, <, ==, !=, >=, <=). They can be combined using logical operators: & (AND), | (OR), ! (NOT).
7. Complex Data Type
Definition and Characteristics
Complex numbers have a real and an imaginary part (e.g., 3 + 4i). While less common in business analytics, they're essential for certain mathematical computations, signal processing, and engineering applications.
✅ When to Use Complex Data Type:
- Engineering Calculations: Electrical engineering, signal processing
- Advanced Mathematics: Fourier transforms, eigenvalue problems
- Scientific Computing: Quantum mechanics, wave equations
- Financial Engineering: Options pricing (advanced models)
# Creating complex numbers
z1 <- 3 + 4i
z2 <- 2 - 3i
z3 <- complex(real = 5, imaginary = 2)
# Basic operations
sum_z <- z1 + z2
diff_z <- z1 - z2
prod_z <- z1 * z2
quot_z <- z1 / z2
cat("Complex Number Operations\n")
cat("=========================\n\n")
cat("z1 = ", z1, "\n")
cat("z2 = ", z2, "\n\n")
cat("Addition (z1 + z2) = ", sum_z, "\n")
cat("Subtraction (z1 - z2) = ", diff_z, "\n")
cat("Multiplication (z1 × z2) = ", prod_z, "\n")
cat("Division (z1 ÷ z2) = ", quot_z, "\n\n")
# Properties of complex numbers
cat("Properties of z1:\n")
cat("Real part: ", Re(z
```html
1), "\n")
cat("Imaginary part: ", Im(z1), "\n")
cat("Modulus (|z1|): ", Mod(z1), "\n")
cat("Conjugate: ", Conj(z1), "\n")
cat("Argument (angle): ", Arg(z1), " radians\n\n")
# Check data type
cat("Data Type: ", class(z1), "\n")
cat("Type: ", typeof(z1), "\n")
Complex Number Operations ========================= z1 = 3+4i z2 = 2-3i Addition (z1 + z2) = 5+1i Subtraction (z1 - z2) = 1+7i Multiplication (z1 × z2) = 18+(-1)i Division (z1 ÷ z2) = -0.4615385+1.307692i Properties of z1: Real part: 3 Imaginary part: 4 Modulus (|z1|): 5 Conjugate: 3-4i Argument (angle): 0.9272952 radians Data Type: complex Type: complex
# Electrical impedance calculation (AC circuit analysis)
# Impedance Z = R + jX, where R is resistance and X is reactance
resistance <- c(100, 150, 200) # Ohms
reactance <- c(50, -75, 100) # Ohms (positive for inductive, negative for capacitive)
# Create complex impedances
impedances <- complex(real = resistance, imaginary = reactance)
cat("AC Circuit Analysis - Impedance Calculations\n")
cat("============================================\n\n")
for(i in 1:length(impedances)) {
z <- impedances[i]
magnitude <- Mod(z)
phase_rad <- Arg(z)
phase_deg <- phase_rad * 180 / pi
cat("Circuit ", i, ":\n")
cat("Impedance: ", z, " Ω\n")
cat("Magnitude: ", round(magnitude, 2), " Ω\n")
cat("Phase Angle: ", round(phase_deg, 2), "°\n")
if(Im(z) > 0) {
cat("Type: Inductive\n")
} else if(Im(z) < 0) {
cat("Type: Capacitive\n")
} else {
cat("Type: Purely Resistive\n")
}
cat("---\n")
}
# Calculate total impedance (series circuit)
total_impedance <- sum(impedances)
cat("\nTotal Series Impedance: ", total_impedance, " Ω\n")
cat("Total Magnitude: ", round(Mod(total_impedance), 2), " Ω\n")
AC Circuit Analysis - Impedance Calculations ============================================ Circuit 1 : Impedance: 100+50i Ω Magnitude: 111.8 Ω Phase Angle: 26.57 ° Type: Inductive --- Circuit 2 : Impedance: 150-75i Ω Magnitude: 167.71 Ω Phase Angle: -26.57 ° Type: Capacitive --- Circuit 3 : Impedance: 200+100i Ω Magnitude: 223.61 Ω Phase Angle: 26.57 ° Type: Inductive --- Total Series Impedance: 450+75i Ω Total Magnitude: 456.22 Ω
8. Raw Data Type
Definition and Characteristics
The raw data type stores data as raw bytes (values from 0 to 255). It's used for low-level data operations, binary file handling, and interfacing with external systems.
✅ When to Use Raw Data Type:
- Binary File Operations: Reading/writing binary files
- Data Encryption: Cryptographic operations
- Network Communication: Socket programming, protocols
- Low-level Data Processing: Image processing, audio data
# Creating raw data
raw_bytes <- as.raw(c(65, 66, 67, 68, 69)) # ASCII codes for A, B, C, D, E
raw_single <- as.raw(72) # ASCII for 'H'
cat("Raw Data Type Demonstration\n")
cat("===========================\n\n")
cat("Raw bytes: ")
print(raw_bytes)
cat("\n")
# Convert raw to character
chars <- rawToChar(raw_bytes)
cat("Converted to characters: ", chars, "\n\n")
# Convert character to raw
text <- "HELLO"
raw_from_text <- charToRaw(text)
cat("Text '", text, "' as raw bytes: ")
print(raw_from_text)
cat("\n")
# Check data type
cat("Data Type: ", class(raw_bytes), "\n")
cat("Type: ", typeof(raw_bytes), "\n\n")
# Size comparison
numeric_vector <- c(65, 66, 67, 68, 69)
cat("Memory Usage Comparison:\n")
cat("Numeric vector: ", object.size(numeric_vector), " bytes\n")
cat("Raw vector: ", object.size(raw_bytes), " bytes\n")
Raw Data Type Demonstration =========================== Raw bytes: [1] 41 42 43 44 45 Converted to characters: ABCDE Text ' HELLO ' as raw bytes: [1] 48 45 4c 4c 4f Data Type: raw Type: raw Memory Usage Comparison: Numeric vector: 88 bytes Raw vector: 53 bytes
9. Data Type Conversion (Coercion)
Understanding Type Conversion
R allows you to convert between data types using as.numeric(), as.integer(), as.character(), as.logical() functions. Understanding when and how to convert is crucial for data manipulation.
# Starting with different types
num_value <- 42.7
char_number <- "123"
char_text <- "Hello"
logical_value <- TRUE
integer_val <- 100L
cat("Type Conversion Examples\n")
cat("========================\n\n")
# Numeric to Integer
num_to_int <- as.integer(num_value)
cat("Numeric to Integer:\n")
cat(num_value, " (", class(num_value), ") -> ",
num_to_int, " (", class(num_to_int), ")\n\n")
# Character to Numeric
char_to_num <- as.numeric(char_number)
cat("Character to Numeric:\n")
cat("'", char_number, "' (", class(char_number), ") -> ",
char_to_num, " (", class(char_to_num), ")\n\n")
# Numeric to Character
num_to_char <- as.character(num_value)
cat("Numeric to Character:\n")
cat(num_value, " (", class(num_value), ") -> '",
num_to_char, "' (", class(num_to_char), ")\n\n")
# Logical to Numeric
logical_to_num <- as.numeric(logical_value)
cat("Logical to Numeric:\n")
cat(logical_value, " (", class(logical_value), ") -> ",
logical_to_num, " (", class(logical_to_num), ")\n\n")
# Attempting invalid conversion
invalid_conversion <- as.numeric(char_text)
cat("Invalid Conversion (generates warning):\n")
cat("'", char_text, "' to numeric -> ", invalid_conversion, "\n")
cat("(NA = Not Available/Missing Value)\n")
Type Conversion Examples ======================== Numeric to Integer: 42.7 ( numeric ) -> 42 ( integer ) Character to Numeric: ' 123 ' ( character ) -> 123 ( numeric ) Numeric to Character: 42.7 ( numeric ) -> ' 42.7 ' ( character ) Logical to Numeric: TRUE ( logical ) -> 1 ( numeric ) Invalid Conversion (generates warning): ' Hello ' to numeric -> NA (NA = Not Available/Missing Value)
# Data imported from Excel/CSV often comes as character
revenue_char <- c("125000", "87500", "156000", "92000", "134500")
year_char <- c("2020", "2021", "2022", "2023", "2024")
profitable_char <- c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE")
cat("Data Cleaning and Type Conversion\n")
cat("==================================\n\n")
# Convert to appropriate types
revenue_num <- as.numeric(revenue_char)
year_int <- as.integer(year_char)
profitable_log <- as.logical(profitable_char)
# Perform calculations (only possible after conversion)
total_revenue <- sum(revenue_num)
avg_revenue <- mean(revenue_num)
num_profitable <- sum(profitable_log)
profitable_rate <- (num_profitable / length(profitable_log)) * 100
cat("Financial Analysis (After Type Conversion)\n")
cat("------------------------------------------\n")
cat("Total Revenue: $", format(total_revenue, big.mark=","), "\n")
cat("Average Revenue: $", format(avg_revenue, big.mark=","), "\n")
cat("Profitable Years: ", num_profitable, " out of ", length(year_int), "\n")
cat("Profitability Rate: ", round(profitable_rate, 1), "%\n\n")
# Year-over-year analysis
cat("Yearly Breakdown:\n")
for(i in 1:length(year_int)) {
status <- ifelse(profitable_log[i], "PROFITABLE", "LOSS")
cat("Year ", year_int[i], ": $", format(revenue_num[i], big.mark=","),
" - ", status, "\n")
}
# Data type verification
cat("\nData Types After Conversion:\n")
cat("Revenue: ", class(revenue_num), "\n")
cat("Year: ", class(year_int), "\n")
cat("Profitable: ", class(profitable_log), "\n")
Data Cleaning and Type Conversion ================================== Financial Analysis (After Type Conversion) ------------------------------------------ Total Revenue: $ 595,000 Average Revenue: $ 119,000 Profitable Years: 3 out of 5 Profitability Rate: 60 % Yearly Breakdown: Year 2020 : $ 125,000 - PROFITABLE Year 2021 : $ 87,500 - LOSS Year 2022 : $ 156,000 - PROFITABLE Year 2023 : $ 92,000 - PROFITABLE Year 2024 : $ 134,500 - LOSS Data Types After Conversion: Revenue: numeric Year: integer Profitable: logical
10. Checking and Verifying Data Types
Essential Functions for Type Checking
R provides several functions to check data types. Using these functions is crucial for debugging and ensuring data quality.
# Create variables of different types
my_numeric <- 45.6
my_integer <- 100L
my_character <- "Business Analytics"
my_logical <- FALSE
my_complex <- 3 + 2i
cat("Data Type Verification Functions\n")
cat("=================================\n\n")
# Using class() function
cat("Using class() function:\n")
cat("my_numeric: ", class(my_numeric), "\n")
cat("my_integer: ", class(my_integer), "\n")
cat("my_character: ", class(my_character), "\n")
cat("my_logical: ", class(my_logical), "\n")
cat("my_complex: ", class(my_complex), "\n\n")
# Using typeof() function (more specific)
cat("Using typeof() function:\n")
cat("my_numeric: ", typeof(my_numeric), "\n")
cat("my_integer: ", typeof(my_integer), "\n")
cat("my_character: ", typeof(my_character), "\n")
cat("my_logical: ", typeof(my_logical), "\n")
cat("my_complex: ", typeof(my_complex), "\n\n")
# Using is.* functions (returns TRUE/FALSE)
cat("Using is.* verification functions:\n")
cat("is.numeric(my_numeric): ", is.numeric(my_numeric), "\n")
cat("is.integer(my_integer): ", is.integer(my_integer), "\n")
cat("is.character(my_character): ", is.character(my_character), "\n")
cat("is.logical(my_logical): ", is.logical(my_logical), "\n")
cat("is.complex(my_complex): ", is.complex(my_complex), "\n\n")
# Checking multiple conditions
cat("Advanced Checking:\n")
cat("Is my_integer numeric? ", is.numeric(my_integer), "\n")
cat("Is my_integer specifically integer? ", is.integer(my_integer), "\n")
cat("Is '123' numeric? ", is.numeric("123"), "\n")
cat("Is '123' character? ", is.character("123"), "\n")
Data Type Verification Functions ================================= Using class() function: my_numeric: numeric my_integer: integer my_character: character my_logical: logical my_complex: complex Using typeof() function: my_numeric: double my_integer: integer my_character: character my_logical: logical my_complex: complex Using is.* verification functions: is.numeric(my_numeric): TRUE is.integer(my_integer): TRUE is.character(my_character): TRUE is.logical(my_logical): TRUE is.complex(my_complex): TRUE Advanced Checking: Is my_integer numeric? TRUE Is my_integer specifically integer? TRUE Is '123' numeric? FALSE Is '123' character? TRUE
# Real-world data validation scenario
# Function to validate imported data
validate_data <- function(data_value, expected_type, variable_name) {
cat("Validating: ", variable_name, "\n")
cat("Value: ", data_value, "\n")
cat("Expected Type: ", expected_type, "\n")
is_valid <- switch(expected_type,
"numeric" = is.numeric(data_value),
"integer" = is.integer(data_value),
"character" = is.character(data_value),
"logical" = is.logical(data_value),
FALSE)
actual_type <- class(data_value)
cat("Actual Type: ", actual_type, "\n")
if(is_valid) {
cat("✓ VALIDATION PASSED\n")
} else {
cat("✗ VALIDATION FAILED - Type Mismatch!\n")
}
cat("---\n")
return(is_valid)
}
# Test the validation function
cat("Data Import Validation Report\n")
cat("==============================\n\n")
customer_id <- 12345L
customer_name <- "John Smith"
purchase_amount <- 1250.50
is_premium <- TRUE
invalid_data <- "not_a_number"
validate_data(customer_id, "integer", "Customer ID")
validate_data(customer_name, "character", "Customer Name")
validate_data(purchase_amount, "numeric", "Purchase Amount")
validate_data(is_premium, "logical", "Premium Status")
validate_data(invalid_data, "numeric", "Invalid Numeric Field")
Data Import Validation Report ============================== Validating: Customer ID Value: 12345 Expected Type: integer Actual Type: integer ✓ VALIDATION PASSED --- Validating: Customer Name Value: John Smith Expected Type: character Actual Type: character ✓ VALIDATION PASSED --- Validating: Purchase Amount Value: 1250.5 Expected Type: numeric Actual Type: numeric ✓ VALIDATION PASSED --- Validating: Premium Status Value: TRUE Expected Type: logical Actual Type: logical ✓ VALIDATION PASSED --- Validating: Invalid Numeric Field Value: not_a_number Expected Type: numeric Actual Type: character ✗ VALIDATION FAILED - Type Mismatch! ---
11. Decision Guide: Choosing the Right Data Type
| Scenario | Recommended Data Type | Reason | Example |
|---|---|---|---|
| Stock prices, revenue | Numeric | Need decimal precision | 125.75, 1500.50 |
| Number of customers, products | Integer | Whole numbers, memory efficient | 1500L, 2500L |
| Customer names, IDs | Character | Text data, identifiers | "John", "CUST001" |
| Yes/No, Pass/Fail | Logical | Binary outcomes, filtering | TRUE, FALSE |
| Age in years | Integer | Discrete whole numbers | 25L, 45L |
| Percentages, ratios | Numeric | Fractional values | 0.15, 1.25 |
| Product categories | Character | Categorical text | "Electronics" |
| Date as text | Character | Then convert to Date type | "2024-01-15" |
| Survey responses (Yes/No) | Logical | Binary responses | TRUE, FALSE |
| Ratings (1-5) | Integer | Discrete values | 1L, 2L, 3L, 4L, 5L |
12. Common Data Type Mistakes and How to Avoid Them
❌ Common Mistake #1: Forgetting the "L" suffix for integers
Wrong: count <- 100 (creates numeric)
Right: count <- 100L (creates integer)
❌ Common Mistake #2: Performing calculations on character data
Problem: sum(c("100", "200")) will cause an error
Solution: sum(as.numeric(c("100", "200")))
❌ Common Mistake #3: Mixing data types in comparisons
Confusing: "100" == 100 returns TRUE (R coerces types)
Better: Explicitly convert: as.numeric("100") == 100
❌ Common Mistake #4: Not checking data types after import
Problem: CSV imports often make numbers into characters
Solution: Always use str() or class() after importing
# Simulating common data type errors
cat("Common Data Type Errors and Solutions\n")
cat("======================================\n\n")
# Error 1: Operating on character numbers
sales_data <- c("1000", "1500", "2000")
cat("ERROR SCENARIO 1: Character numbers\n")
cat("Data: ", paste(sales_data, collapse=", "), "\n")
cat("Type: ", class(sales_data), "\n")
# Attempting: total <- sum(sales_data) # This would error!
cat("Problem: Cannot sum character data\n")
cat("Solution: Convert first\n")
sales_numeric <- as.numeric(sales_data)
total <- sum(sales_numeric)
cat("Result after conversion: ", total, "\n\n")
# Error 2: Type mismatch in filtering
ages <- c(25L, 30L, 35L, 40L, 45L)
threshold <- "35" # Accidentally a character!
cat("ERROR SCENARIO 2: Type mismatch in comparison\n")
cat("Ages: ", paste(ages, collapse=", "), "\n")
cat("Threshold: ", threshold, " (", class(threshold), ")\n")
cat("Comparison result: ages > threshold\n")
result <- ages > threshold
cat("Result: ", paste(result, collapse=", "), "\n")
cat("Warning: R coerced types, but results may be unexpected!\n")
cat("Solution: Ensure matching types\n")
threshold_correct <- 35L
result_correct <- ages > threshold_correct
cat("Correct result: ", paste(result_correct, collapse=", "), "\n\n")
# Error 3: Forgotten L suffix
cat("ERROR SCENARIO 3: Integer vs Numeric\n")
count1 <- 100
count2 <- 100L
cat("count1 = 100 -> Type: ", typeof(count1), "\n")
cat("count2 = 100L -> Type: ", typeof(count2), "\n")
cat("For counting operations, use integer for memory efficiency\n")
Common Data Type Errors and Solutions ====================================== ERROR SCENARIO 1: Character numbers Data: 1000, 1500, 2000 Type: character Problem: Cannot sum character data Solution: Convert first Result after conversion: 4500 ERROR SCENARIO 2: Type mismatch in comparison Ages: 25, 30, 35, 40, 45 Threshold: 35 ( character ) Comparison result: ages > threshold Result: FALSE, FALSE, FALSE, TRUE, TRUE Warning: R coerced types, but results may be unexpected! Solution: Ensure matching types Correct result: FALSE, FALSE, FALSE, TRUE, TRUE ERROR SCENARIO 3: Integer vs Numeric count1 = 100 -> Type: double count2 = 100L -> Type: integer For counting operations, use integer for memory efficiency
13. Practical Exercises
📝 Exercise 1: Customer Database Analysis
Objective: Create a customer database and perform type-specific operations
Task:
- Create vectors for:
- Customer IDs (integer): 1001L to 1005L
- Customer names (character): Any 5 names
- Purchase amounts (numeric): 5 decimal values between $100-$1000
- Premium membership status (logical): 3 TRUE, 2 FALSE
- Calculate:
- Total purchases
- Average purchase amount
- Number of premium customers
- Percentage of premium customers
- Verify data types using
class()for each vector - Create a report displaying all customers with their details
Expected Output Format:
Customer Database Report ======================== ID: 1001 | Name: [Name] | Purchase: $XXX.XX | Premium: TRUE/FALSE ... Summary Statistics: Total Purchases: $XXXX.XX Average Purchase: $XXX.XX Premium Customers: X out of 5 (XX%)
📝 Exercise 2: Product Inventory Management
Objective: Practice integer operations and logical filtering
Task:
- Create inventory data:
- Product IDs (character): "PROD001" to "PROD010"
- Stock quantities (integer): Random values between 10L and 500L
- Reorder threshold: 100L
- Identify products that need reordering (stock < threshold)
- Calculate:
- Total inventory count
- Number of products needing reorder
- Percentage of stock below threshold
- Display a report showing which products need reordering
Hint: Use logical vectors for filtering
📝 Exercise 3: Sales Data Type Conversion
Objective: Practice converting data types (simulating CSV import)
Task:
- Create "imported" data as character vectors:
- sales_char <- c("15000", "22000", "18500", "31000", "27500")
- year_char <- c("2020", "2021", "2022", "2023", "2024")
- profitable_char <- c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE")
- Convert to appropriate data types
- Perform analysis:
- Calculate total and average sales
- Count profitable years
- Calculate year-over-year growth rate
- Verify all conversions using
class()
Bonus: Create a function that automates the conversion process
📝 Exercise 4: Employee Performance Evaluation
Objective: Combine multiple data types in a real-world scenario
Task:
- Create employee data:
- Employee IDs (character): "EMP001" to "EMP008"
- Names (character): Any 8 names
- Ages (integer): Between 25L and 60L
- Sales figures (numeric): Decimal values
- Met target (logical): Based on sales > $50,000
- Create analysis:
- Categorize employees by age group (< 30, 30-45, 45+)
- Calculate average sales by age group
- Identify top 3 performers
- Calculate percentage who met target
- Generate a comprehensive performance report
📝 Exercise 5: Data Validation System
>Objective: Build a comprehensive data validation system
Task:
- Create a mixed dataset with intentional errors:
- customer_ids: Mix of numeric and character (some invalid)
- ages: Include some negative values and character entries
- revenues: Include some non-numeric text
- status: Mix of TRUE/FALSE and "Yes"/"No" strings
- Write validation functions to:
- Check data type of each variable
- Identify invalid entries
- Count errors by type
- Suggest corrections
- Generate a validation report showing:
- Expected vs actual data types
- List of invalid entries
- Total error count
- Corrected dataset
Advanced Challenge: Create an automated cleaning function that fixes common data type errors
14. Solution to Exercise 1
# Exercise 1 Solution: Customer Database Analysis
# Step 1: Create vectors with appropriate data types
customer_ids <- c(1001L, 1002L, 1003L, 1004L, 1005L)
customer_names <- c("Alice Johnson", "Bob Smith", "Carol Williams",
"David Brown", "Emma Davis")
purchase_amounts <- c(450.75, 892.50, 325.99, 1150.00, 678.25)
premium_status <- c(TRUE, TRUE, FALSE, TRUE, FALSE)
# Step 2: Verify data types
cat("Data Type Verification\n")
cat("======================\n")
cat("Customer IDs: ", class(customer_ids), "\n")
cat("Customer Names: ", class(customer_names), "\n")
cat("Purchase Amounts: ", class(purchase_amounts), "\n")
cat("Premium Status: ", class(premium_status), "\n\n")
# Step 3: Calculate statistics
total_purchases <- sum(purchase_amounts)
average_purchase <- mean(purchase_amounts)
num_premium <- sum(premium_status)
premium_percentage <- (num_premium / length(premium_status)) * 100
# Step 4: Generate comprehensive report
cat("Customer Database Report\n")
cat("========================\n\n")
for(i in 1:length(customer_ids)) {
premium_label <- ifelse(premium_status[i], "✓ Premium", "Standard")
cat(sprintf("ID: %d | Name: %-20s | Purchase: $%8.2f | Status: %s\n",
customer_ids[i], customer_names[i],
purchase_amounts[i], premium_label))
}
cat("\n")
cat("="," rep("=", 70), "\n", sep="")
cat("Summary Statistics\n")
cat(rep("=", 71), "\n", sep="")
cat(sprintf("Total Purchases: $%10.2f\n", total_purchases))
cat(sprintf("Average Purchase: $%10.2f\n", average_purchase))
cat(sprintf("Premium Customers: %d out of %d (%.1f%%)\n",
num_premium, length(customer_ids), premium_percentage))
cat(sprintf("Standard Customers: %d out of %d (%.1f%%)\n",
length(customer_ids) - num_premium, length(customer_ids),
100 - premium_percentage))
# Step 5: Additional analysis - Premium vs Standard spending
premium_purchases <- purchase_amounts[premium_status]
standard_purchases <- purchase_amounts[!premium_status]
cat("\nDetailed Analysis\n")
cat(rep("-", 71), "\n", sep="")
cat(sprintf("Avg Premium Purchase: $%10.2f\n", mean(premium_purchases)))
cat(sprintf("Avg Standard Purchase: $%10.2f\n", mean(standard_purchases)))
cat(sprintf("Highest Purchase: $%10.2f (%s)\n",
max(purchase_amounts),
customer_names[which.max(purchase_amounts)]))
cat(sprintf("Lowest Purchase: $%10.2f (%s)\n",
min(purchase_amounts),
customer_names[which.min(purchase_amounts)]))
Data Type Verification ====================== Customer IDs: integer Customer Names: character Purchase Amounts: numeric Premium Status: logical Customer Database Report ======================== ID: 1001 | Name: Alice Johnson | Purchase: $ 450.75 | Status: ✓ Premium ID: 1002 | Name: Bob Smith | Purchase: $ 892.50 | Status: ✓ Premium ID: 1003 | Name: Carol Williams | Purchase: $ 325.99 | Status: Standard ID: 1004 | Name: David Brown | Purchase: $ 1150.00 | Status: ✓ Premium ID: 1005 | Name: Emma Davis | Purchase: $ 678.25 | Status: Standard ======================================================================= Summary Statistics ======================================================================= Total Purchases: $ 3497.49 Average Purchase: $ 699.50 Premium Customers: 3 out of 5 (60.0%) Standard Customers: 2 out of 5 (40.0%) Detailed Analysis ----------------------------------------------------------------------- Avg Premium Purchase: $ 831.08 Avg Standard Purchase: $ 502.12 Highest Purchase: $ 1150.00 (David Brown) Lowest Purchase: $ 325.99 (Carol Williams)
15. Best Practices Summary
🎯 Golden Rules for Using Data Types in R
| Best Practice | Why It Matters | How to Implement |
|---|---|---|
| 1. Always verify data types after import | CSV/Excel imports often misinterpret types | Use str(), class(), or typeof() |
| 2. Use integer for counting | More memory efficient for large datasets | Add "L" suffix: count <- 100L |
| 3. Convert before calculation | Prevents errors and unexpected results | as.numeric(), as.integer(), etc. |
| 4. Use logical for filtering | Clear, efficient, and readable | data[data$value > 100, ] |
| 5. Keep consistent types in vectors | R coerces mixed types unpredictably | Store different types in separate vectors |
| 6. Document expected types | Makes code maintainable | Add comments explaining data type choices |
| 7. Validate external data | User input and imports can be unreliable | Create validation functions |
| 8. Use character for IDs | IDs shouldn't be used in calculations | customer_id <- "CUST001" |
16. Quick Reference Card
📋 Data Type Quick Reference
| Function | Purpose | Example | Returns |
|---|---|---|---|
class(x) |
Get object class | class(100L) |
"integer" |
typeof(x) |
Get internal type | typeof(100L) |
"integer" |
is.numeric(x) |
Check if numeric | is.numeric(45.6) |
TRUE |
is.integer(x) |
Check if integer | is.integer(100L) |
TRUE |
is.character(x) |
Check if character | is.character("text") |
TRUE |
is.logical(x) |
Check if logical | is.logical(TRUE) |
TRUE |
as.numeric(x) |
Convert to numeric | as.numeric("123") |
123 |
as.integer(x) |
Convert to integer | as.integer(45.7) |
45L |
as.character(x) |
Convert to character | as.character(100) |
"100" |
as.logical(x) |
Convert to logical | as.logical(1) |
TRUE |
17. Real-World Case Study: E-Commerce Analytics
# Real-world scenario: E-commerce platform data analysis
# Demonstrating appropriate use of all data types
# ============= DATA CREATION =============
# Character: Product information
product_ids <- c("ELEC001", "FASH002", "HOME003", "ELEC004", "FASH005",
"HOME006", "ELEC007", "FASH008")
product_names <- c("Laptop", "T-Shirt", "Coffee Maker", "Smartphone",
"Jeans", "Blender", "Tablet", "Dress")
categories <- c("Electronics", "Fashion", "Home", "Electronics",
"Fashion", "Home", "Electronics", "Fashion")
# Numeric: Pricing and financial data
prices <- c(899.99, 29.99, 79.50, 699.99, 59.99, 49.99, 399.99, 89.99)
revenue <- c(17999.80, 1499.50, 2385.00, 27999.60, 2999.50,
1999.60, 11999.70, 3599.60)
# Integer: Quantities and counts
units_sold <- c(20L, 50L, 30L, 40L, 50L, 40L, 30L, 40L)
stock_remaining <- c(15L, 200L, 45L, 25L, 180L, 60L, 20L, 150L)
reorder_point <- 30L
# Logical: Business rules and status
in_stock <- stock_remaining > 0
needs_reorder <- stock_remaining < reorder_point
premium_products <- prices > 500
high_performer <- units_sold > 35L
# ============= ANALYSIS =============
cat("E-COMMERCE ANALYTICS DASHBOARD\n")
cat(rep("=", 80), "\n\n", sep="")
# Section 1: Inventory Management (Integer operations)
cat("1. INVENTORY STATUS\n")
cat(rep("-", 80), "\n", sep="")
total_units_sold <- sum(units_sold)
total_stock <- sum(stock_remaining)
items_needing_reorder <- sum(needs_reorder)
cat(sprintf("Total Units Sold: %d units\n", total_units_sold))
cat(sprintf("Total Stock Remaining: %d units\n", total_stock))
cat(sprintf("Items Needing Reorder: %d out of %d products\n\n",
items_needing_reorder, length(product_ids)))
# Section 2: Financial Analysis (Numeric operations)
cat("2. FINANCIAL PERFORMANCE\n")
cat(rep("-", 80), "\n", sep="")
total_revenue <- sum(revenue)
average_price <- mean(prices)
highest_revenue_product <- which.max(revenue)
cat(sprintf("Total Revenue: $%12.2f\n", total_revenue))
cat(sprintf("Average Product Price: $%12.2f\n", average_price))
cat(sprintf("Top Revenue Product: %s ($%.2f)\n\n",
product_names[highest_revenue_product],
revenue[highest_revenue_product]))
# Section 3: Category Performance (Character grouping)
cat("3. CATEGORY BREAKDOWN\n")
cat(rep("-", 80), "\n", sep="")
unique_categories <- unique(categories)
for(cat_name in unique_categories) {
cat_mask <- categories == cat_name
cat_revenue <- sum(revenue[cat_mask])
cat_units <- sum(units_sold[cat_mask])
cat_products <- sum(cat_mask)
cat(sprintf("%-15s: %d products | %d units sold | $%10.2f revenue\n",
cat_name, cat_products, cat_units, cat_revenue))
}
cat("\n")
# Section 4: Product Performance Matrix (Logical filtering)
cat("4. PRODUCT PERFORMANCE MATRIX\n")
cat(rep("-", 80), "\n", sep="")
cat(sprintf("%-12s %-20s %10s %8s %12s %10s\n",
"ID", "Product", "Price", "Sold", "Revenue", "Status"))
cat(rep("-", 80), "\n", sep="")
for(i in 1:length(product_ids)) {
# Determine status using logical operations
status <- ""
if(premium_products[i] && high_performer[i]) {
status <- "⭐ Premium"
} else if(high_performer[i]) {
status <- "✓ Strong"
} else if(needs_reorder[i]) {
status <- "⚠ Reorder"
} else {
status <- "○ Normal"
}
cat(sprintf("%-12s %-20s $%9.2f %7dL $%11.2f %-10s\n",
product_ids[i], product_names[i], prices[i],
units_sold[i], revenue[i], status))
}
cat("\n")
# Section 5: Key Metrics Summary (Mixed data types)
cat("5. KEY BUSINESS METRICS\n")
cat(rep("-", 80), "\n", sep="")
premium_count <- sum(premium_products)
high_performer_count <- sum(high_performer)
avg_units_per_product <- mean(units_sold)
stock_coverage <- total_stock / total_units_sold
cat(sprintf("Premium Products (>$500): %d (%.1f%%)\n",
premium_count, (premium_count/length(product_ids))*100))
cat(sprintf("High Performers (>35 units): %d (%.1f%%)\n",
high_performer_count, (high_performer_count/length(product_ids))*100))
cat(sprintf("Average Units per Product: %.1f units\n", avg_units_per_product))
cat(sprintf("Stock Coverage Ratio: %.2fx\n", stock_coverage))
cat("\n")
# Section 6: Recommendations (Logical decision-making)
cat("6. AUTOMATED RECOMMENDATIONS\n")
cat(rep("-", 80), "\n", sep="")
urgent_reorders <- product_names[needs_reorder & in_stock]
if(length(urgent_reorders) > 0) {
cat("⚠ URGENT: Reorder these products:\n")
for(prod in urgent_reorders) {
cat(sprintf(" - %s\n", prod))
}
cat("\n")
}
focus_products <- product_names[premium_products & high_performer]
if(length(focus_products) > 0) {
cat("⭐ FOCUS: Increase marketing for top performers:\n")
for(prod in focus_products) {
cat(sprintf(" - %s\n", prod))
}
cat("\n")
}
underperformers <- product_names[!high_performer & !premium_products]
if(length(underperformers) > 0) {
cat("📊 REVIEW: Consider promotions for:\n")
for(prod in underperformers) {
cat(sprintf(" - %s\n", prod))
}
}
# Data Type Summary
cat("\n")
cat(rep("=", 80), "\n", sep="")
cat("DATA TYPE VERIFICATION\n")
cat(rep("=", 80), "\n", sep="")
cat(sprintf("Product IDs: %s (identifiers)\n", class(product_ids)))
cat(sprintf("Prices: %s (financial precision)\n", class(prices)))
cat(sprintf("Units Sold: %s (counting)\n", class(units_sold)))
cat(sprintf("Needs Reorder: %s (business rules)\n", class(needs_reorder)))
cat(sprintf("Categories: %s (categorical data)\n", class(categories)))
E-COMMERCE ANALYTICS DASHBOARD ================================================================================ 1. INVENTORY STATUS -------------------------------------------------------------------------------- Total Units Sold: 300 units Total Stock Remaining: 695 units Items Needing Reorder: 3 out of 8 products 2. FINANCIAL PERFORMANCE -------------------------------------------------------------------------------- Total Revenue: $ 70482.30 Average Product Price: $ 288.68 Top Revenue Product: Smartphone ($27999.60) 3. CATEGORY BREAKDOWN -------------------------------------------------------------------------------- Electronics : 4 products | 120 units sold | $ 57998.10 revenue Fashion : 3 products | 140 units sold | $ 8098.60 revenue Home : 2 products | 70 units sold | $ 4384.60 revenue 4. PRODUCT PERFORMANCE MATRIX -------------------------------------------------------------------------------- ID Product Price Sold Revenue Status -------------------------------------------------------------------------------- ELEC001 Laptop $ 899.99 20L $ 17999.80 ○ Normal FASH002 T-Shirt $ 29.99 50L $ 1499.50 ✓ Strong HOME003 Coffee Maker $ 79.50 30L $ 2385.00 ○ Normal ELEC004 Smartphone $ 699.99 40L $ 27999.60 ⭐ Premium FASH005 Jeans $ 59.99 50L $ 2999.50 ✓ Strong HOME006 Blender $ 49.99 40L $ 1999.60 ✓ Strong ELEC007 Tablet $ 399.99 30L $ 11999.70 ⚠ Reorder FASH008 Dress $ 89.99 40L $ 3599.60 ✓ Strong 5. KEY BUSINESS METRICS -------------------------------------------------------------------------------- Premium Products (>$500): 3 (37.5%) High Performers (>35 units): 5 (62.5%) Average Units per Product: 37.5 units Stock Coverage Ratio: 2.32x 6. AUTOMATED RECOMMENDATIONS -------------------------------------------------------------------------------- ⚠ URGENT: Reorder these products: - Laptop - Tablet - Smartphone ⭐ FOCUS: Increase marketing for top performers: - Smartphone 📊 REVIEW: Consider promotions for: - Laptop - Coffee Maker ================================================================================ DATA TYPE VERIFICATION ================================================================================ Product IDs: character (identifiers) Prices: numeric (financial precision) Units Sold: integer (counting) Needs Reorder: logical (business rules) Categories: character (categorical data)
- Character types preserve product identifiers and categories
- Numeric types handle precise financial calculations
- Integer types efficiently count items
- Logical types enable clear business rule evaluation
18. Summary and Key Takeaways
🎯 Chapter Summary: Data Types in R
What We Learned:
- Six Basic Data Types: Numeric, Integer, Character, Logical, Complex, and Raw
- Appropriate Usage: Each data type has specific use cases in business analytics
- Type Conversion: How to convert between types and when it's necessary
- Data Validation: Importance of checking and verifying data types
- Best Practices: Guidelines for choosing and using data types effectively
Critical Skills Acquired:
- Identifying the appropriate data type for different business scenarios
- Performing type-specific operations (calculations, comparisons, filtering)
- Converting between data types safely and correctly
- Validating data quality through type checking
- Debugging common data type errors
- Building efficient and maintainable R programs
💼 Business Applications Covered
- Financial analysis and calculations (Numeric)
- Customer database management (Character, Integer)
- Inventory tracking and management (Integer, Logical)
- Performance evaluation and filtering (Logical)
- Quality control systems (Logical, Numeric)
- E-commerce analytics (All types combined)
📚 Next Steps for Students
- Practice: Complete all five exercises to reinforce learning
- Experiment: Try modifying the code examples with your own data
- Real Data: Apply these concepts to actual business datasets
- Build Projects: Create your own analytics dashboards using appropriate data types
- Prepare: These fundamentals are essential for upcoming topics (data structures, data frames, statistical analysis)
⚠️ Common Pitfalls to Avoid
- Forgetting to check data types after importing data
- Performing calculations on character data without conversion
- Not using the "L" suffix when integers are needed
- Mixing data types inappropriately in comparisons
- Ignoring type coercion warnings from R
- Not validating external or user-provided data
🔑 Final Key Points
- Data types are foundational – Understanding them is crucial for all R programming
- Choose types intentionally – Each type serves specific purposes in business analytics
- Always validate – Check data types, especially after importing data
- Convert when necessary – Use explicit conversion functions to avoid errors
- Think about efficiency – Proper data types improve performance and memory usage
- Document your choices – Comment why you chose specific data types
🎓 Self-Assessment Questions
- What is the default numerical data type in R, and when should you use it?
- How do you explicitly create an integer in R? Why would you choose integer over numeric?
- What are three common business scenarios where logical data types are essential?
- Explain the difference between
class()andtypeof()functions. - What happens when you try to sum a vector of character numbers without conversion?
- Name three functions you can use to verify data types in R.
- When importing data from CSV, why is type checking critically important?
- How would you convert the string "TRUE" to an actual logical value?
- What is type coercion, and when does R perform it automatically?
- In a business context, should customer IDs be stored as integers or characters? Why?
Check your understanding by answering these questions and testing the concepts with actual R code!
