Appropriate uses of different data types in R

📊 Data Types in R: Appropriate Uses of Different Data Types

Learning Objective: Master the fundamental data types in R and understand when and how to use each type effectively in real-world data analysis scenarios.

1. Introduction to Data Types in R

Data types are the foundation of programming in R. They define what kind of data can be stored and what operations can be performed on that data. Understanding data types is crucial for:

  • Efficient memory management
  • Proper data manipulation and analysis
  • Avoiding errors in statistical computations
  • Writing optimized code for large datasets

R has six basic (atomic) data types, and choosing the right one significantly impacts your program’s performance and accuracy.

2. The Six Basic Data Types in R

Data Type Description Example Values Primary Use Case
Numeric (Double) Decimal numbers (default) 3.14, 100.5, -0.001 Continuous measurements, calculations
Integer Whole numbers 1L, 100L, -50L Counting, indexing, discrete data
Character Text strings “Hello”, ‘MBA’, “2024” Names, labels, categorical text
Logical Boolean values TRUE, FALSE Conditions, filtering, binary outcomes
Complex Complex numbers 1+2i, 3-4i Mathematical computations, signal processing
Raw Raw bytes as.raw(65) Binary data, file operations

3. Numeric Data Type (Double)

Definition and Characteristics

The numeric data type (technically “double” or double-precision floating-point) is the default numerical type in R. It can store decimal numbers with approximately 15-17 significant digits of precision.

✅ When to Use Numeric Data Type:

  • Financial Analysis: Stock prices, revenue, profit margins
  • Statistical Measurements: Mean, median, standard deviation
  • Scientific Calculations: Measurements, ratios, percentages
  • Business Metrics: ROI, conversion rates, growth rates
Example 1: Basic Numeric Operations
# Creating numeric variables revenue <- 125000.50 expenses <- 87500.75 profit_margin <- 0.165 # Calculations net_profit <- revenue - expenses roi <- (net_profit / expenses) * 100 # Display results cat("Revenue: $", revenue, "\n") cat("Expenses: $", expenses, "\n") cat("Net Profit: $", net_profit, "\n") cat("ROI: ", roi, "%\n") # Check data type class(revenue) typeof(revenue)
OUTPUT
Revenue: $ 125000.5 
Expenses: $ 87500.75 
Net Profit: $ 37499.75 
ROI:  42.85657 %
[1] "numeric"
[1] "double"
Example 2: Real-World Business Analytics
# Quarterly sales data Q1_sales <- 250000.00 Q2_sales <- 275000.50 Q3_sales <- 310000.75 Q4_sales <- 340000.25 # Calculate year-over-year growth total_sales <- Q1_sales + Q2_sales + Q3_sales + Q4_sales average_quarterly <- total_sales / 4 growth_rate <- ((Q4_sales - Q1_sales) / Q1_sales) * 100 cat("Total Annual Sales: $", format(total_sales, big.mark=","), "\n") cat("Average Quarterly Sales: $", format(average_quarterly, big.mark=","), "\n") cat("Q4 vs Q1 Growth: ", round(growth_rate, 2), "%\n") # Statistical analysis sales_vector <- c(Q1_sales, Q2_sales, Q3_sales, Q4_sales) cat("\nStatistical Summary:\n") cat("Mean: $", mean(sales_vector), "\n") cat("Median: $", median(sales_vector), "\n") cat("Std Dev: $", round(sd(sales_vector), 2), "\n")
OUTPUT
Total Annual Sales: $ 1,175,001 
Average Quarterly Sales: $ 293,750.4 
Q4 vs Q1 Growth:  36 %

Statistical Summary:
Mean: $ 293750.4 
Median: $ 292500.6 
Std Dev: $ 37322.87
💡 Pro Tip: Numeric types automatically handle decimal calculations. Use round() function to control decimal places for presentation: round(value, 2) for two decimal places.

4. Integer Data Type

Definition and Characteristics

Integers are whole numbers without decimal points. In R, you must explicitly define integers by appending L to the number (e.g., 100L). Integers use less memory than numeric types.

✅ When to Use Integer Data Type:

  • Counting: Number of customers, products, employees
  • Indexing: Array positions, database IDs
  • Discrete Data: Age in years, number of transactions
  • Memory Optimization: Large datasets with whole numbers
Example 3: Customer and Product Counting
# Integer variables for business metrics total_customers <- 15000L new_customers_month <- 450L products_sold <- 8500L inventory_count <- 12500L # Calculations customer_growth <- new_customers_month remaining_inventory <- inventory_count - products_sold cat("Total Customers: ", total_customers, "\n") cat("New Customers This Month: ", new_customers_month, "\n") cat("Products Sold: ", products_sold, "\n") cat("Remaining Inventory: ", remaining_inventory, "\n") # Verify data type cat("\nData Type Check:\n") cat("Class: ", class(total_customers), "\n") cat("Type: ", typeof(total_customers), "\n") # Comparison: Integer vs Numeric memory numeric_var <- 15000 integer_var <- 15000L cat("\nMemory Comparison:\n") cat("Numeric size: ", object.size(numeric_var), " bytes\n") cat("Integer size: ", object.size(integer_var), " bytes\n")
OUTPUT
Total Customers:  15000 
New Customers This Month:  450 
Products Sold:  8500 
Remaining Inventory:  4000 

Data Type Check:
Class:  integer 
Type:  integer 

Memory Comparison:
Numeric size:  56  bytes
Integer size:  56  bytes
Example 4: Age Groups and Demographics
# Employee demographics (integers are perfect for age) employee_ages <- c(25L, 32L, 45L, 28L, 55L, 38L, 42L, 29L) num_employees <- length(employee_ages) # Age analysis youngest <- min(employee_ages) oldest <- max(employee_ages) median_age <- median(employee_ages) # Age group categorization under_30 <- sum(employee_ages < 30L) age_30_to_40 <- sum(employee_ages >= 30L & employee_ages < 40L) age_40_plus <- sum(employee_ages >= 40L) cat("Employee Demographics Analysis\n") cat("==============================\n") cat("Total Employees: ", num_employees, "\n") cat("Youngest Employee: ", youngest, " years\n") cat("Oldest Employee: ", oldest, " years\n") cat("Median Age: ", median_age, " years\n\n") cat("Age Distribution:\n") cat("Under 30: ", under_30, " employees\n") cat("30-39: ", age_30_to_40, " employees\n") cat("40+: ", age_40_plus, " employees\n")
OUTPUT
Employee Demographics Analysis
==============================
Total Employees:  8 
Youngest Employee:  25  years
Oldest Employee:  55  years
Median Age:  35  years

Age Distribution:
Under 30:  3  employees
30-39:  2  employees
40+:  3  employees
⚠️ Important: Without the "L" suffix, R treats numbers as numeric (double) by default. Always use "L" when you specifically need integers for memory efficiency or when working with functions that require integer inputs.

5. Character Data Type

Definition and Characteristics

Character data type stores text strings. Strings must be enclosed in either single quotes ('text') or double quotes ("text"). This is essential for storing names, labels, descriptions, and categorical data.

✅ When to Use Character Data Type:

  • Identifiers: Customer names, product codes, IDs
  • Categorical Data: Department names, product categories, regions
  • Text Analysis: Customer reviews, feedback, descriptions
  • Labels: Chart labels, report headers, annotations
Example 5: Customer Information System
# Customer database customer_names <- c("John Smith", "Sarah Johnson", "Michael Brown", "Emily Davis") customer_ids <- c("CUST001", "CUST002", "CUST003", "CUST004") departments <- c("Sales", "Marketing", "Finance", "Operations") email_domains <- c("gmail.com", "company.com", "yahoo.com", "outlook.com") # String operations cat("Customer Database\n") cat("=================\n") for(i in 1:length(customer_names)) { cat("ID: ", customer_ids[i], " | Name: ", customer_names[i], " | Dept: ", departments[i], "\n") } # String functions cat("\nString Analysis:\n") cat("First customer name length: ", nchar(customer_names[1]), " characters\n") cat("Uppercase ID: ", toupper(customer_ids[1]), "\n") cat("Lowercase email: ", tolower(email_domains[2]), "\n") # Check data type cat("\nData Type: ", class(customer_names), "\n")
OUTPUT
Customer Database
=================
ID:  CUST001  | Name:  John Smith  | Dept:  Sales 
ID:  CUST002  | Name:  Sarah Johnson  | Dept:  Marketing 
ID:  CUST003  | Name:  Michael Brown  | Dept:  Finance 
ID:  CUST004  | Name:  Emily Davis  | Dept:  Operations 

String Analysis:
First customer name length:  10  characters
Uppercase ID:  CUST001 
Lowercase email:  company.com 

Data Type:  character
Example 6: Product Catalog Management
# Product information product_names <- c("Laptop Pro 15", "Wireless Mouse", "USB-C Hub", "Mechanical Keyboard") product_categories <- c("Electronics", "Accessories", "Accessories", "Electronics") product_skus <- c("ELEC-LP-001", "ACC-MS-045", "ACC-HB-023", "ELEC-KB-089") # String concatenation and manipulation cat("Product Catalog\n") cat("===============\n\n") for(i in 1:length(product_names)) { full_label <- paste(product_skus[i], "-", product_names[i]) cat("Product ", i, ": ", full_label, "\n") cat("Category: ", product_categories[i], "\n") # Extract category prefix from SKU sku_prefix <- substr(product_skus[i], 1, 4) cat("SKU Prefix: ", sku_prefix, "\n\n") } # Count products by category electronics_count <- sum(product_categories == "Electronics") accessories_count <- sum(product_categories == "Accessories") cat("Category Summary:\n") cat("Electronics: ", electronics_count, " products\n") cat("Accessories: ", accessories_count, " products\n")
OUTPUT
Product Catalog
===============

Product  1 :  ELEC-LP-001 - Laptop Pro 15 
Category:  Electronics 
SKU Prefix:  ELEC 

Product  2 :  ACC-MS-045 - Wireless Mouse 
Category:  Accessories 
SKU Prefix:  ACC- 

Product  3 :  ACC-HB-023 - USB-C Hub 
Category:  Accessories 
SKU Prefix:  ACC- 

Product  4 :  ELEC-KB-089 - Mechanical Keyboard 
Category:  Electronics 
SKU Prefix:  ELEC 

Category Summary:
Electronics:  2  products
Accessories:  2  products
💡 Useful String Functions:
  • paste() - Concatenate strings
  • substr() - Extract substring
  • toupper(), tolower() - Change case
  • nchar() - Count characters
  • grep(), grepl() - Pattern matching

6. Logical Data Type

Definition and Characteristics

Logical (or Boolean) data type has only two possible values: TRUE or FALSE. These are fundamental for conditional operations, filtering data, and control flow in programs.

✅ When to Use Logical Data Type:

  • Data Filtering: Selecting records that meet criteria
  • Conditional Checks: Testing if conditions are met
  • Binary Outcomes: Yes/No, Pass/Fail, Active/Inactive
  • Control Flow: If-else statements, while loops
Example 7: Sales Performance Evaluation
# Sales representatives performance sales_reps <- c("Alice", "Bob", "Charlie", "Diana", "Edward") monthly_sales <- c(125000, 87000, 145000, 92000, 156000) target <- 100000 # Logical evaluations met_target <- monthly_sales >= target exceeded_120k <- monthly_sales > 120000 top_performer <- monthly_sales == max(monthly_sales) # Performance report cat("Monthly Sales Performance Report\n") cat("================================\n\n") for(i in 1:length(sales_reps)) { cat("Representative: ", sales_reps[i], "\n") cat("Sales: $", monthly_sales[i], "\n") cat("Met Target: ", met_target[i], "\n") cat("Exceeded $120K: ", exceeded_120k[i], "\n") cat("Top Performer: ", top_performer[i], "\n") cat("---\n") } # Summary statistics total_met_target <- sum(met_target) total_exceeded_120k <- sum(exceeded_120k) percentage_met <- (total_met_target / length(sales_reps)) * 100 cat("\nSummary:\n") cat("Representatives meeting target: ", total_met_target, " out of ", length(sales_reps), "\n") cat("Representatives exceeding $120K: ", total_exceeded_120k, "\n") cat("Success Rate: ", percentage_met, "%\n") # Check data type cat("\nData Type: ", class(met_target), "\n")
OUTPUT
Monthly Sales Performance Report
================================

Representative:  Alice 
Sales: $ 125000 
Met Target:  TRUE 
Exceeded $120K:  TRUE 
Top Performer:  FALSE 
---
Representative:  Bob 
Sales: $ 87000 
Met Target:  FALSE 
Exceeded $120K:  FALSE 
Top Performer:  FALSE 
---
Representative:  Charlie 
Sales: $ 145000 
Met Target:  TRUE 
Exceeded $120K:  TRUE 
Top Performer:  FALSE 
---
Representative:  Diana 
Sales: $ 92000 
Met Target:  FALSE 
Exceeded $120K:  FALSE 
Top Performer:  FALSE 
---
Representative:  Edward 
Sales: $ 156000 
Met Target:  TRUE 
Exceeded $120K:  TRUE 
Top Performer:  TRUE 
---

Summary:
Representatives meeting target:  3  out of  5 
Representatives exceeding $120K:  3 
Success Rate:  60 %

Data Type:  logical
Example 8: Quality Control and Validation
# Product quality inspection product_ids <- c("P001", "P002", "P003", "P004", "P005") weight_kg <- c(5.2, 4.8, 5.0, 5.5, 4.9) dimensions_ok <- c(TRUE, TRUE, FALSE, TRUE, TRUE) quality_passed <- c(TRUE, FALSE, TRUE, TRUE, TRUE) # Specifications min_weight <- 4.9 max_weight <- 5.3 # Validation checks weight_in_range <- (weight_kg >= min_weight) & (weight_kg <= max_weight) overall_pass <- weight_in_range & dimensions_ok & quality_passed cat("Quality Control Report\n") cat("=====================\n\n") for(i in 1:length(product_ids)) { cat("Product ID: ", product_ids[i], "\n") cat("Weight: ", weight_kg[i], " kg | In Range: ", weight_in_range[i], "\n") cat("Dimensions OK: ", dimensions_ok[i], "\n") cat("Quality Test: ", quality_passed[i], "\n") cat("OVERALL STATUS: ", ifelse(overall_pass[i], "PASS ✓", "FAIL ✗"), "\n") cat("---\n") } # Summary pass_count <- sum(overall_pass) fail_count <- sum(!overall_pass) pass_rate <- (pass_count / length(product_ids)) * 100 cat("\nQuality Summary:\n") cat("Products Passed: ", pass_count, "\n") cat("Products Failed: ", fail_count, "\n") cat("Pass Rate: ", round(pass_rate, 1), "%\n")
OUTPUT
Quality Control Report
=====================

Product ID:  P001 
Weight:  5.2  kg | In Range:  TRUE 
Dimensions OK:  TRUE 
Quality Test:  TRUE 
OVERALL STATUS:  PASS ✓ 
---
Product ID:  P002 
Weight:  4.8  kg | In Range:  FALSE 
Dimensions OK:  TRUE 
Quality Test:  FALSE 
OVERALL STATUS:  FAIL ✗ 
---
Product ID:  P003 
Weight:  5  kg | In Range:  TRUE 
Dimensions OK:  FALSE 
Quality Test:  TRUE 
OVERALL STATUS:  FAIL ✗ 
---
Product ID:  P004 
Weight:  5.5  kg | In Range:  FALSE 
Dimensions OK:  TRUE 
Quality Test:  TRUE 
OVERALL STATUS:  FAIL ✗ 
---
Product ID:  P005 
Weight:  4.9  kg | In Range:  TRUE 
Dimensions OK:  TRUE 
Quality Test:  TRUE 
OVERALL STATUS:  PASS ✓ 
---

Quality Summary:
Products Passed:  2 
Products Failed:  3 
Pass Rate:  40 %
🔑 Key Point: Logical values are the result of comparison operations (>, <, ==, !=, >=, <=). They can be combined using logical operators: & (AND), | (OR), ! (NOT).

7. Complex Data Type

Definition and Characteristics

Complex numbers have a real and an imaginary part (e.g., 3 + 4i). While less common in business analytics, they're essential for certain mathematical computations, signal processing, and engineering applications.

✅ When to Use Complex Data Type:

  • Engineering Calculations: Electrical engineering, signal processing
  • Advanced Mathematics: Fourier transforms, eigenvalue problems
  • Scientific Computing: Quantum mechanics, wave equations
  • Financial Engineering: Options pricing (advanced models)
Example 9: Complex Number Operations
# Creating complex numbers z1 <- 3 + 4i z2 <- 2 - 3i z3 <- complex(real = 5, imaginary = 2) # Basic operations sum_z <- z1 + z2 diff_z <- z1 - z2 prod_z <- z1 * z2 quot_z <- z1 / z2 cat("Complex Number Operations\n") cat("=========================\n\n") cat("z1 = ", z1, "\n") cat("z2 = ", z2, "\n\n") cat("Addition (z1 + z2) = ", sum_z, "\n") cat("Subtraction (z1 - z2) = ", diff_z, "\n") cat("Multiplication (z1 × z2) = ", prod_z, "\n") cat("Division (z1 ÷ z2) = ", quot_z, "\n\n") # Properties of complex numbers cat("Properties of z1:\n") cat("Real part: ", Re(z ```html 1), "\n") cat("Imaginary part: ", Im(z1), "\n") cat("Modulus (|z1|): ", Mod(z1), "\n") cat("Conjugate: ", Conj(z1), "\n") cat("Argument (angle): ", Arg(z1), " radians\n\n") # Check data type cat("Data Type: ", class(z1), "\n") cat("Type: ", typeof(z1), "\n")
OUTPUT
Complex Number Operations
=========================

z1 =  3+4i 
z2 =  2-3i 

Addition (z1 + z2) =  5+1i 
Subtraction (z1 - z2) =  1+7i 
Multiplication (z1 × z2) =  18+(-1)i 
Division (z1 ÷ z2) =  -0.4615385+1.307692i 

Properties of z1:
Real part:  3 
Imaginary part:  4 
Modulus (|z1|):  5 
Conjugate:  3-4i 
Argument (angle):  0.9272952  radians

Data Type:  complex 
Type:  complex
Example 10: Signal Processing Application
# Electrical impedance calculation (AC circuit analysis) # Impedance Z = R + jX, where R is resistance and X is reactance resistance <- c(100, 150, 200) # Ohms reactance <- c(50, -75, 100) # Ohms (positive for inductive, negative for capacitive) # Create complex impedances impedances <- complex(real = resistance, imaginary = reactance) cat("AC Circuit Analysis - Impedance Calculations\n") cat("============================================\n\n") for(i in 1:length(impedances)) { z <- impedances[i] magnitude <- Mod(z) phase_rad <- Arg(z) phase_deg <- phase_rad * 180 / pi cat("Circuit ", i, ":\n") cat("Impedance: ", z, " Ω\n") cat("Magnitude: ", round(magnitude, 2), " Ω\n") cat("Phase Angle: ", round(phase_deg, 2), "°\n") if(Im(z) > 0) { cat("Type: Inductive\n") } else if(Im(z) < 0) { cat("Type: Capacitive\n") } else { cat("Type: Purely Resistive\n") } cat("---\n") } # Calculate total impedance (series circuit) total_impedance <- sum(impedances) cat("\nTotal Series Impedance: ", total_impedance, " Ω\n") cat("Total Magnitude: ", round(Mod(total_impedance), 2), " Ω\n")
OUTPUT
AC Circuit Analysis - Impedance Calculations
============================================

Circuit  1 :
Impedance:  100+50i  Ω
Magnitude:  111.8  Ω
Phase Angle:  26.57 °
Type: Inductive
---
Circuit  2 :
Impedance:  150-75i  Ω
Magnitude:  167.71  Ω
Phase Angle:  -26.57 °
Type: Capacitive
---
Circuit  3 :
Impedance:  200+100i  Ω
Magnitude:  223.61  Ω
Phase Angle:  26.57 °
Type: Inductive
---

Total Series Impedance:  450+75i  Ω
Total Magnitude:  456.22  Ω
⚠️ Note: Complex data types are rarely used in typical business analytics but are essential for specialized fields like engineering, physics, and advanced financial modeling. Most MBA students won't need them frequently, but understanding they exist is important.

8. Raw Data Type

Definition and Characteristics

The raw data type stores data as raw bytes (values from 0 to 255). It's used for low-level data operations, binary file handling, and interfacing with external systems.

✅ When to Use Raw Data Type:

  • Binary File Operations: Reading/writing binary files
  • Data Encryption: Cryptographic operations
  • Network Communication: Socket programming, protocols
  • Low-level Data Processing: Image processing, audio data
Example 11: Raw Data Basics
# Creating raw data raw_bytes <- as.raw(c(65, 66, 67, 68, 69)) # ASCII codes for A, B, C, D, E raw_single <- as.raw(72) # ASCII for 'H' cat("Raw Data Type Demonstration\n") cat("===========================\n\n") cat("Raw bytes: ") print(raw_bytes) cat("\n") # Convert raw to character chars <- rawToChar(raw_bytes) cat("Converted to characters: ", chars, "\n\n") # Convert character to raw text <- "HELLO" raw_from_text <- charToRaw(text) cat("Text '", text, "' as raw bytes: ") print(raw_from_text) cat("\n") # Check data type cat("Data Type: ", class(raw_bytes), "\n") cat("Type: ", typeof(raw_bytes), "\n\n") # Size comparison numeric_vector <- c(65, 66, 67, 68, 69) cat("Memory Usage Comparison:\n") cat("Numeric vector: ", object.size(numeric_vector), " bytes\n") cat("Raw vector: ", object.size(raw_bytes), " bytes\n")
OUTPUT
Raw Data Type Demonstration
===========================

Raw bytes: [1] 41 42 43 44 45

Converted to characters:  ABCDE 

Text ' HELLO ' as raw bytes: [1] 48 45 4c 4c 4f

Data Type:  raw 
Type:  raw 

Memory Usage Comparison:
Numeric vector:  88  bytes
Raw vector:  53  bytes
💡 Practical Insight: Raw data types are memory-efficient for storing binary data but are rarely used in typical business analytics. They're more relevant for data engineers and systems programmers.

9. Data Type Conversion (Coercion)

Understanding Type Conversion

R allows you to convert between data types using as.numeric(), as.integer(), as.character(), as.logical() functions. Understanding when and how to convert is crucial for data manipulation.

Example 12: Explicit Type Conversion
# Starting with different types num_value <- 42.7 char_number <- "123" char_text <- "Hello" logical_value <- TRUE integer_val <- 100L cat("Type Conversion Examples\n") cat("========================\n\n") # Numeric to Integer num_to_int <- as.integer(num_value) cat("Numeric to Integer:\n") cat(num_value, " (", class(num_value), ") -> ", num_to_int, " (", class(num_to_int), ")\n\n") # Character to Numeric char_to_num <- as.numeric(char_number) cat("Character to Numeric:\n") cat("'", char_number, "' (", class(char_number), ") -> ", char_to_num, " (", class(char_to_num), ")\n\n") # Numeric to Character num_to_char <- as.character(num_value) cat("Numeric to Character:\n") cat(num_value, " (", class(num_value), ") -> '", num_to_char, "' (", class(num_to_char), ")\n\n") # Logical to Numeric logical_to_num <- as.numeric(logical_value) cat("Logical to Numeric:\n") cat(logical_value, " (", class(logical_value), ") -> ", logical_to_num, " (", class(logical_to_num), ")\n\n") # Attempting invalid conversion invalid_conversion <- as.numeric(char_text) cat("Invalid Conversion (generates warning):\n") cat("'", char_text, "' to numeric -> ", invalid_conversion, "\n") cat("(NA = Not Available/Missing Value)\n")
OUTPUT
Type Conversion Examples
========================

Numeric to Integer:
42.7  ( numeric ) ->  42  ( integer )

Character to Numeric:
' 123 ' ( character ) ->  123  ( numeric )

Numeric to Character:
42.7  ( numeric ) -> ' 42.7 ' ( character )

Logical to Numeric:
TRUE  ( logical ) ->  1  ( numeric )

Invalid Conversion (generates warning):
' Hello ' to numeric ->  NA 
(NA = Not Available/Missing Value)
Example 13: Practical Business Application of Conversion
# Data imported from Excel/CSV often comes as character revenue_char <- c("125000", "87500", "156000", "92000", "134500") year_char <- c("2020", "2021", "2022", "2023", "2024") profitable_char <- c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE") cat("Data Cleaning and Type Conversion\n") cat("==================================\n\n") # Convert to appropriate types revenue_num <- as.numeric(revenue_char) year_int <- as.integer(year_char) profitable_log <- as.logical(profitable_char) # Perform calculations (only possible after conversion) total_revenue <- sum(revenue_num) avg_revenue <- mean(revenue_num) num_profitable <- sum(profitable_log) profitable_rate <- (num_profitable / length(profitable_log)) * 100 cat("Financial Analysis (After Type Conversion)\n") cat("------------------------------------------\n") cat("Total Revenue: $", format(total_revenue, big.mark=","), "\n") cat("Average Revenue: $", format(avg_revenue, big.mark=","), "\n") cat("Profitable Years: ", num_profitable, " out of ", length(year_int), "\n") cat("Profitability Rate: ", round(profitable_rate, 1), "%\n\n") # Year-over-year analysis cat("Yearly Breakdown:\n") for(i in 1:length(year_int)) { status <- ifelse(profitable_log[i], "PROFITABLE", "LOSS") cat("Year ", year_int[i], ": $", format(revenue_num[i], big.mark=","), " - ", status, "\n") } # Data type verification cat("\nData Types After Conversion:\n") cat("Revenue: ", class(revenue_num), "\n") cat("Year: ", class(year_int), "\n") cat("Profitable: ", class(profitable_log), "\n")
OUTPUT
Data Cleaning and Type Conversion
==================================

Financial Analysis (After Type Conversion)
------------------------------------------
Total Revenue: $ 595,000 
Average Revenue: $ 119,000 
Profitable Years:  3  out of  5 
Profitability Rate:  60 %

Yearly Breakdown:
Year  2020 : $ 125,000  -  PROFITABLE 
Year  2021 : $ 87,500  -  LOSS 
Year  2022 : $ 156,000  -  PROFITABLE 
Year  2023 : $ 92,000  -  PROFITABLE 
Year  2024 : $ 134,500  -  LOSS 

Data Types After Conversion:
Revenue:  numeric 
Year:  integer 
Profitable:  logical
🔑 Critical Business Insight: When importing data from Excel, CSV, or databases, numbers often come as character strings. Always verify and convert data types before performing calculations to avoid errors.

10. Checking and Verifying Data Types

Essential Functions for Type Checking

R provides several functions to check data types. Using these functions is crucial for debugging and ensuring data quality.

Example 14: Comprehensive Type Checking
# Create variables of different types my_numeric <- 45.6 my_integer <- 100L my_character <- "Business Analytics" my_logical <- FALSE my_complex <- 3 + 2i cat("Data Type Verification Functions\n") cat("=================================\n\n") # Using class() function cat("Using class() function:\n") cat("my_numeric: ", class(my_numeric), "\n") cat("my_integer: ", class(my_integer), "\n") cat("my_character: ", class(my_character), "\n") cat("my_logical: ", class(my_logical), "\n") cat("my_complex: ", class(my_complex), "\n\n") # Using typeof() function (more specific) cat("Using typeof() function:\n") cat("my_numeric: ", typeof(my_numeric), "\n") cat("my_integer: ", typeof(my_integer), "\n") cat("my_character: ", typeof(my_character), "\n") cat("my_logical: ", typeof(my_logical), "\n") cat("my_complex: ", typeof(my_complex), "\n\n") # Using is.* functions (returns TRUE/FALSE) cat("Using is.* verification functions:\n") cat("is.numeric(my_numeric): ", is.numeric(my_numeric), "\n") cat("is.integer(my_integer): ", is.integer(my_integer), "\n") cat("is.character(my_character): ", is.character(my_character), "\n") cat("is.logical(my_logical): ", is.logical(my_logical), "\n") cat("is.complex(my_complex): ", is.complex(my_complex), "\n\n") # Checking multiple conditions cat("Advanced Checking:\n") cat("Is my_integer numeric? ", is.numeric(my_integer), "\n") cat("Is my_integer specifically integer? ", is.integer(my_integer), "\n") cat("Is '123' numeric? ", is.numeric("123"), "\n") cat("Is '123' character? ", is.character("123"), "\n")
OUTPUT
Data Type Verification Functions
=================================

Using class() function:
my_numeric:  numeric 
my_integer:  integer 
my_character:  character 
my_logical:  logical 
my_complex:  complex 

Using typeof() function:
my_numeric:  double 
my_integer:  integer 
my_character:  character 
my_logical:  logical 
my_complex:  complex 

Using is.* verification functions:
is.numeric(my_numeric):  TRUE 
is.integer(my_integer):  TRUE 
is.character(my_character):  TRUE 
is.logical(my_logical):  TRUE 
is.complex(my_complex):  TRUE 

Advanced Checking:
Is my_integer numeric?  TRUE 
Is my_integer specifically integer?  TRUE 
Is '123' numeric?  FALSE 
Is '123' character?  TRUE
Example 15: Data Quality Validation Function
# Real-world data validation scenario # Function to validate imported data validate_data <- function(data_value, expected_type, variable_name) { cat("Validating: ", variable_name, "\n") cat("Value: ", data_value, "\n") cat("Expected Type: ", expected_type, "\n") is_valid <- switch(expected_type, "numeric" = is.numeric(data_value), "integer" = is.integer(data_value), "character" = is.character(data_value), "logical" = is.logical(data_value), FALSE) actual_type <- class(data_value) cat("Actual Type: ", actual_type, "\n") if(is_valid) { cat("✓ VALIDATION PASSED\n") } else { cat("✗ VALIDATION FAILED - Type Mismatch!\n") } cat("---\n") return(is_valid) } # Test the validation function cat("Data Import Validation Report\n") cat("==============================\n\n") customer_id <- 12345L customer_name <- "John Smith" purchase_amount <- 1250.50 is_premium <- TRUE invalid_data <- "not_a_number" validate_data(customer_id, "integer", "Customer ID") validate_data(customer_name, "character", "Customer Name") validate_data(purchase_amount, "numeric", "Purchase Amount") validate_data(is_premium, "logical", "Premium Status") validate_data(invalid_data, "numeric", "Invalid Numeric Field")
OUTPUT
Data Import Validation Report
==============================

Validating:  Customer ID 
Value:  12345 
Expected Type:  integer 
Actual Type:  integer 
✓ VALIDATION PASSED
---
Validating:  Customer Name 
Value:  John Smith 
Expected Type:  character 
Actual Type:  character 
✓ VALIDATION PASSED
---
Validating:  Purchase Amount 
Value:  1250.5 
Expected Type:  numeric 
Actual Type:  numeric 
✓ VALIDATION PASSED
---
Validating:  Premium Status 
Value:  TRUE 
Expected Type:  logical 
Actual Type:  logical 
✓ VALIDATION PASSED
---
Validating:  Invalid Numeric Field 
Value:  not_a_number 
Expected Type:  numeric 
Actual Type:  character 
✗ VALIDATION FAILED - Type Mismatch!
---

11. Decision Guide: Choosing the Right Data Type

Scenario Recommended Data Type Reason Example
Stock prices, revenue Numeric Need decimal precision 125.75, 1500.50
Number of customers, products Integer Whole numbers, memory efficient 1500L, 2500L
Customer names, IDs Character Text data, identifiers "John", "CUST001"
Yes/No, Pass/Fail Logical Binary outcomes, filtering TRUE, FALSE
Age in years Integer Discrete whole numbers 25L, 45L
Percentages, ratios Numeric Fractional values 0.15, 1.25
Product categories Character Categorical text "Electronics"
Date as text Character Then convert to Date type "2024-01-15"
Survey responses (Yes/No) Logical Binary responses TRUE, FALSE
Ratings (1-5) Integer Discrete values 1L, 2L, 3L, 4L, 5L

12. Common Data Type Mistakes and How to Avoid Them

❌ Common Mistake #1: Forgetting the "L" suffix for integers

Wrong: count <- 100 (creates numeric)

Right: count <- 100L (creates integer)

❌ Common Mistake #2: Performing calculations on character data

Problem: sum(c("100", "200")) will cause an error

Solution: sum(as.numeric(c("100", "200")))

❌ Common Mistake #3: Mixing data types in comparisons

Confusing: "100" == 100 returns TRUE (R coerces types)

Better: Explicitly convert: as.numeric("100") == 100

❌ Common Mistake #4: Not checking data types after import

Problem: CSV imports often make numbers into characters

Solution: Always use str() or class() after importing

Example 16: Debugging Type Errors
# Simulating common data type errors cat("Common Data Type Errors and Solutions\n") cat("======================================\n\n") # Error 1: Operating on character numbers sales_data <- c("1000", "1500", "2000") cat("ERROR SCENARIO 1: Character numbers\n") cat("Data: ", paste(sales_data, collapse=", "), "\n") cat("Type: ", class(sales_data), "\n") # Attempting: total <- sum(sales_data) # This would error! cat("Problem: Cannot sum character data\n") cat("Solution: Convert first\n") sales_numeric <- as.numeric(sales_data) total <- sum(sales_numeric) cat("Result after conversion: ", total, "\n\n") # Error 2: Type mismatch in filtering ages <- c(25L, 30L, 35L, 40L, 45L) threshold <- "35" # Accidentally a character! cat("ERROR SCENARIO 2: Type mismatch in comparison\n") cat("Ages: ", paste(ages, collapse=", "), "\n") cat("Threshold: ", threshold, " (", class(threshold), ")\n") cat("Comparison result: ages > threshold\n") result <- ages > threshold cat("Result: ", paste(result, collapse=", "), "\n") cat("Warning: R coerced types, but results may be unexpected!\n") cat("Solution: Ensure matching types\n") threshold_correct <- 35L result_correct <- ages > threshold_correct cat("Correct result: ", paste(result_correct, collapse=", "), "\n\n") # Error 3: Forgotten L suffix cat("ERROR SCENARIO 3: Integer vs Numeric\n") count1 <- 100 count2 <- 100L cat("count1 = 100 -> Type: ", typeof(count1), "\n") cat("count2 = 100L -> Type: ", typeof(count2), "\n") cat("For counting operations, use integer for memory efficiency\n")
OUTPUT
Common Data Type Errors and Solutions
======================================

ERROR SCENARIO 1: Character numbers
Data:  1000, 1500, 2000 
Type:  character 
Problem: Cannot sum character data
Solution: Convert first
Result after conversion:  4500 

ERROR SCENARIO 2: Type mismatch in comparison
Ages:  25, 30, 35, 40, 45 
Threshold:  35  ( character )
Comparison result: ages > threshold
Result:  FALSE, FALSE, FALSE, TRUE, TRUE 
Warning: R coerced types, but results may be unexpected!
Solution: Ensure matching types
Correct result:  FALSE, FALSE, FALSE, TRUE, TRUE 

ERROR SCENARIO 3: Integer vs Numeric
count1 = 100 -> Type:  double 
count2 = 100L -> Type:  integer 
For counting operations, use integer for memory efficiency

13. Practical Exercises

📝 Exercise 1: Customer Database Analysis

Objective: Create a customer database and perform type-specific operations

Task:

  1. Create vectors for:
    • Customer IDs (integer): 1001L to 1005L
    • Customer names (character): Any 5 names
    • Purchase amounts (numeric): 5 decimal values between $100-$1000
    • Premium membership status (logical): 3 TRUE, 2 FALSE
  2. Calculate:
    • Total purchases
    • Average purchase amount
    • Number of premium customers
    • Percentage of premium customers
  3. Verify data types using class() for each vector
  4. Create a report displaying all customers with their details

Expected Output Format:

Customer Database Report
========================
ID: 1001 | Name: [Name] | Purchase: $XXX.XX | Premium: TRUE/FALSE
...

Summary Statistics:
Total Purchases: $XXXX.XX
Average Purchase: $XXX.XX
Premium Customers: X out of 5 (XX%)

📝 Exercise 2: Product Inventory Management

Objective: Practice integer operations and logical filtering

Task:

  1. Create inventory data:
    • Product IDs (character): "PROD001" to "PROD010"
    • Stock quantities (integer): Random values between 10L and 500L
    • Reorder threshold: 100L
  2. Identify products that need reordering (stock < threshold)
  3. Calculate:
    • Total inventory count
    • Number of products needing reorder
    • Percentage of stock below threshold
  4. Display a report showing which products need reordering

Hint: Use logical vectors for filtering

📝 Exercise 3: Sales Data Type Conversion

Objective: Practice converting data types (simulating CSV import)

Task:

  1. Create "imported" data as character vectors:
    • sales_char <- c("15000", "22000", "18500", "31000", "27500")
    • year_char <- c("2020", "2021", "2022", "2023", "2024")
    • profitable_char <- c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE")
  2. Convert to appropriate data types
  3. Perform analysis:
    • Calculate total and average sales
    • Count profitable years
    • Calculate year-over-year growth rate
  4. Verify all conversions using class()

Bonus: Create a function that automates the conversion process

📝 Exercise 4: Employee Performance Evaluation

Objective: Combine multiple data types in a real-world scenario

Task:

  1. Create employee data:
    • Employee IDs (character): "EMP001" to "EMP008"
    • Names (character): Any 8 names
    • Ages (integer): Between 25L and 60L
    • Sales figures (numeric): Decimal values
    • Met target (logical): Based on sales > $50,000
  2. Create analysis:
    • Categorize employees by age group (< 30, 30-45, 45+)
    • Calculate average sales by age group
    • Identify top 3 performers
    • Calculate percentage who met target
  3. Generate a comprehensive performance report

📝 Exercise 5: Data Validation System

>

Objective: Build a comprehensive data validation system

Task:

  1. Create a mixed dataset with intentional errors:
    • customer_ids: Mix of numeric and character (some invalid)
    • ages: Include some negative values and character entries
    • revenues: Include some non-numeric text
    • status: Mix of TRUE/FALSE and "Yes"/"No" strings
  2. Write validation functions to:
    • Check data type of each variable
    • Identify invalid entries
    • Count errors by type
    • Suggest corrections
  3. Generate a validation report showing:
    • Expected vs actual data types
    • List of invalid entries
    • Total error count
    • Corrected dataset

Advanced Challenge: Create an automated cleaning function that fixes common data type errors

14. Solution to Exercise 1

Complete Solution: Customer Database Analysis
# Exercise 1 Solution: Customer Database Analysis # Step 1: Create vectors with appropriate data types customer_ids <- c(1001L, 1002L, 1003L, 1004L, 1005L) customer_names <- c("Alice Johnson", "Bob Smith", "Carol Williams", "David Brown", "Emma Davis") purchase_amounts <- c(450.75, 892.50, 325.99, 1150.00, 678.25) premium_status <- c(TRUE, TRUE, FALSE, TRUE, FALSE) # Step 2: Verify data types cat("Data Type Verification\n") cat("======================\n") cat("Customer IDs: ", class(customer_ids), "\n") cat("Customer Names: ", class(customer_names), "\n") cat("Purchase Amounts: ", class(purchase_amounts), "\n") cat("Premium Status: ", class(premium_status), "\n\n") # Step 3: Calculate statistics total_purchases <- sum(purchase_amounts) average_purchase <- mean(purchase_amounts) num_premium <- sum(premium_status) premium_percentage <- (num_premium / length(premium_status)) * 100 # Step 4: Generate comprehensive report cat("Customer Database Report\n") cat("========================\n\n") for(i in 1:length(customer_ids)) { premium_label <- ifelse(premium_status[i], "✓ Premium", "Standard") cat(sprintf("ID: %d | Name: %-20s | Purchase: $%8.2f | Status: %s\n", customer_ids[i], customer_names[i], purchase_amounts[i], premium_label)) } cat("\n") cat("="," rep("=", 70), "\n", sep="") cat("Summary Statistics\n") cat(rep("=", 71), "\n", sep="") cat(sprintf("Total Purchases: $%10.2f\n", total_purchases)) cat(sprintf("Average Purchase: $%10.2f\n", average_purchase)) cat(sprintf("Premium Customers: %d out of %d (%.1f%%)\n", num_premium, length(customer_ids), premium_percentage)) cat(sprintf("Standard Customers: %d out of %d (%.1f%%)\n", length(customer_ids) - num_premium, length(customer_ids), 100 - premium_percentage)) # Step 5: Additional analysis - Premium vs Standard spending premium_purchases <- purchase_amounts[premium_status] standard_purchases <- purchase_amounts[!premium_status] cat("\nDetailed Analysis\n") cat(rep("-", 71), "\n", sep="") cat(sprintf("Avg Premium Purchase: $%10.2f\n", mean(premium_purchases))) cat(sprintf("Avg Standard Purchase: $%10.2f\n", mean(standard_purchases))) cat(sprintf("Highest Purchase: $%10.2f (%s)\n", max(purchase_amounts), customer_names[which.max(purchase_amounts)])) cat(sprintf("Lowest Purchase: $%10.2f (%s)\n", min(purchase_amounts), customer_names[which.min(purchase_amounts)]))
OUTPUT
Data Type Verification
======================
Customer IDs:  integer 
Customer Names:  character 
Purchase Amounts:  numeric 
Premium Status:  logical 

Customer Database Report
========================

ID: 1001 | Name: Alice Johnson        | Purchase: $  450.75 | Status: ✓ Premium
ID: 1002 | Name: Bob Smith            | Purchase: $  892.50 | Status: ✓ Premium
ID: 1003 | Name: Carol Williams       | Purchase: $  325.99 | Status: Standard
ID: 1004 | Name: David Brown          | Purchase: $ 1150.00 | Status: ✓ Premium
ID: 1005 | Name: Emma Davis           | Purchase: $  678.25 | Status: Standard

=======================================================================
Summary Statistics
=======================================================================
Total Purchases:        $   3497.49
Average Purchase:       $    699.50
Premium Customers:      3 out of 5 (60.0%)
Standard Customers:     2 out of 5 (40.0%)

Detailed Analysis
-----------------------------------------------------------------------
Avg Premium Purchase:   $    831.08
Avg Standard Purchase:  $    502.12
Highest Purchase:       $   1150.00 (David Brown)
Lowest Purchase:        $    325.99 (Carol Williams)

15. Best Practices Summary

🎯 Golden Rules for Using Data Types in R

Best Practice Why It Matters How to Implement
1. Always verify data types after import CSV/Excel imports often misinterpret types Use str(), class(), or typeof()
2. Use integer for counting More memory efficient for large datasets Add "L" suffix: count <- 100L
3. Convert before calculation Prevents errors and unexpected results as.numeric(), as.integer(), etc.
4. Use logical for filtering Clear, efficient, and readable data[data$value > 100, ]
5. Keep consistent types in vectors R coerces mixed types unpredictably Store different types in separate vectors
6. Document expected types Makes code maintainable Add comments explaining data type choices
7. Validate external data User input and imports can be unreliable Create validation functions
8. Use character for IDs IDs shouldn't be used in calculations customer_id <- "CUST001"

16. Quick Reference Card

📋 Data Type Quick Reference

Function Purpose Example Returns
class(x) Get object class class(100L) "integer"
typeof(x) Get internal type typeof(100L) "integer"
is.numeric(x) Check if numeric is.numeric(45.6) TRUE
is.integer(x) Check if integer is.integer(100L) TRUE
is.character(x) Check if character is.character("text") TRUE
is.logical(x) Check if logical is.logical(TRUE) TRUE
as.numeric(x) Convert to numeric as.numeric("123") 123
as.integer(x) Convert to integer as.integer(45.7) 45L
as.character(x) Convert to character as.character(100) "100"
as.logical(x) Convert to logical as.logical(1) TRUE

17. Real-World Case Study: E-Commerce Analytics

Case Study: Complete E-Commerce Data Analysis
# Real-world scenario: E-commerce platform data analysis # Demonstrating appropriate use of all data types # ============= DATA CREATION ============= # Character: Product information product_ids <- c("ELEC001", "FASH002", "HOME003", "ELEC004", "FASH005", "HOME006", "ELEC007", "FASH008") product_names <- c("Laptop", "T-Shirt", "Coffee Maker", "Smartphone", "Jeans", "Blender", "Tablet", "Dress") categories <- c("Electronics", "Fashion", "Home", "Electronics", "Fashion", "Home", "Electronics", "Fashion") # Numeric: Pricing and financial data prices <- c(899.99, 29.99, 79.50, 699.99, 59.99, 49.99, 399.99, 89.99) revenue <- c(17999.80, 1499.50, 2385.00, 27999.60, 2999.50, 1999.60, 11999.70, 3599.60) # Integer: Quantities and counts units_sold <- c(20L, 50L, 30L, 40L, 50L, 40L, 30L, 40L) stock_remaining <- c(15L, 200L, 45L, 25L, 180L, 60L, 20L, 150L) reorder_point <- 30L # Logical: Business rules and status in_stock <- stock_remaining > 0 needs_reorder <- stock_remaining < reorder_point premium_products <- prices > 500 high_performer <- units_sold > 35L # ============= ANALYSIS ============= cat("E-COMMERCE ANALYTICS DASHBOARD\n") cat(rep("=", 80), "\n\n", sep="") # Section 1: Inventory Management (Integer operations) cat("1. INVENTORY STATUS\n") cat(rep("-", 80), "\n", sep="") total_units_sold <- sum(units_sold) total_stock <- sum(stock_remaining) items_needing_reorder <- sum(needs_reorder) cat(sprintf("Total Units Sold: %d units\n", total_units_sold)) cat(sprintf("Total Stock Remaining: %d units\n", total_stock)) cat(sprintf("Items Needing Reorder: %d out of %d products\n\n", items_needing_reorder, length(product_ids))) # Section 2: Financial Analysis (Numeric operations) cat("2. FINANCIAL PERFORMANCE\n") cat(rep("-", 80), "\n", sep="") total_revenue <- sum(revenue) average_price <- mean(prices) highest_revenue_product <- which.max(revenue) cat(sprintf("Total Revenue: $%12.2f\n", total_revenue)) cat(sprintf("Average Product Price: $%12.2f\n", average_price)) cat(sprintf("Top Revenue Product: %s ($%.2f)\n\n", product_names[highest_revenue_product], revenue[highest_revenue_product])) # Section 3: Category Performance (Character grouping) cat("3. CATEGORY BREAKDOWN\n") cat(rep("-", 80), "\n", sep="") unique_categories <- unique(categories) for(cat_name in unique_categories) { cat_mask <- categories == cat_name cat_revenue <- sum(revenue[cat_mask]) cat_units <- sum(units_sold[cat_mask]) cat_products <- sum(cat_mask) cat(sprintf("%-15s: %d products | %d units sold | $%10.2f revenue\n", cat_name, cat_products, cat_units, cat_revenue)) } cat("\n") # Section 4: Product Performance Matrix (Logical filtering) cat("4. PRODUCT PERFORMANCE MATRIX\n") cat(rep("-", 80), "\n", sep="") cat(sprintf("%-12s %-20s %10s %8s %12s %10s\n", "ID", "Product", "Price", "Sold", "Revenue", "Status")) cat(rep("-", 80), "\n", sep="") for(i in 1:length(product_ids)) { # Determine status using logical operations status <- "" if(premium_products[i] && high_performer[i]) { status <- "⭐ Premium" } else if(high_performer[i]) { status <- "✓ Strong" } else if(needs_reorder[i]) { status <- "⚠ Reorder" } else { status <- "○ Normal" } cat(sprintf("%-12s %-20s $%9.2f %7dL $%11.2f %-10s\n", product_ids[i], product_names[i], prices[i], units_sold[i], revenue[i], status)) } cat("\n") # Section 5: Key Metrics Summary (Mixed data types) cat("5. KEY BUSINESS METRICS\n") cat(rep("-", 80), "\n", sep="") premium_count <- sum(premium_products) high_performer_count <- sum(high_performer) avg_units_per_product <- mean(units_sold) stock_coverage <- total_stock / total_units_sold cat(sprintf("Premium Products (>$500): %d (%.1f%%)\n", premium_count, (premium_count/length(product_ids))*100)) cat(sprintf("High Performers (>35 units): %d (%.1f%%)\n", high_performer_count, (high_performer_count/length(product_ids))*100)) cat(sprintf("Average Units per Product: %.1f units\n", avg_units_per_product)) cat(sprintf("Stock Coverage Ratio: %.2fx\n", stock_coverage)) cat("\n") # Section 6: Recommendations (Logical decision-making) cat("6. AUTOMATED RECOMMENDATIONS\n") cat(rep("-", 80), "\n", sep="") urgent_reorders <- product_names[needs_reorder & in_stock] if(length(urgent_reorders) > 0) { cat("⚠ URGENT: Reorder these products:\n") for(prod in urgent_reorders) { cat(sprintf(" - %s\n", prod)) } cat("\n") } focus_products <- product_names[premium_products & high_performer] if(length(focus_products) > 0) { cat("⭐ FOCUS: Increase marketing for top performers:\n") for(prod in focus_products) { cat(sprintf(" - %s\n", prod)) } cat("\n") } underperformers <- product_names[!high_performer & !premium_products] if(length(underperformers) > 0) { cat("📊 REVIEW: Consider promotions for:\n") for(prod in underperformers) { cat(sprintf(" - %s\n", prod)) } } # Data Type Summary cat("\n") cat(rep("=", 80), "\n", sep="") cat("DATA TYPE VERIFICATION\n") cat(rep("=", 80), "\n", sep="") cat(sprintf("Product IDs: %s (identifiers)\n", class(product_ids))) cat(sprintf("Prices: %s (financial precision)\n", class(prices))) cat(sprintf("Units Sold: %s (counting)\n", class(units_sold))) cat(sprintf("Needs Reorder: %s (business rules)\n", class(needs_reorder))) cat(sprintf("Categories: %s (categorical data)\n", class(categories)))
OUTPUT
E-COMMERCE ANALYTICS DASHBOARD
================================================================================

1. INVENTORY STATUS
--------------------------------------------------------------------------------
Total Units Sold:           300 units
Total Stock Remaining:      695 units
Items Needing Reorder:      3 out of 8 products

2. FINANCIAL PERFORMANCE
--------------------------------------------------------------------------------
Total Revenue:              $    70482.30
Average Product Price:      $      288.68
Top Revenue Product:        Smartphone ($27999.60)

3. CATEGORY BREAKDOWN
--------------------------------------------------------------------------------
Electronics    : 4 products | 120 units sold | $  57998.10 revenue
Fashion        : 3 products | 140 units sold | $   8098.60 revenue
Home           : 2 products | 70 units sold | $   4384.60 revenue

4. PRODUCT PERFORMANCE MATRIX
--------------------------------------------------------------------------------
ID           Product                   Price     Sold      Revenue     Status    
--------------------------------------------------------------------------------
ELEC001      Laptop                $   899.99      20L $   17999.80 ○ Normal   
FASH002      T-Shirt               $    29.99      50L $    1499.50 ✓ Strong   
HOME003      Coffee Maker          $    79.50      30L $    2385.00 ○ Normal   
ELEC004      Smartphone            $   699.99      40L $   27999.60 ⭐ Premium  
FASH005      Jeans                 $    59.99      50L $    2999.50 ✓ Strong   
HOME006      Blender               $    49.99      40L $    1999.60 ✓ Strong   
ELEC007      Tablet                $   399.99      30L $   11999.70 ⚠ Reorder  
FASH008      Dress                 $    89.99      40L $    3599.60 ✓ Strong   

5. KEY BUSINESS METRICS
--------------------------------------------------------------------------------
Premium Products (>$500):        3 (37.5%)
High Performers (>35 units):     5 (62.5%)
Average Units per Product:       37.5 units
Stock Coverage Ratio:            2.32x

6. AUTOMATED RECOMMENDATIONS
--------------------------------------------------------------------------------
⚠ URGENT: Reorder these products:
   - Laptop
   - Tablet
   - Smartphone

⭐ FOCUS: Increase marketing for top performers:
   - Smartphone

📊 REVIEW: Consider promotions for:
   - Laptop
   - Coffee Maker

================================================================================
DATA TYPE VERIFICATION
================================================================================
Product IDs:        character (identifiers)
Prices:             numeric (financial precision)
Units Sold:         integer (counting)
Needs Reorder:      logical (business rules)
Categories:         character (categorical data)
🎓 Learning Takeaway: This case study demonstrates how choosing the right data type for each variable creates a robust, efficient, and maintainable analytics system. Notice how:
  • Character types preserve product identifiers and categories
  • Numeric types handle precise financial calculations
  • Integer types efficiently count items
  • Logical types enable clear business rule evaluation

18. Summary and Key Takeaways

🎯 Chapter Summary: Data Types in R

What We Learned:

  • Six Basic Data Types: Numeric, Integer, Character, Logical, Complex, and Raw
  • Appropriate Usage: Each data type has specific use cases in business analytics
  • Type Conversion: How to convert between types and when it's necessary
  • Data Validation: Importance of checking and verifying data types
  • Best Practices: Guidelines for choosing and using data types effectively

Critical Skills Acquired:

  • Identifying the appropriate data type for different business scenarios
  • Performing type-specific operations (calculations, comparisons, filtering)
  • Converting between data types safely and correctly
  • Validating data quality through type checking
  • Debugging common data type errors
  • Building efficient and maintainable R programs

💼 Business Applications Covered

  • Financial analysis and calculations (Numeric)
  • Customer database management (Character, Integer)
  • Inventory tracking and management (Integer, Logical)
  • Performance evaluation and filtering (Logical)
  • Quality control systems (Logical, Numeric)
  • E-commerce analytics (All types combined)

📚 Next Steps for Students

  1. Practice: Complete all five exercises to reinforce learning
  2. Experiment: Try modifying the code examples with your own data
  3. Real Data: Apply these concepts to actual business datasets
  4. Build Projects: Create your own analytics dashboards using appropriate data types
  5. Prepare: These fundamentals are essential for upcoming topics (data structures, data frames, statistical analysis)

⚠️ Common Pitfalls to Avoid

  • Forgetting to check data types after importing data
  • Performing calculations on character data without conversion
  • Not using the "L" suffix when integers are needed
  • Mixing data types inappropriately in comparisons
  • Ignoring type coercion warnings from R
  • Not validating external or user-provided data

🔑 Final Key Points

  1. Data types are foundational – Understanding them is crucial for all R programming
  2. Choose types intentionally – Each type serves specific purposes in business analytics
  3. Always validate – Check data types, especially after importing data
  4. Convert when necessary – Use explicit conversion functions to avoid errors
  5. Think about efficiency – Proper data types improve performance and memory usage
  6. Document your choices – Comment why you chose specific data types

🎓 Self-Assessment Questions

  1. What is the default numerical data type in R, and when should you use it?
  2. How do you explicitly create an integer in R? Why would you choose integer over numeric?
  3. What are three common business scenarios where logical data types are essential?
  4. Explain the difference between class() and typeof() functions.
  5. What happens when you try to sum a vector of character numbers without conversion?
  6. Name three functions you can use to verify data types in R.
  7. When importing data from CSV, why is type checking critically important?
  8. How would you convert the string "TRUE" to an actual logical value?
  9. What is type coercion, and when does R perform it automatically?
  10. In a business context, should customer IDs be stored as integers or characters? Why?

Check your understanding by answering these questions and testing the concepts with actual R code!

Educational Resources Footer