📊 Data Types in R: Appropriate Uses of Different Data Types

Learning Objective: Master the fundamental data types in R and understand when and how to use each type effectively in real-world data analysis scenarios.

1. Introduction to Data Types in R

Data types are the foundation of programming in R. They define what kind of data can be stored and what operations can be performed on that data. Understanding data types is crucial for:

Efficient memory management
Proper data manipulation and analysis
Avoiding errors in statistical computations
Writing optimized code for large datasets

R has six basic (atomic) data types, and choosing the right one significantly impacts your program’s performance and accuracy.

2. The Six Basic Data Types in R

Data Type	Description	Example Values	Primary Use Case
Numeric (Double)	Decimal numbers (default)	3.14, 100.5, -0.001	Continuous measurements, calculations
Integer	Whole numbers	1L, 100L, -50L	Counting, indexing, discrete data
Character	Text strings	“Hello”, ‘MBA’, “2024”	Names, labels, categorical text
Logical	Boolean values	TRUE, FALSE	Conditions, filtering, binary outcomes
Complex	Complex numbers	1+2i, 3-4i	Mathematical computations, signal processing
Raw	Raw bytes	as.raw(65)	Binary data, file operations

3. Numeric Data Type (Double)

Definition and Characteristics

The numeric data type (technically “double” or double-precision floating-point) is the default numerical type in R. It can store decimal numbers with approximately 15-17 significant digits of precision.

✅ When to Use Numeric Data Type:

Financial Analysis: Stock prices, revenue, profit margins
Statistical Measurements: Mean, median, standard deviation
Scientific Calculations: Measurements, ratios, percentages
Business Metrics: ROI, conversion rates, growth rates

Example 1: Basic Numeric Operations

# Creating numeric variables
revenue <- 125000.50
expenses <- 87500.75
profit_margin <- 0.165

# Calculations
net_profit <- revenue - expenses
roi <- (net_profit / expenses) * 100

# Display results
cat("Revenue: $", revenue, "\n")
cat("Expenses: $", expenses, "\n")
cat("Net Profit: $", net_profit, "\n")
cat("ROI: ", roi, "%\n")

# Check data type
class(revenue)
typeof(revenue)

OUTPUT

Revenue: $ 125000.5 
Expenses: $ 87500.75 
Net Profit: $ 37499.75 
ROI:  42.85657 %
[1] "numeric"
[1] "double"

Example 2: Real-World Business Analytics

# Quarterly sales data
Q1_sales <- 250000.00
Q2_sales <- 275000.50
Q3_sales <- 310000.75
Q4_sales <- 340000.25

# Calculate year-over-year growth
total_sales <- Q1_sales + Q2_sales + Q3_sales + Q4_sales
average_quarterly <- total_sales / 4
growth_rate <- ((Q4_sales - Q1_sales) / Q1_sales) * 100

cat("Total Annual Sales: $", format(total_sales, big.mark=","), "\n")
cat("Average Quarterly Sales: $", format(average_quarterly, big.mark=","), "\n")
cat("Q4 vs Q1 Growth: ", round(growth_rate, 2), "%\n")

# Statistical analysis
sales_vector <- c(Q1_sales, Q2_sales, Q3_sales, Q4_sales)
cat("\nStatistical Summary:\n")
cat("Mean: $", mean(sales_vector), "\n")
cat("Median: $", median(sales_vector), "\n")
cat("Std Dev: $", round(sd(sales_vector), 2), "\n")

OUTPUT

Total Annual Sales: $ 1,175,001 
Average Quarterly Sales: $ 293,750.4 
Q4 vs Q1 Growth:  36 %

Statistical Summary:
Mean: $ 293750.4 
Median: $ 292500.6 
Std Dev: $ 37322.87

💡 Pro Tip: Numeric types automatically handle decimal calculations. Use round() function to control decimal places for presentation: round(value, 2) for two decimal places.

4. Integer Data Type

Definition and Characteristics

Integers are whole numbers without decimal points. In R, you must explicitly define integers by appending L to the number (e.g., 100L). Integers use less memory than numeric types.

✅ When to Use Integer Data Type:

Counting: Number of customers, products, employees
Indexing: Array positions, database IDs
Discrete Data: Age in years, number of transactions
Memory Optimization: Large datasets with whole numbers

Example 3: Customer and Product Counting

# Integer variables for business metrics
total_customers <- 15000L
new_customers_month <- 450L
products_sold <- 8500L
inventory_count <- 12500L

# Calculations
customer_growth <- new_customers_month
remaining_inventory <- inventory_count - products_sold

cat("Total Customers: ", total_customers, "\n")
cat("New Customers This Month: ", new_customers_month, "\n")
cat("Products Sold: ", products_sold, "\n")
cat("Remaining Inventory: ", remaining_inventory, "\n")

# Verify data type
cat("\nData Type Check:\n")
cat("Class: ", class(total_customers), "\n")
cat("Type: ", typeof(total_customers), "\n")

# Comparison: Integer vs Numeric memory
numeric_var <- 15000
integer_var <- 15000L
cat("\nMemory Comparison:\n")
cat("Numeric size: ", object.size(numeric_var), " bytes\n")
cat("Integer size: ", object.size(integer_var), " bytes\n")

OUTPUT

Total Customers:  15000 
New Customers This Month:  450 
Products Sold:  8500 
Remaining Inventory:  4000 

Data Type Check:
Class:  integer 
Type:  integer 

Memory Comparison:
Numeric size:  56  bytes
Integer size:  56  bytes

Example 4: Age Groups and Demographics

# Employee demographics (integers are perfect for age)
employee_ages <- c(25L, 32L, 45L, 28L, 55L, 38L, 42L, 29L)
num_employees <- length(employee_ages)

# Age analysis
youngest <- min(employee_ages)
oldest <- max(employee_ages)
median_age <- median(employee_ages)

# Age group categorization
under_30 <- sum(employee_ages < 30L)
age_30_to_40 <- sum(employee_ages >= 30L & employee_ages < 40L)
age_40_plus <- sum(employee_ages >= 40L)

cat("Employee Demographics Analysis\n")
cat("==============================\n")
cat("Total Employees: ", num_employees, "\n")
cat("Youngest Employee: ", youngest, " years\n")
cat("Oldest Employee: ", oldest, " years\n")
cat("Median Age: ", median_age, " years\n\n")

cat("Age Distribution:\n")
cat("Under 30: ", under_30, " employees\n")
cat("30-39: ", age_30_to_40, " employees\n")
cat("40+: ", age_40_plus, " employees\n")

OUTPUT

Employee Demographics Analysis
==============================
Total Employees:  8 
Youngest Employee:  25  years
Oldest Employee:  55  years
Median Age:  35  years

Age Distribution:
Under 30:  3  employees
30-39:  2  employees
40+:  3  employees

⚠️ Important: Without the "L" suffix, R treats numbers as numeric (double) by default. Always use "L" when you specifically need integers for memory efficiency or when working with functions that require integer inputs.

5. Character Data Type

Definition and Characteristics

Character data type stores text strings. Strings must be enclosed in either single quotes ('text') or double quotes ("text"). This is essential for storing names, labels, descriptions, and categorical data.

✅ When to Use Character Data Type:

Identifiers: Customer names, product codes, IDs
Categorical Data: Department names, product categories, regions
Text Analysis: Customer reviews, feedback, descriptions
Labels: Chart labels, report headers, annotations

Example 5: Customer Information System

# Customer database
customer_names <- c("John Smith", "Sarah Johnson", "Michael Brown", "Emily Davis")
customer_ids <- c("CUST001", "CUST002", "CUST003", "CUST004")
departments <- c("Sales", "Marketing", "Finance", "Operations")
email_domains <- c("gmail.com", "company.com", "yahoo.com", "outlook.com")

# String operations
cat("Customer Database\n")
cat("=================\n")
for(i in 1:length(customer_names)) {
  cat("ID: ", customer_ids[i], " | Name: ", customer_names[i], 
      " | Dept: ", departments[i], "\n")
}

# String functions
cat("\nString Analysis:\n")
cat("First customer name length: ", nchar(customer_names[1]), " characters\n")
cat("Uppercase ID: ", toupper(customer_ids[1]), "\n")
cat("Lowercase email: ", tolower(email_domains[2]), "\n")

# Check data type
cat("\nData Type: ", class(customer_names), "\n")

OUTPUT

Customer Database
=================
ID:  CUST001  | Name:  John Smith  | Dept:  Sales 
ID:  CUST002  | Name:  Sarah Johnson  | Dept:  Marketing 
ID:  CUST003  | Name:  Michael Brown  | Dept:  Finance 
ID:  CUST004  | Name:  Emily Davis  | Dept:  Operations 

String Analysis:
First customer name length:  10  characters
Uppercase ID:  CUST001 
Lowercase email:  company.com 

Data Type:  character

Example 6: Product Catalog Management

# Product information
product_names <- c("Laptop Pro 15", "Wireless Mouse", "USB-C Hub", "Mechanical Keyboard")
product_categories <- c("Electronics", "Accessories", "Accessories", "Electronics")
product_skus <- c("ELEC-LP-001", "ACC-MS-045", "ACC-HB-023", "ELEC-KB-089")

# String concatenation and manipulation
cat("Product Catalog\n")
cat("===============\n\n")

for(i in 1:length(product_names)) {
  full_label <- paste(product_skus[i], "-", product_names[i])
  cat("Product ", i, ": ", full_label, "\n")
  cat("Category: ", product_categories[i], "\n")
  
  # Extract category prefix from SKU
  sku_prefix <- substr(product_skus[i], 1, 4)
  cat("SKU Prefix: ", sku_prefix, "\n\n")
}

# Count products by category
electronics_count <- sum(product_categories == "Electronics")
accessories_count <- sum(product_categories == "Accessories")

cat("Category Summary:\n")
cat("Electronics: ", electronics_count, " products\n")
cat("Accessories: ", accessories_count, " products\n")

OUTPUT

Product Catalog
===============

Product  1 :  ELEC-LP-001 - Laptop Pro 15 
Category:  Electronics 
SKU Prefix:  ELEC 

Product  2 :  ACC-MS-045 - Wireless Mouse 
Category:  Accessories 
SKU Prefix:  ACC- 

Product  3 :  ACC-HB-023 - USB-C Hub 
Category:  Accessories 
SKU Prefix:  ACC- 

Product  4 :  ELEC-KB-089 - Mechanical Keyboard 
Category:  Electronics 
SKU Prefix:  ELEC 

Category Summary:
Electronics:  2  products
Accessories:  2  products

💡 Useful String Functions:

paste() - Concatenate strings
substr() - Extract substring
toupper(), tolower() - Change case
nchar() - Count characters
grep(), grepl() - Pattern matching

6. Logical Data Type

Definition and Characteristics

Logical (or Boolean) data type has only two possible values: TRUE or FALSE. These are fundamental for conditional operations, filtering data, and control flow in programs.

✅ When to Use Logical Data Type:

Data Filtering: Selecting records that meet criteria
Conditional Checks: Testing if conditions are met
Binary Outcomes: Yes/No, Pass/Fail, Active/Inactive
Control Flow: If-else statements, while loops

Example 7: Sales Performance Evaluation

# Sales representatives performance
sales_reps <- c("Alice", "Bob", "Charlie", "Diana", "Edward")
monthly_sales <- c(125000, 87000, 145000, 92000, 156000)
target <- 100000

# Logical evaluations
met_target <- monthly_sales >= target
exceeded_120k <- monthly_sales > 120000
top_performer <- monthly_sales == max(monthly_sales)

# Performance report
cat("Monthly Sales Performance Report\n")
cat("================================\n\n")

for(i in 1:length(sales_reps)) {
  cat("Representative: ", sales_reps[i], "\n")
  cat("Sales: $", monthly_sales[i], "\n")
  cat("Met Target: ", met_target[i], "\n")
  cat("Exceeded $120K: ", exceeded_120k[i], "\n")
  cat("Top Performer: ", top_performer[i], "\n")
  cat("---\n")
}

# Summary statistics
total_met_target <- sum(met_target)
total_exceeded_120k <- sum(exceeded_120k)
percentage_met <- (total_met_target / length(sales_reps)) * 100

cat("\nSummary:\n")
cat("Representatives meeting target: ", total_met_target, " out of ", 
    length(sales_reps), "\n")
cat("Representatives exceeding $120K: ", total_exceeded_120k, "\n")
cat("Success Rate: ", percentage_met, "%\n")

# Check data type
cat("\nData Type: ", class(met_target), "\n")

OUTPUT

Monthly Sales Performance Report
================================

Representative:  Alice 
Sales: $ 125000 
Met Target:  TRUE 
Exceeded $120K:  TRUE 
Top Performer:  FALSE 
---
Representative:  Bob 
Sales: $ 87000 
Met Target:  FALSE 
Exceeded $120K:  FALSE 
Top Performer:  FALSE 
---
Representative:  Charlie 
Sales: $ 145000 
Met Target:  TRUE 
Exceeded $120K:  TRUE 
Top Performer:  FALSE 
---
Representative:  Diana 
Sales: $ 92000 
Met Target:  FALSE 
Exceeded $120K:  FALSE 
Top Performer:  FALSE 
---
Representative:  Edward 
Sales: $ 156000 
Met Target:  TRUE 
Exceeded $120K:  TRUE 
Top Performer:  TRUE 
---

Summary:
Representatives meeting target:  3  out of  5 
Representatives exceeding $120K:  3 
Success Rate:  60 %

Data Type:  logical

Example 8: Quality Control and Validation

# Product quality inspection
product_ids <- c("P001", "P002", "P003", "P004", "P005")
weight_kg <- c(5.2, 4.8, 5.0, 5.5, 4.9)
dimensions_ok <- c(TRUE, TRUE, FALSE, TRUE, TRUE)
quality_passed <- c(TRUE, FALSE, TRUE, TRUE, TRUE)

# Specifications
min_weight <- 4.9
max_weight <- 5.3

# Validation checks
weight_in_range <- (weight_kg >= min_weight) & (weight_kg <= max_weight)
overall_pass <- weight_in_range & dimensions_ok & quality_passed

cat("Quality Control Report\n")
cat("=====================\n\n")

for(i in 1:length(product_ids)) {
  cat("Product ID: ", product_ids[i], "\n")
  cat("Weight: ", weight_kg[i], " kg | In Range: ", weight_in_range[i], "\n")
  cat("Dimensions OK: ", dimensions_ok[i], "\n")
  cat("Quality Test: ", quality_passed[i], "\n")
  cat("OVERALL STATUS: ", ifelse(overall_pass[i], "PASS ✓", "FAIL ✗"), "\n")
  cat("---\n")
}

# Summary
pass_count <- sum(overall_pass)
fail_count <- sum(!overall_pass)
pass_rate <- (pass_count / length(product_ids)) * 100

cat("\nQuality Summary:\n")
cat("Products Passed: ", pass_count, "\n")
cat("Products Failed: ", fail_count, "\n")
cat("Pass Rate: ", round(pass_rate, 1), "%\n")

OUTPUT

Quality Control Report
=====================

Product ID:  P001 
Weight:  5.2  kg | In Range:  TRUE 
Dimensions OK:  TRUE 
Quality Test:  TRUE 
OVERALL STATUS:  PASS ✓ 
---
Product ID:  P002 
Weight:  4.8  kg | In Range:  FALSE 
Dimensions OK:  TRUE 
Quality Test:  FALSE 
OVERALL STATUS:  FAIL ✗ 
---
Product ID:  P003 
Weight:  5  kg | In Range:  TRUE 
Dimensions OK:  FALSE 
Quality Test:  TRUE 
OVERALL STATUS:  FAIL ✗ 
---
Product ID:  P004 
Weight:  5.5  kg | In Range:  FALSE 
Dimensions OK:  TRUE 
Quality Test:  TRUE 
OVERALL STATUS:  FAIL ✗ 
---
Product ID:  P005 
Weight:  4.9  kg | In Range:  TRUE 
Dimensions OK:  TRUE 
Quality Test:  TRUE 
OVERALL STATUS:  PASS ✓ 
---

Quality Summary:
Products Passed:  2 
Products Failed:  3 
Pass Rate:  40 %

🔑 Key Point: Logical values are the result of comparison operations (>, <, ==, !=, >=, <=). They can be combined using logical operators: & (AND), | (OR), ! (NOT).

7. Complex Data Type

Definition and Characteristics

Complex numbers have a real and an imaginary part (e.g., 3 + 4i). While less common in business analytics, they're essential for certain mathematical computations, signal processing, and engineering applications.

✅ When to Use Complex Data Type:

Engineering Calculations: Electrical engineering, signal processing
Advanced Mathematics: Fourier transforms, eigenvalue problems
Scientific Computing: Quantum mechanics, wave equations
Financial Engineering: Options pricing (advanced models)

Example 9: Complex Number Operations

# Creating complex numbers
z1 <- 3 + 4i
z2 <- 2 - 3i
z3 <- complex(real = 5, imaginary = 2)

# Basic operations
sum_z <- z1 + z2
diff_z <- z1 - z2
prod_z <- z1 * z2
quot_z <- z1 / z2

cat("Complex Number Operations\n")
cat("=========================\n\n")

cat("z1 = ", z1, "\n")
cat("z2 = ", z2, "\n\n")

cat("Addition (z1 + z2) = ", sum_z, "\n")
cat("Subtraction (z1 - z2) = ", diff_z, "\n")
cat("Multiplication (z1 × z2) = ", prod_z, "\n")
cat("Division (z1 ÷ z2) = ", quot_z, "\n\n")

# Properties of complex numbers
cat("Properties of z1:\n")
cat("Real part: ", Re(z

```html
1), "\n")
cat("Imaginary part: ", Im(z1), "\n")
cat("Modulus (|z1|): ", Mod(z1), "\n")
cat("Conjugate: ", Conj(z1), "\n")
cat("Argument (angle): ", Arg(z1), " radians\n\n")

# Check data type
cat("Data Type: ", class(z1), "\n")
cat("Type: ", typeof(z1), "\n")

OUTPUT

Complex Number Operations
=========================

z1 =  3+4i 
z2 =  2-3i 

Addition (z1 + z2) =  5+1i 
Subtraction (z1 - z2) =  1+7i 
Multiplication (z1 × z2) =  18+(-1)i 
Division (z1 ÷ z2) =  -0.4615385+1.307692i 

Properties of z1:
Real part:  3 
Imaginary part:  4 
Modulus (|z1|):  5 
Conjugate:  3-4i 
Argument (angle):  0.9272952  radians

Data Type:  complex 
Type:  complex

Example 10: Signal Processing Application

# Electrical impedance calculation (AC circuit analysis)
# Impedance Z = R + jX, where R is resistance and X is reactance

resistance <- c(100, 150, 200)  # Ohms
reactance <- c(50, -75, 100)    # Ohms (positive for inductive, negative for capacitive)

# Create complex impedances
impedances <- complex(real = resistance, imaginary = reactance)

cat("AC Circuit Analysis - Impedance Calculations\n")
cat("============================================\n\n")

for(i in 1:length(impedances)) {
  z <- impedances[i]
  magnitude <- Mod(z)
  phase_rad <- Arg(z)
  phase_deg <- phase_rad * 180 / pi
  
  cat("Circuit ", i, ":\n")
  cat("Impedance: ", z, " Ω\n")
  cat("Magnitude: ", round(magnitude, 2), " Ω\n")
  cat("Phase Angle: ", round(phase_deg, 2), "°\n")
  
  if(Im(z) > 0) {
    cat("Type: Inductive\n")
  } else if(Im(z) < 0) {
    cat("Type: Capacitive\n")
  } else {
    cat("Type: Purely Resistive\n")
  }
  cat("---\n")
}

# Calculate total impedance (series circuit)
total_impedance <- sum(impedances)
cat("\nTotal Series Impedance: ", total_impedance, " Ω\n")
cat("Total Magnitude: ", round(Mod(total_impedance), 2), " Ω\n")

OUTPUT

AC Circuit Analysis - Impedance Calculations
============================================

Circuit  1 :
Impedance:  100+50i  Ω
Magnitude:  111.8  Ω
Phase Angle:  26.57 °
Type: Inductive
---
Circuit  2 :
Impedance:  150-75i  Ω
Magnitude:  167.71  Ω
Phase Angle:  -26.57 °
Type: Capacitive
---
Circuit  3 :
Impedance:  200+100i  Ω
Magnitude:  223.61  Ω
Phase Angle:  26.57 °
Type: Inductive
---

Total Series Impedance:  450+75i  Ω
Total Magnitude:  456.22  Ω

⚠️ Note: Complex data types are rarely used in typical business analytics but are essential for specialized fields like engineering, physics, and advanced financial modeling. Most MBA students won't need them frequently, but understanding they exist is important.

8. Raw Data Type

Definition and Characteristics

The raw data type stores data as raw bytes (values from 0 to 255). It's used for low-level data operations, binary file handling, and interfacing with external systems.

✅ When to Use Raw Data Type:

Binary File Operations: Reading/writing binary files
Data Encryption: Cryptographic operations
Network Communication: Socket programming, protocols
Low-level Data Processing: Image processing, audio data

Example 11: Raw Data Basics

# Creating raw data
raw_bytes <- as.raw(c(65, 66, 67, 68, 69))  # ASCII codes for A, B, C, D, E
raw_single <- as.raw(72)  # ASCII for 'H'

cat("Raw Data Type Demonstration\n")
cat("===========================\n\n")

cat("Raw bytes: ")
print(raw_bytes)
cat("\n")

# Convert raw to character
chars <- rawToChar(raw_bytes)
cat("Converted to characters: ", chars, "\n\n")

# Convert character to raw
text <- "HELLO"
raw_from_text <- charToRaw(text)
cat("Text '", text, "' as raw bytes: ")
print(raw_from_text)
cat("\n")

# Check data type
cat("Data Type: ", class(raw_bytes), "\n")
cat("Type: ", typeof(raw_bytes), "\n\n")

# Size comparison
numeric_vector <- c(65, 66, 67, 68, 69)
cat("Memory Usage Comparison:\n")
cat("Numeric vector: ", object.size(numeric_vector), " bytes\n")
cat("Raw vector: ", object.size(raw_bytes), " bytes\n")

OUTPUT

Raw Data Type Demonstration
===========================

Raw bytes: [1] 41 42 43 44 45

Converted to characters:  ABCDE 

Text ' HELLO ' as raw bytes: [1] 48 45 4c 4c 4f

Data Type:  raw 
Type:  raw 

Memory Usage Comparison:
Numeric vector:  88  bytes
Raw vector:  53  bytes

💡 Practical Insight: Raw data types are memory-efficient for storing binary data but are rarely used in typical business analytics. They're more relevant for data engineers and systems programmers.

9. Data Type Conversion (Coercion)

Understanding Type Conversion

R allows you to convert between data types using as.numeric(), as.integer(), as.character(), as.logical() functions. Understanding when and how to convert is crucial for data manipulation.

Example 12: Explicit Type Conversion

# Starting with different types
num_value <- 42.7
char_number <- "123"
char_text <- "Hello"
logical_value <- TRUE
integer_val <- 100L

cat("Type Conversion Examples\n")
cat("========================\n\n")

# Numeric to Integer
num_to_int <- as.integer(num_value)
cat("Numeric to Integer:\n")
cat(num_value, " (", class(num_value), ") -> ", 
    num_to_int, " (", class(num_to_int), ")\n\n")

# Character to Numeric
char_to_num <- as.numeric(char_number)
cat("Character to Numeric:\n")
cat("'", char_number, "' (", class(char_number), ") -> ", 
    char_to_num, " (", class(char_to_num), ")\n\n")

# Numeric to Character
num_to_char <- as.character(num_value)
cat("Numeric to Character:\n")
cat(num_value, " (", class(num_value), ") -> '", 
    num_to_char, "' (", class(num_to_char), ")\n\n")

# Logical to Numeric
logical_to_num <- as.numeric(logical_value)
cat("Logical to Numeric:\n")
cat(logical_value, " (", class(logical_value), ") -> ", 
    logical_to_num, " (", class(logical_to_num), ")\n\n")

# Attempting invalid conversion
invalid_conversion <- as.numeric(char_text)
cat("Invalid Conversion (generates warning):\n")
cat("'", char_text, "' to numeric -> ", invalid_conversion, "\n")
cat("(NA = Not Available/Missing Value)\n")

OUTPUT

Type Conversion Examples
========================

Numeric to Integer:
42.7  ( numeric ) ->  42  ( integer )

Character to Numeric:
' 123 ' ( character ) ->  123  ( numeric )

Numeric to Character:
42.7  ( numeric ) -> ' 42.7 ' ( character )

Logical to Numeric:
TRUE  ( logical ) ->  1  ( numeric )

Invalid Conversion (generates warning):
' Hello ' to numeric ->  NA 
(NA = Not Available/Missing Value)

Example 13: Practical Business Application of Conversion

# Data imported from Excel/CSV often comes as character
revenue_char <- c("125000", "87500", "156000", "92000", "134500")
year_char <- c("2020", "2021", "2022", "2023", "2024")
profitable_char <- c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE")

cat("Data Cleaning and Type Conversion\n")
cat("==================================\n\n")

# Convert to appropriate types
revenue_num <- as.numeric(revenue_char)
year_int <- as.integer(year_char)
profitable_log <- as.logical(profitable_char)

# Perform calculations (only possible after conversion)
total_revenue <- sum(revenue_num)
avg_revenue <- mean(revenue_num)
num_profitable <- sum(profitable_log)
profitable_rate <- (num_profitable / length(profitable_log)) * 100

cat("Financial Analysis (After Type Conversion)\n")
cat("------------------------------------------\n")
cat("Total Revenue: $", format(total_revenue, big.mark=","), "\n")
cat("Average Revenue: $", format(avg_revenue, big.mark=","), "\n")
cat("Profitable Years: ", num_profitable, " out of ", length(year_int), "\n")
cat("Profitability Rate: ", round(profitable_rate, 1), "%\n\n")

# Year-over-year analysis
cat("Yearly Breakdown:\n")
for(i in 1:length(year_int)) {
  status <- ifelse(profitable_log[i], "PROFITABLE", "LOSS")
  cat("Year ", year_int[i], ": $", format(revenue_num[i], big.mark=","), 
      " - ", status, "\n")
}

# Data type verification
cat("\nData Types After Conversion:\n")
cat("Revenue: ", class(revenue_num), "\n")
cat("Year: ", class(year_int), "\n")
cat("Profitable: ", class(profitable_log), "\n")

OUTPUT

Data Cleaning and Type Conversion
==================================

Financial Analysis (After Type Conversion)
------------------------------------------
Total Revenue: $ 595,000 
Average Revenue: $ 119,000 
Profitable Years:  3  out of  5 
Profitability Rate:  60 %

Yearly Breakdown:
Year  2020 : $ 125,000  -  PROFITABLE 
Year  2021 : $ 87,500  -  LOSS 
Year  2022 : $ 156,000  -  PROFITABLE 
Year  2023 : $ 92,000  -  PROFITABLE 
Year  2024 : $ 134,500  -  LOSS 

Data Types After Conversion:
Revenue:  numeric 
Year:  integer 
Profitable:  logical

🔑 Critical Business Insight: When importing data from Excel, CSV, or databases, numbers often come as character strings. Always verify and convert data types before performing calculations to avoid errors.

10. Checking and Verifying Data Types

Essential Functions for Type Checking

R provides several functions to check data types. Using these functions is crucial for debugging and ensuring data quality.

Example 14: Comprehensive Type Checking

# Create variables of different types
my_numeric <- 45.6
my_integer <- 100L
my_character <- "Business Analytics"
my_logical <- FALSE
my_complex <- 3 + 2i

cat("Data Type Verification Functions\n")
cat("=================================\n\n")

# Using class() function
cat("Using class() function:\n")
cat("my_numeric: ", class(my_numeric), "\n")
cat("my_integer: ", class(my_integer), "\n")
cat("my_character: ", class(my_character), "\n")
cat("my_logical: ", class(my_logical), "\n")
cat("my_complex: ", class(my_complex), "\n\n")

# Using typeof() function (more specific)
cat("Using typeof() function:\n")
cat("my_numeric: ", typeof(my_numeric), "\n")
cat("my_integer: ", typeof(my_integer), "\n")
cat("my_character: ", typeof(my_character), "\n")
cat("my_logical: ", typeof(my_logical), "\n")
cat("my_complex: ", typeof(my_complex), "\n\n")

# Using is.* functions (returns TRUE/FALSE)
cat("Using is.* verification functions:\n")
cat("is.numeric(my_numeric): ", is.numeric(my_numeric), "\n")
cat("is.integer(my_integer): ", is.integer(my_integer), "\n")
cat("is.character(my_character): ", is.character(my_character), "\n")
cat("is.logical(my_logical): ", is.logical(my_logical), "\n")
cat("is.complex(my_complex): ", is.complex(my_complex), "\n\n")

# Checking multiple conditions
cat("Advanced Checking:\n")
cat("Is my_integer numeric? ", is.numeric(my_integer), "\n")
cat("Is my_integer specifically integer? ", is.integer(my_integer), "\n")
cat("Is '123' numeric? ", is.numeric("123"), "\n")
cat("Is '123' character? ", is.character("123"), "\n")

OUTPUT

Data Type Verification Functions
=================================

Using class() function:
my_numeric:  numeric 
my_integer:  integer 
my_character:  character 
my_logical:  logical 
my_complex:  complex 

Using typeof() function:
my_numeric:  double 
my_integer:  integer 
my_character:  character 
my_logical:  logical 
my_complex:  complex 

Using is.* verification functions:
is.numeric(my_numeric):  TRUE 
is.integer(my_integer):  TRUE 
is.character(my_character):  TRUE 
is.logical(my_logical):  TRUE 
is.complex(my_complex):  TRUE 

Advanced Checking:
Is my_integer numeric?  TRUE 
Is my_integer specifically integer?  TRUE 
Is '123' numeric?  FALSE 
Is '123' character?  TRUE

Example 15: Data Quality Validation Function

# Real-world data validation scenario
# Function to validate imported data

validate_data <- function(data_value, expected_type, variable_name) {
  cat("Validating: ", variable_name, "\n")
  cat("Value: ", data_value, "\n")
  cat("Expected Type: ", expected_type, "\n")
  
  is_valid <- switch(expected_type,
                     "numeric" = is.numeric(data_value),
                     "integer" = is.integer(data_value),
                     "character" = is.character(data_value),
                     "logical" = is.logical(data_value),
                     FALSE)
  
  actual_type <- class(data_value)
  cat("Actual Type: ", actual_type, "\n")
  
  if(is_valid) {
    cat("✓ VALIDATION PASSED\n")
  } else {
    cat("✗ VALIDATION FAILED - Type Mismatch!\n")
  }
  cat("---\n")
  return(is_valid)
}

# Test the validation function
cat("Data Import Validation Report\n")
cat("==============================\n\n")

customer_id <- 12345L
customer_name <- "John Smith"
purchase_amount <- 1250.50
is_premium <- TRUE
invalid_data <- "not_a_number"

validate_data(customer_id, "integer", "Customer ID")
validate_data(customer_name, "character", "Customer Name")
validate_data(purchase_amount, "numeric", "Purchase Amount")
validate_data(is_premium, "logical", "Premium Status")
validate_data(invalid_data, "numeric", "Invalid Numeric Field")

OUTPUT

Data Import Validation Report
==============================

Validating:  Customer ID 
Value:  12345 
Expected Type:  integer 
Actual Type:  integer 
✓ VALIDATION PASSED
---
Validating:  Customer Name 
Value:  John Smith 
Expected Type:  character 
Actual Type:  character 
✓ VALIDATION PASSED
---
Validating:  Purchase Amount 
Value:  1250.5 
Expected Type:  numeric 
Actual Type:  numeric 
✓ VALIDATION PASSED
---
Validating:  Premium Status 
Value:  TRUE 
Expected Type:  logical 
Actual Type:  logical 
✓ VALIDATION PASSED
---
Validating:  Invalid Numeric Field 
Value:  not_a_number 
Expected Type:  numeric 
Actual Type:  character 
✗ VALIDATION FAILED - Type Mismatch!
---

11. Decision Guide: Choosing the Right Data Type

Scenario	Recommended Data Type	Reason	Example
Stock prices, revenue	Numeric	Need decimal precision	125.75, 1500.50
Number of customers, products	Integer	Whole numbers, memory efficient	1500L, 2500L
Customer names, IDs	Character	Text data, identifiers	"John", "CUST001"
Yes/No, Pass/Fail	Logical	Binary outcomes, filtering	TRUE, FALSE
Age in years	Integer	Discrete whole numbers	25L, 45L
Percentages, ratios	Numeric	Fractional values	0.15, 1.25
Product categories	Character	Categorical text	"Electronics"
Date as text	Character	Then convert to Date type	"2024-01-15"
Survey responses (Yes/No)	Logical	Binary responses	TRUE, FALSE
Ratings (1-5)	Integer	Discrete values	1L, 2L, 3L, 4L, 5L

12. Common Data Type Mistakes and How to Avoid Them

❌ Common Mistake #1: Forgetting the "L" suffix for integers

Wrong: count <- 100 (creates numeric)

Right: count <- 100L (creates integer)

❌ Common Mistake #2: Performing calculations on character data

Problem: sum(c("100", "200")) will cause an error

Solution: sum(as.numeric(c("100", "200")))

❌ Common Mistake #3: Mixing data types in comparisons

Confusing: "100" == 100 returns TRUE (R coerces types)

Better: Explicitly convert: as.numeric("100") == 100

❌ Common Mistake #4: Not checking data types after import

Problem: CSV imports often make numbers into characters

Solution: Always use str() or class() after importing

Example 16: Debugging Type Errors

# Simulating common data type errors

cat("Common Data Type Errors and Solutions\n")
cat("======================================\n\n")

# Error 1: Operating on character numbers
sales_data <- c("1000", "1500", "2000")
cat("ERROR SCENARIO 1: Character numbers\n")
cat("Data: ", paste(sales_data, collapse=", "), "\n")
cat("Type: ", class(sales_data), "\n")
# Attempting: total <- sum(sales_data)  # This would error!
cat("Problem: Cannot sum character data\n")
cat("Solution: Convert first\n")
sales_numeric <- as.numeric(sales_data)
total <- sum(sales_numeric)
cat("Result after conversion: ", total, "\n\n")

# Error 2: Type mismatch in filtering
ages <- c(25L, 30L, 35L, 40L, 45L)
threshold <- "35"  # Accidentally a character!
cat("ERROR SCENARIO 2: Type mismatch in comparison\n")
cat("Ages: ", paste(ages, collapse=", "), "\n")
cat("Threshold: ", threshold, " (", class(threshold), ")\n")
cat("Comparison result: ages > threshold\n")
result <- ages > threshold
cat("Result: ", paste(result, collapse=", "), "\n")
cat("Warning: R coerced types, but results may be unexpected!\n")
cat("Solution: Ensure matching types\n")
threshold_correct <- 35L
result_correct <- ages > threshold_correct
cat("Correct result: ", paste(result_correct, collapse=", "), "\n\n")

# Error 3: Forgotten L suffix
cat("ERROR SCENARIO 3: Integer vs Numeric\n")
count1 <- 100
count2 <- 100L
cat("count1 = 100 -> Type: ", typeof(count1), "\n")
cat("count2 = 100L -> Type: ", typeof(count2), "\n")
cat("For counting operations, use integer for memory efficiency\n")

OUTPUT

Common Data Type Errors and Solutions
======================================

ERROR SCENARIO 1: Character numbers
Data:  1000, 1500, 2000 
Type:  character 
Problem: Cannot sum character data
Solution: Convert first
Result after conversion:  4500 

ERROR SCENARIO 2: Type mismatch in comparison
Ages:  25, 30, 35, 40, 45 
Threshold:  35  ( character )
Comparison result: ages > threshold
Result:  FALSE, FALSE, FALSE, TRUE, TRUE 
Warning: R coerced types, but results may be unexpected!
Solution: Ensure matching types
Correct result:  FALSE, FALSE, FALSE, TRUE, TRUE 

ERROR SCENARIO 3: Integer vs Numeric
count1 = 100 -> Type:  double 
count2 = 100L -> Type:  integer 
For counting operations, use integer for memory efficiency

13. Practical Exercises

📝 Exercise 1: Customer Database Analysis

Objective: Create a customer database and perform type-specific operations

Task:

Create vectors for:
- Customer IDs (integer): 1001L to 1005L
- Customer names (character): Any 5 names
- Purchase amounts (numeric): 5 decimal values between $100-$1000
- Premium membership status (logical): 3 TRUE, 2 FALSE
Calculate:
- Total purchases
- Average purchase amount
- Number of premium customers
- Percentage of premium customers
Verify data types using class() for each vector
Create a report displaying all customers with their details

Expected Output Format:

Customer Database Report
========================
ID: 1001 | Name: [Name] | Purchase: $XXX.XX | Premium: TRUE/FALSE
...

Summary Statistics:
Total Purchases: $XXXX.XX
Average Purchase: $XXX.XX
Premium Customers: X out of 5 (XX%)

📝 Exercise 2: Product Inventory Management

Objective: Practice integer operations and logical filtering

Task:

Create inventory data:
- Product IDs (character): "PROD001" to "PROD010"
- Stock quantities (integer): Random values between 10L and 500L
- Reorder threshold: 100L
Identify products that need reordering (stock < threshold)
Calculate:
- Total inventory count
- Number of products needing reorder
- Percentage of stock below threshold
Display a report showing which products need reordering

Hint: Use logical vectors for filtering

📝 Exercise 3: Sales Data Type Conversion

Objective: Practice converting data types (simulating CSV import)

Task:

Create "imported" data as character vectors:
- sales_char <- c("15000", "22000", "18500", "31000", "27500")
- year_char <- c("2020", "2021", "2022", "2023", "2024")
- profitable_char <- c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE")
Convert to appropriate data types
Perform analysis:
- Calculate total and average sales
- Count profitable years
- Calculate year-over-year growth rate
Verify all conversions using class()

Bonus: Create a function that automates the conversion process

📝 Exercise 4: Employee Performance Evaluation

Objective: Combine multiple data types in a real-world scenario

Task:

Create employee data:
- Employee IDs (character): "EMP001" to "EMP008"
- Names (character): Any 8 names
- Ages (integer): Between 25L and 60L
- Sales figures (numeric): Decimal values
- Met target (logical): Based on sales > $50,000
Create analysis:
- Categorize employees by age group (< 30, 30-45, 45+)
- Calculate average sales by age group
- Identify top 3 performers
- Calculate percentage who met target
Generate a comprehensive performance report

📝 Exercise 5: Data Validation System

Objective: Build a comprehensive data validation system

Task:

Create a mixed dataset with intentional errors:
- customer_ids: Mix of numeric and character (some invalid)
- ages: Include some negative values and character entries
- revenues: Include some non-numeric text
- status: Mix of TRUE/FALSE and "Yes"/"No" strings
Write validation functions to:
- Check data type of each variable
- Identify invalid entries
- Count errors by type
- Suggest corrections
Generate a validation report showing:
- Expected vs actual data types
- List of invalid entries
- Total error count
- Corrected dataset

Advanced Challenge: Create an automated cleaning function that fixes common data type errors

14. Solution to Exercise 1

Complete Solution: Customer Database Analysis

# Exercise 1 Solution: Customer Database Analysis

# Step 1: Create vectors with appropriate data types
customer_ids <- c(1001L, 1002L, 1003L, 1004L, 1005L)
customer_names <- c("Alice Johnson", "Bob Smith", "Carol Williams", 
                    "David Brown", "Emma Davis")
purchase_amounts <- c(450.75, 892.50, 325.99, 1150.00, 678.25)
premium_status <- c(TRUE, TRUE, FALSE, TRUE, FALSE)

# Step 2: Verify data types
cat("Data Type Verification\n")
cat("======================\n")
cat("Customer IDs: ", class(customer_ids), "\n")
cat("Customer Names: ", class(customer_names), "\n")
cat("Purchase Amounts: ", class(purchase_amounts), "\n")
cat("Premium Status: ", class(premium_status), "\n\n")

# Step 3: Calculate statistics
total_purchases <- sum(purchase_amounts)
average_purchase <- mean(purchase_amounts)
num_premium <- sum(premium_status)
premium_percentage <- (num_premium / length(premium_status)) * 100

# Step 4: Generate comprehensive report
cat("Customer Database Report\n")
cat("========================\n\n")

for(i in 1:length(customer_ids)) {
  premium_label <- ifelse(premium_status[i], "✓ Premium", "Standard")
  cat(sprintf("ID: %d | Name: %-20s | Purchase: $%8.2f | Status: %s\n", 
              customer_ids[i], customer_names[i], 
              purchase_amounts[i], premium_label))
}

cat("\n")
cat("="," rep("=", 70), "\n", sep="")
cat("Summary Statistics\n")
cat(rep("=", 71), "\n", sep="")
cat(sprintf("Total Purchases:        $%10.2f\n", total_purchases))
cat(sprintf("Average Purchase:       $%10.2f\n", average_purchase))
cat(sprintf("Premium Customers:      %d out of %d (%.1f%%)\n", 
            num_premium, length(customer_ids), premium_percentage))
cat(sprintf("Standard Customers:     %d out of %d (%.1f%%)\n", 
            length(customer_ids) - num_premium, length(customer_ids), 
            100 - premium_percentage))

# Step 5: Additional analysis - Premium vs Standard spending
premium_purchases <- purchase_amounts[premium_status]
standard_purchases <- purchase_amounts[!premium_status]

cat("\nDetailed Analysis\n")
cat(rep("-", 71), "\n", sep="")
cat(sprintf("Avg Premium Purchase:   $%10.2f\n", mean(premium_purchases)))
cat(sprintf("Avg Standard Purchase:  $%10.2f\n", mean(standard_purchases)))
cat(sprintf("Highest Purchase:       $%10.2f (%s)\n", 
            max(purchase_amounts), 
            customer_names[which.max(purchase_amounts)]))
cat(sprintf("Lowest Purchase:        $%10.2f (%s)\n", 
            min(purchase_amounts), 
            customer_names[which.min(purchase_amounts)]))

OUTPUT

Data Type Verification
======================
Customer IDs:  integer 
Customer Names:  character 
Purchase Amounts:  numeric 
Premium Status:  logical 

Customer Database Report
========================

ID: 1001 | Name: Alice Johnson        | Purchase: $  450.75 | Status: ✓ Premium
ID: 1002 | Name: Bob Smith            | Purchase: $  892.50 | Status: ✓ Premium
ID: 1003 | Name: Carol Williams       | Purchase: $  325.99 | Status: Standard
ID: 1004 | Name: David Brown          | Purchase: $ 1150.00 | Status: ✓ Premium
ID: 1005 | Name: Emma Davis           | Purchase: $  678.25 | Status: Standard

=======================================================================
Summary Statistics
=======================================================================
Total Purchases:        $   3497.49
Average Purchase:       $    699.50
Premium Customers:      3 out of 5 (60.0%)
Standard Customers:     2 out of 5 (40.0%)

Detailed Analysis
-----------------------------------------------------------------------
Avg Premium Purchase:   $    831.08
Avg Standard Purchase:  $    502.12
Highest Purchase:       $   1150.00 (David Brown)
Lowest Purchase:        $    325.99 (Carol Williams)

15. Best Practices Summary

🎯 Golden Rules for Using Data Types in R

Best Practice	Why It Matters	How to Implement
1. Always verify data types after import	CSV/Excel imports often misinterpret types	Use `str()`, `class()`, or `typeof()`
2. Use integer for counting	More memory efficient for large datasets	Add "L" suffix: `count <- 100L`
3. Convert before calculation	Prevents errors and unexpected results	`as.numeric()`, `as.integer()`, etc.
4. Use logical for filtering	Clear, efficient, and readable	`data[data$value > 100, ]`
5. Keep consistent types in vectors	R coerces mixed types unpredictably	Store different types in separate vectors
6. Document expected types	Makes code maintainable	Add comments explaining data type choices
7. Validate external data	User input and imports can be unreliable	Create validation functions
8. Use character for IDs	IDs shouldn't be used in calculations	`customer_id <- "CUST001"`

16. Quick Reference Card

📋 Data Type Quick Reference

Function	Purpose	Example	Returns
`class(x)`	Get object class	`class(100L)`	"integer"
`typeof(x)`	Get internal type	`typeof(100L)`	"integer"
`is.numeric(x)`	Check if numeric	`is.numeric(45.6)`	TRUE
`is.integer(x)`	Check if integer	`is.integer(100L)`	TRUE
`is.character(x)`	Check if character	`is.character("text")`	TRUE
`is.logical(x)`	Check if logical	`is.logical(TRUE)`	TRUE
`as.numeric(x)`	Convert to numeric	`as.numeric("123")`	123
`as.integer(x)`	Convert to integer	`as.integer(45.7)`	45L
`as.character(x)`	Convert to character	`as.character(100)`	"100"
`as.logical(x)`	Convert to logical	`as.logical(1)`	TRUE

17. Real-World Case Study: E-Commerce Analytics

Case Study: Complete E-Commerce Data Analysis

# Real-world scenario: E-commerce platform data analysis
# Demonstrating appropriate use of all data types

# ============= DATA CREATION =============
# Character: Product information
product_ids <- c("ELEC001", "FASH002", "HOME003", "ELEC004", "FASH005", 
                 "HOME006", "ELEC007", "FASH008")
product_names <- c("Laptop", "T-Shirt", "Coffee Maker", "Smartphone", 
                   "Jeans", "Blender", "Tablet", "Dress")
categories <- c("Electronics", "Fashion", "Home", "Electronics", 
                "Fashion", "Home", "Electronics", "Fashion")

# Numeric: Pricing and financial data
prices <- c(899.99, 29.99, 79.50, 699.99, 59.99, 49.99, 399.99, 89.99)
revenue <- c(17999.80, 1499.50, 2385.00, 27999.60, 2999.50, 
             1999.60, 11999.70, 3599.60)

# Integer: Quantities and counts
units_sold <- c(20L, 50L, 30L, 40L, 50L, 40L, 30L, 40L)
stock_remaining <- c(15L, 200L, 45L, 25L, 180L, 60L, 20L, 150L)
reorder_point <- 30L

# Logical: Business rules and status
in_stock <- stock_remaining > 0
needs_reorder <- stock_remaining < reorder_point
premium_products <- prices > 500
high_performer <- units_sold > 35L

# ============= ANALYSIS =============
cat("E-COMMERCE ANALYTICS DASHBOARD\n")
cat(rep("=", 80), "\n\n", sep="")

# Section 1: Inventory Management (Integer operations)
cat("1. INVENTORY STATUS\n")
cat(rep("-", 80), "\n", sep="")

total_units_sold <- sum(units_sold)
total_stock <- sum(stock_remaining)
items_needing_reorder <- sum(needs_reorder)

cat(sprintf("Total Units Sold:           %d units\n", total_units_sold))
cat(sprintf("Total Stock Remaining:      %d units\n", total_stock))
cat(sprintf("Items Needing Reorder:      %d out of %d products\n\n", 
            items_needing_reorder, length(product_ids)))

# Section 2: Financial Analysis (Numeric operations)
cat("2. FINANCIAL PERFORMANCE\n")
cat(rep("-", 80), "\n", sep="")

total_revenue <- sum(revenue)
average_price <- mean(prices)
highest_revenue_product <- which.max(revenue)

cat(sprintf("Total Revenue:              $%12.2f\n", total_revenue))
cat(sprintf("Average Product Price:      $%12.2f\n", average_price))
cat(sprintf("Top Revenue Product:        %s ($%.2f)\n\n", 
            product_names[highest_revenue_product], 
            revenue[highest_revenue_product]))

# Section 3: Category Performance (Character grouping)
cat("3. CATEGORY BREAKDOWN\n")
cat(rep("-", 80), "\n", sep="")

unique_categories <- unique(categories)
for(cat_name in unique_categories) {
  cat_mask <- categories == cat_name
  cat_revenue <- sum(revenue[cat_mask])
  cat_units <- sum(units_sold[cat_mask])
  cat_products <- sum(cat_mask)
  
  cat(sprintf("%-15s: %d products | %d units sold | $%10.2f revenue\n", 
              cat_name, cat_products, cat_units, cat_revenue))
}
cat("\n")

# Section 4: Product Performance Matrix (Logical filtering)
cat("4. PRODUCT PERFORMANCE MATRIX\n")
cat(rep("-", 80), "\n", sep="")
cat(sprintf("%-12s %-20s %10s %8s %12s %10s\n", 
            "ID", "Product", "Price", "Sold", "Revenue", "Status"))
cat(rep("-", 80), "\n", sep="")

for(i in 1:length(product_ids)) {
  # Determine status using logical operations
  status <- ""
  if(premium_products[i] && high_performer[i]) {
    status <- "⭐ Premium"
  } else if(high_performer[i]) {
    status <- "✓ Strong"
  } else if(needs_reorder[i]) {
    status <- "⚠ Reorder"
  } else {
    status <- "○ Normal"
  }
  
  cat(sprintf("%-12s %-20s $%9.2f %7dL $%11.2f %-10s\n", 
              product_ids[i], product_names[i], prices[i], 
              units_sold[i], revenue[i], status))
}
cat("\n")

# Section 5: Key Metrics Summary (Mixed data types)
cat("5. KEY BUSINESS METRICS\n")
cat(rep("-", 80), "\n", sep="")

premium_count <- sum(premium_products)
high_performer_count <- sum(high_performer)
avg_units_per_product <- mean(units_sold)
stock_coverage <- total_stock / total_units_sold

cat(sprintf("Premium Products (>$500):        %d (%.1f%%)\n", 
            premium_count, (premium_count/length(product_ids))*100))
cat(sprintf("High Performers (>35 units):     %d (%.1f%%)\n", 
            high_performer_count, (high_performer_count/length(product_ids))*100))
cat(sprintf("Average Units per Product:       %.1f units\n", avg_units_per_product))
cat(sprintf("Stock Coverage Ratio:            %.2fx\n", stock_coverage))
cat("\n")

# Section 6: Recommendations (Logical decision-making)
cat("6. AUTOMATED RECOMMENDATIONS\n")
cat(rep("-", 80), "\n", sep="")

urgent_reorders <- product_names[needs_reorder & in_stock]
if(length(urgent_reorders) > 0) {
  cat("⚠ URGENT: Reorder these products:\n")
  for(prod in urgent_reorders) {
    cat(sprintf("   - %s\n", prod))
  }
  cat("\n")
}

focus_products <- product_names[premium_products & high_performer]
if(length(focus_products) > 0) {
  cat("⭐ FOCUS: Increase marketing for top performers:\n")
  for(prod in focus_products) {
    cat(sprintf("   - %s\n", prod))
  }
  cat("\n")
}

underperformers <- product_names[!high_performer & !premium_products]
if(length(underperformers) > 0) {
  cat("📊 REVIEW: Consider promotions for:\n")
  for(prod in underperformers) {
    cat(sprintf("   - %s\n", prod))
  }
}

# Data Type Summary
cat("\n")
cat(rep("=", 80), "\n", sep="")
cat("DATA TYPE VERIFICATION\n")
cat(rep("=", 80), "\n", sep="")
cat(sprintf("Product IDs:        %s (identifiers)\n", class(product_ids)))
cat(sprintf("Prices:             %s (financial precision)\n", class(prices)))
cat(sprintf("Units Sold:         %s (counting)\n", class(units_sold)))
cat(sprintf("Needs Reorder:      %s (business rules)\n", class(needs_reorder)))
cat(sprintf("Categories:         %s (categorical data)\n", class(categories)))

OUTPUT

E-COMMERCE ANALYTICS DASHBOARD
================================================================================

1. INVENTORY STATUS
--------------------------------------------------------------------------------
Total Units Sold:           300 units
Total Stock Remaining:      695 units
Items Needing Reorder:      3 out of 8 products

2. FINANCIAL PERFORMANCE
--------------------------------------------------------------------------------
Total Revenue:              $    70482.30
Average Product Price:      $      288.68
Top Revenue Product:        Smartphone ($27999.60)

3. CATEGORY BREAKDOWN
--------------------------------------------------------------------------------
Electronics    : 4 products | 120 units sold | $  57998.10 revenue
Fashion        : 3 products | 140 units sold | $   8098.60 revenue
Home           : 2 products | 70 units sold | $   4384.60 revenue

4. PRODUCT PERFORMANCE MATRIX
--------------------------------------------------------------------------------
ID           Product                   Price     Sold      Revenue     Status    
--------------------------------------------------------------------------------
ELEC001      Laptop                $   899.99      20L $   17999.80 ○ Normal   
FASH002      T-Shirt               $    29.99      50L $    1499.50 ✓ Strong   
HOME003      Coffee Maker          $    79.50      30L $    2385.00 ○ Normal   
ELEC004      Smartphone            $   699.99      40L $   27999.60 ⭐ Premium  
FASH005      Jeans                 $    59.99      50L $    2999.50 ✓ Strong   
HOME006      Blender               $    49.99      40L $    1999.60 ✓ Strong   
ELEC007      Tablet                $   399.99      30L $   11999.70 ⚠ Reorder  
FASH008      Dress                 $    89.99      40L $    3599.60 ✓ Strong   

5. KEY BUSINESS METRICS
--------------------------------------------------------------------------------
Premium Products (>$500):        3 (37.5%)
High Performers (>35 units):     5 (62.5%)
Average Units per Product:       37.5 units
Stock Coverage Ratio:            2.32x

6. AUTOMATED RECOMMENDATIONS
--------------------------------------------------------------------------------
⚠ URGENT: Reorder these products:
   - Laptop
   - Tablet
   - Smartphone

⭐ FOCUS: Increase marketing for top performers:
   - Smartphone

📊 REVIEW: Consider promotions for:
   - Laptop
   - Coffee Maker

================================================================================
DATA TYPE VERIFICATION
================================================================================
Product IDs:        character (identifiers)
Prices:             numeric (financial precision)
Units Sold:         integer (counting)
Needs Reorder:      logical (business rules)
Categories:         character (categorical data)

🎓 Learning Takeaway: This case study demonstrates how choosing the right data type for each variable creates a robust, efficient, and maintainable analytics system. Notice how:

Character types preserve product identifiers and categories
Numeric types handle precise financial calculations
Integer types efficiently count items
Logical types enable clear business rule evaluation

18. Summary and Key Takeaways

🎯 Chapter Summary: Data Types in R

What We Learned:

Six Basic Data Types: Numeric, Integer, Character, Logical, Complex, and Raw
Appropriate Usage: Each data type has specific use cases in business analytics
Type Conversion: How to convert between types and when it's necessary
Data Validation: Importance of checking and verifying data types
Best Practices: Guidelines for choosing and using data types effectively

Critical Skills Acquired:

Identifying the appropriate data type for different business scenarios
Performing type-specific operations (calculations, comparisons, filtering)
Converting between data types safely and correctly
Validating data quality through type checking
Debugging common data type errors
Building efficient and maintainable R programs

💼 Business Applications Covered

Financial analysis and calculations (Numeric)
Customer database management (Character, Integer)
Inventory tracking and management (Integer, Logical)
Performance evaluation and filtering (Logical)
Quality control systems (Logical, Numeric)
E-commerce analytics (All types combined)

📚 Next Steps for Students

Practice: Complete all five exercises to reinforce learning
Experiment: Try modifying the code examples with your own data
Real Data: Apply these concepts to actual business datasets
Build Projects: Create your own analytics dashboards using appropriate data types
Prepare: These fundamentals are essential for upcoming topics (data structures, data frames, statistical analysis)

⚠️ Common Pitfalls to Avoid

Forgetting to check data types after importing data
Performing calculations on character data without conversion
Not using the "L" suffix when integers are needed
Mixing data types inappropriately in comparisons
Ignoring type coercion warnings from R
Not validating external or user-provided data

🔑 Final Key Points

Data types are foundational – Understanding them is crucial for all R programming
Choose types intentionally – Each type serves specific purposes in business analytics
Always validate – Check data types, especially after importing data
Convert when necessary – Use explicit conversion functions to avoid errors
Think about efficiency – Proper data types improve performance and memory usage
Document your choices – Comment why you chose specific data types

🎓 Self-Assessment Questions

What is the default numerical data type in R, and when should you use it?
How do you explicitly create an integer in R? Why would you choose integer over numeric?
What are three common business scenarios where logical data types are essential?
Explain the difference between class() and typeof() functions.
What happens when you try to sum a vector of character numbers without conversion?
Name three functions you can use to verify data types in R.
When importing data from CSV, why is type checking critically important?
How would you convert the string "TRUE" to an actual logical value?
What is type coercion, and when does R perform it automatically?
In a business context, should customer IDs be stored as integers or characters? Why?

Check your understanding by answering these questions and testing the concepts with actual R code!

Educational Resources Footer

📊 Data Types in R: Appropriate Uses of Different Data Types

1. Introduction to Data Types in R

2. The Six Basic Data Types in R

3. Numeric Data Type (Double)

Definition and Characteristics

✅ When to Use Numeric Data Type:

4. Integer Data Type

Definition and Characteristics

✅ When to Use Integer Data Type:

5. Character Data Type

Definition and Characteristics

✅ When to Use Character Data Type:

6. Logical Data Type

Definition and Characteristics

✅ When to Use Logical Data Type:

7. Complex Data Type

Definition and Characteristics

✅ When to Use Complex Data Type:

8. Raw Data Type

Definition and Characteristics

✅ When to Use Raw Data Type:

9. Data Type Conversion (Coercion)

Understanding Type Conversion

10. Checking and Verifying Data Types

Essential Functions for Type Checking

11. Decision Guide: Choosing the Right Data Type

12. Common Data Type Mistakes and How to Avoid Them

❌ Common Mistake #1: Forgetting the "L" suffix for integers

❌ Common Mistake #2: Performing calculations on character data

❌ Common Mistake #3: Mixing data types in comparisons

❌ Common Mistake #4: Not checking data types after import

13. Practical Exercises

📝 Exercise 1: Customer Database Analysis

📝 Exercise 2: Product Inventory Management

📝 Exercise 3: Sales Data Type Conversion

📝 Exercise 4: Employee Performance Evaluation

📝 Exercise 5: Data Validation System

14. Solution to Exercise 1

15. Best Practices Summary

🎯 Golden Rules for Using Data Types in R

16. Quick Reference Card

📋 Data Type Quick Reference

17. Real-World Case Study: E-Commerce Analytics

18. Summary and Key Takeaways

🎯 Chapter Summary: Data Types in R

💼 Business Applications Covered

📚 Next Steps for Students

⚠️ Common Pitfalls to Avoid

🔑 Final Key Points

🎓 Self-Assessment Questions

Free Educational Resources