Free Python Tutorial for Data Science for Beginners
The Free Python Tutorial for Data Science for Beginners helps students build strong foundations in Python while exploring essential data science concepts. Moreover, it introduces core topics such as variables, loops, and data types.
Introduction to Python Concepts
This chapter covers basic programming topics and gradually explains how Python is used in data cleaning and visualization. Additionally, learners understand NumPy, Pandas, and Matplotlib through simple examples. The sentences are clear and structured for easy reading.
Learning Resources and Notes
Students also receive helpful practice questions and notes to reinforce concepts. As a result, they gain confidence in handling real datasets. This section ensures step-by-step learning with simple explanations so beginners can follow easily.
Introduction to Python
Python is one of the most popular programming languages used in business analysis, artificial intelligence, automation, and data science. It is known for its simple syntax, making it easy for beginners—especially MBA students without a technical background—to learn quickly. Python allows you to automate reports, analyze datasets, work with Excel/CSV files, and perform advanced analytics with minimal code. Its readability and large community support make it ideal for business decision-making scenarios. Python programs can be written in any text editor and executed instantly, making it a flexible tool for business professionals.
Example Python Program
# Simple Python Program
print("Welcome to Python for Business Analysis!") # prints text to the screen
Welcome to Python for Business Analysis!
Python Data Types: Numbers, Strings, Lists
1. Numbers
Numbers in Python include integers, floating-point numbers, and complex values. They are commonly used in business for calculations such as sales totals, profit margins, interest rates, and forecasting. Python allows direct mathematical operations without requiring complex formulas. You can store numbers in variables and perform addition, subtraction, multiplication, or even financial projections. Numbers can also be combined with strings using formatting, making them ideal for automated report generation.
# Numbers in Python
price = 1200 # integer
tax_rate = 0.18 # float
total = price + (price * tax_rate) # calculating tax
print("Total Price:", total)
Total Price: 1416.0
2. Strings
Strings are sequences of characters used to store text such as customer names, email IDs, company titles, or product descriptions. Python provides many string methods for manipulating and analyzing business-related text data. You can clean data, convert text to uppercase, extract email domains, or generate formatted business reports. Strings are flexible, making them essential for preparing automated text-based output.
# Working with Strings
company = "Global Business Solutions"
print(company.upper()) # convert to uppercase
print("Word Count:", len(company.split())) # count words
GLOBAL BUSINESS SOLUTIONS
Word Count: 3
3. Lists
Lists store multiple values in a single variable and allow easy addition, removal, or modification of items. They are widely used in business analytics for storing monthly sales numbers, customer groups, product categories, or survey responses. Lists are ordered and dynamically changeable, making them ideal for tasks where data needs to be updated frequently. Python’s list functions help analysts iterate through data and perform calculations efficiently.
# Using Lists in Python
sales = [1200, 1500, 1100, 1800] # list of monthly sales
sales.append(2000) # add new sale value
print("Updated Sales:", sales)
print("Highest Sale:", max(sales))
Updated Sales: [1200, 1500, 1100, 1800, 2000]
Highest Sale: 2000
Python Data Types: Tuples and Dictionaries
1. Tuples
Tuples are similar to lists but cannot be changed after creation. This immutability makes them useful for fixed datasets such as product codes, department names, or months of the year. Tuples are faster than lists and ensure data safety by preventing accidental modifications. They are ideal for storing reference information used in business applications.
# Using Tuples
months = ("Jan", "Feb", "Mar", "Apr")
print("Total Months:", len(months))
print("First Month:", months[0])
Total Months: 4
First Month: Jan
2. Dictionaries
Dictionaries store data in key-value pairs, making them perfect for representing structured business information. They are commonly used to store customer data, employee records, product details, or business metrics. Dictionaries allow fast lookup and modification using keys, making them ideal for data analysis and report automation.
# Dictionary Example
employee = {
"name": "John Smith",
"department": "Finance",
"salary": 75000
}
print("Employee Name:", employee["name"])
print("Department:", employee["department"])
Employee Name: John Smith
Department: Finance
Files and Exceptions
Working with Files
Python makes it easy to read and write files, which is essential for business analysts who manage reports, financial data, logs, and CSV files.
Using Python file handling, analysts can automate report creation, extract information, or save processed data.
The with open() structure ensures files are safely opened and closed, reducing errors.
# Writing to a File
with open("report.txt", "w") as file:
file.write("Sales Report Generated Successfully")
print("File Created!")
File Created!
Exceptions (Error Handling)
Exceptions allow your program to continue running even when an error occurs. This is important in business applications where missing files, incorrect input, or calculation issues are common. Using a try-except block ensures stability and prevents program crashes, improving overall reliability.
# Example of Exception Handling
try:
num = int("ABC") # this will cause an error
except ValueError:
print("Error: Invalid number entered")
Error: Invalid number entered
Try this example : Exceptions (Error Handling)
try:
age = int(input("Enter your age: "))
print("Your age:", age)
except ValueError:
print("Please enter a valid number!")
Practice Exercises
- Create a Python program that calculates profit = selling price − cost price.
- Write a program that stores 5 customer names in a list and prints the first and last name.
- Create a dictionary for a product and print each value.
- Write a script to read a text file and count the number of words.
- Use exception handling to prevent errors while dividing numbers.
Types of Operators in Python
Operators in Python are special symbols that perform operations on variables and values. They allow Python to process mathematical expressions, compare results, assign values, and perform logical checks. Operators are essential for business calculations, data filtering, analytics, and automation tasks. There are several types of operators: Arithmetic (like +, -, *), Comparison (like ==, >, <), Logical (and, or, not), Assignment (like +=, -=), and Membership (in, not in). Understanding operators allows analysts to make decisions within code, build rules, and create data-driven workflows efficiently.
# Types of Operators in Python
a = 10
b = 5
print("Arithmetic:", a + b) # Arithmetic Operator
print("Comparison:", a > b) # Comparison Operator
print("Logical:", a > 0 and b > 0) # Logical Operator
x = 20
x += 5 # Assignment Operator
print("Assignment:", x)
print("Membership:", 5 in [1, 2, 3, 4, 5]) # Membership Operator
Arithmetic: 15
Comparison: True
Logical: True
Assignment: 25
Membership: True
Classes and Objects
Classes and objects form the foundation of Object-Oriented Programming (OOP) in Python. A class is a blueprint or template for creating objects—similar to how a company policy defines employee roles. An object is an instance of a class, containing specific data and behaviors. This structure helps organize business logic, model real-world systems (such as customer records, invoices, or product catalogs), and create scalable applications. OOP makes programs modular, reusable, and easier to maintain. Classes contain attributes (data) and methods (functions), enabling analysts to structure complex workflows cleanly.
# Example of Classes and Objects
class Employee:
def __init__(self, name, department):
self.name = name
self.department = department
def show_details(self):
print(f"Employee: {self.name}, Department: {self.department}")
# Creating an object
emp1 = Employee("John Smith", "Finance")
emp1.show_details()
Employee: John Smith, Department: Finance
Python Example: Classes and Objects (Detailed Explanation)
Full Code:
# Example of Classes and Objects
class Employee:
def __init__(self, name, department):
self.name = name
self.department = department
def show_details(self):
print(f"Employee: {self.name}, Department: {self.department}")
# Creating an object
emp1 = Employee("John Smith", "Finance")
emp1.show_details()
What This Code Teaches
This example demonstrates how to create a class, define a constructor (__init__), create object attributes, write methods, and finally create and use an object.
1. What is a Class?
A class is a blueprint for creating objects. In this case, the class is named Employee.
A class defines properties and behaviors that created objects will have.
class Employee:
...
2. The Constructor: __init__()
The constructor is a special method that runs automatically whenever a new object is created. It initializes the object's attributes.
def __init__(self, name, department):
self.name = name
self.department = department
Here:
selfrefers to the object itselfnameanddepartmentare inputs provided when creating the objectself.nameandself.departmentstore those values inside the object
3. Instance Attributes
These lines create variables that belong to each object:
self.name = name
self.department = department
4. Method Inside the Class
A method is a function defined inside a class. Here, show_details() prints information stored in
the object.
def show_details(self):
print(f"Employee: {self.name}, Department: {self.department}")
5. Creating an Object
Below line creates a new Employee object and automatically calls the constructor.
emp1 = Employee("John Smith", "Finance")
6. Calling the Method
This line calls the show_details() method on the object:
emp1.show_details()
Output:
Employee: John Smith, Department: Finance
Summary
class Employee→ defines a blueprint__init__()→ initializes object dataself→ refers to the object itselfshow_details()→ prints stored informationemp1 = Employee(...)→ creates an objectemp1.show_details()→ calls a method
Reading Files Using open()
The open() function lets Python read data stored in text files.
This is useful for business analysts who deal with reports, logs, survey results, or exported CRM data.
Python can read files line-by-line, load complete content, or process large documents efficiently.
Using modes like "r" (read), analysts can extract insights, clean raw text, or convert data into structured form.
Reading files is a core part of automation workflows such as scheduled analytics reports, financial statement parsing, and batch processing.
# Reading a file using open()
with open("sample.txt", "r") as file:
data = file.read()
print(data)
This is sample file content.
Writing Files Using open()
Python can also write information into files using the same open() function but with the mode "w" (write) or "a" (append).
This is extremely useful for automating business reports, storing processed data, creating logs, or exporting results after analysis.
The write mode replaces existing content, while append mode adds new content without deleting old data.
File-writing helps analysts build automated systems where Python saves updates, summaries, or financial calculations into readable documents.
# Writing data to a file
with open("report.txt", "w") as file:
file.write("Business Report Generated Successfully")
print("File Saved")
File Saved
Loading Data with Pandas
Tutorial: Create a small DataFrame, save to CSV (current working directory), and read it back using pandas
This short tutorial is written for students. It shows each step with a small example DataFrame, how to save it as
students.csv in the current working directory (CWD), and how to import the same CSV again. Paste this
block into any large HTML file — it uses only basic tags and <pre><code> blocks, so it will
not affect other page formatting.
Full runnable Python code (copy & run in VS Code / terminal)
import os
import pandas as pd
# 1) Show current working directory (where the CSV will be saved)
print("Current working directory:", os.getcwd())
# 2) Create a small DataFrame
data = {
"student_id": [101, 102, 103],
"name": ["Asha", "Ravi", "Maya"],
"grade": ["5th", "6th", "5th"],
"score": [88, 92, 79]
}
df = pd.DataFrame(data)
print("\nDataFrame created:")
print(df)
# 3) Save DataFrame to CSV in the current working directory
csv_filename = "students.csv"
df.to_csv(csv_filename, index=False) # index=False avoids writing the row numbers to file
print(f"\nSaved DataFrame to CSV: {csv_filename}")
# 4) Confirm file exists where expected
full_path = os.path.join(os.getcwd(), csv_filename)
print("Full CSV path:", full_path)
print("File exists?:", os.path.exists(full_path))
# 5) Read the CSV back into a new DataFrame
df_loaded = pd.read_csv(csv_filename)
print("\nCSV loaded back into DataFrame:")
print(df_loaded)
Line-by-line explanation (stepwise)
Imports
import os
import pandas as pd
os helps you find and join file paths and check the current working directory. pandas is the library used for DataFrame operations and CSV read/write.
1) Check the current working directory
print("Current working directory:", os.getcwd())
os.getcwd() returns the folder path where your Python process is running. When you call
df.to_csv("students.csv"), the file will be saved inside this directory unless you provide an absolute path.
2) Create a small DataFrame
data = {
"student_id": [101, 102, 103],
"name": ["Asha", "Ravi", "Maya"],
"grade": ["5th", "6th", "5th"],
"score": [88, 92, 79]
}
df = pd.DataFrame(data)
We build a Python dictionary where keys are column names and values are lists. pd.DataFrame(...) converts
it into a table-like structure.
3) Save DataFrame to CSV
csv_filename = "students.csv"
df.to_csv(csv_filename, index=False)
to_csv writes the DataFrame to a CSV file. index=False prevents pandas from writing the row
index (0,1,2...) into a separate column — usually what you want for a clean CSV.
4) Confirm file path and existence
full_path = os.path.join(os.getcwd(), csv_filename)
print("File exists?:", os.path.exists(full_path))
This confirms the CSV was written where you expected. Use your OS file explorer or the VS Code Explorer panel to
visually verify students.csv appears in the same folder.
5) Read the CSV back into Python
df_loaded = pd.read_csv(csv_filename)
print(df_loaded)
pd.read_csv() reads the CSV file and recreates a DataFrame. This verifies that the saved CSV contains
the expected data.
Extra tips for VS Code students
-
If you run the script from the integrated terminal, check the terminal prompt to see the CWD, or run
pwd(macOS/Linux) orpwdin PowerShell /cdin Command Prompt. - If the CSV doesn't appear in Explorer: click the Explorer refresh button or reopen the folder.
-
To save in a specific folder: give an absolute or relative path, e.g.
df.to_csv("data/students.csv", index=False). Make sure the folder (data/) exists. -
To view only first few rows: use
print(df.head()).
What students should practice next
- Create a DataFrame with more rows and different data types (dates, floats, categories).
- Try reading other CSVs with
pd.read_csvand inspect columns withdf.columns. - Experiment with
index=Trueor saving Excel files (df.to_excel(...)).
Pandas is a powerful data analysis library used heavily in business analytics.
It provides a DataFrame—a table-like structure ideal for working with financial reports, sales datasets, customer information, and large CSV/Excel files.
With Pandas, loading data is straightforward using functions like read_csv() or read_excel().
Pandas automatically structures raw data, making it easy to clean, filter, summarize, visualize, or export.
This allows analysts to convert raw business data into insights with just a few lines of code.
import pandas as pd
# Loading CSV data
df = pd.read_csv("sales.csv")
print(df.head()) # view first 5 rows
Displays first 5 rows of the CSV file.
Working With and Saving Data Using Pandas
After loading data, Pandas enables powerful operations such as filtering rows, adding new columns, calculating summaries, grouping data, and merging datasets.
This makes it ideal for business intelligence tasks such as revenue analysis, customer segmentation, and financial modeling.
Once analysis is complete, Pandas allows saving results back into CSV or Excel formats using to_csv() or to_excel().
This helps automate data pipelines and create shareable business reports.
import pandas as pd
# Working with data
df = pd.DataFrame({
"Product": ["A", "B", "C"],
"Sales": [1200, 1500, 1000]
})
df["Tax"] = df["Sales"] * 0.18 # adding new column
# Saving the updated data
df.to_csv("updated_sales.csv", index=False)
print(df)
Product Sales Tax
0 A 1200 216.0
1 B 1500 270.0
2 C 1000 180.0
Array-Oriented Programming with NumPy
NumPy (Numerical Python) is the foundation of numerical and scientific computing in Python. It introduces the powerful ndarray object, which is significantly faster and more memory-efficient than Python lists. NumPy allows analysts to perform complex mathematical operations—such as matrix multiplication, statistical analysis, or forecasting—using just a few lines of code. Its vectorized operations eliminate slow Python loops, enabling large-scale business analytics and financial modeling. NumPy arrays support element-wise calculations, broadcasting, reshaping, filtering, and aggregation—making them essential for data preprocessing, machine learning, simulation models, demand forecasting, and quantitative analysis.
import numpy as np
# Creating a NumPy array
sales = np.array([1200, 1500, 1700, 1600])
# Vectorized operations (faster than Python lists)
tax = sales * 0.18 # calculate 18% tax for all values
total_sales = sales + tax
print("Sales:", sales)
print("Tax:", tax)
print("Total Sales:", total_sales)
Sales: [1200 1500 1700 1600]
Tax: [216. 270. 306. 288.]
Total Sales: [1416. 1770. 2006. 1888.]
More NumPy Features (Reshaping, Aggregation)
import numpy as np
data = np.array([[10, 20, 30],
[40, 50, 60]])
print("Sum of all values:", data.sum())
print("Row-wise sum:", data.sum(axis=1))
print("Column-wise max:", data.max(axis=0))
Sum of all values: 210
Row-wise sum: [ 60 150]
Column-wise max: [40 50 60]
Data Cleaning and Preparation
Data cleaning and preparation are critical steps in business analytics because raw data often contains missing values, duplicates, inconsistent formatting, or incorrect types. Before analysis or modeling, data must be standardized, cleaned, and transformed into a usable format. Pandas provides powerful tools for handling missing data, removing duplicates, converting data types, renaming columns, filtering records, and creating new calculated fields. Clean data leads to accurate insights and ensures decisions are based on reliable information. This step is considered the backbone of data science, representing nearly 60–70% of the analytics workflow in most business environments.
import pandas as pd
# Sample raw data
data = {
"Product": ["A", "B", "C", "C"],
"Sales": [1200, None, 1500, 1500],
"City": ["Delhi", "Mumbai", "Delhi", "Delhi"]
}
df = pd.DataFrame(data)
# Cleaning operations
df["Sales"] = df["Sales"].fillna(df["Sales"].mean()) # fill missing values
df = df.drop_duplicates() # remove duplicate rows
df["City"] = df["City"].str.upper() # standardize text
print(df)
Product Sales City
0 A 1200.0 DELHI
1 B 1400.0 MUMBAI
2 C 1500.0 DELHI
Additional Cleaning Tasks
# More cleaning operations df["Sales_Tax"] = df["Sales"] * 0.18 # new calculated column filtered = df[df["Sales"] > 1300] # filter by condition print(filtered)
Product Sales City Sales_Tax
1 B 1400.0 MUMBAI 252.0
2 C 1500.0 DELHI 270.0
Plotting and Visualization
Plotting and visualization play a crucial role in business analytics, helping convert raw data into meaningful visual patterns. Python’s Matplotlib library is widely used to create charts such as line graphs, bar charts, histograms, pie charts, and scatter plots. These visuals help students and managers quickly observe trends, compare categories, and detect business patterns like seasonality, fluctuations, or outliers. Visualization simplifies communication and supports decision-making by presenting complex information in an easy-to-understand manner. Matplotlib is flexible, customizable, and works well with Pandas, making it an essential tool for reporting, dashboards, and presentations.
Example: Monthly Sales Line Graph
import matplotlib.pyplot as plt
# Sample sales data
months = ["Jan", "Feb", "Mar", "Apr", "May"]
sales = [1200, 1500, 1700, 1600, 2000]
plt.plot(months, sales, marker='o')
plt.title("Monthly Sales Trend")
plt.xlabel("Months")
plt.ylabel("Sales in USD")
plt.grid(True)
plt.show()
A line graph showing sales increasing from Jan to May with markers on each point.
Example: Bar Chart of City-wise Revenue
cities = ["Delhi", "Mumbai", "Chennai", "Kolkata"]
revenue = [50000, 60000, 45000, 52000]
plt.bar(cities, revenue)
plt.title("Revenue by City")
plt.xlabel("City")
plt.ylabel("Revenue")
plt.show()
A bar chart with four bars representing revenue in each city.
Data Aggregation and Group Operations
Data aggregation is the process of summarizing information to identify meaningful patterns. In business analytics, this includes performing operations such as sum, mean, count, max, or min across different groups. Pandas provides the groupby() function, which allows analysts to group data based on categories like region, product type, or month, and perform descriptive statistics. Group operations are essential for tasks such as analyzing total sales by region, average revenue per product, customer segmentation, or comparing performance across time periods. Aggregation transforms raw transactional data into insights suitable for dashboards and strategic decisions.
Example: Aggregating Sales by City
import pandas as pd
# Sample business data
data = {
"City": ["Delhi", "Mumbai", "Delhi", "Chennai", "Mumbai"],
"Sales": [1200, 1500, 1800, 1300, 1700]
}
df = pd.DataFrame(data)
# Group by City and calculate total sales
city_sales = df.groupby("City")["Sales"].sum()
print(city_sales)
City
Chennai 1300
Delhi 3000
Mumbai 3200
Name: Sales, dtype: int64
Example: Multiple Aggregations (Sum, Mean)
product_data = {
"Product": ["A", "A", "B", "B", "C"],
"Revenue": [2000, 2500, 1800, 2200, 3000]
}
df2 = pd.DataFrame(product_data)
# Apply multiple aggregation functions
summary = df2.groupby("Product")["Revenue"].agg(["sum", "mean", "max"])
print(summary)
sum mean max
Product
A 4500 2250 2500
B 4000 2000 2200
C 3000 3000 3000
Intro to Python — tiny dataset + step-by-step practice
A compact, easy dataset and annotated Python snippets covering: basic data types, functions, strings & lists, tuples & dicts, files & exceptions, operators, classes/objects, reading/writing files, pandas & numpy, cleaning, plotting, and group aggregation. Copy code into a Python file or Jupyter notebook and run.
Tiny dataset (CSV) — "students.csv"
This is a very small dataset (6 rows). Students should copy this into a file named students.csv or run the code below to create it automatically.
Tip: students can either save the CSV manually or run the "Create CSV" Python snippet below to create it automatically in the working folder.
# create_students_csv.py
csv_text = """Name,Age,Grade,Passed,Math,English,Date
Alice,20,Junior,True,85,78,2024-09-10
Bob,19,Sophomore,False,62,55,2024-09-12
Cara,21,Senior,True,91,88,2024-09-11
Dan,20,Junior,True,73,80,2024-09-10
Eva,22,Senior,False,58,65,2024-09-12
Finn,19,Sophomore,True,79,82,2024-09-11
"""
with open("students.csv", "w", encoding="utf-8") as f:
f.write(csv_text)
print("students.csv created (6 rows).")
Run this once to create students.csv in the same folder as your script.
1) Introduction to Python: Data types
Show basic Python types and simple operations.
# types_demo.py
# 1. Numbers
a = 10 # int
b = 3.5 # float
# 2. Strings
s = "Hello"
# 3. Boolean
flag = True
# 4. None (no value)
x = None
# Print types and simple arithmetic
print(type(a), type(b), type(s), type(flag), type(x))
print("sum:", a + b) # addition
print("concat:", s + " world")
Explanation: type() tells you the variable type. Integers and floats support arithmetic; strings can be concatenated.
2) Functions
Define and use functions with parameters and return values.
# functions_demo.py
def add(x, y):
"Return sum of x and y"
return x + y
def greet(name="Student"):
"Return a greeting"
return f"Hello, {name}!"
print(add(3, 4)) # prints 7
print(greet("Aisha")) # prints "Hello, Aisha!"
print(greet()) # uses default argument
Explanation: functions keep code reusable. Default arguments are helpful for optional values.
3) Strings and Lists
Common string methods and list operations.
# strings_lists.py
# Strings
s = " Data Science "
print("Strip:", s.strip()) # remove surrounding spaces
print("Upper:", s.upper()) # UPPERCASE
print("Slice:", s[2:10]) # substring
# Lists
students = ["Alice", "Bob", "Cara"]
students.append("Dan") # add element
students.insert(1, "Zoe") # insert at index
print(students)
print("pop:", students.pop()) # remove last
print("index of Bob:", students.index("Bob"))
Explanation: strings are immutable; lists are mutable and support append/insert/pop.
4) Tuples and Dictionaries
Tuples are immutable; dictionaries map keys to values.
# tuples_dicts.py
# Tuple (immutable)
coords = (10, 20)
# coords[0] = 5 # would raise an error
# Dictionary
student = {"name":"Alice", "age":20, "grade":"Junior"}
print(student["name"]) # access by key
student["age"] = 21 # update
student["city"] = "Mumbai" # add new key
print(student)
# iterate dict
for k, v in student.items():
print(k, "->", v)
Explanation: use dicts when you want labeled fields (like columns).
5) Files and Exceptions (reading/writing basics)
How to safely read/write files and handle exceptions.
# files_exceptions.py
# Read whole file
try:
with open("students.csv", "r", encoding="utf-8") as f:
contents = f.read()
print("File loaded, length:", len(contents))
except FileNotFoundError:
print("students.csv not found. Run the create script first.")
# Write a small text file
try:
with open("notes.txt", "w", encoding="utf-8") as f:
f.write("This is a note from Python.")
print("notes.txt written.")
except Exception as e:
print("Error writing file:", e)
Explanation: with ensures file is closed; exceptions help handle errors gracefully.
6) Types of Operators (arithmetic, comparison, logical)
# operators_demo.py
x = 7
y = 3
# arithmetic
print(x + y, x - y, x * y, x / y, x // y, x % y, x ** y)
# comparisons
print(x > y, x == y, x != y)
# logical
print((x > 5) and (y < 5))
print((x > 10) or (y < 5))
print(not (x == y))
Explanation: use the right operator for integer division (//), power (**), and logical tests.
7) Classes and Objects (simple example)
# classes_demo.py
class Student:
def __init__(self, name, age, grade):
# constructor: run when object is created
self.name = name
self.age = age
self.grade = grade
def is_adult(self):
return self.age >= 18
def summary(self):
return f"{self.name} ({self.age}) - {self.grade}"
# create objects
s = Student("Alice", 20, "Junior")
print(s.summary())
print("Is adult?", s.is_adult())
Explanation: classes bundle data (attributes) and behavior (methods).
8) Reading files with open (CSV) & 9) Writing files with open
Two ways: manual CSV parsing and using csv module. For quick practice we'll use the csv module.
# read_write_csv.py
import csv
# Read students.csv
with open("students.csv", "r", encoding="utf-8") as f:
reader = csv.DictReader(f) # each row is an ordered dict
rows = list(reader)
print("Rows read:", len(rows))
print("First row:", rows[0])
# Modify in-memory and write a new CSV
rows[0]["City"] = "Pune" # add a new column value to first row
fieldnames = list(rows[0].keys())
with open("students_modified.csv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
print("students_modified.csv written.")
Explanation: DictReader makes CSV rows accessible by column name; DictWriter writes structured output.
10) Loading data with Pandas
Simple pandas usage: read CSV, inspect, head(), dtypes.
# pandas_basic.py
import pandas as pd
df = pd.read_csv("students.csv", parse_dates=["Date"])
print(df.head()) # show first rows
print(df.dtypes) # column types
print(df.describe()) # summary stats for numeric columns
Explanation: parse_dates turns the Date column into datetime objects.
11) Working with and saving with Pandas
# pandas_save.py
import pandas as pd
df = pd.read_csv("students.csv", parse_dates=["Date"])
# Add a computed column: total score
df["Total"] = df["Math"] + df["English"]
# Filter: passed students
passed = df[df["Passed"] == True]
# Save filtered dataframe
passed.to_csv("students_passed.csv", index=False)
print("Saved students_passed.csv with", len(passed), "rows.")
Explanation: DataFrames let you add columns and save to disk with to_csv.
12) Array-oriented programming with NumPy
# numpy_demo.py
import numpy as np
# create numpy array from list
scores = np.array([85, 62, 91, 73, 58, 79])
print("mean:", scores.mean())
print("std:", scores.std())
print("add 5 to all:", scores + 5) # vectorized operation (fast)
# boolean mask
mask = scores >= 70
print("scores >= 70:", scores[mask])
Explanation: NumPy applies operations to entire arrays at once (vectorized), which is faster than Python loops for large data.
13) Data cleaning and preparation (Pandas)
# pandas_cleaning.py
import pandas as pd
df = pd.read_csv("students.csv", parse_dates=["Date"])
# 1. Check for missing values
print(df.isna().sum())
# 2. Example: fill missing numeric values with column mean (if any)
# df["Math"] = df["Math"].fillna(df["Math"].mean())
# 3. Convert types: ensure Age is integer
df["Age"] = df["Age"].astype(int)
# 4. Remove duplicate rows (if present)
df = df.drop_duplicates()
# 5. Rename column
df = df.rename(columns={"English":"Eng"})
print("Cleaned dataframe:")
print(df.head())
Explanation: common cleaning steps: find & fill missing values, correct data types, drop duplicates, rename columns.
14) Plotting and Visualization (Matplotlib)
Basic line / bar / histogram plots (students can run in Jupyter or a Python script that opens a window).
# plotting_demo.py
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("students.csv")
df["Total"] = df["Math"] + df["English"]
# Bar plot of total score by student
plt.figure(figsize=(6,4))
plt.bar(df["Name"], df["Total"])
plt.title("Total score by student")
plt.xlabel("Student")
plt.ylabel("Total")
plt.tight_layout()
plt.show()
# Histogram of Math scores
plt.figure(figsize=(6,4))
plt.hist(df["Math"], bins=5)
plt.title("Distribution of Math scores")
plt.xlabel("Math score")
plt.ylabel("Count")
plt.tight_layout()
plt.show()
Explanation: plt.show() displays the figure (in notebooks it appears inline; in scripts it opens a window).
15) Data aggregation and group operations (Pandas)
# groupby_demo.py
import pandas as pd
df = pd.read_csv("students.csv")
# Example: average Math score by Grade (Junior/Sophomore/Senior)
grouped = df.groupby("Grade")["Math"].mean().reset_index().rename(columns={"Math":"Avg_Math"})
print(grouped)
# Aggregation with multiple functions
agg = df.groupby("Grade").agg({
"Math": ["mean", "min", "max"],
"English": ["mean"]
})
print(agg)
# Count passed vs not passed
pass_counts = df.groupby("Passed").size().reset_index(name="Count")
print(pass_counts)
Explanation: groupby + agg compute summary stats per group (very common in data analysis).
Suggested order for students to practice
- Run the CSV creation script to get
students.csv. - Try basic types, strings & lists, tuples & dicts, operators.
- Practice functions and a simple class.
- File read/write and exceptions.
- Install pandas & numpy (
pip install pandas numpy matplotlib) and try Pandas examples. - Do cleaning, then plotting, then groupby aggregation.
Note: to run plotting code in a headless environment (like some servers) you may need to run in Jupyter or save plots to files using plt.savefig("plot.png").
Small Exercises for Students (practice)
- Calculate the average total score for all students (use Pandas or numpy).
- Find the student with the highest Math score and print their name.
- Create a function that accepts a student's name and returns their English score (or a message if not found).
- Add a column
Resultthat contains "Pass" ifPassed==Trueelse "Fail", and save to a new CSV. - Group by
Gradeand plot average total score per grade.

