Introduction to Databases using SqlAlchemy

Relational Databases

  • Data about entries is organised into tables.

  • Each row or record is an instance of an entity.

  • Each column has information about an attribute.

  • Tables can be linked together using unique keys.

  • Databases support more data, multiple simultaneous users, and data quality controls.

  • Data types are specified for each column .

  • SQL is used to interact with databases e.g Microsoft SQL Server, Oracle, Postgresql, sqlite etc.

Connecting to Databases

  1. Creating a database engine

  2. Query the database

Steps sqlalchemy's create_engine() makes an engine to handle database connections

  • Needs string url of database to connect to.

  • SQLite url format: sqlite:///filename.db

pd.read_sql(query, engine) to load in data from a database. Arguments

  • query: string containing sql query to run or table to load.

  • engine: connection/database engine object

Getting data from a database

# Import sqlalchemy's create_engine() function
from sqlalchemy import create_engine

# Create the database engine
engine = create_engine('sqlite:///data.db')

# View the tables in the database
print(engine.table_names())

Load entire tables

# Create the database engine
engine = create_engine("sqlite:///data.db")

# Create a SQL query to load the entire weather table
query = """
SELECT * 
  FROM weather;
"""

# Load weather with the SQL query
weather = pd.read_sql(query, engine)

# View the first few rows of data
print(weather.head())

Selecting columns with SQL

# Create database engine for data.db
engine =  create_engine("sqlite:///data.db")

# Write query to get date, tmax, and tmin from weather
query = """
SELECT date, 
       tmax, 
       tmin
  FROM weather;
"""

# Make a dataframe by passing query and engine to read_sql()
temperatures = pd.read_sql(query,engine)

# View the resulting dataframe
print(temperatures)

Selecting rows

# Create query to get hpd311calls records about safety
query = """
select *
from hpd311calls
where complaint_type = 'SAFETY';
"""

# Query the database and assign result to safety_calls
safety_calls = pd.read_sql(query, engine)

# Graph the number of safety calls by borough
call_counts = safety_calls.groupby('borough').unique_key.count()
call_counts.plot.barh()
plt.show()

Filtering on multiple conditions

# Create query for records with max temps <= 32 or snow >= 1
query = """
SELECT *
  FROM weather
  where tmax <= 32 or 
  snow >= 1 ;
"""

# Query database and assign result to wintry_days
wintry_days = pd.read_sql(query, engine)

# View summary stats about the temperatures
print(wintry_days.describe())

Counting in groups

# Create query to get call counts by complaint_type
query = """
select complaint_type, 
     count(*)
  FROM hpd311calls
  group by complaint_type;
"""

# Create dataframe of call counts by issue
calls_by_issue = pd.read_sql(query, engine)

# Graph the number of calls for each housing issue
calls_by_issue.plot.barh(x="complaint_type")
plt.show()

Working with aggregate functions

# Create query to get temperature and precipitation by month
query = """
SELECT month, 
        MAX(tmax), 
        MIN(tmin),
        sum(prcp)
  FROM weather 
 GROUP BY month;
"""

# Get dataframe of monthly weather stats
weather_by_month = pd.read_sql(query, engine)

# View weather stats by month
print(weather_by_month)

Joining tables

# Query to join weather to call records by date columns
query = """
SELECT * 
  FROM hpd311calls
  JOIN weather 
  ON hpd311calls.created_date = weather.date;
"""

# Create dataframe of joined tables
calls_with_weather = pd.read_sql(query,engine)

# View the dataframe to make sure all columns were joined
print(calls_with_weather.head())

Joining and filtering

# Query to get hpd311calls and precipitation values
query = """
SELECT hpd311calls.*, weather.prcp
  FROM hpd311calls
  join weather
  on hpd311calls.created_date = weather.date;"""

# Load query results into the leak_calls dataframe
leak_calls = pd.read_sql(query, engine)

# View the dataframe
print(leak_calls.head())

or

query = """
SELECT hpd311calls.*, weather.prcp
  FROM hpd311calls
  JOIN weather
    ON hpd311calls.created_date = weather.date
  where hpd311calls.complaint_type = 'WATER LEAK';"""

# Load query results into the leak_calls dataframe
leak_calls = pd.read_sql(query, engine)

# View the dataframe
print(leak_calls.head())