Python PyODBC guide to retrieve data into Pandas DataFrame
This guide is answering my questions that I had when I wanted to connect Python via PyODBC to a MSSQL database on Windows Server 2019.
I started with the PyODBC Documentation and read up on the main parts:
The PyODBC module
Connection
Cursors
Fetching Rows with for loops
Other miscellaneous things that are good to know.
This didn’t answer my questions nor solve my problems about how to get the SQL query results into a DataFrame. That said, there are probably some way to get it done with a connection, a cursor and some looping over rows… However, there was an easier way!
How do I perform a select query from a database with PyODBC and store the results into a Pandas DataFrame in Python?
Here is a Python code example of how you can connect, select data from multiple data bases and store results into a DataFrame with PyODBC:
import pandas as pd
import pyodbc
DB_name = "DB_1"
DB_user = "Peter"
DB_password = "secretpassword"
conn = pyodbc.connect("DRIVER={ODBC Driver 17 for SQL Server};" +
"SERVER=192.168.0.1,1433;" +
"DATABASE={};".format(DB_name) +
"UID={};".format(DB_user) +
"PWD={};".format(DB_password)
)
sql = """
select a.CustomerID, b.TransactionID
from dbo.Table_1 a
left join DB_2.sales.Transaction b on a.CustomerID = b.CustomerID
"""
df = pd.read_sql(sql, conn)
conn.close()
Some more detailed explanations to the above example:
df is your dataframe with results from your query.
Because we assign DB_name = “DB_1” and use it in the the connect string to PyODBC, that database will be assumed to be DB_1 and we leave it out in our FROM statement. However, even if we did say we wanted DB_1 as default database, we can still make a LEFT JOIN to DB_2, i.e. join between two different databases, by specifying that we want to join with DB_2.schema.table.
Lets break down the row “from DB_1.dbo.Table_1 a”
Here we select from database = DB_1, schema=dbo, table or view = Table_1 and we assign an alias “a” to this table.
If we already in the connect string said DATABASE=”DB_1” we only need to write “from dbo.Table_1” since “DB_1” is assumed.
I use the ODBC driver 17 for SQL Server. You google it and download from Microsoft webpage
I hope this was clear on how to retrieve data with PyODBC and store it into a Pandas DataFrame.
With this method you don’t need to learn about cursors and the specifics in PyODBC because Pandas read_sql function takes care of all that for you.
If you have questions, please comment below.