In a previous post Writing data from a Pandas Dataframe to a MySQL table you learned how to write(export) data in a Pandas dataframe to MySQL table using the to_sql() function. Now let's see how to read(import) data from a MySQL database table on to a Pandas DataFrame.
The read_sql() function allows you to read data from a MySQL table. Similar to to_sql
function, the read_sql
function also needs at least two parameters. The first parameter is a SQL query string or a table name and second is the SQLAlchemy engine or a database string URI.
Here is a simple example for using read_sql()
function to read all records from a MySQL database table named employee.
from sqlalchemy import create_engine import pandas as pd hostname="localhost" dbname="mydatabase_name" uname="root" pwd="password" tablename="employee" engine = create_engine("mysql+pymysql://{user}:{pw}@{host}/{db}" .format(host=hostname, db=dbname, user=uname,pw=pwd)) df = pd.read_sql(tablename, engine) print(df.head())
Getting data using an SQL query instead table name
In the above example we are passing the table name to the read_sql function. Alternatively you can pass a SQL query in place of the table name. So for example you can also make the read_sql call like below:
sql = "Select * from employee" df = pd.read_sql(sql, engine)
Using a database connection URI instead of SQLAlchemy engine
In the above examples, the second parameter for the read_sql
function is connection/engine created using the create_engine
function of SQLAlchemy. Instead of the SQLAlchemy engine you can pass a Database URI. For example
df = pd.read_sql(sql, "mysql+pymysql://{user}:{pw}@{host}/{db}" .format(host=hostname, db=dbname, user=uname, pw=pwd))
Specifying a index for the dataframe
While importing from MySQL, you can set any column(or columns) to be used as the index for the dataframe. To do this, read_sql()
function accepts the optional parameter index_col
which can be set to any column name.
For example, consider a table named employee with the following columns.
id bigint(20) name text age text country text start_date datetime
The following code will read all records from employee table and sets the id column as the index for the dataframe.
df = pd.read_sql('employee', engine, index_col="id") print(df.head(10))
Sample Output
name age country start_date id 111 Thomas 35 United Kingdom 222 Ben 42 Australia 333 Harry 28 India
Parsing Dates
If you have columns in your MySQL table with date values, you may parse those by setting the parse_dates
parameter in read_sql()
function. For example:
df = pd.read_sql('employee', engine, index_col="id",parse_dates={'start_date':'%d/%m/%Y'})
Note that this parameter is only required if the column data type is not datetime
. For datetime
columns, Pandas will import as datetime type even if you do not set this parameter.