Pandas is currently one of the most popular Python library used for data analysis. A DataFrame in Pandas is a data structure for storing data in tabular form, i.e., in rows and columns. This article describes how to write the data in a Pandas DataFrame to a MySQL table.
Consider a DataFrame with three records like below.
You can create a database table in MySQL and insert this data using the to_sql() function in Pandas. The to_sql()
function requires two mandatory parameters - table name and an SQLAlchemy engine object.
SQLAlchemy is a Python toolkit and Object Relational Mapper (ORM) that allows Python to work with SQL Databases. An engine is the base of any SQLAlchemy application that talks to the database. The engine object is created by calling the create_engine() function with database dialect and connection parameters.
The following Python program creates a new table named users
in a MySQL database and populates the table with the three records from our example dataframe.
import pandas as pd from sqlalchemy import create_engine # Credentials to database connection hostname="localhost" dbname="mydb_name" uname="my_user_name" pwd="my_password" # Create dataframe df = pd.DataFrame(data=[[111,'Thomas','35','United Kingdom'], [222,'Ben',42,'Australia'], [333,'Harry',28,'India']], columns=['id','name','age','country']) # Create SQLAlchemy engine to connect to MySQL Database engine = create_engine("mysql+pymysql://{user}:{pw}@{host}/{db}" .format(host=hostname, db=dbname, user=uname, pw=pwd)) # Convert dataframe to sql table df.to_sql('users', engine, index=False)
After executing the above Python script, login to your MySQL database and query the newly created users table.
SELECT * from users;
id | name | age | country |
---|---|---|---|
111 | Thomas | 35 | United Kingdom |
222 | Ben | 42 | Australia |
333 | Harry | 28 | India |
Now let's look at some of the optional parameters that can be passed to the to_sql()
function.
index : True or False. Default is True.
If the index paramter in the to_sql function is set to True (index=True) or if the index parameter is omitted then the DataFrame's index is added as the first column in the MySQL database table. If you don't want the index then set
index=False
index_label : string. Default is None.
When index=True, the DataFrame's index is included in the table under column name index. You can have a different column name for the index column by setting a string value to the index_label parameter. For example
df.to_sql('users', engine, index=True, index_label="RowNum")
if_exists : fail, replace, append. Default is fail
This parameter decides what to do if the table already exist in the database. By dafault an error is raised if the table exists. If you set this parameter to
replace
then the existing table is first dropped before inserting the values. If set toappend
then the values in the dataframe are appended to the existing table.dtype : dictionary
The
dtype
parameter allows you to specify the data type of the table columns as a dictionary object. The keys in the dictionary are column names and values can be one of the SQLAlchemy data types. For examplefrom sqlalchemy.types import String, Integer df.to_sql('users', engine, dtype={"age": Integer(), "name": String(30)})
The above
to_sql()
function call creates theusers
table with theage
column set asinteger
type andname
asvarchar
Columns in users table
index bigint(20) id bigint(20) name varchar(30) age int(11) country text
Adding a PRIMARY KEY
The to_sql() function does not offer any options for creating a Primary Key column. However you can achieve this by calling the .execute method of SQLAlchemy engine.
The following statement sets the id
column in users table as the primary key.
engine.execute('ALTER TABLE users ADD PRIMARY KEY (`id`);')