A simple wrapper to run SQL (SQLite) queries on pandas.DataFrame objects (Python).
- 'python' >= 3.5
- 'pandas' >= 1.0
With pip
(from PyPI repository):
pip install sqldf
- SELECT query with WHERE condition
# Import libraries
import pandas as pd
import numpy as np
import sqldf
# Create a dummy pd.Dataframe
df = pd.DataFrame({'col1': ['A', 'B', np.NaN, 'C', 'D'], 'col2': ['F', np.NaN, 'G', 'H', 'I']})
# Define a SQL (SQLite3) query
query = """
SELECT *
FROM df
WHERE col_1 IS NOT NULL;
"""
# Run the query
df_view = sqldf.run(query)
- UPDATE query that change inplace a pd.Dataframe
# Import libraries
import pandas as pd
import sqldf
# Create a dummy pd.Dataframe
url = ('https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv')
tips = pd.read_csv(url)
# Define a SQL (SQLite3) query
query = """
UPDATE tips
SET tip = tip*2
WHERE tip < 2;
"""
# Run the query
sqldf.run(query)
- More examples in the notebook: Demonstration notebook for SQLDF
- It create a virtual in-memory SQLite3 database at runtime
- It convert the pd.DataFrame input(s) to SQL table(s)
- It proceed the SQL query on the table(s)
- It convert back the SQL table(s) to updated pd.DataFrame(s) if required
- It returns the result of the query if required