I am having a problem writing a .sql file from a .csv file in python
I need to write a .sql file that creates a table.
I am importing the .csv file using pandas:
import os
import pandas as pd
csv_directory = '/.../test/'
# Lecture des fichiers CSV et génération des requêtes SQL
sql_queries = []
for file in os.listdir(csv_directory):
if file.endswith('.csv'):
df = pd.read_csv(csv_directory + file, header=None) # Lecture du fichier CSV avec pandas
table_name = file.split('.')[0] # Utilisation du nom de fichier comme nom de table
print (file)
#print (table_name)
#print (df)
create_table_query = f"CREATE TABLE {table_name} (code TEXT NOT NULL, valeur TEXT NULL, description TEXT,CONSTRAINT {table_name}_pk PRIMARY KEY (code));"
insert_data_query = f"INSERT INTO {table_name} (code, valeur, description) VALUES "
insert_data_query += ', '.join([str(tuple(row)) for row in df.values]) + ';'
sql_queries.append(create_table_query)
sql_queries.append(insert_data_query)
# Écriture des requêtes SQL dans un fichier
with open('list.sql', 'w') as file:
for query in sql_queries:
file.write(query + '\n')
but I need :
('non_concerne','Non concerné(e)','information non possible ou non pertinente pour l''élément décrit')
instead of
('non_concerne','Non concerné(e)',"information non possible ou non pertinente pour l'élément décrit")
Answers
To correctly handle single quotes within the strings when generating the SQL INSERT statements, you can use parameterized queries. This approach ensures that special characters like single quotes are properly escaped. Here's how you can modify your code to achieve this:
import os
import pandas as pd
csv_directory = '/.../test/'
# Function to format a value for SQL query
def format_value(value):
if pd.isna(value):
return 'NULL'
elif isinstance(value, str):
return f"'{value.replace("'", "''")}'" # Escape single quotes
else:
return str(value)
# Read CSV files and generate SQL queries
sql_queries = []
for file in os.listdir(csv_directory):
if file.endswith('.csv'):
df = pd.read_csv(os.path.join(csv_directory, file), header=None) # Read CSV file with pandas
table_name = file.split('.')[0] # Use the filename as table name
# Create table query
create_table_query = f"CREATE TABLE {table_name} (code TEXT NOT NULL, valeur TEXT NULL, description TEXT, CONSTRAINT {table_name}_pk PRIMARY KEY (code));"
sql_queries.append(create_table_query)
# Insert data queries
insert_data_query = f"INSERT INTO {table_name} (code, valeur, description) VALUES "
insert_data_query += ', '.join(['(' + ', '.join([format_value(val) for val in row]) + ')' for row in df.values]) + ';'
sql_queries.append(insert_data_query)
# Write SQL queries to a file
with open('list.sql', 'w') as file:
for query in sql_queries:
file.write(query + '\n')
In this code:
- The
format_value
function is used to properly format values for SQL queries. It escapes single quotes by replacing them with two single quotes (''
). - When constructing the
INSERT
query, each value in the DataFrame is passed through theformat_value
function to ensure proper formatting. - This approach handles cases where the values contain special characters like single quotes, ensuring that the generated SQL queries are valid and safe.