I am having a problem writing a .sql file from a .csv file in py

ghz 8months ago ⋅ 96 views

I am having a problem writing a .sql file from a .csv file in python

I need to write a .sql file that creates a table.

I am importing the .csv file using pandas:

import os
import pandas as pd

csv_directory = '/.../test/'

# Lecture des fichiers CSV et génération des requêtes SQL
sql_queries = []
for file in os.listdir(csv_directory):
  if file.endswith('.csv'):
    df = pd.read_csv(csv_directory + file, header=None)  # Lecture du fichier CSV avec pandas
    table_name = file.split('.')[0]  # Utilisation du nom de fichier comme nom de table
    print (file)
    #print (table_name)
    #print (df)
    create_table_query = f"CREATE TABLE {table_name} (code TEXT NOT NULL, valeur TEXT NULL, description TEXT,CONSTRAINT {table_name}_pk PRIMARY KEY (code));"
    insert_data_query = f"INSERT INTO {table_name} (code, valeur, description) VALUES "
    insert_data_query += ', '.join([str(tuple(row)) for row in df.values]) + ';'
    sql_queries.append(create_table_query)
    sql_queries.append(insert_data_query)

# Écriture des requêtes SQL dans un fichier
with open('list.sql', 'w') as file:
    for query in sql_queries:
        file.write(query + '\n')

but I need :

('non_concerne','Non concerné(e)','information non possible ou non pertinente pour l''élément décrit')

instead of

('non_concerne','Non concerné(e)',"information non possible ou non pertinente pour l'élément décrit")

Answers

To correctly handle single quotes within the strings when generating the SQL INSERT statements, you can use parameterized queries. This approach ensures that special characters like single quotes are properly escaped. Here's how you can modify your code to achieve this:

import os
import pandas as pd

csv_directory = '/.../test/'

# Function to format a value for SQL query
def format_value(value):
    if pd.isna(value):
        return 'NULL'
    elif isinstance(value, str):
        return f"'{value.replace("'", "''")}'"  # Escape single quotes
    else:
        return str(value)

# Read CSV files and generate SQL queries
sql_queries = []
for file in os.listdir(csv_directory):
    if file.endswith('.csv'):
        df = pd.read_csv(os.path.join(csv_directory, file), header=None)  # Read CSV file with pandas
        table_name = file.split('.')[0]  # Use the filename as table name
        
        # Create table query
        create_table_query = f"CREATE TABLE {table_name} (code TEXT NOT NULL, valeur TEXT NULL, description TEXT, CONSTRAINT {table_name}_pk PRIMARY KEY (code));"
        sql_queries.append(create_table_query)
        
        # Insert data queries
        insert_data_query = f"INSERT INTO {table_name} (code, valeur, description) VALUES "
        insert_data_query += ', '.join(['(' + ', '.join([format_value(val) for val in row]) + ')' for row in df.values]) + ';'
        sql_queries.append(insert_data_query)

# Write SQL queries to a file
with open('list.sql', 'w') as file:
    for query in sql_queries:
        file.write(query + '\n')

In this code:

  • The format_value function is used to properly format values for SQL queries. It escapes single quotes by replacing them with two single quotes ('').
  • When constructing the INSERT query, each value in the DataFrame is passed through the format_value function to ensure proper formatting.
  • This approach handles cases where the values contain special characters like single quotes, ensuring that the generated SQL queries are valid and safe.