Targeting specific values from JSON API and inserting into Postg

ghz 14hours ago ⋅ 9 views

Targeting specific values from JSON API and inserting into Postgresql, using Python

Right now i am able to connect to the url api and my database. I am trying to insert data from the url to the postgresql database using psycopg2. I dont fully understand how to do this, and this is all i could come up with to do this.

import urllib3
import json
import certifi
import psycopg2
from psycopg2.extras import Json


http = urllib3.PoolManager(
    cert_reqs='CERT_REQUIRED',
    ca_certs=certifi.where())
url = '<API-URL>'
headers = urllib3.util.make_headers(basic_auth='<user>:<passowrd>')
r = http.request('GET', url, headers=headers)
data = json.loads(r.data.decode('utf-8'))


def insert_into_table(data):

    for item in data['issues']:
        item['id'] = Json(item['id'])

    with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
        with conn.cursor() as cursor:
            query = """
                INSERT into
                     Countries
                    (revenue)
                VALUES
                    (%(id)s);
            """
            cursor.executemany(query, data)

        conn.commit()


insert_into_table(data)

So this code give me a TypeError: string indices must be integers on cursor.executemany(query, data)

So i know that json.loads brings back a type object and that json.dumps brings a type string . I wasn't sure which one i should be using. and i know i am completely missing something on how im targeting the 'id' value, and inserting it into the query.

Also a little about the API, it is very large and complex and eventually i'll have to go down multiple trees to grab certain values, here is an example of what i'm pulling from.

I am trying to grab "id" under "issues" and not "issue type"

{
  "expand": "<>",
  "startAt": 0,
  "maxResults": 50,
  "total": 13372,
  "issues": [
    {
      "expand": "<>",
      "id": "41508",
      "self": "<>",
      "key": "<>",
      "fields": {
        "issuetype": {
          "self": "<>",
          "id": "1",
          "description": "<>",
          "iconUrl": "<>",
          "name": "<>",
          "subtask": <>,
          "avatarId": <>
        },

Answer

Your code has several issues related to handling the JSON data and how you prepare it for insertion into the database. Let's address them and correct your script step by step.

Key Points:

  1. TypeError: string indices must be integers:
    This error occurs because you are passing data (a dictionary) directly to cursor.executemany(). Instead, you need to pass a list of dictionaries where each dictionary contains the data for a single row to be inserted.

  2. Navigating JSON Structure:
    You need to extract the "id" values from "issues" in the JSON structure.

  3. Query and Data Preparation:
    Use a list of dictionaries with the correct structure for cursor.executemany().


Corrected Code:

import urllib3
import json
import certifi
import psycopg2
from psycopg2.extras import Json

# Set up HTTP request
http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
url = '<API-URL>'
headers = urllib3.util.make_headers(basic_auth='<user>:<password>')
response = http.request('GET', url, headers=headers)

# Parse the JSON response
data = json.loads(response.data.decode('utf-8'))

def insert_into_table(data):
    # Extract "id" values from "issues"
    rows = [{'id': issue['id']} for issue in data['issues']]

    # Connect to PostgreSQL and insert data
    with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
        with conn.cursor() as cursor:
            query = """
                INSERT INTO
                    Countries (revenue)
                VALUES
                    (%(id)s)
            """
            cursor.executemany(query, rows)  # Pass the list of dictionaries here

        conn.commit()

# Call the function to insert data
insert_into_table(data)

Explanation of Changes:

  1. Extract Data:

    rows = [{'id': issue['id']} for issue in data['issues']]
    
    • This creates a list of dictionaries where each dictionary contains the "id" value from the "issues" list in your JSON.
  2. Use cursor.executemany:

    • cursor.executemany(query, rows) expects rows to be a list of dictionaries, where each dictionary maps to a row's values in the query.
  3. Connection Handling:

    • The with statement ensures the connection and cursor are properly closed, even if an error occurs.
  4. JSON Parsing:

    • You already use json.loads to convert the JSON string into a Python dictionary, which is correct. There's no need to use json.dumps.

Notes on Scalability:

  • If the API response is too large, consider using pagination (if supported by the API) or break the response into chunks to avoid memory issues.
  • You might also want to add exception handling to manage database or API errors gracefully.

Let me know if you have further questions or encounter any issues!