How to plot Large dataset using matplotlib bar graph

ghz 13hours ago ⋅ 1 views

My sql query returning me 141 rows and two columns and i want to plot them in a horizontal bar graph. While doing so the data's are overlapping each other. How i can make the graph crisp and clear.

On Y axis it will show the report name and in X axis it'll show their value. How i can achieve that?

import os
import cx_Oracle
import matplotlib.pyplot as plt
import numpy as np
 


reportid_count = []
count_ID =  []

c = conn.cursor()

query = 'select distinct (LTRIM(REGEXP_SUBSTR(ID, '[0-9]{3,}'), '0')) as ReportID,count(ID) from dev_user.RECORD_TABLE  group by ID'
c.execute(query) 


#loop through the rows fetched and store the records as arrays.
for row in c:
    reportid_count.append(row[0])
    print(row[0])
    count_ID.append(row[1])
    print(row[1])

fig = plt.figure(figsize=(13.5, 5))

#plot the bar chart
plt.barh(reportid_count,count_ID)#,color=['red', 'blue', 'purple']
for i, v in enumerate(count_ID):
    plt.text(v, i, str(v), color='blue', fontweight='bold')


plt.title('Report_Details')
plt.xlabel('Report Count')
plt.ylabel("Report ID's")


path = r"\\dev_server.com\View\Foldert\uidDocuments\Store_Img"

os.chdir(path)
plt.savefig(path + '\squares.png')
  
plt.show()
conn.close()

Sample Dataset-

Report 1    1200
Report 2    0
Report 3    0
Report 4    0
Report 5    0
Report 6    0
Report 7    0
Report 8    0
Report 9    0
Report 10   0
Report 11   0
Report 12   0
Report 13   0
Report 14   0
Report 15   0
Report 16   0
Report 17   0
Report 18   0
Report 19   0
Report 20   0
Report 21   0
Report 22   0
Report 23   1
Report 24   2
Report 25   3
Report 26   4
Report 27   5
Report 28   6
Report 29   100
Report 30   101
Report 31   102
Report 32   103
Report 33   104
Report 34   105
Report 35   106
Report 36   107
Report 37   108
Report 38   109
Report 39   110
Report 40   111
Report 41   112
Report 42   113
Report 43   114
Report 44   115
Report 45   116
Report 46   117
Report 47   118
Report 48   119
Report 49   120
Report 50   121
Report 51   122
Report 52   123
Report 53   124
Report 54   125
Report 55   126
Report 56   127
Report 57   128
Report 58   129
Report 59   130
Report 60   131
Report 61   132
Report 62   133
Report 63   134
Report 64   135
Report 65   136
Report 66   137
Report 67   138
Report 68   139
Report 69   140
Report 70   141
Report 71   142
Report 72   143
Report 73   144
Report 74   145
Report 75   146
Report 76   147
Report 77   148
Report 78   149
Report 79   150
Report 80   151
Report 81   152
Report 82   153
Report 83   154
Report 84   155
Report 85   156
Report 86   157
Report 87   158
Report 88   159
Report 89   160
Report 90   161
Report 91   162
Report 92   163
Report 93   164
Report 94   165
Report 95   166
Report 96   167
Report 97   168
Report 98   169
Report 99   170
Report 100  171
Report 101  172
Report 102  173
Report 103  174
Report 104  175
Report 105  176
Report 106  177
Report 107  178
Report 108  179
Report 109  180
Report 110  181
Report 111  182
Report 112  183
Report 113  184
Report 114  185
Report 115  186
Report 116  187
Report 117  188
Report 118  189
Report 119  190
Report 120  191
Report 121  192
Report 122  193
Report 123  194
Report 124  195
Report 125  196
Report 126  197
Report 127  198
Report 128  199
Report 129  200
Report 130  201
Report 131  202
Report 132  203
Report 133  204
Report 134  205
Report 135  206
Report 136  207
Report 137  208
Report 138  209
Report 139  210
Report 140  211
Report 141  212
Report 142  0
Report 143  0
Report 144  0
Report 145  0
Report 146  0
Report 147  0
Report 148  0
Report 149  0
Report 150  0
Report 151  0
Report 152  700
Report 153  701
Report 154  702
Report 155  703
Report 156  704
Report 157  705
Report 158  706
Report 159  707
Report 160  708
Report 161  709
Report 162  710
Report 163  711
Report 164  712
Report 165  713
Report 166  714
Report 167  715
Report 168  716
Report 169  717
Report 170  718
Report 171  719
Report 172  720
Report 173  721
Report 174  722
Report 175  723
Report 176  724
Report 177  725
Report 178  726
Report 179  727
Report 180  728
Report 181  729
Report 182  730
Report 183  731
Report 184  732
Report 185  733
Report 186  734
Report 187  735
Report 188  736
Report 189  737
Report 190  738
Report 191  739
Report 192  740
Report 193  741
Report 194  742
Report 195  743
Report 196  744
Report 197  745
Report 198  746
Report 199  747
Report 200  748
Report 201  749
Report 202  750
Report 203  751
Report 204  752
Report 205  753
Report 206  754
Report 207  755
Report 208  756
Report 209  757
Report 210  758
Report 211  759
Report 212  760
Report 213  761
Report 214  762
Report 215  763
Report 216  764
Report 217  765
Report 218  766
Report 219  767
Report 220  768
Report 221  769
Report 222  770
Report 223  771
Report 224  772
Report 225  773
Report 226  774
Report 227  775
Report 228  776
Report 229  777
Report 230  778
Report 231  779
Report 232  780
Report 233  781
Report 234  782
Report 235  0
Report 236  0
Report 237  1300
Report 238  1400

Answer

To avoid overlapping in your horizontal bar chart and to make the graph more clear, we can use a combination of several techniques:

  1. Set the figure size: Increase the figure size to give enough space to all labels on the Y-axis.
  2. Rotate Y-axis labels: If there are many report names, rotating the labels can prevent them from overlapping.
  3. Sort the bars: Sorting the bars either by count or by report name will make the graph easier to read.
  4. Add padding between bars: Adjusting the hspace between the bars can create some space to prevent them from being too close.
  5. Use log scale if counts vary widely: If the values on the X-axis (counts) vary dramatically, you can use a logarithmic scale to make the smaller bars more visible.

Here's an updated version of your code with these suggestions:

import os
import cx_Oracle
import matplotlib.pyplot as plt
import numpy as np

# Initialize lists for storing report names and counts
reportid_count = []
count_ID = []

# Establish a database connection (ensure `conn` is defined)
c = conn.cursor()

# Query to get report ID counts
query = 'select distinct (LTRIM(REGEXP_SUBSTR(ID, \'[0-9]{3,}\'), \'0\')) as ReportID, count(ID) from dev_user.RECORD_TABLE group by ID'
c.execute(query)

# Loop through the rows and store the data
for row in c:
    reportid_count.append(row[0])
    count_ID.append(row[1])

# Close the connection
conn.close()

# Sort by count in descending order to prevent overlapping
sorted_indices = np.argsort(count_ID)[::-1]  # Sorting in descending order
reportid_count_sorted = np.array(reportid_count)[sorted_indices]
count_ID_sorted = np.array(count_ID)[sorted_indices]

# Create a figure and axes for the plot
fig, ax = plt.subplots(figsize=(13.5, 8))

# Plot the horizontal bar chart
ax.barh(reportid_count_sorted, count_ID_sorted, color='skyblue')

# Add text labels for each bar
for i, v in enumerate(count_ID_sorted):
    ax.text(v + 10, i, str(v), color='blue', fontweight='bold')

# Set the title and labels
ax.set_title('Report Details', fontsize=16)
ax.set_xlabel('Report Count', fontsize=12)
ax.set_ylabel('Report ID\'s', fontsize=12)

# Rotate the Y-axis labels to prevent overlap
plt.yticks(rotation=0)

# Optionally, use log scale if the counts vary dramatically
# ax.set_xscale('log')

# Set padding between bars to make the chart less crowded
plt.subplots_adjust(left=0.2, right=0.8, top=0.9, bottom=0.1)

# Save the figure as an image file
path = r"\\dev_server.com\View\Foldert\uidDocuments\Store_Img"
os.chdir(path)
plt.savefig(path + '\\squares.png')

# Display the plot
plt.show()

Key Changes:

  1. Sorting the bars:

    • I sorted the bars in descending order by the count to make sure the most significant bars are shown at the top.
    • This can help reduce overlap since bars with larger counts are visually more prominent.
  2. Increasing figure size:

    • The figure size has been adjusted to (13.5, 8) to provide more space for the chart.
  3. Spacing adjustments:

    • Used plt.subplots_adjust() to adjust the margins, which can help make the chart less crowded.
  4. Rotation of Y-axis labels:

    • Set the Y-axis labels to have a horizontal orientation (rotation=0) for better clarity. You can adjust this depending on the length of your report names.
  5. Adding labels:

    • Added labels on the bars for better clarity, with a little padding (v + 10) to ensure the labels are visible even if the bar is very short.
  6. Log scale option:

    • If your data varies widely (e.g., from 0 to a few thousand), consider using a logarithmic scale (ax.set_xscale('log')), which compresses large values and makes smaller ones more visible.

Output:

This code will produce a horizontal bar chart where:

  • The report names (IDs) are listed on the Y-axis.
  • The corresponding count for each report is displayed on the X-axis.
  • The bars are ordered by count in descending order, ensuring that the graph is not overcrowded.
  • The bar labels are displayed next to the bars for clarity.

Let me know if you need further customizations or explanations!