Python PyQT/PySide QThread limiting

ghz 昨天 ⋅ 1 views

I have problem with thread limiting. I want to do it using QThread. So SpiderThread is QThread object crawling some urls. But I want to limit working threads to X threads at once. I have done it earlier with threadpool and QRunnable but it's buggy in pyside when numbers of urls are big. So I have this simple code:

 self.threads = []
    for url in self.urls:
        th = SpiderThread(url)
        th.updateresultsSignal.connect(self.update_results)
        self.threads.append(th)
        th.start()

Anyone have working example of limiting threads using QThread ?

Answer

To limit the number of concurrent threads using QThread in PySide (or PyQt), you can use a thread pool approach with QThreadPool and QRunnable. However, you can still accomplish thread limiting using QThread by manually controlling the number of active threads.

Here is a basic approach to limit the number of threads using QThread:

Key Concepts:

  1. QThreadPool – This will handle limiting the number of running threads by setting the maximum thread count.
  2. Semaphore – A semaphore can be used to limit the number of concurrent threads. We can use it as a counter to allow only a certain number of threads to run at the same time.

Here's a simplified example of limiting the number of threads to X using QThread:

Step-by-Step Solution:

import sys
from PySide6.QtCore import QThread, Signal, Semaphore
from PySide6.QtWidgets import QApplication, QWidget, QVBoxLayout, QPushButton

# Create a custom QThread for crawling
class SpiderThread(QThread):
    # Signal to send updates (e.g., results from crawling)
    update_results_signal = Signal(str)

    def __init__(self, url, semaphore):
        super().__init__()
        self.url = url
        self.semaphore = semaphore

    def run(self):
        # Simulate some crawling work (e.g., downloading or parsing the URL)
        # Acquire the semaphore to ensure that only X threads run concurrently
        self.semaphore.acquire()
        
        try:
            # Simulate work (replace with actual crawling code)
            print(f"Crawling {self.url}...")
            self.update_results_signal.emit(f"Result from {self.url}")
        finally:
            # Release the semaphore once the work is done
            self.semaphore.release()

# Main window
class MainWindow(QWidget):
    def __init__(self):
        super().__init__()

        self.urls = ["http://example.com", "http://example.org", "http://example.net"]  # List of URLs
        self.max_threads = 2  # Limit the number of concurrent threads

        self.semaphore = Semaphore(self.max_threads)  # Semaphore with a maximum count of X threads

        self.init_ui()

    def init_ui(self):
        layout = QVBoxLayout()

        self.start_button = QPushButton("Start Crawling", self)
        self.start_button.clicked.connect(self.start_crawling)

        layout.addWidget(self.start_button)
        self.setLayout(layout)

    def start_crawling(self):
        # Create and start SpiderThread for each URL
        self.threads = []  # List to store threads
        for url in self.urls:
            th = SpiderThread(url, self.semaphore)
            th.update_results_signal.connect(self.update_results)
            self.threads.append(th)
            th.start()

    def update_results(self, result):
        print(result)

# Run the application
if __name__ == "__main__":
    app = QApplication(sys.argv)
    window = MainWindow()
    window.show()
    sys.exit(app.exec())

Key Points:

  1. Semaphore: The Semaphore is initialized with max_threads to control the maximum number of concurrent threads. A semaphore is a synchronization primitive that blocks a thread until the semaphore's count is positive. When a thread finishes, it releases the semaphore, incrementing the count back.
  2. SpiderThread: Each SpiderThread represents a worker thread that will crawl a URL. Before starting its work (the crawling task), the thread acquires a permit from the semaphore (self.semaphore.acquire()). After finishing the task, it releases the permit (self.semaphore.release()).
  3. Signals: We use signals (update_results_signal) to send results from each thread back to the main GUI thread.

Advantages:

  • Thread Limiting: By using a semaphore, only X threads can run concurrently, ensuring you do not overload the system.
  • Flexible: You can easily adjust the max_threads variable to control the number of threads that should be running at the same time.
  • No Blocking: The semaphore mechanism allows the threads to wait for an available "slot" before proceeding without causing a bottleneck.

This method is simple, efficient, and integrates well with the PySide (or PyQt) event loop and threading system.