Accessing values in a dict by key in multiprocessing

ghz 11hours ago ⋅ 1 views

Using the multiprocessing module, I wrote a server to serve a dict. Now, trying to access that dict by key with a client, I get the following error (server.py and client.py are at the bottom of the post):

Traceback (most recent call last):
  File "client.py", line 19, in <module>
    item = my_dict[key]
TypeError: 'AutoProxy[get_dict]' object is not subscriptable

I believe this is due to the fact that the dict that I register()-ed with my SyncManager gets pickled and passed on as an AutoProxy object. When I check the methods of the AutoProxy object with print(dir(my_dict)), this is what I get:

['_Client', '__builtins__', '__class__', '__deepcopy__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_address_to_local', '_after_fork', '_authkey', '_callmethod', '_close', '_connect', '_decref', '_exposed_', '_getvalue', '_id', '_idset', '_incref', '_isauto', '_manager', '_mutex', '_serializer', '_tls', '_token', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

Which is different from the output of print(dir(my_dict)) in server.py:

['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

It looks like this Autoproxy object retains some methods of the dict object, but apparently not all of them. Crucially it doesn't retain the __getitem__ method, which prevents me from accessing items by key.

How can I access the dict items by key? Also any explanation of how Proxys work with Python multiprocessing would be very helpful.

Note: I don't need to modify the dict values, I just need to extract them by key.

server.py:

from multiprocessing.managers import SyncManager

my_dict = {'item_1': 1, 'item_2':2}
def get_my_dict():
    return my_dict

class MyManager(SyncManager):
    pass

if __name__ == "__main__":
    port_num = 4343
    MyManager.register("get_dict", get_my_dict)
    manager = MyManager(("127.0.0.1", port_num), authkey=b"password")
    manager.start()
    input("Press any key to kill server".center(50, "-"))
    manager.shutdown

client.py

from multiprocessing.managers import SyncManager
import sys

class MyManager(SyncManager):
    pass

MyManager.register("get_dict")

if __name__ == "__main__":
    port_num = 4343
    manager = MyManager(("127.0.0.1", port_num), authkey=b"password")
    manager.connect()
    my_dict = manager.get_dict()

    print("dict = %s" % (dir(my_dict)))
    keys = list(my_dict.keys())
    print(keys)
    for key in keys:
        print(my_dict[key])

Answer

The issue you're facing is due to how the multiprocessing.managers.SyncManager and its AutoProxy work in Python's multiprocessing module. Specifically, AutoProxy objects are a special proxy to an object managed by a remote process, and while they support many methods of the original object (like keys, get, items, etc.), they don't automatically expose methods such as __getitem__ (which is used for subscripting, i.e., my_dict[key]).

Explanation of the issue:

When you create an AutoProxy object using SyncManager, it is a proxy to a shared object between processes. This proxy object does not directly expose the full set of operations you can perform on the underlying dictionary. The proxy is designed to be a more controlled interface to the object, and not all methods (such as __getitem__) are exposed directly.

In your case, the AutoProxy object does expose methods like keys() and items(), but it doesn't allow direct item access using square brackets ([]) because the __getitem__ method is not available. Instead, you need to use methods like get() to access the dictionary values.

Solution:

You can use the get() method (which is available on the AutoProxy object) to retrieve values by key, instead of using subscripting (my_dict[key]). Here's how you can modify your client.py to work correctly:

Modify client.py:

from multiprocessing.managers import SyncManager
import sys

class MyManager(SyncManager):
    pass

MyManager.register("get_dict")

if __name__ == "__main__":
    port_num = 4343
    manager = MyManager(("127.0.0.1", port_num), authkey=b"password")
    manager.connect()
    my_dict = manager.get_dict()

    print("dict = %s" % (dir(my_dict)))  # Show available methods

    # Use the `get()` method instead of subscripting
    keys = list(my_dict.keys())
    print("Keys:", keys)
    for key in keys:
        item = my_dict.get(key)  # Use get() to access items
        print(f"Item for {key}: {item}")

Explanation:

  • Instead of trying to access the dictionary using my_dict[key], use my_dict.get(key) which is supported by the AutoProxy object.
  • This will allow you to retrieve the value associated with a given key without encountering the TypeError: 'AutoProxy' object is not subscriptable error.

Other Notes:

  1. How AutoProxy works: The AutoProxy object is created automatically by multiprocessing.managers.SyncManager. It allows processes to interact with objects (like dictionaries or lists) that are shared between processes. These objects are accessed via proxy, which means that the actual object resides in another process, and calls to the proxy will be forwarded to the original object. The proxy allows many standard methods to work but may not support all Python operations (e.g., direct item access using []).

  2. Alternatives:

    • If you really need to use direct item access (i.e., my_dict[key]), consider using a Manager's dict() directly, but remember that you need to ensure the shared dict is properly created and accessible.
    • You could also use other synchronization mechanisms or libraries (e.g., multiprocessing.Value, multiprocessing.Array) if you're working with more complex data sharing and synchronization between processes.

Summary:

  • The error occurs because AutoProxy does not expose the __getitem__ method.
  • Instead of using my_dict[key], use my_dict.get(key) to access dictionary values.
  • This approach should resolve the TypeError and allow you to interact with the shared dictionary correctly in your client code.