I found a comparison of the Reactor and the Proactor pattern here. Both patterns talk about isses that crop up when building a concurrent network server. Both are related alternatives to thread based concurrency (or could work as a complement to thread based concurrency).
Both revolve around the concept of an IO De-multiplexer, event sources and event handlers. The driver program registers some event sources (e.g., sockets) with an IO de-multiplexer (e.g., select() or poll()). When an event occurs on a socket, a corresponding event handler is called. Of course, there must be some map between events from an event source to event handlers.
I found that these patterns are more or less embodied in the Python asyncore and asynchat modules, and want to discuss how the modules implement these patterns.
The Basics
We'll first compare the terminology of the patterns with that of the Python modules.
- blocking IO: this would translate to a read()/write() on a blocking socket. The call would block until there was some data available to read or the socket was closed. The thread making the call cannot do anything else.
- non-blocking, synchronous IO: this would translate to a read()/write() on a non-blocking socket. The call would return immediately, either with the data read/written, or with a signal that the IO operation could not complete (e.g., read() returns with -1, and errno set to EWOULBLOCK/EAGAIN. It is then the caller's responsibility to keep calling repeatedly until the operation succeeds.
- non-blocking, asynchronous IO: this would translate to Unix SIGIO mechanisms (unfortunately, I am not familiar with this), or posix aio_* functions (not familiar with these either). Essentially, these IO calls return immediately, and the OS starts doing the operation in a separate (kernel level) thread; when the operation is ready, the user code is given some notification.
The Reactor Pattern: asyncore
According to the authors, here is how the Reactor pattern, which usually would use non-blocking synchronous IO, would work:
Here's a read in Reactor:How does this work in Python? Its done using the asyncore module.
- An event handler declares interest in I/O events that indicate readiness for read on a particular socket
- The event de-multiplexer waits for events
- An event comes in and wakes-up the demultiplexor, and the demultiplexor calls the appropriate handler
- The event handler performs the actual read operation, handles the data read, declares renewed interest in I/O events, and returns control to the dispatcher
- The IO demux is the asyncore.loop() function; it listens for events on sockets using either the select() or poll() OS call. It uses a global or user supplied dictionary to map sockets to event handlers (see below). Event handlers are instances of asyncore.dispatcher (or its subclasses). A dispatcher contains a socket and registers itself in the global map, letting loop() know that its methods should be called in response to events on its sockets. It also, through its readable() and writable() methods, lets loop() know what events it is interested in handling.
- loop() uses select() or poll() to wait for events on the sockets it knows about.
- select()/poll() returns; loop() goes through each socket that has an event, find the corresponding dispatcher object, determines the type of event, and calls a method corresponding to the event on the dispatcher object. In fact, loop() translates raw readable/writable events on sockets to slightly higher-level events using state information about the socket.
- The dispatcher object's method is supposed to perform the actual IO: for example, in handle_read() we would read() the data off the socket and process it. Control then returns to loop(). Of course, one problem is that we should not do lengthy tasks in our handler, because then our server would not behave very concurrently and be unable to process other events in time. But what if we did need to do time-taking tasks in response to the event? Thats a subject for another post. For now we assume that our handlers can return quickly enough that as a whole the server behaves pretty concurrently.
The Proactor pattern: a psuedo-implementation in asynchat
According to the authors, here is how the Proactor pattern, which would usually use true asynchronous IO operations provided by the OS, would work:
Here is a read operation in Proactor (true async):How does this work in Python? Using the asynchat module.
- A handler initiates an asynchronous read operation (note: the OS must support asynchronous I/O). In this case, the handler does not care about I/O readiness events, but instead registers interest in receiving completion events.
- The event demultiplexor waits until the operation is completed
- While the event demultiplexor waits, the OS executes the read operation in a parallel kernel thread, puts data into a user-defined buffer, and notifies the event demultiplexor that the read is complete
- The event demultiplexor calls the appropriate handler;
- The event handler handles the data from user defined buffer, starts a new asynchronous operation, and returns control to the event demultiplexor.
- Event handlers are instances of asynchat.async_chat (or rather, its subclasses). Taking read as an example, the handler would register interest in reading data by providing a readable() method that returns True.
- loop() would then use it to wait on its socket until the socket was readable. When the socket become readable, instead of calling some OS function to read the data, async_chat.handle_read() is called.
- This method will slurp up all available data.
- Then, handle_read() would call the collect_incoming_data() method of the subclass. From the subclass's point of view, someone else has done the job of doing the actual IO, and it is being signaled that the IO operation is complete.
- collect_incoming_data() processes the data, and by returning, implicitly starts a new async IO cycle.
A Unified API
Basically, Python's asynchat is providing an emulated Proactor interface to application writers. It would be good if asynchat could be redone so that it could use true async IO operations on OSes that support them, and fall back to synchronous IO when it is not available.
No comments:
Post a Comment