Asynchronous Mailer Information Peter Williams 8/4/2000 1. INTRODUCTION It's pretty clear that the Evolution mailer needs to be asynchronous in some manner. Blocking the UI completely on IMAP calls or large disk reads is unnacceptable, and it's really nice to be able to thread the message view in the background, or do other things while a mailbox is downloading. The problem in making Evolution asynchronous is Camel. Camel is not asynchronous for a few reasons. All of its interfaces are synchronous -- calls like camel_store_get_folder, camel_folder_get_message, etc. can take a very long time if they're being performed over a network or with a large mbox mailbox file. However, these functions have no mechanism for specifying that the operation is in progress but not complete, and no mechanism for signaling when to operation does complete. 2. WHY I DIDN'T MAKE CAMEL ASYNCHRONOUS It seems like it would be a good idea, then, to rewrite Camel to be asynchonous. This presents several problems: * Many interfaces must be rewritten to support "completed" callbacks, etc. Some of these interfaces are connected to synchronous CORBA calls. * Everything must be rewritten to be asynchonous. This includes the CamelStore, CamelFolder, CamelMimeMessage, CamelProvider, every subclass thereof, and all the code that touches these. These include the files in camel/, mail/, filter/, and composer/. The change would be a complete redesign for any provider implementation. * All the work on providers (IMAP, mh, mbox, nntp) up to this point would be wasted. While they were being rewritten evolution-mail would be useless. However, it is worth noting that the solution I chose is not optimal, and I think that it would be worthwhile to write a libcamel2 or some such thing that was designed from the ground up to work asynchronously. Starting fresh from such a design would work, but trying to move the existing code over would be more trouble than it's worth. 3. WHY I MADE CAMEL THREADED If Camel was not going to be made asynchronous, really the only other choice was to make it multithreaded. [1] No one has been particularly excited by this plan, as debugging and writing MT-safe code is hard. But there wasn't much of a choice, and I believed that a simple thread wrapper could be written around Camel. The important thing to know is that while Camel is multithreaded, we DO NOT and CANNOT share objects between threads. Instead, evolution-mail sends a request to a dispatching thread, which performs the action or queues it to be performed. (See section 4 for details) The goal that I was working towards is that there should be no calls to camel made, ever, in the main thread. I didn't expect to and didn't do this, but that was the intent. [1]. Well, we could fork off another process, but they share so much data that this would be pretty impractical. 4. IMPLEMENTATION a. CamelObject Threading presented a major problem regarding Camel. Camel is based on the GTK Object system, and uses signals to communicate events. This is okay in a nonthreaded application, but the GTK Object system is not thread-safe. Particularly, signals and object allocations use static data. Using either one inside Camel would guarantee that we'd be stuck with random crashes forevermore. That's Bad (TM). There were two choices: make sure to limit our usage of GTK, or stop using the GTK Object system. I decided to do the latter, as the former would lead to a mess of "what GTK calls can we make" and GDK_THREADS_ENTER and accidentally calling UI functions and upgrades to GTK breaking everything. So I wrote a very very simple CamelObject system. It had three goals: * Be really straightforward, just encapsulate the type heirarchy without all that GtkArg silliness or anything. * Be as compatible as possible with the GTK Object system to make porting easy * Be threadsafe It supports: * Type inheritance * Events (signals) * Type checking * Normal refcounting * No unref/destroy messiness * Threadsafety * Class functions The entire code to the object system is in camel/camel-object.c. It's a naive implementation and not full of features, but intentionally that way. The main differences between GTK Objects and Camel Objects are: * s,gtk,camel,i of course * Finalize is no longer a GtkObjectClass function. You specify a finalize function along with an init function when declaring a type, and it is called automatically and chained automatically. * Declaring a type is a slightly different API * The signal system is replaced with a not-so-clever event system. Every event is equivalent to a NONE__POINTER signal. The syntax is slightly different: a class "declares" an event and specifies a name and a "prep func", that is called before the event is triggered and can cancel it. * There is only one CamelXXXClass in existence for every type. All objects share it. There is a shell script, tools/make-camel-object.sh that will do all of the common substitutions to make a file CamelObject-compatible. Usually all that needs to be done is move the implementation of the finalize event out of the class init, modify the get_type function, and replace signals with events. Pitfalls in the transition that I ran into were: * gtk_object_ref -> camel_object_ref or you coredump * some files return 'guint' instead of GtkType and must be changed * Remove the #include * gtk_object_set_datas must be changed (This happened once; I added a static hashtable) * signals have to be fudged a bit to match the gpointer input * the BAST_CASTARD option is on, meaning failed typecasts will return NULL, almost guaranteeing a segfault -- gets those bugs fixed double-quick! b. API -- mail_operation_spec I worked by creating a very specific definition of a "mail operation" and wrote an engine to queue and dispatch them. A mail operation is defined by a structure mail_operation_spec prototyped in mail-threads.h. It comes in three logical parts -- a "setup" phase, executed in the main thread; a "do" phase, executed in the dispatch thread; and a "cleanup" phase, executed in the main thread. These three phases are guaranteed to be performed in order and atomically with respect to other mail operations. Each of these phases is represented by a function pointer in the mail_operation_spec structure. The function mail_operation_queue() is called and passed a pointer to a mail_operation_spec and a user_data-style pointer that fills in the operation's parameters. The "setup" callback is called immediately, though that may change. Each callback is passed three parameters: a pointer to the user_data, a pointer to the "operation data", and a pointer to a CamelException. The "operation data" is allocated automatically and freed when the operation completes. Internal data that needs to be shared between phases should be stored here. The size allocated is specified in the mail_operation_spec structure. Because all of the callbacks use Camel calls at some point, the CamelException is provided as utility. The dispatcher will catch exceptions and display error dialogs, unlike the synchronous code which lets exceptions fall through the cracks fairly easily. I tried to implement all the operations following this convention. Basically I used this skeleton code for all the operations, just filling in the specifics: =================================== typedef struct operation_name_input_s { parameters to operation } operation_name_input_t; typedef struct operation_name_data_s { internal data to operation, if any (if none, omit the structure and set opdata_size to 0) } operation_name_data_t; static gchar *describe_operation_name (gpointer in_data, gboolean gerund); static void setup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex); static void do_operation_name (gpointer in_data, gpointer op_data, CamelException *ex); static void cleanup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex); static gchar *describe_operation_name (gpointer in_data, gboolean gerund) { operation_name_input_t *input = (operation_name_input_t *) in_data; if (gerund) { return a g_strdup'ed string describing what we're doing } else { return a g_strdup'ed string describing what we're about to do } } static void setup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex) { operation_name_input_t *input = (operation_name_input_t *) in_data; operation_name_data_t *data = (operation_name_data_t *) op_data; verify that parameters are valid initialize op_data reference objects } static void do_operation_name (gpointer in_data, gpointer op_data, CamelException *ex) { operation_name_input_t *input = (operation_name_input_t *) in_data; operation_name_data_t *data = (operation_name_data_t *) op_data; perform camel operations } static void cleanup_operation_name (gpointer in_data, gpointer op_data, CamelException *ex) { operation_name_input_t *input = (operation_name_input_t *) in_data; operation_name_data_t *data = (operation_name_data_t *) op_data; perform UI updates free allocations dereference objects } static const mail_operation_spec op_operation_name = { describe_operation_name, sizeof (operation_name_data_t), setup_operation_name, do_operation_name, cleanup_operation_name }; void mail_do_operation_name (parameters) { operation_name_input_t *input; input = g_new (operation_name_input_t, 1); store parameters in input mail_operation_queue (&op_operation_name, input, TRUE); } =========================================== c. mail-ops.c Has been drawn and quartered. It has been split into: * mail-callbacks.c: the UI callbacks * mail-tools.c: useful sequences wrapping common Camel operations * mail-ops.c: implementations of all the mail_operation_specs An important part of mail-ops.c are the global functions mail_tool_camel_lock_{up,down}. These simulate a recursize mutex around camel. There are an extreme few, supposedly safe, calls to Camel made in the main thread. These functions should go around evey call to Camel or group thereof. I don't think they're necessary but it's nice to know they're there. If you look at mail-tools.c, you'll notice that all the Camel calls are protected with these functions. Remember that a mail tool is really just another Camel call, so don't use them in the main thread either. All the mail operations are implemented in mail-ops.c EXCEPT: * filter-driver.c: the filter_mail operation * message-list.c: the regenerate_messagelist operation * message-thread.c: the thread_messages operation d. Using the operations The mail operations as implemented are very specific to evolution-mail. I was thinking about leaving them mostly generic and then allowing extra callbacks to be added to perform the more specific UI touches, but this seemed kind of pointless. I basically looked through the code, found references to Camel, and split the code into three parts -- the bit before the Camel calls, the bit after, and the Camel calls. These were mapped onto the template, given a name, and added to mail-ops.c. Additionally, I simplified the common tasks that were taken care of in mail-tools.c, making some functions much simpler. Ninety-nine percent of the time, whatever operation is being done is being done in a callback, so all that has to be done is this: ================== void my_callback (GtkObject *obj, gchar *uid) { camel_do_something (uid); } ==== becomes ==== void my_callback (GtkObject *obj, gchar *uid) { mail_do_do_something (uid); } ================= There are, however, a few belligerents. Particularly, the function mail_uri_to_folder returns a CamelFolder and yet should really be asynchronous. This is called in a CORBA call that is sychronous, and additionally is used in the filter code. I changed the first usage to return the folder immediately but still fetch the CamelFolder asyncrhonously, and in the second case, made filtering asynchronous, so the fact that the call is synchronous doesn't matter. The function was renamed to mail_tool_uri_to_folder to emphasize that it's a synchronous Camel call. e. The dispatcher mail_operation_queue () takes its parameters and assembles them in a closure_t structure, which I abbreviate clur. It sets a timeout to display a progress window if an operation is still running one second later (we're not smart enough to check if it's the same operation, but the issue is not a big deal). The other thread and some communication pipes are created. The dispatcher thread sits in a loop reading from a pipe. Every time the main thread queues an operation, it writes the closure_t into the pipe. The dispatcher reads the closure, sends a STARTING message to the main thread (see below for explanation), calls the callback specified in the closure, and sends a FINISHED message. It then goes back to reading from its pipe; it will either block until another operation comes along, or find one right away and start it. This the pipe takes care of queueing operations. The dispatch thread communicates with the main thread with another pipe; however, the main thread has other things to do than read from the pipe, so it adds registers a GIOReader that checks for messages in the glib main loop. In addition to starting and finishing messages, the other thread can communicate to the user using messages and a progress bar. (This is currently implemented but unused.) 5. ISSUES * Operations are queued and dequeued stupidly. Like if you click on one message then click on another, the first will be retrieved and displayed then overwritten by the second. Operations that could be performed at the same time safely aren't. * The CamelObject system is workable, but it'd be nice to work with something established like the GtkObject * The whole threading idea is not great. Concensus is that an asynchronous interface is the Right Thing, eventually. * Care still needs to be taken when designing evolution-mail code to work with the asynchronous mail_do_ functions * Some of the operations are extremely hacky. * IMAP's timeout to send a NOOP had to be removed because we can't use GTK. We need an alternative for this.