Closing in on closures (23 Feb 2006)

The other day, I battled global variables in Lisp by using this construct:

(let ((globalFoo 42))
    (defun foobar1()
      (* globalFoo globalFoo))

    (defun foobar2(newVal)
      (setf globalFoo newVal))
  )

globalFoo is neither declared nor bound within the functions foobar1 or foobar2; it is a free variable. When Lisp encounters such a variable, it will search the enclosing code (the lexical environment) for a binding of the variable; in the above case, it will find the binding established by the let statement, and all is peachy.

globalFoo's scope is limited to the functions foobar1 and foobar2; functions outside of the let statement cannot refer to the variable. But we can call foobar1 and foobar2 even after returning from the let statement, and thereby read or modify globalFoo without causing a runtime errors.

Lisp accomplishes this by creating objects called closures. A closure is a function plus a set of bindings of free variables in the function. For instance, the function foobar1 plus the binding of globalFoo to a place in memory which stores "42" is such a closure.

To illustrate this:

> (load "closure.lsp")  ;; contains the code above
T
> globalFoo     ;; can we access the variable?
*** Variable GLOBALFOO is unbound
> (foobar1)     ;; we can't, but maybe foobar1 can
1764
> (foobar2 20)  ;; set new value for globalFoo
20
> (foobar1)
400

Hmmm - what does this remind you of? We've got a variable which is shared between two functions, and only those functions have access to the variable, while outside callers have not... he who has never tried to encapsulate data in an object shall cast the first pointer!

Proofreading this, I realize that the simple Lisp code example is probably not too instructive; I guess closures really start to shine when you let functions return anonymous functions with free variables in them. Hope to come up with better examples in the future.

So this is how closures might remind us of objects. But let's look at it from a different angle now - how would we implement closures in conventional languages?

Imagine that while we invoke a function, we'd keep its parameters and local variables on the heap rather than on the stack, so instead of stack frames we maintain heap frames. You could then think of a closure as:

  • A function pointer referring to the code to be executed
  • A set of references to frames on the heap, namely references to all bindings of any free variables which occur in the code of the function.

Because the "stack" frames are actually kept on the heap and we are therefore no longer obliged to follow the strict rules of the hardware stack, the contents of those frames can continue to live even beyond the scope of the executed function.

So we're actually storing a (partial) snapshot of the execution context of a function, along with the code of the function!

Let's see how we could implement this. The first obvious first-order approximation is in C++; it's a function object. A function object encapsulates a function pointer and maybe also copies of parameters needed for the function call:

  typedef bool (*fncptr)(int, float);
  fncptr foobar_fnc; // declaration

  class FunctionObject {
  private:
    int m_i;
    float m_f;
    fncptr m_fnc;
  public:
    FunctionObject(fncptr fnc, int i, float f) : m_fnc(fnc), m_f(f), m_i(i) {}
    bool operator() { m_fnc(m_i, m_f); }
  };

  FunctionObject fo(foobar_fnc, 42, 42.0);

FunctionObject captures a snapshot of a function call with its parameters. This is useful in a number of situations, as can be witnessed by trying to enumerate the many approaches to implement something like this in C++ libraries such as Boost; however, this is not a closure. We're "binding" function parameters in the function object - but those are, in the sense described earlier, not free variables anyway. On the other hand, if the code of the function referred to by the FunctionObject had any free variables, the FunctionObject wouldn't be able to bind them. So this approach won't cut it.

There are other approaches in C++, of course. For example, I recently found the Boost Lambda Library which covers at least parts of what I'm after. At first sight, however, I'm not too sure its syntax is for me. I also hear that GCC implements nested functions:

typedef void (*FNC)(void);

FNC getFNC(void)
{
  int x = 42;
  void foo(void)
  {
    printf("now in foo, x=%d\n", x);
  }
  return foo;
}

int main(void)
{
  FNC fnc = getFNC();
  fnc();
  return 0;
}

Unfortunately, extensions like this didn't make it into the standards so far. So let's move on to greener pastures. Next stop: How anonymous delegates in C# 2.0 implement closures.


When asked for a TWiki account, use your own or the default TWikiGuest account.


Fight globalization! (18 Feb 2006)

A few days ago, I talked everybody to sleep about special variables and dynamic bindings in Lisp. I somehow managed to avoid this topic for years, but then I finally had to understand it to fix subtle issues in our code when dealing with what I thought were simple global variables.

In Lisp, you usually declare a global variable using defvar and defparameter - but this way, the variable not only becomes global, but also special. They are probably called special because of the special effects that they display - see my blog entry for an idea of the confusion this caused to a simple-minded C++ programmer (me).

Most of the time, I would use defvar to emulate the effect of a "file-global" static variable in C++, and fortunately, this can be implemented in a much cleaner fashion using a let statement at the right spot. Example:

  // C++, file foobar.C
  static int globalFoo = 42;
  
  int foobar(void)
  {
    return globalFoo * globalFoo;
  }
  int foobar2(int newVal)
  {
    globalFoo = newVal;
  }

  ;; Lisp
  (let ((globalFoo 42))
    (defun foobar1()
      (* globalFoo globalFoo))

    (defun foobar2(newVal)
      (setf globalFoo newVal))
  )

The let statement establishes a binding for globalFoo which is only accessible within foobar1 and foobar2. This is even better than a static global variable in C++ at file level, because this way precisely the functions which actually have a business with globalFoo are able to use it; the functions foobar1 and foobar2 now share a variable. We don't have to declare a global variable anymore and thereby achieve better encapsulation and at the same time avoid special variables with their amusing special effects. Life is good!

This introduces another interesting concept in Lisp: Closures, i.e. functions with references to variables in their lexical context. More on this hopefully soon.


When asked for a TWiki account, use your own or the default TWikiGuest account.


I'm so special (11 Feb 2006)

The large application which I help to develop has an embedded Lisp interpreter and compiler, and over time I also left my marks in that subsystem. It was only after a considerable amount of tinkering with the innards of the interpreter that my insights into Lisp finally reached critical mass. I guess I understand now why Lispniks are so devoted to their language and why they regard all those other languages as mere Lisp wannabees.

While learning Lisp, bindings and closures were particularly strange to me. It took me way too long until I finally grokked lexical and dynamic binding in Lisp. Or at least I think I get it now.

Let us consider the following C code:

  int fortytwo = 42;

  int shatter_illusions(void)
  {
    return fortytwo;
  }

  void quelle_surprise(void)
  {
    int fortytwo = 4711;
    printf("shatter_illusions returns %d\n", shatter_illusions());
  }

A seasoned C or C++ programmer will parse this code with his eyes shut and tell you immediately that quelle_surprise will print "42" because shatter_illusions() refers to the global definition of fortytwo.

Meanwhile, back in the parentheses jungle:

  (defvar fortytwo 42)

  (defun shatter-illusions()
    fortytwo)

  (defun quelle-surprise()
    (let ((fortytwo 4711))
      (format t "shatter-illusions returns ~A~%" (shatter-illusions))))

To a C++ programmer, this looks like a verbatim transformation of the code above into Lisp syntax, and he will therefore assume that the code will still answer "42". But it doesn't: quelle-surprise thinks the right answer is "4711"!

Subtleties aside, the value of Lisp variables with lexical binding is determined by the lexical structure of the code, i.e. how forms are nested in each other. Most of the time, let is used to establish a lexical binding for a variable.

Variables which are dynamically bound lead a more interesting life: Their value is also determined by how forms call each other at runtime. The defvar statement above both binds fortytwo to a value of 42 and declares the variable as dynamic or special, i.e. as a variable with dynamic binding. Even if code is executed which usually would bind the variable lexically, such as a let form, the variable will in fact retain its dynamic binding.

"Huh? What did you say?"

  1. defvar declares fortytwo as dynamic and binds it to a value of 42.
  2. The let statement in quelle-surprise binds fortytwo to a value of 4711, but does not change the type of binding! Hence, fortytwo still has dynamic binding which was previously established by defvar. This is true even though let usually always creates a lexical binding.
  3. shatter-illusions, when called, inherits the dynamic bindings of the calling code; hence, fortytwo will still have a value of 4711!

Kyoto Common Lisp defines defvar as follows:

(defmacro defvar (var &optional (form nil form-sp) doc-string)
  `(progn (si:make-special ',var)
          ,(if (and doc-string *include-documentation*)
               `(si:putprop ',var ,doc-string 'variable-documentation))
          ,(if form-sp
               `(or (boundp ',var)
                    (setq ,var ,form)))
          ',var))

In the highlighted form, the variable name is declared as special, which is equivalent with dynamic binding in Lisp.

This effect is quite surprising for a C++ programmer. I work with both Lisp and C++, switching back and forth several times a day, so I try to minimize the number of surprises a much as I can. Hence, I usually stay away from special/dynamic Lisp variables, i.e. I tend to avoid defvar and friends and only use them where they are really required.

Unfortunately, defvar and defparameter are often recommended in Lisp tutorials to declare global variables. Even in these enlightened times, there's still an occasional need for a global variable, and if you follow the usual examples out there, you'll be tempted to quickly add a defvar to get the job done. Except that now you've got a dynamically bound variable without even really knowing it, and if you expected this variable to behave like a global variable in C++, you're in for a surprise:

  > (print fortytwo)
  42
  42
  > (quelle-surprise)
  shatter-illusions returns 4711
  NIL
  > (shatter-illusions)
  42
  > (print fortytwo)
  42
  42

So you call shatter-illusions once through quelle-surprise, and it tells you that the value of the variable fortytwo, which is supposedly global, is 4711. And then you call the same function again, only directly, and it will tell you that this time fortytwo is 42.

The above code violates a very useful convention in Lisp programming which suggests to mark global variables with asterisks (*fortytwo*). This, along with the guideline that global variables should only be modified using setq and setf rather than let, will avoid most puzzling situations like the above. Still, I have been confused by the dynamic "side-effect" of global variables declared by defvar often enough now that I made it a habit to question any defvar declarations I see in Lisp code.

More on avoiding global dynamic variables next time.


When asked for a TWiki account, use your own or the default TWikiGuest account.


Beware of unknown enumerators (26.1.2006)

The other day, I was testing COM clients which accessed a collection class via a COM-style enumerator (IEnumVARIANT). And those clients crashed as soon as they tried to do anything with the enumerator. Of course, the same code had worked just fine all the time before. What changed?

In COM, a collection interface often implements a function called GetEnumerator() which returns the actual enumerator interface (IEnumVARIANT), or rather, a pointer to the interface. In my case, the signature of that function was:

   HRESULT GetEnumerator(IUnknown **);

Didn't I say that GetEnumerator is supposed to return an IEnumVARIANT pointer? Yup, but for reasons which I may cover here in one of my next bonus lives, that signature was changed from IEnumVARIANT to IUnknown. This, however, is merely a syntactic change - the function actually still returned IEnumVARIANT pointers, so this alone didn't explain the crashes.

Well, I had been bitten before by smart pointers, and it happened again this time! The COM client code declared a smart pointer for the enumerator like this:

  CComPtr<IEnumVARIANT> enumerator = array->GetEnumerator();

This is perfectly correct code as far as I can tell, but it causes a fatal avalanche:

  • The compiler notices that GetEnumerator returns an IUnknown pointer. This doesn't match the constructor of this particular smart pointer which expects an argument of type IEnumVARIANT *.
  • So the compiler looks for other matching constructors.
  • It doesn't find a matching constructor in CComPtr itself, but CComPtr is derived from CComPtrBase which has an undocumented constructor CComPtrBase(int).
  • To match this constructor, the compiler converts the return value of GetEnumerator() into a bool value which compresses the 32 or 64 bits of the pointer into a single bit! (Ha! WinZip, can you beat that?)
  • The boolean value is then passed to the CComPtrBase(int) constructor.
  • To add insult to injury, this constructor doesn't even use its argument and instead resets the internally held interface pointer to 0.

Any subsequent attempt to access the interface through the smart pointer now crashes because the smart pointer tries to use its internal interface pointer - which is 0.

All this happens without a single compiler or runtime warning. Now, of course it was our own silly fault - the GetEnumerator declaration was bogus. But neither C++ nor ATL really helped to spot this issue. On the contrary, the C++ type system (and its implicit type conversions) and the design of the ATL smart pointer classes collaborated to hide the issue away from me until it was too late.


When asked for a TWiki account, use your own or the default TWikiGuest account.


It's official: Microsoft's compiler is twice as good as the HP-UX compiler! (24.1.2006)

A few days ago, I dissed good ol' aCC on the HP-UX platform, but for political correctness, here's an amusing quirk in Microsoft's compiler as well. Consider the following code:

typedef struct foobar gazonk;
struct gazonk;

The C++ compiler which ships with VS.NET 2003 is quite impressed with those two lines:

  fatal error C1001: INTERNAL COMPILER ERROR
  (compiler file 'msc1.cpp', line 2701)
  Please choose the Technical Support command on the Visual C++
  Help menu, or open the Technical Support help file for more information

What the compiler really wants to tell me is that it does not want me to redefine gazonk. The C++ compiler in VS 2005 gets this right.

If you refer back to the previous blog entry, you'll find that it took me only one line to crash aCC on HP-UX. It took two lines in the above example to crash Microsoft's compiler. Hence, I conclude that their compiler is only half as bad as the HP-UX compiler.

If you want to argue with my reasoning, let me tell you that out there in the wild, I have rarely seen a platform-vs-platform discussion based on facts which were any better than that. Ahem... big grin


When asked for a TWiki account, use your own or the default TWikiGuest account.


Should Lisp programmers be slapped in public? (17.1.2006)

After fixing a nasty bug today, I let off some steam by surfing the 'net for fun stuff and new developments. For instance, Bjarne Stroustrup recently reported on the plans for C++0x. I like most of the stuff he presents, but still was left disturbingly unimpressed with it. Maybe it's just a sign of age, but somehow I am not really thrilled anymore by a programming language standard scheduled for 2008 which, for the first time in the history of the language, includes something as basic as a hashtable.

Yes, I know that pretty much all the major STL implementations already have hashtable equivalents, so it's not a real issue in practice. And yes, there are other very interesting concepts in the standard which make a lot of sense. Still - I used to be a C++ bigot, but I feel the zeal is wearing off; is that love affair over?

Confused and bewildered, I surf some other direction, but only to have Sriram Krishnan explain to me that Lisp is sin. Oh great. I happen to like Lisp a lot - do I really deserve another slap in the face on the same day?

But Sriram doesn't really flame us Lisp geeks; quite to the contrary. He is a programmer at Microsoft and obviously strongly impressed by Lisp as a language. His blog entry illustrates how Lisp influenced recent developments in C# - and looks at reasons why Lisp isn't as successful as many people think it should be.

Meanwhile, back in the C++ jungle: Those concepts are actually quite clever, and solve an important problem in using C++ templates.

In a way, C++ templates use what elsewhere is called duck typing. Why do I say this? Because the types passed to a template are checked implicitly by the template implementation rather than its declaration. If the template implementation says f = 0 and f is a template parameter, then the template assumes that f provides an assignment operator - otherwise the code simply won't compile. (The difference to duck typing in its original sense is that we're talking about compile-time checks here, not dynamic function call resolution at run-time.)

Hence, templates do not require types to derive from certain classes or interfaces, which is particularly important when using templates for primitive types (such as int or float). However, when the type check fails, you'll drown in error messages which are cryptic enough to violate the Geneva convention. To fix the error, the user of a template often has to inspect the implementation of the template to understand what's going on. Not exactly what they call encapsulation.

Generics in .NET improve on this by specifying constraints explicitly:

  static void Foobar<T>(IFun<T> fun) where T : IFunny<T>
  {
    ... function definition ...
  }

T is required to implement IFunny. If it doesn't, the compiler will tell you that T ain't funny at all, and that's that. No need to dig into the implementation details of the generic function.

C++ concepts extend this idea: You can specify pretty arbitrary restrictions on the type. An example from Stroustrup's and Dos Reis' paper:

  concept Assignable<typename T, typename U=T> {
    Var<T> a;
    Var<const U> b;
    a = b;
  };

  ;; using this in a template definition:
  template <typename T, typename U>
    where Assignable<T, U>
  ... template definition ...

So if T and U fit into the Assignable concept, the compiler will accept them as parameters of the template. This is cute: In true C++ tradition, this provides maximum flexibility and performance, but solves the original problem.

Still, that C# code is much easier on the eye...


When asked for a TWiki account, use your own or the default TWikiGuest account.


Address of the union (11 Jan 2006)

The following C++ code will be rejected by both Visual C++ and gcc:

  class BOX {
  public:
    BOX() {}
  };

  union {
    void *pointers[8];
    BOX box;
  };

gcc says something like "error: member `BOX ::box' with constructor not allowed in union"; Visual C++ reports a compiler error C2620.

Now that is too bad, because in my particular case, I really needed both a union (to save memory in a critical area of the code) and that box member with a default constructor! Now I'm sure that all those CEOs around the world who are currently sacking people in the thousands would readily agree that union members aren't constructive enough, but why even turn this into a C++ language rule? big grin

I have a workaround for this now, but I'm still a little puzzled about the compiler restriction. My guess is that the compiler is trying to avoid intialization ambiguity in a scenario like this:

  class FOO {
    int foo;
  public:
    FOO() : foo(42) {}
  };

  class BAR {
    int bar;
  public:
    BAR() : bar(4711) {}
  };

  union {
    FOO foo;
    BAR bar;
  };

Which constructor "wins" here? But then, C++ isn't exactly over-protective in other areas, either, so if I want to shoot myself into the foot, get out of my way, please. Or is there another reason? Hints most welcome.


When asked for a TWiki account, use your own or the default TWikiGuest account.


A syntax error in a comment, or: The case of the vanishing parenthesis (10 Jan 2006)

The other day, Visual C++ would not compile this code, reporting lots and lots of errors:

  if (!strncmp(text, "FOO", 3)) 
  {
    foobar(text); //üüü 
  }
  else 
  {
    gazonk();
  }

Now this was funny because that code had not changed in ages, and so far had compiled just fine. At first, I couldn't explain what was going on. Hmmmm... note the funny u-umlauts in the comment. Why would someone use a comment like that? Well, the above code was inherited from source code originally written on an HP-UX system. For long years, the default character encoding on HP-UX systems has been Roman8. In that encoding, the above comment looked like this:

    foobar(text); //■■■

(If your browser cannot interpret the Unicode codepoint U+25A0, it represents a filled box.)

So the original programmer used this special character for graphically highlighting the line. In Roman8, the filled box has a character code of 0xFC. On a Windows system in the US or Europe, which defaults to displaying characters according to ISO8859-1 (aka Latin1), 0xFC will be interpreted as the German u-umlaut ü.

So far, so good, but why the compilation errors?

On the affected system, I ran the code through the C preprocessor (cpp), and ended up with this preprocessed version:

  if (!strncmp(text, "FOO", 3)) 
  {
    foobar(text);
  else 
  {
    gazonk();
  }

Wow - the preprocessor threw away the comment, as expected, but also the closing parenthesis } on the next line! Hence, the parentheses in the code are now unbalanced, which the compiler complains bitterly about.

But why would the preprocessor misbehave so badly on this system? Shortly before, I had installed the Windows multi-language UI pack (MUI) to run tests in Japanese; because of that, the system defaulted to a Japanese locale. In the default Japanese locale, Windows assumes that all strings are encoding according to the Shift-JIS standard, which is a multi-byte character set (MBCS).

Shift-JIS tries to tackle the problem of representing the several thousands of Japanese characters. The code positions 0-127 are identical with US ASCII. In the range from 128-255, some byte values indicate "first byte of a two-byte sequence" - and 0xFC is indeed one of those indicator bytes.

So the preprocessor reads the line until it finds the // comment indicators. The preprocessor changes into "comment mode" and reads all characters until the end of the line, only to discard them. (The compiler doesn't care about the comments, so why bother it with them?)

Now the preprocessor finds the first 0xFC character, and - according to the active Japanese locale - assumes that it is the first byte of a two-byte character. Hence, it reads the next byte (also 0xFC, the second "box"), converts the sequence 0xFC 0xFC into a Japanese Kanji character, and throws that character away. Then the next byte is read, which again is 0xFC (the third "box" in the comment), and so the preprocessor will slurp another byte, interpreting it as the second byte of a two-byte character.

But the next byte in the file after the third "box" is a 0x0A, i.e. the line-feed character which indicates the end of the line. The preprocessor reads that byte, forms a two-byte character from it and its predecessor (0xFC), discards the character - and misses the end of the line.

The preprocessor doesn't have a choice now but to continue searching for the next LF, which it finds in the next line, but only after the closing parenthesis. Which is why that closing parenthesis never makes it to the compiler. Hocus, pocus, leavenotracus.

So special characters in comments are not a particularly brilliant idea; not just because they might be misinterpreted (in our case, displayed as ü instead of the originally intended box), but because they can actually cause the compiler to fail.

If you think this could only happen in a Roman8 context, consider this variation of the original code:

  if (!strncmp(text, "MENU", 4)) 
  {
    display_ui(text); //Menü
  } 
  else 
  {
    gazonk();
  }

Here, we're simply using the German translation for menu in the comment; we're not even trying to be "graphical" and draw boxes in our comments. But even this is enough to cause the same compilation issue as with my original example.

Now, in my particular case, the affected code isn't likely to be compiled in Japan or China anytime soon, except in that non-standard situation when I performed my experiments with the MUI pack and a Japanese UI. But what if your next open-source project attracts hundreds of volunteers around the world who want to refine the code, and some of those volunteers happen to be from Japan? If you're trying to be too clever (or too patriotic) in your comments, they might have to spend more time on finding out why the code won't compile than on adding new features to your code.


When asked for a TWiki account, use your own or the default TWikiGuest account.


To cut a long filename short, I lost my mind (8.1.2006)

Yesterday, I explained how easy it is to inadvertedly load the same executable twice into the same process address space - you simply run it using its short DOS-ish filename (like Sample~1.exe) instead of its original long filename (such as SampleApplication.exe). For details, please consult the original blog entry.

I mentioned that one fine day I might report how exactly this happened to us, i.e. why in the world our app was started using its short filename. Seems like today is such a fine day smile

Said application registered itself as a COM server, and it does so using the services of the ATL Registrar. Upon calling RegisterServer, the registrar will kindly create all the required registry entries for a COM server, including the LocalServer entry which contains the path and filename of the server. Internally, this will call the following code in atlbase.h:

inline HRESULT WINAPI CComModule::UpdateRegistryFromResourceS(UINT nResID, 
  BOOL bRegister, struct _ATL_REGMAP_ENTRY* pMapEntries)
{
   USES_CONVERSION;
   ATL::CRegObject ro;
   TCHAR szModule[_MAX_PATH];
   GetModuleFileName(_pModule->GetModuleInstance(), szModule, _MAX_PATH);

   // Convert to short path to work around bug in NT4's CreateProcess
   TCHAR szModuleShort[_MAX_PATH];
   GetShortPathName(szModule, szModuleShort, _MAX_PATH);
   LPOLESTR pszModule = T2OLE(szModuleShort);
   ...

Aha! So ATL deliberately converts the module name (something like SampleApplication.exe) into its short-name equivalent (Sample~1.exe) to work around an issue in the CreateProcess implementation of Windows NT. MSKB:179690 describes this problem: CreateProcess could not always handle blanks in pathnames correctly, and so the ATL designers had to convert the path into its short-path version which converts everything into an 8+3 filename and hence guarantees that the filename contains no blanks.

Adding insult to injury, MSKB:201318 shows that this NT-specific bug fix in ATL has a bug itself... and, of course, our problem is, in fact, caused by yet another bug in the bug fix (see earlier blog entry).

For my application, the first workaround was to use a modified version of atlbase.h which checks the OS version; if it is Windows 2000 or later, no short-path conversion takes place. Under Windows NT, however, we're caught in a pickle: Either we use the original ATL version of the registration code and thus map the executable twice into the address space, or we apply the same fix as for Windows 2000, and will suffer from the bug in CreateProcess if the application is installed in a path which has blanks in the pathname.

In my case, this was not a showstopper issue because the application is targeting Windows 2000 and XP only, so I simply left it at that.

Another approach is to use the AddReplacement and ClearReplacements APIs of the ATL registrar to set our own conversion rules for the module name and thereby override ATL's own rules for the module name:

#include <atlbase.h>
#include <statreg.h>

void RegisterServer(wchar_t *widePath, bool reg)
{
  ATL::CRegObject ro;
  ro.AddReplacement(L"Module", widePath);
  reg ? ro.ResourceRegister(widePath, IDR_REGISTRY, L"REGISTRY") :
        ro.ResourceUnregister(widePath, IDR_REGISTRY, L"REGISTRY");
}


When asked for a TWiki account, use your own or the default TWikiGuest account.


Fingerpointing at smart pointers (31 Dec 2005)

In an ATL COM client which uses #import to generate wrapper code for objects, I recently tracked down a subtle reference-counting issue down to this single line:

  IComponentArray *compArray = app->ILoadComponents();

This code calls a method ILoadComponents on an application object which returns an array of components. Innocent-looking as it is, this one-liner caused me quite a bit of grief. If you can already explain what the reference counting issue is, you shouldn't be wasting your time reading this blog. For the rest of us, I'll try to dissect the problem.

(And for those who don't want to rely on my explanation: After I had learnt enough about the problem so that I could intelligently feed Google with search terms, I discovered a Microsoft Knowledge Base article on this very topic. However, even after reading the article, some details were still unclear to me, especially since I don't live and breathe ATL all day.)

The #import statement automatically generates COM wrapper functions. For ILoadComponents, the wrapper looks like this:

inline IComponentArrayPtr IApplication::ILoadComponents () {
    struct IComponentArray * _result = 0;
    HRESULT _hr = raw_ILoadComponents(&_result);
    if (FAILED(_hr)) _com_issue_errorex(_hr, this, __uuidof(this));
    return IComponentArrayPtr(_result, false);
}

IComponentArrayPtr is a typedef-ed template instance of _com_ptr_t. The constructor used in the code snippet above will only call AddRef on the interface pointer if its second argument is true. In our case, however, the second arg is false, so AddRef will not be called. The IComponentArrayPtr destructor, however, always calls Release().

Feeling uneasy already? Yeah, me too. But let's follow the course of action a little bit longer. When returning from the wrapper function, the copy constructor of the class will be called, and intermediate IComponentArrayPtr objects will be created. As those intermediate objects are destroyed, Release() is called.

Now let us assume that the caller looks like above, i.e. we assign the return value of the wrapper function to a CComPtr<IComponentArray> type. The sequence of events is as follows:

  • Wrapper function for ILoadComponents is called.
  • Wrapper function calls into the COM server. The server returns an interface pointer for which AddRef() was called (at least) once inside the server. The reference count is 1.
  • Wrapper function constructs an IComponentArrayPtr smart pointer object which simply copies the interface pointer value, but does not call AddRef(). The refcount is still 1.

Now we return from the wrapper function. In C++, temporary objects are destroyed at the end of the "full expression" which creates them. See also section 6.3.2 in Stroustrup's "Design and Evolution of C++". This means that the following assignment is safe:

  CComPtr<IComponentArray> components = app->ILoadComponents();

ILoadComponents returns an object of type IComponentArrayPtr. At this point, the reference count for the interface is 1 (see above). The The compiler casts IComponentArrayPtr to IComponentArray*, then calls the CComPtr assignment operator which copies the pointer and calls AddRef on it. The refcount is now 2. At the completion of the statement, the temporary IComponentArrayPtr is destroyed and calls Release on the interface. The refcount is 1. Just perfect.

Now back to the original client code:

  IComponentArray *compArray = app->ILoadComponents();

Here, we assign to a "raw" interface pointer, rather than to a CComPtr, When returning from the wrapper function, the refcount for the interface is 1. The compiler casts IComponentArrayPtr to IComponentArray* and directly assigns the pointer. At the end of the statement (i.e. the end of the "full expression"), the temporary IComponentArrayPtr is destroyed and calls Release, decrementing the refcount is 0. The object behind the interface pointer disappears, and subsequent method calls on compArray will fail miserably or crash!

So while ATL, in conjunction with the compiler's #import support, is doing its best to shield us from the perils of reference counting bugs, it won't help us if someone pulls the plug from the ATL force-field generator by incorrectly mixing smart and raw pointers.

This kind of reference counting bug would not have occurred if I had used raw interface pointers throughout; the mismatch in calls to AddRef and Release would be readily apparent in such code. However, those smart pointers are indeed really convenient in practice because they make C++ COM code so much simpler to read. However, they do not alleviate the programmer from learning about the intricacies of reference counting. You better learn your IUnknown before you do CComPtr.

This reminds me of Joel Spolsky's The Perils of JavaSchools, which is soooo 1990 (just like myself), but good fun to read.


When asked for a TWiki account, use your own or the default TWikiGuest account.


Revision: r1.1 - 27 Feb 2006 - 06:01 - ClausBrod
Blog > WebLeftBar > BlogOnSoftwareCPlusPlus
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback