Skip to content

Building a Generic Statistics Library, Part 1: Interface

Let me start this post with an apology: I know that generics are a new technology in Delphi for Win32 and that many people might be looking for a simple, gentle introduction to using them.  However, I’m not going to write that sort of post, at least today.  Instead, I’m going to probe the limitations of Delphi 2009 generics, and work around them when possible.

You can do a lot with Delphi 2009’s generic types, but one thing that you can’t do is specify an "operator constraint." This means that I cannot perform operations like +, -, implicit type conversions, etc., on a type passed as a type parameter. In my last post on this subject, I suggested one possible solution, and experimented enough to show that it does work.  But, as I noted in the post, I suspected it might not be the best way. In particular, I didn’t like the fact that my previous approach used virtual methods, and it required creating a new "adder" type for each value type you wanted to use as a type parameter to the generic type.  That’s just too much work.

In the comments to another post on the subject of Delphi 2009 generics, Raymond asked for an implementation of MapReduce.  I thought about how I might demonstrate that in the context of generics, and decided that implementing the higher order functions Fold and Map would be a slightly better demonstration than MapReduce.  But since higher order functions can be a pretty abstract concept, I wanted to show an example of how you might use them, so I will use these functions to build a library that does some simple statistical operations, namely, computing the average and the standard deviation of a list of numbers.

By using generics, I can write the algorithms for average and standard deviation once and reuse them for any type which supports certain primitive operations, like addition and taking the square root.

I’m going to cover a lot of ground in this demo, so I’ll be breaking it down into several posts.  In today’s post, I’m going to examine how data will be supplied to the generic statistical routines.

Most statistical functions take a list of numbers as input and supply one or more numbers as output.  For example, the arithmetic mean, or average, function takes a list of numbers and returns a single number.  A function which returns the mode or modes of a list would take as input a list of numbers and return a different list, which might have one entry, or many.

So right away, I have a problem: I’d like to take a list of numbers as an invalid argument to a function, but I don’t know what type the numbers in the list will be, never mind the type of the list.  Ideally, I would be able to write something like this:

function Average<A, R>(AData: IEnumerable<A>): R;

This declaration should mean that the function Average accepts any list of values of type A which can be enumerated in a for…in loop and returns a value of type R. However, there are a few roadblocks in the Delphi 2009 RTL which make this impossible.

While the Delphi 2009 RTL does define an IEnumerable<T> interface type, neither TList<T> nor its abstract parent type TEnumerable<T> implement this interface.  If a trivial filesystem search can be believed, nothing else does, either. I’ve tried, unsuccessfully, to implement it myself; that’s one of many things that I haven’t figured out how to do in Delphi 2009 yet, and is perhaps the subject for another post.

My wild guess is that the most common type users of my statistics library will want to pass as the argument AData will be TList<A>. So I have to support that.  But I’d like to support other types of lists, as well.  It seems to me that the best I can do today is to specify TEnumerable<T> as the list argument type.  Then, at least, a user of the library can use other subtypes of TEnumerable<T>, such as TStack<T>, TQueue<T>, etc.

So my function declaration now looks like this:

function Average<A, R>(AData: TEnumerable<A>): R;

This, unfortunately, means that the user of the library cannot supply input data in the form of an array, although it’s trivial to write a class which exposes array data in the form of a TEnumerable<T> (using the facade pattern), so this isn’t a huge loss.

In order to make the code I’m going to show simpler, I’ll use the same type argument for the input and output.  In other words, if the function is passed a list of integers, it will return the average as an integer.  If it’s passed a list of doubles, it will return the average as a double.  This is probably not ideal for a general-purpose statistical library, but, for the purposes of this demo, it makes the code quite a bit easier to understand.

That makes the function prototype look like this:

function Average<T>(AData: TEnumerable<T>): T;

However, since I have no way of specifying an "operator constraint" on T, and because specifying an interface constraint on T would prevent using my functions with primitive types, at least in Delphi 2009, and because I suspect that primitive types are the most common use case for a statistical library, I have to tell the function how to perform the mathematical operations it needs to do its work.  In order to compute an average, I need to be able to add two values and divide two values.  For reasons that I’ll cover in a future post, it will also be useful if I can map an integer value into the type of the type argument, so I’m going to require a function for that, as well.

Here’s the final version of the function’s interface:

type
  TBinaryOp<T> = reference to function(ALeft, ARight: T): T
  TStatistics<T> = class
    public
      class function Average(const AData: TEnumerable<T>;
                             AAdder, ADivider: TBinaryOp<T>;
                             AMapper: TFunc<integer, T>): T; overload;

Calling methods with long lists of arguments can be kind of annoying, so I’ve marked a function as overloadable.  This allows me to subtype the class if I want to, implementing the primitive operations, and introducing an overloaded call to Average that passes those implementations, like this:

  TIntStatistics = class(TStatistics<integer>)
  private
    class function Add(ALeft, ARight: integer): integer;
    class function Divide(ALeft, ARight: integer): integer;
    class function Mapper(AValue: integer): integer;
  public
    class function Average(const AData: TEnumerable<integer>): integer; overload;
  end;

That gives me much closer to the "ideal" function prototype I designed above, but requires a new subtype to be implemented.  The nice thing about this current approach is that you can pick whichever solution you like: Implement a subtype if you want a simpler call to Average, or just pass the appropriate functions if you’d like to use a type parameter for which you don’t have a subtype available.

Since I’ve covered quite a bit of material just on the subject of the interface for one function, I’m going to take a break for today.  Stay tuned for part 2!

{ 4 } Comments

  1. Jolyon Smith | September 10, 2008 at 9:45 pm | Permalink

    I am told that .NET has operator constraints. If so, it’s a real shame that didn’t find it’s way into the Delphi Win32 generics implementation.

    Too many concrete applications of generics, outside of simple containers, would seem to require a greater richness than Delphi 2009 generics currently provide.

    I’m beginning to think that an approach based more on C++ templates would have been preferable (or at least a useful addition, if not instead of).

    A slight tweak to the syntax might yet enable this:

    Average(aData: IEnumerable): Double;

    For which:

    Average(..);

    would be fine but

    Average(..);

    would fail to compile (when the compiler attempts to emit : sum of strings divided by number of strings).

    The operator constraints approach could deal with things in this case more cleanly though:

    Average(aData: IEnumerable): Double;

    This is an off-the-cuff speculation at the sort of contraints specification that might be needed (i.e. T supports Addition of T yielding T and division by T yielding Double).

    Assuming that Addition and Division indicate the calculation of a mean average, the result is surely a floating point of one form or another, not T. I’ve simply opted for Double in my decls above.

  2. Jolyon Smith | September 10, 2008 at 10:01 pm | Permalink

    AAARRGH! Why did no one realise that using < and > in generics syntax would make comments in technical blogs a nightmare!!!!

  3. Raymond | September 11, 2008 at 2:55 pm | Permalink

    Craig,

    Thanks for taking the time out to work up this series of posts.

    You mention that neither Tlist nor TEnumerable implement IEnumerable in the RTL.

    Would this be hard to add as a modification to a local version of the RTL source?

    Yes, I know everyone will say that will break things in deployed and third party packages etc, and they’d be right. But we don’t use packages, and we recompile from source all our third party components so, so it’s not a problem for us :-)

    Raymond.

  4. Jolyon Smith | September 12, 2008 at 8:13 pm | Permalink

    Fair enough - for some reason that explanation of the return type didn’t register.

    As far as op constraints in generics are concerned, I’ll need to have word with my .NET colleagues all of whom seem to be labouring under a misunderstanding of .NET generics.

    (I don’t doubt you - a quick google seems to confirm that op constraints are also not supported in .NET)

Post a Comment

Your email is never published nor shared. Required fields are marked *

Bad Behavior has blocked 713 access attempts in the last 7 days.

Close