Latest Posts


Don’t Depend Upon the ASP.NET Membership Tables

One very popular option for implementing user security in ASP.NET is to use Forms Authentication with the SQL Server membership provider. This provider creates several database tables to store user-related information, as well as a number stored procedures.

From time to time, a developer will attempt to add the ASP.NET Membership/Forms Authentication tables to their Entity Framework model (or LINQ to SQL, NHibernate, etc.) model. Before doing this, they will often have created referential constraints against these tables. When the mapping doesn’t work out quite the way they planned, they will ask how to make the mapping work.

There’s only one correct answer to this question: Don’t do it at all!

There are a number of good reasons why you should not make your database and code depend upon the SQL Membership Provider database schema. In this post, I will focus on a few of the most important:

  • Separation of concerns
  • Membership and authentication providers are supposed to be interchangeable
  • The SQL Membership Provider database schema is an implementation detail

Separation of Concerns

Your application’s data model is designed to fit your application domain. It will change based upon the needs of the end users. It should not have to change because Microsoft decides to update the SQL membership provider, as well. That would violate the single responsibility principle. It is often dangerous to combine data which is not closely related into a single data model. This danger is compounded when data from two separate domains, written by entirely different companies, and designed for orthogonal purposes is shoehorned into a single entity model.

Membership providers are supposed to be interchangeable

One of the most important design intentions of the ASP.NET authentication and membership provider model is to make it easy to interchange providers. If you decide to stop using the SQL membership provider in transition to Open ID, domain authentication, Facebook authentication, etc., this should be a matter of, at most, a couple of days work to migrate data from one provider to the other, rather than a complete rewrite of your application, starting with the database and moving out from there.

The SQL Membership Provider database schema is an implementation detail

The publicly-documented interfaces to membership and forms authentication are the Membership and FormsAuthentication types, respectively, as well as the relevant sections of the Web.config file. If you write your code around these types, you have a reasonable expectation that your code will continue to work when the .NET framework is next updated. On the other hand, if you query the database directly, there is no guarantee that the schema will not change when the next version of .NET ships. If Microsoft makes a security-related change to the SQL membership provider, then it is conceivable that the schema could even change with a service pack. The cost of relying on an implementation detail is that you never really know.


posted @ Fri, 05 Mar 2010 20:48:27 +0000 by Craig Stuntz


jqGrid and XSS Security

Version 3.5.2 of jqGrid included an important new feature:

Now when autoencode is set to true we encode the data coming from server and not only when we post it (secutity fix)

Prior to this, you were required to encode the data yourself.

Now personally, I think that should be the default. But it would have been a breaking change for the grid, since there are a few cases where you want to display unencoded data (I’ll discuss these exceptional cases in a second).

It’s really easy to make this the default for your application. Set the grid’s defaults before you create any grids:

    $.jgrid.defaults = $.extend($.jgrid.defaults, {
            autoencode: true,
            datatype: 'json',
            // etc....

You can override this in the rare cases when you don’t want encoded data by setting autoencode false when setting up your grid:

    $("#grid").jqGrid({
                    autoencode: false,
                    url: "/Some/Path",
                    // etc....,

So why wouldn’t you want to use this setting? I can think of two reasons:

  1. The server is returning markup which you don’t want to encode. If you have a server method which is returning markup (e.g., "<img src=…"), then you won’t want the grid to encode it. It then becomes your responsibility to sanitize the rest of the grid data on the server.
  2. There is a performance cost to doing the encoding in JavaScript. If you are absolutely certain that the server method you’re calling can return only sanitized data, then you can turn off the auto encoding and save this cost. Note that many JSON encoders, such as the "Json()" method in ASP.NET MVC will not encode HTML by default.

Note that you can only set autoencode at the grid level, not at the column level.

If you use custom formatters in your grid, then you should note that the strings they return will not be HTML encoded, even if you set autoencode true. This is probably a good thing, since it is common to use custom formatters to return HTML. However, it means that if you write a custom formatter, then you must ensure that any user data contained in the string and returns is encoded. You can use the grid’s encoder, which is little more than a string replace on angle brackets, in a custom formatter as follows:

    myFormatter: function(cellval, opts, action) {
        if (cellval) {
            return $.jgrid.htmlEncode(cellval.Name + "");
        };
        return "";
    }

posted @ Mon, 08 Feb 2010 20:50:22 +0000 by Craig Stuntz


Entity Framework Models and Source Control

As you’re probably aware, an Entity Framework model is stored in a single XML file, with the extension EDMX. Developers occasionally ask if this means that two people cannot work on the entity model concurrently. My answer to this is, "It depends." But I can give you some tips to make it easier.

Obviously, if you use a source control tool which locks files on check out, then working concurrently on just about anything will be impossible. So I’m going to presume that your source control tool supports working concurrently, without locks, and has a decent merge tool to handle conflicts at check-in.

We tend to think of the EDMX files as having three sections:

  • CSDL, which describes the client schema. This will be used when generating code for your entity classes.
  • SSDL, which describes the storage schema, more commonly known as your database metadata.
  • MSL, which describes the relationship between the first two.

In reality, however, the EDMX file has two main sections. The first, edmx:Runtime, contains the three sub-sections described above. The second, edmx:Designer, contains a couple of important properties, such as MetadataArtifactProcessing, plus a lot of information about the position of objects on the GUI designer, in a sub-node called edmx:Diagrams. The illustration below should make this clear:

Why is this important? My experience is that any decent merge tool doesn’t tend to have any problem merging what we typically think of as the "EDMX", namely the CSDL, SSDL, and MSL. It is well-formed XML with descriptive tags; it’s the sort of thing that text merge tools tend to do very well with. There are a couple of tips worth knowing, but before I describe them, I’d like to focus on the more common issue with EDMX merges.

In my experience, if two people change the layout of entities in the designer and then attempt to merge, the merge will probably fail. Moreover, you will find it difficult to impossible to fix this up manually, something which is generally fairly easy with merges of source code or other parts of the EDMX. When you look at what is in the edmx:Diagrams node, you will quickly understand why:

I strongly suspect that this "feature" is not specific to the Entity Framework designer, but rather is shared amongst all of Visual Studio’s various diagramming tools. At any rate, I think you will have a difficult time merging XML like this, so I have two recommendations regarding merging and EDMX files:

  • If you are going to have two or more developers work on an EDMX file concurrently, then don’t change anything in the designer. If two or more people do this, it almost guarantees a conflict on check-in.
  • The other side of the coin is that if you must change something in the designer, then you should give a heads-up to other developers, and exclusively lock the EDMX file while you work.

That said, we don’t generally use the GUI designer at all. It is too unwieldy to have more than a dozen or so types on a single design surface. On the other hand, the tree-style Model Browser is quite useful. I suggest using the Model Browser instead of the GUI designer to navigate your entity model.

Perhaps a future version of the Entity Framework designer will allow for multiple diagrams within a single entity model, somewhat like SQL Server diagrams. If each diagram had its own node in the XML, you might even be able to merge them.

If you steer clear of issues with the designer, then you will typically find that EDMX changes can be automatically merged by any decent merge tool (unless, of course, they actually conflict). In the unlikely event, however, that your merge tool reports a conflict, and you don’t think it’s an actual conflict between your changes and those made by the other developer, it does help to understand how the "Update Model from Database" wizard updates existing EDMX:

  1. The wizard will generate new SSDL from scratch. It will then go through the newly generated SSDL, and replace each matching node in the existing SSDL with the newly generated version. There are a few SSDL features which are not supported by the designer/wizard, such as will, and these will be left alone, if they exist in your model.
  2. Any new entities or new properties of existing entities will have corresponding nodes added to the CSDL and MSL. But existing entities and properties will not have their CSDL and MSL updated, because the wizard presumes that you want to keep any changes you may have made to them.

Therefore, when multiple developers are updating an EDMX file concurrently, they should use the same database. This will tend to make SSDL changes merge without conflict, and allows you to simply overwrite any false conflicts (merge tool failures) in the SSDL, because you can be confident that it was generated from the same object in the same database. But if you’ve manually customized your SSDL (i.e., edited the SSDL section of the EDMX as text), then you should keep an eye on your changes, to make sure that the wizard does not overwrite them. Most people never do this, though.

On the other hand, changes to the CSDL and MSL must be reviewed carefully in the unlikely event that there are conflicts in the merge, because you, or the other developer, may have made customizations to the mapping or entity types which you want to keep.

Finally, it is worth mentioning that the Entity Framework version 4 supports "code only" models which do not use EDMX files at all. If you prefer designing your entity model via source code, you can choose this option.


posted @ Wed, 03 Feb 2010 19:58:19 +0000 by Craig Stuntz


join in LINQ to SQL and LINQ to Entities Considered Messy, Redundant

In this post I will demonstrate that use of the join keyword in LINQ to SQL and LINQ to Entities is nearly always wrong. LINQ queries which you write with the join keyword are harder to read and write than queries you write using associations, and they require knowledge of database metadata which is not required otherwise. This introduces the potential for errors and makes maintenance harder.

Many people ask how to do a "left join" in LINQ to SQL, and unfortunately, the answer they nearly always get — "Use DefaultIfEmpty!" — is, in my opinion, terrible advice. Let’s implement the same query with and without the join keyword, and then compare the readability of the queries, the functionality, the knowledge required to write the query, and the maintainability of the code. I think you will find that using associations wins on every single criterion I examine.

I’m going to use the Northwind demo database for this example, since many people are familiar with its structure. I created a LINQ to SQL model for Northwind by simply dragging all of the tables in the database onto the LINQ to SQL designer. The only change I made to the model generated by the designer is to rename the "Employee1" property on the generated "Employee" type to the more descriptive name "Supervisor." I’m using LINQ to SQL for this demo, but everything I’m saying here applies equally to LINQ to Entities.

"Left" Joins

Let’s imagine that I am asked to produce a web page listing all employees in a company, along with their supervisor, if any. This requires a "left join," in SQL terms, because not all employees have a supervisor. I’ll project onto a presentation model, just like I do in LINQ to Entities. Using the association properties generated by LINQ to SQL, this is quite simple:

This is fairly readable. The one thing that you need to know is that both LINQ to SQL and LINQ to Entities coalesce nulls. This means that on a row where e.Supervisor is null, you will not get a NullReferenceException in the assignment to SupervisorName and SupervisorBirthDate, as you would with LINQ to Objects. Instead, null will be assigned. Therefore, it is important that EmployeeListItem.SupervisorBirthDate is of type DateTime? (a.k.a. Nullable<DateTime>) instead of the non-nullable DateTime.

Let’s compare that to the equivalent query using the join syntax, using the mysteriously popular DefaultIfEmpty trick:

Yuck! This is far less readable than the query above. Yet these two queries produce exactly the same results. If you don’t believe me, download the sample project attached to this post and try it yourself. Perhaps even worse than the general unreadability of the "join version" is the fact that this query requires knowledge of the structure of the database which is already present in the DBML (or EDMX, in the case of the Entity Framework) model. This is a problem for two reasons. First, it’s an opportunity for programmers to make a mistake, which the first query eliminates. Second, it’s a potential maintenance issue if the foreign key definition ever changes in the database.

"Inner" Joins

Now let’s compare an example of an "inner join" using both my recommended method of associations and the LINQ join keyword. Here’s the association version. It’s so readable that there is very little to say about it:

Here’s the join version:

Again, these two queries do exactly the same thing, as you can confirm for yourself by running the demo project. The join version here shares all of the faults of the "left join" version above.

API Consistency

Thus far, I’ve been discussing LINQ to SQL. But what if I have an Employee instance and that like to examine the employee’s supervisor? I might write code like this:

Now compare that code with the "association" and "join" query forms. You will see that using the associations makes the LINQ to SQL queries much more closely resemble how you work with the materialized entity objects in "regular" code. Again, I think this makes your code easier to read.

What About Performance?

Not surprisingly, the SQL generated by the equivalent association and join versions is close to identical. I would not expect to ever see a performance difference between the two syntaxes in terms of query execution time.

Is It Ever Correct to Use join?

The advantages of using associations are so strong that you may wonder why join exists in LINQ at all. Associations, however, are only helpful when they actually exist. There may be times when you need to "join" based on values which are not actually foreign keys. Or you may need to join between LINQ to SQL and LINQ to Objects.

Running the Demo Project

Here’s a the demo project. To build and run it, you’ll need to Visual Studio 2010 Beta, SQL Server (Express is fine), and the Northwind demo database I linked above. Open the project, find the Web.config file in Solution Explorer and open it. Change the connectionString to point to your SQL Server. Now you should be able to run the application.


posted @ Wed, 13 Jan 2010 20:24:20 +0000 by Craig Stuntz


Crossword No. 2

DotNetSlackers just published a crossword puzzle I created; you’ll see the grid below. The puzzle is focused on .NET and programming themes, including a Delphi reference here and there. The site editors wanted an article to go with the puzzle, so I wrote an article explaining how I created the puzzle. The article is full of spoilers, so if you’d like to try and solve the puzzle yourself scroll down to the bottom of the article to find the grid and clues before reading the article at the top.


posted @ Mon, 11 Jan 2010 19:07:06 +0000 by Craig Stuntz


Projecting Onto a Presentation Model with the Entity Framework and ASP.NET MVC

In this post, I will demonstrate how to map entity models to views in an ASP.NET MVC application without worrying about implementation details like eager loading, lazy loading, or having to manually optimize SQL for the task at hand. I will argue that expressing the relationship between an entity model in the presentation model in a LINQ projection is far simpler than other methods of doing this mapping.

Imagine that you’ve been asked to write a new web application to track employees for a customer, Chotchkies restaurant. The application must use ASP.NET MVC and the ADO.NET Entity Framework. The user interaction designers have mocked up the following interface for editing an employee:

Employee editor

Upon seeing this mockup, your first recommendation is to fire the user interface designers. But management declines to follow your recommendation. So how should you implement this? Since this is a new application, you have no database, no entity model, nothing. Where to begin?

Presentation Models

For a variety of reasons, I always use strongly-typed views in my ASP.NET MVC applications. Also, I use presentation models instead of using entity types directly as the model type for my views.This allows the user interface and data models to evolve independently.

Because I am likely to want to build user interfaces using ASP.NET MVC 2’s Dynamic Templated Views, I need to decorate the view model with presentation concerns, like noting that the "Flair count" field is read-only in this particular view.

Especially in view of the less-than-compelling user interface markup, it’s important to realize that the design of the user interface is likely to change wildly over the course of implementing the application. When actual users begin to test the application, they will request changes to the user interface, and you need to be able to adapt to this.

This has two important implications: You must get a prototype to testers and end-users as fast as possible, and your user interface must not be deeply coupled to the rest of the application, given the high likelihood of change.

So based upon the user interface prototype above, the following presentation model might be reasonable:

It’s now trivial to write an action which can be used to develop a view for editing the employee, and which matches the UI prototype above. Such an action might look like:

Demo employee action code

I’ll spare you the HTML. At this point, I have a running application which the user interaction designer can approve, and testers can try out. My total effort, thus far, is a few minutes of work. If I had started by designing a database and the data model, I would still have nothing to show for my efforts.

Entity Model

At some point however, you need to create a database and an entity model. That was, after all, part of the requirements you were given for the application. Knowing that there might be a need to reference people who are not employees, the entity model is going to have to look very different than the presentation model above. After some discussions with the business analyst, you learn that Chotchkies management is very serious about tracking employee flair, so a first attempt at an entity model might look like this:

Now you can generate a database. At this point, you have a presentation model and an entity model, and need only wire them together.

Projection

Taking a collection of instances of one type, and mapping their properties onto a collection of instances of another type is often called mapping or projection. With older ORMs, which have either no or very limited LINQ support, this can be quite tedious, requiring manual code, the use of tools like AutoMapper, and a good deal of thinking about eager loading, lazy loading, and optimizing situations like the fact that we don’t actually want to load all of the flair for an employee here; we just need the count.

The Entity Framework, on the other hand, makes this very easy, as we can just express the relationship between the entity model and the presentation model with a LINQ expression:

I’m glossing over some details here. In a real-world application, for example, we would use the repository pattern rather than grabbing the context directly in the controller. But the point of this post is shown in the code above:

By expressing the relationship between the entity model and the presentation model as a LINQ query, it is no longer necessary to worry about implementation details like eager loading versus lazy loading, optimizing the SQL for the Count and avoiding loading big properties not actually used here. All of this can be — and is — derived from the query above, automatically.


posted @ Thu, 31 Dec 2009 20:23:59 +0000 by Craig Stuntz


Delphi Developers: Go Buy CodeHealer

If you’re doing commercial Delphi development and you’re not already doing static analysis in your automated build, go buy CodeHealer now. Nick Hodges has arranged a 1/2 price special offer.

There is no good reason not to use static analysis. If you are the sort of person who doesn’t allow hints and warnings in your code, and has configured your build to fail on any hint or warning (and I hope that you are), then you’ll love static analysis; it takes this kind of discipline to the next level. Sure, it’s useful to find existing errors in your code, but the most important benefit is that it prevents new errors from being introduced.

The only remotely valid argument I’ve heard against using static analysis is that when you first use a static analysis tool against a legacy code base, you will typically see hundreds of rules violations, many of them incorrect. Cleaning all this up will be a lot of work, which you usually can’t take on all at once.

Unlike compiler hints and warnings, static analysis rule violations often happen when there is no underlying bug. When you develop new code, you can analyze any newly introduced rule violation and exclude it if necessary. But that’s not practical to do with legacy code. My approach to this problem is very simple: Turn off rules in your static analysis tool until all the rule violations go away.

What’s the point of buying a static analysis tool and then turning off half the rules? Well, the other half are still turned on, and they are now a part of your automated build, making it impossible to introduce new code which violates them. Moreover, you can go back and turn on the other rules a reasonable pace, say, one or two a week, fixing real issues that you find or excluding the particular violation if it turns out to be a non-issue, until they are all turned back on.

Once you have even a few static analysis rules turned on in your automated build, you’re getting some degree of quality control for very little expense. Integration testing costs money, every time you do it. Unit testing costs money every time you write new tests. Static analysis just works and keeps working, and costs you nothing more than a software upgrade once every couple years or so.

So static analysis in general is a good thing. What about CodeHealer in particular? Well, we have been using it for around three years now. It has been stable, and I’ve seen new versions released regularly. Also, I have never seen anything better for Delphi.

Digression: Attributes

When Delphi 2010 was released with the enhanced RTTI and custom attributes feature, many Delphi developers wondered what this feature would be useful for. In general, your source code describes the behavior of your application, and attributes describe your mechanisms (thanks to Eric Lippert for this concise description). Static analysis is an excellent example. Imagine that I want to exclude a static analysis rule in a particular case. This has nothing to do with the desired functionality of the application; rather, it’s more like a parameter to the automated build process.

That may sound a bit abstract, so let’s look at a specific example. FxCop, a static analysis tool for .NET code, has rule that should not introduce private types which are never used. But FxCop cannot detect the instantiation of types inside of a LINQ query. So in this case the rule is making a mistake; it is failing to detect that a type is, in fact, instantiated in my code. One possible solution to this issue would be to try and write a better rule. But that’s a lot of work, and there are only about two cases of this false positive in our entire source tree. So it’s easier to just suppress this message in the case of a false positive, which you can do with an attribute:

[System.Diagnostics.CodeAnalysis.SuppressMessage(
    "Microsoft.Performance",
    "CA1812:AvoidUninstantiatedInternalClasses",
    Justification = "It is instantiated in a query, which FxCop can't see.")]
private class ConsumerRow //...

This is far easier (and, I think, a better design) than what CodeHealer currently offers. I hope that future versions of CodeHealer will use this method, now that Delphi has attribute support.


posted @ Tue, 22 Dec 2009 11:56:00 +0000 by Craig Stuntz


Interview With Me At Delphi.org

Jim McKeeth interviewed me for Episode 34 of The Podcast At Delphi.org.


posted @ Wed, 21 Oct 2009 12:35:39 +0000 by Craig Stuntz


Updating to ASP.NET MVC 2 Preview 2

Last week, I updated our main development branch to ASP.NET MVC 2 preview 2 (from preview 1). In this post, I’ll list some of the features I’ve found, and also issues I encountered and how I resolved them.

New Features

Some of the new features of preview 2 have been discussed elsewhere, so I won’t rehash them. But I’ve also noticed that there is a new attribute, [RequireHttps], which does what you would expect, when added to an action, and a new HTML helper, Html.HttpMethodOverride, which makes it easy to take a POST request and code as if the request were actually PUT or DELETE, by adding a hidden input containing a special value on the HTML form. This allows you to write your server in a more RESTful style, which will be suitable for user agents which know about the PUT and DELETE verbs, while maintaining compatibility with those (like browsers) which do not.

MvcHtmlString

After installing the new assembly into the GAC, I attempted to compile our existing projects. I got a compilation error on some of our custom HTML helpers, as the MVC extension methods like Html.RouteLink have been changed to return an instance of type MvcHtmlString instead of String. In many cases, I could just change the return type. However, the clear intention of the framework designers is that any HTML helpers which return HTML (with angle brackets) instead of "plain" text should return a MvcHtmlString instead of a String. So, in addition to changing the return types of methods where necessary to get those methods to compile, I also wanted to change the return types of any custom HTML helper which returned HTML containing angle brackets. It makes no difference today, but in ASP.NET 4, there will be a new syntax to guard against XSS attacks, and it makes sense to get ready for this by returning the correct type.

The trick here is that MvcHtmlString does not have a public constructor, so it’s not immediately obvious how to create one of these things. Perusing the MVC source code, I noticed that it does have a static Create method, and comments elsewhere in the source code indicate that this method should be used instead of the detected constructor. So you can now write a helper like this:

public static MvcHtmlString Fud(this HtmlHelper helper)
{
    return MvcHtmlString.Create("<acronym title=\"Fear, Uncertainty, and Doubt\">FUD</acronym>");
}

JsonRequestBehavior

ASP.NET MVC will now, by default, throw an exception when an action attempts to return JSON in response to a GET request. Unless you read the release notes carefully, you won’t discover this until runtime. This is in order to proactively defend against a particular cross-site attack. It is good to be safe by default, but you are only vulnerable to this attack under the following combination of circumstances:

  • The data you’re returning is worth stealing.
  • The root data object in the JSON response is an array.
  • The requesting browser is not IE 8, or some other browser which doesn’t allow __defineSetter__

It turns out that in our application we almost never return JSON results with the root object as an array in a GET. So the best fix was generally just to tell the framework that we have examined the risk and determined it returning JSON in this case, by changing code like:

return Json(model);

…to:

return Json(model, JsonRequestBehavior.AllowGet);

Unfortunately, I had to make this change in a lot of places. Still, I agree with the framework designers: Better safe than sorry.

JavaScript Files

The MicrosoftAjax.js, MicrosoftAjax.debug.js, MicrosoftMvcAjax.debug.js, and MicrosoftMvcAjax.debug.js files have all changed for ASP.NET MVC 2. Existing ASP.NET MVC 1.0 projects will have old versions of these files. The upgrade instructions in the Release Notes don’t mention this, but (they do; I just missed it) you should replace these files with the newer versions when upgrading to MVC 2. To do this, create a new MVC 2 project, copy the JavaScript files from the Scripts folder in this new project, and paste them over the existing files in the project you are upgrading.

ModelBindingContext Changes

One of the new features of preview 2 is client-side validation and custom metadata providers for validations. In order to implement this feature, some of the implementation details of model binding have changed. This won’t affect most people, but the fix was a little tricky, and required a good bit of examination of the MVC source code, so I’ll list it anyway. Prior to preview 2, I had code in a unit test for a custom model binder like this:

        internal static T Bind<T>(string prefix, FormCollection collection, ModelStateDictionary modelState) where T:class
        {
            var mbc = new ModelBindingContext()
            {
                ModelName = prefix,
                ModelState = modelState,
                ModelType = typeof(T),
                ValueProvider = collection.ToValueProvider()
            };
            IModelBinder binder = new MyModelBinder();
            var cc = new ControllerContext();
            return binder.BindModel(cc, mbc) as T;
        }

With Preview 2, on the other hand, I have to do this:

        internal static T Bind<T>(string prefix, FormCollection collection, ModelStateDictionary modelState) where T:class
        {
            var mbc = new ModelBindingContext()
            {
                ModelMetadata = ModelMetadataProviders.Current.GetMetadataForType(null, typeof(T)),
                ModelName = prefix,
                ModelState = modelState,
                ValueProvider = collection.ToValueProvider()
            };
            IModelBinder binder = new MyModelBinder();
            var cc = new ControllerContext();
            return binder.BindModel(cc, mbc) as T;
        }

Not hard once you know the trick.


posted @ Mon, 05 Oct 2009 19:35:16 +0000 by Craig Stuntz


Comparing C#, C++, and Delphi (Win32) Generics

C#, C++, and Delphi all have a generic type and method language feature. Although all three languages are statically typed, they implement generics in very different ways. I’m going to give a brief overview of the differences, both in terms of language features and implementation. I presume that Delphi Prism generics work essentially the same as C# generics, which, as you’ll see, is different than Delphi/Win32 generics.

Let me say at the outset that although all three systems work somewhat differently, I don’t see an overwhelming advantage to any one design. Generally, you can do what you need to do in all three environments. I’m writing this article not to claim that any one system is better than the others, but to point out some of the subtleties in the implementations.

Before I get started, I’d like to thank Barry Kelly for his useful feedback on my first draft of this article.

Compiling Instantiations

Every implementation of generic types works via a two-step process. First, you define a generic type or method with a "placeholder" for a specific type, which will be substituted later on. Later (exactly when depends upon the language), the type is "instantiated." Note that instantiating a generic type is very different from instantiating an object. The former happens within the compiler, whereas the latter happens at runtime.

Instantiation is triggered when some code uses a generic type or method with a specific type parameter, and means that based upon the generic definition and the types or values passed when the generic is used, a specific implementation is substituted in order to allow the generation of machine code. Instantiation is one of the most important differences between real generic types and using non-generic types with casts. In the end, different machine code is generated for instantiations for different type parameters.

In C# and Delphi, there is a language feature which is solely dedicated to implementing generic types and methods. In C++, on the other hand, the "templates" language feature can be used to implement generic types and methods, among many, many other things. It is even possible to do general-purpose programming using templates, which C++ programmers call "metaprogramming."

C++ templates require the template source code to be available when the code using the template is compiled. This is because the compiler does not actually compile the template as a separate entity, but rather instantiates it "in-place" and only compiles the instantiation. The C++ compiler is effectively doing code generation, substituting the type parameter (or value) for the placeholder for the type, and generating new code for the instantiation. Update: Moritz Beutel elaborates on this in his excellent comment on this post. You should read the full comment, but the short version is that the manner in which templates are compiled can result in errors in the code which uses the template appearing (from compiler error messages), incorrectly, to be errors in the template itself. Moreover, the implementation of most C++ compilers makes this problem even worse than what is necessary in order to implement the C++ standard.

In Delphi and C#, on the other hand, the generic type or method in the code which uses the generic type or method can be compiled separately. Therefore, you can compile a library which contains a generic type, and later on compile an executable which uses a instantiation of that type and has a reference to the binary library, rather than to the source code for the library.

Another way to think of this difference is that in C++, a template will not be compiled at all until it is used. In Delphi and C#, on the other hand, a generic type or method must be compiled before it can be used.

In Delphi, the compiler uses a feature closely related to the method inlining feature. This causes the compiler to store the relevant bits of the abstract syntax tree for the generic type parameter in the compiled DCU. When the code which uses the generic type is compiled, this bit of the abstract syntax tree is read and included in the abstract syntax tree for the code which uses the generic type, so that when machine code is produced, based on the new, “compound” abstract syntax tree, it looks, to the code emitter, like the type was defined with the type parameter "hard coded." Instead of linking to compiled code in the DCU, the code which uses the generic type emits new code for the instantiation into its own DCU.

Because generic instantiation is performed in the same area of the Delphi compiler which does method inlining, there are some limitations on what you can do in a generic method, or a method of a generic type. Like inlined methods, these methods cannot contain ASM. Also, calls to these methods cannot be inlined. These restrictions are limitations of the implementation, not of the language design, and could theoretically be removed in a future version of the compiler.

C# generics use the .NET Framework 2.0+, which has native support for generic types. The C# compiler emits IL which specifies that a generic type should be used, with certain type parameters. The .NET framework implements these types using one instantiation for any reference type, and custom instantiations for value types. (Don’t confuse “instantiation” with “instance” in the preceding sentence; they mean entirely different things in this context. There are usually many instances of one instantiation.) This is because a reference to a reference type is always the same size, whereas value types can be many different sizes. Later, the IL will be JITted into machine code, and, as with compiled C++ or Delphi code, types don’t really exist at the machine code level. In .NET, generic type instantiation and JITting are two distinct operations.

So one important difference in generics implementations is when the instantiation occurs. It occurs very early in C++ compilation, somewhat later for Delphi compilation, and as late as possible for .NET compilation.

Custom Specializations

Another very important difference is that C++ allows custom instantiations, called specializations, including specializations by value. With C# and Delphi, on the other hand, the only way to instantiate a generic type is to use that type with an explicit type parameter. The implementation will always be the same, with the exception of the type of the type parameter. Because C++ allows custom instantiations, it is easy for a programmer to write different implementations of a method, for example, for different integer values. Like operator overloading, this is a powerful feature which requires considerable self-restraint to avoid abuse.

Constraints

Delphi and C# both have a generic constraint feature, which allows/requires the developer of a generic or method type to limit which type parameter values can be passed. For example, a generic type which needs to iterate over some list of data could require that the type parameter support IEnumerable, in either language. This allows the developer of the generic type to make her intentions for the use of the type very clear. It also allows the IDE to provide code completion/IntelliSense on the type parameter, within the definition of the generic type. Also, it allows a user of the generic type to be confident that they are passing a legal value for the type parameter without having to compile their code to find out.

In C++, on the other hand, there is not presently any such feature. A more powerful/complex feature called "concepts" was considered for, but ultimately removed from, C++0x.

An implication of the lack of constraints is that C++ templates are duck typed. If a generic method calls some method, Foo on a type passed as the generic type parameter, then the template is going to compile just fine so long as the type parameter passed contains some method called Foo with the appropriate signature, no matter where or how it is defined.

Covariance and Contravariance

Let’s say I have a function which takes an argument of type IEnumerable<TParent>. Can I pass an argument of type IEnumerable<TChild>; to that function? What if the argument type were List<TParent>; instead of IEnumerable<TParent>? Or what if the generic type was the function result rather than the function argument? The formal names for these problems are covariance and contravariance. The precise details are too complicated to explain in this article, but the examples above summarize the most common times you run into the problem.

Delphi generics and C++ templates do not support covariance and contravariance. So the answers to the questions above are no, no, and no, although there are, of course, workarounds, like copying the data into a new list. In C# 4.0, function arguments and results can be declared covariant or contravariant, so the examples above can be made to work where appropriate. "Where appropriate" involves non-trivial subtleties hinted at above, and exemplified by the fact that arrays in .NET have (intentionally) broken covariance. However, the BCL routines in the .NET Framework 4.0 have been annotated to support covariance and contravariance when appropriate, so developers will benefit from the feature without having to fully understand it.


posted @ Thu, 01 Oct 2009 19:13:29 +0000 by Craig Stuntz



Server Response from: blog2.codegear.com

 
Copyright© 1994 - 2009 Embarcadero Technologies, Inc. All rights reserved. Contact Us   Legal Notices   Privacy Policy   Report Software Piracy