Skip to content

You Already Know a Lazy Language

In lazy languages such as Haskell, it can be difficult-to-impossible to predict the order in which certain statements will execute. That’s fine, since in a purely functional language the order of execution doesn’t matter, but people who are used to imperative languages such as Delphi sometimes find the concept difficult to get their heads around.

But there’s a very common lazy language which I’d guess just about everyone reading this blog already knows, namely SQL’s SELECT statement. Perhaps many people don’t realize that it’s lazy since most interfaces to SQL databases abstract the laziness away, but under the hood the DB server will execute your SELECT statement lazily. Let’s take a closer look.

When you execute a query against an SQL database server, you do a few things, from the point of view of the client application (I’ll ignore transactions, parameters, and other stuff which aren’t relevant to this discussion):

  1. Prepare the statement
  2. Execute the statement
  3. Fetch one or more rows
  4. Release the statement

When you prepare the statement, the statement is parsed, checked for validity (e.g., all tables referenced must exist), and optimized. Executing the statement binds params and returns a cursor handle. Most Delphi programmers probably expect to get a row back at this point, but that’s something that Delphi’s TDataSet class does; the server won’t return any rows until you explicitly fetch them. You can fetch zero, some, or all of the rows in the cursor, and then release the statement, telling the server you don’t intend to use it again.

With this in mind, let’s consider a simple SQL statement:

SELECT
  *
FROM
  LEFT_TABLE LT
  INNER JOIN RIGHT_TABLE RT
    ON RT.LT_ID = LT.ID

When you prepare this statement, the server will optimize it. Depending upon the indices available, the server may decide to sort both LEFT_TABLE and RIGHT_TABLE and merge the two resulting streams, or it might select rows from LEFT_TABLE sequentially and then use an index to look up matching rows in RIGHT_TABLE. The client neither knows nor cares about this, for the most part; it just needs the correct result.

The interesting thing here is that the order of evaluation is very different depending upon the option chosen. With the sort/merge optimization strategy, the server must perform the entire JOIN before it can return the first row. With the sequential/indexed lookup strategy, the server can perform little bits of the JOIN as each row of the result set is returned. However, if you only want the 20th row, you must first fetch (and the server must JOIN) the first 19 rows before you can see the one you care about. Your SELECT statement, meanwhile, only expresses some truths about the result set you want (namely, that RT.LT_ID = LT.ID) rather than stating precisely how and in what order the server should ensure that this is true. In other words, the JOIN conditions (and, similarly, conditions in a WHERE clause) are not imperative instructions to be executed the instant they are encountered, but predicates which express facts about the result set that you need, and will be examined only when and if you ask for a record which requires its evaluation. From the point of view of the client app, you just want correct results; you don’t care precisely when they’re calculated, at least until it comes time to optimize an already-correct program.

But if you think about trying to "debug through" a SELECT statement inside the server, things get very complicated quickly. If you tried to "step into" fetching the first row of the result set, you might find that you had to "step through" the JOIN of every row of the result set before the first fetch operation returned, or you might find that only the first JOINed row would be evaluated. It would depend on both the query itself and the underlying data. The reason why this isn’t a huge productivity problem in the real world is because SELECTs nearly always work. If it ain’t broke, we don’t have to debug it.

If you understood all of the discussion here about SQL, then you already understand lazy evaluation in a programming language. Just substitute, for example, "list" in place of "SQL result set" and "function" in place of "WHERE clause," and you have it.

Post a Comment

Your email is never published nor shared. Required fields are marked *

Bad Behavior has blocked 1846 access attempts in the last 7 days.

Close