Latest Posts


In Delphi, is addition commutative?

Answer: no, not necessarily. Despite the existence of operator precedence, i.e. the fact that the following

  X := 3 + 4 * 5;

results in 23 and not in 35, the order of operands can still have an effect.

In my BigInteger code, I discovered an odd error, that only happened in some very rare cases, and only in PUREPASCAL code, i.e. code that did not use assembler, and only in 64 bit.

It took me several hours to find out that this was the problematic expression:

  Value := NormDividend[I + J] + NormDivisor[I] + Value shr CDivLimbBits;

CDivLimbBits is a constant with value 32, Value is an Int64, NormDividend[] and NormDivisor[] are arrays of UInt32 (Cardinal). Only in some very special circumstances, this
caused an error.

What happened?

In this unit, which does lots of odd and ugly things to UInt32s and to speed things up, I turned off range and overflow checks, so it went unnoticed that NormDividend[I + J] + NormDivisor[I] caused an overflow. Since overflow and range checks were off, the 33rd bit simply got cut off.

But you might say: "Hey, the third operand is an Int64, so why were these two operands not promoted to 64 bit?" It turns out that this only happens once it is required, so what the compiler actually compiles is:

  UInt32(Intermediate) := UInt32(NormDividend[I + J]) + UInt32(NormDivisor[I]);
  Value := Int64(Intermediate) + Value shr 32;

while I expected:

  Value := Int64(NormDividend[I + J]) + Int64(NormDivisor[I]) + Value shr 32;

Now, if you rearrange the Int64 to come first, like:

  Value := Value shr 32 + NormDividend[I + J] + NormDivisor[I];

then all is well. The first operand is an Int64, so all following operands are promoted too and you really get:

  Value := Value shr 32 + Int64(NormDividend[I + J]) + Int64(NormDivisor[I]);

Note that this error did not happen in 32 bit code. There, NormDividend[] and NormDivisor[] are arrays of UInt16, and Value is an Int32. In other words, in 32 bit code (and even in 64 bit code on Windows), everything seems to be promoted to Int32 (signed 32 bit integer) anyway, probably because that type is somehow the preferred type for integer expressions (most integer code uses 32 bit registers, in 32 bit as well as in 64 bit).

So take care to either cast to the required type, or to put the largest operand first, otherwise you might be in for a surprise. It certainly was a surprise to me, especially because the
data I had used for unit testing had not caught this.

Only the fact I wanted to improve the speed of ToString (converting the largest known prime to a string of 22 million decimal digits still takes approx. 2′30", while converting it to a string of — much more easily convertible — hex digits only takes 135 ms), and the coincidence that in one test I had to divide by exactly 10^57, made me catch this error. Note that the assembler code did not suffer from this. There I can control exactly what gets promoted and when.

This also made me aware again of the fact that testing can only show the presence of errors, and never the absence, and that it is extremely hard to find test cases that cover everything. The fact I had to divide by a number that caused the error was sheer coincidence.


posted @ Mon, 01 Feb 2016 23:01:15 +0000 by Rudy Velthuis


Designing for Problems Too Big to Test

In this post, I will show an example of where using unit testing as a design methodology does not work, and how to produce a design for correct code anyway. There is no single design methodology which works for all problems, so it’s useful to have a variety of tools at your disposal.

This post is my contribution to the 2015 F# Advent Calendar.

I’m implementing a compiler for a tiny language without use of external libraries for things like parsing and code generation. The idea is to produce a minimal example of a purely functional compiler. This is an ongoing project, and some parts are further along than others, but you can see the source code as I work, and it does produce working EXEs today.

Designing a compiler is harder than many problems in programming, because they do something difficult: A compiler must be able to accept any string and either produce an equivalent program or explain clearly to a human being why this is string is not a valid program. And there are a lot of possible strings!

Designing a compiler is also easier than many problems in programming, because there exist formal methods for solving many of the harder sub-problems in the design. You can think of "formal methods," here, as recipes for a solution, but very special recipes which guarantee that you’ll touch all possible cases in the problem space.

Design Methodologies

Over the years, programmers have used a number of different methodologies when approaching program design. These include:

  • The Big Ball of Mud. Arguably the most common methodology, this involves just poking at the problem until things seem to work for the most common case, maybe.
  • Big Design Up Front. In this methodology, a full specification for the implementation of the system is developed before coding begins. Many people consider this obsolete or, at best, wildly inefficient.
  • Test Driven Design. I’m going to distinguish this from test driven development, because tests are useful both as a methodology for program design and for implementing a program design. In practice, people tend to combine these. As a design methodology, the general idea is that you start with either high or low level simple cases, and continue working until a design evolves. Some people divide this into sub-schools of test driven design.

Despite its ubiquity, few defend the big ball of mud as a design practice. Big design up front is widely ridiculed. That leaves TDD as the most prevalent design methodology that people are willing to publicly defend. Unfortunately, testing, while useful, is fundamentally limited.

"…program testing can be a very effective way to show the presence of bugs, but is hopelessly inadequate for showing their absence." Edsger Dijkstra

In cases where "the happy path" is far more prevalent than "edge cases," this might not seem to be a show-stopping limitation, and test driven design works OK in many cases.

There Are No Edge Cases In Programming Languages

As noted above, a compiler must be able to accept any string and either produce an equivalent program or explain clearly to a human being why this is string is not a valid program. A compiler designer cannot predict the valid programs people may write, nor the errors they may create.

For example, let’s consider Duff’s Device. It’s safe to presume that Brian Kernighan and Dennis Ritchie did not have this in mind when they designed the C programming language. For the uninitiated, Duff’s Device nests a while loop inside of a switch statement:

send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count + 7) / 8;
    switch (count % 8 ) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}

This is unreadable to the point that it borders on obfuscation, but is legal C, per the specification, and does perform a useful optimization on a particular case. I bring it up because, as a language implementer, I think it drives home the point that there is no possibility of creating (anywhere near) all of the possible unit tests for all of the possible ways someone might choose to use your language.

Different Tasks, Different Design Methodologies

In many programming tasks, the number of "happy path" cases are similar to the number of edge and error cases. At least within the same order of magnitude. In these cases it’s probably possible to exhaustively test the system, even if people don’t usually bother to do so.

For other tasks, however, the number of "edge cases" is uncountably large. I gave a programming language example above, but there are others, such as designing an API for a cloud service. In these cases, we need a design methodology which gives us some assurance that our design will work with cases that we did not and could not possibly produce tests for, because real-world use cases will vastly outnumber our test cases.

Formal Methods

The solution to this problem is to break the problem space into a countable number of conditions. This is only effective if those countable conditions represent all possible states in the problem space. For example, for a programming language, we divide the task of compilation into small phases such as lexing, parsing, etc., and within each phase we use a formalism which guarantees that we cover the entire possible range of conditions within that phase.

This will make more sense if I give an example.

Lexing and Regular Expressions

In many compilers, the first phase of compilation is lexing, where the string representing the program source code is split into tokens. The token list will be passed to the parser, which attempts to match them up with the grammar of the programming language. As a practical example, consider the following expression from a Lisp-like language, which increments the number 1, resulting in the value 2.

(inc 1)

Lexing divides this source code into a stream of tokens, like this:

LeftParenthesis
Identifier "inc"
LiteralInt 1
RightParenthesis

These tokens will be consumed by the parser to produce and abstract syntax tree, type checked, optimized, etc., but let’s just look at lexing for now.

Substrings of the input source code are mapped to tokens using regular expressions. Not the PCRE type with zillions of features you might be familiar with, but a far simpler version with only a few rules. The lexical grammar for this language looks something like this:

leftParenthesis  = '('
rightParenthesis = ')'
letter           = 'A' | 'B' | 'C' | …
digit            = '0' | '1' | '2' | …
number           = ('+'digit|'-'digit|digit) digit*
alphanumeric     = letter | number
// …

You don’t use System.Text.RegularExpressions.Regex for this; it’s slow, and has features you won’t need.

How can we guarantee that we can tokenize any possible string? We don’t need to; as long as we explicitly handle the case of strings we can’t tokenize, we’re covered. I do this by having an extra token type for unrecognized characters. These are eventually mapped into errors the user sees.

How can we guarantee that we can tokenize any string representing a valid program without seeing an unrecognizable character? Because the parser is designed around a formalism (a context free grammar) which maps lexemes to abstract syntax trees, and the only valid programs are those which can be constructed from repeated applications of the production rules in the parser’s grammar. We have changed the scope of the problem from "any possible string" to "any possible sequence of lexemes."

Right away we have a big win in the number of test cases. Any "character" in a string could be one of 2^16 UTF-16 code points, but the number of possible lexemes is considerably smaller. A real language would have maybe 10 times more, but that’s still better than testing an input of any possible Unicode code point:

type Lexeme =
    | LeftParenthesis
    | RightParenthesis
    | Identifier    of string
    | LiteralInt    of int
    | LiteralString of string
    | Unrecognized  of char

We can test the lexer in isolation with a much smaller number of test cases.

The example I gave was a very simple expression, but real-world programs obviously contain more complicated expressions. Also, real-world code is often invalid and must be rejected by the compiler. Some coding errors cannot be detected until further on in the compilation pipeline, but there are possible errors at the lexing stage. For example, in my language, identifiers must begin with a letter, so the expression

(| 1)

…maps to:

LeftParenthesis
Unrecognized '|'
LiteralInt 1
RightParenthesis

Importantly, we should be able to examine any character of a real-world string, and map it into one of these types. The Unrecognized type serves as a kind of fall through for characters which do not fit into the types in the union.

F#’s exhaustiveness checking ensures that we cannot forget to handle a particular case even if we add additional lexemes to the language specification later. As a simple example, consider this pretty print function which takes a list of lexemes and produces a string similar to the originally parsed source code:

let private prettyPrintLexeme = function
| LeftParenthesis          -> "("
| RightParenthesis         -> ")"
| Identifier    identifier -> identifier
| LiteralInt    num        -> num.ToString()
| LiteralString str        -> sprintf "\"%s"\" str
| Unrecognized  ch         -> ch.ToString() 

let prettyPrint =
    List.map prettyPrintLexeme
    >> String.concat " "

If we were to, after the fact, add an additional type of lexeme, but forgot to modify the prettyPrint function, we would receive a compiler warning since the function would no longer be exhaustive.

The Rest of the Pipeline

What about parsing, type checking, and the rest of the compilation pipeline? Formalisms exist for those, as well.

Compilation phase Formalism
Parsing Context free grammar
Optimization Algebra
Type checking Logical inference rules
Code generation Denotational semantics

Isn’t This Just Big Design Up Front?

The idea of basing your implementation design around in an exhaustive specification might sound like a big design up front, but there is an important difference. The formalisms used in implementing a compiler are "off the shelf." Nobody has to sit down and create them, because they have been refined over decades of compiler implementations. You simply need to know that they exist.

If "formal methods" sounds too snobby for your taste, you can simply call this "programming with proven recipes."

And of this downside of this methodology is that it becomes big design up front in those cases when there is not an off the shelf formalism available for your use. That’s OK! The important thing is to know when these formalisms exist in how to use them, when necessary.

The full source code for this post is available.


posted @ Thu, 24 Dec 2015 04:08:14 +0000 by Craig Stuntz


New: BigIntegers

I wrote a new implementation of BigIntegers for Delphi. It uses assembler for the low level stuff, although there is a PUREPASCAL version for each of these routines too. Instead of using the GPL-licensed GNU Multi-Precision (GMP) library, I wrote everything from scratch, without using any code from others.

I think it is pretty fast, especially because I optimized many of the routines, either by techniques like loop unrolling or using 64 bits at once, or by using bitwise tricks, or by high-level algorithms like Burnikel-Ziegler, Karatsuba, Toom-Cook 3-way, etc. Results are checked with results generated by .NET. The C# test program is included.

In the course of writing this, I read an enormous numbers of publications on these subjects and I learned a lot, not only about the maths involved, but also about many other mathematical notions. Not so easy for a dentist. I also had some good help on StackOverflow.

Go take a look at my website. I think every Delphi should have BigIntegers.


posted @ Thu, 19 Nov 2015 11:29:41 +0000 by Rudy Velthuis


Rudy’s Delphi Corner is back online

Because of problems with my old domain and my provider, my old Delphi Corner website http://www.rvelthuis.de was offline for a while, and parked as subdirectory of another domain of mine.

I am very pleased to announce that it is back online, at the old address: http://rvelthuis.de

YAY!


posted @ Sun, 15 Nov 2015 01:07:50 +0000 by Rudy Velthuis


F# Presentations at CodeMash 2016

I’ve been scouring the CodeMash accepted session list for talks featuring F#, and there are quite a few!

Bonus! Other talks of possible interest to functional programmers:

Double-Bonus! Interesting looking Pre-Compiler:

Triple-Bonus! Employment offer:

Improving is hiring developers and talented QAs in the Columbus area. It’s a nice place to work. We do interesting stuff. If that sounds good to you, talk to me! We don’t have remote positions at this time, but I’m trying to make that happen. Stay tuned!


posted @ Mon, 09 Nov 2015 20:41:59 +0000 by Craig Stuntz


Speed problems caused by code that never ran

Let me start out by telling you that I am currently implementing a (simple) BigInteger implementation for Delphi. This implementation tries to be compatible with the BigIntegers as used in .NET, so I can test my results to be the same as the results of my tests in C#.

For this, RightShift must use two’s complement semantics, and shifting a negative integer must shift in 1-bits from the top, so -100 ($FFFFFF9C) shifts to -50 (hex $FFFFFFCE). For normal Integer values, the result in Delphi would be 2147483598 (hex $7FFFFFCE), because Delphi does a simple shift and does not preserve the sign bit. But my BigIntegers should, because most other Biginteger implementations (including the .NET one, my reference) do that too.

To achieve this, in my routine, I did something like this:

class operator BigInteger.RightShift(const Value: BigInteger; Shift: Integer): BigInteger;
var
  LSize: Integer;
  ShiftOffset: Integer;
  RSize: Integer;
  P: PLimb;
begin
  if Value.IsZero then
    Exit(BigInteger.Zero);

  if Value.IsPositive then
  begin
    // Call internal methods that allocate the result, shift the value right, etc.
    // Works fine, no problem.
  end
  else
  begin
    // Recursive call
    Result := MinusOne - ((MinusOne - Value) shr Shift);
  end;
end;

That gave me the right results, so I was happy until I started benchmarking the new routine. I noticed that the right shift was more than twice as slow as the corresponding left shift. I did not understand this, because a right shift actually has to move fewer bits and the result is smaller. Even if I only passed in positive values, it would still be slow. The code was almost the same as the code for left shift, though. Well, except for the else clause.

People cleverer than I am probably already see the problem: the "negative" branch (the else clause) of the if clause. When I removed it, code was indeed faster than that for left shifts. But when I put it back in, even though it would never run, it slowed down everything. Only after some hard thinking and some debugging I noticed, in the CPU view, that there were calls to InitializeArray and FinalizeArray in the compiled routine, and that everything was surrounded by a try-finally block. The expression in the "negative" branch had some intermediate results, and records for these were allocated and managed by the runtime. That made the entire code slow.

First solution

My first solution was to put the code for the "negative" branch into a nested procedure of its own. The hidden local BigIntegers were now confined in that procedure and would not cause the entire routine to be slow. The "positive" part was indeed up to speed now.

I finally deconstructed Result := MinusOne - ((MinusOne - Value) shr Shift); into internal calls entirely, so no hidden BigIntegers were allocated anymore. Now I could put the "negative" code back from the nested procedure to the negative branch, and it was quite a bit faster as well.

Conclusion

I learned two things from this:

  • Beware of hidden code. Here, it was caused by the expression with intermediate results.
  • Inline with caution. If I had inlined the nested routine, the hidden BigIntegers would be put back in the outer scope, and things would have been slow again.

posted @ Sun, 04 Oct 2015 14:13:06 +0000 by Rudy Velthuis


Speaking at Dog Food Conference, CloudDevelop, and CodeMash

I’ll be speaking about compilers, testing, and machine learning at a conference near you! Abstracts for all of these sessions are on my Presentations page.

  • Programs that Write Programs: How Compilers Work Dog Food Conference and CodeMash
  • What Testing Can Never Do and How to Do It Anyway Dog Food Conference
  • Machine Learning with Azure ML CloudDevelop

posted @ Wed, 30 Sep 2015 13:47:37 +0000 by Craig Stuntz


Delphi Corner new URL

Because of problems with my old domain and my provider, my old Delphi Corner website http://www.rvelthuis.de is now in the hands of a domain grabber who asks a horrendous amount of money for the page.

Until I get this cleared up (if at all), I parked my pages at http://www.praxis-velthuis.de/rdc/ .

You can find the articles and components and other pieces of software there.


posted @ Thu, 10 Sep 2015 20:46:44 +0000 by Rudy Velthuis


Adding a [FixedLength] Attribute in Code-First Entity Framework

In Code First Entity Framework models, you can define the length of a string field with StringLengthAttribute, but you have to write code in OnModelCreating to indicate a CHAR/NCHAR fixed length field:

public class MyEntity
{
	[Key]
	public int Id { get; set; }

	[StringLength(2)]
	public string FixedLengthColumn { get; set; }
}

public partial class MyContext : DbContext
{
	public virtual DbSet MyEntities { get; set; }

	protected override void OnModelCreating(DbModelBuilder modelBuilder)
	{
		if (modelBuilder == null) throw new ArgumentNullException("modelBuilder");

		modelBuilder.Entity()
			.Property(e => e.FixedLengthColumn)
			.IsFixedLength();
	}
}

I find it a bit confusing to split configuration like this, especially with a real model containing lots of fields and not this trivial example. Fortunately, you can fix it! Add these:

/// <summary>
/// Used to mark entity properties that are fixed length strings (CHAR(n) DB fields).
/// </summary>
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field, AllowMultiple = false, Inherited = true)]
public sealed class FixedLengthAttribute : Attribute {}

public class FixedLengthAttributeConvention :  PrimitivePropertyAttributeConfigurationConvention
{
	public override void Apply(ConventionPrimitivePropertyConfiguration configuration, FixedLengthAttribute attribute)
	{
		configuration.IsFixedLength();
	}
}

And change the model configuration to:

public class MyEntity
{
	[Key]
	public int Id { get; set; }

	[StringLength(2),
	 FixedLength]
	public string FixedLengthColumn { get; set; }
}

public partial class MyContext : DbContext
{
	public virtual DbSet MyEntities { get; set; }

	protected override void OnModelCreating(DbModelBuilder modelBuilder)
	{
		if (modelBuilder == null) throw new ArgumentNullException("modelBuilder");

		modelBuilder.Conventions.Add<FixedLengthAttributeConvention>();
	}
}

posted @ Mon, 17 Aug 2015 18:26:27 +0000 by Craig Stuntz


How To Use Real Computer Science in Your Day Job

I’ll be speaking at Lambda Jam next week. Here’s the synopsis:

When you leave Lambda Jam and return to work, do you expect to apply what you’ve learned here to hard problems, or is there just never time or permission to venture outside of fixing “undefined is not a function" in JavaScript? Many of us do use functional languages, machine learning, proof assistants, parsing, and formal methods in our day jobs, and employment by a CS research department is not a prerequisite. As a consultant who wants to choose the most effective tool for the job and keep my customers happy in the process, I’ve developed a structured approach to finding ways to use the tools of the future (plus a few from the 70s!) in the enterprises of today. I’ll share that with you and examine research into the use of formal methods in other companies. I hope you will leave the talk excited about your job!

For Columbus friends who can’t make it to Chicago, I’ll be rehearsing the presentation this coming Saturday at 3.


posted @ Thu, 09 Jul 2015 19:20:54 +0000 by Craig Stuntz



Server Response from: BLOGS1

 
Copyright© 1994 - 2013 Embarcadero Technologies, Inc. All rights reserved. Contact Us   Legal Notices   Privacy Policy   Report Software Piracy