More String Comparisons

Tuesday, June 7, 2005

Just out of college, I used to spend half my time writing Windows UI code in C++, and half my time writing firmware for 8 bit Intel and Hitachi CPUs using C and assembly language. The firmware drove some infrared LEDs and took readings from an ADC to determine the percentage of protein in a wheat sample. “How exciting”, I can hear you say.

The C compiler for the 8 bit Hitachi chip was terrible, but fortunately the compiler generated files of assembly language instructions instead of binaries.. The assembly files would then pass through an assembler to produce the final firmware bits. When I hit a spot that was slow I could take the assembly files and begin to hand tweak the code. The C compiler generated hideous instruction sequences, particularly when it came to using the floating point libraries, so sometimes it was easy to get 5x to 10x performance increases with hand optimizations – not something you’ll do in a day with today’s mainstream compilers.

Ah, well. Different place, different time, different life.

The reason I bring up this background is that I still like to look at assembly code now and then. Particularly when I see a post like Geoff’s – it just makes me want to see what is happening after the JIT optimizations kick in.

I asked Geoff for the code, compiled it, ran it, and attached WinDbg. I wanted to see what the JIT was producing for the following C# and VB.NET methods. Note: both methods are implementing ITester.StringTest. .

public void StringTest()


    string s1 = "foo";

    string s2 = "bar";

    bool bRet = s1.Equals(s2);


Public Sub StringTest() Implements ClassLibrary3.ITester.StringTest

        Dim s1 As String = "foo"

        Dim s2 As String = "bar"

        Dim bRet As Boolean = s1.Equals(s2)

    End Sub

To view a disassembly with WinDbg:

0:005>.load E:\WIN2003\Microsoft.NET\Framework\v1.1.4322\sos.dll

0:005> !name2ee ClassLibrary2.dll ClassLibrary2.Class1.StringTest


MethodDesc: 927308

Name: [DEFAULT] [hasThis] Void ClassLibrary2.Class1.StringTest()

0:005> !name2ee ClassLibrary1.dll ClassLibrary1.Class1.StringTest


MethodDesc: 927280

Name: [DEFAULT] [hasThis] Void ClassLibrary1.Class1.StringTest()

name2ee can get us the MethodDesc addresses given a module and qualified method name. Given a MethodDesc address we can ask for a disassembly. Here is the VB.NET method. I’ve modified the output slightly to make it easier on the eyes.

0:005> !u 927280

Normal JIT generated code

[DEFAULT] [hasThis] Void ClassLibrary1.Class1.StringTest()

Begin 00ce14c0, size 14

1    mov    ecx,[0206617c] ("foo")

2    mov    edx,[02066180] ("bar")

3    cmp    [ecx],ecx

4    call    mscorwks!COMString::EqualsString (791ef42d)

5    ret

The VB.NET version is short and sweet. The code moves string references into the ecx and edx registers. The cmp instruction is a compare instruction, but what this instruction really does is check to make sure the ecx register does not contain a null pointer by dereferencing the value held in the register (the brackets indicate an indirect addressing mode). If you look at the IL listing Geoff shows in his post, the method call s1.Equals produces a callvirt IL instruction, and callvirt guarantees the “this” / “Me” reference will not be null. If s1 was null, the CPU would trap this instruction and ultimately force a NullReferenceException to bubble out of the CLR. Once the check passes, the Equals method gets invoked.

For some reason, the C# compiler and JIT did not work together to produce quite as good a code as the VB.NET version. Mind you, we are talking about nanoseconds here, so as Geoff pointed out it is nothing to get too excited about. Just makes one curious, is all.

0:005> !u 927308

Normal JIT generated code

[DEFAULT] [hasThis] Void ClassLibrary2.Class1.StringTest()

Begin 00ce1500, size 1d

1    push    esi

2    mov    esi,[0206617c] ("foo")

3    mov    edx,[02066180] ("bar")

4    mov    ecx,esi

5    cmp    [ecx],ecx

6    call    mscorwks!COMString::EqualsString (791ef42d)

7    and    eax,0xff

8    pop    esi

9    ret

For some reason the instructions for the C# version bounce the first string reference from the esi register to the ecx register, which forces a push and then a pop to preserve the value in esi. (The value in esi was a reference to the Class1 instance, you can find this out by putting a breakpoint on the first instruction (bp 00ce1500) and dumping stack objects (!DumpStackObjects). The C# version also seems obsessed with the return value of the Equals method, even though we never make use of the value! Return values generally appear in the ax register, and I’m assuming the and operation on line 7 is masking off the high bits to make sure we have a 32 bit bool. Weird.

There you have it. It’s VB.NET by 9 bytes and a few clock ticks.

“How exciting”, I can hear you say again.

Geoff Appleby Tuesday, June 7, 2005
Well I definately am saying it :)
Stuart Tuesday, June 7, 2005
+1 :)
<br>I don't care so much about VB vs. C#, but I love the assembly overview.
<br>I started programming professionally in the late '90's using scripting languages and VB6, and I moved to .NET when it first came out. In other words, I always had these magical engines and runtimes underlying my code.
<br>Over the past two or three years, though, I've been descending ever further in my spare time, moving from C# to C++ to C and finally to assembly. Some people call this regressing; I call it progressing retrochronologically. ;)
<br>So anyway, I'm currently in the process of going through K&amp;R C and this NASM book [1] and it's actually *fun*.
<br>Yeah, so... I liked this post!
Scott Tuesday, June 7, 2005
Thats a good looking book, thanks Stuart.
<br>Another one I've liked that is free is &quot;The Art Of Assembly Language Programming&quot;. There is a version here:
Andy Wednesday, June 8, 2005
Scott I have the pdf's of that book. I got them here. This zip file should have the whole book:
<br>The actual tree killing copy can be bought here:
<br>I love that book. Could you tell?
Milan Negovan Friday, June 10, 2005
Aaaaah, good ol' assembler.
<br>Anyway, the use of esi here *is* strange, although returning the result in ax is &quot;by the book&quot;. It's also a little strange that the unused return value wasn't &quot;optimized away&quot; like you mentioned.
<br>What gives me shivers is seeing &quot;COMString::&quot;. I plead ignorance here, but seeing COM involved in such a rudimentary operation as string comparison is troubling.
Comments are closed.

My Pluralsight Courses

K.Scott Allen OdeToCode by K. Scott Allen
What JavaScript Developers Should Know About ECMAScript 2015
The Podcast!