What's Wrong With This Code? (#8)

Tuesday, October 24, 2006

Joe Developer is working with a simple struct:

struct Point
{

public int x;

public int y;
}

Joe's tech lead asked him to write a method that will return an array of 10,000 initialized points. Joe wrote following code.

Point[] CreatePoints()
{

Point[] points = new Point[10000];

for (int i = 0; i < points.Length; i++)
{
points[i] =
new Point();
}

return points;
}

The code doesn't create any runtime errors, but Joe is worried because his tech lead looked at the code and frowned. What could provoke such a reaction?

Hint: Joe's lead is a performance nut.

Alan Dean Tuesday, October 24, 2006
The iteration is unnecessary - structs are value types.

The declaration:

Point[] points = new Point[10000];

will call the default ctor.
Anthony Mills Tuesday, October 24, 2006
Given that he's working with structs, the memory is zeroed anyway and "return new Point[10000];" is sufficient?
Geoff Appleby Tuesday, October 24, 2006
Ha! That's a simple one :)

Because Point is a struct, it's a valuetype, which means it ALWAYS exist, it doesn't need to be new'ed.

As a result, all he needed was:

Point[] points = new Point[10000];
return points;

Jeff Tuesday, October 24, 2006
Unless I'm missing a shorter way, each element of the array must be created via "new" so the structure is valid. This means we can't escape touching each index of the array. It feels like a naive answer to me, but just unroll the loop. Perhaps mod the length of the array by 10, allocate ten points per loop iteration and handle any extra indices after the loop (incase the length isn't evenly divisible by 10).
Eber Irigoyen Tuesday, October 24, 2006
Joe's lead was pissed because Joe didn't initialize the structures (instances were created but not initialized)
Haacked Tuesday, October 24, 2006
Joe should have done this instead, since it accomplishes the same thing. Instead he creates 10,000 points twice.

Point[] CreatePoints()
{
return new Point[10000];
}
stej Tuesday, October 24, 2006
Hi, is the problem in "return points;"? I think the whole array will be copied (twice?)since the Point[] is array of structs.
Phil Weber Tuesday, October 24, 2006
The loop is superfluous. Structs are value types, and as such do not need to be initialized with 'new'.
Caffenero Tuesday, October 24, 2006
Has it something to do with the fact that Structs and arrays are passed by value? :)

So, maybe:

void CreatePoints(out Point[] result)
{
result = new Point[10000];

for (int i = 0; i < points.Length; i++)
{
result[i] = new Point();
}
}
Wilhelm Svenselius Tuesday, October 24, 2006
Perhaps calling 'points.Length' on each iteration is a bit of a waste, since we know the size from the start. We could just put '10000' as the loop limit or put it in a separate const int and use it for both the array creation and loop limiter.
I would have said that in this case initialising the structs in the for loop is redundant.

Eric Tuesday, October 24, 2006
... Because those 10,000 comments will be physically copied in memory when returned, instead of referenced?
Laurent Tuesday, October 24, 2006
Because Point is a structure, considered as a value type, an instance exists as soon it is declared, so the 1-10000 loop is unusefull ?
JP Tuesday, October 24, 2006
When the Array is created, it is already initialized with a point in each slot, with x=0 and y=0 so the loop has no use.
Although, it takes 10 000 000 points in the array to waist 125 ms on my system.
fromaustria Tuesday, October 24, 2006
just a wild guess:
array of points gets created on the heap.
new Point() creates a value type on the stack.
points[i]=... copies the Point from the stack to the heap.
better: return new Point[10000]; // done.

Duckboy Tuesday, October 24, 2006
Dude, you're allocating and initializing all your points twice!

Point is a value type, so the new Point[10000] will actually allocate them all and set their x and y fields to 0. Then, the for loop allocates and initialializes them again, unnecessarily.

So, you're wasting valuable CPU time, causing the GC to have a minor anurism and ultimately using more electricity, therefore hurting the environment.

No sir, I don't like it!
-- Mr. Horse (from Ren & Stimpy)
James Newton-King Tuesday, October 24, 2006
The Points are automatically initialized when the array is created. The loop is unnecessary.
Alexey Tuesday, October 24, 2006
The only thing I can think of is calling points.Length 10000 times, but won't the compiler inline it? Oh, Point is a struct, not a class, so we don't need to call the constructor 10000 times.
fireping Tuesday, October 24, 2006
Property value points.Length is calculated every iteration while you know exact array length.
jayson knight Tuesday, October 24, 2006
Point[] points should be initialized to 10,000 outside the scope of the CreatePoints method, i.e. Point[] CreatePoints(ref Points[])
Martin Tuesday, October 24, 2006
Value types (structs) are 'auto allocated'.
Andrew Tuesday, October 24, 2006
I'd guess it's to do with the use of a for loop instead of do until
Saamorim Tuesday, October 24, 2006
Well, it simple.

Why use the for?
The simple fact that the Joe’s Developer as created the array of Point structs, doesn't required him to create each of the Points. This is because the compiler already allocates the necessary space to accommodate the structs. This, however, didn't work if the Point object was a class. In that case it was necessary to create each of the point objects.
This is basically the biggest difference between a class and a struct?

Am I correct?

Regards
Mitch Wheat Tuesday, October 24, 2006
Because Point is a defined as a struct it is a value type. Therefore, the line

Point[] points = new Point[10000];

does not just allocate space for an array of pointers to points, it actually allocates the space for them. The loop and subsequent allocations are unnecessary.
Wesley Tuesday, October 24, 2006
Theres no need to construct a new point in every loop.

Since the point is a struct (=value type) Joe could create one new Point() before the start of the loop and asign it to every point in the array. Since it's a value type the Point will be copied to the array.

Cheers,
Wes
Wesley Tuesday, October 24, 2006
Second thought: There's no need for the loop at all. Structs(value types) are initialized at the moment they're created.

Cheers,
Wes
Skup Tuesday, October 24, 2006
Arrays are automaticaly filled whith 0's when initialized. Calling new Point() on each element of the array just fill it again with 0's which is useless.

The developper did that because when initializing a struct on the stack, it's not automatically filled with 0's (contrary to reference types), so you have to call at least the parameterless constructor to initialize it.
Christiaan Tuesday, October 24, 2006
Well my gueuss is that Joe doens't need to initialize 10000 points as Point is a value type and must not be initialized.
Sahil Malik Tuesday, October 24, 2006
Joe should be worried because his lead cares more about performance than anything else.

:)
Scott Tuesday, October 24, 2006
I apologize that I have to moderate comments. There were lots of good comments overnight but they all appear at once- in the morning when I catch up.

A lot of people pointed out that the for loop isn't needed. Since Point is a struct (a value type), creating the array is enough to give us 10,000 zero-initialized points.

The array is created on the heap, so it is ok to return a reference to the calling method.
Geoff Appleby Tuesday, October 24, 2006
Yeah, what's with the moderation dude? Maybe you need Gaptcha! :)
scott Tuesday, October 24, 2006
Yeah Geoff - I need something. If I don't moderate this place would be filled with spam. I did have a captcha control installed previously but got some negative comments about it.

Perhaps Gaptcha is better than what I was using. I'll have to give it a look.
Geoff Appleby Wednesday, October 25, 2006
Well, the source is available for you now, although it looks like you're running CS 1.1 (i'm guessing) and I wrote it as a 2.1 addin for .net 2. Is an upgrade worth it? (Hell yeah! :)
Tin Bui Tuesday, November 14, 2006
omg,
Guys, look at the hint from the author. Regardless of whether or not you want to do a loop for initialization, just say we want to do the loop, then 10,000 would make a performance nut really goes nut.
However, against the author's intention, I would not be thinking about making the looping any faster, but to asynchronously do the loop on another thread and go do something else.

Hi Geoff, how are you?
AJ Monday, November 20, 2006
The (entry level example) author wants an array of 10,000 points, 2 integers each. The best way is to malloc the block, (initialized to 0) and use a pointer to a struct of type point to access the array.

It does however, make sense, especially on complex projects, to make the first write of the code as clear as possible (as this is), with little regard to performance, and then increase performance as needed.

Code "sevicablility" is a paramount issue to support people, however, lack of clarity as we all know, adds to R&D job security.

Performance, unfortunately, often goes hand-in-hand with lack of clarity.
Gaurav Wednesday, December 13, 2006
I think calling points.Length 10000 times is not a good idea and it will have a performance hit. Rather than code should be something like this.

Point[] CreatePoints()
{
Point[] points = new Point[10000];
int length = points.Length;

for (int i = 0; i < length; i++)
{
points[i] = new Point();
}

return points;
}