Keith Dahlby started a discussion with his post “Is Functional Abstraction Too Clever?” Read Keith’s post for the background, but I would agree with all the comments I’ve read so far and I say the functional approach is a good solution to the problem.
Keith’s post did get me thinking of things that could go wrong in the hands of a developer who doesn’t understand how in-memory LINQ is working.
As an example, let’s use Keith’s Values extension method …
public static IEnumerable<int> Values( this Random random, int minValue, int maxValue) { while (true) yield return random.Next(minValue, maxValue); }
… and let’s say a developer is using the above to satisfy the following requirements.
What’s wrong with the following code?
var values = new Random().Values(1, 100) .Distinct() .OrderBy(value => value) .Take(10) .ToArray();
Do you think it’s a subtle problem?
What’s an easy fix?
Comments
This is a halting problem.
At some point, the internal list Distinct uses is going to contain every number between 1 and 100, and all of those were relayed to OrderBy. But then OrderBy requests the next integer in the enumeration. Try as it might, the Random routine is not going to generate an integer that satisfies the Distinct clause, causing an infinite loop.
And even my solution isn't flawless, because it takes a nondeterministic amount of time to evaluate. The random number generation enumeration could keep returning the same nondistinct numbers for an indefinite amount of time. It would probably even pass simple unit tests and cause random and hard to diagnose slowdowns in production. And if the piece of code is ever parametrized and the Take value is larger than the integer range, we would be right back where we started.
For small random integer ranges, it would be better to generate a list that contains all values in the range. Then you pick a number at a random spot in the list, remove it from the list and return it. At no point would more than ten random numbers be generated.
I don't think there's a simple fix, because no matter how many numbers you take, there's always the possibility that you don't have ten distinct.
Practically, you could keep taking numbers until you have ten distinct and this should complete within milliseconds, but theoretically it may never complete.
I guess you could ask for the first 10 numbers, from a random shuffle of numbers 1..100.
var values = new Random()
.Values(1, 100)
.Distinct()
.Take(10)
.OrderBy(value => value)
.ToArray();
should do the trick.
var values = new Random()
.Values(1, 100)
.Take(20) //take enough to make 10 distict numbers
.Distinct()
.Take(10)
.OrderBy(value => value)
.ToArray();
this is of course not entirely correct as few than 10 numbers could result. a correct solution would be to implement a new extension TakeDistinct that takes until N distinct numbers have arrived.
The solution is to make things pure, possibly by using IEnumerable.Let():
community.bartdesmet.net/...
(Yes, that should make your brain hurt.)
The result:
var values = new Random()
.Values(1, 100)
.Let(src => src.Take(20)
.Distinct()
.OrderBy(value => value))
.Take(10)
.ToArray();
You guys have raised some other interesting points, though, and yes @Jonathon - anything pointing to Bart's blog makes my head hurt. :)
One implementation of this "shuffling" problem is:
var randGen = new Random();
var source = Enumerable.Range( 1, 100 ).ToList();
var randList = Enumerable.Repeat( 1, 10 )
.Select( i =>
{
var index = randGen.Next( 101 - i );
var value = source[index];
source.RemoveAt( index );
return value;
} );
I'm trying to wrap my head around a "pure" linq solution, but I'm just not grasping it.
@tobi, There is no guarantee that Take(20), that is, 20 random numbers would yield at least 10 distinct values. It's possible (yet improbable) that all twenty values are in fact the same.
var values =
new Random().Shuffle( 1, 100 )
.Take( 10 )
.OrderBy( value => value )
.ToArray();
Where,
public static IEnumerable<int> Shuffle(
this Random random,
int minValue, int maxValue )
{
var source = Enumerable.Range( minValue, maxValue ).ToList();
while(source.Count > 1)
{
var index = random.Next( source.Count );
var value = source[index];
source.RemoveAt( index );
yield return value;
}
yield return source[0];
}
In fact:
var values =
new Random().Shuffle( 1, 100 )
.OrderBy( value => value )
.Take( 10 )
.ToArray();
would still be okay, because the Shuffle extension would never iterate more than the maxValue (100).
List<int> seenValues = new List<int>();
var values =
new Random().Values(1, 100)
.TakeWhile(i => {
if (!seenValues.Contains(i)) {
seenValues.Add(i);
}
return seenValues.Count < 10;
})
.Distinct()
.OrderBy(value => value)
.ToArray();
I'd probably combine the TakeWhile and Distinct into a TakeDistinct(int count) extension method just for cleanness, but that gives the idea.
Which, yeah - is subtle. But, debugging the hang would spot it immediately - and it'd fail unit testing immediately (well, at least it'd never return).
Actually, you can. Distinct just filters list using knowledge about previously encountered elements. It does not try to enumerate infinite sequence itself. OrderBy is of course different story.
@Jonathan
Thanks for pointer, actually examples are very approachable.
@All
I tried to chew a little more on the topic:
blog.casualdev.net/...