Click here to Skip to main content
15,868,141 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Some background - and why I avoid yield in certain library code I'm writing:

All About Iterators – Yet Another Language Geek[^]

The problem is this:

Say I need to do simple set combinations using loops. In LINQ-ish fashion I'll simply be treating a forward iterable sequence as a collection or a col, below.

so for a union of col a and col b I do a|b

to yield that would look like
Union(a,b) implemented as Figure 1, whose performance has now decreased significantly due to the internals of how this works.

it's bad enough that it has been suggested by brighter minds than mine that a feature be added to C#: yield foreach, as in Figure 2, to both increase readability and sidestep the performance hit required (because it can compose the iterator classes into one)

Right now I hand roll a lot of enumeration stuff, because LINQ performance is unacceptable for extensive use in things like GLR parsers, and part of it is this limitation of yield.

I've nearly wanted to move off of .NET altogether because F# is an odd duck and c# just is causing me a lot of work. To the point where I've considered moving back to my good old C++

This isn't about bit twiddling. I'm not looking to shave clock cycles. This is a design issue that impacts where I can even use LINQ and yield within C# specifically - it limits where I am able to practically use it, because of the nature of the projects I work on which deal in a lot of parsing, decision trees and pattern recognition, and use these with wild abandon everywhere, and nesting is the typical case.

I work around this right now by hand coding quite a bit.

Is anyone else familiar with this problem, and if so do they have any good suggestions where I can continue to use C# with or without LINQ and the modern C# paradigms without hitting performance barriers once I want to employ set based functional programming?

What I have tried:

// Figure 1: union a|b PERFORMANCE SUCKS
IEnumerable Union(IEnumerable a,IEnumerable b) {
  foreach(var x in a) yield return x;
  foreach(var y in b) yield return b;
}


// Figure 2: union a|b basically as good as
// hand rolled, but requires something C# doesn't have
IEnumerable Union(IEnumerable a,IEnumerable b) {
  yield foreach a;
  yield foreach b; 
}
Posted
Updated 17-Jan-18 15:07pm
v3
Comments
PIEBALDconsult 17-Jan-18 21:27pm    
Don't blame the framework or the language. I avoid Linq, foreach, and many other wasteful things that are designed to keep inexperienced practitioners from becoming proficient.
You can't beat a for loop -- particularly for the kind of stuff you mention.
honey the codewitch 17-Jan-18 22:25pm    
this isn't about blaming anything.

it's about not writing enumerator classes by hand anymore if there's a better way to avoid it.

consider me a nihilist when it comes to frameworks and blame.

i don't care who is at fault. i'm just looking for a way to keep using C# even as i basically have to hand-roll a significant subset of what linq and yield provides from scratch an awful lot more than I'd like to.
Alex Schunk 19-Jan-18 8:03am    
if you look at the IL code you can see that foreach has an overhead. If you want performance, use for(;;) because it is basically just jumps.
Everything that makes your code nice, clean and easy to read usually comes with overhead.
You have to always decide between maintainability, reusability and performance. You can't have one without impacting the other.
yield shows its power on big collections where you are not going through all of its elements.
You may find what you want in the "unsafe" world of C#.
honey the codewitch 19-Jan-18 11:00am    
I don't care about bit twiddling overhead.

Look at the article in the link. This isn't about a little bit of overhead.

It's about time complexity of O(m+n) where m is the number of items in the first sequence and n is the number of items in the second sequence. And it gets worse as you nest, the outermost call is O(m+1). The next call has O((m-1)+1), then O((m-2)+1), ... O(1+1). There are m of these calls so the running time should be O(m^2). Essentially, composing concats together like this causes O(m^2) yield returns to be executed.

(from the article, with performance graphs)
Alex Schunk 19-Jan-18 11:43am    
Well... The implementation of the yield looks like this.. This is the overhead I am talking about...
bool IEnumerator.MoveNext()
{
	bool result;
	try
	{
		switch (this.<>1__state)
		{
		case 0:
			this.<>1__state = -1;
			this.<>s__1 = this.sequence1.GetEnumerator();
			this.<>1__state = -3;
			break;
		case 1:
			this.<>1__state = -3;
			this.<item>5__2 = default(T);
			break;
		case 2:
			this.<>1__state = -4;
			this.<item>5__4 = default(T);
			goto IL_101;
		default:
			return false;
		}
		if (this.<>s__1.MoveNext())
		{
			this.<item>5__2 = this.<>s__1.Current;
			this.<>2__current = this.<item>5__2;
			this.<>1__state = 1;
			return true;
		}
		this.<>m__Finally1();
		this.<>s__1 = null;
		this.<>s__3 = this.sequence2.GetEnumerator();
		this.<>1__state = -4;
		IL_101:
		if (!this.<>s__3.MoveNext())
		{
			this.<>m__Finally2();
			this.<>s__3 = null;
			result = false;
		}
		else
		{
			this.<item>5__4 = this.<>s__3.Current;
			this.<>2__current = this.<item>5__4;
			this.<>1__state = 2;
			result = true;
		}
	}
	catch
	{
		this.System.IDisposable.Dispose();
		throw;
	}
	return result;
}


But from looking at this code I can't really explain where the O(m²) come from.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900