OpenSource LINQ providers

by Miguel de Icaza

Microsoft upcoming .NET 3.5 and C# 3.0 have support for Language Integrated Queries (LINQ). This gives C# and VB users a SQL-like syntax right in the language to access databases directly. For example, this is now valid C#:

public void Linq1() {
    int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

    var lowNums =
        from n in numbers
        where n < 5
        select n;

    Console.WriteLine("Numbers < 5:");
    foreach (var x in lowNums) {
        Console.WriteLine(x);
    }
}
	

For more samples, check the 101 LINQ Samples.

There are a number of LINQ providers for different kinds of sources. For example, there is a built-in provider for things like arrays, collections and IEnumerables.

A more useful sample comes from Don Box, this is a program to extract the top ten read articles from an Apache log file file:

	var regex = new Regex(@"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)");

        var grouped = from line in ReadLinesFromFile("logfile.txt")
                      let match = regex.Match(line)
                      where match.Success
                      let url = match.Value
                      group url by url;

        var ordered = from g in grouped
                      let count = g.Count()
                      orderby count descending
                      select new { Count = count, Key = g.Key };

        foreach (var item in ordered.Take(10))
            Console.WriteLine("{0}: {1}", item.Count, item.Key);

There is another provider for creating and consuming XML data: LINQ to XML. The LINQ distribution comes with a nice sample where many of the samples from the XQuery specification have been rewritten into C# with LINQ with code that is roughly the same size (which sadly, I can not find online to point to).

Luckily, we have implementations for both of these in Mono.

And finally there are providers for databases. Microsoft ships one that will integrate natively with Microsoft SQL.

Unlike the XML and in-memory providers, the SQL providers are more complicated as they need to turn high-level operations into optimized SQL statements. Unlike many object to database mapping systems that produce very verbose SQL statements LINQ is designed to provide a very efficient representation.

The Db_Linq is an open source project to create a LINQ provider for other databases. The project is lead by George Moudry and so far has providers for PostgreSQL, Oracle and Mysql.

George keeps a blog here where you can track the development of DbLinq.

Thanks to Bryan for pointing me out to this fantastic piece of code.

Mono users on Linux will now be able to use LINQ with open source databases from C# (in addition to our in-memory and XML providers).

Currently we are still missing some support in our compiler and our class libraries for this to work in Mono, but this will be a great test case and help us deliver this sooner to developers.

Update: A nice blog entry talks about Parallel LINQ. A version of LINQ that can be used to parallelize operations across multiple CPUs:

	IEnumerable data = ...;
	
	// Regular code:
	var q = data.Where(x => p(x)).
		Orderby(x => k(x)).Select(x => f(x));
	foreach (var e in q) a(e);

	// Parallelized version, add the "AsParallel" method:
	var q = data.AsParallel().Where(x => p(x)).
		Orderby(x => k(x)).Select(x => f(x));
	

See more details about the above in the Running Queries On Multi-Core Processors article.

Update2: Am not up to speed in the databases vs object databases wars, but am told that there is also a LINQ binding for NHibernate. A sample is here.

Posted on 24 Oct 2007