Building an F# powered indexing system (part 2)

This is the second in a series of posts documenting a from-scratch indexing application I'm building in F#.  The first post in the series gives a good overview of the application. I've gone ahead and put the source up as well, so you can skip straight to that if you think I'm a jackass.

Now that I have a basic set of types for defining what goes into an index (a schema), it's time to have a look at types for the actual data going in and out.  I'll call these documents, where the document type looks like this:

type DocField = string * string
type Document = { Schema : IndexSchema; Fields : DocField list }

Documents are associated with our IndexSchema type, and contain a list of DocFields.  A document field is a simple two-string tuple, the first is a field name, the second is some text to cram into the index.  Given the sample schema from the previous post, a basic document declaration might look like this:

let sampleDoc = {
    Schema = defaultSchema;
    Fields =
        [
            ("ID", "ASDF");
            ("PublishDate", DateTime.Now.ToString());
            ("Title", "This is a title");
            ("Excerpt", "My title was non-obvious");
            ("Body", "Lorem ipsum ad infinitum");
        ] }

Pretty basic, really, but it's nice an easy to read.

At this point, I need to do a little groundwork for the Lucene.net library.  So, I'll add a reference!  Right clicking on an F# project in Visual Studio brings up an options dialog with a field for additional compiler flags.  I wanted to reference a DLL in a "Vendor" directory, so I put in something like this: -R "..\Vendor\Lucene.net 2.1\Lucene.Net.dll" (with "all configurations" selected up top, it wouldn't make much sense to only reference it for debug builds).  This is admittedly a warty way of adding a reference to a project, but I expect we'll get a real "add reference" dialog when the F# CTP is released later this year.  On the bright side, I didn't have to wait 8 minutes for the dialog to show up.

There are a couple of different things steps to turning the Document type into something Lucene can handle.  Lucene lives in an imperative world, so the process of creating an indexed document goes something like: create a Lucene document, append fields (with options) one by one.  For this particular library, indexing instructions for each field live live in the Schema, and I'll grab those as necessary when I'm preparing Lucene compatible fields from my Document.

Rather than writing any real logic to search my Schema's field list, I decided to change it to store a Map of fields (where a Map is a HashTable, Dictionary, or whatever else you want to call it).  This turned out to be incredibly easy since there's a Map.of_list function that takes a list of tuples (which I was using before), and breaks each one up into a key/value pair:

type IndexSchema = {
    Name : string;
    Version : float;
    Fields : Map<string, FieldOption list> }
 
let defaultSchema =
    let options = [ Indexed; Tokenized; Type(String) ]
    {   Name = "Mubble.Content";
        Version = 1.0;
        Fields = Map.of_list
                  [ ("ID", [ Unique; Indexed; Stored; Required; ] );
                    ( "PublishDate", [ Indexed; Type(Date Minute) ] );
                    ( "Title", options );
                    ( "Excerpt", options );
                    ( "Body", MultiValue :: options ) ]
    }

It's pretty cool that my defaultSchema declaration only required a minor modification to handle the new Map.

Each field in the schema has a number of associated options that need to be translated to their Lucene equivalents.  The Lucene document field constructor (at least, one of the overloads) takes four parameters: name, value, store and index.  I wrote a small function that takes a DocField and gives back a corresponding Lucene field.  My first hack looks something like this:

open Lucene.Net
let buildField (schema : IndexSchema) (field : DocField) =
    let name = ref (fst field)
    let value = ref (snd field)
    let store = ref Documents.Field.Store.NO
    let index = ref Documents.Field.Index.TOKENIZED
 
    let options = getSchemaField schema !name
 
    let setOption o =
        match o with
        | FieldOption.Stored -> store := Documents.Field.Store.YES
        | FieldOption.Compressed -> store := Documents.Field.Store.COMPRESS
        | FieldOption.Indexed -> index := Documents.Field.Index.UN_TOKENIZED
        | FieldOption.Tokenized -> index := Documents.Field.Index.TOKENIZED
        | FieldOption.Type(t) -> value := format t !value
        // The following aren't relevant here
        | FieldOption.Required | FieldOption.Unique
        | FieldOption.MultiValue -> ()
 
    options |> List.iter setOption
 
    let lfield = new Documents.Field(!name, !value, !store, !index)
    lfield

There are a couple of new bits of F# in that function, starting with the ref function calls.  let name = ref (fst field) says "bind a new reference cell to name using the first value in the tuple field. The reference cell stuff in the F# library is defined like this:

type 'a ref = { mutable contents: 'a }
let (!) r = r.contents
let (:=) r v = r.contents <- v
let ref v = { contents = v }
 

The first line creates the ref type, with a mutable contents label.  "Mutable" means exactly that... the value in contents can change within the scope of ref.  The next two lines are some helper operators, which I've used in my buildField function above.  ! is a bit of a non-intuitive choice, but it returns the contents of the passed in reference.  The assignment operator := sets the value of contents for the passed in reference.  The last line is a helper function for creating reference cell instances.  I could actually create an instance directly if I were so inclined: let name = { contents = "something" }, but the ref function call is "the standard".

My buildField function uses a few of these reference cells which will ultimately be passed into the Lucene field constructor, based on the options getSchemaField gives us.  I defined a local function named setOption that sets the value of the appropriate reference cell (based on the passed in option). The setOption function is just a basic pattern match over a discriminated union, though the FieldOption.Type(t) case is somewhat special. Assuming the value of o is a FieldOption.Type, it binds the actual type to t and passes it and the raw string value to a helper format function:

let format (t : FieldType) raw =
    match t with
    | FieldType.Date(r) ->
        let d = DateTime.Parse(raw)
        DateTools.DateToString(d, (convertResolution r))
    | String -> raw

This function is pretty straightforward (since I'm only dealing with strings and dates at the moment). If the FieldType is a Date, it uses the Lucene DateTools utility to format the string according to the specified resolution. Yet another helper function, convertResolution, turns our pretty DateResolution value into a janky Lucene value:

let convertResolution r =
    match r with
    | Day -> DateTools.Resolution.DAY
    | Hour -> DateTools.Resolution.HOUR
    | Millisecond -> DateTools.Resolution.MILLISECOND
    | Minute -> DateTools.Resolution.MINUTE
    | Month -> DateTools.Resolution.MONTH
    | Second -> DateTools.Resolution.SECOND
    | Year -> DateTools.Resolution.YEAR

Pattern matching is a nice, succinct way of doing something like that, which is great!  It does, however, have another compelling ability that I've fallen in love with.  The F# compiler understands the patterns and will give an "incomplete pattern match" warning at compile time if every possible option isn't accounted for.  So if I decided to add a Microsecond option to my DateResolution type and didn't account for it in my convertResolution function, the compiler would let me know.  I've actually taken to making that an error rather than a warning using the --warn-as-error 25 compiler directive.  It's sweet.

With the buildField function as a starting point, it's pretty easy to create a function that takes one of my Document values and converts it to its final Lucene representation:

let convert (doc : Document) =
    let ldoc = new Documents.Document()
 
    let lFields = doc.Fields |> List.map (buildField doc.Schema)
 
    lFields |> List.iter ldoc.Add
    ldoc

This is as particularly good example of what I like about this style of programming. Given the right supporting functions, the entire document conversion process is incredibly expressive. This function creates a new Lucene document to work with. It then uses the built in List.map function to create a list of fields that the Lucene document can cope with, and iterates through them, appending each one.

I actually used something called "partial function application" with the List.map function.  This basically means "I'll give the compiler a function and only one of its two arguments, the compiler will then create a new function that takes the remaining argument for me".  I could have been a bit more explicit and used a lambda (or anonymous function, if you want to call it that) like so:

let lFields = doc.Fields |> List.map (fun x -> buildField doc.Schema x)

The partial function version it a little less encumbered by syntax, and I've found myself using those wherever possible.

There are two other bits of function coolness going on here as well.  The first is the pipeline operator, |>, which works just like a pipeline from the command shell.  Everything to the left gets passed to the right side as a parameter.  I'm also passing a the ldoc.Add function from the Lucene library as a first class function to the built in List.iter function.  List.iter takes a single argument function and passes each element of the list to it, so the ldoc.Add function works fine there.

That's basically it for converting documents (with the appropriate schema information) to something we can ultimately index. It's interesting that it takes me about 8x as long to write these posts as the code, though.

As an aside, my buildField function is very much imperative. If I wanted to, I could rewrite it like this:

let buildField2 (schema : IndexSchema) (field : DocField) =
    let getOptions l init =
        let name, value, store, index = init
        match l with
        | h :: t ->
            match h with
            | FieldOption.Stored ->
                (name, value, Documents.Field.Store.YES, index)
            | FieldOption.Compressed ->
                (name, value, Documents.Field.Store.COMPRESS, index)
            | FieldOption.Indexed ->
                (name, value, store, Documents.Field.Index.UN_TOKENIZED)
            | FieldOption.Tokenized ->
                (name, value, store, Documents.Field.Index.TOKENIZED)
            | FieldOption.Type(t) ->
                (name, format t value, store, index)
            // The following aren't relevant here
            | FieldOption.Required | FieldOption.Unique
            | FieldOption.MultiValue -> init
        | [] -> init
 
    let name, value, store, index =
        (
            fst field,
            snd field,
            Documents.Field.Store.NO,
            Documents.Field.Index.TOKENIZED
        )
            |> getOptions (getSchemaField schema (fst field))
 
    let lfield = new Documents.Field(name, value, store, index)
    lfield

I find that ugly. Given that the imperative bits are well encapsulated in a function, I'm sticking with the first version for now... although it will probably bother me that I can't come up with a pretty, functional way of doing that.

The source code to this point is available on Google Code.  It's revision 2 in the Subversion repository, and there's also a zipped up snapshot.

kick it on DotNetKicks.com

Building an F# powered indexing system

When I first started dabbling in F#, I really struggled to understand how someone (in particular, me) would sit down and start writing an application from scratch.  Project Euler puzzles are a great way to learn syntax (and probably the best place to start), but I would have loved to see a real application's source with a sort of "here's how it was built" narrative.  So that's what I'm going to do!

I sat down this morning to start moving one of the tools I use to F#.  Doing a rewrite is partially a learning exercise, but there's quite a bit of work that I'd have to do even if I were keeping it in C# and various bits of it really lend themselves to my newfound functional abilities. The application is a utility for managing the content index for my publishing system. It's based on Lucene, and every single page on a Mubble powered site uses it to generate lists of content.

In theory, this will be a series of posts covering these steps (part two is now available):

  • Writing an indexer
  • Building queries and running them
  • Testing the application
  • Turning it into a useful .NET library (most of my code is in C#, afterall)

Lucene.NET is a really good place to start for something like this, but I really want to divorce the exposed functionality from Lucene.NET itself. It's a near direct port of Java Lucene, which is good in some ways. It *is* relatively easy to find how-tos and such for the Java version that apply directly to the .NET version. It's bad because many Java idioms don't match up real well to .NET idioms, leaving me with a dirty feeling when I deal directly with it.

Lucene's API is also pretty basic, and in a server side context has a number of gotchas that can make things difficult. I need to account for concurrency, in particular, since I can only write to any given index once at a time.

Indexing applications are divided into roughly two areas: maintaining the index (indexing) and querying the index. I chose to start with the "indexing" portion of that equation, mainly because I have a working "prototype" to base it on.

My first few hours were spent building the structure of an "update" instruction. I chose to break the instruction into two areas: a loose schema defining how various fields were treated, and a document consisting of field/value combinations. The schema structure is relatively straightforward, and an F# record type seemed appropriate. A schema needs a name, a version and a collection of fields (with options):

type IndexSchema = { Name : string; Version : float; Fields : SchemaField list }

The SchemaField type needs two basic pieces of information, a label and a set of options. I don't know about you, but when I read "two pieces of information", I think tuple:

type SchemaField = string * FieldOption list

Field options map almost directly to the Lucene underneath, and are usually just a flag indicating whether something should be tokenized, stored, etc. So, basically an enum:

type FieldOption =
    Unique | Indexed | Stored | Compressed | MultiValue
    | Required | Tokenized

In addition to the various option flags listed above, we need some way of controlling how values are formatted on their way into Lucene. Dates, in particular, need some special treatment before indexing. Discriminated unions are a great way of accounting for this type of requirement, making the FieldOption declaration look something like this:

type FieldType = String | Date
 
type FieldOption =
    Unique | Indexed | Stored | Compressed | MultiValue
    | Required | Tokenized | Type of FieldType

A discriminated union also comes in handy for the FieldType type. Lucene will vary date formatting based on a precision specifier, and that information can actually be encoded as part of the date option for the type:

type DateResolution = Day | Hour | Millisecond | Minute | Month | Second | Year
 
type FieldType = String | Date of DateResolutio

I was relatively impressed with the succinctness of these definitions, especially compared to my corresponding C# where I ended up solving the problem with a base FieldType and a date subtype.

At this point, I can actually create a schema. I just need a set of default options, which I can append to or replace as needed. It looks something like this:

let defaultSchema =
    let options = [ Indexed; Tokenized; Type(String) ]
    {   Name = "Mubble.Content";
        Version = 1.0;
        Fields =
            [
                ("ID", [ Unique; Indexed; Stored; Required; ] );
                ( "PublishDate", [ Indexed; Type(Date Minute) ] );
                ( "Title", options );
                ( "Excerpt", options );
                ( "Body", MultiValue :: options );
            ]
    }

Note that the compiler was able to infer that defaultSchema should have type IndexSchema based on the labels I used. If necessary, I could have told it explicitly that I wanted an IndexSchema, but I didn't have to.

It's always therapeutic to "see" something after I've written code, so at this point I worked up a small SchemaField printer function. The goal was the run through the fields defined in a particular schema, printing the name and options for each. The quick and dirty version looks like this:

let printFields (schema : IndexSchema) =
    let optionToString o =
        match o with
        | FieldOption.Type(t) ->
            match t with
            | Date(p) -> sprintf "Type=Date:%A" p
            | _ -> sprintf "Type=%A" t
        | _ -> sprintf "%A" o
 
    schema.Fields |> List.iter (fun (n,f) ->
        printfn "Field %s" n
        f |> List.iter (fun o -> printfn "\t%s" (optionToString o)))
 
defaultSchema |> printFields

My printFields function takes a schema, defines an inner optionToString function, then iterates over each field in the schema. optionToString is a decent example of pattern matching over discriminated unions. It looks to see if the value passed to it is a FieldOption.Type, assigning it to t if it is. If t is a FieldType.Date, it extracts the precision to p and prints out something like Type=Date:Minute. The underscores catch anything not previously specified, sorta like an "else" block.

On to part two!

kick it on DotNetKicks.com

C# vs F#: some parallel refactoring (and generalization)

So, shortly after adding the more in depth examples in my last post, I started playing around with the TryParse method in C# to see how "nice" I could make the example code:

public class User
{
    public int Age { get; set; }
    public DateTime SignupDate { get; set; }
    public Double Weight { get; set; }
 
    static string Post(string key)
    {
        return key;
    }
 
    static User Build()
    {
        var age = 0;
        var signupDate = DateTime.MinValue;
        var weight = 0.0;
 
        int.TryParse(Post("age"), out age);
        double.TryParse(Post("weight"), out weight);
        DateTime.TryParse(Post("signupDate"), out signupDate);
 
        return new User
        {
            Age = age,
            SignupDate = signupDate,
            Weight = weight
        };
    }
}

The first attempt I made involved a generic ParseOrDefault function and a corresponding TryFunc delegate:

delegate bool TryFunc<t>(string raw, out T value);
static T ParseOrDefault<t>(TryFunc<t> method, string raw)
{
    var local = default(T);
    method(raw, out local);
    return local;
}

Sadly, using that function in the most natural way possible caused type inferencing errors, so the end result of that little bit of work was this code:

var user = new User
{
    Age = ParseOrDefault<int>(int.TryParse, Post("age")),
    SignupDate = ParseOrDefault<datetime>(DateTime.TryParse, Post("signupDate")),
    Weight = ParseOrDefault<double>(Double.TryParse, Post("weight"))
};

It's not terrible, but the redundant type specifiers really chafe. The code could be much prettier with a smarter C# compiler.

As a fun little exercise, I also spent time making the F# from the previous example something more appealing. The original F# looked like this:

type User = { Age : int; SignupDate : DateTime; Weight : Double; }
 
let post key = key
 
let u =
    {
        Age = Int32.TryParse(post "age") |> snd;
        SignupDate = DateTime.TryParse(post "signupDate") |> snd;
        Weight = Double.TryParse(post "weight") |> snd;
    }

I started down the same path, with the F# equivalent of ParseOrDefault from above:

let parseOrDefault f v = f v |> snd

parseOrDefault takes a function (named f) and a value as parameters. It pipes the result of f(v) to the built in second function, which returns the second value in a two-value tuple.

Using that function lets me change my F# to something like this:

 
let u =
    {
        Age = post "age" |> parseOrDefault Int32.TryParse;
        SignupDate = post "signupDate" |> parseOrDefault DateTime.TryParse;
        Weight = post "weight" |> parseOrDefault Double.TryParse;
    }
 

It's actually a little bit more text than the original, but it seems a bit more straightforward to me. "Send the result of the post function to my parseOrDefault function, which should use Int32.TryParse to do its thing."

The level of repetition in that last bit of code annoys me, though. Fortunately for me, creating new functions in F# is such an easy task that I can do this instead:

 
let u =
    let parsePosted f key = parseOrDefault f (post key)
    {
        Age = parsePosted Int32.TryParse "age";
        SignupDate = parsePosted DateTime.TryParse "signupDate";
        Weight = parsePosted Double.TryParse "weight";
    }

Doing the equivalent in C# is just not worth it, at least in this case. A generic ParsePosted function would have to be a static member somewhere, since you can't do this in C#:

Func<TryFunc<T>,string,T> ParsePosted =
    (method, key) => ParseOrDefault<T>(method, Post(key));
var user = new User
{
    Age = ParseOrDefault<int,string>(int.TryParse, Post("age")),
    SignupDate = ParseOrDefault<DateTime,string>(DateTime.TryParse, Post("signupDate")),
    Weight = ParseOrDefault<Double,string>(Double.TryParse, Post("weight"))
};

Local, one-off functions are possible in C#, but I can't find any way to create a generic one. That really sucks at times like this when we're dealing with types that aren't anywhere in the same inheritance chain.

There's another interesting aspect to the F# version of the parseOrDefault method. F# automatically generalizes it to the equivalent of this in C#:

 
delegate bool TryFunc<T, K>(K raw, out T value);
static T ParseOrDefault<T,K>(TryFunc<T> method, K raw)
{
    var local = default(T);
    method(raw, out local);
    return local;
}
 

This version of the function might come in handy if I ever need the TryGetValue function of a dictionary with some-type-other-than-string as a key. The downside in C# is that it gets terribly verbose to use that kind of function, due to the limits in type inferencing:

 
return new User
{
    Age = ParseOrDefault<int,string>(int.TryParse, Post("age")),
    SignupDate = ParseOrDefault<DateTime,string>(DateTime.TryParse, Post("signupDate")),
    Weight = ParseOrDefault<Double,string>(Double.TryParse, Post("weight"))
};
 

As I get deeper into F#, I'm beginning to realize just how empowering it is for function creation and composition to be such a cheap (from a development time perspective) operation.

kick it on DotNetKicks.com

Tuples rock my world

Value types in the .NET framework tend to have a static TryParse method for converting a string to a typed value.  Well, most of them do, the Guid type doesn't for some reason.  The method takes a string and an output parameter for the result.

Output parameters suck and require a local variable to be passed in, resulting in a ton of useless supporting code:

int val = 0;
string raw = "1234";
bool success = int.TryParse(raw, out val);
// success is true
// val is 1234
 

As I was working through the Expert F# book, I came across a neat little F# trick for dealing with .NET output parameters. When you leave the output parameter off of the method call, the function returns a tuple. Here's some overly explicit F# to demonstrate:

let raw = "1234"
let result = Int32.TryParse(raw)
(* result is (true, 1234) *)
 
let success, value = result
(* success is true *)
(* value is 1234 *)
 

The result value is particularly interesting to me, since I've not really used a language with the concept of tuples before. A tuple is basically a grouping of several values into a single value. The "let success, value..." line above is decomposes the result tuple into two separate values. You can read more about tuples in .NET if you'd like.

If we wanted to do it all as one assignment:

let success, value = Int32.TryParse("1234")
(* success is true *)
(* value is 1234 *)
 

This feature is really very small, but I've been annoyed enough about having to initialize extra variables for out parameters in the past that I really got a kick out of it.

Update: Just to clarify what I mean by "tons of useless supporting code", here's a slightly longer (and equally contrived!) example to illustrate the type of bloat I've run across:

If I were to build a User object from post data, the code might look something like this in C#:

public class User
{
    public int Age { get; set; }
    public DateTime SignupDate { get; set; }
    public Double Weight { get; set; }
 
    static string Post(string key)
    {
        return key;
    }
 
    static User Build()
    {
        var age = 0;
        var signupDate = DateTime.MinValue;
        var weight = 0.0;
 
        int.TryParse(Post("age"), out age);
        double.TryParse(Post("weight"), out weight);
        DateTime.TryParse(Post("signupDate"), out signupDate);
 
        return new User
        {
            Age = age,
            SignupDate = signupDate,
            Weight = weight
        };
    }
}

The alternative F#, however, is quite a bit more concise. Even better, it's much more declarative:

type User = { Age : int; SignupDate : DateTime; Weight : Double; }
 
let post key = key
 
let u =
    {
        Age = Int32.TryParse(post "age") |> snd;
        SignupDate = DateTime.TryParse(post "signupDate") |> snd;
        Weight = Double.TryParse(post "weight") |> snd;
    }

Now, there are a lot of other niceties in the F# version as well. But the important bit is that I can take the resulting tuples from the TryParse operations, pipe them into a the snd function, and suddenly I have valid values for my fields. snd does just what it's short for: it returns the second value in a two value tuple.

kick it on DotNetKicks.com

Slimy sales guy tactic #972

Pretend like you know something you don't.

Sales guy: So are you a programmer or what?

Me: Yep.

SG: What languages do you program in?

Me: Mostly .NET languages

SG: Like C and C++?

Me: Nope, C# mostly, with a smattering of F#

SG: <Bluescreen>

Extreme makeover: routing edition

This was going to be a full on post, but it appears the next preview of the MVC bits will do a good chunk of what I was writing up.  So for posterity, here's the general routing flow in the current ASP.NET MVC release:

  1. Define your routes in RouteTable
  2. PostResolveRequestCache event in the UrlRoutingModule passes every request to the RoutingTable to determine the relevant route
  3. If a route matches, an IRouteHandler is created
  4. The IRouteHandler gets the appropriate IHttpHandler for the specified route, passing along a RequestContext object (which contains RouteData)
    1. Default IRouteHandler uses ControllerName + Action format and reflection
  5. IHttpHandler and original path are stored in an internal RequestData object, which is appended to the context.Items collection
  6. Path is rewritten to ~/Mvc.axd
  7. PostMapRequestHandler event in the UrlRoutingModule grabs the RequestData from the Items collection, rewrites the path back to the original path and sets the IHttpHandler from step 4 to the context.Handler property
  8. Profit!

A basic NamedLock class

Here's a bit of code I extracted from my own special baby of an application.  It allows me to lock on a specific "name", rather than having to use an object instance and the built in lock construct. 

Usage

NamedLock<string> locker = new NamedLock<string>();
var url = "http://services.digg.com...";
using (locker.Lock(url))
{
    //Do something synchronized
    var xml = new XmlDocument();
    xml.Load(url);
}

Why?

There are a few problems I've come across where synchronizing a particular "name" might be useful.  One of the apps I work one makes heavy use of Lucene.NET, each page shows the results of a couple of queries.  It doesn't make a whole ton of sense to run multiple, identical queries against Lucene at the same time, so I generate a key for each query, lock against that key, and let the first thread do the actual work while the others sit around sipping coffee.

The meat and potatoes

Have a look at the class itself.  It's relatively small, and consists of three things:

  • The primary class with Lock and Unlock functions
  • An internal Token class that implements IDisposable
  • An internal ReferenceCount class

NamedLock is pretty simple. It contains a Dictionary<T, ReferenceCount> to keep track of which names are currently locked and provides utility functions for acquiring and releasing locks. Lock looks like this:

public IDisposable Lock(T name, int timeout)
{
    Monitor.Enter(lockCollection);
    ReferenceCount obj = null;
    lockCollection.TryGetValue(name, out obj);
    if (obj == null)
    {
        obj = new ReferenceCount();
        Monitor.Enter(obj);
        lockCollection.Add(name, obj);
        Monitor.Exit(lockCollection);
    }
    else
    {
        obj.AddRef();
        Monitor.Exit(lockCollection);
        if (!Monitor.TryEnter(obj, timeout))
        {
            throw new TimeoutException(
                string.Format(
                "Timeout while waiting for lock on {0}",
                name)
                );
        }
    }

    return new Token<T>(this, name);
}

This function locks the lockCollection, checks for an existing lock with the same name, adds one if it's the first, then locks and returns a token. There's a good reason it uses Monitor.Enter instead of a simple lock statement: you'll notice that if there's no current lock in the collection, we actually lock the sync object (named obj) before releasing the lock on the collection. If the lock does exist in the collection, we increment a reference counter, release the collection lock, and then lock on the sync object. Doing it this way lets us avoid deadlocks on the lock collection (bad juju).

The Unlock function is also relatively simple:

public void Unlock(T name)
{
    lock (lockCollection)
    {
        ReferenceCount obj = null;
        lockCollection.TryGetValue(name, out obj);
        if (obj != null)
        {
            Monitor.Exit(obj);
            if (0 == obj.Release())
            {
                lockCollection.Remove(name);
            }
        }
    }
}

It locks the lockCollection, grabs the sync object, releases the named lock, then removes the sync object if there aren't any other threads holding references to it. This code is a bit more straightforward, since we don't have to do anything janky to avoid dead locks on the lock collection.

The token class is so shockingly simple I'm not even going to paste it here. It takes a reference to the parent NamedLock, and then calls parent.Unlock(name) when disposed.

Conclusión

I can't take full credit for this little class. I'm basically dumb, and needed lots of help from Peter Bright. He's a jackass, but he knows it so it's OK for me to say that.

There are some other things I'd like to be able to do that I currently can't with this class. Primarily, I'd like ReaderWriterLock type functionality. That would greatly complexify things, though, and it performs perfectly well for me at the moment.

ASP.NET MVC routing limitations

The ASP.NET MVC stuff is pretty exciting to me, since I've been rolling my own pseudo-MVC-with-pretty-URL applications since the dawn of time (time being defined as "the release of ASP.NET").  The new MVC tools should help me do all the same things, but in a more standard way.

Last night, I set about trying to make the ASP.NET MVC stuff bend to my URL routing needs.  Unfortunately, the built in route parsing stuff is pretty basic, and it's not at all easy to replace in the current CTP.

Note: In the first public MVC preview the Route class isn't extensible (instead it is a data class). For the next preview release we are looking to make it extensible and enable developers to add scenario specific route classes (for example: a RestRoute sub-class) to cleanly add additional semantics and functionality.

My most visible home-grown implementation has URLs like these:

These URLs have a few basic components. Take the first:

/journals/apple
The content "path". Content is arranged in a hierarchy, and this path is used to locate it. Various content types have specific controllers like "Article", "Blog", etc.
.ars
This initially started out as an extension I could map to the ASP.NET handler in IIS. Since then, it's become a way of identifying the expected format for content. There are a number of those -- .rssx, for instance, will give back an RSS feed (if enabled).
/p2
/2007/12/13
/2007/12/13/final-office-2008-preview-highlights-excel-features
These chunks are basically action parameters.  The first  is a page number: a "p" followed by an integer.  The second set of parameters are: year, month and day.  Year is a 4 digit integer, while month and day are two digit integers. The third example adds a slug, which is just a string (and used to uniquely identify a post).

Ugh, parameters can't include forward slashes...

My example application above has a bunch of hierarchical data, and the standard /path/to/whatever seems like a reasonable way of representing it in a URL.  Also, it's attractive.The built in route parsing logic is dead simple, always breaks on /, and makes parameters like my "path" impossible.  I'd have to use some alternate path separator character to make it work.

...and no custom delimiters

My extensions won't work either, as they're delimited by a period.  While the file extensions started as a legacy hack, I really like using them to represent the expected content type.  It provides a bit more information about the type of URL you're requesting... or could, if you knew ".ars" was the same as ".html".In much the same vein, the "page" parameter (/p2) won't work.

Stuck with it

I've investigated techniques for hijacking the built in routing stuff, but it appears to be a pretty daunting task. The current route parsing stuff is kind of a mess (yay for CTPs). Here's a pretty chart:

 Routing diagram

The RouteTable has a RouteCollection comprised of Route objects. Route parsing and matching logic is mostly in the RouteCollection, with only a basic URL splitting function in the Route class itself. Yuck.

Ideally, the RouteCollection would hold objects that implement an IRoute interface. URL matching and RouteData extraction would be handled by those objects, leaving the RouteCollection responsible only for iterating through them and passing in URLs. It would be awesome. I suspect there are quite a few other people who might think so too.

Hijacking the SubSonic relationship load process

In my quest to make SubSonic work for me, I've been stumbling through the source code and making improvements on my own super special branch. Of primary interest to me was the mechanism for loading objects related through a foreign key.  A Review object might have a parent Author, accessible like so:

Review review = new Review(1);
Author author = review.Author;

The default SubSonic templates generate an accessor like this:

public Test.Author Author
{
    get { return Test.Author.FetchByID(this.AuthorID); }
    set { SetColumnValue("AuthorID", value.Id); }
}

This gets unwieldy in a hurry.  Every single reference to the Author property on my Review class results in a database call.  Code like this is scary:

for(int i = 0; i < 1000; i++)
{
    Author a = review.Author;
}

There is a LazyLoad option which changes the generated code slightly, persisting foreign key relationships in a private variable when they're referenced. It's a good start, but doesn't help with something like this:

foreach(Review review in someReviewCollection)
{
    Author a = review.Author;
}

In many applications, you might end up with thousands of reviews that share a small set of authors.  Not only would that previous loop hit the database for every iteration, it might be reloading authors that have been seen before.  This can quickly become a huge amount of load on a database server.

Yay delegates

The easiest fix for this problem would be to cache all the authors, and retrieve the cached instances when they're referenced through the Author property. Wouldn't it be nice to tell SubSonic "Hey, I want you to check the cache for all foreign key relationships in this object". Something like that would be possible with a foreign key property that looks like this:

public Test.Item Item
{
	get { return LoadSingleObject<Item, int>(
            this.ItemID,
            test.Item.FetchByID
            );
    }

	set { SetColumnValue("ItemID", value.Id); }
}

The above getter uses a new LoadSingleObject function to retrieve the related object. This function takes the related object's primary key value and a default function for retrieving that object.

For this to work, SubSonic needs some mechanism that allows us to hijack its relationship loading process. At a super high level, I attempted to make this possible by creating delegates for getting/storing individual objects, then keeping a list of those delegate pairs in each ActiveRecord object. When a foreign key property is referenced, it will run through that list looking for a hit with the "get" delegate (ie: a result that's not null). Assuming there are no hits, it will use the standard ActiveRecord.FetchById function to return the object. At that point, it iterates the list again and passed the found object back to the "store" delegate.

This has proven to be pretty flexible in my applications, and even provides a foundation for putting in eager loading at some point. When I cache my objects, I now use the RegisterSingleObjectLoader function on each of them to ensure that the cache is the first place they'll look when I reference one of their related objects.

You can have a look at my branch (/branches/kurt) to see exactly what I did. The lion's share of the changes are contained in these two files, which I've stuck on monoport so you can see them in all their syntax highlighted glory:

Installing VS 2008 on Vista makes me want to kill myself

After numerous attempts at installing VS 2008 on Vista machines, I'm about ready to switch to vim. At least I can use apt-get for vim.

The process has basically gone something like this:

  1. Fire up the installater
  2. Choose my installation options
  3. Watch the progress bar do really strange things during the .NET Runtime 3.5 step
  4. Notice the "your computer requires a reboot to complete the Windows update process" indicator in the taskbar
  5. Watch the progress bar do absolutely nothing during the .NET Runtime 3.5 step
  6. Go eat dinner
  7. Come back to see that installation has failed
  8. Oh, and that setup.exe crashed
  9. Reboot, switch to Macbook for a while
  10. Start installation again (clench sphincter*)
  11. Select "Normal" install because I'm too lazy to pick my options again
  12. Oh look, it worked

At first, I figured this was just a problem with my desktop (and potentially Vista 64). Sadly, I was wrong. I just stumbled through exactly the same process in a 32 bit Vista VM on my Macbook Pro. Yar!

* As an aside, "clench sphincter" is apparently more correct than "clinch sphincter", although it looks wrong to me.