ldapjs: A reprise of LDAP

This post has been about 10 years in the making. My first job out of college was at IBM working on the Tivoli Directory Server, and at the time I had a preconceived notion that working on anything related to Internet RFCs was about as hot as you could get. I spent a lot of time back then getting “down and dirty” with everything about LDAP: the protocol, performance, storage engines, indexing and querying, caching, customer use cases and patterns, general network server patterns, etc. Basically, I soaked up as much as I possibly could while I was there. On top of that, I listened to all the “gray beards” tell me about the history of LDAP, which was a bizarre marriage of telecommunications conglomerates and graduate students. The point of this blog post is to give you a crash course in LDAP, and explain what makes ldapjs different. Allow me to be the gray beard for a bit…

What is LDAP and where did it come from?

Directory services were largely pioneered by the telecommunications companies (e.g., AT&T) to allow fast information retrieval of all the crap you’d expect would be in a telephone book and directory. That is, given a name, or an address, or an area code, or a number, or a foo support looking up customer records, billing information, routing information, etc. The efforts of several telcos came to exist in the X.500 standard(s). An X.500 directory is one of the most complicated beasts you can possibly imagine, but on a high note, there’s
probably not a thing you can imagine in a directory service that wasn’t thought of in there. It is literally the kitchen sink. Oh, and it doesn’t run over IP (it’s actually on the OSI model).

Several years after X.500 had been deployed (at telcos, academic institutions, etc.), it became clear that the Internet was “for real.” LDAP, the “Lightweight Directory Access Protocol,” was invented to act purely as an IP-accessible gateway to an X.500 directory.

At some point in the early 90’s, a graduate student at the University of Michigan (with some help) cooked up the “grandfather” implementation of the LDAP protocol, which wasn’t actually a “gateway,” but rather a stand-alone implementation of LDAP. Said implementation, like many things at the time, was a process-per-connection concurrency model, and had “backends” (aka storage engine) for the file system and the Unix DB API. At some point the Berkeley Database (BDB) was put in, and still remains the de facto storage engine for most LDAP directories.

Ok, so some a graduate student at UM wrote an LDAP server that wasn’t a gateway. So what? Well, that UM code base turns out to be the thing that pretty much every vendor did a source license for. Those graduate students went off to Netscape later in the 90’s, and largely dominated the market of LDAP middleware until Active Directory came along many years later (as far as I know, Active Directory is “from scratch”, since while it’s “almost” LDAP, it’s different in a lot of ways). That Netscape code base was further bought and sold over the years to iPlanet, Sun Microsystems, and Red Hat (I’m probably missing somebody in that chain). It now lives in the Fedora umbrella as ‘389 Directory Server.’ Probably the most popular fork of that code base now is OpenLDAP.

IBM did the same thing, and the Directory Server I worked on was a fork of the UM code too, but it heavily diverged from the Netscape branches. The divergence was primarily due to: (1) backing to DB2 as opposed to BDB, and (2) needing to run on IBM’s big iron like OS/400 and Z series mainframes.

Macro point is that there have actually been very few “fresh” implementations of LDAP, and it gets a pretty bad reputation because at the end of the day you’ve got 20 years of “bolt-ons” to grad student code. Oh, and it was born out of ginormous telcos, so of course the protocol is overly complex.

That said, while there certainly is some wacky stuff in the LDAP protocol itself, it really suffered from poor and buggy implementations more than the fact that LDAP itself was fundamentally flawed. As engine yard pointed out a few years back, you can think of LDAP as the original NoSQL store.

LDAP: The Good Parts

So what’s awesome about LDAP? Since it’s a directory system it maintains a hierarchy of your data, which as an information management pattern aligns
with _a lot_ of use case (the quintessential example is white pages for people in your company, but subscriptions to SaaS applications, “host groups”
for tracking machines/instances, physical goods tracking, etc., all have use cases that fit that organization scheme). For example, presumably at your job
you have a “reporting chain.” Let’s say a given record in LDAP (I’ll use myself as a guinea pig here) looks like:

    firstName: Mark
    lastName: Cavage
    city: Seattle
    uid: markc
    state: Washington
    mail: mcavagegmailcom
    phone: (206) 555-1212
    title: Software Engineer
    department: 123456
    objectclass: joyentPerson

The record for me would live under the tree of engineers I report to (and as an example some other popular engineers under said vice president) would look like:

                   uid=david
                    /
               uid=bryan
            /      |      \
      uid=markc  uid=ryah  uid=isaacs

Ok, so we’ve got a tree. It’s not tremendously different from your filesystem, but how do we find people? LDAP has a rich search filter syntax that makes a lot of sense for key/value data (far more than tacking Map Reduce jobs on does, imo), and all search queries take a “start point” in the tree. Here’s an example: let’s say I wanted to find all “Software Engineers” in the entire company, a filter would look like:

     (title="Software Engineer")

And I’d just start my search from ‘uid=david’ in the example above. Let’s say I wanted to find all software engineers who worked in Seattle:

     (&(title="Software Engineer")(city=Seattle))

I could keep going, but the gist is that LDAP has “full” boolean predicate logic, wildcard filters, etc. It’s really rich.

Oh, and on top of the technical merits, better or worse, it’s an established standard for both administrators and applications (i.e., most “shipped” intranet software has either a local user repository or the ability to leverage an LDAP server somewhere). So there’s a lot of compelling reasons to look at leveraging LDAP.

ldapjs: Why do I care?

As I said earlier, I spent a lot of time at IBM observing how customers used LDAP, and the real items I took away from that experience were:

  • LDAP implementations have suffered a lot from never having been designed from the ground up for a large number of concurrent connections with asynchronous operations.
  • There are use cases for LDAP that just don’t always fit the traditional “here’s my server and storage engine” model. A lot of simple customer use cases wanted an LDAP access point, but not be forced into taking the heavy backends that came with it (they wanted the original gateway model!). There was an entire “sub” industry for this known as “meta directories” back in the late 90’s and early 2000’s.
  • Replication was always a sticking point. LDAP vendors all tried to offer a big multi-master, multi-site replication model. It was a lot of “bolt-on” complexity, done before the CAP theorem was written, and certainly before it was accepted as “truth.”
  • Nobody uses all of the protocol. In fact, 20% of the features solve 80% of the use cases (I’m making that number up, but you get the idea).

For all the good parts of LDAP, those are really damned big failing points, and even I eventually abandoned LDAP for the greener pastures of NoSQL somewhere
along the way. But it always nagged at me that LDAP didn’t get it’s due because of a lot of implementation problems (to be clear, if I could, I’d change some
aspects of the protocol itself too, but that’s a lot harder).

Well, in the last year, I went to work for Joyent, and like everyone else, we have several use problems that are classic directory service problems. If you break down the list I outlined above:

  • Connection-oriented and asynchronous: Holy smokes batman, node.js is a completely kick-ass event-driven asynchronous server platform that manages connections like a boss. Check!
  • Lots of use cases: Yeah, we’ve got some. Man, the sinatra/express paradigm is so easy to slap over anything. How about we just do that and leave as many use cases open as we can. Check!
  • Replication is hard. CAP is right: There are a lot of distributed databases out vying to solve exactly this problem. At Joyent we went with Riak. Check!
  • Don’t need all of the protocol: I’m lazy. Let’s just skip the stupid things most people don’t need. Check!

So that’s the crux of ldapjs right there. Giving you the ability to put LDAP back into your application while nailing those 4 fundamental problems that plague most existing LDAP deployments.

The obvious question is how it turned out, and the answer is, honestly, better than I thought it would. When I set out to do this, I actually assumed I’d be shipping a much smaller percentage of the RFC than is there. There’s actually about 95% of the core RFC implemented. I wasn’t sure if the marriage of this protocol to node/JavaScript would work out, but if you’ve used express ever, this should be _really_ familiar. And I tried to make it as natural as possible to use “pure” JavaScript objects, rather than requiring the developer to understand ASN.1 (the binary wire protocol) or the LDAP RFC in detail (this one mostly worked out; ldap_modify is still kind of a PITA).

Within 24 hours of releasing ldapjs on Twitter, there was an implementation of an address book that works with Thunderbird/Evolution, by the end of that weekend there was some slick integration with CouchDB, and ldapjs even got used in one of the node knockout apps. Off to a pretty good start!

The Road Ahead

Hopefully you’ve been motivated to learn a little bit more about LDAP and try out ldapjs. The best place to start is probably the guide. After that you’ll probably need to pick up a book from back in the day. ldapjs itself is still in its infancy; there’s quite a bit of room to add some slick client-side logic (e.g., connection pools, automatic reconnects), easy to use schema validation, backends, etc. By the time this post is live, there will be experimental dtrace support if you’re running on Mac OS X or preferably Joyent’s SmartOS (shameless plug). And that nagging percentage of the protocol I didn’t do will get filled in over time I suspect. If you’ve got an interest in any of this, send me some pull requests, but most importantly, I just want to see LDAP not just be a skeleton in the closet and get used in places where you should be using it. So get out there and write you some LDAP.

This entry was posted in Uncategorized. Bookmark the permalink.

25 Responses to ldapjs: A reprise of LDAP

  1. ldap learner says:

    awesome article – thanks

  2. That’s *really* cool !

    Now, the next big part is to build a more powerfull request language for LDAP (i.e one with server side join), add some transaction in the mix, make it a pair of RFC, and now we have a really powerfull, portable, standardized, high performance NoSQL datastore 🙂

    I will follow your experiment, it’s really nice !

    Good luck

  3. Awesome. Great work.
    Maybe i have something to play with, for our next internal “Fed Ex” day here at http://semanticweb-company.at

    Authenication with ldap – backend RDF Triple Store.

  4. Let the Erlang trolls in now.

  5. Howard Chu says:

    Looks like fun. I’ve always missed the LDAP javascript module that Netscape shipped, which disappeared when it became Mozilla. (I took a few stabs at resurrecting it in my own Mozilla builds, but never got it working well.)

    By the way, the newest OpenLDAP backend is super-lightweight and ultra-fast. (The core of its database is less than 32KB of x86-64 object code, and it is typically 5-25x faster than BerkeleyDB for read accesses.) You’ll be hearing more about it at LDAPCon 2011 next month.

    • S.A. says:

      It is possible that you would have a non-blocking async client module interfacing the latest OpenLDAP for use with the node.js?

      • Howard Chu says:

        I’m not sure I understood the question, but since ldapjs is written purely in javascript I don’t see how it has any dependency at all on how recent a version of OpenLDAP you talk to.

  6. Charlie says:

    Awesome work! Great to see fresh attention on a protocol that is still so core to management of accounts, something practically every network service needs.

    Mark Cavage and Howard Chu (and anyone else with LDAP expertise), I would be very interested in your take on the merits of the following idea:

    -Develop a RESTful translation of the LDAP protocol and data model that can run over HTTP, HTTPS, and perhaps Google’s recent SPDY tweak of HTTP/S. Something that would make people like Roy Fielding proud.

    -Support JSON payloads in addition to (or instead of) ASN.1 (an obvious fit for node, and you, Mark, have probably already done much of the legwork of translating the data model to JSON).

    -During or after this translation, take the opportunity to fix rough spots in the protocol alluded to in this post.

    -Adapt modern LDAP servers like OpenLDAP and ldapjs to support this new RESTful API alongside LDAP, with the same back-end. Treat them as two languages over which the same sorts of requests may be expressed.

    -Implement NSS modules for popular operating systems which use the new REST/JSON take on LDAP.

    Is this a reasonable upgrade path for transitioning to a directory protocol that developers will find simple and intuitive? Is there some fundamental CS reason why this is a bad or unworkable idea? Are patches welcome on ldapjs and OpenLDAP to help make this a reality?

    • mcavage says:

      Thanks Charlie,

      No there’s nothing about what you described that’s unreasonable, but it’s not a technical barrier, it’s an adoption/political problem across server vendors and client implementors. Once upon a time there was a thing called DSML, which was around the same time as SOAP, so there has been thought from the industry around different access methods (HTTP) to a directory service. It was horrible for a lot of different reasons, and never caught on, but the point is, it’s definitely doable.

      m

      • Charlie says:

        Mark,

        Thanks for the quick reply!

        Just checked out out DSML. The last line on its wikipedia entry seems to sum it up:

        ‘DSML is often pronounced “dismal”.’

        It can be depressing to read about well-intentioned projects that went absolutely nowhere.

        So after coming up with that idea, I had a “this is so obvious somebody has likely done it” moment, and sure enough, there is this:

        http://nimbusds.com/json2ldap-spec.html

        Possibly a great fit in the world of node, I think 🙂

  7. James says:

    Back when Microsoft proposed Web3S I wondered why they didn’t just use LDIF for synchronising contacts. In the end they went for APP, and CardDAV is also becoming popular. JSON is definitely a better choice though.

    Incidentally, Sun’s http://www.opends.org/ is a from-scratch LDAP server implementation in Java. It even supports DSML, but hasn’t had a release in over a year.

  8. kevmcmanus says:

    So here’s the newbie questin. I have an application I want to authenticate to SUN LDAP/ Windows AD and other LDAP servers. is ldapjs a lighter aka “easier” wa to integrate LDAP authentication over a Kerberos integration?

    • Charlie says:

      LDAP authentication is simpler, and weaker, than Kerberos.

      You can think of Kerberos as strong authentication which may be combined with LDAP for extra security. Active Directory does this.

      • Howard Chu says:

        You seem to imply that the two are mutually exclusive, but they’re not. LDAP uses SASL for strong authentication, and you can use GSSAPI/Kerberos 5 as the SASL mechanism if you wish.

    • mcavage says:

      ldapjs does not yet have support for SASL or GSSAPI; it’s simple bind only. That’s part of what I was alluding to that it’s not 100% of the protocol. I suspect that Kerberos will come up enough times that it will eventually get done, but the core focus of ldapjs right now isn’t to play the legacy integration game, it’s to enable new LDAP apps that people previously didn’t have a good mechanism to implement with.

  9. Charlie says:

    Apologize if I misstated something. I didn’t mean to imply that anything was mutually exclusive; I was making the point that LDAP and Kerberos can be (and often are) used together.

  10. Yuri Negocio says:

    Nice article and job. Im very interested in IAM solutions for node.js.

  11. It’s good to know facts about directory services. Actually, I am one of the beneficiary of this kind of service because I post ads on directories. It is a bit pricy, but very worth it. A lot of people are using directories every now and then, hence there is a great chance that my ads will be viewed by large of number of people.

  12. Joseph Gordon says:

    Getting an error trying to “require” ldapjs on the windows version of node…

    TypeError: Object # has no method ‘dlopen’

    Any way around it?

  13. Thanks mcavage for this wonderful course on LDAP. You are doing a great job, please don’t stop.

  14. Depending on the selected model, a directory tree of LDAP usually reflects boundaries in terms of politics, geography and organizations. You may find people or groups of people, documents, printers and organizational units deeper within the directory.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s