An Easy Way to Build Scalable Network Programs

Suppose you’re writing a web server which does video encoding on each file upload. Video encoding is very much compute bound. Some recent blog posts suggest that Node.js would fail miserably at this.

Using Node does not mean that you have to write a video encoding algorithm in JavaScript (a language without even 64 bit integers) and crunch away in the main server event loop. The suggested approach is to separate the I/O bound task of receiving uploads and serving downloads from the compute bound task of video encoding. In the case of video encoding this is accomplished by forking out to ffmpeg. Node provides advanced means of asynchronously controlling subprocesses for work like this.

It has also been suggested that Node does not take advantage of multicore machines. Node has long supported load-balancing connections over multiple processes in just a few lines of code – in this way a Node server will use the available cores. In coming releases we’ll make it even easier: just pass --balance on the command line and Node will manage the cluster of processes.

Node has a clear purpose: provide an easy way to build scalable network programs. It is not a tool for every problem. Do not write a ray tracer with Node. Do not write a web browser with Node. Do however reach for Node if tasked with writing a DNS server, DHCP server, or even a video encoding server.

By relying on the kernel to schedule and preempt computationally expensive tasks and to load balance incoming connections, Node appears less magical than server platforms that employ userland scheduling. So far, our focus on simplicity and transparency has paid off: the number of success stories from developers and corporations who are adopting the technology continues to grow.

This entry was posted in Uncategorized. Bookmark the permalink.

29 Responses to An Easy Way to Build Scalable Network Programs

  1. The question is, how does it perform when computing Fibonacci numbers?

    Kidding aside, you said it. If anyone thinks Node.js is here to replace all available platforms/languages, then something is off.

    Node.js is an excellent tool to solve a lot of problems. And it does a pretty darn good job.

    PS: Love the –balance option!

  2. I sign this 110%!

    It’s unbelievable how many people, out of fear I think, try to “demolish” the use of Node because “it won’t do USE CASE X well”. People don’t really understand that “the right tool for the job” and “what you feel more comportable with” is not diminishing anyone else’s right of using, say, Bash to write a OGG Encoder, or whatever.

  3. well said … and keep up the great work!

  4. Adam Jones says:

    Ugh, this is missing the point. Well, at the least it’s missing the target audience.

    Most of the people that are doing the “suggesting” here know the limitations of evented servers. They’ve been around for ages, and “blocking the event loop with CPU usage” is like the hello world of evented server bugs.

    The response here is to mindless advocacy of Node as some sort of scalability silver bullet. I know the points under “debate” here are obvious, you know they’re obvious, anyone with half a clue knows that they are obvious.

    We should be straightening out the people that are getting lost in hype, instead of bickering about the details of a (at least) twenty year old server design or pointless arguments about the best way to implement various math functions.

  5. A nice succinct response. One aspect that I think is being glossed over is the very small number of problems that are compute bound where you would even consider using a language like JavaScript. If you are working on something that is truly compute bound, you’ll likely want to be working in a language that gives you fine-grained control over multiple threads, the size of your working set for cache optimization, possibly access to the GPU, etc. But as I mentioned in my post at http://blog.cull.tv/post/10557539851/nodejs-crawling-culling-machinelearning there are a growing number of areas that have been traditionally thought of as compute-bound but are actually I/O-bound due to large distributed data sets.

  6. Zach says:

    I applaud this blog post to blatantly stating the actual use cases for Node — rather than marketing it as a technological Swiss army knife capable of turning mediocre programmers into super programmers. To be honest, I’m pretty sure that’s where a lot of the recent backlash originated from.

  7. Generally in agreement here on the usefulness of Node.js.

    But, having *actually* written a video encoding service that sees a fair amount of use, I need to make it clear that just forking an ffmpeg process for each video upload is a terrible approach. Unless you really don’t care about your site being up or fast or surviving more than about two users, you really need to start with a job queue (RabbitMQ, beanstalk, or something similar). Even if your Node part is well written and returns quickly, if 20 (or 200 or 2000) users all upload a video at the same time, your server will get crushed by that many ffmpeg processes all trying to run at the same time, each consuming memory, CPU, and disk IO. Starting with a job queue lets you at the very least limit the number of parallel jobs so your web server can keep running smoothly and probably sets you up nicely for having other “worker” machines so you can do the encoding elsewhere and scale that out horizontally. I can speak for at least RabbitMQ being so easy to get started with and build upon that there’s no reason not to go that way. I’m sure someone could (and probably has) written a job queue with Node and it would be fine. But please don’t assume or lead anyone else to assume that just “forking out to ffmpeg” is a reasonable approach to scaling a video encoding service.

    Node is great and has many useful applications, but I think you picked a particularly bad example in this case.

    • ryandahl says:

      Sure. The point isn’t that video encoding services should fork ffmpeg on each request, the point is that people shouldn’t put computationally heavy things into their server event loop. Dispatching to a RabbitMQ illustrates the point just as well.

    • Why do you need a message queue? A simple deque is easy to build and manage. You can simply run a LoopingTask which keeps track of each of the current ffmpeg subprocesses and refuses to start new ones as long as the maximum number of subprocesses are running, and then wrap each subprocess with a tracking object to report back on its status. You could even get rid of the LoopingTask timer and be totally event-driven: Whenever a video is uploaded, queue a new subprocess and flush the queue until the maximum number of subprocesses is reached; when a subprocess exits, do that queue-flush again. Now you’re web-scaling with portals. :3

      • Corbin, at a simple level, that’s what the message queue is doing. RabbitMQ, eg, happens to be already written, insanely reliable, faster than you would need for a video encoding job queue, has client libraries for just about any language you might want to use, and provides a ready path to growing out to a cluster with more complex routing and persistence. So, yeah, you might be able to roll your own queuing system fairly quickly, but I can install RabbitMQ more quickly, know that it won’t deadlock or lose jobs on me, and when the system inevitable grows out to include multiple worker machines that need to be coordinated (since it really only takes a couple parallel encodings of 2GB HD videos to bring a typical web server to its knees) , I won’t have to go back to the drawing board.

  8. NegativeK says:

    I fear you may be feeding the trolls.

    However, I did learn about –balance, which I will look forward to playing with — so perhaps it’s all worth it. 😉

  9. Einthusan says:

    Can you guys tell me whats wrong with this node.js performance test? It seems something is seriously wrong. Can anyone recreate this situation and confirm if this is correct?

    Performance testing Node.js using Apache JMeter against Lighttpd (MySQL queries & 5000 concurrent)

    http://stackoverflow.com/questions/7658333/benchmarking-node-js-cluster-with-mysql-queries-is-it-really-that-great-compa

  10. Emir says:

    I’m still not sure about node. For scalable network architectures it doesn’t provide much in the way of a backbone however it is a decent way to receive web requests.

    At the moment I think ZeroMQ provides a decent distributed network layer. Further, its language agnostic for most part. node.js could be used as an input and endpoint into a ZeroMQ processing network however so could Mongrel 2 or one of the Python, event based web servers.

    Otherwise, what are the advantages? I prefer to code in Python. I like to be able to drop down into C when necessary. If there is some great library for search indexing or document parsing that I need then I like to be able to code in Java. ZeroMQ provides a way that these different solutions can communicate. Forking processes is often not an efficient solution.

    It seems to me that the most workable use case for node is to receive requests and orchestrate responses with as little processing as possible being performed in javascript. However, its missing the multitude of features that a mature web server like nginx, apache or lighttpd might have. If node is just going to serve as an input and endpoint into a processing network then why node use nginx with SCGI adapter and your favourite dynamic language instead? Why not use Mongrel 2?

  11. Jorge says:

    Is this a joke ?

    Who’s pretending to transcode videos in the main thread ?

    You know very well that the OP was about small-ish cpu-bound tasks that block node’s event loop. And a process for each of these is far far far from an ideal solution.

  12. –balance? does this mean the end cluster/multi-node and friends?

  13. TJGodel says:

    I don’t understand why people are attacking node.js, it’s a tool with a purpose and it solves certain problems otherwise it would not have the huge adoption curve. I haven’t seen any marketing of node.js as a swiss army knife.

    • Jorge says:

      Nobody is attacking node. I’m listed in the AUTHORS file because I have contributed several times because I love it. But it has this flaw. It does not help to deny it. Recognize it and devise a good solution. That’s what would help.

  14. toni tabak says:

    i just discovered node because of coffeescript and i am really exited about it i think it is mindblowing technology and i love .js and now i love it much more. thx @ryah on this and great presentation.

  15. marcitolopes says:

    Is’nt a best solution use nginx directly to asynchronous I/O bound tasks using http protocol?

  16. Thanks, Ryan, for a well-reasoned post on what Node is and is not good for.

    Adam Jones wrote, “we should be straightening out the people that are getting lost in hype.” One of my concerns about Node is that some of this hype comes from Joyent itself. For example, see:

    http://www.joyent.com/products/node-js/

    Perhaps some amount of hype is inevitable when marketing a software development tool to non-developers. And maybe Node.js is useful for such a large subset of today’s web applications that Joyent’s hype is justified. I’m just concerned that it has the potential to mislead. I have nothing against Node or Joyent in general; I’m just expressing a concern.

  17. $ telnet nodejs.org 80
    Trying 8.12.44.238…
    Connected to nodejs.org.
    Escape character is ‘^]’.
    GET / HTTP/1.0
    Host: NodeJs.org

    HTTP/1.1 200 OK
    Server: nginx

  18. Arindam says:

    Can we get some more info on “In the case of video encoding this is accomplished by forking out to ffmpeg”. How separate process is spawned from node.js ? Is there any detailed discussion somewhere ?

    In which release, we’ll have “–balance” option in the commandline ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s