Why Asynchronous?

by:

in Node.js on Jun 24 2014

Defining "Asynchronous Programming"

Asynchronous programming refers to a style of structuring a program whereby a call to some unit of functionality triggers an action that is allowed to continue outside of the ongoing flow of the program.

You commonly find asynchronous programming in event-based systems, particularly user-interfaces such as the browser. This style of computing is not new and has evolved as an essential method for dealing with complex systems interacting with the messiness of the real world. Increasingly common, however, is asynchronous programming in the form of non-blocking evented I/O as witnessed in platforms such as Node.js.

The Journey to I/O and Beyond

Programming was originally a way of providing instructions to a computer system about how to execute a sequence of computations, from step 1 to step N where data is transformed along the sequence. The ability to store and retrieve persistent data became an important part of the process and this brought in the first forms of I/O, Input and Output. I/O refers to mechanisms that are used for sending and receiving data outside of the core processing and memory block. New forms of I/O were added as computers became interactive and needed to read from human interaction tools such as mice and keyboards. Networking added yet another kind of I/O but the defining characteristic of I/O is latency.

Throughout the journey from pure CPU and memory based computing through to complex forms of I/O, programming stayed roughly the same in its sequential nature. I/O merely represented additional steps in the sequence of operations. Even in relatively modern programming language there remains no distinction between I/O and non-I/O operations:

// A Java file reader
BufferedReader br = new BufferedReader(new FileReader("input.txt"));
try {
  StringBuilder sb = new StringBuilder();
  String line;

  while ((line = br.readLine()) != null)
    sb.append(line + "\n");

  System.out.print(sb.toString());
} finally {
  br.close();
}

Note how the operations that block to wait for I/O operations in the above code are mixed to such a degree that it's only the class and method naming that give you a clue about what's actually going on.

The lack of distinction, while likely seen as a positive achievement by language designers, ends up reinforcing to the programmer the fallacy that I/O is simply another processing operation, in the same league as reading from memory, or performing an in-CPU calculation. But this is obviously false, and we've abstracted away a very important piece of information.

This sequential style of programming where expensive off-CPU operations are mixed up with cheap on-CPU operations is a remnant left over from the origins of computing. But it's also due to our strong natural desire to be able to think of computer instructions as a step-by-step series of operations. It is much easier for us to conceptualise what our programs are doing if they are simply a list of operations from 1 to N with occasional loops and jumps in between rather than a jumble of many sub-sequences rolled in to one.

But, I/O is different, the costs are enormous:

Class	Operation	Time cost
Memory	L1 cache reference:	1 ns
	L2 cache reference:	4 ns
	Main memory reference:	100 ns
I/O	SSD random-read:	16,000 ns
	Round-trip in same datacenter:	500,000 ns
	Physical disk seek:	4,000,000 ns
	Round-trip from US to EU:	150,000,000 ns

Even for the experienced programmer who fully understands the nature of the costs of I/O, most languages and platforms present these two classes of operations as one. In fact, they then introduce additional programming and cognitive overhead when you try to deal with I/O as a special-case; just consider the complexity that threads add to a program.

Enter Node.js

Node.js is one of the first platforms to clearly expose the distinction between I/O and everything else. I/O is treated differently under the hood and is properly exposed through to platform primitives. When working with Node.js, the programmer is rightly forced to see I/O as a separate class of operation.

In other languages and platforms, you can pass off I/O to separate threads or perform non-blocking I/O, but you have to go out of your way to do this. In Node.js this is a core platform feature and should be embraced.

When it comes to speed, compiled languages may have the benefit of being "closer to the metal", the JVM may be a highly performant beast, but Node.js is fast because programmers are forced to write fast programs by not introducing blocking I/O to the program flow.

How Node.js Does I/O

JavaScript is single-threaded, but Node.js is not. By default, a Node.js process will spin up four "worker threads" for performing filesystem I/O. Reads and writes are distributed across the worker threads and these operations may block but they never block the JavaScript thread unless you use one of the fs.*Sync() methods.

Socket I/O operations are performed on the main JavaScript thread but only non-blocking system calls. In this way, Node.js can handle a large number of active sockets in a single thread and never block while waiting for data on the other end of the socket, it simply polls for available data and deals with it when it's there.

Embrace It

There is always a strong push towards serial-style programming. Even with Node.js this can be seen in the popularity of heavyweight abstractions designed to undo the asynchronous nature of the platform. Our brains want things to be sequential because that makes it easier reason about them. But the world is not synchronous and attempting to keep programming in a synchronous box is a gross form of abstraction that denies the realities that our programs must deal with.

It can be difficult making a mental shift to Node.js. But if you can accept that dealing with I/O as a special case is a worthy goal, then you'll ultimately end up with a powerful tool for writing highly performant code where the performance comes through the way you write code and not just the platform itself.

The NodeSource Blog