Node.js Performance Monitoring - Part 3: Debugging the Event Loop
In this article, we’re going to dig a bit deeper into how to diagnose and resolve common Node.js bottlenecks in the event loop by understanding some of the key metrics around the event loop.
Understanding Parts of the Event Loop
A "task" is a unit of work to perform after completing an asynchronous operation. This should not be confused with event emitted on the
EventEmitter; which only counts for a portion of where time is spent in task execution. For example, processing an http request also includes parsing the headers; which is done before the
'request' event is called.
The event loop cannot process multiple tasks simultaneously. Instead tasks are placed in a queue and processed sequentially. This means one one task can be held up because the one before it is taking a long time to complete. A bottleneck can be identified by observing an abnormally large amount of lag between processing asynchronous tasks. Because all task execution originates from the event loop it is the best location for initial bottleneck measurements.
An event loop without lag (we'll consider "healthy") will show that no task has been queued to wait for the completion of another task. Instead they are all processed the moment they are received.
In Node.js it is impossible to determine the moment a task is received. Only when it is ready to be processed. This is because Node.js relies on the operating system's kernel to handle the task queue. Node.js itself has no insight into this. Thus if 2 tasks are ready to be processed at the same time Node can only know the second task is available after the first has been completed. So, for example, if your application requires low latency responding to TCP requests it is important to know that no two tasks have overlapped.
High event loop lag, low event loop idle
When an application is simultaneously experiencing both high lag and low idle time in the event loop, you can expect to find a synchronous operation of some form as the source of the bottleneck in your application.
There are different types of synchronous operations that need to be considered.
The more your application needs to execute in CPU in a single cycle of the event loop, the more you are going to occupy the event loop and therefore prevent other tasks from being processed in a timely manner. You may want to consider offloading your Monte Carlo simulations to a separate process.
Combing through your code and guessing what might be causing your event loop delay is a tough chore without proper insight. By utilizing N|Solid, you can set an event loop threshold to trigger an alert when your event loop is blocked for a given period of time, in addition to triggering a CPU profile that can be examined later. From within the N|Solid console, you can then use the CPU profiler to explore and identify where that synchronous execution is taking place.
Loading your CPU profile as a flame graph lets you dig down into your call stacks to see which calls are occupying the most time in your event loop. Here’s a flame graph as an example:
You will likely find that there is a tradeoff to be made between responsiveness and throughput when you are CPU-constrained. If you break CPU work up into pieces to be executed separately, you free up your event loop to be more responsive to all of the additional work it needs to attend to, but you will likely pay a cost in the overhead of "chunked" work.
It’s essential that you establish benchmarks for your application that measure the metrics that are important to you and then tune your application to maximise these metrics. Perhaps throughput is not as essential as responsiveness. Perhaps you care more about first-byte-to-client than the total amount of time it takes to deliver the entire content. Perhaps you simply want to scale the number of clients using the smallest amount of hardware and some reduction responsiveness is acceptable to achieve that scale. Measure what is important before you begin to tune arbitrarily because there are always trade-offs.
Low event loop lag, High event loop idle
Typically you’re going to want your Node.js application to perform with low lag and high idle time in the event loop—this is usually a sign of an efficient application. Unfortunately, it can also mask efficiency issues, giving you a false sense of confidence that is only exposed during unexpected peak usage.
If you are experiencing timeouts, or general performance degradation in an application that’s demonstrating low lag, high idle time in the event loop, a likely cause of the issue is that there’s an I/O or downstream resource that’s acting as a bottleneck. In this case, your application isn’t maximizing the use of the local resources available to it, particularly the CPU.
The I/O resource is likely to have a performance constraint of its own. This may either be an expected responsiveness issue, i.e. the resource simply isn’t designed to respond as quickly as you need, or it could be that the service is overloaded and your Node.js application is able to throw much more traffic at it than it can reasonably handle.
This is a very common scenario when Node.js is inserted into an existing environment and has to make external calls to existing services. An experience that NodeSource sees repeatedly when assisting companies migrate to Node.js is that their new application can handle much more throughput than their existing stack can, thereby exposing bottlenecks that have always existed but were never noticed.
In a services-based architecture, the performance of your application is dependent on every service involved in the handling of each transaction. Diagnosing problems of this kind requires insight into each service that is part of the transaction change and comparing the event loop and resource usage of each. When you have the kind of side-by-side comparison afforded by the N|Solid console, it is generally fairly easy to spot the lowest-hanging-fruit in your path to optimization.
Or, maybe you just have a slow disk and it’s time to upgrade to an SSD. Either way, N|Solid should be able to help you pinpoint the source of your performance constraint and give you a clue to solving it.
Monitoring with N|Solid
Instrumenting scripts to gain greater insight into the event loop is incredibly inaccurate or impossible without a native C++ extension to extend the core functionality of Node.js.
Currently, N|Solid is the easiest way of debugging and gaining insight into your Node.js applications, a complete drop-in replacement with no code modifications.
With the N|Solid StatsD interface you can easily view real-time metrics into the health and performance of your Node.js event loop in your preferred APM tool. N|Solid provides nearly 50 application metrics with the following focused specifically on providing insight into the event loop.
- loopIdlePercent - Percent of time that Node.js is waiting for I/O ("idle")
- loopEstimatedLag - The amount of time a I/O response may have to wait before being processed
- loopsPerSecond - The amount of event loop "turns" elapsed in the last second
To learn more about your applications’ event loop health and performance, you can try deploying with N|Solid instead of the vanilla Node.js runtime.