N|Solid OSS Release
Origin of N|Solid
In November of 2014, when NodeSource was still a small consulting group, my teammates Dan Shaw, Rod Vagg, and I were having dinner after a customer engagement, discussing how to bring Node.js production deployments to the same level of polish and tooling capability of the other runtimes our customers were already employing. The power and flexibility of Node.js helped it take off like a rocket, but the tooling to make sure that it was behaving properly had (and has) been a lot slower keeping up, or has used jury-rigged tooling designed for completely different runtime paradigms–effectively trying to shove a hexagonal peg into a square hole. There was a general lack of quality information, guidance, or practices around putting Node.js into production at all.
Achieving this sort of parity and filling these holes in the community required solving simultaneous problems: fitting Node.js into modern production infrastructures, having reliable deployments and meaningful success metrics for expanding and internally evangelizing Node.js adoption, and even sometimes simply to just have any idea of what is going on in these Node.js production systems at all.
Between our existing expertise with distributing Node.js builds and internal Node.js expertise itself we realized that if anyone was going to provide something like this for the Node.js Enterprise community, it would have to be us.
We have always believed and seen firsthand that there is a giant number of teams and organizations that could benefit from an augmented set of tooling. Over the years, many of the people working on N|Solid were also core team members to the Node.js project, keeping an eye on industry needs often being deferred by the project. Foremost of these people is Trevor Norris, who has been our expert with his hands deepest in the V8 and Node C++ internals and continues to drive the vision and details of the N|Solid runtime. The broader community shares many of the same values when it comes to performance and the flexibility of Javascript and the power of the community and its resources such as npm–but we wanted to focus our attention and efforts to support the needs of those of us running important, secure, high-throughput, production environments.
So we took it upon ourselves to tailor Node.js a business suit and help it work well with others. Part of what we came up with is what became N|Solid –an instrumented Node.js runtime and a purpose-built inspection console–a tool to both guide teams into a well-structured production environment as well as provide a devtools-like introspection and analysis interface to work with it. We wanted it to provide out-of-the-box compatibility for industry standard monitoring and other infrastructure tooling and harden its security profile, for example by disabling potentially unsafe features. Essentially, we wanted the ability to make decisions about the runtime that might make it less effective for small projects or experimental work in favor of a hardened runtime with guard rails, specifically designed to slot into best-practice production infrastructures.
An example of this was the ability to override core Buffer allocation to zero-fill allocated memory. It took two more Node.js LTS releases after the initial N|Solid release with this feature for Node.js core to come up with an upstream permanent solution to the problem. We were able to provide protection for our clients immediately and seamlessly transition them to the upstream solution when it became available.
The concept of N|Solid originated from our collective experience running Node.js applications in production and helping our customers with theirs. In addition, a good chunk of our business is helping people productionize and stabilize their Node.js environments, so we needed these tools in order to adequately research and analyze these customer issues. Much like any good product, its origin is building tools to solve our own problems first.
The Challenge of Node.js Observability
The fundamental nature of application performance is that everything going on in your code breaks down to CPU instructions and work that must be done. Instrumentation is also work, and the way it's implemented can be extremely contentious with your own business logic, especially on platforms such as Node.js that have single-threaded bottlenecks such as the V8 event loop. Without a separate agent thread like the N|Solid agent, at some point the event loop must stop doing application work to collect metrics, crunch numbers, and send them over the line to the monitoring endpoint. This is your only option if your instrumentation is written in JavaScript and runs on the same event loop as the application.
N|Solid intentionally sequesters as much of this as possible to its own execution thread that works in parallel to Node.js. The work is still being done, but no longer in a way that is in contention with your own application for its single-threaded resources. This also enables us to detect and interact with a stuck Event Loop in a way that no other tool can.
Every tool you add to your platform to improve observability and capture information adds additional overhead. N|Solid aims to provide a single, low-overhead agent for sharing this across all of your tooling needs in a way that is extremely tightly bound to the specifics of Node.js.
Check out our benchmark tool to see how N|Solid compares when it comes to the cost of observability.
The Technical Details
Node.js is a small engine of amazingness; combining the V8 javascript engine with a core suite of libraries to provide an extremely fast and flexible runtime environment for javascript on the server. Developers generally approach runtime engines like little black boxes–as long as it runs their code the way they expect, what is actually going on doesn’t matter all that much. The reality is that the complexity around the asynchronous model Node.js uses being radically different from most other established platforms often results in confusion about what’s going on.
Considering all of our goals around what data we needed access to for both introspection and hardening and the additional goal of near-zero contention with application performance, we decided the only solution would be to build our own version of the runtime with our additional changes patched in. This also allows users to use N|Solid by simply using the nsolid
binary as if it was the node
binary–because it is! To your application, N|Solid is an environment change only, and can be tried without changing a single line of your application code.
There’s one small added complexity of doing it this way, though: we need to to make builds of N|Solid for every supported version of Node.js on every platform of node that our customers might require. This meant our changeset needed to be consistently applied across multiple changing upstream branches built on a build farm with every possible supported architecture. Fortunately, NodeSource was and remains the top community resource for making and distributing builds of Node.js – the odds are extremely good that if you’re using Node.js, we built it for you on the same servers we’re building N|Solid.
The rough architecture of N|Solid is a native C++ thread and a matching Javascript module built into Node.js directly that can access internal hooks and has the ability to send the results upstream in a variety of ways, such as OpenTelemetry or StatsD. Foremost among these is the N|Solid Console which provides fully wired access to all of the runtime features by making use of the bi-directional N|Solid Agent API. This bi-directional communication layer with the agent thread is what enables something akin to devtools, allowing limited interaction with a live Node.js process–even one potentially running in production environments.
It is vital that N|Solid retains 100% compatibility with Node.js, including the entire npm ecosystem. The community was and still is still seeing a significant amount of framework churn, we wanted to sit outside of the framework discussions because we understand just how many different frameworks are being used in production right now. We wanted to make sure we can support these frameworks in what they do, but also provide a tool for comparing and selecting between frameworks.
We want N|Solid to play well with others, so we made it aware of community practices and standards, such as package.json and common Node.js environments. As the project adopts new features and standards, N|Solid also adapts.
Our tooling is built around the runtime engine itself, treating each process (and potentially worker thread) as an individual unit, collecting a wide set of metrics and interactive introspection such as CPU profiling or Heap Snapshot collection from live processes without having stop them or start a canary process and hope that it reproduces the observable behavior. We found it essential to provide the ability to identify and inspect a suspect process while it is still alive, enabling you to interrogate the rogue process itself instead of the frustrating process of trying to reproduce the same behavior in a lab environment.
Node applications are often large microservice installations, sharing state across potentially thousands of processes. We wanted the N|Solid Console to be a tool to expose the information to a central repository that could manage and inspect the results and let you do some limited interactive introspection remotely. This central location for your entire production installation lets you see everything at a glance, but still dig into the details of individual processes. This coordination aspect of the N|Solid Console also allows it to compare different processes–read more about anomaly detection and snapshot diffing in our documentation.
N|Solid Features
- Robust APIs: Benefit from JavaScript and C++ APIs’ flexibility and power.
- Monitoring Data: N|Solid allows for the transmission of a wide array of monitoring data, encompassing system metrics, Event Loop Utilization, worker threads, and numerous specialized Node.js metrics to third-party providers such as Datadog, New Relic, and Dynatrace.
- Open Telemetry and Tracing: Send Open Telemetry compatible traces to supported third-party providers, ensuring comprehensive observability.
- StatsD Compatibility: Transmit monitoring information using StatsD to any compatible backend.
- Environment Variable Utilization: Use all available environment variables at runtime.
- Manual Control over CPU Profiles and Heap Snapshots: Gain the ability to manually capture CPU profiles and heap snapshots using the JS or C++ API.
The N|Solid release schedule is tied directly to the Node.js LTS release schedule. Due to how flexible development on what’s called the Current Node.js line, we wait until the release as been solidified into its LTS form prior to creating a N|Solid version. This means that all active LTS lines of Node.js have a corresponding N|Solid Release, and we aim to release new versions of N|Solid within 24 hours of the upstream Node.js LTS release. If you are stuck on a legacy version of Node.js, let our support team help you update to a current LTS version to ensure you are still getting vital security patches.
In summary, the N|Solid Runtime is the Node.js runtime, augmented with additional capabilities to enable what we saw as operational best practices. The N|Solid Console is the coordinated monitoring and introspection tool designed to fully leverage the N|Solid runtime and the combined experience of encountering and solving our own and our customer’s actual problems in production environments.
Why Open Source the N|Solid Runtime?
This is something we’ve considered for many years. We have always been a strong supporter of the community and believe in the immense value and impact of open-source. We have remained committed to the Node.js ecosystem as active contributors and being a leading distributor of the OSS binary packages.
Earlier this year we came to the conclusion that the timing was right, our development roadmap had reached a point where we had something meaningful to provide to the community and we could continue to deliver the value and support our Enterprise and SaaS customers expect from our commercial offering. Further, we envision that the collaboration with the global developer community will create a brighter and more innovative future for N|Solid and set a new standard for enterprise needs.
We think everyone should be running N|Solid on their business platforms where they are using Node.js. Throughout its existence, we’ve focused on compatibility with other production tooling–even those we compete with–because most of these are not tightly coupled to Node.js. Usually they are polyglot and must cater to the lowest commonality between platforms. We want to encourage the proliferation of N|Solid and the advancement of Node-paradigm specific tooling by putting the runtime directly into the hands of the Open Source community. We see an opportunity for developers to build new connectors and integrations with other tools and support the collective creativity of the community. We get the chance to foster even greater collaboration and partnerships with other providers that want to add the value of N|Solid to their own platforms and tools.
We’re open source engineers at heart, we believe in the power of community code and that having the source available creates an environment of trust and empowerment. We feel like we’ve only been able to scratch the surface of what’s possible here and want to bring the community into the project and we hope that we can get you all excited about it too.
Read more about how to get involved in our contribution guidelines!
The Future of N|Solid
We have a lot of plans already for N|Solid and welcome you to participate in their development. These are some of our upcoming initiatives:
- Custom Metrics: Capture and transport your own application-specific metrics via the N|Solid API
- Heap Profiling: Locate memory leaks by profiling memory allocation over time
- Async Stack Traces: Connect stack traces across the libuv boundary
- Improved APM Integrations: Allow APM vendors to use the N|Solid agent thread for metrics calculation and transport to move overhead off of the main process
- Implement OpenTelemetry standard for metrics
- Implement OpenTelemetry standard for logging
These aren’t our only ideas, and we’re interested to see what the community comes up with as well. Expect to hear more about our plans as we continue work through the open source release and documentation process. There are so many potential valuable integrations throughout the development process from IDEs through CI/CD through production tooling–we can’t wait to see where we can take this together!
Backed by the NodeSource Team
N|Solid is backed by the entire NodeSource team, and for those who want a hand in adopting N|Solid or Node.js, we are here to help. From installation and configuration to upgrades, troubleshooting, and performance tuning, our engineers can support your team at every stage in the application development lifecycle.