Error handling is a pain, and it’s easy to get by for a long time in Node.js without dealing with errors correctly. However, building robust Node.js applications requires dealing with errors properly, and it’s not hard to learn how. If you’re really impatient, skip down to the “Summary” section for a tl;dr.
This document will answer several questions that programmers new to Node.js often ask:
This document is divided into several parts that build on one another:
This document assumes:
function myApiFunc(callback) {
/*
* This pattern does NOT work!
*/
try {
doSomeAsynchronousOperation((err) => {
if (err) {
throw (err);
}
/* continue as normal */
});
} catch (ex) {
callback(ex);
}
}
does not work to handle errors.1
You should also be familiar with the four main ways to deliver an error in Node.js:
We’ll discuss when to use each of these patterns below. This document does not assume that you know anything about domains.
Finally, you should know that in JavaScript (and Node.js especially), there’s a difference between an error and an exception. An error is any instance of the Error class. Errors may be constructed and then passed directly to another function or thrown. When you throw an error, it becomes an exception.2 Here’s an example of using an error as an exception:
throw new Error('something bad happened');
but you can just as well create an Error without throwing it:
callback(new Error('something bad happened'));
and this is much more common in Node.js because most errors are asynchronous. As we’ll see, it’s very uncommon to need to catch an error from a synchronous function. This is very different than Java, C++, and other languages that make heavy use of exceptions.
It’s helpful to divide all errors into two broad categories:3
People use the term “errors” to talk about both operational and programmer errors, but they’re really quite different. Operational errors are error conditions that all correct programs must deal with, and as long as they’re dealt with, they don’t necessarily indicate a bug or even a serious problem. “File not found” is an operational error, but it doesn’t necessarily mean anything’s wrong. It might just mean the program has to create the file it’s looking for first.
By contrast, programmer errors are bugs. They’re cases where you made a mistake, maybe by forgetting to validate user input, mistyping a variable name, or something like that. By definition there’s no way to handle those. If there were, you would have just used the error handling code in place of the code that caused the error!
This distinction is very important: operational errors are part of the normal operation of a program. Programmer errors are bugs.
Sometimes, you have both operational and programming errors as part of the same root problem. If an HTTP server tries to use an undefined variable and crashes, that’s a programmer error. Any clients with requests in flight at the time of the crash will see an ECONNRESET error, typically reported in Node as a “socket hang-up”. For the client, that’s a separate operational error. That’s because a correct client must handle a server that crashes or a network that flakes out.
Similarly, failure to handle an operational error is itself a programmer error. For example, if a program tries to connect to a server but it gets an ECONNREFUSED error, and it hasn’t registered a handler for the socket’s 'error' event, then the program will crash, and that’s a programmer error. The connection failure is an operational error (since that’s something any correct program can experience when the network or other components in the system have failed), but the failure to handle it is a programmer error.
The distinction between operational errors and programmer errors is the foundation for figuring out how to deliver errors and how to handle them. Make sure you understand this before reading on.
Just like performance and security, error handling isn’t something that can be bolted onto a program that has no error handling already. Nor can you centralize all error handling in one part of the program, the same way you can’t centralize “performance” in one part of the program. Any code that does anything which might possibly fail (opening a file, connecting to a server, forking a child process, and so on) has to consider what happens when that operation fails. That includes knowing how it may fail (the failure mode) and what such a failure would indicate. More on this later, but the key point here is that error handling has to be done in a fine-grained way because the impact and response depend on exactly what failed and why.
You may end up handling the same error at several levels of the stack. This happens when lower levels can’t do anything useful except propagate the error to their caller, which propagates the error to its caller, and so on. Often, only the top-level caller knows what the appropriate response is, whether that’s to retry the operation, report an error to the user, or something else. But that doesn’t mean you should try to report all errors to a single top-level callback, because that callback itself can’t know in what context the error occurred, what pieces of an operation have successfully completed, and which ones actually failed.
Let’s make this concrete. For any given error, there are a few things you might do:
There’s nothing you can do to handle a programmer error. By definition, the code that was supposed to do something was broken (e.g., had a mistyped variable name), so you can’t fix the problem with more code. If you could, you’d just use the error handling code in place of the broken code.
Some people advocate attempting to recover from programmer errors — that is, allow the current operation to fail, but keep handling requests. This is not recommended. Consider that a programmer error is a case that you didn’t think about when you wrote the original code. How can you be sure that the problem won’t affect other requests? If other requests share any common state (a server, a socket, a pool of database connections, etc.), it’s very possible that the other requests will do the wrong thing.
A typical example is a REST server (e.g., using restify) where one of the request handlers throws a ReferenceError (e.g., used a mistyped variable name). There are a lot of ways this that continuing on can lead to serious bugs that are extremely difficult to track down. For a few examples:
The best way to recover from programmer errors is to crash immediately. You should run your programs using a restarter that will automatically restart the program in the event of a crash. With a restarter in place, crashing is the fastest way to restore reliable service in the face of a transient programmer error.
The only downside to crashing on programmer errors is that connected clients may be temporarily disrupted, but remember:
If disconnecting clients is a frequently problem because a server crashes so often, you should focus on the bugs that cause the service to crash — and make those exceptional — rather than trying to avoid crashing in cases where the code is obviously wrong. The best way to debug these problems is to configure Node to dump core on an uncaught exception. On both GNU/Linux and illumos-based systems, you can use these core files to see not only the stack trace where the program crashed, but the arguments to each of these functions and most other JavaScript objects as well, even those only referenced in closures. Even without core dumps configured, you can use the stack information and logs to make a start at the problem.
Finally, remember that a programmer error on a server just becomes an operational error on a client. Clients have to deal with servers crashing and network blips. That’s not just theoretical — both really do happen in production systems.
We’ve talked about how to handle errors, but when you’re writing a new function, how do you deliver errors to the code that called your function?
The single most important thing to do is document what your function does, including what arguments it takes (including their types and any other constraints), what it returns, what errors can happen, and what those errors mean. If you don’t know what errors can happen or don’t know what they mean, then your program cannot be correct except by accident. So if you’re writing a new function, you have to tell your callers what errors can happen and what they mean.
There are three basic patterns for a function to deliver errors.
For the most part, we’ll lump callbacks and event emitters in the same bucket of “asynchronous error delivery”. If you want to deliver an error asynchronously, You generally want to use one or the other of these (callback or event emitter), but not both.
So, when do you use throw, and when do you use callbacks or event emitters? It depends on two things:
By far, the most common case is an operational error in an asynchronous function. For the majority of these, you’ll want to have your function take a callback as an argument, and you’ll just pass the error to the callback. This works very well, and is widely used. See the Node fs module for examples. If you’ve got a more complicated case like the ones described above, you may want to use an event emitter instead, but you’ll still deliver the error asynchronously.
The next most common case is an operational error in a synchronous function like JSON.parse. For these functions, if you encounter an operational error (like invalid user input), you have to deliver the error synchronously. You can either throw it (much more common) or return it.
For a given function, if any operational error can be delivered asynchronously, then all operational errors should be delivered asynchronously. There may be cases when you know immediately that the request will fail, but not because of a programmer error. Maybe the function caches the results of recent requests and there’s a cache entry with an error that you’ll return to the caller. Even though you know right away that the request will fail, you should deliver that error asynchronously.
The general rule is that a function may deliver operational errors synchronously (e.g., by throwing) or asynchronously (by passing them to a callback or emitting error on an EventEmitter), but it should not do both. This way, a user can handle errors by either handling them in the callback or using try/catch, but they never need to do both. Which one they use depends on what how the function delivers its errors, and that should be specified with its documentation.
We’ve left out programmer errors. Recall that these are always bugs. They can also usually be identified immediately by checking the types (and other constraints) on arguments at the start of the function. A degenerate case is where someone calls an asynchronous function but doesn’t pass a callback. You should throw these errors immediately, since the program is broken and the best chance of debugging it involves getting at least a stack trace and ideally a core file at the point of the error. To do this, we recommend validating the types of all arguments at the start of the function.
Since programmer errors should never be handled, this recommendation doesn’t change our conclusion above that a caller can use try/catch or a callback (or event emitter) to handle errors but never needs to use both. For more, see “(Not) handling programmer errors” above.
Here’s a summary of these recommendations with some example functions in Node’s core libraries, in rough order of the frequency that each kind of problem comes up:
Example funcKind of funcExample error Kind of errorHow to deliverCaller uses
fs.stat asynchronousfile not found operational callback handle callback error
JSON.parse synchronousbad user input operational throw try/catch
fs.stat asynchronousnull for filenameprogrammer throw none (crash)
Operational errors in an asynchronous function (row 1) are by far the most common case. Use of synchronous functions that report operational errors (row 2) is very rare in Node.js except for user input validation. However, with the release of Node.js version 8 people are starting to promisify these asynchronous functions and using await inside of a try/catch. Programmer errors (row 3) should never happen except in development.
How do you know what’s a programmer error vs. an operational error? Quite simply: it’s up to you to define and document what types your function will allow and how you’ll try to interpret them. If you get something other than what you’ve documented to accept, that’s a programmer error. If the input is something you’ve documented to accept but you can’t process right now, that’s an operational error.
You have to use your judgment to decide how strict you want to be, but we can make some suggestions. To get specific, imagine a function called “connect” that takes an IP address and a callback and invokes the callback asynchronously after either succeeding or failing. Suppose the user passes something that’s obviously not a valid IP address, like 'bob'. In this case, you have a few options:
Both of these are consistent with the guidelines about operational errors and programmer errors. You’re really deciding whether to consider such input to be a programmer error or an operational error. In general, user input validation functions are very loose. Date.parse, for example, accepts a variety of inputs — that’s basically the point. But for most other functions, we strongly recommend biasing towards being stricter rather than looser. The more your function tries to guess what the caller meant (using implied coercions, either as part of JavaScript or doing it explicitly in your function), the more likely it’ll guess wrong. Instead of saving developers the effort required to be more explicit, you may well do something that wastes hours of the developer’s time to debug. Besides, you can always make the function less strict in future versions if you decide that’s a good idea, but if you discover that your attempt to guess what people meant leads to really nasty bugs, you can’t fix it without breaking compatibility.
So if a value cannot possibly be valid (e.g., undefined for a required string, or a string that’s supposed to be an IP address but obviously isn’t), you should document that it isn’t allowed and throw an error immediately if you see it. As long as you document it, then these are programmer errors, not operational errors. By throwing immediately, you minimize the damage caused by the bug and preserve the information the developer would want to debug the problem (e.g., the call stack, and if you’re using core dumps, the arguments and all of memory as well).
Operational errors can always be handled through an explicit mechanism: catching an exception, processing the error in a callback, handling an “error” event on a EventEmitter, and so on. Domains and the process-wide 'uncaughtException' event are primarily useful to attempt to handle or recover from unanticipated programmer errors. For all the reasons described above, this is strongly discouraged.
We’ve talked about a lot of guiding principles, so now let’s get specific.
This is the single most important thing to do. The documentation for every interface function should be very clear about:
If any of these are wrong or missing, that’s a programmer error, and you should throw immediately.
You’ll also want to document:
All of your errors should either use the Error class or a subclass of it. You should provide name and message properties, and stack should work too (and be accurate).
When you need to figure out what kind of error this is, use the name property. Built-in JavaScript names you may want to reuse include “RangeError” (an argument is outside of its valid range) and “TypeError” (an argument has the wrong type). For HTTP errors, it’s common to use the RFC-given status text to name the error, like “BadRequestError” or “ServiceUnavailableError”.
Don’t feel the need to create new names for everything. You don’t need separate InvalidHostnameError, InvalidIpAddressError, InvalidDnsServerError, and so on, when you could just have a single InvalidArgumentError and augment it with properties that say what’s wrong (see below).
For example, if an argument was invalid, set propertyName to the name of the property that was invalid and propertyValue to the value that was passed. If you failed to connect to a server, use remoteIp to say which IP you tried to connect to. If you got a system error, include the syscall property to say which syscall failed, and the errno property to say which system errno you got back. See the appendix for example property names to use.
At the very least, you want:
You should also include enough information in the error message for the caller to construct their own error message without having to parse yours. They may want to localize the error message, or aggregate a large number of errors together, or display the error message differently (e.g., in a table on a web site, or by highlighting a bad user-input form field).
Often you’ll find that your asynchronous function funcA calls some other asynchronous function funcB, and that if funcB emits an Error, you’ll want funcA to emit the same Error. (Note that the second part doesn’t always follow from the first. Sometimes funcA will retry instead. Or sometimes you’ll have funcA ignore the error because it may just mean there’s nothing to do. But we’re just considering the simple case where funcA wants to directly return funcB’s error here.)
In this case, consider wrapping the Error instead of returning it directly. By wrapping, we mean propagating a new Error that includes all of the information from the lower level, plus additional helpful context based on the current level. The verror module provides an easy way to do this.
For example, suppose you have a function called fetchConfig, which fetches a server’s configuration from a remote database. Maybe you call this function when your server starts up. The whole path at startup looks like this:
1.Connect to the database server. This in turn will:
1.Resolve the DNS hostname of the database server.
2.Make a TCP connection to the database server.
3.Authenticate to the database server
2.Make the DB request
3.Decode the response
4.Load the configuration
2.Start handling requests
Suppose at runtime there’s a problem connecting to the database server. If the connection step at 1.1.2 fails because there’s no route to the host, and each level propagates the error to the caller (as it should), but doesn’t wrap the error first, then you might get an error message like this:
myserver: Error: connect ECONNREFUSED
This is obviously not very helpful.
On the other hand, if each level wraps the Error returned from the lower level, you can get a much more informative message:
myserver: failed to start up: failed to load configuration: failed to connect to
database server: failed to connect to 127.0.0.1 port 1234: connect ECONNREFUSED
You may want to skip wrapping in some levels and get a less pedantic message:
myserver: failed to load configuration: connection refused from database at
127.0.0.1 port 1234.
On the other hand, it’s better to err on the side of including more information rather than less.
If you decide to wrap an Error, there are a few things to consider:
At Joyent, we use the verror module to wrap errors since it’s syntactically concise. As of this writing, it doesn’t quite do all of this yet, but it will be extended to do so.
Consider a function that asynchronously connects to a TCP port at an IPv4 address. Here’s an example of how we might document it:
/*
* Make a TCP connection to the given IPv4 address. Arguments:
* ip4addr a string representing a valid IPv4 address
*
* tcpPort a positive integer representing a valid TCP port
*
* timeout a positive integer denoting the number of milliseconds to wait for a response from the remote server before considering the connection to have failed.
* callback invoked when the connection succeeds or fails. Upon
* success, callback is invoked as callback(null, socket),
* where `socket` is a Node net.Socket object. Upon failure,
* callback is invoked as callback(err) instead.
* This function may fail for several reasons:
* SystemError For "connection refused" and "host unreachable" and other
* errors returned by the connect(2) system call. For these
* errors, err.errno will be set to the actual errno symbolic * name.
* TimeoutError Emitted if "timeout" milliseconds elapse without
* successfully completing the connection.
* All errors will have the conventional "remoteIp" and "remotePort" properties.
* After any error, any socket that was created will be closed.
*/
function connect(ip4addr, tcpPort, timeout, callback) {
assert.equal(typeof (ip4addr), 'string',
"argument 'ip4addr' must be a string");
assert.ok(net.isIPv4(ip4addr),
"argument 'ip4addr' must be a valid IPv4 address");
assert.equal(typeof (tcpPort), 'number',
"argument 'tcpPort' must be a number");
assert.ok(!isNaN(tcpPort) && tcpPort > 0 && tcpPort < 65536,
"argument 'tcpPort' must be a positive integer between 1 and 65535");
assert.equal(typeof (timeout), 'number',
"argument 'timeout' must be a number");
assert.ok(!isNaN(timeout) && timeout > 0,
"argument 'timeout' must be a positive integer");
assert.equal(typeof (callback), 'function');
}
This example is conceptually simple, but demonstrates a bunch of the suggestions we talked about:
This may seem like more work than people usually put into writing what should be a well-understood function, but most functions aren’t so universally well-understood. All advice should be shrink-to-fit, and you should use your judgment if something truly is simple, but remember: ten minutes documenting expectations now may save hours for you or someone else later.
It’s strongly recommended that you use these names to stay consistent with the Errors delivered by Node core and Node add-ons. Most of these won’t apply to any given error, but when in doubt, you should include any information that seems useful, both programmatically and for a custom error message.
Property name Intended use
localHostname the local DNS hostname (e.g., that you’re accepting connections at)
localIp the local IP address (e.g., that you’re accepting connections at)
localPort the local TCP port (e.g., that you’re accepting connections at)
remoteHostnamethe DNS hostname of some other service (e.g., that you tried to connect to)
remoteIpthe IP address of some other service (e.g., that you tried to connect to)
remotePortthe port of some other service (e.g., that you tried to connect to)
paththe name of a file, directory, or Unix Domain Socket (e.g., that you tried to open)
srcpaththe name of a path used as a source (e.g., for a rename or copy)
dstpaththe name of a path used as a destination (e.g., for a rename or copy)
hostnamea DNS hostname (e.g., that you tried to resolve)
ipan IP address (e.g., that you tried to reverse-resolve)
propertyNamean object property name, or an argument name (e.g., for a validation error)
propertyValuean object property value (e.g., for a validation error)
syscallthe name of a system call that failed
the symbolic value of errno (e.g., "ENOENT").
errno Do not use this for errors that don’t actually set the C value of errno.
Use “name” to distinguish between types of errors
const func = () => {
return new Promise((resolve, reject) => {
setImmediate(() => {
throw new Error('foo');
});
});
};
const main = async () => {
try {
await func();
} catch (ex) {
console.log('will not execute');
}
};
main();