Node.js multithreading with worker threads series: worker_threads tutorial
Node.js provides a single-threaded JavaScript run-time surface that prevents code from running multiple operations in parallel. If your application typically employs synchronous execution, you may encounter blocks during long-running operations.
However, Node.js itself is a multi-threaded application. This is evident when you use one of the standard library’s asynchronous methods to perform I/O operations, such as reading a file or making a network request. These tasks are delegated to a separate pool of threads that Node creates and maintains using the libuv C library. Although it feels like multi-threading, it’s still possible for async functions to block the main event loop.
But what if you want to create your own independent threads for your Node.js application? Multi-threading can offer substantial performance improvements for CPU-bound workflows by allowing arbitrary work to be performed in parallel. Although Node.js doesn’t offer real multi-threading, you can create something similar with the worker_threads
module. This article will explain what it does, and show how to use it in a few real-world applications.
What are worker threads?
The worker_threads
module implements a form of threading that lets you add parallelism to your own application. Code executed in a worker thread runs in a separate child process, preventing it from blocking your main application.
Worker threads are not real threads in the traditional sense. They’re distinct processes, which means they can’t directly access the execution context of their parents. Communication between worker threads and your application is facilitated by an event-based messaging system.
Although worker threads don’t turn Node.js into a true multi-threaded language, the difference is academic in many real-world scenarios. They implement a convenient mechanism for running several execution “threads” concurrently, letting you take intensive work out of the main loop.
Worker thread use cases
Worker threads can be employed anywhere you’re using expensive CPU-bound operations. They’re not suitable for accelerating I/O work because of the overhead associated with each thread. Node.js’ built-in async I/O utilities will be quicker and more efficient for filesystem and network tasks.
While there’s no shortage of situations where worker threads can help, here are some especially common use cases where you could benefit from them:
- Image resizing: Resizing large images can take several seconds, and the delays add up quickly if you need to generate multiple sizes. This is common in applications that convert uploaded photos into thumbnails, as well as small and large formats. You could use three worker threads to start generating all the sizes at the same time, reducing the total duration of the process.
- Video compression: Video compression is one of the most taxing compute tasks around. Worker threads can accelerate it by processing multiple frames in parallel, then posting the results back to the main thread.
- File encryption and other cryptography: Cryptographic operations are intentionally complex. Encrypting and decrypting files, generating secret keys, and performing signature verification can all create perceptible delays in a program when these tasks are run on the main thread.
- Sorting and searching large amounts of data: Filtering and sorting data requires extensive iteration to compare each value. It can be accelerated by using worker threads to look at multiple pieces of data in parallel.
- Complex mathematical operations: Mathematical computation — such as generating primes, factorizing large numbers, and complex data analysis — is inherently CPU-intensive. Performing some of the work in a separate thread can free up the main loop to work on other tasks.
The slowdown in all these operations is caused by the CPU spending a lot of time executing code, as opposed to reading data from disk or the network. They’re iterative tasks, so increasing the number of passes performed in parallel is the best route to a performance improvement. Worker threads are a mechanism for achieving this.
Using the worker threads module
Worker threads are created by importing Node’s worker_threads
module and calling the new Worker()
constructor. A new process will be spawned to execute a JavaScript file that you specify. You can exchange messages with the worker to get notified when events occur, data is ready for processing, or an error occurs. If you’ve used Web Workers in a browser, the worker thread concept will feel familiar.
Here’s the simplest example of running a JavaScript file in a worker thread:
javascript
const {
Worker,
isMainThread,
parentPort,
workerData
} = require("worker_threads");
if (isMainThread) {
const worker = new Worker(__filename, {workerData: "hello"});
worker.on("message", msg => console.log(`Worker message received: ${msg}`));
worker.on("error", err => console.error(error));
worker.on("exit", code => console.log(`Worker exited with code ${code}.`));
}
else {
const data = workerData;
parentPort.postMessage(`You said \"${data}\".`);
}
Copy this code snippet and save it to a file called worker-demo.js
in your working directory. When you run the code with Node.js, you should get the following output:
$ node worker-demo.js
Worker message received: You said "hello".
Worker exited with code 0.
This simple code demonstrates all the fundamentals of worker threads. The source code can act either as the main thread or as a worker thread. The worker_threads
module provides an isMainThread
export that lets you check whether the code is running as the main thread.
When the file’s the main thread — meaning you’ve launched it from your terminal — then the first branch of the if
statement runs. This creates a new Worker
instance.
The first parameter given to the Worker
constructor is the path to the JavaScript file to execute in the worker. The __filename
global can be used for this example, because the same file provides both the main thread and worker thread code. The Worker
thread’s second parameter takes an options object. Within this object, the workerData
property lets you pass values through to the worker thread.
The remaining code in the main thread section sets up event listeners on the Worker
instance. These let you react to messages, errors, and process terminations occurring within the worker.
When the sample code is run by a worker thread, the isMainThread
variable will be false
and the else
branch will execute. This uses the worker_threads
module’s workerData
export to access the data passed in from the main thread. The parentPort
export provides an interface to the main thread, while the postMessage()
function lets you send data back to the main thread, where it will be reported as a message
event.
This means that running worker-demo.js
results in the following sequence of actions:
- A new worker is constructed, and
hello
is passed in as its data. - The code running in the worker accesses its data and passes a new friendly message back to the main thread.
- The main thread’s event listener picks up the message sent from the worker thread and emits it to the console.
- The worker thread has no more code to run, so it exits. The main thread is notified of this by the
exit
event.
Now you can try some real-world worker thread examples.
Using worker threads to resize images
This code will convert an image to three different sizes in parallel. It’s much quicker than processing each size sequentially.
First, create the code that will resize the image. Save this to resize-worker.js
:
javascript
const {parentPort, workerData} = require("worker_threads");
const sharp = require("sharp");
const {src, width, height} = workerData;
const [filename, ext] = src.split(".");
console.log(`Resizing ${src} to ${width}px wide`);
const resize = async () => {
await sharp(src)
.resize(width, height, {fit: "cover"})
.toFile(`${src}-${width}.${ext}`);
};
resize();
Create the main thread code in a new file and name it resize-main.js
:
javascript
const {Worker} = require("worker_threads");
const src = process.argv[2];
const sizes = [
{width: 1920},
{width: 1280},
{width: 640}
];
for (const size of sizes) {
const worker = new Worker(
__dirname + "/resize-worker.js",
{
workerData: {
src,
...size
}
}
);
}
Use npm to install the sharp module, which provides the image resize function:
$ npm install sharp
Next, place a large image in your working directory and name it image.jpg
. You could use this colorful photo of threads from Unsplash. Run your code with the following command:
$ node resize-main.js image.jpg
You should see all three “Resizing” messages appear instantaneously. The main thread iterates over the requested sizes, creates a new thread for each one, and waits until they complete. This ensures the sizes are generated in parallel, without blocking the main thread.
Using worker threads to resize video
This code snippet demonstrates how video can be resized in a worker thread with [FFmpeg. It frees up the main thread to continue performing other tasks while the intensive video resize completes.
Save the worker code to a new file with the name video-worker.js
:
javascript
const {parentPort} = require("worker_threads");
const ffmpeg = require("fluent-ffmpeg");
const resizeVideo = (src, size) => {
const [filename, ext] = src.split(".");
const output = `${__dirname}/${filename}-${size}.${ext}`;
ffmpeg(`${__dirname}/${src}`)
.size(size)
.on("end", () => parentPort.postMessage({output, input: src, type: "done"}))
.save(output);
};
parentPort.on("message", msg => {
const {file, size} = msg;
const [filename, ext] = file.split(".")[0];
resizeVideo(file, size);
});
Now save the main thread code to a new file named video-main.js
:
javascript
const {StaticPool} = require("node-worker-threads-pool");
const pool = new StaticPool({
size: 4,
task: __dirname + "/video-worker.js"
});
const videoToResize = process.argv[2];
const videoTargetSize = process.argv[3];
const resize = async () => {
await pool.exec({file: videoToResize, size: videoTargetSize}).then(() => {
if (msg?.type === "done") {
console.log(`Saved ${videoToResize} to ${msg.output}`);
}
});
};
resize();
Install the npm dependencies required by the code:
$ npm install fluent-ffmpeg node-worker-threads-pool
The example relies on the popular ffmpeg encoder. The fluent-ffmpeg package is a Node.js wrapper around the existing ffmpeg libraries on your system. You’ll need to [install ffmpeg before you can run your code. Most Linux environments will already include it, but Windows and Mac users may need to get it manually.
You can check whether ffmpeg is available by running its command:
$ ffmpeg
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
Seeing the version number means it’s already installed. If you get a message similar to “command ‘ffmpeg’ not found”, follow the guidance on the ffmpeg website to download the correct library for your system. Ffmpeg is also available in the package repositories of most popular Linux distributions, as well as Homebrew for macOS:
$ brew install ffmpeg
Now you can try resizing a video file in the background. You could download this clip from Pexels if you don’t have your own file available. Save it to video.mp4
in your working directory.
Next run your script to downscale the clip to 1280×720:
$ node video-main.js video.mp4 1280x720
Saved video.mp4 to /path-to-video-1280x720.mp4
The example works a little differently to the last one. It uses the concept of a “worker pool” to improve efficiency. Unlike the previous example, this one won’t automatically terminate the process after the resize completes. The worker thread is still active and listening for messages, so you’ll need to terminate by pressing Ctrl+C in your terminal.
Worker pools can help reduce resource consumption. Creating a worker adds an overhead each time, so it’s good practice to spawn a set number of threads and reuse them where possible. Poolifier, Piscina, and other similar libraries make it easier to use worker threads in this way. They handle the creation of threads, up to a user-specified cap, to serve new tasks. Further tasks are then allocated to existing threads as they become available.
Using workers to encrypt files
This example demonstrates how to combine worker_threads with the Node.js crypto
module to encrypt a file in a separate thread.
First, save the worker thread code in a new file named encrypt-worker.js
:
javascript
const {parentPort, workerData} = require("worker_threads");
const crypto = require("crypto");
const fs = require("fs");
const {file} = workerData;
const output = `${file}.encrypted`;
const key = crypto.randomBytes(32);
const iv = crypto.randomBytes(16);
const cipher = crypto.createCipheriv("aes-256-ctr", Buffer.from(key), iv);
const readStream = fs.createReadStream(file);
const writeStream = fs.createWriteStream(output);
readStream.pipe(cipher).pipe(writeStream);
writeStream.on("close", () => parentPort.postMessage({key: key.toString("hex"), output, type: "done"}));
Copy this main thread code to a new file named encrypt-main.js
:
javascript
const {Worker} = require("worker_threads");
// Perform some other tasks here
const fileToEncrypt = process.argv[2];
const worker = new Worker(
__dirname + "/encrypt-worker.js",
{
workerData: {
file: fileToEncrypt
}
}
);
worker.on("message", msg => {
if (msg?.type === "done") {
console.log(`File encrypted to ${msg.output}`);
console.log(`The key is ${msg.key} - don't lose it!`);
}
});
// Perform some other tasks that don't need to wait for the encryption here
Create a simple text file in your working directory, ready to encrypt, using the following command in your terminal:
$ echo foobar > demo.txt
Now execute your code to encrypt the file:
$ node encrypt-main.js demo.txt
File encrypted to demo.txt.encrypted
The key is 20eb1974c95553ff4f16638192da8b4cbfd780cec8544579f04cde181b78bd7c - don't lose it!
This example uses a message sent from the worker thread to tell the user the output file path and the secret key that was generated.
Your worker threads crash course
Node.js is a single-threaded runtime. This derives from JavaScript being a synchronous blocking language, which runs everything in one thread. Node.js’s async standard library components appear to be non-blocking because they use a separate pool of threads.
The worker_threads
module offers something similar for inclusion in your own code. It’s not true multi-threading, but in many situations it’ll be close enough, letting you execute code in parallel outside the main thread.
This article has explained what worker threads are, when they can be used, and how you can get started with them in your project. The next installment in this series will dive into the pros and cons of worker threads and compare them to the multi-threading implementations found in other languages.