Why do we need Streams in Node.js?

·

4 min read

Streams are sequences of data made available over time. The difference with other types of data like strings or arrays is that streams might not be available all at once, and they don’t have to fit in memory.

Many of the built-in modules in Node implement the streaming interface, such as HTTP response, HTTP request, fs, process.stdin, etc.

Let's see how Stream solves our HTTP server problem.

Assume we need to serve a big file using a Node web server.

mkdir blog-why-node-streams
cd blog-why-node-streams
npm init -y
echo "" > index.js

Update index.js with the below code,

const fs = require("fs");
const http = require("http");

// code generate random sized file on fly
fs.stat("big.file", function (err, stat) {
  if (err == null) {
    console.log("File exists");
  } else if (err.code === "ENOENT") {
    const file = fs.createWriteStream("./big.file");
    for (let i = 0; i <= 1e6; i++) {
      file.write(
        `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur tempus id metus a sodales. Maecenas faucibus bibendum mauris elementum ultrices. In hac habitasse platea dictumst. Pellentesque consequat augue nec urna interdum, a sagittis arcu ornare. Duis pulvinar odio vitae velit euismod, nec pretium nisi tempus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras ante lorem, suscipit non lobortis venenatis, interdum a dui. Donec rhoncus magna lectus, ut vestibulum eros rutrum gravida. Aenean sit amet fringilla erat. In varius fermentum justo, in maximus sapien tempus non. Sed malesuada tempor erat eget tristique. Pellentesque diam nulla, pharetra sed luctus nec, euismod non tortor.`
      );
    }
    console.log("big.file created");
    file.end();
  } else {
    console.log("Some other error: ", err.code);
  }
});

const server = http.createServer();

server.on("request", (req, res) => {
  fs.readFile("./big.file", (err, data) => {
    if (err) throw err;

    res.end(data);
  });
});

server.listen(8000, () => console.log("The server is running at localhost:8000"));

The above code does two things:

  1. The first part of the code is to generate a huge (~600 MB) file. A utility code.
  2. The second part is a simple web server endpoint serving the big.file file.

Let's run the server.

> node index.js

The server is running at localhost:8000
big.file created

After starting the node server, let's see the memory usage using the Windows task manager. We have ~5.8 MB of memory consumed by our server.

stream-http-1.png

Now let's curl the endpoint.

> curl localhost:8000

Lorem ipsum dolor sit amet, consectetur adipiscing e......
............................
.......................

Now, look at the memory consumption for the server using task manager.

stream-http-2.png

When we run the server, it starts out with a normal amount of memory, ~5.8 MB. Then we connected to the server. Note what happened to the memory consumed. The memory consumption jumped to ~684 MB.

How does it work?

We basically put the whole big.file content in memory before we wrote it out to the response object. This is very inefficient.

Solution.

The HTTP response object (res in the code above) is also a writable stream. This means that if we have a readable stream that represents the content of big.file, we can simply pipe those two together and get nearly the same result without consuming ~ 684 MB of memory.

Node’s fs module can give us a readable stream for any file using the createReadStream method. We can pipe that to the response object.

So, replace the request handler code with the below code snippet and measure the memory consumption.

server.on("request", (req, res) => {
  const src = fs.createReadStream("./big.file");
  src.pipe(res);
});
> node index.js

The server is running at localhost:8000
big.file created

Now, let's curl the endpoint,

> curl localhost:8000

Lorem ipsum dolor sit amet, consectetur adipiscing e......
............................
.......................

Now, look at the memory consumption for the server in the task manager.

stream-http-3.png

When we ran the server, it started out with a normal amount of memory, ~ 5.8 MB. Then we connected to the server (curl). Note what happened to the memory consumed. The memory consumption is just ~8 MB.

Now, what's changed, and how is it working?

When a client asks for that big file, we stream it one chunk at a time, which means we don’t buffer it in memory at all. The memory usage grew by about ~8 MB and that’s it.

These scenarios need not be for just an HTTP server; they may be applicable to cases such as file content manipulation, big file creation, uploading files from client to server, or sending big audio or video file to a client, etc.