index & main

Shrink node_modules with refining

Alternative universe title: Oops, I shrunk node_modules!

It's easy to end up with a bloated node_modules directory when developing a Node.js application. You can try to trim your dependencies or be clever about them with caching or symlinks, but that will only get you so far. What if you could shrink the modules themselves?

In this article I introduce refining, a technique to shrink node_modules by deleting unnecessary files within the module itself. It's an experimental technique. You still have to download your full node_modules but you'll be able to shrink it before packaging it into an artifact or Docker image.

I also present two proof-of-concept experiments (with Express and React) using refining, as well as a discussion on limitations and applicability to a production build pipeline.

Why should I care about node_modules weight?

The first victim of a bloated node_modules is your developer feedback loop – unless your build system can cache dependencies your feedback loop just gets longer and longer. Developers should not have to wait more than 10 minutes for build feedback. This is terrible for your DevOps journey.

Your entire pipeline also suffers: you get big Docker images, overloaded image registry servers, and slow cold-starts in a container cluster.

This meme is instantly relatable to anyone that tries to maintain a healthy delivery pipeline:

A meme about node_modules weight

What is refining?

You don't need every file in node_modules! It's filled with fluff that shouldn't have been published, such as documentation, build files, and test code. It may even have unused feature code – for example you might only use part of a modularized package. If you can detect exactly which files your application needs then the rest can be deleted safely.

Refining is a method to separate a useful substance from impurities. In this case the useful substance is a set of required node_modules files and the impurities are the remaining files which can be safely deleted.

Refining is a black-box approach – instead of analyzing the code (like webpack) it uses the operating system to list all the files your application opens. Therefore the inverse of these files could be deleted safely to refine your node_modules.

This technique requires running the application first without any modifications to collect data on its behaviour. Consider the V8 optimizing compiler as an analogy: V8 runs code first with a byte-code compiler (Ignition) while it collects data on code behaviour and hotspots. After analysis V8 switches some of the code to its optimizing machine-code compiler (Turbofan). For safety V8 can also fall back from Turbofan to Ignition if needed. A full application of refining might work in a similar way.

How to refine node_modules

I followed these steps to refine applications:

  1. Start file access tracing
  2. Run application and tests
  3. Stop file access tracing
  4. Delete unused node_modules files
  5. Rerun application and tests

File access tracing

I used opensnoop on macOS to trace file access by all node processes. On Linux you can use strace instead. I extracted only filepaths within node_modules (to exclude system files that Node.js reads on startup), and I printed them relative to my base directory for easy diffing later:

sudo opensnoop -n node \
| grep --color=never "$(pwd)/*node_modules" \
| awk '{print $5}' \
| sed -e "s:^$(pwd)::" \
| xargs -L 1 -I %e echo ".%e" \
| tee nm-trace.log

opensnoop stays running until stopped. My test script killed opensnoop after running my application and its tests.

Deleting unused files

I deleted the unused files with:

diff \
  --new-line-format="" \
  --unchanged-line-format="" \
  <(find . -path "*/node_modules/*" -type f | sort) \
  <(sort --unique nm-trace.log) \
| xargs -L 1 rm

I sorted the diff inputs to properly compare changed lines.

Disk space measurement

I used du on macOS to measure disk space. Any sizes reported below are disk space measurements, not apparent size. That's why all measurements are in multiples of 4 kb.

Experiment 1: a simple Express server

Refining a simple Express server reduced disk usage by 59% (from 2.4 mb to 984 kb) while keeping the server working:

Raw Refined Reduction
Disk usage 2.4 mb 984 kb 59%
Number of files 314 123 61%

Here were the top 3 packages in terms of size reduction:

Package Raw size (kb) Refined size (kb) Reduction (kb)
iconv-lite 400 32 368
qs 164 36 128
express 240 120 120

What got deleted from these modules:

This experiment shows that refining is not just about deleting Markdown documents inside a module directory. In addition to deleting documentation refining safely deleted JavaScript files that weren't needed for the application: a qs browser bundle, qs browser code, and iconv-lite optional encodings.

See Appendix 1 for setup details and a list of files that were kept or deleted by refining.

Experiment 2: React with server-side rendering

Refining a simple React application reduced disk usage by 94% (from 4.9 mb to 284 kb) while keeping the application working:

Raw Refined Reduction
Disk usage 4.9 mb 284 kb 94%
Number of files 103 10 90%

Here were the top 3 packages in terms of size reduction:

Package Raw size (kb) Refined size (kb) Reduction (kb)
react-dom 4500 132 4368
react 220 72 148
scheduler 104 0 104

What got deleted from these modules:

This experiment has a much larger reduction than the Express experiment, likely because server-side rendering only uses a fraction of the React code. Only 10 dependency files were actually needed to run this application.

This experiment shows the potential of refining to trim packages based on the features used. Since the experiment focused on server-size rendering, refining deleted all browser bundles and even CommonJS code to support client-side rendering.

npm packages meant for both browsers and Node.js contain bundles to target both environments. However the application artifact only needs to have the Node.js-related code. Refining helped delete irrelevant code safely.

See Appendix 2 for setup details and a list of files that were kept or deleted by refining.

Limitations

Refining needs a complete list of all node_modules files your application needs. The application start and testing steps should ensure that Node.js loads all the node_modules files it needs in production.

Of course this would also catch test framework code. To avoid this you would have to pull your test framework out of your node_modules and supply it from another location (maybe a higher-level folder).

Ideally an application would require all packages up front however some applications may lazy load other modules. Your dependencies might also lazy load modules (though I would personally consider that bad behaviour from a library package).

We need a generalized way to quickly determine if an application has finished file access for node_modules. Maybe if you monitor file access in production on your canary for a while, then you can build a reliable profile for later refining and release to broader traffic.

Once you have a complete list you could check it into your codebase as a snapshot, similar to React snapshot testing. Then you would update it periodically as your dependency usage changes.

Conclusion

Refining has potential and works well for toy applications. The next step is to try it on larger and more complex applications.

Refining solves a problem that ideally shouldn't exist but unfortunately does. Even battle-tested and stable packages such as express have unnecessary files. Package maintainers could reduce size by actively including files to publish through the files keyword in package.json. This would prevent files such as .travis.yml from being published. However it wouldn't be feasible to coordinate an effort to trim even the most popular npm packages. I also can't see package maintainers going out of their way to exclude files to trim their modules because they don't pay the cost themselves.

Refining isn't code-intelligent. It's just keeping track of what files are accessed. It can't eliminate dead code within a file or perform tree-shaking. As long as you load a node_modules file from your application entrypoint then refining will keep it. This works well for modules like react-dom which has multiple entrypoints depending on the usage mode (e.g. react-dom/server). If the usage mode is selected through a parameter then likely all files would be retained, even the useless ones.

Packages already exist to delete node_modules files based on pattern matching (such as deleting all Markdown documents), for example Modclean. Refining goes further and deletes actual code files. It is agnostic to the type of a file. It only cares whether it is required or not.

This article gave some ideas about how to apply it to a build pipeline (by storing snapshots) but doesn't have a full answer yet.


Appendix 1: detailed results from Experiment 1 (simple Express server)

Setup

I created a simple Express server application. The only direct dependency was express. This was the entire application:

const express = require('express');
const app = express();
app.use(express.json());
app.get('/', (req, res) => res.send('Hello world!'));
app.listen(8080, () => console.log('Listening on port 8080.'));

I tested the application with an HTTP request:

curl http://localhost:8080/

Files deleted or kept

Here are the files that were either kept or deleted by refining from the top 3 packages in terms of size reduction:

iconv-lite

File Size (kb) Status
lib/bom-handling.js 4 Keep
lib/extend-node.js 12 Keep
lib/index.js 8 Keep
lib/streams.js 4 Keep
package.json 4 Keep
.travis.yml 4 Delete
Changelog.md 8 Delete
LICENSE 4 Delete
README.md 8 Delete
encodings/dbcs-codec.js 24 Delete
encodings/dbcs-data.js 12 Delete
encodings/index.js 4 Delete
encodings/internal.js 8 Delete
encodings/sbcs-codec.js 4 Delete
encodings/sbcs-data-generated.js 32 Delete
encodings/sbcs-data.js 8 Delete
encodings/tables/big5-added.json 20 Delete
encodings/tables/cp936.json 48 Delete
encodings/tables/cp949.json 40 Delete
encodings/tables/cp950.json 44 Delete
encodings/tables/eucjp.json 44 Delete
encodings/tables/gb18030-ranges.json 4 Delete
encodings/tables/gbk-added.json 4 Delete
encodings/tables/shiftjis.json 24 Delete
encodings/utf16.js 8 Delete
encodings/utf7.js 12 Delete
lib/index.d.ts 4 Delete

qs

File Size (kb) Status
lib/formats.js 4 Keep
lib/index.js 4 Keep
lib/parse.js 8 Keep
lib/stringify.js 8 Keep
lib/utils.js 8 Keep
package.json 4 Keep
.editorconfig 4 Delete
.eslintignore 4 Delete
.eslintrc 4 Delete
CHANGELOG.md 16 Delete
LICENSE 4 Delete
README.md 16 Delete
dist/qs.js 20 Delete
test/.eslintrc 4 Delete
test/index.js 4 Delete
test/parse.js 24 Delete
test/stringify.js 24 Delete
test/utils.js 4 Delete

express

File Size (kb) Status
index.js 4 Keep
lib/application.js 16 Keep
lib/express.js 4 Keep
lib/middleware/init.js 4 Keep
lib/middleware/query.js 4 Keep
lib/request.js 16 Keep
lib/response.js 28 Keep
lib/router/index.js 16 Keep
lib/router/layer.js 4 Keep
lib/router/route.js 8 Keep
lib/utils.js 8 Keep
lib/view.js 4 Keep
package.json 4 Keep
History.md 108 Delete
LICENSE 4 Delete
Readme.md 8 Delete

Appendix 2: detailed results from Experiment 2 (React with server-side rendering)

Setup

I created a simple React application with server-side rendering. The only direct dependencies were react and react-dom. This was the entire application:

const React = require('react');
const ReactDOMServer = require('react-dom/server');

class Hello extends React.Component {
  render() {
    return React.createElement('div', null, `Hello ${this.props.toWhat}`);
  }
}

const html = ReactDOMServer.renderToString(
  React.createElement(Hello, {toWhat: 'World'}, null)
);

console.log(html);

Files deleted or kept

Here are the files that were either kept or deleted by refining from the top 3 packages in terms of size reduction:

react-dom

File Size (kb) Status
cjs/react-dom-server.node.development.js 124 Keep
server.js 4 Keep
server.node.js 4 Keep
LICENSE 4 Delete
README.md 4 Delete
build-info.json 4 Delete
cjs/react-dom-server.browser.development.js 120 Delete
cjs/react-dom-server.browser.production.min.js 20 Delete
cjs/react-dom-server.node.production.min.js 20 Delete
cjs/react-dom-test-utils.development.js 48 Delete
cjs/react-dom-test-utils.production.min.js 12 Delete
cjs/react-dom-unstable-fire.development.js 724 Delete
cjs/react-dom-unstable-fire.production.min.js 100 Delete
cjs/react-dom-unstable-fire.profiling.min.js 104 Delete
cjs/react-dom-unstable-fizz.browser.development.js 4 Delete
cjs/react-dom-unstable-fizz.browser.production.min.js 4 Delete
cjs/react-dom-unstable-fizz.node.development.js 4 Delete
cjs/react-dom-unstable-fizz.node.production.min.js 4 Delete
cjs/react-dom-unstable-native-dependencies.development.js 64 Delete
cjs/react-dom-unstable-native-dependencies.production.min.js 12 Delete
cjs/react-dom.development.js 724 Delete
cjs/react-dom.production.min.js 100 Delete
cjs/react-dom.profiling.min.js 104 Delete
index.js 4 Delete
package.json 4 Delete
profiling.js 4 Delete
server.browser.js 4 Delete
test-utils.js 4 Delete
umd/react-dom-server.browser.development.js 124 Delete
umd/react-dom-server.browser.production.min.js 20 Delete
umd/react-dom-test-utils.development.js 48 Delete
umd/react-dom-test-utils.production.min.js 12 Delete
umd/react-dom-unstable-fire.development.js 728 Delete
umd/react-dom-unstable-fire.production.min.js 100 Delete
umd/react-dom-unstable-fire.profiling.min.js 104 Delete
umd/react-dom-unstable-fizz.browser.development.js 4 Delete
umd/react-dom-unstable-fizz.browser.production.min.js 4 Delete
umd/react-dom-unstable-native-dependencies.development.js 64 Delete
umd/react-dom-unstable-native-dependencies.production.min.js 12 Delete
umd/react-dom.development.js 728 Delete
umd/react-dom.production.min.js 100 Delete
umd/react-dom.profiling.min.js 104 Delete
unstable-fizz.browser.js 4 Delete
unstable-fizz.js 4 Delete
unstable-fizz.node.js 4 Delete
unstable-native-dependencies.js 4 Delete

react

File Size (kb) Status
cjs/react.development.js 64 Keep
index.js 4 Keep
package.json 4 Keep
LICENSE 4 Delete
README.md 4 Delete
build-info.json 4 Delete
cjs/react.production.min.js 8 Delete
umd/react.development.js 100 Delete
umd/react.production.min.js 12 Delete
umd/react.profiling.min.js 16 Delete

scheduler

Refining deleted this package entirely.


Comments

Discuss this on: Hacker News or Reddit (/r/javascript) (/r/node).