Shrink node_modules with refining
Alternative universe title: Oops, I shrunk node_modules!
It's easy to end up with a bloated node_modules
directory when developing a Node.js application. You can try to trim your dependencies or be clever about them with caching or symlinks, but that will only get you so far. What if you could shrink the modules themselves?
In this article I introduce refining, a technique to shrink node_modules
by deleting unnecessary files within the module itself. It's an experimental technique. You still have to download your full node_modules
but you'll be able to shrink it before packaging it into an artifact or Docker image.
I also present two proof-of-concept experiments (with Express and React) using refining, as well as a discussion on limitations and applicability to a production build pipeline.
Why should I care about node_modules weight?
The first victim of a bloated node_modules
is your developer feedback loop – unless your build system can cache dependencies your feedback loop just gets longer and longer. Developers should not have to wait more than 10 minutes for build feedback. This is terrible for your DevOps journey.
Your entire pipeline also suffers: you get big Docker images, overloaded image registry servers, and slow cold-starts in a container cluster.
This meme is instantly relatable to anyone that tries to maintain a healthy delivery pipeline:
What is refining?
You don't need every file in node_modules
! It's filled with fluff that shouldn't have been published, such as documentation, build files, and test code. It may even have unused feature code – for example you might only use part of a modularized package. If you can detect exactly which files your application needs then the rest can be deleted safely.
Refining is a method to separate a useful substance from impurities. In this case the useful substance is a set of required node_modules
files and the impurities are the remaining files which can be safely deleted.
Refining is a black-box approach – instead of analyzing the code (like webpack) it uses the operating system to list all the files your application opens. Therefore the inverse of these files could be deleted safely to refine your node_modules
.
This technique requires running the application first without any modifications to collect data on its behaviour. Consider the V8 optimizing compiler as an analogy: V8 runs code first with a byte-code compiler (Ignition) while it collects data on code behaviour and hotspots. After analysis V8 switches some of the code to its optimizing machine-code compiler (Turbofan). For safety V8 can also fall back from Turbofan to Ignition if needed. A full application of refining might work in a similar way.
How to refine node_modules
I followed these steps to refine applications:
- Start file access tracing
- Run application and tests
- Stop file access tracing
- Delete unused
node_modules
files - Rerun application and tests
File access tracing
I used opensnoop
on macOS to trace file access by all node
processes. On Linux you can use strace
instead. I extracted only filepaths within node_modules
(to exclude system files that Node.js reads on startup), and I printed them relative to my base directory for easy diffing later:
sudo opensnoop -n node \
| grep --color=never "$(pwd)/*node_modules" \
| awk '{print $5}' \
| sed -e "s:^$(pwd)::" \
| xargs -L 1 -I %e echo ".%e" \
| tee nm-trace.log
opensnoop
stays running until stopped. My test script killed opensnoop
after running my application and its tests.
Deleting unused files
I deleted the unused files with:
diff \
--new-line-format="" \
--unchanged-line-format="" \
<(find . -path "*/node_modules/*" -type f | sort) \
<(sort --unique nm-trace.log) \
| xargs -L 1 rm
I sorted the diff
inputs to properly compare changed lines.
Disk space measurement
I used du
on macOS to measure disk space. Any sizes reported below are disk space measurements, not apparent size. That's why all measurements are in multiples of 4 kb.
Experiment 1: a simple Express server
Refining a simple Express server reduced disk usage by 59% (from 2.4 mb to 984 kb) while keeping the server working:
Raw | Refined | Reduction | |
---|---|---|---|
Disk usage | 2.4 mb | 984 kb | 59% |
Number of files | 314 | 123 | 61% |
Here were the top 3 packages in terms of size reduction:
Package | Raw size (kb) | Refined size (kb) | Reduction (kb) |
---|---|---|---|
iconv-lite | 400 | 32 | 368 |
qs | 164 | 36 | 128 |
express | 240 | 120 | 120 |
What got deleted from these modules:
iconv-lite
is used for parsing HTTP bodies. The bulk of the 92% size reduction was due to encoding files that are lazy loaded if needed. The remaining reduction came from deleting non-code files and a typings file.qs
is used for parsing URL query parameters. The bulk of the 78% size reduction was due to a browser bundle indist/
and test code. The remaining reduction came from deleting non-code files.express
was the sole direct dependency in this experiment and provides a web server framework. The majority of the 50% size reduction comes from deleting a108 kb
History.md
document.
This experiment shows that refining is not just about deleting Markdown documents inside a module directory. In addition to deleting documentation refining safely deleted JavaScript files that weren't needed for the application: a qs
browser bundle, qs
browser code, and iconv-lite
optional encodings.
See Appendix 1 for setup details and a list of files that were kept or deleted by refining.
Experiment 2: React with server-side rendering
Refining a simple React application reduced disk usage by 94% (from 4.9 mb to 284 kb) while keeping the application working:
Raw | Refined | Reduction | |
---|---|---|---|
Disk usage | 4.9 mb | 284 kb | 94% |
Number of files | 103 | 10 | 90% |
Here were the top 3 packages in terms of size reduction:
Package | Raw size (kb) | Refined size (kb) | Reduction (kb) |
---|---|---|---|
react-dom | 4500 | 132 | 4368 |
react | 220 | 72 | 148 |
scheduler | 104 | 0 | 104 |
What got deleted from these modules:
react-dom
is used to render React applications onto an HTML document object model (DOM) and is a direct dependency. The 97% size reduction was due to mostly deleting all browser (UMD) bundles as well as deleting any CommonJS code incjs/
needed for client-side rendering. Any code needed for server-side rendering was kept:react
is the main direct dependency and is used for creating the view layer of an application. The bulk of the 67% size reduction came from deleting all browser (UMD) bundles. The CommonJS production code was also deleted as this experiment was run in development mode. All essential code was kept:
scheduler
is used by React internally for supporting browser environments. The package was deleted entirely as the application only uses server-side rendering.
This experiment has a much larger reduction than the Express experiment, likely because server-side rendering only uses a fraction of the React code. Only 10 dependency files were actually needed to run this application.
This experiment shows the potential of refining to trim packages based on the features used. Since the experiment focused on server-size rendering, refining deleted all browser bundles and even CommonJS code to support client-side rendering.
npm packages meant for both browsers and Node.js contain bundles to target both environments. However the application artifact only needs to have the Node.js-related code. Refining helped delete irrelevant code safely.
See Appendix 2 for setup details and a list of files that were kept or deleted by refining.
Limitations
Refining needs a complete list of all node_modules
files your application needs. The application start and testing steps should ensure that Node.js loads all the node_modules
files it needs in production.
Of course this would also catch test framework code. To avoid this you would have to pull your test framework out of your node_modules
and supply it from another location (maybe a higher-level folder).
Ideally an application would require
all packages up front however some applications may lazy load other modules. Your dependencies might also lazy load modules (though I would personally consider that bad behaviour from a library package).
We need a generalized way to quickly determine if an application has finished file access for node_modules
. Maybe if you monitor file access in production on your canary for a while, then you can build a reliable profile for later refining and release to broader traffic.
Once you have a complete list you could check it into your codebase as a snapshot, similar to React snapshot testing. Then you would update it periodically as your dependency usage changes.
Conclusion
Refining has potential and works well for toy applications. The next step is to try it on larger and more complex applications.
Refining solves a problem that ideally shouldn't exist but unfortunately does. Even battle-tested and stable packages such as express
have unnecessary files. Package maintainers could reduce size by actively including files to publish through the files
keyword in package.json
. This would prevent files such as .travis.yml
from being published. However it wouldn't be feasible to coordinate an effort to trim even the most popular npm packages. I also can't see package maintainers going out of their way to exclude files to trim their modules because they don't pay the cost themselves.
Refining isn't code-intelligent. It's just keeping track of what files are accessed. It can't eliminate dead code within a file or perform tree-shaking. As long as you load a node_modules
file from your application entrypoint then refining will keep it. This works well for modules like react-dom
which has multiple entrypoints depending on the usage mode (e.g. react-dom/server
). If the usage mode is selected through a parameter then likely all files would be retained, even the useless ones.
Packages already exist to delete node_modules
files based on pattern matching (such as deleting all Markdown documents), for example Modclean. Refining goes further and deletes actual code files. It is agnostic to the type of a file. It only cares whether it is required or not.
This article gave some ideas about how to apply it to a build pipeline (by storing snapshots) but doesn't have a full answer yet.
Appendix 1: detailed results from Experiment 1 (simple Express server)
Setup
I created a simple Express server application. The only direct dependency was express
. This was the entire application:
const express = require('express');
const app = express();
app.use(express.json());
app.get('/', (req, res) => res.send('Hello world!'));
app.listen(8080, () => console.log('Listening on port 8080.'));
I tested the application with an HTTP request:
curl http://localhost:8080/
Files deleted or kept
Here are the files that were either kept or deleted by refining from the top 3 packages in terms of size reduction:
iconv-lite
File | Size (kb) | Status |
---|---|---|
lib/bom-handling.js | 4 | Keep |
lib/extend-node.js | 12 | Keep |
lib/index.js | 8 | Keep |
lib/streams.js | 4 | Keep |
package.json | 4 | Keep |
.travis.yml | 4 | Delete |
Changelog.md | 8 | Delete |
LICENSE | 4 | Delete |
README.md | 8 | Delete |
encodings/dbcs-codec.js | 24 | Delete |
encodings/dbcs-data.js | 12 | Delete |
encodings/index.js | 4 | Delete |
encodings/internal.js | 8 | Delete |
encodings/sbcs-codec.js | 4 | Delete |
encodings/sbcs-data-generated.js | 32 | Delete |
encodings/sbcs-data.js | 8 | Delete |
encodings/tables/big5-added.json | 20 | Delete |
encodings/tables/cp936.json | 48 | Delete |
encodings/tables/cp949.json | 40 | Delete |
encodings/tables/cp950.json | 44 | Delete |
encodings/tables/eucjp.json | 44 | Delete |
encodings/tables/gb18030-ranges.json | 4 | Delete |
encodings/tables/gbk-added.json | 4 | Delete |
encodings/tables/shiftjis.json | 24 | Delete |
encodings/utf16.js | 8 | Delete |
encodings/utf7.js | 12 | Delete |
lib/index.d.ts | 4 | Delete |
qs
File | Size (kb) | Status |
---|---|---|
lib/formats.js | 4 | Keep |
lib/index.js | 4 | Keep |
lib/parse.js | 8 | Keep |
lib/stringify.js | 8 | Keep |
lib/utils.js | 8 | Keep |
package.json | 4 | Keep |
.editorconfig | 4 | Delete |
.eslintignore | 4 | Delete |
.eslintrc | 4 | Delete |
CHANGELOG.md | 16 | Delete |
LICENSE | 4 | Delete |
README.md | 16 | Delete |
dist/qs.js | 20 | Delete |
test/.eslintrc | 4 | Delete |
test/index.js | 4 | Delete |
test/parse.js | 24 | Delete |
test/stringify.js | 24 | Delete |
test/utils.js | 4 | Delete |
express
File | Size (kb) | Status |
---|---|---|
index.js | 4 | Keep |
lib/application.js | 16 | Keep |
lib/express.js | 4 | Keep |
lib/middleware/init.js | 4 | Keep |
lib/middleware/query.js | 4 | Keep |
lib/request.js | 16 | Keep |
lib/response.js | 28 | Keep |
lib/router/index.js | 16 | Keep |
lib/router/layer.js | 4 | Keep |
lib/router/route.js | 8 | Keep |
lib/utils.js | 8 | Keep |
lib/view.js | 4 | Keep |
package.json | 4 | Keep |
History.md | 108 | Delete |
LICENSE | 4 | Delete |
Readme.md | 8 | Delete |
Appendix 2: detailed results from Experiment 2 (React with server-side rendering)
Setup
I created a simple React application with server-side rendering. The only direct dependencies were react
and react-dom
. This was the entire application:
const React = require('react');
const ReactDOMServer = require('react-dom/server');
class Hello extends React.Component {
render() {
return React.createElement('div', null, `Hello ${this.props.toWhat}`);
}
}
const html = ReactDOMServer.renderToString(
React.createElement(Hello, {toWhat: 'World'}, null)
);
console.log(html);
Files deleted or kept
Here are the files that were either kept or deleted by refining from the top 3 packages in terms of size reduction:
react-dom
File | Size (kb) | Status |
---|---|---|
cjs/react-dom-server.node.development.js | 124 | Keep |
server.js | 4 | Keep |
server.node.js | 4 | Keep |
LICENSE | 4 | Delete |
README.md | 4 | Delete |
build-info.json | 4 | Delete |
cjs/react-dom-server.browser.development.js | 120 | Delete |
cjs/react-dom-server.browser.production.min.js | 20 | Delete |
cjs/react-dom-server.node.production.min.js | 20 | Delete |
cjs/react-dom-test-utils.development.js | 48 | Delete |
cjs/react-dom-test-utils.production.min.js | 12 | Delete |
cjs/react-dom-unstable-fire.development.js | 724 | Delete |
cjs/react-dom-unstable-fire.production.min.js | 100 | Delete |
cjs/react-dom-unstable-fire.profiling.min.js | 104 | Delete |
cjs/react-dom-unstable-fizz.browser.development.js | 4 | Delete |
cjs/react-dom-unstable-fizz.browser.production.min.js | 4 | Delete |
cjs/react-dom-unstable-fizz.node.development.js | 4 | Delete |
cjs/react-dom-unstable-fizz.node.production.min.js | 4 | Delete |
cjs/react-dom-unstable-native-dependencies.development.js | 64 | Delete |
cjs/react-dom-unstable-native-dependencies.production.min.js | 12 | Delete |
cjs/react-dom.development.js | 724 | Delete |
cjs/react-dom.production.min.js | 100 | Delete |
cjs/react-dom.profiling.min.js | 104 | Delete |
index.js | 4 | Delete |
package.json | 4 | Delete |
profiling.js | 4 | Delete |
server.browser.js | 4 | Delete |
test-utils.js | 4 | Delete |
umd/react-dom-server.browser.development.js | 124 | Delete |
umd/react-dom-server.browser.production.min.js | 20 | Delete |
umd/react-dom-test-utils.development.js | 48 | Delete |
umd/react-dom-test-utils.production.min.js | 12 | Delete |
umd/react-dom-unstable-fire.development.js | 728 | Delete |
umd/react-dom-unstable-fire.production.min.js | 100 | Delete |
umd/react-dom-unstable-fire.profiling.min.js | 104 | Delete |
umd/react-dom-unstable-fizz.browser.development.js | 4 | Delete |
umd/react-dom-unstable-fizz.browser.production.min.js | 4 | Delete |
umd/react-dom-unstable-native-dependencies.development.js | 64 | Delete |
umd/react-dom-unstable-native-dependencies.production.min.js | 12 | Delete |
umd/react-dom.development.js | 728 | Delete |
umd/react-dom.production.min.js | 100 | Delete |
umd/react-dom.profiling.min.js | 104 | Delete |
unstable-fizz.browser.js | 4 | Delete |
unstable-fizz.js | 4 | Delete |
unstable-fizz.node.js | 4 | Delete |
unstable-native-dependencies.js | 4 | Delete |
react
File | Size (kb) | Status |
---|---|---|
cjs/react.development.js | 64 | Keep |
index.js | 4 | Keep |
package.json | 4 | Keep |
LICENSE | 4 | Delete |
README.md | 4 | Delete |
build-info.json | 4 | Delete |
cjs/react.production.min.js | 8 | Delete |
umd/react.development.js | 100 | Delete |
umd/react.production.min.js | 12 | Delete |
umd/react.profiling.min.js | 16 | Delete |
scheduler
Refining deleted this package entirely.
Comments
Discuss this on: Hacker News or Reddit (/r/javascript) (/r/node).