NPM registry in numbers
Often in development you have to decide between a DIY solution or a few ready-made packages. Sometimes you might try to justify the answer by relying on the community choice, and usually it is a safe bet.
For example, have you ever been asked to choose between underscore and lodash? Right on the front page of the npm registry there are some stats:
- Total Packages: 94 767
- Most dependent upon 7071 underscore 6475 async 5604 request 4960 lodash 3644 commander 3555 express 2717 optimist 2639 coffee-script 2612 colors 2253 mkdirp
It looks like underscore is more popular than lodash. I think this table is flawed.
Maturity of packages
One of the first ideas is to check how mature these 100k packages are. We can't analyze the code quality of all packages, but we can check the project age (time interval between first and last version), version count and time since the last update.
Time since the last version
Distribution of days since the last release
Version counts
From these graphs we could come up with some reasonable boundaries. Packages that:
- have more than one version: 71853
- are over two weeks old: 42106
- had a new release in the last 360 days: 71277
There are 31888 packages satisfying all three conditions above. Though it's only one third of all packages there, the amount is still enormous.
There are some time-proven excellent packages that do not meet the above definition of maturity. Yet we can find a dependency closure of mature package set. More precisely, let’s take the dependencies of all packages and add them to the set of packages, then dependencies of dependencies and so on. In fact this procedure terminates after five rounds, resulting in a set of 37466 packages.
We can make our boundaries even stricter: packages should have more than five versions, be over one month old and have a new release within the last 180 days. The closure of this set has only 19857 packages.
For the following statistics we consider both sets, with values for the stricter set in parentheses.
Package families
There are lots of different package families in the NPM registry, grunt-, gulp-, karma- etc. Which are the biggest ones?
The top 5 is:
1673
(742) grunt756
(338↓) generator712
(359↑) node620
(352↑) gulp243
(124) expressNPM package families
There are three times more grunt packages than gulp packages, though it begs the question of what those extra thousand grunt packages do.
Reverse dependencies
The npmjs.org site doesn't show reverse development dependencies. As there are many build process -related packages in the registry, it's interesting to know how much they are depended upon.
Most dependent on during development:
12063
(6912) mocha5069
(2640) grunt4220
(2400) chai3553
(3061) should3403
(1731) grunt-contrib-jshint1923
(925↓) grunt-contrib-clean1844
(1154↑) tape1823
(970↑) grunt-contrib-watch1704
(921↓) coffee-script1648
(1032↑) jshint1573
(857↓) sinon1546
(972↑) istanbulAs you can see from the list above, more people depend on jshint through grunt-contrib-jshint rather than directly. So we did a slightly more sophisticated table:
Depended during development upon packages depended upon:
12579
(7171) mocha5497
(3040) jshint1924
(1176) istanbul1985
(985) nodeunit623
(356) jasmine-node403
(247) jscs161
(97) eslintMocha is clearly the most popular test framework, while nodeunit is way behind. And Jasmine comes third. Community interest in code coverage is increasing, and it's indeed so easy with istanbul Also, more people are interested in checking style issues in their code with jscs than performing static analysis with ESLint. And it's surprising how low jshint usage is. People probably (hopefully!) use a globally installed jshint rather than using it through some other package.
And it was easy to remake the most dependent upon list using only our mature package set.
3215↑
(1867) async2885↓
(1507↓) underscore2650↑
(1555↑) lodash2286↑
(1191) request1514
(805) commander1192↑
(752) mkdirp1150↑
(722) debug1147↓
(646↑) express1106↓
(661↓) colors1096?
(597) q1063↓
(562) optimist924↓
(482↓) coffee-script892?
(533↑) chalkThe list has almost the same entries, but the order and relative counts have changed. The difference between underscore and lodash is negligible in this table. It looks like underscore was used for many projects that are now abandoned. With stricter maturity boundaries lodash is even more popular than underscore!
The Chalk command-line colors library might soon be more popular than colors : the trend between our refined datasets justifies this. Additionally, commander for parsing command line arguments starts to be clearly more popular than the deprecated optimist.
It is surprising to see CoffeeScript in this list as a language compiler should be mainly a dev-dependency. You compile the CoffeeScript source to JavaScript for distribution, so you don't need coffee-script to be a dependency. The packages that depend upon coffee-script include among others: grunt, jasmine-node, jscoverage, cucumber and hubot. They all allow you to use CoffeeScript sources.
Licenses
License issues are highly important, but aren’t usually stressed enough. If you try to look up license information about packages in the npm registry, you'll be frustrated quite quickly. The npm package.json document suggests picking a license from the SPDX License List. That's not true for many packages.
After mangling and cleaning up the packages' metadata, we found the following numbers for the smaller package set (19857 packages):
12018
MIT4099
unknown
1120
BSD687
Apache-2.0442
ISC428
not recognized or not OSI license
301
BSD-2-Clause188
GPL-3.0101
BSD-3-Clause93
Apache92
GPL47
GPL-2.029
LGPL-3.027
LGPL26
AGPL-3.0The license field is missing in one fifth of packages. We hope that the license is mentioned in a readme field. The license field should be mandatory, or at least there should be a warning if it's missing, in the same way as for the readme field.
Always remember to check the licenses of transitive dependencies. There are packages which say they are licensed under MIT, yet they depend on an (A)GPL package! That might or might not to be an issue for you.
More stats
About 33327 (18004) out of 37466 (19857) packages (that is, 89% (91%)) are hosted on GitHub. The Bitbucket users are in the minority, with 180 (84) packages. What is surprising is that there are five (one) packages using Subversion. GitHub is definitely one of the cornerstones for open source software nowadays.
13% (18%) of packages have more than one maintainer. This metric suggests that our definition of maturity is reasonable. Yet we would all like this number to be much higher.
Conclusion
The npm registry is full of packages, and though our definition of maturity wasn't very strict, only a third of packages are even close to production quality. The total package count doesn’t represent the whole truth, neither for the npm registry, nor for other package repositories.
- Oleg GrenrusSoftware Developer