The rest

NPM registry in numbers

Often in development you have to decide between a DIY solution or a few ready-made packages. Sometimes you might try to justify the answer by relying on the community choice, and usually it is a safe bet. For example, have you ever been asked to choose between underscore and lodash? Right on the front page of the npm registry there are some stats:

  • Total Packages: 94 767
  • Most dependent upon 7071 underscore 6475 async 5604 request 4960 lodash 3644 commander 3555 express 2717 optimist 2639 coffee-script 2612 colors 2253 mkdirp

It looks like underscore is more popular than lodash. I think this table is flawed.

Maturity of packages

One of the first ideas is to check how mature these 100k packages are. We can't analyze the code quality of all packages, but we can check the project age (time interval between first and last version), version count and time since the last update.

Time since the last version​
Time since the last version​
Distribution of days since the last release​
Distribution of days since the last release​
Version counts​
Version counts​

From these graphs we could come up with some reasonable boundaries. Packages that:

  • have more than one version: 71853
  • are over two weeks old: 42106
  • had a new release in the last 360 days: 71277

There are 31888 packages satisfying all three conditions above. Though it's only one third of all packages there, the amount is still enormous.

There are some time-proven excellent packages that do not meet the above definition of maturity. Yet we can find a dependency closure of mature package set. More precisely, let’s take the dependencies of all packages and add them to the set of packages, then dependencies of dependencies and so on. In fact this procedure terminates after five rounds, resulting in a set of 37466 packages.

We can make our boundaries even stricter: packages should have more than five versions, be over one month old and have a new release within the last 180 days. The closure of this set has only 19857 packages.

For the following statistics we consider both sets, with values for the stricter set in parentheses.

Package families

There are lots of different package families in the NPM registry, grunt-, gulp-, karma- etc. Which are the biggest ones?

The top 5 is:

1673
(742)
grunt

756
(338↓)
generator

712
(359↑)
node

620
(352↑)
gulp

243
(124)
express

NPM package families​
NPM package families​

There are three times more grunt packages than gulp packages, though it begs the question of what those extra thousand grunt packages do.

Reverse dependencies

The npmjs.org site doesn't show reverse development dependencies. As there are many build process -related packages in the registry, it's interesting to know how much they are depended upon.

Most dependent on during development:

12063
(6912)
mocha

5069
(2640)
grunt

4220
(2400)
chai

3553
(3061)
should

3403
(1731)
grunt-contrib-jshint

1923
(925↓)
grunt-contrib-clean

1844
(1154↑)
tape

1823
(970↑)
grunt-contrib-watch

1704
(921↓)
coffee-script

1648
(1032↑)
jshint

1573
(857↓)
sinon

1546
(972↑)
istanbul

As you can see from the list above, more people depend on jshint through grunt-contrib-jshint rather than directly. So we did a slightly more sophisticated table:

Depended during development upon packages depended upon:

12579
(7171)
mocha

5497
(3040)
jshint

1924
(1176)
istanbul

1985
(985)
nodeunit

623
(356)
jasmine-node

403
(247)
jscs

161
(97)
eslint

Mocha is clearly the most popular test framework, while nodeunit is way behind. And Jasmine comes third. Community interest in code coverage is increasing, and it's indeed so easy with istanbul Also, more people are interested in checking style issues in their code with jscs than performing static analysis with ESLint. And it's surprising how low jshint usage is. People probably (hopefully!) use a globally installed jshint rather than using it through some other package.

And it was easy to remake the most dependent upon list using only our mature package set.

3215↑
(1867)
async

2885↓
(1507↓)
underscore

2650↑
(1555↑)
lodash

2286↑
(1191)
request

1514
(805)
commander

1192↑
(752)
mkdirp

1150↑
(722)
debug

1147↓
(646↑)
express

1106↓
(661↓)
colors

1096?
(597)
q

1063↓
(562)
optimist

924↓
(482↓)
coffee-script

892?
(533↑)
chalk

The list has almost the same entries, but the order and relative counts have changed. The difference between underscore and lodash is negligible in this table. It looks like underscore was used for many projects that are now abandoned. With stricter maturity boundaries lodash is even more popular than underscore!

The Chalk command-line colors library might soon be more popular than colors : the trend between our refined datasets justifies this. Additionally, commander for parsing command line arguments starts to be clearly more popular than the deprecated optimist.

It is surprising to see CoffeeScript in this list as a language compiler should be mainly a dev-dependency. You compile the CoffeeScript source to JavaScript for distribution, so you don't need coffee-script to be a dependency. The packages that depend upon coffee-script include among others: grunt, jasmine-node, jscoverage, cucumber and hubot. They all allow you to use CoffeeScript sources.

Licenses

License issues are highly important, but aren’t usually stressed enough. If you try to look up license information about packages in the npm registry, you'll be frustrated quite quickly. The npm package.json document suggests picking a license from the SPDX License List. That's not true for many packages.

After mangling and cleaning up the packages' metadata, we found the following numbers for the smaller package set (19857 packages):

12018
MIT

4099

unknown

1120
BSD

687
Apache-2.0

442
ISC

428

not recognized or not OSI license

301
BSD-2-Clause

188
GPL-3.0

101
BSD-3-Clause

93
Apache

92
GPL

47
GPL-2.0

29
LGPL-3.0

27
LGPL

26
AGPL-3.0

The license field is missing in one fifth of packages. We hope that the license is mentioned in a readme field. The license field should be mandatory, or at least there should be a warning if it's missing, in the same way as for the readme field.

Always remember to check the licenses of transitive dependencies. There are packages which say they are licensed under MIT, yet they depend on an (A)GPL package! That might or might not to be an issue for you.

More stats

About 33327 (18004) out of 37466 (19857) packages (that is, 89% (91%)) are hosted on GitHub. The Bitbucket users are in the minority, with 180 (84) packages. What is surprising is that there are five (one) packages using Subversion. GitHub is definitely one of the cornerstones for open source software nowadays.

13% (18%) of packages have more than one maintainer. This metric suggests that our definition of maturity is reasonable. Yet we would all like this number to be much higher.

Conclusion

The npm registry is full of packages, and though our definition of maturity wasn't very strict, only a third of packages are even close to production quality. The total package count doesn’t represent the whole truth, neither for the npm registry, nor for other package repositories.