NPM registry in numbers
Often in development you have to decide between a DIY solution or a few ready-made packages. Sometimes you might try to justify the answer by relying on the community choice, and usually it is a safe bet.
For example, have you ever been asked to choose between underscore and lodash? Right on the front page of the npm registry there are some stats:
- Total Packages: 94 767
- Most dependent upon 7071 underscore 6475 async 5604 request 4960 lodash 3644 commander 3555 express 2717 optimist 2639 coffee-script 2612 colors 2253 mkdirp
It looks like underscore is more popular than lodash. I think this table is flawed.
Maturity of packages
One of the first ideas is to check how mature these 100k packages are. We can't analyze the code quality of all packages, but we can check the project age (time interval between first and last version), version count and time since the last update.
Time since the last version
Distribution of days since the last release
From these graphs we could come up with some reasonable boundaries. Packages that:
- have more than one version: 71853
- are over two weeks old: 42106
- had a new release in the last 360 days: 71277
There are 31888 packages satisfying all three conditions above. Though it's only one third of all packages there, the amount is still enormous.
There are some time-proven excellent packages that do not meet the above definition of maturity. Yet we can find a dependency closure of mature package set. More precisely, let’s take the dependencies of all packages and add them to the set of packages, then dependencies of dependencies and so on. In fact this procedure terminates after five rounds, resulting in a set of 37466 packages.
We can make our boundaries even stricter: packages should have more than five versions, be over one month old and have a new release within the last 180 days. The closure of this set has only 19857 packages.
For the following statistics we consider both sets, with values for the stricter set in parentheses.
There are lots of different package families in the NPM registry, grunt-, gulp-, karma- etc. Which are the biggest ones?
The top 5 is:
NPM package families
There are three times more grunt packages than gulp packages, though it begs the question of what those extra thousand grunt packages do.
The npmjs.org site doesn't show reverse development dependencies. As there are many build process -related packages in the registry, it's interesting to know how much they are depended upon.
Most dependent on during development:
As you can see from the list above, more people depend on jshint through grunt-contrib-jshint rather than directly. So we did a slightly more sophisticated table:
Depended during development upon packages depended upon:
Mocha is clearly the most popular test framework, while nodeunit is way behind. And Jasmine comes third. Community interest in code coverage is increasing, and it's indeed so easy with istanbul Also, more people are interested in checking style issues in their code with jscs than performing static analysis with ESLint. And it's surprising how low jshint usage is. People probably (hopefully!) use a globally installed jshint rather than using it through some other package.
And it was easy to remake the most dependent upon list using only our mature package set.
The list has almost the same entries, but the order and relative counts have changed. The difference between underscore and lodash is negligible in this table. It looks like underscore was used for many projects that are now abandoned. With stricter maturity boundaries lodash is even more popular than underscore!
The Chalk command-line colors library might soon be more popular than colors : the trend between our refined datasets justifies this. Additionally, commander for parsing command line arguments starts to be clearly more popular than the deprecated optimist.
License issues are highly important, but aren’t usually stressed enough. If you try to look up license information about packages in the npm registry, you'll be frustrated quite quickly. The npm package.json document suggests picking a license from the SPDX License List. That's not true for many packages.
After mangling and cleaning up the packages' metadata, we found the following numbers for the smaller package set (19857 packages):
not recognized or not OSI license
The license field is missing in one fifth of packages. We hope that the license is mentioned in a readme field. The license field should be mandatory, or at least there should be a warning if it's missing, in the same way as for the readme field.
Always remember to check the licenses of transitive dependencies. There are packages which say they are licensed under MIT, yet they depend on an (A)GPL package! That might or might not to be an issue for you.
About 33327 (18004) out of 37466 (19857) packages (that is, 89% (91%)) are hosted on GitHub. The Bitbucket users are in the minority, with 180 (84) packages. What is surprising is that there are five (one) packages using Subversion. GitHub is definitely one of the cornerstones for open source software nowadays.
13% (18%) of packages have more than one maintainer. This metric suggests that our definition of maturity is reasonable. Yet we would all like this number to be much higher.
The npm registry is full of packages, and though our definition of maturity wasn't very strict, only a third of packages are even close to production quality. The total package count doesn’t represent the whole truth, neither for the npm registry, nor for other package repositories.
- Oleg GrenrusSoftware Developer