Why GitHub Stars Lie About MCP Server Quality

Filesystem MCP has 128 stars on GitHub. Redis MCP has 847. We ran both through the same benchmark: 5 production tasks, same machine, same MCP SDK version. Filesystem passed every single test in 2 milliseconds per call. Redis timed out on 4 out of 5 tasks with an average latency of 11 seconds. The server with 7x more stars was the one that failed. This isn't an anomaly. It's the norm.

The Star Count Trap

GitHub stars measure visibility, not reliability. They're a bookmark with public confirmation. A developer stars a repo because the README looks polished, the demo GIF is slick, or the project got retweeted by someone they follow. None of that tells you whether the server will actually start when you run npx in a clean environment. We learned this the hard way. When we started MCP Select, we assumed the most-starred servers would be the safest bets. We were wrong. The first server we tested, a browser automation tool with over 600 stars, crashed on initialization because it expected a global Chrome install that wasn't documented anywhere. The second, a database connector with 400+ stars, silently swallowed connection errors and returned empty arrays instead of failing loudly. Both had immaculate READMEs and hundreds of stars. Both were unusable in production.

What 13 Servers Actually Look Like Under Test

We tested 13 MCP servers across 4 categories using a uniform harness: macOS, Node.js v25.7.0, MCP SDK 1.0.0. Each server got 5 real tasks. No mocks. No cached credentials. Here's what happened. The 100% club: Playwright MCP (23 tools, 607ms p50), Puppeteer MCP (7 tools, 25ms p50), SQLite MCP (8 tools, 2ms p50), Filesystem MCP (14 tools, 2ms p50), and Shell MCP (1 tool, 118ms p50). All passed every test. Three of these have fewer than 200 stars. The middle: Git MCP passed 3 out of 5 tests. Tavily MCP passed 1 out of 2. Both are useful but have edge cases that matter in production. The failures: Redis MCP timed out on 4/5 tasks with an 11-second latency. Browserbase MCP couldn't authenticate without an undocumented API key flow. PostgreSQL and Brave Search MCPs failed to connect entirely. Firecrawl MCP returned zero tools on discovery. Average pass rate across all 13 servers: 66%. And here's the thing: there is zero correlation between that number and GitHub stars. The server with the fastest response time (SQLite at 2ms) has 89 stars. The server that hung for 11 seconds has 847.

We're Not the Only Ones Who Noticed

Agent Tool Intelligence scored 39,752 MCP servers in early 2026. Their first model used static analysis only. Result? 85.7% of all tools scored "Grade B", completely useless for telling good from bad. They rebuilt their model to include live execution data. After the rebuild, only 0.4% of servers reached "Verified" status (70+ score). 69.9% scored below 30, "Experimental" tier. Stars alone would have hidden that bottom 70%. PT-Edge's MCP Quality Index tracks 12,653+ servers daily across four dimensions: maintenance, adoption, maturity, and community. Their finding: a server needs a composite score of 50+ to be "Established." Only 5.4% make it. The rest look fine on paper but haven't been exercised. Even the official Anthropic servers aren't immune. The mcp-quality-gate project tested server-everything (the reference implementation) and found it leaks environment variables through a get-env tool. server-filesystem, which we confirmed passes our tests, still has 72% of parameters undocumented in its schema. Stars don't catch that either.

What Stars Actually Measure

Stars measure three things, none of which is quality: 1. **Marketing surface area.** A good demo video beats a robust error-handling strategy for star accumulation. Every time. 2. **Network effects.** The first 50 stars come from the author's Twitter followers. The next 200 come from Hacker News. Neither group has installed the server. 3. **Recency bias.** A server launched last week can out-star a two-year-old workhorse because it's riding a trend wave. What stars don't measure: timeout handling, schema correctness, auth flow clarity, error message quality, or whether the server still works with the current MCP SDK version.

What We Do Instead

At MCP Select, we don't look at stars until after we've tested a server. The ranking process is simple: 1. Install it in a clean environment. No pre-existing config, no cached credentials. 2. Run 5 production tasks. Real tool calls, real latency measurement. 3. Score it. Pass rate + p50 latency + tool discovery success. 4. Document the auth requirements. Free-to-test? API key needed? Paid subscription? 5. Publish the raw results. Every benchmark script is on GitHub. This is slow. We test about one server per hour when everything goes right. But it's the only way to know if a server actually works before you commit to it.

The Bottom Line

If you're choosing an MCP server for production use, ignore the star count. Ask three questions instead: - Has anyone run real tasks on this server and published the results? - What's the p50 latency under load, not just the README claim? - Does it fail loudly and clearly, or silently and dangerously? We publish answers to all three for every server we test. Not because we're better than the community. Because we got burned by the alternative.

Try it yourself. Browse our benchmarked servers or run our open-source test harness against any server you're evaluating. The results might surprise you. *Last updated: June 11, 2026. Tested on macOS with Node.js v25.7.0 and MCP SDK 1.0.0.*