The People's Code: An analysis of public engagement with the US Federal Government's Open Source Pilot Program

paper By Jake Rashbass, Mairi Robertson

This report analyzes public engagement with the US Federal Government’s Open Source Pilot Program, part of the 2016 Federal Source Code Policy. It aims to inform Code.Gov’s decision on whether and how to continue the program beyond its August 2019 expiration date. The research provides empirical data and qualitative insights into what drives successful public engagement with government open source code.

Core Arguments & Findings

The Case for Federal Open Source Software (OSS)

The report outlines several key arguments favoring the use of OSS within the federal government (pp. 9-11):

Cost Savings: OSS can be cheaper than proprietary software, avoiding high license fees and recurring maintenance costs. It can reduce duplicative spending across agencies.
Avoiding Vendor Lock-in: OSS allows agencies to avoid being tied to a single vendor for modifications, support, and updates, providing flexibility and leverage.
Improved Reliability & Security: The ‘peer review’ nature of OSS, with many eyes on the code, generally improves quality, reduces bugs, and makes security verification easier compared to closed-source alternatives.
Enhanced Sharing & Reuse: OSS facilitates sharing code between federal agencies and reuse by state/local governments, non-profits, and civil society (e.g., Data.Gov, Analytics.Usa.Gov code).
Faster Development: Open development models can lead to shorter, more agile development cycles with quicker bug fixes and feature additions from the community.

However, the report cautions that OSS is not a universal solution, acknowledging cases where proprietary software might be necessary due to national security, privacy risks, specific support needs, or legal restrictions (p. 11).

The Federal Source Code Policy & Pilot Program

Launched in August 2016, the policy had two main pillars (p. 12):

Interagency Requirement: New custom code must be shared internally across all federal agencies.
Public Requirement (Pilot Program): At least 20% of new custom code must be released publicly as OSS for three years.

The report highlights ambiguities and challenges with the 20% requirement, including the lack of a standard measurement metric (lines of code, projects, cost?) and the potential incentive for agencies to release code that is easy to share rather than code that offers the most public benefit (p. 12, p. 35). The Pilot Program was designed specifically to test this public-facing component.

Pilot Program Impact on Engagement (Quantitative Findings)

Analysis of GitHub data from late 2009 to early 2019 revealed (pp. 17, 23-26):

No Sustained Increase: The Pilot Program did not lead to a sustained increase in the rate of new federal repository creation or the rate of public engagement (stars, forks, issues, pull requests). While absolute numbers grew, the growth rate slowed post-policy compared to the two years prior.
Highly Skewed Distribution: Engagement is heavily concentrated. The top 1% of repositories accounted for 51% of all engagement, and the top 20 ‘Superstar’ repositories captured over 40% (p. 17, p. 25). The median repository received only 6 engagements over the entire period.
Dominant Engagement Types: Users primarily engage by ‘starring’ (bookmarking/appreciating, 54%) and ‘forking’ (copying the code, 40%), suggesting interest in monitoring or reusing the code (p. 19, p. 26). Issues (feedback/bugs, 4%) and Pull Requests (code contributions, 2%) were much less common.
Agency Performance: NASA and the Department of Defense (DOD) contributed the most ‘Superstar’ repositories. However, when considering average engagement per repository, the General Services Administration (GSA) and the Department of the Interior performed more consistently (p. 18, p. 25).

Factors Driving Engagement (Qualitative Findings - DREAM CODE Framework)

Qualitative research identified nine key characteristics, grouped into the ‘DREAM CODE’ framework, that drive higher user engagement with government OSS projects (pp. 25, 31-39):

Discoverability: Repositories must be easy to find via clear, relevant names, user-friendly home pages (with contact info, avatars), and active promotion (website links, SEO).
Reusability: Code should be complete, self-contained, usable with minimal recoding, easy to initiate, and modular where possible.
End user: Projects should target specific user populations, considering both intrinsic (direct use) and extrinsic (reputation) motivations for engagement. Avoid being too tightly tied to one specific end user if broad reach is desired.
Applicability elsewhere: Code should be relevant beyond its original purpose, prioritizing library/infrastructure code over highly specific application code. Cross-platform compatibility helps.
Maintenance: Code requires regular updates, bug fixes, and feature additions post-release. Encouraging maintainer diversity and using continuous integration practices improves quality. Development status should be clearly communicated.
Community building: Active outreach is crucial. This includes engaging proactively (e.g., conferences), having dedicated community managers, providing structured forums for interaction (calls, webinars), and being responsive on platforms like GitHub.
Open origins: Projects developed ‘in the open’ from Day 1 are often more successful, as the code is designed with OSS users in mind, avoiding late-stage security or cultural hurdles.
Documentation: Clear, comprehensive documentation is essential. This includes ReadMe files, Wikis, mission statements, feature lists, setup instructions, examples, and clear licensing information. Using badges can signal quality and standards.
Explicit licensing: Clear, permissive open source licenses should be chosen upfront and stated explicitly to avoid deterring potential contributors concerned about intellectual property.

Key Statistics & Data

Dataset Scope: ~191,719 engagements across 5,672 repositories from 130 sub-agencies/organizations within 23 major federal agencies (Dec 16, 2009 - Jan 26, 2019) (p. 21).
Engagement Growth Rate: Averaged 8% monthly increase in the two years before the Pilot (Aug 2014-16), compared to 4% monthly after (Aug 2016-18) (p. 17, p. 24).
Repository Growth Rate: Averaged 5% monthly increase before the Pilot, compared to 2% monthly after (p. 17, p. 24).
Engagement Distribution: Top 1% of repositories = 51% of engagement; Top 20 repositories = 41% of engagement (p. 17, p. 25). Median engagement per repository = 6 (p. 17).
Top Repositories: NASA’s openmct (mission control framework) was the most engaged-with repository (~39.7k engagements), followed by DOD’s Dshell (network forensics) and SIMP (system integrity) (p. 28).
Agency Responsiveness: On average, agencies acted on 96% of pull requests but closed only 63% of issues, with significant variation between agencies (p. 19, p. 26, pp. 61-62).

Methodology

The study employed a mixed-methods approach (pp. 7, 14-20):

Quantitative Analysis: Developed an original dataset by scraping GitHub’s API for information on nearly 200,000 interactions (stars, forks, issues, pull requests) with over 5,000 federal repositories since 2008. Analyzed trends in repository creation and engagement over time, distribution of engagement, and agency performance.
Qualitative Analysis: Conducted 10 expert interviews (public, private, non-profit sectors), 2 focus groups with 12 federal employees involved in OSS, and an extensive literature review to understand the factors driving success and identify best practices, leading to the DREAM CODE framework.

Limitations: The dataset, while substantial, did not capture all federal repositories (estimated >90% coverage of GitHub-hosted ones). Establishing direct causality between the policy and engagement trends is difficult. The analysis could not determine user identities/types due to API/privacy restrictions, nor could it quantify internal government engagement. The authors, not being professional coders, relied on expert input for assessing code characteristics like modularity (pp. 18-20).

Key Conclusions & Recommendations

The Pilot Program showed mixed results, failing to significantly boost overall public engagement but highlighting the potential of well-managed projects (pp. 5, 33-53). Key conclusions informed four main recommendations for designing ‘Open Source Policy 2.0’:

Define the Policy Objective Clearly: Code.Gov and GSA must articulate whether the primary goal of public OSS release is to solicit user contributions (improving code quality), enable third-party reuse (public value), support internal procurement/innovation goals, or a combination. This clarity is needed to guide implementation and evaluation (p. 34).
Amend the “20% Requirement”: The 20% rule is flawed. The report strongly recommends adopting a ‘Default to Open’ approach (100% release with clear exceptions for security, privacy, etc.), similar to the UK model. This simplifies enforcement, aligns incentives with releasing valuable code, and maximizes opportunities for engagement and reuse, regardless of the chosen policy objective (pp. 35-38, 45).
Provide Programmatic Support for Agencies: Recognizing that policy changes alone are insufficient, agencies need institutional support. The report recommends, at minimum:
- Providing training for federal acquisition employees on OSS licensing and procurement.
- Pushing for the Federal CIO Council to engage actively in the policy redesign.
- Considering the creation of an ‘OSS Parachute Team’ within Code.Gov/GSA to provide targeted expertise and support to agencies lacking internal capacity (pp. 40-43, 50).
Investigate Open Questions: Prioritize further research to fill knowledge gaps crucial for effective policy design:
- Who is engaging? (User demographics, affiliations, motivations). Collaboration with efforts like the LISH/Linux Foundation census is suggested.
- Impact of Licensing: Empirically analyze how different license types correlate with engagement levels.
- Reusability Correlation: Test the hypothesis that code modularity and reusability directly correlate with higher engagement.
- Financial Savings: Quantify any cost savings generated by the Pilot Program (pp. 44, 51-52).

Stated or Implied Applications

The report emphasizes the potential benefits derived from public engagement with federal OSS (p. 10, p. 12, p. 34):

Code Improvement: Contributions from the public can enhance code quality, fix bugs, and add features.
Reuse: Publicly released code can be reused by other federal agencies, state and local governments, non-profits, academic institutions, and the private sector, reducing redundant effort and cost.
Innovation: Fostering OSS communities can spur innovation both within and outside government.
Transparency & Public Value: Releasing code honors public ownership (“The People’s Code”) and allows taxpayers to benefit from federally funded software development.

Key Questions Addressed or Raised

Questions Addressed (p. 4):
- To what extent did the Pilot Program increase user engagement with federal source code? (Answer: Not significantly at the aggregate level).
- How did users engage? (Answer: Primarily Stars & Forks).
- Which factors drove user engagement? (Answer: DREAM CODE framework).
- What changes should Code.Gov make to boost engagement? (Answer: See Recommendations).
Questions Raised for Future Research (pp. 5, 44, 51-52):
- Who are the users engaging with federal source code (demographics, type, motivation)?
- How does the type of license used correlate with engagement levels?
- How does code reusability/modularity correlate with engagement?
- What financial savings, if any, has the Pilot Program generated?
- How can agencies best be supported to build institutional capacity for OSS?

Key Points

The Pilot Program did not increase the aggregate rate of federal open source project creation or public engagement since its 2016 launch.
A small number of 'Superstar' projects (top 0.04%) accounted for over 40% of public engagement, indicating a highly skewed distribution.
Nine characteristics (DREAM CODE framework) drive user engagement: Discoverability, Reusability, End user focus, Applicability elsewhere, Maintenance, Community building, Open origins, Documentation, Explicit licensing.
Confusion exists among agencies and industry regarding the policy's primary objective (user engagement vs. internal reuse/savings).
The 20% public release requirement is difficult to measure and enforce, potentially incentivizing the release of less useful code.
Many federal agencies lack the institutional capacity, expertise, and resources to effectively manage open source projects and drive engagement.
Key questions remain regarding user demographics, the impact of licensing choices, and the correlation between code reusability and engagement.