Replication package

💡

This page provides instructions on executing the replication package to produce the final results in the paper.

Download

Download the replication package (20.1 GB) containing all unrestricted datasets employed in the study.

Notice that the Compustat dataset, which has restricted access, is not part of the replication package. Retrieve Compustat, rename it as “compustat_annual2.csv”, and transfer it to the "input_data" directory.

Environment setup

If you have installed both R and RStudio, begin by executing the “run_project.Rproj” file. The R project employs renv to guarantee the project's portability and reproducibility. Renv helps ensure that the environment (libraries and their versions) used in the project is consistent across different systems. If you encounter any issues while running the renv environment, try the following command:


renv::repair()

If you prefer to execute the project using the latest versions of the library rather than the ones employed by the author, deactivate the renv using “renv::deactivate()”.

Direct replication

Running the "07_tables_3_4_RUNALL.R" script alone will produce Tables 1, 3, and 4 along with Figures 1 through 5 under the folder “output”. To produce Table 5, execute the "08_table5_gmm.do".

Datasets

input_data/202208_OECD_PATENT_QUALITY_USPTO

OECD Patent Quality Indicators Database. Refer to the codebook and the OECD working paper located in the designated folder.

input_data/compustat

Wharton Research Data Services. It is a firm-level balance sheet dataset.

input_data/favotetal_2023

Favot et al. (2023). See the source file. IPC patent categories associated with the ENVTECH classification of climate change mitigation technologies.

input_data/koganetal_2017

Kogan et al. (2017). See the source file. Compustat gvkeys linked to patent identifiers to facilitate matching PatentsView patents with the firm-level dataset.

input_data/patentsView

PatentsView. “PatentsView is an award-winning visualization, data dissemination, and analysis platform that focuses on intellectual property (IP) data. Support for the site and the team that works on it comes from the Office of the Chief Economist at the U.S. Patent & Trademark Office (USPTO).”

Codes

00_setup.R

Sets up the environment by loading necessary libraries for data manipulation, visualization, and analysis.

01_build_patent_data.R

Cleans and curates patent data
Identifies green patents.
Associates patent assignee information with patents for further analysis.

02_build_firm_data.R

Imports and cleans Compustat data, filtering it for relevant years. It cleans industry codes and removes duplicates.
Calculates various financial ratios and metrics based on Compustat data such as valuation, investment, profitability, and financing.
Merges Compustat data with patent data, using Kogan et al. (2017)’s crosswalk to match patent and Compustat identifiers.

03_build_patent_quality_data

Integrates OECD quality data with existing patent data.

04_build_sample_table1

Generates the sample and descriptive statistics tables (Table 2).

05_figures_1_2_4_5.R

Generates several figures (1,2,4, and 5) related to the evolution of clean technology innovation and patent concentration.

06_figure3.R

Calculates the cumulative number of CCMT patents per firm for the years 2009 to 2020. Another set of calculations is done for the cumulative number of CCMT patents per firm for the years 1976 to 2008.The results are joined together and additional information about the firms is merged in (like organization name and Compustat status).
CCMT patents are filtered, and data are grouped by patent year. Quantile variables are created for various financial indicators such as sales, return on assets (ROA), cash flow, etc.The data are further categorized into periods and labeled accordingly.Quartiles are calculated for each financial indicator over different periods.The results are combined and reshaped to facilitate visualization.

07_tables_3_4_RUNALL.R

C a comprehensive econometric analysis of firm-level data to investigate the determinants of patenting activity. It covers cross-sectional, Poisson regression, and panel data models to capture different aspects of the data.

08_table5_gmm.do

Conducts a panel data analysis using the xtabond2 command to estimate dynamic panel data models with GMM.