In the course of my research, I have gathered and cleaned data that is not otherwise publicly available. I am committed to making these datasets available for public download, alongside the code required to generate them from original sources. I intend to do this once I have finished with the round of research that derive from these data (ca. Summer 2018). If you would like to take a look before that date, please email me.
The workhorse of my dissertation is an original dataset measuring the proportion of a country's working-age population employed in high-capacity industries (manufacturing, mining, construction and transport). These data come from three different sources, and required significant care to combine. Past research on the causes and consequences of labor strength have been handicapped by their reliance on data which are not available for most countries and most years, and which are very likely to be causally entangled with the outcomes of interest. My data have large coverage advantages. They are available for the vast majority of countries in the world, and for most of the modern period. And while the use of these data cannot substitute for strategies to identify causal relationships, for many political and social outcomes concerns about the endogeneity of the measure are probably less acute.
Most existing research on strikes focuses on a handful of countries, and, more importantly, on trends in the post-WWII era. I compiled a more extensive dataset. The raw data come almost entirely from the International Labour Organization, which has been collecting data on industrial disputes in member countries for almost a century. Data for the post-1969 period is available online, though I had to make some 'editorial' decisions about how to clean and organize the constituent data. Data prior to 1969 is available in various volumes of Brian Mitchell's International Historical Statistics. Most of these data are gleaned from old ILO yearbooks, though some are attributed to other sources.
Barbieri and Keshk curate a dataset of imports and exports valued in nominal dollars. These data are very difficult to combine with GDP data, which are not available in nominal dollars prior to 1948. For the pre-1948 period, I combined data on GDP with data on trade volumes, all denominated in nominal dollars. This gives trade/GDP ratios for the pre-1948 period, which is a conventional measure of the intensity of globalization. Based on correspondence with other researchers, I believe these data are unique.
COUNTY AND CZ-LEVEL INCARCERATION RATES
Quantitative work on incarceration in the US typically measures its outcome variable at the state level, but, as is well-known, there is a large amount of intra-state variation. With John Clegg, I use restricted-access data available through the National Corrections Reporting Program to generate incarceration rates at the county and commuting zone levels between the mid-1980s and 2014. These data vary in reliability, so we identify samples of varying reliability based on how much information the NCRP reports, as well as how well these data match state-level counts released through the National Prisoner Statistics program.