The multitalented Matt Pearce has created a fantastic new Stata tool called “UCI Merge”, which facilitates the merging of cross-national datasets.
Check it out: https://github.com/mpearce/UCIMerge/releases/latest
As many know first-hand, assembling and organizing large cross-national datasets is time-consuming, frustrating, and often error-prone. Matt’s new tool will help automate the process and save countless hours of tedious work.
As an aside: One of the reasons the Stanford world society/world polity research group has been so productive over many decades is the cumulation of expertise with cross-national datasets (as well as the sharing of data more generally). Knowledge about data was passed down through many generations of graduate students. I personally benefitted tremendously from the generosity of Marc Ventresca. Marc taught me all about how to organize datasets, and helped me figure out the big datasets that had been previously assembled at Stanford. (I remember asking “What is this variable called newid3?”) Anyhow, Matt has continued this important tradition of generosity with his knowledge and expertise.
An excerpt from the README file is below. This is new, so please report bugs or problems to Matt so they can be fixed. And, consider buying Matt a beer at ASA… He has earned it.
# UCIMerge – a framework for harmonizing cross national time series data
## Read Me
UCIMerge is a framework in STATA to standardize the merging of international comparative datasets. This project creates conventions and a library of functions so that it becomes easier and faster to merge time series datasets, incorporate updates, make sure observations are consistent across years, conserve N and encourage reproducible research.
This framework came about from conversations at the [UC Irvine International Comparative Workshop](http://sites.uci.edu/icsw/).
Download the [latest release](https://github.com/mpearce/UCIMerge/releases/latest). Join the [announcement list](http://eepurl.com/btU40r) to receive notifications of updates.
## How to use UCIMerge
The first time you run the scripts, it will take an extremely long to update the datasets from the web. If you would like to jumpstart this, you can use this [starter pack](http://mattpearce.name/files/UCIMergeStarterPack.zip) by drop these files into the /source directory. If you want to force the system to refresh a dataset, just delete that dataset file from /source.
1. Set the UCIMerge folder as the working directory for STATA (‘cd ~/UCIMerge’)
2. Edit the Master.do file with the configuration that you would like.
3. Run ‘do master’ -> your new dataset will be opened and saved within the UCIMerge folder.
UCIMerge requires STATA 13. The .csv files which link countries across datasets can be used independently.
## Currently Supported Datasets
* [Norris 2009](https://sites.google.com/site/pippanorris3/research/data#TOC-Democracy-Time-series-Data-Release-3.0-January-2009)
* [Freedom House 2015](https://freedomhouse.org/report/freedom-world/freedom-world-2015)
* [Polity IV](http://www.systemicpeace.org/polityproject.html)
* Polity IV Coups
* [World Development Indicators](http://data.worldbank.org)
* [KOF Index of Globalization](http://globalization.kof.ethz.ch)
* [The Lexical Index of Electoral Democracy (LIED)](http://ps.au.dk/forskning/forskningsprojekter/dedere/datasets/)
* [CIRI Human Rights Dataset](http://www.humanrightsdata.com)
* [Quality of Government Standard dataset](http://qog.pol.gu.se/data/datadownloads/qogstandarddata)
* [Cross National Time Series](http://www.databanksinternational.com)
* [Penn World Table version 8.1](http://www.rug.nl/research/ggdc/data/pwt/pwt-8.1)