Data and analysis code for "The Labor Market Integration of Refugee Migrants in High-Income Countries"
Courtney Brell, Christian Dustmann, and Ian Preston
Journal of Economic Perspectives 2020

All code and data files are provided in Stata 15 format.

----------------------------------------------------

Main data file: refugees_combined.dta
This file contains labor market outcomes for the countries we study, separately for natives, refugees, and other immigrants.
This is disaggregated by gender and the number of years since arrival in the country.

Variables are as follows:
country - The host country
migranttype - 0=Native, 1=Refugee, 2=Other Immigrant
female - Boolean variable indicating recorded sex. Missing indicated values aggregated over gender
yearssincearrive - The number of years since a migrant arrived in the country (natives are always recorded as 0)
employment - The employment rate in the sample
avg_income - The mean wage in the sample
Nemp - The number of observations used for calculating employment
Ninc - The number of observations used for calculating average wages

Some of the data in this file can be generated from public surveys using the code provided in the folder "Public survey data codebase" (described below). The remaining data were provided to us by other authors as moments from administrative data sets, as described in the online appendix.

----------------------------------------------------

Main analysis file: analysis.do
This file is used to generate Figures 2, 4, and A1, and the statistics shown in Tables 1, 2, 3, and A1.

----------------------------------------------------

For those countries where we generated the data in refugees_combined.dta from public survey data sources, the corresponding codes are provided in the folder "Public survey data codebase". These files should each be run in the folder of the corresponding survey dataset. Each file (apart from EU-LFS.do) produces a .dta file containing a subset of the data in refugees_combined.dta. They also output the descriptive statistics needed to produce Tables A2 and A4. The files US-ACS.do and EU-LFS.do also output Figures A4, and 3, A2, and A3 respectively.

These files are as follows:

AU-BNLA.do
- Analysis of the "Building a New Life in Australia" survey (doi:10.26193/ZQHBPW). This data set is available (upon approval) for researchers at government, academic, and not-for-profit organizations from the National Centre for Longitudinal Data's Dataverse (https://dataverse.ada.edu.au/dataverse/bnla). Descriptions of the variables used in this file may be found in the documentation for the dataset.

AU-HILDA.do
- Analysis of the "Household, Income and Labour Dynamics in Australia" survey (doi:10.26193/PTKLYP). This data set is available (upon approval) for researchers at government, academic, and not-for-profit organizations from the National Centre for Longitudinal Data's Dataverse (https://dataverse.ada.edu.au/dataverse/hilda). Descriptions of the variables used in this file may be found in the documentation for the dataset.

DE-SOEP.do
- Analysis of the German Socio-Economic Panel (doi:10.5684/soep.v34). This data set is available (upon approval) to researchers at independent, non-profit research institutions for research purposes from the German Institute for Economic Research (DIW) (https://www.diw.de/en/diw_02.c.222829.en). Descriptions of the variables used in this file may be found in the documentation for the dataset.

EU-LFS.do
- Analysis of the EU Labour Force Survey (doi:10.2907/LFS1983-2018V.1). This data set is available (upon approval) to research organizations for scientific purposes from Eurostat (https://ec.europa.eu/eurostat/web/microdata/european-union-labour-force-survey). This code requires that the data for all countries has previously been appended into a single file "allcountries20xx_y.dta", and that datafileinfo_ahm_20xx have also previously been converted to Stata dta format. Descriptions of the variables used in this file may be found in the documentation for the dataset.

UK-LFS.do
- Analysis of the UK Labour Force Survey (doi:10.5255/UKDA-SN-6770-1). This dataset is available (upon registration) for non-commercial use from the UK Data Service (https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=6770). Descriptions of the variables used in this file may be found in the documentation for the dataset.

UK-SNR.do
- Analysis of the UK "Survey of New Refugees" (doi:10.5255/UKDA-SN-6556-1). This dataset is available (upon registration) for non-commercial use from the UK Data Service (https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=6556). Descriptions of the variables used in this file may be found in the documentation for the dataset.

US-ACS.do
- Analysis of the American Community Survey (ACS). Specifically, this makes use of the 2017 5-year release available (upon registration) from the Integrated Public Use Microdata Series (https://usa.ipums.org/usa/). As well as being run in the folder containing the ACS data, this code requires that the US-YISrefugees.dta file also be included in the same folder. Descriptions of the variables used in this file may be found in the documentation for the dataset.

US-YISrefugees.dta
- A compilation of refugee numbers corresponding to origin country-year of arrival pairs for the USA. These have been compiled from the Department of Homeland Security's Yearbook of Immigration Statistics (https://www.dhs.gov/immigration-statistics/yearbook), and include all origin country-year of arrival pairs that record more than 50 refugee arrivals. The variables in this file are as follows: country - indicating the origin country of the refugees; bpld - codes indicating the corresponding birthplace, in the format used by the ACS; refxxxx - the number of refugees recorded by the Yearbook of Immigration Statistics as arriving in the US from this origin country in the year xxxx.