PEC: Ted Enamorado

Oct 4, 2018, 12:15 pm12:15 pm
Fisher Hall 200



Event Description

Integrating information from multiple sources plays a key role in social science research. However, when a unique identifier that unambiguously links records is not available, merging datasets can be a difficult and error-prone endeavor. In this paper, I propose an active learning algorithm for Probabilistic Record Linkage (PRL), which efficiently incorporates human judgement into the process and significantly improves PRL’s performance at the cost of manually labeling a small number of records. Using data from local politicians in Brazil, and from a recent vote validation study conducted for the ANES, I show that the proposed method can recover estimates that are indistinguishable from those obtained from a more extensive, expensive, and time-consuming clerical review.