PE Research Seminar: Xuancheng Qian

AdroitRA: A Deep Learning Model for Information Extraction
Apr 19, 2022, 12:15 pm1:30 pm
JRR 217


Event Description

For social science research, high-quality text data has become an important data source. However, the time-consuming process of extracting information from documents often makes using text for theory development and empirical testing too expensive for researchers. The AdroitRA model, which uses state-of-the-art technology from deep neural network research, is a low-cost, semi-automated alternative to human coding. Our model allows researchers to extract information from documents using query-formulation and question-asking. We also present an adversarial-network-based active learning approach for improving the model's performance across diverse corpus domains. The model performance is evaluated using three popular and challenging databases: SQuAD 2.0, NewsQA, and MASH-QA. The results demonstrate our model's effectiveness and versatility. Lastly, we apply our model to U.S. firms' annual reports from which we extract information about regulatory barriers faced by firms. We believe that our method helps open up new avenues for future research by lowering the cost of extracting information from documents.