Publications
* indicates equal contribution
2024
- Bridging or Breaking: Impact of Intergroup Interactions on Religious PolarizationRochana* Chaturvedi , Sugat* Chaturvedi , and Elena ZhelevaIn Proceedings of the ACM on Web Conference 2024 , 2024
While exposure to diverse viewpoints may reduce polarization, it can also have a backfire effect and exacerbate polarization when the discussion is adversarial. Here, we examine the question whether intergroup interactions around important events affect polarization between majority and minority groups in social networks. We compile data on the religious identity of nearly 700,000 Indian Twitter users engaging in COVID-19-related discourse during 2020. We introduce a new measure for an individual’s group conformity based on contextualized embeddings of tweet text, which helps us assess polarization between religious groups. We then use a meta-learning framework to examine heterogeneous treatment effects of intergroup interactions on an individual’s group conformity in the light of communal, political, and socio-economic events. We find that for political and social events, intergroup interactions reduce polarization. This decline is weaker for individuals at the extreme who already exhibit high conformity to their group. In contrast, during communal events, intergroup interactions can increase group conformity. Finally, we decompose the differential effects across religious groups in terms of emotions and topics of discussion. The results show that the dynamics of religious polarization are sensitive to the context and have important implications for understanding the role of intergroup interactions.
- Temporal Knowledge Graph Extraction and Modeling across Multiple Documents for Health Risk PredictionRochana ChaturvediIn Companion Proceedings of the ACM on Web Conference 2024 , 2024
Clinical text in electronic health records (EHR) holds vital cues into a patient’s journey, often absent in structured EHR data. Evidence-based healthcare decisions demand accurate extraction and modeling of these cues. The goal of our study is to predict Type-II Diabetes by utilizing concept-based models of visit sequences from longitudinal EHR data. We undertake the challenging task of fine-grained temporal information extraction from clinical text using a recent span-based approach with pre-trained transformers. We achieve a new state-of-the-art in end-to-end relation extraction from 2012 clinical temporal relations corpus. We propose to apply our model to a new dataset and extract patient-centric temporal knowledge graphs from their visits-fusing temporal orderings within documents and across visits. Beyond the current focus of our work on Type-II Diabetes risk prediction from EHR, our versatile framework can be extended to other domains including web-based healthcare systems for personalized medicine. It can not only model health outcomes having long progression timelines but also various socio-economic outcomes such as conflict, natural disasters, and financial markets by leveraging news, reports, and social-media text for extracting and modeling irregular time-series and help inform a variety of web-based applications and policies.
- It’s All in the Name: A Character Based Approach to Infer ReligionRochana* Chaturvedi , and Sugat* ChaturvediPolitical Analysis, 2024
Large-scale microdata on group identity are critical for studies on identity politics and violence but remain largely unavailable for developing countries. We use personal names to infer religion in South Asia—where religion is a salient social division, and yet, disaggregated data on it are scarce. Existing work predicts religion using a dictionary-based method, and therefore, cannot classify unseen names. We provide character-based machine learning models that can classify unseen names too with high accuracy. Our models are also much faster, and hence, scalable to large datasets. We explain the classification decisions of one of our models using the layer-wise relevance propagation technique. The character patterns learned by the classifier are rooted in the linguistic origins of names. We apply these to infer the religion of electoral candidates using historical data on Indian elections and observe a trend of declining Muslim representation. Our approach can be used to detect identity groups across the world for whom the underlying names might have different linguistic roots.
2023
- Sequential Representation of Sparse Heterogeneous Data for Diabetes Risk PredictionRochana Chaturvedi , Mudassir Rashid , Brian T. Layden , and 3 more authorsIn 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , Dec 2023
Type 2 diabetes (T2D) is a major public health problem, and opportunistic screening to detect T2D at an early stage can help initiate interventions that delay or prevent the disease and its complications. In this study, we use electronic health records (EHR) and concepts extracted from clinical notes to predict future T2D risk. Our deep neural network-based model captures the temporal sequence of patient visits. We use explainable AI algorithms to assess the model decisions and observe alignment with the domain knowledge of clinical experts.
2020
- Divide and conquer: From complexity to simplicity for lay summarizationRochana Chaturvedi , Saachi* , Jaspreet Singh* Dhani , and 7 more authorsIn Proceedings of the first workshop on scholarly document processing , Dec 2020
We describe our approach for the 1st Computational Linguistics Lay Summary Shared Task CL-LaySumm20. The task is to produce non-technical summaries of scholarly documents. The summary should be within easy grasp of a layman who may not be well versed with the domain of the research article. We propose a two step divide-and-conquer approach. First, we judiciously select segments of the documents that are not overly pedantic and are likely to be of interest to the laity, and over-extract sentences from each segment using an unsupervised network based method. Next, we perform abstractive summarization on these extractions and systematically merge the abstractions. We run ablation studies to establish that each step in our pipeline is critical for improvement in the quality of lay summary. Our approach leverages state-of-the-art pre-trained deep neural network based models as zero-shot learners to achieve high scores on the task.