Our published AI research

Bloomberg contributes back to academia whenever we can by attending and speaking at conferences in AI, ML, NLP, and IR, handing out the Bloomberg Data Science Research Grant, hosting the Bloomberg Data Science Ph.D. Fellows and serving as committee members for conferences.

Select research papers

RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models. Bang An, Shiyue Zhang and Mark Dredze. 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics. | arXiv Preprint.

Adapting Sentence-Level Metrics for Document-Level Simplification. Mounica Maddela and Fernando Alva-Manchego (Cardiff University). 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics.

Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs. David Ifeoluwa Adelani, A. Seza Doğruöz, Iyanuoluwa Shode and Anuoluwapo Aremu. 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics. | arXiv Preprint.

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines. [BEST THEME PAPER AWARD]. Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, Anirban Das, Ashmari Pramodya, Aulia Adila, Bryan Wilie, Candy Olivia Mawalim, Ching Lam Cheng, Daud Abolade, Emmanuele Chersoni, Enrico Santus, Fariz Ikhwantri, Garry Kuwanto, Hanyang Zhao, Haryo Akbarianto Wibowo, Holy Lovenia, Jan Christian Blaise Cruz, Jan Wira Gotama Putra, Junho Myung, Lucky Susanto, Maria Angelica Riera Machin, Marina Zhukova, Michael Anugraha, Muhammad Farid Adilazuarda, Natasha Santosa, Peerat Limkonchotiwat, Raj Dabre, Rio Alexander Audino, Samuel Cahyawijaya, Shi-Xiong Zhang, Stephanie Yulia Salim, Yi Zhou, Yinxuan Gui, David Ifeoluwa Adelani, En-Shiun Annie Lee, Shogo Okada, Ayu Purwarianti, Alham Fikri Aji, Taro Watanabe, Derry Tanti Wijaya, Alice Oh and Chong-Wah Ngo. 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics. | arXiv Preprint.

Intent Classification on Low-Resource Languages with Query Similarity Search. Arjun Bhalla and Qi Huang. arXiv

Understanding and Mitigating Risks of Generative AI in Financial Services. Sebastian Gehrmann, Claire Huang, Xian Teng, Sergei Yurovski, Iyanuoluwa Shode, Chirag S. Patel, Arjun Bhorkar, Naveen Thomas, John Doucette, David Rosenberg, Mark Dredze and David Rabinowitz. Published at the ACM Conference on Fairness, Accountability, and Transparency. | arXiv Preprint.

Evaluating the Retrieval Robustness of Large Language Models. Shuyang Cao, Karthik Radhakrishnan, David Rosenberg, Steven Lu, Pengxiang Cheng, Lu Wang and Shiyue Zhang. arXiv.

Benchmark Granularity and Model Robustness for Image-Text Retrieval: A Reproducibility Study. Mariya Hendriksen, Shuo Zhang, Ridho Reinanda, Mohamed Yahya, Edgar Meij and Maarten de Rijke. SIGIR 2025. | arXiv Preprint.

An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-Doc. Aldo Porco, Dhruv Mehra, Igor Malioutov, Karthik Radhakrishnan, Moniba Keymanesh,
Daniel Preoţiuc-Pietro, Sean MacAvaney and Pengxiang Cheng. SIGIR 2025; also presented at the co-located ReNeuIR’25: The 4th Workshop on Reaching Efficiency in Neural Information Retrieval. | arXiv Preprint

FinIR: The 2nd Workshop on Financial Information Retrieval in the Era of Generative AI. Workshop Organizers: Fengbin Zhu, Yunshan Ma, Fuli Feng, Chao Wang, Huanbo Luan, Guangnan Ye, Shuo Zhang, Dhagash Mehta, Pingping Chen, Bing Xiang and Tat-Seng Chua. Co-located at SIGIR 2025.

Agentic Retrieval of Topics and Insights from Earnings Calls. Anant Gupta, Rajarshi Bhowmik and Geoffrey Gunow. FinIR: The 2nd Workshop on Financial Information Retrieval in the Era of Generative AI (co-located at SIGIR 2025).

A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents. Bin Wu, Edgar Meij and Emine Yilmaz. Findings of the Association of Computational Linguistics: ACL 2025.

GEM2 Workshop: Generation, Evaluation & Metrics. Workshop Organizers: Sebastian Gehrmann, Gabriel Stanovsky, Simon Mille, Enrico Santus, Miruna Clinciu, Kaustubh Dhole, Yotam Perlitz, Rotem Dror, Itay Itzhak, Ofir Arviv, Eliya Habba, Michal Shmueli Scheuer, João Sedoc and Oyvind Tafjord. Co-located at ACL 2025.

Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding. Haoran Zhou, Xingchen Song, Brendan Fahy, Qiaochu Song, Binbin Zhang, Zhendong Peng, Anshul Wadhawan, Denglin Jiang, Apurv Verma, Vinay Ramesh, Srivas Prasad and Michele M. Franceschini. Interspeech 2025. | arXiv Preprint.

LLM-as-a-Judge: Rapid Evaluation of Legal Document Recommendation via Retrieval-Augmented Generation. Anu Pradhan, Alexandra Ortan, Apurv Verma and Madhavan Seshadri. To be published at 2nd EARL Workshop on Evaluating and Applying Recommender Systems with Large Language Models at RecSys 2025.

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding. Jaemin Cho, Debanjan Mahata, Ozan Irsoy, Yujie He and Mohit Bansal. 1st Workshop on the Findings of ICCV. | ArXiv preprint.

Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies. Terrance Liu (Carnegie Mellon University), Shuyi Wang, Daniel Preoţiuc-Pietro, Yash Chandarana and Chirag Gupta. To be published at EMNLP 2025. | arXiv preprint

Improving Instruct Models for Free: A Study on Partial Adaptation. Ozan İrsoy, Pengxiang Cheng, Jennifer L. Chen (Independent Researcher), Daniel Preoţiuc-Pietro, Shiyue Zhang and Duccio Pappadopulo. EMNLP 2025. | arXiv preprint

STARQA: A Question Answering Dataset for Complex Analytical Reasoning over Structured Databases. Mounica Maddela, Lingjue Xie, Daniel Preotiuc-Pietro and Mausam. EMNLP 2025.

Can LLMs Find a Needle in a Haystack? A Look at Anomaly Detection. Leslie Barrett, Vikram Sunil Bajaj and Robert Kingan. Findings of the Association for Computational Linguistics: EMNLP 2025.

Can LLMs Be Efficient Predictors of Conversational Derailment? Kaustubh Olpadkar, Vikram Sunil Bajaj and Leslie Barrett. Findings of the Association for Computational Linguistics: EMNLP 2025.

Enhancing RAG Efficiency with Adaptive Context Compression. Shuyu Guo (Shandong University), Shuo Zhang and Zhaochun Ren (Leiden University). Findings of the Association for Computational Linguistics: EMNLP 2025.

LEXTIME: A Benchmark for Temporal Ordering of Legal Events. Claire Barale (University of Edinburgh), Leslie Barrett, Vikram Sunil Bajaj and Michael Rovatsos (University of Edinburgh). Findings of the Association for Computational Linguistics: EMNLP 2025.

Learning to Trade with Preferences: Interpretable Execution via
Mixture-of-Experts. Haohan Xu (Stony Brook University), Jason Bohne, Pawel Polak (Stony Brook University), David Byrd (Bowdoin College), David Rosenberg and Gary Kazantsev. ICAIF 2025.

NeuralBeta: Estimating Beta Using Deep Learning. Yuxin Liu, Jimin Lin, Achintya Gopal. ICAIF 2025.

Robust Pricing of To-Be-Announced (TBA) Securities. Matias Altamirano, Saher Esmeir, Sophia Sosnina, Jan Szopinski, Miha Torkar, Nathan Visagan and Min Wang. Workshop on Rethinking Financial Time-Series (co-located with ICAIF ’25).

From News to Forecasts: Early Detection of Dividend Regime Shifts with AI Agents. Duc Nguyen, Ching-Yu Lin, Thomas Fonlladosa, Jan Szopinski, Min Wang and Saher Esmeir. 2nd Workshop on LLMs and Generative AI for Finance (co-located with ICAIF ’25).

Learning from Interval Targets. Rattana Pukdee, Ziqi Ke and Chirag Gupta. To be published at NeurIPS 2025. arXiv preprint

DELPHYNE: A Pre-Trained Model for General and Financial Time Series. Xueying Ding, Aakriti Mittal and Achintya Gopal. To be published at the Generative AI in Finance Workshop (co-located at NeurIPS 2025). | arXiv preprint

Teachable Facets: A Framework of Interactive Machine Teaching for Information Filtering. Swati Mishra, Matt Ryerkerk, Yitzchak D. Lockerman, David Eis and Jeffrey M. Rzeszotarski. ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2024).

Unsupervised Contrast-Consistent Ranking with Language Models. Niklas Stoehr, Pengxiang Cheng, Jing Wang, Daniel Preoţiuc-Pietro and Rajarshi Bhowmik. 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024).

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks. Danae Sánchez Villegas, Daniel Preoţiuc-Pietro and Nikolaos Aletras. Findings of the Association for Computational Linguistics: EACL 2024.

Towards Efficient Active Learning in NLP via Pretrained Representations. Artem Vysogorets and Achintya Gopal. Data-centric Machine Learning Research (DMLR) Workshop at ICLR 2024.

A new era of AI-assisted journalism at Bloomberg. Claudia Quinonez and Edgar Meij. AI Magazine (Volume 45, Issue 2 – Summer 2024). DOI: 10.1002/aaai.12181

Modeling and Detecting Company Risks from News. Jiaxin Pei, Soumya Vadlamannati, Liang-Kang Huang, Daniel Preoţiuc-Pietro and Xinyu Hua. NAACL 2024 Industry Track.

Leveraging Contextual Information for Effective Entity Salience Detection. Rajarshi Bhowmik, Marco Ponza, Atharva Tendle, Anant Gupta, Rebecca Jiang, Xingyu Lu, Qian Zhao and Daniel Preoţiuc-Pietro. Findings of the Association for Computational Linguistics: NAACL 2024.

Non-contrastive sentence representations via self-supervision. Marco Farina and Duccio Pappadopulo. Findings of the Association for Computational Linguistics: NAACL 2024.

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs). Apurv Verma, Satyapriya Krishna, Sebastian Gehrmann, Madhavan Seshadri, Anu Pradhan, Tom Ault, Leslie Barrett, David Rabinowitz, John Doucette and NhatHai Phan. arXiv.

Investigating Flexible Role Binding in AI Agents. Brian Pennisi, Rheza Budiono, Todd M. Gureckis and Mark K. Ho. COGSCI 2024.

Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering. Zhengliang Shi, Shuo Zhang, Weiwei Sun, Shen Gao, Pengjie Ren, Zhumin Chen and Zhaochun Ren. ACL 2024.

ULTRA: Unleash LLMs’ Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement. Xinliang Frederick Zhang, Carter Blum, Temma Choji, Shalin Shah and Alakananda Vempala. Findings of the ACL: ACL 2024. arXiv preprint.

Online Nonconvex Bilevel Optimization with Bregman Divergences. Jason Bohne, David Rosenberg, Gary Kazantsev and Pawel Polak. arXiv preprint.

Academics Can Contribute to Domain-Specialized Language Models. Mark Dredze, Genta Indra Winata, Prabhanjan Kambadur, Shijie Wu, Ozan İrsoy, Steven Lu, Vadim Dabravolski, David S Rosenberg and Sebastian Gehrmann. EMNLP 2024.

Do LLMs Plan Like Human Writers? Comparing Journalist Coverage of Press Releases with LLMs. [OUTSTANDING PAPER AWARD]. Alexander Spangher, Nanyun Peng, Sebastian Gehrmann and Mark Dredze. EMNLP 2024.

Enhancing Question Answering on Charts Through Effective Pre-training Tasks. Ashim Gupta, Vivek Gupta, Shuo Zhang, Yujie He, Ning Zhang and Shalin Shah. To be published during the BlackboxNLP Workshop at EMNLP 2024.

Can We Statically Locate Knowledge in Large Language Models? Financial Domain and Toxicity Reduction Case Studies. Jordi Armengol-Estapé, Lingyu Li, Sebastian Gehrmann, Achintya Gopal, David S Rosenberg, Gideon Mann and Mark Dredze. BlackboxNLP Workshop at EMNLP 2024.

Neural Term Structures of Additive Process for Option Pricing. Jimin Lin and Guixin Liu. 5th ACM International Conference on AI in Finance (ICAIF 2024).

Augmenting Equity Factor Investing with Global Macro Regimes. Dmitriy Nuriyev, Lingjie Ye, and Songyun Duan. 5th ACM International Conference on AI in Finance (ICAIF 2024).

Generative Machine Learning for Multivariate Equity Returns. Ruslan Tepelyan and Achintya Gopal. 5th ACM International Conference on AI in Finance (ICAIF 2024).

NeuralFactors: A Novel Factor Learning Approach to Generative Modeling of Equities. Achintya Gopal. 5th ACM International Conference on AI in Finance (ICAIF 2024).

Online Nonconvex Bilevel Optimization with Bregman Divergences. Jason Bohne, David S Rosenberg, Gary Kazantsev and Pawel Polak. 16th International OPT Workshop on Optimization for Machine Learning (OPT2024) at NeurIPS 2024.

BloombergGPT: A Large Language Model for Finance. Shijie Wu, Ozan İrsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg and Gideon Mann. arXiv.

Preserving Fairness in Artificial Intelligence–Based Travel Demand Forecasting Models. Xiaojian Zhang (University of Florida), Qian Ke (Bloomberg) and Xilei Zhao (Univ. of Florida). Poster presented at the Transportation Research Board (TRB) 102nd Annual Meeting.

Interactive Convolutional Network for Forecasting Travel Demand of Shared Micromobility. Yiming Xu (University of Florida), Qian Ke (Bloomberg) and Xilei Zhao (Univ. of Florida). Poster presented at the Transportation Research Board (TRB) 102nd Annual Meeting.

UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems. Jafar Afzali, Aleksander Mark Drzewiecki, Krisztian Balog, and Shuo Zhang. WSDM 2023.

Dataless Knowledge Fusion by Merging Weights of Language Models. Xisen Jin, Xiang Ren, Daniel Preoţiuc-Pietro and Pengxiang Cheng. ICLR 2023.

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages [OUTSTANDING PAPER AWARD]. Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich and Sebastian Ruder. EACL 2023.

Towards a Unified Multi-Domain Multilingual Named Entity Recognition Model. Mayank Kulkarni, Daniel Preoţiuc-Pietro, Karthik Radhakrishnan, Genta Indra Winata, Shijie Wu, Jane Xie and Shaohua Yang. EACL 2023.

Distillation of encoder-decoder transformers for sequence labelling. Marco Farina, Duccio Pappadopulo, Anant Gupta, Leslie Huang, Ozan İrsoy and Thamar Solorio. Findings of the ACL: EACL 2023.

Evaluating Paraphrastic Robustness in Textual Entailment Models. Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme and Adam Poliak. ACL 2023.

InfoSync: Information Synchronization across Multilingual Semi-structured Tables. Siddharth Hemant Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria and Shuo Zhang. Findings of the ACL: ACL 2023.

MixCE: Training Autoregressive Language Models
by Mixing Forward and Reverse Cross-Entropies. Shiyue Zhang, Shijie Wu, Ozan İrsoy, Steven Lu, Mohit Bansal, Mark Dredze and David Rosenberg. ACL 2023.

On “Scientific Debt” in NLP: A Case for More Rigour in Language Model Pre-Training Research. Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Alham Fikri Aji, Genta Indra Winata, Radityo Eko Prasojo, Phil Blunsom and Adhiguna Kuncoro. ACL 2023.

NusaCrowd: Open Source Initiative for Indonesian NLP Resources. Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, et al. Findings of the ACL: ACL 2023.

Joint End-to-end Semantic Proto-role Labeling. Elizabeth Spaulding, Gary Kazantsev and Mark Dredze. ACL 2023.

RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue. Zhengliang Shi, Weiwei Sun, Shuo Zhang Zhen Zhang, Pengjie Ren and Zhaochun Ren. ACL 2023.

Don’t Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text. Ashim Gupta, Carter Wood Blum, Temma Choji, Yingjie Fei, Shalin Shah, Alakananda Vempala and Vivek Srikumar. ACL 2023.

Multi-lingual and Multi-cultural Figurative Language Understanding. Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo and Graham Neubig. Findings of the ACL: ACL 2023.

The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges. Genta Indra Winata, Alham Fikri Aji, Zheng Xin Yong and Thamar Solorio. Findings of the ACL: ACL 2023.

Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning. Genta Indra Winata, Jane Xie, Karthik Radhakrishnan, Shijie Wu, Xisen Jin, Pengxiang Cheng, Mayank Kulkarni and Daniel Preoţiuc-Pietro. Findings of the ACL: ACL 2023.

Dense Retrieval Adaptation using Target Domain Description. Helia Hashemi, Yong Zhuang, Sachith Sri Ram Kothur, Srivas Prasad, Edgar Meij and W. Bruce Croft. ICTIR 2023.

Towards Explainable Conversational Recommender Systems. Shuyu Guo, Shuo Zhang, Weiwei Sun, Pengjie Ren, Zhumin Chen and Zhaochun Ren. SIGIR 2023.

Unsupervised Contrast-Consistent Ranking with Language Models. Niklas Stoehr, Pengxiang Cheng, Jing Wang, Daniel Preoţiuc-Pietro and Rajarshi Bhowmik. arXiv.

Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation. Xumei Xi, Yuke Zhao, Quan Liu, Liwen Ouyang, and Yang Wu. RecSys 2023.

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation. Rodrigo Castellon, Achintya Gopal, Brian Bloniarz and David Rosenberg. 2023 Theory and Practice of Differential Privacy Workshop (TPDP 2023).

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems [BEST PAPER AWARD]. Muhammad Dehan Al Kautsar, Rahmah Nurdini, Samuel Cahyawijaya, Genta Indra Winata and Ayu Purwarianti. First Workshop in South East Asian Language Processing (SEALP), co-located at IJCNLP-AACL 2023.

Efficient Zero-Shot Cross-lingual Inference via Retrieval. Genta Indra Winata, Lingjue Xie, Karthik Radhakrishnan, Yifan Gao and Daniel Preoţiuc-Pietro. IJCNLP-AACL 2023.

Analyzing and Predicting Persistence of News Tweets. Maggie Liu, Jing Wang and Daniel Preoţiuc-Pietro. IJCNLP-AACL 2023.

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti and Pascale Fung. IJCNLP-AACL 2023.

Generative Machine Learning for Multivariate Equity Returns. Ruslan Tepelyan and Achintya Gopal. 4th ACM International Conference on AI in Finance (ACAIF 2023).

EntSUMV2: Data, Models and Evaluation for More Abstractive Entity-Centric Summarization. Dhruv Mehra, Lingjue Xie, Ella Hofmann-Coyle, Mayank Kulkarni and Daniel Preoţiuc-Pietro. EMNLP 2023.

Multilingual Large Language Models Are Not (Yet) Code-Switchers. Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Indra Winata and Alham Fikri Aji. EMNLP 2023.

TempTabQA: Temporal Question Answering for Semi-Structured Tables. Vivek Gupta, Pranshu Kandoi, Mahek Bhavesh Vora, Shuo Zhang, Yujie He, Ridho Reinanda and Vivek Srikumar. EMNLP 2023.

Semantic Similarity Covariance Matrix Shrinkage. Guillaume Becquin and Saher Esmeir. Findings of EMNLP 2023.

Novelty Controlled Paraphrase Generation with Retrieval Augmented Conditional Prompt Tuning. Jishnu Ray Chowdhury, Yong Zhuang and Shuyi Wang. AAAI 2022. (video)

Scientific Chart Summarization: Datasets and Improved Text Modeling. Hao Tan, Chen-Tse Tsai, Yujie He and Mohit Bansal. Workshop on Scientific Document Understanding at AAAI 2022.

Keynote – Search and Discovery in News and Research. Anju Kambadur. AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (KDF 2022).

StruBERT: Structure-aware BERT for Table Search and Matching. Mohamed Trabelsi, Zhiyu Chen, Shuo Zhang, Brian D. Davison and Jeff Heflin. The ACM Web Conference 2022.

Similarity-based Multi-Domain Dialogue State Tracking with Copy Mechanisms for Task-based Virtual Personal Assistants. Jarana Manotumruksa, Jeff Dalton, Edgar Meij and Emine Yilmaz. The ACM Web Conference 2022.

Understanding Financial Information Seeking Behavior from User Interactions with Company Filings. Mozhdeh Ariannezhad, Mohamed Yahya, Edgar Meij, Sebastian Schelter, Maarten de Rijke. FinWeb: 2nd International Workshop on Financial Technology on the Web (at the ACM Web Conference 2022).

Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning. Vivek Gupta, Shuo Zhang, Alakananda Vempala, Yujie He, Temma Choji, Vivek Srikumar. ACL 2022.

IMPLI: Investigating NLI Models’ Performance on Figurative Language. Kevin Stowe, Ajie Utama and Iryna Gurevych. ACL 2022.

EntSUM: A Data Set for Entity-Centric Extractive Summarization. Mounica Maddela, Mayank Kulkarni and Daniel Preoţiuc-Pietro. ACL 2022.

Automatic Identification and Classification of Bragging in Social Media. Mali Jin, A. Seza Doğruöz, Daniel Preoţiuc-Pietro, Nikolaos Aletras. ACL 2022.

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder. ACL 2022.

Updated Headline Generation: Creating Updated Summaries for Evolving News Stories. Sheena Panthaplackel, Adrian Benton and Mark Dredze. ACL 2022.

Improving Argument Structure Extraction Efficacy with Transfer Learning and Active Learning. Xinyu Hua and Lu Wang. Findings of the Association for Computational Linguistics: ACL 2022.

XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference. Bhavnick Singh Minhas, Anant Shankhdhar, Vivek Gupta and Shuo Zhang. Fifth Workshop on Fact Extraction and VERification (FEVER) during ACL 2022.

Enhanced Distant Supervision with State-Change Information for Relation Extraction. Jui Shah, Dongxu Zhang, Sam Brody and Andrew McCallum. LREC 2022.

Combining Humor and Sarcasm for Improving Political Parody Detection. Xiao Ao, Danae Sánchez Villegas, Daniel Preoţiuc-Pietro and Nikolaos Aletras. NAACL 2022.

Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization. Ajie Utama, Joshua Bambrick, Nafise Sadat Moosavi and Iryna Gurevych. NAACL 2022.

Learning Rich Representation of Keyphrases from Text. Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. Findings of NAACL 2022.

Analyzing and Simulating User Utterance Reformulation in Conversational Recommender Systems. Shuo Zhang, Mu Chun Wang and Krisztian Balog. To be published at SIGIR 2022.

Cross-Domain Graph Learning for Multivariate Time Series. Difan Zou, Ni Ma and Saher Esmeir. 8TH SIGKDD International Workshop on Mining and Learning from Time Series (MiLeTS) — Deep Forecasting: Models, Interpretability, and Applications.

Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders. Ella Hofmann-Coyle, Mayank Kulkarni, Jane Xie, Mounica Maddela and Daniel Preoţiuc-Pietro. AACL-IJCNLP 2022.

Cross-lingual Few-Shot Learning on Unseen Languages. Genta Indra Winata, Shijie Wu, Mayank Kulkarni, Thamar Solorio and Daniel Preoţiuc-Pietro. AACL-IJCNLP 2022.

Enhancing Tabular Reasoning with Pattern Exploiting Training. Abhilash Reddy Shankarampeta, Vivek Gupta and Shuo Zhang. AACL-IJCNLP 2022.

IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages. Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Genta Indra Winata and Ayu Purwarianti. First Workshop on Scaling Up Multilingual Evaluation (SUMEval) at AACL-IJCNLP 2022.

Sequentially Controlled Text Generation. Alexander Spangher, Yao Ming, Xinyu Hua and Nanyun Peng. Findings of EMNLP 2022.

Realistic Data Augmentation Framework for Enhancing Tabular Reasoning. Dibyakanti Kumar, Vivek Gupta, Soumya Sharma, and Shuo Zhang. Findings of EMNLP 2022.

Weakly Supervised Headline Dependency Parsing. Adrian Benton, Tianze Shi, Ozan İrsoy and Igor Malioutov. Findings of EMNLP 2022.

Entity Retrieval from Multilingual Knowledge Graphs. Saher Esmeir, Arthur Câmara and Edgar Meij. 2nd Workshop on Multilingual Representation Learning (MRL) at EMNLP 2022.

What Makes Data-to-Text Generation Hard for Pre-trained Language Models? Moniba Keymanesh, Adrian Benton and Mark Dredze. Natural Language Generation, Evaluation & Metrics (GEM) Workshop 2022 at EMNLP 2022.

The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
Genta Indra Winata (Bloomberg), Alham Fikri Aji (Independent Researcher), Zheng-Xin Yong (Brown University) and Thamar Solorio (Bloomberg). Published on arXiv.

kōan: A Corrected CBOW Implementation. Ozan İrsoy, Adrian Benton and Karl Stratos. arXiv. (Code Repository)

Keynote – Information in Context: Financial Conversations & News Flows. Gideon Mann. Workshop on Knowledge Discovery from Unstructured Data in Financial Services at AAAI 2021. (Video)

Dual Reinforcement-Based Specification Generation for Image De-Rendering. Ramakanth Pasunuru, David Rosenberg, Gideon Mann and Mohit Bansal. Workshop on Scientific Document Understanding at AAAI 2021. (Video)

Contextualizing Trending Entities in News Stories. Marco Ponza, Diego Ceccarelli, Paolo Ferragina, Edgar Meij and Sambhav Kothari. WSDM 2021.

Identifying Named Entities as they are Typed. Ravneet Arora, Chen-Tse Tsai and Daniel Preoţiuc-Pietro. EACL 2021.

SERAG: Semantic Entity Retrieval from Arabic knowledge Graphs. Saher Esmeir. The Sixth Arabic Natural Language Processing Workshop (WANLP 2021) at EACL 2021.

Semantic Table Retrieval using Keyword and Table Queries. Shuo Zhang, Krisztian Balog. Transactions on the Web (TWEB), May 2021.

On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles. Rakesh Gosangi, Ravneet Arora, Mohsen Gheisarieha, Debanjan Mahata, Haimin (Raymond) Zhang. NAACL-HLT 2021.

Diversity-Aware Batch Active Learning for Dependency Parsing. Tianze Shi, Adrian Benton, Igor Malioutov, Ozan İrsoy. NAACL-HLT 2021.

Learning Syntax from Naturally-Occurring Bracketings. Tianze Shi, Ozan İrsoy, Igor Malioutov, Lillian Lee. NAACL-HLT 2021.

ERNIE-NLI: Analyzing the Impact of Domain-Specific External Knowledge on Enhanced Representations for NLI. Lisa Bauer, Lingjia Deng, Mohit Bansal. Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures at NAACL-HLT 2021.

News Article Retrieval in Context for Event-centric Narrative Creation. Nikos Voskarides, Sabrina Sauer, Maarten de Rijke, Edgar Meij. ICTIR 2021 (co-located at SIGIR 2021).

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems. Weiwei Sun*, Shuo Zhang*, Krisztian Balog, Zhaochun Ren, Pengjie Ren, Zhumin Chen and Maarten de Rijke. SIGIR 2021. (*Equal contribution)

WTR: A Test Collection for Web Table Retrieval. Zhiyu Chen, Shuo Zhang, Brian D. Davison. SIGIR 2021.

Estimation of Corporate Greenhouse Gas Emissions via Machine Learning. You Han, Achintya Gopal, Liwen Ouyang and Aaron Key. Tackling Climate Change with Machine Learning Workshop at ICML 2021.

A Practical Two-step Approach to Assist Enterprise Question-Answering Live Chat. Ling-Yen Liao and Tarec Fares. SIGDIAL 2021.

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance Domain. Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng and Tat-Seng Chua. ACL-IJCNLP 2021.

Generic Oracles for Structured Prediction. Christoph Teichmann and Antoine Venant. 17th International Conference on Parsing Technologies (IWPT 2021); co-located at ACL-IJCNLP 2021.

Disentangling Online Chats with DAG-Structured LSTMs. Duccio Pappadopulo, Lisa Bauer, Marco Farina, Ozan İrsoy, Mohit Bansal. 10th Joint Conference on Lexical and Computational Semantics (*SEM 2021); co-located at ACL-IJCNLP 2021.

Keynote – Knowledge Graphs in Practice. Edgar Meij. CIKM 2021.

POSHAN: Cardinal POS Pattern Guided Attention for News Headline Incongruence. Rahul Mishra and Shuo Zhang. CIKM 2021.

Discovering Supply Chain Links with Augmented Intelligence. Achintya Gopal and Chunho Chang. ICAIF’21 Workshop on NL and Network Analysis in Financial Applications.

Cross-Register Projection for Headline Part of Speech Tagging. Adrian Benton, Hanyang Li and Igor Malioutov. EMNLP 2021.

Towards Realistic Few-Shot Relation Extraction. Sam Brody, Sichao Wu and Adrian Benton. EMNLP 2021.

Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification. Alexander Spangher, Sz-rung Shiang, Lingjia Deng and Jonathan May. EMNLP 2021.

GupShup: Summarizing Open-Domain Code-Switched Conversations. Laiba Mehnaz, Debanjan Mahata, Rakesh Gosangi, Uma Sushmitha Gunturi, Riya Jain, Gauri Gupta, Amardeep Kumar, Isabelle G. Lee, Anish Acharya and Rajiv Ratn Shah. EMNLP 2021.

Improving Dialogue State Tracking with Turn-based Loss Function & Sequential Data Augmentation. Jarana Manotumruksa, Edgar Meij, Emine Yilmaz and Jeff Dalton. Published in Findings of the Association for Computational Linguistics: EMNLP 2021; presented at Workshop on NLP for Conversational AI (NLP4ConvAI), co-located with EMNLP 2021.

FANATIC: FAst Noise-Aware TopIc Clustering. Ari Silburt, Anja Subasic, Evan Thompson, Carmeline Dsilva and Tarec Fares. Published in Findings of the Association for Computational Linguistics: EMNLP 2021.

Comparing Euclidean and Hyperbolic Embeddings on the WordNet Nouns Hypernymy Graph. Sameer Bansal and Adrian Benton. Workshop on Insights from Negative Results in NLP (co-located with EMNLP 2021).

Corrected CBOW Performs as well as Skip-gram. Ozan İrsoy, Adrian Benton, and Karl Stratos. Workshop on Insights from Negative Results in NLP (co-located with EMNLP 2021).

Keynote – Temporal drift & low-resource information extraction. Thamar Solario. 7th Workshop on Noisy User-generated Text (W-NUT); co-located with EMNLP 2021.

Maximum Mean Discrepancy for Generalization in the Presence of Distribution and Missingness Shift. Liwen Ouyang and Aaron Key. NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods & Applications (DistShift).

Keyphrase Generation for Scientific Articles using GANs. Avinash Swaminathan, Raj Kuwar Gupta, Haimin (Raymond) Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah. AAAI 2020.

Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition. Yaman Kumar, Dhruva Sahrawat, Shubham Maheshwari, Debanjan Mahata, Amanda Stent, Yifang Yin, Rajiv Ratn Shah and Roger Zimmermann. AAAI 2020.

An Iterative Approach for Identifying Complaint Based Tweets in Social Media Platforms. Gyanesh Anand, Akash Gautam, Puneet Mathur, Debanjan Mahata, Rajiv Ratn Shah, and Ramit Sawhney. AAAI 2020.

Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings. Dhruva Sahrawat, Debanjan Mahata, Haimin (Raymond) Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah and Roger Zimmermann. ECIR 2020. (watch the conference presentation)

Identifying Notable News Stories. Antonia Saravanou, Edgar Meij and Giorgio Stefanoni. ECIR 2020. (watch the conference presentation)

Novel Entity Discovery from Web Tables. Shuo Zhang, Edgar Meij, Ridho Reinanda and Krisztian Balog. WWW 2020.

Bias in Automatic Knowledge Graph Construction: A Workshop (KG-BIAS 2020). Edgar Meij, Tara Safavi, Chenyan Xiong, Gianluca Demartini, Miriam Redi and Fatma Ozcan. AKBC 2020.

Multi-Domain Named Entity Recognition with Genre-Aware and Agnostic Inference. Jing Wang, Mayank Kulkarni and Daniel Preoţiuc-Pietro. ACL 2020.

Temporally-informed Analysis of Named Entity Recognition. Shruti Rijhwani and Daniel Preoţiuc-Pietro. ACL 2020.

The Summary Loop: Learning to Write Abstractive Summaries Without Examples. Philippe Laban, Andrew Hsi, John Canny, and Marti Hearst. ACL 2020.

Analyzing Political Parody in Social Media. Antonis Maronikolakis, Danae Sanchez-Villegas, Daniel Preoţiuc-Pietro and Nikolaos Aletras. ACL 2020.

NSTM: Real-Time Query-Driven News Overview Composition at Bloomberg (System Demo). Joshua Bambrick, Minjie Xu, Guim Perarnau, Igor Malioutov, Andy Almonte, Vittorio Sello and Iat Chong Chan. ACL 2020.

Semi-Supervised Iterative Approach for Domain-Specific Complaint Detection in Social Media. Akash Gautam, Debanjan Mahata, Rakesh Gosangi and Rajiv Ratn Shah. The 3rd Workshop on e-Commerce and NLP (ECNLP 3) at ACL 2020.

Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification. Hongyuan Mei, Guanghui Qin, Minjie Xu and Jason Eisner. ICML 2020.

CrossBERT with Triplet Neural Architecture for Entity Property Ranking. Jarana Manotumruksa, Jeff Dalton, Edgar Meij and Emine Yilmaz. SIGIR 2020.

International Workshop on Knowledge Graphs: Mining Knowledge Graphs for Deep Insights. Ying Ding, Benjamin Glicksberg, Jim Hendler, Edgar Meij, Francois Scharffe, Jie Tang and Fei Wang, KDD 2020.

Global and Local Differential Privacy for Collaborative Bandits. Huazheng Wang, Qian Zhao, Qingyun Wu, Shubham Chopra, Abhinav Khaitan and Hongning Wang. RecSys 2020.

Cascading Hybrid Bandits: Online Learning to Rank for Relevance and Diversity. Chang Li, Haoyun Feng and Maarten de Rijke. RecSys 2020.

The 1st Wikidata Workshop. Lucie-Aimée Kaffee, Oana Tifrea-Marciuska, Elena Simperl and Denny Vrandečić. ISWC 2020.

Evaluating the Calibration of Knowledge Graph Embeddings for Trustworthy Link Prediction. Tara Safavi, Edgar Meij and Danai Koutra. EMNLP 2020.

Semantic Role Labeling as Syntactic Dependency Parsing. Tianze Shi, Igor Malioutov and Ozan İrsoy. EMNLP 2020.

A Preliminary Exploration of GANs for Keyphrase Generation. Avinash Swaminathan, Haimin (Raymond) Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah and Amanda Stent. EMNLP 2020.

Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty. Katherine Keith, Christoph Teichmann, Brendan O’Connor and Edgar Meij. NLP+CSS 2020 Workshop at EMNLP 2020.

Two-Step Classification using Recasted Data for Low Resource Settings. Shagun Uppal, Vivek Gupta, Avinash Swaminathan, Debanjan Mahata, Rakesh Gosangi, Haimin (Raymond) Zhang, Rajiv Ratn Shah, Amanda Stent. AACL 2020.

Point-of-Interest Type Inference from Social Media Text. Danae Sánchez Villegas, Daniel Preoţiuc-Pietro and Nikolaos Aletras. AACL 2020.

Keynote – The Computational Linguistics of Conversation Modeling. Amanda Stent. COLING’2020.

Fact vs. Opinion: The Role of Argumentative Features in News Classification. Tariq Alhindi, Smaranda Muresan, Daniel Preoţiuc-Pietro. COLING’2020.

Get IT Scored using AutoSAS – An Automated System for Scoring Short Answers. Yaman Kumar, Swati Aggarwal, Debanjan Mahata, Rajiv Shah, Ponnurangam Kumaraguru and Roger Zimmermann. EAAI-2019.

Predicting and Analyzing Language Specificity in Social Media Posts. Yifan Gao, Yang Zhong, Daniel Preoţiuc-Pietro, Junyi Jessy Li. AAAI-2019.

Visual Attention Model for Cross-sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning. Ran Zhao, Yuntian Deng, Mark Dredze, Arun Verma, David Rosenberg, Amanda Stent. FLAIRS 2019.

Improving Grey-Box Fuzzing by Modeling Program Control Flow. Siddharth Karamcheti, Gideon Mann and David Rosenberg. ML4SE 2019.

SemEval-2019 Task 6: Identifying Offensive Posts and Targeted Offense from Twitter. Haimin (Raymond) Zhang, Debanjan Mahata, Simra Shahid, Laiba Mehnaz, Sarthak Anand, Yaman Singla, Rajiv Ratn Shah, and Karan Uppal. International Workshop on Semantic Evaluation 2019 at NAACL-HLT 2019.

SemEval-2019 Task 9: Suggestion Mining from Online Reviews using ULMFiT. Sarthak Anand, Debanjan Mahata, Kartik Aggarwal, Laiba Mehnaz, Simra Shahid, Haimin (Raymond) Zhang, Yaman Singla, Rajiv Ratn Shah, Karan Uppal. International Workshop on Semantic Evaluation 2019 at NAACL-HLT 2019.

SNAP-BATNET: Cascading Author Profiling and Social Network Graphs for Suicide Ideation Detection on Social Media. Rohan Mishra, Pradyumn Prakhar Sinha, Ramit Sawhney, Debanjan Mahata, Puneet Mathur and Rajiv Ratn Shah. NAACL Student Research Workshop (SRW) 2019 at NAACL-HLT 2019.

Speak Up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment. Arijit Ghosh Chowdhury, Ramit Sawhney, Puneet Mathur, Debanjan Mahata and Rajiv Ratn Shah. NAACL Student Research Workshop (SRW) 2019 at NAACL-HLT 2019.

Decoding the Style and Bias of Song Lyrics. Manash Pratim Barman, Amit Awekar, and Sambhav Kothari. SIGIR 2019.

Dialogue Act Classification in Group Chats with DAG-LSTMs. Ozan İrsoy, Rakesh Gosangi, Haimin (Raymond) Zhang, Mu-Hsin Wei, Peter Lund, Duccio Pappadopulo, Brendan Fahy, Neophytos Nephytou, and Camilo Ortiz. 1st Workshop on Conversational Interaction Systems (WCIS) at SIGIR 2019.

Modeling financial analysts’ decision making via the pragmatics and semantics of earnings calls. Katherine Keith and Amanda Stent. ACL 2019.

Multi-task Pairwise Neural Ranking for Hashtag Segmentation. Mounica Maddela, Wei Xu, and Daniel Preoţiuc-Pietro. ACL 2019.

Categorizing and Inferring the Relationship between the Text and Image of Twitter Posts. Alakananda Vempala and Daniel Preoţiuc-Pietro. ACL 2019.

Analyzing Linguistic Differences between Owner and Staff Attributed Tweets. Daniel Preoţiuc-Pietro and Rita Devlin Marier. ACL 2019.

Automatically Identifying Complaints in Social Media. Daniel Preoţiuc-Pietro, Mihaela Găman, and Nikolaos Aletras. ACL 2019.

A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition. Ravneet Arora, Chen-Tsei Tsai, Ketevan Tsereteli, Anju Kambadur, and Yi Yang. ACL 2019.

Grammatical Sequence Prediction for Real-Time Neural Semantic Parsing. Chunyang Xiao, Christoph Teichmann, and Konstantine Arkoudas. Deep Learning & Formal Languages: Building Bridges Workshop @ ACL 2019.

Hush-Hush Speak: Speech Reconstruction Using Silent Videos. Shashwat Uttam, Yaman Kumar, Dhruva Sahrawat, Mansi Aggarwal, Rajiv Ratn Shah, Debanjan Mahata and Amanda Stent. INTERSPEECH 2019.

MobiVSR: Efficient and Light-weight Neural Network for Visual Speech Recognition on Mobile Devices. Nilay Shrivastava, Astitwa Saxena, Yaman Kumar, Rajiv Ratn Shah, Amanda Stent, Debanjan Mahata, Preeti Kaur, Roger Zimmermann. INTERSPEECH 2019.

Challenges in end-to-end neural scientific document OCR. Yuntian Deng, David Rosenberg, and Gideon Mann. ICDAR 2019.

Semantically Driven Auto-completion. Konstantine Arkoudas and Mohamed Yahya. CKIM 2019.

Understanding Goal-Oriented Active Learning via Influence Functions. Minjie Xu and Gary Kazantsev. Machine Learning with Guarantees Workshop @ NeurIPS 2019.

Learning Better Name Translation for Cross-Lingual Wikification. Chen-Tse Tsai and Dan Roth. AAAI-18.

Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. Giorgio Stefanoni, Boris Motik, Egor V. Kostylev. WWW 2018.

Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases. Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, Gerhard Weikum. WWW 2018.

RIDDL at SemEval-2018 Task 1: Rage Intensity Detection with Deep Learning. Venkatesh Elango and Karan Uppal. SemEval-2018 (at NAACL-HLT 2018).

Key2Vec: Automated Ranked Keyphrase Extraction from Scientific Articles Using Phrase Embeddings. Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann. NAACL-HLT 2018.

Collective Entity Disambiguation with Structured Gradient Tree Boosting. Yi Yang, Ozan İrsoy, Shefaet Rahman. NAACL-HLT 2018.

Weakly-supervised Contextualization of Knowledge Graph Facts. Nikos Voskarides, Edgar Meij, Ridho Reinanda, Abhinav Khaitan, Miles Osborne, Giorgio Stefanoni, Anju Kambadur and Maarten de Rijke. SIGIR 2018.

Trends in the Adoption of Corporate Child Labor Policies: An Analysis with Bloomberg Terminal ESG Data. Vedran Sekara, Alex Rutherford, Gideon Mann, Mark Dredze, Natalia Adler, Manuel Garcia-Herranz. Data for Good Exchange 2018.

Adaptive Grey-Box Fuzz-Testing with Thompson Sampling. Siddharth Karamcheti, Gideon Mann and David Rosenberg. AISec 2018.

Predicting Good Twitter Conversations. Zach Wood-Doughty, Anju Kambadur and Gideon Mann. W-NUT 2018 (at EMNLP 2018).

The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-level Predictions. Salvatore Giorgi, Daniel Preoţiuc-Pietro, Anneke Buffone, Daniel Rieman, Lyle Ungar and H. Andrew Schwartz. EMNLP 2018

Zero-Shot Open Entity Typing as Type-Compatible Grounding. Ben Zhou, Daniel Khashabi, Chen-Tse Tsai and Dan Roth. EMNLP 2018.

Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions. Eric Holgate, Isabel Cachola, Daniel Preoţiuc-Pietro and Junyi Jessy Li. EMNLP 2018.

Improving Grey-Box Fuzzing by Modeling Program Behavior. Siddharth Karamcheti, Gideon Mann and David Rosenberg. arXiv.

Generating descriptions of entity relationships. Nikos Voskarides, Edgar Meij, and Maarten de Rijke. ECIR 2017.

Automated Template Generation for Question Answering over Knowledge Graphs. Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald and Gerhard Weikum. WWW 2017.

Adaptive Submodular Ranking. Anju Kambadur and Fatemeh Navidi with Viswanath Nagarajan. IPCO 2017.

Beyond Binary Labels: Political Ideology Prediction of Twitter Users. Daniel Preoţiuc-Pietro, Ye Liu, Daniel Hopkins and Lyle Ungar. ACL 2017

Faster Greedy MAP Inference for Determinantal Point Processes. Anju Kambadur with Insu Han, Kyoungsoo Park, Jinwoo Shin. ICML 2017.

A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix. Christos Boutsidis, Petros Drineas, Anju Kambadur, Eugenia-Maria Kontopoulou and Anastasios Zouzias. ICML 2017

Boosting Information Extraction Systems with Character-level Neural Networks and Free Noisy Supervision. Philipp Meerkamp and Zhengyi Zhou. Structured Predictions Workshop (at EMNLP 2017).

Cheap Translation for Cross-Lingual Named Entity Recognition. Stephen Mayhew, Chen-Tse Tsai and Dan Roth. EMNLP 2017.

Controlling Human Perception of Basic User Traits. Daniel Preoţiuc-Pietro, Sharath Chandra Guntuku and Lyle Ungar. EMNLP 2017.

Scatteract: Automated extraction of data from scatter plots. Mathieu Cliche, David Rosenberg, Dhruv Madeka and Connie Yee. ECML PKDD 2017.

Civil Asset Forfeiture: A Judicial Perspective. Leslie Barrett, Alexandra Ortan, Ryon Smey, Michael W. Sherman, Zefu Lu, Wayne Krug, Roberto Martin, Anu Pradhan, Trent Wenzel, Alexander Sherman, Karin D. Martin. Data for Good Exchange 2017.

Knowledge Questions from Knowledge Graphs. Dominic Seyler, Mohamed Yahya and Klaus Berberich. ICTIR 2017.

Camera Based Two Factor Authentication Through Mobile and Wearable Devices. Mozhgan Azimpourkivi, Umut Topkara, Bogdan Carbunar. UbiComp 2017.