{"id":34072,"date":"2023-12-08T08:26:26","date_gmt":"2023-12-08T13:26:26","guid":{"rendered":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/"},"modified":"2024-04-12T16:08:38","modified_gmt":"2024-04-12T20:08:38","slug":"bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023","status":"publish","type":"post","link":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/","title":{"rendered":"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023"},"content":{"rendered":"<div class='bbg-row bbg-bg--white  bbg-row--margin-top-none bbg-row--margin-bottom-none' data-anchor='row-6a007324a9f28'>\n  \n\t\n\t\n\t<div class=\"bbg-row--content\">\n\t\t\n\t\t\t<div class='bbg-column bbg-column--width-8 bbg-column--offset-2'>\n\t<div class='bb-wysiwyg'>\n    \n    <p>During the <a href=\"https:\/\/2023.emnlp.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)<\/a> in Singapore this week, researchers from Bloomberg\u2019s <a href=\"https:\/\/www.TechAtBloomberg.com\/AI\" target=\"_blank\" rel=\"noopener noreferrer\">AI Engineering Group<\/a> are showcasing their expertise in natural language processing (NLP) by publishing four papers, one of which will appear in Findings of EMNLP 2023.<\/p>\n<p>Through these papers, the authors and their collaborators highlight a variety of NLP applications, novel approaches and improved models used in key tasks, and other advances to the state-of-the-art in the field of computational linguistics.<\/p>\n<p>We asked some of the authors to summarize their research and explain why the results were notable:<\/p>\n<hr \/>\n<h3 style=\"text-align: center;\"><\/h3>\n<h3><a href=\"https:\/\/aclanthology.org\/2023.emnlp-main.337\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>EntSUMv2: Data, Models and Evaluation for More Abstractive Entity-Centric Summarization<\/strong><\/a><\/h3>\n<p>Dhruv Mehra (Bloomberg), Lingjue Xie (Bloomberg), Ella Hofmann-Coyle (Bloomberg), Mayank Kulkarni (work done while at Bloomberg) and Daniel Preo\u0163iuc-Pietro (Bloomberg)<\/p>\n<p><em>Poster Session 5 (Saturday, December 9, 2023 @ 11:00 AM SGT)<\/em><em><br \/>\n<\/em><\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center image-figure--has-small-image\" data-animation=\"\">\n    <a class='image-figure__link' href='https:\/\/aclanthology.org\/2023.emnlp-main.337.pdf' target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1086\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png\" class=\"attachment-medium_large size-medium_large image-figure__image image-figure__image--primary\" alt=\"Front page of EMNLP 2023 paper &quot;ENTSUMV2: Data, Models and Evaluation for More Abstractive Entity-Centric Summarization&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 134w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 1654w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"1654\" height=\"2339\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Front page of EMNLP 2023 paper &quot;ENTSUMV2: Data, Models and Evaluation for More Abstractive Entity-Centric Summarization&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 1654w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.337-EntSUMv2_Page_01.png 134w\" sizes=\"(max-width: 1654px) 100vw, 1654px\" \/><\/a>\n    \n<\/figure>\n<div class=\"bb-separator\" data-color=\"\">\n\t<hr class=\"bb-separator__rule\">\n<\/div>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>Please summarize your research. Why are your results notable?<\/strong><\/p>\n<p><strong>Ella:<\/strong> Entity-centric summarization is a form of controllable summarization that requires producing a synopsis of a text document with respect to a specific entity. Our research focuses on abstractive summarization, which involves generating a new summary from scratch. This is in contrast to\u00a0 our <a href=\"https:\/\/aclanthology.org\/2022.aacl-short.40\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous work on extractive summarization<\/a>, where the summary was constructed using only text that is present in the original text.<\/p>\n<p>Exploration of this entity-centric summarization task was enabled by our <a href=\"https:\/\/aclanthology.org\/2022.acl-long.237.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">past work at ACL 2022<\/a>, where we introduced the <a href=\"https:\/\/github.com\/bloomberg\/entsum\" target=\"_blank\" rel=\"noopener noreferrer\">EntSUM dataset<\/a>. In this paper, we release the EntSUMv2 dataset, which builds upon the original EntSUM dataset to include new annotated abstractive summaries that are intentionally shortened to aid in generating more specific and useful entity-centric summaries.<\/p>\n<p>In addition to releasing EntSUMv2, we explore supervised fine-tuning and instruction tuning of large language models to generate entity-specific abstractive summaries and perform evaluation against EntSUMv2.<\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center image-figure--has-small-image\" data-animation=\"\">\n    <img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"331\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png\" class=\"attachment-medium_large size-medium_large image-figure__image image-figure__image--primary\" alt=\"Table 1. Automated metrics for the different fine-tuned and instruction-tuned summarization models on the EntSUMv2 dataset (bold typeface denotes the best performance overall and underlined numbers represent best performance within a class of methods).\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 1536w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 280w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 1762w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"1762\" height=\"760\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Table 1. Automated metrics for the different fine-tuned and instruction-tuned summarization models on the EntSUMv2 dataset (bold typeface denotes the best performance overall and underlined numbers represent best performance within a class of methods).\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 1762w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 1536w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image5.png 280w\" sizes=\"(max-width: 1762px) 100vw, 1762px\" \/>\n    <figcaption class='image-figure__caption'><span style=\"font-weight: 400\">Table 1. Automated metrics for the different fine-tuned and instruction-tuned summarization models on the EntSUMv2 dataset (bold typeface denotes the best performance overall and underlined numbers represent best performance within a class of methods).<\/span><\/figcaption>\n<\/figure>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>Dhruv:<\/strong> As you can see, it is clear that fine-tuned models (the middle section) fare much better than instruction-tuned models (the last section), but it is not clear what the differences between each of these models are. Are they producing short and relevant summaries about an entity that are incomplete? Or are they producing verbose and complete summaries about an entity that contain extra, yet irrelevant, information?<\/p>\n<p><strong>Ella:<\/strong> To answer these questions, we propose a new method of qualitative human evaluation that evaluates each model across five crucial facets that high quality entity-centric summaries possess: Entity-Specificity, Factuality, Completeness, Fluency and Quality. These qualitative metrics provide a more fine-grained interpretation of the current state-of-the-art systems.<\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center image-figure--has-small-image\" data-animation=\"\">\n    <img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"96\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png\" class=\"attachment-medium_large size-medium_large image-figure__image image-figure__image--primary\" alt=\"Table 2. Human evaluation results of three types of summarization models on a subset of the ENTSUMv2 dataset (bold typeface denotes the best performance).\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 1536w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 280w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 1670w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"1670\" height=\"208\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Table 2. Human evaluation results of three types of summarization models on a subset of the ENTSUMv2 dataset (bold typeface denotes the best performance).\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 1670w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 1536w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image3.png 280w\" sizes=\"(max-width: 1670px) 100vw, 1670px\" \/>\n    <figcaption class='image-figure__caption'>Table 2. Human evaluation results of three types of summarization models on a subset of the EntSUMv2 dataset (bold typeface denotes the best performance).<\/figcaption>\n<\/figure>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>Dhruv:<\/strong> We evaluated the best performing models in each category along these metrics, which reveal some insights. For example, GSum models give more relevant and complete summaries that are less fluent, while the T5-based models provide more fluent summaries that are less complete and less factually accurate.<\/p>\n<p><strong>How does your research advance the state-of-the-art in the field of natural language processing?<\/strong><\/p>\n<p><strong>Dhruv:<\/strong> Our research provides a new dataset which can be used to evaluate models on the generative entity-centric summarization task, as well as provides a new framework for obtaining human evaluations which captures a more holistic view of the summaries as opposed to industry standard automated metrics.<\/p>\n\n<\/div>\n\n<\/div>\n\n\n\t\t\n\t<\/div>\n<\/div>\n<div class='bbg-row bbg-bg--white ' data-anchor='row-6a007324b9bdc'>\n  \n\t\n\t\n\t<div class=\"bbg-row--content\">\n\t\t\n\t\t\t<div class='bbg-column'>\n\t<div class='interstitial' data-element='interstitial-component'>\n\t<div class='interstitial-blue_border_design interstitial-bg-white'>\n\t\t<div class='interstitial-blue_border_design__rest bbg-column--width-7'>\n\t\t\t<p class='interstitial-blue_border_design__the_title interstitial_title'>Make it happen here.<\/p>\n\t\t\t<p class='interstitial-blue_border_design__text interstitial_text'><\/p>\n\t\t<\/div>\n\t\t<a class='interstitial-blue_border_design__link interstitial__link bbg-column--width-3 bbg-button bbg-button--size-large' href='https:\/\/bloomberg.avature.net\/en_US\/careers\/SearchJobs?utm_medium=mktg_site&utm_content=company_interstitial&utm_source=website' target='_blank' rel='noopener noreferrer' data-element='interstitial' data-description='' data-label='SEARCH NOW' data-element-position='[@data-element-position]'>SEARCH NOW<\/a>\n\t<\/div>\n<\/div>\n\n<\/div>\n\n\n\t\t\n\t<\/div>\n<\/div>\n<div class='bbg-row bbg-bg--grey ' data-anchor='row-6a007324baf90'>\n  \n\t\n\t\n\t<div class=\"bbg-row--content\">\n\t\t\n\t\t\t<div class='bbg-column bbg-column--width-8 bbg-column--offset-2'>\n\t<div class='bb-wysiwyg'>\n    \n    <h3><a href=\"https:\/\/aclanthology.org\/2023.emnlp-main.774\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Multilingual Large Language Models Are Not (Yet) Code-Switchers<\/strong><\/a><\/h3>\n<p>Ruochen Zhang (Brown University), Samuel Cahyawijaya (HKUST), Jan Christian Blaise Cruz (Samsung R&amp;D Institute Philippines), Genta Indra Winata (Bloomberg) and Alham Fikri Aji (MBZUAI)<\/p>\n<p><em>Multilinguality and Linguistic Diversity 2 (Saturday, December 9, 2023 @ 11:00 AM SGT)<\/em><\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center image-figure--has-small-image\" data-animation=\"\">\n    <a class='image-figure__link' href='https:\/\/aclanthology.org\/2023.emnlp-main.774.pdf' target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1086\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png\" class=\"attachment-medium_large size-medium_large image-figure__image image-figure__image--primary\" alt=\"Front page of EMNLP 2023 paper &quot;Multilingual Large Language Models Are Not (Yet) Code-Switchers&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 134w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 1654w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"1654\" height=\"2339\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Front page of EMNLP 2023 paper &quot;Multilingual Large Language Models Are Not (Yet) Code-Switchers&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 1654w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.774-Multilingual-Large-Language-Models-Are-Not-Yet-Code-Switchers_Page_01.png 134w\" sizes=\"(max-width: 1654px) 100vw, 1654px\" \/><\/a>\n    \n<\/figure>\n<div class=\"bb-separator\" data-color=\"\">\n\t<hr class=\"bb-separator__rule\">\n<\/div>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>Please summarize your research. Why are your results notable?<\/strong><\/p>\n<p><strong>Genta:<\/strong> Large Language Models (LLMs) have shown their potential in the context of zero-shot and few-shot prompting. The successes of these LLMs have also been effective in multilingual settings, where models are specifically trained to learn individual languages, which has proven to be highly beneficial for monolingual tasks. However, in multilingual communities, people do not confine themselves to speaking only a single language; instead, they use two or more languages interchangeably during a conversation \u2013 a phenomenon known as code-switching. This allows individuals to communicate cultural-specific concepts more effectively, signaling their group identity and reinforcing their social connection.<\/p>\n<p>The main challenge of developing multilingual LLMs optimized for code-switching lies in data scarcity. Given the highly colloquial characteristic of code-switching, existing resources dedicated to code-switching are rare, and large-scale collection requires considerable annotation efforts.<\/p>\n<p>In this paper, we benchmark the ability of LLMs to understand and generate code-switching on existing code-switching datasets to gauge the limitations of LLMs on four different tasks and a variety of language pairs. Figure 1 shows the illustration of tasks included in our benchmark study.<\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure--has-small-image\" data-animation=\"\">\n    <img loading=\"lazy\" decoding=\"async\" width=\"1216\" height=\"616\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png\" class=\"attachment-full size-full image-figure__image image-figure__image--primary\" alt=\"Figure 1. Illustration of NLP tasks included in the study.\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 1216w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 280w\" sizes=\"(max-width: 1216px) 100vw, 1216px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"1216\" height=\"616\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Figure 1. Illustration of NLP tasks included in the study.\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 1216w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image2.png 280w\" sizes=\"(max-width: 1216px) 100vw, 1216px\" \/>\n    <figcaption class='image-figure__caption'>Figure 1. Illustration of NLP tasks included in the study.<\/figcaption>\n<\/figure>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>How does your research advance the state-of-the-art in the field of natural language processing?<\/strong><\/p>\n<p>Our results suggest that the scaling law is applicable to multilingual LLMs across diverse code-switching tasks and model architectures. However, smaller-scale, fine-tuned models substantially outperform the largest multilingual LLM with prompting methods. In addition, while hosted LLMs achieve scores comparable to our fine-tuned models, such performance remains uninterpretable due to their closed natures. We argue that existing multilingual LLMs exhibit limited proficiency in code-switching contexts, highlighting future research opportunities to transform them into true polyglots.<\/p>\n\n<\/div>\n\n<\/div>\n\n\n\t\t\n\t<\/div>\n<\/div>\n<div class='bbg-row bbg-bg--white ' data-anchor='row-6a007324c3b55'>\n  \n\t\n\t\n\t<div class=\"bbg-row--content\">\n\t\t\n\t\t\t<div class='bbg-column bbg-column--width-8 bbg-column--offset-2'>\n\t<div class='bb-wysiwyg'>\n    \n    <h3><a href=\"https:\/\/aclanthology.org\/2023.emnlp-main.149\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>TempTabQA: Temporal Question Answering for Semi-Structured Tables<\/strong><\/a><\/h3>\n<p>Vivek Gupta (University of Pennsylvania), Pranshu Kandoi<br \/>\n(IIT Guwahati), Mahek Bhavesh Vora (IIT Guwahati), Shuo Zhang (Bloomberg), Yujie He (Bloomberg), Ridho Reinanda (Bloomberg) and Vivek Srikumar (University of Utah)<\/p>\n<p><em>Resources and Evaluation 2 (Sunday, December 10, 2023 @ 12:00 PM SGT)<\/em><\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center image-figure--has-small-image\" data-animation=\"\">\n    <a class='image-figure__link' href='https:\/\/aclanthology.org\/2023.emnlp-main.149.pdf' target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1086\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png\" class=\"attachment-medium_large size-medium_large image-figure__image image-figure__image--primary\" alt=\"Front page of EMNLP 2023 paper &quot;TEMPTABQA: Temporal Question Answering for Semi-Structured Tables&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 134w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"3721\" height=\"5262\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Front page of EMNLP 2023 paper &quot;TEMPTABQA: Temporal Question Answering for Semi-Structured Tables&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 3721w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.emnlp-main.149-TempTabQA_Page_01.png 134w\" sizes=\"(max-width: 3721px) 100vw, 3721px\" \/><\/a>\n    \n<\/figure>\n<div class=\"bb-separator\" data-color=\"\">\n\t<hr class=\"bb-separator__rule\">\n<\/div>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>Please summarize your research. Why are your results notable?<\/strong><\/p>\n<p><strong>Shuo:<\/strong> Factual information pertaining to a particular entity often undergoes temporal changes, necessitating a thorough comprehension of the scope of knowledge and temporal intervals. This factual data is typically dispersed across semi-structured formats, such as tables, and includes both implicit and explicit representations (see Figure 2 for an example). The extensive presence of these characteristics presents significant challenges for NLP models, necessitating them to proficiently manage temporal changes and extract meaningful insights from time-sensitive data.<\/p>\n<p>To address this issue effectively, we introduce a new task, referred to as \u201ctemporal question answering on entity-centric semi-structured tables,\u201d demonstrated in Figure 2. Furthermore, we have curated a comprehensive, temporally-aligned dataset (<a href=\"https:\/\/zenodo.org\/records\/10022927\" target=\"_blank\" rel=\"noopener noreferrer\">TempTabQA<\/a>), which covers a variety of domains and has undergone human verification. We conducted extensive experiments and found that temporal reasoning in TempTabQA presents a greater challenge compared to non-temporal reasoning in preceding tabular datasets.<\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center image-figure--has-small-image\" data-animation=\"\">\n    <img loading=\"lazy\" decoding=\"async\" width=\"738\" height=\"1084\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png\" class=\"attachment-medium_large size-medium_large image-figure__image image-figure__image--primary\" alt=\"Figure 2. A semi-structured table of women badminton players (source: Wikipedia), along with accompanying temporal questions and their respective answers from TempTabQA.\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 738w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 204w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 697w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 129w\" sizes=\"(max-width: 738px) 100vw, 738px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"738\" height=\"1084\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Figure 2. A semi-structured table of women badminton players (source: Wikipedia), along with accompanying temporal questions and their respective answers from TempTabQA.\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 738w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 204w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 697w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image4.png 129w\" sizes=\"(max-width: 738px) 100vw, 738px\" \/>\n    <figcaption class='image-figure__caption'><span style=\"font-weight: 400\">Figure 2. A semi-structured table of women badminton players (source: Wikipedia), along with accompanying temporal questions and their respective answers from TempTabQA.<\/span><\/figcaption>\n<\/figure>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>How does your research advance the state-of-the-art in the field of natural language processing?<\/strong><\/p>\n<p>Our paper is a significant step forward because it&#8217;s the first to develop complex datasets for answering time-based questions that are specifically designed for tables focused on specific topics or entities. Our main goal was to introduce a new challenge \u2013 answering complex questions about time within this context.<\/p>\n<p>The TempTabQA dataset requires not only high-level reasoning but also a solid understanding of how time works, as well as good math skills. Our work highlights how unique this dataset is because of its focus on time, making it different from existing models. We dig deep into this difference, providing a detailed set of statistics and analyses that show the many challenges of reasoning about time that the dataset presents. These findings help us better understand how to reason about time in tables and encourage more research in this area.<\/p>\n\n<\/div>\n\n<\/div>\n\n\n\t\t\n\t<\/div>\n<\/div>\n<div class='bbg-row bbg-bg--grey ' data-anchor='row-6a007324cc7f1'>\n  \n\t\n\t\n\t<div class=\"bbg-row--content\">\n\t\t\n\t\t\t<div class='bbg-column bbg-column--width-8 bbg-column--offset-2'>\n\t<div class='bb-wysiwyg'>\n    \n    <h3><a href=\"https:\/\/aclanthology.org\/2023.findings-emnlp.668\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Semantic Similarity Covariance Matrix Shrinkage<\/strong><\/a><\/h3>\n<p>Guillaume Becquin (Bloomberg) and Saher Esmeir (Bloomberg)<\/p>\n<p><em>Findings of EMNLP 2023<\/em><\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center image-figure--has-small-image\" data-animation=\"\">\n    <a class='image-figure__link' href='https:\/\/aclanthology.org\/2023.findings-emnlp.668.pdf' target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1086\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png\" class=\"attachment-medium_large size-medium_large image-figure__image image-figure__image--primary\" alt=\"Front page of Findings of EMNLP 2023 paper &quot;Semantic Similarity Covariance Matrix Shrinkage&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 134w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 1654w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><img loading=\"lazy\" decoding=\"async\" width=\"1654\" height=\"2339\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png\" class=\"attachment-full size-full image-figure__image image-figure__image--small\" alt=\"Front page of Findings of EMNLP 2023 paper &quot;Semantic Similarity Covariance Matrix Shrinkage&quot;\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 1654w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 212w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 724w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 1086w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 1448w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/2023.findings-emnlp.668-Semantic-Similarity-Covariance-Matrix-Shrinkage_Page_01.png 134w\" sizes=\"(max-width: 1654px) 100vw, 1654px\" \/><\/a>\n    \n<\/figure>\n<div class=\"bb-separator\" data-color=\"\">\n\t<hr class=\"bb-separator__rule\">\n<\/div>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>Please summarize your research. Why are your results notable?<\/strong><\/p>\n<p><strong>Guillaume:<\/strong> When building an investment portfolio, asset managers often aim to maximize the expected returns while minimizing the expected volatility (a proxy for the portfolio level of risk). A common technique to reduce the volatility is to build a diversified portfolio \u2013 find uncorrelated assets so the volatility of the portfolio is significantly lower than its individual components. Unfortunately, estimating the degree of correlation between assets in a portfolio (covariance matrix) is very challenging since the number of random variables (components of the portfolio) is typically larger than the number of historical price observations.<\/p>\n<p>Covariance shrinkage is an established regularization method in quantitative finance that regularizes the estimation of the covariance matrix. Our work extends the idea of shrinkage by making use of additional information from company fundamentals to regularize the covariance matrix. Embeddings (vector representations) of portfolio components (e.g., company stocks) can be generated using modern NLP techniques via sentence encoder or knowledge graphs. These embeddings are used to compute a similarity matrix for the portfolio assets that includes fundamental information about the assets, and are an effective regularization target for use in the well-established shrinkage framework.<\/p>\n\n<\/div>\n<figure class=\"image-figure image-figure__center\" data-animation=\"\">\n    <a class='image-figure__link' href='https:\/\/aclanthology.org\/2022.findings-acl.36.pdf' target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"874\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image1.png\" class=\"attachment-full size-full image-figure__image image-figure__image--primary\" alt=\"Figure 3. The semantic similarity between companies is used as a target to shrink (regularize) the sample covariance matrix.\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image1.png 1999w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image1.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image1.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image1.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image1.png 1536w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/image1.png 280w\" sizes=\"(max-width: 1999px) 100vw, 1999px\" \/><\/a>\n    <figcaption class='image-figure__caption'>Figure 3. The semantic similarity between companies is used as a target to shrink (regularize) the sample covariance matrix.<\/figcaption>\n<\/figure>\n<div class='bb-wysiwyg'>\n    \n    <p><strong>How does your research advance the state-of-the-art in the field of natural language processing?<\/strong><\/p>\n<p>Natural language processing approaches are increasingly being adopted in the fields of finance and portfolio management. Previous work has mainly focused on improving the prediction of future returns to maximize expected profit. However, the estimation of portfolio volatility is also a critical element for finding the optimum portfolio at a given level of acceptable risk (risk-return trade-off).<\/p>\n<p>Our research provides a robust framework that uses the output of NLP models to produce robust estimates of the portfolio covariance matrix by extending established methods in quantitative finance. Implemented as a simple post-processing step, it is widely applicable to any semantic model (including sentence embeddings and knowledge graph embeddings).<\/p>\n\n<\/div>\n\n<\/div>\n\n\n\t\t\n\t<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Bloomberg&#8217;s four EMNLP 2023 research papers highlight a variety of state-of-the-art applications, novel approaches, and improved models used in key NLP tasks<\/p>\n","protected":false},"author":184,"featured_media":34097,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1466],"tags":[1498,1578,1637,1472,1485,1624,1486,1638,1580,1591],"class_list":["post-34072","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-at-bloomberg","tag-ai","tag-artificial-intelligence","tag-computational-linguistics","tag-data-science","tag-machine-learning","tag-ml","tag-natural-language-processing","tag-neural-ranking","tag-nlp","tag-sentiment"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.11 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023<\/title>\n<meta name=\"description\" content=\"Bloomberg&#039;s four EMNLP 2023 research papers highlight a variety of applications, novel approaches, and improved models used in key NLP tasks.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023\" \/>\n<meta property=\"og:description\" content=\"Bloomberg&#039;s four EMNLP 2023 research papers highlight a variety of applications, novel approaches, and improved models used in key NLP tasks.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/\" \/>\n<meta property=\"og:site_name\" content=\"Bloomberg L.P.\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/bloomberglp\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-12-08T13:26:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-12T20:08:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1323\" \/>\n\t<meta property=\"og:image:height\" content=\"465\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"chaas30\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png\" \/>\n<meta name=\"twitter:creator\" content=\"@bloomberg\" \/>\n<meta name=\"twitter:site\" content=\"@bloomberg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"chaas30\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/\",\"url\":\"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/\",\"name\":\"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023\",\"isPartOf\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/#website\"},\"datePublished\":\"2023-12-08T13:26:26+00:00\",\"dateModified\":\"2024-04-12T20:08:38+00:00\",\"author\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/4d4a18aae79d6fcc1ea98181a906905e\"},\"description\":\"Bloomberg's four EMNLP 2023 research papers highlight a variety of applications, novel approaches, and improved models used in key NLP tasks.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":\"1\",\"name\":\"Home\",\"item\":\"https:\/\/www.bloomberg.com\/company\/\"},{\"@type\":\"ListItem\",\"position\":\"2\",\"name\":\"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/#website\",\"url\":\"https:\/\/www.bloomberg.com\/company\/\",\"name\":\"Bloomberg L.P.\",\"description\":\"Bloomberg L.P. is the leader in global business and financial information, enabling customers to make smarter, faster, more informed business decisions.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.bloomberg.com\/company\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/4d4a18aae79d6fcc1ea98181a906905e\",\"name\":\"Bloomberg L.P.\",\"url\":\"https:\/\/www.bloomberg.com\/company\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023","description":"Bloomberg's four EMNLP 2023 research papers highlight a variety of applications, novel approaches, and improved models used in key NLP tasks.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/","og_locale":"en_US","og_type":"article","og_title":"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023","og_description":"Bloomberg's four EMNLP 2023 research papers highlight a variety of applications, novel approaches, and improved models used in key NLP tasks.","og_url":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/","og_site_name":"Bloomberg L.P.","article_publisher":"https:\/\/www.facebook.com\/bloomberglp\/","article_published_time":"2023-12-08T13:26:26+00:00","article_modified_time":"2024-04-12T20:08:38+00:00","og_image":[{"width":1323,"height":465,"url":"https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png","type":"image\/png"}],"author":"chaas30","twitter_card":"summary_large_image","twitter_image":"https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png","twitter_creator":"@bloomberg","twitter_site":"@bloomberg","twitter_misc":{"Written by":"chaas30","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/","url":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/","name":"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023","isPartOf":{"@id":"https:\/\/www.bloomberg.com\/company\/#website"},"datePublished":"2023-12-08T13:26:26+00:00","dateModified":"2024-04-12T20:08:38+00:00","author":{"@id":"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/4d4a18aae79d6fcc1ea98181a906905e"},"description":"Bloomberg's four EMNLP 2023 research papers highlight a variety of applications, novel approaches, and improved models used in key NLP tasks.","breadcrumb":{"@id":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.bloomberg.com\/company\/stories\/bloombergs-ai-engineering-group-cto-office-publish-4-nlp-research-papers-at-emnlp-2023\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":"1","name":"Home","item":"https:\/\/www.bloomberg.com\/company\/"},{"@type":"ListItem","position":"2","name":"Bloomberg\u2019s AI Engineering Group Publishes 4 NLP Research Papers at EMNLP 2023"}]},{"@type":"WebSite","@id":"https:\/\/www.bloomberg.com\/company\/#website","url":"https:\/\/www.bloomberg.com\/company\/","name":"Bloomberg L.P.","description":"Bloomberg L.P. is the leader in global business and financial information, enabling customers to make smarter, faster, more informed business decisions.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.bloomberg.com\/company\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/4d4a18aae79d6fcc1ea98181a906905e","name":"Bloomberg L.P.","url":"https:\/\/www.bloomberg.com\/company"}]}},"featured_image_rendered":"<img srcset='https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png 280w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png 1323w' src='https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2023\/12\/emnlp-2023-logo.png' alt='EMNLP 2023 logo' \/>","category_info":{"name":"Tech At Bloomberg","blog_landing_name":"Tech At Bloomberg"},"_links":{"self":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/34072","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/users\/184"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/comments?post=34072"}],"version-history":[{"count":10,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/34072\/revisions"}],"predecessor-version":[{"id":35686,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/34072\/revisions\/35686"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/media\/34097"}],"wp:attachment":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/media?parent=34072"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/categories?post=34072"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/tags?post=34072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}