Tech At Bloomberg

Bloomberg’s AI Engineers Publish 3 Information Retrieval Research Papers at SIGIR 2025

July 13, 2025

During the 48th International ACM SIGIR Conference (SIGIR 2025) in Padua, Italy this week (July 13-17, 2025), researchers from Bloomberg’s AI Engineering Group are showcasing their expertise in information retrieval (IR) by publishing three papers at the conference.

IR is the process of finding and accessing relevant information from a collection of data, such as documents, images or databases, in response to a user’s query. IR is critical for combating information overload and facilitating access to knowledge in a world full of data. AI is revolutionizing the field, and research presented at SIGIR will be applied toward making information retrieval systems more robust, effective, and efficient.

In addition, Bloomberg AI Research Scientist Shuo Zhang is one of the organizers of the 2nd Workshop on Financial Information Retrieval in the Era of Generative AI (FinIR) on July 17, 2025. The workshop serves as a forum for researchers and practitioners to explore potential approaches and research directions to address the challenges of using generative AI in finance through the use of advanced IR techniques. Its goal is to deepen understanding, accelerate progress, and support the advancement of IR technology to enhance generative models to address financial challenges.

We asked some of the authors to summarize their research and explain why the results were notable:


Monday, July 14, 2025

Reproducibility: Domain-Specific, Multimodal, and Multilingual Retrieval
10:30-12:30 CEST

Benchmark Granularity and Model Robustness for Image-Text Retrieval: A Reproducibility Study
Mariya Hendriksen (University of Amsterdam), Shuo Zhang (Bloomberg), Ridho Reinanda (Bloomberg), Mohamed Yahya (Bloomberg), Edgar Meij (Bloomberg), Maarten de Rijke (University of Amsterdam)

Click to read "Benchmark Granularity and Model Robustness for Image-Text Retrieval," published July 13, 2025 at SIGIR 2025Click to read "Benchmark Granularity and Model Robustness for Image-Text Retrieval," published July 13, 2025 at SIGIR 2025

Please summarize your research. Why are your results notable?

Shuo Zhang: In our study, we examined how well today’s AI models can match images with text descriptions — a technology behind many modern search and recommendation systems. While these systems are widely used, most are only tested with short, simple captions that don’t reflect how people actually search in real life. Our work looked at two often-overlooked factors: the level of detail in descriptions (caption “granularity”) and how well these systems handle errors or unexpected input (“robustness”). See Figure 1 as an illustration.

Granularity refers to how much detail is present in a caption. A simple, coarse caption might be “a red rose is sitting next to a couple of mugs,” while a fine-grained version could be “a single red rose in a glass vase sits on a dining table, next to two ceramic mugs with floral designs.”

Robustness (illustrated below), on the other hand shows how even a small typo (“couple” vs. “coupel,” or “motorcycles” vs. “omtorcycles”) can increase or decrease the quality of retrieved images. This highlights the importance of evaluating models under realistic, imperfect conditions, not just with idealized queries.

Examples of perturbation effects on R@1 (Recall@1) for image retrievalExamples of perturbation effects on R@1 (Recall@1) for image retrieval
Examples of perturbation effects on R@1 (Recall@1) for image retrieval

We evaluated four of the most advanced vision-language models, CLIP, ALIGN, AltCLIP, and GroupViT, on both standard datasets and new versions we created, which include richer, more descriptive captions. To test robustness, we introduced a wide range of common mistakes, — like word order changes, typos, and synonyms — into the search queries, and measured how each model performed.

Key Findings

  • Detailed descriptions lead to better results: Models found correct images much more often when given detailed, specific captions — up to 16% better in some cases.
  • Varying robustness: Models responded differently to different kinds of input “noise.” Some errors, like rearranged word order, had a major impact; others, like small word changes, were less disruptive or even occasionally improved performance.
  • Order matters: Contrary to previous beliefs, these AI models are sensitive to the order of words in a description, especially with longer or more detailed captions.
  • A new evaluation toolkit: We built a new testing framework that allows others to measure model performance using more realistic, challenging scenarios that reflect actual user behavior.

How does your research advance the state-of-the-art in the field of information retrieval?

Most current image search systems are tested only under ideal conditions, with simple, short, and general captions — for example, “a person walking a dog.” These “coarse” captions mention only the main objects or actions and leave out specifics like the dog’s breed, what the person is wearing, or the location. In contrast, real users often type much more detailed and sometimes imperfect queries, such as, “a woman in a yellow raincoat walking a black labrador on a leash through a city park in autumn, with leaves on the ground.” This mismatch means existing AI systems may not be as reliable as we think when deployed in real-world situations. Our research directly addresses this gap.

  • More accurate search: For industries that rely on finding the right image quickly, such as newsrooms, digital asset management, and e-commerce, improved search accuracy saves time and reduces frustration.
  • Greater reliability: Our findings help ensure AI tools don’t break down when faced with everyday user input, improving user trust and satisfaction.
  • Higher standards: By introducing more realistic testing, we set a new bar for how image-text AI should be evaluated before being put into production.

For example, if someone needs to find a specific photo for a story and enters a long, detailed caption — even with a typo: “a woman in a yellow raincaot walking a black labrador…”) — our research helps ensure that the right image can still be found. This is crucial in fast-paced or high-stakes environments. To support progress across the industry, we are making our tools publicly available, so others can continue improving the reliability and usefulness of AI search technology.


Tuesday, July 15, 2025

Short Paper Posters 2
14:00-15:30 CEST
An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-Doc
Aldo Porco* (Bloomberg), Dhruv Mehra* (Bloomberg), Igor Malioutov* (Bloomberg), Karthik Radhakrishnan* (Bloomberg), Moniba Keymanesh* (Bloomberg), Daniel Preoţiuc-Pietro (Bloomberg), Sean MacAvaney (University of Glasgow), Pengxiang Cheng (Bloomberg).
(* equal contributions)

This paper will also be presented on Thursday, July 17, 2025 at the Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR’25).

Click to read "An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-Doc," published July 13, 2025 at SIGIR 2025Click to read "An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-Doc," published July 13, 2025 at SIGIR 2025

Please summarize your research. Why are your results notable?

Karthik Radhakrishnan: Our research deals with improving latency of retrieval models from the SPLADE-Doc family. Typically, these models are used to generate representations for passages indexed into search engines, such as Apache Solr. A big part of their retrieval latency is due to high frequency tokens that are assigned high weights in passage representations. As a result, for queries containing these high frequency tokens, a large number of documents are matched from the index, resulting in slow retrieval. The current training process for these models cannot handle this issue without a substantial drop in performance.

We propose a new method of regularization called “DF-FLOPS.” The method enforces sparsity over both documents and terms, compared to the popular FLOPS regularization, which only does so within documents.

Let’s consider the representations produced with both FLOPS and DF-FLOPS for a sample document below. When using FLOPS, the expansions generated by the model often contain high frequency tokens that are semantically irrelevant (Tokens like “is,” “with” etc.). These high frequency tokens highlighted in red below result in large search indices and slow retrieval.

Representation with FLOPSRepresentation with FLOPS

With DF-FLOPS, the model learns to avoid relying on high frequency tokens while using them when appropriate. For example, in the text below, the word “who” is used to represent the “World Heath Organization,” and keeping the token would be crucial for retrieval, whereas irrelevant tokens like “is” and “with” are dropped.

Representation with DF-FLOPSRepresentation with DF-FLOPS

With our method, we can lower retrieval latency of SPLADE-Doc by 10x to the level of BM25, making learned sparse retrieval practical for deployment in production-grade search engines.

The performance of the retrieval system decreases slightly when tested in the same domain as training (- 2.2 MRR@10 on MS-MARCO), but it improves on 12/13 cross-domain datasets from the BEIR benchmark.

How does your research advance the state-of-the-art in the field of information retrieval?

Semantic similarity applications are widespread and include search engines and retrieval-augmented generation (RAG)-based chatbots. These rely on representing documents as numerical vectors (“embeddings”) from which document similarity is computed. Learned sparse embeddings, such as those obtained from SPLADE, are a more practical alternative to dense embeddings, as they enable faster computation of similarities and can make use of existing sparse index solutions and infrastructure.

The method described in our paper shows that the retrieval latency of SPLADE can be sped up by an order of magnitude, while preserving most of the quality improvements. These practical improvements are critical for real production retrieval systems.


Thursday, July 17, 2025

FinIR: The 2nd Workshop on Financial Information Retrieval in the Era of Generative AI
Fengbin Zhu (National University of Singapore), Yunshan Ma (Singapore Management University), Fuli Feng (University of Science and Technology of China), Chao Wang (6Estates Pte Ltd.), Huanbo Luan (6Estates Pte Ltd), Guangnan Ye (Fudan University), Shuo Zhang (Bloomberg), Dhagash Mehta (BlackRock), Pingping Chen (Goldman Sachs), Bing Xiang (Goldman Sachs), Tat-Seng Chua (National University of Singapore)

Please explain the goal of this workshop. Why are you helping to organize it?

Shuo Zhang: The 2nd Workshop on Financial Information Retrieval in the Era of Generative AI is designed to explore and address the emerging challenges at the intersection of generative AI and financial information retrieval. As generative models — particularly large language models (LLMs) — continue to revolutionize information access, their limitations become increasingly evident in fast-paced domains like finance. Issues like hallucination, outdated knowledge, and data sparsity make it clear that generative models cannot operate in isolation.

This is where information retrieval (IR) plays a critical role. The workshop aims to advance the integration of IR technologies — such as retrieval-augmented generation (RAG), multimodal and real-time retrieval, and domain-specific query understanding — into generative systems for finance. It also tackles core concerns around benchmarking, privacy, trustworthiness, and evaluation frameworks unique to financial applications.

I am helping to organize this workshop because I believe the financial domain offers a uniquely demanding and richly complex environment for IR research. The combination of heterogeneous data, temporal sensitivity, and regulatory constraints requires sophisticated solutions that push the boundaries of both retrieval and generation. With my research focus on IR and text analytics, I’m especially excited to help bridge the research gap and promote cross-pollination between academia and industry practitioners.

How do you expect or hope that this workshop will help advance the state-of-the-art in the field of financial information retrieval?

We hope this workshop will drive new progress in how IR is used in finance, especially when combined with generative AI models like LLMs. Financial data is complex, constantly changing, and comes in many forms — like tables, charts, reports, and news. Traditional generative models often struggle with this because they rely only on their training, and can’t always access the latest or most specific information.

This is where IR becomes essential. The workshop encourages new methods to improve how we find and use relevant financial information — for example, retrieving updated data in real time, handling multiple types of data (like text and images), and refining user queries in financial contexts. These advances can make generative models more accurate, up-to-date, and useful in real-world financial applications.

We’re also focused on creating better ways to evaluate these systems, since current benchmarks don’t reflect the real demands of financial tasks. The workshop supports building new benchmarks and testing tools that help researchers measure system performance more realistically.

Another goal is to support the design of complete systems, like financial assistants or tools for analysts,  that combine retrieval and generation. Finally, we want to tackle trust and privacy issues, which are especially important in finance. This includes improving data security, avoiding misinformation, and ensuring systems are explainable and reliable.

By bringing together researchers and professionals from both academia and industry, we hope this workshop sparks ideas, collaborations, and new research directions that raise the standard for how AI is applied in finance.


Agentic Retrieval of Topics and Insights from Earnings Calls
Anant Gupta (Bloomberg), Rajarshi Bhowmik (Bloomberg), Geoffrey Gunow (Bloomberg).

Click to read "Agentic Retrieval of Topics and Insights from Earnings Calls," published July 17, 2025 at the 2nd Workshop on Financial Information Retrieval in the Era of Generative AI (FinIR 2025)Click to read "Agentic Retrieval of Topics and Insights from Earnings Calls," published July 17, 2025 at the 2nd Workshop on Financial Information Retrieval in the Era of Generative AI (FinIR 2025)

Please summarize your research. Why are your results notable?

Anant Gupta: In this work that we’re presenting during the 2nd Workshop on Financial Information Retrieval in the Era of Generative AI (FinIR 2025), we showcase a generative AI-driven, agentic framework for dynamically extracting and organizing financial topics from corporate earnings call transcripts. Our system utilizes LLMs to autonomously identify, structure, and evolve a hierarchical topic ontology that captures emerging themes and their relationships over time. Unlike traditional topic models such as Latent Dirichlet Allocation (LDA), which rely on static, unsupervised distributions or pre-defined labels, our approach enables fine-grained, contextual tracking of financially-relevant narratives across companies and sectors.

Our results are notable in three ways:

  1. Dynamic topic discovery: The agentic system can surface emerging financial topics without requiring prior labeling, capturing evolving language in real time (e.g., the rise of “generative AI” or “cost reduction” post-2022).
  2. Ontology-grounded insights: We construct and validate a coherent multi-level topic ontology using semantic similarity and embedding-based coherence metrics, showing improved structural integrity compared to LDA baselines.
  3. Actionable financial analytics: We demonstrate downstream applications like trend detection, competitor benchmarking, and identification of strategic differentiators, providing equity analysts with early, interpretable signals directly from unstructured text.

How does your research advance the state-of-the-art in the field of information retrieval?

Our work advances IR by shifting from static, keyword-based topic detection toward dynamic, context-aware retrieval using agentic LLMs. While existing IR systems often rely on pre-indexed vocabularies or supervised classifiers, our approach uses semantic reasoning to identify novel concepts as they appear, and self-updating structures (ontologies) to organize them over time.

This agentic system acts not just as a retriever but as a semantic curator — it validates, contextualizes, and links topics across documents in a way that mirrors how a human analyst might build mental models over quarters of financial reporting. In doing so, we bridge the gap between deep retrieval and interpretability, enabling richer question-answering, longitudinal trend analysis, and sector-wide benchmarking.