ipydatagrid adds interactive data grids to the Jupyter ecosystem

December 08, 2021

Software engineers at Bloomberg often create tools that solve problems related to cloud infrastructure, information retrieval, data science, natural language processing, mobile application development, and more. A number of these tools have been published on GitHub as open source projects for others across the tech industry to use in solving real-world problems. Several of the open source projects born at Bloomberg have also been spun out and are now supported by communities with their own governance.

In this article, we talk with Itay Dafna, one of the developers of ipydatagrid, an interactive data table widget for the Jupyter ecosystem, to learn more about this open source project:

Briefly tell us about yourself.

My name is Itay Dafna and I am a Software Engineer in Bloomberg’s San Francisco Engineering Office, where I work on the BQuant Visualizations team. I joined Bloomberg eight years ago after completing my master’s degree in management. I have had the privilege to move between different departments whilst at Bloomberg – starting in Analytics, then Financial Engineering, where I tackled complex inquiries from our clients, and now Engineering. I also earned my Certificate in Quantitative Finance while working for the company.

Most of my work is “full stack” and typically revolves around the Jupyter ecosystem, specifically Jupyter-widgets. I am a maintainer of the popular ipywidgets package, as well as bqplot and, of course, ipydatagrid. In some of the other projects I work on, I also get to apply my finance knowledge and leverage bleeding-edge technologies like WebAssembly (WASM).

Tell us about the new open source project that was published.

As the name suggests, ipydatagrid is a data grid Jupyter-widget. A data grid is essentially a grid of cells, much like Microsoft Excel, that holds tabular information (hierarchical, in some cases) and allows for quick analysis with filtering, sorting, and formatting based on some conditions. ipydatagrid is a complete data grid solution and the first Jupyter-widget based one that allows for two-way data binding – any edits made on the UI get trickled down to the Python model, and vice versa.

How did you come to work on this project? Who else has contributed?

The project was initiated by Kaia Young, my Team Lead, and Mehmet Bektas, a Jupyter core developer, with help from QuantStack’s Martin Renou. By the time I joined the team, the project was released to Bloomberg clients only via the BQuant platform. As an early BQuant user, I always relied on data grids when building apps, so you can imagine my excitement when I joined the team and was offered the opportunity to contribute to ipydatagrid.

What did you individually contribute to the project?

In preparing to release the project to the OSS community, I worked with Bloomberg’s Python Guild, JavaScript Guild, and Open Source Program Office (OSPO) to refactor the code base so that it was ready for publication. I also added numerous enhancements to the product, with notable examples being nested hierarchies of columns and rows, conditional formatting based on sibling cells, styling and themes, text wrapping, and automatic column-width fitting. Adding a new feature to ipydatagrid is trickier than other products because it is built on top of other packages from the Jupyter ecosystem. So, a typical feature request in ipydatagrid requires enhancements to happen upstream before they can be added to ipydatagrid itself.

How did your experience/background prepare you to make this contribution?

I have a strong affinity for mathematics, especially quantitative finance and machine learning. The first stage of working with ML models is data exploration – understanding the data is crucial for determining the correct model to use. Being able to quickly filter, transform and format the data can be tremendously helpful in that phase. ipydatagrid excels at making it easy to explore tabular data sets. Plus, because ipydatagrid is a jupyter-widget, it is easy to link it with a plotting library like bqplot so that when you click on a given numerical data column, a histogram plot is rendered, for example. ipydatagrid can also be used to display portfolio weights, trading strategy backtest results, and to visualize asset scoring models.

In my previous role with our Financial Engineering team, I was an early BQuant user and regularly used an older, less-capable data grid, which ipydatagrid has now replaced. So I benefitted not only from knowing how to make the most out of data grids, but also what ipydatagrid needs to reach feature parity (and beyond) with other data grid solutions.

What problems does this solution solve inside Bloomberg?

A large base of our clients is used to working with tabular data. A vast majority of those who are using BQuant will have migrated their analyses from Excel. Having that familiar view of cells containing data inside BQuant has helped us bridge that large user experience gap and ease adoption of our new analytics platform for quantitative analysts and data scientists in the financial markets. ipydatagrid also introduces enhanced functionality such as two-way data binding, different cell renderers, and conditional formatting based on the Vega Expression grammar, allowing for the creation of really sophisticated data grids with a user experience similar to Excel. In fact, before ipydatagrid existed, many of our clients resorted to constructing their own data grids using HTML tables and CSS. ipydatagrid ensures standardization when it comes to data grids in the Jupyter ecosystem.

How does this solution benefit the broader open source community?

There were a few attempts at creating a free-to-use Jupyter-widget data grid, but most efforts either lacked basic functionality, such as two-way data binding, or were abandoned by the community due to a lack of maintenance. ipydatagrid provides the broader Jupyter community with a fast, robust and feature-complete data grid solution. Because ipydatagrid is based on the Lumino package — a core package used in many JupyterLab extensions — most enhancements we add to ipydatagrid are first added to that package before they’re exposed in ipydatagrid. So you have this ripple effect.

What do you hope the community contributes to the project?

I hope ipydatagrid becomes the de-facto data grid solution in the Jupyter ecosystem and that users get involved by contributing enhancements and fixes. We are doing so much to engage the community by being very responsive to issues raised on the repo, as well as providing help to community members who author pull requests.