{"id":20649,"date":"2017-09-13T10:47:11","date_gmt":"2017-09-13T14:47:11","guid":{"rendered":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/"},"modified":"2022-03-04T23:11:51","modified_gmt":"2022-03-05T04:11:51","slug":"the-search-for-solr-analytics","status":"publish","type":"post","link":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/","title":{"rendered":"The search for Solr analytics"},"content":{"rendered":"<div class='bbg-row bbg-bg--white  bbg-row--margin-top-none bbg-row--margin-bottom-none' data-anchor='row-6a0408d2e59c1'>\n  \n\t\n\t\n\t<div class=\"bbg-row--content\">\n\t\t\n\t\t\t<div class='bbg-column bbg-column--width-8 bbg-column--offset-2'>\n\t<div class='bb-wysiwyg'>\n    \n    <p>In the constant search to extract meaningful insight from data, one team of Bloomberg programmers is having an outsized impact, both inside and outside the company.<\/p>\n<p><a href=\"https:\/\/www.linkedin.com\/in\/stevenbower\/\" target=\"_blank\" rel=\"noopener noreferrer\">Steven Bower<\/a> and <a href=\"https:\/\/www.linkedin.com\/in\/houston-putman-3b662361\/\" target=\"_blank\" rel=\"noopener noreferrer\">Houston Putman<\/a> are contributing a new version of their Analytics Component to a project called Apache Solr, and their work is benefitting programmers and data scientists all over the world. Solr (pronounced like \u201csolar\u201d) is an <a href=\"https:\/\/lucene.apache.org\/solr\/\" target=\"_blank\" rel=\"noopener noreferrer\">open source platform<\/a> designed for indexing and searching data. Created in 2004, it\u2019s widely used as a data search tool at a variety of high-profile companies, including Best Buy, eBay and Netflix.<\/p>\n<p>Bower said Solr also serves as the technical foundation for more than 300 functions on the Bloomberg Terminal. \u201cPretty much anytime you search on the Terminal, Solr is there,\u201d he said. \u201cIt forms the underpinning for things like News Search, Bloomberg Unified Search, Fixed Income Search (SRCH&lt;GO&gt;), and even jobs listed on <a href=\"https:\/\/bloomberg.avature.net\/en_US\/careers\/SearchJobs\" target=\"_blank\" rel=\"noopener noreferrer\">Bloomberg.com\/careers<\/a>. It\u2019s all over the place.\u201d<\/p>\n<p>Bloomberg\u2019s reliance on Solr led the company\u2019s software engineers to contribute to the Solr project in a big way. Three members of the Bloomberg search and news teams are \u201c<a href=\"https:\/\/www.bloomberg.com\/company\/announcements\/open-source-at-bloomberg-expanding-our-engagement-with-solr\/\" target=\"_blank\" rel=\"noopener noreferrer\">committers<\/a>\u201d to the project, meaning they can <a href=\"https:\/\/en.wikipedia.org\/wiki\/Committer\" target=\"_blank\" rel=\"noopener noreferrer\">modify the code directly<\/a>. Another 20 Bloomberg engineers have submitted code that has ultimately been added to Solr. And one is a member of the Apache Lucene\/Solr Project Management Committee (PMC), which provides oversight of the project for the Apache Software Foundation (ASF), decides the release strategy, appoints new committers and sets community and technical direction for the project.<\/p>\n<p>The original idea for adding the <a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-10123\" target=\"_blank\" rel=\"noopener noreferrer\">Analytics Component<\/a> to Solr dates back four years, to when Putman, then a 19-year old intern at Bloomberg, started work on an earlier data analysis project.<\/p>\n<p>\u201cBasically people were asking us for ways they could do simple rollups of their data generated by Solr,\u201d Putman said. \u201cThe requests were pretty basic, like how many records were in the system every week. Over time, the requests got more advanced.\u201d<\/p>\n<p>Solr already included a component called <a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/solr\/The+Stats+Component\" target=\"_blank\" rel=\"noopener noreferrer\">Stats<\/a> that proved to be too inflexible. Once Putman released the first version of the Analytics Component, it proved popular in the Solr community for expanding the range of functionality. Inside Bloomberg, the new functionality helped the company cut back on its use of some commercial database products.<\/p>\n<p>The Analytics Component\u2019s ability to search on live data alleviates the need and cost to use a data warehouse database. That speeds up and simplifies the process, which also reduces the cost of conducting the analysis. \u201cIt\u2019s pretty intense what it can do,\u201d Bower said. Version 2.0 of the Analytics Component (<a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-10123\" target=\"_blank\" rel=\"noopener noreferrer\">SOLR-10123<\/a>) is a standard part of Solr 7.0, the platform\u2019s soon-to-be-released major revision.<\/p>\n<p>The new version addresses a number of challenges found as usage of Solr has grown at Bloomberg.<\/p>\n<p>First, a common method that Solr users employ to handle ever-growing document stores (called \u2018collections\u2019 in Solr), and to provide consistent performance, is called \u2018sharding\u2019: breaking the document store into chunks. Those chunks can then be distributed across a number of Solr instances, either located on a single machine (to take advantage of multiple CPUs, for example), or across multiple machines (when a single machine can no longer provide sufficient performance for the application). Unfortunately, the first version of the Analytics Component could not be applied to sharded collections. The new version directly supports sharded collections, which allows it to be used on collections with tens of billions (even hundreds of billions) of documents.<\/p>\n<p>This enhancement provides much more than just the ability to support larger collections, though. Sharding can be used to more fully utilize the processing power of machines running Solr (which are often constrained by how quickly they can load and store data from their storage devices, not by their CPUs). Executing analytics on a collection organized this way employs a method referred to as \u2018<a href=\"https:\/\/en.wikipedia.org\/wiki\/MapReduce\" target=\"_blank\" rel=\"noopener noreferrer\">MapReduce<\/a>\u2019, providing near-linear performance increases as the number of shards grows. When the shards are distributed across multiple machines, the parallelism benefit continues, as the computations are also distributed across the machines.<\/p>\n<p>Second, in the years since the Analytics Component was made available to teams at Bloomberg, their usage demonstrated a need to support hundreds (or even thousands) of categories in multi-level group searches performed against their large collections. The new version of the component has a completely rewritten grouping algorithm, which can handle these searches without a significant loss in performance.<\/p>\n<p>Why all the effort to boost an open source project? It falls in line with Bloomberg\u2019s culture values of making a difference in the communities where we live and work. This way, other companies &#8212; even potential competitors &#8212; benefit from the work that Putman, Bower and their Bloomberg colleagues have done. And those same external users help improve the software by finding its flaws and fixing them, after putting it through their own-real world deployments.<\/p>\n\n<\/div>\n<div class=\"bb-separator\" data-color=\"\">\n\t<hr class=\"bb-separator__rule\">\n<\/div>\n<div class='bb-wysiwyg'>\n    \n    <p><em>Attending <\/em><a href=\"https:\/\/lucenerevolution.org\" target=\"_blank\" rel=\"noopener noreferrer\"><em>Lucene\/Solr Revolution 2017<\/em><\/a><em> in Las Vegas? Learn more about the next iteration of the Solr Analytics Component from Houston Putman during \u201c<\/em><a href=\"https:\/\/sched.co\/BAwk\" target=\"_blank\" rel=\"noopener noreferrer\"><em>Analytics at Scale with the Analytics Component 2.0<\/em><\/a><em>\u201d from 10:00-10:40 AM PT on Thursday, September 14, 2017.<\/em><\/p>\n\n<\/div>\n<figure class=\"image-figure\" data-animation=\"\">\n    <img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-rev-2017-Solr-Analytics-2.0.png\" class=\"attachment-large size-large image-figure__image image-figure__image--primary\" alt=\"\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-rev-2017-Solr-Analytics-2.0.png 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-rev-2017-Solr-Analytics-2.0.png 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-rev-2017-Solr-Analytics-2.0.png 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-rev-2017-Solr-Analytics-2.0.png 170w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-rev-2017-Solr-Analytics-2.0.png 140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\n    \n<\/figure>\n<\/p>\n\n<\/div>\n\n\n\t\t\n\t<\/div>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Open source effort extends analytics to sharded collections in Solr<\/p>\n","protected":false},"author":313,"featured_media":19636,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1466],"tags":[1788,1485,1464,1678,1693,1787,1793,1777],"class_list":["post-20649","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-at-bloomberg","tag-bloomberg-terminal","tag-machine-learning","tag-open-source","tag-programmer","tag-programming","tag-termina","tag-vc","tag-venture-capital"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.11 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>The search for Solr analytics | Bloomberg LP<\/title>\n<meta name=\"description\" content=\"Bloomberg engineers lead open source effort to extend analytics to sharded collections in Solr\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The search for Solr analytics | Bloomberg LP\" \/>\n<meta property=\"og:description\" content=\"Bloomberg engineers lead open source effort to extend analytics to sharded collections in Solr\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/\" \/>\n<meta property=\"og:site_name\" content=\"Bloomberg L.P.\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/bloomberglp\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-09-13T14:47:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-03-05T04:11:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"725\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"akelber5\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@bloomberg\" \/>\n<meta name=\"twitter:site\" content=\"@bloomberg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"akelber5\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/\",\"url\":\"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/\",\"name\":\"The search for Solr analytics | Bloomberg LP\",\"isPartOf\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/#website\"},\"datePublished\":\"2017-09-13T14:47:11+00:00\",\"dateModified\":\"2022-03-05T04:11:51+00:00\",\"author\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6\"},\"description\":\"Bloomberg engineers lead open source effort to extend analytics to sharded collections in Solr\",\"breadcrumb\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":\"1\",\"name\":\"Home\",\"item\":\"https:\/\/www.bloomberg.com\/company\/\"},{\"@type\":\"ListItem\",\"position\":\"2\",\"name\":\"The search for Solr analytics\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/#website\",\"url\":\"https:\/\/www.bloomberg.com\/company\/\",\"name\":\"Bloomberg L.P.\",\"description\":\"Bloomberg L.P. is the leader in global business and financial information, enabling customers to make smarter, faster, more informed business decisions.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.bloomberg.com\/company\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6\",\"name\":\"Bloomberg L.P.\",\"url\":\"https:\/\/www.bloomberg.com\/company\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The search for Solr analytics | Bloomberg LP","description":"Bloomberg engineers lead open source effort to extend analytics to sharded collections in Solr","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/","og_locale":"en_US","og_type":"article","og_title":"The search for Solr analytics | Bloomberg LP","og_description":"Bloomberg engineers lead open source effort to extend analytics to sharded collections in Solr","og_url":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/","og_site_name":"Bloomberg L.P.","article_publisher":"https:\/\/www.facebook.com\/bloomberglp\/","article_published_time":"2017-09-13T14:47:11+00:00","article_modified_time":"2022-03-05T04:11:51+00:00","og_image":[{"width":725,"height":450,"url":"https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg","type":"image\/jpeg"}],"author":"akelber5","twitter_card":"summary_large_image","twitter_image":"https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg","twitter_creator":"@bloomberg","twitter_site":"@bloomberg","twitter_misc":{"Written by":"akelber5","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/","url":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/","name":"The search for Solr analytics | Bloomberg LP","isPartOf":{"@id":"https:\/\/www.bloomberg.com\/company\/#website"},"datePublished":"2017-09-13T14:47:11+00:00","dateModified":"2022-03-05T04:11:51+00:00","author":{"@id":"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6"},"description":"Bloomberg engineers lead open source effort to extend analytics to sharded collections in Solr","breadcrumb":{"@id":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.bloomberg.com\/company\/stories\/the-search-for-solr-analytics\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":"1","name":"Home","item":"https:\/\/www.bloomberg.com\/company\/"},{"@type":"ListItem","position":"2","name":"The search for Solr analytics"}]},{"@type":"WebSite","@id":"https:\/\/www.bloomberg.com\/company\/#website","url":"https:\/\/www.bloomberg.com\/company\/","name":"Bloomberg L.P.","description":"Bloomberg L.P. is the leader in global business and financial information, enabling customers to make smarter, faster, more informed business decisions.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.bloomberg.com\/company\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6","name":"Bloomberg L.P.","url":"https:\/\/www.bloomberg.com\/company"}]}},"featured_image_rendered":"<img srcset='https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg 725w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg 170w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg 140w' src='https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2017\/09\/solr-image.jpg' alt='' \/>","category_info":{"name":"Tech At Bloomberg","blog_landing_name":"Tech At Bloomberg"},"_links":{"self":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/20649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/users\/313"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/comments?post=20649"}],"version-history":[{"count":2,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/20649\/revisions"}],"predecessor-version":[{"id":38498,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/20649\/revisions\/38498"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/media\/19636"}],"wp:attachment":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/media?parent=20649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/categories?post=20649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/tags?post=20649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}