{"id":20602,"date":"2018-02-06T10:29:06","date_gmt":"2018-02-06T15:29:06","guid":{"rendered":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/"},"modified":"2022-03-04T23:13:30","modified_gmt":"2022-03-05T04:13:30","slug":"name-translation-wikipedia-work-chen-tse-tsai","status":"publish","type":"post","link":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/","title":{"rendered":"Name Translation and Wikipedia: The Work of Chen-Tse Tsai"},"content":{"rendered":"<div class='bbg-row bbg-bg--white  bbg-row--margin-top-none bbg-row--margin-bottom-none' data-anchor='row-6a08e4ad9a037'>\n  \n\t\n\t\n\t<div class=\"bbg-row--content\">\n\t\t\n\t\t\t<div class='bbg-column bbg-column--width-8 bbg-column--offset-2'>\n\t<p><figure class=\"image-figure\" data-animation=\"\">\n    <img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"735\" src=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg\" class=\"attachment-large size-large image-figure__image image-figure__image--primary\" alt=\"\" srcset=\"https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 170w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 140w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&amp;type=webp&amp;url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 1672w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\n    \n<\/figure>\n<div class='bb-wysiwyg'>\n    \n    <p>Chen-Tse Tsai, a research scientist in the artificial intelligence group at Bloomberg, has been working on a problem that seems simple enough: Given a name in a text, find the correct Wikipedia page.<\/p>\n<p>But consider that many names appear on dozens of Wikipedia pages. The word \u201cChicago,\u201d for example, could refer to the city, the University of Chicago, or even the band \u2013 and that\u2019s just for starters. Plus, what if the name is not in English? What if it is usually transliterated, so that the phonetic sound, rather than any literal meaning, is preserved in the new language?<\/p>\n<p>Name translation is \u00a0an especially helpful step in general translation, says Tsai. If the names in the text can be identified, it\u2019s often possible to get an idea of what the text is about. \u201cSome facts are only stated in foreign-language texts. By grounding them to the English-language Wikipedia, we can get more information,\u201d says Tsai. \u201cIt\u2019s a pretty huge problem.\u201d The English-language Wikipedia, says Tsai, is the most useful Wikipedia for this sort of research, simply because it\u2019s the largest.<\/p>\n<p>Tsai says a three-step process can be used to untangle cross-lingual wikification. The first is to successfully identify named entities (people, locations, organizations, etc.) in a foreign text. That might sound simple, but, he says, \u201cThis is still a very challenging problem.\u201d<\/p>\n<p>The second challenge is to identify possible English Wikipedia pages for each foreign-language name. This was the subject of Tsai\u2019s work. His goal was to pick the 30 most-likely Wikipedia candidates for any name, hoping that the best match would be among them.<\/p>\n<p>The third step is a ranking problem, which Tsai did not attempt in this particular research.<\/p>\n<p>In his research, Tsai devised a new way to surface likely Wikipedia matches. He showed that his model outperformed six others, sometimes by impressive margins. Tsai presented <a href=\"https:\/\/cogcomp.org\/papers\/TsaiRo18.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">his work<\/a> on Tuesday, February 6, 2018 at the <a href=\"https:\/\/aaai.org\/Conferences\/AAAI-18\/aaai-18-technical-program\/\" target=\"_blank\" rel=\"noopener noreferrer\">Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)<\/a>, a conference in New Orleans sponsored by the Association for the Advancement of Artificial Intelligence (AAAI), .<\/p>\n<p>To identify possible Wikipedia pages for a given foreign-language name, Tsai says most researchers have tried to use a dictionary. If a name exists in multiple languages, Wikipedia will often link between them. This doesn\u2019t work as well as one might suspect, though: Tsai points out that for Spanish-language mentions, this method yields the correct English name in about 40 percent of cases. The smaller the non-English Wikipedia, the harder it is to consistently find matches.<\/p>\n<p>Tsai\u2019s methodology is more advanced. Instead of simply looking up a name, his model tries to generalize from the entire dictionary. It looks at all the title pairs joined by inter-language links, and then tries to learn how to translate them. In other words, the links themselves are used as training data. In Spanish, Tsai says, there are about 10,000 title pairs that he used to train the model, while there are only about 1,000 in Tagalog.<\/p>\n<p>Tsai\u2019s model also pays special attention to the order of the words in a name, which can vary between different languages. While a transliterated word may be most likely to show up as the third word in a foreign phrase, for instance, it might more commonly be found as the first word in English. The key idea in the proposed model is to consider word alignment and word transliteration jointly. Tsai says a better understanding of word order helps the model better handle transliterations, and vice versa.<\/p>\n<p>Most research of this type focuses on the biggest languages, says Tsai \u2013 English, Spanish, and Chinese. But a more generalized model, such as his, can be extended to other languages with less Wikipedia coverage. Notably, Tsai\u2019s method achieves impressive candidate generation coverage with Tagalog (73 percent); Italian (66 percent); and Bengali (65 percent). (Arabic and Hebrew proved more difficult, at 37 percent and 46 percent, respectively.)<\/p>\n<p>Tsai said his research relates to the work he does at Bloomberg in the field of information extraction and disambiguation. He has worked on several multilingual and cross-lingual problems. It would be ideal to create something specific to each language, says Tsai, but that can be difficult and slow, and each tool is of limited utility. \u201cThere is a need to cover more languages using the existing data we have,\u201d he says. \u201cWe want this model to cover as many as possible.\u201d<\/p>\n\n<\/div>\n\n<\/div>\n\n\n\t\t\n\t<\/div>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Untangling cross-lingual wikification using NLP<\/p>\n","protected":false},"author":313,"featured_media":19554,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1466],"tags":[1498,1578,1472,1486,1580],"class_list":["post-20602","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-at-bloomberg","tag-ai","tag-artificial-intelligence","tag-data-science","tag-natural-language-processing","tag-nlp"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.11 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Name Translation and Wikipedia: The Work of Chen-Tse Tsai | Bloomberg LP<\/title>\n<meta name=\"description\" content=\"Chen-Tse Tsai, a research scientist in Bloomberg&#039;s artificial intelligence group, has been working on untangling cross-lingual wikification.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Name Translation and Wikipedia: The Work of Chen-Tse Tsai | Bloomberg LP\" \/>\n<meta property=\"og:description\" content=\"Chen-Tse Tsai, a research scientist in Bloomberg&#039;s artificial intelligence group, has been working on untangling cross-lingual wikification.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/\" \/>\n<meta property=\"og:site_name\" content=\"Bloomberg L.P.\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/bloomberglp\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-02-06T15:29:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-03-05T04:13:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1672\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"akelber5\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@bloomberg\" \/>\n<meta name=\"twitter:site\" content=\"@bloomberg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"akelber5\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/\",\"url\":\"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/\",\"name\":\"Name Translation and Wikipedia: The Work of Chen-Tse Tsai | Bloomberg LP\",\"isPartOf\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/#website\"},\"datePublished\":\"2018-02-06T15:29:06+00:00\",\"dateModified\":\"2022-03-05T04:13:30+00:00\",\"author\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6\"},\"description\":\"Chen-Tse Tsai, a research scientist in Bloomberg's artificial intelligence group, has been working on untangling cross-lingual wikification.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":\"1\",\"name\":\"Home\",\"item\":\"https:\/\/www.bloomberg.com\/company\/\"},{\"@type\":\"ListItem\",\"position\":\"2\",\"name\":\"Name Translation and Wikipedia: The Work of Chen-Tse Tsai\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/#website\",\"url\":\"https:\/\/www.bloomberg.com\/company\/\",\"name\":\"Bloomberg L.P.\",\"description\":\"Bloomberg L.P. is the leader in global business and financial information, enabling customers to make smarter, faster, more informed business decisions.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.bloomberg.com\/company\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6\",\"name\":\"Bloomberg L.P.\",\"url\":\"https:\/\/www.bloomberg.com\/company\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Name Translation and Wikipedia: The Work of Chen-Tse Tsai | Bloomberg LP","description":"Chen-Tse Tsai, a research scientist in Bloomberg's artificial intelligence group, has been working on untangling cross-lingual wikification.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/","og_locale":"en_US","og_type":"article","og_title":"Name Translation and Wikipedia: The Work of Chen-Tse Tsai | Bloomberg LP","og_description":"Chen-Tse Tsai, a research scientist in Bloomberg's artificial intelligence group, has been working on untangling cross-lingual wikification.","og_url":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/","og_site_name":"Bloomberg L.P.","article_publisher":"https:\/\/www.facebook.com\/bloomberglp\/","article_published_time":"2018-02-06T15:29:06+00:00","article_modified_time":"2022-03-05T04:13:30+00:00","og_image":[{"width":1672,"height":1200,"url":"https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg","type":"image\/jpeg"}],"author":"akelber5","twitter_card":"summary_large_image","twitter_image":"https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg","twitter_creator":"@bloomberg","twitter_site":"@bloomberg","twitter_misc":{"Written by":"akelber5","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/","url":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/","name":"Name Translation and Wikipedia: The Work of Chen-Tse Tsai | Bloomberg LP","isPartOf":{"@id":"https:\/\/www.bloomberg.com\/company\/#website"},"datePublished":"2018-02-06T15:29:06+00:00","dateModified":"2022-03-05T04:13:30+00:00","author":{"@id":"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6"},"description":"Chen-Tse Tsai, a research scientist in Bloomberg's artificial intelligence group, has been working on untangling cross-lingual wikification.","breadcrumb":{"@id":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.bloomberg.com\/company\/stories\/name-translation-wikipedia-work-chen-tse-tsai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":"1","name":"Home","item":"https:\/\/www.bloomberg.com\/company\/"},{"@type":"ListItem","position":"2","name":"Name Translation and Wikipedia: The Work of Chen-Tse Tsai"}]},{"@type":"WebSite","@id":"https:\/\/www.bloomberg.com\/company\/#website","url":"https:\/\/www.bloomberg.com\/company\/","name":"Bloomberg L.P.","description":"Bloomberg L.P. is the leader in global business and financial information, enabling customers to make smarter, faster, more informed business decisions.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.bloomberg.com\/company\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.bloomberg.com\/company\/#\/schema\/person\/09c9e1a38b7f345ce5c0b4bbde1656a6","name":"Bloomberg L.P.","url":"https:\/\/www.bloomberg.com\/company"}]}},"featured_image_rendered":"<img srcset='https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 1672w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 300w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 768w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 1024w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 170w, https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg 140w' src='https:\/\/assets.bbhub.io\/image\/v1\/resize?width=auto&type=webp&url=https:\/\/assets.bbhub.io\/company\/sites\/51\/2018\/02\/IMG_1907-edited.jpg' alt='' \/>","category_info":{"name":"Tech At Bloomberg","blog_landing_name":"Tech At Bloomberg"},"_links":{"self":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/20602","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/users\/313"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/comments?post=20602"}],"version-history":[{"count":1,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/20602\/revisions"}],"predecessor-version":[{"id":21849,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/posts\/20602\/revisions\/21849"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/media\/19554"}],"wp:attachment":[{"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/media?parent=20602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/categories?post=20602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bloomberg.com\/company\/wp-json\/wp\/v2\/tags?post=20602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}