Starting-up within Microsoft: Bing BigIndex System (Maguro)

You might be thinking, what does “Starting-up within Microsoft” mean?

Though it is not very common, at Microsoft we occasionally incubate products or technologies from a ground-up. Typically at any group at Microsoft we do haveshort-term, mid-term and long-term focus. Incubation of new technologies can happen for different focuses and needs and mostly they are considered as long-term bets.

Maguro ~ The Big Tuna

Source: channel.nationalgeographic.com

Maguro means tuna in Japanese language. But, why we incubated Maguro? We were not Google, our index was only a fraction of what Google had in 2010, and we did not have as big of a budget as Google.

We realized that one way to stay alive is to increase our index size, so our goal was to scale our index size from low to high tens of Billion Documents with technology that can support up to 1 Trillion documents.

Using the current architecture, then, was not financially feasible as it will be too expensive, besides it will hit a perf bottleneck at that scale. Hence we incubated Maguro, a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost.

read more >>

Advertisements

PERSPEKTIF: KARIR DI STARTUP VS PERUSAHAAN TEKNOLOGI BESAR (BAGIAN 2)

In the first part of this two-part piece, Henry Tan talked about his experience working in large corporation and early stage venture. This time, he will discuss about career at mid- to late-stage startups. 

Late Stage Startup

After about 8 years navigating inside a large tech company, and presented with an opportunity to work for a mid- to late-stage startup, I considered a position working on  at a Big Data for marketing company – BlueKai.

Silicon Valley startups can offer a very generous package exceeding the package offered by typical large tech companies. Talents acquisition is part of the core strategies of many Silicon Valley startups. Only by offering a very competitive package, they can win this talents acquisition war with the large tech companies with fat pockets.

read more >>

Perspective: Startup vs Large Tech Corporation (part I)

Perspektif: Karir di Startup vs Perusahaan Teknologi Besar (Bagian 1)

PERSPECTIVE: Career @ Startup VS large Tech Corporation (Part 1)

Why do some people prefer joining startup than large corporations and vice versa? What are the important factors to consider?

How is it like working for an early stage startup, late stage startup, and a large tech company?

This article aims to share tidbits and perspectives of working in each of those three environments,  based on my own experience.

read more >>

TOKOPEDIA – 5th TECH A BREAK

poster

We are very excited to invite you to come and join the 5th Tech a Break @tokopedia. Tech enthusiasts and practitioners will gather together in this warm knowledge-sharing meetup with awesome guest speakers from variety of tech backgrounds.

Speaker’s Profile:

1. Gautam Chakravarthy & Yonathan Sebhastian– Test Engineer at Tokopedia
Topic: Dexter 1.0 – QA Automation Tool
Bio : Seasoned software test engineer, lead testing and product development teams towards successful product releases at Zynga and YmediaLabs, Gautam currently leads the QA team in Tokopedia with an ambition to make it “the best test engineering team”. On a personal front, one of his hobbies is nature and wildlife photography.
Bio : As Tokopedia test engineer, Yonathan works on crafting quality test automation system that produces result and catch bugs. Aside from automating, he also likes to tinker with web CMS and reading. Currently he is reading The Personal MBA by Josh Kaufman.

2. Henry Tan Setiawan– Principal Software Design Engineer at Microsoft Research
Topic: Starting-up within Microsoft: Bing BigIndex system (Maguro)
Bio   : Henry focuses on research and development of Deep Learning infrastructure and services (a.k.a Project Adam). He was leading the data classification tech team at BlueKai, the leader in AdTech Big Data company prior getting acquired by Oracle in 2013. He holds a PhD in Computer Science from University of Technology, Sydney. He joined Microsoft, Redmond, USA in 2006 and helped developed the Bing BigIndex search backend infrastructures and services (a.k.a Maguro) and contributed to the R&D of other large scale distributed services at Microsoft including, but not limited to, Azure Machine Learning, Azure Cloud Storage, and Messenger Server backend services.

Kindly RSVP by Sunday, September 13th, 2015

We’re looking forward to see you at Tech a Break @tokopedia!

read more >>

Henry Tan Setiawan on Machine Learning @ UPH

FaST Science-Tech Colloquium Discussing Machine Learning in daily life and Career as Software Developer in USA by Henry Tan Setiawan from Microsoft Research
On Friday, September 11, 2015, at Building B UPH, the colloquium once again was hosted, and this time Industrial Engineering Department invited a guest speaker namely Dr. Henry Tan Setiawan, Principal RSDE Microsoft Research.
 
Dr. Henri P. Uranus, colloquium coordinator gave certificate to Dr. Henry Tan Setiawan
Science-Tech Colloquium is a monthly event of UPH Faculty of Science and Technology (FaST) where the lecturers and cross-departments students gather to discuss about scientific topics in a serious but fun way. The colloquium has started since 2011 by the departments in science and technology group. According to Dr. Henri P. Uranus, as coordinator of colloquium, the scientific event is expected to be the media to share information and to learn about each colleague’s research, so that it can result in a scientific research partnership in cross-departments and can nourish the scientific culture in academicians of FaST.

The event that hosted alternately by 6 departments in FaST is open for lecturers, students, and public. The topic discussed is usually the research result of a lecturer. On Friday, September 11, 2015, at Building B UPH, the colloquium once again was hosted, and this time Industrial Engineering Department invited a guest speaker namely Dr. Henry Tan Setiawan, Principal RSDE Microsoft Research. The topic discussed was Machine Learning in daily life and Career as Software Developer in USA. The event was attended by approximately 100 participants, consisted of students and lecturers, and moderated by Dr. Jessica Hanafi, Industrial Engineering lecturer.

read more >>

Microsoft Researcher Watching Indonesia 2014 Presidential Election Vote Counting through Crawler

1404144Pilpres-2014780x390
JAKARTA, KOMPAS.com – An Indonesian citizen who worked as a technology researcher at Microsoft headquarter in Redmond, Washington, USA, participate overseeing the counting process Presidential election 2014, by creating an independent web site www.pilpres2014.org.

The web site was created by Henry Tan Setiawan, taking advantage of the openness of the data scanned DA1 form from the web site General Election Commission (KPU). He worked on this web site alone, unlike the KawalPemilu.org, for example, which is managed by at least 600 volunteers through a crowdsourcing platform.

Until Sunday (20.07.2014) at 14:00 pm, counting in www.pilpres2014.org show the pair Joko Widodo-Jusuf Kalla ahead with 52.15 percent while the pair Prabowo and Hatta Rajasa 47.85 percent.

Henry-made software somewhat special because the results of the vote count is done automatically on the Commission web site and updated every 2 hours. This site also provides visual data and the history data is uploaded.

read more >>

Very Proud, Spirit of Collaboration Technology for the Indonesian 2014 Presidential Election

1037340pilpres2014780x390

KOMPAS.com – Participation of technology practitioners in this 2014 Presidential Election gives a new color in the democratic space.

Very proud of their selfless actively counting by exploiting the openness of data from the General Election Commission (KPU), so that the process of calculating the sound is not the monopoly of government institutions.

Before Election Commission announced the vote and vice presidential candidate, there are some web sites that illustrate the results of the vote count, among others Pilpres2014.org, KawalPemilu.org, Data-Pilpres.umm.ac.id (Institute of Information and Communications, University Malang), and Kawal-Suara.appspot.com.

Based on the observation KompasTekno, the results of vote count conducted above website capable of honing recapitulation 33 provinces released by the Commission with the highest margin of 0.37 percent.

Meanwhile, the vote count conducted KawalPemilu.org and Pilpres2014.org website can exactly match the final recapitulation Commission.

Based on the Commission released final data on Tuesday (07/22/2014), noted that the pair Joko Widodo and Jusuf Kalla superior sound, which stood at 53.15 per cent while the pair Prabowo and Hatta Rajasa 46.85 percent.

The sophistication of the system count synthesized voice above sites could not be separated from the creator of hard work in preparing the programming code and the role of the volunteers involved in its management.

read more >>

Defending Indonesia’s democracy with Technology

indonesia-election-pilpres

Tomorrow is a monumental day in Indonesia, when the Elections General Commission (KPU) will be announcing who’s the next president of Indonesia based on the official voting tally. In the two weeks since the vote took place, both candidates declared that they have won based on different quick count results, and neither of them are backing down from their claims today. Because of this, many people in the country have turned to tech, creating initiatives such as online crowdsourced vote counts that aim to make the contested count more transparent. The most “open source” initiative of them all is Pilpres2014.org 1.

As with the other vote counting sites that have popped up since the July 9 general election, Pilpres2014 lets you see the counting results based on the vote tally documents released on KPU’s website. Furthermore, visitors can also see data visualizations based on the tallies, like bubble graphs and deep bar hierarchies (which I personally love; see the video below). The data is updated every two hours.

read more >>