About Me

SHORT-BIO

Henry Tan Setiawan is currently a Principal Software Design Engineer at Microsoft Research. He focuses on research and development of Deep Learning infrastructure and services (a.k.a Project Adam). He was leading the data classification tech team at BlueKai, the leader in AdTech Big Data company prior getting acquired by Oracle in early 2014. He holds a PhD in Computer Science from University of Technology, Sydney. He joined Microsoft, Redmond, USA in 2006 and helped developed the Bing BigIndex search backend infrastructures and services (a.k.a Maguro) and contributed to the R&D of other large scale distributed services at Microsoft including, but not limited to, Azure Machine Learning, Azure Cloud Storage, and Messenger Server backend services.

Microsoft Research, Redmond

Principal Software Design Engineer, summer 2014 – current

Acting as a tech lead to develop distributed computing infrastructure for training large-scale neural networks.

Recent industry highlight about what we do:

#whatis, #demo, #osdi-presentation, #osdi-pdf

#microsoft-research

Tech Activist during Indonesian Presidential Election 2014

Tech Activist, July 2014

Volunteered to help monitor the vote counting process during Indonesian General Election 2014. Developed automated crawler that constantly monitor KPU’s server. KPU stands for General Election Commission. Since KPU does not officially provide any kinds of Data API, the crawler has to reverse-engineer the html5 page to get the vote counting numbers for each province and district. The crawler code was committed to github and can be accessed from https://github.com/ht4n/Pilpres2014. With this pipeline, all the raw data is periodically fetched, stored, published and shared immediately to public hence promote fair practice and transparency.

Pilpres2014 website: link.

News coverage: #techinasia, #kompas1, #kompas2

#Indonesian-presidential-election

BlueKai – BigData for Marketing

BlueKai is the industry’s leading cloud-based big data platform that enables companies to personalize online, offline and mobile marketing campaigns with richer and more actionable information about targeted audiences.

Senior Software Engineer, Summer 2013 – summer 2014

Leading the Taxonomy Technology project scaling the current classification technology with development of Classification Management System and building a new classification stack on top of Search Engine technology (Text Mining, Machine Learning).

#acquired-by-oracle

Microsoft Redmond, USA

Senior Software Design Engineer, 2006 – summer 2013

Windows Azure Storage (XStore), early 2013 – Late Summer 2013

A member of technical team maintaining and optimizing the index management area. This area deals with traffic load balancing, index maintenance and distribution and monitoring.

In the team C++ committee; ensuring good and best practice of using and applying C++. The committee actively watching code quality from Code Review and the compliance to the guideline; watching and following the trend in industry (e.g. adoption of C++11).

Azure ML Incubation, 2012 July – early 2013

A member of small team incubating Machine Learning data pipeline inside SQL organization, the project was later released as AzureML.
Wrote AzureML web control execution flow editor. The control is developed using modern web stack technologies: HTML5/jQuery/KnockoutJS/SVG.

Core Index Platform @ Bing.com, 2008 Oct – 2012 July

A technical lead in a long range incubation project under the supervision of Knut Magne Risvik and Trishul Chilimbi (MSR), codenamed  ‘Maguro’, rewriting the search engine index serving stack for improving tail query relevance by increasing the number of documents and significant reduction in cost by expanding serving capacity by 10-20x. Working on many core pieces of the technology stack (Matching Engine, Network Subsystem, and Query Execution Engine) and driving many other critical subsystems including, but not limited to, Early Architectural and Implementation Design, E2E System Integration, Perf optimization, Experimentation, Monitoring & UI/Tools development. The technology has been successfully integrated to Bing Index Serving technology stack and serving live traffics.

Knut Risvik, Trishul Chilimbi, Henry Tan, Chris Anderson, Karthik Kalyanaraman, “Maguro, a System for Indexing and Searching over Very Large Text Collections”in ACM International Conference on Web Search and Data Mining ’13 (WSDM), Feb. 2013.

Contribution to the improvement and development of internal modern communication framework used internally in Bing.com. The library is now open-sourced under the name BOND (comparable to Facebook thrift or Google proto-buffer).

Leading few junior developers on shipping a time-critical and tight Service Level Agreement large-scale distributed Index Distribution & Management System, that shares many similarities with underlying Google’s GFS technology. The system manages and coordinates data transfer, regular index updates and maintenance, machine fail over, and data partitioning.

Help modernizing Bing query pipeline components. This core component is used by hundreds of services and executed millions times every day and critical for the relevance quality of the service.

Messenger Server, 2006 Jan – 2008 Oct

Leading the manageability team adopting a large-scale service monitoring & deployment infrastructure platform for live-site management that improves system availability to 99.9% (a.k.a Autopilot).

A team player in geo-distributing large-scale PubSub service to scale up the service world-wide.

Improvement in monitoring automation & alerting system. Developed monitoring gadgets/controls in WPF/Silverlight.

Software Lead Engineer

Smart Car Technologies Pty. Ltd. 2004-2005

Developed and design a full stack of the patented client software (MFC/OpenGL/C++) to build speed limit database in Australia. The software helped securing the seed fund from Venture Capitalist contracts from Sensis.

EDUCATION

PhD in Comp. Science – Graduated 2008

University of Technology Sydney, NSW, Australia

thesis: link, mirror

Bachelor of Computer System Engineering (Hons) – Graduated 2002

La Trobe University, Bundoora, VIC, Australia

thesis: link

AWARDS & ACHIEVEMENTS

Holder of 2003 Australian Computer Society (ACS) Student Award
Nominated the most outstanding Honours Student in Computer Science at La Trobe University (2003)
Nominated the best 3rd year student in Computer System Engineering at La Trobe University (2002)
Listed in 2001, 2002, 2003 Dean’s Honours List at Faculty of Science, Technology and Engineering
The Holder of 2001 Golden Key Honours Society Award

ACTIVITIES

University Talk: Machine Learning in Daily Life – link

Startup Tech Talk: Starting-up within Microsoft – link

ARTICLES

Perspectives: Startup vs Large Tech Corporations? – link

Defending Indonesia Democracy through technology – link

indonesia-election-pilpres

Visualization of High Dimensional data using T-SNE in R – link
Trajectory Motion with SVG/KnockoutJS – link
MVVM with HTML5, SVG, KnockoutJS – link
Speedometer control – link

Sample Image-SpeedoMeter.jpg

Book Publications:
Mining of Data with Complex Structures

Series: Studies in Computational Intelligence, Vol. 333 February 2011

Authors: Fedja Hadzic, Henry Tan, Tharam S. Dillon

The primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it will reach a broad market both within academia and industry. In addition, its subject matter is a rapidly emerging field that is critical for efficient analysis of knowledge stored in various domains.

Conference/Journal Publications:

0. Risvik, KM, Chilimbi, T, Tan, H, Anderson, C, and Kalyanaraman, K. ‘Maguro, a system for indexing and searching over very large text collections’, Proceeding of the 6th International Conference on Web Search and Data Mining (WSDM 2013), Rome Feb 4-8, 2013.

1. Tan, H, Dillon, TS, Feng, L, Chang, E & Hadzic, F 2005, ‘X3-Miner: Mining patterns from XML database’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 6th International Conference on Data Mining (Data Mining’05), Skiathos, Greece, WIT Press, pp. 287-297.

2. Tan, H, Dillon, TS, Hadzic, F, Feng, L & Chang, E 2005, ‘MB3-Miner: Mining eMBedded subTREEs using tree model guided candidate generation’, Proceedings of the 1st International Workshop on Mining Complex Data (MCD’05), Houston, TX, USA, pp. 103-110.

3. Tan, H, Dillon, TS, Hadzic, F, Chang, E & Feng, L 2006, ‘IMB3-Miner: Mining induced/embedded subtrees by constraining the level of embedding’, In WK Ng, M Kitsuregawa & J Li (eds), Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06), Singapore, pp. 450-461.

4. Tan, H, Dillon, TS & Hadzic, F 2006, ‘Razor: Distance constrained mining of embedded subtrees’, in Tsumota & Shusaku (eds), Proceedings of the International Conference on Data Mining (ICDM’06), Hongkong, pp. 8-13.

5. Tan, H, Dillon, TS, Hadzic, F, Feng, L & Chang, E 2007, ‘Tree model guided candidate generation for mining frequent subtrees from XML’, accepted for publication in Transactions on Knowledge Discovery from Data (TKDD).

6. Tan, H, Dillon, TS, Hadzic, F, Chang, E & Feng, L 2007, ‘Mining induced /embedded subtrees using the level of embedding constraint’, submitted to Fundamenta Informaticae.

7. Tan, H, Hadzic, F, Dillon, TS & Chang, E 2008, ‘State of the art of data mining of tree structured information’, Computer System Science and Engineering, vol. 23, no. 4, July 2008 (pending publication).

8. Tan, H, Dillon, TS, Hadzic, F & Chang, E 2006, ‘SEQUEST: Mining frequent subsequences using DMA strips’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 7th International Conference on Data Mining and Information Engineering (Data Mining’06), Prague, Czech Republic, WIT Press, pp. 315-328.

9. Hadzic, F, Dillon, TS, Sidhu, AS, Chang, E & Tan, H 2006, ‘Mining substructures in protein data’, Proceedings of the 6th International Conference on Data Mining Workshop (ICDMW’06) – Invited, Hong Kong, pp. 213-217.

10. Hadzic, F, Tan, H & Dillon, TS 2007, ‘UNI3 – efficient algorithm for mining unordered induced subtrees using TMG candidate generation’, Proceedings of the Computational Intelligence and Data Mining (CIDM’07), Hawaii, USA, pp. 568-575.

11. Hadzic, F, Tan, H, Dillon, TS & Chang, E 2008, ‘U3: Unordered subtree mining using TMG candidate generation and the level of embedding constraint’, (pending publication).

12. Hadzic, F, Tan, H, Dillon, TS & Chang, E 2007, ‘Implications of frequent subtree mining using hybrid support definition’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 8th International Conference on Data Mining & Information Engineering (Data Mining’07), The New Forest, UK, WIT Press, pp. 13-24.

13. Hadzic, F, Dillon, TS & Tan, H 2007, ‘Outlier detection strategy using the self-organizing map’, in X Zhu & I Davidson (eds), Knowledge Discovery and Data Mining: Challenges and Realities, Information Science Reference, Hershey, PA, USA, pp. 224-243.

14. Hadzic, F, Dillon, TS, Tan, H, Feng, L & Chang, E 2007, ‘Mining frequent patterns using self-organizing map’, in D Taniar (ed.), Research and Trends in Data Mining Technologies and Applications: Advances in Data Warehousing and Mining, IGI Global, Hershey, PA, USA, pp. 121-135.

15. Sidhu, AS, Dillon, TS & Setiawan, H 2004, ‘XML based semantic protein map’, in A Zanasi, NFF Ebecken & CA Brebbia (eds), Proceedings of 5th International Conference on Data Mining, Text Mining and their Business Applications (Data Mining’04), Malaga, Spain, WIT Press, pp. 51-60.

16. Sidhu, AS, Dillon, TS & Setiawan, H 2004, ‘Comprehensive protein database representation’, in A Gramada & PE Bourne (eds), Proceedings of the 8th International Conference on Research in Computational Biology (RECOMB’04), ACM Press, San Diego, CA, USA, pp. 427-429.

17. Sidhu, AS, Dillon, TS, Sidhu, BS & Setiawan, H 2004, ‘Protein knowledge meta model’, Molecular & Cellular Proteomics, pp. 262-263.