SHORT-BIO
Microsoft Research, Redmond
Principal Software Design Engineer, summer 2014 – current
Acting as a tech lead to develop distributed computing infrastructure for training large-scale neural networks.
Recent industry highlight about what we do:
#whatis, #demo, #osdi-presentation, #osdi-pdf
Tech Activist during Indonesian Presidential Election 2014
Tech Activist, July 2014
Volunteered to help monitor the vote counting process during Indonesian General Election 2014. Developed automated crawler that constantly monitor KPU’s server. KPU stands for General Election Commission. Since KPU does not officially provide any kinds of Data API, the crawler has to reverse-engineer the html5 page to get the vote counting numbers for each province and district. The crawler code was committed to github and can be accessed from https://github.com/ht4n/Pilpres2014. With this pipeline, all the raw data is periodically fetched, stored, published and shared immediately to public hence promote fair practice and transparency.
Pilpres2014 website: link.
News coverage: #techinasia, #kompas1, #kompas2
#Indonesian-presidential-election
BlueKai – BigData for Marketing
BlueKai is the industry’s leading cloud-based big data platform that enables companies to personalize online, offline and mobile marketing campaigns with richer and more actionable information about targeted audiences.
Senior Software Engineer, Summer 2013 – summer 2014
Leading the Taxonomy Technology project scaling the current classification technology with development of Classification Management System and building a new classification stack on top of Search Engine technology (Text Mining, Machine Learning).
Microsoft Redmond, USA
Senior Software Design Engineer, 2006 – summer 2013
Windows Azure Storage (XStore), early 2013 – Late Summer 2013
- A member of technical team maintaining and optimizing the index management area. This area deals with traffic load balancing, index maintenance and distribution and monitoring.
- In the team C++ committee; ensuring good and best practice of using and applying C++. The committee actively watching code quality from Code Review and the compliance to the guideline; watching and following the trend in industry (e.g. adoption of C++11).
Azure ML Incubation, 2012 July – early 2013
- A member of small team incubating Machine Learning data pipeline inside SQL organization, the project was later released as AzureML.
- Wrote AzureML web control execution flow editor. The control is developed using modern web stack technologies: HTML5/jQuery/KnockoutJS/SVG.
Core Index Platform @ Bing.com, 2008 Oct – 2012 July
- A technical lead in a long range incubation project under the supervision of Knut Magne Risvik and Trishul Chilimbi (MSR), codenamed ‘Maguro’, rewriting the search engine index serving stack for improving tail query relevance by increasing the number of documents and significant reduction in cost by expanding serving capacity by 10-20x. Working on many core pieces of the technology stack (Matching Engine, Network Subsystem, and Query Execution Engine) and driving many other critical subsystems including, but not limited to, Early Architectural and Implementation Design, E2E System Integration, Perf optimization, Experimentation, Monitoring & UI/Tools development. The technology has been successfully integrated to Bing Index Serving technology stack and serving live traffics.
Knut Risvik, Trishul Chilimbi, Henry Tan, Chris Anderson, Karthik Kalyanaraman, “Maguro, a System for Indexing and Searching over Very Large Text Collections”in ACM International Conference on Web Search and Data Mining ’13 (WSDM), Feb. 2013.
- Contribution to the improvement and development of internal modern communication framework used internally in Bing.com. The library is now open-sourced under the name BOND (comparable to Facebook thrift or Google proto-buffer).
- Leading few junior developers on shipping a time-critical and tight Service Level Agreement large-scale distributed Index Distribution & Management System, that shares many similarities with underlying Google’s GFS technology. The system manages and coordinates data transfer, regular index updates and maintenance, machine fail over, and data partitioning.
- Help modernizing Bing query pipeline components. This core component is used by hundreds of services and executed millions times every day and critical for the relevance quality of the service.
Messenger Server, 2006 Jan – 2008 Oct
- Leading the manageability team adopting a large-scale service monitoring & deployment infrastructure platform for live-site management that improves system availability to 99.9% (a.k.a Autopilot).
- A team player in geo-distributing large-scale PubSub service to scale up the service world-wide.
- Improvement in monitoring automation & alerting system. Developed monitoring gadgets/controls in WPF/Silverlight.
Software Lead Engineer
Smart Car Technologies Pty. Ltd. 2004-2005
Developed and design a full stack of the patented client software (MFC/OpenGL/C++) to build speed limit database in Australia. The software helped securing the seed fund from Venture Capitalist contracts from Sensis.
EDUCATION
PhD in Comp. Science – Graduated 2008
University of Technology Sydney, NSW, Australia
Bachelor of Computer System Engineering (Hons) – Graduated 2002
La Trobe University, Bundoora, VIC, Australia
thesis: link
AWARDS & ACHIEVEMENTS
- Holder of 2003 Australian Computer Society (ACS) Student Award
- Nominated the most outstanding Honours Student in Computer Science at La Trobe University (2003)
- Nominated the best 3rd year student in Computer System Engineering at La Trobe University (2002)
- Listed in 2001, 2002, 2003 Dean’s Honours List at Faculty of Science, Technology and Engineering
- The Holder of 2001 Golden Key Honours Society Award
ACTIVITIES
- University Talk: Machine Learning in Daily Life – link
- Startup Tech Talk: Starting-up within Microsoft – link
ARTICLES
- Perspectives: Startup vs Large Tech Corporations? – link
- Defending Indonesia Democracy through technology – link
- Visualization of High Dimensional data using T-SNE in R – link
- Trajectory Motion with SVG/KnockoutJS – link
- MVVM with HTML5, SVG, KnockoutJS – link
- Speedometer control – link
Book Publications:
Mining of Data with Complex Structures
Series: Studies in Computational Intelligence, Vol. 333 February 2011
Authors: Fedja Hadzic, Henry Tan, Tharam S. Dillon
The primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it will reach a broad market both within academia and industry. In addition, its subject matter is a rapidly emerging field that is critical for efficient analysis of knowledge stored in various domains.
Conference/Journal Publications:
0. Risvik, KM, Chilimbi, T, Tan, H, Anderson, C, and Kalyanaraman, K. ‘Maguro, a system for indexing and searching over very large text collections’, Proceeding of the 6th International Conference on Web Search and Data Mining (WSDM 2013), Rome Feb 4-8, 2013.
1. Tan, H, Dillon, TS, Feng, L, Chang, E & Hadzic, F 2005, ‘X3-Miner: Mining patterns from XML database’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 6th International Conference on Data Mining (Data Mining’05), Skiathos, Greece, WIT Press, pp. 287-297.
2. Tan, H, Dillon, TS, Hadzic, F, Feng, L & Chang, E 2005, ‘MB3-Miner: Mining eMBedded subTREEs using tree model guided candidate generation’, Proceedings of the 1st International Workshop on Mining Complex Data (MCD’05), Houston, TX, USA, pp. 103-110.
3. Tan, H, Dillon, TS, Hadzic, F, Chang, E & Feng, L 2006, ‘IMB3-Miner: Mining induced/embedded subtrees by constraining the level of embedding’, In WK Ng, M Kitsuregawa & J Li (eds), Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06), Singapore, pp. 450-461.
4. Tan, H, Dillon, TS & Hadzic, F 2006, ‘Razor: Distance constrained mining of embedded subtrees’, in Tsumota & Shusaku (eds), Proceedings of the International Conference on Data Mining (ICDM’06), Hongkong, pp. 8-13.
5. Tan, H, Dillon, TS, Hadzic, F, Feng, L & Chang, E 2007, ‘Tree model guided candidate generation for mining frequent subtrees from XML’, accepted for publication in Transactions on Knowledge Discovery from Data (TKDD).
6. Tan, H, Dillon, TS, Hadzic, F, Chang, E & Feng, L 2007, ‘Mining induced /embedded subtrees using the level of embedding constraint’, submitted to Fundamenta Informaticae.
7. Tan, H, Hadzic, F, Dillon, TS & Chang, E 2008, ‘State of the art of data mining of tree structured information’, Computer System Science and Engineering, vol. 23, no. 4, July 2008 (pending publication).
8. Tan, H, Dillon, TS, Hadzic, F & Chang, E 2006, ‘SEQUEST: Mining frequent subsequences using DMA strips’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 7th International Conference on Data Mining and Information Engineering (Data Mining’06), Prague, Czech Republic, WIT Press, pp. 315-328.
9. Hadzic, F, Dillon, TS, Sidhu, AS, Chang, E & Tan, H 2006, ‘Mining substructures in protein data’, Proceedings of the 6th International Conference on Data Mining Workshop (ICDMW’06) – Invited, Hong Kong, pp. 213-217.
10. Hadzic, F, Tan, H & Dillon, TS 2007, ‘UNI3 – efficient algorithm for mining unordered induced subtrees using TMG candidate generation’, Proceedings of the Computational Intelligence and Data Mining (CIDM’07), Hawaii, USA, pp. 568-575.
11. Hadzic, F, Tan, H, Dillon, TS & Chang, E 2008, ‘U3: Unordered subtree mining using TMG candidate generation and the level of embedding constraint’, (pending publication).
12. Hadzic, F, Tan, H, Dillon, TS & Chang, E 2007, ‘Implications of frequent subtree mining using hybrid support definition’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 8th International Conference on Data Mining & Information Engineering (Data Mining’07), The New Forest, UK, WIT Press, pp. 13-24.
13. Hadzic, F, Dillon, TS & Tan, H 2007, ‘Outlier detection strategy using the self-organizing map’, in X Zhu & I Davidson (eds), Knowledge Discovery and Data Mining: Challenges and Realities, Information Science Reference, Hershey, PA, USA, pp. 224-243.
14. Hadzic, F, Dillon, TS, Tan, H, Feng, L & Chang, E 2007, ‘Mining frequent patterns using self-organizing map’, in D Taniar (ed.), Research and Trends in Data Mining Technologies and Applications: Advances in Data Warehousing and Mining, IGI Global, Hershey, PA, USA, pp. 121-135.
15. Sidhu, AS, Dillon, TS & Setiawan, H 2004, ‘XML based semantic protein map’, in A Zanasi, NFF Ebecken & CA Brebbia (eds), Proceedings of 5th International Conference on Data Mining, Text Mining and their Business Applications (Data Mining’04), Malaga, Spain, WIT Press, pp. 51-60.
16. Sidhu, AS, Dillon, TS & Setiawan, H 2004, ‘Comprehensive protein database representation’, in A Gramada & PE Bourne (eds), Proceedings of the 8th International Conference on Research in Computational Biology (RECOMB’04), ACM Press, San Diego, CA, USA, pp. 427-429.
17. Sidhu, AS, Dillon, TS, Sidhu, BS & Setiawan, H 2004, ‘Protein knowledge meta model’, Molecular & Cellular Proteomics, pp. 262-263.