Mining and analyzing text helps organizations find valuable business insights in corporate data. Too often we find the common text mining techniques are not effective in many real-world corporate settings, in particular when dealing with short snippets, boilerplate text collected in forms and repeated text due to cut and paste authoring. Choosing the appropriate combinations of the machine learning techniques between supervised (e.g. classification) and unsupervised (e.g. topic modeling or clustering) learning depends on the end goal and the nature of the data (the content, the size and the sparsity, etc.). Even after deciding on clustering approach, there are still various algorithms to consider including well-known K-means and hierarchical agglomerative clustering. In this presentation, I will illustrate a case study and explain how we tackled these challenges.
Yvonne Li joined The New York Stock Exchange (NYSE) Member Regulation Technology as a data architect in 2005 and later as a director at FINRA, the successor to The National Association of Securities Dealers (NASD) and member regulation/enforcement/arbitration of NYSE. For the past 7 years, Yvonne has been working closely with the Office of Risk and Advanced Analytics Team to develop statistical regression and machine learning models for identifying risks at individual, branch and firm levels of broker-dealers by leveraging both structured and unstructured datasets. Prior to joining FINRA, Yvonne was a consultant at JPMorgan Chase for 8 years where she designed and built data warehouse for the Credit Risk and Credit Rating systems. Yvonne also worked at Salomon Brothers supporting the equity trading desk. Yvonne has a BS and MA in Computer Science. Outside of work Yvonne is an avid runner and a passionate saltwater angler.