SQA/CFA Society NY Data Finance Conference, January 18, 2018
Summary notes prepared by Steven Bloom
On January 18, 2018, the CFA Society New York hosted and co-sponsored, along with the Society of Quantitative Analysts (SQA), the full day conference, Data Science in Finance: The Final Frontier? The annual event addressed recent progress in data science applications in the financial industry by bringing together leading academics and industry practitioners to discuss how they are using Data Science in their work. Many thanks to the SQA, led by Inna Okounkova, and CFA Society NY’s Fintech Thought Leadership Group, led by Carole Crawford, for organizing the conference and recruiting the speakers.
When Should we Trust Autonomous Learning Systems with Decisions, Vasant Dhar, SCT Capital Management, New York University.
Vasant Dhar kicked off the day by providing a brief history of how data analysis evolved to become such an important part of investment analysis and trading. Simply, programs and algorithms can detect patterns before people become aware of them, especially when the patterns can only be seen by parsing tremendous amounts of data. Ironically, once an algorithm detects an alpha-generating trading strategy, the aberration is quickly arbitraged away. The eventuality is referred to as alpha decay.
One of the biggest challenges in deploying machine-modeled trading strategies is preventing people from overriding the models. It is difficult for humans to let go. That said, it is often difficult to determine when a model is no longer of value or when market conditions change enough that the model can no longer make useful recommendations. Models are most useful when their predictive value is high and the cost of errors is low. In addition, model transparency helps. Knowing why an algorithm identified an anomaly and the basis for the anomaly certainly helps people determine if the thesis will no longer hold. That said, the future is inherently uncertain and unknowable, which could result in the model’s view no longer being relevant.
The Contextual Bandits Problem: Techniques for Learning to Make High Reward Decisions, Robert Schapire, Microsoft.
Mr. Schapire broke down machine learning in simple terms before expanding on analytical techniques and algorithms to address complicated questions and complex behaviors. Machine learning begins by identifying desired outcomes and noting the actions that led to the outcome with the goal of maximizing reward. For example, what action gets web-page visitors to click on the greatest number of web advertisements or click most often? It may be where the ad is placed on the screen, the message, or user familiarity with the product. The general form of learning can identify patterns that fit decision trees such as: if assumption or action is true, then desired outcome, else different or undesired outcome.
Ultimately, the algorithm should parse through the data and highlight the rules or policies most successful in producing the desired outcome. Data theory can help in solving problems by building on existing applications and increasing model speed and efficiency.
Text a Data, Bryan Kelly, Yale and AQR
Analyzing stock price actions is a relatively manageable process given the speed and capability of today’s computers. The stock universe may be limited to the S&P500, Russell 2000, or Russell 3000. Even measured minute by minute of decades, the analysis is manageable. Analyzing text and word for patterns expands the data universe enormously. Words can take on different meanings in different context. There are a lot of words and a lot of uses for words. Text appears in newspapers, journals, web searches, Facebook postings, and tweets. Clearly, there is no shortage of words to analyze.
Machine learning has already made great advances in representing text as data. Witness the success of speech and voice recognition software used by Alexa, Cortana, Siri, and digital assistants. Many in the finance industry are searching journals and chat rooms to identify how the mention of stocks, the market, and the economy can be tied to changes in investor sentiment and stock performance. For example, does the more frequent appearance of words like “inflation” before a Fed meeting correlate with a high probability in rising interest rates? Topic Modeling—grouping words into topics and topics into data sets—is one method to reduce volume and complexity of data. The objective of some of the searches described by Mr. Kelly is to identify what topics are diagnostic of increasing market volatility, which could lead to new trading strategies.
Search Data and Finance, Paul Gao, University of Notre Dame,
Mr. Gao continued on the theme of how word searches could be used to predict securities pricing. The Google Search Volume Index can measure retail investors’ interest in a stock. Bloomberg also tracks changes in investor attention to specific issues. He has looked at how increases in the number of searches on a topic, and searching on favorable or unfavorable words such as job growth or bankruptcy, retrospectively, correlates with changes in investor sentiment and company performance. For example, the number of searches on one company correlated with the quarterly revenue report.
Media and Its Interaction with Financial Markets, Ronnie Sadka, Boston College
Searching media for news on the market and companies has been a fruitful area for research. The universe is wide. Dr. Sadka and his research team receive feeds from 100k articles each day, including company press releases. Cleaning the data to take note of biases is labor intensive and takes time. His research has demonstrated that there are relationships among positive and negative news stories, the frequency of mentions, sentiment, and volatility. Other areas for research include separating feedback loops from original stories—the more times a topic appears, the more times it will be mentioned or searched subsequently—and establishing causality in addition to association.
What Machine Learning Can and Cannot Do, Claudia Perlich, Dstillery
Machines are good at a number of things—winning at games, observing trends, and measuring probabilities. Machines have not developed intuition (yet!). It still takes a human to apply judgment to algorithm output to determine if the output makes sense, if the association is valid, and if the recommendation can be implemented. In the field of online advertising, the cost of delivering messages can be low, and the cost of mistakes like delivering a message to the wrong person is low. In contrast, the wrong treatment for an ill patient can be very costly. Because patients in a hospice have higher mortality rates than patients at walk-in clinics does not imply that one is necessarily safer, better, or more effective than the other. People are required to develop the skills to know when programs mis-specify a result, help others to adapt technology, and decide which recommendations to put in place.