“When Should We Trust Autonomous Learning Systems with Decisions?”

by Joe Iagulli

Vasant Dhar is a Professor of Information Systems at NYU Stern School of Business and the founder of SCT Capital Management, a firm implementing an Adaptive Quant Trading (“AQT”) program, which uses machine learning (“ML”) pattern recognition algorithms (“algos”) to make investment decisions.   Vasant started his ML based education and career in the 1990’s and 2000’s with data mining projects at Morgan Stanley and proprietary trading at Deutsche Bank, before establishing SCT.

One of the biggest ideas raised was “when can you trust a machine with your investing decisions?”.  The task of finding a consistent strategy with outperformance is very difficult for humans.  Algorithmic machine trading has shown to be effective for short-term trading but not so for longer-term investing..  There are several processes being implemented for algorithmic processes, not only for investments but also for driverless cars, flying drones, early life education, spam email, and disease detection, with varying levels of success.  All these processes depend on a level of trust that humans need to have in machines for these automated processes to being widely adapted.

Vasant detailed a chart called the “automation frontier”, to help evaluate processes for machine learning applications, based on two factors.

  • Predictability (X axis), from low signals and very random (ex. short-term & high frequency trading) or high signals and very certain (ex. early life education or driverless cars).
  • Cost per Error or Mistake (Y axis), the risks to humans if machine makes a wrong decision. There is no physical harm to mistakes in trading or advertising, but certainly very harmful with driverless cars and misdiagnosed diseases.
  • Low signals and cost per error originate at origin of the chart while high signals are to right and high errors are towards the top of the chart.
  • High predictability and low cost per error indicate high ability for automation.
  • Low predictability and high cost per error indicate low ability for automation.
  • Generally, more data and improved algorithms can help move processes to the right.

With investment management, Vasant generally observed that model complexities can be a killer to efficient research processes, data availability and scope will be very impactful, and back-test models that performed well will often do poorly in reality due to model overfitting the data or having too high of expectations.

A few insights from Q&A session:

  • People have certainly inquired about automation of jobs across industries. If you look at employment in financial services, it’s been relatively positive, with a changing set of opportunities in the industry, mostly for technology related aspects.   A lot of new people in quantitative analysis and more recently with cryptocurrencies.
  • The overall financial services industry is currently experiencing immense changes in how customers pay, invest, and save money. Automation and technology will happen in most industries. Younger people have better opportunities to pivot and re-train for these disruptions while older workers aren’t as willing to adapt.
  • Shorter term decision making processes are more suited to machines while longer term decisions will take longer for disruption.
  • Unstructured data can possibly be interesting but there has been a lot of noise and bad data points (bots, fake info.) and Vasant hasn’t incorporated it quite yet.

“The Contextual Bandits Problem: Techniques for Learning to Make High-Reward Decisions”

Robert Schapire is a Principal Researcher at Microsoft Research and former professor of computer science at Princeton University, who has extensively researched theoretical and applied machine learning over his career.  Robert detailed the Contextual Bandits Problem (“CBP”), a theorem where an AI algo learns to choose actions based on observed rewards with the goal of learning to maximize these rewards.  Two examples of a CBP includes: Websites choosing which advertisements are best suited to users and optimal medical treatments prescribed by doctors.

The general outline to the CBP follows a process of formalizing the learning problem or what needs to be solved, followed by which algorithms are best used to implement, and used with relevant applications.  A formal model to the CBP involves gathering the relevant context to address a problem, determining the reward values for certain actions (reward vectors), running through several observations of chosen policies and determining the perceived rewards from policies.  There is a determined “best policy” given rewards per chosen policies and the goal is to optimize the algorithm to make the closest decisions to this “best policy” as much as possible, while also minimizing error with low reward policies.

Some of the challenges inherent with the formal model include very large amount of possibilities (decision trees), algo learns about all policies while also attempting to converge on the best policy, and only observing rewards for policies within the policy space.

“Text as Data”

Bryan Kelly is Professor of Finance at the Yale School of Management and has primary research with asset pricing and financial econometrics.

In the 1930’s Alfred Cowles III was an American economist who researched if professional stock forecasters could accurately forecast markets. He read Wall Street Journal articles from the previous 30 years, made judgements of Bull, Bear, or Neutral sentiments, and created back-tested portfolios based on these market calls. Alfred discovered that a passive Dow Jones portfolio performed better over this time horizon.

Today, there are almost infinitely more sources of information from the early 20th century and it’s practically impossible for humans to make informed decisions on all sources of written pieces.  In the past two years alone, humans have created more data in size than all previous human history.  Computers can now read, analyze, and score text observations into “tokenized” numerical data while assigning value to these observations.  There is an explosion for the amount of research in text or unstructured data, where humans usually can’t observe text as numerical matrices the way computers can.

Financial markets inherently possess low signals and a lot of noise in available data.  The more predictability someone can find, the more this advantage is priced into markets, defined with the efficient markets hypothesis. Text differs from traditional quantitative data in two ways:

  • Text is super high dimensional. A big challenge is reducing this dimensionality to manageable level.  This can be achieved by restricting attention to base words or phrases, cutting out very common (is, the, and, etc.) and very rarely used words, and dropping numbers and punctuation.  This produces a “bag of words” that can be placed in a time-series dataset for frequency observations.
  • Text is inherently sparse, with a few words making up most of a recorded language. A study of the Wall Street Journal from 1979 – 2017 showed that there were 927K unique words, and 742K of these words show up less than 10 times in all records for this period.  50% of the total dataset was comprised of 208 words and nearly 77k words comprised 99% of all content in WSJ through this period.

Penalized Regression is a method to help compress or reduce dimensionality of text data while also throwing out meaningless data and assigning high coefficient predictor values to the most relevant text.

Topic Modeling is a factor model for word counts, that looks to cluster certain words into associated topics.  For example, energy is a topic and “oil”, “reserves” and “solar” are associated words. This process will often visualize a time series pattern (usually as a line chart) for the frequency of certain clusters of words related to a given topic.

Conclusions from Brian’s work and research

  • Digital text is a rich repository of information on economic and social activity.
  • Modern computational tools make it possible to harness text into quantitative form.
  • Availability of text data and frontier of methods are expanding rapidly.
  • Text analysis in empirical economics will only accelerate in years to come.
  • These algorithms require increasing data available to be remain stable and efficient.
  • Text-based predictors can be used aside the known consensus indicators, to see results and evaluate indications of efficiency.

“Search Data and Finance”

Paul Gao is an Associate Professor of Finance at Notre Dame University.

Traditional asset pricing models assume that information is incorporated into prices instantaneously, but this is usually with information that is conditioned to be relevant and important.  Information needs to attract investor attention, otherwise it might lead to slow information diffusion and underreaction to possibly relevant news.

While examining the impact of investor attention, participants are split into retail and institutional investor bases.  Retail investors are less likely to immediately act on information, and are triggered by salient events, resulting in delayed, positive, and temporary price pressures.  Institutional investors possess greater resources and are more likely to immediately transact on news, having permanent price impacts.

Given the speed and availability of news releases in this modern era, its difficult to measure investor attention.  Google’s Search Volume Index (“SVI”) (accessible at trends.google.com) is a direct measure of attention and is mostly attributable to retail investors.  SVI is a tool provided by google that can help any user visually model and track terms that are searched in Google for a variety of time-series and cross-sectional data.

An example includes the impact of retail investor attention before an Initial public offering (“IPO”).  Companies that reach a high amount of pre-IPO SVI will experience better returns on the IPO date vs those companies with low amount of pre-IPO SVI (median 1st day return was 12% vs 6%).  But in the following 4-12 months of an IPO, the companies with low amount of pre-IPO SVI would outperform the high pre-IPO SVI companies (adjusted by industry, median post-IPO period +1.4% vs -19.5%).

Recently, a similar measure was developed for Institutional investors. Abnormal Institutional Attention (“AIA”) is based on the number of times Bloomberg users read an article about a firm or search for news on a firm.  Bloomberg calculates this measure by averaging the number of reads or searches on a firm over an 8-hour period and compares this to a moving average of 8-hour periods over the past 30 days.  The measure is given a score of 0 if below the 80th            percentile of moving average, but assessed a higher score for any movement up the percentile range (max score of 4 if +95th percentile).

By measuring the attention of institutional investors, this can reveal the speed of information diffusion.  Slow information diffusion can lead to large underreactions, while fast information diffusions lead to small underreactions.

“Media and Its Interaction with Financial Markets”

Ronnie Sadka is a Professor of Finance at Boston College.

Media is an avenue through which information is gathered, processed, and disseminated to the public.  Research on media data is typically focused on direct effects of media coverage to predict returns with individual stocks or US aggregate equity.  Ronnie’s research studies the interaction of media with individual equities, US aggregate equity, as well as currencies.

Media Types were separated into these sources:

1) General media (WSJ, USA Today, NYT)

2) Specialized Media (P&I, finance services audience)

3) Company press release

4) Social Media

“Negative News Sells”; With general and specialized media, writers are typically compensated to produce information whether its new and relevant or they just need to produce regular reports.  Information from specialized media and company press releases are generally positive sentiment and might possess bias of opinions.  Distance from a source can show biases as well, as a reporter that’s closer and more associated to a firm, is less likely to have negative sentiment compared to outside reporter.

Examples of recent media sentiment measures include the following:

  • Brexit Vote (June 23rd 2016): Global Equity sentiment.  In the early day of Brexit vote, global sentiment was mostly positive.  When Brexit vote was announced, majority of sentiment turned negative, while Russia and China were noticeably the inverse.
  • USA Presidential Election (Nov 8th 2016): Excluded USA and examined foreign exchange sentiment.  Produced a similar reaction globally as Brexit vote earlier in the year.
  • French President election (April 23, 2017): European equity sentiment.  Negative sentiment across most countries prior to vote results, which reversed after the Macron victory.

A sample study was created using 10-day laddered portfolios, while also examining relative autocorrelation in different markets, based on weekly sentiment data and price returns of foreign exchange, individual equities and aggregate country equites.  While examining the results of these sample portfolios over a 4-year period from March ’13 to April ’14, a reinforcement effect emerged.

Measuring from five days prior to sentiment event(s), positive returns and high sentiment will lead to a reversal while a positive return and low sentiment will be a neutral pattern over the following ten days.   Negative returns and low sentiment will lead to higher returns compared to negative returns with high sentiment over same period.

“What Machine Learning Can and Cannot Do”

Claudia Perlich is SVP of Data Science at Two Sigma Investments.

Predictive modeling can be defined as gathering data and trying to determine a machine learning (“ML) model with applicable predictability.  Machine Learning and Artificial Intelligence (“AI”) has certainly developed in several ways as follows.

1) Beating human players in games like Go, Poker, and Jeopardy.

2) Exploiting musical theory and producing very interesting compositions.

3) Recognizing imagery in identifying objects and signals, imperative to driverless vehicles.

4) Showing internet users relevant advertisements in their internet browsing.

Professionals will need to evolve from the changes in society due to the increasing amounts of data and capabilities of AI technology.  Some of the desired skills of data analytics professionals are detailed below.

  1. Knowing when the problem is not a machine learning problem. Usually due to an issue with the data. Training a model to recognize a unicorn, which doesn’t exist or predicting the election of Donald Trump, not previously a politician, are both instances where machine learning would fail given there is no historical dataset.
  2. Stepping back from defining solutions before you have the right tools. Users need to examine data applicability before assessing what insights or possible conclusions can be made.
  3. Determining when the algorithms aren’t appropriate and if results from certain tests can draw unrelated conclusions.
  4. Which “wrong” problem is the right one to solve. If someone trains a model on a different dataset, it could be an effective predictor for a lightly related problem.  With investment data, some datasets might be too old to be applicable given the changing regimes or market structures.
  5. Knowing when the right model is predicting the wrong things. When someone clicks on online advertisements (“ads”), a lot of these instances are accidents.  Additionally, programmed bots are made to automatically click on ads.  Although these bots are usually predictable, they can sway a dataset if not excluded.
  6. Knowing when your model is too good to be true and finding out why. You shouldn’t be able to predict human behavior with too much certainty. Humans aren’t as predictable as bots and will have random actions that can’t be modeled or anticipated.

Good intuition, experience and dedication are all very necessary to be most effective in developing models.  How we use these models will greatly impact our actions.  They need to be tools that help make decisions and used as a secondary input, not as an automated decision making.