• Jeff Bacidore

Applications of Machine Learning in Trading: Part 1

In this and future posts, I thought it might be useful to provide some real-world applications of machine learning in a trading context. Because machine learning is such a broad topic, I am going to break this into a few different parts, where each post will discuss a specific class of machine learning. Each post will describe that class of machine learning on a very high level to gain intuition before delving into specific examples. But before jumping into our first set of machine learning applications in this post, I should probably justify why I think machine learning holds such promise for improving trading-related processes and analyses.[1]

Why is Machine Learning Relevant to Trading?

Simply put, trading costs and liquidity are forecastable. For example, trading cost models can be developed for use in forecasting the expected trading cost for a given order using factors like trading volume, spread, volatility, etc. Similarly, past patterns in intraday volume, i.e., volume profiles, can be used to forecast the expected volume patterns for future trading days. Such forecastability of liquidity patterns stands in stark contrast to alpha, which is notoriously challenging to forecast. And to make matters worse, any patterns the researcher identifies are constantly at risk of disappearing as more and more traders move to exploit those patterns. Liquidity costs, on the other hand, cannot be “arbitraged away” because they reflect a fundamental cost of trading, specifically the cost traders must pay to liquidity providers for immediacy. So, while trading costs can vary across time, across assets, etc., they tend to do so consistently and in forecastable ways.[2]

Quantitative techniques therefore can be used to predict liquidity patterns and trading costs, and these forecasts can be used to optimize trading decisions. Prior to machine learning, the most common techniques used in trading applications involved calculating summary statistics, like the mean or median of the variable of interest, perhaps “bucketed” by factors that correlate to liquidity, e.g., market capitalization, average daily trading volume (ADV), etc. These statistics were then used as forecasts going forward.

For example, consider volume profiles used in VWAP algorithms. Such profiles are often constructed by grouping stocks by liquidity, security type (e.g., ADR, ETF), etc. and then averaging the profiles of all the stocks in that group. The VWAP algorithm then uses the group’s average profile as an estimate of the future volume profile for every stock in that group.

Another widely used statistical technique is regression analysis. Regression analysis involves establishing a statistical relationship between the variable you are trying to model and the factors that influence that variable. For example, regression analysis is often used in the estimation of trading cost models, like those mentioned above. Factors such as ADV, spread, volatility, etc. are used to “explain” some trading cost measure, typically execution shortfall, i.e., the slippage between the average execution price and a pre-trade benchmark price. The resulting model built off of past data can then be used to forecast the trading costs of future orders. These regression techniques are often augmented by tools aimed at avoiding overfitting, reducing the impact of outliers, and so forth. But regardless of the exact approach, the most commonly used techniques have been part of the standard statistical toolkit for some time.

So what changed?

The rise of big data along with the dramatic increase in computing power have made it possible to move beyond these common techniques to more computational- and data-intensive ones, with machine learning being probably the most widely discussed. But while machine learning represents a sizeable leap forward in terms of modeling, the underpinnings of machine learning are typically not markedly different from those of conventional tools. For example, complex deep learning models often use the exact same objective function as conventional models.[3] Consequently, machine learning can often be applied to the same sorts of problems that standard statistical techniques have been applied to in the past.

Of course, that doesn’t necessarily mean that machine learning will significantly enhance our predictions, and certainly not in every application. However, I believe that given the nature of the problems we face in trading, machine learning holds tremendous promise.

Applications of Machine Learning in Trading, Part 1: Unsupervised Learning

Examples of Unsupervised Learning

To begin the discussion of machine learning in trading, I start with the most basic type of machine learning: Unsupervised Learning. Unsupervised learning can be thought of as simply a means to identify patterns in data without guidance from the researcher. For example, in regression analysis, the researcher has to decide what variable they are trying to predict (e.g., trading costs), which variables are being used to predict that that variable (e.g., ADV, spread, etc.), and the functional form of the model (e.g., linear). In unsupervised learning, however, the researcher simply inputs variables into the learning algorithm and lets the algorithm identify patterns based solely on the data.

For example, one commonly used unsupervised learning technique is called k-means clustering. This technique finds the “best” k clusters or groupings of the data. For example, if you were to give a sample of data to the k-means algorithm and set k to 10, the algorithm would identify 10 groupings based solely on the input data itself. The k-means algorithm effectively provides two useful insights as outputs. First, the algorithm assigns each data point to one of the k clusters. Second, the algorithm characterizes the “central point” of each cluster.

To make this more concrete, consider a researcher that is trying to estimate the volume profiles to be used in a VWAP algorithm. The most common approach historically has been to identify a list of stock characteristics that may influence the shape of the volume profile and use those to “bucket” the data as discussed above. For example, a stock that is in the SP500 may have more volume in the closing auction than other stocks since more of its flow comes from passive funds executing around the close. Other factors might include things like ADV, special stock type (ETF, ADR), etc. These factors are then be used to create groupings, like the “High ADV, SP500, non-ETF, non-ADR” bucket, and each stock is assigned to one of these groupings. The researcher then estimates the average volume profile for each group using data from all the stocks in that bucket. This average profile is then assigned to each stock in that bucket for use in the VWAP algorithm going forward (i.e., each stock in the “High ADV SP500” bucket will use the exact same profile)

K-means provides a much simpler approach. Instead of the researcher identifying the groups and estimating the volume profile for each group, the k-means algorithm does both as part of its algorithm. It both identifies the groupings and provides an estimate of the volume profile for those groups as a natural output. Specifically, the estimated volume profile is simply the “central point” that each group’s data are clustered around, as noted above. So, with k-means, a researcher simply provides the volume profile of every stock in their universe, and k-means does the rest. The k-means approach is much simpler and less labor-intensive than the traditional approach, and my experience has been that it also is a more effective predictor of volume profiles as well (though it is impossible to say whether I simply am not very good at picking volume profile factors!).

Another example of how k-means can be used in a trading context is provided in a paper I co-wrote a few years ago with some of my colleagues.[4] The paper resulted from our efforts to characterize the trading strategies used by a particular trading client who used a mix of manual trading and multiple algorithms to execute orders. Because they used multiple tools when executing a single ticket, their underlying strategies were not obvious to us simply by looking at their order submissions. To help us identify these strategies, we turned to k-means, which was able to not only identify the client’s most commonly used strategies, but also classifying each of their trades into one of these strategies.[5] At that point, we were able to analyze each of these trading strategies separately and provide feedback on the effectiveness of each.

Main Drawbacks of Unsupervised Learning

Perhaps the biggest drawback of unsupervised learning is that it doesn’t necessarily provide any intuition for the grouping, as they are purely statistical. But this doesn’t mean that there isn’t an intuition behind the bucketing or that the intuition is unknowable. A researcher could gain some intuition by looking at the characteristics of each bucket created by k-means to see if there are any obvious similarities across stocks in that bucket. For example, the researcher may notice that a volume profile with a disproportionate amount of trading midday contains a disproportionate amount of European ADRs. Or a profile with a relatively large amount of trading around the close is skewed toward S&P500 stocks. Effectively, this approach involves working backwards relative to the conventional method: first k-means is used to provide the best grouping and then the researcher looks to the groupings to gain intuition after the fact.[6]

Another more significant drawback of k-means is that it requires the researcher to define how many groupings the learning process should find, i.e., the researcher must provide the k in k-means. Choosing too high a value will result in identifying superfluous splits of the data. Oversplitting results in each bucket having fewer observations on average, and therefore the center point – e.g., the estimated volume profile in our example – would be estimated with less precision. Choosing a k that is too low, on the other hand, will lead to some dissimilar stocks being clustered together, which in turn will lead to the central point being “biased”. But again, the researcher can investigate this issue to determine the best choice of k, for example, by looking at how the out-of-sample prediction error depends on k.

Going forward

Hopefully, this post has provided some useful insight into how even the most basic learning techniques can add value over the more traditional methods. But to be clear, the machine learning techniques with the greatest potential to disrupt the trading, especially algorithmic and quantitative trading, are those which we will discuss in future blog posts. I personally think these techniques will someday become commonplace, providing models and insights that more traditional models simply cannot. And I think that “someday” may be sooner than we think. (Though to be fair, I once wrote something like “someday, algorithmic trading will be so pervasive that we won’t be calling it ‘algorithmic trading’ – we will just be calling it “trading”. That was over 10 years ago – yet I am still writing about “algorithmic trading”!).

The author is the Founder and President of The Bacidore Group, LLC. For more information on how the Bacidore Group can help improve trading performance as well as measure that performance, please feel free to contact us at or via our webpage

For an overview of our other services, please click HERE.

And please check out our other blog posts available HERE.

Copyright 2019, The Bacidore Group, LLC. All Rights Reserved.


[1] In speaking with colleagues at different firms, views on the value of machine learning have been quite varied. At one firm, the business leaders and quants suggested that they were pessimistic on the value of machine learning and were not devoting much resources to it. On the other extreme, another firm described how they wanted machine learning embedded in all of their algos. Others, not surprisingly, fell somewhere in between.

[2] For a more detailed discussion, so our previous blog post:

[3] For example, deep neural networks use the same objective function as logistic regression (categorical variable prediction) and least squares regression (real-valued prediction).

[4] See Cluster Analysis for Evaluating Trading Strategies, Journal of Trading, Summer 2012, Vol. 7, No. 3, pp. 6-11.

[5] For example, one strategy involved back-loaded trading into the close, while another was heavily front-loaded after the open.

[6] To be clear, I am not suggesting that one can conclude that the characteristics of a grouping caused the grouping to happen, or that one should necessarily see an intuitive grouping to conclude that the process “worked”. Rather, the point of this exercise is simply to extend what k-means started, specifically, to gain insights into the data that weren’t accessible using traditional methods. But to be clear, the whole point of unsupervised learning is to allow the data full license to find the groupings and to define each grouping’s central point.


Phone: 914-296-4311


Follow us
  • LinkedIn Social Icon
  • Twitter Social Icon
  • Black YouTube Icon
Join our Mailing List

and receive notification of new blog posts via email 

© 2018-2020, The Bacidore Group, LLC