VWAP Profiles: A Machine Learning Application

Jeff Bacidore
May 10, 2022
5 min read

Updated: May 11, 2022

The primary input to the VWAP algorithm is the volume profile. In fact, other algorithms, like arrival price algorithms and open/close algorithms, often utilize them as well. Volume profiles give an algorithm information on historical volume patterns. Algorithms generally use this information by trading more when volumes are typically high (e.g., near the close) and less when they are typically light (e.g., mid-day). As you can imagine, the key input to these profiles are past volume data. But the methods used to construct these profiles vary across algo provider. In this article, we discuss the most common ways these profiles are developed and provide a simple example of how machine learning can be used to not only generate better volume profiles, but to do so more efficiently.

Most algorithm providers use one of two general approaches to volume profile estimation. The first involves estimating volume profiles on a symbol-level, where only past data for that symbol is used to estimate the volume profile. Given the noise in volume data, profiles are created by typically averaging over a relatively long period (e.g., several months). The benefit of this approach is that it can pick up symbol-specific nuances. The downside, though, is that even over a long period, the number of observations is relatively small. A year’s worth of data, for example, provides only 252 data points per stock. And going over longer periods risks using “stale” data, data that may not represent current volume patterns. And the problem is even more severe for less actively traded stocks, where the volume profiles can be particularly problematic to estimate on a symbol-by-symbol level.

The second approach uses group-level rather than symbol-level profiles. For example, the profiles of all low-volume Nasdaq stocks could be averaged together to create a common volume profile that is applied to all low-volume Nasdaq stocks. The benefit of this approach is that it reduces variability in volume profiles by averaging over the profiles of similar symbols. This way, variability can be reduced via incorporating more data, but without having to use potentially outdated data, as is the case in the first approach. This grouping methodology is particularly effective when dealing with less actively traded assets, whose profiles can be quite jagged each day. In fact, few (if any) VWAP algorithms rely on the symbol-level approach alone. Algo providers who use symbol-level profiles typically use them only for the most actively traded symbols and rely on grouped profiles for less actively traded ones. The downside of this method, though, is defining what we mean by “similar” stocks. Should we group by market cap? By industry? By liquidity? By all of the above? And how many groups should we have? This involves considerable research, to ensure that stocks within a group are, in fact, similar. And, even then, within any group, there may be stocks that have distinct symbol-level variation that makes the group-level profile inappropriate.

But there is a third approach that aims to capture the symbol-level variation, while exploiting the efficiency of group-level estimates – and without having to pre-define the groups. Specifically, machine learning tools can be used that can group together stocks with similar profiles and provide a representative profile for that group. And this can be done using the exact same data as the other traditional methods. In fact, since the process defines the groupings using only volume data itself, it doesn’t require other background data, like market cap, stock listing, stock type (ADR, etc.).[1]

As an illustration, I used a tool called k-means clustering on a (dated) sample of Russell 3000 stocks. Since the intraday period is basically flat, I created truncated profiles that contain the opening auction, the first 5-minutes of the trading day, the last 5-minutes of the trading day, and the closing auction, for a total of 12 “bins”. I then had the k-means process determine the 10 group-wise profiles to use for the stocks in my sample. The results are shown in the figure below, where each line represents the group-level volume profile for the 12 volume bins described above.

The x-axis denotes open auction, the minute of the day since the open, and the closing auction. The y-axis is the fraction of volume in that bin (e.g., 0.02 means 2% of the day’s volume is traded in that bin). How can we interpret these profiles? The results show that there is one group of stocks that trade a relatively large fraction in the opening auction as well as significant trading just after the open (the one labeled P6 in green). By contrast, profile P2 shows relatively little trading in the open, but a substantial amount of trading in and around the close. As importantly, k-means identifies which stocks are in each group, so we know which profile to apply to each stock going forward. To reiterate, the machine did not know any underlying characteristics about the stocks (exchange, liquidity, etc.) to generate its profile. Rather, the process uses only historical volume data to determine which stocks have similar profiles and what a representative profile for that group would be.

One thing I should point out. In this example, I arbitrarily asked k-means to generate 10 profiles (and groups), as this example is just for illustrative purposes. There are techniques that can be used to determine the “optimal” number of groups and profiles. But regardless, the broader point here is that machine learning tools can be brought to bear in algorithmic trading. Indeed, some firms have noted that they employ similar technology for use in other applications.[2] Going forward, other forms of machine learning will be applied to other areas of algorithmic trading. (I will provide further examples in future blogs). Indeed, machine learning tools are easily accessible and cost-effective (typically free!), and the corresponding computing power needed is now readily available (either by purchase or by rental via Amazon, Google, Microsoft, etc.). Expect to see an increase reliance on machine learning in algorithm development, as firms start to combine the wealth of data thrown off by algorithms every day with modern machine learning tools. And with it, a new level of sophistication and performance.

References [1] Researchers could incorporate this background information into the process as well (e.g., if they want to explicitly have unique volume profiles for ETFs), but it is not entirely necessary. [2] See “Deciphering global execution dynamics for optimal trading,” The Trade, March 2020.

The author is the Founder and President of The Bacidore Group, LLC and author of the new book Algorithmic Trading: A Practitioner's Guide. For more information on how the Bacidore Group can help improve trading performance as well as measure that performance, please feel free to contact us at info@bacidore.com or via our webpage www.bacidore.com.

Please check out our new book Algorithmic Trading: A Practitioner's Guide, available on now on Amazon. Click Here for more details.

For an overview of our other services, please click HERE.

And please check out our other blog posts available HERE.

And if you'd like to receive notification of new posts, please join our mailing list below.

Comments