Statistical Arbitrage with Python and Quantopian
- Sydwell Rammala

- Jul 13, 2020
- 5 min read
Updated: Oct 1, 2020
There are a variety of ways to interact with financial markets. Some of these methods are complex, while some are a simple. In between these two extremes there exists a variety of potential strategies that will either result in profit or loss depending on how they are used. In this article, we discuss a type of arbitrage, the advantages and pitfalls associated with it, and finally we conclude with two simulations. Let us begin!

Arbitrage
According to the Merriam-Webster dictionary, arbitrage is defined as the nearly simultaneous purchase and sale of foreign exchange or securities in different markets in order to profit from differences in price. Arbitrage as a process ensures that the price of a good (or the real exchange rate) in two different markets is the same for the good in question. If the good is worth more in one market than the other, than an arbitrage opportunity exists.
One intuitive advantage of arbitrage is that it ensures that the price of a good is worth the same value in two different markets. It would not make sense for Unilever, as a dually-listed company, to be worth more in the Netherlands than in the United Kingdom. Irrespective of where it is bought, the security or securities purchased represent ownership in the same company and should therefore be worth the same value on both exchanges.
One disadvantage of arbitrage would come from the transaction costs and taxes associated with the buying and selling of the security in question. So even though it might work from a theoretical perspective, the actual implementation thereof may be impractical.
Regarding Statistical Arbitrage
According to an article published on Nasdaq.com, statistical arbitrage is nothing more than a fancy name for pair trading, which is the purchase and sale of a pair of stocks based on their association with one another. A practitioner who practices pair trading would observe the association between two stocks and buy and sell whenever the association is out of sync. The practitioner makes these transactions assuming that the association will continue in future.
This method is not fool proof, but rather another tactic to use when interacting with the financial markets. Comparing the definition of statistical arbitrage to the definition of arbitrage provided above reveals that statistical arbitrage, even though not arbitrage in the strictest sense of the word, does involve a lot of the mechanics rooted in the plain definition of arbitrage. Also, individuals in mathematics want to sound fancy sometimes, and I have no reason to not encourage them to do their thing

The association between securities In order to apply statistical arbitrage to trade a pair of securities, one needs to find a relationship between the securities. In particular, the pair of securities must be cointegrated. According to Quantopian, if two securities' time series are cointegrated, there exists a linear combination of the two securities that varies around the mean.
In order to check for such a relationship, one must search through a group of stocks that can be related in some way or another. This relationship could be because the group of shares are related to one another because of industry, or because the group of companies being analyzed procure their inventory from the same supplier or another similar economic relationship.
In order to illustrate cointegration, we will look at the data for alternative energy securities. In particular, the securities being analyzed are Abengoa, S.A. - American Deposi, Ascent Solar Technologies Incorporated, China Sun Energy Co,Daqo New Energy Corp, First Solar Incorporated, and the S&P500. Because these securities belong to the same sector, it can be expected that they are affected by similar variables. However, we will not be guessing or assuming whether the securities are cointegrated. We run python code that does the cointegration test for us, and the result of that test is shown below.

Remember, pairs trading is dependent on the existence of some linear combination between the securities, which we tested for using cointegration. Even though other linear combinations probably exist between the securities, we are interested in the spread between the securities. In order to calculate the spread, we will make use of linear regression. We run python code to obtain the spread, and thereafter visualize the spread.

We will normalize the spread calculated above and treat it as a z-score. Understanding what a z-score is an important part for the type of pairs trading presented in this article. However, for the sake of brevity, an in depth discussion of z-scores will not be given here. For a thorough explanation, I recommend checking out the following Quantopian Lecture.
After conducting our analysis, we can come up with the following strategy:
Go Long whenever the z-score is below –1.0;
Go short whenever the z-score is above 1.0; and
Exit positions whenever the z-score approaches 0.
The strategy above is one that shows what statistical arbitrage is in a nutshell. An advantage of this particular method would be that it provides the benefits of hedging. In this strategy we are matching a long position with a short position whenever we trade, which gives users of this strategy protection from capital losses.
Including Updating Statistics Assuming that the spread calculated above will remain constant during the entire time we are long or short the securities in the strategy does not make practical sense. It does not make sense because the factors that are used to calculate the spread are constantly changing. Therefore, in order to improve how the spread is calculated, we will base its calculation on a rolling mean, which is also known as a moving average.
A moving average is a calculation to analyze data points by creating a series of averages of different subsets of the data. A shorter moving average will fluctuate a lot, while a longer moving will appear smoother. We will also need a moving average of the beta, in order to keep all the variables used in our calculation up to date.

Having computed the moving averages, we can compute a z-score of the spread for each day that we analyze the data, based on the calculated moving averages. This will be indicative of the volatility of the spread, and give us an indication as to whether it is a profitable idea to enter into a position. Also, this time around we plot the spread against the two securities that satisfy the cointegration condition.

Backtesting the proposed strategy The pair trading strategy was developed with data from 2013-11-01 to 2014-09-13. During this particular point in time, the strategy was profitable.
However, when it is implemented outside of the interval mentioned above, we see that it is no longer profitable.This is because cointegration no longer holds between the two securities. As unpleasant as these results may be, they underline the importance of verifying the performance of a proposed strategy on market data before using it for live trading.
Important Note:
This article was based extensively on the Quantopian Lecture on Pairs Trading. For more information on pairs trading and the risks associated with it, I recommend the Quantopian lecture series on pairs trading.
References
Arbitrage, Merriam-Webster.com
Dual-Listed, Oxford.com
Dont be fooled by the fancy name--statistcial arbitrage is a simple way to profit, Nasdaq.com
Introduction to Pairs Trading, Quantopian.com
Confidence Intervals, Quantopian.com
What is Hedging, thestreet.com
Giphy.com, Giphy.com




Comments