Self Ramblings

Stuff that matters to me

Sharpe Ratio with Pandas

written on Saturday, November 17, 2012

Encouraged by a friend, I have recently enrolled in an online course on cousera to learn about investment computation. The course is being given by the amazing Prof. Tucker Balch at Georgia Tech Institute. The course aims to teach building equities portfolios using python, and it does make a heavy use of numpy and pandas.


The first assignment we had is to select four stock equities using data from last year, 2011 and to build a portfolio out of those four. The portfolio should have as high as possible of a Sharpe Ratio. Sharpe ratio is a standard formula to measure the volatility of a stock performance, introduced by Nobel laureate William F. Sharpe and is calculated as

Sharpe \Rightarrow \frac{{\sqrt{n}\times{AVG(d)}}}{{STDEV(d)}}

Where n is the number of business days, in a US trading year that is 250. d is the daily return as a vector for the given period.

Now, the idea is to build a portfolio were when one equity declines, another equity in the same portfolio inclines immediately to cancel the change in the overall total performance. This is pretty hard to implement given that we cannot change the percentage of how to divide the total investment over the equities once we make it.

The simplest solution is to get equities with high Sharpe ratios, meaning they are of low volatility anyway, and hoping the total of them would also yield a low volatility portfolio, aka a high Sharpe ratio.

Here is my approach anyway ...

I managed to get a list of all NYSE stocks from this site which I have saved into a file symbols.txt, each line contains a single ticker.

I used the fantastic request library to download all the data I needed from Yahoo.

import requests
import os

tickers_file = '/path/to/tickers/file.txt'
with open(tickers_file) as f:
    tickers = [l.strip() for l in f.readlines()]

# path is where I save downloaded tickers
path = '/path/to/tickers'
# Yahoo API URL
url = ""
# Yahoo get parameters where date range 2011/01/01 - 2012/01/01
# although 2012/01/01 is an international holiday and won't be in the data
get_params = "s=%(t)s&a=00&b=01&c=2011&d=00&e=01&f=2012&g=d&ignore=.csv"
url = '?'.join([url, get_params])
for t in tickers:
    r = requests.get(url % {'t': t})
    if r.status_code != 200:
    fname = os.path.join(path, '%(t)s.csv' % {'t': t})
    with open(fname, 'w') as f:

The snippet above has downloaded around 4000 files on my machine. Unfortunately not all were useful. Some files didn't have data for the whole year, either because the companies closed, or were not public for the whole year. I had to keep the largest group of files with the maximum number of rows, i.e. similar number of days.

import glob
from collections import Counter

c = Counter()
for fname in glob.glob(os.path.join(path, '*.csv')):
    fpath = os.path.join(path, fname)
    with open(fpath) as f:
        c.update({len(f.readlines()): 1})

Having done the above, I checked my counter and I've found that I had 2441 files with 253 rows in them. I thought it is a good idea to delete other files that I did not need.

for fname in glob.glob(os.path.join(path, '*.csv')):
    fpath = os.path.join(path, fname)
    with open(fpath) as f:
        if len(f.readlines()) == 253:

Now the interesting part starts, pandas comes here really useful.

import pandas as pd
import numpy as np
from random import randint
from glob import glob
import os
import math
import operator

path = '/path/to/tickers/*.csv'
s = dict()
dataframes = dict()
for fpath in glob(path):
    ticker, _ = os.path.splitext(os.path.basename(fpath))
    #df = pd.DataFrame.from_csv(path, parse_dates=False)
    df = pd.read_table(fpath, sep=',')
    df.sort(columns=['Date'], inplace=True)
    df.index = df.index.order()
    df['Cumlative'] = df['Adj Close'] / df['Adj Close'][0]
    daily = df['Adj Close'][1:].values / df['Adj Close'][:-1].values - 1
    s[ticker] = math.sqrt(250) * np.average(daily) / np.std(daily)
    dataframes[ticker] = df

sharpes = list(reversed(sorted(s.iteritems(), key=operator.itemgetter(1))))

Having sharpes as a list of all the sharpes sorted from higher to lower. It is a good idea to see what the top five are:

>>> sharpes[:5]
[('AETI', 2.7101310277193726),
 ('NETE', 2.648418249389036),
 ('ASPS', 2.6116854762593866),
 ('EGAN', 2.4914765781893822),
 ('GLNG', 2.3528848646673186)]

I selected four out of the top five best Sharpe ratios, a limitation set by the assignment. These four equities formed my portfolio, built in Microsoft Excel as a spreadsheet.

Now this is interesting, we need to diversify our fund across the four equities in a way that guarantees the best Sharpe ratio. Microsoft Excel comes with a plugin called Solver that does exactly that.

Here is a screenshot for how it works.

Excel Solver Screenshot

This approach has given me a Sharpe ratio of 3.98, that's one of the highest in the class :)

My next plan is to implement the Solver in Python. I will of course keep this post updated.

This entry was tagged finance, numpy, pandas and python