People learning to program often struggle with how to decompose a problem into the steps necessary to write a program to solve that problem. This is one of a series of posts in which I take a problem and go through my decision-making process that leads to a program.
Problem: Download stock prices from a website, extract the closing prices for each day, and then plot the stock prices as well as a moving average of the prices. I assume that you understand statements, conditionals, loops, lists, strings, functions, and file input/output.
You can find a video with more details at https://www.youtube.com/watch?v=55piV5_cFeA
Many programs that I write read a file that I have created. In this case, the program will download data from a website and therefore I am not completely sure of the form of the data. I start by downloading the data and looking at it; the first couple of lines are
Date,Open,High,Low,Close,Volume 2019-08-20,9.46,9.64,9.46,9.56,58377 2019-08-21,9.62,9.77,9.62,9.76,71992
This shows me that
- the file format is CSV (comma separated values), so it will be easy to extract the desired information.
- there is a header line, which I will need to remove, and that the prices I want are in the 5th column (index 4).
- the data downloaded as a single string, so using a newline as a delimiter was necessary to create ‘lines’ from it.
After downloading the data, we need to extract the closing prices. This consists of going through each line, tokenizing it, and getting the closing price on that line. The price is added to a list that will ultimately be plotted.
Now I want to produce the moving average. This is actually the most difficult part of the program. Let’s start with what a moving average is. A moving average of a set of values consists of many averages, each of which is based on a subset of the data. For example, if the data consists of daily prices and we want a 3-day moving average, then we find the average of the first three prices, then the average of days 2, 3, and 4, then the average of days 3, 4, and 5, and so forth. The purpose of the moving average is to smooth out the prices in order to see the general trend.
So how do we do this? Let’s say we have list of prices
p = [1, 2, 3, 4, 5, 6]
and we want the 3-day moving average of it. The indices of the list begin at 0, so the first average in our moving average is (p + p + p)/3. The second average will be (p + p + p)/3. The third average will be (p + p + p)/3. So what is the pattern? If the index of the first number is i, then the current average is (p[i] + p[i+1] + p[i+2])/3. If I generalize to allow moving averages of sizes other than 3, then an average as part of a w-day average is (p[i] + p[i+1] + … + p[i+w – 1])/w. In some languages I would have to use a loop to perform this calculate, with i changing as the loop iterated. In Python, I can use slicing to achieve this.
Finally, how will we plot the data? There is a library for Python called
matplotlib that is not installed by default, so I need to install it myself. This library requires
numpy, which I also need to install.
matplotlib plots pairs of coordinates using
plot( X, Y )
where X is a list of the x-coordinates of the points and Y is a list of the y-coordinates of the points. The numbers in the moving average are the y-coordinates; their indices in the list are the x-coordinates.
import urllib.request def getWebpage( src ) : fp = urllib.request.urlopen(src) webpage = fp.read().decode('utf-8') fp.close() # webpage is a string, so tokenize # delimiter is a newline tokens = webpage.split('\n') return tokens def getPrices( d ) : # header = Date,Open,High,Low,Close,Volume # 2019-08-20,9.46,9.64,9.46,9.56,58377 prices =  size = len(d) i = 1 # skip header line while i < size : t = d[i].strip().split(',') prices.append( float(t) ) i += 1 return prices def produceAvg( d, windowSize ) : # purpose: produce running average of stock prices size = len( d ) movavg = [ ] i = 0 while i < size - windowSize + 1 : subTotal = sum(d[i : i+windowSize]) movavg.append( subTotal/windowSize ) i += 1 return movavg def plotStocks( p, m ) : # plotting requires matplotlib and numpy, which # are not installed as part of the default # Python installation. import matplotlib.pyplot as plt x = range( len(p) ) plt.plot( x, p ) x = range( len(m) ) plt.plot( x, m ) avgTitle = "%d day moving average" % window plt.legend(["daily closing price", avgTitle], loc = "lower left") plt.title( 'stock prices' ) plt.show() ##### main ##### # Note that at some point this link may no longer work url = "https://stooq.com/q/d/l/?s=googl.us&d1=20190820&d2=20200820&i=d" data = getWebpage( url ) # I noticed that the list contains an extra blank string # at the end, so remove with pop data.pop() # extract just the daily closing prices prices = getPrices( data ) # ask user how many numbers to use in moving average window = int(input("Enter window size: ")) movavg = produceAvg( prices, window ) plotStocks( prices, movavg )