Functions and Modules


  • Function definitions
  • Function documentation
  • Functions within functions
  • Modules
  • Modules of Interest: sys, math, collections
  • Making your own libraries


This afternoon we'll concentrate on our last fundamental programming concept for the course. To date, we've been writing all of our program logic in the main body of our scripts. And we've seen how built-in python functions like raw_input() are used to operate on variables and their values. In this session, we'll learn how to write functions of our own, how to properly document them for ourselves and other users, and how to collect them into modules, and make our own local repositories, or libraries.

If you properly leverage a well-designed function, writing the main logic of your programs becomes almost-too-easy. Instead of writing out meticulous logical statements and loops for every task, you just call forth your previously-crafted logic, which you've vested in well-made functions.


Functions are the basic means to manage complexity in your programs, allowing you to avoid nesting and repeating large chunks of code that could otherwise make your tasks unmanageable. They allow you to bundle code with a defined input and output into single lines, and you should use them frequently from now on.

We will start with the syntax:

#!/usr/bin/env python
# define the function
def hello(name):
 greeting = "Hello %s!" % (name)
 return greeting
# use the function
functionInput = 'Zaphod Beeblebrox'
functionOutput = hello(functionInput)
print functionOutput

To define a function, you use the keyword def. Then comes the function name, in this case hello, with parentheses containing any input arguments the function might need. In this case, we need a name to form a proper greeting, so we're giving the hello() function a variable argument called name. After that, the function does its thing, executing the indented block of code immediately below. In this case, it creates a greeting "Hello <name>!". The last thing that it does is return that greeting to the rest of the program.

Technically speaking, a function does not need to explicitly return something, although it's uncommon that you'll write any that don't. If you don't return something explicitly, Python will nevertheless return the special object None. None is logically false (for if statements), and printing None will result in nothing being printed (although None is not the empty string). It's easy to forget to return a value, so this is an easy first thing to check in case your functions don't work as expected.

Note that the variable names are different on the inside and the outside of the function: I give it functionInput, although it takes name, and it returns greeting, although that return value is fed into functionOutput. I did this on purpose, as I want to emphasize that the function only knows to expect something, which it internally refers to as name, and then to give something else back. In fact, there is some insulation against the outside world, as you can see in this example:

#!/usr/bin/env python
def hello(name):
 greeting = "Hello %s!" % (name)
 testVariable = """The hotel room is a mess, there's a chicken hangin'
                   out, somebody's baby is in the closet, there's a
                   tiger in the bathroom that Mike Tyson wants back, Stu
                   lost a tooth and eloped, and Doug is missing."""
 print 'Inside of the function:', testVariable
 return greeting
testVariable = "What happens in Vegas stays in Vegas."
grt = hello("Stu Price")
print 'Outside of the function:', testVariable

Even though the epic story of a bachelor party gone horrifically awry was assigned to a variable called testVariable inside the function, nothing happened to that variable outside the function. Variables created inside a function occupy their own namespace in memory distinct from variables outside of the function, and so reusing names between the two can be done without you having to keep track of it. (Refer to this article about namespace for more information.) That means you can use functions written by other people without having to keep track of what variables those functions are using internally. Just like a sleazy town in Nevada, what happens in the function stays in the function. (An important exception lies with lists and dictionaries, which you will examine in the exercises.)

What happens if you try to print testVariable outside the function and you don't assign anything to it?

Let's have another example, returning to a more pressing subject:

#!/usr/bin/env python
def whichFood(balance):
    if balance < 10:
        return 'ramen'
    elif balance < 100:
        return 'good ramen'
    elif balance < 200:
        return 'better ramen'
        return 'ramen that is truly profound in its goodness'
print whichFood(14)

Here we've made a slightly more complicated function-- it contains some control statements, and there is more than one way for it to return. We also never explicitly create an input variable (as we did with functionInput in the first example), and we don't store the output to a variable either (as we did with functionOutput).

Finally, we've shown examples with one input variable and one return value, but functions can accept zero input variables, one input variable, or multiple input variables, and functions don't necessarily need to return variables back to the program, but they are also capable of returning multiple variables. They can even have other functions nested inside them!

Here are a few more examples of the syntax used with functions:

#!/usr/bin/env python
# functions can do their thing without taking input or returning output
def useless():
    print 'What was the point of that?'
def countToTen():
    for i in range(10):
        print i
Notice what you print inside the function gets printed if you call on the function, even if you don't return anything.
#!/usr/bin/env python
# functions can also take multiple items in and return multiple items out
def doLaundry(amtDetergent, dirtyClothes):
    cleanClothes = []
    for load in dirtyClothes:
        amtDetergent -= 1
    return (amtDetergent, cleanClothes)
amtTide = 5
print "Starting amount of Tide:",amtTide
print "Let's do some laundry!"
dirtyLaundry = ['socks','shirts','pants']
(amtTide, cleanLaundry) = doLaundry(amtTide, dirtyLaundry)
print "Amount of Tide left:", amtTide
print cleanLaundry
#What happens if you only give this function one argument, or more than two arguments?

Above, in doLaundry, I returned a tuple of the two variables enclosed in parenthesis. You could also return a list, which works much the same way. (Some more information on the distinctions between tuples and lists can be found at this link.) You could return other objects as well, like dictionaries.

#!/usr/bin/env python
def returnStuff():
    a = '>Gene1'
    b = 'ATGGTGGG'
    return [a,b] # returns the output as a list
print type(returnStuff())
# We can index the output the same as any list
print returnStuff()[0]
print returnStuff()[1]
(name, seq) = returnStuff() # stores output to the variables name & seq,
                            # so you can access name and seq directly
print name
print seq
both = returnStuff() # stores the output to the variable both
                     # which will be a list
print both
dictOfStuff = {}
dictOfStuff[returnStuff()[0][1:]] = returnStuff()[1]
print dictOfStuff

So how do functions make our lives easier? We can exploit functions to break difficult tasks into a number of easier tasks, and then these easier tasks into ones easier still, and so on. Large code blocks, with a few function calls, are only tens of lines long, and many functions are only a handful of lines. This allows us to program in large, structural sweeps, rather than getting lost in the details. This makes programs both easier to write and easier to read:

def publishAPaper(authors,topic,journal):
 data = doWork(topic)
 figures = analyze(data)
 paper = writePaper(data,figures)

And, a big part of that ease comes with the use of:


In all of the examples above, we defined our functions right above the code that we hoped to execute. If you have many functions, you can see how this would get messy in a hurry. Furthermore, part of the benefit of functions is that you can call them multiple times within a program to execute the same operations without tiresomely writing them all out again. But wouldn't it be nice to share functions across programs, too? For example, working with genomic data means lots of time getting sequence out of FASTA files, and shuttling that sequence from program to program. Many of the programs we work with overlap to a significant degree, as they need to parse FASTA files, calculate evolutionary rates, and interface with our lab servers, for example -- all of which means that many of them share functions. And if the same function exists in two or more different programs, we hit the same problems that we hit before: complex debugging, decreased readability, and, of course, too much typing.

Modules solve these problems. In short, they're collections of functions and variables (and often objects, which we'll get to towards the end of the course) that are kept together in a single file that can be read and imported by any number of programs.

Using a module: the basics

To illustrate the basics, we'll go through the use of two modules, sys and math, one of which we use almost all the time. In fact, it's a very rare program indeed that doesn't use the sys module. sys contains a lot of really esoteric functions, but it also contains a simple, everyday thing -- what you typed on the command line. To illustrate, if we were to create a new program called and type the following commands into Terminal:

$ ./ argument1 argument2 argument3

then the sys module would contain a list of strings called argv composed of the following: ['./', 'argument1', 'argument2', 'argument3']. We can access the list argv from our program by importing the module sys.

Copy the following into
#!/usr/bin/env python
import sys # gaining access to the module
# you can access variables stored in the module by using a dot
# to get at the variable 'argv' which is stored in 'sys', type:
commandLine = sys.argv
print commandLine

$ ./ hi world
['', 'hi', 'world']

From (And here's another relevant one you can appreciate today:
From (And here's another relevant one you can appreciate today:

Conveniently, we can access functions stored inside modules. To demonstrate this, I'll use the module math.

#!/usr/bin/env python
import sys
import math
# sys.argv contains only strings, even if you type integers.
# And, remember, the first element is the command itself-- usually
# not very useful.
x = float(sys.argv[1]) # argv stores the command line arguments as
                       # strings, but python isn't especially clever,
                       # so we can't do math with strings
logX = math.log(x)
print logX

And to run it:

$ ./ 3

There's actually a really great module that lets you call your program really easily from the command line, without having to manually parse out what each of the arguments does. I'll show you how to use that next week.

Great! Not so hard.

Modules have more than just functions: The collections module

We already knew this: sys.argv is a list. Another thing that modules often contain is datatypes. Just as Python has some built-in datatypes (like int, list, str, and dict), it's also possible (although outside the scope of this course) to create full-fledged data types of your own.

One of the more useful of these is the collections module. It has a bunch of new data types that are, as you might guess from the name, collections of other things. There are two of them that I use with some regularity: Counter and defaultdict. Let's start with Counter, which counts things.

#!/usr/bin/env python
import collections
my_genera = ['Helicobacter', 'Escherichia', 'Lactobacillus', 'Lactobacillus', 'Oryza',
 'Wolbachia', 'Oryza', 'Rattus', 'Lactobacillus', 'Drosophila']
c = collections.Counter(my_genera)
print c
# or
d = collections.Counter()
for genus in my_genera:
    d[genus] += 1
print d

Now, we could do this same thing with a dictionary:

e = {}
for genus in my_genera:
    if genus not in e:
        e[genus] = 0
    e[genus] += 1
print e

But using a Counter is faster to write, shorter to read, and makes it more obvious that we are counting, as opposed to a dictionary, which could be used for almost anything. Another big advantage of the Counter type is that it makes it really easy to sort by frequency:

c = collections.Counter(my_genera)
print c
print c.most_common()

The other collections type I really like is the defaultdict, which is like a dictionary, but has a default type for a key that we haven't seen before (with a normal dict, if you try to read something where the key isn't in the dict, then you get an error). Let's think about how we'd make a dictionary where each key is a genus, and the value is a list of species in that genus:

import collections
my_species_list = [('Helicobacter','pylori'), ('Escherichia','coli'),
              ('Lactobacillus', 'helveticus'), ('Lactobacillus', 'acidophilus'),
              ('Oryza', 'sativa'), ('Wolbachia', 'pipientis'), ('Oryza', 'glabberima'),
              ('Rattus', 'norvegicus'), ('Lactobacillus','casei'),
d1 = {}
for genus, species in my_species_list:
    if genus not in d1:
        d1[genus] = []
print d1

With a defaultdict, we can once again save the line in the for loop where we check for a non-existent key:
d2 = collections.defaultdict(list)
for genus, species in my_species_list:
print d2

One thing to look at is the line where we actually declare the defaultdict: here we've given it another type, and if we use a key that's not in the dictionary already, it will initialize it to be an empty variable of that type. Most often, this will be a list, but you could imagine uses for other types, like a string, an integer (here "empty" actually would mean 0), or even another dict. It's possible to even have a defaultdict of defaultdicts, but doing so would require covering more than we have time for.

It turns out that it's easy to write our own modules too:

Making a module

Any file of python code with a .py extension can be imported as a module from your script. When you invoke an import operation from a program, all the statements in the imported module are executed immediately. The program also gains access to names assigned in the file (names can be functions, variables, classes, etc.), which can be invoked in the program using the syntax Go ahead and make your first module by pasting the following code into your text editor and saving as

#!/usr/bin/env python
print 'The top of the greeting_module has been read.'
def hello(name):
 greeting = "Hello %s!" % name
 return greeting
def ahoy(name):
 greeting = "Ahoy-hoy %s!" % name
 return greeting
x = 5
print 'The bottom of the greeting_module has been read.'

Now make a new program called with the following code and include your first name as an argument in the Terminal command line when you execute it:

#!/usr/bin/env python
import greeting_module
hi = greeting_module.hello('Aisha')
print hi
print greeting_module.x
# What happens if you try 'print x' here?
# Remember how to access argv?
import sys
print greeting_module.hello(sys.argv[1])
# This will take your Terminal argument as input for the greeting
# module's hello function

And that's it! See-- no more messy function declarations at the beginning of your script. Now if you need any other program to say hi to you, all you need to do is import the greeting module.

Using modules: slightly more than just 'import'

Although creating a basic module is easy, sometimes you want more than just the basics. And although using a module in the most basic manner is easy, it's best to get a more thorough picture of how modules behave.

First, what if you only want one function from a given module? Let's say, as an Alexander Graham Bell loyalist, you really only dealt in 'ahoys' rather than 'hellos.' We need to use a modified syntax for retrieving only the ahoy function from the module, without cluttering things up by loading the newfangled hello function preferred by T.A. Edison's entourage.

Change the code in to the following:

#!/usr/bin/env python
from greeting_module import ahoy
hi = ahoy('everybody')
# if you grab a function from a module with a 'from' statement,
# you don't need to use the <module>.<function> syntax
print hi

We see that we can now write ahoy('everybody') directly, instead of having to write greeting_module.ahoy('everybody'). And if we wanted to access both functions this way, we could import them both in one statement by changing the import line in to the following:

#!/usr/bin/env python
from greeting_module import ahoy, hello

Or, what if there were a lot of functions from the greeting_module we wanted to use, but didn't want to write out the full name? Rather than writing out all of the function names to import individually (there could be a lot of them), we can use the asterisk wildcard (*) symbol to refer to them.

#!/usr/bin/env python
from greeting_module import *
hi = ahoy('everybody')
hi2 = hello('everybody')
print hi
print hi2

While this may be useful if we are familiar with the contents of the module, including all of the names inside, there are a few reasons to be careful about using the from modulename import * syntax. First, if the module contains a lot of variables that we don't need to use, we will needlessly allocate memory to storing the information. Second, and perhaps more importantly, if the module being imported contains variables with the same names as those inside your program, you will lose access to the original values of those variables. For example, would might have a problem if both and each define distinct functions called hello(). If instead you use the syntax import yourmodule, then you can call the function in using hello() and you can call the function in using yourmodule.hello(). If you want to import a whole module, but don't want to type out it's full name every time, you can use the syntax: import a_long_module_name as mname.

Finally, you can also import variables from modules and assign them new names in your program using the syntax from modulename import variablename as newvariablename.

Where to Store Your Modules: using PYTHONPATH

Over time, you'll end up accumulating lots of these modules, and they'll tend to fall together in meaningful collections. For example, you might have a module for all your functions related to reading and parsing files, called You might have another for common sequence-related tasks, called Python keeps its modules installed in a system directory that you may or may not have access to on a remote server. Therefore, it's useful and simpler to just create your own python modules directory and then let your operating system environment know about it. Here, I accomplish this by placing my modules in /home/mel/python_modules and then adding a few lines to my .bash_profile file in my home directory with the following terminal commands:

$ echo 'PYTHONPATH=~/python_modules' >> ~/.bash_profile
$ echo 'export PYTHONPATH' >> ~/.bash_profile
$ source ~/.bash_profile

NOTE: ~ is a shortcut to your own full home path: e.g. '/Users/mel/'. (Remember, you can see what your home path is called in Terminal by typing pwd from your home folder.)

And with that, any file that ends up in this directory will be treated as a module by Python. And though this is a good final resting place for your polished modules, you can also prototype them by simply saving them in your current working directory, and moving them over when you're happy with them.


1: Practice with functions

Make a function that:

A) Takes an integer x as input and prints x * 2.

B) Takes integers x and y as input and prints x * y.

C) Takes a list xs as input and prints xs[0] * xs[1].

D) Modify the above programs so that the function returns the result instead of printing it, then the output is printed from program that called the function.

2. What happens in functions doesn't always stay in functions

As promised, most things that happen in functions stay in the functions, but there are important exceptions. Make the following functions, which should illustrate this property:

A) The function takes an integer as input and increments the integer by one using the '+=' operator. Print the value of the integer before and after the function is called.

B) The function takes a list as input and changes the first element of the list to the string 'x'. Print the value of the list before and after the function is called.

C) The function takes a dictionary as input and adds the key 'x' with value 'y' to this dictionary. Print the dictionary before and after the function is called.

3. Reverse Complement

A) Write a function that takes a DNA sequence as an argument, ensures that it the sequence is in capital letters, and then returns the reverse complement of the sequence.

B) Modify the function to ensure that only the characters A, T, G, C and N (for unknown nucleotide) are in the input sequence.

4. Making a module

Create a directory in your PythonCourse directory called pylib, then add it to your PYTHONPATH. Create a module in this directory called Put your functions from Exercise 1 into this module. Now write two programs that import and call all of the functions in the module both of these ways:

A) A program that uses the line import exercises.

B) A program that uses the line from exercises import *. What happens if you have print statements in Are they printed when you use the from statement?

C) Add your reverse complement function from Exercise 3 to this module.

5. Make a FASTA parser

Starting with your script from this morning, make a function that takes a FASTA file as input, reads through the file using open(), distinguishes between ID-containing lines and sequence-containing lines, and returns a dictionary with gene IDs as keys and sequences as values. Put this function along with your reverse complement function into a module.

Copy and paste the following lines into a file called testFasta.fa. Create a program that imports the module and prints the sequence corresponding to the gene ID 'gene3.'

6. (Bonus) Create an ORF finder

For our purposes, we will define an open reading frame (ORF) as a start codon followed at some distance by a stop codon in the same frame. This program should take a dictionary from a parsed FASTA file as in (5) as input and outputs dictionary of gene name->ORF sequence key-value pairs. If the sequence does not contain an ORF, then the gene name should not be in the dictionary.

7. For This and Giggles.

Try out the following code:

#!/usr/bin/env python
import this

#!/usr/bin/env python
import antigravity


1. Practice with Functions

#!/usr/bin/env python
# a) Takes an integer x as input, prints x*2 (x multiplied by 2)
def timestwo(x):
 print '%.0f multiplied by 2 is %.0f' % (x, x * 2)
num = float(raw_input('Input number to multiply by 2: '))
x = timestwo(num)
# b) Takes integers x and y as input, prints x * y
x=int(raw_input('First number: '))
y=int(raw_input('Second number: '))
print 'You entered the numbers %i and %i.' %(x,y)
def product(x,y):
 print "The product of the first two numbers is %.0f." % (x*y)
multiplied = product(x,y)
# c) Takes a list x as input, prints x[0] * x[1]
listOfNumbers = [2,3,3,4]
def product(x):
 result = x[0] * x[1]
 print 'You supplied the list: %s' % (x)
 print 'The product of the first two numbers in the list is %.0f.' % (result)
multipliedNumbers = product(listOfNumbers)
print multipliedNumbers # returns None
# d) Modify the above programs so that the function returns
# the result instead of printing it. This result is then
# printed by the program that called the function.
listOfNumbers = [2,3,3,4]
def product(xs):
 result = xs[0] * xs[1]
 print 'You supplied the list: %s' % (xs)
 return result
multipliedNumbers = product(listOfNumbers)
print 'The product of the first two numbers in the list is %.0f, but this time we returned the result from the function.' % (multipliedNumbers)

2: What happens in functions doesn't always stay in functions.

# a) The function takes an integer as input, and it increments that integer
#    by one using the '+=' operator. Print the value of the integer before
#    and after the function is called.
def increment(numToIncrement):
    numToIncrement += 1
numberToIncrement = 5
print 'The number to increment was', numberToIncrement
print 'The number is still', numberToIncrement
# b) The function takes a list as input, and it changes the first element
#    of the list to the string 'x'.
#    Print the value of the list before and after the function is called.
def modifyList(x):
    x[0] = 'overwrite'
stringlist = ['1', '33', '5', 'dog'] # could have used list of integers,
                                     #  or any type of list
print 'The list was', stringlist
print 'Now the list is', stringlist
# c) The function takes a dictionary as input, and it adds the key 'x'
#    with value 'y' to this dictionary.
#    Print the dictionary before and after the function is called.
def appendToDict(Dict_with_a_new_name):
    Dict_with_a_new_name['x'] = 'y'
Dict = {}
Dict['0'] = 'zero'
Dict['1'] = 'one'
Dict['2'] = 'two'
print 'Before:', Dict
print 'After:', Dict

3. Reverse Complement

# 3 Reverse Complement
def revComp(seq):
    seq=seq.upper() # Makes seq uppercase
    seq=seq[::-1] # Reverses seq
    seq=seq.replace('A','t') # Replace ACGT with lowercase complement
    seq=seq.upper() # Make seq uppercase again
    print '\nisitempty',isitempty
    if isitempty != "":
        print "Careful, improper characters!"
    return seq
##Let's try it out!
print 'Input sequence is',seq
print 'The reverse complement is',revComp(seq)
def revCompIterative(watson):
    complements = {'A':'T', 'T':'A', 'C':'G', 'G':'C', 'N':'N'}
    watson = watson.upper()
    watsonrev = watson[::-1]
    crick = ""
    for nt in watsonrev:
        crick += complements[nt]
    return crick
print revComp("aTNrg")

4. Make a module

# Make a directory in /Users/[username]/PythonCourse/pylib
# Open a new terminal window and type the following:
# cd ~
# echo "PYTHONPATH=~/PythonCourse/pylib" >>.bash_profile
# echo "export PYTHONPATH" >>.bash_profile
# source .bash_profile
# Create a file called in the pylib folder,
# copy in your timestwo() function
# To verify it worked, try part a
#Part a
import exercises
print exercises.timestwo(4) # or whatever your function was called
#Part b --note, this should be run separately from part a
from exercises import timestwo
print timestwo(6)
#Part c
#Copy the reverse complement function from problem 3 to
# PythonCourse/pylib/
import exercises
print exercises.revComp('agct')

5. Make a FASTA parser

Here is the module:
def fastaParser(filename):
    current_gene = ""
    genes = {}
    fh = open(filename, 'r')
    for line in fh:
        line = line.strip()
        if line.startswith('>'):
            current_gene = line[1:]
            genes[current_gene] = ''
            genes[current_gene] += line
    return genes
And here is where I call the FASTA parser from a different script:
##Use the fastaParser to parse seq.FASTA
from sequence_tools import fastaParser
x = fastaParser('seq.FASTA')
print x

6. BONUS-Create an ORF Finder

import sequence_tools
import sys
def find_orfs(sequence):
 """ Finds all valid open reading frames in the string 'sequence',
 and returns them as a list"""
 starts = find_all(sequence, 'ATG')
 stop_amber = find_all(sequence, 'TAG')
 stop_ochre = find_all(sequence, 'TAA')
 stop_umber = find_all(sequence, 'TGA')
 print starts, stop_umber
 stops = stop_amber + stop_ochre + stop_umber
 ##print stops
 orfs = []
 for start in starts:
    for stop in stops:
       if start < stop and (start - stop) % 3 == 0: # Stop is in-frame
          # the +3 includes the stop codon
          # break out of the inner for loop
          # when we hit the first stop codon
 return orfs
def find_all(sequence, subsequence):
 ###Returns a list of indexes within sequence that are the start of subseq
 start = 0
 idxs = []
 next_idx = sequence.find(subsequence, start)
 while next_idx != -1:
    start = next_idx + 1 # Move past this on the next time around
    next_idx = sequence.find(subsequence, start)
 return idxs
fname = sys.argv[1] # Read in from the first command-line argument
genedict = sequence_tools.fastaParser(fname)
orfdict = {}
for gene in genedict:
   gene_seq = genedict[gene]
   orfs = find_orfs(gene_seq)
   if len(orfs) > 0:
     orfdict[gene] = orfs
print orfdict

7: For This and Giggles

import this should print 'The Zen of Python to the Terminal'

import antigravity open's your web browser and points it to the xkcd comic about import antigravity (a bit meta, no?)