A hands-on Python tutorial¶

This tutorial will guide you through the basics of Python. We will present 3 problems to solve to be well-equipped for solving computer vision tasks with this language.

Part 1. Python basics¶

Python is a very simple language. To define a variable, simply use the equation sign:

x = 3

Then, you can print the value of the variable with the printing command:


Standard mathematical operations are done with +,-,/ and * commands.

Task 1.1. In the cell below, compute how many foreign residents Switzerland had in 2015 if the total number of inhabitants was 8327126 and the fraction of foreigners was 24.6%. Hint: use round() function to remove decimals from your result.

In [ ]:

Result: 2048473

To define a function, use syntax as in the following example:

def compute_number_of_days(age):
    # this function roughly computes the number of days a person has lived
    days = age * 365
    return days

The function will return the variable written next to the return word and will stop execution after. To add a comment to your code (that will not be executed and is just needed to make the code better readable, use hash (#). To call a function after its definition, just type its name and pass its arguments in the brackets, as in the example below.

days = compute_number_of_days(22)

Functions can also be called from inside another function:

def print_congratulations(age, name):
    days = compute_number_of_days(age)
    print('Hello ' + name + '! You have already lived on this planet for ' + str(days) + ' days!')

In this example, we use + to concatenate strings. days is not a string but a number, so we use the str function to convert it to a string before concatenation.

If you call

print_congratulations(22, 'Nikolay')

the function will print

Hello Nikolay! You have already lived on this planet for 8030 days!

Another important concept of Python is lists. Lists are similar to arrays in other languages, and are used to aggregate multiple (dis)similar objects in a sequence. It is very easy to define them as in the following example:

ages_parents = [51, 52]
ages_children = [2, 4, 10]

This code creates 2 lists. The first list stores ages of parents, and the second stores ages of their kids. You can easily concatenate arrays with the plus sign:

ages_family = ages_parents + ages_children

This code will print

[51, 52, 2, 4, 10]

Another important structure in Python is the for loop, which is used to repeatedly perform the same operation. You can call a for loop for each element of a list in the way shown below:

total_age = 0
for age in ages_family:
    total_age = total_age + age

This code will print 119, which is the total age of the people in the family. Note that you must indent the commands inside the for loop. The end of these commands is implicitly defined by switching to the previous indentation level. This is also the case for other Python structures, such as function definitions, while loops etc.

However, you can also iterate with an index and not array elements. This is useful in many cases, for example, when you have a second array that corresponds to the first one. Consider the following example:

ages_family = [51, 52, 2, 4, 10]
names = ['Patrick', 'Maria', 'Emma', 'Jordi', 'Vasiliy']

We have two lists, the first containing the ages of family members and the second containing their names. To find out the length of an array, you can use len() command. For example,


will print 5. Knowing the length of the list, you can iterate over it by using indexing. Consider the following example:

for i in range(0, len(names)):
    current_name = names[i]
    current_age = ages_family[i]
    print_congratulations(current_age, current_name)

Here, we use [i] to access the i-th element of the list. The above code will print the following:

Hello Patrick! You have already lived on this planet for 18615 days!
Hello Maria! You have already lived on this planet for 18980 days!
Hello Emma! You have already lived on this planet for 730 days!
Hello Jordi! You have already lived on this planet for 1460 days!
Hello Vasiliy! You have already lived on this planet for 3650 days!

You can also nest the loops inside each other – but be sure you don't forget to use proper indentation and a different index variable for each loop. Have a look at the next example:

for i in range(3, 7):
    for j in range (100, 103):
        mul = i * j
        print('If you multiply ' + str(i) + ' by ' + str(j) + ', you get ' + str(mul))

It will print

If you multiply 3 by 100, you get 300
If you multiply 3 by 101, you get 303
If you multiply 3 by 102, you get 306
If you multiply 4 by 100, you get 400
If you multiply 4 by 101, you get 404
If you multiply 4 by 102, you get 408
If you multiply 5 by 100, you get 500
If you multiply 5 by 101, you get 505
If you multiply 5 by 102, you get 510
If you multiply 6 by 100, you get 600
If you multiply 6 by 101, you get 606
If you multiply 6 by 102, you get 612

Task 1.2. For a triangle with base $b$ and height $h_b$, the area is computed as $A=\frac{h_b b}{2}$. You have 100 triangles with base sizes linearly growing from 1 to 100 and heights decreasing from 150 to 51. Write code to compute the total area of all the triangles. You need to define a function to compute an area of one triangle, and then use a for loop to compute the total sum. A hint: when defining the function for area calculation, divide not by 2 but by 2.0. This will trigger type coercion so that a floating-point division is performed instead of an integer division. E.g. 5 / 2 = 2, but 5 / 2.0 = 2.5.

In [ ]:

Result: 212100

Part 2. Working with matrices¶

The real power of Python is revealed when it comes to packages. Packages can be thought of as libraries and provide code and functions for specific applications. One of the most important and popular packages in Python is numpy. It is designed for scientific computing and has excellent support for matrix operations that are essential in Computer Vision. It is also well-documented online (e.g. https://docs.scipy.org/doc/numpy-1.13.0/reference/), with detailed explanations of functions, arguments, concepts etc.

To get started with numpy, you first have to import this package. This is done as follows:

import numpy as np

This will import the numpy package under the name 'np'. For example, to generate a random number, one has to call

x = np.random.rand()

This calls rand() function from the random subpackage.

To define a numpy matrix, the following syntax can be used:

A = np.matrix([[1, 2], [3, 4], [5, 6]])

This will create the following matrix: $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}$.

If you are interested in the dimensions of the matrix, use the shape() function. For example,


will print (3, 2).

To define a vector, use the array constructor, as follows:

b = np.array([1, 2])

To left-multiply a vector by matrix, use dot() operation:

r = A.dot(b)

will print

[[ 5]

Please note that this results in a matrix of shape (3, 1). To transpose a matrix, use a .T construction:


will print

[[ 5 11 17]]

To find a maximal value in the array, you can use the max() function from numpy:


will print


To find the index of the element that contains the maximal value, use the argmax() function:


will print


This is because indices in Python are zero-based, and the last element of a vector of length 3 has index 2.

Task 2.1. You have 50 warehouses that store 3 products with the prices CHF 3, 5, and 1 per product item. The quantity of each product in each warehouse is a random variable uniformly sampled between 1 and 10. Find the warehouse that has the highest value of goods in total. We suggest the following steps to solve the problem:

  • Define random matrix of product quantities. For that, use numpy function np.random.randint(vmin, vmax, (rows, cols)), where vmin and vmax are the minimal and maximal values of uniform distribution, and rows and cols define the dimensions of the matrix.
  • Define the vector of prices.
  • Compute the dot product between the matrix and the vector.
  • Use argmax() to compute the index of the warehouse with the highest value of goods.
In [ ]:

Part 3. Working with images¶

This part will be rather passive. We will guide you through steps for building a naive object detector. Your goal is to follow the code and understand what is happening in every line. Ask a TA if you do not understand something.

We start by loading the needed packages.

In [ ]:
import numpy as np
# This is another way of loading a package. You can use it when you don't need
# to load the whole package, but only some parts of it. We need misc for
# reading an image file
from scipy import misc
# This is a plotting tool for python. We will use it for image visualization
import matplotlib.pyplot as plt
# This is a special command for Jupyter notebook. It forces plots to be refreshed
# when you recompute a cell
%matplotlib inline 

Now we are ready to load the picture and plot it! When an image is loaded, it is a matrix of size (height, width, 3). 3 comes from the fact that there are 3 color channels (red, green and blue). By default, the numbers are integers in range between 0 and 255. We will divide them by 255 to obtain a matrix with values in range (0, 1).

In [ ]:
clown = misc.imread('/home/cvcourse/pics/clown.jpg')
clown = clown / 255.0

We will implement a naive red nose detector. It just detects the red-most pixel in the image. During the course, you will learn more sophisticated ways of solving similar tasks. We present this instructional method to show you the pythonic way of solving matrix-related problems.

We will introduce another image that is just filled with red color. Then, we will compute the difference between the clown picture and the red one and will take the pixel with the smallest difference as the point where the nose is.

First, let us define an image that is just red. As images are encoded with red, green, and blue channels, we create a matrix of the same size as the clown picture. We set the red channel to 1, and green and blue to 0 (that happens by default as we initialize the whole matrix with zeros).

In [ ]:
red = np.zeros(np.shape(clown))
# Setting red channel (that has index 0) to 1
red[:,:,0] = 1

Now we want to compare pixels between the two images. We will use the standard mean-square distance for that:

$d_{ij} = \sqrt{\sum_{k=1}^{3}(s_{ijk} - t_{ijk})^2}$

Every element $d_{ij}$ of the distance matrix $D$ is just a Euclidean distance between pixel values for all 3 color channels in two compared images $S$ and $T$.

To solve this equation in a pythonic way, we first simplify it by splitting it into two tasks:

$r = s - t$

$d_{ij} = \sqrt{\sum_{k=1}^{3}r_{ijk}^2}$

This is exactly the same math, but the beauty of this approach is that one can compute both values with very simple matrix operations. To compute $r$, one simply has to subtract the matrices from each other. The definition of $d_{ij}$ is just a definition of Euclidean norm, for which numpy has a function. So the whole computation can be done with 2 lines of code.

In [ ]:
r = clown - red
# The np.linalg.norm function computes the norm of a vector. We are giving it a tensor of size (200, 185, 3).
# By default, it will give one number that will be the norm of all items in this tensor. However, if we provide
# the axis argument, the function will only compute norm in the given dimension. If we set axis to 2, we get a
# matrix of size (200, 185), every element of which is a norm of (r, g, b) values of a corresponding pixel.
d = np.linalg.norm(r, axis=2)
# When given a matrix with values, the imshow function color-maps them.

Now we can detect the pixel with the lowest value in the matrix (that will be the pixel that corresponds to the red-most pixel in the original image). The argmin function returns an index of the matrix element that has the lowest value. This index is a number from 0 to $N-1$, where $N$ is the number of elements in the matrix. To convert it to (x, y) coordinates, the unravel_index function needs to be used.

In [ ]:
maxind = np.argmin(d)
# We obtain the height and width of the image
clown_hw = np.shape(clown)[0:2]
(y, x) = np.unravel_index(maxind, clown_hw)

The scatter function can be used for drawing a blue point in the picture to identify the nose pixel that our detector has identified.

In [ ]:
plt.scatter(x, y)

As you see, the detector we have built is not exactly perfect — it just returns a random point on the clown's nose. If there were some other red objects in the picture, we could not guarantee that the point would be on the nose any longer. During the course you will learn much more principled ways of image processing.

Good luck!