A hands-on python tutorial¶

This tutorial will guide you through the basics of python. We will present 3 problems to solve to be well-equipped for solving computer vision tasks with this language.

Exercise 1. Python basics¶

Python is a very simple language. To define a variable, simply use the equation sign:

x = 3

Then, you can print the value of the variable with the printing command:

print(x)

Standard mathematical operations are done with +,-,/ and * commands.

Task 1.1. In the cell below, compute how many foreign residents Switzerland had in 2015 if the total number of inhabitants was 8327126 and the fraction of foreigners was 24.6%. Hint: use round() function to remove decimals from your result.

In [2]:
# SOLUTION HERE

Result: 2048473

To define a function, use syntax as in the following example:

def compute_number_of_days(age):
    # this function roughly computes the number of days a person has lived
    days = age * 365
    return days

The function will return the variable written next to the return word and will stop execution after. To add a comment to your code (that will not be executed and is just needed to make the code better readable, use hash (#). To call a function, just print its name and give the arguments in the brackets after, as in the example below.

days = compute_number_of_days(22)

Functions can also be called from inside of another function:

def print_congratulations(age, name):
    days = compute_number_of_days(age)
    print('Hello ' + name + '! You have already lived on this planet for ' + str(days) + ' days!')

In this example, we use + to concatenate strings. days is not a string but a number, so we use str command to convert it to sting before.

If you call

print_congratulations(22, 'Nikolay')

The function will print

Hello Nikolay! You have already lived on this planet for 8030 days!

Another important concept of python is lists. List is similar to arrays in other languages, and is used to store multiple instances of similar information. It is very easy to set them as you see in the following example:

ages_parents = [51, 52]
ages_children = [2, 4, 10]

This code creates 2 lists. In our example, the first list stores ages of parents, and the second stores ages of their kids. You can easily concatenate arrays with the plus sign:

ages_family = ages_parents + ages_children
print(ages_family)

This code will print

[51, 52, 2, 4, 10]

The last important thing that you need to know about python is for loops that are used to repeatedly do the same operation. You can call a for loop for each element of the list in a way shown in the example below:

total_age = 0
for age in ages_family:
    total_age = total_age + age
print(total_age)

This code will print 119 that is the total age of the people in the family.

However, you can also iterate with an index and not array elements. This is useful in many cases, for example, when you have a second array that corresponds to the first one. Consider the following example:

ages_family = [51, 52, 2, 4, 10]
names = ['Patrick', 'Maria', 'Emma', 'Jordi', 'Vasiliy']

We have two lists, the first containing the ages of family members and the second containing their names. To find out the lenght of an array, you can use len() command. For example,

print(len(names))

will print 5. Knowing the length of the list, you can iterate over it by using indexing. Consider the following example:

for i in range(0, len(names)):
    current_name = names[i]
    current_age = ages_family[i]
    print_congratulations(current_age, current_name)

Here, we use [i] to access i-th element of the list. The function will print the following:

Hello Patrick! You have aloready lived on this planet for 18615 days!
Hello Maria! You have aloready lived on this planet for 18980 days!
Hello Emma! You have aloready lived on this planet for 730 days!
Hello Jordi! You have aloready lived on this planet for 1460 days!
Hello Vasiliy! You have aloready lived on this planet for 3650 days!

You can also nest the loops inside each other – but be sure you don't forget to use proper tabulation for each loop. Have a look at the next example:

for i in range(3, 7):
    for j in range (100, 103):
        mul = i * j
        print('If you multiply ' + str(i) + ' by ' + str(j) + ', you get ' + str(mul))

It will print

If you multiply 3 by 100, you get 300
If you multiply 3 by 101, you get 303
If you multiply 3 by 102, you get 306
If you multiply 4 by 100, you get 400
If you multiply 4 by 101, you get 404
If you multiply 4 by 102, you get 408
If you multiply 5 by 100, you get 500
If you multiply 5 by 101, you get 505
If you multiply 5 by 102, you get 510
If you multiply 6 by 100, you get 600
If you multiply 6 by 101, you get 606
If you multiply 6 by 102, you get 612

Task 1.2. For a triangle with base $b$ and height $h_b$, the area is computed as $A=\frac{h_b b}{2}$. You have 100 triangles with base sizes linearly growing from 1 to 100 and heights decreasing from 150 to 51. Write code to compute the total area of all the triangles. You need to define a function to compute an area of one triangle, and then use a for loop to compute the total sum. A hint: when defining the function for area calculation, divide not by 2 but by 2.0. This trick will force python to do floating-number computations. E.g. 5 / 2 = 2, but 5 / 2.0 = 2.5.

In [4]:
# SOLUTION HERE

Result: 212100

Excercise 2. Working with matrices¶

The real power of Python reveals when it comes to packages. Packages can contain code for any specific problem. One of the most important and popular packages in Python is numpy. It is designed for scientific computing and has excellent support for matrix operations that are essential in Computer Vision.

To get started with numpy, you first have to import this package. This is done as follows:

import numpy as np

This will import the numpy package under the name 'np'. For example, to generate a random number, one has to call

x = np.random.rand()

This calls rand() function from random namespace.

To define a numpy matrix, the following syntax can be used:

A = np.matrix([[1, 2], [3, 4], [5, 6]])

This will create the following matrix: $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{pmatrix}$.

If you are interested in the dimensions of the matrix, use shape() function. For example,

print(np.shape(A))

will print (3, 2).

To define a vector, use array constructor, as follows:

b = np.array([1, 2])

To multiply a vector by matrix, use dot() operation:

r = A.dot(b)
print(r)

will print

[[ 5]
 [11]
 [17]]

Please note that this results in a matrix of shape (3, 1). To transpose a matrix, use a .T construction:

print(r.T)

will print

[[ 5 11 17]]

To find a maximal value in the array, you can use the max() function from numpy:

print(np.max(r))

will print

17

To find the index of the element that contains the maximal value, use the argmax() function:

print(np.argmax(r))

will print

2

This is because indexing in Python starts from 0, and the last element of the vector of length 3 has index 2.

Task 2.1. You have 50 warehouses that store 3 items with the prices CHF 3, 5, and 1 per item. The quantinty of each good in each warehouse is a random variable uniformly sampled between 1 and 10. Find the warehouse that has the highest value of goods in total. We suggest the following steps to solve the problem:

  • Define random matrix of prices. For that, use numpy function np.random.randint(vmin, vmax, (rows, cols)), where vmin and vmax are the minimal and maximal values of uniform distribution, and rows and cols define the dimensions of the matrix.
  • Define the vector of prices.
  • Compute the dot product between the matrix and the vector.
  • Use argmax() to compute the index of the warehouse with the highest value of goods.
In [5]:
# SOLUTION HERE

Excercise 3. Working with images¶

This excercise will be rather passive. We will guide you through steps for building a really simple object detector. Your goal is to follow the code and understand what is happening in every line. Ask a TA if you do not understand something.

We start by loading the needed packages.

In [5]:
import numpy as np
# This is another way of loading a package. You can use it when you don't need
# to load the whole package, but only some parts of it. We need misc from
# reading an image file
from scipy import misc
# This is a plotting tool for python. We will use it for image visualization
import matplotlib.pyplot as plt
# This is a special command for Jupyter notebook. It forces plots to be refreshed
# when you recompute a cell
%matplotlib inline 

Now we are ready to load the picture and plot it! When an image is loaded, it is a matrix of size (height, width, 3). 3 comes from the fact that there is 3 color channels (red, green and blue). By default, the numbers are in range between 0 and 255. We will divide them by 255 to obtain a matrix with values in range (0, 1).

In [8]:
clown = misc.imread('/home/cvcourse/images/clown.jpg')
clown = clown / 255.0
plt.imshow(clown)
Out[8]:
<matplotlib.image.AxesImage at 0x107cd9250>

We will implement an extremely simple red nose detector. It is exremely naive and just detects the red-most pixel in the image. During the course, you will learn more efficient ways of solving such task. We present this method to show you the pythonic way of solving matrix-related problems.

We will introduce another image that is just filled with red color. Then, we will compute the difference between the clown picture and the red one and will take the pixel with the smallest difference as the point where nose is.

First, let us define an image that is just red. As images are encoded with red, green, and blue channels, we create a matrix of the same size as the clown picture. We set the red channel to 1, and green and blue to 0 (that happens by default as we initialize the whole matrix with zeros).

In [9]:
red = np.zeros(np.shape(clown))
# Setting red channel (that has index 0) to 1
red[:,:,0] = 1
plt.imshow(red)
Out[9]:
<matplotlib.image.AxesImage at 0x10830af50>

Now we want to compare pixels between the two images. We will use the standard mean-square distance for that:

$d_{ij} = \sqrt{\sum_{k=1}^{3}(s_{ijk} - t_{ijk})^2}$

Every element $d_{ij}$ of the distance matrix $D$ is just a Euclinean distance between pixel values for all 3 color channels in two compared images $S$ and $T$.

To solve this equation in a pythonic way, we first simplify it by splitting it into two tasks:

$r = s - t$

$d_{ij} = \sqrt{\sum_{k=1}^{3}r_{ijk}^2}$

This is exactly the same math, but the beauty of this approach is that one can compute both values with very simple matrix operations. To compute $r$, one simply has to subtract the matrices from each other. The definition of $d_{ij}$ is just a definition of Euclidean norm, for which numpy has a function. So the whole computation can be done with 2 lines of code.

In [10]:
r = clown - red
# The np.linalg.norm function computes the norm of a vector. We are giving it a tensor of size (200, 185, 3).
# By default, it will give one number that will be the norm of all items in this tensor. However, if we provide
# the axis argument, the function will only compute norm in the given dimension. If we set axis to 2, we get a
# matrix of size (200, 185), every element of which is a norm of (r, g, b) values of a corresponding pixel.
d = np.linalg.norm(r, axis=2)
# When given a matrix with values, the imshow function color-maps them.
plt.imshow(d)
Out[10]:
<matplotlib.image.AxesImage at 0x108782090>

Now we can detect the pixel with the lowest value in the matrix (that will be the pixel that corresponds to the red-most pixel in the original image). The argmin function returns an index of the matrix element that has the lowest value. This index is a number from 0 to $N-1$, where $N$ is the number of elements in the matrix. To convert it to (x, y) coordinates, the unravel_index function needs to be used.

In [12]:
maxind = np.argmin(d)
# We obtain the height and width of the image
clown_hw = np.shape(clown)[0:2]
(y, x) = np.unravel_index(maxind, clown_hw)

The scatter function can be used for drawing a blue point in the picture to identify the nose pixel that our detector has indetified.

In [13]:
plt.scatter(x, y)
plt.imshow(clown)
Out[13]:
<matplotlib.image.AxesImage at 0x1087908d0>

As you see, the detector we have built is not exactly perfect — it just returns a random point on the clown's noise. If there was some other red objects in the pictures, we could not guarantee that the point will be on the nose any longer. During the course you will learn much more efficient ways of image processing.

Good luck!