In this course, we will review some mathematical concepts wich are very useful in computer science research.
To illustrate these concepts, we will use the power of (source) code.
As part of the course evaluation, you will also write some code (Luke, use the source), and obviously do some math exercises
The course language is Python. It could be matlab, R, or even Java, C, (or the language of preference), but I choose Python because:
If you don't known Python, don't worry. It's a nice opportunity to learn. There are a lot of very good material freely available on the interner.
I won't teach you Python (it's out of the scope of this course), but I can eventually help you with some language issues. We will review some import things before we startd
The course handouts are written using jupyter notebooks.
I'll make them available on the internet.
The course handouts (these notebooks) are written in English. If you can't read English, do be worried. You must so, thus start studing English now.
The intended audience are graduate students pursuing studies towards conclusind a M.Sc. or Ph.D. degree, which have a good backgroung in programming, but wish (or need) to revisit some math concepts.
The student’s prior programming experience can be in pretty much any programming language. Although the course is based in Python, no deep knowledge of the language is necessary.
The topics will not (intentionally) be covered in depth. Most of the content is already covered in undergraduate courses of Calculus, Linear Algebra, Discrete Mathmatics and Statistics. However, the intention is also to show a different, and more practical perspective.
Some python advantages for scientific computing
Consider the formula for the height y of a ball in vertical motion, with initial upward velocity $v_0$: $ y = v_0 t + \frac{1}{2}g t^2$, where $g$ is the acceleration of gravity and $t$ is time.
How long time does it take for the ball to reach the height $y_c$?
As this is a quadratic function, we can make $y_c = y$ and solve for $t$, geting:
$t_1 = \frac{v_0 - \sqrt{v_0^2 - 2 g y_c}}{g}$, $t_2 = \frac{v_0 + \sqrt{v_0^2 - 2 g y_c}}{g}$
A possible program would be
v0 = 5 //arbitrarily chosen
g = 9.81 // constant
yc = 0.2 // arbitrailiry choson
import math
t1 = (v0 - math.sqrt(v0**2 - 2*g*yc))/g
t2 = (v0 + math.sqrt(v0**2 - 2*g*yc))/g
print 'At t=%g s and %g s, the height is %g m.' % (t1, t2, yc)
import math
print math.sqrt(2)
from math import sqrt
print sqrt(2)
Imported modules and functions can be given new names in the import statement
import math as m
print m.sqrt(2)
from math import sqrt as square_root
print square_root(2)
Let's say we want to convert from Celsius to Fahrenheit. The formula is:
$ F = \frac{9}{5}C + 32$
where $C$ is the temperature in Celsius and $F$ is the temperature in Fahrenheit. A possible program is
C = 21 #some arbitrary value
F = (9/5)*C + 32
print F
In python 2.X, this program will print 53. However, in python 3.X, this same program will print 69.8 (which is the correct value)
What happened? The formula in our program looks pretty much correct (and indeed is).
The error in our program above is one of the most common errors in mathematical software and is not at all obvious for a newcomer to programming.
In many computer languages, there are two types of divisions: float division and integer division.
Float division is what you know from mathematics: 9/5 becomes 1.8 in decimal notation
Integer division a/b with integers (whole numbers) a and b results in an integer that is truncated (or mathematically, rounded down. This implies that 9/5 becomes 1.
The version 2 of python (and many other programming languages) interprets a/b as an integer division, when a and b are integers
Other languages, such as Matlab and the version 3 of python interpret a/b as float division even if both operands are integers, or complex division if one of the operands is a complex number
It would be interesting to avoid integer division when programming mathematical formulas
Python 3.X has no problem with integer division, so the problem arises when we are using python 2.X. To overcome this, we can use at the very begining of our program:
from __future__ import division
In the rare cases when a mathematical algorithm does make use of integer division, one should use a double forward slash, //
, as division operator, because this is Python’s way of explicitly indicating integer division
An alternative is to use the cast operation:
C = 21 #some arbitrary value
F = (float(9)/5)*C + 32 # it could be (9/float(5))*C + 32
print F
or alternativelly, rewrite the formula using floats
C = 21 #some arbitrary value
F = (9.0/5)*C + 32 # it could be (9/5.0)*C + 32
print F
Let's say we want to compute the hiporbolic sine
$sinh = \frac{1}{2}(e^x - e{-x})$
using three different ways:
python math.sinh
procedure from math packagepython math.exp
procedure from mathpython **
from math import sinh, exp, e, pi
x = 2*pi
r1 = sinh(x)
r2 = 0.5*(exp(x) - exp(-x))
r3 = 0.5*(e**x - e**(-x))
print r1, r2, r3
267.744894041 267.744894041 267.744894041
At a first glance, the three results seems to be the equal
Nevertheless, this is not the whole story ...
Let's print the results using 16 decimals
print '%.16f %.16f %.16f' % (r1,r2,r3)
267.7448940410164369 267.7448940410164369 267.7448940410163232
Now r1
and r2
are equal, but r3
is different!
Why is this so?
- Our program computes with real numbers, and real numbers need in general an infinite number of decimals to be represented exactly.
- The computer truncates the sequence of decimals because the storage is finite
We won't get into details, but you can read further here
We must be aware that real numbers on a computer often have a small error.
Only a few real numbers can be represented exactly within the quantity of reserved digits, the rest of the real numbers are only approximations.
For this reason, most arithmetic operations involve inaccurate real numbers, resulting in inaccurate calculations
Let's consider the following two calculations: $\frac{1}{49}\cdot 49$ and $\frac{1}{51}\cdot 51$
Both expressions are identical to 1, but when we perform the calculations in Python
print '%.16f %.16f' % (1/49.0*49, 1/51.0*51)
0.9999999999999999 1.0000000000000000
The two results are different!
The reason why we do not get exactly 1.0 as answer in the first case is because $\frac{1}{49}$ is not correctly represented in the computer. Also $\frac{1}{51}$ has an inexact representation, but the error does not propagate to the final answer.
To summarize, errors in floating-point numbers may propagate through mathematical calculations.
As a result, answers are only approximations to the exact underlying mathematical values.
The errors in the answers are commonly known as round-off errors.
Python has a special module decimal
which allows real numbers to be represented with adjustable accuracy so that round-off errors can be made as small as desired.
A set is a collection of mathematical objects in which each object is considered to occur at most once.
The objects belonging to a set are its elements.
Curly braces are used to explicit enumerate the elements of a set. For example, $\{\heartsuit, \clubsuit, \spadesuit, \diamondsuit\}$ is the set of suits in a traditional deck of cards.
Python has a built in set structure
deck = set(["heart", "club", "spade", "diadmond"])
print (deck)
set(['club', 'heart', 'diadmond', 'spade'])
The symbol $\in$ is used to indicate that an object belongs to a set (equivalently, that the set contains the object).
For example, $\heartsuit \in \{\heartsuit, \clubsuit, \spadesuit, \diamondsuit\}$.
We can test if an element is in a set using the keyword in
element = "heart"
if element in deck:
print(element+" belongs to set")
else:
print(element+" does not belong to set")
heart belongs to set
One set $S_1$ is contained in another set $S_2$ (written $S_1 \subseteq S_2$ ) if every element of $S_1$ belongs to $S_2$.
Two sets are equal if they contain exactly the same elements. A convenient way to prove that two sets are equal consists of two steps:
In python, you can use the equality opertar to test set equality
A = set([1, 2, 3])
B = set([3, 2, 1])
if (A==B):
print("sets are equal")
else:
print("sets are different")
sets are equal
A set can be infinite. The set of real numbers $\Re$, which contains all real numbers, is infinite.
If a set S is not infinite, we use $|S|$ to denote its cardinality, the number of elements it contains. For example, the set of suits has cardinality 4.
The len
command returns the size of a set.
len(deck)
4
An empty set, denoted as $\emptyset$, no contains any element
We can create an empty set in Python by using set()
without arguments
emptyset = set()
Let $A$ and $B$ be two sets. The following are set operations
The union is a set of all elements from both sets
# initialize A and B
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
# use | operator
print(A | B)
# use union function
print(A.union(B))
# use union function on B
print(B.union(A))
set([1, 2, 3, 4, 5, 6, 7, 8]) set([1, 2, 3, 4, 5, 6, 7, 8]) set([1, 2, 3, 4, 5, 6, 7, 8])
The intersection is a set of elements that are common in both sets.
# initialize A and B
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
# use & operator
print(A & B)
# use intersection function
print(A.intersection(B))
# use intersection function on B
print(B.intersection(A))
set([4, 5]) set([4, 5]) set([4, 5])
The difference of A and B (A - B) is a set of elements that are only in A but not in B. Similarly, B - A is a set of element in B but not in A.
# initialize A and B
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
# use - operator
print(A-B)
# use the difference function on A
print(A.difference(B))
# use - operator
print(B-A)
# use the difference function on A
print(B.difference(A))
set([1, 2, 3]) set([1, 2, 3]) set([8, 6, 7]) set([8, 6, 7])
The Cartesian product of two sets $A$ and $B$, denoted by $A \times B$ is the set of all pairs $(a,b)$ where $a \in A$ and $b \in B$.
Example For the sets $A=\{1,2,3\}$ and $B=\{\heartsuit, \clubsuit, \spadesuit, \diamondsuit\}$, the cartesian product is $\{(1,\heartsuit), (1,\clubsuit), (1,\spadesuit), (1,\diamondsuit), (2,\heartsuit), (2,\clubsuit), (2,\spadesuit), (2,\diamondsuit), (3,\heartsuit), (3,\clubsuit), (3,\spadesuit), (3,\diamondsuit)\}$
For finite sets $A$ and $B$, $|A \times B| = |A|\cdot |B|$
Loosely speaking, a function is a rule that, for each element in some set $D$ of possible inputs, assigns a possible output.
The output is said to be the image of the input under the function and the input is a pre-image of the output.
The set D of possible inputs is called the domain of the function.
Formally, a function is a (possibly infinite) set of pairs $(a, b)$ no two of which share the same first entry.
Example
The doubling function with domain $\{1, 2, 3, \dots\}$ is $\{(1, 2), (2, 4), (3, 6), (4, 8), \dots\}$
Example
The multiplication function with domain $\{1, 2, 3, \dots\} × \{1, 2, 3, \dots\}$ looks something like this: $\{((1, 1), 1), ((1, 2), 2), \dots, ((2, 1), 2), ((2, 2), 4), ((2, 3), 6), \dots\}$
For a function named $f$, the image of $q$ under $f$ is denoted by $f (q)$. If $r = f (q)$, we say that $q$ maps to $r$ under $f$.
The notation for “$q$ maps to $r$” is $q \rightarrow r$.
It is convenient when specifying a function to specify a co-domain for the function.
The co-domain is a set from which the function’s output values are chosen. Note that one has some leeway in choosing the co-domain since not all of its members need be outputs.
The notation $f : D \rightarrow F$ means that $f$ is a function whose domain is the set $D$ and whose co-domain (the set of possible outputs) is the set $F$ . (More briefly: "a function from $D$ to $F$”, or “a function that maps $D$ to $F$”).
Example
Caesar’s cryptosystem replace each letter with the one three steps forward in the alphabet (wrapping around for $X$,$Y$, and $Z$). Thus the plaintext $MATRIX$ would be encrypted as the cyphertext $PDWULA$. The function that maps each plaintext letter to its cyphertext replacement could be written as $A \rightarrow D, B \rightarrow E, C \rightarrow F, D \rightarrow G, \dots, W \rightarrow Z, X \rightarrow A, Y \rightarrow B, Z \rightarrow C$ This function’s domain and co-domain are both the alphabet $\{A, B, \dots, Z\}$.
Example
The $cosine$ function, $\cos$, maps from the set of real numbers (indicated by $R$) to the set of real numbers. We would therefore write $\cos : R \rightarrow R$. Of course, the outputs of the cos function do not include all real numbers, only those between -1 and 1.
The image of a function $f$ is the set of images of all domain elements. That is, the image of $f$ is the set of elements of the co-domain that actually occur as outputs.
For example, the image of Caesar’s encryption function is the entire alphabet, and the image of the cosine function is the set of numbers between -1 and 1.
'''
compute the multiplication between two integers p and q
'''
def mult(p,q):
return p*q
Sometimes the arguments of procedures are sets or other container with several elements, and we want to compute and return somethig for (a combination of) all elements.
We can do the entire computation for all elements and return a container, but this may be time and memory consuming
Python provides the generator/iterator concept to help in this case
Let's say we want to implement a procedure to compute the cross product of two sets.
Without a generator we have:
def cross_prod_no_generator(A,B):
result = []
for element_a in A:
for element_b in B:
result.append((element_a,element_b))
return result
Let's say we want to implement a procedure to compute the cross product of two sets.
With a generator we have:
def cross_prod(A, B):
for element_a in A:
for element_b in B:
yield (element_a,element_b)
Observe that, instead of the return
statement, we used the yield
statement
return
is used when we want a normal procedure, and the "status" of the call is destroied after the return
is executed
yield
, on the other hand, produces a generator. The "status" of the call is frozen and return the next result in the next call
Each time we call the iterator, it will return a new result until all results are produced
A = set(['1','2','3'])
B = set(["heart", "club", "spade", "diadmond"])
for pair_element in cross_prod(A,B):
print pair_element
('1', 'club') ('1', 'heart') ('1', 'diadmond') ('1', 'spade') ('3', 'club') ('3', 'heart') ('3', 'diadmond') ('3', 'spade') ('2', 'club') ('2', 'heart') ('2', 'diadmond') ('2', 'spade')
We can also use without a loop, and explicit call the next element
A = set(['1','2','3'])
B = set(["heart", "club", "spade", "diadmond"])
cross_product_iterator = cross_prod(A,B)
print(cross_product_iterator.next())
print(cross_product_iterator.next())
print(cross_product_iterator.next())
('1', 'club') ('1', 'heart') ('1', 'diadmond')
As only one item is generated/consumed in each iteration, it is not necessary to allocate memory for all retornable items.
In a loop, all elements are consumed.
Outside the loop, elements could be consumed only once. Calling next after all elements where consumed will raise an error.
If not all elements are consumed, the frozen state is kept in memory (eventually, it may be garbage collected)
Example
input: a pair $(p, q)$ of integers greater than 1 output: the product $p\times q$
Example
input: an integer $m$ greater than 1 output: a pair $(p, q)$ of integers whose product is $m$
Unlike a procedure, a function or computational problem does not give us any idea how to compute the output from the input.
For integer multiplication, we can use:
- ordinary long multiplication (you learned this in elementary school)
- Karatsuba algorithm (used by Python for long-integer multiplication)
- the faster Schönhage-Strassen algorithm (which uses the Fast Fourier Transform)
- and the even faster Fürer algorithm, discovered in 2007.
There are often many different procedures that satisfy the same input-output specification or that implement the same function.
The
python muly
procedure definie before can be used for multiplying negative integers and numbers that are not integers.
A computational problem need not specify a unique output for every input.
Unlike a function, a computational problem need not specify a unique output for every input; for Example 0.3.8 (Page 4), if the input is 12, the output could be (2, 6) or (3, 4) or (4, 3) or (6, 2).
Although function and computational problem are defined differently, they are related
For each function $f$, there is a corresponding computational problem: Given an element a of $f$’s domain, compute $f(a)$, the image of $a$ under $f$.
This is known as the "forward problem"
However, there is another computational problem associated with a function: Given an element $r$ of the co-domain of the function, compute any pre-image (or report that none exists).
This is known as the "backward problem"
Suppose there is a procedure $P(x)$ for computing the image under $f$ of any element $x$ of the domain.
An obvious procedure for computing the pre-image of $r$ is to iterate through each of the domain elements $q$, and, one by one, apply the procedure $P(x) on $q$ to see if the output matches $r$.
This approach seems very slow - even if the domain is finite, it might be so large that the time required for solving the pre-image problem would be much more than that for $P(x)$ - and yet there is no better approach that works for all functions.
Sometimes, this behaviour is desirable: think about a good cryptographic function. One would like to compute the cyphred message easyly, but the decyphing process should be as hard as possible, unless we know the key
However, in some cases, we want to compute both problems efficiently. To this end, we must consider the pre-image problem not for arbitrary functions but for specific families of functions (as we have just seen, there is no general efficient procedure for all functions).
Furthermore, the family of functions must have applicability. If the family of functions is too restrictive, the existence of fast procedures for solving the pre-image problem will have no relevance to real-world problems.
Let us take the perspective of a lieutenant of Caesar who has received a cyphertext: PDWULA.
To obtain the plaintext, the lieutenant must find for each letter in the cyphertext the letter that maps to it under the encryption function.
That is, he must find the letter that maps to $P$ (namely $M$), the letter that maps to $D$ (namely $A$), and so on.
In doing so, he can be seen to be applying another function to each of the letters of the cyphertext, specifically the function that reverses the effect of the encryption function. This function is said to be the functional inverse of the encryption function.
The operation functional composition combines two functions to get a new function.
Given two functions $f : A \rightarrow B$ and $g : B \rightarrow C$, the function $g\circ f$, called the composition of $g$ and $f$, is a function whose domain is $A$ and its co-domain is $C$.
It is defined by the rule $$(g \circ f)(x) = g(f(x))$$
For any domain $D$, there is a function $id_D : D \rightarrow D$ called the identity function for $D$, defined by
$id_D(d) = d$
$\forall d \in D$.
We say that functions $f$ and $g$ are functional inverses of each other if
Not every function has an inverse. A function that has an inverse is said to be invertible.
Consider a function $f : D \rightarrow F$.
We say that $f$ is one-to-one if for $\forall x,y \in D,f(x)=f(y)$ implies $x=y$ .
We say that $f$ is onto if, $\forall y,z \in F$, there exists $x \in D$ such that $f(x) = z$.
Suppose $f$ is not one-to-one, and let $x_1$ and $x_2$ be distinct elements of the domain such that $f(x_1) = f(x_2)$.
Let $y = f(x_1)$. Assume for a contradiction that $f$ is invertible.
The definition of inverse implies that $f^{−1}(y) = x1$ and also $f^{−1}(y) = x2$, but both cannot be true.
Suppose $f$ is not onto, and let $\hat{y}$ be an element of the co-domain such that $\hat{y}$ is not the image of any domain element.
Assume for a contradiction that $f$ is invertible.
Then $\hat{y}$ has an image $\hat{x}$ under $f^{−1}$. The definition of inverse implies that $f(\hat{x}) = \hat{y}$, a contradiction.
Proof:
For each element $\hat{y}$ of the co-domain of $f$, since $f$ is onto, $f$’s domain contains some element $\hat{x}$ for which $f(\hat{x}) = \hat{y}$; we define $g(\hat{y}) = \hat{x}$.
$g \circ f$ must bee the identity function on $f$’s domain. Because $f$ is one-to-one, $\hat{x}$ is the only element of $f$’s domain whose image under $f$ is $\hat{y}$, so $g(\hat{y}) = \hat{x}$. This shows $g \circ f$ is the identity function.
Furthermore $f \circ g$ is the identity function on $g$’s domain. Let $\hat{y}$ be any element of $g$’s domain. By the definition of $g$, $f(g(\hat{y})) = \hat{y}$.
Proof:
Let $f : U \rightarrow V$ be an invertible function. Suppose that $g_1$ and $g_2$ are inverses of $f$.
We show that, for every element $v \in V$ , $g_1(v) = g_2(v)$, so $g_1$ and $g_2$ are the same function.
Let $v \in V$ be any element of the co-domain of $f$. Since $f$ is onto, there is some element $u \in U$ such that $v = f(u)$. By definition of inverse, $g_1(v) = u$ and $g_2(v) = u$. Thus $g_1(v) = g_2(v)$.
If $f$ and $g$ are invertible functions and $f \circ g$ exists then $f \circ g$ is invertible and $(f \circ g)^{-1} = g^{-1} \circ f^{-1}$.