Data processing with Python

Alexander Sapozhnikov, Tatyana Vasilieva,

Data processing with Python

Python

Part 3

Part 3

Control flow and loops

See also docs.python.org / Tutorial / More Control Flow Tools

Let’s imagine a song

Let’s imagine a song

Parse first line

Rewrite in Python-like pseudocode

   ↓ condition
if pianist.is_here :
    play_an_introduction()
    ↑ action

Conditional statement

Colon and indent instead of curly braces for blocks

if (x < 5):
    # four spaces indent is recommended
    print(x)

Conditional statement

Parentheses around condition aren’t necessary

if  x < 5 :
    # omit parentheses
    print(x)

Conditional statement

Parentheses around condition aren’t necessary

if x < 5 and y > 7:
    # omit parentheses when possible
    print(x)

See also docs.python.org / Tutorial / Expressions # Operator precedence

Chained comparison

# instead of
if temperature > 21 and temperature < 26:
# write
if 21 < temperature < 26:
    '''
    Comfortable
    '''

Chained comparison

# or even
if 21 < outdoor_temperature < indoor_temperature < 26:
    '''
    Comfortable but external temperature
    is a bit lower than internal one
    '''

Else

if x < 5:
    print(x)
else:
    # otherwise
    print(something_else)

else if → elif

if x < 5:
    print('Few')
elif x > 9:
    # second condition
    print('Many')
else:
    print(something_else)

else if → elif

if x < 5:
    print('Few')
elif x > 9:
    # second condition
    print('Many')
else:
    print(something_else)

else if → elif

if x < 5:
    print('Few')
elif x > 9:
    print('Many')
elif x > 7:
    print('Not so many')
elif z == 42:

= is not a comparison operator

>>> if z = 7:
  File "<stdin>", line 1
    if z = 7:
         ^
SyntaxError: invalid syntax

= and ==

Assignment

z = 7

Comparison

if z == 42:
    print(z)

Assignment with if/else

 
if x < 5:
    z = 'Few'
else:
    z = 'Many'

Assignment with if/else

instead of
if x < 5:
    z = 'Few'
else:
    z = 'Many'
write
z = 'Few' if x < 5 else 'Many'

Let’s remember song example

if pianist.is_here :
    play_an_introduction()

Try to play immediately!


    play_an_introduction()

SomeError: we have no pianist to play anything

Error handling

>>> stuff = ['hydrogen', 'helium', 'lithium']
>>> number = input('Enter number of element ')
Enter number of element 42
>>> print(stuff[int(number)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Handle exceptions with try statement

>>> try:
...     print(stuff[int(number)])
... except IndexError:
...     print(f'Wrong index. Use number less than {len(stuff)}')
...
Wrong index. Use number less than 3

What if number == 'z'?

>>> try:
...     print(stuff[int(number)])
... except IndexError:
...     print(f'Wrong index. Use number less than {len(stuff)}')
...
Traceback (most recent call last):
  File "", line 2, in 
ValueError: invalid literal for int() with base 10: 'z'

Add new exception handler

>>> try:
...     print(stuff[int(number)])
... except IndexError:
...     print(f'Wrong index. Use number less than {len(stuff)}')
... except ValueError:
...     print(f'Index must be an integer number')

Add else when there is no exception

>>> try: # number == 2
...     print(stuff[int(number)])
... # skipped
... else:
...     print('OK')
...
lithium
OK

finally is executing after all checks

... # skipped
... else:
...     print('OK')
... finally:
...     print("That's all, folks!")
...

output:

lithium
OK
That's all, folks!

finally is executing after all checks

>>> try:
...     print(stuff[int(number)])
... except IndexError:
...     print(f'Wrong index...
... # skipped
... finally:
...     print("That's all, folks!")
when number is wrong

output:

Wrong index. Use number less than 3
That's all, folks!

with

with open('/etc/timezone', 'r') as f:
    for line in f:
        print(line)
# same as
f = open('/etc/timezone', 'r')
for line in f:
        print(line)

with implicitly calls methods

#    __enter__
with open('/etc/timezone', 'r') as f:
    for line in f:
        print(line)
# __exit__
class Writer:
    def __init__(self, file_name):
        self.file_name = file_name;
    def __enter__(self):
        self.file = open(self.file_name, 'w')
        return self.file;
    def __exit__(self, exc_type, exc_value, traceback):
        self.file.close()
class Writer:
    def __init__(self, file_name):
        self.file_name = file_name;
    def __enter__(self):
        self.file = open(self.file_name, 'w')
        return self.file;
    def __exit__(self, exc_type, exc_value, traceback):
        self.file.close()
with Writer('file.txt') as f:
    f.write('hello world')
# Writer.__exit__
class Writer:
    def __init__(self, file_name):
        self.file_name = file_name;
    def __enter__(self):
        self.file = open(self.file_name, 'w')
        return self.file;
    def __exit__(self, exc_type, exc_value, traceback):
        self.file.close()

Loops

Structure of the song

loop

Loops

for loop


for item in sequence:
    # do something

Iterate through list


people = ['Alice', 'Bob', 'Charlie']
for person in people:
    print(person)

Iterate over range


teen = range(13, 20)
for age in teen:
    print(f'Age is {age}')

Iterate over range


teen = range(13, 20)
for age in teen:
    print(f'Age is {age}')

for i in range(10):
    do_something() # ten times

Counter name

i, j, k (and x, y, z) are good names for counter variables


for x in width:
    for y in height:
        for z in depth:
            do_something(x, y, z)

Nested loops

Put a loop inside another one


for x in width:
    for y in height:
        for z in depth:
            do_something(x, y, z)
loop arrows

How to iterate over several sequences simultaneously?


colors = ['red',   'orange', 'yellow'  ]
people = ['Alice', 'Bob',    'Charlie' ]
fruits = ['apple', 'banana', 'cucumber']

How to iterate over several sequences simultaneously?


colors = ['red',   'orange', 'yellow'  ]
people = ['Alice', 'Bob',    'Charlie' ]
fruits = ['apple', 'banana', 'cucumber']

How to iterate over several sequences simultaneously?


colors = ['red',   'orange', 'yellow'  ]
people = ['Alice', 'Bob',    'Charlie' ]
fruits = ['apple', 'banana', 'cucumber']

How to iterate over several sequences simultaneously?


colors = ['red',   'orange', 'yellow'  ]
people = ['Alice', 'Bob',    'Charlie' ]
fruits = ['apple', 'banana', 'cucumber']

zip

>>> for color, name, fruit in zip(colors, people, fruits):
...     print(f'{name} has {color} {fruit}')
...
Alice has red apple
Bob has orange banana
Charlie has yellow cucumber

How to enumerate items?

  1. First
  2. Second
  3. Third
  4. Fourth
  5. Fifth

Example: chemical elements

>>> stuff = ['hydrogen', 'helium', 'lithium']

name → value

Classic way

>>> stuff = ['hydrogen', 'helium', 'lithium']
>>> for i in range(len(stuff)):
...     print(i + 1, stuff[i])
...
1 hydrogen
2 helium
3 lithium

Use zip and range for numbering

>>> for number, name in zip(range(1, len(stuff) + 1), stuff):
...     print(number, name)
...
1 hydrogen
2 helium
3 lithium

enumerate

>>> for number, name in enumerate(stuff):
...     print(number, name)
...
0 hydrogen
1 helium
2 lithium

enumerate

>>> for number, name in enumerate(stuff, start=1):
...     print(number, name)
...
1 hydrogen
2 helium
3 lithium

Iterate over dict

>>> fruits = {
    'apple': 'red',
    'banana': 'yellow',
    'cucumber': 'green',
}

Iterate over dict — see its methods

>>> fruits = {'apple': 'red', 'banana': 'yellow', 'cucumber': 'green'}
>>> fruits.items()
dict_items([('apple', 'red'), ('banana', 'yellow'), ('cucumber', 'green')])
>>> fruits.keys()
dict_keys(['apple', 'banana', 'cucumber'])
>>> fruits.values()
dict_values(['red', 'yellow', 'green'])

Iterate over dict — see its methods

>>> fruits = {'apple': 'red', 'banana': 'yellow', 'cucumber': 'green'}
>>> fruits.items()
dict_items([ ('apple', 'red'), ('banana', 'yellow'), ('cucumber', 'green')] ] )

List of tuples

Iterate over whole dict

>>> for fruit, color in fruits.items():
...     print(f'{fruit} is {color}')
...
apple is red
banana is yellow
cucumber is green

tuple

Iterate over keys of dict

>>> for fruit in fruits.keys():
...     print(fruits[fruit], fruit)
...
red apple
yellow banana
green cucumber

while

while condition:
    # do something

while

>>> rest = 3
>>> while rest > 0:
...     print(f'Rest is {rest}')
...     rest -= 1
...
Rest is 3
Rest is 2
Rest is 1

while

>>> rest = 3
>>> while rest:
...     print(f'Rest is {rest}')
...     rest -= 1
...
Rest is 3
Rest is 2
Rest is 1

while

>>> rest = 3
>>> while rest:
...     print(f'Rest is {rest}')
...     rest -= 1

Increment and decrement

variable += delta  # increase
variable -= delta  # decrease

Unlike C, C++, Java, JavaScript, Perl, PHP, Ruby etc

Python has no ++ and -- operators

Python has no ++ and -- operators

>>> 3++2  # 3 + +2
5
>>> 4--5  # 4 − (−5) = 4 + 5
9

Python has no ++ and -- operators

>>> 7++
  File "", line 1
    7++
      ^
SyntaxError: invalid syntax

Loop

Loop

Skip rest of loop with continue

>>> for i in range(1, 5):
...     if i < 3: continue
...     print(i)
...
3
4
Loop

Go away from loop with break

>>> for i in range(1, 55):
...     print(i)
...     if i > 2: break
...
1
2
3
Loop

What about postconditional loop?

do:
    # do something
    until condition

Python has no postconditional loop

do:
    # do something
    until condition

×

Use break to emulate it

>>> while True: # infinite loop
...     amount = input('How many? Or type q to quit ')
...     if amount == 'q':
...         break
...
How many? Or type q to quit 4
How many? Or type q to quit q

Code reuse

Code reuse

Structure of the song

Structure of the song

← the same refrains

Structure of the song

← almost same refrains

Structure of the song

  • Introduction
  • First verse
  • Refrain
  • Second verse
  • Refrain
  • Refrain (shifted pitch)
  • Refrain (silent)

Refrain is part of code

Such parts named

  • subroutine
  • procedure
  • function

Let’s separate refrain

  • Introduction
  • First verse
  • Refrain
  • Second verse
  • Refrain
  • Refrain (shifted pitch)
  • Refrain (silent)

 

Refrain is part of code

  • function

→

define a function

>>> def refrain(how):
...     print('Chorus')
...

() are required

>>> def refrain():
...     print('Chorus')
... 

Call function

>>> refrain()
Chorus

() are required

>>> refrain()
Chorus

>>> refrain  # without parentheses
<function refrain at 0x7faf21a710d0>

Postpone implementation with pass

>>> def do_nothing():
...     pass
...
>>> do_nothing()

Function can take an arguments

>>> refrain()
Chorus

>>> duration = sing('Quick brown fox jumps')

and return values

Let’s define function and call it

>>> def refrain(text, count):
...     print(str(text) * int(count))
... 
>>> refrain('Yeah! ', 3)
Yeah! Yeah! Yeah!

Does type conversion work?

>>> def refrain(text, count):
...     print(str(text) * int(count))
... 
>>> refrain(42, 3)
424242
>>> refrain(42, '7')
42424242424242

Named arguments can arrive in any order

>>> def refrain(text, count):
...     print(str(text) * int(count))
... 
>>> refrain(count=5, text='Five! ')
Five! Five! Five! Five! Five!

Variable number of positional arguments

Define function
def production(*args):
    result = 1
    for number in args:
        result *= number
    return result

What about not a numbers?

>>> production('🐍', 2, 10)
'🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍'
>>> production(range(1, 70))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in production
TypeError: unsupported operand type(s) for *=: 'int' and 'range'

Expand arguments with asterisk *

>>> production(range(1, 7))
... TypeError: unsupported operand type(s) for *=:
'int' and 'range'
>>> production(list(range(1, 7)))
[1, 2, 3, 4, 5, 6]
>>> production(*range(1, 7))
720 #  = 1 × 2 × 3 × 4 × 5 × 6 = 6!

Expand arguments with asterisk *

>>> production(*range(1, 7))
720 #  = 1 × 2 × 3 × 4 × 5 × 6 = 6!

Cannot use * outside argument list

>>> *range(1, 7)
  File "<stdin>", line 1
SyntaxError: can't use starred expression here
>>> print(*range(1, 7))
1 2 3 4 5 6

* works with various iterators

>>> print(*range(1, 7))
1 2 3 4 5 6
>>> print(*zip(['apple', 'banana'], ['red', 'yellow']))
('apple', 'red') ('banana', 'yellow')
>>> print(*enumerate(['H', 'He', 'Li', 'Be', 'B'], start=1))
(1, 'H') (2, 'He') (3, 'Li') (4, 'Be') (5, 'B')

Keyword arguments

>>> def sing(**kwargs):
...     print(f'We sing a song named {kwargs["name"]} '
...         + f'in key {kwargs["key"]} using tempo '
...         + kwargs["tempo"])
...
>>> sing(name='Yesterday', tempo='96 bpm', key='F dur')
We sing a song named Yesterday in key F dur using tempo 96 bpm

Let’s try to use dict as arguments

>>> sos = {'name': 'S. O. S.', 'key': 'A moll',
...        'tempo': 'Allegro'}
>>> sing(sos)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sing() takes 0 positional arguments but 1 was given

Use double asterisk ** to expand dict

>>> sing(**sos)
We sing a song named S. O. S. in key A moll using tempo Allegro

Cannot use ** outside argument list

>>> **sos
  File "<stdin>", line 1
    **sos
     ^
SyntaxError: invalid syntax

Unknown names causes errors

>>> print(**sos)
Traceback (most recent call last):
  File <stdin>", line 1, in <module>
TypeError: 'name' is an invalid keyword argument for print()
>>> args = {'sep': '/', 'end': '!\n'}
>>> print('Some', 'sequence', 'here', **args)
Some/sequence/here!

Use * and ** together

>>> args = {'sep': '/', 'end': '!\n'}
>>> stuff = ['H', 'He', 'Li', 'Be', 'B']
>>> print(stuff, **args)
['H', 'He', 'Li', 'Be', 'B']!
>>> print(*stuff, **args)
H/He/Li/Be/B!

Long files are inconvenient

def do_something():
    # something

def do_something_else():
    # something else

result = do_something() + do_something_else()

Move functions to separate file

import something

result \
    = something.do() \
    + something.do_more()

print(result) # 1042

Side effect of import

# something.py
def do():
    # skipped

print('Oops!')

Prevent unexpected code execution

# something.py
def do():
    # skipped

if __name__ == '__main__':
    print('Oops!')

Use short alias

import something as s

result \
    = s.do() \
    + s.do_more()

print(result) # 1042
# something.py
def do():
    return 1000

def do_more():
    return 42
→

List imported items after import

#    import something
from something import do, do_more

result = do() + do_more()

print(result)

dir lists methods of imported module

>>> import something
>>> dir(something)
['__builtins__', '__cached__', '__doc__', '__file__',
 '__loader__', '__name__', '__package__', '__spec__',
 'do', 'do_more']

Specify alias of imported module in dir

>>> import something as s
>>> dir(something)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'something' is not defined
>>> dir(s)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'do', 'do_more']

Import modules from libraries the same way

>>> from datetime import date
>>> date.today()
datetime.date(2020, 11, 22)

Use aliases for module and object names

>>> from datetime import date as d
>>> d.fromtimestamp(1555444333)
datetime.date(2019, 4, 17)

>>> import math as m
>>> m.sqrt(65536)  # square root
256.0

Standard library

docs.python.org / The Python Standard Library

More than 200 modules which are already installed and usually ready to use

Local modules are preferred

# import-re.py
import re

print('Who is there?')
m = re.match(r'\w', 'S')
# re.py
def some():
    return 'thing'

print('Local re module')

python3 import-re.py

Local re module
Who is there?
Traceback (most recent call last):
  File "import-re.py", line 4, in <module>
    m = re.match(r'\w', 'S')
AttributeError: module 're' has no attribute 'match'

Look for module from standard library

Each module also comes with a one-line summary of what it does; to list the modules whose name or summary contain a given string such as "spam", type "modules spam".

>>> help('modules graph')

Look for module from standard library

>>> help('modules graph')
Here is a list of modules whose name or summary contains 'graph'.
If there are any, enter a module name to get more help.

secrets - Generate cryptographically strong pseudo-random numbers suitable for
turtle - Turtle graphics is a popular way for introducing programming to
Crypto - Python Cryptography Toolkit

Use more precise search query

>>> help('modules statistic')
Here is a list of modules whose name or summary contains 'statistic'.
If there are any, enter a module name to get more help.

statistics - Basic statistics module.

Get help on a module

>>> import(statistics)
>>> help(statistics)
Help on module statistics:

NAME
    statistics - Basic statistics module.
...

How to use modules
outside of Standard library

docs.python.org / Installing Python Modules

Install MODULENAME system-wide for Debian GNU/Linux and its derivatives (Ubuntu, Mint)

sudo apt install python3-MODULENAME

There are many ways to do it

Install local copy of modules

sudo apt install python3-venv
python3 -m venv venv
. venv/bin/activate
pip install MODULENAME

Read the documentation for your system.

Alexander Sapozhnikov, Tatyana Vasilieva

https://as.susu.ru