Data processing with Python

Alexander Sapozhnikov, Tatyana Vasilieva,

Data processing with Python

Python

Part 2

Part 2

Documentation

Documentation

Interactive programming environments

repl.it

repl.it

Interactive mode

>>>
⮬ prompt

Interactive mode

>>> 42 + 24
our input ⮭

Interactive mode

>>> 42 + 24
66 ⬅ result

Oops

>>> 42 + 'e'
Traceback (most recent call last):
  File "", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
⮬ error messages

Syntax peculiarities

No semicolon ; after single statement

counter = 42

Syntax peculiarities

Colon and indent instead of curly braces for blocks

for fruit in basket:
    # four spaces is recommended
    print(fruit)

Syntax peculiarities

Colon and indent instead of curly braces for blocks

for fruit in basket:
    # four spaces is recommended
    print(fruit)

Python is case sensitive

Python is case sensitive

A ≠ a

Variables

Variables

Assignment

name = 'value'

name → value

Assign a new value

name = 'value'
name = 42

name → value

Multiple assignment

>>> mice = cats = dogs = 3
>>> cats
3
name → value

Multiple assignment

>>> mice = cats = dogs = 3
>>> cats
3
>>> cats = 15
>>> dogs
3
name → value

Variable naming

There are only two hard things in Computer Science:
cache invalidation and naming things.

Phil Karlton

Variable naming

Available characters are:

>>> theSun_and_8_planets = 'solar'

Variable naming

b (single lowercase letter)
B (single uppercase letter)
CapitalizedWords
or CamelCase 🐪

lowercase lower_case_with_underscores UPPERCASE UPPER_CASE_WITH_UNDERSCORES

See PEP8

Variable naming

Convention is to use lower_case_with_underscores — 🐍 snake case

for variables and functions

You cannot use keyword as variable name

>>> global = 'World'
  File "<stdin>", line 1
    global = 'World'
           ^
SyntaxError: invalid syntax

You cannot use keyword as variable name

>>> help("keywords")

Here is a list of the Python keywords.  Enter any keyword to get more help.

False       class       from        or
None        continue    global      pass
True        def         if          raise

Keywords

FalseNoneTrueandas
assertasyncawaitbreakclass
continuedefdelelifelse
exceptfinallyforfromglobal
ifimportinislambda
nonlocalnotorpassraise
returntrywhilewithyield

Variable naming

Python 3 allows to use some non-ASCII letters but it’s a wrong way

>>> Öl = 'Barrel.'
>>> print(Öl * 3)
Barrel.Barrel.Barrel.
>>> Ø = 0
>>> Ж = 8
>>> Зима = 'Winter'

Don’t do that

>>> ქ = 'khar'
>>> ձ = 'ja'
>>> ж = 'zhe'
>>> ξ = 'xi'
>>> ש = 'shin'
>>> ش = 'sheen'

Don’t do that

>>> o = 'Latin'
>>> ο = 'Greek'
>>> о = 'Cyrillic'
>>> օ = 'Armenian'
>>> ჿ = 'Georgian'
>>> print(o, ο, о, օ, ჿ)
Latin Greek Cyrillic Armenian Georgian

Non-letters are forbidden

>>> × = 'multiply'
  File "<stdin>", line 1
    × = 'multiply'
    ^

Non-letters are forbidden

>>> ⼤ = 'big'
  File "<stdin>", line 1
    ⼤ = 'big'
    ^
SyntaxError: invalid character in identifier

Process and output variable

>>> some = 'thing'
>>> len(some)
5
>>> print(some)
thing

Process and output variable

>>> print(some)
thing
>>> print('Any' + some)
Anything

Objects and classes

Let's imagine some animal

Let's imagine some animal

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

cat is an Object

      height = cat.size

result = cat.run(42)
is_relaxed = cat.sleep()
      
    

Object has properties

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

cat is an Object

      height = cat.size

result = cat.run(42)
is_relaxed = cat.sleep()
      
    

Object has methods

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

cat is an Object

      height = cat.size

result = cat.run(42)
is_relaxed = cat.sleep()
      
    

Use dot . to call properties and methods

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

cat is an Object

      height = cat.size

result = cat.run(42)
is_relaxed = cat.sleep()
      
    

Class vs Object

Class →

 

 

Objects →

class Animal:
    def __init__(self, name):
        self.name = name

cat   = Animal('Tom')
mouse = Animal('Jerry')

Which reason?

Any data type is a Class

Any data is an Object

Data types

Built-in data types

See also docs.python.org/3.7/library/stdtypes

Built-in data types

See also docs.python.org/3.7/library/stdtypes

Detect type of data

>>> type(42)
<class 'int'>

Detect type of data

>>> type(42)
<class 'int'>
>>> type(3.14)
<class 'float'>

Detect type of data

>>> type(42)
<class 'int'>
>>> type(3.14)
<class 'float'>
>>> type('3.14')
<class 'str'>

Detect type of variables and expressions

>>> type(some)    # variable
<class 'str'>
>>> type(5 + 0.5) # expression
<class 'float'>

List methods with dir

>>> some = 'Thing'
>>> dir(some)     # or
>>> dir('Thing')  # or
>>> dir(str)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__',
... 'swapcase', 'title', 'translate', 'upper', 'zfill']

List methods with dir()

>>> some = 'Thing'
>>> dir(some)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__',
... 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> some.swapcase()
'tHING'

bool

Boolean type is used for logical data

Note: the capitalization

Logic

Answers have boolean type

>>> type(5 < 2)
<class 'bool'>
>>> type(apple == 'fruit')
<class 'bool'>

Convert to boolean

>>> bool(7)
True
>>> bool('non empty')
True
>>> bool([2020, 11, 22])
True

Convert to boolean

>>> bool(0)
False
>>> bool('')
False
>>> bool([])
False

Implicit conversion to boolean

>>> if list_name:
...    # do something with list_name

False value testing

int

integer number

>>> type(42)
<class 'int'>

Underscores for long numbers

>>> 4_294_967_296     # 232
4294967296
>>> +7_800_775_00_00  # even phone numbers
78007750000

int can use various bases

>>> 0xC0FFEE  # hexadecimal
12648430
>>> 0o777     # octal
511
>>> 0b1111    # binary
15

float

Floating point number

>>> 3.1415926
3.1415926
>>> 9.
9.0
>>> 3e8
300000000.0

float

3e8 = 3 × 108 = 300000000.0 # light speed, meters per second

125e-3 = 125 × 10−3 = 0.125

>>> 6.022e23 # Avogadro constant, mol−1
6.022e+23

Change type

Use name of type as function to convert data

>>> int(3.1415926)
3
>>> float(42)
42.0

Implicit type changing

>>> 3. + 2
5.0
>>> 3 + 2.
5.0

str

string is the sequence of characters

string

Individual characters are accessible

string with highlighted i

>>> 'string'[3]
'i'

as well as whole string

highlighted string

>>> 'string'.upper()
'STRING'

String parts are strings too

string with highlighted i

>>> 'string'[3].upper()
'I'

Methods of str

  • capitalize() → string
  • center(width[, fillchar]) → string
  • count(sub[, start[, end]]) → int
  • decode([encoding[,errors]]) → object
  • encode([encoding[,errors]]) → object
  • endswith(suffix[, start[, end]]) → bool
  • expandtabs([tabsize]) → string
  • find(sub [,start [,end]]) → int
  • format(*args, **kwargs) → string
  • index(sub [,start [,end]]) → int

Methods of str

  • isalnum() → bool
  • isalpha() → bool
  • isdigit() → bool
  • islower() → bool
  • isspace() → bool
  • istitle() → bool
  • isupper() → bool
  • join(iterable) → string
  • just(width[, fillchar]) → string
  • lower() → string
  • lstrip([chars]) → string or unicode
  • partition(sep) → (head, sep, tail)
  • replace(old, new[, count]) → string
  • rfind(sub [,start [,end]]) → int

Methods of str

  • rindex(sub [,start [,end]]) → int
  • rjust(width[, fillchar]) → string
  • rpartition(sep) → (head, sep, tail)
  • rsplit([sep [,maxsplit]]) → list of strings
  • rstrip([chars]) → string or unicode
  • split([sep [,maxsplit]]) → list of strings
  • splitlines(keepends=False) → list of strings
  • startswith(prefix[, start[, end]]) → bool
  • strip([chars]) → string or unicode

Methods of str

  • swapcase() → string
  • title() → string
  • translate(table [,deletechars]) → string
  • upper() → string
  • zfill(width) → string

38 methods!

String subtypes

>>> 'Generic' or "common"
'Generic'

Special characters

>>> 'B letter \x42'
'B letter B'
>>> "\x53ame behavior with double quotation marks"
'Same behavior with double quotation marks'
>>> 'Unicode: питон — это змея 🐍 蛇'
'Unicode: питон — это змея 🐍 蛇'

Special characters

C-like notation

Use """triple delimiters""" to make multiple lines

>>> """
... Multiline strings
... often used as comments
... """
'\nMultiline strings\noften used as comments\n'

'''Triple delimiters'''

>>> '''
... These strings can contain
... 'single' or "double" quotation marks
... '''
'\nThese strings can contain\n\'single\' or "double" quotation marks\n'

Special characters

C-like notation

>>> '\xAB'
'«'

Special characters

What about small Russian letter “ef”? Its hexadecimal code is 444.

>>> '\x444'
'D4'

Special characters

What about small Russian letter “ef”? Its hexadecimal code is 444.

>>> '\x444'
'D4'

Unicode characters

>>> '\u444'
SyntaxError: (unicode error) 'unicodeescape' codec
can't decode bytes in position 0-4: truncated \uXXXX escape
>>> '\u0444'
'ф'
>>> '\x01f41b'
'ὁb'

Unicode characters above U+FFFF

>>> '\U1f41b'
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec
can't decode bytes in position 0-6: truncated \UXXXXXXXX escape
>>> '\U0001f41b'
'🐛'

Unicode strings — u''

>>> u'日'
'日'

Byte strings — b''

>>> b'Byte'
b'Byte'
>>> b'Жи-ши'
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.

Escape backslash

Just double it

>>> print('\\back')
\back

Raw strings — r''

There are no special characters

>>> r'\back\slash'
'\\back\\slash'

Raw strings

There are no special characters

>>> r'\back\slash'
'\\back\\slash'
>>> r'^\S+ome\regular\expr\e\s\Sio\n{7}'
'^\\S+ome\\regular\\expr\\e\\s\\Sio\\n{7}'

See also re — Regular expression operations

Raw strings

There are no special characters

>>> r'\back\slash'
'\\back\\slash'
>>> r'^\S+ome\regular\expr\e\s\Sio\n{7}'
'^\\S+ome\\regular\\expr\\e\\s\\Sio\\n{7}'
>>> r'C:\Windows\system32\drivers\hosts.txt'
'C:\\Windows\\system32\\drivers\\hosts.txt'

Raw strings with triple delimiters

>>> r'''
... TenorI = \context Voice = TenorI {
...     \global
...     \dynamicUp \stemUp \slurUp \tieUp
...     \tempo Moderato
... '''
'\nTenorI = \\context Voice = TenorI {\n    \\global\n    \\dynamicUp \\stemUp \\slurUp \\tieUp\n    \\tempo Moderato\n'

Frescobaldi

Format strings — f''

>>> pi = 3.14159265358
>>> f'π is {pi}'
'π is 3.14159265358'

Since 2015 — Python 3.6. See also realpython.com/python-f-strings

Concatenate strings with +

>>> 'head ' + 'and' + ' tail'
'head and tail'

Concatenate strings with +

>>> 3 + ' is three'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> str(3) + ' is three'
'3 is three'

list

List is a sequence of values

list

list

List items can have various types

list

>>> [1, 2, 3, 5, 7, 11, 'numbers']

Empty list

>>> empty = []

List items numbered from 0

Index of item is an offset from left edge of list

list

>>> prime_numbers = [1, 2, 3, 5, 7, 11]

List items numbered from 0

list

>>> prime_numbers = [1, 2, 3, 5, 7, 11]
>>> prime_numbers[3]
5

Use negative indices to access last items

list

>>> prime_numbers[-1]
11

Use negative indices to access last items

list

>>> prime_numbers[-2]
7

Slice — first:last

list

>>> prime_numbers[1:4]
[2, 3, 5]

Slice — first: without right bound

list

>>> prime_numbers[1:]
[2, 3, 5, 7, 11]

Slice — :last without left bound

list

>>> prime_numbers[:3]
[1, 2, 3]

Slice — ::step after second semicolon

list

>>> prime_numbers[1:6:2]
[2, 5, 11]

Without bounds but with ::step

list

>>> prime_numbers[::2]
[1, 3, 7]

Assign new value to certain item

>>> prime_numbers[3] = 'R'
>>> prime_numbers
[1, 2, 3, 'R', 7, 11]

String parts are accessible same way

list

>>> line = 'abcdefghi'
>>> line[3]
'd'

Get substring

list

>>> line = 'abcdefghi'
>>> line[:3]
'abc'

String parts are accessible same way

list

>>> line = 'abcdefghi'
>>> line[::2]
'acegi'

Access to single character

array

>>> names = ['Alice', 'Bob', 'Charlie']

Access to single character

access

>>> names[2][0]
'C'

Methods of list

>>> help(list)
Help on class list in module __builtin__:
...
 |  append(...)
 |      L.append(object) -- append object to end
 |
 |  count(...)

Methods of list

  • append(object)
  • count(value) → integer
  • extend(iterable)
  • index(value, [start, [stop]]) → integer
  • insert(index, object)
  • pop([index]) → item
  • remove(value)
  • reverse()
  • sort(cmp=None, key=None, reverse=False)

Methods of list

List methods

Add items to list

>>> abc = ['a', 'b', 'c']
>>> abc.append('e')
>>> abc.extend(['f', 'g'])
>>> abc.insert(3, 'd')
>>> abc
['a', 'b', 'c', 'd', 'e', 'f', 'g']

Remove items from list

>>> abc = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> abc.pop(3) # return deleted item
'd'
>>> del abc[1] # return nothing
>>> abc.remove('e')
>>> abc
['a', 'c', 'f', 'g']

tuple

Tuple is read-only list

>>> (1, 2, 3, 5, 7, 11)
(1, 2, 3, 5, 7, 11)
>>> (1, 2, 3, 5, 7, 11)[3]
5

Tuple is read-only list

>>> wheels = (2, 3, 4, 6, 8)
>>> wheels[2] = 7
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

Make tuple

Use parentheses to make a tuple

>>> (2)
2        # oops! It's an integer
>>> (2,)
(2,)     # tuple has one item
>>> ()
()       # tuple is empty

Convert tuple to list to make it writable

>>> list((1, 2, 3, 5, 7, 11))
[1, 2, 3, 5, 7, 11]
>>> tuple([1, 2, 3, 5, 7, 11])
(1, 2, 3, 5, 7, 11)

dict

Dictionary is list of pairs key: value

>>> apple = {'color': 'red', 'weight': 7, 'shape': 'ball'}
>>> apple['color']
'red'
>>> apple['shape']
'ball'

Change values and add new ones

>>> apple['color'] = 'yellow'
>>> apple['origin'] = 'Normandy'
>>> f"{apple['color']} apple came from {apple['origin']}"
'yellow apple came from Normandy'

Get nonexistent value

>>> apple['nonexistent']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'nonexistent'

Get nonexistent value

>>> apple['nonexistent']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'nonexistent'

>>> apple.get('nonexistent', 'none')
'none'

Get any value safely

>>> apple.get('nonexistent', 'none')
'none'

>>> apple.get('color', 'none')
'yellow'

Make an empty dict

>>> empty = dict()  # possible but ugly
>>> empty
{}

>>> hollow = {}     # better

set

>>> even = {0, 2, 4, 6, 2, 0, 0}
>>> even
{0, 2, 4, 6}

set

>>> 5 in even
False
>>> 2 in even
True

Methods of set

>>> dir(set)
['__and__', '__class__', '__contains__', '__delattr__', '__dir__',
'add', 'clear', 'copy', 'difference', 'difference_update',
'discard', 'intersection', 'intersection_update', 'isdisjoint',
'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference',
'symmetric_difference_update', 'union', 'update']

docs.python.org / Built-in Types #set

Methods of set

>>> help(set)
 |  add(...)
 |      Add an element to a set.
 |      This has no effect if the element is already present.
 |
 |  clear(...)
 |      Remove all elements from this set.

docs.python.org / Built-in Types #set

Call some methods of set type

>>> threes = {3, 6, 9, 12, 15, 18}
>>> fives  = {5, 10, 15, 20 }
>>> threes.union(fives)
{3, 5, 6, 9, 10, 12, 15, 18, 20}
>>> threes.difference(fives)
{3, 6, 9, 12, 18}

Convert range to set

>>> threes = set(range(3, 21, 3))
>>> fives  = set(range(5, 25, 5))
>>> threes.union(fives)
{3, 5, 6, 9, 10, 12, 15, 18, 20}
>>> threes.difference(fives)
{3, 6, 9, 12, 18}

Convert set to list

>>> threes[2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable
>>> list(threes)[2]
9

range

Range is the sequence of monotonically uniformly changing integers — arithmetic progression:

Range

>>> teen = range(13, 20)

Mathematically, t = [13, 20)

Range

>>> teen = range(13, 20)

Mathematically, t = [13, 20)

include 13 and exclude 20

>>> teen = range(13, 20)
>>> teen
range(13, 20)
>>> teen = range(13, 20)
>>> teen
range(13, 20)
>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,

Get arbitrary item from range by its index

>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,
>>> teen[3]
16

Get part of range

>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,
>>> teen[3:]
range(16, 20)
>>> teen[:3]
range(13, 16)

Get part of range with third parameter

>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,
>>> teen[3::2]
range(16, 20, 2)

Range with step

>>> for item in range(0, 100, 9):
...     print(item, end=', ')
...
0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99,

Range with negative step

>>> for item in range(3, 0, -1):
...     print(item, end=', ')
...
3, 2, 1,

Range with negative step

>>> for item in range(3, 0, -1):
...     print(item, end=', ')
...
3, 2, 1,

No zero here

Convert range to list

>>> before_ten = list(range(0, 10))
>>> before_ten
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Conclusion

Next part

Part 3. Control flow and loops

Alexander Sapozhnikov, Tatyana Vasilieva

https://as.susu.ru