= (4, 5, 6)
tup tup
(4, 5, 6)
A tuple is a fixed-length, immutable sequence of Python objects which, once assigned, cannot be changed. The easiest way to create one is with a comma-separated sequence of values wrapped in parentheses:
= (4, 5, 6)
tup tup
(4, 5, 6)
In many contexts, the parentheses can be omitted
= 4, 5, 6
tup tup
(4, 5, 6)
You can convert any sequence or iterator to a tuple by invoking
tuple([4,0,2])
= tuple('string')
tup
tup
('s', 't', 'r', 'i', 'n', 'g')
Elements can be accessed with square brackets []
Note the zero indexing
0] tup[
's'
Tuples of tuples
= (4,5,6),(7,8)
nested_tup
nested_tup
((4, 5, 6), (7, 8))
0] nested_tup[
(4, 5, 6)
1] nested_tup[
(7, 8)
While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify which object is stored in each slot:
= tuple(['foo', [1, 2], True])
tup
2] tup[
True
```{python}
tup[2] = False
```
TypeError Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 tup[2] = False
TypeError: 'tuple' object does not support item assignment
TypeError: 'tuple' object does not support item assignment
If an object inside a tuple is mutable, such as a list, you can modify it in place
1].append(3)
tup[
tup
('foo', [1, 2, 3], True)
You can concatenate tuples using the + operator to produce longer tuples:
4, None, 'foo') + (6, 0) + ('bar',) (
(4, None, 'foo', 6, 0, 'bar')
If you try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the righthand side of the equals sign:
= (4, 5, 6)
tup tup
(4, 5, 6)
= tup
a, b, c
c
6
Even sequences with nested tuples can be unpacked:
= 4, 5, (6,7)
tup
= tup
a, b, (c, d)
d
7
To easily swap variable names
= 1, 4
a, b
a
1
b
4
= a, b
b, a
a
4
b
1
A common use of variable unpacking is iterating over sequences of tuples or lists
= [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
seq
seq
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
print(f'a={a}, b={b}, c={c}')
a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9
*rest
syntax for plucking elements
= 1,2,3,4,5
values
*rest = values
a, b,
rest
[3, 4, 5]
As a matter of convention, many Python programmers will use the underscore (_) for unwanted variables:
*_ = values a, b,
Since the size and contents of a tuple cannot be modified, it is very light on instance methods. A particularly useful one (also available on lists) is count
= (1,2,2,2,2,3,4,5,7,8,9)
a
2) a.count(
4
In contrast with tuples, lists are variable length and their contents can be modified in place.
Lists are mutable.
Lists use []
square brackts or the list
function
= [2, 3, 7, None]
a_list
= ("foo", "bar", "baz")
tup
= list(tup)
b_list
b_list
['foo', 'bar', 'baz']
1] = "peekaboo"
b_list[
b_list
['foo', 'peekaboo', 'baz']
Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions.
= range(10)
gen
gen
range(0, 10)
list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
the append
method
"dwarf")
b_list.append(
b_list
['foo', 'peekaboo', 'baz', 'dwarf']
the insert
method
1, "red")
b_list.insert(
b_list
['foo', 'red', 'peekaboo', 'baz', 'dwarf']
insert
is computationally more expensive than append
the pop
method, the inverse of insert
2) b_list.pop(
'peekaboo'
b_list
['foo', 'red', 'baz', 'dwarf']
the remove
method
"foo")
b_list.append(
b_list
['foo', 'red', 'baz', 'dwarf', 'foo']
"foo")
b_list.remove(
b_list
['red', 'baz', 'dwarf', 'foo']
Check if a list contains a value using the in
keyword:
"dwarf" in b_list
True
The keyword not
can be used to negate an in
"dwarf" not in b_list
False
similar with tuples, use +
to concatenate
4, None, "foo"] + [7, 8, (2, 3)] [
[4, None, 'foo', 7, 8, (2, 3)]
the extend
method
= [4, None, "foo"]
x
7,8,(2,3)])
x.extend([
x
[4, None, 'foo', 7, 8, (2, 3)]
list concatenation by addition is an expensive operation
using extend
is preferable
```{python}
everything = []
for chunk in list_of_lists:
everything.extend(chunk)
```
is generally faster than
```{python}
everything = []
for chunk in list_of_lists:
everything = everything + chunk
```
the sort
method
= [7, 2, 5, 1, 3]
a
a.sort()
a
[1, 2, 3, 5, 7]
sort
options
= ["saw", "small", "He", "foxes", "six"]
b
= len)
b.sort(key
b
['He', 'saw', 'six', 'small', 'foxes']
Slicing semantics takes a bit of getting used to, especially if you’re coming from R or MATLAB.
using the indexing operator []
= [7, 2, 3, 7, 5, 6, 0, 1]
seq
3:5] seq[
[7, 5]
also assigned with a sequence
3:5] = [6,3]
seq[
seq
[7, 2, 3, 6, 3, 6, 0, 1]
Either the start
or stop
can be omitted
5] seq[:
[7, 2, 3, 6, 3]
3:] seq[
[6, 3, 6, 0, 1]
Negative indices slice the sequence relative to the end:
-4:] seq[
[3, 6, 0, 1]
A step can also be used after a second colon to, say, take every other element:
2] seq[::
[7, 3, 3, 0]
A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple:
-1] seq[::
[1, 0, 6, 3, 6, 3, 2, 7]
The dictionary or dict may be the most important built-in Python data structure.
One approach for creating a dictionary is to use curly braces {} and colons to separate keys and values:
= {}
empty_dict
= {"a": "some value", "b": [1, 2, 3, 4]}
d1
d1
{'a': 'some value', 'b': [1, 2, 3, 4]}
access, insert, or set elements
7] = "an integer"
d1[
d1
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}
and as before
"b" in d1
True
the del
and pop
methods
del d1[7]
d1
{'a': 'some value', 'b': [1, 2, 3, 4]}
= d1.pop("a")
ret
ret
'some value'
The keys
and values
methods
list(d1.keys())
['b']
list(d1.values())
[[1, 2, 3, 4]]
the items
method
list(d1.items())
[('b', [1, 2, 3, 4])]
the update method to merge one dictionary into another
"b": "foo", "c": 12})
d1.update({
d1
{'b': 'foo', 'c': 12}
### Creating dictionaries from sequences
list(range(5))
[0, 1, 2, 3, 4]
= zip(range(5), reversed(range(5)))
tuples
tuples
= dict(tuples)
mapping
mapping
{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}
imagine categorizing a list of words by their first letters as a dictionary of lists
= ["apple", "bat", "bar", "atom", "book"]
words
= {}
by_letter
for word in words:
= word[0]
letter if letter not in by_letter:
= [word]
by_letter[letter] else:
by_letter[letter].append(word)
by_letter
{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}
The setdefault
dictionary method can be used to simplify this workflow. The preceding for loop can be rewritten as:
= {}
by_letter
for word in words:
= word[0]
letter
by_letter.setdefault(letter, []).append(word)
by_letter
{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}
The built-in collections
module has a useful class, defaultdict
, which makes this even easier.
from collections import defaultdict
= defaultdict(list)
by_letter
for word in words:
0]].append(word)
by_letter[word[
by_letter
defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})
keys generally have to be immutable objects like scalars or tuples for hashability
To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can be:
= {}
d
tuple([1,2,3])] = 5
d[
d
{(1, 2, 3): 5}
can be created in two ways: via the set
function or via a set literal
with curly braces:
set([2, 2, 2, 1, 3, 3])
2,2,1,3,3} {
{1, 2, 3}
Sets support mathematical set operations like union, intersection, difference, and symmetric difference.
The union
of these two sets:
= {1, 2, 3, 4, 5}
a
= {3, 4, 5, 6, 7, 8}
b
a.union(b)
| b a
{1, 2, 3, 4, 5, 6, 7, 8}
The &
operator or the intersection
method
a.intersection(b)
& b a
{3, 4, 5}
A table of commonly used set
methods
All of the logical set operations have in-place counterparts, which enable you to replace the contents of the set on the left side of the operation with the result. For very large sets, this may be more efficient
= a.copy()
c
|= b
c
c
{1, 2, 3, 4, 5, 6, 7, 8}
= a.copy()
d
&= b
d
d
{3, 4, 5}
set elements generally must be immutable, and they must be hashable
you can convert them to tuples
You can also check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set
= {1, 2, 3, 4, 5}
a_set
1, 2, 3}.issubset(a_set) {
True
1, 2, 3}) a_set.issuperset({
True
enumerate
returns a sequence of (i, value) tuples
sorted
returns a new sorted list
sorted([7,1,2,9,3,6,5,0,22])
[0, 1, 2, 3, 5, 6, 7, 9, 22]
zip
“pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples
= ["foo", "bar", "baz"]
seq1
= ["one", "two", "three"]
seq2
= zip(seq1, seq2)
zipped
list(zipped)
[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]
zip
can take an arbitrary number of sequences, and the number of elements it produces is determined by the shortest sequence
= [False, True]
seq3
list(zip(seq1, seq2, seq3))
[('foo', 'one', False), ('bar', 'two', True)]
A common use of zip
is simultaneously iterating over multiple sequences, possibly also combined with enumerate
for index, (a, b) in enumerate(zip(seq1, seq2)):
print(f"{index}: {a}, {b}")
0: foo, one
1: bar, two
2: baz, three
reversed
iterates over the elements of a sequence in reverse order
list(reversed(range(10)))
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[expr for value in collection if condition]
For example, given a list of strings, we could filter out strings with length 2 or less and convert them to uppercase like this
= ["a", "as", "bat", "car", "dove", "python"]
strings
for x in strings if len(x) > 2] [x.upper()
['BAT', 'CAR', 'DOVE', 'PYTHON']
A dictionary comprehension looks like this
dict_comp = {key-expr: value-expr for value in collection
if condition}
Suppose we wanted a set containing just the lengths of the strings contained in the collection
= {len(x) for x in strings}
unique_lengths
unique_lengths
{1, 2, 3, 4, 6}
we could create a lookup map of these strings for their locations in the list
= {value: index for index, value in enumerate(strings)}
loc_mapping
loc_mapping
{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}
Suppose we have a list of lists containing some English and Spanish names. We want to get a single list containing all names with two or more a’s in them
= [["John", "Emily", "Michael", "Mary", "Steven"],
all_data "Maria", "Juan", "Javier", "Natalia", "Pilar"]]
[
= [name for names in all_data for name in names
result if name.count("a") >= 2]
result
['Maria', 'Natalia']
Here is another example where we “flatten” a list of tuples of integers into a simple list of integers
= [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
some_tuples
= [x for tup in some_tuples for x in tup]
flattened
flattened
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Functions
are the primary and most important method of code organization and reuse in Python.
they use the def
keyword
Each function can have positional arguments and keyword arguments. Keyword arguments are most commonly used to specify default values or optional arguments. Here we will define a function with an optional z argument with the default value 1.5
def my_function(x, y, z=1.5):
return (x + y) * z
4,25) my_function(
43.5
The main restriction on function arguments is that the keyword arguments must follow the positional arguments
A more descriptive name describing a variable scope in Python is a namespace.
Consider the following function
= []
a
def func():
for i in range(5):
a.append(i)
When func()
is called, the empty list a is created, five elements are appended, and then a is destroyed when the function exits.
func()
func()
a
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
What’s happening here is that the function is actually just returning one object, a tuple, which is then being unpacked into the result variables.
def f():
= 5
a = 6
b = 7
c return a, b, c
= f()
a, b, c
a
5
Suppose we were doing some data cleaning and needed to apply a bunch of transformations to the following list of strings:
= [" Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
states "south carolina##", "West virginia?"]
import re
def clean_strings(strings):
= []
result for value in strings:
= value.strip()
value = re.sub("[!#?]", "", value)
value = value.title()
value
result.append(value)return result
clean_strings(states)
['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']
Another approach
def remove_punctuation(value):
return re.sub("[!#?]", "", value)
= [str.strip, remove_punctuation, str.title]
clean_ops
def clean_strings(strings, ops):
= []
result for value in strings:
for func in ops:
= func(value)
value
result.append(value)return result
clean_strings(states, clean_ops)
['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']
You can use functions as arguments to other functions like the built-in map
function
for x in map(remove_punctuation, states):
print(x)
Alabama
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia
a way of writing functions consisting of a single statement
suppose you wanted to sort a collection of strings by the number of distinct letters in each string
= ["foo", "card", "bar", "aaaaaaa", "ababdo"]
strings
=lambda x: len(set(x)))
strings.sort(key
strings
['aaaaaaa', 'foo', 'bar', 'card', 'ababdo']
Many objects in Python support iteration, such as over objects in a list or lines in a file.
= {"a": 1, "b": 2, "c": 3}
some_dict
for key in some_dict:
print(key)
a
b
c
Most methods expecting a list or list-like object will also accept any iterable object. This includes built-in methods such as min
, max
, and sum
, and type constructors like list
and tuple
A generator
is a convenient way, similar to writing a normal function, to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used. To create a generator, use the yield keyword instead of return in a function
def squares(n=10):
print(f"Generating squares from 1 to {n ** 2}")
for i in range(1, n + 1):
yield i ** 2
= squares()
gen
for x in gen:
print(x, end=" ")
Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100
Since generators produce output one element at a time versus an entire list all at once, it can help your program use less memory.
This is a generator analogue to list, dictionary, and set comprehensions. To create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets:
= (x ** 2 for x in range(100))
gen
gen
<generator object <genexpr> at 0x000001A3294B9C10>
Generator expressions can be used instead of list comprehensions as function arguments in some cases:
sum(x ** 2 for x in range(100))
328350
dict((i, i ** 2) for i in range(5))
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
itertools
module has a collection of generators for many common data algorithms.
groupby
takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function
import itertools
def first_letter(x):
return x[0]
= ["Alan", "Adam", "Jackie", "Lily", "Katie", "Molly"]
names
for letter, names in itertools.groupby(names, first_letter):
print(letter, list(names))
A ['Alan', 'Adam']
J ['Jackie']
L ['Lily']
K ['Katie']
M ['Molly']
Handling errors or exceptions gracefully is an important part of building robust programs
def attempt_float(x):
try:
return float(x)
except:
return x
"1.2345") attempt_float(
1.2345
"something") attempt_float(
'something'
You might want to suppress only ValueError, since a TypeError (the input was not a string or numeric value) might indicate a legitimate bug in your program. To do that, write the exception type after except:
def attempt_float(x):
try:
return float(x)
except ValueError:
return x
1, 2)) attempt_float((
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
d:\packages\bookclub-py4da\03_notes.qmd in <cell line: 1>()
----> 1001 attempt_float((1, 2))
Input In [114], in attempt_float(x)
1 def attempt_float(x):
2 try:
----> 3 return float(x)
4 except ValueError:
5 return x
TypeError: float() argument must be a string or a real number, not 'tuple'
You can catch multiple exception types by writing a tuple of exception types instead (the parentheses are required):
def attempt_float(x):
try:
return float(x)
except (TypeError, ValueError):
return x
1, 2)) attempt_float((
(1, 2)
In some cases, you may not want to suppress an exception, but you want some code to be executed regardless of whether or not the code in the try block succeeds. To do this, use finally
:
= open(path, mode="w")
f
try:
write_to_file(f)finally:
f.close()
Here, the file object f will always get closed.
you can have code that executes only if the try: block succeeds using else:
= open(path, mode="w")
f
try:
write_to_file(f)except:
print("Failed")
else:
print("Succeeded")
finally:
f.close()
If an exception is raised while you are %run-ing a script or executing any statement, IPython will by default print a full call stack trace. Having additional context by itself is a big advantage over the standard Python interpreter
To open a file for reading or writing, use the built-in open function with either a relative or absolute file path and an optional file encoding.
We can then treat the file object f like a list and iterate over the lines
= "examples/segismundo.txt"
path
= open(path, encoding="utf-8")
f
= [x.rstrip() for x in open(path, encoding="utf-8")]
lines
lines
When you use open to create file objects, it is recommended to close the file
f.close()
some of the most commonly used methods are read
, seek
, and tell
.
read(10)
returns 10 characters from the file
the read
method advances the file object position by the number of bytes read
tell()
gives you the current position in the file
To get consistent behavior across platforms, it is best to pass an encoding (such as encoding="utf-8"
)
seek(3)
changes the file position to the indicated byte
To write text to a file, you can use the file’s write
or writelines
methods
The default behavior for Python files (whether readable or writable) is text mode, which means that you intend to work with Python strings (i.e., Unicode).