= ['success', 'failure', 'failure', 'success', 'failure', 'failure', 'failure', 'success'] x
Introduction to Loops
Suppose we have the following outcomes of an experiment:
How might we write code to count the number of successes in this string of outcomes? There are several different ways to accomplish this: one involves the main topic of today’s lab, which is a for
loop.
Here’s the general idea: we would like to perform an element-wise comparison; that is, we would like to iteratively check whether each element of x
is a success or a failure. The “brute-force” way would be to check each element individually, using comparisons:
0] == 'success' x[
True
1] == 'success' x[
False
2] == 'success' x[
False
As you can imagine, though, this would get incredibly tedious, especially if x
were large! This is where for
loops become useful: they allow us to automate this iterative process.
Before returning to this success/failure problem, let’s look at an example to see how for
loops work.
for fruit in ['apple', 'banana', 'pear']:
print(fruit)
apple
banana
pear
Here are how the different components work:
- The
for
keyword signifies the beginning of thefor
loop. - The name
fruit
is the variable. - The list following the
in
keyword contains all of the different values the variable will take during the execution of the for loop. - The code after the initial colon
:
is called the body of the loop. (Note that the body of a for loop must be indented properly!) Here is how the body is executed:- First, the variable
fruit
is assigned the first value in the list of possible values specified in the first line of the loop - Then, after assigning
fruit
this value, the code in the body is executed once. - Next, the variable
fruit
is assigned the second value of the list of values, and the body is run again. - This continues until the list of all possible values is exhausted.
- First, the variable
Sometimes, it may be useful to sketch a diagram/table to keep track of the code at each iteration of the loop:
FIRST ITERATION | |
Start of Iteration |
|
End of Iteration |
|
SECOND ITERATION | |
Start of Iteration |
|
End of Iteration |
|
THIRD ITERATION | |
Start of Iteration |
|
End of Iteration |
|
It may seem strange to keep track of the values of the variables at the end of each iteration. The reason we do so is because sometimes the body of the loop will actually change the value of a variable! For example, consider the code
for n in [1, 2, 3]:
+= 2
n print(n)
3
4
5
the associated diagram would look like
FIRST ITERATION | |
Start of Iteration |
|
End of Iteration |
|
SECOND ITERATION | |
Start of Iteration |
|
End of Iteration |
|
THIRD ITERATION | |
Start of Iteration |
|
End of Iteration |
|
By the way, notice the shorthand notation +=
that was used above:
The code x += y
is equivalent to x = x + y
.
Finally, one thing that should be mentioned is that you can call the variable in a loop whatever you like!
for yummy in ['apple', 'banana', 'pear']:
print(yummy)
apple
banana
pear
Copy-paste the code
= ['success', 'failure', 'failure', 'success', 'failure', 'failure', 'failure', 'success'] x
into a cell, and run it. Then, create a for
loop that iterates through the elements of x
and at each iteration prints True
if the corresponding element of x
is a 'success'
and False
if the corresponding element of x
is a 'failure'
. Your final output should look like:
True
False
False
True
False
False
False
True
By the way, the set of values a variable will take during a for
loop doesn’t have to be a list- it could also be an array! This is particularly useful when there are multiple things we would like to iterate over. For example:
import datascience as ds
= ds.make_array(
credit_scores "Anne", 750],
["Barbara", 755],
["Cassandra", 745]
[
)
for k in credit_scores:
print(k[0], "has a credit score of", k[1])
Anne has a credit score of 750
Barbara has a credit score of 755
Cassandra has a credit score of 745
Make a table like the one above that keeps track of the variables and their values in the above loop. You do not need to turn this in; do it on a separate sheet of paper and in your .ipynb
file simply state “I have done Task 2 on a separate sheet of paper.”
Now, we never quite finished our problem of counting the number of successes in the variable x
. We were able to iterate through the elements of x
to determine which were successes and which were failures, but we never counted the number of successes.
Here is the general idea:
- We initialize a counter variable, which starts off with the value of
0
. - Then, we iterate through the elements of
x
as we did in Task 1 above. Instead of printingTrue
orFalse
, however, we use a conditional statement to add1
tocount
if the corresponding element ofx
(i.e. the element ofx
under consideration in the current iteration of the loop). - Finally, we see what the value of our counter variable is- this will be exactly the number of successes in
x
!
Combine everything we’ve learned so far to count the number of successes in x
. Here is a rough template of how your code should look:
= 0 # initialize the counter variable
count
<for loop code here, containing a conditional and a 'count += 1'>
# display the final value of our counter variable count
There is another way to iterate through the elements in a list, and this is to use indexing. Before talking about how this works, we should quickly introduce another function: the arange()
function from the numpy
module. Here is how a general call to numpy.arange()
works:
numpy.arange(a, b, n)
This code returns the array of evenly spaced integers between a
and b
- including a
but excluding b
, where each element is s
more than the previous element. That is, the code above is equivalent to array([a, a+s, a+2s, ...])
As a concrete example:
import numpy as np
0, 5, 2) np.arange(
array([0, 2, 4])
The arange()
function is particularly useful when we are iterating using indices. For example, given a list x = [1, 2, 3, 4, 5]
, we can loop through the entries of x
using:
for k in np.arange(0, len(x)):
print(x[k])
1
2
3
4
5
Note that this is equivalent to
for k in x:
print(k)
1
2
3
4
5
Rewrite your loop from Task 3, except now iterate through the indices of x
. Check that your output is the same as in Task 3.
Quick Aside: arange()
vs linspace()
Some of you may recall that we previously used the numpy.linspace()
function to generate a list of numbers between two specified endpoints. The key difference between these two functions is that:
arange()
allows you to specify the step sizelinspace()
allows you to specify the final number of elements
Generate the list of numbers [1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 2.9, 2]
in two ways: one using arange()
and the other using linspace()
.
Sampling from a Population
To sample k
numbers from a list of numbers called y
, we can use the choices()
function from the module called random
. Specifically, if we import random
as rnd
, the command
rnd.choices(y, k)
generates a list of k
elements, all sampled from y
.
Simulate rolling a fair 6-sided die 100 times, and store the results of these rolls in a variable called x
. (Hint: Think how you can use the choices()
function to do this.)
Looking Ahead
On the upcoming homework, you will work toward recreating the simulation we did back in Lecture 10 to construct the sampling distribution of \(\widehat{P}\). This will involve using loops, so please make sure you understand the above material well!