PSTAT 5A: Archives

Introduction to Loops

Suppose we have the following outcomes of an experiment:

x = ['success', 'failure', 'failure', 'success', 'failure', 'failure', 'failure', 'success']

How might we write code to count the number of successes in this string of outcomes? There are several different ways to accomplish this: one involves the main topic of today’s lab, which is a for loop.

figure source: https://i.kym-cdn.com/photos/images/newsfeed/001/393/656/da7.jpg

Here’s the general idea: we would like to perform an element-wise comparison; that is, we would like to iteratively check whether each element of x is a success or a failure. The “brute-force” way would be to check each element individually, using comparisons:

x[0] == 'success'

True

x[1] == 'success'

False

x[2] == 'success'

False

As you can imagine, though, this would get incredibly tedious, especially if x were large! This is where for loops become useful: they allow us to automate this iterative process.

Before returning to this success/failure problem, let’s look at an example to see how for loops work.

for fruit in ['apple', 'banana', 'pear']:
  print(fruit)

apple
banana
pear

Here are how the different components work:

The for keyword signifies the beginning of the for loop.
The name fruit is the variable.
The list following the in keyword contains all of the different values the variable will take during the execution of the for loop.
The code after the initial colon : is called the body of the loop. (Note that the body of a for loop must be indented properly!) Here is how the body is executed:
- First, the variable fruit is assigned the first value in the list of possible values specified in the first line of the loop
- Then, after assigning fruit this value, the code in the body is executed once.
- Next, the variable fruit is assigned the second value of the list of values, and the body is run again.
- This continues until the list of all possible values is exhausted.

Sometimes, it may be useful to sketch a diagram/table to keep track of the code at each iteration of the loop:

FIRST ITERATION
Start of Iteration	`fruit`: `‘apple’`
End of Iteration	`fruit`: `‘apple’`
SECOND ITERATION
Start of Iteration	`fruit`: `‘banana’`
End of Iteration	`fruit`: `‘banana’`
THIRD ITERATION
Start of Iteration	`fruit`: `‘pear’`
End of Iteration	`fruit`: `‘pear’`

It may seem strange to keep track of the values of the variables at the end of each iteration. The reason we do so is because sometimes the body of the loop will actually change the value of a variable! For example, consider the code

for n in [1, 2, 3]:
  n += 2
  print(n)

3
4
5

the associated diagram would look like

FIRST ITERATION
Start of Iteration	`n`: `1`
End of Iteration	`n`: `3`
SECOND ITERATION
Start of Iteration	`n`: `2`
End of Iteration	`n`: `4`
THIRD ITERATION
Start of Iteration	`n`: `3`
End of Iteration	`n`: `5`

By the way, notice the shorthand notation += that was used above:

Tip

The code x += y is equivalent to x = x + y.

Finally, one thing that should be mentioned is that you can call the variable in a loop whatever you like!

for yummy in ['apple', 'banana', 'pear']:
  print(yummy)

apple
banana
pear

Task 1

Copy-paste the code

x = ['success', 'failure', 'failure', 'success', 'failure', 'failure', 'failure', 'success']

into a cell, and run it. Then, create a for loop that iterates through the elements of x and at each iteration prints True if the corresponding element of x is a 'success' and False if the corresponding element of x is a 'failure'. Your final output should look like:

True
False
False
True
False
False
False
True

By the way, the set of values a variable will take during a for loop doesn’t have to be a list- it could also be an array! This is particularly useful when there are multiple things we would like to iterate over. For example:

import datascience as ds
credit_scores = ds.make_array(
  ["Anne", 750],
  ["Barbara", 755],
  ["Cassandra", 745]
)

for k in credit_scores:
  print(k[0], "has a credit score of", k[1])

Anne has a credit score of 750
Barbara has a credit score of 755
Cassandra has a credit score of 745

Task 2

Make a table like the one above that keeps track of the variables and their values in the above loop. You do not need to turn this in; do it on a separate sheet of paper and in your .ipynb file simply state “I have done Task 2 on a separate sheet of paper.”

Now, we never quite finished our problem of counting the number of successes in the variable x. We were able to iterate through the elements of x to determine which were successes and which were failures, but we never counted the number of successes.

Here is the general idea:

We initialize a counter variable, which starts off with the value of 0.
Then, we iterate through the elements of x as we did in Task 1 above. Instead of printing True or False, however, we use a conditional statement to add 1 to count if the corresponding element of x (i.e. the element of x under consideration in the current iteration of the loop).
Finally, we see what the value of our counter variable is- this will be exactly the number of successes in x!

Task 3

Combine everything we’ve learned so far to count the number of successes in x. Here is a rough template of how your code should look:

count = 0     # initialize the counter variable

<for loop code here, containing a conditional and a 'count += 1'>

count       # display the final value of our counter variable

There is another way to iterate through the elements in a list, and this is to use indexing. Before talking about how this works, we should quickly introduce another function: the arange() function from the numpy module. Here is how a general call to numpy.arange() works:

numpy.arange(a, b, n)

This code returns the array of evenly spaced integers between a and b - including a but excluding b, where each element is s more than the previous element. That is, the code above is equivalent to array([a, a+s, a+2s, ...]) As a concrete example:

import numpy as np
np.arange(0, 5, 2)

array([0, 2, 4])

The arange() function is particularly useful when we are iterating using indices. For example, given a list x = [1, 2, 3, 4, 5], we can loop through the entries of x using:

for k in np.arange(0, len(x)):
  print(x[k])

Note that this is equivalent to

for k in x:
  print(k)

Task 4

Rewrite your loop from Task 3, except now iterate through the indices of x. Check that your output is the same as in Task 3.

Quick Aside: `arange()` vs `linspace()`

Some of you may recall that we previously used the numpy.linspace() function to generate a list of numbers between two specified endpoints. The key difference between these two functions is that:

arange() allows you to specify the step size
linspace() allows you to specify the final number of elements

Task 5

Generate the list of numbers [1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 2.9, 2] in two ways: one using arange() and the other using linspace().

Sampling from a Population

To sample k numbers from a list of numbers called y, we can use the choices() function from the module called random. Specifically, if we import random as rnd, the command

rnd.choices(y, k)

generates a list of k elements, all sampled from y.

Task 6

Simulate rolling a fair 6-sided die 100 times, and store the results of these rolls in a variable called x. (Hint: Think how you can use the choices() function to do this.)

Looking Ahead

On the upcoming homework, you will work toward recreating the simulation we did back in Lecture 10 to construct the sampling distribution of \(\widehat{P}\). This will involve using loops, so please make sure you understand the above material well!

Introduction to Loops

Quick Aside: arange() vs linspace()

Sampling from a Population

Looking Ahead

Quick Aside: `arange()` vs `linspace()`