<item 1>, <item 2>, ..., <item n>] [
Data Classes
Last week, we were introduced to the notion of data types. Recall that “data type” can be thought of as the category (or type) of data- i.e. integer, float, character, etc.
In Python, however, we often need to aggregate data into larger structures, often referred to as data classes.
Lists
Perhaps the most fundamental data structure in Python is that of a list. Just like lists in real life or in mathematics, Python lists are just collections of items enclosed in square brackets:
Again, the items in a list can be of any data type; we can even mix and match data types!
Create a list containing the elements 1
, "hi"
, 3.4
, and "PSTAT 5A"
. Assign this list to a variable called list1
.
Just as we were able to use a Python function (type()
) to check the type of a particular piece of data, we can also use Python to check the structure or class of a piece of data. It turns out that we use the same function as before- namely, type()
!
Run the code type(list1)
.
Indexing
Alright, now that we can store data in lists, how can we access elements in a list? The answer is to use what is known as indexing.
Given a list x
, we access the i
th element using the code
x[i]
The reason we call this “indexing” is because the number that goes between the brackets is the index of the element that we want.
Python begins indexing at 0.
What does this mean? Well, let’s see by way of an example.
Create a list with the numbers
1
through10
, inclusive.Run the code
x[1]
.Run the code
x[0]
.
So, what we would colloquially call the first element of a list, Python calls the zeroeth element.
Alright, let’s put together some of the concepts we just learned.
Create a list called x
that contains the elements 1
, "two"
, 3.5
, "four"
, and "five five"
. Answer the following questions WITHOUT running any code, writing your answers as a comment in a code cell:
- What would be the output of
type(x)
? - What would be the output of
type(x[1])
? - What would be the output of
x[0]
?
Now, run code to verify your answers to the above three questions.
Tables
Another very useful data structure in Python is that of a table. Python tables behave pretty much the same as the tables we’ve used in, say, math- they are a grid of values arranged sequentially.
Tables can be created using the Table()
function in Python, which itself comes from the datascience
module. The general syntax of creating a table with the Table()
function is:
Table().with_columns("<col 1 name>", [<col 1, val 1>, <col 1, val 2>, ... ],
"<col 2 name>", [<col 2, val 1>, <col 2, val 2>, ... ],
... )
For example,
Table().with_columns("Name", ["Ethan", "Morgan", "Amy"],
"ID", [12345, 10394, 20343],
"Office", ["South Hall", "South Hall", "North Hall"]
)
Name | ID | Office |
---|---|---|
Ethan | 12345 | South Hall |
Morgan | 10394 | South Hall |
Amy | 20343 | North Hall |
There is nothing stopping us from assigning a table to a variable! For example, after running
= Table().with_columns(
table1 "Name", ["Ethan", "Morgan", "Amy"],
"ID", [12345, 10394, 20343],
"Fav_Drink", ["Iced Tea", "Coffee", "Sprite"]
)
the variable table1
is equivalent to the table displayed above:
table1
Name | ID | Fav_Drink |
---|---|---|
Ethan | 12345 | Iced Tea |
Morgan | 10394 | Coffee |
Amy | 20343 | Sprite |
Sometimes in Python we will encounter expressions of the form
<object type>.<function name>()
In this syntax, the function <function name>
is said to be a method. For example, the function with_columns()
is a method for the Table
object.
The datascience
module contains a plethora of methods we can use to manage tables. For example, the select()
method can be used to select columns by name:
"ID") table1.select(
ID |
---|
12345 |
10394 |
20343 |
Methods are always appended to either a function that creates a blank object type (like Table()
) or a variable of the correct type.
Read the list of methods for Table objects at http://data8.org/datascience/tables.html, and write down (in a code cell, using comments) at least three different methods, including a short description of what each method does. For example:
# .with_columns(): adds specified columns to a table.
- Create the following table, and assign it to a variable called
profs
:
Professor | Office | Course |
---|---|---|
Dr. Swenson | South Hall | PSTAT 130 |
Dr. Wainwright | Old Gym | PSTAT 120A |
Dr. Mouti | Old Gym | PSTAT 126 |
Run a cell containing only the code profs
to make sure (visually) that your table looks correct.
Select the column called
Course
fromprofs
.Append (i.e. add) a new row to the
profs
table, containing the following information:
Professor | Office | Course |
---|---|---|
Dr. Ravat | South Hall | PSTAT 120B |
Run a cell containing only the code profs
to make sure (visually) that the appending was successful.
Suppose we want to select rows of a table that satisfy a given condition. For example, if we wanted to find the information of only people who like Sprite in the table1
table above, we would call
"Fav_Drink", "Sprite") table1.where(
Name | ID | Fav_Drink |
---|---|---|
Amy | 20343 | Sprite |
What would happen if we tried to select the rows of table1
with Coke
in the Fav_Drink
column? Well, since there is nobody in table1
that has coke as their favorite drink, we should hope that Python returns an empty table.
"Fav_Drink", "Coke") table1.where(
Name | ID | Fav_Drink |
---|
Sure enough, Python has returned an empty table!
Arrays
The final Data Structure we will examine in this class is that of an array. Arrays behave very similarly to Tables, with a few differences. For one, the syntax used to create an array is slightly different:
<item 1>, <item 2>, <item 3>, ...) make_array(
For example,
"Spring", "Summer", "Autumn", "Winter") make_array(
array(['Spring', 'Summer', 'Autumn', 'Winter'],
dtype='<U6')
You may ask- what’s that dtype='<U6'
symbol at the end of the output? For now, don’t worry about it, as we will revisit this later.
Lists vs. Arrays
So, we now know about three different data classes in Python: lists, tables, and arrays. At first glance, lists and arrays may seem somewhat similar. However, there are a few key differences between them:
Make a list called my_list
containing the elements 1
, 2
, and 3
, and make an array called my_array
also containing the elements 1
, 2
, and 3
. Run the following commands in separate code cells:
sum(my_list)
sum(my_array)
my_list + 2
my_array + 2
What the previous Task illustrates is the fact that arrays lend themselves to element-wise operations, whereas lists do not. One important limitation about arrays, though, is that the elements in an array must all be of the same data type. If you try to make an array consisting of elements that are different data types Python will still run, however it will not run in the way you expect it to!