If we wanted to extract a header, grab all the columns, and do a count of a dataset, we would have to have written all these functions in a module and imported them for every new dataset we used. This can get tedious over time because we use datasets all the time and we would want a consistent set of behaviors and attributes to use with them.
With classes, we bundle all of that data and behavior together in one location. An instance of the Dataset class is all we need to count unique terms in a dataset or get a file's header. Once we add behavior to a class, every instance of the class will be able to perform that behavior. As we develop our application, we can add more properties to classes to extend their functionality. Using classes and instances helps organize our code, and allows us to represent real-world concepts in well-defined code constructs.
Create a class called Dataset.
- Inside the class, create a
typeattribute. Assign the value"csv"to it.
Create an instance of the Dataset class, and assign it to the variable dataset.
Print the type attribute of the dataset instance.
Add a data parameter to the __init__() method, and set the value to the self.data attribute.
Read the data from nfl.csv and set it to the variable nfl_data.
Make an instance of the class, passing in nfl_data to the __init__() method (when you call Dataset(...)).
- Assign the result to the variable
nfl_dataset.
Use the data attribute to access the underlying data for nfl_dataset and assign the result to the variable dataset_data.
Add an instance method print_data() that takes in a num_rowsargument.
- This method should print out data up to the given amount of rows.
Create an instance of the Dataset class and initialize with the nfl_data. nfl_data is already loaded for you .
- Assign it to the cariable
nfl_dataset.
Call the print_data method , setting the num_rows parameter to 5.
Add the extract_header() code to the initializer and set the header data to self.header.
Create a variable called nfl_header and set it to the header attribute.
Add a method named column that takes in a label argument, finds the index of the header, and returns a list of the column data.
- If the
labelis not in the header, you should returnNone.
Create a variable called year_column and set it to the return value of column('year').
Create a variable called player_column and set it to the return value of column('player').
Add a method to the Dataset class called count_unique() that takes in a label arguments.
Get the unique set of items from the column() method and return the total count.
Use the instance method to assign the number of unique term values of year to total_years.
Add a method to the Dataset class called __str__()
- Convert the first 10 rows of
self.datato a string and set it as the return value.
Create an instance of the class called nfl_dataset and call print on it.