Pandas tutorial: Data Structure

pandas: pan(el)-da(ta)-s.

Series

Series is a one-dimensional labeled array capable of holding any data type . The axis labels are collectively referred to as the index.

The basic method to create a Series is to call:

s = pd.Series(data, index=index)

Here, data can be many different things:

a Python dict
an ndarray
a scalar value

Series is ndarray-like
Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, things like slicing also slice the index.

Series is dict-like
A Series is like a fixed-size dict in that you can get and set values by index label.

Align the data based on label
A key difference between Series and ndarray is that operations between Series automatically align the data based on label.

The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN.

some attributes

s.index             <-- Series obj's index

s.values            <-- Series obj's data

Name attribute
Series can also have a name attribute.

s = pd.Series(data, name = '..')

DataFrame

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.

The basic method to create a DataFrame is to call:

df = pd.DataFrame(data, index=.., columns=..)

Like Series, DataFrame accepts many different kinds of input:

Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments.

Note When a particular set of columns is passed along with a dict of data, the passed columns override the keys in the dict.

Alternate Constructors

DataFrame.from_dict
DataFrame.from_records
DataFrame.from_items

accessing attributes

df.index        <-- DataFrame obj's index

df.columns      <-- DataFrame obj's columns

df.values       <-- DataFrame obj's data

Column selection, addition, deletion
You can treat a DataFrame semantically like a dict of like-indexed Series objects.

Assigning New Columns in Method Chains
DataFrame has an assign() method that allows you to easily create new columns that are potentially derived from existing columns.

NOTE All expressions are computed first, and then assigned. So you can’t refer to another column being assigned in the same call to assign. For example:

In [1]: # Don't do this, bad reference to `C`
        df.assign(C = lambda x: x['A'] + x['B'],
                  D = lambda x: x['A'] + x['C'])
In [2]: # Instead, break it into two assigns
        (df.assign(C = lambda x: x['A'] + x['B'])
           .assign(D = lambda x: x['A'] + x['C']))

Indexing / Selection

index_select

DataFrame column attribute access
If a DataFrame column label is a valid Python variable name, the column can be accessed like attributes:

df.column_name

Panel

In 0.20.0, Panel is deprecated and will be removed in a future version.

In additon, the xarray package was built from the ground up, specifically in order to support the multi-dimensional analysis that is one of Panel s main usecases.

You can see the full-documentation for the xarray package.

Reference

Intro to Data Structures