pandas: pan(el)-da(ta)-s.
Series
Series is a one-dimensional labeled array capable of holding any data type . The axis labels are collectively referred to as the index.
The basic method to create a Series is to call:
s = pd.Series(data, index=index)
Here, data
can be many different things:
- a Python dict
- an ndarray
- a scalar value
Series is ndarray-like
Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, things like slicing also slice the index.
Series is dict-like
A Series is like a fixed-size dict in that you can get and set values by index label.
Align the data based on label
A key difference between Series and ndarray is that operations between Series automatically align the data based on label.
The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN
.
some attributes
s.index <-- Series obj's index
s.values <-- Series obj's data
Name attribute
Series can also have a name attribute.
s = pd.Series(data, name = '..')
DataFrame
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.
The basic method to create a DataFrame is to call:
df = pd.DataFrame(data, index=.., columns=..)
Like Series, DataFrame accepts many different kinds of input:
- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- A
Series
- Another
DataFrame
Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments.
Note When a particular set of columns is passed along with a dict of data, the passed columns override the keys in the dict.
Alternate Constructors
- DataFrame.from_dict
- DataFrame.from_records
- DataFrame.from_items
accessing attributes
df.index <-- DataFrame obj's index
df.columns <-- DataFrame obj's columns
df.values <-- DataFrame obj's data
Column selection, addition, deletion
You can treat a DataFrame semantically like a dict of like-indexed Series objects.
Assigning New Columns in Method Chains
DataFrame has an assign() method that allows you to easily create new columns that are potentially derived from existing columns.
NOTE All expressions are computed first, and then assigned. So you can’t refer to another column being assigned in the same call to assign. For example:
In [1]: # Don't do this, bad reference to `C`
df.assign(C = lambda x: x['A'] + x['B'],
D = lambda x: x['A'] + x['C'])
In [2]: # Instead, break it into two assigns
(df.assign(C = lambda x: x['A'] + x['B'])
.assign(D = lambda x: x['A'] + x['C']))
Indexing / Selection
DataFrame column attribute access
If a DataFrame column label is a valid Python variable name, the column can be accessed like attributes:
df.column_name
Panel
In 0.20.0, Panel is deprecated and will be removed in a future version.
In additon, the xarray package was built from the ground up, specifically in order to support the multi-dimensional analysis that is one of Panel s main usecases.
You can see the full-documentation for the xarray package.