# S!T!A!R!T```python%matplotlib inlineimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt```## 1.define a Dataframe data```pythondates = pd.date_range('20170129', periods = 5, freq = 'd')df = pd.DataFrame(np.random.randn(5,4), columns = list('ABCD'), index = dates)``````pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythondf.plot(kind = 'barh')```![png](output_5_1.png)## 2.having data types in 'df'```pythondf.dtypes``` A float64 B float64 C float64 D float64 dtype: object## 3. Choose indexs&cloumns&values```pythonprint('-_'*33)print('The index of df is :\n{0}'.format(df.index))print('-_'*33)print('The values of df is :\n{0}'.format(df.values))print('-_'*33)print('The columns of df is :\n{0}'.format(df.columns))print('-_'*33)print(df.describe())print('-_'*33)``` -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ The index of df is : DatetimeIndex(['2017-01-29', '2017-01-30', '2017-01-31', '2017-02-01', '2017-02-02'], dtype='datetime64[ns]', freq='D') -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ The values of df is : [[-0.87679455 0.62570401 0.8394737 1.23924262] [ 0.35142547 0.61910446 0.69015155 0.89032261] [ 0.48453389 2.18840161 -0.2584778 -0.83329982] [ 1.49215272 -1.739632 -1.86894142 -0.01332174] [ 0.67477899 0.31703409 -0.37920271 -0.07616798]] -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ The columns of df is : Index(['A', 'B', 'C', 'D'], dtype='object') -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A B C D count 5.000000 5.000000 5.000000 5.000000 mean 0.425219 0.402122 -0.195399 0.241355 std 0.852178 1.403764 1.083244 0.827270 min -0.876795 -1.739632 -1.868941 -0.833300 25% 0.351425 0.317034 -0.379203 -0.076168 50% 0.484534 0.619104 -0.258478 -0.013322 75% 0.674779 0.625704 0.690152 0.890323 max 1.492153 2.188402 0.839474 1.239243 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_## 4.Transposing your data```pythonprint(df.T)``` 2017-01-29 2017-01-30 2017-01-31 2017-02-01 2017-02-02 A -0.876795 0.351425 0.484534 1.492153 0.674779 B 0.625704 0.619104 2.188402 -1.739632 0.317034 C 0.839474 0.690152 -0.258478 -1.868941 -0.379203 D 1.239243 0.890323 -0.833300 -0.013322 -0.076168## 5.Sort```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint('-_'*33)print(df.sort_index(axis = 0, ascending = False))print('-_'*33)print(df.sort_index(axis = 1, ascending = False))print('-_'*33)print(df.sort_values(axis = 0, ascending = True , by = 'A'))print('-_'*33)``` -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A B C D 2017-02-02 0.674779 0.317034 -0.379203 -0.076168 2017-02-01 1.492153 -1.739632 -1.868941 -0.013322 2017-01-31 0.484534 2.188402 -0.258478 -0.833300 2017-01-30 0.351425 0.619104 0.690152 0.890323 2017-01-29 -0.876795 0.625704 0.839474 1.239243 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ D C B A 2017-01-29 1.239243 0.839474 0.625704 -0.876795 2017-01-30 0.890323 0.690152 0.619104 0.351425 2017-01-31 -0.833300 -0.258478 2.188402 0.484534 2017-02-01 -0.013322 -1.868941 -1.739632 1.492153 2017-02-02 -0.076168 -0.379203 0.317034 0.674779 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A B C D 2017-01-29 -0.876795 0.625704 0.839474 1.239243 2017-01-30 0.351425 0.619104 0.690152 0.890323 2017-01-31 0.484534 2.188402 -0.258478 -0.833300 2017-02-02 0.674779 0.317034 -0.379203 -0.076168 2017-02-01 1.492153 -1.739632 -1.868941 -0.013322 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_## 6.Choose data### 6.1.Choose columns by single label```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint('-_'*33)print(df.A)print('-_'*33)print(df['B'])print('-_'*33)``` -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ 2017-01-29 -0.876795 2017-01-30 0.351425 2017-01-31 0.484534 2017-02-01 1.492153 2017-02-02 0.674779 Freq: D, Name: A, dtype: float64 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ 2017-01-29 0.625704 2017-01-30 0.619104 2017-01-31 2.188402 2017-02-01 -1.739632 2017-02-02 0.317034 Freq: D, Name: B, dtype: float64 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_### 6.2.Choose rows by slice via'[ : ]'```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```python#not include the second numberdf[ :3]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
### 6.3. Choose rows by index name via'[:]'```python# include the second labledf['20170129': '2017-1-31']```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
### 6.4. Choose complex sections using '.loc( )'```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint(df.loc[dates[0]])print('-_'*33)print(df.loc['20170201'])print('-_'*33)print(df.loc[dates[0]:dates[len(df)-1], list('AB')])print('-_'*33)print(df.loc['20170129': '20170202', ['A', 'C', 'D']])print('-_'*33)print(df.loc[:, list('CBA')])print('-_'*33)print(df.loc[dates[0], 'A'])print('-_'*33)print(df.loc[dates[0], ['A']])print('-_'*33)##A' can not be a listprint(df.at[dates[0], 'A'])``` A -0.876795 B 0.625704 C 0.839474 D 1.239243 Name: 2017-01-29 00:00:00, dtype: float64 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A 1.492153 B -1.739632 C -1.868941 D -0.013322 Name: 2017-02-01 00:00:00, dtype: float64 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A B 2017-01-29 -0.876795 0.625704 2017-01-30 0.351425 0.619104 2017-01-31 0.484534 2.188402 2017-02-01 1.492153 -1.739632 2017-02-02 0.674779 0.317034 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A C D 2017-01-29 -0.876795 0.839474 1.239243 2017-01-30 0.351425 0.690152 0.890323 2017-01-31 0.484534 -0.258478 -0.833300 2017-02-01 1.492153 -1.868941 -0.013322 2017-02-02 0.674779 -0.379203 -0.076168 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ C B A 2017-01-29 0.839474 0.625704 -0.876795 2017-01-30 0.690152 0.619104 0.351425 2017-01-31 -0.258478 2.188402 0.484534 2017-02-01 -1.868941 -1.739632 1.492153 2017-02-02 -0.379203 0.317034 0.674779 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ -0.876794550096 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A -0.876795 Name: 2017-01-29 00:00:00, dtype: float64 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ -0.876794550096### 6.5. Choose complex sections using '.lioc( )'```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint('-_'*33)#single elementprint(df.iloc[0,0])print('-_'*33)#complex sectionsprint(df.iloc[:3,2:])print('-_'*33)#single rowprint(df.iloc[0])print('-_'*33)#single columnprint(df.iloc[:,0])print('-_'*33)``` -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ -0.876794550096 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ C D 2017-01-29 0.839474 1.239243 2017-01-30 0.690152 0.890323 2017-01-31 -0.258478 -0.833300 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ A -0.876795 B 0.625704 C 0.839474 D 1.239243 Name: 2017-01-29 00:00:00, dtype: float64 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ 2017-01-29 -0.876795 2017-01-30 0.351425 2017-01-31 0.484534 2017-02-01 1.492153 2017-02-02 0.674779 Freq: D, Name: A, dtype: float64 -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_### 7. Boolean indexing```python#element in column 'A' > 0, return the DataFramedf[df.A > 0]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```python#all element in 'df' which <=0 is np.nandf[df > 0]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29NaN0.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402NaNNaN
2017-02-011.492153NaNNaNNaN
2017-02-020.6747790.317034NaNNaN
```pythondf2=df.copy()df2['E'] = ['fuck', 'shit', ] * 2 + ['suck']df2[df2.E.isin(['fuck', 'suck'])]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDE
2017-01-29-0.8767950.6257040.8394741.239243fuck
2017-01-310.4845342.188402-0.258478-0.833300fuck
2017-02-020.6747790.317034-0.379203-0.076168suck
### 8.Missing Data```pythondf3 = df[df > 0]df3```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29NaN0.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402NaNNaN
2017-02-011.492153NaNNaNNaN
2017-02-020.6747790.317034NaNNaN
```pythondf3.dropna()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-300.3514250.6191040.6901520.890323
```pythondf3.fillna(value = 'Missing')```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29Missing0.6257040.8394741.23924
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.1884MissingMissing
2017-02-011.49215MissingMissingMissing
2017-02-020.6747790.317034MissingMissing
```pythondf4 = pd.isnull(df3)df4```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29TrueFalseFalseFalse
2017-01-30FalseFalseFalseFalse
2017-01-31FalseFalseTrueTrue
2017-02-01FalseTrueTrueTrue
2017-02-02FalseFalseTrueTrue
### 9. Merge```pythona = df[ : 2]b = df[2 : 4]c = df[4 : 5]pieces = [a, b, c]pd.concat(pieces)```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
### 10. Time Series### 10.1. define time series based 'df'```pythontime = pd.date_range('20170129', periods = 1000, freq = 'd')col = list('ABCDEFG')df_time = pd.DataFrame(np.random.randn(len(time), len(col)), index = time, columns = col)df_time.head(5)```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-29-2.442170-1.423526-0.2833060.4749680.2284170.8108171.918797
2017-01-300.3414900.611065-0.6841250.9758580.595433-1.2319040.027245
2017-01-31-0.448969-0.416562-1.026428-1.529416-0.5672462.305233-0.186806
2017-02-010.3858290.0555172.3483670.419194-0.3857280.0490750.613855
2017-02-02-0.2629490.3337900.750670-0.7366430.0635610.6028150.328664
### 10.2. resample 'df' by month```pythondf_time2 = df_time.resample('m').mean()df_time2.head()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-31-0.849883-0.409675-0.664620-0.0261960.0855350.6280490.586412
2017-02-28-0.0904390.0590900.069942-0.049585-0.075990-0.1410150.258787
2017-03-310.133955-0.0674840.157919-0.0252380.2037280.217022-0.109552
2017-04-30-0.1271820.1696820.065305-0.2698410.2872390.1400860.193535
2017-05-31-0.0336770.088312-0.284405-0.1335220.114316-0.551639-0.158488
```pythondf_time3 = df_time2.tz_localize('utc')df_time3.head()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-31 00:00:00+00:00-0.849883-0.409675-0.664620-0.0261960.0855350.6280490.586412
2017-02-28 00:00:00+00:00-0.0904390.0590900.069942-0.049585-0.075990-0.1410150.258787
2017-03-31 00:00:00+00:000.133955-0.0674840.157919-0.0252380.2037280.217022-0.109552
2017-04-30 00:00:00+00:00-0.1271820.1696820.065305-0.2698410.2872390.1400860.193535
2017-05-31 00:00:00+00:00-0.0336770.088312-0.284405-0.1335220.114316-0.551639-0.158488
```pythondf_time3.tz_convert('US/Eastern').head()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-30 19:00:00-05:00-0.849883-0.409675-0.664620-0.0261960.0855350.6280490.586412
2017-02-27 19:00:00-05:00-0.0904390.0590900.069942-0.049585-0.075990-0.1410150.258787
2017-03-30 20:00:00-04:000.133955-0.0674840.157919-0.0252380.2037280.217022-0.109552
2017-04-29 20:00:00-04:00-0.1271820.1696820.065305-0.2698410.2872390.1400860.193535
2017-05-30 20:00:00-04:00-0.0336770.088312-0.284405-0.1335220.114316-0.551639-0.158488
### 11. Plotting```pythondf.plot()```![png](output_49_1.png)
### 12. Data I/O
```python
df_time.to_csv('DataFrame_time_data.csv')
```
```python
df_time.to_excel('DataFrame_time_data.xlsx', sheet_name = 'data_1')
```
```python
df_time.to_html('DataFrame_time_data.html')
```
```python
df_time.to_json('DataFrame_time_data.txt')
```
```python
```