数据读取和输出

读取格式

Function Description
read_csv Load delimited data from a file, URL, or file-like object; use comma as default delimiter
read_table Load delimited data from a file, URL, or file-like object; use tab ('\t') as default delimiter
read_fwf Read data in fixed-width column format (i.e., no delimiters)
read_clipboard Version of read_table that reads data from the clipboard; useful for converting tables from web pages
read_excel Read tabular data from an Excel XLS or XLSX file
read_hdf Read HDF5 files written by pandas
read_html Read all tables found in the given HTML document
read_json Read data from a JSON (JavaScript Object Notation) string representation
read_msgpack Read pandas data encoded using the MessagePack binary format
read_pickle Read an arbitrary object stored in Python pickle format
read_sas Read a SAS dataset stored in one of the SAS system’s custom storage formats
read_sql Read the results of a SQL query (using SQLAlchemy) as a pandas DataFrame
read_stata Read a dataset from Stata file format
read_feather Read the Feather binary file format

import pandas as pd
df = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex1.csv')

df

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>a</th>
<th>b</th>
<th>c</th>
<th>d</th>
<th>message</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>hello</td>
</tr>
<tr>
<th>1</th>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>world</td>
</tr>
<tr>
<th>2</th>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>foo</td>
</tr>
</tbody>
</table>
</div>

pd.read_table('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex1.csv',sep = ',')#指定分隔符类型

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

无标题格式

pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex2.csv',header=None)

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>hello</td>
</tr>
<tr>
<th>1</th>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>world</td>
</tr>
<tr>
<th>2</th>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>foo</td>
</tr>
</tbody>
</table>
</div>

指定标题格式

pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex2.csv',names=['a','b','c','d','message'])

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

指定行列

names = ['a','b','c','d','message']

pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex2.csv',
            names = names,index_col = 'message')

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>a</th>
<th>b</th>
<th>c</th>
<th>d</th>
</tr>
<tr>
<th>message</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>hello</th>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<th>world</th>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<th>foo</th>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
</tr>
</tbody>
</table>
</div>

多索引

parsed = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/csv_mindex.csv',
                     index_col = ['key1','key2'])

parsed

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th></th>
<th>value1</th>
<th>value2</th>
</tr>
<tr>
<th>key1</th>
<th>key2</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="4" valign="top">one</th>
<th>a</th>
<td>1</td>
<td>2</td>
</tr>
<tr>
<th>b</th>
<td>3</td>
<td>4</td>
</tr>
<tr>
<th>c</th>
<td>5</td>
<td>6</td>
</tr>
<tr>
<th>d</th>
<td>7</td>
<td>8</td>
</tr>
<tr>
<th rowspan="4" valign="top">two</th>
<th>a</th>
<td>9</td>
<td>10</td>
</tr>
<tr>
<th>b</th>
<td>11</td>
<td>12</td>
</tr>
<tr>
<th>c</th>
<td>13</td>
<td>14</td>
</tr>
<tr>
<th>d</th>
<td>15</td>
<td>16</td>
</tr>
</tbody>
</table>
</div>

特殊操作

list(open('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex3.txt'))

['            A         B         C\n',
 'aaa -0.264438 -1.026059 -0.619500\n',
 'bbb  0.927272  0.302904 -0.032399\n',
 'ccc -0.264273 -0.386314 -0.217601\n',
 'ddd -0.871858 -0.348382  1.100491\n']

result = pd.read_table('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex3.txt',
                       sep = '\s+')

result

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>A</th>
<th>B</th>
<th>C</th>
</tr>
</thead>
<tbody>
<tr>
<th>aaa</th>
<td>-0.264438</td>
<td>-1.026059</td>
<td>-0.619500</td>
</tr>
<tr>
<th>bbb</th>
<td>0.927272</td>
<td>0.302904</td>
<td>-0.032399</td>
</tr>
<tr>
<th>ccc</th>
<td>-0.264273</td>
<td>-0.386314</td>
<td>-0.217601</td>
</tr>
<tr>
<th>ddd</th>
<td>-0.871858</td>
<td>-0.348382</td>
<td>1.100491</td>
</tr>
</tbody>
</table>
</div>

跳过

pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex4.csv',skiprows=[0,2,3])

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

!cat /Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv

something,a,b,c,d,message
one,1,2,3,4,NA
two,5,6,,8,world
three,9,10,11,12,foo

r_l = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv')
r_l

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>something</th>
<th>a</th>
<th>b</th>
<th>c</th>
<th>d</th>
<th>message</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>one</td>
<td>1</td>
<td>2</td>
<td>3.0</td>
<td>4</td>
<td>NaN</td>
</tr>
<tr>
<th>1</th>
<td>two</td>
<td>5</td>
<td>6</td>
<td>NaN</td>
<td>8</td>
<td>world</td>
</tr>
<tr>
<th>2</th>
<td>three</td>
<td>9</td>
<td>10</td>
<td>11.0</td>
<td>12</td>
<td>foo</td>
</tr>
</tbody>
</table>
</div>

缺失值

pd.isnull(r_l)

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>something</th>
<th>a</th>
<th>b</th>
<th>c</th>
<th>d</th>
<th>message</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
<td>True</td>
</tr>
<tr>
<th>1</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>True</td>
<td>False</td>
<td>False</td>
</tr>
<tr>
<th>2</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
</tr>
</tbody>
</table>
</div>

result = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv',na_values=['NULL'])

result

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

指定特殊位置为Nan

sentinels = {'message':['foo','NA'],
             'someting':['two']}

import pandas as pd
pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv',na_values=sentinels)

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

converters Dict containing column number of name mapping to functions (e.g., {'foo': f} would apply the function f to all values in the 'foo' column).
dayfirst When parsing potentially ambiguous dates, treat as international format (e.g., 7/6/2012 -> June 7, 2012); False by default.
date_parser Function to use to parse dates.
nrows Number of rows to read from beginning of file.
iterator Return a TextParser object for reading file piecemeal.
chunksize For iteration, size of file chunks.
skip_footer Number of lines to ignore at end of file.
verbose Print various parser output information, like the number of missing values placed in non-numeric columns.
encoding Text encoding for Unicode (e.g., 'utf-8' for UTF-8 encoded text).
squeeze If the parsed data only contains one column, return a Series.
thousands Separator for thousands (e.g., ',' or '.').Argument Description
path String indicating filesystem location, URL, or file-like object
sep or delimiter Character sequence or regular expression to use to split fields in each row
header Row number to use as column names; defaults to 0 (first row), but should be None if there is no header row
index_col Column numbers or names to use as the row index in the result; can be a single name/number or a list of them for a hierarchical index
names List of column names for result, combine with header=None
skiprows Number of rows at beginning of file to ignore or list of row numbers (starting from 0) to skip.
na_values Sequence of values to replace with NA.
comment Character(s) to split comments off the end of lines.
parse_dates Attempt to parse data to datetime; False by default. If True, will attempt to parse all columns. Otherwise can specify a list of column numbers or name to parse. If element of list is tuple or list, will combine multiple columns together and parse to date (e.g., if date/time split across two columns).
keep_date_col If joining columns to parse date, keep the joined columns; False by default.

部分

pd.options.display.max_rows = 10

result = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex6.csv')

result

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>one</th>
<th>two</th>
<th>three</th>
<th>four</th>
<th>key</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0.467976</td>
<td>-0.038649</td>
<td>-0.295344</td>
<td>-1.824726</td>
<td>L</td>
</tr>
<tr>
<th>1</th>
<td>-0.358893</td>
<td>1.404453</td>
<td>0.704965</td>
<td>-0.200638</td>
<td>B</td>
</tr>
<tr>
<th>2</th>
<td>-0.501840</td>
<td>0.659254</td>
<td>-0.421691</td>
<td>-0.057688</td>
<td>G</td>
</tr>
<tr>
<th>3</th>
<td>0.204886</td>
<td>1.074134</td>
<td>1.388361</td>
<td>-0.982404</td>
<td>R</td>
</tr>
<tr>
<th>4</th>
<td>0.354628</td>
<td>-0.133116</td>
<td>0.283763</td>
<td>-0.837063</td>
<td>Q</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>9995</th>
<td>2.311896</td>
<td>-0.417070</td>
<td>-1.409599</td>
<td>-0.515821</td>
<td>L</td>
</tr>
<tr>
<th>9996</th>
<td>-0.479893</td>
<td>-0.650419</td>
<td>0.745152</td>
<td>-0.646038</td>
<td>E</td>
</tr>
<tr>
<th>9997</th>
<td>0.523331</td>
<td>0.787112</td>
<td>0.486066</td>
<td>1.093156</td>
<td>K</td>
</tr>
<tr>
<th>9998</th>
<td>-0.362559</td>
<td>0.598894</td>
<td>-1.843201</td>
<td>0.887292</td>
<td>G</td>
</tr>
<tr>
<th>9999</th>
<td>-0.096376</td>
<td>-1.012999</td>
<td>-0.657431</td>
<td>-0.573315</td>
<td>0</td>
</tr>
</tbody>
</table>
<p>10000 rows × 5 columns</p>
</div>

限定部分读取

pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex6.csv',nrows=5)

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

chunksize方法

chunker = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex6.csv',chunksize=1000)

chunker

<pandas.io.parsers.TextFileReader at 0x110ad76a0>

tot = pd.Series([])
for piece in chunker:
    tot = tot.add(piece['key'].value_counts(),fill_value=0)

tot = tot.sort_values(ascending = False)

tot[:10]

E    368.0
X    364.0
L    346.0
O    343.0
Q    340.0
M    338.0
J    337.0
F    335.0
K    334.0
H    330.0
dtype: float64

写文件

data = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv')

data

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

data.to_csv('/Users/meininghang/Desktop/out.csv')

!cat /Users/meininghang/Desktop/out.csv

,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo

import sys
data.to_csv(sys.stdout,sep='|') #保存特定格式

|something|a|b|c|d|message
0|one|1|2|3.0|4|
1|two|5|6||8|world
2|three|9|10|11.0|12|foo

缺失值处理

data.to_csv(sys.stdout,na_rep = 'NULL')

,something,a,b,c,d,message
0,one,1,2,3.0,4,NULL
1,two,5,6,NULL,8,world
2,three,9,10,11.0,12,foo

丢掉行列

data.to_csv(sys.stdout,index = False,header=False)

one,1,2,3.0,4,
two,5,6,,8,world
three,9,10,11.0,12,foo

data.to_csv(sys.stdout,index=False,columns=['a','b','c'])

a,b,c
1,2,3.0
5,6,
9,10,11.0

series导出

dates = pd.date_range('1/1/2000',periods=7)

import numpy as np
ts = pd.Series(np.arange(7),index = dates)

ts.to_csv('/Users/meininghang/Desktop/tse.csv')

!cat /Users/meininghang/Desktop/tse.csv

2000-01-01,0
2000-01-02,1
2000-01-03,2
2000-01-04,3
2000-01-05,4
2000-01-06,5
2000-01-07,6

特定格式

!cat /Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex7.csv

"a","b","c"
"1","2","3"
"1","2","3"

import csv
f = open('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex7.csv')

reader = csv.reader(f)

for li in reader:
    print(li)

['a', 'b', 'c']
['1', '2', '3']
['1', '2', '3']

#step1:读取
with open('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex7.csv') as f:
    li = list(csv.reader(f))

#step2:设定
header,values = li[0],li[1:]

#step3:构造
data_dict = {h:v for h,v in zip(header,zip(*values))} #zip(*values)可以把行变成列
data_dict

{'a': ('1', '1'), 'b': ('2', '2'), 'c': ('3', '3')}

#step4:格式
class my_dialect(csv.Dialect):
    lineterminator = '\n'
    delimiter = ';'
    quotechar = ' " '
    quoting = csv.QUOTE_MINIMAL
reader = csv.reader(f, dialect=my_dialect)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-38-be4cd5a73166> in <module>()
      5     quotechar = ' " '
      6     quoting = csv.QUOTE_MINIMAL
----> 7 reader = csv.reader(f, dialect=my_dialect)


TypeError: argument 1 must be an iterator

Argument Description
delimiter One-character string to separate fields; defaults to ','.
lineterminator Line terminator for writing; defaults to '\r\n'. Reader ignores this and recognizes cross-platform line terminators.
quotechar Quote character for fields with special characters (like a delimiter); default is '"'.
quoting Quoting convention. Options include csv.QUOTE_ALL (quote all fields), csv.QUOTE_MINIMAL (only fields with special characters like the delimiter), csv.QUOTE_NONNUMERIC, and csv.QUOTE_NONE (no quoting). See Python’s documentation for full details. Defaults to QUOTE_MINIMAL.
skipinitialspace Ignore whitespace after each delimiter; default is False.
doublequote How to handle quoting character inside a field; if True, it is doubled (see online documentation for full detail and behavior).
escapechar String to escape the delimiter if quoting is set to csv.QUOTE_NONE; disabled by default.

Python和csv文件交互

Python和csv文件交互

数据读取和输出

读取格式

多索引

特殊操作

跳过

缺失值

指定特殊位置为Nan

部分

限定部分读取

chunksize方法

写文件

缺失值处理

丢掉行列

series导出

特定格式

与之对应的有csv.writer方法

推荐阅读更多精彩内容