Tables and Schema
Tables
Data is organized into named tables, or relations
Table is used to store the data
Every column has its data type:
- INTEGER, SMALLINT, TINYINT, BIGINT
- DECIMAL(n,m) : float type
- VARCHAR: string type
- BOOLEAN: logic
Schemas
The names and types of the data in a table's columns are called the table's schema.
Log in shell
mysql -u root -p psw
SQL
The most common features of SQL are:
- Projection - selecting some columns from table
- Function application - applying a function to a value
- Filtering - selecting rows based on some criterion
- Grouping - aggregating rows based on the values in a column
- Joins - combining two tables
(回想在big data中学习的实现各类功能的完备数学运算集)
Query syntax
SQL queries have the form:
SELECT expression[, expression, ...]
FROM table
[... additional clauses ...]
Projection
Projection is when you ask for some(all) of the columns of a table.
也就是抽取相应的列,keyword 负责对抽取的列中的行进行过滤操作,
比如DISTINCT
就是抽取出某一列的不重复的所有元素(类似于数学里面的集合)。
SELECT col1, col2
FROM table1
SELECT has many keywords like:
- DISTINCT - return unique results
- BETWEEN a AND b - limit the range, the values can be numbers, text, or dates
- LIKE - pattern search within the column text
- IN (a, b, c) - check if the value is contained among given.
SELECT DISTINCT Facility
FROM Admissions;
SELECT *
FROM table1
Function application
在列上施加函数,可以构造新的列名
SELECT Facility,
Lengthofstay * 24
FROM Admissions;
SELECT UPPER(Facility),
Lengthofstay * IllnessCode
FROM Admissions;
SELECT Facility,
Lengthofstay * 24 AS stayinhours
FROM Admissions;
Filtering
相当于对行施加函数,抽取出部分的行, 关键词为WHERE
.多个条件语句之间用逻辑符OR AND
隔开
SELECT *
FROM Admissions
WHERE Lengthofstay > 4;
SELECT Facility, IllnessCode
FROM Admissions
WHERE IllnessCode = 141 OR
IllnessCode = 94;
SELECT Facility, Lengthofstay
FROM Admissions
WHERE Lengthofstay > 2 AND
Lengthofstay < 6;
Aggregation function
Aggregate the rows according to a column, and perform operarions on the aggregated rows. 使用keyword :GROUP BY
Grouping 操作也就是汇总通常与聚合函数(共有40多种聚合函数)一起使用,常见的聚合函数:
- COUNT - return the number of rows
- SUM - cumulate the values
- AVG - return the average for the group
- MIN / MAX - smallest / largest value
SELECT Facility,
MAX(Lengthofstay)
FROM Admissions
GROUP BY Facility;
SELECT Facility,
MIN(Lengthofstay),
MAX(Lengthofstay),
AVG(Lengthofstay)
FROM Admissions
GROUP BY Facility;
SELECT COUNT(*) FROM Admissions;
可以将过滤操作和GROUP BY 操作放到一起同时执行,但是注意过滤操作要放到前面。
Sort the result
对结果排序,关键词 :ORDER BY
SELECT Facility, Lengthofstay
FROM Admissions
ORDER BY Lengthofstay;
SELECT Facility, Lengthofstay
FROM Admissions
ORDER BY Facility, Lengthofstay
Joins
链接两个或者多个表
SELECT * FROM Admissions JOIN Illnesses
ON Admissions.IllnessCode = Illnesses.Code;
Join 操作可以写的非常的复杂,同时join分为 Left Outer Join
, Inner join
and right Outer join
UPDATE t1 SET a = 1
FROM table1 t1 JOIN table2 t2 ON t1.id = t2.t1_id WHERE t1.col1 = 0 AND t2.col2 IS NULL;
LIMIT clause
从返回的结果中显示有限的元素
SELECT Facility, IllnessCode
FROM Admissions
LIMIT 2
Database and tables
Basic operations
CREATE DATABASE database name;
DROP DATABASE database name;
SHOW DATABASES;
USE database name;
SHOW TABLES;
DESCRIBE table-name;
tables
DROP TABLE IF EXISTS plate_types;
CREATE TABLE plate_types(
code CHAR(3) NOT NULL,
description VARCHAR(27),
PRIMARY KEY(code)
);
You may notice that the return of a SELECT statement is actually another table. So you may use it as a subqueries anywhere you need a table
-- create a table from another select-statement
CREATE TABLE longstays
AS SELECT facility, MAX(lengthofstay)
FROM hospital
GROUP BY facility;
Deleting rows and tables
DROP TABLE table-name;
# just delete the data but leave the table(as a schema)
TRUNCATE TABLE table-name;
# Delete rows
DELETE FROM table-name
WHERE conditions;
add or update tables
-- update speci c data with the WHERE clause
UPDATE table1 SET col1 = 1
WHERE col2 = 2
-- insert values manually
INSERT INTO table1 (ID, FIRST_NAME, LAST_NAME)
VALUES (1, ‘Rebel’, ‘Labs’);
-- or by using the results of a query
INSERT INTO table1 (ID, FIRST_NAME, LAST_NAME)
SELECT id, last_name, rst_name FROM table2
-- load data
LOAD DATA LOCAL INFILE '/Users/xiaodiu/Google/Big Data/assignment2/data/plate-types.csv'
INTO TABLE plate_types
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n';