SQL基礎筆記_ZenDei技術網路在線

Codecademy中Learn SQL, SQL: Table Transformaton和SQL: Analyzing Business Metrics三門課程的筆記，以及補充的附加筆記。 Codecademy的課程以SQLite編寫，筆記中改成了MySQL語句。 I. Learn SQL 1. ...

Codecademy中Learn SQL, SQL: Table Transformaton和SQL: Analyzing Business Metrics三門課程的筆記，以及補充的附加筆記。 Codecademy的課程以SQLite編寫，筆記中改成了MySQL語句。 I. Learn SQL 1. Manipulation - Create, edit, delete data 1.4 Create 創建資料庫或資料庫中的表

CREATE TABLE celebs 
    (
    id INTEGER, 
    name TEXT, 
    age INTEGER 
    ); # 第一列id，數據類型整數；第二列name，數據類型文本；第三列age，數據類型整數

1.5 Insert 向表中插入行

INSERT INTO celebs ( id, name, age)
    VALUES ( 1, 'Alan Mathison Turing', 42); # 在celebs表最下方插入數據：id列為1，name列為Alan Mathion Turing，age列為42

1.6 Select 選取數據

SELECT 
    *
FROM
    celebs; # 顯示celebs表所有數據

1.7 Update 修改數據

UPDATE celebs 
SET 
    age = 22
WHERE
    id = 1; # 將celebs表中id=1的行的age改為22

1.8 Alert 更改表結構或數據類型

ALERT TABLE celebs
ADD COLUMN twitter_handle TEXT; # 在celebs表增加twitter_handle列

ALERT TABLE 'test'.'data'
CHANGE COLUMN 'Mobile' 'Mobile' BLOB NULL DEFAULT NULL; # 將表test.data的Mobile列的數據類型改為BLOB，該列數據預設為NULL

1.9 DELETE 刪除行

DELETE FROM celebs 
WHERE
    twitter_handle IS NULL; # 刪除表celebs中twitter_handle為NULL的行

2. Queries - Retrieve data 2.3 Select Distinct 返回唯一不同的值

SELECT DISTINCT
    genre
FROM
    movies; # 查詢movies表中genre列的所有不重覆值

2.4 Where 規定選擇的標準

SELECT 
    *
FROM
    movies
WHERE
    imdb_rating > 8; # 查詢movies表中imdb_rating大於8的行

= equals != not equals > greater than < less than >= greater than or equal to <= less than or equal to 2.5 Like I 在 WHERE 子句中搜索列中的指定模式

SELECT 
    *
FROM
    movies
WHERE
    name LIKE ' Se_en';

2.6 Like II

SELECT 
    *
FROM
    movies
WHERE
    name LIKE 'a%';

SELECT 
    *
FROM
    movies
WHERE
    name LIKE '%man%';

NB 通配符 '_' substitutes any individual character '%' matches zero or more missing characters '[charlist]%' any individual character in string: WHERE city LIKE '[ALN]%' 以“A"或”L“或”N“開頭的城市 '[!charlist]%' any individual character not in string: WHERE city LIKE '[!ALN]%' 不以“A"或”L“或”N“開頭的城市 2.7 Between 在 WHERE 子句中使用，選取介於兩個值之間的數據範圍 The BETWEEN operator is used to filter the result set within a certain range. The values can be numbers, text or dates.

SELECT 
    *
FROM
    movies
WHERE
    name BETWEEN 'A' AND 'J'; # 查詢movies中name以A至J開頭的所有行

NB: names that begin with letter "A" up to but not including "J". 不同的資料庫對 BETWEEN...AND 操作符的處理方式是有差異的，有開區間、閉區間，也有半開半閉區間。

SELECT 
    *
FROM
    movies
WHERE
    year BETWEEN 1990 AND 2000; # 查詢movies中year在1990至2000年間的行

NB: years between 1990 up to and including 2000 2.8 And 且運算符 AND is an operator that combines two conditions. Both conditions must be true for the row to be included in the result set.

SELECT 
    *
FROM
    movies
WHERE
    year BETWEEN 1990 AND 2000
        AND genre = 'comedy'; # 查詢movies中year在1990至2000間，且genre為comedy的行

2.9 Or 或運算符 OR is used to combine more than one condition in WHERE clause. It evaluates each condition separately and if any of the conditions are true than the row is added to the result set. OR is an operator that filters the result set to only include rows where either condition is true.

SELECT 
    *
FROM
    movies
WHERE
    genre = 'comedy' OR year < 1980; # 查詢movies中genre為comedy，或year小於1980的行

2.10 Order By 對結果集進行排序

SELECT 
    *
FROM
    movies
ORDER BY imdb_rating DESC; # 查詢movies中的行，結果以imdb_rating降序排列

DESC sorts the result by a particular column in descending order (high to low or Z - A). ASC ascending order (low to high or A - Z). 2.11 Limit 規定返回的記錄的數目 LIMIT is a clause that lets you specify the maximum number of rows the result set will have.

SELECT 
    *
FROM
    movies
ORDER BY imdb_rating ASC
LIMIT 3;  # 查詢movies中的行，結果以imdb_rating升序排列，僅返回前3行

MS SQL Server中使用SELECT TOP 3，Oracle中使用WHERE ROWNUM <= 5（？） 3. Aggregate Function 3.2 Count 返回匹配指定條件的行數 COUNT( ) is a function that takes the name of a column as an argument and counts the number of rows where the column is not NULL.

SELECT 
    COUNT(*)
FROM
    fake_apps
WHERE 
    price = 0; # 返回fake_apps中price=0的行數

3.3 Group By 合計函數

SELECT 
    price, COUNT(*)
FROM
    fake_apps
WHERE
    downloads > 2000
GROUP BY price; # 查詢fake_apps表中downloads大於2000的行，將結果集根據price分組，返回price和行數

Here, our aggregate function is COUNT( ) and we are passing price as an argument（參數） to GROUP BY. SQL will count the total number of apps for each price in the table. It is usually helpful to SELECT the column you pass as an argument to GROUP BY. Here we SELECT price and COUNT(*). 3.4 Sum 返回數值列的總數（總額） SUM is a function that takes the name of a column as an argument and returns the sum of all the values in that column.

SELECT 
    category, SUM(downloads)
FROM
    fake_apps
GROUP BY category;

3.5 Max 返回一列中的最大值（NULL 值不包括在計算中） MAX( ) is a function that takes the name of a column as an argument and returns the largest value in that column.

SELECT 
    name, category, MAX(downloads)
FROM
    fake_apps
GROUP BY category;

3.6 Min 返回一列中的最小值（NULL 值不包括在計算中） MIN( ) is a function that takes the name of a column as an argument and returns the smallest value in that column.

SELECT 
    name, category, MIN(downloads)
FROM
    fake_apps
GROUP BY category;

3.7 Average 返回數值列的平均值（NULL 值不包括在計算中）

SELECT 
    price, AVG(downloads)
FROM
    fake_apps
GROUP BY price;

3.8 Round 把數值欄位舍入為指定的小數位數 ROUND( ) is a function that takes a column name and an integer as an argument. It rounds the values in the column to the number of decimal places specified by the integer.

SELECT 
    price, ROUND(AVG(downloads), 2)
FROM
    fake_apps
GROUP BY price;

4. Multiple Tables 4.2 Primary Key 主鍵 A primary key serves as a unique identifier for each row or record in a given table. The primary key is literally an "id" value for a record. We could use this value to connect the table to other tables.

CREATE TABLE  artists
    (
    id INTEGER PRIMARY KET,
    name TEXT
    );

NB By specifying that the "id" column is the "PRIMARY KEY", SQL make sure that: 1. None of the values in this column are "NULL"; 2. Each value in this column is unique. A table can not have more than one "PRIMARY KEY" column. 4.3 Foreign Key 外鍵

SELECT 
    *
FROM
    albums
WHERE
    artist_id = 3;

A foreign key is a column that contains the primary key of another table in the database. We use foreign keys and primary keys to connect rows in two different tables. One table's foreign key holds the value of another table's primary key. Unlike primary keys, foreign keys do not need to be unique and can be NULL. Here, artist_id is a foreign key in the "albums" table. The relationship between the "artists" table and the "albums" table is the "id" value of the artists. 4.4 Cross Join 用於生成兩張表的笛卡爾集

SELECT 
    albums.name, albums.year, artists.name
FROM
    albums,
    artists;

One way to query multiple tables is to write a SELECT statement with multiple table names seperated by a comma. This is also known as a "cross join". When querying more than one table, column names need to be specified by table_name.column_name. Unfortunately, the result of this cross join is not very useful. It combines every row of the "artists" table with every row of the "albums" table. It would be more useful to only combine the rows where the album was created by the artist. 4.5 Inner Join 內連接：在表中存在至少一個匹配時，INNER JOIN 關鍵字返回行

SELECT 
    *
FROM
    albums
        JOIN
    artists ON albums.artist_id = artists.id; # INNER JOIN等價於JOIN，寫JOIN預設為INNER JOIN

In SQL, joins are used to combine rows from two or more tables. The most common type of join in SQL is an inner join. An inner join will combine rows from different tables if the join condition is true. 1. SELECT *: specifies the columns our result set will have. Here * refers to every column in both tables; 2. FROM albums: specifies first table we are querying; 3. JOIN artists ON: specifies the type of join as well as the second table; 4. albums.artist_id = artists.id: is the join condition that describes how the two tables are related to each other. Here, SQL uses the foreign key column "artist_id" in the "albums" table to match it with exactly one row in the "artists" table with the same value in the "id" column. It will only match one row in the "artists" table because "id" is the PRIMARY KEY of "artists". 4.6 Left Outer Join 左外連接：即使右表中沒有匹配，也從左表返回所有的行

SELECT 
    *
FROM
    albums
        LEFT JOIN
    artists ON albums.artist_id = artists.id;

Outer joins also combine rows from two or more tables, but unlike inner joins, they do not require the join condition to be met. Instead, every row in the left table is returned in the result set, and if the join condition is not met, the NULL values are used to fill in the columns from the right table. RIGHT JOIN 右外鏈接：即使左表中沒有匹配，也從右表返回所有的行 FULL JOIN 全鏈接：只要其中一個表中存在匹配，就返回行 4.7 Aliases 為列名稱和表名稱指定別名 AS is a keyword in SQL that allows you to rename a column or table using an alias. The new name can be anything you want as long as you put it inside of single quotes.

SELECT 
    albums.name AS 'Album',
    albums.year,
    artists.name AS 'Artist'
FROM
    albums
        JOIN
    artists ON albums.artist_id = artists.id
WHERE
    albums.year > 1980;

NB The columns have not been renamed in either table. The aliases only appear in the result set. II. SQL: Table Transformation 1. Subqueries 子查詢 1.2 Non-Correlated Subqueries I 不相關子查詢

SELECT 
    *
FROM
    flights
WHERE
    origin IN (SELECT 
            code
        FROM
            airports
        WHERE
            elevation > 2000);

1.4 Non-Correlated Subqueries III

SELECT 
    a.dep_month,
    a.dep_day_of_week,
    AVG(a.flight_count) AS average_flights
FROM
    (SELECT 
        dep_month,
            dep_day_of_week,
            dep_date,
            COUNT(*) AS flight_count
    FROM
        flights
    GROUP BY 1 , 2 , 3) a
WHERE
    a.dep_day_of_week = 'Friday'
GROUP BY 1 , 2
ORDER BY 1 , 2; # 返回每個月中，每個星期五的平均航班數量

結構 [outer query] FROM [inner query] a WHERE GROUP BY ORDER BY NB "a": With the inner query, we create a virtual table. In the outer query, we can refer to the inner query as "a". "1,2,3" in inner query: refer to the first, second and third columns selected for display DBMS SELECT dep_month, (1) dep_day_of_week, (2) dep_date, (3) COUNT(*) AS flight_count (4) FROM flights

SELECT 
    a.dep_month,
    a.dep_day_of_week,
    AVG(a.flight_distance) AS average_distance
FROM
    (SELECT 
        dep_month,
    dep_day_of_week,
    dep_date,
    SUM(distance) AS flight_distance
    FROM
        flights
    GROUP BY 1 , 2 , 3) a
GROUP BY 1 , 2
ORDER BY 1 , 2; # 返回每個月中，每個周一、周二……至周日的平均飛行距離

1.5 Correlated Subqueries I 相關子查詢 NB In a correlated subquery, the subquery can not be run independently of the outer query. The order of operations is important in a correlated subquery: 1. A row is processed in the outer query; 2. Then, for that particular row in the outer query, the subquery is executed. This means that for each row processed by the outer query, the subquery will also be processed for that row.

SELECT 
    id
FROM
    flights AS f
WHERE
    distance > (SELECT 
            AVG(distance)
        FROM
            flights
        WHERE
            carrier = f.carrier); # the list of all flights whose distance is above average for their carrier

1.6 Correlated Subqueries II In the above query, the inner query has to be reexecuted for each flight. Correlated subqueries may appear elsewhere besides the WHERE clause, they can also appear in the SELECT.

SELECT 
    carrier,
    id,
    (SELECT 
            COUNT(*)
        FROM
            flights f
        WHERE
            f.id < flights.id
                AND f.carrier = flights.carrier) + 1 AS flight_sequence_number
FROM
    flights; # 結果集為航空公司，航班id以及序號。相同航空公司的航班，id越大則序號越大

相關子查詢中，對於外查詢執行的每一行，子查詢都會為這一行執行一次。在這段代碼中，每當外查詢提取一行數據中的carrier和id，子查詢就會COUNT表中有多少行的carrier與外查詢中的行的carrier相同，且id小於外查詢中的行，併在COUNT結果上+1，這一結果列別名為flight_sequence_number。於是，id越大的航班，序號就越大。如果將"<"改為">"，則id越大的航班，序號越小。 2. Set Operation 2.2 Union 並集 (only distinct values) Sometimes, we need to merge two tables together and then query the merged result. There are two ways of doing this: 1) Merge the rows, called a join. 2) Merge the columns, called a union.

SELECT 
    item_name
FROM
    legacy_products 
UNION SELECT 
    item_name
FROM
    new_products;

Each SELECT statement within the UNION must have the same number of columns with similar data types. The columns in each SELECT statement must be in the same order. By default, the UNION operator selects only distinct values. 2.3 Union All 並集 (allows duplicate values)

SELECT 
    AVG(sale_price) 
FROM
    (SELECT 
        id, sale_price
    FROM
        order_items UNION ALL SELECT 
        id, sale_price
    FROM
        order_items_historic) AS a;

2.4 Intersect 交集 Microsoft SQL Server's INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.

SELECT category FROM new_products
INTERSECT
SELECT category FROM legacy_products;

NB MySQL不滋瓷INTERSECT，但可以用INNER JOIN+DISTINCT或WHERE...IN+DISTINCT或WHERE EXISTS實現：

SELECT DISTINCT
    category
FROM
    new_products
        INNER JOIN
    legacy_products USING (category);

或

SELECT DISTINCT
    category
FROM
    new_products
WHERE
    category IN (SELECT 
            category
        FROM
            legacy_products);

http://stackoverflow.com/questions/2621382/alternative-to-intersect-in-mysql 網上很多通過UNION ALL 實現的辦法（如下）是錯誤的，可能會返回僅在一個表中出現且COUNT(*) > 1的值：

SELECT 
    category, COUNT(*)
FROM
    (SELECT 
        category
    FROM
        new_products UNION ALL SELECT 
        category
    FROM
        legacy_products) a
GROUP BY category
HAVING COUNT(*) > 1;

2.5 Except (MS SQL Server) / Minus (Oracle) 差集

SELECT category FROM legacy_products
EXCEPT # 在Oracle中為MINUS
SELECT category FROM new_products;

NB MySQL不滋瓷差集，但可以用WHERE...IS NULL+DISTINCT或WHERE...NOT IN+DISTINCT或WHERE EXISTS實現：

SELECT DISTINCT
    category
FROM
    legacy_products
        LEFT JOIN
    new_products USING (category)
WHERE
    new_products.category IS NULL;

或

SELECT DISTINCT
    category
FROM
    legacy_products
WHERE
    category NOT IN (SELECT 
            category
        FROM
            new_products);

3. Conditional Aggregates 3.2 NULL use IS NULL or IS NOT NULL in the WHERE clause to test whether a value is or is not null.

SELECT 
    COUNT(*)
FROM
    flights
WHERE
    arr_time IS NOT NULL
        AND destination = 'ATL';

3.3 CASE WHEN "if, then, else"

SELECT
    CASE
        WHEN elevation < 250 THEN 'Low'
        WHEN elevation BETWEEN 250 AND 1749 THEN 'Medium'
        WHEN elevation >= 1750 THEN 'High'
        ELSE 'Unknown'
    END AS elevation_tier
    , COUNT(*)
FROM airports
GROUP BY 1;

END is required to terminate the statement, but ELSE is optional. If ELSE is not included, the result will be NULL. 3.4 COUNT(CASE WHEN) count the number of low elevation airports by state where low elevation is defined as less than 1000 ft.

SELECT 
    state,
    COUNT(CASE
        WHEN elevation < 1000 THEN 1
        ELSE NULL
    END) AS count_low_elevaton_airports
FROM
    airports
GROUP BY state;

3.5 SUM(CASE WHEN) sum the total flight distance and compare that to the sum of flight distance from a particular airline (in this case, Delta) by origin airport.

SELECT 
    origin,
    SUM(distance) AS total_flight_distance,
    SUM(CASE
        WHEN carrier = 'DL' THEN distance
        ELSE 0
    END) AS total_delta_flight_distance
FROM
    flights
GROUP BY origin;

3.6 Combining aggregates find out the percentage of flight distance that is from Delta by origin airport.