Codecademy中Learn SQL, SQL: Table Transformaton和SQL: Analyzing Business Metrics三門課程的筆記,以及補充的附加筆記。 Codecademy的課程以SQLite編寫,筆記中改成了MySQL語句。 I. Learn SQL 1. ...
Codecademy中Learn SQL, SQL: Table Transformaton和SQL: Analyzing Business Metrics三門課程的筆記,以及補充的附加筆記。 Codecademy的課程以SQLite編寫,筆記中改成了MySQL語句。 I. Learn SQL 1. Manipulation - Create, edit, delete data 1.4 Create 創建資料庫或資料庫中的表
CREATE TABLE celebs ( id INTEGER, name TEXT, age INTEGER ); # 第一列id,數據類型整數;第二列name,數據類型文本;第三列age,數據類型整數
1.5 Insert 向表中插入行
INSERT INTO celebs ( id, name, age) VALUES ( 1, 'Alan Mathison Turing', 42); # 在celebs表最下方插入數據:id列為1,name列為Alan Mathion Turing,age列為42
1.6 Select 選取數據
SELECT * FROM celebs; # 顯示celebs表所有數據
1.7 Update 修改數據
UPDATE celebs SET age = 22 WHERE id = 1; # 將celebs表中id=1的行的age改為22
1.8 Alert 更改表結構或數據類型
ALERT TABLE celebs ADD COLUMN twitter_handle TEXT; # 在celebs表增加twitter_handle列
ALERT TABLE 'test'.'data' CHANGE COLUMN 'Mobile' 'Mobile' BLOB NULL DEFAULT NULL; # 將表test.data的Mobile列的數據類型改為BLOB,該列數據預設為NULL
1.9 DELETE 刪除行
DELETE FROM celebs WHERE twitter_handle IS NULL; # 刪除表celebs中twitter_handle為NULL的行
2. Queries - Retrieve data 2.3 Select Distinct 返回唯一不同的值
SELECT DISTINCT genre FROM movies; # 查詢movies表中genre列的所有不重覆值
2.4 Where 規定選擇的標準
SELECT * FROM movies WHERE imdb_rating > 8; # 查詢movies表中imdb_rating大於8的行
= equals != not equals > greater than < less than >= greater than or equal to <= less than or equal to 2.5 Like I 在 WHERE 子句中搜索列中的指定模式
SELECT * FROM movies WHERE name LIKE ' Se_en';
2.6 Like II
SELECT * FROM movies WHERE name LIKE 'a%';
SELECT * FROM movies WHERE name LIKE '%man%';
NB 通配符 '_' substitutes any individual character '%' matches zero or more missing characters '[charlist]%' any individual character in string: WHERE city LIKE '[ALN]%' 以“A"或”L“或”N“開頭的城市 '[!charlist]%' any individual character not in string: WHERE city LIKE '[!ALN]%' 不以“A"或”L“或”N“開頭的城市 2.7 Between 在 WHERE 子句中使用,選取介於兩個值之間的數據範圍 The BETWEEN operator is used to filter the result set within a certain range. The values can be numbers, text or dates.
SELECT * FROM movies WHERE name BETWEEN 'A' AND 'J'; # 查詢movies中name以A至J開頭的所有行
NB: names that begin with letter "A" up to but not including "J". 不同的資料庫對 BETWEEN...AND 操作符的處理方式是有差異的,有開區間、閉區間,也有半開半閉區間。
SELECT * FROM movies WHERE year BETWEEN 1990 AND 2000; # 查詢movies中year在1990至2000年間的行
NB: years between 1990 up to and including 2000 2.8 And 且運算符 AND is an operator that combines two conditions. Both conditions must be true for the row to be included in the result set.
SELECT * FROM movies WHERE year BETWEEN 1990 AND 2000 AND genre = 'comedy'; # 查詢movies中year在1990至2000間,且genre為comedy的行
2.9 Or 或運算符 OR is used to combine more than one condition in WHERE clause. It evaluates each condition separately and if any of the conditions are true than the row is added to the result set. OR is an operator that filters the result set to only include rows where either condition is true.
SELECT * FROM movies WHERE genre = 'comedy' OR year < 1980; # 查詢movies中genre為comedy,或year小於1980的行
2.10 Order By 對結果集進行排序
SELECT * FROM movies ORDER BY imdb_rating DESC; # 查詢movies中的行,結果以imdb_rating降序排列
DESC sorts the result by a particular column in descending order (high to low or Z - A). ASC ascending order (low to high or A - Z). 2.11 Limit 規定返回的記錄的數目 LIMIT is a clause that lets you specify the maximum number of rows the result set will have.
SELECT * FROM movies ORDER BY imdb_rating ASC LIMIT 3; # 查詢movies中的行,結果以imdb_rating升序排列,僅返回前3行
MS SQL Server中使用SELECT TOP 3,Oracle中使用WHERE ROWNUM <= 5(?) 3. Aggregate Function 3.2 Count 返回匹配指定條件的行數 COUNT( ) is a function that takes the name of a column as an argument and counts the number of rows where the column is not NULL.
SELECT COUNT(*) FROM fake_apps WHERE price = 0; # 返回fake_apps中price=0的行數
3.3 Group By 合計函數
SELECT price, COUNT(*) FROM fake_apps WHERE downloads > 2000 GROUP BY price; # 查詢fake_apps表中downloads大於2000的行,將結果集根據price分組,返回price和行數
Here, our aggregate function is COUNT( ) and we are passing price as an argument(參數) to GROUP BY. SQL will count the total number of apps for each price in the table. It is usually helpful to SELECT the column you pass as an argument to GROUP BY. Here we SELECT price and COUNT(*). 3.4 Sum 返回數值列的總數(總額) SUM is a function that takes the name of a column as an argument and returns the sum of all the values in that column.
SELECT category, SUM(downloads) FROM fake_apps GROUP BY category;3.5 Max 返回一列中的最大值(NULL 值不包括在計算中) MAX( ) is a function that takes the name of a column as an argument and returns the largest value in that column.
SELECT name, category, MAX(downloads) FROM fake_apps GROUP BY category;3.6 Min 返回一列中的最小值(NULL 值不包括在計算中) MIN( ) is a function that takes the name of a column as an argument and returns the smallest value in that column.
SELECT name, category, MIN(downloads) FROM fake_apps GROUP BY category;3.7 Average 返回數值列的平均值(NULL 值不包括在計算中)
SELECT price, AVG(downloads) FROM fake_apps GROUP BY price;
3.8 Round 把數值欄位舍入為指定的小數位數 ROUND( ) is a function that takes a column name and an integer as an argument. It rounds the values in the column to the number of decimal places specified by the integer.
SELECT price, ROUND(AVG(downloads), 2) FROM fake_apps GROUP BY price;4. Multiple Tables 4.2 Primary Key 主鍵 A primary key serves as a unique identifier for each row or record in a given table. The primary key is literally an "id" value for a record. We could use this value to connect the table to other tables.
CREATE TABLE artists ( id INTEGER PRIMARY KET, name TEXT );NB By specifying that the "id" column is the "PRIMARY KEY", SQL make sure that: 1. None of the values in this column are "NULL"; 2. Each value in this column is unique. A table can not have more than one "PRIMARY KEY" column. 4.3 Foreign Key 外鍵
SELECT * FROM albums WHERE artist_id = 3;
A foreign key is a column that contains the primary key of another table in the database. We use foreign keys and primary keys to connect rows in two different tables. One table's foreign key holds the value of another table's primary key. Unlike primary keys, foreign keys do not need to be unique and can be NULL. Here, artist_id is a foreign key in the "albums" table. The relationship between the "artists" table and the "albums" table is the "id" value of the artists. 4.4 Cross Join 用於生成兩張表的笛卡爾集
SELECT albums.name, albums.year, artists.name FROM albums, artists;
One way to query multiple tables is to write a SELECT statement with multiple table names seperated by a comma. This is also known as a "cross join". When querying more than one table, column names need to be specified by table_name.column_name. Unfortunately, the result of this cross join is not very useful. It combines every row of the "artists" table with every row of the "albums" table. It would be more useful to only combine the rows where the album was created by the artist. 4.5 Inner Join 內連接:在表中存在至少一個匹配時,INNER JOIN 關鍵字返回行
SELECT * FROM albums JOIN artists ON albums.artist_id = artists.id; # INNER JOIN等價於JOIN,寫JOIN預設為INNER JOIN
In SQL, joins are used to combine rows from two or more tables. The most common type of join in SQL is an inner join. An inner join will combine rows from different tables if the join condition is true. 1. SELECT *: specifies the columns our result set will have. Here * refers to every column in both tables; 2. FROM albums: specifies first table we are querying; 3. JOIN artists ON: specifies the type of join as well as the second table; 4. albums.artist_id = artists.id: is the join condition that describes how the two tables are related to each other. Here, SQL uses the foreign key column "artist_id" in the "albums" table to match it with exactly one row in the "artists" table with the same value in the "id" column. It will only match one row in the "artists" table because "id" is the PRIMARY KEY of "artists". 4.6 Left Outer Join 左外連接:即使右表中沒有匹配,也從左表返回所有的行
SELECT * FROM albums LEFT JOIN artists ON albums.artist_id = artists.id;
Outer joins also combine rows from two or more tables, but unlike inner joins, they do not require the join condition to be met. Instead, every row in the left table is returned in the result set, and if the join condition is not met, the NULL values are used to fill in the columns from the right table. RIGHT JOIN 右外鏈接:即使左表中沒有匹配,也從右表返回所有的行 FULL JOIN 全鏈接:只要其中一個表中存在匹配,就返回行 4.7 Aliases 為列名稱和表名稱指定別名 AS is a keyword in SQL that allows you to rename a column or table using an alias. The new name can be anything you want as long as you put it inside of single quotes.
SELECT albums.name AS 'Album', albums.year, artists.name AS 'Artist' FROM albums JOIN artists ON albums.artist_id = artists.id WHERE albums.year > 1980;NB The columns have not been renamed in either table. The aliases only appear in the result set. II. SQL: Table Transformation 1. Subqueries 子查詢 1.2 Non-Correlated Subqueries I 不相關子查詢
SELECT * FROM flights WHERE origin IN (SELECT code FROM airports WHERE elevation > 2000);
1.4 Non-Correlated Subqueries III
SELECT a.dep_month, a.dep_day_of_week, AVG(a.flight_count) AS average_flights FROM (SELECT dep_month, dep_day_of_week, dep_date, COUNT(*) AS flight_count FROM flights GROUP BY 1 , 2 , 3) a WHERE a.dep_day_of_week = 'Friday' GROUP BY 1 , 2 ORDER BY 1 , 2; # 返回每個月中,每個星期五的平均航班數量
結構 [outer query] FROM [inner query] a WHERE GROUP BY ORDER BY NB "a": With the inner query, we create a virtual table. In the outer query, we can refer to the inner query as "a". "1,2,3" in inner query: refer to the first, second and third columns selected for display DBMS SELECT dep_month, (1) dep_day_of_week, (2) dep_date, (3) COUNT(*) AS flight_count (4) FROM flights
SELECT a.dep_month, a.dep_day_of_week, AVG(a.flight_distance) AS average_distance FROM (SELECT dep_month, dep_day_of_week, dep_date, SUM(distance) AS flight_distance FROM flights GROUP BY 1 , 2 , 3) a GROUP BY 1 , 2 ORDER BY 1 , 2; # 返回每個月中,每個周一、周二……至周日的平均飛行距離
1.5 Correlated Subqueries I 相關子查詢 NB In a correlated subquery, the subquery can not be run independently of the outer query. The order of operations is important in a correlated subquery: 1. A row is processed in the outer query; 2. Then, for that particular row in the outer query, the subquery is executed. This means that for each row processed by the outer query, the subquery will also be processed for that row.
SELECT id FROM flights AS f WHERE distance > (SELECT AVG(distance) FROM flights WHERE carrier = f.carrier); # the list of all flights whose distance is above average for their carrier
1.6 Correlated Subqueries II In the above query, the inner query has to be reexecuted for each flight. Correlated subqueries may appear elsewhere besides the WHERE clause, they can also appear in the SELECT.
SELECT carrier, id, (SELECT COUNT(*) FROM flights f WHERE f.id < flights.id AND f.carrier = flights.carrier) + 1 AS flight_sequence_number FROM flights; # 結果集為航空公司,航班id以及序號。相同航空公司的航班,id越大則序號越大
相關子查詢中,對於外查詢執行的每一行,子查詢都會為這一行執行一次。在這段代碼中,每當外查詢提取一行數據中的carrier和id,子查詢就會COUNT表中有多少行的carrier與外查詢中的行的carrier相同,且id小於外查詢中的行,併在COUNT結果上+1,這一結果列別名為flight_sequence_number。於是,id越大的航班,序號就越大。 如果將"<"改為">",則id越大的航班,序號越小。 2. Set Operation 2.2 Union 並集 (only distinct values) Sometimes, we need to merge two tables together and then query the merged result. There are two ways of doing this: 1) Merge the rows, called a join. 2) Merge the columns, called a union.
SELECT item_name FROM legacy_products UNION SELECT item_name FROM new_products;
Each SELECT statement within the UNION must have the same number of columns with similar data types. The columns in each SELECT statement must be in the same order. By default, the UNION operator selects only distinct values. 2.3 Union All 並集 (allows duplicate values)
SELECT AVG(sale_price) FROM (SELECT id, sale_price FROM order_items UNION ALL SELECT id, sale_price FROM order_items_historic) AS a;
2.4 Intersect 交集 Microsoft SQL Server's INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.
SELECT category FROM new_products INTERSECT SELECT category FROM legacy_products;
NB MySQL不滋瓷INTERSECT,但可以用INNER JOIN+DISTINCT或WHERE...IN+DISTINCT或WHERE EXISTS實現:
SELECT DISTINCT category FROM new_products INNER JOIN legacy_products USING (category);
或
SELECT DISTINCT category FROM new_products WHERE category IN (SELECT category FROM legacy_products);
http://stackoverflow.com/questions/2621382/alternative-to-intersect-in-mysql 網上很多通過UNION ALL 實現的辦法(如下)是錯誤的,可能會返回僅在一個表中出現且COUNT(*) > 1的值:
SELECT category, COUNT(*) FROM (SELECT category FROM new_products UNION ALL SELECT category FROM legacy_products) a GROUP BY category HAVING COUNT(*) > 1;
2.5 Except (MS SQL Server) / Minus (Oracle) 差集
SELECT category FROM legacy_products EXCEPT # 在Oracle中為MINUS SELECT category FROM new_products;
NB MySQL不滋瓷差集,但可以用WHERE...IS NULL+DISTINCT或WHERE...NOT IN+DISTINCT或WHERE EXISTS實現:
SELECT DISTINCT category FROM legacy_products LEFT JOIN new_products USING (category) WHERE new_products.category IS NULL;
或
SELECT DISTINCT category FROM legacy_products WHERE category NOT IN (SELECT category FROM new_products);
3. Conditional Aggregates 3.2 NULL use IS NULL or IS NOT NULL in the WHERE clause to test whether a value is or is not null.
SELECT COUNT(*) FROM flights WHERE arr_time IS NOT NULL AND destination = 'ATL';
3.3 CASE WHEN "if, then, else"
SELECT CASE WHEN elevation < 250 THEN 'Low' WHEN elevation BETWEEN 250 AND 1749 THEN 'Medium' WHEN elevation >= 1750 THEN 'High' ELSE 'Unknown' END AS elevation_tier , COUNT(*) FROM airports GROUP BY 1;
END is required to terminate the statement, but ELSE is optional. If ELSE is not included, the result will be NULL. 3.4 COUNT(CASE WHEN) count the number of low elevation airports by state where low elevation is defined as less than 1000 ft.
SELECT state, COUNT(CASE WHEN elevation < 1000 THEN 1 ELSE NULL END) AS count_low_elevaton_airports FROM airports GROUP BY state;
3.5 SUM(CASE WHEN) sum the total flight distance and compare that to the sum of flight distance from a particular airline (in this case, Delta) by origin airport.
SELECT origin, SUM(distance) AS total_flight_distance, SUM(CASE WHEN carrier = 'DL' THEN distance ELSE 0 END) AS total_delta_flight_distance FROM flights GROUP BY origin;
3.6 Combining aggregates find out the percentage of flight distance that is from Delta by origin airport.