分區容量大於16TB的格式化

来源:http://www.cnblogs.com/similarface/archive/2016/03/02/5235555.html
-Advertisement-
Play Games

當磁碟或者分區的容量超過16TB的時候,如何格式化呢?mkfs不能滿足需求。


File systems do have limits. Thats no surprise. ext3 had a limit at 16 TB file system size. If you needed more space you´d have to use another file system for instance XFS or JFS or spilt the capacity into multiple mount points.

ext4 was designed to allow far more larger file systems than ext3. According to wikipedia ext4 has a maximum file system size of 1 EiB (approx. one exabyte or 1024 TB).

Now if you´d try to create one single large file system with ext4 on every linux distribution out there (including OEL 6.1; as of 18th August 2011) you will end up with:

[root@localhost ~]# mkfs.ext4 /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)
mkfs.ext4: Size of device /dev/iscsi/test too big to be expressed in 32 bit susing a blocksize of 4096.

This post is about how to solve the issue.

 

The demo system

My demo system consists of one large LUNof 18 TB encapsulated in LVM with a logical volume of 17 TB on a Oracle Enterprise Linux (OEL 5.5):

[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[root@localhost ~]# fdisk -l /dev/sdb
Disk /dev/sdb: 19791.2 GB, 19791209299968 bytes
255 heads, 63 sectors/track, 2406144 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdb doesn't contain a valid partition table 

[root@localhost ~]# vgdisplay iscsi
--- Volume group ---
VG Name               iscsi
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  2
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                1
Open LV               0
Max PV                0
Cur PV                1
Act PV                1
VG Size               18.00 TB
PE Size               4.00 MB
Total PE              4718591
Alloc PE / Size       4456448 / 17.00 TB
Free  PE / Size       262143 / 1024.00 GB
VG UUID               tdi4f2-3ZYr-c1P0-NuSl-i3w2-5qQl-K75guj
[root@localhost ~]# lvdisplay iscsi
--- Logical volume ---
LV Name                /dev/iscsi/test
VG Name                iscsi
LV UUID                8q1UrT-ludC-FEkT-NExO-4Gzd-cn5H-FYJcB1
LV Write Access        read/write
LV Status              available
# open                 0
LV Size                17.00 TB
Current LE             4456448
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     256
Block device           253:2

Creating file systems  larger than 16TB with ext4:

If you try to create a ext4 file system on the 17 TB logical volume:

[root@localhost ~]# mkfs.ext4 /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)
mkfs.ext4: Size of device /dev/iscsi/test too big to be expressed in 32 bit susing a blocksize of 4096.

OK. Maybe with ext4dev:

[root@localhost ~]# mkfs.ext4dev /dev/iscsi/test mke4fs 1.41.9 (22-Aug-2009)
mkfs.ext4dev: Size of device /dev/iscsi/test too big to be expressed in 32 bits using a blocksize of 4096.

Nope – no success. The reason behind that are the e2fsprogs (or how they are called on OEL: e4fsprogs) are not able to deal with file systems larger than ~ 16 TB.

To be specific: Even with the most recent e2fsprogs 1.41.14 there is no way to create file systems larger than 16 TB.

But: According to this post it should work since June:

It’s taken way too long, but I’ve finally finished integrating the 64-bit patches into e2fsprogs’s mainline repository. All of the necessary patches should now be in the master branch for e2fsprogs. The big change from before is that I replaced Val’s changes for fixing up how mke2fs picked the correct fs-type profile from mke2fs.conf with something that I think works much better and leaves the code much cleaner. With this change you need to add the following to your /etc/mke2fs.conf file if you want to enable the 64-bit feature flag automatically for a big disk:

[fs_types] ext4 = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
auto_64-bit_support = 1 # <—- add this line
inode_size = 256
}

Alternatively you can change the features line to include the feature “64bit”; this will force the use of the 64-bit fields, and double the size of the block group descriptors, even for smaller file systems that don’t require the 64-bit support. (This was one of my problems with Val’s implementation; it forced the mke2fs.conf file to always enable the 64-bit feature flag, which would cause backwards compatibility issues.) This might be a good thing to do for debugging purposes, though, so this is an option which I left open, but the better way of doing things is to use the auto_64-bit-support flag.

So the change must be there. A short look at the ‘WIP’ (work-in-progress) branch of the e2fsprogrs confirmed the integration.

So i tried to build the most recent e2fsprogs (Remeber: This are *development* tools – use at your OWN RISK):

[root@vm-mkmoel ~] git clone git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
[root@vm-mkmoel ~]# cd e2fsprogs
[root@vm-mkmoel e2fsprogs]# mkdir build ; cd build/
[root@vm-mkmoel build]# ../configure
[root@vm-mkmoel build]# make
[root@vm-mkmoel build]# make install

So let´s try to create a file system:

[root@vm-mkmoel misc]# ./mke2fs -O 64bit,has_journal,extents,huge_file,flex_bg, \
uninit_bg,dir_nlink,extra_isize -i 4194304 /dev/iscsi/test 

mke2fs 1.42-WIP (02-Jul-2011)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
4456448 inodes, 4563402752 blocks
228170137 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=6710886400
139264 block groups
32768 blocks per group, 32768 fragments per group
32 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
2560000000, 3855122432
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 0 mounts or 0 days,
whichever comes first.  Use tune2fs -c or -i to override.

OK. Seems to have worked. Lets check it:

[root@vm-mkmoel misc]# mount /dev/iscsi/test /mnt
[root@vm-mkmoel misc]# df -h
Filesystem                          Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00     18G  2.6G   14G  16% /
/dev/sda1                           99M  13M  82M    14% /boot
tmpfs                               502M 0    502M   0% /dev/shm
/dev/mapper/iscsi-test              17T  229M   17T   1% /mnt
[root@vm-mkmoel misc]# mount | grep mnt
/dev/mapper/iscsi-test on /mnt type ext4 (rw)

As you can see: With the most recent development e2fsprogrs it is possible to create ext4 file systems larger than 16 TB.

I even tried it with a 50 TB file system (because thats what i needed i my use case):

[root@vm-mkmoel misc]# df -h
Filesystem                          Size Used Avail Use% Mounted on
/dev/mapper/iscsi-test              50T  237M   48T   1% /mnt

Update:

Today i tested some more user space tools.

fsck

Maybe the most important tool in case the journaling fails. I copied some data to the file system (roughly about 2 TB) and had 73% of my 6.5 million inodes (one inode per 8 MB) allocated. Running fsck on my demo system with 1 GB memory yields:

[root@vm-mkmoel ~]# fsck.ext4 -f /dev/iscsi/test
e2fsck 1.42-WIP (02-Jul-2011)
Pass 1: Checking inodes, blocks, and sizes
Error allocating block bitmap (4): Memory allocation failed

fsck is some kind of messy with memory. Increasing the memory to 8 GB did it. While running fsck i noticed a memory consumption of up to 3.4 GB! So large file systems require a lot of memory for fscking. It requires even more memory with more inodes!

resize2fs

After fscking my file system i tried to resize it:

[root@localhost sbin]# lvresize -l +7199 /dev/iscsi/test
  Extending logical volume test to 50.00 TB
  Logical volume test successfully resized
[root@localhost sbin]# resize2fs /dev/iscsi/test
resize2fs 1.42-WIP (02-Jul-2011)
resize2fs: New size too large to be expressed in 32 bits

As you can see resizing the file system is not yet supported/implemented. So it would be wise to create the file system with the final size from start since growing is NOT possible!

tune2fs

tune2fs seems to work – at least it dumps the suberblock contents:

[root@localhost sbin]# tune2fs -l /dev/iscsi/test
tune2fs 1.42-WIP (02-Jul-2011)
Filesystem volume name:   <none>
Last mounted on:          /mnt/mnt
Filesystem UUID:          a754e947-8b89-415d-909d-000e6c95c44a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              6550000
Block count:              13414400000
Reserved block count:     670720000
Free blocks:              13394134177
Free inodes:              1484526
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16
Inode blocks per group:   1
Flex block group size:    16
Filesystem created:       Wed Oct 19 17:09:06 2011
Last mount time:          Wed Oct 19 18:45:47 2011
Last write time:          Wed Oct 19 18:45:47 2011
Mount count:              1
Maximum mount count:      20
Last checked:             Wed Oct 19 18:35:36 2011
Check interval:           0 (<none>)
Lifetime writes:          2511 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      ea117174-a04a-412e-a067-7972804f83d7
Journal backup:           inode blocks

Setting properties works as well:

[root@localhost sbin]# tune2fs -L test /dev/iscsi/test
tune2fs 1.42-WIP (02-Jul-2011)
[root@localhost sbin]# tune2fs -l /dev/iscsi/test | head -10
tune2fs 1.42-WIP (02-Jul-2011)
Filesystem volume name:   test
Last mounted on:          /mnt/mnt
[...]

e4defrag

e4defrag is a new tool to defragment the ext4 file system. According to the man page:

e4defrag  reduces  fragmentation of extent based file. The file targeted by e4defrag is created on ext4 filesystem made with “-O extent” option (see  mke2fs(8)).   The  targeted  file gets more contiguous blocks and improves the file access speed.

I am not yet sure how this affects file systems used for oracle datafiles. All i can say is that e4defrag seems to work with >16 TB file systems:

 

[root@localhost sbin]# e4defrag /mnt/
ext4 defragmentation for directory(/mnt/)
[....]
        Success:                        [ 4772040/5065465 ]
        Failure:                        [ 293425/5065465 ]

The failures are from directories which cannot be defragmented.

 

Conclusion

With the most recent e2fstools (1.42-WIP) it is possible to create ext4 file system larger than 16 TB.

If you do so remember the following:

  • the tool is still in development – use at your own risk!
  • tune the values for autocheck (after x mounts / after y days)
  • adjust the “-i” switch which defnes the bytes/inode ratio; in the example above one inode is created for every 8 MB
  • the more inodes you create the longer fsck takes and the more memory it needs
  • Resizing the file system (growing / shrinking) is NOT possible at the moment

http://blog.ronnyegner-consulting.de/2011/08/18/ext4-and-the-16-tb-limit-now-solved/

 


您的分享是我們最大的動力!

-Advertisement-
Play Games
更多相關文章
  • (1)選擇最有效率的表名順序(只在基於規則的優化器中有效):Oracle的解析器按照從右到左的順序處理FROM子句中的表名,FROM子句中寫在最後的表(基礎表 driving table)將被最先處理,在FROM子句中包含多個表的情況下,你必須選擇記錄條數最少的表作為基礎表。如果有3個以上的表連接查
  • 本文內容 Elasticsearch logstash 本文介紹安裝 logstash 2.2.0 和 elasticsearch 2.2.0,操作系統環境版本是 CentOS/Linux 2.6.32-504.23.4.el6.x86_64。 安裝 JDK 是必須的,一般操作系統都會有,只是版本的...
  • 關係型資料庫的定義及設計思路
  • 上一篇說了一下查詢5步走~然後就幾天_(:з」∠)_ ~今天繼續說一下其中 表裡面操作符裡面的 Pivot ~ Pivot 在實現行轉列的時候灰常有用。通常一個例子 CREATE TABLE #Tbl (Emp VARCHAR(50), [WeekDay] VARCHAR(50),LoginTime
  • SQL語句分組排序,多表關聯排序總結幾種常見的方法: 案例一: 在查詢結果中按人數降序排列,若人數相同,則按課程號升序排列? 分析:單個表內的多個欄位排序,一般可以直接用逗號分割實現。 select * from tableA order by col1 desc,col2 asc; -- 先按co
  • null是什麼? 不知道。我是說,他的意思就是不知道(unknown)。 它和true、false組成謂詞的三個邏輯值,代表“未知”。與true和false相比,null最難以令人捉摸,因為它沒有明確的值,在不同的場景下,它能代表不同的含義。下文以例子的方式給大家分享下null使用的典型場景及對應的
  • samza是一個分散式的流式數據處理框架(streaming processing),它是基於Kafka消息隊列來實現類實時的流式數據處理的。(準確的說,samza是通過模塊化的形式來使用kafka的,因此可以構架在其他消息隊列框架上,但出發點和預設實現是基於kafka)
  • 一、註解方式 1. 在Spring的配置文件ApplicationContext.xml,首先添加命名空間 1 xmlns:task="http://www.springframework.org/schema/task" 2 http://www.springframework.org/schem
一周排行
    -Advertisement-
    Play Games
  • 移動開發(一):使用.NET MAUI開發第一個安卓APP 對於工作多年的C#程式員來說,近來想嘗試開發一款安卓APP,考慮了很久最終選擇使用.NET MAUI這個微軟官方的框架來嘗試體驗開發安卓APP,畢竟是使用Visual Studio開發工具,使用起來也比較的順手,結合微軟官方的教程進行了安卓 ...
  • 前言 QuestPDF 是一個開源 .NET 庫,用於生成 PDF 文檔。使用了C# Fluent API方式可簡化開發、減少錯誤並提高工作效率。利用它可以輕鬆生成 PDF 報告、發票、導出文件等。 項目介紹 QuestPDF 是一個革命性的開源 .NET 庫,它徹底改變了我們生成 PDF 文檔的方 ...
  • 項目地址 項目後端地址: https://github.com/ZyPLJ/ZYTteeHole 項目前端頁面地址: ZyPLJ/TreeHoleVue (github.com) https://github.com/ZyPLJ/TreeHoleVue 目前項目測試訪問地址: http://tree ...
  • 話不多說,直接開乾 一.下載 1.官方鏈接下載: https://www.microsoft.com/zh-cn/sql-server/sql-server-downloads 2.在下載目錄中找到下麵這個小的安裝包 SQL2022-SSEI-Dev.exe,運行開始下載SQL server; 二. ...
  • 前言 隨著物聯網(IoT)技術的迅猛發展,MQTT(消息隊列遙測傳輸)協議憑藉其輕量級和高效性,已成為眾多物聯網應用的首選通信標準。 MQTTnet 作為一個高性能的 .NET 開源庫,為 .NET 平臺上的 MQTT 客戶端與伺服器開發提供了強大的支持。 本文將全面介紹 MQTTnet 的核心功能 ...
  • Serilog支持多種接收器用於日誌存儲,增強器用於添加屬性,LogContext管理動態屬性,支持多種輸出格式包括純文本、JSON及ExpressionTemplate。還提供了自定義格式化選項,適用於不同需求。 ...
  • 目錄簡介獲取 HTML 文檔解析 HTML 文檔測試參考文章 簡介 動態內容網站使用 JavaScript 腳本動態檢索和渲染數據,爬取信息時需要模擬瀏覽器行為,否則獲取到的源碼基本是空的。 本文使用的爬取步驟如下: 使用 Selenium 獲取渲染後的 HTML 文檔 使用 HtmlAgility ...
  • 1.前言 什麼是熱更新 游戲或者軟體更新時,無需重新下載客戶端進行安裝,而是在應用程式啟動的情況下,在內部進行資源或者代碼更新 Unity目前常用熱更新解決方案 HybridCLR,Xlua,ILRuntime等 Unity目前常用資源管理解決方案 AssetBundles,Addressable, ...
  • 本文章主要是在C# ASP.NET Core Web API框架實現向手機發送驗證碼簡訊功能。這裡我選擇是一個互億無線簡訊驗證碼平臺,其實像阿裡雲,騰訊雲上面也可以。 首先我們先去 互億無線 https://www.ihuyi.com/api/sms.html 去註冊一個賬號 註冊完成賬號後,它會送 ...
  • 通過以下方式可以高效,並保證數據同步的可靠性 1.API設計 使用RESTful設計,確保API端點明確,並使用適當的HTTP方法(如POST用於創建,PUT用於更新)。 設計清晰的請求和響應模型,以確保客戶端能夠理解預期格式。 2.數據驗證 在伺服器端進行嚴格的數據驗證,確保接收到的數據符合預期格 ...