Just got done working on a friends Nexenta platform, and boy howdy was it tired. NFS was slow, iSCSI was slow, and for all the slowness, we couldn’t see a problem with the system. The GUI was reporting there was free RAM, and IOSTAT showed the disks not being completely thrashed. We didn’t see anything really out of the ordinary at first glance.
After some digging, we figured out that we were running out of RAM for the Dedupe tables. It goes a little something like this.
Nexenta by default allocates 1/4 of your ARC cache (RAM) to metadata caching. Your L2ARC map is considered metadata. When you turn on dedupe, all of that dedupe information is stored in metadata. The more you dedupe, the more RAM you use, the more L2ARC you use, the more RAM you use.
The system in question is a 48GB system, and it reported that had free memory (6GB or so), so we were baffled. If its got free RAM, what’s the holdup? Seems as though between the dedupe tables and the L2ARC, we had outstripped the capabilities of the ARC to hold all of the metadata. This caused _everything_ to be slow. The solution? You can either increase the percentage of RAM that can be used for metadata, increase the total RAM (thereby increasing the amount you can use for metadata caching), or you can turn off dedupe, copy everything off of the volume, then copy it back. Since there’s no way currently to “undedupe” a volume, once that data has been created, you’re stuck with it until you remove the files.
So, without further ado, here’s how to figure out what’s going on in your system.
echo ::arc|mdb -k
This will display some interesting stats. The most important in this situation is the last three lines :
arc_meta_used = 11476 MB arc_meta_limit = 12014 MB arc_meta_max = 12351 MB
These numbers will change. Things will get evicted, things will come back. You don’t want to see the meta_used and meta_limit numbers this close. You definately don’t want to see the meta_max exceed the limit. This is a great indicator that you’re out of RAM.
After quite a bit of futzing around, disabling dedupe, and shuffling data off of, then back on to pool, things look better :
arc_meta_used = 7442 MB arc_meta_limit = 12014 MB arc_meta_max = 12351 MB
Just by disabling dedupe, and blowing away the dedupe tables, it freed up almost 5GB of RAM. Who knows how much was being swapped in and out of RAM.
Other things to check :
zpool status -D <volumename>
This gives you your standard volume status, but it also prints out the dedupe information. This is good to figure out how much dedupe data there is. Here’s an example :
DDT entries 7102900, size 997 on disk, 531 in core
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 6.41M 820G 818G 817G 6.41M 820G 818G 817G
2 298K 37.3G 37.3G 37.3G 656K 82.0G 82.0G 81.9G
4 30.5K 3.82G 3.82G 3.81G 140K 17.5G 17.5G 17.5G
8 43.9K 5.49G 5.49G 5.49G 566K 70.7G 70.7G 70.6G
16 968 121M 121M 121M 19.1K 2.38G 2.38G 2.38G
32 765 95.6M 95.6M 95.5M 33.4K 4.17G 4.17G 4.17G
64 33 4.12M 4.12M 4.12M 2.77K 354M 354M 354M
128 5 640K 640K 639K 943 118M 118M 118M
256 2 256K 256K 256K 676 84.5M 84.5M 84.4M
1K 1 128K 128K 128K 1.29K 164M 164M 164M
4K 1 128K 128K 128K 5.85K 749M 749M 749M
32K 1 128K 128K 128K 37.0K 4.63G 4.63G 4.62G
Total 6.77M 867G 865G 864G 7.84M 1003G 1001G 1000G
This tells us that there are 7 million entries, with each entry taking up 997 bytes on disk, and 531 bytes in memory. Simple math tells us how much space that takes up.
7102900*531=3771639900/1024/1024=3596MB used in RAM
The same math tells us that there’s 6753MB used on disk, just to hold the dedupe tables.
The dedupe ratio on this system wasn’t even worth it. Overall dedupe ratio was something like 1.15x. Compression on that volume(which has nearly no overhead) after shuffling the data around,is at 1.42x. So at the cost of CPU time (which there is plenty of), we get a better over-subscription ratio from compression vs deduplication.
There are definitely use-cases for deduplication, but his generic VM storage pool is not one of them (in my experience).