
Sizer is a capacity planning tool that allows you scope hardware resources (RAM, DISK).
| Download | Installation | Documentation | Features | Contributors |
sizer is the most accurate capacity planning tool for MySQL Cluster and has been used to scope many large scale cluster deployments. You create the data model in MySQL Cluster and then run sizer on each table. At the end you will have a csv file. The content of the csv file is pasted into the supplied excel spreadsheet. There you can change the number of records for the tables (to anticipate growth), and you can scope how many data nodes you need and how much RAM and DISK storage that is required.
For the most accurate analysis it is recommended to have sample data in the tables you want to dimension. When dimensioning e.g a subscriber database you can provision a small set of subscribers (for example using the Sandbox), run sizer, and then see how much resources are needed to handle 1M, 5M, 10M etc subscribers, by changing the number of records in the excel spreadsheet.
sizer is more accurate than e.g, ndb_size.pl, because sizer calculates:
Version 0.30 - released 2nd of June 2010 - download source dist - Incorporated fixes from community for a bug in BLOB calculation.
In order to make it simple it is also recommended to have:
Make sure you have the correct mysql_config on the path. If you have used the config tool (and the default install path):
Make sure you have: g++ and gcc installed (run those two commands from the command line). If they don't exists you have to install them!!
or ..another popular PATH is
Proceed with the following when you can execute the command mysql_config (then your PATH is good):
If you don't set the LD_LIBRARY_PATH to where libmysqlclient_r.so.16 is, then you get the following error message when starting sizer:
on Solaris (you need Sun studio) run the following configure:
Running sizer is easy:
$ sizer -h will print out the arguments: -? --help prints this information
-c --ndb-connectstring=<hostname> hostname where ndb_mgmd is running, default 'localhost'
-d --database=<db> Database to analyse. Default is '', ALL databases will be analyzed
-t --tablename=<t> Table to analyse. Default is '', ALL tables will be analyzed
-a --analyze-full-table Performs a full table scan to calculate the average data usage for variable size columns. Otherwise 1024 random records are analyzed.
-l --loadfactor=<x> x is the percentage (0-100) how much of the VAR* columns you expect to fill up (on average). This is if you don't have any sample data to base the average on. 0 (default) means that this is disabled.
If you have the management server on the localhost the only necessary argument is --tablename. For example:
sizer --tablename=t1
will analyse table t1 in database test. Omitting the tablename will make sizer analyze all tables in a particular database (by default test).:
sizer - analyze all tables in test
sizer -d xyz - analyze all tables in database xyz
sizer analyzes a given table or all tables in a database. The output of running sizer is written to a text file.
If you want to analyze all ndb tables in all database you can (from sizer-0.26) run sizer :
sizer
and the result will be stored in all_databases.csv:
$cat all_databases.csv
mysql,ndb_schema,1,1,0,0,9,20,341,0
mysql,ndb_apply_status,0,1,0,0,0,5,20,331,0
test,t1,4096,1,1,0,0,5,20,68,0
test,t2,65536,1,2,1,0,6,40,152,0
Consider that we have two tables t1 and t2:
sizer -d test -t t1
sizer -d test -t t2
The result is stored in test_t1.csv and test_t2.csv and the files are written to the current working directory:
$cat test_t1.csv
test,t1,4096,1,0,0,20,68,0
$cat test_t2.csv
test,t2,65536,2,1,0,40,152,0
If you want to analyze all tables in a database you can use: sizer -d test
The result will be stored in test.csv:
$cat test.csv
test,t1,4096,1,1,0,0,5,20,68,0
test,t2,65536,1,2,1,0,6,40,152,0
Each column in the csv file corresponds to:
If we want to make a more extensive analysis we have to start sizer with -a. This will calculate the average size of variable sized columns over all records in the table. Most often, this is not needed.
It is higly recommend that you load the tables with some reference data. This will make sizing more accurate. If you don't have reference data you can start with sizer --loadfactor=[0-100] which is a percentage saying how much the variable sized columns (VARCHAR, VARBINARY..) in average will be filled up. E.g, if you have a column username VARCHAR(128) and have a --loadfactor=50, then sizer will calculate that the username will have 64 characters.
UTF8 and other CHARACTER SETS are handled correctly.
Non-indexed TINYINTs/SMALLINTs can be replaced by BIT(8) and BIT(16) respectively. sizer will suggest this to you:
--- WARNING! tiny/small/mediumint found. If the column is not indexed consider replacing it with BIT(8)!!
Best practice is to put sample data in the tables representing average data sizes. This will get better accuracy since otherwise the --loadfactor parameter is used. sizer will calculate an average over (by default) up to 1024 random records and print out:
--- VAR* attribute is 14 bytes averaged over 1024 rows
If you don't have any sample data run sizer with:
sizer --loadfactor=X where X is a percentage (0-100) representing how much you expect your VAR* columns to be filled.
If there are 0 records in the table, and loadfactor is not specified, it will default to 100%:
--- WARNING! No reference data found for VAR*. Defaulting to max size (loadfactor=100%)..256 bytes
sizer prints a warning for each UNIQUE index it finds. This is so that you can judge if you really need it, because they can be very costly:
--- WARNING! Unique Index found named (b$unique): Cost DataMemory per record = 48 and IndexMemory = 20
sizer calculates storage also for disk data columns.
sizer correctly handles UTF-8 and other Character sets.
Are correctly calculated using NdbBlob::getLength().
Many thanks to the following individuals/organizations: