Sizer is a capacity planning tool that allows you scope hardware resources (RAM, DISK).

Download Installation Documentation Features Contributors

sizer is the most accurate capacity planning tool for MySQL Cluster and has been used to scope many large scale cluster deployments. You create the data model in MySQL Cluster and then run sizer on each table. At the end you will have a csv file. The content of the csv file is pasted into the supplied excel spreadsheet. There you can change the number of records for the tables (to anticipate growth), and you can scope how many data nodes you need and how much RAM and DISK storage that is required.

For the most accurate analysis it is recommended to have sample data in the tables you want to dimension. When dimensioning e.g a subscriber database you can provision a small set of subscribers (for example using the Sandbox), run sizer, and then see how much resources are needed to handle 1M, 5M, 10M etc subscribers, by changing the number of records in the excel spreadsheet.

sizer is more accurate than e.g, ndb_size.pl, because sizer calculates:

  • correct record overheads
  • calculates cost for unique indexes
  • averages storage costs for VAR* columns (user specified by either estimation (loadfactor) or actual data)
  • calculates cost for BLOB/TEXT

Download

Sizer can be downloaded from GitHub.


Installation

Latest installation instructions are available on GitHub.

Running sizer

Running sizer is easy:

$ sizer -h will print out the arguments: -? --help prints this information
-c --ndb-connectstring=<hostname> hostname where ndb_mgmd is running, default 'localhost'
-d --database=<db> Database to analyse. Default is '', ALL databases will be analyzed
-t --tablename=<t> Table to analyse. Default is '', ALL tables will be analyzed
-a --analyze-full-table Performs a full table scan to calculate the average data usage for variable size columns. Otherwise 1024 random records are analyzed.
-l --loadfactor=<x> x is the percentage (0-100) how much of the VAR* columns you expect to fill up (on average). This is if you don't have any sample data to base the average on. 0 (default) means that this is disabled.

If you have the management server on the localhost the only necessary argument is --tablename. For example:

sizer --tablename=t1

will analyse table t1 in database test. Omitting the tablename will make sizer analyze all tables in a particular database (by default test).:
sizer - analyze all tables in test
sizer -d xyz - analyze all tables in database xyz


Documentation

Quick Start Guide

Step 1 - Get the data

sizer analyzes a given table or all tables in a database. The output of running sizer is written to a text file.

If you want to analyze all ndb tables in all database you can (from sizer-0.26) run sizer :

sizer
and the result will be stored in all_databases.csv:
$cat all_databases.csv
mysql,ndb_schema,1,1,0,0,9,20,341,0
mysql,ndb_apply_status,0,1,0,0,0,5,20,331,0
test,t1,4096,1,1,0,0,5,20,68,0
test,t2,65536,1,2,1,0,6,40,152,0

Consider that we have two tables t1 and t2:

sizer -d test -t t1
sizer -d test -t t2

The result is stored in test_t1.csv and test_t2.csv and the files are written to the current working directory:

$cat test_t1.csv
test,t1,4096,1,0,0,20,68,0
$cat test_t2.csv
test,t2,65536,2,1,0,40,152,0

If you want to analyze all tables in a database you can use: sizer -d test
The result will be stored in test.csv:
$cat test.csv
test,t1,4096,1,1,0,0,5,20,68,0
test,t2,65536,1,2,1,0,6,40,152,0

Each column in the csv file corresponds to:

  • Database name
  • Table name
  • Number of records in table found=4096 records
  • Table object it is always 1 and just there to be able to count the tables in the excel spreadsheet.
  • Number of orderedIndexes=1
  • Number of uniqueHashIndexes=0
  • Number of blob/text columns=0
  • Number of attributes
  • IndexMemory=20 bytes
  • DataMemory=68 bytes
  • Diskspace=0 bytes

If we want to make a more extensive analysis we have to start sizer with -a. This will calculate the average size of variable sized columns over all records in the table. Most often, this is not needed.

It is higly recommend that you load the tables with some reference data. This will make sizing more accurate. If you don't have reference data you can start with sizer --loadfactor=[0-100] which is a percentage saying how much the variable sized columns (VARCHAR, VARBINARY..) in average will be filled up. E.g, if you have a column username VARCHAR(128) and have a --loadfactor=50, then sizer will calculate that the username will have 64 characters.

UTF8 and other CHARACTER SETS are handled correctly.

Step 2 - Load it into the excel (or ooffice) spreadsheet

  • Load the csv file into a excel spreadsheet (you have to do some kind of import which works different on ooffice and excel).
  • Copy the data to the tab 'TableObjects' starting at cell A4. Now you can change the number of rows in the table and see how that affects storage (see next bullet)
  • In the tab 'Data node configuration' you will be able to see how much resources you need to support your data model.
  • In the tab 'Data node configuration' you can change
      number of nodes

      number of backups saved

      compressedLCP and compressedBackup

Step 3 - Get a configuration

  • The parameters Number of nodes,DataMemory, IndexMemory can now be used in the Configurator to generate a production class cluster configuration!

Features

TINYINT/SMALLINT data type

Non-indexed TINYINTs/SMALLINTs can be replaced by BIT(8) and BIT(16) respectively. sizer will suggest this to you:
--- WARNING! tiny/small/mediumint found. If the column is not indexed consider replacing it with BIT(8)!!

VARCHAR/VARBINARY data type

Best practice is to put sample data in the tables representing average data sizes. This will get better accuracy since otherwise the --loadfactor parameter is used. sizer will calculate an average over (by default) up to 1024 random records and print out:
--- VAR* attribute is 14 bytes averaged over 1024 rows

If you don't have any sample data run sizer with:
sizer --loadfactor=X where X is a percentage (0-100) representing how much you expect your VAR* columns to be filled.

If there are 0 records in the table, and loadfactor is not specified, it will default to 100%:
--- WARNING! No reference data found for VAR*. Defaulting to max size (loadfactor=100%)..256 bytes

UNIQUE INDEX

sizer prints a warning for each UNIQUE index it finds. This is so that you can judge if you really need it, because they can be very costly:
--- WARNING! Unique Index found named (b$unique): Cost DataMemory per record = 48 and IndexMemory = 20

Disk data columns/tables

sizer calculates storage also for disk data columns.

UTF-8 and Character Sets

sizer correctly handles UTF-8 and other Character sets.

BLOB/TEXT

Are correctly calculated using NdbBlob::getLength().


Contributors

Many thanks to the following individuals/organizations:

  • Alexander Fisher
  • A very large anonymous telco company