Sizer is a capacity planning tool that allows you scope hardware resources (RAM, DISK).

Download Installation Documentation Features Contributors

sizer is the most accurate capacity planning tool for MySQL Cluster and has been used to scope many large scale cluster deployments. You create the data model in MySQL Cluster and then run sizer on each table. At the end you will have a csv file. The content of the csv file is pasted into the supplied excel spreadsheet. There you can change the number of records for the tables (to anticipate growth), and you can scope how many data nodes you need and how much RAM and DISK storage that is required.

For the most accurate analysis it is recommended to have sample data in the tables you want to dimension. When dimensioning e.g a subscriber database you can provision a small set of subscribers (for example using the Sandbox), run sizer, and then see how much resources are needed to handle 1M, 5M, 10M etc subscribers, by changing the number of records in the excel spreadsheet.

sizer is more accurate than e.g, ndb_size.pl, because sizer calculates:

  • correct record overheads
  • calculates cost for unique indexes
  • averages storage costs for VAR* columns (user specified by either estimation (loadfactor) or actual data)
  • calculates cost for BLOB/TEXT

Download - source dist only (latest first) - Community edition (GPL)

Version 0.30 - released 2nd of June 2010 - download source dist - Incorporated fixes from community for a bug in BLOB calculation.


Installation

Pre-requisites

  • MySQL Cluster 6.3 or MySQL Cluster 7.0 (or later) installed and started

In order to make it simple it is also recommended to have:

  • ndb_mgmd running on localhost (where sizer is going to be installed)
  • data nodes can be somewhere else or also on localhost
  • mysql_config on the PATH
  • autoconf, automake, libtool (e.g apt-get install autoconf apt-get install automake apt-get install libtool

Compile sizer

Make sure you have the correct mysql_config on the path. If you have used the config tool (and the default install path):

Make sure you have: g++ and gcc installed (run those two commands from the command line). If they don't exists you have to install them!!

    export PATH=/usr/local/mysql/mysql/bin:$PATH

or ..another popular PATH is

    export PATH=/usr/local/mysql/bin:$PATH

Proceed with the following when you can execute the command mysql_config (then your PATH is good):

    tar xvfz sizer-0.26.tgz
    cd sizer-0.26
    sh autogen.sh
    ./configure
    make
    sudo make install
    export LD_LIBRARY_PATH=/usr/local/mysql/mysql/lib:/usr/local/mysql/mysql/lib/mysql

If you don't set the LD_LIBRARY_PATH to where libmysqlclient_r.so.16 is, then you get the following error message when starting sizer:

    sizer: error while loading shared libraries: libmysqlclient_r.so.16: cannot open shared object file: No such file or directory

on Solaris (you need Sun studio) run the following configure:

    CC=cc-5.0 CXX=CC ./configure

Running sizer

Running sizer is easy:

$ sizer -h will print out the arguments: -? --help prints this information
-c --ndb-connectstring=<hostname> hostname where ndb_mgmd is running, default 'localhost'
-d --database=<db> Database to analyse. Default is '', ALL databases will be analyzed
-t --tablename=<t> Table to analyse. Default is '', ALL tables will be analyzed
-a --analyze-full-table Performs a full table scan to calculate the average data usage for variable size columns. Otherwise 1024 random records are analyzed.
-l --loadfactor=<x> x is the percentage (0-100) how much of the VAR* columns you expect to fill up (on average). This is if you don't have any sample data to base the average on. 0 (default) means that this is disabled.

If you have the management server on the localhost the only necessary argument is --tablename. For example:

sizer --tablename=t1

will analyse table t1 in database test. Omitting the tablename will make sizer analyze all tables in a particular database (by default test).:
sizer - analyze all tables in test
sizer -d xyz - analyze all tables in database xyz


Documentation

Quick Start Guide

Step 1 - Get the data

sizer analyzes a given table or all tables in a database. The output of running sizer is written to a text file.

If you want to analyze all ndb tables in all database you can (from sizer-0.26) run sizer :

sizer
and the result will be stored in all_databases.csv:
$cat all_databases.csv
mysql,ndb_schema,1,1,0,0,9,20,341,0
mysql,ndb_apply_status,0,1,0,0,0,5,20,331,0
test,t1,4096,1,1,0,0,5,20,68,0
test,t2,65536,1,2,1,0,6,40,152,0

Consider that we have two tables t1 and t2:

sizer -d test -t t1
sizer -d test -t t2

The result is stored in test_t1.csv and test_t2.csv and the files are written to the current working directory:

$cat test_t1.csv
test,t1,4096,1,0,0,20,68,0
$cat test_t2.csv
test,t2,65536,2,1,0,40,152,0

If you want to analyze all tables in a database you can use: sizer -d test
The result will be stored in test.csv:
$cat test.csv
test,t1,4096,1,1,0,0,5,20,68,0
test,t2,65536,1,2,1,0,6,40,152,0

Each column in the csv file corresponds to:

  • Database name
  • Table name
  • Number of records in table found=4096 records
  • Table object it is always 1 and just there to be able to count the tables in the excel spreadsheet.
  • Number of orderedIndexes=1
  • Number of uniqueHashIndexes=0
  • Number of blob/text columns=0
  • Number of attributes
  • IndexMemory=20 bytes
  • DataMemory=68 bytes
  • Diskspace=0 bytes

If we want to make a more extensive analysis we have to start sizer with -a. This will calculate the average size of variable sized columns over all records in the table. Most often, this is not needed.

It is higly recommend that you load the tables with some reference data. This will make sizing more accurate. If you don't have reference data you can start with sizer --loadfactor=[0-100] which is a percentage saying how much the variable sized columns (VARCHAR, VARBINARY..) in average will be filled up. E.g, if you have a column username VARCHAR(128) and have a --loadfactor=50, then sizer will calculate that the username will have 64 characters.

UTF8 and other CHARACTER SETS are handled correctly.

Step 2 - Load it into the excel (or ooffice) spreadsheet

  • Load the csv file into a excel spreadsheet (you have to do some kind of import which works different on ooffice and excel).
  • Copy the data to the tab 'TableObjects' starting at cell A4. Now you can change the number of rows in the table and see how that affects storage (see next bullet)
  • In the tab 'Data node configuration' you will be able to see how much resources you need to support your data model.
  • In the tab 'Data node configuration' you can change
      number of nodes
      number of backups saved
      compressedLCP and compressedBackup

Step 3 - Get a configuration

  • The parameters Number of nodes,DataMemory, IndexMemory can now be used in the Configurator to generate a production class cluster configuration!

Features

TINYINT/SMALLINT data type

Non-indexed TINYINTs/SMALLINTs can be replaced by BIT(8) and BIT(16) respectively. sizer will suggest this to you:
--- WARNING! tiny/small/mediumint found. If the column is not indexed consider replacing it with BIT(8)!!

VARCHAR/VARBINARY data type

Best practice is to put sample data in the tables representing average data sizes. This will get better accuracy since otherwise the --loadfactor parameter is used. sizer will calculate an average over (by default) up to 1024 random records and print out:
--- VAR* attribute is 14 bytes averaged over 1024 rows

If you don't have any sample data run sizer with:
sizer --loadfactor=X where X is a percentage (0-100) representing how much you expect your VAR* columns to be filled.

If there are 0 records in the table, and loadfactor is not specified, it will default to 100%:
--- WARNING! No reference data found for VAR*. Defaulting to max size (loadfactor=100%)..256 bytes

UNIQUE INDEX

sizer prints a warning for each UNIQUE index it finds. This is so that you can judge if you really need it, because they can be very costly:
--- WARNING! Unique Index found named (b$unique): Cost DataMemory per record = 48 and IndexMemory = 20

Disk data columns/tables

sizer calculates storage also for disk data columns.

UTF-8 and Character Sets

sizer correctly handles UTF-8 and other Character sets.

BLOB/TEXT

Are correctly calculated using NdbBlob::getLength().


Contributors

Many thanks to the following individuals/organizations:

  • Alexander Fisher
  • A very large anonymous telco company