Thursday, July 14, 2011

Comparing Databases with mysqldbcompare

If you have two or more database servers containing the same data, how do you know if the objects are identical. Furthermore, how can you be sure the data is the same on all of the servers? What is needed is a way to determine if the databases are in synch - all objects are present, the object definitions are the same, and the tables contain the same data. Synchronizing data can become a nightmare without the proper tools to quickly identify differences among objects and data in two databases. Perhaps a worst case (and more daunting) is trying find data that you suspect may be different but you don’t have any way of finding out.

This is where the new 'mysqldbcompare' utility comes in handy. The mysqldbcompare utility uses the mysqldiff functionality (mysqldiff allows you to find the differences in object definitions for two objects or a list of objects in two databases) and permits you to compare the object definitions and the data among two databases. Not only will it find the differences among database objects and their definitions, it will also find differences in the data!

The databases can reside on the same server or different servers. The utility performs a consistency check to ensure two databases are the same defined as having the same list of objects, identical object definitions (including object names), and for tables the same row counts and the same data.

Some scenarios where mysqldbcompare can be employed include:

  • checking master and slave for consistency
  • checking production and development databases for consistency
  • generating a difference report for expected differences among new and old data
  • comparing backups for differences
Running the Utility

Let us take a look at the utility in action. Below are two examples of the utility comparing what should be the same database on two servers. I am using a simple detail shop inventory database used to manage supplies. It consists of two tables (supplier, supplies) and three views (cleaning, finishing_up, and tools).

In the first example, we see an example where the databases are consistent. When you examine the output, you will see each object is inspected in three passes. First, the object definitions are compared. Any discrepancies would be displayed as a difference in their CREATE statements. If the object is a table, a row count test is performed followed by a comparison of the data. Surely, if the row count test fails we know the data check will fail.

Note: This is the same output (and indeed the same code) that is used for mysqldiff. The mysqldbcompare utility has all of the same features with respect to difference type presented (unified, differ, and context) which you can select with the same option named ‘--difftype’.

mysqldbcompare --server1=root@localhost --server2=root@backup_host:3310 inventory:inventory
# server1 on localhost: ... connected.
# server2 on localhost: ... connected.
# Checking databases inventory on server1 and inventory on server2


Defn Row Data
Type Object Name Diff Count Check
---------------------------------------------------------------------------
TABLE supplier pass pass pass
TABLE supplies pass pass pass
VIEW cleaning pass - -
VIEW finishing_up pass - -
VIEW tools pass - -


Databases are consistent.


# ...done

Normally, the mysqldbcompare utility will stop on the first failed test. This default means you can run the utility as a safeguard on data that you expect to be consistent. However, if you suspect or know there will be differences in the database objects or data and want to run all of the checks, you can use the ‘--run-all-tests’ option. This option will run the tests on all objects even if some tests fail. Note that this does not include system or access errors such as a down server or incorrect login - those errors will cause the utility to fail with an appropriate error message.

In the second example, we expect the databases to be different and we want to know which data is different. As you can see, the utility found differences in the object definitions as well as differences in the data. Both are reported.

mysqldbcompare --server1=root@localhost --server2=root@backup_host:3310 inventory:inventory --run-all-tests
# server1 on localhost: ... connected.
# server2 on localhost: ... connected.
# Checking databases inventory on server1 and inventory on server2


WARNING: Objects in server1:inventory but not in server2:inventory:
VIEW: finishing_up
VIEW: cleaning


Defn Row Data
Type Object Name Diff Count Check
---------------------------------------------------------------------------
TABLE supplier pass FAIL FAIL


Row counts are not the same among inventory.supplier and inventory.supplier.


Data differences found among rows:
--- inventory.supplier
+++ inventory.supplier
@@ -1,2 +1,2 @@
code,name
-2,Never Enough Inc.
+2,Wesayso Corporation


Rows in inventory.supplier not in inventory.supplier
code,name
3,Never Enough Inc.


TABLE supplies pass FAIL FAIL


Row counts are not the same among inventory.supplies and inventory.supplies.


Data differences found among rows:
--- inventory.supplies
+++ inventory.supplies
@@ -1,4 +1,4 @@
stock_number,description,qty,cost,type,notes,supplier
-11040,Leather care,1,9.99,other,,1
-11186,Plastic polish,1,9.99,polishing,,1
-11146,Speed shine,1,9.99,repair,,1
+11040,Leather care,1,10.00,other,,1
+11186,Plastic polish,1,10.00,polishing,,1
+11146,Speed shine,1,10.00,repair,,1


Rows in inventory.supplies not in inventory.supplies
stock_number,description,qty,cost,type,notes,supplier
11104,Interior cleaner,1,9.99,cleaning,,1
11056,Microfiber and foam pad cleaner,1,9.99,cleaning,,1
11136,Rubber cleaner,1,9.99,cleaning,,1
11173,Vinyl and rubber dressing,1,9.99,cleaning,,1
11106,Wheel cleaner,1,9.99,cleaning,,1
11270,Carpet cleaner,1,9.99,cleaning,,1


Rows in inventory.supplies not in inventory.supplies
stock_number,description,qty,cost,type,notes,supplier
11269,Microfiber spray on car wash towel,3,16.99,cleaning,,1
11116,Microfiber wax removal towel,3,16.99,waxing,,1
10665,Glass polish pad,3,10.00,polishing,,1


VIEW tools FAIL - -


--- inventory.tools
+++ inventory.tools
@@ -1,1 +1,1 @@
-CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `inventory`.`tools` AS select `inventory`.`supplies`.`stock_number` AS `stock_number`,`inventory`.`supplies`.`description` AS `description`,`inventory`.`supplies`.`qty` AS `qty`,`inventory`.`supplies`.`cost` AS `cost`,`inventory`.`supplies`.`type` AS `type`,`inventory`.`supplies`.`notes` AS `notes`,`inventory`.`supplies`.`supplier` AS `supplier` from `inventory`.`supplies` where (`inventory`.`supplies`.`type` = 'tool')
+CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `inventory`.`tools` AS select `inventory`.`supplies`.`stock_number` AS `stock_number`,`inventory`.`supplies`.`description` AS `description`,`inventory`.`supplies`.`qty` AS `qty`,`inventory`.`supplies`.`cost` AS `cost`,`inventory`.`supplies`.`type` AS `type`,`inventory`.`supplies`.`notes` AS `notes`,`inventory`.`supplies`.`supplier` AS `supplier` from `inventory`.`supplies` where (`inventory`.`supplies`.`type` in ('tool','other'))


Database consistency check failed.

Take a moment to read through the report above. At the top of the report (the first object tested), we see a critical error in the suppliers table. Here we can see that there are two different names for the same supplier_id. All relational database theory aside, that could spell trouble when it comes time to reorder supplies.

Notice the report for the supplies table. In this example, the utility identified three rows that were different among the two databases. It also identified rows that were missing from either table. Clearly, that could help you diagnose what went wrong where in your application (or your data entry).

Lastly, we see a possible issue with the tools view. Here the view definition differs slightly. Depending the use of the view this may be acceptable but it is nice to know nonetheless.

What Does This All Mean?

If data consistency is important to you or if you need a way to quickly determine the differences among data in two databases, the mysqldbcompare utility is a great addition to your toolset. But don’t take my word for it - try it out yourself.

Download and Try It Out!

The MySQL Utilities project is written in Python and is a component of the MySQL Workbench tool. You can download the latest release of the MySQL Workbench here:

http://dev.mysql.com/downloads/workbench/

There are some limitations in this first release. Currently, if the storage engines differ among the tables in the compare, the object definitions will show this difference – which is exactly what you would expect. However, the utility will stop and report the object definition test as a failure. You can still run the data consistency check by using the --force option which instructs the mysqldbcompare utility to run all tests unless they fail from an internal exception (for example, the table files are corrupt).

Got Some Ideas or Want to See New Features?

One really cool feature would be if the utility generate SQL statements for synchronizing the data and objects. Would this be something you would want to see incorporated?

If you’re intrigued by this new utility and in your course of use find new features or new uses that you would like to see incorporated in future revisions, please email me and let me know!

Related Material

Th latest MySQL Utilities development tree (including mysqldbcompare) can be found here:

https://launchpad.net/mysql-utilities