Thursday, July 14, 2011

Comparing Databases with mysqldbcompare

If you have two or more database servers containing the same data, how do you know if the objects are identical. Furthermore, how can you be sure the data is the same on all of the servers? What is needed is a way to determine if the databases are in synch - all objects are present, the object definitions are the same, and the tables contain the same data. Synchronizing data can become a nightmare without the proper tools to quickly identify differences among objects and data in two databases. Perhaps a worst case (and more daunting) is trying find data that you suspect may be different but you don’t have any way of finding out.

This is where the new 'mysqldbcompare' utility comes in handy. The mysqldbcompare utility uses the mysqldiff functionality (mysqldiff allows you to find the differences in object definitions for two objects or a list of objects in two databases) and permits you to compare the object definitions and the data among two databases. Not only will it find the differences among database objects and their definitions, it will also find differences in the data!

The databases can reside on the same server or different servers. The utility performs a consistency check to ensure two databases are the same defined as having the same list of objects, identical object definitions (including object names), and for tables the same row counts and the same data.

Some scenarios where mysqldbcompare can be employed include:

  • checking master and slave for consistency
  • checking production and development databases for consistency
  • generating a difference report for expected differences among new and old data
  • comparing backups for differences
Running the Utility

Let us take a look at the utility in action. Below are two examples of the utility comparing what should be the same database on two servers. I am using a simple detail shop inventory database used to manage supplies. It consists of two tables (supplier, supplies) and three views (cleaning, finishing_up, and tools).

In the first example, we see an example where the databases are consistent. When you examine the output, you will see each object is inspected in three passes. First, the object definitions are compared. Any discrepancies would be displayed as a difference in their CREATE statements. If the object is a table, a row count test is performed followed by a comparison of the data. Surely, if the row count test fails we know the data check will fail.

Note: This is the same output (and indeed the same code) that is used for mysqldiff. The mysqldbcompare utility has all of the same features with respect to difference type presented (unified, differ, and context) which you can select with the same option named ‘--difftype’.

mysqldbcompare --server1=root@localhost --server2=root@backup_host:3310 inventory:inventory
# server1 on localhost: ... connected.
# server2 on localhost: ... connected.
# Checking databases inventory on server1 and inventory on server2


Defn Row Data
Type Object Name Diff Count Check
---------------------------------------------------------------------------
TABLE supplier pass pass pass
TABLE supplies pass pass pass
VIEW cleaning pass - -
VIEW finishing_up pass - -
VIEW tools pass - -


Databases are consistent.


# ...done

Normally, the mysqldbcompare utility will stop on the first failed test. This default means you can run the utility as a safeguard on data that you expect to be consistent. However, if you suspect or know there will be differences in the database objects or data and want to run all of the checks, you can use the ‘--run-all-tests’ option. This option will run the tests on all objects even if some tests fail. Note that this does not include system or access errors such as a down server or incorrect login - those errors will cause the utility to fail with an appropriate error message.

In the second example, we expect the databases to be different and we want to know which data is different. As you can see, the utility found differences in the object definitions as well as differences in the data. Both are reported.

mysqldbcompare --server1=root@localhost --server2=root@backup_host:3310 inventory:inventory --run-all-tests
# server1 on localhost: ... connected.
# server2 on localhost: ... connected.
# Checking databases inventory on server1 and inventory on server2


WARNING: Objects in server1:inventory but not in server2:inventory:
VIEW: finishing_up
VIEW: cleaning


Defn Row Data
Type Object Name Diff Count Check
---------------------------------------------------------------------------
TABLE supplier pass FAIL FAIL


Row counts are not the same among inventory.supplier and inventory.supplier.


Data differences found among rows:
--- inventory.supplier
+++ inventory.supplier
@@ -1,2 +1,2 @@
code,name
-2,Never Enough Inc.
+2,Wesayso Corporation


Rows in inventory.supplier not in inventory.supplier
code,name
3,Never Enough Inc.


TABLE supplies pass FAIL FAIL


Row counts are not the same among inventory.supplies and inventory.supplies.


Data differences found among rows:
--- inventory.supplies
+++ inventory.supplies
@@ -1,4 +1,4 @@
stock_number,description,qty,cost,type,notes,supplier
-11040,Leather care,1,9.99,other,,1
-11186,Plastic polish,1,9.99,polishing,,1
-11146,Speed shine,1,9.99,repair,,1
+11040,Leather care,1,10.00,other,,1
+11186,Plastic polish,1,10.00,polishing,,1
+11146,Speed shine,1,10.00,repair,,1


Rows in inventory.supplies not in inventory.supplies
stock_number,description,qty,cost,type,notes,supplier
11104,Interior cleaner,1,9.99,cleaning,,1
11056,Microfiber and foam pad cleaner,1,9.99,cleaning,,1
11136,Rubber cleaner,1,9.99,cleaning,,1
11173,Vinyl and rubber dressing,1,9.99,cleaning,,1
11106,Wheel cleaner,1,9.99,cleaning,,1
11270,Carpet cleaner,1,9.99,cleaning,,1


Rows in inventory.supplies not in inventory.supplies
stock_number,description,qty,cost,type,notes,supplier
11269,Microfiber spray on car wash towel,3,16.99,cleaning,,1
11116,Microfiber wax removal towel,3,16.99,waxing,,1
10665,Glass polish pad,3,10.00,polishing,,1


VIEW tools FAIL - -


--- inventory.tools
+++ inventory.tools
@@ -1,1 +1,1 @@
-CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `inventory`.`tools` AS select `inventory`.`supplies`.`stock_number` AS `stock_number`,`inventory`.`supplies`.`description` AS `description`,`inventory`.`supplies`.`qty` AS `qty`,`inventory`.`supplies`.`cost` AS `cost`,`inventory`.`supplies`.`type` AS `type`,`inventory`.`supplies`.`notes` AS `notes`,`inventory`.`supplies`.`supplier` AS `supplier` from `inventory`.`supplies` where (`inventory`.`supplies`.`type` = 'tool')
+CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `inventory`.`tools` AS select `inventory`.`supplies`.`stock_number` AS `stock_number`,`inventory`.`supplies`.`description` AS `description`,`inventory`.`supplies`.`qty` AS `qty`,`inventory`.`supplies`.`cost` AS `cost`,`inventory`.`supplies`.`type` AS `type`,`inventory`.`supplies`.`notes` AS `notes`,`inventory`.`supplies`.`supplier` AS `supplier` from `inventory`.`supplies` where (`inventory`.`supplies`.`type` in ('tool','other'))


Database consistency check failed.

Take a moment to read through the report above. At the top of the report (the first object tested), we see a critical error in the suppliers table. Here we can see that there are two different names for the same supplier_id. All relational database theory aside, that could spell trouble when it comes time to reorder supplies.

Notice the report for the supplies table. In this example, the utility identified three rows that were different among the two databases. It also identified rows that were missing from either table. Clearly, that could help you diagnose what went wrong where in your application (or your data entry).

Lastly, we see a possible issue with the tools view. Here the view definition differs slightly. Depending the use of the view this may be acceptable but it is nice to know nonetheless.

What Does This All Mean?

If data consistency is important to you or if you need a way to quickly determine the differences among data in two databases, the mysqldbcompare utility is a great addition to your toolset. But don’t take my word for it - try it out yourself.

Download and Try It Out!

The MySQL Utilities project is written in Python and is a component of the MySQL Workbench tool. You can download the latest release of the MySQL Workbench here:

http://dev.mysql.com/downloads/workbench/

There are some limitations in this first release. Currently, if the storage engines differ among the tables in the compare, the object definitions will show this difference – which is exactly what you would expect. However, the utility will stop and report the object definition test as a failure. You can still run the data consistency check by using the --force option which instructs the mysqldbcompare utility to run all tests unless they fail from an internal exception (for example, the table files are corrupt).

Got Some Ideas or Want to See New Features?

One really cool feature would be if the utility generate SQL statements for synchronizing the data and objects. Would this be something you would want to see incorporated?

If you’re intrigued by this new utility and in your course of use find new features or new uses that you would like to see incorporated in future revisions, please email me and let me know!

Related Material

Th latest MySQL Utilities development tree (including mysqldbcompare) can be found here:

https://launchpad.net/mysql-utilities

15 comments:

  1. Hi Chuck,
    this sounds like a great tool, we were in great need for something like that lately when we migrated our database in order to get rid of a large and historical innodb file...
    One thing missing here is the run times of the utility. Our database has some pretty large entities and all the different tools we tried out took a few hours per entity...
    Another thing is that if there is a discrepancy in the data of a given table and there are many many rows differing then it would be hard to read the file.
    Again, great tool, we'll definitely find a way to use it, even on an ongoing basis if it runs really fast :-)

    ReplyDelete
  2. How well does this tool scale? If you have millions of rows to compare, are you locking the table the entire time during the comparison on both sides? Does the tool take in to account replication lag and make sure both sides are caught up?

    ReplyDelete
  3. You can download the latest code from: https://launchpad.net/mysql-utilities.

    ReplyDelete
  4. I have tested the tool up to about 350k records. If you have a larger dataset, please post your experiences with the speed of the utility.

    The lock is held only for the first phase of the operation which calculates checksums for each row. Once that is done, the locks are released.

    Currently the utility does not synchronize a master and slave, but that would be a cool feature to include I will put that on the wish list.

    ReplyDelete
  5. Printing run times can be done - but I wonder if you (Gabi D) would like a progress indicator as well. I know I feel uncomfortable watching a blank command line cursor blink seemingly forever wondering if something is really happening. Would this be helpful too?

    Yes, for comparisons with many rows differing the output could get messy. In those cases it might be a good idea to pipe the output to a file to examine later. Or would a write-to-file or limit output feature be more useful?

    ReplyDelete
  6. How to run this tool Chuck on a Linux machine which has mysql.

    ReplyDelete
  7. Once you install Connector/Python then MySQL Utilities, you should be able to simply open a terminal and enter the command mysqldbcompare [...] (with appropriate options specified).

    If you need a more specific example, please post it so I can comment.

    C/Py: http://dev.mysql.com/downloads/connector/python/
    Utilities: http://dev.mysql.com/downloads/tools/utilities/

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Hello Chuck,

    but if my database has password?

    #imNewbie

    ReplyDelete
    Replies
    1. Check the documentation. You provide it on the command line with --server* options. Let's say your password is SECRET:

      mysqldbcompare --server1=root:SECRET@localhost --server2=root:SECRET@backup_host:3310 inventory:inventory --run-all-tests

      Delete
    2. I'm so grateful man!
      I found together a friend this answer.
      I confess that I also lost me with term "inventory:inventory", but now all was resolved.


      Thanks for All.

      Delete
  10. I tried to use mysqldbcompare but returns
    ERROR: Cannot connect to the server2 server.
    Error Authentication with old (insecure) passwords is not supported. For more information, lookup Password Hashing in the latest MySQL manual

    What can I do?

    ReplyDelete
    Replies
    1. What version of MySQL are you using? What version of MySQL Utilities are you using? Can you paste the actual command so that I can see how you are attempting to connect (you can mask out the user id/passwd)?

      Finally, is your MySQL server setup to use the default login protocol? For example, can you connect using the mysql client like this:

      > mysql -u -p

      Delete
    2. Thanks for your quick reply

      Server1 is my localhost with MySQL 5.1, Server2 is my hosting with MySQL 5.0
      I use HeidiSQL to connect them, but it hasnt the db compare feature, so I downloaded mysql utilities 1.3.5

      This is my command
      mysqldbcompare --server1=root:secret@localhost --server2=cyberese_2rsv3:secret@server2 cyberese_2rsv3:cyberese_2rsv3

      Thanks for your help

      Delete
    3. Ok, it is clear there is nothing wrong with how you provide the parameters and the server(s) are sufficiently new(ish). The problem is on your server side.

      Please refer to the MySQL Manual to disable old-style logins:
      http://dev.mysql.com/doc/refman/5.1/en/password-hashing.html
      http://dev.mysql.com/doc/refman/5.1/en/old-client.html

      Delete