Thursday, July 14, 2011

Comparing Databases with mysqldbcompare

If you have two or more database servers containing the same data, how do you know if the objects are identical. Furthermore, how can you be sure the data is the same on all of the servers? What is needed is a way to determine if the databases are in synch - all objects are present, the object definitions are the same, and the tables contain the same data. Synchronizing data can become a nightmare without the proper tools to quickly identify differences among objects and data in two databases. Perhaps a worst case (and more daunting) is trying find data that you suspect may be different but you don’t have any way of finding out.

This is where the new 'mysqldbcompare' utility comes in handy. The mysqldbcompare utility uses the mysqldiff functionality (mysqldiff allows you to find the differences in object definitions for two objects or a list of objects in two databases) and permits you to compare the object definitions and the data among two databases. Not only will it find the differences among database objects and their definitions, it will also find differences in the data!

The databases can reside on the same server or different servers. The utility performs a consistency check to ensure two databases are the same defined as having the same list of objects, identical object definitions (including object names), and for tables the same row counts and the same data.

Some scenarios where mysqldbcompare can be employed include:

  • checking master and slave for consistency
  • checking production and development databases for consistency
  • generating a difference report for expected differences among new and old data
  • comparing backups for differences
Running the Utility

Let us take a look at the utility in action. Below are two examples of the utility comparing what should be the same database on two servers. I am using a simple detail shop inventory database used to manage supplies. It consists of two tables (supplier, supplies) and three views (cleaning, finishing_up, and tools).

In the first example, we see an example where the databases are consistent. When you examine the output, you will see each object is inspected in three passes. First, the object definitions are compared. Any discrepancies would be displayed as a difference in their CREATE statements. If the object is a table, a row count test is performed followed by a comparison of the data. Surely, if the row count test fails we know the data check will fail.

Note: This is the same output (and indeed the same code) that is used for mysqldiff. The mysqldbcompare utility has all of the same features with respect to difference type presented (unified, differ, and context) which you can select with the same option named ‘--difftype’.

mysqldbcompare --server1=root@localhost --server2=root@backup_host:3310 inventory:inventory
# server1 on localhost: ... connected.
# server2 on localhost: ... connected.
# Checking databases inventory on server1 and inventory on server2


Defn Row Data
Type Object Name Diff Count Check
---------------------------------------------------------------------------
TABLE supplier pass pass pass
TABLE supplies pass pass pass
VIEW cleaning pass - -
VIEW finishing_up pass - -
VIEW tools pass - -


Databases are consistent.


# ...done

Normally, the mysqldbcompare utility will stop on the first failed test. This default means you can run the utility as a safeguard on data that you expect to be consistent. However, if you suspect or know there will be differences in the database objects or data and want to run all of the checks, you can use the ‘--run-all-tests’ option. This option will run the tests on all objects even if some tests fail. Note that this does not include system or access errors such as a down server or incorrect login - those errors will cause the utility to fail with an appropriate error message.

In the second example, we expect the databases to be different and we want to know which data is different. As you can see, the utility found differences in the object definitions as well as differences in the data. Both are reported.

mysqldbcompare --server1=root@localhost --server2=root@backup_host:3310 inventory:inventory --run-all-tests
# server1 on localhost: ... connected.
# server2 on localhost: ... connected.
# Checking databases inventory on server1 and inventory on server2


WARNING: Objects in server1:inventory but not in server2:inventory:
VIEW: finishing_up
VIEW: cleaning


Defn Row Data
Type Object Name Diff Count Check
---------------------------------------------------------------------------
TABLE supplier pass FAIL FAIL


Row counts are not the same among inventory.supplier and inventory.supplier.


Data differences found among rows:
--- inventory.supplier
+++ inventory.supplier
@@ -1,2 +1,2 @@
code,name
-2,Never Enough Inc.
+2,Wesayso Corporation


Rows in inventory.supplier not in inventory.supplier
code,name
3,Never Enough Inc.


TABLE supplies pass FAIL FAIL


Row counts are not the same among inventory.supplies and inventory.supplies.


Data differences found among rows:
--- inventory.supplies
+++ inventory.supplies
@@ -1,4 +1,4 @@
stock_number,description,qty,cost,type,notes,supplier
-11040,Leather care,1,9.99,other,,1
-11186,Plastic polish,1,9.99,polishing,,1
-11146,Speed shine,1,9.99,repair,,1
+11040,Leather care,1,10.00,other,,1
+11186,Plastic polish,1,10.00,polishing,,1
+11146,Speed shine,1,10.00,repair,,1


Rows in inventory.supplies not in inventory.supplies
stock_number,description,qty,cost,type,notes,supplier
11104,Interior cleaner,1,9.99,cleaning,,1
11056,Microfiber and foam pad cleaner,1,9.99,cleaning,,1
11136,Rubber cleaner,1,9.99,cleaning,,1
11173,Vinyl and rubber dressing,1,9.99,cleaning,,1
11106,Wheel cleaner,1,9.99,cleaning,,1
11270,Carpet cleaner,1,9.99,cleaning,,1


Rows in inventory.supplies not in inventory.supplies
stock_number,description,qty,cost,type,notes,supplier
11269,Microfiber spray on car wash towel,3,16.99,cleaning,,1
11116,Microfiber wax removal towel,3,16.99,waxing,,1
10665,Glass polish pad,3,10.00,polishing,,1


VIEW tools FAIL - -


--- inventory.tools
+++ inventory.tools
@@ -1,1 +1,1 @@
-CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `inventory`.`tools` AS select `inventory`.`supplies`.`stock_number` AS `stock_number`,`inventory`.`supplies`.`description` AS `description`,`inventory`.`supplies`.`qty` AS `qty`,`inventory`.`supplies`.`cost` AS `cost`,`inventory`.`supplies`.`type` AS `type`,`inventory`.`supplies`.`notes` AS `notes`,`inventory`.`supplies`.`supplier` AS `supplier` from `inventory`.`supplies` where (`inventory`.`supplies`.`type` = 'tool')
+CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `inventory`.`tools` AS select `inventory`.`supplies`.`stock_number` AS `stock_number`,`inventory`.`supplies`.`description` AS `description`,`inventory`.`supplies`.`qty` AS `qty`,`inventory`.`supplies`.`cost` AS `cost`,`inventory`.`supplies`.`type` AS `type`,`inventory`.`supplies`.`notes` AS `notes`,`inventory`.`supplies`.`supplier` AS `supplier` from `inventory`.`supplies` where (`inventory`.`supplies`.`type` in ('tool','other'))


Database consistency check failed.

Take a moment to read through the report above. At the top of the report (the first object tested), we see a critical error in the suppliers table. Here we can see that there are two different names for the same supplier_id. All relational database theory aside, that could spell trouble when it comes time to reorder supplies.

Notice the report for the supplies table. In this example, the utility identified three rows that were different among the two databases. It also identified rows that were missing from either table. Clearly, that could help you diagnose what went wrong where in your application (or your data entry).

Lastly, we see a possible issue with the tools view. Here the view definition differs slightly. Depending the use of the view this may be acceptable but it is nice to know nonetheless.

What Does This All Mean?

If data consistency is important to you or if you need a way to quickly determine the differences among data in two databases, the mysqldbcompare utility is a great addition to your toolset. But don’t take my word for it - try it out yourself.

Download and Try It Out!

The MySQL Utilities project is written in Python and is a component of the MySQL Workbench tool. You can download the latest release of the MySQL Workbench here:

http://dev.mysql.com/downloads/workbench/

There are some limitations in this first release. Currently, if the storage engines differ among the tables in the compare, the object definitions will show this difference – which is exactly what you would expect. However, the utility will stop and report the object definition test as a failure. You can still run the data consistency check by using the --force option which instructs the mysqldbcompare utility to run all tests unless they fail from an internal exception (for example, the table files are corrupt).

Got Some Ideas or Want to See New Features?

One really cool feature would be if the utility generate SQL statements for synchronizing the data and objects. Would this be something you would want to see incorporated?

If you’re intrigued by this new utility and in your course of use find new features or new uses that you would like to see incorporated in future revisions, please email me and let me know!

Related Material

Th latest MySQL Utilities development tree (including mysqldbcompare) can be found here:

https://launchpad.net/mysql-utilities

34 comments:

  1. Hi Chuck,
    this sounds like a great tool, we were in great need for something like that lately when we migrated our database in order to get rid of a large and historical innodb file...
    One thing missing here is the run times of the utility. Our database has some pretty large entities and all the different tools we tried out took a few hours per entity...
    Another thing is that if there is a discrepancy in the data of a given table and there are many many rows differing then it would be hard to read the file.
    Again, great tool, we'll definitely find a way to use it, even on an ongoing basis if it runs really fast :-)

    ReplyDelete
  2. How well does this tool scale? If you have millions of rows to compare, are you locking the table the entire time during the comparison on both sides? Does the tool take in to account replication lag and make sure both sides are caught up?

    ReplyDelete
  3. You can download the latest code from: https://launchpad.net/mysql-utilities.

    ReplyDelete
  4. I have tested the tool up to about 350k records. If you have a larger dataset, please post your experiences with the speed of the utility.

    The lock is held only for the first phase of the operation which calculates checksums for each row. Once that is done, the locks are released.

    Currently the utility does not synchronize a master and slave, but that would be a cool feature to include I will put that on the wish list.

    ReplyDelete
  5. Printing run times can be done - but I wonder if you (Gabi D) would like a progress indicator as well. I know I feel uncomfortable watching a blank command line cursor blink seemingly forever wondering if something is really happening. Would this be helpful too?

    Yes, for comparisons with many rows differing the output could get messy. In those cases it might be a good idea to pipe the output to a file to examine later. Or would a write-to-file or limit output feature be more useful?

    ReplyDelete
  6. How to run this tool Chuck on a Linux machine which has mysql.

    ReplyDelete
  7. Once you install Connector/Python then MySQL Utilities, you should be able to simply open a terminal and enter the command mysqldbcompare [...] (with appropriate options specified).

    If you need a more specific example, please post it so I can comment.

    C/Py: http://dev.mysql.com/downloads/connector/python/
    Utilities: http://dev.mysql.com/downloads/tools/utilities/

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Hello Chuck,

    but if my database has password?

    #imNewbie

    ReplyDelete
    Replies
    1. Check the documentation. You provide it on the command line with --server* options. Let's say your password is SECRET:

      mysqldbcompare --server1=root:SECRET@localhost --server2=root:SECRET@backup_host:3310 inventory:inventory --run-all-tests

      Delete
    2. I'm so grateful man!
      I found together a friend this answer.
      I confess that I also lost me with term "inventory:inventory", but now all was resolved.


      Thanks for All.

      Delete
  10. I tried to use mysqldbcompare but returns
    ERROR: Cannot connect to the server2 server.
    Error Authentication with old (insecure) passwords is not supported. For more information, lookup Password Hashing in the latest MySQL manual

    What can I do?

    ReplyDelete
    Replies
    1. What version of MySQL are you using? What version of MySQL Utilities are you using? Can you paste the actual command so that I can see how you are attempting to connect (you can mask out the user id/passwd)?

      Finally, is your MySQL server setup to use the default login protocol? For example, can you connect using the mysql client like this:

      > mysql -u -p

      Delete
    2. Thanks for your quick reply

      Server1 is my localhost with MySQL 5.1, Server2 is my hosting with MySQL 5.0
      I use HeidiSQL to connect them, but it hasnt the db compare feature, so I downloaded mysql utilities 1.3.5

      This is my command
      mysqldbcompare --server1=root:secret@localhost --server2=cyberese_2rsv3:secret@server2 cyberese_2rsv3:cyberese_2rsv3

      Thanks for your help

      Delete
    3. Ok, it is clear there is nothing wrong with how you provide the parameters and the server(s) are sufficiently new(ish). The problem is on your server side.

      Please refer to the MySQL Manual to disable old-style logins:
      http://dev.mysql.com/doc/refman/5.1/en/password-hashing.html
      http://dev.mysql.com/doc/refman/5.1/en/old-client.html

      Delete
  11. pompeyomex:
    Hi I had a similar problem - I found that old-style logins were switched off, but since I had migrated an earlier (pre 4.1) mysql installation, some of the passwords were still in the old (15 character) hash format, rather than the newer (41 char). By resetting the password for the user this was created in the new format. Unfortunately this still doesn't help since the next error message says:
    The server2 version is incompatible. Utility requires version 5.1.30 or higher.
    ... stumped. I now need to upgrade the older installation to 5.1 in order to use this...
    Hope you got there in the end... :-)

    ReplyDelete
    Replies
    1. Hello. We have that restriction (server version) for several reasons but mainly it is because there is some feature the utility is exploiting that is only available in a certain version and later.

      For example, the 5.1.30 requirement for mysqldbcompare is because it uses CREATE TABLE ... SELECT which was improved in 5.1.30 to fix a bug for binary logged tables.

      Delete
  12. Chuck, you mentioned:

    "One really cool feature would be if the utility generate SQL statements for synchronizing the data and objects. Would this be something you would want to see incorporated?"

    Yes! That would be absolutely fantastic - I know this post is really old but I'm currently experimenting with putting a DB under version control so looking for ways to produce an SQL diff between two versions of a DB. Right now this tool is almost perfect, and all that would be needed is to be able to output a file of the diff in SQL format (ideally with data and schema separately) which I can then add to the db migration tool I'm using (https://github.com/guilhermechapiewski/simple-db-migrate)!!

    Could you please please please add this? :D?

    ReplyDelete
    Replies
    1. I just realised that you did already incorporate this with the "--difftype=sql" parameter! Right now I'm the happiest man alive!!! :D

      Delete
  13. Thanks for this great introduction of mysqldbcompare, just what I needed to get started!

    ReplyDelete
  14. I'm having trouble getting this to work. I'm comparing two databases on the same MAMP server. My command is:
    mysqldbcompare --server1=root:secret@localhost:8888 sim_ebus:sim_mamp

    I get the warning about using a password on the command line, and then it times out after a minute or two. From the examples I've seen, there should be a message about connecting to the server, but it never appears.

    ReplyDelete
    Replies
    1. I've not seen this issue. You can try specifying the --server1 and --server2 options even if they are the same.

      Do the tables have lots of data? It is possible the utility is still running trying to build checksums for the rows (that's how it works) and has known to take a while for large tables.

      Also, use verbosity (-vvv) to see if there are any additional statements.

      Finally, use mysqldiff instead to just check the structures.

      If you are still unable to get it to work, please file a bug on bugs.mysql.com with as much detail as you can including any output, structure of the tables, and sample data (if possible).

      Delete
    2. Thanks for the suggestions.

      Specifying server2 didn't make any difference, and neither did the -vvv flag. mysqldiff wouldn't work either.

      My database has 53 tables with a total of 158K records. Would that be large enough to cause a timeout?

      Delete
    3. Ok, that is strange. mysqdiff has nothing in it which can cause a hang like that.

      Let's take this off the blog so we can communicate in more detail.

      Please email me the version number of Utilities you are using, the version of MySQL, and the version of Connector/Python and Python along with an example of your execution (command).

      Also, do any of the other utilities work? Are you using Utilities via Workbench or standalone?

      Send this to my work email.

      c h u c k dot b e l l at o r a c l e dot c o m

      Delete
    4. Chuck,

      I’m sure I’m making some obvious mistake, since my experience with the MySQL Workbench and Utilities is zero. I’ve always used phpMyAdmin to manage databases, but it doesn’t have a built-in function for comparing databases.

      I’ve wasted enough time trying to get mysqlbdcompare to work. I’ll go back to creating exports from phpMyAdmin and then comparing them in Text Wrangler.

      Thanks for taking an interest in my problem.

      Lorin

      Delete
  15. Possibly I'm missing a trick here, but I can't get mysqldbcompare to CREATE TABLEs (using the sql difftype) when required.
    I'm putting a db under version control and am hoping to automate it, but I'm kinda stuck if mysqldbcompare won't pick up new tables.

    $ mysqldbcompare --quiet --compact --difftype=sql --run-all-tests -vv --server1=script:password@localhost:3306 a:b
    [..]
    # WARNING: Objects in server1.a but not in server1.b
    # TABLE: foo

    ReplyDelete
    Replies
    1. Hi. The utilities only identifies objects that are not in both. It doesn't currently dump the CREATE for missing objects. But that would be a nice feature. I invite you to open a feature request on bugs.mysql.com. :)

      Delete
    2. Thanks, I was vaugely holding out for some undocumented option - oh well.
      http://bugs.mysql.com/bug.php?id=80593

      Delete
  16. Hi,

    I ran this utility on my 1TB data. the process got struck but process is running when it reached to to my 200GB table which has 18Million rows and in utf8 format. And there is no error.

    Checked the tmpdir space and has disk space. Not understanding why the process got hanged.

    ReplyDelete
  17. Hi. Sorry you are having trouble. Can you tell me which version of MySQL Utilities, the MySQL server, and possibly the schema of your table in question? You could try narrowing the scope by creating copies of the table partitioned into smaller chunks. That may help determine what row(s) are causing the problem.

    That said, with large data the utility can sometimes appear to be hung but is actually processing too many rows in memory. We don't have a solution for this yet, but it can become memory bound for large tables, especially if the primary keys are large, have wide distribution, or contain blob fields.

    If you would like, we can continue the conversation via email.

    c h u c k dot b e l l at o r a c l e dot c o m

    ReplyDelete
  18. I Hi Chuck,

    I was looking at your blog because I am searching a solution for my problem when I do mysqldbcompare to a big database MySQL goes down and I have to start the service again. any succession.

    NOTE: I am not a DB ADMIN :)

    ReplyDelete
  19. I'm seeing an odd result. My compare is telling me that three tables are not present in the target when they definitely are there. this makes me concerned about what else it might be getting wrong.

    ReplyDelete
    Replies
    1. Hi, Can you send me more details about the problem you’re seeing? Feel free to contact me at c h u c k dot b e l l at o r a c l e dot c o m.

      Delete
  20. Nice & useful information...thanks

    -Satya
    http://satya-dba.blogspot.com/2018/11/mysql-mysqldbcompare-compare.html

    ReplyDelete