Tip 78 - Introduction - Replication Step By Step

Breck Carter
Last modified: February 23, 1998
mail to: bcarter@bcarter.com

Tip 78: Replication Step By Step

Introduction

Some people have no trouble understanding replication. That's because they're very smart. They don't have any difficulty dealing with sophisticated arrangements of multiple databases on multiple computers all exchanging data with each other. Those, of course, are the people who should be in charge of replication.

Then there's the rest of us. Folks like me who do find it hard to keep track of complex environments that keep changing all the time. And that can spell trouble when we're the ones put in charge.

SQL Anywhere replication looks straightforward on paper. The product is well designed and implemented, each step is simple, and each command and program is easy to understand. It's just that there are so many steps and so many commands and programs to run that real life gets very complicated very fast.

And error prone too. Each step offers an opportunity for something to go wrong, a chance for you to make a mistake. Replication errors can have profound implications, even something as simple as "Oops, I forgot to do that" or running commands in the wrong order. Mistakes can cascade and cause other symptoms, eventually destabilizing the replication network and causing lost or inconsistent data. You can tell this has happened when you hear yourself groan, "Oh, no, I gotta run DBXTRACT again!"

This document is an attempt to fill the gap between understanding the basics of SQL Anywhere replication and dealing with a production system. It is a detailed examination of the steps required to run a simple replication setup. Unlike some tutorials, this demonstration actually uses more than one computer.

The purpose is three-fold:

To make it clear there really are a lot of steps,
To show exactly how these steps work and how they are related to one another,
To present some tips and techniques for dealing with the complexity.

The important word above is "detailed". You can follow along and run the commands or just read the step-by-step instructions, either way you won't be missing much. If you think it's too detailed, well, you're probably one of those bright folks who don't need to read it anyway.

Here's a list of what this document is not:

It's not a replication manual. The SQL Anywhere manuals describe the syntax and semantics in complete detail. That's where you should look for documentation of individual commands and programs. Either there, or in the highly-recommended help file DBENG50W.HLP which contains all the material from the manuals. Of particular interest is the new 411 page manual "Data Replication with SQL Remote" which was published in July 1997 and is included in the help file dated September 10, 1997.
It's not an introduction to replication. The manuals and help file contain a basic tutorial as well as many discussions about replication philosophy and design.
It's not a simulation of a real world business system. The columns and tables all have artificial names like "replication1" and "string_data2". This is done to divide and conquer, to separate technical topics from business issues, and to concentrate entirely on replication techniques.

This is the kind of article that once you put it down, you can't pick it up again.

Well, maybe it's not that bad. But it did cause one early reviewer to coin the phrase "publish subscribe repitation".

A Replication Refresher

Figure i1 lists the basic characteristics of publish subscribe replication with SQL Anywhere.

Figure i1 - A Replication Refresher

Replication - The copying of data (insertions, deletions, updates) among multiple databases.

Central administration - Oriented towards a mobile workforce.

Small footprint - Disk and RAM requirements are reasonable.

Large fan-out - Optimized for a large number of remote databases (often standalone) for each central database.

Occasional connections - Designed for users that are only occasionally connected to the central database.

High latency - Characterized by long time lags between data entry and replication to other databases.

Low volume - Best suited for small amounts of replicated data per database.

Homogenous databases - Designed for SQL Anywhere and SQL Server databases with very similar schema.

Local availability - Unaffected by communications problems or a failure of the central server.

Local performance - Response times and transfer rates are better than across a WAN.

Server offloaded - Only updates are replicated, not repeated queries.

Transaction-log based architecture - Based on replication of committed transactions, .LOG file required.

Full DBENG50*.EXE required - The royalty-free RTDSK50*.EXE Desktop Runtime engine is not sufficient.

Tight integrity - Transactions are replicated atomically.

Loose consistency - Consistent replication over time with differences in copies of data caused by time lags.

Hierarchical configuration - As opposed to a peer-to-peer relationship among databases.

Consolidated database - Contains all the data to be replicated, plus other data.

Remote database - Contains part or all of the data to be replicated, plus other data.

No system-wide locking - Conflicts must be designed out of the system or resolved at the consolidated database.

Remote user - A single user id represents the remote database for replication purposes.

Message-based data transfer - Via MAPI, VIM, SMTP or "message files".

Publish Subscribe - The send-and-receive replication method uses publications and subscriptions.

Publication - Database object describing the subset of data to be replicated; e.g.:

CREATE PUBLICATION publication-name ( TABLE table-name SUBSCRIBE BY column-name )

Subscription - Identifies which user (database) is subscribing to which publication; e.g.:

CREATE SUBSCRIPTION TO publication-name ( 'column-value' ) FOR target-database-user-id

Article - How a single table participates in a publication.

Update - When changes to data are replicated to all subscribers to a publication.

Publisher - The user id which sends output messages from a consolidated or remote database.

Subscriber - The user id which receives input messages sent to a remote or consolidated database.

Two-way replication - Updates are replicated up and down in the consolidated-remote hierarchy by default.

Replication between remote sites - This is possible by replicating up and down via the consolidated database.

Synchronization - Making a remote database consistent with the consolidated database.

Initialization utility - DBINIT creates the .DB and .LOG files for each database.

Extraction utility - DBXTRACT creates SQL and data files to load a synchronized remote database from scratch.

Message Agent - DBREMOTE runs on each database to gather and apply changes and to exchange messages.

SSXTRACT, SSREMOTE - Extraction utility and message agent for SQL Server.

SQL Remote - Another name for DBREMOTE, or the replication product as a whole.

Batch mode - Where DBREMOTE starts and then shuts down after processing all current messages.

Continuous mode - Where DBREMOTE periodically sends and continuously receives messages.

SYNCHRONIZE SUBSCRIPTION - Synchronizes a subscription for one remote user via DBREMOTE.

PASSTHROUGH - Facility for sending SQL commands via DBREMOTE to execute on remote databases.

Sybase Central - SCVIEW is a graphical tool for managing databases.

ISQL - An interactive tool for executing SQL commands.

Importance of database design - To avoid errors, conflicts and complexity.

Data Replication is not Data Distribution - Life is good within the limits of publish subscribe!

Figure i2 presents some basic assumptions and options in effect for this demonstration.

Figure i2 - Let's Make Some Assumptions

Windows NT 4 is running on the central server.

Windows For Workgroups 3.11 is running on the remote computers.

SQL Anywhere 5.5.03 is used on all the computers.

"FILE" replication is being used instead of mail-based "MAPI", "SMTP" or "VIM".

A peer-to-peer shared drive setup is used to send the replication files.

C: is the local drive used on the central server.

C: is the local drive used on the first remote computer.

F: is the shared drive letter used on the central server to refer to the first remote C: drive.

Directory, file and user ids are limited to 8 characters for simplicity when dealing with Windows 3.x.

\TEST is the name of the base directory used on both the central server and the remote computers.

SITEnnnn is the naming convention for remote database user ids and their associated subdirectories.

CONSOL is the name of the consolidated database user id and its associated subdirectories.

DBREMOTE replication files are placed in the \TEST\CONSOL and \TEST\SITEnnnn subdirectories .

DBXTRACT files are also placed in \TEST\SITEnnnn.

C:\TEST\CONSOL.DB and .LOG are the file names for the consolidated database.

C:\TEST\REMOTE.DB and .LOG are the file names for the remote databases.

ISQL is used for administrative tasks instead of Sybase Central.

ISQL and ISQLW are used to simulate application programs making database updates.

Only SQL Anywhere and DBREMOTE/DBREMOTW are shown, not SQL Server or SSREMOTE.

This document is divided into five sections:

Figure i3 shows the Sections and Steps "at a glance": what programs are run, when they're run and in what order, and what files are passed back and forth. This list may make more sense after you've had a chance to read some of the document... come back here when you need a road map.

Figure i3 - Steps And Sections At A Glance

Central File Remote See Section Server Transfer Computer Step ------- -------------- ------------- -------------- ---- One DBINIT, DBSRV50 2 ISQL CREATE 2 Two DBXTRACT Reload 5 Reload --> 8 DBINITW 9 ISQLW Reload 9 Three ISQL Update A 14 DBREMOTE A 17 ISQLW Update B 19 A --> 22 DBREMOTW A, B 23 ISQL Update C 26 <-- B 29 DBREMOTE B, C 31 ISQLW Update D 34 C --> 37 DBREMOTW C, D 40 Four ISQL Update E 43 ISQLW Update F 46 ISQL Synchronize 49 DBREMOTE Synch 52 Synch --> 54 DBREMOTW Synch 56 Five ISQL Change DDL 59 ISQL Passthrough 61 DBREMOTE Passthrough 64 ISQL Start Replication 66 DBREMOTE Start Rep 69 Pass, Start --> 71 DBREMOTW Pass, Start 73

In the body of this document some of the steps are marked "required". That means these steps are necessary if this demonstration is going to make any sense at all. They may or may not be needed for real-world replication setups, and even if they are they may look different.

Other steps are marked "suggested" and "optional" which means they aren't absolutely necessary even for this demonstration.

Figure i4 is an alphabetic list of the various batch, SQL and other files provided with this demonstration. It also shows where each file is used: on the central server, remote computer, or both.

Figure i4 - Files Provided With This Demonstration

(Download rep.zip (33K) to get these files.)
Used on Used on Central Remote File Name Fig. Req'd Server Computer Description CHECKDDL.SQL 67a Yes Yes Check that the passthrough processing worked CONSDDL.SQL 58a Yes Yes DDL commands for the consolidated database CONSREMO.BAT 17a Yes Yes Run DBREMOTE for consolidated-to-remote replication CREDATA.SQL 2d Yes Yes Load some data on the consolidated database CREPUBS.SQL 2c Yes Yes Set up the publications and subscriptions CRETABS.SQL 2a Yes Yes Create the tables CREUSERS.SQL 2b Yes Yes Set up replication-related user ids MAKECBAK.BAT 3a Yes Back up the consolidated database in C:\TEST\CBAKnn MAKECDIR.BAT 1a Yes Yes Make replication subdirectories on the central server MAKERBAK.BAT 3b Yes Back up a remote database in C:\TEST\RBAKnn MAKERDIR.BAT 6a Yes Yes Make replication subdirectories on a remote computer PASS.SQL 60a Yes Passthrough commands REPSTAT.EXE 4e Yes Replication status program REPSTAT.PBL 4f Yes Replication status source code RESTCBAK.BAT 3c Yes Restore the consolidated database from C:\TEST\CBAKnn RESTRBAK.BAT 3d Yes Restore a remote database from C:\TEST\RBAKnn SELDATA.SQL 4a Yes Yes Summarize the current database status STARTREP.SQL 65a Yes Yes Start replication of the new replication4 table SYNC0001.SQL 48a Yes Yes Synchronize subscriptions for SITE0001 SYSREMO.SQL 4c Yes Yes Display sys.sysremoteusers UPDATE_A.SQL 14a Yes Yes "Update A" for the consolidated database UPDATE_B.SQL 19a Yes Yes "Update B" for the remote database UPDATE_C.SQL 26a Yes Yes "Update C" for the consolidated database UPDATE_D.SQL 34a Yes Yes "Update D" for the remote database UPDATE_E.SQL 42a Yes Yes "Update E" for the consolidated database UPDATE_F.SQL 45a Yes Yes "Update F" for the remote database XTRACT.BAT 5a Yes Yes Run DBXTRACT on the consolidated database

These files are all placed in the C:\TEST directory on the central server and remote computers. Complete listings appear elsewhere in this document, and the files are also available in machine readable form.

Batch files and command lines are used in this document instead of Sybase Central for the following reasons:

Sybase Central is only available on 32-bit operating systems,
Command lines and batch files are explicit and concise, and
Batch files help automate repetitive tasks.

Figure i5 lists some of the other documents and resources that contain information about replication with SQL Anywhere.

Figure i5 - Other Resources

Manual: Data Replication with SQL Remote - July 1997

Help file: \sqlany50\win32\dbeng50w.hlp - 2,521,989 bytes, 1997 09 10 12:00p

Manuals: Sybase SQL Anywhere User's Guide Volumes I and II - 1995

Manual: Sybase SQL Anywhere Network Guide - 1995

Web site: http://www.sybase.com/products/system11/workplace/sqlindex.html

Web site: http://support.sybase.com/

Web site: http://techinfo.sybase.com/

Web site: http://sybooks.sybase.com/

FTP site: ftp://ftp.powersoft.com/pub2/sqlany/maintfls/

[Home] [Table of Contents] [Previous Section] [Next Section]