Breck Carter
Last modified: February 23, 1998
mail to: bcarter@bcarter.com
Tip 78: Replication Step By Step
Some people have no trouble understanding replication. That's because
they're very smart. They don't have any difficulty dealing with
sophisticated arrangements of multiple databases on multiple computers
all exchanging data with each other. Those, of course, are the people
who should be in charge of replication.
Then there's the rest of us. Folks like me who do find it hard
to keep track of complex environments that keep changing all the time.
And that can spell trouble when we're the ones put in charge.
SQL Anywhere replication looks straightforward on paper. The product
is well designed and implemented, each step is simple, and each command
and program is easy to understand. It's just that there are so many
steps and so many commands and programs to run that real life gets very
complicated very fast.
And error prone too. Each step offers an opportunity for something to
go wrong, a chance for you to make a mistake. Replication errors can
have profound implications, even something as simple as "Oops, I forgot
to do that" or running commands in the wrong order. Mistakes can cascade
and cause other symptoms, eventually destabilizing the replication
network and causing lost or inconsistent data. You can tell this has
happened when you hear yourself groan, "Oh, no, I gotta run DBXTRACT
again!"
This document is an attempt to fill the gap between understanding the
basics of SQL Anywhere replication and dealing with a production system.
It is a detailed examination of the steps required to run a simple
replication setup. Unlike some tutorials, this demonstration actually
uses more than one computer.
The purpose is three-fold:
- To make it clear there really are a lot of steps,
- To show exactly how these steps work and how they are related to one
another,
- To present some tips and techniques for dealing with the
complexity.
The important word above is "detailed". You can follow along and run
the commands or just read the step-by-step instructions, either way you
won't be missing much. If you think it's too detailed, well, you're
probably one of those bright folks who don't need to read it anyway.
Here's a list of what this document is not:
- It's not a replication manual. The SQL Anywhere manuals describe the
syntax and semantics in complete detail. That's where you should look
for documentation of individual commands and programs. Either there, or
in the highly-recommended help file DBENG50W.HLP which contains all the
material from the manuals. Of particular interest is the new 411 page
manual "Data Replication with SQL Remote" which was published in July
1997 and is included in the help file dated September 10, 1997.
- It's not an introduction to replication. The manuals and help file
contain a basic tutorial as well as many discussions about replication
philosophy and design.
- It's not a simulation of a real world business system. The columns
and tables all have artificial names like "replication1" and
"string_data2". This is done to divide and conquer, to separate
technical topics from business issues, and to concentrate entirely on
replication techniques.
This is the kind of article that once you put it down, you can't
pick it up again.
Well, maybe it's not that bad. But it did cause one early reviewer to
coin the phrase "publish subscribe repitation".
A Replication Refresher
Figure i1 lists the basic characteristics of publish subscribe
replication with SQL Anywhere.
Figure i1 - A Replication Refresher
- Replication -
The copying of data (insertions, deletions,
updates) among multiple databases.
- Central administration -
Oriented towards a mobile
workforce.
- Small footprint -
Disk and RAM requirements are
reasonable.
- Large fan-out -
Optimized for a large number of remote
databases (often standalone) for each central database.
- Occasional connections -
Designed for users that are only
occasionally connected to the central database.
- High latency -
Characterized by long time lags between data
entry and replication to other databases.
- Low volume -
Best suited for small amounts of replicated data
per database.
- Homogenous databases -
Designed for SQL Anywhere and SQL
Server databases with very similar schema.
- Local availability -
Unaffected by communications problems or
a failure of the central server.
- Local performance -
Response times and transfer rates are
better than across a WAN.
- Server offloaded -
Only updates are replicated, not repeated
queries.
- Transaction-log based architecture -
Based on replication of
committed transactions, .LOG file required.
- Full DBENG50*.EXE required -
The royalty-free RTDSK50*.EXE
Desktop Runtime engine is not sufficient.
- Tight integrity -
Transactions are replicated
atomically.
- Loose consistency -
Consistent replication over time with
differences in copies of data caused by time lags.
- Hierarchical configuration -
As opposed to a peer-to-peer
relationship among databases.
- Consolidated database -
Contains all the data to be
replicated, plus other data.
- Remote database -
Contains part or all of the data to be
replicated, plus other data.
- No system-wide locking -
Conflicts must be designed out of
the system or resolved at the consolidated database.
- Remote user -
A single user id represents the remote database
for replication purposes.
- Message-based data transfer -
Via MAPI, VIM, SMTP or "message
files".
- Publish Subscribe -
The send-and-receive replication method
uses publications and subscriptions.
- Publication -
Database object describing the subset of data
to be replicated; e.g.:
CREATE PUBLICATION publication-name
( TABLE table-name SUBSCRIBE BY column-name )
- Subscription -
Identifies which user (database) is
subscribing to which publication; e.g.:
CREATE SUBSCRIPTION TO publication-name
( 'column-value' ) FOR target-database-user-id
- Article -
How a single table participates in a
publication.
- Update -
When changes to data are replicated to all
subscribers to a publication.
- Publisher -
The user id which sends output messages from a
consolidated or remote database.
- Subscriber -
The user id which receives input messages sent
to a remote or consolidated database.
- Two-way replication -
Updates are replicated up and down in
the consolidated-remote hierarchy by default.
- Replication between remote sites -
This is possible by
replicating up and down via the consolidated database.
- Synchronization -
Making a remote database consistent with
the consolidated database.
- Initialization utility -
DBINIT creates the .DB and .LOG
files for each database.
- Extraction utility -
DBXTRACT creates SQL and data files to
load a synchronized remote database from scratch.
- Message Agent -
DBREMOTE runs on each database to gather and
apply changes and to exchange messages.
- SSXTRACT, SSREMOTE -
Extraction utility and message agent for
SQL Server.
- SQL Remote -
Another name for DBREMOTE, or the replication
product as a whole.
- Batch mode -
Where DBREMOTE starts and then shuts down after
processing all current messages.
- Continuous mode -
Where DBREMOTE periodically sends and
continuously receives messages.
- SYNCHRONIZE SUBSCRIPTION -
Synchronizes a subscription for
one remote user via DBREMOTE.
- PASSTHROUGH -
Facility for sending SQL commands via DBREMOTE
to execute on remote databases.
- Sybase Central -
SCVIEW is a graphical tool for managing
databases.
- ISQL -
An interactive tool for executing SQL commands.
- Importance of database design -
To avoid errors, conflicts
and complexity.
- Data Replication is not Data Distribution -
Life is good
within the limits of publish subscribe!
|
Figure i2 presents some basic assumptions and options in effect for
this demonstration.
Figure i2 - Let's Make Some Assumptions
- Windows NT 4 is running on the central server.
- Windows For Workgroups 3.11 is running on the remote computers.
- SQL Anywhere 5.5.03 is used on all the computers.
- "FILE" replication is being used instead of mail-based "MAPI",
"SMTP" or "VIM".
- A peer-to-peer shared drive setup is used to send the replication
files.
- C: is the local drive used on the central server.
- C: is the local drive used on the first remote computer.
- F: is the shared drive letter used on the central server to refer to
the first remote C: drive.
- Directory, file and user ids are limited to 8 characters for
simplicity when dealing with Windows 3.x.
- \TEST is the name of the base directory used on both the central
server and the remote computers.
- SITEnnnn is the naming convention for remote database user ids and
their associated subdirectories.
- CONSOL is the name of the consolidated database user id and its
associated subdirectories.
- DBREMOTE replication files are placed in the \TEST\CONSOL and
\TEST\SITEnnnn subdirectories .
- DBXTRACT files are also placed in \TEST\SITEnnnn.
- C:\TEST\CONSOL.DB and .LOG are the file names for the consolidated
database.
- C:\TEST\REMOTE.DB and .LOG are the file names for the remote
databases.
- ISQL is used for administrative tasks instead of Sybase
Central.
- ISQL and ISQLW are used to simulate application programs making
database updates.
- Only SQL Anywhere and DBREMOTE/DBREMOTW are shown, not SQL Server or
SSREMOTE.
|
This document is divided into five sections:
Figure i3 shows the Sections and Steps "at a glance": what programs
are run, when they're run and in what order, and what files are passed
back and forth. This list may make more sense after you've had a chance
to read some of the document... come back here when you need a road map.
Figure i3 - Steps And Sections At A Glance
Central File Remote See
Section Server Transfer Computer Step
------- -------------- ------------- -------------- ----
One DBINIT, DBSRV50 2
ISQL CREATE 2
Two DBXTRACT Reload 5
Reload --> 8
DBINITW 9
ISQLW Reload 9
Three ISQL Update A 14
DBREMOTE A 17
ISQLW Update B 19
A --> 22
DBREMOTW A, B 23
ISQL Update C 26
<-- B 29
DBREMOTE B, C 31
ISQLW Update D 34
C --> 37
DBREMOTW C, D 40
Four ISQL Update E 43
ISQLW Update F 46
ISQL Synchronize 49
DBREMOTE Synch 52
Synch --> 54
DBREMOTW Synch 56
Five ISQL Change DDL 59
ISQL Passthrough 61
DBREMOTE Passthrough 64
ISQL Start Replication 66
DBREMOTE Start Rep 69
Pass, Start --> 71
DBREMOTW Pass, Start 73
|
In the body of this document some of the steps are marked "required".
That means these steps are necessary if this demonstration is going to
make any sense at all. They may or may not be needed for real-world
replication setups, and even if they are they may look different.
Other steps are marked "suggested" and "optional" which means they
aren't absolutely necessary even for this demonstration.
Figure i4 is an alphabetic list of the various batch, SQL and other
files provided with this demonstration. It also shows where each file is
used: on the central server, remote computer, or both.
Figure i4 - Files Provided With This Demonstration
These files are all placed in the C:\TEST directory on the central
server and remote computers. Complete listings appear elsewhere in this
document, and the files are also available in machine readable form.
Batch files and command lines are used in this document instead of
Sybase Central for the following reasons:
- Sybase Central is only available on 32-bit operating systems,
- Command lines and batch files are explicit and concise, and
- Batch files help automate repetitive tasks.
Figure i5 lists some of the other documents and resources that
contain information about replication with SQL Anywhere.
Figure i5 - Other Resources