A Simple Data Dictionary

Breck Carter
Last modified: August 8, 1996
mail to: bcarter@bcarter.com

A Simple Data Dictionary

While there was indeed life Before The Personal Computer, it wasn't much of a life. Almost everything about creating an application was hard to do, from designing the files and programs to developing and testing the code and even writing documentation. Because it was hard to do, extreme specialization became the order of the day. Different people assumed control of different tasks and it was not uncommon for a whole department to grow from a single job description.

Nowhere was this more obvious and more unproductive than in the area of database design. In the 1970's it became a common mantra that Data was a company asset, just as important as Money and People, and that it was not to be trusted to mere programmers. When the Relational theory of data became popular, followed quickly by the Entity Relationship scheme, whole projects grew up with the sole aim of designing and documenting something called The Corporate Data Model. These projects inevitably collapsed under their own weight but their legacy remained: database design and implementation were beyond the control of the application development team.

Down at ground level it could take weeks to get a field changed, and months or even years to get a new database. The record, in my experience, was two years to implement what could have been a simple name and address table but instead became inflated into a grandiose "business relationship" database.

The Personal Computer Revolution brought not just power but freedom from tyranny. Because new tools like PowerBuilder and ERwin simplified so many tasks it was now possible to build applications as a series of rapidly developed prototypes. At first the central database design bureaucracies ignored the PC and then discovered they just couldn't cope. This made an ally of the end user, enabling programmers to take back control of the process and develop the databases as well as the programs.

As with any revolution, grotesque mistakes were made. Not everything about The Old Way was bad in principle, some of it was just poorly implemented. In the anarchy of the early 1990s several babies were thrown out with the bath water. One of these was the data dictionary, and that's what this column is about.

In its simplest form a dictionary is an alphabetic list of words and their definitions together with details of usage and pronunciation. It is a working tool for writers.

Similarly, a data dictionary is a working tool for programmers and power users who must deal directly with the database. A simple data dictionary is an alphabetic list of tables and columns and their descriptions, and some technical details like data types and default values. Programmers use it to look stuff up, to answer questions about the data. End users do the same thing when they're trying to construct ad-hoc reports.

Figure 1 shows a screen shot of a simple data dictionary. The tables are arranged alphabetically together with concise definitions and some other details. In this example, SQL Anywhere 5 is the DBMS and publish subscribe replication is in use, so these details include the disposition of each table; i.e., where it is created, how it is replicated and so on.

Figure 1: A Simple Data Dictionary

Also included are the names of files to be used for the new SQL Anywhere 5 unload and load commands. These file names start with the letters tn, where n is a number representing hierarchical dependencies. In other words, t2contac means the contact table must be loaded after and deleted before other tables marked t1. This makes it possible to write a simple PowerBuilder application to dump and reload an entire database to and from text files, and have that program driven from the dictionary.

The last line of each table definition describes the end user access privileges, making it possible to automatically generate grant commands from the dictionary.

Each table definition is immediately followed by an alphabetic list of its columns, together with their data types, null versus not null settings, default values and an indication of whether it is part of a primary, foreign and/or alternate key (PK, FK, AK). As with the tables, a concise free-form description appears directly beneath each column name.

Some of these details are specific to the needs of one particular project. Their primary importance is to the human reader, not computer programs. Many of these details are available elsewhere; for example, the data types are visible in the database diagram shown in Figure 2, and the default values and access privileges can be discovered by querying the system catalog. The dictionary serves to gather the important information together in one place and, of course, to provide the table and column descriptions that make a dictionary what it is.

Figure 2: The Contact Table

In this example, ERwin/ERX for PowerBuilder was used to create the database. The ERwin Entity Definition and Attribute Definition editors were used to store the table and column descriptions respectively. As well, the three-part Entity Note/Query/Sample Editor was used to hold specific details about each table as shown in Figure 3. The text entered here may not fit the purpose for which these fields were originally intended, but that doesn't matter; it was easy, it worked, and information entered was useful to the project team.

Figure 3: Using And Abusing ERwin Editors

The main reason to store dictionary information in the database diagram itself is ease of data entry. While it might seem like an oxymoron to claim that ERwin's MultiLineEdit text fields are easy to use when compared with, say, Microsoft Word, the truly difficult data entry task comes when the database changes. It is quite hard to ensure correctness and completeness if the dictionary data is stored separately from the diagram.

If the wrong table and column names appear, if new entries are missing and old entries remain, if the technical details are incorrect the audience will come to mistrust the dictionary. When that happens, as with any documentation that is grossly out of date, it becomes useless and should be abandoned as a waste of time. It is still possible for "garbage in, garbage out" errors to creep in to the text itself but the really big mistakes are avoided by storing it in the diagram.

When it's time to print or display the dictionary, of course, Microsoft Word would be nice. It might be possible to automate a linkage between the diagram and a document via ERwin's DDE facility, or some fancy arrangement involving OLE, but that's where the rubber meets the sky and this is ground level. The dictionary is in the diagram, let's just print it.

ERwin's Entity and Attribute Report facilities provide a way to export the dictionary data to tab-delimited text files. PowerBuilder offers a way to import these files into a database, and the ubiquitous DataWindow can be used to produce final report. Figure 4 shows the layout of two dictionary tables that can be used to provide the link between ERwin and the report. These tables can be stored together with the main database or separately. If you choose the separate database route, you can standardize your dictionary in, say, Watcom SQL or SQL Anywhere even if you're using multiple DBMS's for your applications.

Figure 4: The Dictionary Tables

The trick to exporting ERwin data is to blank out all the fields in the Entity and Attribute Report windows, then check the options in the same order as the columns appear in Figure 4, and finally check the fields labeled Remove Newline and Tab Delimited. To verify your work, have a look at the resulting text files and make sure the column titles on the first line agree with the names in Figure 4.

You can then import these files by hand using the Database Painter or write a PowerBuilder program to automate the task via ImportFile(). In either case, the first line of imported text must be skipped or deleted before the data is saved to the entity and attribute tables.

Once the dictionary data is loaded into the entity and attribute tables, it must be flattened out into a single result set for reporting purposes. This can be accomplished with a DataWindow using the SQL union of two selects shown in Listing 1. Syntax mode is used because it's easier than graphics mode in cases like this. Each column in the first select is given an alias name to force the DataWindow painter to use that as the column name.

Listing 1: The Dictionary Selects

   select table_name as table_name,
      '' as column_name,
      entity_definition as entity_definition,
      entity_note as entity_note,
      '' as column_datatype,
      '' as null_option,
      '' as default_value,
      '' as pk, '' as fk, '' as ak,
      '' as attribute_definition,
      sample_instance_data
        as sample_instance_data,
      query as query
   from entity
   union
   select table_usage,
      column_name,
      '',
      '',
      column_datatype,
      null_option,
      default_value,
      pk, fk, ak,
      attribute_definition,
      '',
      ''
   from attribute
   order by 1, 2

With the columns from Listing 1 it is possible to produce the report in Figure 1 with a DataWindow that requires no extra PowerScript code. In other words, it can be run from the DataWindow Painter in Preview mode. Figure 5 shows such a DataWindow in all its glory, complete with computed fields and overlapping columns.

Figure 5: The Dictionary DataWindow

It might not be apparent from Figure 5 but the detail band contains two database columns and 3 computed fields. The columns are entity_definition and attribute_definition and the three computed field expressions are shown in Listing 2. The // comments are actually included in the expressions to make it easier to figure out which field is which when working in the DataWindow painter.

Listing 2: Computed Field Expressions

   // table_details
   if ( column_name = '',
        entity_note
           + '~r~n'
           + 'Unload/load data: '
           + sample_instance_data
           + '~r~n'
           + query,
       '' )

   // column_title
   ' ' +  WordCap ( column_name )

   // column_type
   '  ' + lower ( column_datatype ) + ' '
      + lower ( null_option ) + ' '
      + if ( default_value <> '',
             'default ' + default_value,
             '' )
      + ' ' + pk + ' ' + fk + ' ' + ak

At runtime each detail row displays either table data, or column data, but not both. In other words, either entity_definition and table_details are filled in and column_title, column_type and attribute_definition are blank, or vice versa. That's why they can be overlapped as shown in Figure 5 to produce the effect in Figure 1.

All the Edit Style options like Auto H Scroll and Auto V Scroll are turned off for the two columns entity_definition and attribute_definition. Autosize Height is turned on for these two columns, and for the detail band as well, so they can grow vertically when they contain a lot of text. The table_details field also has Autosize Height turned on, as well as Slide - Up - All Above so it can both grow vertically and slide up to meet the varying height entity_definition column.

Both column_title and column_type have Autosize Height turned off because they each contain only a single line of text. The column_type field has Slide - Left turned on so it will appear close to the column name. The attribute_definition column has Autosize Height turned on but not Slide - Up because although it varies in height, nothing above it does.

When you put all this together, you get a report that has some of the niceties available in Microsoft Word and is definitely readable. And the DataWindow is self-contained: you can wrap a PowerBuilder program around it if you want but that's not really necessary.

As mentioned earlier, the query column in the entity table is used to hold information about end user access rights. When the SQL command in Listing 3 is executed in the Database Administration painter, it creates a single-column result set containing a long series of Watcom SQL grant commands. This result set can be saved to an SQL file by using File - Save Rows As - Text and then loaded back into the Database Administration painter as a command file. It can then be executed to grant all privileges to the "administration" group and specific privileges to "endusers."

Listing 3: Generating Grant Commands

   select string ( 'grant all on ',
          table_name, ' to administration;' )
          as "GRANT COMMAND" from entity
   union select string ( 'grant delete on ',
         table_name, ' to endusers;' )
    from entity where query like '%delete%'
   union select string ( 'grant insert on ',
         table_name, ' to endusers;' )
    from entity where query like '%insert%'
   union select string ( 'grant select on ',
         table_name, ' to endusers;' )
    from entity where query like '%select%'
   union select string ( 'grant update on ',
         table_name, ' to endusers;' )
    from entity where query like '%update%'
   order by 1;

The text stored in the query column has been written to be both readable by human beings and usable by the SQL command in Listing 3. Precise positive statements are used, such as "End users have select, insert and update rights" rather than vague or negative sentences like "End users can't remove any rows from this table."

A data dictionary is a valuable tool if it contains practical and useful information, and if it is kept up to date and accurate. That implies it must reflect the physical reality of the database rather than some nebulous logical view, and that it must be easy to update and generate. The techniques described in this article are only one way to achieve these goals. You can just as easily use some other database design tool to hold the dictionary information, or even a combination of the DBMS system catalog tables plus PowerBuilder's own PBCATxxx extended catalog tables to store it.

See also: Code For A Simple Data Dictionary.

Breck Carter can be reached by phone at (416) 763-5200 or via email at bcarter@bcarter.com.