Zen and the Art of Systems Analysis

Meditations on Computer Systems Development

by Patrick McDermott

8. MEDITATIONS ON A MODEL

The Opposite of a Profound Truth
is Another Profound Truth.

Good modeling often requires the Right Meditation. Chapter 8 delves into the most important analytical skill: the ability to abstract complex systems and build models to understand, explain and design systems. We’ll look primarily at the nuts and bolts (actually the boxes & lines) of data models as we ponder the quintessence of Classes, Entities, Attributes and Relationships. In the process we’ll answer some koans presented as guides to the truth.

Zen meditates on the relationship of perception to reality and concludes life is an illusion. This is true of models, as Craig Larman illustrates on the cover for his book Applying UML and Patterns, which echoes René Magritte’s painting Ceci n’est pas une pipe. Magritte shows us a painting of a pipe that is so realistic as to appear from a distance to be a three-dimensional model, and warns us: “This is not a pipe”. Larman shows us a data model of a sailboat and warns us “This is not a sailboat”. The data model is not the computer system, and the goal is not to build a model, it’s to build a system, so don’t mimic Pygmalion of Greek mythology. Pygmalion sculpted a model of his ideal woman. Unfortunately, he then found himself in love with the model, and no real woman could please him. Don’t make Pygmalion your role model, unless you have a goddess friend called Aphrodite who can turn the model into reality for you like Pygmalion did.

The Sound of One Hand Clapping

In Zen Buddhism, an acolyte is given an essentially unanswerable puzzle, called a k?an (pronounced Koe AHN), to ponder on the way to enlightenment. One of the most famous koans is one created by Rinzai Zen master Hakuin (1686-1769) that goes like this: “You know the sound of two hands clapping—But what is the sound of one hand clapping?” Many questions in systems analysis are like Zen koans. Some of the conundrums we analysts face every day:

1. How Can You See What Cannot Be Seen?
2. How Many Words Is One Picture Worth?
3. What is The Ultimate Data Model?
4. What Should One Model?
5. When Is a Fact About a Thing In Fact a Thing?
6. Is There Organization Object Ontology?
7. What Is the Metaphysics of Metadata?
8. This can become so confusing you might even ask:
   “Who Am I?”

In many cases analysts talk a lot to answer little, like an incandescent bulb generating much more heat than light, but there is often more at stake than minutia and esoterica. Simply putting an attribute at the wrong level of a hierarchy of entities can be an incredibly expensive error, and these seemingly endless and random ruminations can prevent that. One of the intriguing aspects of analysis is this necessity to understand the quintessence of the situation, so we’ll look at these analysis koans as a way of enlightenment.

Oh, yes, the sound of one hand clapping. If you’re a systems analyst, you’re probably already familiar with it without even realizing it. Here’s how to hear it: First, if you’re wearing glasses, remove them. Then extend your hand out, palm up. Vigorously hit your forehead with the palm of your hand. The smack you hear is the sound of one hand clapping.

1. How Can You See What Cannot Be Seen?

Musashi tells us we must perceive what cannot be seen as his seventh principle. The primary method used by analysts to see the unseen is the model. So before we understand our koan, we need to know what a model is. Models are a critical component of the Systems Analyst’s toolkit. I can’t imagine how to design a database without data models to visualize the relationships, although some people still do it without them. A model is just an abstraction of a part of the world. Models serve many purposes outside systems analysis. Fashion models allow consumers to visualize, or perhaps fantasize, what they might look like in designer clothes. Budgets are models that allow managers to visualize, or perhaps fantasize, how project spending might unfold. Business models allow entrepreneurs to visualize, or perhaps fantasize, how they might make money.

A model can be one of three things. It can be a representation of something. Data models help understand the structure and connections of data so a useful database can be conceptualized. A model can be a pattern for something to be made. Prototypes are models that allow users to describe how the eventual computer system should function so programmers can use them as a pattern for the system. A model can be an analogy or description used to help see something not normally seen. Rutherford developed a model to describe atomic structure. Mathematical models let economists and meteorologists predict events. Okay, green tea leaves are usually better predictors than econometric and meteorological models, but you get the idea.

Many of our models will be pictures or diagrams, allowing us to visualize the process and experiment with it before actually implementing it. But models are not always visual. Econometric models are systems of mathematical equations. And the use case models used in UML are more text-based than visual, but they are models nonetheless—models of interactions between people and computers. In fact, a computer program is a model, and a database is, too.


[   [   [


Some caveats are in order. An important point: most models simplify the object they portray, and will not be exact representations of the real world. A good model will highlight what we are interested in, but hide what we are not. It is usually cheaper and safer to manipulate a model than it is to manipulate real world objects.

Models are only valuable for the purpose they serve; they have no intrinsic worth. I once took over a project that had been in analysis (not psychoanalysis, though that might have been appropriate) for about two years, and had produced many pages of circles, boxes and arrows. There were circles, boxes and arrows on boards; circles, boxes and arrows on the walls; and especially circles, boxes and arrows in binders—but no system, and not even one program. A classic case of analysis paralysis, the time had long passed to just do it. Once again, you must seek the Middle Way. The model should receive enough effort to fulfill its purpose, and no more. If your goal is to explain, cut unnecessary detail. If it is ephemeral, don’t put too much effort into it.

2. How Many Words is One Picture Worth?

The saying is “A picture’s worth a thousand words”, but that is not true. It can be a gross understatement. The file that contains a typical picture is about 500,000 bytes, which means it could easily hold a hundred thousand English words. Try to write a “word picture”—Describe, using words alone, the woodblock from Hokusai’s 36 Views of Mount Fuji. For someone to actually reproduce the woodblock from words without having seen it even 100,000 words would not suffice. And a scan of a page usually requires many more bytes than the text on it. (Incidentally, Hokusai is an early case of scope creep—there are actually are more than 36 views in the series 36 Views of Mount Fuji.)

Which better conveys an idea, words or pictures? Embrace contradiction: choose the right tool—the answer is “It depends”. For a high-level overview, you can’t beat a diagram, but eventually diagrams aren’t appropriate. They become too complex and don’t illustrate desired system functions. For example, trying to show every decision point and path in a program will simply result in a line of tangles similar to a photo of the wires and cables behind your desk.

Flowcharting detailed program logic is an example of a picture of little value. Assume a system in which we need to process all the Items for a given Order. A flowchart of a loop statement requires a diagram with four boxes, a diamond and five lines, but the code itself looks like this:

int Counter = 0;
while (Counter++ < ItemCount)
ValidateItem();

The flowchart diagram is much larger than the statements themselves, and doesn’t translate directly into code in any event. The flowchart can be a teaching tool to get the concept of the loop or if statement across to novices, but after that it is of no value.

The Microsoft Access ® Query builder feature surprisingly enough doesn’t even work well for teaching. I at first thought this feature would help students who had no programming background grasp Boolean concepts. Logical and’s and or’s and not’s, are difficult to grasp and explain, and I thought this visual approach would be helpful. Instead of grasping the actual SQL code, that looks like this:

SELECT Name, City, State, CreditLimit

FROM Customer

WHERE (City="Oakland" OR City="Redmond") OR

(State="CA" AND CreditLimit>1000) OR

CreditLimit>10000;

They could visualize it on the Select Query screen with the criteria neatly arranged in order and no clumsy syntax to confuse them.

Surprisingly, neither of these representations was any easier to grasp than the other. Any student who could understand the Access query could understand the SQL and vice versa. If they didn’t understand it one way, they didn’t understand it the other way, either. The complexity was in understanding the concepts, not visualizing them. In models, attributes are best conveyed as lists, and even visual color-coding doesn’t seem to help. And so with flow charts, decision trees and decision tables, the picture is not worth any words.

A Picture is Worth 100,000 Words
—or Zero


3. What is The Ultimate Data Model?

The ultimate universal data model is a box labeled “Thing” with a many-to-many optional recursive relationship. Every thing is a thing, and so only one entity type is needed. And as the First Law of Ecology tells us, everything is related to everything else.

That’s one extreme. At the other extreme, everything is unique, and thus needs to be a unique entity unto itself. One of the realizations of the Tao is that no day is like any other: each day, although similar in many ways to those that went before and will come after, is nonetheless absolutely unique. Every day is unique, every snowflake is unique, and especially every person is unique. We must, however, be able to classify things based on similarities and differences if we are ever to use a computer effectively. Everything is the same, yet everything is distinct. The best answer lies somewhere between these extremes. And so once again we must choose The Middle Way.

4. What Should One Model?

This is probably the most important tip in this book. If it works in the world, it has to work in your system, but if it does not work that way in the world, it probably won’t work in the system. Remember the Third Law of Ecology: Nature knows Best. Zen honors Nature. Your representation is most likely to be right if it models the working real world analog as closely as possible. It will also help you and others understand it and find it when you’re looking for it months later. All of the indexing, cross-referencing and searching won’t be as good as having it be in its logical place. It’s also the easiest representation to understand, assuming you understand the world, which you have to do anyway.

Model the World


Use this as your analysis tool: Go look at the world. The computer system is a model of the business. So, for example, if the business is recording and tracking certain data, then your computer system needs to do so, too. For an example of a real world/machine analog, there must be an Entity (or group of entities) if:

There’s a form. Forms are expensive to design and make. So if the business designed a form they definitely are recording information that should be kept in your machine analog. A good analyst always collects forms that are being used by anyone she interviews. Be sure to get some completed examples. Look for stray marks indicating status or type. The name of the entity might not be obvious: A packing slip represents a shipment; a receipt represents a payment.

Sections and boxes on forms also often enclose an entity. There is often a bold line marking off sections; each section could hide an entity.

There are several copies. Look for multiple part forms, especially with copies of different colors. Go to an office supply store and look at some of the forms. Carbon or carbonless paper is used to make multiple copies of invoices, receipts, orders, etc.

There’s a file. If there is a Claims File, the odds are there will be a Claim entity. If there are filing cabinets in the office ask, “What’s in here?” The answer will be the name of an entity you need in your system.

There’s a serial number. Objects have identities. If they’ve given something an identity, it is an entity. Serial numbers and pre-numbered forms indicate an entity.

It’s important. Things the business considers important must be tracked.

In addition as its use as an analysis tool, “Model the World” is important as a design rule. If it doesn’t work that way in the world, watch out! It can be dangerous to ignore the rule. The underlying problem is demonstrated by an address change after a move. The last time I moved, I sent my bank a change of address notice, and soon enough my checking account statements began arriving like clockwork at the new address. But my savings account, credit card, IRA, and CD continued to be sent to the old address. It took three change-of-address forms to actually get all of my accounts going to the new address. This was because I was apparently in their system three different times. Since there is only one of me, they didn’t model the world.

I’m encouraged if I have a conflict in my model and discover it holds in the world—it means we are accurate. There is often a conflict in the model between people and the roles they play. This models real world as anyone involved in an office romance knows, and anyone juggling parenthood and employment. The roles conflict. It is complex in the real world so if it models world it might also conflict, and that shouldn’t make you uncomfortable because that means it is accurate.

O/O Analysis sometimes goes off the track with its emphasis on messages sent between objects. There are no messages. One thing making O/O not helpful is this confusion, especially in Java where everything is an object. A number format, for example, is not an object in the real world, but it is in Java. C++ is superior in this regard because it is a hybrid, and supports both objects and procedures. Arthur J. Riel in Object-Oriented Design Heuristics has a rule: “Model the real world whenever possible”. (Italics added). He then allows: “This heuristic is often violated for reasons of system intelligence distribution, avoidance of god classes, and the keeping of related behavior and data in one place.” I would argue even these are rarely good reasons to ignore the rule:

Systems Intelligence? You’re more intelligent than Mother Nature? It’s not nice to fool Mother Nature…

Avoiding god classes? If the real gods thought there was a god class, why shouldn’t you?

Keeping related behavior in one place? Related in whose mind? Not the gods, not the world, not the users…

What if the business does not faithfully reflect the real world? The Dot.coms showed this was a real possibility. But that’s a different problem—Don’t update the model, update your résumé.

5. When is a Fact About a Thing In Fact a Thing?

A computer systems’ most important function is to keep track of things: customers, bills and payments; students, teachers and courses; citizens, taxes and refunds—computers keep track of all these things. But you need a place to keep the data, and how to do that? There are three categories of data we keep in traditional data modeling: entities, relationships and attributes (ERA). Object Oriented analysis adds behavior, referred to as Methods, and uses different terminology; so classes, relationships, attributes and methods (CRAM), with classes corresponding to entities, at least during business analysis. An entity is just “A Thing the Business needs to know about”. Since that’s not too helpful, I’ve found “Entity” is best defined by examples. Entities can be people, places, or things. They can be events, roles, or organizations. They can even be other systems. Entities are tangible or intangible, and even conceptual such as a cost center or account. And they can be collections of other objects. So they can be just about anything the business cares about. A relationship is a connection of some kind between two objects that is significant to the business. And an attribute is a fact about an entity. One of the trickiest problems in analysis is determining which of these categories a piece of data falls into. For example: “Is X an entity or an attribute?” Is it a fact about a thing (and thus an attribute) or in fact a thing (in which case its an entity)?

Try this one. Is a telephone number an attribute, an entity, or a relationship? By now you can guess the answer is: It Depends. For most applications, it will be an attribute, which is what most people answer. But to the phone company, a number not only isn’t a mere attribute, it isn’t even just an entity—it’s at least three entities. In North America, phone numbers have three distinct parts: my number is (510)893-2943. The first part (510), as you probably know, is called the area code. Everyone knows what the other two parts are, but not many know what they’re called. They are the Exchange (893) and the Subscriber number (2943). To the phone company, the area code, exchange, and subscriber number are each important entities in themselves. For the area code, they need to track the location, whether it is public or private, whether it is toll, toll-free, or regular, and so on. For the exchange, they need its location, the PBX if any, and so forth. And for the subscriber number, they need the address, billing information, etc.

Telephone numbers in Japan are one of those quaint things Japanese. I say “things Japanese” as an allusion to Basil Hall Chamberlain’s 1905 book Things Japanese, which was inexplicably renamed Japanese Things by Charles E. Tuttle Company in their 1971 reprint. Tuttle is usually very simpatico with their specialized audience and authors, but this is an example of a failure to understand the intent of an author: “Japanese Things” ain’t got the ring. Despite Tuttle’s rare lapse, it’s a great book. Read it! But I digress.

Japan has a surprisingly irrational phone system. To get a telephone connection, you have to tell the telephone company what the telephone number will be. No kidding, the telephone company has no telephone numbers. You have to buy a number from someone; some numbers are very expensive, and others very cheap. For example, the number four can be pronounced “shi”, which also means “death”. So a number with a lot of fours is a bad number indeed, and cheap. Eight, on the other hand, means “wealth”, so only a wealthy person could afford all eights. The Japanese language has a limited number of possible syllables, and multiple pronunciations for most words, including numbers, one from aboriginal Japanese and one from Chinese, so it’s usually easy to make a phrase out of a number, like vanity license plates in the US. Clever phrases are worth a lot. This system apparently dates back to the original phone system in the 1920s. Politicians were given blocks of phone numbers in return for favorable votes, bureaucrats for favorable decisions; since no money or property was involved it didn’t meet the definition of a bribe. But now no one who owns an expensive number wants to end the system.


[   [   [


There is no easy rule to determine whether to use an attribute or an entity, but I’ll give some guidelines that I’ve found helpful. After a while, you’ll get a Zen-like ability to know instantly, but it’s a hard distinction to explain.

We sometimes use Grammatical Analysis to determine which category information belongs in. Entities or Classes are usually Nouns; Relationships are Verbs; Attributes are Adjectives; and Methods are Phrases, including an action and a class, relationship, attribute or a combination thereof.

Next is the “Of Test”. Attributes are “OF” a class. If you can’t make sense of the data without an ‘of’, it’s an attribute. Name? Name of what? Name of the Employee, so Name is an attribute of Employee. Date? Date of what? Date of Receipt of Invoice, so Date is an attribute of Invoice.

Attributes have no attributes. When is an Attribute not an Attribute? When it has an attribute itself. Attributes do not have attributes. If there’s a fact about it, it’s an entity, if not, it’s an attribute. Watch out for the multi-value attribute, particularly for a status (e.g. customer status and invoice status).

6. Is there Organization Object Ontology?

What is the ontology of organizations? Ontology is said to be “the study of the nature of being and existence”, but my extensive research has convinced me it’s actually Greek for “You’ll get a headache if you think about this very long”. So in this section we’ll study the nature of being an organization until we understand, or get a headache, whichever comes first. Our organizations deal with many external organizations and they must be represented in our data models. The rule for data models is simple, based on Albert Einstein’s dictum concerning models in Physics:

Make it as simple as it can be,
but no simpler.


You’d think that would be easy, but it is surprisingly difficult to get a model of organizations that has as much detail as is needed, but no more than can be reasonably maintained. There are actually three parts to this problem. (Actually there is a fourth, but I can only remember three of them at any one time.) The first is we deal with both organizations and people, second they play many roles, and third the computer cannot resolve variations in addresses.

First, a customer or a vendor might be an organization such as a corporation, firm, or government agency, or it could be a living, breathing human being. You need to keep different data, or at least structure it differently, depending on which it is. An organization name is just a name, but a person’s name has parts: first, middle and last name. It is difficult to tease these out of an unstructured field. You also will keep different attributes and relationships for organizations and people. Customers and vendors might be either real live people, or else companies or other organizations. Silverston, Inmon & Graziano in The Data Model Resource Book, and David C. Hay in Data Model Patterns, both recommend the approach of subtyping Party into Organization and Person.


[   [   [


The next problem is that of roles. The same party, to use the terminology of the two books, can play many roles. A vendor or an employee might also be a customer, for example. Consider the baffling variety of roles outside organizations can have with Amazon.com. One person or company might play many of these roles at the same or different times. Let’s take a high-level look at the parties involved in Amazon.com’s business:

Associate
An associate links eyeballs from an outside website, and gets paid a commission when a purchase is made by the customer who followed the link.
Authorized Merchant
Includes zShops and auctioneers. Amazon serves as an agent, and collects and transmits the money, less a fee for the service.
Author
In addition to writing the book, the author provides content, such as reviews and interviews.
Bank
Makes ePayments.
Carrier
UPS, USPS, FedEx. Responsible for the delivery; Amazon sometimes will receive a tracking number which is given to the customer via Email and recorded as part of the order status.
Credit Card
MasterCard, Visa, AmEx, etc. Credit card companies are the main source of money income, as most purchases are paid for with credit cards.
Customer
The consumer. Gets order fulfillment information and promotions (ads and coupons). Customers write reviews, and provide other content such as recommendation lists.
Distributor
Supplies the product, but doesn’t make it—either imports or warehouses it. Might send direct in the cases of trusted distributors, but usually sends the items to Amazon for re-shipment. For books, traditional book distributors, such as Ingrams are crucial to Amazon’ business.
Honor System Member
Users to their website decide to pay the member on the honor system. Amazon will collect and transmit payments. Honor system member is charged 15¢ per transaction plus 15% of the payment amount.
Manufacturer
Makes an item Amazon.com sells. Might or might not supply it to Amazon.com, could co-brand or co-advertise, or pay for placement at the top of the search list.
Publisher
Publishes the books. Note Manufacturer and Publisher fill the same role.
“Trusted Partner”
Amazon.com sends eyeballs, and must track them. In addition to the products they sell, Amazon has teamed with Toys R Us and CarsDirect.com to sell toys and automobiles, for example. They can apparently cover some of the costs of advertising and website development by steering customers to these other companies, turning technology into a revenue-producing product.
Vendor/Supplier
Supplies items that are needed for business, not for re-sale.
Warehouse
Amazon warehouses, the critical component in order fulfillment. Fills the order, sends shipping information to the carrier for pickup. Some book distributors effectively fill this role.

In a traditional design, each role would be represented as a different entity type and eventually database table, so a party filling many roles would appear in the database many times. We can separate the parties from the roles to avoid this redundancy, but at a cost in complexity.


[   [   [


The third problem is that of address variations. You will easily recognize that XYZ Company is both a vendor and a company even though the attention line on the address is “Ann Pollywog, Accounts Payable” and the other is “Alvin Richards, Accounts Receivable” but to the computer they are different addresses and have to be tracked differently. Say a company is both a vendor and a customer. In all probability they will be different at the bit and byte level, to wit, you’ll be dealing with the accounts payable department on the customer side and the accounts receivable on the vendor side. The ATTN: line will be different, the department name will be different; in some companies even the city will be different. So are they the same organizations and thus the same entity? Since one of the goals of a single entity is to change both addresses at the same time this might not work, since one address might change and the other not. This is going a step further than my bank problem, i.e. in a single role I want all addresses the same (maybe). I want checking, savings, CD, IRA all to change at the same time and kept in sync.

Now unfortunately this is indeed an inherently complex problem. If you’re not confused, you haven’t been paying attention. Silverston et al. solve it with a data model that requires 12 entity types plus 5 subtypes and 18 relationships! For something that is sometimes tracked as a single entity type. Is it really worth it? It probably won’t save space because overhead of entities and relationships will probably outweigh the redundant data. And it’s pretty complex, so programmers will make errors and users will make errors so you’ll probably be worse off anyway.

Since Silverston is indeed correct in his model, why doesn’t it work well?

Computers aren’t smart enough. Even a bored distracted clerk will identify two similar but slightly different addresses as the same; computers cannot yet make these simple distinctions, requiring manual intervention to make the judgment. And programmers aren’t smart enough. Even though most humans deal with these problems every day and don’t even see them as a problem, programming them has been impossible up until now. Perhaps someday AI will advance to a point this will no longer be true, but until then, the Systems Analyst is the key player in making systems successful.

7. What is the Metaphysics of Metadata

I never Metadata I didn’t Like

Metadata is surely a Zen-like concept, truly metaphysical. The meta- part of metaphysics is from the Greek, meaning “beyond” or “transcending”, and the physics as in physical. In a 1569 collection of Aristotle, the works on the nature of being (ontology), space (cosmology) and knowing (epistemology) were sequenced after the works on physical topics, so were beyond physics, or “metaphysics”. It was apt, since these subjects were also beyond physics intellectually, and the name stuck.

The meta prefix has been applied to subjects like linguistics, where a metalanguage is a language about languages. Metadata uses this sense, in that metadata is “data about data”—definitional data about tables, records, and other data structures, attributes or fields and their types, domains, etc. It’s information about the nature of the data to be kept, rather than the data itself. The distinction between data and metadata is especially important because most IT Departments are organized so that the applications programmers and DBAs (Data Base Administrators) are in different reporting lines, often even reporting up to different Vice Presidents. The data is in the realm of programmers and users, while the Data Administrators and DBAs are responsible for the metadata. So if the customer’s name is misspelled, it’s a data problem and the user should fix it. If the customer’s name is missing, it’s a data integrity problem and the programmer should fix the program to require the name. If the field will only accept a two-character name, it’s a metadata problem and the DBA should fix it. If the customer’s name is kept in incompatible format in different records, it’s the data administrator’s problem.

Metaphysical can be a derogatory term meaning abstract abstruse, or subtle, and surely many data administrators have been justly accused of that. Metadata, like metaphysics, involves exploring the relation between mind and matter, substance and attribute, fact and value. It is concerned with the fundamental nature of reality and being—some metadata and metaphysical discussions will sometimes be indistinguishable.

I was once at a DAMA conference where someone had a sheet of music and someone jokingly referred to it as “metamusic”. On reflection, though, I realized that was not the appropriate name, since the sheet music was about music whereas metamusic should be music about music. So The Tennessee Waltz would be a metasong. The words tell of another dance at another time when the singer “was dancing with my darling to the Tennessee Waltz”, so this is a song about another song. Begin the Beguine is even clearer, since the song is a swing and the Beguine is a Tropical Waltz. Other metasongs are The Land of 1,000 Dances and Blame it on the Bossa Nova.


[   [   [


Database languages, such as SQL, recognize this distinction by having a DDL (Data Definition Language) to CREATE and DROP TABLES and other objects; and DML (Data Manipulation Language) to SELECT, INSERT, UPDATE, and DELETE rows in tables. Most DBAs will jealously defend their right to exclusively maintain the metadata. Observe what is covered in SQL.

DDL—Data Definition Language

CREATE TABLE TEACHER
(	
	SSN		char(11) NOT NULL,
	LastName	char(20),
	FirstName	char(15),
	Street	char(35),
	City		char(20),
	State		char(2),
	Zip		char(5),
	Phone		char(13),
	Rate		INT,
	
	PRIMARY KEY(SSN)
);
	
CREATE TABLE COURSE
(	
	ID			char(11) NOT NULL,
	Title			char(20),
	Units			char(15),
	HSorCollege	char(2),
	
	PRIMARY KEY(ID)
);
	
CREATE TABLE SESSION
(	
	ID			char(11) NOT NULL,
	StartDate		char(8),
	EndDate		char(8),
	Time			char(25),
	Site			char(12),
	TeacherSSN		char(11),
	CourseID		char(11),
	
	PRIMARY KEY(ID),
	FOREIGN KEY(TeacherSSN) REFERENCES TEACHER(SSN),
	FOREIGN KEY(CourseID) REFERENCES COURSE(ID)
);

DML—Data Manipulation Language

INSERT INTO Teacher
	VALUES ('123-45-6789', 
		'McDermott','Patrick',
		'PO Box 20689',
		'Oakland','CA','94620',
		'(510)893-1234',
		125);
INSERT INTO Course
	VALUES ('CIS 234A', 
		'PHP, SQL & HTML',
		3,
		'C');
INSERT INTO Session
	VALUES ('SS8833B1', 
		'1/1/09',
		'1/31/09',
		'M - F 9:00 - 12:00',
		'Bernal',
		'123-45-6789',
		'CIS 234A');

HTML meta tags also illustrate the definition. The tags carry meta-information, that is, information about the information on the webpage. Common attributes are keywords, author and description. Note that these tags contain information about the information on the page, for example, what keyword categories the information belongs in.

A breakthrough in metadata would be a metaphysical advance in computing. XML is designed to do just that, but progress has been slow. XML is an HTML extension, Extensible Markup Language. Other advances are XMI, which is XML Metadata Interchange, and CWMI, the Common Warehouse Metamodel Interchange.

In the Death of E, Fingar & Aronica note the need to communicate metadata if e-business is to reach its potential: “Metadata interchange is essential to the continued growth of e-business, as data from increasingly diverse sources and applications needs to be exchanged both within the enterprise, and more importantly outside the enterprise across the value chain.” So far, we exchange just text. We need a common semantics between companies, and we can’t even do that within a company, so the breakthrough is not yet on the horizon.

8. Who am I?

Who am I? Sounds pretty straightforward, but it can be mind-numbingly complex. The complexity is not analytical in origin, psycho- or systems, but rather legal. Here’s my situation. I am a person as you probably guessed, and I also have a small business that I started in 1996. In 1997, I incorporated my business, but I had a book contract and so I left the book with the small business and the rest of the business (consulting) shifted to the corporation. The corporation is called McDermott Computer Decisions, Inc, but I call it MCD, Inc for short. The corporate stockholders are myself and my girlfriend, and we are also corporate officers: I’m CEO, she’s CFO, I’m Treasurer, she’s General Secretary, I drive, she reads the map, she makes the pasta, I eat it, etc., etc.

Not that complicated yet, until the IRS gets involved. I, Patrick the person, file a 1040 that represents me the human being. My royalties and book expenses go on a schedule C or E, representing me the author. And the corporation files an 1120S. So every time I spend any money I have to go through the “Who am I?” routine. For example, when me the person is running a little short of cash, I can ask me the President of MCD, Inc. for an advance, which I invariably approve. Or I can suggest to me the President we declare a dividend and then vote as me the stockholder to ratify it. I direct me the Treasurer to write a check to me the stockholder, which I the person endorse and deposit into the bank account of me the person, withdraw some money and am on my way. I have to write or sign my name a dozen times to complete one of these transactions!

And of course, when I buy something, I have to figure out who (as in which me) bought it. Could it have been me the person? I hope not, since then it isn’t tax deductible. What about a book? Am I using it as a reference for a book I’m writing or does the corporation want as a consultant’s reference? Often, it’s both. But if I bought it with my money, I have to get me to reimburse me. I haven’t gone psycho yet, but I’m afraid one day I might disapprove one of my requests and start a big fight between me.

Of course the IRS also defines a person a bit differently from most people. For example, there is no such thing as an impersonal corporation because every corporation is a “Legal Person”. And a married couple is one person, not two. That’s why spouses are automatically responsible for each other’s taxes—they are actually the same person as far as the IRS is concerned.

These distinctions may seem silly, but in fact they are important in data modeling and computer systems—not to mention in a tax audit. If your database is not designed to carry both spouses names in a tax system, or doesn’t recognize that corporations are persons, your system might not be able to fulfill its requirements, or might be unnecessarily difficult to program. And you’ll find Zen is probably the only way to understand the complexity. So watch for the sequel to this book.

The End



 Zen & the Art of Systems Analysis 


Copyright ©2003, 2008 Patrick McDermott