What is XML - RDBMS vs. Native XML Storage

RDBMS vs. Native XML Storage

Where To Store XML-Data ?

• File Systems
• Cumbersome search and retrieval
• Entire documents only - or- Doc./Content Mgmt. System required
• Object DBMSs
• Practical hierarchical approach for a few users, but not scalable for Web applications with thousands of users
• XML-Enabled RDBMSs, Object-Relational DBMS
• Suitable for flat structured content, integrity lost due to decomposition
• Storing and retrieving XML-docs -> modelling & programming effort
• XML DBMS
• Dedicated to extensible XML structures -> fastest query responses

Relational database management systems (RDBMS) with their table-based data model are very poorly suited to the hierarchical, interconnected nature of structured XML content. Relational databases disassemble the XML objects in order to fit them into their tabular architecture. As a result the XML object's structure and semantics are either lost, minimising its value, or they must be duplicated in the design of the database. Duplicating the structure and semantics of complex XML objects in the design of the database is very difficult, particularly if the structure of the XML data is variable, as it almost always is. The rigidity of the relational design is a poor fit with the dynamic
assembly and manipulation of XML data. Relational databases also cannot handle object-level locking; the best they can provide is row-level locking. Since relational databases decompose XML elements into various tables, linked via keys, it is very difficult to implement an effective locking scheme that doesn't dramatically hinder concurrent use and scalability. In concurrent editing environments, there will be an increase in demand for related objects from disparate users. Relational databases respond by locking entire rows across multiple tables. This can cause unacceptable performance degradation if multiple users are requesting different objects that are locked via this broad locking scheme. If the DBA responds by separating the information into a larger number of more granular tables, the performance is degraded by the number of joins required to model the richly linked structure of
structured XML content. In addition, relational databases are typically too heavyweight to form an infrastructure for
embeddable storage and require substantial development to adapt to the complex structure of structured XML
content. Quite simply, relational databases, while excellent for many purposes, are not architecturally compatible with the storage needs of XML data.

Analyst Quote on XML Storage in RDBMS
DB2 UDB (Universal Database) has now implemented support for XML that provides automatic mapping to
and from a normalized relational construct, or alternatively permits XML messages to be stored intact as ”BLOBs".
While this is a leading implementation of XML in the relational world, it does not match up to the native XML processing capabilities of a specific XML database such as Software AG's Tamino XML Database.
Source: IDC - IT Integration Strategies, Dec. 2000

Relational DBMSs and XML

Data storage

The data is stored as BLOB (no / very limited queries w/ extra tools)
The data must be normalized and stored in multiple tables
Separately programmed trigger required for every doc-subtree
Recreating XML docs from stored content needs separate stylesheet for appropriate display of every subtree.

Relational DBMSs do not understand XML or XML Query Language

Additional processing layer required
Continuous bi-directional format & structure conversion overhead
XSQL allows to retrieve (but not store) relational data in XML format
SQL-based queries have a huge overhead
XML doc tree hierarchies can only be queried with complex joins
XML information is held outside the database kernel
Potential referential integrity exposure
Transformation is a Slow Expensive Process

First we’ll look at the start up process. Each different type of XML object or document has a DTD. For the sake of discussion whenever I say DTD it can mean either DTD or schema. The sample DTD is a very generic one for any kind of tech document or manual consisting of chapters, sections, subsections, paragraphs, etc.
Each end node of this DTD has to be mapped to a row and column in a set of tables that is a relational database. This process is called creating a data model and the result is an entity relationship diagram or ERD.

The ERD defines the parent/child nature of the nodes up through every level of the DTD Header data from the DTD also has to be mapped into the tables.
Decisions will have to be made on which fields should be indexed to improve retrieval times. Any filed not indexed has to be searched sequentially to find a specific record. However indexing more fields increases the size and overhead of a relational database.
Finally the table structure has to be normalized to eliminate any duplication of fields.