ChernoWiki.ru: web collaboraion (powered by CoWiki)
Current user:   guest    Change     Preferences 
   List directory   History   Similar   Print version 
root 
dev 
   PostgreSQL 
   PHP 
   XML 
   DHTML 
   Design Patterns 
   Graph in RDBMS 
   Social networks 
   Adaptive Web 
   Wiki 
   Philosophy 
os 
   Unix, Linux, FreeBSD 
other 
   Hello 
   searchers 



dev > PostgreSQL > [SoC Proposal] Initial support of XMLType for PostgreSQL

 
rw-rw-r--   ns   wheel

[SoC Proposal] Initial support of XMLType for PostgreSQL

Summary

Primary goal is introduction of special type support for storing XML data in ORDBMS PostgreSQL, querying this data and modifying it. This project is intended to develop manipulation abilities rather than special storage engine (VARCHAR as initial storage implicit type).

At the moment there is no good general vision of most suitable storage for XMLType. Moreover, from my point of view, DBMS should have support of different index types for XMLType - every for its special purpose. And which is more important is an open question. That's why I propose to work on 'external' things rather than 'internals' (data structure for index) and strictly follow standards. But anyway, I've included path index (#7 in the list of Deliverables), because now I suppose that it is most expectative type of index (this item is optional, because here community's feedback is highly needed).

Deliverables

  1. Ability to define any column as of XMLType. Initially, this means that only well-formed XML documents could be stored in such a column.
  2. Automatic validation of XML documents being inserted/modified against XML schema, if definition of column contains it (reference to it). DTD and/or XML Schema could be used for this.
  3. Subset of SQL/XML standard [1] for mixing relational and XML data in queries. This includes at least following: XMLELEMENT, XMLAGG, XMLFOREST, XMLCONCAT expressions; implementation of mapping rules for basic types. (See Project Details for more details).
  4. XML domains support: possibility to define domain based on XMLType, using XML schema (DTD / XML Schema).
  5. Basic XPath support (existing experience - contrib/xml2 module - should be taken into account).
  6. Basic XSLT support (existing experience - contrib/xml2 module - should be taken into account).
  7. Path indexes for fast retrieval of XML documents (queries with XPath expressions in WHERE clause). [OPTIONAL]
  8. Documentation (definition rules for XMLType, SQL/XML expressions, etc).

Project Details

XML support in relational database is rather difficult but very promising task. Now PostgreSQL has following abilities:

- Contrib module xml2 (contrib/xml2), which gives possibility to work with VARCHAR records, using XPath and XSLT expressions (in other words, it is a kind of emulation of XML support) [2].

- Patch for basic SQL/XML support, developed by Pavel Stehule [3].

- Several dead projects outside CVS tree.

In other words, there is no native support of special type for XML storage.

First of all, it's worth to say that major vendors of commercial (O)RDBMSes have been working hard on XML support for several years. There is a great experience, which could be taken by analysing ways that they follow and the results. Currently I work on such analysis (as a part of my PhD research activity). The goal is to compare XML support implementations in latest versions of SQL Server, Oracle and DB2. This research consits of three parts (articles), and the first one is scheduled on the end of May. (These comments don't explicitly concern the SoC contest, but this research would potentially produce useful knowledge to be used in this work.)

Surely, at the moment storage engine for XML in new version of DB2 ('Viper', currently is beta) seems very attractive [10]. But such an implementation is very huge problem and definitely couldn't be done by one person in few months (actually, this is almost as huge task as cretation of new DBMS). Instead of this, I propose to implement basic features, to obtain some initial support of standard language constructions and to obtain basic possibilities for work with XML data. This could be good basis for future work and attract more people (developers) to work in this direction.

Basic idea is to concentrate efforts on syntax and on things that PostgreSQL already has and to create a good foundation by implementing / improving / combining these things. It's worth to say, that contrib/xml2 uses Expat for parsing documents, which have its pros and cons, comparing with libxml2 ( e.g., better restrictions for document size [9], but absence of validation abilities). So, at the moment I'd prefer to not restrict myself choosing libraries to be used.

It's worth to say, that this project could not be performed as contrib module. The reason lies in SQL/XML expressions ('XMLELEMENT(NAME ..., ...)') - changes in grammar are needed [13].

So, only a strict approach could be accepted by PostgreSQL community. I hope that analysis of other projects experience, intensive communication with PostgreSQL Community, knowledge of SQL:200n standard papers and Oleg's supervision will help me to achieve a good results.

Path indexes support could be implemented as table of pairs (path, value) and intended to improve performance of statements with XPath expressions in WHERE clause. Different paths labeling techniques could be used (for example, improved prefix labeling scheme, which is better than MASS [6] and was developed in my institute). Syntax for definition of such index would be as following:

CREATE INDEX name ON table USING XPATH (column);

Project Schedule

  • End of May - review of XML support in commercial DBMSes (the article mentioned above; doesn't below to the SoC and will be created anyway), talks in PostgreSQL Community mailing lists (for points of view have being established and details have been understood).
  • End of June - alpha code (strictly: items #1-4 from the list of Deliverables; optionally: items #5-7).
  • End of August - improved code (fixed bugs, left items from the list). Documentation.

Benefits to PostgreSQL community

  1. Support of special type for XML storage, according standards.
  2. Extention of the list of SQL:200n features supported by PostgreSQL.
  3. Good basis for future work on XML type support (major benefit).

Future work

I see two ways to continue the work on XMLType support. The first one is 'MS-like' [11, 12], when DBMS stores XML document as is and, additionally, represents the data in parsed format ('primary' index , which could be with shredding of XML document into relational tables - actually, this is redundant rather useful for many purposes storage). Moreover, many traditional (for native XML DBMSes) and innovatory techniques would be implemented for better perfomance [5, 6, 7, 8]. The second one is 'native storage' way - as descibed above, seems to be the hardest way, but may lead to better results.

Two comments here. First of all, Oleg Bartunov and Teodor Sigaev just committed inverted index (Gin) support, which looks very attractive. So, despite the way we go, GiST and Gin will allow to implement a lot of index types and will help a lot. Also, it would very useful to have full text search abilities for XML documents (again, according standard and using Gin and GiST). The second comment is a little idea that it would be useful to make a similar research, comparing native XML DBMSes (actually, several papers on this theme already exist) and to try to catch some benefits from them (initial crazy idea: to work with XML DBMS using two-phase commit protocol gaining benefits from native XML storage; open question is mutual execution plan).

Other activities would include XQuery support (would be a great target point for following versions), XML views / updatable XML views support (I have my own idea on this), and many things that inevitably follow by such a great project (encoding issues, namespaces, better support of standard expressions, etc).

About me

Name: Nikolay Samokhvalov.

Contact information: pgsql@samokhvalov.com; ICQ: 50000252; skype: samokhvalov

I am a PhD student of Moscow Institute of Physics and Technology (MIPT, http://phystech.edu), Moscow, Russia.

Scientific mentor: Dr. Sergey Kuznetsov , Institute for System Programming, Moscow, Russia (http://ispras.ru).

Fields of interests: Relational Theory, SQL:200n standard, XML. Current research activity includes investigation of methods for parallel work with relational and XML data, analysis of approaches and current implementations for XML support in (O)RDBMSes. My undergoing PhD thesis work is devoted to 'XML support in (O)RDBMSes' and this work on PostgreSQL's XML support could help and (vice versa) gain a lot from my PhD activity.

References

[1] Fresh draft of SQL:200n: http://www.wiscorp.com/sql200n.zip

[2] 'PostgreSQL and XML', presentation of contrib/xml2:

http://www.mtsu.edu/~csal/spring.06/cs6560/PostgreSQLandXML.ppt

[3] Patch for initial SQL/XML support by Pavel Stehule: http://www.pgsql.ru/db/mw/msg.html?mid=2096818

[4] SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards

[5] XISS system (Li, Moon - advanced interval indexes): http://www.cs.arizona.edu/xiss/

[6] MASS (prefix indexes): http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html

[7] Ctree: A Compact Tree for Indexing XML Data: http://www.cs.ucla.edu/~sliu/papers/WIDM04.pdf

[8] Staircase joins for PostgreSQL (accelerating XPath Evaluation): http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf

[9] XML parsers benchmark: http://xmlbench.sourceforge.net/results/benchmark200402/

[10] XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf

[11] XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf

[12] Labeling schema in SQL Server (ORDPATHs): http://portal.acm.org/citation.cfm?id=1007686

[13] Research 'SQL/XML in PostgreSQL': http://www.cs.utwente.nl/~hiemstra/courses/reports/sqlxml.pdf

[14] Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo

5056



Reference [SoC Proposal] Initial support of XMLType for PostgreSQL
http://chernowiki.ru/Dev/SoCProposalInitialSupportOfXMLTypeForPostgreSQL


Prev. Regexps in PostgreSQL   [SoC Proposal] Full Text Search support in PostgreSQL GUI Tools Next


Russian Internet News | AJAX news | avianews

Driven by coWiki 0.3.4 (Boron) web collaboration tool. Processed in 0.078 sec.