This document describes the design of the POIFS system. It is organized as follows:
This document is written as part of an iterative process. As that process is not yet complete, neither is this document.
The design of POIFS is not dependent on the code written for the proof-of-concept prototype POIFS package.
As usual, the primary considerations in the design of the POIFS assumption involve the classic space-time tradeoff. In this case, the main consideration has to involve minimizing the memory footprint of POIFS. POIFS may be called upon to create relatively large documents, and in web application server, it may be called upon to create several documents simultaneously, and it will likely co-exist with other Serializer systems, competing with those other systems for space on the server.
We've addressed the risk of being too slow through a proof-of-concept prototype. This prototype for POIFS involved reading an existing file, decomposing it into its constituent documents, composing a new POIFS from the constituent documents, and writing the POIFS file back to disk and verifying that the output file, while not necessarily a byte-for-byte image of the input file, could be read by the application that generated the input file. This prototype proved to be quite fast, reading, decomposing, and re-generating a large (300K) file in 2 to 2.5 seconds.
While the POIFS format allows great flexibility in laying out the documents and the other internal data structures, the layout of the filesystem will be kept as simple as possible.
The design of the POIFS is broken down into two parts: discussion of the classes and interfaces, and discussion of how these classes and interfaces will be used to convert an appropriate Java InputStream (such as an XML stream) to a POIFS output stream containing an HSSF document.
Classes and InterfacesThe classes and interfaces used in the POIFS are broken down as follows:
| Package | Contents | 
|---|---|
| net.sourceforge.poi.poifs.storage | Block classes and interfaces | 
| net.sourceforge.poi.poifs.property | Property classes and interfaces | 
| net.sourceforge.poi.poifs.filesystem | Filesystem classes and interfaces | 
| net.sourceforge.poi.util | Utility classes and interfaces | 
The block classes and interfaces are shown in the following class diagram.
                             
                        
| Class/Interface | Description | 
|---|---|
| BATBlock | The BATBlock class
                                represents a single big block
                                containing 128 BAT
                                entries. Its _fieldsarray is
                                used to read and write the BAT entries
                                into the_dataarray.Its createBATBlocksmethod is used to create an array of
                                BATBlock instances from an array of
                                int BAT entries.Its calculateStorageRequirementsmethod calculates the number of BAT
                                blocks necessary to hold the specified
                                number of BAT entries. | 
| BigBlock | The BigBlock class is an
                                abstract class representing the common
                                big block of 512 bytes. It implements
                                BlockWritable,
                                trivially delegating the writeBlocksmethod
                                of BlockWritable to its own abstractwriteDatamethod. | 
| BlockWritable | The BlockWritable interface
                                defines a single method, writeBlocks, that
                                is used to write an implementation's
                                block data to anOutputStream. | 
| DocumentBlock | The DocumentBlock class is
                                used by a Document to holds
                                its raw data. It also retains the
                                number of bytes read, as this is used
                                by the Document class to determine the
                                total size of the data, and is also
                                used internally to determine whether
                                the block was filled by the InputStreamor
                                not.The DocumentBlockconstructor is passed anInputStreamfrom which to
                                fill its_dataarray.The sizemethod returns the number of bytes
                                read (_bytes_readwhen the instance was
                                constructed.The partiallyReadmethod returns true if the_dataarray was
                                not completely filled, which may be
                                interpreted by the Document as having
                                reached the end of file
                                point.Typical use of the DocumentBlock class is like this: while
                                (true) | 
| HeaderBlock | The HeaderBlock class is
                                used to contain the data found in a
                                POIFS header. Its IntegerField members are used to read and write the appropriate entries into the _dataarray.Its setBATBlocks,setPropertyStart,
                                andsetXBATStartmethods are used to set the
                                appropriate fields in the_dataarray.The calculateXBATStorageRequirementsmethod is used to determine how many
                                XBAT blocks are necessary to
                                accommodate the specified number of
                                BAT blocks. | 
| PropertyBlock | The PropertyBlock class is
                                used to contain Property
                                instances for the PropertyTable
                                class. It contains an array, _propertiesof 4
                                Property instances, which together
                                comprise the 512 bytes of a BigBlock.The createPropertyBlockArraymethod is used to convert aListof Property
                                instances into an array of
                                PropertyBlock instances. The number of
                                Property instances is rounded up to a
                                multiple of 4 by creating empty
                                anonymous inner class extensions of
                                Property. | 
The property classes and interfaces are shown in the following class diagram.
                             
                        
| Class/Interface | Description | 
|---|---|
| Directory | The Directory interface is
                                implemented by the RootProperty
                                class. It is not strictly necessary
                                for the initial POIFS implementation,
                                but when the POIFS supports directory
                                elements, this interface will be
                                more widely implemented, and so is
                                included in the design at this point
                                to ease the eventual support of
                                directory elements. Its methods are a getter/setter pair, getChildren,
                                returning anIteratorof
                                Property
                                instances; andaddChild, which
                                will allow the caller to add another
                                Property instance to the Directory's
                                children. | 
| DocumentProperty | The DocumentProperty class
                                is a trivial extension of Property and is
                                used by Document to keep
                                track of its associated entry in the
                                PropertyTable. Its constructor takes a name and the document size, on the assumption that the Document will not create a DocumentProperty until after it has created the storage for the document data and therefore knows how much data there is. | 
| File | The File interface specifies the behavior of reading and writing the next and previous child fields of a Property. | 
| Property | The Property class is an
                                abstract class that defines the basic
                                data structure of an element of the Property
                                Table. Its ByteField, ShortField, and IntegerField members are used to read and write data into the appropriate locations in the _raw_dataarray.The _indexmember is
                                used to hold a Propery instance's
                                index in theListof
                                Property instances maintained by PropertyTable,
                                which is used to populate the child
                                property of parent Directory
                                properties and the next property and
                                previous property of sibling File
                                properties.The _name,_next_file, and_previous_filemembers are used to help fill the
                                appropriate fields of the _raw_data
                                array.Setters are provided for some of the fields (name, property type, node color, child property, size, index, start block), as well as a few getters (index, child property). The preWritemethod is
                                abstract and is used by the owning
                                PropertyTable to iterate through its
                                Property instances and prepare each
                                for writing.The shouldUseSmallBlocksmethod returns true if the Property's
                                size is sufficiently small - how small
                                is none of the caller's business. | 
| PropertyBlock | See the description in PropertyBlock. | 
| PropertyTable | The PropertyTable class
                                holds all of the DocumentProperty
                                instances and the RootProperty
                                instance for a Filesystem
                                instance. It maintains a Listof its Property
                                instances
                                (_properties), and
                                when prepared to write its data by a
                                call topreWrite,
                                it gets and holds an array of PropertyBlock
                                instances
                                (_blocks.It also maintains its start block in its _start_blockmember.It has a method, getRoot, to get
                                the RootProperty, returning it as an
                                implementation of Directory, and a
                                method to add a Property,addProperty, and a
                                method to get its start block,getStartBlock. | 
| RootProperty | The RootProperty class acts
                                as the Directory for
                                all of the DocumentProperty
                                instance. As such, it is more of a
                                pure directory
                                entry than a proper root
                                entry in the Property
                                Table, but the initial POIFS
                                implementation does not warrant the
                                additional complexity of a full-blown
                                root entry, and so it is not modeled
                                in this design. It maintains a Listof its children,_children, in
                                order to perform its
                                directory-oriented duties. | 
The property classes and interfaces are shown in the following class diagram.
                             
                        
| Class/Interface | Description | 
|---|---|
| Filesystem | The Filesystem class is the
                                top-level class that manages the
                                creation of a POIFS document. It maintains a PropertyTable instance in its _property_tablemember, a HeaderBlock
                                instance in its_header_blockmember, and aListof its
                                Document
                                instances in its_documentsmember.It provides methods for a client to create a document ( createDocument),
                                and a method to write the Filesystem
                                to anOutputStream(writeFilesystem). | 
| BATBlock | See the description in BATBlock | 
| BATManaged | The BATManaged interface
                                defines common behavior for objects
                                whose location in the written file is
                                managed by the Block
                                Allocation Table. It defines methods to get a count of the implementation's BigBlock instances ( countBlocks), and
                                to set an implementation's start block
                                (setStartBlock). | 
| BlockAllocationTable | The BlockAllocationTable is
                                an implementation of the POIFS Block
                                Allocation Table. It is only
                                created when the Filesystem is
                                about to be written to an OutputStream.It contains an IntList of block numbers for all of the BATManaged implementations owned by the Filesystem, _entries, which is
                                filled by calls toallocateSpace.It fills its array, _blocks, of BATBlock
                                instances when itscreateBATBlocksmethod is called. This method has to
                                take into account its own storage
                                requirements, as well as those of the
                                XBAT blocks, and so callsBATBlock.calculateStorageRequirementsandHeaderBlock.calculateXBATStorageRequirementsrepeatedly until the counts returned
                                by those methods stabilize.The countBlocksmethod
                                returns the number of BATBlock
                                instances created by the preceding
                                call to createBlocks. | 
| BlockWritable | See the description in BlockWritable | 
| Document | The Document class is used
                                to contain a document, such as an HSSF
                                workbook. It has its own DocumentProperty ( _property) and
                                stores its data in a collection of DocumentBlock
                                instances
                                (_blocks).It has a method, getDocumentProperty,
                                to get its DocumentProperty. | 
| DocumentBlock | See the description in DocumentBlock | 
| DocumentProperty | See the description in DocumentProperty | 
| HeaderBlock | See the description in HeaderBlock | 
| PropertyTable | See the description in PropertyTable | 
The utility classes and interfaces are shown in the following class diagram.
                             
                        
| Class/Interface | Description | 
|---|---|
| BitField | The BitField class is used primarily by HSSF code to manage bit-mapped fields of HSSF records. It is not likely to be used in the POIFS code itself and is only included here for the sake of complete documentation of the POI utility classes. | 
| ByteField | The ByteField class is an
                                implementation of FixedField for
                                the purpose of managing reading and
                                writing to a byte-wide field in an
                                array of bytes. | 
| FixedField | The FixedField interface
                                defines a set of methods for reading a
                                field from an array of bytesor from anInputStream, and for
                                writing a field to an array ofbytes. Implementations
                                typically require an offset in their
                                constructors that, for the purposes of
                                reading and writing to an array ofbytes, makes sure that
                                the correctbytesin the
                                array are read or written. | 
| HexDump | The HexDump class is a
                                debugging class that can be used to
                                dump an array of bytesto
                                anOutputStream. The
                                static methoddumptakes an array ofbytes,
                                alongoffset that is
                                used to label the output, an openOutputStream, and anintindex that specifies
                                the starting index within the array ofbytes.The data is displayed 16 bytes per line, with each byte displayed in hexadecimal format and again in printable form, if possible (a byte is considered printable if its value is in the range of 32 ... 126). Here is an example of a small array of byteswith an offset of
                                0x110:00000110 C8 00 00 00 FF 7F 90 01 00 00 00 00 00 00 05 01 ................ | 
| IntegerField | The IntegerField class is
                                an implementation of FixedField for
                                the purpose of managing reading and
                                writing to an integer-wide field in an
                                array of bytes. | 
| IntList | The IntList class is a
                                work-around for functionality missing
                                in Java (see http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
                                for details); it is a simple growable
                                array of intsthat gets
                                around the requirement of wrapping and
                                unwrappingintsinIntegerinstances in
                                order to use thejava.util.Listinterface.IntList mimics the functionality of the java.util.Listinterface
                                as much as possible. | 
| LittleEndian | The LittleEndian class
                                provides a set of static methods for
                                reading and writing shorts,ints,longs,
                                anddoublesin and out ofbytearrays, and out ofInputStreams, preserving
                                the Intel byte ordering and encoding
                                of these values. | 
| LittleEndianConsts | The LittleEndianConsts
                                interface defines the width of a short,int,long, anddoubleas stored by Intel
                                processors. | 
| LongField | The LongField class is an
                                implementation of FixedField for
                                the purpose of managing reading and
                                writing to a long-wide field in an
                                array of bytes. | 
| ShortField | The ShortField class is an
                                implementation of FixedField for
                                the purpose of managing reading and
                                writing to a short-wide field in an
                                array of bytes. | 
| ShortList | The ShortList class is a
                                work-around for functionality missing
                                in Java (see http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
                                for details); it is a simple growable
                                array of shortsthat gets
                                around the requirement of wrapping and
                                unwrappingshortsinShortinstances in order
                                to use thejava.util.Listinterface.ShortList mimics the functionality of the java.util.Listinterface
                                as much as possible. | 
| StringUtil | The StringUtil class manages the processing of Unicode strings. | 
This section describes the scenarios of how the POIFS classes and interfaces will be used to convert an appropriate XML stream to a POIFS output stream containing an HSSF document.
It is broken down as suggested by the following scenario diagram:
		     
		
| Step | Description | 
|---|---|
| 1 | The Filesystem is created by the client application. | 
| 2 | The client
			application tells the Filesystem to create a
			document, providing an InputStreamand the name of the
			document. This may be repeated several
			times. | 
| 3 | The client
			application asks the Filesystem to write its
			data to an OutputStream. | 
Initialization of the POIFS system is shown in the following scenario diagram:
                             
                        
| Step | Description | 
|---|---|
| 1 | The Filesystem object, which is created for each request to convert an appropriate XML stream to a POIFS output stream containing an HSSF document, creates its PropertyTable. | 
| 2 | The PropertyTable
                                creates its RootProperty
                                instance, making the RootProperty the
                                first Property
                                in its Listof Property
                                instances. | 
| 3 | The Filesystem creates its HeaderBlock instance. It should be noted that the decision to create the HeaderBlock at Filesystem initialization is arbitrary; creation of the HeaderBlock could easily and harmlessly be postponed to the appropriate moment in writing the filesystem. | 
Creating and adding a document to a POIFS system is shown in the following scenario diagram:
                             
                        
| Step | Description | 
|---|---|
| 1 | The Filesystem
                                instance creates a new Document
                                instance. It will store the newly
                                created Document in a Listof BATManaged
                                instances. | 
| 2 | The Document reads
                                data from the provided InputStream, storing the
                                data in DocumentBlock
                                instances. It keeps track of the byte
                                count as it reads the data. | 
| 3 | The Document creates a DocumentProperty to keep track of its property data. The byte count is stored in the newly created DocumentProperty instance. | 
| 4 | The Filesystem requests the newly created DocumentProperty from the newly created Document instance. | 
| 5 | The Filesystem
                                sends the newly created DocumentProperty
                                to the Filesystem's PropertyTable
                                so that the PropertyTable can add the
                                DocumentProperty to its Listof Property
                                instances. | 
| 6 | The Filesystem gets the RootProperty from its PropertyTable. | 
| 7 | The Filesystem adds the newly created DocumentProperty to the RootProperty. | 
Although typical deployment of the POIFS system will only entail adding a single Document (the workbook) to the Filesystem, there is nothing in the design to prevent multiple Documents from being added to the Filesystem. This flexibility can be employed to write summary information document(s) in addition to the workbook.
Writing the filesystem is shown in the following scenario diagram:
                             
                        
| Step | Description | |
|---|---|---|
| 1 | The Filesystem adds
                                the PropertyTable
                                to its Listof BATManaged
                                instances and calls the
                                PropertyTable'spreWritemethod. The action taken by the
                                PropertyTable is shown in the PropertyTable
                                preWrite scenario diagram. | |
| 2 | The Filesystem creates the BlockAllocationTable. | |
| 3 | The Filesystem gets the block count from the BATManaged instance. | These three steps are
                                repeated for each BATManaged
                                instance in the Filesystem's Listof BATManaged
                                instances (i.e., the Documents, in
                                order of their addition to the
                                Filesystem, followed by the PropertyTable). | 
| 4 | The Filesystem sends the block count to the BlockAllocationTable, which adds the appropriate entries to is IntList of entries, returning the starting block for the newly added entries. | |
| 5 | The Filesystem gives the start block number to the BATManaged instance. If the BATManaged instance is a Document, it sets the start block field in its DocumentProperty. | |
| 6 | The Filesystem tells the BlockAllocationTable to create its BatBlocks. | |
| 7 | The Filesystem gives the BAT information to the HeaderBlock so that it can set its BAT fields and, if necessary, create XBAT blocks. | |
| 8 | If the filesystem is unusually large (over 7MB), the HeaderBlock will create XBAT blocks to contain the BAT data that it cannot hold directly. In this case, the Filesystem tells the HeaderBlock where those additional blocks will be stored. | |
| 9 | The Filesystem gives the PropertyTable start block to the HeaderBlock. | |
| 10 | The Filesystem
                                tells the BlockWritable
                                instance to write its blocks to the
                                provided OutputStream.This step is repeated for each BlockWritable instance, in this order: 
 | |
PropertyTable preWrite scenario diagram
                             
                        
| Step | Description | 
|---|---|
| 1 | The PropertyTable
                                calls setIndexfor
                                each of its Property
                                instances, so that each Property now
                                knows its index within the
                                PropertyTable'sListof
                                Property instances. | 
| 2 | The PropertyTable requests the PropertyBlock class to create an array of PropertyBlock instances. | 
| 3 | The PropertyBlock
                                calculates the number of empty Property
                                instances it needs to create and
                                creates them. The algorithm for the
                                number to create is: block_count = (properties.size()
                                + 3) / 4; | 
| 4 | The PropertyBlock
                                creates the required number of PropertyBlock
                                instances from the Listof Property
                                instances, including the newly created
                                empty Property
                                instances. | 
| 5 | The PropertyTable
                                calls preWriteon
                                each of its Property
                                instances. For DocumentProperty
                                instances, this call is a no-op. For
                                the RootProperty,
                                the action taken is shown in the RootProperty
                                preWrite scenario diagram. | 
RootProperty preWrite scenario diagram
                             
                        
| Step | Description | |
|---|---|---|
| 1 | The RootProperty
                                sets its child property with the index
                                of the child Property that is
                                first in its Listof
                                children. | |
| 2 | The RootProperty
                                sets its child's next property field
                                with the index of the child's next
                                sibling in the RootProperty's Listof children. If the
                                child is the last in theList, its next property
                                field is set to-1. | These two steps are
                                repeated for each File in the RootProperty's Listof
                                children. | 
| 3 | The RootProperty
                                sets its child's previous property
                                field with a value of -1. | |