One question that arises time and time again pertains to the manner in which OpenDJ stores it entry data and how this differs from the Oracle Directory Server Enterprise Edition (previously known as Sun Directory Server Enterprise Edition).
The following information is from ForgeRock’s OpenDJ Administration, Maintenance and Tuning Class and has been used with the permission of ForgeRock.
OpenDJ includes the Berkeley DB Java Edition database as the backend repository for user data. The Java version is quite different from the Berkeley C version which is used by the Sun Directory Server Enterprise Edition.
The Berkeley DB Java Edition is a Java implementation of a raw database using the B-Tree technology. A Berkeley DB JE environment can be composed of multiple databases, each of which is stored in a single folder on the file system. Rather than having separate files for records and transaction logs, Berkeley DB JE uses a rolling log file to store everything; this includes the B-Tree structure, the user provided records and the indexes. Write operations append entries as the last items in the log file. When a certain size is reached (10MB by default), a new log file is created. This results in consistent write performance regardless of the database size.
Note: Initial log files are located beneath the db/userRoot folder in the installation directory. The initial log file is 00000000.jdb. When that file reaches a size of 10MB, a new file is created as 00000001.jdb.
Over time records are deleted or modified in the log. OpenDJ performs periodic cleanup of log files and rewrites them to new log files. This task is performed without action by a system administrator and ensures consistency of the data contained in the log files.
You can see a list of all entries contained in the database with the dbtest utility. This command returns the entry specific information that can also be used to debug the backend.
The following diagram demonstrates the periodic processing of the Berkeley DB Java Edition database over time.
Log (Entry) Processing
The log shown at the far left (row 1, column 1) contains entries after an immediate population. You will note that it contains five data entries (Entry 1 through Entry 5) as well as the associated index entries. To keep things simple, only the common name and object class attributes have been indexed (as shown by the cn and OC keys).
Note: The data does not appear in this actual format. This notation is used for demonstration purposes.
As time goes by, data in the database changes. New entries are added, attribute values for existing entries are modified, and some entries are deleted. This has an effect on both the data entries in the log as well as any associated index entries.
The second log (row 1, column 2) demonstrates the effect on the database after Entry 3 has been modified. The modified data entry is written to the end of the log and the original entry is marked for deletion. The modification was made to an attribute that that was not indexed so the log does not contain any modification to the index entries.
Changes made to data entries containing indexed attributes would not only appear at the end of the log, but the modifications to the indexes would appear there as well. This can be seen in the third log (row 1, column 3). Entry 2 was modified and the change involved a modification to the common name (cn) attribute. Note how the previous index value for this entry is marked for deletion and the new index entry is written to the end of the log.
The logs found at row 1, column 4 and row 2, column 4 demonstrate how new log files are created as previous ones reach their limit. Write operations are appended to the log file in a linear fashion until it reaches a maximum size of 10 MB at which time a new log is created. This can be seen by the 00000001.jdb log in row 2, column 4.
Note: The maximum log size of 10 MB is defined in the ds-cfg-db-log-file-max attribute contained in the backend definition.
Database Cleanup
The Berkeley C Database simply purged data from the database. This led to fragmentation (“holes”) in the database and required periodic cleanup by system administrators to eliminate the holes (similar to defragmenting your hard drive). This is not the case with the Berkeley DB Java Edition.
The Berkeley DB Java Edition has a number of threads that periodically check the occupancy of each log. If it detects that the size associated with active entries falls below a certain threshold (50% of its maximum size, or 5MB by default), it rewrites the active records to the end of the latest log file and deletes the old log altogether. This can be seen in the log found at row 2, column 5.
Note: Maximum occupancy is defined in the ds-cfg-db-cleaner-min-utilization attribute contained in the backend definition.
The default occupancy is 50% so at its maximum size, the log will be twice as big as its sum of records. Increasing the occupancy % will reduce the log’s size, but induce more copying, thus increasing CPU utilization.
This process of always appending data to the end of the log and periodically rewriting the log as entries are obsoleted allows OpenDJ to maintain a fairly consistent size – even if entries are heavily modified. It does, however, allow the database to shrink in size if many entries are deleted.
Check out ForgeRock’s website for more information on OpenDJ or click here if you are interested in attending one of ForgeRock’s upcoming training classes.