Overall data stratification is about data residing at different layers within the architecture. The purpose, scope, nature, content, granularity and retention time are attributed to the specific layer of the stratification.

Figure A High Level Data Stratification Architecture
The origin of data is either based on automated data collected from the equipment or manually by a human via shop floor interfaces. The data at this layer of the stratification is the most granular and represents the current state of information within a specific function space. Typically, this data is first either persistently stored on the processing equipment (for later storage within an operational database) or immediately fed into an operational database. As data moves up the stratification, the data is aggregated and stored into common schemas for greater levels of correlation with other data.
Typically, common data stratification layers are:
- Equipment (Local Storage)
- Operational Databases
- Operational Shadows
- Operational Data Store
- Data Warehouse
- Data Marts
By implementing a true operational reporting layer, the operational systems can be better tuned in drive better execution systems performance, common reporting tools can be deployed at multiple levels within the stratification, and the critical factory data is retained to ensure that the right level of information is available.
Equipment (Local Storage)
Equipment local storage is simply that, local (or possibly networked) drives that are available to the equipment. Obviously, this is the lowest level of raw data, logs, and information about the equipment. Typically within previously generations of factory systems (with limited FDC integration, RS232 limited communication, less sophisticated equipment integration) this data was accessed directly off the equipment for uncontrolled usage by the equipment and process engineering teams. Today, with the more advances features of the shop floor architecture, access to this data off the tool should be highly discouraged or possibly even restricted.
The key characteristics of the operational databases are as follows:
Usage: Internal tool usage and reporting
Data Retention: Dependant on the configuration of the specific tool and equipment engineer administration.
Aggregation: Typically only raw or derived data from the tool software
Access: Should be restricted to the equipment automation
While it is common for data (processing data and instructions) to be fed down from executions systems to the tool, for the purposes of the overall data stratification, then information flow should be thought to flow from the tools upwards into the operational databases.
The data from the equipment local storage is not typically backed up for purposes of being able to later fetch the data for some use. However, it is common for this level of data to be backed up, though this is not usually done with a common methodology.
Within most published information about manufacturing data stratification, the equipment local storage is not typically included. However, it is included in this paper to show the data source origin and specifically comment on a principle to not use data directly off the tool, but rather have manufacturing systems access this and persistently store it within another layer.
Operational Databases
Operational databases contain data that is either entered via automated or manual interfaces. The content of the data should be current information only and is used for both execution tracking and control. Operational systems are typically purchased off the shelf from commercial providers so a common characteristic is for them to have their own proprietary schemas, which are optimized for the application use and not usually easily correlated between themselves. Access to the data within the operational systems should be highly restricted to applications and clients; reporting should never be done within this layer of the stratification.
Obviously, the manufacturing execution system (MES) is the most common operational database and a key part of the overall data stratification.
The key characteristics of the operational databases are as follows.
Usage: WIP Tracking, Equipment Tracking, Engineering Data Collection, etc.
Data Retention: Typically only current WIP within the factory
Aggregation: Raw data, only aggregation is for application efficiency
Access: [Real Time] Application Level, user interface via controlled services
Information commonly flows between the various operational systems for tracking and execution control purposes, but again the information flow from the data stratification perspective should be thought to flow up towards the operational data stores. Typically, only object and historical data is fed up into the operational data store; however there is usually a considerable amount of information that only resides on the operational which is only specific to the application usage at the operational level.
The data of the operational database is ideally purged and not archived. It is a best practice to not use this level of the stratification for de-archiving information, unless that information physically represents objects that have re-entered the factory.
Operational Shadows
While not really part of the overall data stratification, operational shadows are at least worth mentioning within it’s context as they are critical for the availability of the operational systems and depending on the replication methodology implemented, it is sometimes possible to be leveraged.
The key characteristics of the operational shadows is as follows:
Usage: Operational Systems High Availability and Partial Disaster Recovery
Data Retention: Same as the operational system it is shadowing
Aggregation: Same as the operational system it is shadowing
Access: When online synchronization is on, access should be restricted. However, depending on the shadow methodology, a good practice is to take operational statistics and metrics from the shadow database weekly if it is able to not be online for a specified period.
The information flow to the operational shadow is intended strictly for the higher availability of the operational systems. There is really not a common backup methodology within this level as typically it is the backup for the operational systems.
Operational Data Store
Operational data stores contain the first level of common, correlated data within the stratification. A critical aspect of the operational data store is a well-architected common schema that can be loaded from the various operational databases. This level of the stratification should be leveraged as both the primary reporting information store for the factory as well as data feed for various data marts, as needed for specific needs like engineering data analysis. Access to this level is typically controlled through production reporting tools and is based on well-designed and optimized data access. Data from within an operational data store is typically from only one facility or factory, but the enterprise should share one common schema (even if there are different operational systems that feed it) to ensure that data can be easily shared across the enterprise at the next level of the data stratification.
The key characteristics of the operational database store is as follows:
Usage: Production reporting, part traceability & tracking
Data Retention: Typically 12-18 months
Aggregation: Some Raw data tending to more aggregation to make more efficient operational reporting
Access: [Near Real Time – minutes delayed] Web-based reporting, reporting tools (with already optimized data access) and applications
Information flows up the stratification to the enterprise data warehouses, but also typically flows through the operational data store to feed data marts. It is a very good practice to ensure that data flows through the operational data stores to the data marts instead of direct data feeds to the data marts so that information can be commonly controlled though one channel.
The operational data store should be the first level of the data stratification implemented as it has the most immediate benefit to the enterprise.
The data of the operational data store is backed up typically using common database archiving methodologies and should be implemented in such a way that the data can later be retrieved to meet manufacturing use cases (for offline data retrieval) as well as availability purposes. Typically, corporations have legal requirements for the archival and retention. These requirements should be designed in to be satisfied at this level of the data stratification as it typically has enough of the raw data in order to meet these requirements, but does not burden the operational systems with having to meet this requirement.
Data Warehouse
The data warehouse contains information for the entire enterprise and consists of the highest level of aggregated data. The data retention of the data warehouse is the longest of any level of the data stratification. The use of this information is typically for operational and manufacturing performance metrics based on common reporting tools. Access to this level should not be ad-hoc due to the very large levels of history that is stored within the data warehouse. The schema of this database should be common across the enterprise, which is often achieved by having only one data warehouse. This common schema should be based on that of the operational data store, but typically can not be the same as much of the data is aggregated and filtered to meet the specific needs of enterprise metrics.
The key characteristics of the data warehouse is as follows:
Usage: Cross-Factory Reporting, Metrics, and Information Sharing
Data Retention: Typically only current WIP within the factory.
Aggregation: Limited raw data, mostly aggregated data
Access: [Delayed – typically an hour old] Corporate reporting tools and applications.
While ultimately this level is very beneficial to the enterprise, it is typically the last level that is implemented within the overall stratification as the purpose can be substituted through data channels from the operational data store. In addition, since the data retention of the operational data store is typically around or greater than eighteen (18) months, the urgency of implementation is less than other levels.
The data within the data warehouse is commonly backed up using standard database methodologies and it is not a common practice to move this data or bring information that has been archived back into the data warehouse.
Data Marts
The data marts are usually designed to cover specific part traceability, financial or engineering data analysis purposes. The schemas of these data marts are not typically common, but suited to the specific needs of the data mart. The data retention should be optimized for the use of the data mart, but is typically greater than a year. It is common that data marts only contain data for specific uses along with the critical manufacturing contexts (lots, equipment, etc.).
The key characteristics of data marts are as follows:
Usage: Specific to data mart type (engineering analysis, part tracking, etc.)
Data Retention: Specific to data mart
Aggregation: Specific to data mart
Access: [Offline] Specific to the data mart, capabilities to export own data for further analysis
The information flow of the data mart should come from the operational data store to ensure a common data channel to each of the specific data marts and ensure that they contain common data. It is not uncommon to have some data (typically aggregated or summarized) fed back to the operational data store.
The data within the data marts is backup up using either application specific or common database procedures. They typically do not share common backup methodologies between each of the specific data marts.