Relational Database vs. Process Historian

Relational Database vs. Process Historian for process data; use BOTH!

July 5, 2010 Rich Winslow 4 comments

We frequently get asked to compare a relational database for storing time series data versus a true Process Historian, so I wanted capture our viewpoint.

The first hurdle is to dispell any rumors that I’m against Relational Databases; I’ve been pushing for the use of relational databases for process data since the mid 1990s because I was dealing with Batch records, thus I’m for relational databases. Back then I was looking to run a OSIsoft PI Data Historian alongside a Microsoft SQL Server.

I’ve run into enough cases where a relational database will not work for time series data based on a combination of the following:

– I/O rate
– # of tags
– Volume of data
– Uncompressed or poor exception spec tuning of input data

People often ask what the compression savings is between a relational database and a properly tuned Swinging Door Algorithm in a Process Historian. We see between 1:1000 and 1:5000 difference, thus a 1 gigabyte Process Historian database is a 1 to 5 terabyte database in a relational database, which is significant.

One might argue that this is unusually large and that most implementations don’t get that large. In January, I moved process data from a small lab with just 200 tags in a relational database to a Process Historian for a customer because the relational database was choked after 2 years and 5 terabytes of data which reduced down to 5 gigabytes. The reason it choked was that the relational database was not properly indexed and because the control system didn’t define realistic exceptions to filter insignificant changes (ex. system noise). Of course you say that it isn’t the fault of the relational database that the input was poor and the database wasn’t tuned and you are right. However, we run into that all the time:

1) Customers don’t have experienced to tune the exception specs in the control system or the ‘data people’ are not allowed touch the control system (ex. validated environment).
2) Customers have basic database, but not a true DBA that knows how to tune. They ‘never thought’ it would get that big.

So compression is significant and having the right tool with the right expertise is important, but it’s not the whole picture…

It is true that storage is cheap, but a relational database is spending too much time storing AND it is taking too much time retrieving as well. When you have a continuous process feeding new data AND you want to view data, having a 1000 to 1 difference in system noise data vs. useful information is significant. It’s a classic calculus problem of trying to fill the cylinder while you are emptying it… can you keep up. You may be able to keep up in the beginning, but as time goes on the Relational database is strangling itself even with a good index; the index will keep growing. A process historian stores data in individual files by time range and thus the file size is fixed (ex. 500 mb for 10,000 tags) and the performance is consistent regardless of how many years of data. As a result, a 100,000 tag system can request 365 days of data for 20 tags and have the same response time with 1 yr or 10 yrs of data; you won’t get the same response even with a well tuned relational database.

A Process Historian is not just a niche time series database, it is an application that offers functionality beyond the just data storage:

1) Time series trending with the intelligence to not display every value but sufficient inflection points for it to be accurate and readable, the ability to zoom in/out and scroll; normal x-y chart will not do.
2) Historical Process Graphics displays
3) Real-time screen updates
4) Statistical Process Control
5) Accurate totalization
6) Access to multiple data historians
7) Interfaces that can record data from 250+ different control systems (OPC and non-OPC)
8) N-way redundancy and high availability (redundant data collection and redundant servers)

Relational databases can be used for storing time series data in select cases, but it has not been difficult to justify the cost of a process historian when you consider the additional benefits of process historian as application and not just a database. If you cannot justify the cost of a process historian and a relational database makes sense for now and the long-term, then by all means use the relational database… it is an awesome tool!!!

But Process Historians like OSIsoft PI are learning to provide BOTH types of database in their application offering. OSIsoft is using Microsoft SQL server for its Analysis Framework that holds the asset management hierarchy and relational type batch history data AND they have shifted from a tag-centric application to an asset-centric application with a relational database at the core.

So consider using both in a best of breed model and it becomes a win-win scenario!

—————————————————————–

Bio: Rich Winslow, co-owner of Automated Results Computer Consulting

Rich has been working with Hierarchial, Network, and Relational Databases since 1980, Data Historians since 1990, and has been a process historian consultant since 2000 working with Oil Refineries, Pharamaceuticals, Paper Mills, Power Plants, and Chemical Manufacturers of all sizes. He was a member of the S88 Batch standards committee and one of the original members of the S95 Enterprise Data standards committee.

Categories: Data Acquisition, OSI PI, Process Historian Tags: Compression ratio of a Process Historian, Relational Database for process data, Relational Database vs. Process Historian, Swinging Door Algorithm, Time S, Time Series Database

OSI PI Blog

Archive