Home > Data Acquisition, OSI PI, Process Historian > Relational Database vs. Process Historian for process data; use BOTH!

Relational Database vs. Process Historian for process data; use BOTH!

We frequently get asked to compare a relational database for storing time series data versus a true Process Historian, so I wanted capture our viewpoint.

The first hurdle is to dispell any rumors that I’m against Relational Databases; I’ve been pushing for the use of relational databases for process data since the mid 1990s because I was dealing with Batch records, thus I’m for relational databases. Back then I was looking to run a OSIsoft PI Data Historian alongside a Microsoft SQL Server.

I’ve run into enough cases where a relational database will not work for time series data based on a combination of the following:

– I/O rate
– # of tags
– Volume of data
– Uncompressed or poor exception spec tuning of input data

People often ask what the compression savings is between a relational database and a properly tuned Swinging Door Algorithm in a Process Historian. We see between 1:1000 and 1:5000 difference, thus a 1 gigabyte Process Historian database is a 1 to 5 terabyte database in a relational database, which is significant.

One might argue that this is unusually large and that most implementations don’t get that large. In January, I moved process data from a small lab with just 200 tags in a relational database to a Process Historian for a customer because the relational database was choked after 2 years and 5 terabytes of data which reduced down to 5 gigabytes. The reason it choked was that the relational database was not properly indexed and because the control system didn’t define realistic exceptions to filter insignificant changes (ex. system noise). Of course you say that it isn’t the fault of the relational database that the input was poor and the database wasn’t tuned and you are right. However, we run into that all the time:

1) Customers don’t have experienced to tune the exception specs in the control system or the ‘data people’ are not allowed touch the control system (ex. validated environment).
2) Customers have basic database, but not a true DBA that knows how to tune. They ‘never thought’ it would get that big.

So compression is significant and having the right tool with the right expertise is important, but it’s not the whole picture…

It is true that storage is cheap, but a relational database is spending too much time storing AND it is taking too much time retrieving as well. When you have a continuous process feeding new data AND you want to view data, having a 1000 to 1 difference in system noise data vs. useful information is significant. It’s a classic calculus problem of trying to fill the cylinder while you are emptying it… can you keep up. You may be able to keep up in the beginning, but as time goes on the Relational database is strangling itself even with a good index; the index will keep growing. A process historian stores data in individual files by time range and thus the file size is fixed (ex. 500 mb for 10,000 tags) and the performance is consistent regardless of how many years of data. As a result, a 100,000 tag system can request 365 days of data for 20 tags and have the same response time with 1 yr or 10 yrs of data; you won’t get the same response even with a well tuned relational database.

A Process Historian is not just a niche time series database, it is an application that offers functionality beyond the just data storage:

1) Time series trending with the intelligence to not display every value but sufficient inflection points for it to be accurate and readable, the ability to zoom in/out and scroll; normal x-y chart will not do.
2) Historical Process Graphics displays
3) Real-time screen updates
4) Statistical Process Control
5) Accurate totalization
6) Access to multiple data historians
7) Interfaces that can record data from 250+ different control systems (OPC and non-OPC)
8) N-way redundancy and high availability (redundant data collection and redundant servers)

Relational databases can be used for storing time series data in select cases, but it has not been difficult to justify the cost of a process historian when you consider the additional benefits of process historian as application and not just a database. If you cannot justify the cost of a process historian and a relational database makes sense for now and the long-term, then by all means use the relational database… it is an awesome tool!!!

But Process Historians like OSIsoft PI are learning to provide BOTH types of database in their application offering. OSIsoft is using Microsoft SQL server for its Analysis Framework that holds the asset management hierarchy and relational type batch history data AND they have shifted from a tag-centric application to an asset-centric application with a relational database at the core.

So consider using both in a best of breed model and it becomes a win-win scenario!

—————————————————————–

Bio: Rich Winslow, co-owner of Automated Results Computer Consulting

Rich has been working with Hierarchial, Network, and Relational Databases since 1980, Data Historians since 1990, and has been a process historian consultant since 2000 working with Oil Refineries, Pharamaceuticals, Paper Mills, Power Plants, and Chemical Manufacturers of all sizes. He was a member of the S88 Batch standards committee and one of the original members of the S95 Enterprise Data standards committee.

Advertisements
  1. Gum
    January 6, 2011 at 1:08 pm

    Hi,

    How does PI implement, if at all, the ‘D’ (durability) portion of ACID (http://en.wikipedia.org/wiki/ACID) that SQL DBs do?

    From reading the PI manual it appears that clients send data to the snapshot subsystem which pushes it to a queue from which the archive subsystem reads, stores in a cache, then writes to disk periodically.

    Can a client get a guarantee from PI that the event it sent to PI has been flushed to disk?

    What happens if clients send a bunch of events and the PI server gets shut down before these events work their way through the various subsystems and get flushed to disk?

    thanks

    • Rich Winslow
      January 6, 2011 at 2:27 pm

      We’ve worked with the PI database since 1996 on VMS, Windows NT, and Windows Server. All of these systems used the disk caching technique and even after power failures, we’ve never lost information.

      You can watch all the queues dynamically using the SMT (System Management Tool) to see how much information is backed up in the queues. We look at this following a server or network outage and you can see an influx of data and watch it work down to real-time. Based on these observations, the caching mechanism is only caching for 1 or 2 seconds and is able to keep up nicely. We have observed this on systems with 60,000 – 500,000 data points running at 15,000 data events / second.

      Hopefully this will help, but feel free to ask further.

      Rich Winslow
      Automated Results
      828-862-6667

  2. Vaibhav Dantale
    April 26, 2011 at 5:42 am

    How can I estimate disk space required. Suppose I have to manage 1 million devices, and collect data every 15min and the information I am collecting is may be 3 different decimal values?

  3. Ger Otten
    November 22, 2011 at 7:42 am

    Hi,

    A lot of process information is relational. But how can I query this information in a historian which is time related. Must I combine this data in a historian with a relational database?

    For instance I have defined a KPI for instance for the OEE and I want to get to know for a certain production line what the main factors are that contribute to a drop in this OEE. And what about adhoc queries were a lot of realitions are involved. Does a relational database use the data in the historian?

    And why do not use this model for all databases? In other words besides all the advantages of a historian what are the disadvantages?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: