Bridging the big data chasm

Data Chasm

James Willis, Technical Marketing Engineer at National Instruments examines improvement in test data storage and analysis to overcome the big data chasm.

National Instruments (NI) is experienced in working with engineers and scientists focused on automated test and providing insights on the current outlook and future trends affecting the market. This article looks at the changes required for innovative technology companies to streamline and improve the storage and analysis of valuable test data.

Automated test systems create vast amounts of data and with the growing complexity of the consumer devices we are testing, this is only going to increase. With growing pressure on test engineers to use this data to increase productivity, how do test departments ensure they have a system in place to store, mine and analyse the vast amount of data these test systems produce?

Typically, engineering teams work independently of IT in an organisation. This is a major contributor to the annoyance of an IT admin, who perhaps does not understand why the engineering department needs so much storage for test data. Working independently of engineering, IT departments often haven’t considered including managing test data from engineers and scientists as part of their role in the company.  However it is important to recognise the role that IT can play in helping to manage this vast amount of data.  The sheer amount of data being created by engineering departments is causing a chasm between IT and engineering teams and unless these groups work together to develop tools and methods to better use the data, the chasm will only grow wider as automated test systems become ever more complex.

Data Analytics

In order for a company to stay competitive in the market place, they need to dive into data produced by their test systems to search for important trends and correlation. Kimberly Madia IBM’s Worldwide Data Security Strategist, recently said “The shift to an increasingly flexible and dynamic development process requires rapid access to the appropriate test data”.

There is compelling business value in applying analytics and algorithms to exploit the mountain of data produced. Being able to identify a long-term recurring data spike, which could appear like an anomaly over a short time frame, could prevent a defective product being supplied. Uncovering problems like this through data mining could help improve production throughput and overall productivity.

Cross-Function teams

How do we ensure test departments work with IT to develop a data analytics solution that helps define a process for data management? One way to effectively evaluate new data solutions is to form a cross-functional team from both the IT and test departments and to additionally involve a data scientist and a manager with a high-level company view.

Making this transition to integrating these two departments involves reviewing questions such as:

* Is more than half of your analysis manual?
* Does your team spend more than 10% of their week searching for data trends?
* Are you analysing less than 80% of the data you’re collecting?
* Within your organisation – are teams using different data management tools?
* Is your data management tool used flexibly?
* Will the tools work at different data formats and rates?
* Can you easily share data with colleagues?
* Is your storage format flexible enough for future needs?

If you answered yes to two or more of these questions, then it is important to consider how you work with your test data.

The traditional method for analysis is a time consuming task: importing data into a spreadsheet program, adding formulas for processing, then finally displaying the results on a chart or graph. If testing has been automated, shouldn’t you also automate the analysis? Being such a time consuming task means that only a small amount of data can be looked at in a reasonable time frame, opening the door for key data trends to be missed.

Data Storage Process

One of the most important steps of integrating these departments is to agree on a standard process for data storage and presentation. For test data this can be particularly challenging when working with multiple measurement types from multiple sources and the task of taking these real world analogue measurements and digitising them. This is termed Big Analogue Data and just like traditional data is marked out by five Vs:

* Volume – systems generating large volumes of data
* Variety – data that changes in structure and format
* Velocity – data that changes in sample rates
* Value – significant value is derived from the analysis of data
* Visibility – data is accessed or visible from disparate or multiple geographic locations.

Siemens – NI Diadem

Siemens faced the challenge of implementing a Big Analogue Data management system. They needed to look into an issue that was causing high voltage transient signals from the overhead lines to the pantograph on one of their light-rail transit vehicles.

Siemens engineer Ryan Parkinson explained “Recording data with various rates and formats is only half the challenge; making sense of the data and effectively analysing it is the other half”.

The measurement system used at Siemens creates 16GB of data per day and typically they run this system for over three months. This generates over 1440 GB of test data, not including the video recordings they were also running at the same time. While looking at data management systems Siemens considered factors such as synchronisation across multiple channels, which may run at different sample rates.

To address these challenges, it is necessary to use software that is optimised for data management. In this case, Siemens selected NI DIAdem. “DIAdem uses automatically stored metadata to open, navigate, zoom and perform computations on extremely large files very quickly”, explained Parkinson.

Technical Data Management Streaming

Using the Technical Data Management Streaming (TDMS) file structure, Siemens realised advantages over more traditional formats. These included the ability to show many channels of data of different types and rates, synchronise them together and correlate the data. After gathering this data the next challenge is automating the analysis. Parkinson was able to use one of DIAdem’s key features to solve this challenge:

“DIAdem also supports scripting. Because we ran our monitoring system for more than three months, we generated hundreds of gigabytes of data and it was not feasible to open each file and manually analyse it”.

After determining the critical data needed to establish the root cause of the transients, the team at Siemens created a script that would open each file, look for critical events, and summarise the findings.

Developing a solution

Developing a bridge between IT and Engineering teams to create a complete data analytics solution takes time. Many companies make the mistake of expecting a full data analytics solution in an unreasonable amount of time. Developing a solution requires a designated planning period, which prevents teams suggesting solutions without fully understanding their true data needs. An approach that has been successful in a many leading companies is to run an internal pilot to help to identify data analysis requirements. These internal pilots define a process to analyse data from start to finish, integrate test data into existing IT infrastructure and trial data analytic software packages to see which one fits best. This trial period gives IT time to learn the differences between traditional big data and test data, and time to pinpoint strategies for a company-wide implementation of a test data solution.

A successful roll-out of a test data solution company-wide will help to address product bottlenecks, improve quality and reduce time to market by providing a more comprehensive picture of a product’s performance. These benefits will help increase overall profitability – what more could you want from your data?

Find out more

Complete Guide to Building a Measurement System

Related news

Read More News From National Instruments:

Leave a Reply

Your email address will not be published. Required fields are marked *

seventeen − 8 =