A deployment of smart meters in Europe and worldwide means appearance of unstructured energy data, that need to be stored, integrated, retrieved and analysed for which original high performance computer systems and algorithms have to be designed and evaluated.
The energy data collected by metering infrastructure represents the biggest volume of smart grid energy-related data. In the domaine of big data, it is a big challenge to store and manage these large volumes of data. In general data are cheap, but the value added information extracted from data is an article necessary to process and store. Big data architectures help to let machine understand and unveil the hidden association laws between data terms. Normally to assure this process, we store and manage large-scale structured or unstructured data in NoSQL databases, process in HDFS architecture or use cluster storage management.
Nevertheless, energy consumption is not just a simple mesure taken by a smart meter. It is a kind of behavior, habit or attitude, which can be recognized on consumption records. The treatment of behavioral patterns needs to deal with several challenges respecting a variety of such an information.
In this area, the principle goals of big data technologies are applied:
- To collect and manage large scale unstructured data sets that can not be handled by classical RDBMS: unstructed data are far more than structured ones
- To provide a compact description of contents in data and detect trends of evolution of business data
- To Identify non-linear associations between different objects to infer laws, reveal relationships and behaviors
Data appers in different forms with a different frequency of data generation. When we look at these data variables, we realize that these data change by each time stamp. To deal with that, we use high performance computation technologies (to perform high volume and high velocity data processing): MapReduce, GraphX, GPU parallel computing, multi-thread libraries and so on.
- Smart meter data as a Time series of real world data or data from public available datasets
- Temperature data
- Energy market data time series from residential pricing programs
- General information
- Other data
Big Data veracity refers to the biases, noises and abnormality in data. It is the biggest challenge in the world of Big Data to minimize noises when compares to things like volume and velocity. This requires solid skills in data cleaning to minimize the accumulating of dirty data in system.
To understand data it is necessary to programme computers to optimize a performance criterion using example data or past experience. This methodology is called machine learning. Learning is used when human expertise does not exist or people are enable to explain their expertise. It respects that solution can change in time and need to be adapted to particular cases.
GridPocket deals with this situation and provides energy-related value-added services solutions. Our scalable platform uses Hadoop ecosystem for collecting, manipulating and analyzing SmarGrid data. The platform is generic, because it allows several types of end-users to interact with it concurently, following their specific needs.
- Energy utilities, which have extended access to the data analysis
- Residential customers, that can manipulate their own data
- Companies and public institutions, that can exploit their data
- Data scientists, that can perform further analysis and research