A SOFTWARE PACKAGE TO AUTOMATE THE OPERATION OF A MODEL-BASED SYSTEM FOR AIR POLLUTION IDENTIFICATION, MONITORING AND FORECASTING
Building and deploying a software solution to streamline the operation of a system for air pollution identification, monitoring and forecasting combined with expert consulting to improve the system computation performance
The Institute for Ecology of Industrial Areas operates a system for air pollution identification, monitoring and forecasting. The system is used to collect and monitor data about air pollution emission sources, model the emission from the sources, measure pollutant concentration levels as well as track and predict air pollution diffusion in an area under scrutiny. The system models air pollutants emission and diffusion in the Voivodeship of Silesia with the population of about 4.5 mln. The modelling is based on imported weather forecast data, pollutant concentration data recorded in the monitoring stations and the estimations of air pollution influx from outside the area under monitoring. The system enables, among other things, to forecast air pollution levels based on the changing weather conditions, to identify specific pollutants and emission sources in selected locations as well as to establish the quality of air in a location without the need to build and maintain a dedicated monitoring station. The monitoring services are delivered to local, regional and national government bodies as well as public and private research agencies.
In order to deliver the service, the client operated a system with a set of constituent numerical models such as WRF (Weather Research and Forecasting) - a weather prediction system, CALPUFF - a meteorological and air quality modeling system developed by Exponent, and CALMET. The client’s system was manually operated, heavily dependent on the expert knowledge of a single individual and fairly inflexible. The client wanted to depart from manual procedures, automate the operation of the system, capture the expertise needed in the software itself as well as increase the operational flexibility of the system.
The CPU-intensive model programs (WRF, CALPUFF, CALMET) and network-distributed computing (InfiniBand, MPI) required the client to set up and manage a fairly complicated system, incl. configuration files, environment variables, command line switches. To depart from manual procedures, we set up a technical environment for the deployment of the constituent models as well as developed a software package (a set of more than 250 scripts) with which to automate the compilation, configuration and parameterization of the models involved and run HPC (High Performance Computing) computations in the system in a more efficient manner. The solution built also supports geographic map formats conversions, i.e. the translation of geolocalized data between the models involved, e.g. the export of WRF weather-related data into the CALPUFF model.
As regards specifically the WRF model, we implemented WRF geographic domains for the system and programmed a script to semi-automatically re-configure and compile the WRF model when a new version of the WRF software becomes available. We also developed scripts which enabled to visualize emission data from an ArcGIS-based tool as well as to export the data into an MS SQL database for further processing.
In designing and implementing the software, we aimed to synchronize the operation of the system constituent models as well as to capture the expert’s knowledge and ensure its coherent deployment across the whole system. All the system parameters are now generated from the settings in the software package, i.e. the parameters set in the scripts are fed automatically across the whole system. Importantly, the new set-up increased the flexibility of the system and put the user in a position to much more easily experiment with varied parameters in the system.
As regards computation, we delivered expert consulting whereby we conducted research to arrive at an efficient configuration of the high computation cluster (HPC cluster). We also implemented a diagnostic tool to monitor NICs usage on the HPC cluster to identify computation bottlenecks and take measures to streamline the computation process by improving the latency of the system - InfiniBand was used to replace GigaBit Ethernet. Finally, we built and deployed a tool with which to analyze the system codebase and identify the location of specific functionalities. The solution facilitates optimization, refactoring and codebase maintenance work.
- software package with a set of over 250 scripts for automating the system operation
- expert knowledge captured in the software system
- network analysis and recommendations for high performance computation optimization
- diagnostic tools for system reconfiguration, maintenance and upgrades
- more efficient system compilation, configuration, parameterization and operation
- more efficient high performance computations
- greater system flexibility for the generation of experiments
- reduced risk / reduced dependency on an individual expert’s knowledge