|  IT Skills for Meteorology    |  The OU Weather Lounge    |

Regular Expressions

 

My semester class project is a program that ingests tabular text data output from the NSSL Mesocyclone Detection Algorithm (MDA) and performs user selectable filtering.

The MDA processes WSR-88D level-II data and attempts to detect and classify vortices. During a typical severe weather event, a single radar may provide thousands of individual mesocyclone detections, many of which are weak circulations that are of little interest to myself. A large number are also false detections, identified while the radar is in clear air mode. The purpose of this program is to remove unwanted detections in order to create a more meaningful data set. The MDA output file (fort.40) is in tab-delimited format. Each line contains 120 columns of information and represents a single mesocyclone detection. Click here for a complete description of a fort.40 file.

The filtering is a multi-step process, as described below:

  • Step 1: Prompt the user to enter the desired mode:
    • Mode 1: (1) Add labels to the columns of the output file and (2) Columns 94 - 97 contain latitude and longitude information. However, the numerical format of these numbers is incorrect. For example, a latitude of 35.54 degrees is originally formatted as 355400 degrees. To remedy this, these values are divided by 1000.
    • Mode 2: Same as MODE 1...and remove mesocyclone detections that have a range of 147 KM or less than or equal to 5 KM. The detection range is located in column 6.
    • Mode 3: Same as MODE 1...and remove detections that have a Mesocyclone Strength Rank (MSr) of 0. This is performed by checking column 10. The MSr is a measure of the relative intensity of a mesocyclone detection and is provided as a non-dimensional index ranging from 0 (very weak) to 25 (exceptionally intense).
    • Mode 4: Same as MODE 1...and remove detections made while the radar is in clear-air mode (Volume Coverage Pattern 31 or 32). The VCP attribute is found in column 100.
    • Mode 5: The combined functionality of Modes 1 through 4.
    • Mode 6: Same as MODE 5 except the user can enter the MSr threshold used, instead of the standard value of 0.
  • Step 2: Prompt user to enter the name of the input file containing the pre-filtered data.
  • Step 3: Print header/column labels to output file.
  • Step 4: Ingest each line of data from the input file.
  • Step 5: Quality-check column 97. Occasionally the MDA writes a value of 1E7 to column 98. Because this number is so large, it carries over into adjacent column 97, which contains mesocyclone longitudes. For example, a longitude of -98765 might look like -9876510000000 instead. The program checks each value in column 97 and tests to see if it's less than -180000 (corresponding to a longitude of -180 degrees), which only occurs if column 98 has "bled" into column 97. To correct this, the value in column 97 is divided by 1E8 and added by 0.1.
  • Step 6: Test each detection to see if it passes the requirements set by the user in step 1. If so, write the data for that detection to an output file.

Two versions of this program were written and compared using both Perl and C programming languages. The functionality of the two versions are essentially the same. However, subtle differences make the Perl version more attractive, as summarized below:

  • Perl is an extremely portable programming language. I run the filtering program on both Linux and Sun UNIX machines. The Perl version is very handy because I don't have to compile it on each machine, as required by the C version. The Perl Journal states it best:
Perl is compiled on-the-fly. This means that as soon as you write your program, you can run it-you don't have to wait for your compiler to generate object code. Since Perl programs needn't be compiled for a particular type of computer, they can run on all of them without modification. The same Perl program can run on Unix, Windows, NT, Macs, DOS, Plan 9, OS/2, VMS, and AmigaOS.

  • The type of scalar value of the fort.40 attributes must be predefined in the C version. While only a handful of attributes have floating-point precision (the rest being integers), the C version ingests and outputs all as floats. The result is an output file that includes a large amount of unnecessary precision. The Perl version, on the other hand, stores numbers as signed integers if possible, or as double-precision floating-point values in the machine's native format otherwise (see below for examples). This means that the filtered data file size generated by the Perl version is quite a bit smaller than the one generated by the C version, usually on the order of 50%.
Attribute
Value in C Version
Value in Perl Version
Range
152.000000
152
Latitude
36.1220000
36.122

 

Both versions are available to download through the links below. A sample fort.40 file is also provided to test their functionality. This file contains MDA output from the KAMA (Amarillo, Texas) radar from February 19, 2002.

File
Filtering program (C version, Perl version)
Sample fort.40 file


Last updated: Saturday 26 October 2002
All material Copyright © 2002 Kevin M. McGrath. All rights reserved.
Please use only with permission. Violators will be prosecuted.