Impact of Feature Selection in .NET

Creation Data Matrix in .NET Impact of Feature Selection
13.2.1 Impact of Feature Selection
Data Matrix Scanner In VS .NET
Using Barcode Control SDK for Visual Studio .NET Control to generate, create, read, scan barcode image in .NET applications.
To investigate the impact of feature selection on predictor performance, the experimentation was carried out both with and without FRFS. The unreduced data for
ECC200 Maker In Visual Studio .NET
Using Barcode maker for .NET Control to generate, create DataMatrix image in VS .NET applications.
APPLICATIONS IV: ALGAE POPULATION ESTIMATION
Decode ECC200 In .NET
Using Barcode scanner for .NET framework Control to read, scan read, scan image in .NET framework applications.
TABLE 13.1 Species 1 2 3 4 5 6 7
Bar Code Printer In VS .NET
Using Barcode printer for Visual Studio .NET Control to generate, create bar code image in .NET framework applications.
Features selected: FRFS Subset {season, size, ow, {season, size, ow, {season, size, ow, {season, size, ow, {season, size, ow, {season, size, ow, {season, size, ow,
Reading Barcode In Visual Studio .NET
Using Barcode scanner for VS .NET Control to read, scan read, scan image in VS .NET applications.
1, 1, 1, 1, 1, 1, 1,
Encoding Data Matrix In C#.NET
Using Barcode drawer for Visual Studio .NET Control to generate, create DataMatrix image in .NET framework applications.
2, 2, 3, 2, 2, 2, 7,
DataMatrix Printer In .NET
Using Barcode generator for ASP.NET Control to generate, create DataMatrix image in ASP.NET applications.
3, 7} 3, 7, 8} 4, 5, 7} 5} 4, 7, 8} 4, 5, 8} 8}
Printing Data Matrix In Visual Basic .NET
Using Barcode creation for .NET framework Control to generate, create Data Matrix image in Visual Studio .NET applications.
each species of alga were supplied to each predictor and used in evaluation via cross-validation. Then the same data were processed by FRFS to reduce dimensionality and evaluated in an identical fashion. This resulted in, on average, a 7-attribute dataset selected from the original, 11-attribute one. The exact selected attributes were different for each alga species (as can be seen in Table 13.1), although certain attributes were present in all 7 reduct sets, namely the season, size of the river, ow rate of the water, and concentration 1. The obtained reducts could not be veri ed based on empirical evidence because the dataset documentation mentions the names of the concentration attributes but not their ordering in the data; hence it is needed to refer to the chemical concentrations by number rather than name. However, based on previous experience with FRFS [163], it is expected that the selected feature subsets would overall make sense to an expert. It must also be noted, however, that it is dif cult to verify directly the quality of selected attributes, in default of a suitable quality metric. The most accessible way is therefore to use the reduced and unreduced data to train a learning system, and compare the results. This gives an indirect measure of subset quality. The results of experimentation using linear regression can be found in Figure 13.2. It can be seen that both approaches perform similarly in terms of RMSE and MAE, with FRFS-based predictions somewhat more accurate in general. This trend is re ected in the results for M5Prime (presented in Figure 13.3) and Pace (Figure 13.5). For SMOreg (Figure 13.6) the results for both methods are very similar, which is to be expected as SVM methods are not sensitive to feature selection. It is worth reiterating that the task of the system is to reduce the number of measurements that must be obtained while maintaining prediction performance. This is clearly the case in these experiments. Figure 13.4 shows the results for the BPNN-based predictor. Here a small difference in performance can be seen between the two approaches. The method that incorporates FRFS produces some improvement in accuracy for each algae estimation problem. Again, note that the improvement in accuracies are obtained with fewer measured variables, which is important for dynamic systems where observables are often restricted, or where the cost of obtaining more measurements is high. In the river algae domain, for instance, providing different measurements has different
Code 39 Creator In Visual Studio .NET
Using Barcode generation for .NET Control to generate, create Code 3 of 9 image in Visual Studio .NET applications.
EXPERIMENTATION
EAN 13 Printer In Visual Studio .NET
Using Barcode maker for .NET framework Control to generate, create EAN-13 image in Visual Studio .NET applications.
Unreduced Reduced
Print Code 128 Code Set A In Visual Studio .NET
Using Barcode generation for VS .NET Control to generate, create Code 128 Code Set C image in .NET applications.
RMSE 10
Making ANSI/AIM ITF 25 In .NET Framework
Using Barcode generation for .NET Control to generate, create ANSI/AIM ITF 25 image in .NET applications.
4 Algae
UPC A Generator In Visual Basic .NET
Using Barcode drawer for Visual Studio .NET Control to generate, create UPCA image in Visual Studio .NET applications.
Unreduced Reduced
Drawing Data Matrix In Visual Basic .NET
Using Barcode printer for .NET Control to generate, create Data Matrix 2d barcode image in Visual Studio .NET applications.
MAE 0 2 4 6
Recognize Universal Product Code Version A In .NET
Using Barcode reader for .NET Control to read, scan read, scan image in VS .NET applications.
4 Algae
Barcode Creator In Java
Using Barcode generator for Java Control to generate, create bar code image in Java applications.
Unreduced and reduced data RMSEs and MAEs with linear regression
Print UPC-A In Visual C#.NET
Using Barcode creation for Visual Studio .NET Control to generate, create UPC-A Supplement 5 image in Visual Studio .NET applications.
costs attached. It is trivial to give the time of year and size of river, but ow rate may need extra equipment. Additionally each of the measurements of concentration of chemicals may need its own process, requiring time, well-trained personnel, and money. Reducing the number of measurements to be made signi cantly enhances the potential of the estimator system.
Code 3/9 Scanner In VS .NET
Using Barcode scanner for VS .NET Control to read, scan read, scan image in VS .NET applications.
APPLICATIONS IV: ALGAE POPULATION ESTIMATION
Data Matrix ECC200 Maker In Java
Using Barcode maker for Java Control to generate, create Data Matrix 2d barcode image in Java applications.
Unreduced Reduced
ANSI/AIM Code 39 Drawer In VB.NET
Using Barcode encoder for .NET framework Control to generate, create ANSI/AIM Code 39 image in Visual Studio .NET applications.
RMSE 10
4 Algae
Unreduced Reduced
MAE 0 2 4 6
4 Algae
Figure 13.3 Unreduced and reduced data RMSEs and MAEs with M5Prime
Comparison with RELIEF
In order to further show the utility of feature selection, and in particular, the bene ts of using FRFS, a well-established FS algorithm was chosen for experimental comparisons: Relief (see Section 4.2.1.1). Unlike most FS methods, both FRFS and Relief can handle continuous decision features. For the experimentation
EXPERIMENTATION
Unreduced Reduced
RMSE 15
4 Algae
Unreduced Reduced
MAE 10
4 Algae
Unreduced and reduced data RMSEs and MAEs with BPNN
presented here, only those features that result in a nal positive weight are selected (see Table 13.2). Figures 13.7 to 13.13 show the results for the unreduced, FRFS-reduced, and Relief-reduced data for algae species 1 to 7. It is clear that estimators trained using data reduced by FRFS generally outperform those trained using