USING KNOWLEDGE DISCOVERY FOR ONTOLOGY LEARNING

Reading Code 39 Extended In .NET FrameworkUsing Barcode Control SDK for .NET framework Control to generate, create, read, scan barcode image in Visual Studio .NET applications.

of the most important and demanding, in the remaining of this subsection we brie y describe both methods (clustering and LSI) for suggesting concepts. Turning to the second approach, naming of the concepts is based on proposing labels comprising the most common keywords (describing a subset of documents belonging to the topic), and alternatively on providing the most discriminative keywords (enabling classi cation of documents into the topic relative to the neighboring topics). Methods for document classi cation are brie y described in subsection 2.6.2. Document clustering (Steinbach et al., 2000) is based on a general data clustering algorithm adopted for textual data by representing each document as a word-vector, which for each word contains some weight proportional to the number of occurrences of the word (usually TFIDF weight as given in Equation (2.1)). d i TF Wi ; d IDF Wi ; where IDF Wi log D DF Wi 2:1

Creating Code 39 Extended In Visual Studio .NETUsing Barcode drawer for Visual Studio .NET Control to generate, create USS Code 39 image in VS .NET applications.

where D is the number of documents; document frequency DF(W) is the number of documents the word W occurred in at least once; and TF(W, d) is the number of times word W occurred in document d. The exact formula used in different approaches may vary somewhat but the basic idea remains the same namely, that the weighting is a measure of how frequently the given word occurs in the document at hand and of how common (or otherwise) the word is in an entire document collection. The similarity of two documents is commonly measured by the cosinesimilarity between the word-vector representations of the documents (see Equation (2.2)). The clustering algorithm group documents based on their similarity, putting similar documents in the same group. Cosinesimilarity is commonly used also by some supervised learning algorithms for document categorization, which can be useful in populating topic ontologies (ontology learning scenario 3 in Section 2.5). Given a new document, cosine-similarity is used to nd the most similar documents (e.g., using k-Nearest Neighbor algorithm (Mitchell, 1997)). Cosine-similarity between all the documents and the new document is used to nd the k most similar documents whose categories (topics) are then used to assign categories to a new document. For documents di and dj , the similarity is calculated as given in Equation (2.2). Note that the cosine-similarity between two identical documents is 1 and between two documents that share no words is 0. P dik djk k cos di ; dj r 2:2 P 2P 2 dil djm

Recognize Code39 In .NET FrameworkUsing Barcode scanner for .NET framework Control to read, scan read, scan image in .NET applications.

Latent Semantic Indexing is a linear dimensionality reduction technique based on a technique from linear algebra called Singular Value

Making Barcode In .NET FrameworkUsing Barcode printer for .NET framework Control to generate, create bar code image in VS .NET applications.

KNOWLEDGE DISCOVERY FOR ONTOLOGY CONSTRUCTION

Bar Code Scanner In Visual Studio .NETUsing Barcode recognizer for .NET Control to read, scan read, scan image in .NET applications.

Decomposition. It uses a word-vector representation of text documents for extracting words with similar meanings (Deerwester et al., 2001). It relies on the fact that two words related to the same topic more often cooccur together than words describing different topics. This can also be viewed as extraction of hidden semantic concepts or topics from text documents. The results of applying Latent Semantic Indexing on a document collection are fuzzy clusters of words each describing topics. More precisely, in the process of extracting the hidden concepts rst a term-document matrix A is constructed from a given set of text documents. This is a matrix having word-vectors of documents as columns. This matrix is decomposed using singular value decomposition so that A USVT, where matrices U and V are orthogonal and S is a diagonal matrix with ordered singular values on the diagonal. Columns of the matrix U form an orthogonal basis of a subspace of the original space where vectors with higher singular values carry more information (by truncating singular values to only the k biggest values, we get the best approximation of matrix A with rank k). Because of this, vectors that form this basis can also be viewed as concepts or topics. Geometrically each basis vector splits the original space into two halves. By taking just the words with the highest positive or the highest negative weight in this basis vector, we get a set of words which best describe a concept generated by this vector. Note that each vector can generate two concepts; one is generated by positive weights and one by negative weights.

ANSI/AIM Code 39 Drawer In C#.NETUsing Barcode creation for Visual Studio .NET Control to generate, create Code 3 of 9 image in .NET framework applications.

Creating USS Code 39 In .NET FrameworkUsing Barcode encoder for ASP.NET Control to generate, create Code 39 Full ASCII image in ASP.NET applications.

EAN128 Maker In .NETUsing Barcode maker for .NET framework Control to generate, create EAN / UCC - 14 image in VS .NET applications.

Drawing Bar Code In VS .NETUsing Barcode generation for VS .NET Control to generate, create barcode image in .NET framework applications.

Code128 Creation In Visual C#.NETUsing Barcode generator for VS .NET Control to generate, create Code 128 image in .NET framework applications.

GS1 - 13 Creator In VS .NETUsing Barcode generator for ASP.NET Control to generate, create EAN13 image in ASP.NET applications.

Paint GS1 - 13 In JavaUsing Barcode creator for Java Control to generate, create European Article Number 13 image in Java applications.

UPC A Creator In .NETUsing Barcode encoder for ASP.NET Control to generate, create UPC Code image in ASP.NET applications.