Sentiment Analysis and Deep Understanding Support Vector Machine Kernels
Introduction:
Text Analysis using Machine Learning is a method of categorizing and extracting information from unstructured text data. In the field of Information Extraction and Sentiment Analysis, SVM’s Polynomial and Radial kernels can be used to classify text into various categories such as positive, negative, or neutral. The non-linear patterns in the data can be handled effectively by these kernel functions, making them suitable for text classification problems.
Preparation:
Before we can classify the text data, we need to convert the text inputs into numerical values and vectors. The process of converting text data into numerical representations is known as Feature Extraction. There are several ways to perform Feature Extraction, including graph-based models and statistical models.
- In Graph Based Models; words can be represented as symbolic nodes, and the relationships between them can be represented using “WordNet”.
- In Statistical Models; we need numerical representations. Depending on the end goal, we may use either document-level representations, such as “Bag-of-Words” or “Doc2Vec”, or Word-level representations, such as “Word2Vec” or “GloVe”.
The choice of representation depends on the specific task, and selecting the best method often requires experimentation.
Classification:
Once the text data is prepared, it can be classified using SVM models. Two commonly used kernels in text analysis are the Polynomial and Radial Basis Function (RBF) kernels.
Polynomial Kernel
The Polynomial kernel is defined by the equation (a ⋅ b + r)^d , where:
- r: Represents the coefficient of the polynomial.
- d: Represents the degree of the polynomial.
These parameters are typically determined through cross-validation.
To illustrate, let’s consider an example where r=1/2 and d=2. Plugging these values into the formula:
(a ⋅ b + 1/2)^2 = (a ⋅ b + 1/2) ⋅ (a ⋅ b + 1/2) = a^2 ⋅ b^2 + ab + 1/4
Here, the dot product between each pair of points is calculated as:
(a, a^2, 1/2) ⋅ (b, b^2, 1/2) = ab + (a^2 ⋅ b^2) + 1/4
In this example, “a” and “b” represent the “x-axis” coordinates, “a^2” and “b^2” represent the “y-axis”, and “r” represents the “z-axis”, set to 1/2. Adjusting the value of “r” affects the distance between sample pairs on the x-axis.
Radial Basis Function (RBF) Kernel
The RBF kernel, also known as the Gaussian kernel, is based on a radial basis function. It’s particularly useful when there’s significant overlap in the data, such as in cases of varying drug dosages in patients. In such scenarios, finding an optimal Support Vector Classifier (SVC) with linear kernels may be challenging, and an SVM with an RBF kernel might be more appropriate.
The RBF kernel can operate in infinite dimensions, allowing it to function like a Weighted Nearest Neighbour model in a two-dimensional space. For instance, if the closest neighbours to a new observation are classified as “not cured,” the RBF kernel will likely classify the new observation similarly.
The RBF Kernel formula: e ^ — ( γ (a-b)²)
Where “γ” determines the influence of each observation in the training dataset on classifying new observations.
Comparing Polynomial and RBF Kernels:
In text analysis, Linear, Polynomial, and RBF kernels are employed based on the nature of the data and the classification task. The Polynomial kernel can be considered an extension of the Linear kernel, while the RBF kernel operates in potentially infinite dimensions.
For example, consider the Polynomial kernel formula for r=0 and d=1:
In Polynomial Kernel, we had kernel = ( a x b + r )^d formula, when;
- r=0, d=1 → dot product = (a¹) . (b¹)
- r=0, d=2 → dot product=(a²) . (b²)
When extended to infinite dimensions, the Polynomial kernel formula becomes:
(a⁰) x (b⁰) + (a¹) x (b¹) + (a²) x (b²) + … + (a^∞) x (b^∞) = (a, a²,…, a^∞) . (b, b²,…,b^∞)
Radial Kernel Formula:
e ^ — ( γ (a-b)²) = e ^ (-1/2(a-b)²) = e ^ ( -1/2 (a² + b²) ) x e ^ ( ab )
By using Taylor Series Expansion on e^(ab), its infinite calculation can be found using a formula for a given value. This results in derivatives and dot products that refer to linear or polynomial kernels in infinite dimensions without the need for additional computations or machine resources.
Conclusion:
In general, text mining algorithms used in Information Extraction and Sentiment Analysis include Support Vector Machines (SVM) with various kernel functions, such as Polynomial and Radial kernels. These algorithms are used to classify text data into different categories and to extract relevant information from the data. We can assume that Linear, Polynomial and Radial kernels are the way that the computation behind the scene with the ability of finding the best accurate model in an infinite dimension with different approaches and formulas.