traditional machine learning

Traditional machine learning

The research way of traditional machine learning mostly includes research on Bayesian learning, artificial neural networks, random forests, and decision trees.

Decision trees are a common method of machine learning. At the end of the 20th century, machine learning researcher J. Ross Quinlan found Shannon’s information theory in the decision tree algorithm and proposed the ID3 algorithm. In 1984, I. Kononenko, I.Bratko, and E.Roskar proposed the AS-SISTANTAlgorithm based on the ID3 algorithm, which allows the intersection of the values ?? of the categories. In the same year, A.Hart proposed the Chi-Squared statistical algorithm, which utilizes a statistic based on the degree of association between categories and attributes (Traditional machine learning). In 1984, L.Breiman, C.Ttone, R.Olshen, and J.Freidman proposed the concept of decision tree pruning, which greatly improved the performance of decision trees.

In 1993, Quinlan proposed an improved algorithm based on the ID3 algorithm, namely the C4.5 algorithm. The C4.5 algorithm controls the problem of attribute bias of the ID3 algorithm and improves the processing of ongoing attributes. Through pruning, the phenomenon of “overfitting” is avoided to a certain extent. However, when the algorithm discretizes a continuous attribute, it needs to traverse all the values ?? of the attribute, which reduces the efficiency and requires the training sample set to reside in memory, which is not suitable for processing large-scale data sets.

In 2010, Xie submit a CART algorithm, which is a flexible procedure to express the conditional distribution variable Y of a given prediction vector X and has been put in many areas. The CART algorithm can deal with disordered data and uses the Gini coefficient as the selection criterion for the test attributes. The decision tree generated by the CART algorithm has high accuracy, but when the complexity of the decision tree generated by the CART algorithm exceeds a certain level, the classification accuracy will decrease with the increase of the complexity of Traditional machine learning, so the decision tree established by the algorithm should not be too complicated.

In 2007, Fang Xiangfei described an algorithm called SLIQ (decision tree classification). The classification accuracy of this algorithm is comparable to other decision tree algorithms, but its execution speed is faster than other decision tree algorithms. There is no limit to the number of samples and the number of attributes.

SLIQ Algorithm

The SLIQ algorithm can handle large-scale training sample sets, has good scalability, fast execution speed, and can generate smaller binary decision trees. The SLIQ algorithm enables parallelism by allowing multiple processors to process attribute tables simultaneously. anyway, the SLIQ algorithm is still unable to get rid of the limitation of main memory space.

In 2000, RajeevRaSto and others proposed the PUBLIC algorithm, which prunes the decision tree that has not been fully generated, thus improving the efficiency. Fuzzy decision trees have also flourished in recent years. Considering the correlation between attributes, the researchers proposed a hierarchical regression algorithm, a constrained hierarchical induction algorithm, and a functional tree algorithm.

These three algorithms are all decision tree algorithms based on the combination of multiple classifiers. Some experiments and studies have been carried out on the property, but these studies have not generally explained how the correlation between attributes affects the performance of decision trees. In addition, there are many other algorithms, such as an optimization algorithm based on rough sets proposed by Zhang.J in 2014, and an algorithm model based on extreme learning trees proposed by Wang.R in 2015.

Random Forest (RF)

As one of the important algorithms in machine learning, random forest (RF) is a method that utilizes multiple tree classifiers for classification and prediction. In these years, the development of the random forest algorithm research has been very fast, and applied research has been carried out in many fields such as genetics, medicine, ecology, bioinformatics, and remote sensing geography.

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) is an algorithm with nonlinear adaptive information processing ability, which can overcome the shortcomings of traditional artificial intelligence methods for intuition, such as pattern, speech recognition, and unstructured information processing. Artificial neural networks have received attention as early as the 1940s and have developed rapidly since then.

Bayesian Learning

Bayesian learning is an earlier research direction of machine learning, and its method originated from the British mathematician Thomas, a special case of Bayes’ theorem proved by Bayes in 1763. Through the joint efforts of many statisticians, Bayesian statistics was gradually established after the 1950s and became an important part of statistics.

Leave a Comment

Your email address will not be published. Required fields are marked *