1.什么是KNN
KNN,K Nearest Neighbor(k近邻判别法),是一个常用的分类器(或用于聚类),我们通过选取K个邻近的点,来判断某个点的类别。
测试样本(绿色圆形)应归入要么是第一类的蓝色方形或是第二类的红色三角形。如果k=3(实线圆圈)它被分配给第二类,因为有2个三角形和只有1个正方形在内侧圆圈之内。如果k=5(虚线圆圈)它被分配到第一类(3个正方形与2个三角形在外侧圆圈之内)
KNN算法为分类算法。一句老话来描述KNN算法:“近朱者赤,近墨者黑”。
2.欧氏距离
两个数据:$dist=\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}$
多个数据:$dist=\sqrt{\sum_{i=1}^n(x_i-y_i)^2}$
3.基本实现方式
4.部分代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| from numpy import *
def createDataSet(): group = array([[1.0, 2.0], [1.2, 0.1], [0.1, 1.4], [0.3, 3.5]]) labels = ['A', 'A', 'B', 'B'] return group, labels
def classify(inputs, dataSet, label, k): """ :param inputs 需要识别的序列 :param dataSet 训练的数据 :param label 训练的类别 :param k 基准个数 """ dataSize = dataSet.shape[0] print("dataSize = %d" % dataSize)
diff = tile(input, (dataSize, 1)) - dataSet sqdiff = diff ** 2 print(diff) print(sqdiff) squareDist = sum(sqdiff, axis=1) dist = squareDist ** 0.5 print(dist)
sortedDistIndex = argsort(dist) print(sortedDistIndex)
classCount = {} for i in range(k): print("i = %d"% i) voteLabel = label[sortedDistIndex[i]] print("voteLabel = %s" % voteLabel) classCount[voteLabel] = classCount.get(voteLabel, 0) + 1
maxCount = 0 for key, value in classCount.items(): if value > maxCount: maxCount = value classes = key
return classes
|