Ⅰ 怎麼得到python中歸一化直方圖橫坐標的對應值
a=plt.hist()
a[0]就是bins的高度,a[1]就是bins的列表
Ⅱ python數據歸一化的函數嗎
目測是autonorm.py中lin 17 normdataset=zeros(shape(dataset)) 這一句 shape(dataset)返回的是元組,但是zeros( args )需要的是整形參數,做個類型轉換就ok了
Ⅲ 數據的歸一化處理
歸一化的具體作用是歸納統一樣本的統計分布性。歸一化在0-1之間是統計的概率分布,歸一化在某個區間上是統計的坐標分布。歸一化有同一、統一和合一的意思。
1、(0,1)標准化:
這是最簡單也是最容易想到的方法,通過遍歷feature vector里的每一個數據,將Max和Min的記錄下來,並通過Max-Min作為基數(即Min=0,Max=1)進行數據的歸一化處理:
LaTex:{x}_{normalization}=frac{x-Min}{Max-Min}
Python實現:
Ⅳ python怎麼做均值方差歸一化
可以用線性歸一化,就是找到最大值和最小值。
平均數是表示一組數據集中趨勢的量數,是指在一組數據中所有數據之和再除以這組數據的個數。它是反映數據集中趨勢的一項指標。解答平均數應用題的關鍵在於確定「總數量」以及和總數量對應的總份數。在統計工作中,平均數(均值)和標准差是描述數據資料集中趨勢和離散程度的兩個最重要的測度值。
Ⅳ 在Python中,能不能在不使用sklearn包的情況下對數據進行歸一化處理
用定義唄,
取 data.max ,data.min
然後對 所有元素 取 (x- data.min ) / (data.max - data.min) 就可以了
Ⅵ 怎麼看python中邏輯回歸輸出的解釋
以下為python代碼,由於訓練數據比較少,這邊使用了批處理梯度下降法,沒有使用增量梯度下降法。
##author:lijiayan##data:2016/10/27
##name:logReg.pyfrom numpy import *import matplotlib.pyplot as pltdef loadData(filename):
data = loadtxt(filename)
m,n = data.shape print 'the number of examples:',m print 'the number of features:',n-1 x = data[:,0:n-1]
y = data[:,n-1:n] return x,y#the sigmoid functiondef sigmoid(z): return 1.0 / (1 + exp(-z))#the cost functiondef costfunction(y,h):
y = array(y)
h = array(h)
J = sum(y*log(h))+sum((1-y)*log(1-h)) return J# the batch gradient descent algrithmdef gradescent(x,y):
m,n = shape(x) #m: number of training example; n: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)
a = 0.0000025 # learning rate maxcycle = 4000 theta = zeros((n+1,1)) #initial theta J = [] for i in range(maxcycle):
h = sigmoid(x*theta)
theta = theta + a * (x.T)*(y-h)
cost = costfunction(y,h)
J.append(cost)
plt.plot(J)
plt.show() return theta,cost#the stochastic gradient descent (m should be large,if you want the result is good)def stocGraddescent(x,y):
m,n = shape(x) #m: number of training example; n: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)
a = 0.01 # learning rate theta = ones((n+1,1)) #initial theta J = [] for i in range(m):
h = sigmoid(x[i]*theta)
theta = theta + a * x[i].transpose()*(y[i]-h)
cost = costfunction(y,h)
J.append(cost)
plt.plot(J)
plt.show() return theta,cost#plot the decision boundarydef plotbestfit(x,y,theta):
plt.plot(x[:,0:1][where(y==1)],x[:,1:2][where(y==1)],'ro')
plt.plot(x[:,0:1][where(y!=1)],x[:,1:2][where(y!=1)],'bx')
x1= arange(-4,4,0.1)
x2 =(-float(theta[0])-float(theta[1])*x1) /float(theta[2])
plt.plot(x1,x2)
plt.xlabel('x1')
plt.ylabel(('x2'))
plt.show()def classifyVector(inX,theta):
prob = sigmoid((inX*theta).sum(1)) return where(prob >= 0.5, 1, 0)def accuracy(x, y, theta):
m = shape(y)[0]
x = c_[ones(m),x]
y_p = classifyVector(x,theta)
accuracy = sum(y_p==y)/float(m) return accuracy
調用上面代碼:
from logReg import *
x,y = loadData("horseColicTraining.txt")
theta,cost = gradescent(x,y)print 'J:',cost
ac_train = accuracy(x, y, theta)print 'accuracy of the training examples:', ac_train
x_test,y_test = loadData('horseColicTest.txt')
ac_test = accuracy(x_test, y_test, theta)print 'accuracy of the test examples:', ac_test
學習速率=0.0000025,迭代次數=4000時的結果:
似然函數走勢(J = sum(y*log(h))+sum((1-y)*log(1-h))),似然函數是求最大值,一般是要穩定了才算最好。
從上面這個例子,我們可以看到對特徵進行歸一化操作的重要性。
Ⅶ 怎麼將一組數據歸一化到(0,1)之間,用matlab編程
很簡單,用函數mapminmax,文檔太長我就不翻譯了,只提醒幾個關鍵
1 默認的map范圍是[-1, 1],所以如果需要[0, 1],則按這樣的格式提供參數:
MappedData = mapminmax(OriginalData, 0, 1);
2 只按行歸一化,如果是矩陣,則每行各自歸一化,如果需要對整個矩陣歸一化,用如下方法:
FlattenedData = OriginalData(:)'; % 展開矩陣為一列,然後轉置為一行。
MappedFlattened = mapminmax(FlattenedData, 0, 1); % 歸一化。
MappedData = reshape(MappedFlattened, size(OriginalData)); % 還原為原始矩陣形式。此處不需轉置回去,因為reshape恰好是按列重新排序
文檔全文如下:
mapminmax
Process matrices by mapping row minimum and maximum values to [-1 1]
Syntax
[Y,PS] = mapminmax(YMIN,YMAX)
[Y,PS] = mapminmax(X,FP)
Y = mapminmax('apply',X,PS)
X = mapminmax('reverse',Y,PS)
dx_dy = mapminmax('dx',X,Y,PS)
dx_dy = mapminmax('dx',X,[],PS)
name = mapminmax('name');
fp = mapminmax('pdefaults');
names = mapminmax('pnames');
remconst('pcheck',FP);
Description
mapminmax processes matrices by normalizing the minimum and maximum values of each row to [YMIN, YMAX].
mapminmax(X,YMIN,YMAX) takes X and optional parameters
X
N x Q matrix or a 1 x TS row cell array of N x Q matrices
YMIN
Minimum value for each row of Y (default is -1)
YMAX
Maximum value for each row of Y (default is +1)
and returns
Y
Each M x Q matrix (where M == N) (optional)
PS
Process settings that allow consistent processing of values
mapminmax(X,FP) takes parameters as a struct: FP.ymin, FP.ymax.
mapminmax('apply',X,PS) returns Y, given X and settings PS.
mapminmax('reverse',Y,PS) returns X, given Y and settings PS.
mapminmax('dx',X,Y,PS) returns the M x N x Q derivative of Y with respect to X.
mapminmax('dx',X,[],PS) returns the derivative, less efficiently.
mapminmax('name') returns the name of this process method.
mapminmax('pdefaults') returns the default process parameter structure.
mapminmax('pdesc') returns the process parameter descriptions.
mapminmax('pcheck',FP) throws an error if any parameter is illegal.
Examples
Here is how to format a matrix so that the minimum and maximum values of each row are mapped to default interval [-1,+1].
*
x1 = [1 2 4; 1 1 1; 3 2 2; 0 0 0]
[y1,PS] = mapminmax(x1)
Next, apply the same processing settings to new values.
*
x2 = [5 2 3; 1 1 1; 6 7 3; 0 0 0]
y2 = mapminmax('apply',x2,PS)
Reverse the processing of y1 to get x1 again.
*
x1_again = mapminmax('reverse',y1,PS)
Algorithm
It is assumed that X has only finite real values, and that the elements of each row are not all equal.
*
y = (ymax-ymin)*(x-xmin)/(xmax-xmin) + ymin;
Ⅷ PYTHON實現對CSV文件多維不同單位數據的歸一化處理
1)線性歸一化
這種歸一化比較適用在數值比較集中的情況,缺陷就是如果max和min不穩定,很容易使得歸一化結果不穩定,使得後續的效果不穩定,實際使用中可以用經驗常量來代替max和min。
2)標准差標准化
經過處理的數據符合標准正態分布,即均值為0,標准差為1。
3)非線性歸一化
經常用在數據分化較大的場景,有些數值大,有些很小。通過一些數學函數,將原始值進行映射。該方法包括log、指數、反正切等。需要根據數據分布的情況,決定非線性函數的曲線。
log函數:x = lg(x)/lg(max)
反正切函數:x = atan(x)*2/pi
Python實現
線性歸一化
定義數組:x = numpy.array(x)
獲取二維數組列方向的最大值:x.max(axis = 0)
獲取二維數組列方向的最小值:x.min(axis = 0)
對二維數組進行線性歸一化:
def max_min_normalization(data_value, data_col_max_values, data_col_min_values):
""" Data normalization using max value and min value
Args:
data_value: The data to be normalized
data_col_max_values: The maximum value of data's columns
data_col_min_values: The minimum value of data's columns
"""
data_shape = data_value.shape
data_rows = data_shape[0]
data_cols = data_shape[1]
for i in xrange(0, data_rows, 1):
for j in xrange(0, data_cols, 1):
data_value[i][j] = \
(data_value[i][j] - data_col_min_values[j]) / \
(data_col_max_values[j] - data_col_min_values[j])
標准差歸一化
定義數組:x = numpy.array(x)
獲取二維數組列方向的均值:x.mean(axis = 0)
獲取二維數組列方向的標准差:x.std(axis = 0)
對二維數組進行標准差歸一化:
def standard_deviation_normalization(data_value, data_col_means,
data_col_standard_deviation):
""" Data normalization using standard deviation
Args:
data_value: The data to be normalized
data_col_means: The means of data's columns
data_col_standard_deviation: The variance of data's columns
"""
data_shape = data_value.shape
data_rows = data_shape[0]
data_cols = data_shape[1]
for i in xrange(0, data_rows, 1):
for j in xrange(0, data_cols, 1):
data_value[i][j] = \
(data_value[i][j] - data_col_means[j]) / \
data_col_standard_deviation[j]
非線性歸一化(以lg為例)
定義數組:x = numpy.array(x)
獲取二維數組列方向的最大值:x.max(axis=0)
獲取二維數組每個元素的lg值:numpy.log10(x)
獲取二維數組列方向的最大值的lg值:numpy.log10(x.max(axis=0))
對二維數組使用lg進行非線性歸一化:
def nonlinearity_normalization_lg(data_value_after_lg,
data_col_max_values_after_lg):
""" Data normalization using lg
Args:
data_value_after_lg: The data to be normalized
data_col_max_values_after_lg: The maximum value of data's columns
"""
data_shape = data_value_after_lg.shape
data_rows = data_shape[0]
data_cols = data_shape[1]
for i in xrange(0, data_rows, 1):
for j in xrange(0, data_cols, 1):
data_value_after_lg[i][j] = \
data_value_after_lg[i][j] / data_col_max_values_after_lg[j]
Ⅸ python sklearn pca降維前需要數據歸一化嗎
不用
fromsklearn.decompositionimportPCA
pca=PCA(n_components=1)
newData=pca.fit_transform(data)
可以去這里看看,有詳細說明。
http://doc.okbase.net/u012162613/archive/120946.html
Ⅹ 在python上數據歸一化後怎樣還原
看到各位大佬們都會把原始數據進行歸一化,再處理。可是都沒有人講怎樣把歸一化的數據還原回來。
目前可找到的方法就只有matlab上的這個函數:
xtt = mapminmax('reverse',y1,ps)
在python上,就看到許多人推薦用sklearn進行歸一化,但沒有還原的方法呀。
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
你要問我為什麼 要還原?
把日期和氣溫的數據放到模型里跑半天,想看看下一天的氣溫,結果出來一個0.837之類東西。
sklearn中transform用來歸一化後,可以用inverse_transform還原。