Home > wafo > kdetools > kdebin.m

kdebin

PURPOSE ^

Binned Kernel Density Estimator.

SYNOPSIS ^

f = kdebin(A,options,xlo,xup)

DESCRIPTION ^

 KDEBIN Binned Kernel Density Estimator. 
  
  CALL:  f = kdebin(data,options,xlo,xup) 
  
    f      = pdf structure containing: 
             kernel density estimate evaluated in meshgrid(x1,x2,..). 
    data   = data matrix, size N x D (D = # dimensions ) 
   options = kdeoptions-structure or cellvector of named parameters with 
             corresponding values, see kdeoptset for details. 
  xlo,xup  = vectors specifying the range of f. Must include the range of the 
             data. (default min(data)-range(data)/4, max(data)-range(data)/4) 
             If a single value of xlo or xup is given then the range is 
             the is the same in all directions. 
  
   KDEBIN gives very fast and accurate kernel density estimate. 
   Notice that densities close to normality appear to be the easiest for 
   the kernel estimator to estimate and that the degree of estimation 
   difficulty increases with skewness, kurtosis and multimodality. 
  
   If D>1 KDEBIN calculates quantile levels by integration. An alternative is  
   to calculate them by ranking the kernel density estimate obtained at the  
   points given in DATA i.e. use the commands  
  
    f    = kdebin( data , kernel) 
    tmp  = num2cell(data,1); 
    r    = evalpdf(f,tmp{:}) 
    f.cl = qlevels2(r,f.pl);  
  
   The first is probably best when estimating the pdf and the latter is the 
   easiest and most robust for 2D data when only a contour visualization of  
   the data is needed.   
  
  Examples:  
   data = wraylrnd(1,500,1); 
                                %Box-Cox transform data before estimation 
   f = kdebin(data,{'L2',.5,'inc',64});  
   pdfplot(f) 
                                %Non-parametric transformation 
   g   = cdf2tr(empdistr(data,[],0),mean(data),std(data)); 
   opt = kdeoptset('L2',{g},'inc',64);   
   f1  = kdebin(data,opt); 
   hold on, pdfplot(f1,'r'), hold off 
  
  See also  kde, kdeoptset, mkernel

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function f = kdebin(A,options,xlo,xup) 
0002 %KDEBIN Binned Kernel Density Estimator. 
0003 % 
0004 % CALL:  f = kdebin(data,options,xlo,xup) 
0005 % 
0006 %   f      = pdf structure containing: 
0007 %            kernel density estimate evaluated in meshgrid(x1,x2,..). 
0008 %   data   = data matrix, size N x D (D = # dimensions ) 
0009 %  options = kdeoptions-structure or cellvector of named parameters with 
0010 %            corresponding values, see kdeoptset for details. 
0011 % xlo,xup  = vectors specifying the range of f. Must include the range of the 
0012 %            data. (default min(data)-range(data)/4, max(data)-range(data)/4) 
0013 %            If a single value of xlo or xup is given then the range is 
0014 %            the is the same in all directions. 
0015 % 
0016 %  KDEBIN gives very fast and accurate kernel density estimate. 
0017 %  Notice that densities close to normality appear to be the easiest for 
0018 %  the kernel estimator to estimate and that the degree of estimation 
0019 %  difficulty increases with skewness, kurtosis and multimodality. 
0020 % 
0021 %  If D>1 KDEBIN calculates quantile levels by integration. An alternative is  
0022 %  to calculate them by ranking the kernel density estimate obtained at the  
0023 %  points given in DATA i.e. use the commands  
0024 % 
0025 %   f    = kdebin( data , kernel) 
0026 %   tmp  = num2cell(data,1); 
0027 %   r    = evalpdf(f,tmp{:}) 
0028 %   f.cl = qlevels2(r,f.pl);  
0029 % 
0030 %  The first is probably best when estimating the pdf and the latter is the 
0031 %  easiest and most robust for 2D data when only a contour visualization of  
0032 %  the data is needed.   
0033 % 
0034 % Examples:  
0035 %  data = wraylrnd(1,500,1); 
0036 %                               %Box-Cox transform data before estimation 
0037 %  f = kdebin(data,{'L2',.5,'inc',64});  
0038 %  pdfplot(f) 
0039 %                               %Non-parametric transformation 
0040 %  g   = cdf2tr(empdistr(data,[],0),mean(data),std(data)); 
0041 %  opt = kdeoptset('L2',{g},'inc',64);   
0042 %  f1  = kdebin(data,opt); 
0043 %  hold on, pdfplot(f1,'r'), hold off 
0044 % 
0045 % See also  kde, kdeoptset, mkernel 
0046  
0047 % Reference:   
0048 %  Wand,M.P. and Jones, M.C. (1995)  
0049 % 'Kernel smoothing' 
0050 %  Chapman and Hall, pp 182-192 
0051 % 
0052 %  B. W. Silverman (1986)  
0053 % 'Density estimation for statistics and data analysis'   
0054 %  Chapman and Hall pp 61--66 
0055  
0056  
0057  
0058 %Tested on: matlab 5.2, 5.3 
0059 % History: 
0060 % revised pab Feb 2005 
0061 %  -moved options into a options structure   
0062 % revised pab Dec2003 
0063 % -removed the binning to a separate function, gridcount.   
0064 % revised pab 05.08.2001 
0065 % - fixed a bug in the binning of c. 
0066 % - made the binning of c even faster using sparse and binc 
0067 % revised pab 27.04.2001 
0068 % -added call to mkernel2 
0069 % revised pab 19.12.2000 
0070 % - added the possibility that L2 is a cellarray of parametric 
0071 %   or non-parametric transformations (secret option) 
0072 % - fixed a bug in the calculation of xlo and xup 
0073 % revised pab 05.01.2000 
0074 %  - fixed a bug in back transformation 
0075 % revised pab 17.12.99 
0076 %  - fixed a bug in back transformation 
0077 % revised pab 09.12.99 
0078 %  -added alpha,L2 
0079 % revised pab 5.11.99 
0080 %  - fixed a bug: changed fftshift to ifftshift 
0081 % revised pab 21.10.99 
0082 %  - added the possibility that Hs is a smoothing matrix  
0083 % revised pab 15.10.99 
0084 %  updated documentation 
0085 % revised pab 21.09.99   
0086 %  - made it fully general for d dimensions 
0087 %  - improoved gridding 
0088 % adapted from kdfft1 and kdfft2 from kdetools by Christian C. Beardah 1994 
0089  
0090 defaultoptions = kdeoptset;   
0091 % If just 'defaults' passed in, return the default options in g 
0092 if ((nargin==1) & (nargout <= 1) &  isequal(A,'defaults')), 
0093   f = defaultoptions; 
0094   return 
0095 end   
0096 error(nargchk(1,4, nargin)) 
0097  
0098 [n, d]=size(A); % Find dimensions of A,  
0099                % n=number of data points, 
0100                % d=dimension of the data.   
0101 if (nargin<2 | isempty(options)) 
0102   options  = defaultoptions; 
0103 else 
0104   switch lower(class(options)) 
0105    case {'char','struct'}, 
0106     options = kdeoptset(defaultoptions,options); 
0107    case {'cell'} 
0108     
0109       options = kdeoptset(defaultoptions,options{:}); 
0110    otherwise 
0111     error('Invalid options') 
0112   end 
0113 end 
0114 kernel   = options.kernel; 
0115 h        = options.hs; 
0116 alpha    = options.alpha; 
0117 inc      = options.inc; 
0118 L2       = options.L2; 
0119 hsMethod = options.hsMethod; 
0120 if isempty(h),      h      = zeros(1,d);    end 
0121  
0122  
0123  
0124 L22 = cell(1,d); 
0125 k3  = []; 
0126 if isempty(L2) 
0127   L2 = ones(1,d);     % default no transformation 
0128 elseif iscell(L2)   % cellarray of non-parametric and parametric transformations 
0129   Nl2 = length(L2); 
0130   if ~(Nl2==1|Nl2==d), error('Wrong size of L2'), end 
0131   [L22{1:d}] = deal(L2{1:min(Nl2,d)}); 
0132   L2 = ones(1,d); % default no transformation 
0133   for ix=1:d, 
0134     if length(L22{ix})>1, 
0135       k3=[k3 ix];       % Non-parametric transformation 
0136     else  
0137      L2(ix) = L22{ix};  % Parameter to the Box-Cox transformation 
0138     end 
0139   end 
0140 elseif length(L2)==1 
0141   L2=L2(:,ones(1,d)); 
0142 end 
0143  
0144  
0145  
0146  
0147 amin = min(A); 
0148 if any((amin(L2~=1)<=0))  , 
0149   f=[]; 
0150   error('DATA cannot be negative or zero when L2~=1') 
0151 end 
0152  
0153 %new call 
0154 lA = A;   
0155  
0156 k1 = find(L2==0); % logaritmic transformation 
0157 if any(k1) 
0158   lA(:,k1)=log(A(:,k1)); 
0159 end 
0160 k2 = find(L2~=0 & L2~=1); % power transformation 
0161 if any(k2) 
0162   lA(:,k2)=sign(L2(ones(n,1),k2)).*A(:,k2).^L2(ones(n,1),k2); 
0163 end 
0164 % Non-parametric transformation 
0165 for ix = k3, 
0166   lA(:,ix) = tranproc(A(:,ix),L22{ix}); 
0167 end 
0168  
0169 amax = max(lA); 
0170 amin = min(lA); 
0171 xyzrange = amax-amin; 
0172  
0173  
0174  
0175 if nargin<3|isempty(xlo) 
0176   xlo=amin-xyzrange/4; 
0177   if any(k1) 
0178       xlo(k1)=max(min(.5*log(eps),amin(k1)-eps),xlo(k1)); 
0179   end       
0180   if any(k2)      
0181     %xlo(k2)=max(min(sign(L2(k2)).*(eps).^(L2(k2)/2),amin(k2)-eps),xlo(k2) );   
0182     ki=find(xlo(k2)<=0); 
0183     xlo(k2(ki)) = max(sign(L2(k2(ki))).*(eps).^(L2(k2(ki))/2),amin(k2(ki))-eps);  
0184     %xlo(k2(ki)) = amin(k2(ki))/2; 
0185   end 
0186   for ix = k3, 
0187     xlo(ix) = max(tranproc(sqrt(eps),L22{ix}),amin(ix)-eps); 
0188   end 
0189   xlo = min(xlo,amin); % make sure xlo<=amin pab 19.12.2000 
0190 else 
0191   if length(xlo)<d 
0192     xlo=xlo(1)*ones(1,d); 
0193   end 
0194   if any(k1), xlo(k1)=log(xlo(k1));  end 
0195   if any(k2), xlo(k2)=sign(L2(k2)).*xlo(k2).^L2(k2);  end 
0196   for ix = k3, 
0197     xlo(ix) = tranproc(xlo(ix),L22{ix}); 
0198   end 
0199   xlo = min(xlo,amin-eps); % make sure the range of the data is included 
0200 end 
0201  
0202 if nargin<4|isempty(xup) 
0203   xup=amax+xyzrange/4; 
0204 else 
0205   if length(xup)<d 
0206     xup=xup(1)*ones(1,d); 
0207   end 
0208   if any(k1), xup(k1) = log(xup(k1));  end 
0209   if any(k2), xup(k2) = sign(L2(k2)).*xup(k2).^L2(k2);  end 
0210   for ix = k3, 
0211     xup(ix) = tranproc(xup(ix),L22{ix}); 
0212   end 
0213   xup = max(xup,amax+eps); % make sure the range of the data is included 
0214 end 
0215  
0216  
0217 f = createpdf(d); 
0218 X = zeros(inc,d); 
0219 for ix=1:d 
0220   X(:,ix)=transpose(linspace(xlo(ix),xup(ix),inc)); 
0221   f.x{ix}=X(:,ix); 
0222 end 
0223  
0224 hsiz=size(h); 
0225 if (min(hsiz)==1)|(d==1) 
0226   if max(hsiz)==1, 
0227     h=h*ones(1,d); 
0228   else 
0229     h = reshape(h,[1 d]); % make sure it has the correct shape 
0230   end 
0231  
0232   ind=find(h<=0); 
0233   if any(ind)      % If no value of h has been specified by the user then  
0234     h(ind)=feval(hsMethod,lA(:,ind),kernel); % calculate optimal values.   
0235   end 
0236   deth = prod(h); 
0237   hvec = h; 
0238   HG   = 0; 
0239 else 
0240   deth=det(h); 
0241   if deth<=0, 
0242     error('bandwidth matrix h must be positive definit') 
0243   end 
0244   hvec=diag(h).'; 
0245   h1=inv(h); 
0246   HG=1; 
0247 end 
0248 options.hs = h; 
0249  
0250 % The kernel must be symmetric and compactly supported on [-tau tau] 
0251 % if the kernel has infinite support then the kernel must have  
0252 % the effective support in [-tau tau], i.e., be negligible outside the range 
0253 tau=1; 
0254 switch lower(kernel(1:4)) 
0255   case 'epan', tstr= 'Epanechnikov';%  - Epanechnikov kernel. (default) 
0256   case 'biwe', tstr= 'Biweight'; %     - Bi-weight kernel. 
0257   case 'triw', tstr= 'Triweight';%     - Tri-weight kernel.   
0258   case 'tria', tstr= 'Triangular';%    - Triangular kernel. 
0259   case 'gaus', tstr= 'Gaussian';%      - Gaussian kernel 
0260     tau=4; 
0261   case 'rect', tstr= 'Rectangular';%   - Rectangular kernel.  
0262   case 'lapl', tstr= 'Laplace';%       - Laplace kernel. 
0263     tau=7; 
0264   case 'logi', tstr= 'Logistic';%      - Logistic kernel. 
0265     tau=7; 
0266 end 
0267  
0268 L1 = max(floor(tau*hvec.*(inc-1)./(xyzrange))); 
0269 L  = min(L1,inc-1); 
0270 if d<2 
0271   fsiz=[inc,1]; 
0272   nfft=[2*inc,1]; 
0273 else 
0274   fsiz=inc*ones(1,d); 
0275   nfft=2*inc.*ones(1,d); 
0276 end 
0277  
0278 dx = (xup-xlo)./(inc-1); 
0279 X1 = cell(d,1); 
0280 if HG, 
0281     % new call 
0282     X1=num2cell([-L:L]'*dx,1); 
0283     % X1=num2cell([0:L]'*dx,1); 
0284     if d<=3, 
0285       [X1{:}]=meshgrid(X1{:}); 
0286     else   
0287       disp('Dimension of data large, this will take a while.') 
0288       [X1{:}]=ndgrid(X1{:}); 
0289     end 
0290      
0291     for ix=1:d 
0292       X1{ix}=X1{ix}(:); 
0293     end 
0294     X1=num2cell([X1{:}]*h1,1); 
0295     for ix=1:d 
0296       X1{ix}=reshape(X1{ix},(2*L+1)*ones(1,d)); 
0297     end   
0298 else 
0299   switch d  
0300     case 1, X1{1}=transpose(dx/h*[-L:L]);   
0301     case {2,3}  
0302       X1      = num2cell([-L:L]'*(dx./h),1); 
0303       [X1{:}] = meshgrid(X1{:}); 
0304     otherwise ,   
0305       disp('Dimension of data large, this will take a while.') 
0306       % new call 
0307       X1      = num2cell([-L:L]'*(dx./h),1); 
0308       [X1{:}] = ndgrid(X1{:}); 
0309   end 
0310 end 
0311 % Obtain the kernel weights. 
0312  
0313 if 1 %HG, % 
0314   kw = zeros(nfft); 
0315   indk(1:d) = {(inc-L+1):(inc+L+1)}; 
0316   kw(indk{:}) = mkernel(X1{:},kernel)/(n*deth); 
0317   % Apply 'ifftshift' to the kernel weights, kw. 
0318   kw = ifftshift(kw); 
0319 else 
0320   kw1 = zeros(floor(nfft/2)+1);   
0321   indk(1:d)  = {1:(L+1)}; 
0322   kw1(indk{:}) = mkernel(X1{:},kernel)/(n*deth); 
0323   kw = fftce(kw1); % circulant embedding 
0324   clear kw1  
0325 end 
0326  
0327 % Find the binned kernel weights, c. 
0328 c = gridcount(lA,X); 
0329  
0330 % Perform the convolution. 
0331 z = real(ifftn(fftn(c,nfft).*fftn(kw))); 
0332 clear kw 
0333  
0334 % New call pab 19.04.2001 
0335 %indk = repmat({1:inc}, d, 1); % initialize subscripts 
0336 indk(1:d) = {1:inc};  % alternatively 
0337 f.f = z(indk{:}).*(z(indk{:})>0); 
0338 clear z indk 
0339  
0340 %f.kernel=tstr; 
0341 %f.hs=h; 
0342  
0343 if (alpha>0), % adaptive kde 
0344   f.f=f.f(:);  
0345   indc=transpose(find(c>0)); 
0346  
0347   if 0 % clipping to make sure lambda do not get too large 
0348     minf=sqrt(eps) % make sure we sum over f.f(xi)>0 
0349     ind=find(f.f(:)>minf);  
0350   else 
0351     ind=indc; 
0352   end 
0353   % this gives smaller lambda than kdefun! Why??? 
0354   g=exp(sum(log(f.f(ind)))/length(ind(:))); % geometric mean of f 
0355   lambda=ones(fsiz); 
0356   
0357   lambda(ind)=(f.f(ind)/g).^(-alpha); %  local bandwidth factor  
0358    
0359   
0360   Nx=inc^d; 
0361   f.f=zeros(Nx,1); 
0362   switch d  
0363     case 1, X1{1}=transpose(dx*[1:inc]);   
0364     case {2,3}  
0365       X1=num2cell([1:inc]'*dx,1); 
0366       [X1{:}]=meshgrid(X1{:}); 
0367     otherwise ,   
0368       %disp('Dimension of data large, this will take a while.') 
0369       % new call 
0370       X1=num2cell([1:inc]'*dx,1); 
0371       [X1{:}]=ndgrid(X1{:}); 
0372     end 
0373     for ix=1:d 
0374       X1{ix}=X1{ix}(:); 
0375     end 
0376     lX=[X1{:}]; 
0377  
0378   if ~HG ,%(min(hsiz)==1)|(d==1) 
0379     lX=lX./h(ones(Nx,1),:);  
0380     for ix=indc,%Sum over all data points 
0381       Avec=lX(ix,:); 
0382       Xnn=(lX-Avec(ones(Nx,1),:))/(lambda(ix));   
0383       f.f=f.f+mkernel2(Xnn,kernel)*c(ix)/(lambda(ix)^d);% pab 27.04.01 
0384     end 
0385   else % adaptive kde % fully general 
0386     for ix=indc,%  Sum over all data points 
0387      Avec=lX(ix,:); 
0388      Xnn=(lX-Avec(ones(Nx,1),:))*(h1/lambda(ix)); 
0389      f.f=f.f+mkernel2(Xnn,kernel)*c(ix)/(lambda(ix)^d);% pab 27.04.01 
0390     end     
0391   end 
0392   f.f=reshape(f.f,fsiz)/(n*deth); 
0393   %f.alpha=alpha; 
0394   f.lambda=lambda; 
0395   f.title=['Adaptive Binned Kernel density estimate ( ',tstr,' )']; 
0396 else 
0397   f.title=['Binned Kernel density estimate ( ',tstr,' )']; 
0398 end   
0399  
0400 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
0401 %        transforming back         % 
0402 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
0403  
0404  
0405 if any(k1)|any(k2)|any(k3), 
0406   X1=cell(d,1); 
0407   switch d 
0408     case 1,X1{1}=f.x{1}; 
0409     case {2 3}, [X1{:}]=meshgrid(f.x{:}); 
0410     otherwise, [X1{:}]=ndgrid(f.x{:}); 
0411   end 
0412   if any(k1), % L2=0 i.e. logaritmic transformation 
0413     for ix=k1 
0414       f.f=f.f./exp(X1{ix}); 
0415       f.x{ix}=exp(f.x{ix}); 
0416     end 
0417     if any(max(abs(diff(f.f)))>10) 
0418       disp('Warning: Numerical problems may have occured due to the logaritmic') 
0419       disp('transformation. Check the KDE for spurious spikes') 
0420     end 
0421   end 
0422   if any(k2) % L2~=0 i.e. power transformation 
0423     for ix=k2 
0424       f.f=f.f.*((sign(L2(ix)).*X1{ix}.^(1/L2(ix))).... 
0425       .^(L2(ix)-1))*L2(ix)*sign(L2(ix)); 
0426       f.x{ix}=(sign(L2(ix)).*f.x{ix}).^(1/L2(ix)); 
0427     end 
0428     if any(max(abs(diff(f.f)))>10) 
0429       disp('Warning: Numerical problems may have occured due to the power') 
0430       disp('transformation. Check the KDE for spurious spikes') 
0431     end 
0432   end 
0433   if any(k3), % non-parametric transformation 
0434     oneC = ones(inc,1); 
0435     oneD = ones(1,d); 
0436     for ix=k3 
0437       gn  = L22{ix};  
0438       Gn  = fliplr(L22{ix}); 
0439       x0  = tranproc(f.x{ix},Gn); 
0440       if any(isnan(x0)), 
0441     error('The transformation does not have a strictly positive derivative.') 
0442       end 
0443       hg1  = tranproc([x0 oneC],gn); 
0444       der1 = abs(hg1(:,2)); % dg(X)/dX = 1/(dG(Y)/dY) 
0445       % alternative 2 
0446       %pp  = smooth(Gn(:,1),Gn(:,2),1,[],1); 
0447       %dpp = diffpp(pp); 
0448       %der1 = 1./abs(ppval(dpp,f.x{ix})); 
0449       % Alternative 3 
0450       %pp  = smooth(gn(:,1),gn(:,2),1,[],1); 
0451       %dpp = diffpp(pp); 
0452       %%plot(hg1(:,1),der1-abs(ppval(dpp,x0))) 
0453       %der1 = abs(ppval(dpp,x0)); 
0454       if any(der1<=0),  
0455     error(['The transformation must have a strictly positive derivative']) 
0456       end 
0457      
0458       switch d, 
0459     case 1,  f.f = f.f.*der1; 
0460     case {2,3}, 
0461       oneD(ix)  = inc;  
0462       oneD(1:2) = oneD(2:-1:1);       
0463       f.f = f.f.*repmat(reshape(der1,oneD),inc+1-oneD); 
0464       oneD(1:2) = oneD(2:-1:1); 
0465       oneD(ix)  = 1; 
0466     otherwise 
0467       oneD(ix) = inc; 
0468       f.f = f.f.*repmat(reshape(der1,oneD),inc+1-oneD) 
0469        oneD(ix) = 1; 
0470       end 
0471        
0472       f.x{ix} = x0; 
0473     end 
0474     if any(max(abs(diff(f.f)))>10) 
0475       disp('Warning: Numerical problems may have occured due to the power') 
0476       disp('transformation. Check the KDE for spurious spikes') 
0477     end 
0478   end 
0479   if 1 
0480   tmp=f; 
0481  % pdfplot(f) 
0482   for ix=[k1 k2 k3] 
0483     f.x{ix}=transpose(linspace(f.x{ix}(1),f.x{ix}(end),inc)); 
0484   end 
0485   switch d 
0486     case 1,X1{1}=f.x{1}; 
0487     case {2 3}, [X1{:}] = meshgrid(f.x{:}); 
0488     otherwise,  [X1{:}] = ndgrid(f.x{:}); 
0489   end 
0490   % interpolating to obtain equidistant grid 
0491   if 1, 
0492     f.f = evalpdf(tmp,X1{:},'linear'); 
0493   else 
0494     tmp.f = log(tmp.f+eps); 
0495     f.f=max(exp(evalpdf(tmp,X1{:},'spline'))-eps,0); 
0496   end 
0497   clear tmp X1 
0498   end 
0499   %who 
0500 end 
0501 f.note   = f.title; 
0502 %f.kernel = tstr; 
0503 %f.alpha  = alpha; 
0504 %f.l2     = L2; 
0505 f.options = options; 
0506 f.n      = n; 
0507 if d>1 
0508   [ql PL] = qlevels(f.f); 
0509   f.cl = ql; 
0510   f.pl = PL; 
0511 end 
0512  
0513  
0514  
0515  
0516  
0517  
0518  
0519

Mathematical Statistics
Centre for Mathematical Sciences
Lund University with Lund Institute of Technology

Comments or corrections to the WAFO group


Generated on Thu 06-Oct-2005 02:21:16 for WAFO by m2html © 2003