Title: | Vector Summaries of Persistence Diagrams |
---|---|
Description: | Tools for computing various vector summaries of persistence diagrams studied in Topological Data Analysis. For improved computational efficiency, all code for the vector summaries is written in 'C++' using the 'Rcpp' package. |
Authors: | Umar Islambekov [aut], Alexey Luchinsky [aut, cre], Hasani Pathirana [ctb] |
Maintainer: | Alexey Luchinsky <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.3 |
Built: | 2024-11-23 05:49:37 UTC |
Source: | https://github.com/aluchinsky/tdavec |
Vectorizes the Euler characteristic curve
where are the Betti curves corresponding to persistence diagrams
of dimeansions
respectively, all computed from the same filtration
computeECC(D, maxhomDim, scaleSeq)
computeECC(D, maxhomDim, scaleSeq)
D |
matrix with three columns containing the dimension, birth and death values respectively |
maxhomDim |
maximum homological dimension considered (0 for |
scaleSeq |
numeric vector of increasing scale values used for vectorization |
A numeric vector whose elements are the average values of the Euler characteristic curve computed between each pair of
consecutive scale points of scaleSeq
=:
where
Umar Islambekov
1. Richardson, E., & Werman, M. (2014). Efficient classification using the Euler characteristic. Pattern Recognition Letters, 49, 99-106.
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute ECC computeECC(D,maxhomDim=1,scaleSeq)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute ECC computeECC(D,maxhomDim=1,scaleSeq)
For a given persistence diagram ,
computeNL()
vectorizes the normalized life (NL) curve
where . Points of
with infinite death value are ignored
computeNL(D, homDim, scaleSeq)
computeNL(D, homDim, scaleSeq)
D |
matrix with three columns containing the dimension, birth and death values respectively |
homDim |
homological dimension (0 for |
scaleSeq |
numeric vector of increasing scale values used for vectorization |
A numeric vector whose elements are the average values of the persistent entropy summary function computed between each pair of consecutive scale points of scaleSeq
=:
where
Umar Islambekov
Chung, Y. M., & Lawson, A. (2022). Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1), 1-42.
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute NL for homological dimension H_0 computeNL(D,homDim=0,scaleSeq) # compute NL for homological dimension H_1 computeNL(D,homDim=1,scaleSeq)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute NL for homological dimension H_0 computeNL(D,homDim=0,scaleSeq) # compute NL for homological dimension H_1 computeNL(D,homDim=1,scaleSeq)
For a given persistence diagram ,
computePES()
vectorizes the persistent entropy summary (PES) function
where and
. Points of
with infinite death value are ignored
computePES(D, homDim, scaleSeq)
computePES(D, homDim, scaleSeq)
D |
matrix with three columns containing the dimension, birth and death values respectively |
homDim |
homological dimension (0 for |
scaleSeq |
numeric vector of increasing scale values used for vectorization |
A numeric vector whose elements are the average values of the persistent entropy summary function computed between each pair of consecutive scale points of scaleSeq
=:
where
Umar Islambekov
1. Atienza, N., Gonzalez-Díaz, R., & Soriano-Trigueros, M. (2020). On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition, 107, 107509.
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute PES for homological dimension H_0 computePES(D,homDim=0,scaleSeq) # compute PES for homological dimension H_1 computePES(D,homDim=1,scaleSeq)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute PES for homological dimension H_0 computePES(D,homDim=0,scaleSeq) # compute PES for homological dimension H_1 computePES(D,homDim=1,scaleSeq)
For a given persistence diagram ,
computePI()
computes the persistence image (PI) - a vector summary of the persistence surface:
where is
the Gaussian distribution with mean
and
covariance matrix
and
is the weighting function with being the maximum persistence value among all persistence diagrams considered in the experiment. Points of
with infinite persistence value are ignored
computePI(D, homDim, xSeq, ySeq, sigma)
computePI(D, homDim, xSeq, ySeq, sigma)
D |
matrix with three columns containing the dimension, birth and persistence values respectively |
homDim |
homological dimension (0 for |
xSeq |
numeric vector of increasing x (birth) values used for vectorization |
ySeq |
numeric vector of increasing y (persistence) values used for vectorization |
sigma |
standard deviation of the Gaussian |
A numeric vector whose elements are the average values of the persistence surface computed over each cell of the two-dimensional grid constructred from xSeq
= and
ySeq
=:
where ,
and
Umar Islambekov
1. Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., ... & Ziegelmeier, L. (2017). Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18.
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram # switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" resB <- 5 # resolution (or grid size) along the birth axis resP <- 5 # resolution (or grid size) along the persistence axis # compute PI for homological dimension H_0 minPH0 <- min(D[D[,1]==0,3]); maxPH0 <- max(D[D[,1]==0,3]) ySeqH0 <- seq(minPH0,maxPH0,length.out=resP+1) sigma <- 0.5*(maxPH0-minPH0)/resP computePI(D,homDim=0,xSeq=NA,ySeqH0,sigma) # compute PI for homological dimension H_1 minBH1 <- min(D[D[,1]==1,2]); maxBH1 <- max(D[D[,1]==1,2]) minPH1 <- min(D[D[,1]==1,3]); maxPH1 <- max(D[D[,1]==1,3]) xSeqH1 <- seq(minBH1,maxBH1,length.out=resB+1) ySeqH1 <- seq(minPH1,maxPH1,length.out=resP+1) sigma <- 0.5*(maxPH1-minPH1)/resP computePI(D,homDim=1,xSeqH1,ySeqH1,sigma)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram # switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" resB <- 5 # resolution (or grid size) along the birth axis resP <- 5 # resolution (or grid size) along the persistence axis # compute PI for homological dimension H_0 minPH0 <- min(D[D[,1]==0,3]); maxPH0 <- max(D[D[,1]==0,3]) ySeqH0 <- seq(minPH0,maxPH0,length.out=resP+1) sigma <- 0.5*(maxPH0-minPH0)/resP computePI(D,homDim=0,xSeq=NA,ySeqH0,sigma) # compute PI for homological dimension H_1 minBH1 <- min(D[D[,1]==1,2]); maxBH1 <- max(D[D[,1]==1,2]) minPH1 <- min(D[D[,1]==1,3]); maxPH1 <- max(D[D[,1]==1,3]) xSeqH1 <- seq(minBH1,maxBH1,length.out=resB+1) ySeqH1 <- seq(minPH1,maxPH1,length.out=resP+1) sigma <- 0.5*(maxPH1-minPH1)/resP computePI(D,homDim=1,xSeqH1,ySeqH1,sigma)
Vectorizes the persistence landscape (PL) function constructed from a given persistence diagram. The th order landscape function of a persistence diagram
is defined as
where returns the
th largest value and
computePL(D, homDim, scaleSeq, k=1)
computePL(D, homDim, scaleSeq, k=1)
D |
matrix with three columns containing the dimension, birth and death values respectively |
homDim |
homological dimension (0 for |
scaleSeq |
numeric vector of increasing scale values used for vectorization |
k |
order of landscape function. By default, |
A numeric vector whose elements are the values of the th order landscape function evaluated at each point of
scaleSeq
=:
Umar Islambekov
1. Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16(1), 77-102.
2. Chazal, F., Fasy, B. T., Lecci, F., Rinaldo, A., & Wasserman, L. (2014, June). Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the thirtieth annual symposium on Computational geometry (pp. 474-483).
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute persistence landscape (PL) for homological dimension H_0 with order of landscape k=1 computePL(D,homDim=0,scaleSeq,k=1) # compute persistence landscape (PL) for homological dimension H_1 with order of landscape k=1 computePL(D,homDim=1,scaleSeq,k=1)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute persistence landscape (PL) for homological dimension H_0 with order of landscape k=1 computePL(D,homDim=0,scaleSeq,k=1) # compute persistence landscape (PL) for homological dimension H_1 with order of landscape k=1 computePL(D,homDim=1,scaleSeq,k=1)
Vectorizes the persistence silhouette (PS) function constructed from a given persistence diagram. The th power silhouette function of a persistence diagram
is defined as
where
Points of with infinite death value are ignored
computePS(D, homDim, scaleSeq, p=1)
computePS(D, homDim, scaleSeq, p=1)
D |
matrix with three columns containing the dimension, birth and death values respectively |
homDim |
homological dimension (0 for |
scaleSeq |
numeric vector of increasing scale values used for vectorization |
p |
power of the weights for the silhouette function. By default, |
A numeric vector whose elements are the average values of the th power silhouette function computed between each pair of
consecutive scale points of
scaleSeq
=:
where
Umar Islambekov
1. Chazal, F., Fasy, B. T., Lecci, F., Rinaldo, A., & Wasserman, L. (2014). Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the thirtieth annual symposium on Computational geometry (pp. 474-483).
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute persistence silhouette (PS) for homological dimension H_0 computePS(D,homDim=0,scaleSeq,p=1) # compute persistence silhouette (PS) for homological dimension H_1 computePS(D,homDim=1,scaleSeq,p=1)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute persistence silhouette (PS) for homological dimension H_0 computePS(D,homDim=0,scaleSeq,p=1) # compute persistence silhouette (PS) for homological dimension H_1 computePS(D,homDim=1,scaleSeq,p=1)
For a given persistence diagram ,
computeVAB()
vectorizes the Betti Curve
where the weight function
computeVAB(D, homDim, scaleSeq)
computeVAB(D, homDim, scaleSeq)
D |
matrix with three columns containing the dimension, birth and death values respectively |
homDim |
homological dimension (0 for |
scaleSeq |
numeric vector of increasing scale values used for vectorization |
A numeric vector whose elements are the average values of the Betti curve computed between each pair of
consecutive scale points of scaleSeq
=:
where
Umar Islambekov, Hasani Pathirana
1. Chazal, F., & Michel, B. (2021). An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists. Frontiers in Artificial Intelligence, 108.
2. Chung, Y. M., & Lawson, A. (2022). Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1), 1-42.
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute vector of averaged Bettis (VAB) for homological dimension H_0 computeVAB(D,homDim=0,scaleSeq) # compute vector of averaged Bettis (VAB) for homological dimension H_1 computeVAB(D,homDim=1,scaleSeq)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram scaleSeq = seq(0,2,length.out=11) # sequence of scale values # compute vector of averaged Bettis (VAB) for homological dimension H_0 computeVAB(D,homDim=0,scaleSeq) # compute vector of averaged Bettis (VAB) for homological dimension H_1 computeVAB(D,homDim=1,scaleSeq)
For a given persistence diagram ,
computeVPB()
vectorizes the persistence block
where and
with
. Points of
with infinite persistence value are ignored
computeVPB(D, homDim, xSeq, ySeq, tau)
computeVPB(D, homDim, xSeq, ySeq, tau)
D |
matrix with three columns containing the dimension, birth and persistence values respectively |
homDim |
homological dimension (0 for |
xSeq |
numeric vector of increasing x (birth) values used for vectorization |
ySeq |
numeric vector of increasing y (persistence) values used for vectorization |
tau |
parameter (between 0 and 1) controlling block size. By default, |
A numeric vector whose elements are the weighted averages of the persistence block computed over each cell of the two-dimensional grid constructred from xSeq
= and
ySeq
=:
where ,
and
Umar Islambekov, Aleksei Luchinsky
1. Chan, K. C., Islambekov, U., Luchinsky, A., & Sanders, R. (2022). A computationally efficient framework for vector representation of persistence diagrams. Journal of Machine Learning Research 23, 1-33.
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram # switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" # construct one-dimensional grid of scale values ySeqH0 <- unique(quantile(D[D[,1]==0,3],probs = seq(0,1,by=0.2))) tau <- 0.3 # parameter in [0,1] which controls the size of blocks around each point of the diagram # compute VPB for homological dimension H_0 computeVPB(D,homDim = 0,xSeq=NA,ySeqH0,tau) xSeqH1 <- unique(quantile(D[D[,1]==1,2],probs = seq(0,1,by=0.2))) ySeqH1 <- unique(quantile(D[D[,1]==1,3],probs = seq(0,1,by=0.2))) # compute VPB for homological dimension H_1 computeVPB(D,homDim = 1,xSeqH1,ySeqH1,tau)
N <- 100 set.seed(123) # sample N points uniformly from unit circle and add Gaussian noise X <- TDA::circleUnif(N,r=1) + rnorm(2*N,mean = 0,sd = 0.2) # compute a persistence diagram using the Rips filtration built on top of X D <- TDA::ripsDiag(X,maxdimension = 1,maxscale = 2)$diagram # switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" # construct one-dimensional grid of scale values ySeqH0 <- unique(quantile(D[D[,1]==0,3],probs = seq(0,1,by=0.2))) tau <- 0.3 # parameter in [0,1] which controls the size of blocks around each point of the diagram # compute VPB for homological dimension H_0 computeVPB(D,homDim = 0,xSeq=NA,ySeqH0,tau) xSeqH1 <- unique(quantile(D[D[,1]==1,2],probs = seq(0,1,by=0.2))) ySeqH1 <- unique(quantile(D[D[,1]==1,3],probs = seq(0,1,by=0.2))) # compute VPB for homological dimension H_1 computeVPB(D,homDim = 1,xSeqH1,ySeqH1,tau)