3.3 Complexity control in overlapping stochastic block models (Pierre Latouche)
Listen now
Description
Networks are highly used to represent complex systems as sets of interactions between units of interest. For instance, regulatory networks can describe the regulation of genes with transcriptional factors while metabolic networks focus on representing pathways of biochemical reactions. In social sciences, networks are commonly used to represent relational ties between actors. Numerous graph clustering algorithms have been proposed since the earlier work of Moreno [2]. Most of them partition the vertices into disjoint clusters depending on their connection profiles. However, recent studies showed that these techniques were too restrictive since most existing networks contained overlapping clusters. To tackle this issue, we proposed the Overlapping Stochastic Block Model (OSBM) in [1]. This approach allows the vertices of a network to belong to multiple classes and can be seen as a generalization of the stochastic block model [3]. In [1], we developed a variational method to cluster the vertices of networks and showed that the algorithm had good clustering performances on both simulated and real data. However, no criterion was proposed to estimate the number of classes from the data, which is a major issue in practice. Here, we tackle this limit using a Bayesian framework. Thus, we introduce some priors over the model parameters and consider variational Bayes methods to approximate the full posterior distribution. We show how a model selection criterion can be obtained in order to estimate the number of (overlapping) clusters in a network. On both simulated and real data, we compare our work with other approaches.
More Episodes
Functional data are becoming increasingly common in a variety of fields. Many studies underline the importance to consider the representation of data as functions. This has sparked a growing attention in the development of adapted statistical tools that allow to analyze such kind of data :...
Published 12/03/14
The idea of selecting a model via penalizing a log-likelihood type criterion goes back to the early seventies with the pioneering works of Mallows and Akaike. One can find many consistency results in the literature for such criteria. These results are asymptotic in the sense that one deals with a...
Published 12/03/14