Publications by model

acs cnets spns aogs pdgs psdds pcs
Arithmetic circuits [ acs ]
  • Probabilistic Graphical Models 2020
    A New Perspective on Learning Context-Specific Independence
    Shen, Yujia and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              Local structure such as context-specific independence (CSI) has received much attention in the probabilistic graphical model (PGM) literature, as it facilitates the modeling of large complex systems, as well as for reasoning with them. In this paper, we provide a new perspective on how to learn CSIs from data. We propose to first learn a functional and parameterized representation of a conditional probability table (CPT), such as a neural network. Next, we quantize this continuous function, into an arithmetic circuit representation that facilitates efficient inference. In the first step, we can leverage the many powerful tools that have been developed in the machine learning literature. In the second step, we exploit more recently-developed analytic tools from explainable AI, for the purposes of learning CSIs. Finally, we contrast our approach, empirically and conceptually, with more traditional variable-splitting approaches, that search for CSIs more explicitly.
          
    
              @inproceedings{shen2020new,
    author = {Shen, Yujia and Choi, Arthur and Darwiche, Adnan},
    title = {A New Perspective on Learning Context-Specific Independence},
    booktitle = {Probabilistic Graphical Models},
    year = {2020}
    }
  • ICML 2019
    Conditional Independence in Testing Bayesian Networks
    Shen, Yujia and Huang, Haiying and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              Testing Bayesian Networks (TBNs) were introduced recently to represent a set of distributions, one of which is selected based on the given evidence and used for reasoning. TBNs are more expressive than classical Bayesian Networks (BNs): Marginal queries correspond to multi-linear functions in BNs and to piecewise multi-linear functions in TBNs. Moreover, TBN queries are universal approximators, like neural networks. In this paper, we study conditional independence in TBNs, showing that it can be inferred from d-separation as in BNs. We also study the role of TBN expressiveness and independence in dealing with the problem of learning with incomplete models (i.e., ones that miss nodes or edges from the data-generating model). Finally, we illustrate our results on a number of concrete examples, including a case study on Hidden Markov Models.
          
    
              @inproceedings{shen2019conditional,
    author = {Shen, Yujia and Huang, Haiying and Choi, Arthur and Darwiche, Adnan},
    title = {Conditional Independence in Testing Bayesian Networks},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {97},
    pages = {5701--5709},
    publisher = {{PMLR}},
    year = {2019}
    }
  • ISIPTA 2019
    Robust Analysis of MAP Inference in Selective Sum-Product Networks
    Llerena, Julissa Giuliana Villanueva and Maua, Denis Deratani
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPN) are deep probabilistic models that have shown to achieve state-of-the-art performance in several machine learning tasks. As with many other probabilistic models, performing Maximum-A-Posteriori (MAP) inference is NP-hard in SPNs. A notable exception is selective SPNs, that constrain the network so as to allow MAP inference to be performed in linear time. Due to the high number of parameters, SPNs learned from data can produce unreliable and overconfident inference; this phenomenon can be partially mitigated by performing a Robustness Analysis of the model predictions to changes in the parameters. In this work, we address the problem of assessing the robustness of MAP inferences produced with Selective SPNs to global perturbations of the parameters. We present efficient algorithms and an empirical analysis with realistic problems.
          
    
              @inproceedings{llerena2019robust,
    author = { Llerena, Julissa Giuliana Villanueva and Deratani Mau{'{a, Denis},
    title = {Robust Analysis of {MAP} Inference in Selective Sum-Product Networks},
    booktitle = {{ISIPTA}},
    series = {Proceedings of Machine Learning Research},
    volume = {103},
    pages = {430--440},
    publisher = {{PMLR}},
    year = {2019}
    }
  • Int. J. Approx. Reason. 2019
    On the relative expressiveness of Bayesian and neural networks
    Choi, Arthur and Wang, Ruocheng and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              A neural network computes a function. A central property of neural networks is that they are ''universal approximators:'' for a given continuous function, there exists a neural network that can approximate it arbitrarily well, given enough neurons (and some additional assumptions). In contrast, a Bayesian network is a model, but each of its queries can be viewed as computing a function. In this paper, we identify some key distinctions between the functions computed by neural networks and those by marginal Bayesian network queries, showing that the former are more expressive than the latter. Moreover, we propose a simple augmentation to Bayesian networks (a testing operator), which enables their marginal queries to become ''universal approximators.''
          
    
              @article{choi2019relative,
    author = {Choi, Arthur and Wang, Ruocheng and Darwiche, Adnan},
    title = {On the relative expressiveness of Bayesian and neural networks},
    journal = {Int. J. Approx. Reason.},
    volume = {113},
    pages = {303--323},
    year = {2019}
    }
  • DAC 2019
    ProbLP: A framework for low-precision probabilistic inference
    Shah, Nimish and Galindez Olascoaga, Laura Isabel and Meert, Wannes and Verhelst, Marian
    [ paper | bibtex | abstract ]
    
              Bayesian reasoning is a powerful mechanism for probabilistic inference in smart edge-devices. During such inferences, a low-precision arithmetic representation can enable improved energy efficiency. However, its impact on inference accuracy is not yet understood. Furthermore, general-purpose hardware does not natively support low-precision representation. To address this, we propose ProbLP, a framework that automates the analysis and design of low-precision probabilistic inference hardware. It automatically chooses an appropriate energy-efficient representation based on worst-case error-bounds and hardware energy-models. It generates custom hardware for the resulting inference network exploiting parallelism, pipelining and low-precision operation. The framework is validated on several embedded-sensing benchmarks.
          
    
              @inproceedings{shah2019problp:,
    author = {Shah, Nimish and Isabel Galindez Olascoaga, Laura and Meert, Wannes and Verhelst, Marian},
    title = {ProbLP: A framework for low-precision probabilistic inference},
    booktitle = {DAC 2019},
    pages = {190},
    publisher = {ACM},
    year = {2019},
    url = {https://doi.org/10.1145/3316781.3317885},
    doi = {10.1145/3316781.3317885}
    }
  • NeurIPS 2018
    Approximate Knowledge Compilation by Online Collapsed Importance Sampling
    Friedman, Tal and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              We introduce collapsed compilation, a novel approximate inference algorithm for discrete probabilistic graphical models. It is a collapsed sampling algorithm that incrementally selects which variable to sample next based on the partial compila- tion obtained so far. This online collapsing, together with knowledge compilation inference on the remaining variables, naturally exploits local structure and context- specific independence in the distribution. These properties are used implicitly in exact inference, but are difficult to harness for approximate inference. More- over, by having a partially compiled circuit available during sampling, collapsed compilation has access to a highly effective proposal distribution for importance sampling. Our experimental evaluation shows that collapsed compilation performs well on standard benchmarks. In particular, when the amount of exact inference is equally limited, collapsed compilation is competitive with the state of the art, and outperforms it on several benchmarks.
          
    
              @inproceedings{friedman2018approximate,
    author = {Friedman, Tal and Van den Broeck, Guy},
    title = {Approximate Knowledge Compilation by Online Collapsed Importance Sampling},
    booktitle = {NeurIPS},
    pages = {8035--8045},
    year = {2018}
    }
  • ICML 2017
    On Relaxing Determinism in Arithmetic Circuits
    Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              The past decade has seen a significant interest in learning tractable probabilistic representations. Arithmetic circuits (ACs) were among the first proposed tractable representations, with some subsequent representations being instances of ACs with weaker or stronger properties. In this paper, we provide a formal basis under which variants on ACs can be compared, and where the precise roles and semantics of their various properties can be made more transparent. This allows us to place some recent developments on ACs in a clearer perspective and to also derive new results for ACs. This includes an exponential separation between ACs with and without determinism; completeness and incompleteness results; and tractability results (or lack thereof) when computing most probable explanations (MPEs).
          
    
              @inproceedings{choi2017relaxing,
    author = {Choi, Arthur and Darwiche, Adnan},
    title = {On Relaxing Determinism in Arithmetic Circuits},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {70},
    pages = {825--833},
    publisher = {{PMLR}},
    year = {2017}
    }
  • AISTATS 2016
    Discriminative Structure Learning of Arithmetic Circuits
    Rooshenas, Amirmohammad and Lowd, Daniel
    [ paper | bibtex | abstract ]
    
              The biggest limitation of probabilistic graphical models is the complexity of inference, which is often intractable. An appealing alternative is to use tractable probabilistic models, such as arithmetic circuits (ACs) and sum-product networks (SPNs), in which marginal and conditional queries can be answered efficiently. In this paper, we present the first discriminative structure learning algorithm for ACs, DACLearn (Discriminative AC Learner). Like previous work on generative structure learning, DACLearn finds a log-linear model with conjunctive features, using the size of an equivalent AC representation as a learning bias. Unlike previous work, DACLearn optimizes conditional likelihood, resulting in a more accurate conditional distribution. DACLearn also learns much more compact ACs than generative methods, since it does not need to represent a consistent distribution over the evidence variables. To ensure efficiency, DACLearn uses novel initialization and search heuristics to drastically reduce the number of feature evaluations required to learn an accurate model. In experiments on 20 benchmark domains, we find that our DACLearn learns models that are more accurate and compact than other tractable generative and discriminative methods.
          
    
              @inproceedings{rooshenas2016discriminative,
    author = {Rooshenas, Amirmohammad and Lowd, Daniel},
    title = {Discriminative Structure Learning of Arithmetic Circuits},
    booktitle = {{AISTATS}},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {51},
    pages = {1506--1514},
    publisher = {JMLR.org},
    year = {2016}
    }
  • NIPS 2016
    Tractable Operations for Arithmetic Circuits of Probabilistic Models
    Shen, Yujia and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              We consider tractable representations of probability distributions and the polytime operations they support. In particular, we consider a recently proposed arithmetic circuit representation, the Probabilistic Sentential Decision Diagram (PSDD). We show that PSDD supports a polytime multiplication operator, while they do not support a polytime operator for summing-out variables. A polytime multiplication operator make PSDDs suitable for a broader class of applications compared to arithmetic circuits, which do not in general support multiplication. As one example, we show that PSDD multiplication leads to a very simple but effective compilation algorithm for probabilistic graphical models: represent each model factor as a PSDD, and then multiply them.
          
    
              @inproceedings{shen2016tractable,
    author = {Shen, Yujia and Choi, Arthur and Darwiche, Adnan},
    title = {Tractable Operations for Arithmetic Circuits of Probabilistic Models},
    booktitle = {{NIPS}},
    pages = {3936--3944},
    year = {2016}
    }
  • NIPS 2015
    Tractable Learning for Complex Probability Queries
    Bekker, Jessa and Davis, Jesse and Choi, Arthur and Darwiche, Adnan and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Tractable learning aims to learn probabilistic models where inference is guaranteed to be efficient. However, the particular class of queries that is tractable depends on the model and underlying representation. Usually this class is MPE or conditional probabilities Pr(x|y) for joint assignments x, y. We propose a tractable learner that guarantees efficient inference for a broader class of queries. It simultaneously learns a Markov network and its tractable circuit representation, in order to guarantee and measure tractability. Our approach differs from earlier work by using Sentential Decision Diagrams (SDD) as the tractable language instead of Arithmetic Circuits (AC). SDDs have desirable properties, which more general representations such as ACs lack, that enable basic primitives for Boolean circuit compilation. This allows us to support a broader class of complex probability queries, including counting, threshold, and parity, in polytime.
          
    
              @inproceedings{bekker2015tractable,
    author = {Bekker, Jessa and Davis, Jesse and Choi, Arthur and Darwiche, Adnan and Van den Broeck, Guy},
    title = {Tractable Learning for Complex Probability Queries},
    booktitle = {{NIPS}},
    pages = {2242--2250},
    year = {2015}
    }
  • ICML 2015
    On the Relationship between Sum-Product Networks and Bayesian Networks
    Zhao, Han and Melibari, Mazen and Poupart, Pascal
    [ paper | supplemental | bibtex | abstract ]
    
              In this paper, we establish some theoretical connections between Sum-Product Networks (SPNs) and Bayesian Networks (BNs). We prove that every SPN can be converted into a BN in linear time and space in terms of the network size. The key insight is to use Algebraic Decision Diagrams (ADDs) to compactly represent the local conditional probability distributions at each node in the resulting BN by exploiting context-specific independence (CSI). The generated BN has a simple directed bipartite graphical structure. We show that by applying the Variable Elimination algorithm (VE) to the generated BN with ADD representations, we can recover the original SPN where the SPN can be viewed as a history record or caching of the VE inference process. To help state the proof clearly, we introduce the notion of normal SPN and present a theoretical analysis of the consistency and decomposability properties. We conclude the paper with some discussion of the implications of the proof and establish a connection between the depth of an SPN and a lower bound of the tree-width of its corresponding BN.
          
    
              @inproceedings{zhao2015relationship, 
    title={On the relationship between sum-product networks and Bayesian networks},
    author={Zhao, Han and Melibari, Mazen and Poupart, Pascal},
    booktitle={International Conference on Machine Learning},
    pages={116--124},
    year={2015}
    }
  • AISTATS 2015
    On Theoretical Properties of Sum-Product Networks
    Peharz, Robert and Tschiatschek, Sebastian and Pernkopf, Franz and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a promising avenue for probabilistic modeling and have been successfully applied to various tasks. However, some theoretic properties about SPNs are not yet well understood. In this paper we fill some gaps in the theoretic foundation of SPNs. First, we show that the weights of any complete and consistent SPN can be transformed into locally normalized weights without changing the SPN distribution. Second, we show that consistent SPNs cannot model distributions significantly (exponentially) more compactly than decomposable SPNs. As a third contribution, we extend the inference mechanisms known for SPNs with finite states to generalized SPNs with arbitrary input distributions.
          
    
              @inproceedings{peharz2015theoretical, 
    title={On theoretical properties of sum-product networks},
    author={Peharz, Robert and Tschiatschek, Sebastian and Pernkopf, Franz and Domingos, Pedro},
    booktitle={Artificial Intelligence and Statistics},
    pages={744--752},
    year={2015}
    }
  • LTPM Workshop 2014
    Learning Selective Sum-Product Networks
    Peharz, Robert and Gens, Robert and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              We consider the selectivity constraint on the structure of sum-product networks (SPNs), which allows each sum node to have at most one child with non-zero output for each possible input. This allows us to find globally optimal maximum likelihood parameters in closed form. Although being a constrained class of SPNs, these models still strictly generalize classical graphical models such as Bayesian networks. Closed form parameter estimation opens the door for structure learning using a principled scoring function, trading off training likelihood and model complexity. In experiments we show that these models are easy to learn and compete well with state of the art.
          
    
              @inproceedings{peharz2014learning, title={Learning selective sum-product networks}, author={Peharz, Robert and Gens, Robert and Domingos, Pedro}, booktitle={LTPM workshop}, year={2014} }                 
          
  • ICML 2014
    Learning Sum-Product Networks with Direct and Indirect Interactions
    Rooshenas, Amirmohammad and Lowd, Daniel
    [ paper | code | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a deep probabilistic representation that allows for efficient, exact inference. SPNs generalize many other tractable models, including thin junction trees, latent tree models, and many types of mixtures. Previous work on learning SPN structure has mainly focused on using top-down or bottom-up clustering to find mixtures, which capture variable interactions indirectly through implicit latent variables. In contrast, most work on learning graphical models, thin junction trees, and arithmetic circuits has focused on finding direct interactions among variables. In this paper, we present ID-SPN, a new algorithm for learning SPN structure that unifies the two approaches. In experiments on 20 benchmark datasets, we find that the combination of direct and indirect interactions leads to significantly better accuracy than several state-of-the-art algorithms for learning SPNs and other tractable models.
          
    
              @inproceedings{rooshenas2014learning, 
    title={Learning sum-product networks with direct and indirect interactions},
    author={Rooshenas, Amirmohammad and Lowd, Daniel},
    booktitle={In Proceedings of the Thirty-First International Conference on Machine Learning}, year={2014},
    organization={Citeseer}
    }
  • AISTATS 2013
    Learning Markov Networks With Arithmetic Circuits
    Lowd, Daniel and Rooshenas, Amirmohammad
    [ paper | bibtex | abstract ]
    
              Markov networks are an effective way to represent complex probability distributions. However, learning their structure and parameters or using them to answer queries is typically intractable. One approach to making learning and inference tractable is to use approximations, such as pseudo-likelihood or approximate inference. An alternate approach is to use a restricted class of models where exact inference is always efficient. Previous work has explored low treewidth models, models with tree-structured features, and latent variable models. In this paper, we introduce ACMN, the first ever method for learning efficient Markov networks with arbitrary conjunctive features. The secret to ACMN’s greater flexibility is its use of arithmetic circuits, a linear-time inference representation that can handle many high treewidth models by exploiting local structure. ACMN uses the size of the corresponding arithmetic circuit as a learning bias, allowing it to trade off accuracy and inference complexity. In experiments on 12 standard datasets, the tractable models learned by ACMN are more accurate than both tractable models learned by other algorithms and approximate inference in intractable models.
          
    
              @inproceedings{lowd2013learning,
    author = {Lowd, Daniel and Rooshenas, Amirmohammad},
    title = {Learning Markov Networks With Arithmetic Circuits},
    booktitle = {{AISTATS}},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {31},
    pages = {406--414},
    publisher = {JMLR.org},
    year = {2013}
    }
  • NIPS 2010
    Approximate Inference by Compilation to Arithmetic Circuits
    Lowd, Daniel and Domingos, Pedro M.
    [ paper | bibtex | abstract ]
    
              Arithmetic circuits (ACs) exploit context-specific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we find that the methods based on sampling and learning work best: one such method (AC2-F) is faster and usually more accurate than loopy belief propagation, mean field, and Gibbs sampling; another (AC2-G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines.
          
    
              @inproceedings{lowd2010approximate,
    author = {Lowd, Daniel and Domingos, Pedro M.},
    title = {Approximate Inference by Compilation to Arithmetic Circuits},
    booktitle = {{NIPS}},
    pages = {1477--1485},
    publisher = {Curran Associates Inc.},
    year = {2010}
    }
  • UAI 2008
    Learning Arithmetic Circuits
    Lowd, Daniel and Domingos, Pedro M.
    [ paper | bibtex | abstract ]
    
              Graphical models are usually learned without regard to the cost of doing inference with them. As a result, even if a good model is learned, it may perform poorly at prediction, because it requires approximate inference. We propose an alternative: learning models with a score function that directly penalizes the cost of inference. Specifically, we learn arithmetic circuits with a penalty on the number of edges in the circuit (in which the cost of inference is linear). Our algorithm is equivalent to learning a Bayesian network with context-specific independence by greedily splitting conditional distributions, at each step scoring the candidates by compiling the resulting network into an arithmetic circuit, and using its size as the penalty. We show how this can be done efficiently, without compiling a circuit from scratch for each candidate. Experiments on several real-world domains show that our algorithm is able to learn tractable models with very large treewidth, and yields more accurate predictions than a standard context-specific Bayesian network learner, in far less time.
          
    
              @inproceedings{lowd2008learning,
    author = {Lowd, Daniel and Domingos, Pedro M.},
    title = {Learning Arithmetic Circuits},
    booktitle = {{UAI}},
    pages = {383--392},
    publisher = {{AUAI} Press},
    year = {2008}
    }
  • Int. J. Approx. Reason. 2006
    Compiling relational Bayesian networks for exact inference
    Chavira, Mark and Darwiche, Adnan and Jaeger, Manfred
    [ paper | bibtex | abstract ]
    
              We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available Primula tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating and differentiating these circuits in time linear in their size. We report on experimental results showing successful compilation and efficient inference on relational Bayesian networks, whose Primula-generated propositional instances have thousands of variables, and whose jointrees have clusters with hundreds of variables.
          
    
              @article{chavira2006compiling,
    author = {Chavira, Mark and Darwiche, Adnan and Jaeger, Manfred},
    title = {Compiling relational Bayesian networks for exact inference},
    journal = {Int. J. Approx. Reason.},
    volume = {42},
    number = {1-2},
    pages = {4--20},
    year = {2006}
    }
Cutset networks [ cnets ]
  • arXiv 2020
    Don't Explain without Verifying Veracity: An Evaluation of Explainable {AI} with Video Activity Recognition
    Nourani, Mahsan and Roy, Chiradeep and Rahman, Tahrima and Ragan, Eric D. and Ruozzi, Nicholas and Gogate, Vibhav
    [ paper | bibtex | abstract ]
    
              Explainable machine learning and artificial intelligence models have been used to justify a model's decision-making process. This added transparency aims to help improve user performance and understanding of the underlying model. However, in practice, explainable systems face many open questions and challenges. Specifically, designers might reduce the complexity of deep learning models in order to provide interpretability. The explanations generated by these simplified models, however, might not accurately justify and be truthful to the model. This can further add confusion to the users as they might not find the explanations meaningful with respect to the model predictions. Understanding how these explanations affect user behavior is an ongoing challenge. In this paper, we explore how explanation veracity affects user performance and agreement in intelligent systems. Through a controlled user study with an explainable activity recognition system, we compare variations in explanation veracity for a video review and querying task. The results suggest that low veracity explanations significantly decrease user performance and agreement compared to both accurate explanations and a system without explanations. These findings demonstrate the importance of accurate and understandable explanations and caution that poor explanations can sometimes be worse than no explanations with respect to their effect on user performance and reliance on an AI system.
          
    
              @article{nourani2020dont,
    author = {Nourani, Mahsan and Roy, Chiradeep and Rahman, Tahrima and Ragan, Eric D. and Ruozzi, Nicholas and Gogate, Vibhav},
    title = {Don't Explain without Verifying Veracity: An Evaluation of Explainable
    {AI} with Video Activity Recognition},
    journal = {CoRR},
    volume = {abs/2005.02335},
    year = {2020}
    }
  • IJCAI 2019
    Cutset Bayesian Networks: A New Representation for Learning Rao-Blackwellised Graphical Models
    Rahman, Tahrima and Jin, Shasha and Gogate, Vibhav
    [ paper | bibtex | abstract ]
    
              Recently there has been growing interest in learning probabilistic models that admit poly-time inference called tractable probabilistic models from data. Although they generalize poorly as compared to intractable models, they often yield more accurate estimates at prediction time. In this paper, we seek to further explore this trade-off between generalization performance and inference accuracy by proposing a novel, partially tractable representation called cutset Bayesian networks (CBNs). The main idea in CBNs is to partition the variables into two subsets X and Y,  learn a (intractable) Bayesian network that represents P(X) and a tractable con-ditional model that represents P(Y|X). The hopeis that the intractable model will help improve generalization while the tractable model, by leveraging Rao-Blackwellised sampling which combines exact inference and sampling, will help improve the prediction accuracy. To compactly model P(Y|X), we introduce a novel tractable representation called conditional cutset networks (CCNs) in which all conditional probability distributions are represented using calibrated classifiers—classifiers which typically yield higher quality probability estimates than conventional classifiers. We show via a rigorous experimental evaluation that CBNs and CCNs yield more accurate posterior estimates than their tractable as well as intractable counterparts.
          
    
              @inproceedings{rahman2019cutset,
    author = {Rahman, Tahrima and Jin, Shasha and Gogate, Vibhav},
    title = {Cutset Bayesian Networks: {A} New Representation for Learning Rao-Blackwellised
    Graphical Models},
    booktitle = {{IJCAI}},
    pages = {5751--5757},
    publisher = {ijcai.org},
    year = {2019}
    }
  • Cutset Bayesian Networks: A New Representation for Learning Rao-Blackwellised Graphical Models
    Rahman, Tahrima
    [ paper | bibtex | abstract ]
    
              Tractable models are a subclass of probabilistic graphical models (PGMs) in which exact inference can be performed tractably – a very desirable property missing from arbitrary PGMs like Bayesian and Markov networks – exact inference is a # P-complete problem in general PGMs. Thin junction trees, mixtures of trees, tractable Markov networks, and sum-product networks are some of the well-studied and popular tractable PGMs deployed in many practical application domains. In spite of their success in performing fast exact inference, these models often suffer from low expressivity and very slow learning time. Thus inventing new and improved tractable PGMs as well as developing efficient algorithms for learning them from data has become one of the core research topics in unsupervised density estimation using probabilistic graphical models. In this dissertation, we propose cutset networks (CNs) – a new class of tractable probabilistic models for representing multidimensional joint distributions. CNs are rooted OR search trees, in which each internal node (OR node) represents conditioning of a variable in the model, with tree distributions (tree Bayesian or Markov networks) at the leaves – a simple tractable PGM that can be learned in polynomial time using the Chow-Liu algorithm. Leveraging vast amount of research on decision tree induction, we present efficient algorithms for learning CNs and their ensembles from data. Sum-product networks (SPNs) have recently emerged as deep tractable probabilistic architectures which combine latent variable mixtures and model decomposition in an alternating fashion with univariate distributions at the leaves and have shown impressive performances in several important application domains including computer vision and natural language processing. Inspired the expressive power of SPNs and the efficient learnability of CNs, we combine the two and propose sum-product-cutset networks (SPCNs). Our experimental evaluations on a large variety of high-dimensional real world benchmark datasets proved that SPCNs are not only faster to learn than SPNs but also superior in accurately modeling the underlying distribution than CNs and SPNs. We proposed several novel and efficient post-processing optimization approaches like pruning and merging which not only reduce the variance of SPCNs that overfit to the training data but also increases the speed of prediction by learning a more compact model. Finally, we propose hybrid SPCNs (HSPCNs) – SPCNs with the ability to model distributions over continuous domains. HSPCNs model the sum-product space using discrete variables (latent or observed) and use tree structured conditional linear Gaussians to model distributions at the leaves with continuous variables. Most real world application domains contain both discrete and continuous variables, and HSPCNs enable an application designer to develop rich density estimators in such domains.
          
    
              @phdthesis{rahman2016scalable, 
    title={Scalable Learning Approaches for Sum-Product-Cutset Networks},
    author={Rahman, Tahrima},
    year={2016}
    }
  • ICML 2019
    Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation
    Rahman, Tahrima and Jin, Shasha and Gogate, Vibhav
    [ paper | bibtex | abstract ]
    
              Tractable probabilistic models obviate the need for unreliable approximate inference approaches and as a result often yield accurate query answers in practice. However, most tractable models that achieve state-of-the-art generalization performance (measured using test set likelihood score) use latent variables. Such models admit poly-time marginal (MAR) inference but do not admit poly-time (full) maximum-a-posteriori (MAP) inference. To address this problem, in this paper, we propose a novel approach for inducing cutset networks, a well-known tractable, highly interpretable representation that does not use latent variables and admits linear time MAR as well as MAP inference. Our approach addresses a major limitation of existing techniques that learn cutset networks from data in that their accuracy is quite low as compared to latent variable models such as ensembles of cutset networks and sum-product networks. The key idea in our approach is to construct deep cutset networks by not only learning them from data but also compiling them from a more accurate latent tractable model. We show experimentally that our new approach yields more accurate MAP estimates as compared with existing approaches and significantly improves the test set log-likelihood score of cutset networks bringing them closer in terms of generalization performance to latent variable models.
          
    
              @inproceedings{rahman2019look,
    author = {Rahman, Tahrima and Jin, Shasha and Gogate, Vibhav},
    title = {Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {97},
    pages = {5311--5320},
    publisher = {{PMLR}},
    year = {2019}
    }
  • AI*IA 2018
    Extremely Randomized CNets for Multi-label Classification
    Basile, Teresa Maria Altomare and Di Mauro, Nicola and Esposito, Floriana
    [ paper | bibtex | abstract ]
    
              Multi-label classification (MLC) is a challenging task in machine learning consisting in the prediction of multiple labels associated with a single instance. Promising approaches for MLC are those able to capture label dependencies by learning a single probabilistic model—differently from other competitive approaches requiring to learn many models. The model is then exploited to compute the most probable label configuration given the observed attributes. Cutset Networks (CNets) are density estimators leveraging context-specific independencies providing exact inference in polynomial time. The recently introduced Extremely Randomized CNets (XCNets) reduce the structure learning complexity making able to learn ensembles of XCNets outperforming state-of-the-art density estimators. In this paper we employ XCNets for MLC by exploiting efficient Most Probable Explanations (MPE). An experimental evaluation on real-world datasets shows how the proposed approach is competitive w.r.t. other sophisticated methods for MLC.
          
    
              @inproceedings{mariaaltomarebasile2018extremely,
    author = {Maria Altomare Basile, Teresa and Di Mauro, Nicola and Esposito, Floriana},
    title = {Extremely Randomized CNets for Multi-label Classification},
    booktitle = {AI*IA},
    series = {Lecture Notes in Computer Science},
    volume = {11298},
    pages = {334--347},
    publisher = {Springer},
    year = {2018}
    }
  • ECML/PKDD 2017
    Fast and Accurate Density Estimation with Extremely Randomized Cutset Networks
    Di Mauro, Nicola and Vergari, Antonio and Basile, Teresa Maria Altomare and Esposito, Floriana
    [ paper | code | bibtex | abstract ]
    
              Cutset Networks (CNets) are density estimators leveraging context-specific independencies recently introduced to provide exact inference in polynomial time. Learning a CNet is done by firstly building a weighted probabilistic OR tree and then estimating tractable distributions as its leaves. Specifically, selecting an optimal OR split node requires cubic time in the number of the data features, and even approximate heuristics still scale in quadratic time. We introduce Extremely Randomized Cutset Networks (XCNets), CNets whose OR tree is learned by performing random conditioning. This simple yet surprisingly effective approach reduces the complexity of OR node selection to constant time. While the likelihood of an XCNet is slightly worse than an optimally learned CNet, ensembles of XCNets outperform state-of-the-art density estimators on a series of standard benchmark datasets, yet employing only a fraction of the time needed to learn the competitors. Code and data related to this chapter are available at: https://github.com/nicoladimauro/cnet.
          
    
              @inproceedings{dimauro2017fast,
    author = {Di Mauro, Nicola and Vergari, Antonio and Maria Altomare Basile, Teresa and Esposito, Floriana},
    title = {Fast and Accurate Density Estimation with Extremely Randomized Cutset
    Networks},
    booktitle = {{ECML/PKDD} {(1)}},
    series = {Lecture Notes in Computer Science},
    volume = {10534},
    pages = {203--219},
    publisher = {Springer},
    year = {2017}
    }
  • Probabilistic Graphical Models 2016
    Multi-Label Classification with Cutset Networks
    Di Mauro, Nicola and Vergari, Antonio and Esposito, Floriana
    [ paper | bibtex | abstract ]
    
              In this work, we tackle the problem of Multi-Label Classification (MLC) by using Cutset Networks (CNets), weighted probabilistic model trees, recently proposed as tractable probabilistic models for discrete distributions. We employ CNets to perform Most Probable Explanation (MPE) inference exactly and efficiently and we improve a state-of-the-art structure learning algorithm for CNets by explicitly taking advantage of label dependencies. We achieve this by forcing the tree inner nodes to represent only feature variables and by exploiting structural heuristics while learning the leaf models. A thorough experimental evaluation on ten real-world datasets shows how the proposed approach improves several metrics for MLC, proving it to be competitive with problem transformation methods like classifier chains.
          
    
              @inproceedings{dimauro2016multi-label,
    author = {Di Mauro, Nicola and Vergari, Antonio and Esposito, Floriana},
    title = {Multi-Label Classification with Cutset Networks},
    booktitle = {Probabilistic Graphical Models},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {52},
    pages = {147--158},
    publisher = {JMLR.org},
    year = {2016}
    }
  • AAAI 2016
    Learning Ensembles of Cutset Networks
    Rahman, Tahrima and Gogate, Vibhav
    [ paper | bibtex | abstract ]
    
              Cutset networks — OR (decision) trees that have Bayesian networks whose treewidth is bounded by one at each leaf — are a new class of tractable probabilistic models that admit fast, polynomial-time inference and learning algorithms. This is unlike other state-of-the-art tractable models such as thin junction trees, arithmetic circuits and sum-product networks in which inference is fast and efficient but learning can be notoriously slow. In this paper, we take advantage of this unique property to develop fast algorithms for learning ensembles of cutset networks. Specifically, we consider generalized additive mixtures of cutset networks and develop sequential boosting-based and parallel bagging-based approaches for learning them from data. We demonstrate, via a thorough experimental evaluation, that our new algorithms are superior to competing approaches in terms of test-set log-likelihood score and learning time.
          
    
              @inproceedings{rahman2016learning,
    author = {Rahman, Tahrima and Gogate, Vibhav},
    title = {Learning Ensembles of Cutset Networks},
    booktitle = {{AAAI}},
    pages = {3301--3307},
    publisher = {{AAAI} Press},
    year = {2016}
    }
  • UAI 2016
    Merging Strategies for Sum-Product Networks: From Trees to Graphs
    Rahman, Tahrima and Gogate, Vibhav
    [ paper | bibtex | abstract ]
    
              Learning the structure of sum-product networks (SPNs) - arithmetic circuits over latent and observed variables - has been the subject of much recent research. These networks admit linear time exact inference, and thus help alleviate one of the chief disadvantages of probabilistic graphical models: accurate probabilistic inference algorithms are often computationally expensive. Although, algorithms for inducing their structure from data have come quite far and often outperform algorithms that induce probabilistic graphical models, a key issue with existing approaches is that they induce tree SPNs, a small, inefficient sub-class of SPNs. In this paper, we address this limitation by developing post-processing approaches that induce graph SPNs from tree SPNs by merging similar sub-structures. The key benefits of graph SPNs over tree SPNs include smaller computational complexity which facilitates faster online inference, and better generalization accuracy because of reduced variance, at the cost of slight increase in the learning time. We demonstrate experimentally that our merging techniques significantly improve the accuracy of tree SPNs, achieving state-of-the-art performance on several real world benchmark datasets.
          
    
              @inproceedings{rahman2016merging,
    author = {Rahman, Tahrima and Gogate, Vibhav},
    title = {Merging Strategies for Sum-Product Networks: From Trees to Graphs},
    booktitle = {{UAI}},
    publisher = {{AUAI} Press},
    year = {2016}
    }
  • AI*IA 2015
    Learning Accurate Cutset Networks by Exploiting Decomposability
    Di Mauro, Nicola and Vergari, Antonio and Esposito, Floriana
    [ paper | bibtex | abstract ]
    
              The rising interest around tractable Probabilistic Graphical Models is due to the guarantees on inference feasibility they provide. Among them, Cutset Networks (CNets) have recently been introduced as models embedding Pearl’s cutset conditioning algorithm in the form of weighted probabilistic model trees with tree-structured models as leaves. Learning the structure of CNets has been tackled as a greedy search leveraging heuristics from decision tree learning. Even if efficient, the learned models are far from being accurate in terms of likelihood. Here, we exploit the decomposable score of CNets to learn their structure and parameters by directly maximizing the likelihood, including the BIC criterion and informative priors on smoothing parameters. In addition, we show how to create mixtures of CNets by adopting a well known bagging method from the discriminative framework as an effective and cheap alternative to the classical EM. We compare our algorithms against the original variants on a set of standard benchmarks for graphical model structure learning, empirically proving our claims.
          
    
              @inproceedings{dimauro2015learning,
    author = {Di Mauro, Nicola and Vergari, Antonio and Esposito, Floriana},
    title = {Learning Accurate Cutset Networks by Exploiting Decomposability},
    booktitle = {AI*IA},
    series = {Lecture Notes in Computer Science},
    volume = {9336},
    pages = {221--232},
    publisher = {Springer},
    year = {2015}
    }
  • AIxIA 2015
    Learning Accurate Cutset Networks by Exploiting Decomposability
    Nicola Di Mauro, Antonio Vergari and Floriana Esposito
    [ paper | code | bibtex | abstract ]
    
              The rising interest around tractable Probabilistic Graphical Models is due to the guarantees on inference feasibility they provide. Among them, Cutset Networks (CNets) have recently been introduced as models embedding Pearl’s cutset conditioning algorithm in the form of weighted probabilistic model trees with tree-structured models as leaves. Learning the structure of CNets has been tackled as a greedy search leveraging heuristics from decision tree learning. Even if efficient, the learned models are far from being accurate in terms of likelihood. Here, we exploit the decomposable score of CNets to learn their structure and parameters by directly maximizing the likelihood, including the BIC criterion and informative priors on smoothing parameters. In addition, we show how to create mixtures of CNets by adopting a well known bagging method from the discriminative framework as an effective and cheap alternative to the classical EM. We compare our algorithms against the original variants on a set of standard benchmarks for graphical model structure learning, empirically proving our claims.
          
    
              @inproceedings{di2015learning, 
    title={Learning accurate cutset networks by exploiting decomposability},
    author={Di Mauro, Nicola and Vergari, Antonio and Esposito, Floriana},
    booktitle={Congress of the Italian Association for Artificial Intelligence},
    pages={221--232},
    year={2015},
    organization={Springer}}
  • ECML-PKDD 2014
    Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees
    Rahman, Tahrima and Kothalkar, Prasanna and Gogate, Vibhav
    [ paper | bibtex | abstract ]
    
              In this paper, we present cutset networks, a new tractable probabilistic model for representing multi-dimensional discrete distributions. Cutset networks are rooted OR search trees, in which each OR node represents conditioning of a variable in the model, with tree Bayesian networks (Chow-Liu trees) at the leaves. From an inference point of view, cutset networks model the mechanics of Pearl’s cutset conditioning algorithm, a popular exact inference method for probabilistic graphical models. We present efficient algorithms, which leverage and adopt vast amount of research on decision tree induction for learning cutset networks from data. We also present an expectation-maximization (EM) algorithm for learning mixtures of cutset networks. Our experiments on a wide variety of benchmark datasets clearly demonstrate that compared to approaches for learning other tractable models such as thin-junction trees, latent tree models, arithmetic circuits and sum-product networks, our approach is significantly more scalable, and provides similar or better accuracy.
          
    
              @inproceedings{rahman2014cutset,
    author = {Rahman, Tahrima and Kothalkar, Prasanna and Gogate, Vibhav},
    title = {Cutset Networks: {A} Simple, Tractable, and Scalable Approach for
    Improving the Accuracy of Chow-Liu Trees},
    booktitle = {{ECML/PKDD} {(2)}},
    series = {Lecture Notes in Computer Science},
    volume = {8725},
    pages = {630--645},
    publisher = {Springer},
    year = {2014}
    }
Sum-Product networks [ spns ]
  • PLDI 2021
    SPPL: Probabilistic Programming With Fast Exact Symbolic Inference
    Saad, Feras A. and Rinard, Martin C. and Mansinghka, Vikash K.
    [ paper | code | bibtex | abstract ]
    
              We present the Sum-Product Probabilistic Language (SPPL), a new probabilistic programming language that automatically delivers exact solutions to a broad range of probabilistic inference queries. SPPL translates probabilistic programs into sum-product expressions, a new symbolic representation and associated semantic domain that extends standard sum-product networks to support mixed-type distributions, numeric transformations, logical formulas, and pointwise and set-valued constraints. We formalize SPPL via a novel translation strategy from probabilistic programs to sum-product expressions and give sound exact algorithms for conditioning on and computing probabilities of events. SPPL imposes a collection of restrictions on probabilistic programs to ensure they can be translated into sum-product expressions, which allow the system to leverage new techniques for improving the scalability of translation and inference by automatically exploiting probabilistic structure. We implement a prototype of SPPL with a modular architecture and evaluate it on benchmarks the system targets, showing that it obtains up to 3500x speedups over state-of-the-art symbolic systems on tasks such as verifying the fairness of decision tree classifiers, smoothing hidden Markov models, conditioning transformed random variables, and computing rare event probabilities.
          
    
              @inproceedings{saad2021sppl,
    title = {{SPPL:} Probabilistic Programming with Fast Exact Symbolic Inference},
    author = {Saad, Feras A. and Rinard, Martin C. and Mansinghka, Vikash K.},
    booktitle = {Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation},
    pages = {804--819},
    numpages = {16},
    year = {2021},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    doi = {10.1145/3453483.3454078},
    isbn = {9781450383912},
    keywords = {probabilistic programming, symbolic execution},
    location = {Virtual, Canada},
    }
  • Thesis
    Sum-Product Networks for Complex Modelling Scenarios
    Trapp, Martin
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are flexible general-purpose probabilistic models that have received increasing attention due to their attractive inference properties. Even though there exists a large body of work on parameter and structure learning in SPNs, many of the existing approaches focus on rather simple modelling scenarios. For example, in the case of discriminative parameter learning, the labelled training examples are assumed to be abundant, and we generally consider SPNs to be defined only over a finite set of random variables. Moreover, most approaches to construct SPNs in a data-agnostic way rely on heuristic and ad-hoc strategies rather than proposing a principled solution. In this thesis, we examine SPNs for complex modelling scenarios. We are particularly interested in: i) principled semi-supervised parameter learning in SPNs, which guarantees that the learner cannot deteriorate in performance when adding additional unlabelled data, ii) principled structure learning in SPNs that is mathematically sound, protects us from overfitting and enables learning under missing data, and iii) extending the framework of SPNs to model possibly infinitely many random variables, and thus, establishing SPNs as a stochastic process model. As a first main contribution, we introduce an extension of the contrastive pessimistic likelihood for safe semi-supervised parameter learning in SPNs. Our approach is the first semi- supervised learning technique for SPNs, and often obtains a performance that is similar to an SPN trained on a fully labelled datasets. We first derive an objective for generative learning and later extend the approach to discriminative parameter learning. Lastly, we show empirical evidence that safe semi-supervised SPNs perform favourably compared to existing semi-supervised techniques on various classification tasks. The second main contribution of this thesis is the introduction of principled structure learning in SPNs. While there exists a large body of work on structure learning, none of the approaches asks either of the two essential questions: “What is a good structure?” or “What is a principle to derive a good structure?”. We aim to change this practice and introduce a sound, Bayesian formulation for joint parameter and structure learning in SPNs. Our experiments show that this principled approach competes well with the prior art and that we gain several benefits, such as automatic protection against overfitting, robustness under missing data and a natural extension to nonparametric formulations. As a third main contribution, we introduce deep structured mixtures of Gaussian processes, which combine tractable inference in SPNs with exact posterior inference in Gaussian processes. Our approach directly extends SPNs to the stochastic process case by equipping SPNs with Gaussian measures, which correspond to Gaussian processes, as leaves. We show that the resulting model allows a natural interpretation as exact Bayesian model averaging over a rich collection of naive-local expert models. In a series of experiments, we show that the proposed technique outperforms existing expert-based approaches and provides low approximation errors when used as an approximation to a Gaussian process. In addition to the main contributions, we show that gradient-based optimisation in overparameterised SPNs results in intrinsic acceleration effects, which depend directly on the depth of the network. Furthermore, we introduce two formulations for nonparametric SPNs and discuss their advantages and limitations.
          
    
              @phdthesis{trapp2020thesis, 
    title={Sum-Product Networks for Complex Modelling Scenarios},
    author={Trapp, Martin},
    year={2020},
    school={PhD thesis, Graz University of Technology}
    }
  • NeurIPS 2020
    Probabilistic Circuits for Variational Inference in Discrete Graphical Models
    Shih, Andy and Ermon, Stefano
    [ paper | bibtex | abstract ]
    
              Inference in discrete graphical models with variational methods is difficult because of the inability to re-parameterize gradients of the Evidence Lower Bound (ELBO). Many sampling-based methods have been proposed for estimating these gradients, but they suffer from high bias or variance. In this paper, we propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN), to compute ELBO gradients exactly (without sampling) for a certain class of densities. In particular, we show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is a polynomial the corresponding ELBO can be computed analytically. To scale to graphical models with thousands of variables, we develop an efficient and effective construction of selective-SPNs with size O(kn), where n is the number of variables and k is an adjustable hyperparameter. We demonstrate our approach on three types of graphical models -- Ising models, Latent Dirichlet Allocation, and factor graphs from the UAI Inference Competition. Selective-SPNs give a better lower bound than mean-field and structured mean-field, and is competitive with approximations that do not provide a lower bound, such as Loopy Belief Propagation and Tree-Reweighted Belief Propagation. Our results show that probabilistic circuits are promising tools for variational inference in discrete graphical models as they combine tractability and expressivity.
          
    
              @inproceedings{shih2020probabilistic,
    author = {Shih, Andy and Ermon, Stefano},
    title = {Probabilistic Circuits for Variational Inference in Discrete Graphical
    Models},
    booktitle = {NeurIPS},
    year = {2020}
    }
  • AISTATS 2020
    Deep Structured Mixtures of Gaussian Processes
    Trapp, Martin and Peharz, Robert and Pernkopf, Franz and Rasmussen, Carl E.
    [ paper | bibtex | abstract ]
    
              Gaussian Processes (GPs) are powerful non-parametric Bayesian regression models that allow exact posterior inference, but exhibit high computational and memory costs. In order to improve scalability of GPs, approximate posterior inference is frequently employed, where a prominent class of approximation techniques is based on local GP experts. However, local-expert techniques proposed so far are either not well-principled, come with limited approximation guarantees, or lead to intractable models. In this paper, we introduce deep structured mixtures of GP experts, a stochastic process model which i) allows exact posterior inference, ii) has attractive computational and memory costs, and iii) when used as GP approximation, captures predictive uncertainties consistently better than previous expert-based approximations. In a variety of experiments, we show that deep structured mixtures have a low approximation error and often perform competitive or outperform prior work.
          
    
              @inproceedings{trapp2020dsmgp,
    author = {Trapp, Martin and Peharz, Robert and Pernkopf, Franz and Rasmussen, Carl E.},
    title = {Deep Structured Mixtures of Gaussian Processes},
    booktitle = {AISTATS},
    year = {2020}
    }
  • PGM 2020
    Sum-Product-Transform Networks: Exploiting Symmetries using Invertible Transformations
    Pevny, Tomas and Smidl, Vasek and Trapp, Martin and Polacek, Ondrej and Oberhuber, Tomas
    [ paper | bibtex | abstract ]
    
              In this work, we propose Sum-Product-Transform Networks (SPTN), an extension of sum-product networks that uses invertible transformations as additional internal nodes. The type and placement of transformations determine properties of the resulting SPTN with many interesting special cases. Importantly, SPTN with Gaussian leaves and affine transformations pose the same inference task tractable that can be computed efficiently in SPNs. We propose to store affine transformations in their SVD decompositions using an efficient parametrization of unitary matrices by a set of Givens rotations. Last but not least, we demonstrate that G-SPTNs achieve state-of-the-art results on the density estimation task and are competitive with state-of-the-art methods for anomaly detection.
          
    
              @inproceedings{pevny2020sptn,
    author = {Pevny, Tomas and Smidl, Vasek and Trapp, Martin and Polacek, Ondrej and Oberhuber, Tomas},
    title = {Sum-Product-Transform Networks: Exploiting Symmetries using Invertible Transformations},
    booktitle = {PGM},
    year = {2020}
    }
  • IEEE Trans. Neural Networks Learn. Syst. 2020
    Deep Model Compression and Inference Speedup of Sum-Product Networks on Tensor Trains
    Ko, Ching-Yun and Chen, Cong and He, Zhuolun and Zhang, Yuke and Batselier, Kim and Wong, Ngai
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) represent an emerging class of neural networks with clear probabilistic semantics and superior inference speed over graphical models. This work reveals a strikingly intimate connection between SPNs and tensor networks, thus leading to a highly efficient representation that we call tensor SPNs (tSPNs). For the first time, through mapping an SPN onto a tSPN and employing novel optimization techniques, we demonstrate remarkable parameter compression with negligible loss in accuracy.
          
    
              @article{ko2020deep,
    author = {Ko, Ching{-}Yun and Chen, Cong and He, Zhuolun and Zhang, Yuke and Batselier, Kim and Wong, Ngai},
    title = {Deep Model Compression and Inference Speedup of Sum-Product Networks
    on Tensor Trains},
    journal = {{IEEE} Trans. Neural Networks Learn. Syst.},
    volume = {31},
    number = {7},
    pages = {2665--2671},
    year = {2020}
    }
  • Int. J. Approx. Reason. 2020
    Discriminative training of feed-forward and recurrent sum-product networks by extended Baum-Welch
    Duan, Haonan and Rashwan, Abdullah and Poupart, Pascal and Chen, Zhitang
    [ paper | bibtex | abstract ]
    
              We present a discriminative learning algorithm for feed-forward Sum-Product Networks (SPNs) [42] and recurrent SPNs [31] based on the Extended Baum-Welch (EBW) algorithm [4]. We formulate the conditional data likelihood in the SPN framework as a rational function, and we use EBW to monotonically maximize it. We derive the algorithm for SPNs and RSPNs with both discrete and continuous variables. The experiments show that this algorithm performs better than both generative Expectation-Maximization, and discriminative gradient descent on a wide variety of applications. We also demonstrate the robustness of the algorithm in the case of missing features by comparing its performance to Support Vector Machines and Neural Networks.
          
    
              @article{duan2020discriminative,
    author = {Duan, Haonan and Rashwan, Abdullah and Poupart, Pascal and Chen, Zhitang},
    title = {Discriminative training of feed-forward and recurrent sum-product
    networks by extended Baum-Welch},
    journal = {Int. J. Approx. Reason.},
    volume = {124},
    pages = {66--81},
    year = {2020}
    }
  • arXiv 2020
    Sum-product networks: A survey
    Paris, Iago and Sanchez-Cauce, Raquel and Javier Diez, Francisco
    [ paper | bibtex | abstract ]
    
              A sum-product network (SPN) is a probabilistic model, based on a rooted acyclic directed graph, in which terminal nodes represent univariate probability distributions and non-terminal nodes represent convex combinations (weighted sums) and products of probability functions. They are closely related to probabilistic graphical models, in particular to Bayesian networks with multiple context-specific independencies. Their main advantage is the possibility of building tractable models from data, i.e., models that can perform several inference tasks in time proportional to the number of links in the graph. They are somewhat similar to neural networks and can address the same kinds of problems, such as image processing and natural language understanding. This paper offers a survey of SPNs, including their definition, the main algorithms for inference and learning from data, the main applications, a brief review of software libraries, and a comparison with related models.
          
    
              @article{paris2020spn,
    author = {Par{'{i}}s, Iago and S{'{a}}nchez{-}Cauce, Raquel and Javier D{'{i}}ez, Francisco},
    title = {Sum-product networks: {A} survey},
    journal = {CoRR},
    volume = {abs/2004.01167},
    year = {2020}
    }
  • Machine Learning Journal (MLJ) 2020
    Sum-product graphical models
    Desana, Mattia and Schnoerr, Christoph
    [ paper | code | bibtex | abstract ]
    
              This paper introduces a probabilistic architecture called sum–product graphical model (SPGM). SPGMs represent a class of probability distributions that combines, for the first time, the semantics of probabilistic graphical models (GMs) with the evaluation efficiency of sum–product networks (SPNs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, this approach provides new connections between the fields of SPNs and GMs, and enables a high-level interpretation of the family of distributions encoded by SPNs. We provide two applications of SPGMs in density estimation with empirical results close to or surpassing state-of-the-art models. The theoretical and practical results demonstrate that jointly exploiting properties of SPNs and GMs is an interesting direction of future research.
          
    
              @article{desana2020sum-product,
    author = {Desana, Mattia and Schnoerr, Christoph},
    title = {Sum-product graphical models},
    journal = {Mach. Learn.},
    volume = {109},
    number = {1},
    pages = {135--173},
    year = {2020}
    }
  • Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits
    Peharz, Robert and Lang, Steven and Vergari, Antonio and Stelzner, Karl and Molina, Alejandro and Trapp, Martin and Van den Broeck, Guy and Kersting, Kristian and Ghahramani, Zoubin
    [ paper | bibtex | abstract ]
    
              Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Furthermore, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.
          
    
              @inproceedings{peharz2020einsum,
    author = {Peharz, Robert and Lang, Steven and Vergari, Antonio and Stelzner, Karl and Molina, Alejandro and Trapp, Martin and Van den Broeck, Guy and Kersting, Kristian and Ghahramani, Zoubin},
    title = {Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
    Circuits},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {119},
    pages = {7563--7574},
    publisher = {{PMLR}},
    year = {2020}
    }
  • PGM 2020
    Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures
    Shao, Xiaoting and Molina, Alejandro and Vergari, Antonio and Stelzner, Karl and Peharz, Robert and Liebig, Thomas and Kersting, Kristian
    [ paper | bibtex | abstract ]
    
              Probabilistic graphical models are a central tool in AI; however, they are generally not as expressive as deep neural models, and inference is notoriously hard and slow. In contrast, deep probabilistic models such as sum-product networks (SPNs) capture joint distributions in a tractable fashion, but still lack the expressive power of intractable models based on deep neural networks. Therefore, we introduce conditional SPNs (CSPNs), conditional density estimators for multivariate and potentially hybrid domains which allow harnessing the expressive power of neural networks while still maintaining tractability guarantees. One way to implement CSPNs is to use an existing SPN structure and condition its parameters on the input, e.g., via a deep neural network. This approach, however, might misrepresent the conditional independence structure present in data. Consequently, we also develop a structure-learning approach that derives both the structure and parameters of CSPNs from data. Our experimental evidence demonstrates that CSPNs are competitive with other probabilistic models and yield superior performance on multilabel image classification compared to mean field and mixture density networks. Furthermore, they can successfully be employed as building blocks for structured probabilistic models, such as autoregressive image models.
          
    
              @inproceedings{shao2020conditional,
    author = {Shao, Xiaoting and Molina, Alejandro and Vergari, Antonio and Stelzner, Karl and Peharz, Robert and Liebig, Thomas and Kersting, Kristian},
    title = {Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic
    Architectures},
    booktitle = {PGM},
    series = {Proceedings of Machine Learning Research},
    year = {2020}
    }
  • Workshop on Tractable Probabilistic Modeling (ICML 2019)
    Optimisation of Overparametrized Sum-Product Networks
    Trapp, Martin and Peharz, Robert and Pernkopf, Franz and Rasmussen, Carl E.
    [ paper | bibtex | abstract ]
    
              It seems to be a pearl of conventional wisdom that parameter learning in deep sum-product networks is surprisingly fast compared to shallow mixture models. This paper examines the effects of overparameterization in sum-product networks on the speed of parameter optimisation. Using theoretical analysis and empirical experiments, we show that deep sum-product networks exhibit an implicit acceleration compared to their shallow counterpart. In fact, gradient-based optimisation in deep tree-structured sum-product networks is equal to gradient ascend with adaptive and time-varying learning rates and additional momentum terms.
          
    
              @inproceedings{trapp2019overparam,
    author = {Trapp, Martin and Peharz, Robert and Pernkopf, Franz},
    title = {Optimisation of Overparametrized Sum-Product Networks},
    booktitle = {Workshop on Tractable Probabilistic Modeling (ICML 2019)},
    year = {2019}
    }
  • ISIPTA 2019
    Robustness in Sum-Product Networks with Continuous and Categorical Data
    Wit, Rob and de Campos, Cassio P. and Conaty, Diarmaid and Martinez del Rincon, Jesus
    [ paper | bibtex | abstract ]
    
              Sum-product networks are a popular family of probabilistic graphical models for which marginal inference can be performed in polynomial time. After learning sum-product networks from scarce data, small variations of parameters could lead to different conclusions. We adapt the robustness measure created for categorical credal sum-product networks to domains with both continuous and categorical variables. We apply this approach to a real-world dataset of online purchases where the goal is to identify fraudulent cases. We empirically show that such credal models can better discriminate between easy and hard instances than simply using the probability of the most probable class.
          
    
              @inproceedings{wit2019robustness,
    author = {Wit, Rob and de Campos, Cassio P. and Conaty, Diarmaid and Mart{'{i}}nez del Rinc{'{o}}n, Jes{'{u}}s},
    title = {Robustness in Sum-Product Networks with Continuous and Categorical
    Data},
    booktitle = {{ISIPTA}},
    series = {Proceedings of Machine Learning Research},
    volume = {103},
    pages = {156--158},
    publisher = {{PMLR}},
    year = {2019}
    }
  • ISIPTA 2019
    Robust Analysis of MAP Inference in Selective Sum-Product Networks
    Llerena, Julissa Giuliana Villanueva and Maua, Denis Deratani
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPN) are deep probabilistic models that have shown to achieve state-of-the-art performance in several machine learning tasks. As with many other probabilistic models, performing Maximum-A-Posteriori (MAP) inference is NP-hard in SPNs. A notable exception is selective SPNs, that constrain the network so as to allow MAP inference to be performed in linear time. Due to the high number of parameters, SPNs learned from data can produce unreliable and overconfident inference; this phenomenon can be partially mitigated by performing a Robustness Analysis of the model predictions to changes in the parameters. In this work, we address the problem of assessing the robustness of MAP inferences produced with Selective SPNs to global perturbations of the parameters. We present efficient algorithms and an empirical analysis with realistic problems.
          
    
              @inproceedings{llerena2019robust,
    author = { Llerena, Julissa Giuliana Villanueva and Deratani Mau{'{a, Denis},
    title = {Robust Analysis of {MAP} Inference in Selective Sum-Product Networks},
    booktitle = {{ISIPTA}},
    series = {Proceedings of Machine Learning Research},
    volume = {103},
    pages = {430--440},
    publisher = {{PMLR}},
    year = {2019}
    }
  • NeurIPS 2019
    Bayesian Learning of Sum-Product Networks
    Trapp, Martin and Peharz, Robert and Ge, Hong and Pernkopf, Franz and Ghahramani, Zoubin
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are flexible density estimators and have received significant attention due to their attractive inference properties. While parameter learning in SPNs is well developed, structure learning leaves something to be desired: Even though there is a plethora of SPN structure learners, most of them are somewhat ad-hoc and based on intuition rather than a clear learning principle. In this paper, we introduce a well-principled Bayesian framework for SPN structure learning. First, we decompose the problem into i) laying out a computational graph, and ii) learning the so-called scope function over the graph. The first is rather unproblematic and akin to neural network architecture validation. The second represents the effective structure of the SPN and needs to respect the usual structural constraints in SPN, i.e. completeness and decomposability. While representing and learning the scope function is somewhat involved in general, in this paper, we propose a natural parametrisation for an important and widely used special case of SPNs. These structural parameters are incorporated into a Bayesian model, such that simultaneous structure and parameter learning is cast into monolithic Bayesian posterior inference. In various experiments, our Bayesian SPNs often improve test likelihoods over greedy SPN learners. Further, since the Bayesian framework protects against overfitting, we can evaluate hyper-parameters directly on the Bayesian model score, waiving the need for a separate validation set, which is especially beneficial in low data regimes. Bayesian SPNs can be applied to heterogeneous domains and can easily be extended to nonparametric formulations. Moreover, our Bayesian approach is the first, which consistently and robustly learns SPN structures under missing data.
          
    
              @inproceedings{trapp2019bayesian,
    author = {Trapp, Martin and Peharz, Robert and Ge, Hong and Pernkopf, Franz and Ghahramani, Zoubin},
    title = {Bayesian Learning of Sum-Product Networks},
    booktitle = {NeurIPS},
    pages = {6344--6355},
    year = {2019}
    }
  • UAI 2019
    Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning
    Peharz, Robert and Vergari, Antonio and Stelzner, Karl and Molina, Alejandro and Trapp, Martin and Shao, Xiaoting and Kersting, Kristian and Ghahramani, Zoubin
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are expressive probabilistic models with a rich set of exact and efficient inference routines. However, in order to guarantee exact inference, they require specific structural constraints, which complicate learning SPNs from data. Thereby, most SPN structure learners proposed so far are tedious to tune, do not scale easily, and are not easily integrated with deep learning frameworks. In this paper, we follow a simple “deep learning” approach, by generating unspecialized random structures, scalable to millions of parameters, and subsequently applying GPU-based optimization. Somewhat surprisingly, our models often perform on par with state-of-the-art SPN structure learners and deep neural networks on a diverse range of generative and discriminative scenarios. At the same time, our models yield well-calibrated uncertainties, and stand out among most deep generative and discriminative models in being robust to missing features and being able to detect anomalies.
          
    
              @inproceedings{peharz2019random,
    author = {Peharz, Robert and Vergari, Antonio and Stelzner, Karl and Molina, Alejandro and Trapp, Martin and Shao, Xiaoting and Kersting, Kristian and Ghahramani, Zoubin},
    title = {Random Sum-Product Networks: {A} Simple and Effective Approach to
    Probabilistic Deep Learning},
    booktitle = {{UAI}},
    series = {Proceedings of Machine Learning Research},
    volume = {115},
    pages = {334--344},
    publisher = {{AUAI} Press},
    year = {2019}
    }
  • Machine Learning Journal (MLJ) 2019
    Visualizing and understanding Sum-Product Networks
    Vergari, Antonio and Di Mauro, Nicola and Esposito, Floriana
    [ paper | code | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are deep tractable probabilistic models by which several kinds of inference queries can be answered exactly and in a tractable time. They have been largely used as black box density estimators, assessed by comparing their likelihood scores on different tasks. In this paper we explore and exploit the inner representations learned by SPNs. By taking a closer look at the inner workings of SPNs, we aim to better understand what and how meaningful the representations they learn are, as in a classic Representation Learning framework. We firstly propose an interpretation of SPNs as Multi-Layer Perceptrons, we then devise several criteria to extract representations from SPNs and finally we empirically evaluate them in several (semi-)supervised tasks showing they are competitive against classical feature extractors like RBMs, DBNs and deep probabilistic autoencoders, like MADEs and VAEs.
          
    
              @article{vergari2019visualizing,
    author = {Vergari, Antonio and Di Mauro, Nicola and Esposito, Floriana},
    title = {Visualizing and understanding Sum-Product Networks},
    journal = {Mach. Learn.},
    volume = {108},
    number = {4},
    pages = {551--573},
    year = {2019}
    }
  • ICML 2019
    Hierarchical Decompositional Mixtures of Variational Autoencoders
    Tan, Ping Liang and Peharz, Robert
    [ paper | bibtex | abstract ]
    
              Variational autoencoders (VAEs) have received considerable attention, since they allow us to learn expressive neural density estimators effectively and efficiently. However, learning and inference in VAEs is still problematic due to the sensitive interplay between the generative model and the inference network. Since these problems become generally more severe in high dimensions, we propose a novel hierarchical mixture model over low-dimensional VAE experts. Our model decomposes the overall learning problem into many smaller problems, which are coordinated by the hierarchical mixture, represented by a sum-product network. In experiments we show that our models outperform classical VAEs on almost all of our experimental benchmarks. Moreover, we show that our model is highly data efficient and degrades very gracefully in extremely low data regimes.ow data regimes.
          
    
              @inproceedings{tan2019hierarchical,
    author = {Tan, Ping Liang and Peharz, Robert},
    title = {Hierarchical Decompositional Mixtures of Variational Autoencoders},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {97},
    pages = {6115--6124},
    publisher = {{PMLR}},
    year = {2019}
    }
  • ICML 2019
    Faster Attend-Infer-Repeat with Tractable Probabilistic Models
    Stelzner, Karl and Peharz, Robert and Kersting, Kristian
    [ paper | bibtex | abstract ]
    
              The recent Attend-Infer-Repeat (AIR) framework marks a milestone in structured probabilistic modeling, as it tackles the challenging problem of unsupervised scene understanding via Bayesian inference. AIR expresses the composition of visual scenes from individual objects, and uses variational autoencoders to model the appearance of those objects. However, inference in the overall model is highly intractable, which hampers its learning speed and makes it prone to suboptimal solutions. In this paper, we show that the speed and robustness of learning in AIR can be considerably improved by replacing the intractable object representations with tractable probabilistic models. In particular, we opt for sum-product networks (SPNs), expressive deep probabilistic models with a rich set of tractable inference routines. The resulting model, called SuPAIR, learns an order of magnitude faster than AIR, treats object occlusions in a consistent manner, and allows for the inclusion of a background noise model, improving the robustness of Bayesian scene understanding.
          
    
              @inproceedings{stelzner2019faster,
    author = {Stelzner, Karl and Peharz, Robert and Kersting, Kristian},
    title = {Faster Attend-Infer-Repeat with Tractable Probabilistic Models},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {97},
    pages = {5966--5975},
    publisher = {{PMLR}},
    year = {2019}
    }
  • AAAI 2019
    Automatic Bayesian Density Analysis
    Vergari, Antonio and Molina, Alejandro and Peharz, Robert and Ghahramani, Zoubin and Kersting, Kristian and Valera, Isabel
    [ paper | bibtex | abstract ]
    
              Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for exploratory data analysis are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.
          
    
              @inproceedings{vergari2019automatic,
    author = {Vergari, Antonio and Molina, Alejandro and Peharz, Robert and Ghahramani, Zoubin and Kersting, Kristian and Valera, Isabel},
    title = {Automatic Bayesian Density Analysis},
    booktitle = {{AAAI}},
    pages = {5207--5215},
    publisher = {{AAAI} Press},
    year = {2019}
    }
  • AAAI 2019
    Deep Convolutional Sum-Product Networks
    Butz, Cory J. and de S. Oliveira, Jhonatan and dos Santos, Andre E. and Teixeira, Andre L.
    [ paper | bibtex | abstract ]
    
              We give conditions under which convolutional neural networks (CNNs) define valid sum-product networks (SPNs). One subclass, called convolutional SPNs (CSPNs), can be implemented using tensors, but also can suffer from being too shallow. Fortunately, tensors can be augmented while maintaining valid SPNs. This yields a larger subclass of CNNs, which we call deep convolutional SPNs (DCSPNs), where the convolutional and sum-pooling layers form rich directed acyclic graph structures. One salient feature of DCSPNs is that they are a rigorous probabilistic model. As such, they can exploit multiple kinds of probabilistic reasoning, including marginal inference and most probable explanation (MPE) inference. This allows an alternative method for learning DCSPNs using vectorized differentiable MPE, which plays a similar role to the generator in generative adversarial networks (GANs). Image sampling is yet another application demonstrating the robustness of DCSPNs. Our preliminary results on image sampling are encouraging, since the DCSPN sampled images exhibit variability. Experiments on image completion show that DCSPNs significantly outperform competing methods by achieving several state-of-the-art mean squared error (MSE) scores in both left-completion and bottom-completion in benchmark datasets.
          
    
              @inproceedings{butz2019deep,
    author = {Butz, Cory J. and de S. Oliveira, Jhonatan and dos Santos, Andr{'{e}} E. and Teixeira, Andr{'{e}} L.},
    title = {Deep Convolutional Sum-Product Networks},
    booktitle = {{AAAI}},
    pages = {3248--3255},
    publisher = {{AAAI} Press},
    year = {2019}
    }
  • Cutset Bayesian Networks: A New Representation for Learning Rao-Blackwellised Graphical Models
    Rahman, Tahrima
    [ paper | bibtex | abstract ]
    
              Tractable models are a subclass of probabilistic graphical models (PGMs) in which exact inference can be performed tractably – a very desirable property missing from arbitrary PGMs like Bayesian and Markov networks – exact inference is a # P-complete problem in general PGMs. Thin junction trees, mixtures of trees, tractable Markov networks, and sum-product networks are some of the well-studied and popular tractable PGMs deployed in many practical application domains. In spite of their success in performing fast exact inference, these models often suffer from low expressivity and very slow learning time. Thus inventing new and improved tractable PGMs as well as developing efficient algorithms for learning them from data has become one of the core research topics in unsupervised density estimation using probabilistic graphical models. In this dissertation, we propose cutset networks (CNs) – a new class of tractable probabilistic models for representing multidimensional joint distributions. CNs are rooted OR search trees, in which each internal node (OR node) represents conditioning of a variable in the model, with tree distributions (tree Bayesian or Markov networks) at the leaves – a simple tractable PGM that can be learned in polynomial time using the Chow-Liu algorithm. Leveraging vast amount of research on decision tree induction, we present efficient algorithms for learning CNs and their ensembles from data. Sum-product networks (SPNs) have recently emerged as deep tractable probabilistic architectures which combine latent variable mixtures and model decomposition in an alternating fashion with univariate distributions at the leaves and have shown impressive performances in several important application domains including computer vision and natural language processing. Inspired the expressive power of SPNs and the efficient learnability of CNs, we combine the two and propose sum-product-cutset networks (SPCNs). Our experimental evaluations on a large variety of high-dimensional real world benchmark datasets proved that SPCNs are not only faster to learn than SPNs but also superior in accurately modeling the underlying distribution than CNs and SPNs. We proposed several novel and efficient post-processing optimization approaches like pruning and merging which not only reduce the variance of SPCNs that overfit to the training data but also increases the speed of prediction by learning a more compact model. Finally, we propose hybrid SPCNs (HSPCNs) – SPCNs with the ability to model distributions over continuous domains. HSPCNs model the sum-product space using discrete variables (latent or observed) and use tree structured conditional linear Gaussians to model distributions at the leaves with continuous variables. Most real world application domains contain both discrete and continuous variables, and HSPCNs enable an application designer to develop rich density estimators in such domains.
          
    
              @phdthesis{rahman2016scalable, 
    title={Scalable Learning Approaches for Sum-Product-Cutset Networks},
    author={Rahman, Tahrima},
    year={2016}
    }
  • arXiv 2019
    SPFlow: An Easy and Extensible Library for Deep Probabilistic Learning using Sum-Product Networks
    Molina, Alejandro and Vergari, Antonio and Stelzner, Karl and Peharz, Robert and Subramani, Pranav and Di Mauro, Nicola and Poupart, Pascal and Kersting, Kristian
    [ paper | bibtex | abstract ]
    
              We introduce SPFlow, an open-source Python library providing a simple interface to inference, learning and manipulation routines for deep and tractable probabilistic models called Sum-Product Networks (SPNs). The library allows one to quickly create SPNs both from data and through a domain specific language (DSL). It efficiently implements several probabilistic inference routines like computing marginals, conditionals and (approximate) most probable explanations (MPEs) along with sampling as well as utilities for serializing, plotting and structure statistics on an SPN. Moreover, many of the algorithms proposed in the literature to learn the structure and parameters of SPNs are readily available in SPFlow. Furthermore, SPFlow is extremely extensible and customizable, allowing users to promptly distill new inference and learning routines by injecting custom code into a lightweight functional-oriented API framework. This is achieved in SPFlow by keeping an internal Python representation of the graph structure that also enables practical compilation of an SPN into a TensorFlow graph, C, CUDA or FPGA custom code, significantly speeding-up computations.
          
    
              @article{molina2019spflow:,
    author = {Molina, Alejandro and Vergari, Antonio and Stelzner, Karl and Peharz, Robert and Subramani, Pranav and Di Mauro, Nicola and Poupart, Pascal and Kersting, Kristian},
    title = {SPFlow: An Easy and Extensible Library for Deep Probabilistic Learning
    using Sum-Product Networks},
    journal = {CoRR},
    volume = {abs/1901.03704},
    year = {2019}
    }
  • arXiv 2019
    Deep Generalized Convolutional Sum-Product Networks
    van de Wolfshaar, Jos and Pronobis, Andrzej
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are hierarchical, graphical models that combine benefits of deep learning and probabilistic modeling. SPNs offer unique advantages to applications demanding exact probabilistic inference over high-dimensional, noisy inputs. Yet, compared to convolutional neural nets, they struggle with capturing complex spatial relationships in image data. To alleviate this issue, we introduce Deep Generalized Convolutional Sum-Product Networks (DGC-SPNs), which encode spatial features in a way similar to CNNs, while preserving the validity of the probabilistic SPN model. As opposed to existing SPN-based image representations, DGC-SPNs allow for overlapping convolution patches through a novel parameterization of dilations and strides, resulting in significantly improved feature coverage and feature resolution. DGC-SPNs substantially outperform other SPN architectures across several visual datasets and for both generative and discriminative tasks, including image inpainting and classification. These contributions are reinforced by the first simple, scalable, and GPU-optimized implementation of SPNs, integrated with the widely used Keras/TensorFlow framework. The resulting model is fully probabilistic and versatile, yet efficient and straightforward to apply in practical applications in place of traditional deep nets.
          
    
              @article{vandewolfshaar2019deep, 
    title={Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations},
    author={van de Wolfshaar, Jos and Pronobis, Andrzej},
    journal={arXiv preprint arXiv:1902.06155},
    year={2019}
    }
  • Int. J. Approx. Reason. 2018
    Robustifying sum-product networks
    Maua, Denis Deratani and Conaty, Diarmaid and Gagliardi Cozman, Fabio and Poppenhaeger, Katja and de Campos, Cassio Polpo
    [ paper | bibtex | abstract ]
    
              Sum-product networks are a relatively new and increasingly popular family of probabilistic graphical models that allow for marginal inference with polynomial effort. They have been shown to achieve state-of-the-art performance in several tasks involving density estimation. Sum-product networks are typically learned from data; as such, inferences produced with them are prone to be unreliable and overconfident when data is scarce. In this work, we develop the credal sum-product networks, a generalization of sum-product networks that uses set-valued parameters. We present algorithms and complexity results for common inference tasks with this class of models. We also present an approach for assessing the reliability of classifications made with sum-product networks. We apply this approach on benchmark classification tasks as well as a new application in predicting the age of stars. Our experiments show that the use of credal sum-product networks allow us to distinguish between reliable and unreliable classifications with higher accuracy than standard approaches based on (precise) probability values.
          
    
              @article{maua2018robustifying,
    author = {Mau{'{a}}, Denis Deratani and Conaty, Diarmaid and Gagliardi Cozman, F{'{a}}bio and Poppenhaeger, Katja and de Campos, Cassio Polpo},
    title = {Robustifying sum-product networks},
    journal = {Int. J. Approx. Reason.},
    volume = {101},
    pages = {163--180},
    year = {2018}
    }
  • NeurIPS 2018
    Deep Homogeneous Mixture Models: Representation, Separation, and Approximation
    Jaini, Priyank and Poupart, Pascal and Yu, Yaoliang
    [ paper | bibtex | abstract ]
    
              At their core, many unsupervised learning models provide a compact representation of homogeneous density mixtures, but their similarities and differences are not always clearly understood. In this work, we formally establish the relationships among latent tree graphical models (including special cases such as hidden Markov models and tensorial mixture models), hierarchical tensor formats and sum-product networks. Based on this connection, we then give a unified treatment of exponential separation in mph{exact} representation size between deep mixture architectures and shallow ones. In contrast, for mph{approximate} representation, we show that the conditional gradient algorithm can approximate any homogeneous mixture within ϵ accuracy by combining O(1/ϵ2) shallow'' architectures, where the hidden constant may decrease (exponentially) with respect to the depth. Our experiments on both synthetic and real datasets confirm the benefits of depth in density estimation.
          
    
              @inproceedings{jaini2018deep,
    author = {Jaini, Priyank and Poupart, Pascal and Yu, Yaoliang},
    title = {Deep Homogeneous Mixture Models: Representation, Separation, and Approximation},
    booktitle = {NeurIPS},
    pages = {7136--7145},
    year = {2018}
    }
  • ICCD 2018
    Automatic Mapping of the Sum-Product Network Inference Problem to FPGA-Based Accelerators
    Sommer, Lukas and Oppermann, Julian and Molina, Alejandro and Binnig, Carsten and Kersting, Kristian and Koch, Andreas
    [ paper | bibtex | abstract ]
    
              In recent years, FPGAs have been successfully employed for the implementation of efficient, application-specific accelerators for a wide range of machine learning tasks. In this work, we consider probabilistic models, namely, (Mixed) Sum-Product Networks (SPN), a deep architecture that can provide tractable inference for multivariate distributions over mixed data-sources. We develop a fully pipelined FPGA accelerator architecture, including a pipelined interface to external memory, for the inference in (mixed) SPNs. To meet the precision constraints of SPNs, all computations are conducted using double-precision floating point arithmetic. Starting from an input description, the custom FPGA-accelerator is synthesized fully automatically by our tool flow. To the best of our knowledge, this work is the first approach to offload the SPN inference problem to FPGA-based accelerators. Our evaluation shows that the SPN inference problem benefits from offloading to our pipelined FPGA accelerator architecture.
          
    
              @inproceedings{sommer2018automatic,
    author = {Sommer, Lukas and Oppermann, Julian and Molina, Alejandro and Binnig, Carsten and Kersting, Kristian and Koch, Andreas},
    title = {Automatic Mapping of the Sum-Product Network Inference Problem to
    FPGA-Based Accelerators},
    booktitle = {{ICCD}},
    pages = {350--357},
    publisher = {{IEEE} Computer Society},
    year = {2018}
    }
  • arXiv 2018
    Tractable Querying and Learning in Hybrid Domains via Sum-Product Networks
    Bueff, Andreas and Speichert, Stefanie and Belle, Vaishak
    [ paper | bibtex | abstract ]
    
              Probabilistic representations, such as Bayesian and Markov networks, are fundamental to much of statistical machine learning. Thus, learning probabilistic representations directly from data is a deep challenge, the main computational bottleneck being inference that is intractable. Tractable learning is a powerful new paradigm that attempts to learn distributions that support efficient probabilistic querying. By leveraging local structure, representations such as sum-product networks (SPNs) can capture high tree-width models with many hidden layers, essentially a deep architecture, while still admitting a range of probabilistic queries to be computable in time polynomial in the network size. The leaf nodes in SPNs, from which more intricate mixtures are formed, are tractable univariate distributions, and so the literature has focused on Bernoulli and Gaussian random variables. This is clearly a restriction for handling mixed discrete-continuous data, especially if the continuous features are generated from non-parametric and non-Gaussian distribution families. In this work, we present a framework that systematically integrates SPN structure learning with weighted model integration, a recently introduced computational abstraction for performing inference in hybrid domains, by means of piecewise polynomial approximations of density functions of arbitrary shape. Our framework is instantiated by exploiting the notion of propositional abstractions, thus minimally interfering with the SPN structure learning module, and supports a powerful query interface for conditioning on interval constraints. Our empirical results show that our approach is effective, and allows a study of the trade off between the granularity of the learned model and its predictive power.
          
    
              @article{bueff2018tractable,
    author = {Bueff, Andreas and Speichert, Stefanie and Belle, Vaishak},
    title = {Tractable Querying and Learning in Hybrid Domains via Sum-Product
    Networks},
    journal = {CoRR},
    volume = {abs/1807.05464},
    year = {2018}
    }
  • NeurIPS 2018
    Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks
    Kalra, Agastya and Rashwan, Abdullah and Hsu, Wei-Shou and Poupart, Pascal and Doshi, Prashant and Trimponias, Georgios
    [ paper | bibtex | abstract ]
    
              Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes a new online structure learning technique for feed-forward and recurrent SPNs. The algorithm is demonstrated on real-world datasets with continuous features for which it is not clear what network architecture might be best, including sequence datasets of varying length.
          
    
              @inproceedings{kalra2018online,
    author = {Kalra, Agastya and Rashwan, Abdullah and Hsu, Wei{-}Shou and Poupart, Pascal and Doshi, Prashant and Trimponias, Georgios},
    title = {Online Structure Learning for Feed-Forward and Recurrent Sum-Product
    Networks},
    booktitle = {NeurIPS},
    pages = {6944--6954},
    year = {2018}
    }
  • PGM 2018
    Prometheus : Directly Learning Acyclic Directed Graph Structures for Sum-Product Networks
    Jaini, Priyank and Ghose, Amur and Poupart, Pascal
    [ paper | bibtex | abstract ]
    
              In this paper, we present Prometheus, a graph partitioning based algorithm that creates multiple variable decompositions efficiently for learning Sum-Product Network structures across both continuous and discrete domains. Prometheus proceeds by creating multiple candidate decompositions that are represented compactly with an acyclic directed graph in which common parts of different decompositions are shared. It eliminates the correlation threshold hyperparameter often used in other structure learning techniques, allowing Prometheus to learn structures that are robust in low data regimes. Prometheus outperforms other structure learning techniques in 30 discrete and continuous domains. We also describe a sampling based approximation of Prometheus that scales to high-dimensional domains such as images.
          
    
              @inproceedings{jaini2018prometheus,
    author = {Jaini, Priyank and Ghose, Amur and Poupart, Pascal},
    title = {Prometheus : Directly Learning Acyclic Directed Graph Structures for
    Sum-Product Networks},
    booktitle = {{PGM}},
    series = {Proceedings of Machine Learning Research},
    volume = {72},
    pages = {181--192},
    publisher = {{PMLR}},
    year = {2018}
    }
  • PGM 2018
    Cascading Sum-Product Networks using Robustness
    Conaty, Diarmaid and Martinez del Rincon, Jesus and de Campos, Cassio P.
    [ paper | bibtex | abstract ]
    
              Sum-product networks are an increasingly popular family of probabilistic graphical models for which marginal inference can be performed in polynomial time. They have been shown to achieve state-of-the-art performance in several tasks. When learning sum-product networks from scarce data, the obtained model may be prone to robustness issues. In particular, small variations of parameters could lead to different conclusions. We discuss the characteristics of sum-product networks as classifiers and study the robustness of them with respect to their parameters. Using a robustness measure to identify (possibly) unreliable decisions, we build a hierarchical approach where the classification task is deferred to another model if the outcome is deemed unreliable. We apply this approach on benchmark classification tasks and experiments show that the robustness measure can be a meaningful manner to improve classification accuracy.
          
    
              @inproceedings{conaty2018cascading,
    author = {Conaty, Diarmaid and Mart{'{i}}nez del Rinc{'{o}}n, Jes{'{u}}s and de Campos, Cassio P.},
    title = {Cascading Sum-Product Networks using Robustness},
    booktitle = {{PGM}},
    series = {Proceedings of Machine Learning Research},
    volume = {72},
    pages = {73--84},
    publisher = {{PMLR}},
    year = {2018}
    }
  • Advances in Cognitive Systems 2018
    Exact, tractable inference in the Sigma cognitive architecture via sum-product networks
    Joshi, Himanshu and Rosenbloom, Paul S. and Ustun, Volkan
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a new kind of deep architecture that support exact, tractable inference over a large class of problems for which traditional graphical models cannot. The Sigma cognitive architecture is based on graphical models, posing a challenge for it to handle problems within this class, such as parsing with probabilistic grammars, a potentially important aspect of language processing. This work proves that an early unidirectional extension to Sigma's graphical architecture, originally added in service of rule-like behavior but later also shown to support neural networks, can be leveraged to yield exact, tractable computations across this class of problems, and further demonstrates this tractability experimentally for  probabilistic parsing. It thus shows that Sigma is able to specify any valid SPN and, despite its grounding in graphical models, retain the desirable inference properties of SPNs when solving them.
          
    
              @article{joshi2018exact, 
    title={Exact, tractable inference in the Sigma cognitive architecture via sum-product networks},
    author={Joshi, Himanshu and Rosenbloom, Paul S and Ustun, Volkan},
    journal={Advances in Cognitive Systems},
    volume={7},
    pages={31--47},
    year={2018}
    }
  • arXiv 2018
    Sum-Product Networks for Sequence Labeling
    Ratajczak, Martin and Tschiatschek, Sebastian and Pernkopf, Franz
    [ paper | bibtex | abstract ]
    
              We consider higher-order linear-chain conditional random fields (HO-LC-CRFs) for sequence modelling, and use sum-product networks (SPNs) for representing higher-order input- and output-dependent factors. SPNs are a recently introduced class of deep models for which exact and efficient inference can be performed. By combining HO-LC-CRFs with SPNs, expressive models over both the output labels and the hidden variables are instantiated while still enabling efficient exact inference. Furthermore, the use of higher-order factors allows us to capture relations of multiple input segments and multiple output labels as often present in real-world data. These relations can not be modelled by the commonly used first-order models and higher-order models with local factors including only a single output label. We demonstrate the effectiveness of our proposed models for sequence labeling. In extensive experiments, we outperform other state-of-the-art methods in optical character recognition and achieve competitive results in phone classification.
          
    
              @article{ratajczak2018sum-product,
    author = {Ratajczak, Martin and Tschiatschek, Sebastian and Pernkopf, Franz},
    title = {Sum-Product Networks for Sequence Labeling},
    journal = {CoRR},
    volume = {abs/1807.02324},
    year = {2018}
    }
  • PGM 2018
    An Empirical Study of Methods for SPN Learning and Inference
    Butz, Cory J. and de S. Oliveira, Jhonatan and dos Santos, Andre E. and Teixeira, Andre L. and Poupart, Pascal and Kalra, Agastya
    [ paper | bibtex | abstract ]
    
              In this study, we provide an empirical comparison of methods for mph{sum-product network} (SPN) learning and inference. LearnSPN is a popular algorithm for learning SPNs that utilizes chop and slice operations. As mph{g-test} is a standard chopping method and mph{Gaussian mixture models} (GMM) using expectation-maximization is a common slicing method, it seems to have been assumed in the literature that this is the best pair in LearnSPN. On the contrary, our results show that g-test for chopping and mph{k-means} for slicing yields SPNs that are just as accurate. Moreover, it has been shown that implementing SPN leaf nodes as mph{Chow-Liu Trees} (CLTs) yields more accurate SPNs for the former pair. Our experiments show the same for the latter pair, and that neither pair dominates the other. Lastly, we report an analysis of SPN topology for unstudied pairs. With respect to inference, we derive mph{partial propagation} (PP), which performs SPN exact inference without requiring a full propagation over all nodes in the SPN as currently done. Experimental results on SPN datasets demonstrate that PP has several advantages over full propagation in SPNs, including relative time savings, absolute time savings in large SPNs, and scalability.
          
    
              @inproceedings{butz2018empirical,
    author = {Butz, Cory J. and de S. Oliveira, Jhonatan and dos Santos, Andr{'{e}} E. and Teixeira, Andr{'{e}} L. and Poupart, Pascal and Kalra, Agastya},
    title = {An Empirical Study of Methods for {SPN} Learning and Inference},
    booktitle = {{PGM}},
    series = {Proceedings of Machine Learning Research},
    volume = {72},
    pages = {49--60},
    publisher = {{PMLR}},
    year = {2018}
    }
  • IEA/AIE 2018
    Efficient Examination of Soil Bacteria Using Probabilistic Graphical Models
    Butz, Cory J. and dos Santos, Andre E. and de S. Oliveira, Jhonatan and Stavrinides, John
    [ paper | bibtex | abstract ]
    
              This paper describes a novel approach to study bacterial relationships in soil datasets using probabilistic graphical models. We demonstrate how to access and reformat publicly available datasets in order to apply machine learning techniques. We first learn a Bayesian network in order to read independencies in linear time between bacterial community characteristics. These independencies are useful in understanding the semantic relationships between bacteria within communities. Next, we learn a Sum-Product network in order to perform inference in linear time. Here, inference can be conducted to answer traditional queries, involving posterior probabilities, or MPE queries, requesting the most likely values of the non-evidence variables given evidence. Our results extend the literature by showing that known relationships between soil bacteria holding in one or a few datasets in fact hold across at least 3500 diverse datasets. This study paves the way for future large-scale studies of agricultural, health, and environmental applications, for which data are publicly available.
          
    
              @inproceedings{butz2018efficient,
    author = {Butz, Cory J. and dos Santos, Andr{'{e}} E. and de S. Oliveira, Jhonatan and Stavrinides, John},
    title = {Efficient Examination of Soil Bacteria Using Probabilistic Graphical
    Models},
    booktitle = {{IEA/AIE}},
    series = {Lecture Notes in Computer Science},
    volume = {10868},
    pages = {315--326},
    publisher = {Springer},
    year = {2018}
    }
  • AISTATS 2018
    Sum-Product-Quotient Networks
    Sharir, Or and Shashua, Amnon
    [ paper | bibtex | abstract ]
    
              We present a novel tractable generative model that extends Sum-Product Networks (SPNs) and significantly boosts their power. We call it Sum-Product-Quotient Networks (SPQNs), whose core concept is to incorporate conditional distributions into the model by direct computation using quotient nodes, e.g. 𝑃(𝐴|𝐵)=𝑃(𝐴,𝐵)𝑃(𝐵). We provide sufficient conditions for the tractability of SPQNs that generalize and relax the decomposable and complete tractability conditions of SPNs. These relaxed conditions give rise to an exponential boost to the expressive efficiency of our model, i.e. we prove that there are distributions which SPQNs can compute efficiently but require SPNs to be of exponential size. Thus, we narrow the gap in expressivity between tractable graphical models and other Neural Network-based generative models.
          
    
              @inproceedings{sharir2018sum-product-quotient,
    author = {Sharir, Or and Shashua, Amnon},
    title = {Sum-Product-Quotient Networks},
    booktitle = {{AISTATS}},
    series = {Proceedings of Machine Learning Research},
    volume = {84},
    pages = {529--537},
    publisher = {{PMLR}},
    year = {2018}
    }
  • AAAI 2018
    Learning Graph-Structured Sum-Product Networks for Probabilistic Semantic Maps
    Zheng, Kaiyu and Pronobis, Andrzej and Rao, Rajesh P. N.
    [ paper | bibtex | abstract ]
    
              We introduce Graph-Structured Sum-Product Networks (GraphSPNs), a probabilistic approach to structured prediction for problems where dependencies between latent variables are expressed in terms of arbitrary, dynamic graphs. While many approaches to structured prediction place strict constraints on the interactions between inferred variables, many real-world problems can be only characterized using complex graph structures of varying size, often contaminated with noise when obtained from real data. Here, we focus on one such problem in the domain of robotics. We demonstrate how GraphSPNs can be used to bolster inference about semantic, conceptual place descriptions using noisy topological relations discovered by a robot exploring large-scale office spaces. Through experiments, we show that GraphSPNs consistently outperform the traditional approach based on undirected graphical models, successfully disambiguating information in global semantic maps built from uncertain, noisy local evidence. We further exploit the probabilistic nature of the model to infer marginal distributions over semantic descriptions of as yet unexplored places and detect spatial environment configurations that are novel and incongruent with the known evidence.
          
    
              @inproceedings{zheng2018learning,
    author = {Zheng, Kaiyu and Pronobis, Andrzej and Rao, Rajesh P. N.},
    title = {Learning Graph-Structured Sum-Product Networks for Probabilistic Semantic
    Maps},
    booktitle = {{AAAI}},
    pages = {4547--4555},
    publisher = {{AAAI} Press},
    year = {2018}
    }
  • AAAI 2018
    Maximum A Posteriori Inference in Sum-Product Networks
    Mei, Jun and Jiang, Yong and Tu, Kewei
    [ paper | code | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a class of probabilistic graphical models that allow tractable marginal inference. However, the maximum a posteriori (MAP) inference in SPNs is NP-hard. We investigate MAP inference in SPNs from both theoretical and algorithmic perspectives. For the theoretical part, we reduce general MAP inference to its special case without evidence and hidden variables; we also show that it is NP-hard to approximate the MAP problem to 2nϵ for fixed 0≤ϵ<1, where n is the input size. For the algorithmic part, we first present an exact MAP solver that runs reasonably fast and could handle SPNs with up to 1k variables and 150k arcs in our experiments. We then present a new approximate MAP solver with a good balance between speed and accuracy, and our comprehensive experiments on real-world datasets show that it has better overall performance than existing approximate solvers.
          
    
              @inproceedings{mei2018maximum,
    author = {Mei, Jun and Jiang, Yong and Tu, Kewei},
    title = {Maximum {A} Posteriori Inference in Sum-Product Networks},
    booktitle = {{AAAI}},
    pages = {1923--1930},
    publisher = {{AAAI} Press},
    year = {2018}
    }
  • AAAI 2018
    Sum-Product Autoencoding: Encoding and Decoding Representations Using Sum-Product Networks
    Vergari, Antonio and Peharz, Robert and Di Mauro, Nicola and Molina, Alejandro and Kersting, Kristian and Esposito, Floriana
    [ paper | code | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are a deep probabilistic architecture that up to now has been successfully employed for tractable inference. Here, we extend their scope towards unsupervised representation learning: we encode samples into continuous and categorical embeddings and show that they can also be decoded back into the original input space by leveraging MPE inference. We characterize when this Sum-Product Autoencoding (SPAE) leads to equivalent reconstructions and extend it towards dealing with missing embedding information. Our experimental results on several multi-label classification problems demonstrate that SPAE is competitive with state-of-the-art autoencoder architectures, even if the SPNs were never trained to reconstruct their inputs.
          
    
              @inproceedings{vergari2018sum-product,
    author = {Vergari, Antonio and Peharz, Robert and Di Mauro, Nicola and Molina, Alejandro and Kersting, Kristian and Esposito, Floriana},
    title = {Sum-Product Autoencoding: Encoding and Decoding Representations Using
    Sum-Product Networks},
    booktitle = {{AAAI}},
    pages = {4163--4170},
    publisher = {{AAAI} Press},
    year = {2018}
    }
  • AAAI 2018
    Mixed Sum-Product Networks: {A} Deep Architecture for Hybrid Domains
    Molina, Alejandro and Vergari, Antonio and Di Mauro, Nicola and Natarajan, Sriraam and Esposito, Floriana and Kersting, Kristian
    [ paper | code | bibtex | abstract ]
    
              While all kinds of mixed data---from personal data, over panel and scientific data, to public and commercial data---are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difficult. Users spend significant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson, Logit, etc.) involved and learning the mixed models. To make this difficult task easier, we propose the first trainable probabilistic deep architecture for hybrid domains that features tractable queries. It is based on Sum-Product Networks (SPNs) with piecewise polynomial leaf distributions together with novel nonparametric decomposition and conditioning steps using the Hirschfeld-Gebelein-Renyi Maximum Correlation Coefficient. This relieves the user from deciding a-priori the parametric form of the random variables but is still expressive enough to effectively approximate any distribution and permits efficient learning and inference. Our experiments show that the architecture, called Mixed SPNs, can indeed capture complex distributions across a wide range of hybrid domains.
          
    
              @inproceedings{molina2018mixed,
    author = {Molina, Alejandro and Vergari, Antonio and Di Mauro, Nicola and Natarajan, Sriraam and Esposito, Floriana and Kersting, Kristian},
    title = {Mixed Sum-Product Networks: {A} Deep Architecture for Hybrid Domains},
    booktitle = {{AAAI}},
    pages = {3828--3835},
    publisher = {{AAAI} Press},
    year = {2018}
    }
  • BRACIS 2017
    On Using Sum-Product Networks for Multi-label Classification
    Llerena, Julissa Villanueva and Maua, Denis Deratani
    [ paper | bibtex | abstract ]
    
              Multi-Label classification consists of mapping each object to a set of relevant labels. A successful approach to constructing Multi-Label classifiers is to obtain a probabilistic model of the relation between object attributes and labels. This model can then be used to classify objects by computing the most probable configuration of label relevance variable conditioned on the attributes. Sum-Product Networks (SPN) are deep probabilistic models that have shown promising results in many tasks. As with many other probabilistic models, performing Most Probable Explanation (MPE) inference is NP-hard in SPNs. In this work we investigate the use of SPNs for Multi-Label classification. We compare different approaches for learning SPNs and computing MPE. We show that SPN-based Multi-Label Classifiers are competitive against state-of-the-art classifiers in a collection of real-world experiments.
          
    
              @inproceedings{villanuevallerena2017using,
    author = {Llerena, Julissa Villanueva and Maua, Denis Deratani},
    title = {On Using Sum-Product Networks for Multi-label Classification},
    booktitle = {{BRACIS}},
    pages = {25--30},
    publisher = {{IEEE} Computer Society},
    year = {2017}
    }
  • ICML 2017
    On Relaxing Determinism in Arithmetic Circuits
    Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              The past decade has seen a significant interest in learning tractable probabilistic representations. Arithmetic circuits (ACs) were among the first proposed tractable representations, with some subsequent representations being instances of ACs with weaker or stronger properties. In this paper, we provide a formal basis under which variants on ACs can be compared, and where the precise roles and semantics of their various properties can be made more transparent. This allows us to place some recent developments on ACs in a clearer perspective and to also derive new results for ACs. This includes an exponential separation between ACs with and without determinism; completeness and incompleteness results; and tractability results (or lack thereof) when computing most probable explanations (MPEs).
          
    
              @inproceedings{choi2017relaxing,
    author = {Choi, Arthur and Darwiche, Adnan},
    title = {On Relaxing Determinism in Arithmetic Circuits},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {70},
    pages = {825--833},
    publisher = {{PMLR}},
    year = {2017}
    }
  • ICMLA 2017
    Autoencoder-Enhanced Sum-Product Networks
    Dennis, Aaron W. and Ventura, Dan
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are probabilistic models that guarantee exact inference in time linear in the size of the network. We use autoencoders in concert with SPNs to model high-dimensional, high-arity random vectors (e.g., image data). Experiments show that our proposed model, the autoencoder-SPN (AESPN), which combines two SPNs and an autoencoder, produces better samples than an SPN alone. This is true whether we sample all variables, or whether a set of unknown query variables is sampled, given a set of known evidence variables.
          
    
              @inproceedings{dennis2017autoencoder-enhanced,
    author = {Dennis, Aaron W. and Ventura, Dan},
    title = {Autoencoder-Enhanced Sum-Product Networks},
    booktitle = {{ICMLA}},
    pages = {1041--1044},
    publisher = {{IEEE}},
    year = {2017}
    }
  • ICMLA 2017
    Online Structure-Search for Sum-Product Networks
    Dennis, Aaron W. and Ventura, Dan
    [ paper | bibtex | abstract ]
    
              A variety of algorithms exist for learning both the structure and parameters of sum-product networks (SPNs), a class of probabilistic model in which exact inference can be done quickly. The vast majority of them are batch learners, including a recently proposed algorithm, SEARCHSPN. However, SEARCHSPN has properties that make it particularly suited for adaptation to the online setting. In this paper we introduce the ONLINESEARCHSPN algorithm which does just that. We compare it to two general methods that build online learners from batch learners; one learns poor models quickly and the other learns good models slowly. Our experiments show that ONLINESEARCHSPN achieves the best of both methods. The test likelihood values of the models it learns are as good as the slow learner, while the training times needed to learn the models are much closer to the fast learner.
          
    
              @inproceedings{dennis2017online,
    author = {Dennis, Aaron W. and Ventura, Dan},
    title = {Online Structure-Search for Sum-Product Networks},
    booktitle = {{ICMLA}},
    pages = {155--160},
    publisher = {{IEEE}},
    year = {2017}
    }
  • AI*IA 2017
    Alternative Variable Splitting Methods to Learn Sum-Product Networks
    Di Mauro, Nicola and Esposito, Floriana and Ventola, Fabrizio G. and Vergari, Antonio
    [ paper | code | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are recent deep probabilistic models providing exact and tractable inference. SPNs have been successfully employed as density estimators in several application domains. However, learning an SPN from high dimensional data still poses a challenge in terms of time complexity. This is due to the high cost of determining independencies among random variables (RVs) and sub-populations among samples, two operations that are repeated several times. Even one of the simplest greedy structure learner, LearnSPN, scales quadratically in the number of the variables to determine RVs independencies. In this work we investigate approximate but fast procedures to determine independencies among RVs whose complexity scales in sub-quadratic time. We propose two procedures: a random subspace approach and one that adopts entropy as a criterion to split RVs in linear time. Experimental results prove that LearnSPN equipped by our splitting procedures is able to reduce learning and/or inference times while preserving comparable inference accuracy.
          
    
              @inproceedings{dimauro2017alternative,
    author = {Di Mauro, Nicola and Esposito, Floriana and Ventola, Fabrizio G. and Vergari, Antonio},
    title = {Alternative Variable Splitting Methods to Learn Sum-Product Networks},
    booktitle = {AI*IA},
    series = {Lecture Notes in Computer Science},
    volume = {10640},
    pages = {334--346},
    publisher = {Springer},
    year = {2017}
    }
  • PADL 2017
    LibSPN: A library for learning and inference with Sum-Product Networks and TensorFlow
    Pronobis, Andrzej and Ranganath, Avinash and Rao, Rajesh PN
    [ paper | bibtex | abstract ]
    
              Abstract Sum-Product Networks (SPNs) are a probabilistic deep architecture with solid theoretical foundations, which demonstrated state-of-the-art performance in several domains. Yet, surprisingly, there are no mature, general-purpose SPN implementations that would serve as a platform for the community of machine learning researchers centered around SPNs. Here, we present a new general-purpose Python library called LIBSPN, which aims to become such a platform. The library is designed to make it straightforward and effortless to apply various SPN architectures to large-scale datasets and problems. The library achieves scalability and efficiency, thanks toa tight coupling with TensorFlow, a framework already used by a large community of researchers and developers in multiple domains. We describe the design and benefits of LIBSPN, give several use-case examples, and demonstrate the applicability of the library to real-world problems on the example of spatial understanding in mobile robotics.
          
    
              @inproceedings{pronobis2017libspn, 
    title={LibSPN: A library for learning and inference with Sum-Product Networks and TensorFlow},
    author={Pronobis, Andrzej and Ranganath, Avinash and Rao, Rajesh PN},
    booktitle={Principled Approaches to Deep Learning Workshop},
    year={2017}
    }
  • SSRR (ICAPS) 2017
    Deep spatial affordance hierarchy: Spatial knowledge representation for planning in large-scale environments
    Pronobis, Andrzej and Riccio, Francesco and Rao, Rajesh PN
    [ paper | bibtex | abstract ]
    
              Domain-specific state representations are a funda-mental component that enables planning of robot actions in unstructured human environments. In case of mobile robots, it is the  spatial knowledge that constitutes the core of the state, and directly affects the performance of the planning algorithm. Here, we propose Deep Spatial Affordance Hierarchy  (DASH), a probabilistic representation of spatial knowledge, spanning multiple levels of abstraction from geometry and appearance to semantics, and leveraging a deep model of generic spatial concepts. DASH is designed to represent space from the perspective of a mobile robot executing complex behaviors in the environment, and directly encodes gaps in knowledge and spatial affordances. In this paper, we explain the principles behind DASH, and presentits initial realization for a robot equipped with laser-range sensor. We demonstrate the ability of our implementation to successfullybuild representations of large-scale environments, and leveragethe deep model of generic spatial concepts to infer latent and missing information at all abstraction levels.
          
    
              @inproceedings{pronobis2017deep, 
    title={Deep spatial affordance hierarchy: Spatial knowledge representation for planning in large-scale environments},
    author={Pronobis, Andrzej and Riccio, Francesco and Rao, Rajesh PN},
    booktitle={ICAPS 2017 Workshop on Planning and Robotics},
    year={2017}
    }
  • MICCAI (1) 2017
    Locally Adaptive Probabilistic Models for Global Segmentation of Pathological {OCT} Scans
    Rathke, Fabian and Desana, Mattia and Schnoerr, Christoph
    [ paper | bibtex | abstract ]
    
              Segmenting retinal tissue deformed by pathologies can be challenging. Segmentation approaches are often constructed with a certain pathology in mind and may require a large set of labeled pathological scans, and therefore are tailored to that particular pathology.We present an approach that can be easily transfered to new pathologies, as it is designed with no particular pathology in mind and requires no pathological ground truth. The approach is based on a graphical model trained for healthy scans, which is modified locally by adding pathology-specific shape modifications. We use the framework of sum-product networks (SPN) to find the best combination of modified and unmodified local models that globally yield the best segmentation. The approach further allows to localize and quantify the pathology. We demonstrate the flexibility and the robustness of our approach, by presenting results for three different pathologies: diabetic macular edema (DME), age-related macular degeneration (AMD) and non-proliferative diabetic retinopathy.
          
    
              @inproceedings{rathke2017locally,
    author = {Rathke, Fabian and Desana, Mattia and Schnorr, Christoph},
    title = {Locally Adaptive Probabilistic Models for Global Segmentation of Pathological
    {OCT} Scans},
    booktitle = {{MICCAI} {(1)}},
    series = {Lecture Notes in Computer Science},
    volume = {10433},
    pages = {177--184},
    publisher = {Springer},
    year = {2017}
    }
  • ISIPTA 2017
    Credal Sum-Product Networks
    Maua, Denis Deratani and Cozman, Fabio Gagliardi and Conaty, Diarmaid and de Campos, Cassio P.
    [ paper | bibtex | abstract ]
    
              Sum-product networks are a relatively new and increasingly popular class of (precise) probabilistic graphical models that allow for marginal inference with polynomial effort. As with other probabilistic models, sum-product networks are often learned from data and used to perform classification. Hence, their results are prone to be unreliable and overconfident. In this work, we develop credal sum-product networks, an imprecise extension of sum-product networks. We present algorithms and complexity results for common inference tasks. We apply our algorithms on realistic classification task using images of digits and show that credal sum-product networks obtained by a perturbation of the parameters of learned sum-product networks are able to distinguish between reliable and unreliable classifications with high accuracy.
          
    
              @inproceedings{maua2017credal,
    author = {Mau{'{a}}, Denis Deratani and Cozman, F{'{a}}bio Gagliardi and Conaty, Diarmaid and Campos, Cassio P. de},
    title = {Credal Sum-Product Networks},
    booktitle = {{ISIPTA}},
    series = {Proceedings of Machine Learning Research},
    volume = {62},
    pages = {205--216},
    publisher = {{PMLR}},
    year = {2017}
    }
  • UAI 2017
    Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks
    Conaty, Diarmaid and Campos, Cassio P. de and Mau{'{a, Denis Deratani
    [ paper | bibtex | abstract ]
    
              We discuss the computational complexity of approximating maximum a posteriori inference in sum-product networks. We first show NP-hardness in trees of height two by a reduction from maximum independent set; this implies non-approximability within a sublinear factor. We show that this is a tight bound, as we can find an approximation within a linear factor in networks of height two. We then show that, in trees of height three, it is NP-hard to approximate the problem within a factor 2f(n) for any sublinear function f of the size of the input n. Again, this bound is tight, as we prove that the usual max-product algorithm finds (in any network) approximations within factor 2c⋅n for some constant c<1. Last, we present a simple algorithm, and show that it provably produces solutions at least as good as, and potentially much better than, the max-product algorithm. We empirically analyze the proposed algorithm against max-product using synthetic and realistic networks.
          
    
              @inproceedings{conaty2017approximation,
    author = {Conaty, Diarmaid and Campos, Cassio P. de and Mau{'{a, Denis Deratani},
    title = {Approximation Complexity of Maximum {A} Posteriori Inference in Sum-Product
    Networks},
    booktitle = {{UAI}},
    publisher = {{AUAI} Press},
    year = {2017}
    }
  • NIPS 2017
    Linear Time Computation of Moments in Sum-Product Networks
    Zhao, Han and Gordon, Geoffrey J.
    [ paper | bibtex | abstract ]
    
              Bayesian online algorithms for Sum-Product Networks (SPNs) need to update their posterior distribution after seeing one single additional instance. To do so, they must compute moments of the model parameters under this distribution. The best existing method for computing such moments scales quadratically in the size of the SPN, although it scales linearly for trees. This unfortunate scaling makes Bayesian online algorithms prohibitively expensive, except for small or tree-structured SPNs. We propose an optimal linear-time algorithm that works even when the SPN is a general directed acyclic graph (DAG), which significantly broadens the applicability of Bayesian online algorithms for SPNs. There are three key ingredients in the design and analysis of our algorithm: 1). For each edge in the graph, we construct a linear time reduction from the moment computation problem to a joint inference problem in SPNs. 2). Using the property that each SPN computes a multilinear polynomial, we give an efficient procedure for polynomial evaluation by differentiation without expanding the network that may contain exponentially many monomials. 3). We propose a dynamic programming method to further reduce the computation of the moments of all the edges in the graph from quadratic to linear. We demonstrate the usefulness of our linear time algorithm by applying it to develop a linear time assume density filter (ADF) for SPNs.
          
    
              @inproceedings{zhao2017linear,
    author = {Zhao, Han and Gordon, Geoffrey J.},
    title = {Linear Time Computation of Moments in Sum-Product Networks},
    booktitle = {{NIPS}},
    pages = {6894--6903},
    year = {2017}
    }
  • UAI 2017
    Safe Semi-Supervised Learning of Sum-Product Networks
    Trapp, Martin and Madl, Tamas and Peharz, Robert and Pernkopf, Franz and Trappl, Robert
    [ paper | bibtex | abstract ]
    
              In several domains obtaining class annotations is expensive while at the same time unlabelled data are abundant. While most semi-supervised approaches enforce restrictive assumptions on the data distribution, recent work has managed to learn semi-supervised models in a non-restrictive regime. However, so far such approaches have only been proposed for linear models. In this work, we introduce semi-supervised parameter learning for Sum-Product Networks (SPNs). SPNs are deep probabilistic models admitting inference in linear time in number of network edges. Our approach has several advantages, as it (1) allows generative and discriminative semi-supervised learning, (2) guarantees that adding unlabelled data can increase, but not degrade, the performance (safe), and (3) is computationally efficient and does not enforce restrictive assumptions on the data distribution. We show on a variety of data sets that safe semi-supervised learning with SPNs is competitive compared to state-of-the-art and can lead to a better generative and discriminative objective value than a purely supervised approach.
          
    
              @inproceedings{trapp2017safe,
    author = {Trapp, Martin and Madl, Tamas and Peharz, Robert and Pernkopf, Franz and Trappl, Robert},
    title = {Safe Semi-Supervised Learning of Sum-Product Networks},
    booktitle = {{UAI}},
    publisher = {{AUAI} Press},
    year = {2017}
    }
  • ICLR (Workshop) 2017
    Online Structure Learning for Sum-Product Networks with Gaussian Leaves
    Hsu, Wilson and Kalra, Agastya and Poupart, Pascal
    [ paper | code | bibtex | abstract ]
    
              Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes the first online structure learning technique for continuous SPNs with Gaussian leaves. We also introduce an accompanying new parameter learning technique.
          
    
              @inproceedings{hsu2017online,
    author = {Hsu, Wilson and Kalra, Agastya and Poupart, Pascal},
    title = {Online Structure Learning for Sum-Product Networks with Gaussian Leaves},
    booktitle = {{ICLR} (Workshop)},
    publisher = {OpenReview.net},
    year = {2017}
    }
  • ICLR} (Workshop) 2017
    Encoding and Decoding Representations with Sum- and Max-Product Networks
    Vergari, Antonio and Peharz, Robert and Mauro, Nicola Di and Esposito, Floriana
    [ paper | bibtex | abstract ]
    
              Sum-Product networks (SPNs) are expressive deep architectures for representing probability distributions, yet allowing exact and efficient inference. SPNs have been successfully applied in several domains, however always as black-box distribution estimators. In this paper, we argue that due to their recursive definition, SPNs can also be naturally employed as hierarchical feature extractors and thus for unsupervised representation learning. Moreover, when converted into Max-Product Networks (MPNs), it is possible to decode such representations back into the original input space. In this way, MPNs can be interpreted as a kind of generative autoencoder, even if they were never trained to reconstruct the input data. We show how these learned representations, if visualized, indeed correspond to meaningful parts of the training data. They also yield a large improvement when used in structured prediction tasks. As shown in extensive experiments, SPN and MPN encoding and decoding schemes prove very competitive  against the ones employing RBMs and other stacked autoencoder architectures.
          
    
              @inproceedings{vergari2017encoding,
    author = {Vergari, Antonio and Peharz, Robert and Mauro, Nicola Di and Esposito, Floriana},
    title = {Encoding and Decoding Representations with Sum- and Max-Product Networks},
    booktitle = {{ICLR} (Workshop)},
    publisher = {OpenReview.net},
    year = {2017}
    }
  • IEEE Trans. Pattern Anal. Mach. Intell. 2017
    On the Latent Variable Interpretation in Sum-Product Networks
    Peharz, Robert and Gens, Robert and Pernkopf, Franz and Domingos, Pedro M.
    [ paper | code | bibtex | abstract ]
    
              One of the central themes in Sum-Product networks (SPNs) is the interpretation of sum nodes as marginalized latent variables (LVs). This interpretation yields an increased syntactic or semantic structure, allows the application of the EM algorithm and to efficiently perform MPE inference. In literature, the LV interpretation was justified by explicitly introducing the indicator variables corresponding to the LVs’ states. However, as pointed out in this paper, this approach is in conflict with the completeness condition in SPNs and does not fully specify the probabilistic model. We propose a remedy for this problem by modifying the original approach for introducing the LVs, which we call SPN augmentation. We discuss conditional independencies in augmented SPNs, formally establish the probabilistic interpretation of the sum-weights and give an interpretation of augmented SPNs as Bayesian networks. Based on these results, we find a sound derivation of the EM algorithm for SPNs. Furthermore, the Viterbi-style algorithm for MPE proposed in literature was never proven to be correct. We show that this is indeed a correct algorithm, when applied to selective SPNs, and in particular when applied to augmented SPNs. Our theoretical results are confirmed in experiments on synthetic data and 103 real-world datasets.
          
    
              @article{peharz2017latent,
    author = {Peharz, Robert and Gens, Robert and Pernkopf, Franz and Domingos, Pedro M.},
    title = {On the Latent Variable Interpretation in Sum-Product Networks},
    journal = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
    volume = {39},
    number = {10},
    pages = {2030--2044},
    year = {2017}
    }
  • AAAI 2017
    Poisson Sum-Product Networks: {A} Deep Architecture for Tractable Multivariate Poisson Distributions
    Molina, Alejandro and Natarajan, Sriraam and Kersting, Kristian
    [ paper | bibtex | abstract ]
    
              Multivariate count data are pervasive in science in the form of histograms, contingency tables and others. Previous work on modeling this type of distributions do not allow for fast and tractable inference. In this paper we present a novel Poisson graphical model, the first based on sum product networks, called PSPN, allowing for positive as well as negative dependencies. We present algorithms for learning tree PSPNs from data as well as for tractable inference via symbolic evaluation. With these, information-theoretic measures such as entropy, mutual information, and distances among count variables can be computed without resorting to approximations. Additionally, we show a connection between PSPNs and LDA, linking the structure of tree PSPNs to a hierarchy of topics. The experimental results on several synthetic and real world data-sets demonstrate that PSPN often outperform state-of-the-art while remaining tractable.
          
    
              @inproceedings{molina2017poisson,
    author = {Molina, Alejandro and Natarajan, Sriraam and Kersting, Kristian},
    title = {Poisson Sum-Product Networks: {A} Deep Architecture for Tractable
    Multivariate Poisson Distributions},
    booktitle = {{AAAI}},
    pages = {2357--2363},
    publisher = {{AAAI} Press},
    year = {2017}
    }
  • BRACIS 2016
    Image Classification Using Sum-Product Networks for Autonomous Flight of Micro Aerial Vehicles
    Sguerra, Bruno Massoni and Cozman, Fabio Gagliardi
    [ paper | bibtex | abstract ]
    
              Flying autonomous micro aerial vehicles (MAVs) in indoor environments is still a challenging task, as MAVs are not capable of carrying heavy sensors as Lidar or RGD-B, and GPS signals are not reliable indoors. We investigate a strategy where image classification is used to guide a MAV, one of the main requirements then is to have a classifier that can produce results quickly during operation. The goal here is to explore the performance of Sum-Product Networks and Arithmetic Circuits as image classifiers, because these formalisms lead to deep probabilistic models that are tractable during operation. We have trained and tested our classifiers using the Libra toolkit and real images. We describe our approach and report the result of our experiments in the paper.
          
    
              @inproceedings{sguerra2016image,
    author = {Sguerra, Bruno Massoni and Cozman, F{'{a}}bio Gagliardi},
    title = {Image Classification Using Sum-Product Networks for Autonomous Flight
    of Micro Aerial Vehicles},
    booktitle = {{BRACIS}},
    pages = {139--144},
    publisher = {{IEEE} Computer Society},
    year = {2016}
    }
  • Bayesian Nonparametrics Workshop (NIPS 2016)
    Structure inference in sum-product networks using infinite sum-product trees
    Trapp, Martin and Peharz, Robert and Skowron, Marcin and Madl, Tamas and Pernkopf, Franz and Trappl, Robert
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are a highly efficient type of a deep probabilistic model that allows exact inference in time linear in the size of the network.  In previous work, several heuristic structure learning approaches for SPNs have been developed, which are prone to overfitting compared to a purely Bayesian model. In this work, we propose a principled approach to structure learning in SPNs by introducing infinite Sum-Product Trees (SPTs). Our approach is the first correct and successful extension of SPNs to a Bayesian nonparametric model. We show that infinite SPTs can be used successfully to discover SPN structures and outperform infinite Gaussian mixture models in the task of density estimation.
          
    
              @inproceedings{trapp2016structure, 
    title={Structure inference in sum-product networks using infinite sum-product trees},
    author={Trapp, Martin and Peharz, Robert and Skowron, Marcin and Madl, Tamas and Pernkopf, Franz and Trappl, Robert},
    booktitle={NIPS Workshop on Practical Bayesian Nonparametrics},
    year={2016}
    }
  • Probabilistic Graphical Models 2016
    Dynamic Sum Product Networks for Tractable Inference on Sequence Data
    Melibari, Mazen and Poupart, Pascal and Doshi, Prashant and Trimponias, George
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPN) have recently emerged as a new class of tractable probabilistic models. Unlike Bayesian networks and Markov networks where inference may be exponential in the size of the network, inference in SPNs is in time linear in the size of the network. Since SPNs represent distributions over a fixed set of variables only, we propose dynamic sum product networks (DSPNs) as a generalization of SPNs for sequence data of varying length. A DSPN consists of a template network that is repeated as many times as needed to model data sequences of any length. We present a local search technique to learn the structure of the template network. In contrast to dynamic Bayesian networks for which inference is generally exponential in the number of variables per time slice, DSPNs inherit the linear inference complexity of SPNs. We demonstrate the advantages of DSPNs over DBNs and other models on several datasets of sequence data.
          
    
              @inproceedings{melibari2016dynamic,
    author = {Melibari, Mazen and Poupart, Pascal and Doshi, Prashant and Trimponias, George},
    title = {Dynamic Sum Product Networks for Tractable Inference on Sequence Data},
    booktitle = {Probabilistic Graphical Models},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {52},
    pages = {345--355},
    publisher = {JMLR.org},
    year = {2016}
    }
  • Probabilistic Graphical Models 2016
    Online Algorithms for Sum-Product Networks with Continuous Variables
    Jaini, Priyank and Rashwan, Abdullah and Zhao, Han and Liu, Yue and Banijamali, Ershad and Chen, Zhitang and Poupart, Pascal
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) have recently emerged as an attractive representation due to their dual interpretation as a special type of deep neural network with clear semantics and a tractable probabilistic graphical model. We explore online algorithms for parameter learning in SPNs with continuous variables. More specifically, we consider SPNs with Gaussian leaf distributions and show how to derive an online Bayesian moment matching algorithm to learn from streaming data. We compare the resulting generative models to stacked restricted Boltzmann machines and generative moment matching networks on real-world datasets.
          
    
              @inproceedings{jaini2016online,
    author = {Jaini, Priyank and Rashwan, Abdullah and Zhao, Han and Liu, Yue and Banijamali, Ershad and Chen, Zhitang and Poupart, Pascal},
    title = {Online Algorithms for Sum-Product Networks with Continuous Variables},
    booktitle = {Probabilistic Graphical Models},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {52},
    pages = {228--239},
    publisher = {JMLR.org},
    year = {2016}
    }
  • arXiv preprint
    Learning arbitrary sum-product network leaves with expectation-maximization
    Desana, Mattia and Schnoerr, Christoph
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks with complex probability distribution at the leaves have been shown to be powerful tractable-inference probabilistic models. However, while learning the internal parameters has been amply studied, learning complex leaf distribution is an open problem with only few results available in special cases. In this paper we derive an efficient method to learn a very large class of leaf distributions with Expectation-Maximization. The EM updates have the form of simple weighted maximum likelihood problems, allowing to use any distribution that can be learned with maximum likelihood, even approximately. The algorithm has cost linear in the model size and converges even if only partial optimizations are performed. We demonstrate this approach with experiments on twenty real-life datasets for density estimation, using tree graphical models as leaves. Our model outperforms state-of-the-art methods for parameter learning despite using SPNs with much fewer parameters.
          
    
              article{desana2016learning,
    title={Learning arbitrary sum-product network leaves with expectation-maximization},
    author={Desana, Mattia and Schn{"o}rr, Christoph},
    journal={arXiv preprint arXiv:1604.07243},
    year={2016}
    }
  • NIPS 2016
    A Unified Approach for Learning the Parameters of Sum-Product Networks
    Zhao, Han and Poupart, Pascal and Gordon, Geoffrey J.
    [ paper | code | bibtex | abstract ]
    
              We present a unified approach for learning the parameters of Sum-Product networks (SPNs). We prove that any complete and decomposable SPN is equivalent to a mixture of trees where each tree corresponds to a product of univariate distributions. Based on the mixture model perspective, we characterize the objective function when learning SPNs based on the maximum likelihood estimation (MLE) principle and show that the optimization problem can be formulated as a signomial program. We construct two parameter learning algorithms for SPNs by using sequential monomial approximations (SMA) and the concave-convex procedure (CCCP), respectively. The two proposed methods naturally admit multiplicative updates, hence effectively avoiding the projection operation. With the help of the unified framework, we also show that, in the case of SPNs, CCCP leads to the same algorithm as Expectation Maximization (EM) despite the fact that they are different in general.
          
    
              @inproceedings{zhao2016unified,
    author = {Zhao, Han and Poupart, Pascal and Gordon, Geoffrey J.},
    title = {A Unified Approach for Learning the Parameters of Sum-Product Networks},
    booktitle = {{NIPS}},
    pages = {433--441},
    year = {2016}
    }
  • Expert Syst. Appl. 2016
    Modeling spatial layout for scene image understanding via a novel multiscale sum-product network
    Yuan, Ze{-}Huan and Wang, Hao and Wang, Limin and Lu, Tong and Palaiahnakote, Shivakumara and Tan, Chew Lim
    [ paper | bibtex | abstract ]
    
              Semantic image segmentation is challenging due to the large intra-class variations and the complex spatial layouts inside natural scenes. This paper investigates this problem by designing a new deep architecture, called multiscale sum-product network (MSPN), which utilizes multiscale unary potentials as the inputs and models the spatial layouts of image content in a hierarchical manner. That is, the proposed MSPN models the joint distribution of multiscale unary potentials and object classes instead of single unary potentials in popular settings. Besides, MSPN characterizes scene spatial layouts in a fine-to-coarse manner to enforce the consistency in labeling. Multiscale unary potentials at different scales can thus help overcome semantic ambiguities caused by only evaluating single local regions, while long-range spatial correlations can further refine image labeling. In addition, higher orders are able to pose the constraints among labels. By this way, multi-scale unary potentials, long-range spatial correlations, higher-order priors are well modeled under the uniform framework in MSPN. We conduct experiments on two challenging benchmarks consisting of the MSRC-21 dataset and the SIFT FLOW dataset. The results demonstrate the superior performance of our method comparing with the previous graphical models for understanding scene images.
          
    
              @article{yuan2016modeling,
    author = {Yuan, Ze{-}Huan and Wang, Hao and Wang, Limin and Lu, Tong and Palaiahnakote, Shivakumara and Tan, Chew Lim},
    title = {Modeling spatial layout for scene image understanding via a novel
    multiscale sum-product network},
    journal = {Expert Syst. Appl.},
    volume = {63},
    pages = {231--240},
    year = {2016}
    }
  • UAI 2016
    Merging Strategies for Sum-Product Networks: From Trees to Graphs
    Rahman, Tahrima and Gogate, Vibhav
    [ paper | bibtex | abstract ]
    
              Learning the structure of sum-product networks (SPNs) - arithmetic circuits over latent and observed variables - has been the subject of much recent research. These networks admit linear time exact inference, and thus help alleviate one of the chief disadvantages of probabilistic graphical models: accurate probabilistic inference algorithms are often computationally expensive. Although, algorithms for inducing their structure from data have come quite far and often outperform algorithms that induce probabilistic graphical models, a key issue with existing approaches is that they induce tree SPNs, a small, inefficient sub-class of SPNs. In this paper, we address this limitation by developing post-processing approaches that induce graph SPNs from tree SPNs by merging similar sub-structures. The key benefits of graph SPNs over tree SPNs include smaller computational complexity which facilitates faster online inference, and better generalization accuracy because of reduced variance, at the cost of slight increase in the learning time. We demonstrate experimentally that our merging techniques significantly improve the accuracy of tree SPNs, achieving state-of-the-art performance on several real world benchmark datasets.
          
    
              @inproceedings{rahman2016merging,
    author = {Rahman, Tahrima and Gogate, Vibhav},
    title = {Merging Strategies for Sum-Product Networks: From Trees to Graphs},
    booktitle = {{UAI}},
    publisher = {{AUAI} Press},
    year = {2016}
    }
  • ICML 2016
    The Sum-Product Theorem: A Foundation for Learning Tractable Models
    Friesen, Abram L. and Domingos, Pedro M.
    [ paper | bibtex | abstract ]
    
              Inference in expressive probabilistic models is generally intractable, which makes them difficult to learn and limits their applicability. Sum-product networks are a class of deep models where, surprisingly, inference remains tractable even when an arbitrary number of hidden layers are present. In this paper, we generalize this result to a much broader set of learning problems: all those where inference consists of summing a function over a semiring. This includes satisfiability, constraint satisfaction, optimization, integration, and others. In any semiring, for summation to be tractable it suffices that the factors of every product have disjoint scopes. This unifies and extends many previous results in the literature. Enforcing this condition at learning time thus ensures that the learned models are tractable. We illustrate the power and generality of this approach by applying it to a new type of structured prediction problem: learning a nonconvex function that can be globally optimized in polynomial time. We show empirically that this greatly out-performs the standard approach of learning without regard to the cost of optimization.
          
    
              @inproceedings{friesen2016sum-product,
    author = {Friesen, Abram L. and Domingos, Pedro M.},
    title = {The Sum-Product Theorem: {A} Foundation for Learning Tractable Models},
    booktitle = {{ICML}},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {48},
    pages = {1909--1918},
    publisher = {JMLR.org},
    year = {2016}
    }
  • ICML 2016
    Collapsed Variational Inference for Sum-Product Networks
    Zhao, Han and Adel, Tameem and Gordon, Geoffrey J. and Amos, Brandon
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are probabilistic inference machines that admit exact inference in linear time in the size of the network. Existing parameter learning approaches for SPNs are largely based on the maximum likelihood principle and are subject to overfitting compared to more Bayesian approaches. Exact Bayesian posterior inference for SPNs is computationally intractable. Even approximation techniques such as standard variational inference and posterior sampling for SPNs are computationally infeasible even for networks of moderate size due to the large number of local latent variables per instance. In this work, we propose a novel deterministic collapsed variational inference algorithm for SPNs that is computationally efficient, easy to implement and at the same time allows us to incorporate prior information into the optimization formulation. Extensive experiments show a significant improvement in accuracy compared with a maximum likelihood based approach.
          
    
              @inproceedings{zhao2016collapsed,
    author = {Zhao, Han and Adel, Tameem and Gordon, Geoffrey J. and Amos, Brandon},
    title = {Collapsed Variational Inference for Sum-Product Networks},
    booktitle = {{ICML}},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {48},
    pages = {1310--1318},
    publisher = {JMLR.org},
    year = {2016}
    }
  • AISTATS 2016
    Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks
    Rashwan, Abdullah and Zhao, Han and Poupart, Pascal
    [ paper | bibtex | abstract ]
    
              Probabilistic graphical models provide a general and flexible framework for reasoning about complex dependencies in noisy domains with many variables. Among the various types of probabilistic graphical models, sum-product networks (SPNs) have recently generated some interest because exact inference can always be done in linear time with respect to the size of the network. This is particularly attractive since it means that learning an SPN from data always yields a tractable model for inference. However, existing parameter learning algorithms for SPNs operate in batch mode and do not scale easily to large datasets. In this work, we explore online algorithms to ensure that parameter learning can also be done tractably with respect to the amount of data. More specifically, we propose a new Bayesian moment matching (BMM) algorithm that operates naturally in an online fashion and that can be easily distributed. We demonstrate the effectiveness and scalability of BMM in comparison to online extensions of gradient descent, exponentiated gradient and expectation maximization on 20 classic benchmarks and 4 large scale datasets.
          
    
              @inproceedings{rashwan2016online,
    author = {Rashwan, Abdullah and Zhao, Han and Poupart, Pascal},
    title = {Online and Distributed Bayesian Moment Matching for Parameter Learning
    in Sum-Product Networks},
    booktitle = {{AISTATS}},
    series = {{JMLR} Workshop and Conference Proceedings},
    volume = {51},
    pages = {1469--1477},
    publisher = {JMLR.org},
    year = {2016}
    }

  • ICLR 2016
    A Minimalistic Approach to Sum-Product Network Learning for Real Applications
    Krakovna, Viktoriya and Looks, Moshe
    [ paper | bibtex | abstract ]
    
              Sum-Product Networks (SPNs) are a class of expressive yet tractable hierarchical graphical models. LearnSPN is a structure learning algorithm for SPNs that uses hierarchical co-clustering to simultaneously identifying similar entities and similar features. The original LearnSPN algorithm assumes that all the variables are discrete and there is no missing data. We introduce a practical, simplified version of LearnSPN, MiniSPN, that runs faster and can handle missing data and heterogeneous features common in real applications. We demonstrate the performance of MiniSPN on standard benchmark datasets and on two datasets from Google's Knowledge Graph exhibiting high missingness rates and a mix of discrete and continuous features. 
          
    
              @inproceedings{krakovna2016minimalistic, 
    title={A minimalistic approach to sum-product network learning for real applications},
    author={Krakovna, Viktoriya and Looks, Moshe},
    booktitle={Proceedings of ICLR Workshop},
    year={2016}
    }
  • AAMAS 2016
    Sum-product-max networks for tractable decision making
    Melibari, Mazen A and Poupart, Pascal and Doshi, Prashant
    [ paper | bibtex | abstract ]
    
              Investigations into probabilistic graphical models for decision making have predominantly centered on influence diagrams (IDs) and decision circuits (DCs) for representation and computation of decision rules that maximize expected utility. Since IDs are typically handcrafted and DCs are compiled from IDs, in this paper we propose an approach to learn the structure and parameters of decision-making problems directly from data. We present a new representation called sum-product-max network (SPMN) that generalizes a sum-product network (SPN) to the class of decision-making problems and whose solution, analogous to DCs, scales linearly in the size of the network. We show that SPMNs may be reduced to DCs linearly and present a first method for learn-ing SPMNs from data. This approach is significant because it facilitates a novel paradigm of tractable decision making driven by data.
          
    
              @inproceedings{melibari2016sum, 
    title={Sum-product-max networks for tractable decision making},
    author={Melibari, Mazen A and Poupart, Pascal and Doshi, Prashant},
    booktitle={Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems},
    pages={1419--1420},
    year={2016}
    }
  • AAAI 2016
    Learning Tractable Probabilistic Models for Fault Localization
    Nath, Aniruddh and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              In recent years, several probabilistic techniques have been applied to various debugging problems. However, most existing probabilistic debugging systems use relatively simple statistical models, and fail to generalize across multiple programs. In this work, we propose Tractable Fault Localization Models (TFLMs) that can be learned from data, and probabilistically infer the location of the bug. While most previous statistical debugging methods generalize over many executions of a single program, TFLMs are trained on a corpus of previously seen buggy programs, and learn to identify recurring patterns of bugs. Widely-used fault localization techniques such as TARANTULA evaluate the suspiciousness of each line in isolation; in contrast, a TFLM defines a joint probability distribution over buggy indicator variables for each line. Joint distributions with rich dependency structure are often computationally intractable; TFLMs avoid this by exploiting recent developments in tractable probabilistic models (specifically, Relational SPNs). Further, TFLMs can incorporate additional sources of information, including coverage-based features such as TARANTULA. We evaluate the fault localization performance of TFLMs that include TARANTULA scores as features in the probabilistic model. Our study shows that the learned TFLMs isolate bugs more effectively than previous statistical methods or using TARANTULA directly.
          
    
              @inproceedings{nath2016learning, 
    title={Learning Tractable Probabilistic Models for Fault Localization},
    author={Nath, Aniruddh and Domingos, Pedro M},
    booktitle={Thirtieth AAAI Conference on Artificial Intelligence},
    year={2016}
    }
  • TCSVT 2016
    Hierarchical Spatial Sum-Product Networks for action recognition in Still Images
    Wang, Jinghua and Wang, Gang
    [ paper | bibtex | abstract ]
    
              Recognizing actions from still images has been popularly studied recently. In this paper, we model an action class as a flexible number of spatial configurations of body parts by proposing a new spatial sum-product network (SPN). First, we discover a set of parts in image collections via unsupervised learning. Then, our new spatial SPN is applied to model the spatial relationship and also the high-order correlations of parts. To learn robust networks, we further develop a hierarchical spatial SPN method, which models pairwise spatial relationship between parts inside subimages and models the correlation of subimages via extra layers of SPN. Our method is shown to be effective on two benchmark data sets.
          
    
              @article{wang2016hierarchical, 
    title={Hierarchical spatial sum--product networks for action recognition in still images},
    author={Wang, Jinghua and Wang, Gang},
    journal={IEEE Transactions on Circuits and Systems for Video Technology},
    volume={28},
    number={1},
    pages={90--100},
    year={2016},
    publisher={IEEE}
    }
  • Thesis
    Foundations of Sum-Product Networks for Probabilistic Modeling
    Peharz, Robert
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a promising and novel type of probabilistic model, which has been receiving significant attention in recent years. There are, however, several open questions regarding the foundations of SPNs and also some misconceptions in literature. In this thesis we provide new theoretical insights and aim to establish a more coherent picture of the principles of SPNs. As a first main contribution we show that, without loss of generality, SPN parameters can be assumed to be locally normalized, i.e. normalized for each individual sum node. Furthermore, we provide an algorithm which transforms unnormalized SPN-weights into locally normalized weights, without changing the represented distribution. Our second main contribution is concerned with the two notions of consistency and decompos- ability. Consistency, together with completeness, is required to enable efficient inference in SPNs over random variables with finitely many states. Instead of consistency, usually the conceptually easier and stronger condition of decomposability is used. As our second main contribution we show that consistent SPNs can not represent distributions exponentially more compactly than decomposable SPNs. Furthermore, we point out that consistent SPNs are not amenable for Darwiche’s differential approach to inference. SPNs were originally defined over random variables with finitely many states, but can be easily generalized to SPNs over random variables with infinitely many states. As a third main contribution, we formally derive two inference mechanisms for generalized SPNs, which so far have been derived only for finite-state SPNs. These two inference scenarios are marginalization and the so-called evaluation of modified evidence. Our fourth main contribution is to make the latent variable interpretation of sum nodes in SPNs more precise. We point out a weak point about this interpretation in literature and propose a remedy for it. Furthermore, using the latent variable interpretation, we concretize the inference task of finding the most probable explanation (MPE) in the corresponding latent variable model. We show that the MPE inference algorithm proposed in literature suffers from a phenomenon which we call low-depth bias. We propose a corrected algorithm and show that it indeed delivers an MPE solution. While an MPE solution can be efficiently found in the augmented latent variable model, we show that MPE inference in SPNs is generally NP-hard. As another contribution, we point out that the EM algorithm for SPN-weights presented in literature is incorrect, and propose a corrected version. Furthermore, we discuss some learning approaches for SPNs and a rather powerful sub-class of SPNs, called selective SPNs, for which maximum-likelihood parameters can be found in closed form.
          
    
              @phdthesis{peharz2015foundations, 
    title={Foundations of sum-product networks for probabilistic modeling},
    author={Peharz, Robert},
    year={2015},
    school={PhD thesis, Medical University of Graz}
    }
  • OM 2015
    Combining Sum-Product Network and Noisy-OrModel for Ontology Matching
    Li, Weizhuo
    [ paper | bibtex | abstract ]
    
              Ontology matching is the key challenge to achieve semantic interoperability in building the Semantic Web. We present an alternative probabilistic scheme, called GMap, which combines the sum-product network and the noisy-or model. More precisely, we employ the sum-product network to encode the similarities based on individuals and disjointness axioms across ontologies and calculate the contributions by the maximum a posterior inference. The noisy-or model is used to encode the probabilistic matching rules, which are independent of each other as well asthe value calculated by the sum-product network. Experiments show thatGMap is competitive with many OAEI top-ranked systems. Futhermore,GMap, benefited from these two graphical models, can keep inferencetractable in the whole matching process.
          
    
              @inproceedings{li2015combining, 
    title={Combining sum-product network and noisy-or model for ontology matching.},
    author={Li, Weizhuo},
    booktitle={OM},
    pages={35--39},
    year={2015}
    }
  • ECML-PKDD 2015
    Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning
    Vergari, Antonio and Di Mauro, Nicola and Esposito, Floriana
    [ paper | supplemental | code | bibtex | abstract ]
    
              The need for feasible inference in Probabilistic Graphical Models (PGMs) has lead to tractable models like Sum-Product Networks (SPNs). Their highly expressive power and their ability to provide exact and tractable inference make them very attractive for several real world applications, from computer vision to NLP. Recently, great attention around SPNs has focused on structure learning, leading to different algorithms being able to learn both the network and its parameters from data. Here, we enhance one of the best structure learner, LearnSPN, aiming to improve both the structural quality of the learned networks and their achieved likelihoods. Our algorithmic variations are able to learn simpler, deeper and more robust networks. These results have been obtained by exploiting some insights in the building process done by LearnSPN, by hybridizing the network adopting tree-structured models as leaves, and by blending bagging estimations into mixture creation. We prove our claims by empirically evaluating the learned SPNs on several benchmark datasets against other competitive SPN and PGM structure learners.
          
    
              @inproceedings{vergari2015simplifying, 
    title={Simplifying, regularizing and strengthening sum-product network structure learning},
    author={Vergari, Antonio and Di Mauro, Nicola and Esposito, Floriana},
    booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
    pages={343--358},
    year={2015},
    organization={Springer}
    }
  • UAI 2015
    Learning and Inference in Tractable Probabilistic Knowledge Bases
    Niepert, Mathias and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              Building efficient large-scale knowledge bases (KBs) is a longstanding goal of AI. KBs need to be first-order to be sufficiently expressive, and probabilistic to handle uncertainty, but these lead to intractable inference. Recently, tractable Markov logic (TML) was proposed as a non-trivial tractable first-order probabilistic representation. This paper describes the first inference and learning algorithms for TML, and its application to real-world problems. Inference takes time per query sublinear in the size of the KB, and supports very large KBs via parallelization and a disk-based implementation using a relational database engine. Query answering is fast enough for interactive and real-time use. We show that, despite the data being non-i.i.d. in general, maximum likelihood parameters for TML knowledge bases can be computed in closed form. We use our algorithms to build a very large tractable probabilistic KB from numerous heterogeneous data sets. The KB includes millions of objects and billions of parameters. Our experiments show that the learned KB outperforms existing specialized approaches on challenging tasks in information extraction and integration.
          
    
              @inproceedings{niepert2015learning, 
    title={Learning and Inference in Tractable Probabilistic Knowledge Bases.},
    author={Niepert, Mathias and Domingos, Pedro M},
    booktitle={UAI},
    pages={632--641},
    year={2015}
    }
  • UAI 2015
    Learning the Structure of Sum-Product Networks via an SVD-based Algorithm
    Adel, Tameem and Balduzzi, David and Ghodsi, Ali
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a recently developed class of deep probabilistic models where inference is tractable. We present two new structure learning algorithms for sum-product networks, in the generative and discriminative settings, that are based on recursively extracting rank-one submatrices from data. The proposed algorithms find the subSPNs that are the most coherent jointly in the instances and variables - that is, whose instances are most strongly correlated over the given variables. Experimental results show that SPNs learned using the proposed generative algorithm have better likelihood and inference results - and also much faster - than previous approaches. Finally, we apply the discriminative SPN structure learning algorithm to handwritten digit recognition tasks, where it achieves state-of-the-art performance for an SPN.
          
    
              @inproceedings{adel2015learning, 
    title={Learning the Structure of Sum-Product Networks via an SVD-based Algorithm.},
    author={Adel, Tameem and Balduzzi, David and Ghodsi, Ali},
    booktitle={UAI},
    pages={32--41},
    year={2015}
    }
  • ICML 2015
    On the Relationship between Sum-Product Networks and Bayesian Networks
    Zhao, Han and Melibari, Mazen and Poupart, Pascal
    [ paper | supplemental | bibtex | abstract ]
    
              In this paper, we establish some theoretical connections between Sum-Product Networks (SPNs) and Bayesian Networks (BNs). We prove that every SPN can be converted into a BN in linear time and space in terms of the network size. The key insight is to use Algebraic Decision Diagrams (ADDs) to compactly represent the local conditional probability distributions at each node in the resulting BN by exploiting context-specific independence (CSI). The generated BN has a simple directed bipartite graphical structure. We show that by applying the Variable Elimination algorithm (VE) to the generated BN with ADD representations, we can recover the original SPN where the SPN can be viewed as a history record or caching of the VE inference process. To help state the proof clearly, we introduce the notion of normal SPN and present a theoretical analysis of the consistency and decomposability properties. We conclude the paper with some discussion of the implications of the proof and establish a connection between the depth of an SPN and a lower bound of the tree-width of its corresponding BN.
          
    
              @inproceedings{zhao2015relationship, 
    title={On the relationship between sum-product networks and Bayesian networks},
    author={Zhao, Han and Melibari, Mazen and Poupart, Pascal},
    booktitle={International Conference on Machine Learning},
    pages={116--124},
    year={2015}
    }
  • AAAI 2015
    Learning Relational Sum-Product Networks
    Nath, Aniruddh and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a recently-proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational SumProduct Networks (RSPNs), a new tractable first-order probabilistic architecture. RSPNs generalize SPNs by modeling a set of instances jointly, allowing them to influence each other's probability distributions, as well as modeling probabilities of relations between objects. We also present LearnR-SPN, the first algorithm for learning high-treewidth tractable statistical relational models. LearnRSPN is a recursive top-down structure learning algorithm for RSPNs, based on Gens and Domingos' LearnSPN algorithm for propositional SPN learning. We evaluate the algorithm on three datasets; the RSPN learning algorithm outperforms Markov Logic Networks in both running time and predictive accuracy.
          
    
              @inproceedings{nath2015learning, 
    title={Learning relational sum-product networks},
    author={Nath, Aniruddh and Domingos, Pedro M},
    booktitle={Twenty-Ninth AAAI Conference on Artificial Intelligence},
    year={2015}
    }
  • AISTATS 2015
    On Theoretical Properties of Sum-Product Networks
    Peharz, Robert and Tschiatschek, Sebastian and Pernkopf, Franz and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a promising avenue for probabilistic modeling and have been successfully applied to various tasks. However, some theoretic properties about SPNs are not yet well understood. In this paper we fill some gaps in the theoretic foundation of SPNs. First, we show that the weights of any complete and consistent SPN can be transformed into locally normalized weights without changing the SPN distribution. Second, we show that consistent SPNs cannot model distributions significantly (exponentially) more compactly than decomposable SPNs. As a third contribution, we extend the inference mechanisms known for SPNs with finite states to generalized SPNs with arbitrary input distributions.
          
    
              @inproceedings{peharz2015theoretical, 
    title={On theoretical properties of sum-product networks},
    author={Peharz, Robert and Tschiatschek, Sebastian and Pernkopf, Franz and Domingos, Pedro},
    booktitle={Artificial Intelligence and Statistics},
    pages={744--752},
    year={2015}
    }
  • TPAMI 2015
    Sum Product Networks for Activity Recognition
    Amer, Mohamed R. and Todorovic, Sinisa
    [ paper | bibtex | abstract ]
    
              This paper addresses detection and localization of human activities in videos. We focus on activities that may have variable spatiotemporal arrangements of parts, and numbers of actors. Such activities are represented by a sum-product network (SPN). A product node in SPN represents a particular arrangement of parts, and a sum node represents alternative arrangements. The sums and products are hierarchically organized, and grounded onto space-time windows covering the video. The windows provide evidence about the activity classes based on the Counting Grid (CG) model of visual words. This evidence is propagated bottom-up and top-down to parse the SPN graph for the explanation of the video. The node connectivity and model parameters of SPN and CG are jointly learned under two settings, weakly supervised, and supervised. For evaluation, we use our new Volleyball dataset, along with the benchmark datasets VIRAT, UT-Interactions, KTH, and TRECVID MED 2011. Our video classification and activity localization are superior to those of the state of the art on these datasets.
          
    
              @article{amer2015sum, 
    title={Sum product networks for activity recognition},
    author={Amer, Mohamed R and Todorovic, Sinisa},
    journal={IEEE transactions on pattern analysis and machine intelligence},
    volume={38},
    number={4},
    pages={800--813},
    year={2015},
    publisher={IEEE}
    }
  • IJCAI 2015
    Greedy Structure Search for Sum-Product Networks
    Dennis, Aaron and Ventura, Dan
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are rooted, directed acyclic graphs (DAGs) of sum and product nodes with well-defined probabilistic semantics. Moreover, exact inference in the distribution represented by an SPN is guaranteed to take linear time in the size of the DAG. In this paper we introduce an algorithm that learns the structure of an SPN using a greedy search approach. It incorporates methods used in a previous SPN structure-learning algorithm, but, unlike the previous algorithm, is not limited to learning tree-structured SPNs. Several proven ideas from circuit complexity theory along with our experimental results provide evidence for the advantages of SPNs with less-restrictive, nontree structures.
          
    
              @inproceedings{dennis2015greedy, 
    title={Greedy structure search for sum-product networks},
    author={Dennis, Aaron and Ventura, Dan},
    booktitle={Twenty-Fourth International Joint Conference on Artificial Intelligence},
    year={2015}
    }
  • IJCAI 2015
    Recursive Decomposition for Nonconvex Optimization
    Friesen, Abram L. and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most real-world optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problem-decomposition approach to nonconvex optimization. Similarly to DPLL-style SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent sub-functions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding.
          
    
              @article{friesen2016recursive, 
    title={Recursive decomposition for nonconvex optimization},
    author={Friesen, Abram L and Domingos, Pedro},
    journal={arXiv preprint arXiv:1611.02755},
    year={2016}
    }
  • Preprint arXiv
    On the Expressive Efficiency of Sum Product Networks
    Martens, James and Medabalimi, Venkatesh
    [ paper | bibtex | abstract ]
    
              Sum Product Networks (SPNs) are a recently developed class of deep generative models which compute their associated unnormalized density functions using a special type of arithmetic circuit. When certain sufficient conditions, called the decomposability and completeness conditions (or 'D&C' conditions), are imposed on the structure of these circuits, marginal densities and other useful quantities, which are typically intractable for other deep generative models, can be computed by what amounts to a single evaluation of the network (which is a property known as 'validity'). However, the effect that the D&C conditions have on the capabilities of D&C SPNs is not well understood. In this work we analyze the D&C conditions, expose the various connections that D&C SPNs have with multilinear arithmetic circuits, and consider the question of how well they can capture various distributions as a function of their size and depth. Among our various contributions is a result which establishes the existence of a relatively simple distribution with fully tractable marginal densities which cannot be efficiently captured by D&C SPNs of any depth, but which can be efficiently captured by various other deep generative models. We also show that with each additional layer of depth permitted, the set of distributions which can be efficiently captured by D&C SPNs grows in size. This kind of 'depth hierarchy' property has been widely conjectured to hold for various deep models, but has never been proven for any of them. Some of our other contributions include a new characterization of the D&C conditions as sufficient and necessary ones for a slightly strengthened notion of validity, and various state-machine characterizations of the types of computations that can be performed efficiently by D&C SPNs.
          
    
              @article{martens2014expressive, title={On the expressive efficiency of sum product networks}, author={Martens, James and Medabalimi, Venkatesh}, journal={arXiv preprint arXiv:1411.7717}, year={2014} }                
          
  • INTERSPEECH 2014
    Language Modeling with Sum-Product Networks
    Cheng, Wei-Chen and Kok, Stanley and Pham, Hoai Vu and Chieu, Hai Leong and Chai, Kian Ming A
    [ paper | code | bibtex | abstract ]
    
              Sum product networks (SPNs) are a new class of deep probabilistic models. They can contain multiple hidden layers while keeping their inference and training times tractable. An SPN consists of interleaving layers of sum nodes and product nodes. A sum node can be interpreted as a hidden variable, and a product node can be viewed as a feature capturing rich interactions among an SPN's inputs. We show that the ability of SPN to use hidden layers to model complex dependencies among words, and its tractable inference and learning times, make it a suitable framework for a language model. Even though SPNs have been applied to a variety of vision problems, we are the first to use it for language modeling. Our empirical comparisons with six previous language models indicate that our SPN has superior performance.
          
    
              @inproceedings{cheng2014language, title={Language modeling with sum-product networks}, author={Cheng, Wei-Chen and Kok, Stanley and Pham, Hoai Vu and Chieu, Hai Leong and Chai, Kian Ming A.}, booktitle={Fifteenth Annual Conference of the International Speech Communication Association}, year={2014} }                
          
  • ICCASP 2014
    Modeling Speech with Sum-Product Networks: Application to Bandwidth Extension
    Peharz, Robert and Kapeller, Georg and Mowlaee, Pejman and Pernkopf, Franz
    [ paper | code | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a recently proposed type of probabilistic graphical models allowing complex variable interactions while still granting efficient inference. In this paper we demonstrate the suitability of SPNs for modeling log-spectra of speech signals using the application of artificial bandwidth extension, i.e. artificially replacing the high-frequency content which is lost in telephone signals. We use SPNs as observation models in hidden Markov models (HMMs), which model the temporal evolution of log short-time spectra. Missing frequency bins are replaced by the SPNs using most-probable-explanation inference, where the state-dependent reconstructions are weighted with the HMM state posterior.
          
    
              @inproceedings{peharz2014modeling, 
    title={Modeling speech with sum-product networks: Application to bandwidth extension},
    author={Peharz, Robert and Kapeller, Georg and Mowlaee, Pejman and Pernkopf, Franz},
    booktitle={2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    pages={3699--3703},
    year={2014},
    organization={IEEE}
    }
  • LTPM Workshop 2014
    Non-Parametric Bayesian Sum-Product Networks
    Lee, Sang-Woo and Watkins, Christopher and Zhang, Byoung-Tak
    [ paper | bibtex | abstract ]
    
              We define two non-parametric models for Sum-Product Networks (SPNs)(Poon & Domingos, 2011). The first is a tree structure of Dirichlet Processes; the second is a dag of hierarchical Dirichlet Processes. These generative models for data implicitly define a prior distribution on on SPN of tree and of dag structure. They allow MCMC fitting of data to SPN models, and the learning of SPN structure from data.
          
    
              @inproceedings{lee2014non, 
    title={Non-parametric Bayesian sum-product networks},
    author={Lee, Sang-Woo and Watkins, Christopher and Zhang, Byoung-Tak},
    booktitle={ICML Workshop on Learning Tractable Probabilistic Models},
    volume={32},
    year={2014},
    organization={Citeseer}
    }
  • LTPM Workshop 2014
    Sum-Product Networks for Structured Prediction: Context-Specific Deep Conditional Random Fields
    Ratajczak, Martin and Tschiatschek, Sebastian and Pernkopf, Franz
    [ paper | bibtex | abstract ]
    
              Linear-chain conditional random fields (LC-CRFs) have been successfully applied in many structured prediction tasks. Many previous extensions, eg replacing local factors by neural networks, are computationally demanding. In this paper, we extend conventional LC-CRFs by replacing the local factors with sum-product networks, ie a promising new deep architecture allowing for exact and efficient inference. The proposed local factors can be interpreted as an extension of Gaussian mixture models (GMMs). Thus, we provide a powerful alternative to LC-CRFs extended by GMMs. In extensive experiments, we achieved performance competitive to state-of-the-art methods in phone classification and optical character recognition tasks.
          
    
              @inproceedings{ratajczak2014sum, 
    title={Sum-product networks for structured prediction: Context-specific deep conditional random fields},
    author={Ratajczak, Martin and Tschiatschek, Sebastian and Pernkopf, Franz},
    booktitle={International Conference on Machine Learning (ICML) Workshop on Learning Tractable Probabilistic Models Workshop},
    year={2014}
    }
  • StarAI Workshop 2014
    Learning Tractable Statistical Relational Models
    Nath, Aniruddh and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a recently proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational Sum-Product Networks (RSPNs), a new tractable first-order probabilistic architecture. RSPNs generalize SPNs by modeling a set of instances jointly, allowing them to influence each other’s probability distributions, as well as modeling the probabilities of relations between objects. We also present LearnRSPN, the first algorithm for learning tractable statistical relational models. LearnRSPN is a recursive top-down structure learning algorithm for RSPNs,  similar to the LearnSPN algorithm for propositional SPN learning. In preliminary experiments,  RSPNs outperform Markov Logic Networks in both running time and predictive accuracy.
          
    
              @inproceedings{nath2014learning, title={Learning Tractable Statistical Relational Models.}, author={Nath, Aniruddh and Domingos, Pedro M}, booktitle={AAAI Workshop: Statistical Relational Artificial Intelligence}, year={2014}, organization={Citeseer} }                
          
  • LTPM Workshop 2014
    Learning Selective Sum-Product Networks
    Peharz, Robert and Gens, Robert and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              We consider the selectivity constraint on the structure of sum-product networks (SPNs), which allows each sum node to have at most one child with non-zero output for each possible input. This allows us to find globally optimal maximum likelihood parameters in closed form. Although being a constrained class of SPNs, these models still strictly generalize classical graphical models such as Bayesian networks. Closed form parameter estimation opens the door for structure learning using a principled scoring function, trading off training likelihood and model complexity. In experiments we show that these models are easy to learn and compete well with state of the art.
          
    
              @inproceedings{peharz2014learning, title={Learning selective sum-product networks}, author={Peharz, Robert and Gens, Robert and Domingos, Pedro}, booktitle={LTPM workshop}, year={2014} }                 
          
  • ICML 2014
    Learning Sum-Product Networks with Direct and Indirect Interactions
    Rooshenas, Amirmohammad and Lowd, Daniel
    [ paper | code | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a deep probabilistic representation that allows for efficient, exact inference. SPNs generalize many other tractable models, including thin junction trees, latent tree models, and many types of mixtures. Previous work on learning SPN structure has mainly focused on using top-down or bottom-up clustering to find mixtures, which capture variable interactions indirectly through implicit latent variables. In contrast, most work on learning graphical models, thin junction trees, and arithmetic circuits has focused on finding direct interactions among variables. In this paper, we present ID-SPN, a new algorithm for learning SPN structure that unifies the two approaches. In experiments on 20 benchmark datasets, we find that the combination of direct and indirect interactions leads to significantly better accuracy than several state-of-the-art algorithms for learning SPNs and other tractable models.
          
    
              @inproceedings{rooshenas2014learning, 
    title={Learning sum-product networks with direct and indirect interactions},
    author={Rooshenas, Amirmohammad and Lowd, Daniel},
    booktitle={In Proceedings of the Thirty-First International Conference on Machine Learning}, year={2014},
    organization={Citeseer}
    }
  • ICONIP 2013
    Online Incremental Structure Learning of Sum--Product Networks
    Lee, Sang-Woo and Heo, Min-Oh and Zhang, Byoung-Tak
    [ paper | bibtex | abstract ]
    
              Sum–product networks (SPNs) are deep architectures that can learn and infer at low computational costs. The structure of SPNs is especially important for their performance; however, structure learning for SPNs has until now been introduced only for batch-type dataset. In this study, we propose a new online incremental structure learning method for SPNs. We note that SPNs can be represented by mixtures of basis distributions. Online learning of SPNs can be formulated as an online clustering problem, in which a local assigning instance corresponds to modifying the tree-structure of the SPN incrementally. In the method, the number of hidden units and even layers are evolved dynamically on incoming data. The experimental results show that the proposed method outperforms the online version of the previous method. In addition, it achieves the performance of batch structure learning.
          
    
              @inproceedings{lee2013online, 
    title={Online incremental structure learning of sum--product networks},
    author={Lee, Sang-Woo and Heo, Min-Oh and Zhang, Byoung-Tak},
    booktitle={International Conference on Neural Information Processing},
    pages={220--227},
    year={2013},
    organization={Springer}
    }
  • ECML-PKDD 2013
    Greedy Part-wise Learning of Sum-Product Networks
    Peharz, Robert and Geiger, Bernhard C. and Pernkopf, Franz
    [ paper | bibtex | abstract ]
    
              Sum-product networks allow to model complex variable interactions while still granting efficient inference. However, most learning algorithms proposed so far are explicitly or implicitly restricted to the image domain, either by assuming variable neighborhood or by assuming that dependent variables are related by their magnitudes over the training set. In this paper, we introduce a novel algorithm, learning the structure and parameters of sum-product networks in a greedy bottom-up manner. Our algorithm iteratively merges probabilistic models of small variable scope to larger and more complex models. These merges are guided by statistical dependence test, and parameters are learned using a maximum mutual information principle. In experiments our method competes well with the existing learning algorithms for sum-product networks on the task of reconstructing covered image regions, and outperforms these when neither neighborhood nor correlations by magnitude can be assumed.
          
    
              @inproceedings{peharz2013greedy, 
    title={Greedy part-wise learning of sum-product networks},
    author={Peharz, Robert and Geiger, Bernhard C and Pernkopf, Franz},
    booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
    pages={612--627},
    year={2013},
    organization={Springer}
    }
  • ICML 2013
    Learning the Structure of Sum-Product Networks
    Gens, Robert and Domingos, Pedro
    [ paper | code | bibtex | abstract ]
    
              Sum-product networks (SPNs) are a new class of deep probabilistic models. SPNs can have unbounded treewidth but inference in them is always tractable. An SPN is either a univariate distribution, a product of SPNs over disjoint variables, or a weighted sum of SPNs over the same variables.  We  propose the first algorithm for learning the structure of SPNs that takes full advantage of their expressiveness. At each step, the algorithm attempts to divide the current variables into approximately independent subsets. If successful, it returns the product of  recursive calls on the subsets; otherwise it returns the sum of recursive calls on subsets of  similar instances from the current training set. A comprehensive empirical study shows that the learned SPNs are typically comparable to graphical models in likelihood but superior in inference speed and accuracy.
          
    
              @inproceedings{gens2013learning, 
    title={Learning the structure of sum-product networks},
    author={Gens, Robert and Pedro, Domingos},
    booktitle={International conference on machine learning},
    pages={873--880},
    year={2013}
    }
  • NIPS 2012
    Discriminative Learning of Sum-Product Networks
    Gens, Robert and Domingos, Pedro
    [ paper | bibtex | abstract ]
    
              Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using 'hard' gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset.
          
    
              @inproceedings{gens2012discriminative, title={Discriminative learning of sum-product networks}, author={Gens, Robert and Domingos, Pedro}, booktitle={Advances in Neural Information Processing Systems}, pages={3239--3247}, year={2012} }                
          
  • NIPS 2012
    Learning the Architecture of Sum-Product Networks using Clustering on Variables
    Dennis, Aaron and Ventura, Dan
    [ paper | bibtex | abstract ]
    
              The sum-product network (SPN) is a recently-proposed deep model consisting of a network of sum and product nodes, and has been shown to be competitive with state-of-the-art deep models on certain difficult tasks such as image completion. Designing an SPN network architecture that is suitable for the task at hand is an open question. We propose an algorithm for learning the SPN architecture from data. The idea is to cluster variables (as opposed to data instances) in order to identify variable subsets that strongly interact with one another. Nodes in the SPN network are then allocated towards explaining these interactions. Experimental evidence shows that learning the SPN architecture significantly improves its performance compared to using a previously-proposed static architecture.
          
    
              @inproceedings{dennis2012learning, 
    title={Learning the architecture of sum-product networks using clustering on variables},
    author={Dennis, Aaron and Ventura, Dan},
    booktitle={Advances in Neural Information Processing Systems},
    pages={2033--2041},
    year={2012}
    }
  • StarAI 2012
    A Dynamic Programming Algorithm for Inference in Recursive Probabilistic Programs
    Stuhlmueller, Andreas and Goodman, Noah D.
    [ paper | bibtex | abstract ]
    
              We describe a dynamic programming algorithm for computing the marginal distribution of discrete probabilistic programs. This algorithm takes a functional interpreter for an arbitrary probabilistic programming language and turns it into an efficient marginalizer. Because direct caching of sub-distributions is impossible in the presence of recursion, we build a graph of dependencies between sub-distributions. This factored sum-product network makes (potentially cyclic) dependencies between subproblems explicit, and corresponds to a system of equations for the marginal distribution. We solve these equations by fixed-point iteration in topological order. We illustrate this algorithm on examples used in teaching probabilistic models, computational cognitive science research, and game theory.
          
    
              @article{stuhlmuller2012dynamic, 
    title={A dynamic programming algorithm for inference in recursive probabilistic programs},
    author={Stuhlm{"u}ller, Andreas and Goodman, Noah D},
    journal={Workshop of Statistical and Relational AI (StarAI)},
    year={2012}
    }
  • CVPR 2012
    Sum-Product Networks for Modeling Activities with Stochastic Structure
    Amer, Mohamed R. and Todorovic, Sinisa
    [ paper | bibtex | abstract ]
    
              This paper addresses recognition of human activities with stochastic structure, characterized by variable spacetime arrangements of primitive actions, and conducted by a variable number of actors. We demonstrate that modeling aggregate counts of visual words is surprisingly expressive enough for such a challenging recognition task. An activity is represented by a sum-product network (SPN). SPN is a mixture of bags-of-words (BoWs) with exponentially many mixture components, where subcomponents are reused by larger ones. SPN consists of terminal nodes representing BoWs, and product and sum nodes organized in a number of layers. The products are aimed at encoding particular configurations of primitive actions, and the sums serve to capture their alternative configurations. The connectivity of SPN and parameters of BoW distributions are learned under weak supervision using the EM algorithm. SPN inference amounts to parsing the SPN graph, which yields the most probable explanation (MPE) of the video in terms of activity detection and localization. SPN inference has linear complexity in the number of nodes, under fairly general conditions, enabling fast and scalable recognition. A new Volleyball dataset is compiled and annotated for evaluation. Our classification accuracy and localization precision and recall are superior to those of the state-of-the-art on the benchmark and our Volleyball datasets.
          
    
              @inproceedings{amer2012sum, 
    title={Sum-product networks for modeling activities with stochastic structure},
    author={Amer, Mohamed R and Todorovic, Sinisa},
    booktitle={2012 IEEE Conference on Computer Vision and Pattern Recognition},
    pages={1314--1321},
    year={2012},
    organization={IEEE}
    }
  • NIPS 2011
    Shallow vs. Deep Sum-Product Networks
    Delalleau, Olivier and Bengio, Yoshua
    [ paper | bibtex | abstract ]
    
              We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures. We prove there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units. Such results were not available until now, and contribute to motivate recent research involving learning of deep sum-product networks, and more generally motivate research in Deep Learning.
          
    
              @inproceedings{delalleau2011shallow, 
    title={Shallow vs. deep sum-product networks},
    author={Delalleau, Olivier and Bengio, Yoshua},
    booktitle={Advances in neural information processing systems},
    pages={666--674},
    year={2011}
    }
  • UAI 2011
    Sum-Product Networks: a New Deep Architecture
    Hoifung Poon and Pedro Domingos
    [ paper | code | bibtex | abstract ]
    
              The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are general conditions under which the partition function is tractable? The answer leads to a new kind of deep architecture, which we call sum-product networks (SPNs). SPNs are directed acyclic graphs with variables as leaves,  sums and products as internal nodes, and weighted edges. We show that if an SPN is complete and consistent it represents the partition function and all marginals of some graphical model, and give semantics to its nodes. Essentially all tractable graphical models can be cast as SPNs, but SPNs are also strictly more general. We then propose learning algorithms for SPNs, based onbackpropagation and EM. Experiments show that inference and learning with SPNs can be both faster and more accurate than with standard deep networks. For example, SPNs perform image completion better than state-of-the-art deep networks for this task. SPNs also have intriguing potential connections to the architecture of the cortex.
          
    
              @inproceedings{poon2011sum, title={Sum-product networks: A new deep architecture}, author={Poon, Hoifung and Domingos, Pedro}, booktitle={2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)}, pages={689--690}, year={2011}, organization={IEEE} }                
          
And/Or graphs [ aogs ]
  • J. Artif. Intell. Res. 2008
    AND/OR Multi-Valued Decision Diagrams (AOMDDs) for Graphical Models
    Mateescu, Robert and Dechter, Rina and Marinescu, Radu
    [ paper | bibtex | abstract ]
    
              Inspired by the recently introduced framework of AND/OR search spaces for graphical models, we propose to augment Multi-Valued Decision Diagrams (MDD) with AND nodes, in order to capture function decomposition structure and to extend these compiled data structures to general weighted graphical models (e.g., probabilistic models). We present the AND/OR Multi-Valued Decision Diagram (AOMDD) which compiles a graphical model into a canonical form that supports polynomial (e.g., solution counting, belief updating) or constant time (e.g. equivalence of graphical models) queries. We provide two algorithms for compiling the AOMDD of a graphical model. The first is search-based, and works by applying reduction rules to the trace of the memory intensive AND/OR search algorithm. The second is inference-based and uses a Bucket Elimination schedule to combine the AOMDDs of the input functions via the the APPLY operator. For both algorithms, the compilation time and the size of the AOMDD are, in the worst case, exponential in the treewidth of the graphical model, rather than pathwidth as is known for ordered binary decision diagrams (OBDDs). We introduce the concept of semantic treewidth, which helps explain why the size of a decision diagram is often much smaller than the worst case bound. We provide an experimental evaluation that demonstrates the potential of AOMDDs.
          
    
              @article{mateescu2008andor,
    author = {Mateescu, Robert and Dechter, Rina and Marinescu, Radu},
    title = {{AND/OR} Multi-Valued Decision Diagrams (AOMDDs) for Graphical Models},
    journal = {J. Artif. Intell. Res.},
    volume = {33},
    pages = {465--519},
    year = {2008}
    }
  • Artif. Intell. 2007
    AND/OR search spaces for graphical models
    Dechter, Rina and Mateescu, Robert
    [ paper | bibtex | abstract ]
    
              The paper introduces an AND/OR search space perspective for graphical models that include probabilistic networks (directed or undirected) and constraint networks. In contrast to the traditional (OR) search space view, the AND/OR search tree displays some of the independencies present in the graphical model explicitly and may sometimes reduce the search space exponentially. Indeed, most algorithmic advances in search-based constraint processing and probabilistic inference can be viewed as searching an AND/OR search tree or graph. Familiar parameters such as the depth of a spanning tree, treewidth and pathwidth are shown to play a key role in characterizing the effect of AND/OR search graphs vs. the traditional OR search graphs. We compare memory intensive AND/OR graph search with inference methods, and place various existing algorithms within the AND/OR search space.
          
    
              @article{dechter2007and/or},
    author = {Dechter, Rina and Mateescu, Robert},
    title = {{AND/OR} search spaces for graphical models},
    journal = {Artif. Intell.},
    volume = {171},
    number = {2-3},
    pages = {73--106},
    year = {2007}
    }
Probabilistic decision graphs [ pdgs ]
  • Probabilistic Graphical Models 2014
    Supervised Classification Using Hybrid Probabilistic Decision Graphs
    Fernandez, Antonio and Rumi, Rafael and del Sagrado, Jose and Salmeron, Antonio
    [ paper | bibtex | abstract ]
    
              In this paper we analyse the use of probabilistic decision graphs in supervised classification problems. We enhance existing models with the ability of operating in hybrid domains, where discrete and continuous variables coexist. Our proposal is based in the use of mixtures of truncated basis functions. We first introduce a new type of probabilistic graphical model, namely probabilistic decision graphs with mixture of truncated basis functions distribution, and then present an initial experimental evaluation where our proposal is compared with state-of-the-art Bayesian classifiers, showing a promising behaviour.
          
    
              @inproceedings{fernandez2014supervised,
    author = {Fern{'{a}}ndez, Antonio and Rum{'{i}}, Rafael and del Sagrado, Jos{'{e}} and Salmer{'{o}}n, Antonio},
    title = {Supervised Classification Using Hybrid Probabilistic Decision Graphs},
    booktitle = {Probabilistic Graphical Models},
    series = {Lecture Notes in Computer Science},
    volume = {8754},
    pages = {206--221},
    publisher = {Springer},
    year = {2014}
    }
  • Int. J. Approx. Reason. 2012
    Modelling and inference with Conditional Gaussian Probabilistic Decision Graphs
    Dalgaard Nielsen, Jens and G{'{a}}mez, Jos{'{e}} A. and Salmer{'{o}}n, Antonio
    [ paper | bibtex | abstract ]
    
              Probabilistic Decision Graphs (PDGs) are probabilistic graphical models that represent a factorisation of a discrete joint probability distribution using a “decision graph”-like structure over local marginal parameters. The structure of a PDG enables the model to capture some context specific independence relations that are not representable in the structure of more commonly used graphical models such as Bayesian networks and Markov networks. This sometimes makes operations in PDGs more efficient than in alternative models. PDGs have previously been defined only in the discrete case, assuming a multinomial joint distribution over the variables in the model. We extend PDGs to incorporate continuous variables, by assuming a Conditional Gaussian (CG) joint distribution. We also show how inference can be carried out in an efficient way.
          
    
              @article{nielsen2012modelling,
    author = {Dalgaard Nielsen, Jens and G{'{a}}mez, Jos{'{e}} A. and Salmer{'{o}}n, Antonio},
    title = {Modelling and inference with Conditional Gaussian Probabilistic Decision
    Graphs},
    journal = {Int. J. Approx. Reason.},
    volume = {53},
    number = {7},
    pages = {929--945},
    year = {2012}
    }
  • Comput. Stat. Data Anal. 2009
    Supervised classification using probabilistic decision graphs
    Nielsen, Jens Dalgaard and Rumi, Rafael and Salmeron, Antonio
    [ paper | bibtex | abstract ]
    
              A new model for supervised classification based on probabilistic decision graphs is introduced. A probabilistic decision graph (PDG) is a graphical model that efficiently captures certain context specific independencies that are not easily represented by other graphical models traditionally used for classification, such as the Naïve Bayes (NB) or Classification Trees (CT). This means that the PDG model can capture some distributions using fewer parameters than classical models. Two approaches for constructing a PDG for classification are proposed. The first is to directly construct the model from a dataset of labelled data, while the second is to transform a previously obtained Bayesian classifier into a PDG model that can then be refined. These two approaches are compared with a wide range of classical approaches to the supervised classification problem on a number of both real world databases and artificially generated data.
          
    
              @article{nielsen2009supervised,
    author = {Dalgaard Nielsen, Jens and Rum{'{i}}, Rafael and Salmer{'{o}}n, Antonio},
    title = {Supervised classification using probabilistic decision graphs},
    journal = {Comput. Stat. Data Anal.},
    volume = {53},
    number = {4},
    pages = {1299--1311},
    year = {2009}
    }
  • Int. J. Approx. Reason. 2006
    Learning probabilistic decision graphs
    Jaeger, Manfred and Dalgaard Nielsen, Jens and Silander, Tomi
    [ paper | bibtex | abstract ]
    
              Probabilistic decision graphs (PDGs) are a representation language for probability distributions based on binary decision diagrams. PDGs can encode (context-specific) independence relations that cannot be captured in a Bayesian network structure, and can sometimes provide computationally more efficient representations than Bayesian networks. In this paper we present an algorithm for learning PDGs from data. First experiments show that the algorithm is capable of learning optimal PDG representations in some cases, and that the computational efficiency of PDG models learned from real-life data is very close to the computational efficiency of Bayesian network models.
          
    
              @article{jaeger2006learning,
    author = {Jaeger, Manfred and Dalgaard Nielsen, Jens and Silander, Tomi},
    title = {Learning probabilistic decision graphs},
    journal = {Int. J. Approx. Reason.},
    volume = {42},
    number = {1-2},
    pages = {84--100},
    year = {2006}
    }
  • Int. J. Uncertain. Fuzziness Knowl. Based Syst. 2004
    Probabilistic Decision Graphs - Combining Verification And Ai Techniques For Probabilistic Inference
    Jaeger, Manfred
    [ paper | bibtex | abstract ]
    
              We adopt probabilistic decision graphs developed in the field of automated verification as a tool for probabilistic model representation and inference. We show that probabilistic inference has linear time complexity in the size of the probabilistic decision graph, that the smallest probabilistic decision graph for a given distribution is at most as large as the smallest junction tree for the same distribution, and that in some cases it can in fact be much smaller. Behind these very promising features of probabilistic decision graphs lies the fact that they integrate into a single coherent framework a number of representational and algorithmic optimizations developed for Bayesian networks (use of hidden variables, context-specific independence, structured representation of conditional probability tables).
          
    
              @article{jaeger2004probabilistic,
    author = {Jaeger, Manfred},
    title = {Probabilistic Decision Graphs - Combining Verification And Ai Techniques
    For Probabilistic Inference},
    journal = {Int. J. Uncertain. Fuzziness Knowl. Based Syst.},
    volume = {12},
    number = {Supplement-1},
    pages = {19--42},
    year = {2004}
    }
Probabilistic sentential decision diagrams [ psdds ]
  • IDA 2020
    Discriminative Bias for Learning Probabilistic Sentential Decision Diagrams
    Olascoaga, Laura Isabel Galindez and Meert, Wannes and Shah, Nimish and Van den Broeck, Guy and Verhelst, Marian
    [ paper | bibtex | abstract ]
    
              Methods that learn the structure of Probabilistic Sentential Decision Diagrams (PSDD) from data have achieved state-of-the-art performance in tractable learning tasks. These methods learn PSDDs incrementally by optimizing the likelihood of the induced probability distribution given available data and are thus robust against missing values, a relevant trait to address the challenges of embedded applications, such as failing sensors and resource constraints. However PSDDs are outperformed by discriminatively trained models in classification tasks. In this work, we introduce D-LearnPSDD, a learner that improves the classification performance of the LearnPSDD algorithm by introducing a discriminative bias that encodes the conditional relation between the class and feature variables.
          
    
              @inproceedings{olascoaga2020discriminative,
    author = {Olascoaga, Laura Isabel Galindez and Meert, Wannes and Shah, Nimish and Van den Broeck, Guy and Verhelst, Marian},
    title = {Discriminative Bias for Learning Probabilistic Sentential Decision
    Diagrams},
    booktitle = {{IDA}},
    series = {Lecture Notes in Computer Science},
    volume = {12080},
    pages = {184--196},
    publisher = {Springer},
    year = {2020}
    }
  • Probabilistic Graphical Models 2020
    A New Perspective on Learning Context-Specific Independence
    Shen, Yujia and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              Local structure such as context-specific independence (CSI) has received much attention in the probabilistic graphical model (PGM) literature, as it facilitates the modeling of large complex systems, as well as for reasoning with them. In this paper, we provide a new perspective on how to learn CSIs from data. We propose to first learn a functional and parameterized representation of a conditional probability table (CPT), such as a neural network. Next, we quantize this continuous function, into an arithmetic circuit representation that facilitates efficient inference. In the first step, we can leverage the many powerful tools that have been developed in the machine learning literature. In the second step, we exploit more recently-developed analytic tools from explainable AI, for the purposes of learning CSIs. Finally, we contrast our approach, empirically and conceptually, with more traditional variable-splitting approaches, that search for CSIs more explicitly.
          
    
              @inproceedings{shen2020new,
    author = {Shen, Yujia and Choi, Arthur and Darwiche, Adnan},
    title = {A New Perspective on Learning Context-Specific Independence},
    booktitle = {Probabilistic Graphical Models},
    year = {2020}
    }
  • Int. J. Approx. Reason. 2020
    Tractable inference in credal sentential decision diagrams
    Mattei, Lilith and Antonucci, Alessandro and Deratani Maua, Denis and Facchini, Alessandro and Llerena, Julissa Villanueva
    [ paper | bibtex | abstract ]
    
              Probabilistic sentential decision diagrams are logic circuits where the inputs of disjunctive gates are annotated by probability values. They allow for a compact representation of joint probability mass functions defined over sets of Boolean variables, that are also consistent with the logical constraints defined by the circuit. The probabilities in such a model are usually “learned” from a set of observations. This leads to overconfident and prior-dependent inferences when data are scarce, unreliable or conflicting. In this work, we develop the credal sentential decision diagrams, a generalisation of their probabilistic counterpart that allows for replacing the local probabilities with (so-called credal) sets of mass functions. These models induce a joint credal set over the set of Boolean variables, that sharply assigns probability zero to states inconsistent with the logical constraints. Three inference algorithms are derived for these models. These allow to compute: (i) the lower and upper probabilities of an observation for an arbitrary number of variables; (ii) the lower and upper conditional probabilities for the state of a single variable given an observation; (iii) whether or not all the probabilistic sentential decision diagrams compatible with the credal specification have the same most probable explanation of a given set of variables given an observation of the other variables. These inferences are tractable, as all the three algorithms, based on bottom-up traversal with local linear programming tasks on the disjunctive gates, can be solved in polynomial time with respect to the circuit size. The first algorithm is always exact, while the remaining two might induce a conservative (outer) approximation in the case of multiply connected circuits. A semantics for this approximation together with an auxiliary algorithm able to decide whether or not the result is exact is also provided together with a brute-force characterization of the exact inference in these cases. For a first empirical validation, we consider a simple application based on noisy seven-segment display images. The credal models are observed to properly distinguish between easy and hard-to-detect instances and outperform other generative models not able to cope with logical constraints.
          
    
              @article{mattei2020tractable,
    author = {Mattei, Lilith and Antonucci, Alessandro and Deratani Mau{'{a}}, Denis and Facchini, Alessandro and Llerena, Julissa Villanueva},
    title = {Tractable inference in credal sentential decision diagrams},
    journal = {Int. J. Approx. Reason.},
    volume = {125},
    pages = {26--48},
    year = {2020}
    }
  • NeurIPS 2019
    Towards Hardware-Aware Tractable Learning of Probabilistic Models
    Olascoaga, Laura Isabel Galindez and Meert, Wannes and Shah, Nimish and Verhelst, Marian and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Smart portable applications increasingly rely on edge computing due to privacy and latency concerns. But guaranteeing always-on functionality comes with two major challenges: heavily resource-constrained hardware; and dynamic application conditions. Probabilistic models present an ideal solution to these challenges: they are robust to missing data, allow for joint predictions and have small data needs. In addition, ongoing efforts in field of tractable learning have resulted in probabilistic models with strict inference efficiency guarantees. However, the current notions of tractability are often limited to model complexity, disregarding the hardware's specifications and constraints. We propose a novel resource-aware cost metric that takes into consideration the hardware's properties in determining whether the inference task can be efficiently deployed. We use this metric to evaluate the performance versus resource trade-off relevant to the application of interest, and we propose a strategy that selects the device-settings that can optimally meet users' requirements. We showcase our framework on a mobile activity recognition scenario, and on a variety of benchmark datasets representative of the field of tractable learning and of the applications of interest.
          
    
              @inproceedings{olascoaga2019towards,
    author = {Olascoaga, Laura Isabel Galindez and Meert, Wannes and Shah, Nimish and Verhelst, Marian and Van den Broeck, Guy},
    title = {Towards Hardware-Aware Tractable Learning of Probabilistic Models},
    booktitle = {NeurIPS},
    pages = {13726--13736},
    year = {2019}
    }
  • AAAI 2019
    Structured Bayesian Networks: From Inference to Learning with Routes
    Shen, Yujia and Goyanka, Anchal and Darwiche, Adnan and Choi, Arthur
    [ paper | bibtex | abstract ]
    
              Structured Bayesian networks (SBNs) are a recently proposed class of probabilistic graphical models which integrate background knowledge in two forms: conditional independence constraints and Boolean domain constraints. In this paper, we propose the first exact inference algorithm for SBNs, based on compiling a given SBN to a Probabilistic Sentential Decision Diagram (PSDD). We further identify a tractable subclass of SBNs, which have PSDDs of polynomial size. These SBNs yield a tractable model of route distributions, whose structure can be learned from GPS data, using a simple algorithm that we propose. Empirically, we demonstrate the utility of our inference algorithm, showing that it can be an order-ofmagnitude more efficient than more traditional approaches to exact inference. We demonstrate the utility of our learning algorithm, showing that it can learn more accurate models and classifiers from GPS data.
          
    
              @inproceedings{shen2019structured,
    author = {Shen, Yujia and Goyanka, Anchal and Darwiche, Adnan and Choi, Arthur},
    title = {Structured Bayesian Networks: From Inference to Learning with Routes},
    booktitle = {{AAAI}},
    pages = {7957--7965},
    publisher = {{AAAI} Press},
    year = {2019}
    }
  • ISIPTA 2019
    Credal Sentential Decision Diagrams
    Antonucci, Alessandro and Facchini, Alessandro and Mattei, Lilith
    [ paper | bibtex | abstract ]
    
              Probabilistic sentential decision diagrams are logical circuits annotated by probability mass functions on the disjunctive gates. This allows for a compact representation of joint mass functions consistent with logical constraints. We propose a credal generalisation of the probabilistic quantification of these models, that allows to replace the local probabilities with (credal) sets of mass functions specified by linear constraints. This induces a joint credal set, that sharply assigns probability zero to states inconsistent with the constraints. These models can support cautious estimates of the local parameters when only small amounts of training data are available. Algorithmic strategies to compute lower and upper bounds of marginal and conditional queries are provided. The task can be achieved in linear time with respect to the diagram size for marginal queries. The same can be done for conditional queries if the topology of the circuit is singly connected.
          
    
              @inproceedings{antonucci2019credal,
    author = {Antonucci, Alessandro and Facchini, Alessandro and Mattei, Lilith},
    title = {Credal Sentential Decision Diagrams},
    booktitle = {{ISIPTA}},
    series = {Proceedings of Machine Learning Research},
    volume = {103},
    pages = {14--22},
    publisher = {{PMLR}},
    year = {2019}
    }
  • AAAI 2018
    Conditional PSDDs: Modeling and Learning With Modular Knowledge
    Shen, Yujia and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              Probabilistic Sentential Decision Diagrams (PSDDs) have been proposed for learning tractable probability distributions from a combination of data and background knowledge (in the form of Boolean constraints). In this paper, we propose a variant on PSDDs, called conditional PSDDs, for representing a family of distributions that are conditioned on the same set of variables. Conditional PSDDs can also be learned from a combination of data and (modular) background knowledge. We use conditional PSDDs to define a more structured version of Bayesian networks, in which nodes can have an exponential number of states, hence expanding the scope of domains where Bayesian networks can be applied. Compared to classical PSDDs, the new representation exploits the independencies captured by a Bayesian network to decompose the learning process into localized learning tasks, which enables the learning of better models while using less computation. We illustrate the promise of conditional PSDDs and structured Bayesian networks empirically, and by providing a case study to the modeling of distributions over routes on a map.
          
    
              @inproceedings{shen2018conditional,
    author = {Shen, Yujia and Choi, Arthur and Darwiche, Adnan},
    title = {Conditional PSDDs: Modeling and Learning With Modular Knowledge},
    booktitle = {{AAAI}},
    pages = {6433--6442},
    publisher = {{AAAI} Press},
    year = {2018}
    }
  • UAI 2017
    A Tractable Probabilistic Model for Subset Selection
    Shen, Yujia and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              Subset selection tasks, such as top-k ranking, induce datasets where examples have cardinalities that are known a priori. In this paper, we propose a tractable probabilistic model for subset selection and show how it can be learned from data. Our proposed model is interpretable and subsumes a previously introduced model based on logistic regression. We show how the parameters of our model can be estimated in closed form given complete data, and propose an algorithm for learning its structure in an interpretable space. We highlight the intuitive structures that we learn via case studies. We finally show how our proposed model can be viewed as an instance of the recently proposed Probabilistic Sentential Decision Diagram.
          
    
              @inproceedings{shen2017tractable,
    author = {Shen, Yujia and Choi, Arthur and Darwiche, Adnan},
    title = {A Tractable Probabilistic Model for Subset Selection},
    booktitle = {{UAI}},
    publisher = {{AUAI} Press},
    year = {2017}
    }
  • NIPS 2017
    Tractability in Structured Probability Spaces
    Choi, Arthur and Shen, Yujia and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              Recently, the Probabilistic Sentential Decision Diagram (PSDD) has been proposed as a framework for systematically inducing and learning distributions over structured objects, including combinatorial objects such as permutations and rankings, paths and matchings on a graph, etc. In this paper, we study the scalability of such models in the context of representing and learning distributions over routes on a map. In particular, we introduce the notion of a hierarchical route distribution and show how they can be leveraged to construct tractable PSDDs over route distributions, allowing them to scale to larger maps. We illustrate the utility of our model empirically, in a route prediction task, showing how accuracy can be increased significantly compared to Markov models.
          
    
              @inproceedings{choi2017tractability,
    author = {Choi, Arthur and Shen, Yujia and Darwiche, Adnan},
    title = {Tractability in Structured Probability Spaces},
    booktitle = {{NIPS}},
    pages = {3477--3485},
    year = {2017}
    }
  • NIPS 2016
    Tractable Operations for Arithmetic Circuits of Probabilistic Models
    Shen, Yujia and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              We consider tractable representations of probability distributions and the polytime operations they support. In particular, we consider a recently proposed arithmetic circuit representation, the Probabilistic Sentential Decision Diagram (PSDD). We show that PSDD supports a polytime multiplication operator, while they do not support a polytime operator for summing-out variables. A polytime multiplication operator make PSDDs suitable for a broader class of applications compared to arithmetic circuits, which do not in general support multiplication. As one example, we show that PSDD multiplication leads to a very simple but effective compilation algorithm for probabilistic graphical models: represent each model factor as a PSDD, and then multiply them.
          
    
              @inproceedings{shen2016tractable,
    author = {Shen, Yujia and Choi, Arthur and Darwiche, Adnan},
    title = {Tractable Operations for Arithmetic Circuits of Probabilistic Models},
    booktitle = {{NIPS}},
    pages = {3936--3944},
    year = {2016}
    }
  • IJCAI 2015
    Tractable Learning for Structured Probability Spaces: {A} Case Study in Learning Preference Distributions
    Choi, Arthur and Van den Broeck, Guy and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              Probabilistic sentential decision diagrams (PSDDs) are a tractable representation of structured proba- bility spaces, which are characterized by complex logical constraints on what constitutes a possible world. We develop general-purpose techniques for probabilistic reasoning and learning with PSDDs, allowing one to compute the probabilities of arbi- trary logical formulas and to learn PSDDs from in- complete data. We illustrate the effectiveness of these techniques in the context of learning pref- erence distributions, to which considerable work has been devoted in the past. We show, analyti- cally and empirically, that our proposed framework is general enough to support diverse and complex data and query types. In particular, we show that it can learn maximum-likelihood models from partial rankings, pairwise preferences, and arbitrary pref- erence constraints. Moreover, we show that it can efficiently answer many queries exactly, from ex- pected and most likely rankings, to the probability of pairwise preferences, and diversified recommen- dations. This case study illustrates the effectiveness and flexibility of the developed PSDD framework as a domain-independent tool for learning and rea- soning with structured probability spaces.
          
    
              @inproceedings{choi2015tractable,
    author = {Choi, Arthur and Van den Broeck, Guy and Darwiche, Adnan},
    title = {Tractable Learning for Structured Probability Spaces: {A} Case Study
    in Learning Preference Distributions},
    booktitle = {{IJCAI}},
    pages = {2861--2868},
    publisher = {{AAAI} Press},
    year = {2015}
    }
  • NIPS 2015
    Tractable Learning for Complex Probability Queries
    Bekker, Jessa and Davis, Jesse and Choi, Arthur and Darwiche, Adnan and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Tractable learning aims to learn probabilistic models where inference is guaranteed to be efficient. However, the particular class of queries that is tractable depends on the model and underlying representation. Usually this class is MPE or conditional probabilities Pr(x|y) for joint assignments x, y. We propose a tractable learner that guarantees efficient inference for a broader class of queries. It simultaneously learns a Markov network and its tractable circuit representation, in order to guarantee and measure tractability. Our approach differs from earlier work by using Sentential Decision Diagrams (SDD) as the tractable language instead of Arithmetic Circuits (AC). SDDs have desirable properties, which more general representations such as ACs lack, that enable basic primitives for Boolean circuit compilation. This allows us to support a broader class of complex probability queries, including counting, threshold, and parity, in polytime.
          
    
              @inproceedings{bekker2015tractable,
    author = {Bekker, Jessa and Davis, Jesse and Choi, Arthur and Darwiche, Adnan and Van den Broeck, Guy},
    title = {Tractable Learning for Complex Probability Queries},
    booktitle = {{NIPS}},
    pages = {2242--2250},
    year = {2015}
    }
  • KR 2014
    Probabilistic Sentential Decision Diagrams
    Kisa, Doga and Van den Broeck, Guy and Choi, Arthur and Darwiche, Adnan
    [ paper | bibtex | abstract ]
    
              We propose the Probabilistic Sentential Decision Diagram (PSDD): A complete and canonical representation of probability distributions defined over the models of a given propositional theory. Each parameter of a PSDD can be viewed as the (conditional) probability of making a decision in a corresponding Sentential Decision Diagram (SDD). The SDD itself is a recently proposed complete and canonical representation of propositional theories. We explore a number of interesting properties of PSDDs, including the independencies that underlie them. We show that the PSDD is a tractable representation. We further show how the parameters of a PSDD can be efficiently estimated, in closed form, from complete data. We empirically evaluate the quality of PSDDs learned from data, when we have knowledge, a priori, of the domain logical constraints.
          
    
              @inproceedings{kisa2014probabilistic,
    author = {Kisa, Doga and Van den Broeck, Guy and Choi, Arthur and Darwiche, Adnan},
    title = {Probabilistic Sentential Decision Diagrams},
    booktitle = {{KR}},
    publisher = {{AAAI} Press},
    year = {2014}
    }
Other probabilistic circuits [ pcs ]
  • IEEE International Solid- State Circuits Conference (ISSCC)
    PIU: A 248GOPS/W Stream-Based Processor for Irregular Probabilistic Inference Networks Using Precision-Scalable Posit Arithmetic in 28nm
    Shah, Nimish and Galindez Olascoaga, Laura Isabel and Zhao, Shirui and Meert, Wannes and Verhelst, Marian
    [ paper | bibtex | abstract ]
    
              
          
    
              @inproceedings{shah2021piu,
    author= {Shah, Nimish and Olascoaga, Laura Isabel Galindez and Zhao, Shirui and Meert, Wannes and Verhelst, Marian},
    booktitle={2021 IEEE International Solid- State Circuits Conference (ISSCC)},
    title={PIU: A 248GOPS/W Stream-Based Processor for Irregular Probabilistic Inference Networks Using Precision-Scalable Posit Arithmetic in 28nm},
    year={2021},
    volume={64},
    number={},
    pages={150-152},
    doi={10.1109/ISSCC42613.2021.9366061}}
  • AAAI Demo track 2021
    Juice: A Julia Package for Logic and Probabilistic Circuits
    Dang, Meihua and Khosravi, Pasha and Liang, Yitao and Vergari, Antonio and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              JUICE is an open-source Julia package providing tools for logic and probabilistic reasoning and learning based on logic circuits (LCs) and probabilistic circuits (PCs). It provides arange of efficient algorithms for probabilistic inference queries, such as computing marginal probabilities (MAR), as well as many more advanced queries. Certain structural circuit proper-ties are needed to achieve this tractability, which JUICE helps validate. Additionally, it supports several parameter and struc-ture learning algorithms proposed in the recent literature. By leveraging parallelism (on both CPU and GPU), JUICE provides a fast implementation of circuit-based algorithms, which makes it suitable for tackling large-scale datasets and models.
          
    
              @inproceedings{dang2021juice, 
    title = {Juice: A Julia Package for Logic and Probabilistic Circuits},
    author = {Dang, Meihua and Khosravi, Pasha and Liang, Yitao and Vergari, Antonio and Van den Broeck, Guy},
    booktitle = {Proceedings of the 35th AAAI Conference on Artificial Intelligence (Demo Track)},
    month = {Feb},
    year = {2021}
    }
  • AAAI 2021
    On the Tractability of SHAP Explanations
    Van den Broeck, Guy and Lykov, Anton and Schleich, Maximilian and Suciu, Dan
    [ paper | bibtex | abstract ]
    
              SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.
          
    
              @inproceedings{vandenbroeck2020tractability,
    author = {Van den Broeck, Guy and Lykov, Anton and Schleich, Maximilian and Suciu, Dan},
    title = {On the Tractability of {SHAP} Explanations},
    booktitle = {AAAI},
    year = {2021}
    }
  • AAAI 2021
    Group Fairness by Probabilistic Modeling with Latent Fair Decisions
    Choi, YooJung and Dang, Meihua and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved. Moreover, we show on real-world datasets that our approach not only is a better model than existing methods of how the data was generated but also achieves competitive accuracy.
          
    
              @inproceedings{choi2020group,
    author = {Choi, YooJung and Dang, Meihua and Van den Broeck, Guy},
    title = {Group Fairness by Probabilistic Modeling with Latent Fair Decisions},
    booktitle = {AAAI},
    year = {2020}
    }
  • UAI 2020
    On the Relationship Between Probabilistic Circuits and Determinantal Point Processes
    Zhang, Honghua and Holtzen, Steven and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of the most prominent classes of TPMs are determinantal point processes (DPPs) and probabilistic circuits (PCs). This paper provides the first systematic study of their relationship. We propose a unified analysis and shared language for discussing DPPs and PCs. Then we establish theoretical barriers for the unification of these two families, and prove that there are cases where DPPs have no compact representation as a class of PCs. We close with a perspective on the central problem of unifying these tractable models.
          
    
              @inproceedings{zhang2020relationship,
    author = {Zhang, Honghua and Holtzen, Steven and Van den Broeck, Guy},
    title = {On the Relationship Between Probabilistic Circuits and Determinantal
    Point Processes},
    booktitle = {{UAI}},
    series = {Proceedings of Machine Learning Research},
    volume = {124},
    pages = {1188--1197},
    publisher = {{AUAI} Press},
    year = {2020}
    }
  • DATE 2020
    Acceleration of probabilistic reasoning through custom processor architecture
    Shah, Nimish and Galindez Olascoaga, Laura Isabel and Meert, Wannes and Verhelst, Marian
    [ paper | bibtex | abstract ]
    
              Probabilistic reasoning is an essential tool for robust decision-making systems because of its ability to explicitly handle real-world uncertainty, constraints and causal relations. Consequently, researchers are developing hybrid models by combining Deep Learning with probabilistic reasoning for safety-critical applications like self-driving vehicles, autonomous drones, etc. However, probabilistic reasoning kernels do not execute efficiently on CPUs or GPUs. This paper, therefore, proposes a custom programmable processor to accelerate sum-product networks, an important probabilistic reasoning execution kernel. The processor has an optimized datapath architecture and memory hierarchy optimized for sum-product networks execution. Experimental results show that the processor, while requiring fewer computational and memory units, achieves a 12x throughput benefit over the Nvidia Jetson TX2 embedded GPU platform.
          
    
              @inproceedings{shah2020acceleration,
    author = {Shah, Nimish and Olascoaga, Laura Isabel Galindez and Meert, Wannes and Verhelst, Marian},
    title = {Acceleration of probabilistic reasoning through custom processor architecture},
    booktitle = {{DATE}},
    pages = {322--325},
    publisher = {{IEEE}},
    year = {2020}
    }
  • PGM 2020
    Sum-Product-Transform Networks: Exploiting Symmetries using Invertible Transformations
    Pevny, Tomas and Smidl, Vasek and Trapp, Martin and Polacek, Ondrej and Oberhuber, Tomas
    [ paper | bibtex | abstract ]
    
              In this work, we propose Sum-Product-Transform Networks (SPTN), an extension of sum-product networks that uses invertible transformations as additional internal nodes. The type and placement of transformations determine properties of the resulting SPTN with many interesting special cases. Importantly, SPTN with Gaussian leaves and affine transformations pose the same inference task tractable that can be computed efficiently in SPNs. We propose to store affine transformations in their SVD decompositions using an efficient parametrization of unitary matrices by a set of Givens rotations. Last but not least, we demonstrate that G-SPTNs achieve state-of-the-art results on the density estimation task and are competitive with state-of-the-art methods for anomaly detection.
          
    
              @inproceedings{pevny2020sptn,
    author = {Pevny, Tomas and Smidl, Vasek and Trapp, Martin and Polacek, Ondrej and Oberhuber, Tomas},
    title = {Sum-Product-Transform Networks: Exploiting Symmetries using Invertible Transformations},
    booktitle = {PGM},
    year = {2020}
    }
  • Machine Learning Journal (MLJ) 2020
    Sum-product graphical models
    Desana, Mattia and Schnoerr, Christoph
    [ paper | code | bibtex | abstract ]
    
              This paper introduces a probabilistic architecture called sum–product graphical model (SPGM). SPGMs represent a class of probability distributions that combines, for the first time, the semantics of probabilistic graphical models (GMs) with the evaluation efficiency of sum–product networks (SPNs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, this approach provides new connections between the fields of SPNs and GMs, and enables a high-level interpretation of the family of distributions encoded by SPNs. We provide two applications of SPGMs in density estimation with empirical results close to or surpassing state-of-the-art models. The theoretical and practical results demonstrate that jointly exploiting properties of SPNs and GMs is an interesting direction of future research.
          
    
              @article{desana2020sum-product,
    author = {Desana, Mattia and Schnoerr, Christoph},
    title = {Sum-product graphical models},
    journal = {Mach. Learn.},
    volume = {109},
    number = {1},
    pages = {135--173},
    year = {2020}
    }
  • Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits
    Peharz, Robert and Lang, Steven and Vergari, Antonio and Stelzner, Karl and Molina, Alejandro and Trapp, Martin and Van den Broeck, Guy and Kersting, Kristian and Ghahramani, Zoubin
    [ paper | bibtex | abstract ]
    
              Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Furthermore, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.
          
    
              @inproceedings{peharz2020einsum,
    author = {Peharz, Robert and Lang, Steven and Vergari, Antonio and Stelzner, Karl and Molina, Alejandro and Trapp, Martin and Van den Broeck, Guy and Kersting, Kristian and Ghahramani, Zoubin},
    title = {Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
    Circuits},
    booktitle = {{ICML}},
    series = {Proceedings of Machine Learning Research},
    volume = {119},
    pages = {7563--7574},
    publisher = {{PMLR}},
    year = {2020}
    }
  • arXiv 2020
    Handling Missing Data in Decision Trees: A Probabilistic Approach
    Khosravi, Pasha and Vergari, Antonio and Choi, YooJung and Liang, Yitao and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the 'expected prediction' of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their 'expected prediction loss' w.r.t. our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.
          
    
              @article{khosravi2020handling,
    author = {Khosravi, Pasha and Vergari, Antonio and Choi, YooJung and Liang, Yitao and Van den Broeck, Guy},
    title = {Handling Missing Data in Decision Trees: {A} Probabilistic Approach},
    journal = {CoRR},
    volume = {abs/2006.16341},
    year = {2020}
    }
  • PGM 2020
    Strudel: Learning Structured-Decomposable Probabilistic Circuits
    Dang, Meihua and Vergari, Antonio and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Probabilistic circuits (PCs) represent a probability distribution as a computational graph. Enforcing structural properties on these graphs guarantees that several inference scenarios become tractable. Among these properties, structured decomposability is a particularly appealing one: it enables the efficient and exact computations of the probability of complex logical formulas, and can be used to reason about the expected output of certain predictive models under missing data. This paper proposes Strudel, a simple, fast and accurate learning algorithm for structured-decomposable PCs. Compared to prior work for learning structured-decomposable PCs, Strudel delivers more accurate single PC models in fewer iterations, and dramatically scales learning when building ensembles of PCs. It achieves this scalability by exploiting another structural property of PCs, called determinism, and by sharing the same computational graph across mixture components. We show these advantages on standard density estimation benchmarks and challenging inference scenarios.
          
    
              @inproceedings{dang2020strudel,
    author = {Dang, Meihua and Vergari, Antonio and Van den Broeck, Guy},
    title = {Strudel: Learning Structured-Decomposable Probabilistic Circuits},
    booktitle = {PGM},
    series = {Proceedings of Machine Learning Research},
    year = {2020}
    }
  • NeurIPS 2019
    On Tractable Computation of Expected Predictions
    Khosravi, Pasha and Choi, YooJung and Liang, Yitao and Vergari, Antonio and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              Computing expected predictions of discriminative models is a fundamental task in machine learning that appears in many interesting applications such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations, as well as moments of any order, of the latter with respect to the former in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations for classification scenarios in which exact computations are intractable. Our framework to compute expected predictions allows for handling of missing data during prediction time in a principled and accurate way and enables reasoning about the behavior of discriminative models. We empirically show our algorithm to consistently outperform standard imputation techniques on a variety of datasets. Finally, we illustrate how our framework can be used for exploratory data analysis.
          
    
              @inproceedings{khosravi2019tractable,
    author = {Khosravi, Pasha and Choi, YooJung and Liang, Yitao and Vergari, Antonio and Van den Broeck, Guy},
    title = {On Tractable Computation of Expected Predictions},
    booktitle = {NeurIPS},
    pages = {11167--11178},
    year = {2019}
    }
  • NeurIPS 2019
    Smoothing Structured Decomposable Circuits
    Shih, Andy and Van den Broeck, Guy and Beame, Paul and Amarilli, Antoine
    [ paper | bibtex | abstract ]
    
              We study the task of smoothing a circuit, i.e., ensuring that all children of a plus-gate mention the same variables. Circuits serve as the building blocks of state-of-the-art inference algorithms on discrete probabilistic graphical models and probabilistic programs. They are also important for discrete density estimation algorithms. Many of these tasks require the input circuit to be smooth. However, smoothing has not been studied in its own right yet, and only a trivial quadratic algorithm is known. This paper studies efficient smoothing for structured decomposable circuits. We propose a near-linear time algorithm for this task and explore lower bounds for smoothing decomposable circuits, using existing results on range-sum queries. Further, for the important case of All-Marginals, we show a more efficient linear-time algorithm. We validate experimentally the performance of our methods.
          
    
              @inproceedings{shih2019smoothing,
    author = {Shih, Andy and Van den Broeck, Guy and Beame, Paul and Amarilli, Antoine},
    title = {Smoothing Structured Decomposable Circuits},
    booktitle = {NeurIPS},
    pages = {11412--11422},
    year = {2019}
    }
  • AAAI 2019
    Learning Logistic Circuits
    Liang, Yitao and Van den Broeck, Guy
    [ paper | bibtex | abstract ]
    
              This paper proposes a new classification model called logistic circuits. On MNIST and Fashion datasets, our learning algorithm outperforms neural networks that have an order of magnitude more parameters. Yet, logistic circuits have a distinct origin in symbolic AI, forming a discriminative counterpart to probabilistic-logical circuits such as ACs, SPNs, and PSDDs. We show that parameter learning for logistic circuits is convex optimization, and that a simple local search algorithm can induce strong model structures from data.
          
    
              @inproceedings{liang2019learning,
    author = {Liang, Yitao and Van den Broeck, Guy},
    title = {Learning Logistic Circuits},
    booktitle = {{AAAI}},
    pages = {4277--4286},
    publisher = {{AAAI} Press},
    year = {2019}
    }
  • DAC 2019
    ProbLP: A framework for low-precision probabilistic inference
    Shah, Nimish and Galindez Olascoaga, Laura Isabel and Meert, Wannes and Verhelst, Marian
    [ paper | bibtex | abstract ]
    
              Bayesian reasoning is a powerful mechanism for probabilistic inference in smart edge-devices. During such inferences, a low-precision arithmetic representation can enable improved energy efficiency. However, its impact on inference accuracy is not yet understood. Furthermore, general-purpose hardware does not natively support low-precision representation. To address this, we propose ProbLP, a framework that automates the analysis and design of low-precision probabilistic inference hardware. It automatically chooses an appropriate energy-efficient representation based on worst-case error-bounds and hardware energy-models. It generates custom hardware for the resulting inference network exploiting parallelism, pipelining and low-precision operation. The framework is validated on several embedded-sensing benchmarks.
          
    
              @inproceedings{shah2019problp:,
    author = {Shah, Nimish and Isabel Galindez Olascoaga, Laura and Meert, Wannes and Verhelst, Marian},
    title = {ProbLP: A framework for low-precision probabilistic inference},
    booktitle = {DAC 2019},
    pages = {190},
    publisher = {ACM},
    year = {2019},
    url = {https://doi.org/10.1145/3316781.3317885},
    doi = {10.1145/3316781.3317885}
    }
  • NeurIPS 2018
    Submodular Field Grammars: Representation Inference and Application to Image Parsing
    Friesen, Abram L. and Domingos, Pedro M.
    [ paper | bibtex | abstract ]
    
              Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part. Unfortunately, in contrast to language grammars, where the number of possible split points for a production A→BC is linear in the length of A, in an image there are an exponential number of ways to split a region into subregions. This makes parsing intractable and requires image grammars to be severely restricted in practice, for example by allowing only rectangular regions. In this paper, we address this problem by associating with each production a submodular Markov random field whose labels are the subparts and whose labeling segments the current object into these subparts. We call the result a submodular field grammar (SFG). Finding the MAP split of a region into subregions is now tractable, and by exploiting this we develop an efficient approximate algorithm for MAP parsing of images with SFGs. Empirically, we present promising improvements in accuracy when using SFGs for scene understanding, and show exponential improvements in inference time compared to traditional methods, while returning comparable minima.
          
    
              @inproceedings{friesen2018submodular,
    author = {Friesen, Abram L. and Domingos, Pedro M.},
    title = {Submodular Field Grammars: Representation Inference and Application
    to Image Parsing},
    booktitle = {NeurIPS},
    pages = {4312--4322},
    year = {2018}
    }
  • ICLR (Workshop) 2017
    Compositional Kernel Machines
    Gens, Robert and Domingos, Pedro M.
    [ paper | bibtex | abstract ]
    
              Convolutional neural networks (convnets) have achieved impressive results on recent computer vision benchmarks. While they benefit from multiple layers that encode nonlinear decision boundaries and a degree of translation invariance, training convnets is a lengthy procedure fraught with local optima. Alternatively, a kernel method that incorporates the compositionality and symmetry of convnets could learn similar nonlinear concepts yet with easier training and architecture selection. We propose compositional kernel machines (CKMs), which effectively create an exponential number of virtual training instances by composing transformed sub-regions of the original ones. Despite this, CKM discriminant functions can be computed efficiently using ideas from sum-product networks. The ability to compose virtual instances in this way gives CKMs invariance to translations and other symmetries, and combats the curse of dimensionality. Just as support vector machines (SVMs) provided a compelling alternative to multilayer perceptrons when they were introduced, CKMs could become an attractive approach for object recognition and other vision problems. In this paper we define CKMs, explore their properties, and present promising results on NORB datasets. Experiments show that CKMs can outperform SVMs and be competitive with convnets in a number of dimensions, by learning symmetries and compositional concepts from fewer samples without data augmentation.
          
    
              @inproceedings{gens2017compositional,
    author = {Gens, Robert and Domingos, Pedro M.},
    title = {Compositional Kernel Machines},
    booktitle = {{ICLR} (Workshop)},
    publisher = {OpenReview.net},
    year = {2017}
    }