Though the output space sampling is novel. There are various sampling algorithms. In frequent pattern mining that focus on sampling the input space. Of an FPM algorithm. Mining interesting subgraphs by output space sampling.

A sampling algorithm to uniformly sample maximal frequent subgraphs that uses Markov Chain Monte Carlo. Though proposed as a frequent subgraph summarization algorithm. Is one of the first algorithms that aims to sample the output space of patterns; however. The sampling is limited to the maximal patterns.

By sampling output space. We mean to sample one feasible subgraph from a user-specified discrete distribution. The user wants to sample in proportion to the interestingness score. The discrete distribution can be constructed from the interestingness value of all the feasible subgraphs in the output space.

Uniform Sampling of k Maximal Patterns. Efficient Mining of Top-K Frequent Subgraphs edges to explore the search space. Keep a queue Q k that contains the current top-k subgraphs found until now. When k patterns are found. Raise minsup to the support of the least frequent subgraph in Q k.

For each subgraph added to Q k. Raise the minsup threshold. When the algorithm terminates. The top-k subgraphs have been found. FANMOD is a tool that implements the RAND-ESU algorithm for enumerating and sampling subgraphs as well as the full enumeration algorithm. RAND-ESU was designed to address the bias of the sampling method for subgraph counting implemented in Mfinder. As Mfinder's sampling method is prone to sampling certain subgraphs more often than others. RAND-ESU fixes this bias and is faster. It enumerates all subgraphs of a certain size. Although during the execution it will ignore some of.

A reverse search algorithm for mining maximal. An algorithm for mining all maximal frequent subgraphs is to enumerate the frequent subgraphs enumeration tree and to report subgraphs that do not have any frequent valid or invalid extension. This algorithm is a straightforward extension of Algorithm 2. To decide locally if a subgraph is a maximal frequent subgraph. We need to switch lines 8 and 9 in Algorithm 2. We also need a flag before line 7 that is set to.

Definition of correlation. Target the graph database scenario where there are multiple graphs. Two subgraphs A and B are called correlated if the containment of A within a data graph increases the likelihood of containing B as well. We have only one large data graph. Subgraphs A and B are defined to be correlated if the instances of A are frequently located in close proximity to the instances of B.

