PhD Thesis

My thesis investigates the mining of sequential data in order to (1) provide interesting patterns by considering contextual information, and (2) exploit such patterns for other tasks such as classification, prediction or anomaly detection.

The first part of this thesis aims at considering contextual information associated with data during the frequent pattern mining process, in order to provide the expert with patterns that are representative of a context. Existing work prior to this thesis could not reveal that some patterns strongly depend on context. We thus provide the notion of contextual frequent pattern, where a pattern is associated with a context. In addition, we generalize the notion of contextual pattern to various interestingness measures (other than frequency): information gain, growth rate, etc. In both cases, we unveil and exploit some essential theoretical properties of contextual patterns and provide efficient algorithms.

The second part of this work concerns the use of contextual patterns to address various data mining tasks. We mainly focus here on sequential data in order to perform pattern-based classification, prediction and anomaly detection. Being able to consider contextual information is here of great help. For instance, contextual patterns can highlight the fact that a behavior that is considered as anomalous in summer can be considered as normal in winter. It is therefore absolutely necessary to understand what is changing according to the context. Our approaches have been experimented on various real-world datasets and have been showed to be efficient in practical applications.

Current version of the PhD thesis here (in French)

Topics

  • Frequent pattern mining
  • Context-aware patterns
  • Pattern-based applications (classification, prediction, anomaly detection)
  • Sensor data mining
  • Sequential data mining