Modelling and classification of time series stemming from visual workflows is a very challenging problem due to the inherent complexity of the activity patterns involved and the difficulty in tracking moving targets. In this paper, we propose a framework for classification of visual tasks in industrial environments. We propose a novel method to automatically segment the input stream and to classify the resulting segments using prior knowledge and hidden Markov models (HMMs), combined through a genetic algorithm. We compare this method to an echo state network (ESN) approach, which is appropriate for general-purpose time-series classification. In addition, we explore the applicability of several fusion schemes for multicamera configuration in order to mitigate the problem of limited visibility and occlusions. The performance of the suggested approaches is evaluated on real-world visual behaviour scenarios.