Mismatch String Kernels for SVM Protein Classification

Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)

Bibtex Metadata Paper


Eleazar Eskin, Jason Weston, William Noble, Christina Leslie


We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence sim- ilarity based on shared occurrences of  -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most success- ful method for remote homology detection, while achieving considerable computational savings.