This paper proposes a new approach to text-to-speech based on Gaussian processes which are widely used to perform non-parametric Bayesian regression and classification. The Gaussian process regression model is designed for the prediction of frame-level acoustic features from the corresponding frame information. The frame information includes relative position in the phone and preceding and succeeding phoneme information obtained from linguistic information. In this paper, a frame context kernel is proposed as a similarity measure of respective frames. Experimental results using a small data set show the potential of the proposed approach without state-dependent dynamic features or decision-tree clustering used in a conventional HMM-based approach.