In this paper, a novel analysis technique is applied for investigating human operators' trial and error learning process to control a nonholonomic system, 2-link planer underactuated manipulator (2PUAM). An essential core of the technique is to use a value function of the reinforcement learning scheme for revealing how the operators can find a control strategy. It is an advantage of the proposed technique compared to the others that a transition of the value function may explain the changes of the operators' strategies during the learning process. According to the results of the analysis, the operators tended to explore an objective trajectory first, and then shift to the tracking control of the trajectory. The tracking was accompanied with acceleration to achieve the goal faster. Interestingly, the acceleration disturbs the objective trajectory due to the complex dynamics of the target, and induces another exploration to get better trajectories. The fact that this phase transition structure under unsupervised learning environment is consistent with previously reported results for a supervised case implies that the structure can be a general nature of human learning process.