Consider a grid environment of a linear corridor of length n where the last cell on the right contains the goal. Suppose that the only action possible in all non-goal cells is going right. Consider a passive TD-Learning agent which always starts from the leftmost cell and keeps executing the right action until the goal is reached. Several such trials may be needed for the final values to converge. Assume that the learning rate alpha is 1 and reward is -1 for every step.(a) What will be the final value of the starting state after the algorithm converges?(b) How many trials would be needed for convergence?(c) Estimate the number of actions the agent takes before the convergence.