Tagged: AI RSS

  • trung 6:05 pm on March 22, 2011 Permalink | Reply
    Tags: AI   

    An introduction to Q-learning 

    Q-learning [Watkins, 1989] is one of the most popular reinforcement learning methods. One of the advantages of Q-learning is its ability to compare the expected utility of the available actions without requiring a model of the environment.

    The basic content of Q-learning is inside the below equation:

    Q_{t+1}(a, s)=(1-\alpha_{t})Q_{t}(a,s)+\alpha_{t}[r_{t}(s)+\gamma\max_{a^{'}}{Q_{t}(a',s')}]

    Where:

    • Q_{t}(a,s) is the Q-value at time t, state s with action a.
    • r_{t} is the reward.
    • \alpha is the learning rate. The learning rate determines how fast and how important the new information is to be learned. If \alpha is 0, the agent does not learn anything. If \alpha is 1, only the new information is considered and all old information is discarded.
    • \gamma is the discount factor. The discount factor is in range [0..1] and is used to weight new term reinforcement more heavily than distant future reinforcement. The closer \gamma is to 1, the greater the weight of future reinforcement.

    So what does the equation mean ? We now assume \alpha=1 and \gamma=1, then the equation becomes:

    Q_{t+1}(a, s)=r_{t}(s)+max_{a'}{Q_{t}(a',s')}

    It is now easy to see that the Q-value of state-action pair (a,s) is equal to the maximum Q-value of next state (for all next actions) adding the reward of action a. The learning method is obviously a dynamic algorithm that gives the optimal Q-value for state-action pairs.

    When the discount factor is enabled (<1),  it makes the reward reduced by time and hence the total reward at time t is given by:

    R_{t}=r_{t}+\gamma r_{t+1} + \gamma^2 r_{t+2} + \dots + \gamma^n r_{t+n} + \dots

    The bellow java applet is a very good illustration of Q-learning (thank to Vander B. Frank):

    For the detail of how the applet works, please reach the document of Vander B. Frank through this PDF.



    Bibliography

    1. Wikipedia: Q-learning [http://en.wikipedia.org/wiki/Q-learning].
    2. Vander B. Frank: Q-learning. IRIDIA, Universit Libre de Bruxelles. 7, 2003. [PDF]
    3. Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England

     
  • trung 5:04 pm on December 10, 2010 Permalink | Reply
    Tags: AI, genetic, music   

    Have a look at my project in Music Hackday, London, 2010: “Generating Music Using Genetic Algorithms in CUDA

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
esc
cancel