AWS DeepRacer 獎勵函數範例

以下列出 AWS DeepRacer 獎勵功能的一些範例。

主題

範例 1：遵循時間試驗中的中心線
示例 2：在時間試驗中留在兩個邊界內
示例 3：在時間試驗中防止鋸齒形
例子 4：停留在一條車道上，而不會撞到固定的障礙物或移動車輛

範例 1：遵循時間試驗中的中心線

此範例會判斷代理程式與中心線距離多遠，並在代理程式與賽道中心較接近時給予較高的獎勵，鼓勵代理程式緊跟著中心線。


def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are increasingly further away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    return reward

示例 2：在時間試驗中留在兩個邊界內

這個例子簡單地給出了很高的回報，如果代理停留在邊界內，並讓代理找出完成一圈的最佳路徑。編程和理解很容易，但可能需要更長的時間才能收斂。


def reward_function(params):
    '''
    Example of rewarding the agent to stay inside the two borders of the track
    '''
    
    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    
    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and 
    # the car is somewhere in between the track borders 
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return reward

示例 3：在時間試驗中防止鋸齒形

此範例會鼓勵代理程式依循中心線，但會在其轉向過大時以較低的獎勵進行懲罰，有助於防止蛇行行為。代理程式學會在模擬器中順利駕駛，並且在部署到實體車輛時可能會保持相同的行為。


def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''
    
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 

    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)

例子 4：停留在一條車道上，而不會撞到固定的障礙物或移動車輛

此獎勵功能獎勵留在軌道邊界內的代理商，並懲罰代理人因太靠近它前面的物體而受到懲罰。代理程式可以變換車道來避免衝撞。總獎勵是獎勵和懲罰的加權總和。這個例子給了更多的權重，以避免崩潰的懲罰。試驗不同的平均權重，以針對不同的行為結果進行訓練。


import math
def reward_function(params):
    '''
    Example of rewarding the agent to stay inside two borders
    and penalizing getting too close to the objects in front
    '''
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    objects_location = params['objects_location']
    agent_x = params['x']
    agent_y = params['y']
    _, next_object_index = params['closest_objects']
    objects_left_of_center = params['objects_left_of_center']
    is_left_of_center = params['is_left_of_center']
    # Initialize reward with a small number but not zero
    # because zero means off-track or crashed
    reward = 1e-3
    # Reward if the agent stays inside the two borders of the track
    if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
        reward_lane = 1.0
    else:
        reward_lane = 1e-3
    # Penalize if the agent is too close to the next object
    reward_avoid = 1.0
    # Distance to the next object
    next_object_loc = objects_location[next_object_index]
    distance_closest_object = math.sqrt((agent_x - next_object_loc[0])**2 + (agent_y - next_object_loc[1])**2)
    # Decide if the agent and the next object is on the same lane
    is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center
    if is_same_lane:
        if 0.5 <= distance_closest_object < 0.8:
            reward_avoid *= 0.5
        elif 0.3 <= distance_closest_object < 0.5:
            reward_avoid *= 0.2
        elif distance_closest_object < 0.3:
            reward_avoid = 1e-3  # Likely crashed
    # Calculate reward by putting different weights on
    # the two aspects above
    reward += 1.0 * reward_lane + 4.0 * reward_avoid
    return reward

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

獎勵函數輸入參數

此頁面是否有幫助？

提供意見回饋

下一個主題：

上一個主題：

獎勵函數輸入參數

需要協助？

隱私權網站條款 Cookie 偏好設定

選取您的 Cookie 偏好設定

自訂 Cookie 偏好設定

必要

效能

功能

廣告

無法儲存 Cookie 偏好設定