
Student Projects
VM450
Evaluation and Improvement of Vehicle Energy Optimization System in Open-ended Simulation Environment
Project Video
Team Members

Team Members:
Fang, Han; Xu, Zeheng; Xu, Yihang; Lu, Dongyun Zhang, Siyuan
Instructors:
Chengbin Ma
Project Description
-
Problem
With the rise of unmanned driving, the energy consumption of unmanned driving vehicles becomes the next focus of the industry. Companies like Newrizon try to use algorithms for driving assistants to optimize energy consumption. However, such optimization tests in roads take much time and money. So the implementation of energy consumption in a simulation platform will be a good approach.
Figure 1: Newrizon Vehicle Company[1]
-
Concept Generation
Among calibration, imitation learning, and Reinforcement, we choose Deep Deterministic Policy Gradient as the optimization method. And Metadrive is chosen as the better simulation platform than Carla. DDPG is a structure in reinforcement learning that enables the system to carry out optimizations in the process of seeking a higher reward. Metadrive is a simulation platform that is able to generate infinite maps with various traffic and environmental settings with high efficiency.
Figure 2: Method & Platform Selection
-
Design Description
Metadrive generates an environment that allows the agent to explore, handling the physical interactions between the vehicle and map. The agent will generate action based on the observation and interaction with the environment, receiving feedback or rewards. Standing on the DDPG algorithm, experiences by exploring and exploiting the possible outcomes of different actions. The back and forth interactions form a loop, the training process, which will the agent is able to learn from the not terminate until convergence.
Figure 3: Concept diagram
-
Validation
Validation Process:
For the energy optimization rate, the energy consumption of idem policy and DDPG agent was compared. For the energy consumption, a vehicle energy estimation model[2], which considers the friction, wind drag, acceleration, and steering process was applied.
For the response time and process delay, the average of multiple time slots in one process was calculated.
Some other specifications can also be verified using easy experiments.
Validation Results:
According to the validation part, most specifications can be met.
Energy optimization rate>=5%
Response time<=0.02s
Process delay<=0.016s
Brake distance at 50km/h<=21m
Developing time<=200h
Cost<=$1500
Average speed>=81km/h
means to be further determined and subject to change.
-
Modeling and Analysis
For each step the agent interacts with the environment, a reward will be given, which is a weighted sum of stability, speed, and energy consumption. As can be seen in the figure below, at the beginning of the training, the agent gets a high reward occasionally. However, as the training goes on, the agent can constantly get a higher reward, which means that the agent is learning to behave better. A peak even occurred at the later training process, meaning that the vehicle reaches the terminal state successfully.
Figure 4: Reward with training step
-
Conclusion
Reinforcement learning can be used for optimizing the energy consumption in the simulation platform. A well-designed reward is essential to train successfully. And also, the energy optimization rate and the response time are vital to evaluate the optimizing method.
-
Acknowledgement
Faculty Advisor: Chengbin Ma from UM-SJTU Joint Institute
Sponsor:Binjian Xin from Newrizon
-
Reference
[1] Newrizon vehicle company.https://www.newrizon.com/. October 24, 2021.
[2] Charles Mendler. “Equations for Estimating and Optimizing the Fuel Economy of Future Automobiles”. In: (1993).