Video showing the project working with the camera in view:
Video showing the best run of the project:
https://youtube.com/shorts/0SzdhBfsPbw?feature=share
The aim of this project was to understand how AI + VLM (Visual Language Models) can be integrated into modern robotics, especially in path planning for manipulators.
The task at hand was to essentially integrate pre-trained AI models such as YOLO-world and use said models to help identify objects the user specifies so that the manipulator can move towards said object.
I was inspired to do this project by coming across the Voxposer project from Stanford University during my undergraduate years.
Voxposer: https://voxposer.github.io/
Note: this project was done at home as such I had used whatever was available to me at the time. Despite this, I really did only use only 2 pieces of equipment + my computer.