This is a template portfolio for the students attending the BlueStamp Engineering program.

Project maintained by KujoJotaro99 Hosted on GitHub Pages — Theme by mattgraham

Object Detection on Raspberry Pi

This is my chosen project for the Bluestamp summer program. My aim is to make an object detection program on the Raspberry Pi using my own machine learning model. I am focusing my project around detecting vehicles because I am fascinated by cars, and I also want to learn more about AI and its uses in the real world.


Engineer School Area of Interest Grade
Subrahmanian Hari Dougherty Valley High School Engineering Incoming Senior

Setting up the Raspberry Pi

Here are the materials you will need

(asterix: optional)

I first set up my Raspberry Pi by downloading the Raspbian OS from the Pi imager, provided on their downloads page, onto my micro SD card via an adapter and a separate reader. Then, I inserted my SD card into the slot underneath the Raspberry Pi’s board. I then stuck the two heatsinks onto the RAM and CPU of the RPI after taking the adhesive off. I took the Picamera and attached the accompanying ribbon cable to it. The next step was to insert the ribbon cable inside the camera port. Once everything was secure, I plugged in the power brick(warning: since the Raspberry doesn’t have a dedicated power button, it remains on until the plug is pulled). Raspberry then prompted me to install updates and set up my account. Once my setup was complete, I could start my project.

Before setup After setup

First Milestone

My first milestone was getting a pre made model object detection to work with Raspberry Pi, using the Tensorflow library. The model I used was a pre-made model gathered and trained by google. The setup to get this working was quite tedious, as there were many configurations to be made and lots of dependencies/packages I needed to install. This milestone was meant to be a testing process for my Raspberry Pi and camera to ensure everything was set up correctly and that my Raspberry Pi was capable of handling object detection(considering it has 1gb of ram and is bound to run into one issue or another). Of course, I am not satisfied by stopping here. I intend to gather my own data and create annotations in order to train my own model on a service called Nanonets.

Second Milestone

This milestone involved a few steps: gathering data, annotating the data, and uploading both the gathered images and onto Nanonets, training a new model, and finally making a program to detect cars. Initially I went out to capture images of any passerby vehicles and parked cars and manually annotated them on Nanonets. Of course, that was the plan until I realized that a hundred images was nowhere near enough to train a reliable model. This was a bit upsetting however I discovered there were plenty of data packs online that allowed you to use their images to train your model. The model I used was the Stanford Cars Dataset. It contained over sixteen thousand images, however I only used eight thousand one hundred and forty four. The annotations unfortunately were inside a .mat format, which I was unable to open. However, my instructor provided me with an opened version that allowed me to utilize them in my model. Uploading the images/annotations onto Nanonets manually was near impossible. The website frequently glitched and did not allow more than 10 images to be uploaded at a time, and there was a max limit of 5 types of labels(I had 196 types of cars). In order to work around this, I decided to upload the images via API call. I made a script to open and parse through each XML file, and gather the corresponding image and call the API to upload them together. Due to the label limit, I replaced the category/name attribute in each XML file with simply ‘cars’. It took over 2 hours to fully upload each image and XML onto Nanonets, but it was far quicker than what would have happened had I done so manually. I then proceeded to train the model successful. The final accuracy was approximately 99%, which changed every time I tested it with a new image. On my Raspberry Pi, I wrote a Python script that used OpenCV to show a live preview of the camera module, and each frame was sent as an image to Nanonets, and it would return a prediction in the console. It then drew a box on the image using the gathered bounding box coordinates from the JSON response. Although it was successful, the API response was far too slow to display a true live stream, so I removed the preview and the program would update the image file stored on the Desktop every few seconds. It was slow, but showed a significant increase in speed.

Here is the code I made to upload onto Nanonets(in case you are using this as a guide):

import os, requests
from tqdm import tqdm
from xml.dom import minidom
import json
model_id = 'model ID shown after model generation'
api_key = 'the API key assigned to you when you creat an account'
url = 'Put the URL nanonets generates for you here'
directory = 'where your image files are located in this project'
directory2 = 'where your xml files are located in this project'
#get image
for image in sorted(os.listdir(directory)):
    if image.endswith(".jpg"):
        xmlName=directory2 + image[:-4]+'.xml'
#get your corresponding xml file
        mydoc = minidom.parse(xmlName)
        #get category, this is if you have more than one category. 
        #For reference I simply had 'cars' so I technically did not
        #even need this line, but for your convenience.
        name = mydoc.getElementsByTagName('name')
        category = (name[0]
        #print('category: '+category)
        #get xmin
        x1 = mydoc.getElementsByTagName('xmin')
        xmin = int((x1[0]
        #print('xmin: '+xmin)
        #get xmax
        x2 = mydoc.getElementsByTagName('xmax')
        xmax = int((x2[0]
        #print('xmax: '+xmax)
        #get ymin
        y1 = mydoc.getElementsByTagName('ymin')
        ymin = int((y1[0]
        #print('ymin: '+ymin)
        #get ymax
        y2 = mydoc.getElementsByTagName('ymax')
        ymax = int((y2[0]
        #print('ymax: '+ymax)
        info = json.dumps([{"filename":imageName, "object": [{"name": category, "bndbox": {"xmin": xmin, "ymin": ymin, "xmax": xmax, "ymax": ymax}}]}])
        data = {'file' :open('images/'+imageName, 'rb'), 'data' :('', info), 'modelId': model_id}
        response =, auth=requests.auth.HTTPBasicAuth('hk_n3BqBYfwtHbVz_5GqvL2YXU2RsYYe', ''), files=data)
        #print(imageName+ ' '+xmlName+' '+xmin+' '' '+xmax+' '+ymin+' '+ymax+' ')
        #if it cannot find an image or xml file, it will simply continue in order to prevent mismatched annotations

Third Milestone

This was my final milestone for my raspberry pi object detection project. This involved setting up a mini user interface via tkinter in Python. I set up a button to allow the user to take a snapshot whenever they wanted, rather that automatically taking a snapshot at specific intervals. This improved upon two aspects of my project: speed and API usage. Back when I initially tried to use a live stream feature, I set up my program to take a picture every unit of time(decided by my time.sleep(n) line where n is the seconds to cooldown). However, this constant cycle of snapshots and updating the image frame caused HEAVY lag. My nanonets model would be recieving images almost every second and the response time was very slow. It would often take up to 10 seconds just for the frame to load up after the camera would start taking images, and sometimes it would not load at all. I decided it was probably for the best that I throw out the live stream idea and simply allow the user to decide when to detect an object. As a result it also improved upon my API usage. I am not particularly enthusastic about paying a subscription so I had to be careful about the amount of images I captured. These two changes really helped my program perform more efficiently.

My Demo Night


I was able to produce a working demo for my project, but I still believe that it is far from perfect as I was not able to accomplish everything I wished for. I originally intended to design a more capable interface and also show the user a live preview of their camera. In order to do this, I created a Node.js web server that allows the user to directly take an image on a page and view it inside a canvas. My current plan is to implement a feature to allow the user to predict the captured image real time and get a result from my model. In addition, rather than simply predicting whether a car exists, I aim to implement a feature to detect the year and make of the car. That will require further extensions of my model directly but thanks to the experience I have gained here at BSE, I feel like I can accomplish it!

About Me

Screen Shot 2021-07-02 at 8 44 55 PM

Hello! My name is Mani and I am a 16 year old programmer living in San Ramon. I have been programming since 2018 and my primary interests lie in game and web design. When I am not in school or with my friends, you can find me making sprites in photoshop, trying to debug a random Typescript error, or doing something creative. Recently, I started getting into cars, and making them the focus of my programming and hobbies. I love to draw and model cars and also make 2D racer games on Unity. Prior to this program, I was unware of the many tools available to build AI. At Bluestamp, I was able to familiarize myself with the world of machine learning and I am motivated to continue my path of learning.