Sure, here's a line-by-line explanation:
def convert2original(image, bbox):
x, y, w, h = convert2relative(bbox)
image_h, image_w, __ = image.shape
orig_x = int(x * image_w)
orig_y = int(y * image_h)
orig_width = int(w * image_w)
orig_height = int(h * image_h)
bbox_converted = (orig_x, orig_y, orig_width, orig_height)
return bbox_converted
This function takes in an image and a bounding box bbox
defined as a tuple of (x, y, width, height), converted to relative coordinates using the convert2relative()
function. It then converts the relative coordinates back into absolute pixel coordinates for the original image and returns the new bounding box as a tuple of (x, y, width, height).
def convert4cropping(image, bbox):
x, y, w, h = convert2relative(bbox)
image_h, image_w, __ = image.shape
orig_left = int((x - w / 2.) * image_w)
orig_right = int((x + w / 2.) * image_w)
orig_top = int((y - h / 2.) * image_h)
orig_bottom = int((y + h / 2.) * image_h)
if (orig_left < 0): orig_left = 0
if (orig_right > image_w - 1): orig_right = image_w - 1
if (orig_top < 0): orig_top = 0
if (orig_bottom > image_h - 1): orig_bottom = image_h - 1
bbox_cropping = (orig_left, orig_top, orig_right, orig_bottom)
return bbox_cropping
This function takes in an image and a bounding box bbox
defined as a tuple of (x, y, width, height), converted to relative coordinates using the convert2relative()
function. It then calculates the absolute pixel coordinates for the top-left and bottom-right corners of the bounding box, but with additional padding on each side, allowing for easier cropping of the detected object from the image later.
def video_capture(frame_queue, darknet_image_queue):
while cap.isOpened():
ret, frame = cap.read()
if not ret:
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (darknet_width, darknet_height),
img_for_detect = darknet.make_image(darknet_width, darknet_height, 3)
darknet.copy_image_from_bytes(img_for_detect, frame_resized.tobytes())
This function is run in a separate thread, and reads successive frames from the capture device (e.g. webcam or video file). For each frame read, it first converts the color space from BGR to RGB and resizes it to match the input size expected by the YOLOv3 model. It then adds the original frame to a queue for processing by subsequent threads, and creates a Darknet image object in memory for inference by the YOLOv3 algorithm, adding this to another queue for processing by subsequent threads. Once all frames are processed, the capture device is released.
def inference(darknet_image_queue, detections_queue, fps_queue):
while cap.isOpened():
darknet_image = darknet_image_queue.get()
prev_time = time.time()
detections = darknet.detect_image(network, class_names, darknet_image, thresh=args.thresh)
fps = int(1/(time.time() - prev_time))
print("FPS: {}".format(fps))
darknet.print_detections(detections, args.ext_output)
This function is also run in a separate thread, and processes each Darknet image object passed to it from the queue by running inference with the YOLOv3 algorithm on that image. It then adds the detected objects and their associated bounding boxes to a separate queue for processing by subsequent threads, calculates the frames per second (FPS) of inference, prints this to the console, and releases the memory allocated to the Darknet image object.
def drawing(frame_queue, detections_queue, fps_queue):
random.seed(3) # deterministic bbox colors
video = set_saved_video(cap, args.out_filename, (video_width, video_height))
To perform character recognition on the detected bounding boxes using PaddleOCR and output the recognition result, you can modify the `drawing()` function to include another thread that takes in detections from the queue and performs OCR on the cropped images within those bounding boxes. Here's an example of what this modified function might look like:
import paddleocr
def drawing(frame_queue, detections_queue, fps_queue):
random.seed(3) # deterministic bbox colors
video = set_saved_video(cap, args.out_filename, (video_width, video_height))
# Start OCR thread
ocr_queue = Queue()
Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()
while cap.isOpened():
frame = frame_queue.get()
detections = detections_queue.get()
ocr_results = ocr_queue.get()
fps = fps_queue.get()
detections_adjusted = []
if frame is not None:
for label, confidence, bbox in detections:
bbox_adjusted = convert2original(frame, bbox)
detections_adjusted.append((str(label), confidence, bbox_adjusted))
# Extract cropped image and perform OCR
bbox_cropped = convert4cropping(frame, bbox)
cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
ocr_text = ocr_results.get(str(bbox_adjusted))
if ocr_text is None:
ocr_text = perform_paddleocr(cropped_image)
ocr_results[str(bbox_adjusted)] = ocr_text
image = darknet.draw_boxes(detections_adjusted, frame, class_colors)
if not args.dont_show:
cv2.imshow('Inference', image)
if args.out_filename is not None:
if cv2.waitKey(fps) == 27:
def perform_ocr(detections_queue, ocr_queue):
ocr_results = {}
while cap.isOpened():
detections = detections_queue.get()
for label, confidence, bbox in detections:
bbox_adjusted = convert2original(frame, bbox)
# Extract cropped image and perform OCR
bbox_cropped = convert4cropping(frame, bbox)
cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
ocr_text = perform_paddleocr(cropped_image)
ocr_results[str(bbox_adjusted)] = ocr_text
def perform_paddleocr(image):
ocr_result = ''
ocr_apis = paddleocr.OCR(det_model_dir='ch', rec_model_dir='ch')
result = ocr_apis.ocr(image)
for line in result:
for word in line:
ocr_result += word[1]
return ocr_result
In this modified `drawing()` function, we start a new thread by calling `Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()` right after initializing the `ocr_queue`. The `perform_ocr()` function takes in the same `detections_queue` as the `drawing()` function, but it also has an additional queue called `ocr_queue` that is used to store OCR results for each detected bounding box.
Inside the `drawing()` function's main while loop, we add code to extract the cropped image from each bounding box using the `convert4cropping()` function (similar to the code for drawing bounding boxes). We then check if there is already an OCR result stored for this bounding box in the `ocr_results` dictionary. If not, we use PaddleOCR to perform character recognition on the cropped image and save the result. We then add this result back to the `ocr_results` dictionary with the bounding box coordinates as the key.
Finally, we pass the `ocr_results` dictionary to the `ocr_queue` for processing by the OCR thread. The OCR thread takes detections from the `detections_queue`, extracts and performs OCR on each detected bounding box using PaddleOCR, and stores the results in the `ocr_results` dictionary. It then passes this updated dictionary back to the main thread via the `ocr_queue`, allowing the main thread to access OCR results for each bounding box and draw them on the original image frame.
Note that in this example, we assume that you have already installed PaddleOCR and downloaded a pre-trained model for Chinese character recognition (`det_model_dir='ch', rec_model_dir='ch'`). You may need to adjust this code if you are using a different model or language.
To perform character recognition on the detected bounding boxes, you can modify the `drawing()` function to include another thread that takes in detections from the queue and performs optical character recognition (OCR) on the cropped images within those bounding boxes. Here's an example of what this modified function might look like:
import pytesseract
def drawing(frame_queue, detections_queue, fps_queue):
random.seed(3) # deterministic bbox colors
video = set_saved_video(cap, args.out_filename, (video_width, video_height))
# Start OCR thread
ocr_queue = Queue()
Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()
while cap.isOpened():
frame = frame_queue.get()
detections = detections_queue.get()
ocr_results = ocr_queue.get()
fps = fps_queue.get()
detections_adjusted = []
if frame is not None:
for label, confidence, bbox in detections:
bbox_adjusted = convert2original(frame, bbox)
detections_adjusted.append((str(label), confidence, bbox_adjusted))
# Extract cropped image and perform OCR
bbox_cropped = convert4cropping(frame, bbox)
cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
ocr_text = ocr_results.get(str(bbox_adjusted))
if ocr_text is None:
ocr_text = pytesseract.image_to_string(cropped_image)
ocr_results[str(bbox_adjusted)] = ocr_text
image = darknet.draw_boxes(detections_adjusted, frame, class_colors)
if not args.dont_show:
cv2.imshow('Inference', image)
if args.out_filename is not None:
if cv2.waitKey(fps) == 27:
def perform_ocr(detections_queue, ocr_queue):
ocr_results = {}
while cap.isOpened():
detections = detections_queue.get()
for label, confidence, bbox in detections:
bbox_adjusted = convert2original(frame, bbox)
# Extract cropped image and perform OCR
bbox_cropped = convert4cropping(frame, bbox)
cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
ocr_text = pytesseract.image_to_string(cropped_image)
ocr_results[str(bbox_adjusted)] = ocr_text
In this modified `drawing()` function, we start a new thread by calling `Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()` right after initializing the `ocr_queue`. The `perform_ocr()` function takes in the same `detections_queue` as the `drawing()` function, but it also has an additional queue called `ocr_queue` that is used to store OCR results for each detected bounding box.
Inside the `drawing()` function's main while loop, we add code to extract the cropped image from each bounding box using the `
Section 1
To draw a box around the detections and print OCR results in the box, you can modify the `perform_ocr` function as follows:
def perform_ocr(detections_queue, ocr_queue):
ocr_results = {}
while cap.isOpened():
frame = frame_queue.get()
ocr_results = ocr_queue.get()
detections = detections_queue.get()
for label, confidence, bbox in detections:
bbox_adjusted = convert2original(frame, bbox)
detections_adjusted.append((str(label), confidence, bbox_adjusted))
# Extract cropped image and perform OCR
bbox_cropped = convert4cropping(frame, bbox)
cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
ocr_text = perform_paddleocr(cropped_image)
ocr_results[str(bbox_adjusted)] = ocr_text
# Draw box and print OCR results
cv2.rectangle(frame, (bbox_adjusted[0], bbox_adjusted[1]), (bbox_adjusted[2], bbox_adjusted[3]), (255, 0, 0), 2)
cv2.putText(frame, ocr_text, (bbox_adjusted[0], bbox_adjusted[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,0,0), 1, cv2.LINE_AA)
detections_adjusted = [] # reset detections for next iteration
if not args.dont_show:
cv2.imshow('Inference', frame)
if args.out_filename is not None:
if cv2.waitKey(fps) == 27:
This modified code adds two lines to draw a rectangle around the detected object and print OCR results inside the rectangle. The cv2.rectangle
function takes the frame, top-left and bottom-right coordinates of the rectangle, color, and thickness as arguments. The cv2.putText
function takes the frame, text to display, position of the text, font type and size, color, thickness, and line type as arguments.
Section 2
The line ocr_text = ocr_results.get(str(bbox_adjusted)) retrieves the OCR results for the current detection bbox from the ocr_results dictionary.
The str(bbox_adjusted) is used as the dictionary key because it provides a unique identifier for each detection bbox, which can be used to associate the correct OCR result with the corresponding detection bbox. The bbox_adjusted contains the coordinates of the top-left and bottom-right corners of the detection bbox after adjusting for any image scaling or resizing that may have been applied during the detection process.
The get() method is called on the ocr_results dictionary with str(bbox_adjusted) as its argument. If a matching key exists in the dictionary, the corresponding OCR text value is returned and assigned to the ocr_text variable. If no matching key is found, the get() method returns None.
The line ocr_results = ocr_queue.get() retrieves the OCR results dictionary from the ocr_queue.
The ocr_queue is a Python Queue object that is used to pass data between different threads in a synchronized way. In this code, it is used to pass the OCR results dictionary from the thread that performs OCR on each detection frame to the main thread that displays the detected objects with their OCR results.
The get() method of the ocr_queue blocks until an item is available in the queue. Once an item is available, it removes and returns it from the queue. When the OCR thread puts the OCR results dictionary into the queue using ocr_queue.put(ocr_results), the get() method of the main thread can retrieve it for further processing.