darknet_video删改

逐行解释
Sure, here's a line-by-line explanation:

def convert2original(image, bbox):
    x, y, w, h = convert2relative(bbox)

    image_h, image_w, __ = image.shape

    orig_x       = int(x * image_w)
    orig_y       = int(y * image_h)
    orig_width   = int(w * image_w)
    orig_height  = int(h * image_h)

    bbox_converted = (orig_x, orig_y, orig_width, orig_height)

    return bbox_converted

This function takes in an image and a bounding box bbox defined as a tuple of (x, y, width, height), converted to relative coordinates using the convert2relative() function. It then converts the relative coordinates back into absolute pixel coordinates for the original image and returns the new bounding box as a tuple of (x, y, width, height).

def convert4cropping(image, bbox):
    x, y, w, h = convert2relative(bbox)

    image_h, image_w, __ = image.shape

    orig_left    = int((x - w / 2.) * image_w)
    orig_right   = int((x + w / 2.) * image_w)
    orig_top     = int((y - h / 2.) * image_h)
    orig_bottom  = int((y + h / 2.) * image_h)

    if (orig_left < 0): orig_left = 0
    if (orig_right > image_w - 1): orig_right = image_w - 1
    if (orig_top < 0): orig_top = 0
    if (orig_bottom > image_h - 1): orig_bottom = image_h - 1

    bbox_cropping = (orig_left, orig_top, orig_right, orig_bottom)

    return bbox_cropping

This function takes in an image and a bounding box bbox defined as a tuple of (x, y, width, height), converted to relative coordinates using the convert2relative() function. It then calculates the absolute pixel coordinates for the top-left and bottom-right corners of the bounding box, but with additional padding on each side, allowing for easier cropping of the detected object from the image later.

def video_capture(frame_queue, darknet_image_queue):
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frame_resized = cv2.resize(frame_rgb, (darknet_width, darknet_height),
                                   interpolation=cv2.INTER_LINEAR)
        frame_queue.put(frame)
        img_for_detect = darknet.make_image(darknet_width, darknet_height, 3)
        darknet.copy_image_from_bytes(img_for_detect, frame_resized.tobytes())
        darknet_image_queue.put(img_for_detect)
    cap.release()

This function is run in a separate thread, and reads successive frames from the capture device (e.g. webcam or video file). For each frame read, it first converts the color space from BGR to RGB and resizes it to match the input size expected by the YOLOv3 model. It then adds the original frame to a queue for processing by subsequent threads, and creates a Darknet image object in memory for inference by the YOLOv3 algorithm, adding this to another queue for processing by subsequent threads. Once all frames are processed, the capture device is released.

def inference(darknet_image_queue, detections_queue, fps_queue):
    while cap.isOpened():
        darknet_image = darknet_image_queue.get()
        prev_time = time.time()
        detections = darknet.detect_image(network, class_names, darknet_image, thresh=args.thresh)
        detections_queue.put(detections)
        fps = int(1/(time.time() - prev_time))
        fps_queue.put(fps)
        print("FPS: {}".format(fps))
        darknet.print_detections(detections, args.ext_output)
        darknet.free_image(darknet_image)
    cap.release()

This function is also run in a separate thread, and processes each Darknet image object passed to it from the queue by running inference with the YOLOv3 algorithm on that image. It then adds the detected objects and their associated bounding boxes to a separate queue for processing by subsequent threads, calculates the frames per second (FPS) of inference, prints this to the console, and releases the memory allocated to the Darknet image object.

def drawing(frame_queue, detections_queue, fps_queue):
    random.seed(3)  # deterministic bbox colors
    video = set_saved_video(cap, args.out_filename, (video_width, video_height))
while cap.is




PaddleOCR
To perform character recognition on the detected bounding boxes using PaddleOCR and output the recognition result, you can modify the `drawing()` function to include another thread that takes in detections from the queue and performs OCR on the cropped images within those bounding boxes. Here's an example of what this modified function might look like:

import paddleocr

def drawing(frame_queue, detections_queue, fps_queue):
random.seed(3) # deterministic bbox colors
video = set_saved_video(cap, args.out_filename, (video_width, video_height))

# Start OCR thread
ocr_queue = Queue()
Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()

while cap.isOpened():
    frame = frame_queue.get()
    detections = detections_queue.get()
    ocr_results = ocr_queue.get()
    fps = fps_queue.get()
    detections_adjusted = []
    if frame is not None:
        for label, confidence, bbox in detections:
            bbox_adjusted = convert2original(frame, bbox)
            detections_adjusted.append((str(label), confidence, bbox_adjusted))
            
            # Extract cropped image and perform OCR
            bbox_cropped = convert4cropping(frame, bbox)
            cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
            ocr_text = ocr_results.get(str(bbox_adjusted))
            if ocr_text is None:
                ocr_text = perform_paddleocr(cropped_image)
                ocr_results[str(bbox_adjusted)] = ocr_text
        
        image = darknet.draw_boxes(detections_adjusted, frame, class_colors)
        if not args.dont_show:
            cv2.imshow('Inference', image)
        if args.out_filename is not None:
            video.write(image)
        if cv2.waitKey(fps) == 27:
            break
cap.release()
video.release()
cv2.destroyAllWindows()

def perform_ocr(detections_queue, ocr_queue):
ocr_results = {}
while cap.isOpened():
detections = detections_queue.get()
for label, confidence, bbox in detections:
bbox_adjusted = convert2original(frame, bbox)

        # Extract cropped image and perform OCR
        bbox_cropped = convert4cropping(frame, bbox)
        cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
        ocr_text = perform_paddleocr(cropped_image)
        ocr_results[str(bbox_adjusted)] = ocr_text
        
    ocr_queue.put(ocr_results)

def perform_paddleocr(image):
ocr_result = ''
ocr_apis = paddleocr.OCR(det_model_dir='ch', rec_model_dir='ch')
result = ocr_apis.ocr(image)
for line in result:
for word in line:
ocr_result += word[1]
return ocr_result


In this modified `drawing()` function, we start a new thread by calling `Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()` right after initializing the `ocr_queue`. The `perform_ocr()` function takes in the same `detections_queue` as the `drawing()` function, but it also has an additional queue called `ocr_queue` that is used to store OCR results for each detected bounding box. 

Inside the `drawing()` function's main while loop, we add code to extract the cropped image from each bounding box using the `convert4cropping()` function (similar to the code for drawing bounding boxes). We then check if there is already an OCR result stored for this bounding box in the `ocr_results` dictionary. If not, we use PaddleOCR to perform character recognition on the cropped image and save the result. We then add this result back to the `ocr_results` dictionary with the bounding box coordinates as the key.

Finally, we pass the `ocr_results` dictionary to the `ocr_queue` for processing by the OCR thread. The OCR thread takes detections from the `detections_queue`, extracts and performs OCR on each detected bounding box using PaddleOCR, and stores the results in the `ocr_results` dictionary. It then passes this updated dictionary back to the main thread via the `ocr_queue`, allowing the main thread to access OCR results for each bounding box and draw them on the original image frame.

Note that in this example, we assume that you have already installed PaddleOCR and downloaded a pre-trained model for Chinese character recognition (`det_model_dir='ch', rec_model_dir='ch'`). You may need to adjust this code if you are using a different model or language.








tesseract

To perform character recognition on the detected bounding boxes, you can modify the `drawing()` function to include another thread that takes in detections from the queue and performs optical character recognition (OCR) on the cropped images within those bounding boxes. Here's an example of what this modified function might look like:

import pytesseract

def drawing(frame_queue, detections_queue, fps_queue):
random.seed(3) # deterministic bbox colors
video = set_saved_video(cap, args.out_filename, (video_width, video_height))

# Start OCR thread
ocr_queue = Queue()
Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()

while cap.isOpened():
    frame = frame_queue.get()
    detections = detections_queue.get()
    ocr_results = ocr_queue.get()
    fps = fps_queue.get()
    detections_adjusted = []
    if frame is not None:
        for label, confidence, bbox in detections:
            bbox_adjusted = convert2original(frame, bbox)
            detections_adjusted.append((str(label), confidence, bbox_adjusted))
            
            # Extract cropped image and perform OCR
            bbox_cropped = convert4cropping(frame, bbox)
            cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
            ocr_text = ocr_results.get(str(bbox_adjusted))
            if ocr_text is None:
                ocr_text = pytesseract.image_to_string(cropped_image)
                ocr_results[str(bbox_adjusted)] = ocr_text
        
        image = darknet.draw_boxes(detections_adjusted, frame, class_colors)
        if not args.dont_show:
            cv2.imshow('Inference', image)
        if args.out_filename is not None:
            video.write(image)
        if cv2.waitKey(fps) == 27:
            break
cap.release()
video.release()
cv2.destroyAllWindows()

        # Extract cropped image and perform OCR
        bbox_cropped = convert4cropping(frame, bbox)
        cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
        ocr_text = pytesseract.image_to_string(cropped_image)
        ocr_results[str(bbox_adjusted)] = ocr_text
        
    ocr_queue.put(ocr_results)


In this modified `drawing()` function, we start a new thread by calling `Thread(target=perform_ocr, args=(detections_queue, ocr_queue)).start()` right after initializing the `ocr_queue`. The `perform_ocr()` function takes in the same `detections_queue` as the `drawing()` function, but it also has an additional queue called `ocr_queue` that is used to store OCR results for each detected bounding box. 

Inside the `drawing()` function's main while loop, we add code to extract the cropped image from each bounding box using the `
[Request aborted]


我自认为在政治思想方面，我始终坚持以习近平新时代中国特色社会主义思想为指导，认真履行自己的公民责任，在日常生活中积极践行社会主义核心价值观。同时，我也深刻认识到个人思想觉悟水平还需不断提高，需要更加深入学习和理解马克思列宁主义、毛泽东思想和邓小平理论等。
在学习科研方面，我一直保持着良好的学习态度和求知欲，认真听取老师的讲解，积极参与讨论，并且勇于拓展自己的思维，多角度思考问题。我在大学期间也积极参与各种科研项目，在实践中提升了自己的动手实践能力和科学研究能力。
在社会活动方面，我始终坚持以服务于他人为本，积极参与各类志愿者活动，如组织学生参与校外公益活动、参与社区文化建设等。这些活动不仅锻炼了我的组织管理能力和沟通协调能力，还增强了我对社会的责任感和使命感。
在能力特长方面，我有一定的编程技能和语言表达能力。我热爱计算机编程，熟练掌握多种编程语言，对数据分析等方面也有一定的理解和应用能力。同时，我也经常参加英语演讲比赛和辩论赛，在锻炼自己的英语口语表达能力的同时，也提高了自己的逻辑思考和分析能力。
在取得的成就方面，我曾获得全国大学生英语竞赛二等奖、校园科技节创新创业大赛三等奖、优秀志愿者称号等荣誉。这些成就是我不断努力和付出的结果，也给予我更多信心和动力去追求更高的目标。
在存在的问题方面，我认为自己还存在很多不足和需要改进的地方。例如，我的时间管理能力还需要提升，有时候容易陷入拖延和浪费时间的状态。同时，我的社交能力还有待提高，需要更好地处理人际关系，并且尝试与更多不同背景的人交流互动。
今后，我将继续扩大自己的知识储备，深入学习和研究各个领域的知识，提高自己的综合素质。同时，我也将更加注重实践能力的提升，在项目实践中不断锻炼自己的动手实践能力和团队合作能力。在社会活动方面，我将积极参与各类志愿者活动，并且尝试组织一些有意义的活动来回馈社会。最后，我也希望自己可以更好地发掘自己的潜力，尽快找到适合自己的职业方向，为国家和社会做出自己的贡献。




Day2
Section 1
To draw a box around the detections and print OCR results in the box, you can modify the `perform_ocr` function as follows:

```python
def perform_ocr(detections_queue, ocr_queue):
    ocr_results = {}
    while cap.isOpened():
        frame = frame_queue.get()
        ocr_results = ocr_queue.get()
        detections = detections_queue.get()
        for label, confidence, bbox in detections:
            bbox_adjusted = convert2original(frame, bbox)
            detections_adjusted.append((str(label), confidence, bbox_adjusted))
            # Extract cropped image and perform OCR
            bbox_cropped = convert4cropping(frame, bbox)
            cropped_image = frame[bbox_cropped[1]:bbox_cropped[3], bbox_cropped[0]:bbox_cropped[2]]
            ocr_text = perform_paddleocr(cropped_image)
            ocr_results[str(bbox_adjusted)] = ocr_text
            
            # Draw box and print OCR results
            cv2.rectangle(frame, (bbox_adjusted[0], bbox_adjusted[1]), (bbox_adjusted[2], bbox_adjusted[3]), (255, 0, 0), 2)
            cv2.putText(frame, ocr_text, (bbox_adjusted[0], bbox_adjusted[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,0,0), 1, cv2.LINE_AA)
        
        detections_adjusted = [] # reset detections for next iteration
        ocr_queue.put(ocr_results)
        
        if not args.dont_show:
            cv2.imshow('Inference', frame)
        if args.out_filename is not None:
            video.write(frame)
        if cv2.waitKey(fps) == 27:
            break
        
    cap.release()
    video.release()

This modified code adds two lines to draw a rectangle around the detected object and print OCR results inside the rectangle. The cv2.rectangle function takes the frame, top-left and bottom-right coordinates of the rectangle, color, and thickness as arguments. The cv2.putText function takes the frame, text to display, position of the text, font type and size, color, thickness, and line type as arguments.

Section 2
The line ocr_text = ocr_results.get(str(bbox_adjusted)) retrieves the OCR results for the current detection bbox from the ocr_results dictionary.
The str(bbox_adjusted) is used as the dictionary key because it provides a unique identifier for each detection bbox, which can be used to associate the correct OCR result with the corresponding detection bbox. The bbox_adjusted contains the coordinates of the top-left and bottom-right corners of the detection bbox after adjusting for any image scaling or resizing that may have been applied during the detection process.
The get() method is called on the ocr_results dictionary with str(bbox_adjusted) as its argument. If a matching key exists in the dictionary, the corresponding OCR text value is returned and assigned to the ocr_text variable. If no matching key is found, the get() method returns None.

Section3
The line ocr_results = ocr_queue.get() retrieves the OCR results dictionary from the ocr_queue.
The ocr_queue is a Python Queue object that is used to pass data between different threads in a synchronized way. In this code, it is used to pass the OCR results dictionary from the thread that performs OCR on each detection frame to the main thread that displays the detected objects with their OCR results.
The get() method of the ocr_queue blocks until an item is available in the queue. Once an item is available, it removes and returns it from the queue. When the OCR thread puts the OCR results dictionary into the queue using ocr_queue.put(ocr_results), the get() method of the main thread can retrieve it for further processing.

darknet_video删改

推荐阅读更多精彩内容