&&&The extension:??????
bag of visual words of the recognition framework
multi-labled &multi-class classificaiton
LBP(Local binary patterns)
LLC coding:locality constrained linear encoding
Gabor feature(Gabor filters)
rotation-invariant pooling scheme
super vector regression
1、The initial motivation:be easy to do the food log by takeing a meal image to help deal with the obesity.
1、database:Menu-Match、Pittsburgh food dataset
2、transformation:map the coloria estimation into an identificaiton problem
3、challenging:occlusions、visual information alone can miss some details of food preparation(e.g.,oil,fat content of meats)、accurate volume estimation.
4、related methods:
a novel feature description
local and global features in a voting scheme
bag-of-features model
multiple kernel learning
5、Ideal flow:
Short and accurate but hard to map a image to such ingredients database-based on fundamental nutirtional building blocks,such as oils,fats,proteins and minerals(eg,one gram of oil contains 8.8 calories)
Coarser but easy to solve visual mapping database-containing food categories or atomic food items
6、Resturant specific recognition:
“the cheeseburger at Joe's at Solo Grill in Toronto”,and then accurate nutritional statistics can be read from the database.
which means the database is linked to the specific resturants and the items is reletated each meal that makes the problem easier without the challenging of occlusion and volume estimation.
7、The pipeline
first localize the resturant,then recognize the image of every food items linked to the restaurants, finally lookup the database to estimate thecalorie。
# the size and ingredietents vary by customes
a、a single image with several food items
b、GPS to localize the resturant where the customer is dietting
c、The menu-match database contains 646 images with 1386 tagged food items across 41 categories.
d、identify the image map intoseveral food items
8、Implementation Details
#follow a publicly available vision libray to extract the features.
a、pre-processing:rescale the image into the largest dimension is 500 pixels.
b、semi-automated food item identification:train a one-vs-rest linear svm by using the concatenated features and then predict the image with multi-labeled and sort the items to choosed by the user.
c、Fully Automated Estimation of food statistics:based on the support vector regression
d、Rotationally invariant pooling:increase the mean average precision for the joint feature.Because the food images are always captured top-down ranther than normal images of sideways,it is better than traditional spatial pyramid pooling.On the other hand,the food is always in the center of the image.
Recognition Framework
A、Extract the features
based on the bag of words approach:five kinds of features are extracted(color,HOG,SIFT,LBP,MR8) .Then code them by LLC followed by the max pooling in a rotation-invariant pooling scheme.
B、Identification
infer to 8(b、c)
C、Estimate the calorie according to the nutirtional table
Summarization:
Pros:A new food database is created to training the one-vs-rest svm and some challenges can be avoided with this method,which relac some technique to access the image
Cons:The feature extraction is tedious why not use the DeepLearning methods?
The constraints of the food items in a special resturant which can be not feasible in some situation.