Wanli commited on
Commit
c2ded1d
·
0 Parent(s):

add pose estimation model (#152)

Browse files
Files changed (4) hide show
  1. LICENSE +202 -0
  2. README.md +34 -0
  3. demo.py +252 -0
  4. mp_pose.py +179 -0
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pose estimation from MediaPipe Pose
2
+
3
+ This model estimates 33 pose keypoints and person segmentation mask per detected person from [person detector](../person_detection_mediapipe). (The image below is referenced from [MediaPipe Pose Keypoints](https://github.com/tensorflow/tfjs-models/tree/master/pose-detection#blazepose-keypoints-used-in-mediapipe-blazepose))
4
+
5
+ ![MediaPipe Pose Landmark](examples/pose_landmarks.png)
6
+
7
+ This model is converted from TFlite to ONNX using following tools:
8
+ - TFLite model to ONNX: https://github.com/onnx/tensorflow-onnx
9
+ - simplified by [onnx-simplifier](https://github.com/daquexian/onnx-simplifier)
10
+
11
+ **Note**:
12
+ - Visit https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#pose for models of larger scale.
13
+ ## Demo
14
+
15
+ Run the following commands to try the demo:
16
+ ```bash
17
+ # detect on camera input
18
+ python demo.py
19
+ # detect on an image
20
+ python demo.py -i /path/to/image -v
21
+ ```
22
+
23
+ ### Example outputs
24
+
25
+ ![webcam demo](examples/mpposeest_demo.webp)
26
+
27
+ ## License
28
+
29
+ All files in this directory are licensed under [Apache 2.0 License](LICENSE).
30
+
31
+ ## Reference
32
+ - MediaPipe Pose: https://developers.google.com/mediapipe/solutions/vision/pose_landmarker
33
+ - MediaPipe pose model and model card: https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#pose
34
+ - BlazePose TFJS: https://github.com/tensorflow/tfjs-models/tree/master/pose-detection/src/blazepose_tfjs
demo.py ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import argparse
3
+
4
+ import numpy as np
5
+ import cv2 as cv
6
+
7
+ from mp_pose import MPPose
8
+
9
+ sys.path.append('../person_detection_mediapipe')
10
+ from mp_persondet import MPPersonDet
11
+
12
+ # Check OpenCV version
13
+ assert cv.__version__ >= "4.7.0", \
14
+ "Please install latest opencv-python to try this demo: python3 -m pip install --upgrade opencv-python"
15
+
16
+ # Valid combinations of backends and targets
17
+ backend_target_pairs = [
18
+ [cv.dnn.DNN_BACKEND_OPENCV, cv.dnn.DNN_TARGET_CPU],
19
+ [cv.dnn.DNN_BACKEND_CUDA, cv.dnn.DNN_TARGET_CUDA],
20
+ [cv.dnn.DNN_BACKEND_CUDA, cv.dnn.DNN_TARGET_CUDA_FP16],
21
+ [cv.dnn.DNN_BACKEND_TIMVX, cv.dnn.DNN_TARGET_NPU],
22
+ [cv.dnn.DNN_BACKEND_CANN, cv.dnn.DNN_TARGET_NPU]
23
+ ]
24
+
25
+ parser = argparse.ArgumentParser(description='Pose Estimation from MediaPipe')
26
+ parser.add_argument('--input', '-i', type=str,
27
+ help='Path to the input image. Omit for using default camera.')
28
+ parser.add_argument('--model', '-m', type=str, default='./pose_estimation_mediapipe_2023mar.onnx',
29
+ help='Path to the model.')
30
+ parser.add_argument('--backend_target', '-bt', type=int, default=0,
31
+ help='''Choose one of the backend-target pair to run this demo:
32
+ {:d}: (default) OpenCV implementation + CPU,
33
+ {:d}: CUDA + GPU (CUDA),
34
+ {:d}: CUDA + GPU (CUDA FP16),
35
+ {:d}: TIM-VX + NPU,
36
+ {:d}: CANN + NPU
37
+ '''.format(*[x for x in range(len(backend_target_pairs))]))
38
+ parser.add_argument('--conf_threshold', type=float, default=0.8,
39
+ help='Filter out hands of confidence < conf_threshold.')
40
+ parser.add_argument('--save', '-s', action='store_true',
41
+ help='Specify to save results. This flag is invalid when using camera.')
42
+ parser.add_argument('--vis', '-v', action='store_true',
43
+ help='Specify to open a window for result visualization. This flag is invalid when using camera.')
44
+ args = parser.parse_args()
45
+
46
+ def visualize(image, poses):
47
+ display_screen = image.copy()
48
+ display_3d = np.zeros((400, 400, 3), np.uint8)
49
+ cv.line(display_3d, (200, 0), (200, 400), (255, 255, 255), 2)
50
+ cv.line(display_3d, (0, 200), (400, 200), (255, 255, 255), 2)
51
+ cv.putText(display_3d, 'Main View', (0, 12), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
52
+ cv.putText(display_3d, 'Top View', (200, 12), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
53
+ cv.putText(display_3d, 'Left View', (0, 212), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
54
+ cv.putText(display_3d, 'Right View', (200, 212), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
55
+ is_draw = False # ensure only one person is drawn
56
+
57
+ def _draw_lines(image, landmarks, keep_landmarks, is_draw_point=True, thickness=2):
58
+
59
+ def _draw_by_presence(idx1, idx2):
60
+ if keep_landmarks[idx1] and keep_landmarks[idx2]:
61
+ cv.line(image, landmarks[idx1], landmarks[idx2], (255, 255, 255), thickness)
62
+
63
+ _draw_by_presence(0, 1)
64
+ _draw_by_presence(1, 2)
65
+ _draw_by_presence(2, 3)
66
+ _draw_by_presence(3, 7)
67
+ _draw_by_presence(0, 4)
68
+ _draw_by_presence(4, 5)
69
+ _draw_by_presence(5, 6)
70
+ _draw_by_presence(6, 8)
71
+
72
+ _draw_by_presence(9, 10)
73
+
74
+ _draw_by_presence(12, 14)
75
+ _draw_by_presence(14, 16)
76
+ _draw_by_presence(16, 22)
77
+ _draw_by_presence(16, 18)
78
+ _draw_by_presence(16, 20)
79
+ _draw_by_presence(18, 20)
80
+
81
+ _draw_by_presence(11, 13)
82
+ _draw_by_presence(13, 15)
83
+ _draw_by_presence(15, 21)
84
+ _draw_by_presence(15, 19)
85
+ _draw_by_presence(15, 17)
86
+ _draw_by_presence(17, 19)
87
+
88
+ _draw_by_presence(11, 12)
89
+ _draw_by_presence(11, 23)
90
+ _draw_by_presence(23, 24)
91
+ _draw_by_presence(24, 12)
92
+
93
+ _draw_by_presence(24, 26)
94
+ _draw_by_presence(26, 28)
95
+ _draw_by_presence(28, 30)
96
+ _draw_by_presence(28, 32)
97
+ _draw_by_presence(30, 32)
98
+
99
+ _draw_by_presence(23, 25)
100
+ _draw_by_presence(25, 27)
101
+ _draw_by_presence(27, 31)
102
+ _draw_by_presence(27, 29)
103
+ _draw_by_presence(29, 31)
104
+
105
+ if is_draw_point:
106
+ for i, p in enumerate(landmarks):
107
+ if keep_landmarks[i]:
108
+ cv.circle(image, p, thickness, (0, 0, 255), -1)
109
+
110
+ for idx, pose in enumerate(poses):
111
+ bbox, landmarks_screen, landmarks_word, mask, heatmap, conf = pose
112
+
113
+ edges = cv.Canny(mask, 100, 200)
114
+ kernel = np.ones((2, 2), np.uint8) # expansion edge to 2 pixels
115
+ edges = cv.dilate(edges, kernel, iterations=1)
116
+ edges_bgr = cv.cvtColor(edges, cv.COLOR_GRAY2BGR)
117
+ edges_bgr[edges == 255] = [0, 255, 0]
118
+ display_screen = cv.add(edges_bgr, display_screen)
119
+
120
+
121
+ # draw box
122
+ bbox = bbox.astype(np.int32)
123
+ cv.rectangle(display_screen, bbox[0], bbox[1], (0, 255, 0), 2)
124
+ cv.putText(display_screen, '{:.4f}'.format(conf), (bbox[0][0], bbox[0][1] + 12), cv.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 255))
125
+ # Draw line between each key points
126
+ landmarks_screen = landmarks_screen[:-6, :]
127
+ landmarks_word = landmarks_word[:-6, :]
128
+
129
+ keep_landmarks = landmarks_screen[:, 4] > 0.8 # only show visible keypoints which presence bigger than 0.8
130
+
131
+ landmarks_screen = landmarks_screen
132
+ landmarks_word = landmarks_word
133
+
134
+ landmarks_xy = landmarks_screen[:, 0: 2].astype(np.int32)
135
+ _draw_lines(display_screen, landmarks_xy, keep_landmarks, is_draw_point=False)
136
+
137
+ # z value is relative to HIP, but we use constant to instead
138
+ for i, p in enumerate(landmarks_screen[:, 0: 3].astype(np.int32)):
139
+ if keep_landmarks[i]:
140
+ cv.circle(display_screen, np.array([p[0], p[1]]), 2, (0, 0, 255), -1)
141
+
142
+ if is_draw is False:
143
+ is_draw = True
144
+ # Main view
145
+ landmarks_xy = landmarks_word[:, [0, 1]]
146
+ landmarks_xy = (landmarks_xy * 100 + 100).astype(np.int32)
147
+ _draw_lines(display_3d, landmarks_xy, keep_landmarks, thickness=2)
148
+
149
+ # Top view
150
+ landmarks_xz = landmarks_word[:, [0, 2]]
151
+ landmarks_xz[:, 1] = -landmarks_xz[:, 1]
152
+ landmarks_xz = (landmarks_xz * 100 + np.array([300, 100])).astype(np.int32)
153
+ _draw_lines(display_3d, landmarks_xz,keep_landmarks, thickness=2)
154
+
155
+ # Left view
156
+ landmarks_yz = landmarks_word[:, [2, 1]]
157
+ landmarks_yz[:, 0] = -landmarks_yz[:, 0]
158
+ landmarks_yz = (landmarks_yz * 100 + np.array([100, 300])).astype(np.int32)
159
+ _draw_lines(display_3d, landmarks_yz, keep_landmarks, thickness=2)
160
+
161
+ # Right view
162
+ landmarks_zy = landmarks_word[:, [2, 1]]
163
+ landmarks_zy = (landmarks_zy * 100 + np.array([300, 300])).astype(np.int32)
164
+ _draw_lines(display_3d, landmarks_zy, keep_landmarks, thickness=2)
165
+
166
+ return display_screen, display_3d
167
+
168
+ if __name__ == '__main__':
169
+ backend_id = backend_target_pairs[args.backend_target][0]
170
+ target_id = backend_target_pairs[args.backend_target][1]
171
+
172
+ # person detector
173
+ person_detector = MPPersonDet(modelPath='../person_detection_mediapipe/person_detection_mediapipe_2023mar.onnx',
174
+ nmsThreshold=0.3,
175
+ scoreThreshold=0.5,
176
+ topK=5000, # usually only one person has good performance
177
+ backendId=backend_id,
178
+ targetId=target_id)
179
+ # pose estimator
180
+ pose_estimator = MPPose(modelPath=args.model,
181
+ confThreshold=args.conf_threshold,
182
+ backendId=backend_id,
183
+ targetId=target_id)
184
+
185
+ # If input is an image
186
+ if args.input is not None:
187
+ image = cv.imread(args.input)
188
+
189
+ # person detector inference
190
+ persons = person_detector.infer(image)
191
+ poses = []
192
+
193
+ # Estimate the pose of each person
194
+ for person in persons:
195
+ # pose estimator inference
196
+ pose = pose_estimator.infer(image, person)
197
+ if pose is not None:
198
+ poses.append(pose)
199
+ # Draw results on the input image
200
+ image, view_3d = visualize(image, poses)
201
+
202
+ if len(persons) == 0:
203
+ print('No person detected!')
204
+ else:
205
+ print('Person detected!')
206
+
207
+ # Save results
208
+ if args.save:
209
+ cv.imwrite('result.jpg', image)
210
+ print('Results saved to result.jpg\n')
211
+
212
+ # Visualize results in a new window
213
+ if args.vis:
214
+ cv.namedWindow(args.input, cv.WINDOW_AUTOSIZE)
215
+ cv.imshow(args.input, image)
216
+ cv.imshow('3D Pose Demo', view_3d)
217
+ cv.waitKey(0)
218
+ else: # Omit input to call default camera
219
+ deviceId = 0
220
+ cap = cv.VideoCapture(deviceId)
221
+
222
+ tm = cv.TickMeter()
223
+ while cv.waitKey(1) < 0:
224
+ hasFrame, frame = cap.read()
225
+ if not hasFrame:
226
+ print('No frames grabbed!')
227
+ break
228
+
229
+ # person detector inference
230
+ persons = person_detector.infer(frame)
231
+ poses = []
232
+
233
+ tm.start()
234
+ # Estimate the pose of each person
235
+ for person in persons:
236
+ # pose detector inference
237
+ pose = pose_estimator.infer(frame, person)
238
+ if pose is not None:
239
+ poses.append(pose)
240
+ tm.stop()
241
+ # Draw results on the input image
242
+ frame, view_3d = visualize(frame, poses)
243
+
244
+ if len(persons) == 0:
245
+ print('No person detected!')
246
+ else:
247
+ print('Person detected!')
248
+ cv.putText(frame, 'FPS: {:.2f}'.format(tm.getFPS()), (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
249
+
250
+ cv.imshow('MediaPipe Pose Detection Demo', frame)
251
+ cv.imshow('3D Pose Demo', view_3d)
252
+ tm.reset()
mp_pose.py ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import cv2 as cv
3
+
4
+ class MPPose:
5
+ def __init__(self, modelPath, confThreshold=0.5, backendId=0, targetId=0):
6
+ self.model_path = modelPath
7
+ self.conf_threshold = confThreshold
8
+ self.backend_id = backendId
9
+ self.target_id = targetId
10
+
11
+ self.input_size = np.array([256, 256]) # wh
12
+ # RoI will be larger so the performance will be better, but preprocess will be slower. Default to 1.
13
+ self.PERSON_BOX_PRE_ENLARGE_FACTOR = 1
14
+ self.PERSON_BOX_ENLARGE_FACTOR = 1.25
15
+
16
+ self.model = cv.dnn.readNet(self.model_path)
17
+ self.model.setPreferableBackend(self.backend_id)
18
+ self.model.setPreferableTarget(self.target_id)
19
+
20
+ @property
21
+ def name(self):
22
+ return self.__class__.__name__
23
+
24
+ def setBackendAndTarget(self, backendId, targetId):
25
+ self._backendId = backendId
26
+ self._targetId = targetId
27
+ self.model.setPreferableBackend(self.backend_id)
28
+ self.model.setPreferableTarget(self.target_id)
29
+
30
+ def _preprocess(self, image, person):
31
+ '''
32
+ Rotate input for inference.
33
+ Parameters:
34
+ image - input image of BGR channel order
35
+ face_bbox - human face bounding box found in image of format [[x1, y1], [x2, y2]] (top-left and bottom-right points)
36
+ person_landmarks - 4 landmarks (2 full body points, 2 upper body points) of shape [4, 2]
37
+ Returns:
38
+ rotated_person - rotated person image for inference
39
+ rotate_person_bbox - person box of interest range
40
+ angle - rotate angle for person
41
+ rotation_matrix - matrix for rotation and de-rotation
42
+ pad_bias - pad pixels of interest range
43
+ '''
44
+ # crop and pad image to interest range
45
+ pad_bias = np.array([0, 0], dtype=np.int32) # left, top
46
+ person_keypoints = person[4: 12].reshape(-1, 2)
47
+ mid_hip_point = person_keypoints[0]
48
+ full_body_point = person_keypoints[1]
49
+ # get RoI
50
+ full_dist = np.linalg.norm(mid_hip_point - full_body_point)
51
+ full_bbox = np.array([mid_hip_point - full_dist, mid_hip_point + full_dist], np.int32)
52
+ # enlarge to make sure full body can be cover
53
+ center_bbox = np.sum(full_bbox, axis=0) / 2
54
+ wh_bbox = full_bbox[1] - full_bbox[0]
55
+ new_half_size = wh_bbox * self.PERSON_BOX_PRE_ENLARGE_FACTOR / 2
56
+ full_bbox = np.array([
57
+ center_bbox - new_half_size,
58
+ center_bbox + new_half_size], np.int32)
59
+
60
+ person_bbox = full_bbox.copy()
61
+ # refine person bbox
62
+ person_bbox[:, 0] = np.clip(person_bbox[:, 0], 0, image.shape[1])
63
+ person_bbox[:, 1] = np.clip(person_bbox[:, 1], 0, image.shape[0])
64
+ # crop to the size of interest
65
+ image = image[person_bbox[0][1]:person_bbox[1][1], person_bbox[0][0]:person_bbox[1][0], :]
66
+ # pad to square
67
+ left, top = person_bbox[0] - full_bbox[0]
68
+ right, bottom = full_bbox[1] - person_bbox[1]
69
+ image = cv.copyMakeBorder(image, top, bottom, left, right, cv.BORDER_CONSTANT, None, (0, 0, 0))
70
+ pad_bias += person_bbox[0] - [left, top]
71
+ # compute rotation
72
+ mid_hip_point -= pad_bias
73
+ full_body_point -= pad_bias
74
+ radians = np.pi / 2 - np.arctan2(-(full_body_point[1] - mid_hip_point[1]), full_body_point[0] - mid_hip_point[0])
75
+ radians = radians - 2 * np.pi * np.floor((radians + np.pi) / (2 * np.pi))
76
+ angle = np.rad2deg(radians)
77
+ # get rotation matrix
78
+ rotation_matrix = cv.getRotationMatrix2D(mid_hip_point, angle, 1.0)
79
+ # get rotated image
80
+ rotated_image = cv.warpAffine(image, rotation_matrix, (image.shape[1], image.shape[0]))
81
+ # get landmark bounding box
82
+ blob = cv.resize(rotated_image, dsize=self.input_size, interpolation=cv.INTER_AREA).astype(np.float32)
83
+ rotated_person_bbox = np.array([[0, 0], [image.shape[1], image.shape[0]]], dtype=np.int32)
84
+ blob = cv.cvtColor(blob, cv.COLOR_BGR2RGB)
85
+ blob = blob / 255. # [0, 1]
86
+ return blob[np.newaxis, :, :, :], rotated_person_bbox, angle, rotation_matrix, pad_bias
87
+
88
+ def infer(self, image, person):
89
+ h, w, _ = image.shape
90
+ # Preprocess
91
+ input_blob, rotated_person_bbox, angle, rotation_matrix, pad_bias = self._preprocess(image, person)
92
+
93
+ # Forward
94
+ self.model.setInput(input_blob)
95
+ output_blob = self.model.forward(self.model.getUnconnectedOutLayersNames())
96
+
97
+ # Postprocess
98
+ results = self._postprocess(output_blob, rotated_person_bbox, angle, rotation_matrix, pad_bias, np.array([w, h]))
99
+ return results # [bbox_coords, landmarks_coords, conf]
100
+
101
+ def _postprocess(self, blob, rotated_person_bbox, angle, rotation_matrix, pad_bias, img_size):
102
+ landmarks, conf, mask, heatmap, landmarks_word = blob
103
+
104
+ conf = conf[0][0]
105
+ if conf < self.conf_threshold:
106
+ return None
107
+
108
+ landmarks = landmarks[0].reshape(-1, 5) # shape: (1, 195) -> (39, 5)
109
+ landmarks_word = landmarks_word[0].reshape(-1, 3) # shape: (1, 117) -> (39, 3)
110
+
111
+ # recover sigmoid score
112
+ landmarks[:, 3:] = 1 / (1 + np.exp(-landmarks[:, 3:]))
113
+ # TODO: refine landmarks with heatmap. reference: https://github.com/tensorflow/tfjs-models/blob/master/pose-detection/src/blazepose_tfjs/detector.ts#L577-L582
114
+ heatmap = heatmap[0]
115
+
116
+ # transform coords back to the input coords
117
+ wh_rotated_person_bbox = rotated_person_bbox[1] - rotated_person_bbox[0]
118
+ scale_factor = wh_rotated_person_bbox / self.input_size
119
+ landmarks[:, :2] = (landmarks[:, :2] - self.input_size / 2) * scale_factor
120
+ landmarks[:, 2] = landmarks[:, 2] * max(scale_factor) # depth scaling
121
+ coords_rotation_matrix = cv.getRotationMatrix2D((0, 0), angle, 1.0)
122
+ rotated_landmarks = np.dot(landmarks[:, :2], coords_rotation_matrix[:, :2])
123
+ rotated_landmarks = np.c_[rotated_landmarks, landmarks[:, 2:]]
124
+ rotated_landmarks_world = np.dot(landmarks_word[:, :2], coords_rotation_matrix[:, :2])
125
+ rotated_landmarks_world = np.c_[rotated_landmarks_world, landmarks_word[:, 2]]
126
+ # invert rotation
127
+ rotation_component = np.array([
128
+ [rotation_matrix[0][0], rotation_matrix[1][0]],
129
+ [rotation_matrix[0][1], rotation_matrix[1][1]]])
130
+ translation_component = np.array([
131
+ rotation_matrix[0][2], rotation_matrix[1][2]])
132
+ inverted_translation = np.array([
133
+ -np.dot(rotation_component[0], translation_component),
134
+ -np.dot(rotation_component[1], translation_component)])
135
+ inverse_rotation_matrix = np.c_[rotation_component, inverted_translation]
136
+ # get box center
137
+ center = np.append(np.sum(rotated_person_bbox, axis=0) / 2, 1)
138
+ original_center = np.array([
139
+ np.dot(center, inverse_rotation_matrix[0]),
140
+ np.dot(center, inverse_rotation_matrix[1])])
141
+ landmarks[:, :2] = rotated_landmarks[:, :2] + original_center + pad_bias
142
+
143
+ # get bounding box from rotated_landmarks
144
+ bbox = np.array([
145
+ np.amin(landmarks[:, :2], axis=0),
146
+ np.amax(landmarks[:, :2], axis=0)]) # [top-left, bottom-right]
147
+ center_bbox = np.sum(bbox, axis=0) / 2
148
+ wh_bbox = bbox[1] - bbox[0]
149
+ new_half_size = wh_bbox * self.PERSON_BOX_ENLARGE_FACTOR / 2
150
+ bbox = np.array([
151
+ center_bbox - new_half_size,
152
+ center_bbox + new_half_size])
153
+
154
+ # invert rotation for mask
155
+ mask = mask[0].reshape(256, 256) # shape: (1, 256, 256, 1) -> (256, 256)
156
+ invert_rotation_matrix = cv.getRotationMatrix2D((mask.shape[1]/2, mask.shape[0]/2), -angle, 1.0)
157
+ invert_rotation_mask = cv.warpAffine(mask, invert_rotation_matrix, (mask.shape[1], mask.shape[0]))
158
+ # enlarge mask
159
+ invert_rotation_mask = cv.resize(invert_rotation_mask, wh_rotated_person_bbox)
160
+ # crop and pad mask
161
+ min_w, min_h = -np.minimum(pad_bias, 0)
162
+ left, top = np.maximum(pad_bias, 0)
163
+ pad_over = img_size - [invert_rotation_mask.shape[1], invert_rotation_mask.shape[0]] - pad_bias
164
+ max_w, max_h = np.minimum(pad_over, 0) + [invert_rotation_mask.shape[1], invert_rotation_mask.shape[0]]
165
+ right, bottom = np.maximum(pad_over, 0)
166
+ invert_rotation_mask = invert_rotation_mask[min_h:max_h, min_w:max_w]
167
+ invert_rotation_mask = cv.copyMakeBorder(invert_rotation_mask, top, bottom, left, right, cv.BORDER_CONSTANT, None, 0)
168
+ # binarize mask
169
+ invert_rotation_mask = np.where(invert_rotation_mask > 0, 255, 0).astype(np.uint8)
170
+
171
+ # 2*2 person bbox: [[x1, y1], [x2, y2]]
172
+ # 39*5 screen landmarks: 33 keypoints and 6 auxiliary points with [x, y, z, visibility, presence], z value is relative to HIP
173
+ # Visibility is probability that a keypoint is located within the frame and not occluded by another bigger body part or another object
174
+ # Presence is probability that a keypoint is located within the frame
175
+ # 39*3 world landmarks: 33 keypoints and 6 auxiliary points with [x, y, z] 3D metric x, y, z coordinate
176
+ # img_height*img_width mask: gray mask, where 255 indicates the full body of a person and 0 means background
177
+ # 64*64*39 heatmap: currently only used for refining landmarks, requires sigmod processing before use
178
+ # conf: confidence of prediction
179
+ return [bbox, landmarks, rotated_landmarks_world, invert_rotation_mask, heatmap, conf]