prithivMLmods commited on
Commit
dc1bc05
·
verified ·
1 Parent(s): bc58bda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -1
README.md CHANGED
@@ -13,6 +13,9 @@ tags:
13
  ---
14
 
15
  ![zfdggzdrg.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/DPx-7s4BmG_XocnPQ4TR9.png)
 
 
 
16
 
17
  ```py
18
  Classification Report:
@@ -39,4 +42,99 @@ listening_to_music 0.8494 0.7988 0.8233 840
39
  weighted avg 0.8421 0.8327 0.8339 12600
40
  ```
41
 
42
- ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9ir2VwHirB-T75ABCP7m.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  ![zfdggzdrg.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/DPx-7s4BmG_XocnPQ4TR9.png)
16
+ # **Human-Action-Recognition**
17
+
18
+ > **Human-Action-Recognition** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for multi-class human action recognition. It uses the **SiglipForImageClassification** architecture to predict human activities from still images.
19
 
20
  ```py
21
  Classification Report:
 
42
  weighted avg 0.8421 0.8327 0.8339 12600
43
  ```
44
 
45
+ ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9ir2VwHirB-T75ABCP7m.png)
46
+
47
+ The model categorizes images into 15 action classes:
48
+
49
+ - **0:** calling
50
+ - **1:** clapping
51
+ - **2:** cycling
52
+ - **3:** dancing
53
+ - **4:** drinking
54
+ - **5:** eating
55
+ - **6:** fighting
56
+ - **7:** hugging
57
+ - **8:** laughing
58
+ - **9:** listening_to_music
59
+ - **10:** running
60
+ - **11:** sitting
61
+ - **12:** sleeping
62
+ - **13:** texting
63
+ - **14:** using_laptop
64
+
65
+ ---
66
+
67
+ # **Run with Transformers 🤗**
68
+
69
+ ```python
70
+ !pip install -q transformers torch pillow gradio
71
+ ```
72
+
73
+ ```python
74
+ import gradio as gr
75
+ from transformers import AutoImageProcessor, SiglipForImageClassification
76
+ from PIL import Image
77
+ import torch
78
+
79
+ # Load model and processor
80
+ model_name = "prithivMLmods/Human-Action-Recognition" # Change to your updated model path
81
+ model = SiglipForImageClassification.from_pretrained(model_name)
82
+ processor = AutoImageProcessor.from_pretrained(model_name)
83
+
84
+ # ID to Label mapping
85
+ id2label = {
86
+ 0: "calling",
87
+ 1: "clapping",
88
+ 2: "cycling",
89
+ 3: "dancing",
90
+ 4: "drinking",
91
+ 5: "eating",
92
+ 6: "fighting",
93
+ 7: "hugging",
94
+ 8: "laughing",
95
+ 9: "listening_to_music",
96
+ 10: "running",
97
+ 11: "sitting",
98
+ 12: "sleeping",
99
+ 13: "texting",
100
+ 14: "using_laptop"
101
+ }
102
+
103
+ def classify_action(image):
104
+ """Predicts the human action in the image."""
105
+ image = Image.fromarray(image).convert("RGB")
106
+ inputs = processor(images=image, return_tensors="pt")
107
+
108
+ with torch.no_grad():
109
+ outputs = model(**inputs)
110
+ logits = outputs.logits
111
+ probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
112
+
113
+ predictions = {id2label[i]: round(probs[i], 3) for i in range(len(probs))}
114
+ return predictions
115
+
116
+ # Gradio interface
117
+ iface = gr.Interface(
118
+ fn=classify_action,
119
+ inputs=gr.Image(type="numpy"),
120
+ outputs=gr.Label(label="Action Prediction Scores"),
121
+ title="Human Action Recognition",
122
+ description="Upload an image to recognize the human action (e.g., dancing, calling, sitting, etc.)."
123
+ )
124
+
125
+ # Launch the app
126
+ if __name__ == "__main__":
127
+ iface.launch()
128
+ ```
129
+
130
+ ---
131
+
132
+ # **Intended Use**
133
+
134
+ The **Human-Action-Recognition** model is designed to detect and classify human actions from images. Example applications:
135
+
136
+ - **Surveillance & Monitoring:** Recognizing suspicious or specific activities in public spaces.
137
+ - **Sports Analytics:** Identifying player activities or movements.
138
+ - **Social Media Insights:** Understanding trends in user-posted visuals.
139
+ - **Healthcare:** Monitoring elderly or patients for activity patterns.
140
+ - **Robotics & Automation:** Enabling context-aware AI systems with visual understanding.