[코드]
###########################################
# 2-1. Zero-shot 평가 (영어)
## About hellaswag, copa, boolq, mmlu
!lm_eval --model hf \
--model_args pretrained=[...Custom_LLM...] \
--tasks hellaswag,copa,boolq,mmlu \
--device cuda:0 \
--batch_size 8 \
--num_fewshot 0
[결과]
hf (pretrained=cashbook/SOLAR-Platypus-10.7B-v1-kjw), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 8
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|---------------------------------------|-------|------|-----:|--------|-----:|---|-----:|
|mmlu |N/A |none | 0|acc |0.6304|± |0.0038|
| - humanities |N/A |none | 0|acc |0.5626|± |0.0066|
| - formal_logic | 0|none | 0|acc |0.3413|± |0.0424|
| - high_school_european_history | 0|none | 0|acc |0.7818|± |0.0323|
| - high_school_us_history | 0|none | 0|acc |0.8284|± |0.0265|
| - high_school_world_history | 0|none | 0|acc |0.8101|± |0.0255|
| - international_law | 0|none | 0|acc |0.8099|± |0.0358|
| - jurisprudence | 0|none | 0|acc |0.7407|± |0.0424|
| - logical_fallacies | 0|none | 0|acc |0.7607|± |0.0335|
| - moral_disputes | 0|none | 0|acc |0.7312|± |0.0239|
| - moral_scenarios | 0|none | 0|acc |0.2413|± |0.0143|
| - philosophy | 0|none | 0|acc |0.7074|± |0.0258|
| - prehistory | 0|none | 0|acc |0.7500|± |0.0241|
| - professional_law | 0|none | 0|acc |0.4831|± |0.0128|
| - world_religions | 0|none | 0|acc |0.8129|± |0.0299|
| - other |N/A |none | 0|acc |0.7219|± |0.0077|
| - business_ethics | 0|none | 0|acc |0.7000|± |0.0461|
| - clinical_knowledge | 0|none | 0|acc |0.6981|± |0.0283|
| - college_medicine | 0|none | 0|acc |0.6474|± |0.0364|
| - global_facts | 0|none | 0|acc |0.3600|± |0.0482|
| - human_aging | 0|none | 0|acc |0.7175|± |0.0302|
| - management | 0|none | 0|acc |0.7961|± |0.0399|
| - marketing | 0|none | 0|acc |0.8932|± |0.0202|
| - medical_genetics | 0|none | 0|acc |0.7800|± |0.0416|
| - miscellaneous | 0|none | 0|acc |0.8340|± |0.0133|
| - nutrition | 0|none | 0|acc |0.7516|± |0.0247|
| - professional_accounting | 0|none | 0|acc |0.5319|± |0.0298|
| - professional_medicine | 0|none | 0|acc |0.7022|± |0.0278|
| - virology | 0|none | 0|acc |0.5241|± |0.0389|
| - social_sciences |N/A |none | 0|acc |0.7423|± |0.0077|
| - econometrics | 0|none | 0|acc |0.4737|± |0.0470|
| - high_school_geography | 0|none | 0|acc |0.8131|± |0.0278|
| - high_school_government_and_politics| 0|none | 0|acc |0.8756|± |0.0238|
| - high_school_macroeconomics | 0|none | 0|acc |0.6308|± |0.0245|
| - high_school_microeconomics | 0|none | 0|acc |0.7269|± |0.0289|
| - high_school_psychology | 0|none | 0|acc |0.8367|± |0.0158|
| - human_sexuality | 0|none | 0|acc |0.7786|± |0.0364|
| - professional_psychology | 0|none | 0|acc |0.6667|± |0.0191|
| - public_relations | 0|none | 0|acc |0.7000|± |0.0439|
| - security_studies | 0|none | 0|acc |0.7388|± |0.0281|
| - sociology | 0|none | 0|acc |0.8507|± |0.0252|
| - us_foreign_policy | 0|none | 0|acc |0.8600|± |0.0349|
| - stem |N/A |none | 0|acc |0.5322|± |0.0086|
| - abstract_algebra | 0|none | 0|acc |0.3300|± |0.0473|
| - anatomy | 0|none | 0|acc |0.5926|± |0.0424|
| - astronomy | 0|none | 0|acc |0.7039|± |0.0372|
| - college_biology | 0|none | 0|acc |0.7708|± |0.0351|
| - college_chemistry | 0|none | 0|acc |0.4100|± |0.0494|
| - college_computer_science | 0|none | 0|acc |0.5400|± |0.0501|
| - college_mathematics | 0|none | 0|acc |0.3800|± |0.0488|
| - college_physics | 0|none | 0|acc |0.4118|± |0.0490|
| - computer_security | 0|none | 0|acc |0.7300|± |0.0446|
| - conceptual_physics | 0|none | 0|acc |0.5362|± |0.0326|
| - electrical_engineering | 0|none | 0|acc |0.5655|± |0.0413|
| - elementary_mathematics | 0|none | 0|acc |0.4339|± |0.0255|
| - high_school_biology | 0|none | 0|acc |0.7742|± |0.0238|
| - high_school_chemistry | 0|none | 0|acc |0.4975|± |0.0352|
| - high_school_computer_science | 0|none | 0|acc |0.6200|± |0.0488|
| - high_school_mathematics | 0|none | 0|acc |0.3593|± |0.0293|
| - high_school_physics | 0|none | 0|acc |0.3974|± |0.0400|
| - high_school_statistics | 0|none | 0|acc |0.5509|± |0.0339|
| - machine_learning | 0|none | 0|acc |0.4286|± |0.0470|
|hellaswag | 1|none | 0|acc |0.6396|± |0.0048|
| | |none | 0|acc_norm|0.8310|± |0.0037|
|copa | 1|none | 0|acc |0.8700|± |0.0338|
|boolq | 2|none | 0|acc |0.8260|± |0.0066|
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu |N/A |none | 0|acc |0.6304|± |0.0038|
| - humanities |N/A |none | 0|acc |0.5626|± |0.0066|
| - other |N/A |none | 0|acc |0.7219|± |0.0077|
| - social_sciences|N/A |none | 0|acc |0.7423|± |0.0077|
| - stem |N/A |none | 0|acc |0.5322|± |0.0086|