'2024/02 글 목록

구글, 개방향 AI 모델로 급선회... LLM 학습 다음 목표

AI Insight 2024. 2. 23. 10:43

구글, 개방형 AI 모델로 급선회… 폐쇄형 오픈AI와 ‘진검승부’

https://v.daum.net/v/20240223050328046

구글, 개방형 AI 모델로 급선회… 폐쇄형 오픈AI와 ‘진검승부’

생성형 인공지능(AI) 업계에서 오픈AI와 패권을 다투는 구글이 ‘오픈소스(개방형) AI’ 진영으로 한 걸음 이동했다. 업계는 챗GPT의 기반 모델인 ‘GPT4’의 어떤 것도 공개하지 않는 오픈AI와 구글

v.daum.net

구글은 21일(현지시간) 거대언어모델(LLM) ‘젬마’를 오픈소스로 공개했다. 이에 따라 개별 연구자나 개발자, 기업, 연구기관 등은 젬마를 자유롭게 활용할 수 있다. 젬마는 구글의 AI 모델 ‘제미나이’의 경량형이라고 생각하면 쉽다.

>

제미나이 아주 잘 쓰고 있음.

다음 목표는 젬마!

(그 다음 라마, 버트. 솔라는 했고.)

Posted by 캬웃

,

[Fine-tuning] Zero-shot 평가 (영어)

LLM 2024. 2. 20. 15:46

[코드]

###########################################

# 2-1. Zero-shot 평가 (영어)

## About hellaswag, copa, boolq, mmlu

!lm_eval --model hf \

--model_args pretrained=[...Custom_LLM...] \

--tasks hellaswag,copa,boolq,mmlu \

--device cuda:0 \

--batch_size 8 \

--num_fewshot 0

[결과]

hf (pretrained=cashbook/SOLAR-Platypus-10.7B-v1-kjw), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 8
|                 Tasks                 |Version|Filter|n-shot| Metric |Value |   |Stderr|
|---------------------------------------|-------|------|-----:|--------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc     |0.6304|±  |0.0038|
| - humanities                          |N/A    |none  |     0|acc     |0.5626|±  |0.0066|
|  - formal_logic                       |      0|none  |     0|acc     |0.3413|±  |0.0424|
|  - high_school_european_history       |      0|none  |     0|acc     |0.7818|±  |0.0323|
|  - high_school_us_history             |      0|none  |     0|acc     |0.8284|±  |0.0265|
|  - high_school_world_history          |      0|none  |     0|acc     |0.8101|±  |0.0255|
|  - international_law                  |      0|none  |     0|acc     |0.8099|±  |0.0358|
|  - jurisprudence                      |      0|none  |     0|acc     |0.7407|±  |0.0424|
|  - logical_fallacies                  |      0|none  |     0|acc     |0.7607|±  |0.0335|
|  - moral_disputes                     |      0|none  |     0|acc     |0.7312|±  |0.0239|
|  - moral_scenarios                    |      0|none  |     0|acc     |0.2413|±  |0.0143|
|  - philosophy                         |      0|none  |     0|acc     |0.7074|±  |0.0258|
|  - prehistory                         |      0|none  |     0|acc     |0.7500|±  |0.0241|
|  - professional_law                   |      0|none  |     0|acc     |0.4831|±  |0.0128|
|  - world_religions                    |      0|none  |     0|acc     |0.8129|±  |0.0299|
| - other                               |N/A    |none  |     0|acc     |0.7219|±  |0.0077|
|  - business_ethics                    |      0|none  |     0|acc     |0.7000|±  |0.0461|
|  - clinical_knowledge                 |      0|none  |     0|acc     |0.6981|±  |0.0283|
|  - college_medicine                   |      0|none  |     0|acc     |0.6474|±  |0.0364|
|  - global_facts                       |      0|none  |     0|acc     |0.3600|±  |0.0482|
|  - human_aging                        |      0|none  |     0|acc     |0.7175|±  |0.0302|
|  - management                         |      0|none  |     0|acc     |0.7961|±  |0.0399|
|  - marketing                          |      0|none  |     0|acc     |0.8932|±  |0.0202|
|  - medical_genetics                   |      0|none  |     0|acc     |0.7800|±  |0.0416|
|  - miscellaneous                      |      0|none  |     0|acc     |0.8340|±  |0.0133|
|  - nutrition                          |      0|none  |     0|acc     |0.7516|±  |0.0247|
|  - professional_accounting            |      0|none  |     0|acc     |0.5319|±  |0.0298|
|  - professional_medicine              |      0|none  |     0|acc     |0.7022|±  |0.0278|
|  - virology                           |      0|none  |     0|acc     |0.5241|±  |0.0389|
| - social_sciences                     |N/A    |none  |     0|acc     |0.7423|±  |0.0077|
|  - econometrics                       |      0|none  |     0|acc     |0.4737|±  |0.0470|
|  - high_school_geography              |      0|none  |     0|acc     |0.8131|±  |0.0278|
|  - high_school_government_and_politics|      0|none  |     0|acc     |0.8756|±  |0.0238|
|  - high_school_macroeconomics         |      0|none  |     0|acc     |0.6308|±  |0.0245|
|  - high_school_microeconomics         |      0|none  |     0|acc     |0.7269|±  |0.0289|
|  - high_school_psychology             |      0|none  |     0|acc     |0.8367|±  |0.0158|
|  - human_sexuality                    |      0|none  |     0|acc     |0.7786|±  |0.0364|
|  - professional_psychology            |      0|none  |     0|acc     |0.6667|±  |0.0191|
|  - public_relations                   |      0|none  |     0|acc     |0.7000|±  |0.0439|
|  - security_studies                   |      0|none  |     0|acc     |0.7388|±  |0.0281|
|  - sociology                          |      0|none  |     0|acc     |0.8507|±  |0.0252|
|  - us_foreign_policy                  |      0|none  |     0|acc     |0.8600|±  |0.0349|
| - stem                                |N/A    |none  |     0|acc     |0.5322|±  |0.0086|
|  - abstract_algebra                   |      0|none  |     0|acc     |0.3300|±  |0.0473|
|  - anatomy                            |      0|none  |     0|acc     |0.5926|±  |0.0424|
|  - astronomy                          |      0|none  |     0|acc     |0.7039|±  |0.0372|
|  - college_biology                    |      0|none  |     0|acc     |0.7708|±  |0.0351|
|  - college_chemistry                  |      0|none  |     0|acc     |0.4100|±  |0.0494|
|  - college_computer_science           |      0|none  |     0|acc     |0.5400|±  |0.0501|
|  - college_mathematics                |      0|none  |     0|acc     |0.3800|±  |0.0488|
|  - college_physics                    |      0|none  |     0|acc     |0.4118|±  |0.0490|
|  - computer_security                  |      0|none  |     0|acc     |0.7300|±  |0.0446|
|  - conceptual_physics                 |      0|none  |     0|acc     |0.5362|±  |0.0326|
|  - electrical_engineering             |      0|none  |     0|acc     |0.5655|±  |0.0413|
|  - elementary_mathematics             |      0|none  |     0|acc     |0.4339|±  |0.0255|
|  - high_school_biology                |      0|none  |     0|acc     |0.7742|±  |0.0238|
|  - high_school_chemistry              |      0|none  |     0|acc     |0.4975|±  |0.0352|
|  - high_school_computer_science       |      0|none  |     0|acc     |0.6200|±  |0.0488|
|  - high_school_mathematics            |      0|none  |     0|acc     |0.3593|±  |0.0293|
|  - high_school_physics                |      0|none  |     0|acc     |0.3974|±  |0.0400|
|  - high_school_statistics             |      0|none  |     0|acc     |0.5509|±  |0.0339|
|  - machine_learning                   |      0|none  |     0|acc     |0.4286|±  |0.0470|
|hellaswag                              |      1|none  |     0|acc     |0.6396|±  |0.0048|
|                                       |       |none  |     0|acc_norm|0.8310|±  |0.0037|
|copa                                   |      1|none  |     0|acc     |0.8700|±  |0.0338|
|boolq                                  |      2|none  |     0|acc     |0.8260|±  |0.0066|

|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.6304|±  |0.0038|
| - humanities     |N/A    |none  |     0|acc   |0.5626|±  |0.0066|
| - other          |N/A    |none  |     0|acc   |0.7219|±  |0.0077|
| - social_sciences|N/A    |none  |     0|acc   |0.7423|±  |0.0077|
| - stem           |N/A    |none  |     0|acc   |0.5322|±  |0.0086|

'LLM' 카테고리의 다른 글

BART (Bidirectional Auto-Regressive Transformer) (0)	2024.02.20
[에러] config.json (0)	2024.02.20

Posted by 캬웃

,

BERT (Bidirectional Encoder Representations from Transformers)

카테고리 없음 2024. 2. 20. 14:33

BERT(Bidirectional Encoder Representations from Transformers)는 구글 연구원이 2018년에 도입한 마스킹된 언어 모델 제품군이다.

Posted by 캬웃

,

BART (Bidirectional Auto-Regressive Transformer)

LLM 2024. 2. 20. 14:32

# Facebook에서 개발

# BART는 BERT와 GPT를 하나로 합친 형태

(기존 Sequence to Sequence 트랜스포머 모델을 새로운 Pre-training objective를 통해 학습하여 하나로 합친 모델)

# 핵심 장점:

noising의 유연성

어떤 임의의 변형이라도 기존 텍스트에 바로 적용될 수 있으며, 심지어 길이도 변화시킬 수 있습니다.

# 모델

자 그러면 모델 구조를 알아봅시다. BART는 손상된 문서를 기존 문서로 되돌리는 denoising autoencoder입니다. BART는 seq2seq 모델으로 구현되어 있고 손상된 텍스트를 birdirectional encoder(BERT)가 엔코딩하고 이를 left-to-right autoregressive decoder(GPT)가 받습니다. 사전학습을 위해, 기존의 negative log lielihood를 최적화 하였다고 합니다.

# 논문 작성에 참고:

기존에 있던 모델을 하나로 단순히 합친 모델, 아니 심지어 기존 Viswani et al.이 제안한 트랜스포머 모델 구조와 동일한데 어떻게 논문이 될 수 있었을까요? 바로 BART 모델이 여러 자연어 벤치마크에서 sota를 달성한 것도 있겠지만, 여러 사전 학습 태스크에 대한 면밀한 분석도 한 몫하였습니다.

논문을 쓰시는 독자 여러분들께선 와 이렇게도 논문이 만들어지는구나~ 하고 보시면 될 것 같고, 자연어 처리를 공부하시는 입장에서는 사전학습의 중요성에 대해 알아가시면 좋을 것 같습니다.

# 사전학습에 참고:

사전 학습에 대한 내용을 더 많이 알고싶다면, T5논문을 읽는것을 추천드립니다. 강추!!

[참고문헌]

https://chloelab.tistory.com/34

'LLM' 카테고리의 다른 글

[Fine-tuning] Zero-shot 평가 (영어) (0)	2024.02.20
[에러] config.json (0)	2024.02.20

Posted by 캬웃

,

[허깅페이스] 평가 제출

카테고리 없음 2024. 2. 20. 13:56

허깅페이스에서 평가 제출했더니
Citation
Copy the following snippet to cite these results

@misc{open-llm-leaderboard,
  author = {Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf},
  title = {Open LLM Leaderboard},
  year = {2023},
  publisher = {Hugging Face},
  howpublished = "\url{https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard}"
}
@software{eval-harness,
  author       = {Gao, Leo and
                  Tow, Jonathan and
                  Biderman, Stella and
                  Black, Sid and
                  DiPofi, Anthony and
                  Foster, Charles and
                  Golding, Laurence and
                  Hsu, Jeffrey and
                  McDonell, Kyle and
                  Muennighoff, Niklas and
                  Phang, Jason and
                  Reynolds, Laria and
                  Tang, Eric and
                  Thite, Anish and
                  Wang, Ben and
                  Wang, Kevin and
                  Zou, Andy},
  title        = {A framework for few-shot language model evaluation},
  month        = sep,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v0.0.1},
  doi          = {10.5281/zenodo.5371628},
  url          = {https://doi.org/10.5281/zenodo.5371628}
}
@misc{clark2018think,
      title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
      author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
      year={2018},
      eprint={1803.05457},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}
@misc{zellers2019hellaswag,
      title={HellaSwag: Can a Machine Really Finish Your Sentence?},
      author={Rowan Zellers and Ari Holtzman and Yonatan Bisk and Ali Farhadi and Yejin Choi},
      year={2019},
      eprint={1905.07830},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{hendrycks2021measuring,
      title={Measuring Massive Multitask Language Understanding},
      author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt},
      year={2021},
      eprint={2009.03300},
      archivePrefix={arXiv},
      primaryClass={cs.CY}
}
@misc{lin2022truthfulqa,
      title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
      author={Stephanie Lin and Jacob Hilton and Owain Evans},
      year={2022},
      eprint={2109.07958},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{DBLP:journals/corr/abs-1907-10641,
      title={{WINOGRANDE:} An Adversarial Winograd Schema Challenge at Scale},
      author={Keisuke Sakaguchi and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi},
      year={2019},
      eprint={1907.10641},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{DBLP:journals/corr/abs-2110-14168,
      title={Training Verifiers to Solve Math Word Problems},
      author={Karl Cobbe and
                  Vineet Kosaraju and
                  Mohammad Bavarian and
                  Mark Chen and
                  Heewoo Jun and
                  Lukasz Kaiser and
                  Matthias Plappert and
                  Jerry Tworek and
                  Jacob Hilton and
                  Reiichiro Nakano and
                  Christopher Hesse and
                  John Schulman},
      year={2021},
      eprint={2110.14168},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

라고 뜨는데 뭐에 쓰는 거임?

Posted by 캬웃

,

[에러] config.json

LLM 2024. 2. 20. 13:55

[증상]

colab에서 평가 돌렸을 때 아래와 같이 에러남

File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
    r = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
    response = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 409, in _request_wrapper
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 296, in hf_raise_for_status
    raise EntryNotFoundError(message, response) from e
huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-65d2f58b-6bbe66047d0d87c019aa16ba;557e17ef-bc31-490e-bd8b-52d4921f268b)

Entry Not Found for url: https://huggingface.co/cashbook/SOLAR-Platypus-10.7B-v1-kjw/resolve/main/config.json.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/__main__.py", line 279, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/evaluator.py", line 123, in simple_evaluate
    lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/api/model.py", line 134, in create_from_arg_string
    return cls(**args, **args2)
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/models/huggingface.py", line 187, in __init__
    self._get_config(
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/models/huggingface.py", line 444, in _get_config
    self._config = transformers.AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 622, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 677, in _get_config_dict
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 481, in cached_file
    raise EnvironmentError(
OSError: cashbook/SOLAR-Platypus-10.7B-v1-kjw does not appear to have a file named config.json. Checkout 'https://huggingface.co/cashbook/SOLAR-Platypus-10.7B-v1-kjw/main' for available files.

[해결]

config.json이 없어서 그럼.
허깅페이스에서도 이 파일 없으면 모델 없다고 안올라감.

모델 업로드할 때 필요한 파일들 중 config.json만 누락되어 있어서,

구글 드라이브에 올라와 있는 거 받아서 수동으로 이것만 올리니 해결.

'LLM' 카테고리의 다른 글

[Fine-tuning] Zero-shot 평가 (영어) (0)	2024.02.20
BART (Bidirectional Auto-Regressive Transformer) (0)	2024.02.20

Posted by 캬웃

,

colab 실행 중 Time-out

colab 2024. 2. 20. 13:44

구글 코랩(colab)은 90분 동안 아무런 interaction이 없는 경우 연결이 끊겨버리고 학습이 중단될수 있다.

이런 경우를 방지하는 방법은 90분안에 아무 interaction을 하면 될것이다.

F12를 누르거나 Ctrl+shift+i를 누르면 개발자 콘솔이 나타난다.

개발자 콘솔 가장 밑에 코드를 입력할 수 있는 창이있다. 이곳에 아래 JS 코드를 입력하면 된다.

function ClickConnect(){
console.log("코랩 연결 끊김 방지");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect, 60 * 1000)

1분마다 ClickConnect() 함수를 실행시킨다.

===================

https://research.google.com/colaboratory/faq.html#idle-timeouts
여기엔 최대 12시간이라고 써있지만 한 3시간 정도만 지나도 런타임 연결이 끊어져 있다.

90분이라고 생각해야될 듯.

Posted by 캬웃

,

다섯째 날.

AI 일기 2024. 2. 19. 16:45

허깅페이스에 파인튜닝한 걸 올림

리더보드 평가에 모델 찾을 수 없다고 에러남.

코랩에서도 평가 돌렸는데 아래와 같이 에러남.

2024-02-19 06:30:21.446937: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-19 06:30:21.446989: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-19 06:30:21.448322: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-19 06:30:22.563854: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-19:06:30:24,734 INFO     [__main__.py:200] Verbosity set to INFO
2024-02-19:06:30:24,734 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-19:06:30:35,622 INFO     [__main__.py:276] Selected Tasks: ['boolq', 'copa', 'hellaswag', 'mmlu']
2024-02-19:06:30:35,623 INFO     [__main__.py:277] Loading selected tasks...
2024-02-19:06:30:35,623 INFO     [evaluator.py:95] Setting random seed to 0
2024-02-19:06:30:35,623 INFO     [evaluator.py:99] Setting numpy seed to 1234
2024-02-19:06:30:35,623 INFO     [evaluator.py:103] Setting torch manual seed to 1234
2024-02-19:06:30:35,648 INFO     [huggingface.py:161] Using device 'cuda:0'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/cashbook/SOLAR-Platypus-10.7B-v1-kjw/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 430, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
    r = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
    response = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 409, in _request_wrapper
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 296, in hf_raise_for_status
    raise EntryNotFoundError(message, response) from e
huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-65d2f58b-6bbe66047d0d87c019aa16ba;557e17ef-bc31-490e-bd8b-52d4921f268b)

Entry Not Found for url: https://huggingface.co/cashbook/SOLAR-Platypus-10.7B-v1-kjw/resolve/main/config.json.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/__main__.py", line 279, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/evaluator.py", line 123, in simple_evaluate
    lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/api/model.py", line 134, in create_from_arg_string
    return cls(**args, **args2)
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/models/huggingface.py", line 187, in __init__
    self._get_config(
  File "/content/drive/MyDrive/FastCampus-LLM/lm-evaluation-harness/lm_eval/models/huggingface.py", line 444, in _get_config
    self._config = transformers.AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 622, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 677, in _get_config_dict
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 481, in cached_file
    raise EnvironmentError(
OSError: cashbook/SOLAR-Platypus-10.7B-v1-kjw does not appear to have a file named config.json. Checkout 'https://huggingface.co/cashbook/SOLAR-Platypus-10.7B-v1-kjw/main' for available files.

> config.json을 찾을 수 없다는 메시지.

> 드라이브엔 config.json이 있음. 비교해보니 이거 외엔 다 허깅페이스에 업로드 되어있음.

> 처음부터 다시 돌릴 생각하니 까마득함.

> 수동으로 올릴 수 있나? 확인. 가능하구나!

> config.json만 올리고, 돌려보니 코랩과 허깅페이스 둘 다 문제없이 돌아가는 중.

허깅페이스에 제출했더니

Citation
Copy the following snippet to cite these results

@misc{open-llm-leaderboard,
  author = {Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf},
  title = {Open LLM Leaderboard},
  year = {2023},
  publisher = {Hugging Face},
  howpublished = "\url{https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard}"
}
@software{eval-harness,
  author       = {Gao, Leo and
                  Tow, Jonathan and
                  Biderman, Stella and
                  Black, Sid and
                  DiPofi, Anthony and
                  Foster, Charles and
                  Golding, Laurence and
                  Hsu, Jeffrey and
                  McDonell, Kyle and
                  Muennighoff, Niklas and
                  Phang, Jason and
                  Reynolds, Laria and
                  Tang, Eric and
                  Thite, Anish and
                  Wang, Ben and
                  Wang, Kevin and
                  Zou, Andy},
  title        = {A framework for few-shot language model evaluation},
  month        = sep,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v0.0.1},
  doi          = {10.5281/zenodo.5371628},
  url          = {https://doi.org/10.5281/zenodo.5371628}
}
@misc{clark2018think,
      title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
      author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
      year={2018},
      eprint={1803.05457},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}
@misc{zellers2019hellaswag,
      title={HellaSwag: Can a Machine Really Finish Your Sentence?},
      author={Rowan Zellers and Ari Holtzman and Yonatan Bisk and Ali Farhadi and Yejin Choi},
      year={2019},
      eprint={1905.07830},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{hendrycks2021measuring,
      title={Measuring Massive Multitask Language Understanding},
      author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt},
      year={2021},
      eprint={2009.03300},
      archivePrefix={arXiv},
      primaryClass={cs.CY}
}
@misc{lin2022truthfulqa,
      title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
      author={Stephanie Lin and Jacob Hilton and Owain Evans},
      year={2022},
      eprint={2109.07958},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{DBLP:journals/corr/abs-1907-10641,
      title={{WINOGRANDE:} An Adversarial Winograd Schema Challenge at Scale},
      author={Keisuke Sakaguchi and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi},
      year={2019},
      eprint={1907.10641},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{DBLP:journals/corr/abs-2110-14168,
      title={Training Verifiers to Solve Math Word Problems},
      author={Karl Cobbe and
                  Vineet Kosaraju and
                  Mohammad Bavarian and
                  Mark Chen and
                  Heewoo Jun and
                  Lukasz Kaiser and
                  Matthias Plappert and
                  Jerry Tworek and
                  Jacob Hilton and
                  Reiichiro Nakano and
                  Christopher Hesse and
                  John Schulman},
      year={2021},
      eprint={2110.14168},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

라고 뜨는데 뭐에 쓰는 거임?

구글에서 영어 이력서 달라고 연락옴.

주말 동안 영어로 CoverLetter 1장, Resume 2장 작성.

오늘 집에 가서 수정해서 보내야겠다.

'AI 일기' 카테고리의 다른 글

넷째 날. (0)	2024.02.07
셋째 날. (0)	2024.02.05
둘째 날. (0)	2024.02.03
첫째 날. (0)	2024.01.22

Posted by 캬웃

,

넷째 날.

AI 일기 2024. 2. 7. 21:11

LLM 학습을 위한 방법

- Quantization 양자화

- Pruning 가지치기

- Distilation 지식-직류화 (LLM > sLLM)

Fine-tuning을 하는 기법

- Instruction-tuning : task에 대한 instruction을 제공해서 LLM의 이해를 돕는다.(이 기법을 쓰지 않으면 학습이 어렵다.)

Fine-tuning을 위한 기법을 좀 보고

코랩에서 Fine-tuning 코딩을 하자.

이번 설까지 AI 개발자로 몇군데 지원하자.

2월 안에 이직은 쇼부 봐야.

AI에 몸을 던지고자 하는데,

받아줄 분 계시나요~?

( k2mj5ngw55@gmail.com )

'AI 일기' 카테고리의 다른 글

다섯째 날. (0)	2024.02.19
셋째 날. (0)	2024.02.05
둘째 날. (0)	2024.02.03
첫째 날. (0)	2024.01.22

Posted by 캬웃

,

셋째 날.

AI 일기 2024. 2. 5. 00:58

Fine-tuning을 도와주는 Instruction-tuning을 공부함.

코테보는 중.

네트워크 복습.

딥러닝 수학 책 보는 중.

알고리즘 중 정렬 파트 완료.

'AI 일기' 카테고리의 다른 글

다섯째 날. (0)	2024.02.19
넷째 날. (0)	2024.02.07
둘째 날. (0)	2024.02.03
첫째 날. (0)	2024.01.22

Posted by 캬웃

,

수호거북이의 집

'2024/02'에 해당되는 글 11건

구글, 개방향 AI 모델로 급선회... LLM 학습 다음 목표

구글, 개방형 AI 모델로 급선회… 폐쇄형 오픈AI와 ‘진검승부’

[Fine-tuning] Zero-shot 평가 (영어)

'LLM' 카테고리의 다른 글

BERT (Bidirectional Encoder Representations from Transformers)

BART (Bidirectional Auto-Regressive Transformer)

'LLM' 카테고리의 다른 글

[허깅페이스] 평가 제출

[에러] config.json

'LLM' 카테고리의 다른 글

colab 실행 중 Time-out

다섯째 날.

'AI 일기' 카테고리의 다른 글

넷째 날.

'AI 일기' 카테고리의 다른 글

셋째 날.

'AI 일기' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바