An LLM development project plan was adopted for the “Post-5G Information Communication System Infrastructure Enhancement Research and Development Project/Post-5G Information Communication System Development” publicly solicited by ABEJA and NEDO

Abeja · Feb 1 10:00

人とAIの協調により「ゆたかな世界を、実装する」株式会社ABEJA（本社：東京都港区、代表取締役CEO：岡田陽介、以下「ABEJA」）は、国立研究開発法人新エネルギー・産業技術総合開発機構（以下「NEDO」）が公募した「ポスト5G情報通信システム基盤強化研究開発事業※1／ポスト5G情報通信システムの開発」に当社提案の「LLM※2の社会実装に向けた特化型モデルの元となる汎化的LLM」が採択されましたことをお知らせいたします。

ABEJAは、主にLLMの構築に必要となる計算リソースについて、7億円規模の助成金の交付を受ける予定です。

ABEJAは、LLMの社会実装に必要不可欠となる精度および計算コストパフォーマンスの飛躍的な向上を目的に、日本語LLMおよび周辺技術（RAG※3、Agent※4）の研究開発を行います。

また、LLMの利活用の推進や社会全体におけるAI技術革新の加速、次世代の研究者や技術者の育成に貢献できるよう、開発したLLMおよびソースコードや開発ノウハウなどを適宜公開してまいります。

なお、当社の事業化においては、デジタル版EMS「ABEJA Platform」に2023年より搭載している「ABEJA LLM Series」と合わせ、広く提供を行う予定です。ビジネスモデルは、オープンソースソフトウェア(OSS)※5におけるディストリビューションモデル※6を想定しており、公開するLLMの利活用に伴い必要となるサポートを有償で提供する予定です。

ABEJAは、2018年より生成AIの一つであるLLMにおける研究開発を進め、2023年3月以降は、「ABEJA LLM Series」をABEJA Platformに搭載し、顧客企業に提供してまいりました。現在は、顧客企業のLLMの実装を実現すべく、サポート領域をより広範囲に拡大し、戦略策定やビジネスプロセスの構築、ビジネスプロセス上での運用まで、一気通貫で顧客支援を担うとともに、更なるサービスの拡充を図り、LLMの研究開発を継続して進めております。

ABEJAはこのたびの採択を受け、当該事業は、当社の経営理念である「ゆたかな世界を、実装する」を実現する上でも意義のある取り組みであり、社会全体におけるLLMの実装を加速させる一助を担うと考えております。

現在、世界中の企業がLLMを中心とする生成AIから生み出される巨大な価値の享受を目指し、様々な取り組みを開始しています。実際、LLMの市場規模は急速な拡大が見込まれており、日本での対話AIビジネスの市場規模は楽観的なシナリオで2023年度の140億円から2027年度には6,905億円（年間平均成長率165.0％、CAGR：2023年度-2027年度）に成長すると予想されており（出所：株式会社シード・プランニング「2023年版対話AIビジネスの現状と将来展望」）、ABEJAにおいてもベースシナリオとして2,000億円規模の市場を見込んでおります。

LLMの利活用により産業構造に大きな変革が生じることが期待されている一方、現状では、LLMの利用時に大規模な計算リソースの消費が不可避となるため、投資対効果を勘案すると適用範囲に制約が生じ、LLMの社会実装の妨げの一因となっています。また、LLMが抱える課題の代表的なものとして、最新の情報や更新された情報に対応しない「知識のカットオフ」や、事実に基づかない不正確な情報を生成する「ハルシネーション」があります。これは、LLMの知識が膨大な量の「既存の」データに基づいており、「学習データに存在する不備や誤情報も学習する」というLLM特有の性質に基づくものです。LLMの精度向上には、誤った情報や偏った情報を含むデータを排除し、正確で信頼性の高いデータを学習させることが必要不可欠です。対処手法として、学習済のLLMに新たなデータセットを用いて追加学習させる「fine-tuning」がありますが、都度大規模な計算リソースを消費し、コストや時間がかかります。このため、一部のエンタープライズ企業の適用に限られているのが実情です。2023年にOpenAIが、「GPT-3.5 Turbo」のfine-tuning機能を発表しましたが、対応可能なデータ量は4,096トークン、50MB以下のファイルに限定されており、実用性に課題があります。

こうした現状の課題解決に有望視されている手法に、「RAG（Retrieval-Augmented Generation）」があります。RAGは、LLMと外部のデータベースや情報源（以下「外部データ」）を結びつけ、LLMが外部データの知見を組み込んだ回答を生成できる技術です。fine-tuningを都度行うことなく、外部データを入れ替えるだけで、外部データに関連する高精度な回答を行うことが可能となります。また、「Agent」の最適化を行うことで、入力された内容を基にLLMが自律的にAPIやツール活用など必要なアクションを計画・実行できるようになります。

ABEJAは、RAGによる精度の向上およびAgentの最適化が、計算コストパフォーマンスを向上させ、経済的合理性や適用範囲の拡張性をもたらし、LLMの社会実装を強力に推進すると考えております。現在用いられるRAGには、技術進歩の余地があると捉えており、LLMおよび周辺技術（RAG、Agent）の研究開発に統合的に取り組むことで、実用性の高い先駆的な手法を実現してまいります。なお、LLM単体の研究開発においては、オープンソースの既存LLMをベンチマークとし、公開時にJGLUE※7の全項目で、トップスコアを達成することを目標としています。

ABEJAは、今後、日本が国際的なAI分野において重要な役割を担い、国際社会における情報処理技術の新たなスタンダードの確立をすることを視野にいれております。

ABEJAは、生成AIを利活用する企業や組織の増加、社会におけるAI技術革新の大幅な加速、そして次世代の研究者や技術者の育成を目的に、研究開発で得られたLLMおよびソースコードや開発ノウハウなどを社会に提供することでLLMの社会実装を推進し、ABEJAの企業理念である「ゆたかな世界を、実装する」の実現に努めてまいります。

事業概要

公募事業名	ポスト5G情報通信システム基盤強化研究開発事業／ポスト5G情報通信システムの開発
当社応募事業名	LLMの社会実装に向けた特化型モデルの元となる汎化的LLMに関する研究開発
実施期間	2024年2月～2024年8月
目　的	・LLMの社会実装に向け、汎用的な活用を見据えた日本語LLMと周辺技術（RAG、Agent）の研究開発・研究開発で得られた成果物（LLM、ソースコード、開発ノウハウなど）を公開し、生成AIの利活用、社会におけるAI技術革新の加速、次世代の研究者や技術者の育成を推進する・国際的なAI分野において、日本が重要な役割を担い、国際社会における情報処理技術の新たなスタンダードを確立する
概　要	・特化型の元となる汎化的なLLMの研究開発 - オープンソースのLLMをベンチマークとし、評価でトップスコアを達成 - 周辺技術（RAG、Agent）の精度を向上し、データ活用を推進・社会実装に向け、自社ビジネスに関連させた展開を図り、一部モデル・ノウハウ等の成果物も公開・提供 - 研究開発したLLMと周辺技術（RAG、Agent）を、現在提供しているサービスと合わせて広く提供 - 研究開発で得られた成果物（ソースコード・モデル・開発ノウハウ）を公開
NEDO公表内容	採択結果公表ページ URL：https://www.nedo.go.jp/koubo/IT3_100304.html

■ 全体概要図（イメージ）

■ 実施スケジュール

用語について

※	用　語	内　容
1	ポスト5G情報通信システム基盤強化研究開発事業	日本国内におけるポスト5G情報通信システムの開発・製造基盤強化を目指し、中核となる技術を開発する事業。ポスト5G情報通信システムとは、第5世代移動通信システム（5G）より更に超低遅延や多数同時接続といった機能が強化されたポスト5Gに対応した通信システムを指す。https://www.meti.go.jp/policy/mono_info_service/joho/post5g/index.html
2	LLM	Large Language Modelの略称で、生成AIの領域の一つである大規模言語モデル。
3	RAG	Retrieval-Augmented Generationの略称。外部のデータベースや情報源を結びつける技術。この技術の活用により、LLMが外部のデータベースや情報源の知見を組み込んだ精度の高い回答を生成できるようになる。
4	Agent	Agentは、自律的なアクションを計画・実行できるようにする技術。この技術を用いることで、LLMが自律的に意思決定をして、入力された内容を基にAPIやツールの活用などのアクションを計画し実行できるようになる。これにより、自律的に学習データに含まれていない外部データを用いた回答を作成することが可能になる。
5	オープンソースソフトウェア(OSS)	利用者の目的を問わず、無償でソースコードを使用、調査、再利用、修正、拡張、再配布が可能なソフトウェアの総称。
6	ディストリビューションモデル	OSSの提供会社または他のコミュニティで開発した、OSSを組み込んだモデルに必要になる保守やバグ、セキュリティなどアップデートに関するサポートを行うビジネスモデル。ABEJAは、このたびの事業化において「Red Hat Enterprise Linux(RHEL)]の手法を想定。
7	JGLUE	日本語の一般的な言語理解能力を測るためのデータセット群。LLMモデルを様々な観点から評価する。

■ 株式会社ABEJAについて

ABEJAは、「ゆたかな世界を、実装する」を経営理念とし、「ABEJA Platform」を基盤に顧客企業の基幹業務のプロセスを変革し、ビジネスの継続的な収益成長の実現に伴走する「デジタルプラットフォーム事業」を展開しています。2012年の創業時よりABEJA Platformの研究開発を進めており、これまで多種多様な業界・業態の300社以上のデジタル変革をABEJA Platform上で実現してきました。また、「Human In the Loop」をはじめとする高度なノウハウやアプローチを用いて、デジタル変革に必要不可欠な「人とAIの協調」を実現し、戦略的かつ効率的に顧客の基幹業務を変革し、さらにはビジネスモデルの革新に取り組んでいます。

本社：東京都港区三田一丁目1番14号 Bizflex麻布十番2階

設立：2012年9月10日

代表者：代表取締役CEO 岡田陽介

事業：デジタルプラットフォーム事業

URL：https://abejainc.com

ABEJA Co., Ltd. (Headquarters: Minato-ku, Tokyo; Representative Director and CEO: Okada Yosuke; hereinafter “ABEJA”) is a generalized “post-5G information communication system infrastructure enhancement research and development project*1/post-5G information communication system development” proposed by the National Research and Development Corporation New Energy and Industrial Technology Development Organization (hereinafter “NEDO”) that “implements a rich world” through collaboration between humans and AI We are pleased to announce that “LLM” has been selected.

ABEJA plans to receive grants of 700 million yen, mainly for computational resources required to build an LLM.

ABEJA conducts research and development of Japanese LLM and peripheral technologies (RAG*3, Agent*4) with the aim of dramatically improving accuracy and computational cost performance, which are essential for social implementation of LLM.

Furthermore, we will disclose the developed LLM, source code, development know-how, etc. as appropriate so that we can promote the utilization of LLM, accelerate AI technology innovation in society as a whole, and contribute to the development of the next generation of researchers and engineers.

Furthermore, in our commercialization, we plan to offer it widely in conjunction with the “ABEJA LLM Series,” which has been installed on the digital EMS “ABEJA Platform” since 2023. The business model assumes a distribution model*6 for open source software (OSS) *5, and it is planned to provide necessary support for a fee in connection with the utilization of the LLM to be released.

ABEJA has been promoting research and development in LLM, which is one type of generative AI, since 2018, and since 2023/3, the “ABEJA LLM Series” has been installed on the ABEJA Platform and provided to client companies. Currently, in order to realize the implementation of LLM for client companies, we have expanded our support area more widely, and are responsible for customer support all the way through strategy formulation, business process construction, and operation in the business process, and we are working to further expand our services, and continue to advance LLM research and development.

ABEJA has recently been adopted, and we believe that this project is a meaningful initiative in realizing our management philosophy of “implementing a rich world,” and will help accelerate the implementation of LLM in society as a whole.

Currently, companies around the world are starting various initiatives with the aim of enjoying the huge value generated from generative AI centered on LLM. In fact, the LLM market size is expected to expand rapidly, and the market size of the dialogue AI business in Japan is expected to grow from 14 billion yen in fiscal 2023 to 690.5 billion yen (average annual growth rate 165.0%, CAGR: 2023-2027) under an optimistic scenario (source: Seed Planning Co., Ltd. “Current Status and Future Prospects of 2023 Dialogue AI Business”), and is also based on ABEJA We anticipate a market of 200 billion yen as a scenario.

While it is expected that major changes in the industrial structure will occur due to the utilization of LLM, at present, consumption of large-scale computational resources is unavoidable when using LLM, so restrictions on the scope of application occur when return on investment is taken into account, which is one of the causes of hindrance to social implementation of LLM. Also, typical issues faced by LLM are “knowledge cutoffs” that do not respond to the latest information or updated information, and “halcination,” which generates inaccurate information that is not based on facts. This is because LLM knowledge is based on huge amounts of “existing” data, and it is based on LLM's unique property of “learning even incompleteness and misinformation present in learning data.” In order to improve the accuracy of LLM, it is essential to eliminate data containing incorrect or biased information and learn accurate and reliable data. As a coping method, there is “fine-tuning,” where additional learning is performed using a new data set to the LLM that has already been learned, but it consumes large computational resources each time, which is costly and time-consuming. For this reason, the reality is that it is limited to the application of some enterprise companies. OpenAI announced the fine-tuning function of the “GPT-3.5 Turbo” in 2023, but the amount of data that can be handled is limited to 4,096 tokens and files of 50 MB or less, so there are issues with practicality.

“RAG (Retrieval-Augmented Generation)” is a method that is viewed as promising for solving such current issues. RAG is a technology that links LLM with external databases and information sources (hereinafter “external data”), and allows LLM to generate answers incorporating knowledge from external data. It is possible to perform high-precision answers related to external data simply by replacing external data without performing fine-tuning each time. Also, by optimizing the “Agent”, LLM will be able to autonomously plan and execute necessary actions, such as utilizing APIs and tools, based on input content.

ABEJA believes that improving accuracy through RAG and optimizing agents will improve computational cost performance, bring economic rationality and expandability of application scope, and strongly promote social implementation of LLM. We believe that there is room for technological progress in RAG currently being used, and we will realize pioneering methods with high practicality by working integrally on LLM and peripheral technology (RAG, Agent) research and development. Note, in LLM standalone research and development, existing open source LLMs are used as benchmarks, and the goal is to achieve top scores in all JGLUE*7 items at the time of publication.

ABEJA is considering that Japan will play an important role in the international AI field in the future and establish a new standard for information processing technology in the international community.

ABEJA promotes social implementation of LLM by providing society with LLM, source code, development know-how, etc. obtained through research and development for the purpose of increasing the number of companies and organizations utilizing generative AI, drastically accelerating AI technology innovation in society, and developing the next generation of researchers and engineers, and strives to realize ABEJA's corporate philosophy of “implementing a spacious world.”

Business Overview

Public offering business name	Post-5G Information Communication System Infrastructure Enhancement Research and Development Project/Post-5G Information Communication System Development
The name of the business we are applying for	Research and development on generalized LLM as the basis for specialized models for social implementation of LLM
Implementation period	2024/2 to 2024/8
purpose	・Research and development of Japanese LLM and peripheral technology (RAG, Agent) with an eye on general-purpose use for the social implementation of LLM ・Disclose deliverables (LLM, source code, development know-how, etc.) obtained through research and development, and promote utilization of generative AI, acceleration of AI technology innovation in society, and development of the next generation of researchers and engineers ・Japan will play an important role in the international AI field and establish a new standard for information processing technology in the international community
Overview	・Generalized LLM research and development as a source of specialization - Achieve top scores in evaluations using open source LLM as a benchmark - Improve the accuracy of peripheral technology (RAG, Agent) and promote data utilization ・For social implementation, we aim for development related to our own business, and also disclose and provide deliverables such as some models and know-how - We offer a wide range of LLM and peripheral technology (RAG, Agent) that we have researched and developed along with the services we currently provide - Publish deliverables (source code, models, development know-how) obtained through research and development
NEDO publication details	Adoption results publication page URL:https://www.nedo.go.jp/koubo/IT3_100304.html

■ Overall Overview Diagram (image)

■ Implementation schedule

About terms

※	terms	content
1	Post-5G Information Communication System Infrastructure Enhancement Research and Development Project	A business that develops core technology with the aim of strengthening the development and manufacturing infrastructure of post-5G information communication systems within Japan. A post-5G information communication system indicates a communication system compatible with post-5G with further enhanced functions such as ultra-low latency and multiple simultaneous connections compared to 5th generation mobile communication systems (5G).https://www.meti.go.jp/policy/mono_info_service/joho/post5g/index.html
2	LLM	It is an abbreviation of Large Language Model, and large-scale language model is one of the areas of generative AI.
3	RAG	An abbreviation for Retrieval-Augmented Generation. Technology that links external databases and information sources. By utilizing this technology, LLM will be able to generate highly accurate responses incorporating knowledge from external databases and information sources.
4	agent	An agent is a technology that makes it possible to plan and execute autonomous actions. By using this technology, LLM can autonomously make decisions and plan and execute actions such as utilizing APIs and tools based on the inputted content. Thus, it is possible to autonomously create answers using external data not included in learning data.
5	Open source software (OSS)	A general term for software that can use, investigate, reuse, modify, expand, and redistribute source code free of charge regardless of the user's purpose.
6	distribution model	A business model developed by an OSS provider or other community that provides support related to maintenance, bugs, security, and other updates required for models incorporating OSS. ABEJA assumes the “Red Hat Enterprise Linux (RHEL)] method for this commercialization.
7	JGLUE	A set of data sets for measuring general language comprehension ability in Japanese. The LLM model is evaluated from various perspectives.

■ About ABEJA Co., Ltd.

ABEJA has a management philosophy of “implementing a spacious world,” and is developing a “digital platform business” that transforms the core business processes of client companies based on the “ABEJA Platform,” and continues to achieve business profit growth. We have been promoting research and development on the ABEJA Platform since our establishment in 2012, and until now, we have realized digital transformation for more than 300 companies in a wide variety of industries and business categories on the ABEJA Platform. Furthermore, using advanced know-how and approaches such as “Human In the Loop,” we have realized “human-AI coordination,” which is essential for digital transformation, strategically and efficiently transform core customer operations, and are also working to innovate business models.

Headquarters: 2nd Floor, Bizflex Azabujuban, 1-14 Sanda, Minato-ku, Tokyo

Established: 2012/9/10

Representative: Representative Director and CEO Yosuke Okada

Business: Digital platform business

URL:https://abejainc.com

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more

ABEJA、NEDOが公募した「ポスト5G情報通信システム基盤強化研究開発事業／ポスト5G情報通信システムの開発」に、LLM開発事業案が採択

An LLM development project plan was adopted for the “Post-5G Information Communication System Infrastructure Enhancement Research and Development Project/Post-5G Information Communication System Development” publicly solicited by ABEJA and NEDO

Risk Disclaimer

Statement