"Web scraping is illegal" — Many people still believe this.
The government has officially responded. "It is not."
If you are a company that utilizes web scraping for business purposes, you may have experienced postponing data collection projects due to legal risks. While competitors are already collecting the same data, you may feel like we are the only ones hesitating.
On February 26, 2026, the Ministry of Culture, Sports and Tourism, the Ministry of Science and ICT, the National AI Strategy Commission, and the Korea Copyright Commission jointly published the "Fair Use Guidelines for Learning Copyrighted Works through Generative Artificial Intelligence."
The key message of these guidelines is clear:
"Even for commercial purposes or learning through web automated collection (web scraping), it is not excluded from fair use."
In this article, we will summarize the key points of these guidelines from the perspective of web scraping practitioners.
Table of Contents
- Background of the Guidelines
- What is Fair Use?
- Four Criteria for Fair Use
- Key Points for Web Scraping Operators to Know
- Cases Not Covered by Fair Use
- Self-Assessment of Fair Use in 5 Questions
- Government Policies Being Implemented Together
- FAQ
Background of the Guidelines
As we enter the era of generative AI, there is a heated global legal debate on "AI learning copyrighted works." In the United States, The New York Times has filed a lawsuit against OpenAI, while Japan maintains a relatively lenient stance on AI learning.
In Korea, AI companies, content creators, and data collection operators all needed clear standards on "how far is permissible."
These guidelines were led by the Ministry of Culture, Sports and Tourism and the Copyright Commission, with joint review by the Ministry of Science and ICT and the AI Strategy Commission. They are highly reliable as they were published after extensive input from field experts, related ministries, and professionals.
What is Fair Use?
Fair Use, as stipulated in Article 35-5 of the Copyright Act, is a legal exception that allows the use of copyrighted works without the permission of the copyright holder. It was introduced in December 2011.
In simple terms, it means that not all use of copyrighted works constitutes copyright infringement.
To be recognized as fair use, the following four elements are considered comprehensively. A conclusion is not reached based on a single element.
Four Criteria for Fair Use
Criterion 1: Purpose and Nature of the Use
It considers whether the use is commercial or nonprofit.
The guidelines provide an important clue here:
"Even for commercial purposes, commercial use alone does not necessarily negate fair use."
"Scraping for profit = illegal" is not true.
It is more important to determine whether the purpose of the use is to replace the original work or to create new value through transformative use.
Additionally, factors such as measures to prevent illegal reproduction and unauthorized access, and the method and manner of use are also considered under this criterion.
Criterion 2: Type and Purpose of the Work
It distinguishes whether the work used is factual information or creative expression.
- Factual information (facts in news articles, product prices, review ratings, etc.) → Higher likelihood of fair use recognition
- Highly creative expression (novels, movies, art, music, etc.) → Fair use recognition is stricter
- Unpublished works → Considered less favorably than published works
If the data collected through scraping mainly consists of factual information such as price information, reviews, product specifications, etc., it is in a favorable position under this criterion.
Criterion 3: Amount and Substantiality of the Portion Used
It looks at how much of the original work was used.
If the entire work is replicated in its entirety, it is considered less favorable. However, if the use is within a necessary or unavoidable range for the intended purpose, there may be room for favorable consideration.
Criterion 4: Effect on the Market
This is the most important criterion. It assesses whether the use replaces or damages the market value of the original work.
- Using scraped data for analytical purposes → Does not replace the market of the original work
- Reposting scraped content as it is → Directly replaces the market of the original work → Higher possibility of fair use denial
Considerations include damage to sales of the work, economic losses, and loss of licensing opportunities.
Key Points for Web Scraping Operators to Know
The scraping method itself is not a negative factor in fair use determination.
- The guidelines explicitly state that "even learning through web automated collection (web scraping) is not excluded from fair use."
- What matters is not the collection method, but how the collected data is utilized.
Commercial purposes alone are not problematic.
- Even if data is collected for business purposes, fair use may be recognized if it is a transformative use that does not replace the market of the original work.
robots.txt and terms of use are also considerations.
- Ignoring technical protection measures (such as robots.txt) and collecting data can work against you in the first criterion (method and manner of use) assessment.
- Respecting a site's access restrictions is a basic principle to reduce legal risks.
Cases Not Covered by Fair Use
It is important to clearly understand cases where fair use is unlikely to be recognized.
- Reposting collected content as it is
- Creating services that directly replace the market of the original work
- Evading technical protection measures to collect data
- Massively replicating highly creative works for use
The key point is that it depends not on the act of collecting but on the way the collected data is utilized.
Self-Assessment of Fair Use in 5 Questions
If you are using web scraping for business purposes, count how many of the following items apply to you.
- You are using the collected data for analysis or processing purposes.
- You do not repost the original content as it is.
- You check the robots.txt policy of the target site.
- The collected data mainly consists of factual information (prices, specs, statistics).
- Your scraping activities do not disrupt the normal operation of the target site.
4-5 applicable: You are likely within the fair use scope based on these guidelines.
2-3 applicable: We recommend reviewing your data usage practices.
0-1 applicable: We recommend consulting a legal expert.
Government Policies Being Implemented Together
These guidelines were not published in isolation. It is also important to pay attention to practical support policies being jointly implemented by related ministries.
- Introduction of a new type of 'Public License AI Learning' (January 28, 2026) — Clear standards for the use of public works for AI learning have been established
- R&D tax deduction for AI learning data purchase costs — Reduces the burden of acquiring learning data costs
- Establishment of a specialized dispute resolution center for AI at the Copyright Commission — Provides expert consultation, advice, and mediation for copyright disputes related to AI learning
- Establishment of a system for integrated provision of learning data — Reduces transaction costs for rights information verification
It is significant that the government is not only issuing guidelines but also simultaneously working on substantial institutional improvements for the balance between the AI industry and copyright.
FAQ
Q. Does this guideline have legal binding force?
No. This guideline is of a guideline nature and does not replace court decisions. The final judgment will be made by the court based on specific facts. However, as it is an official standard jointly published by four agencies (Ministry of Culture, Sports and Tourism, Ministry of Science and ICT, AI Strategy Commission, Copyright Commission), it is significant as a reference material in future disputes.
Q. Is using data collected through scraping for AI learning legal?
It cannot be definitively judged as "legal/illegal." The four criteria for fair use must be comprehensively reviewed, with particular emphasis on the nature and usage of the collected data.
Q. Does this apply to data collected through hash scraping?
Hash scraping is a service that structures and provides publicly available web data designated by customers. The final use of the collected data is the responsibility of the customers, and most customers use it for analysis purposes such as market analysis, price monitoring, trend identification. This usage aligns well with the fair use criteria outlined in these guidelines.
Q. Where can I view the original guidelines?
You can view the full text on the official website of the Korea Copyright Commission (www.copyright.or.kr). The formal title of the guidelines is "Fair Use Guidelines for Learning Copyrighted Works through Generative Artificial Intelligence."
The vague perception that "web scraping is illegal" is now officially being corrected at the government level.
The important thing is not the collection method but the usage method.
If you collect publicly available web data for analysis purposes and use it in a way that does not replace the market of the original work, there is a high likelihood that you are within the scope of fair use.
If you have legal questions regarding data collection, please feel free to contact the HashScraper team at any time.




