Adding web crawling functionality to an AI agent - Complete tutorial on setting up on the MCP server

A perfect tutorial on adding crawling functionality to AI agents with MCP server. With MCP, AI can retrieve and analyze data from websites. Explore what can be done with crawling using MCP.

23
Adding web crawling functionality to an AI agent - Complete tutorial on setting up on the MCP server

If AI can directly fetch data from the web?

Reading Time: 10 minutes | As of January 2026


Key Summary

AI tools like ChatGPT, Claude, and Cursor are powerful, but they cannot directly access real-time web data. Even if you ask, "Tell me the price of this product on Coupang," AI does not actually go into Coupang to check the price.

By using MCP (Model Context Protocol), you can overcome this limitation. By connecting a crawling MCP server to an AI agent, AI can directly fetch and analyze data from websites.

Topics covered in this article:
- What is MCP (explained in a non-technical way)
- Why AI needs web crawling (real use cases)
- How to connect (setting up Claude, Cursor)
- Real-life examples (things you can do with crawling MCP)


Table of Contents

  1. What is MCP
  2. Why AI Needs Crawling
  3. How Crawling MCP Servers Work
  4. Setup Guide: Claude Desktop
  5. Setup Guide: Cursor
  6. Setup Guide: Windsurf
  7. Practical Examples
  8. Comparison of Crawling MCP Servers
  9. Frequently Asked Questions

1. What is MCP

Understanding through analogy

When you install an app on your smartphone, new features are added. Installing KakaoMap gives you navigation, and KakaoBank allows you to transfer money.

MCP is like an app store for AI agents.

Connecting an MCP server to AI gives AI new abilities. By connecting a crawling MCP server, AI can fetch data from websites, and by connecting a database MCP server, AI can query databases.

Technical explanation

MCP (Model Context Protocol) is an open protocol announced by Anthropic in November 2024. It allows AI models to access external tools and data sources in a standardized way.

[AI 에이전트]  ←→  [MCP 프로토콜]  ←→  [MCP 서버 (도구)]
                                         ├── 크롤링 서버
                                         ├── DB 서버
                                         ├── 파일 시스템
                                         └── API 연동

The key is standardization. Before MCP, each AI tool had different plugin methods. With MCP, creating one server makes it usable by all MCP-compatible clients like Claude, Cursor, Windsurf.

Components of MCP

Component Role Example
MCP Client AI agent (requesting side) Claude Desktop, Cursor, Windsurf
MCP Server Tool provider (executing side) Crawling server, DB server, GitHub server
MCP Protocol Communication protocol JSON-RPC based standard message format

2. Why AI Needs Crawling

No matter how smart AI is, it cannot know real-time information not in its training data.

Scenario 1: Market Research

당신: "쿠팡에서 '에어팟 프로' 검색하면 최저가가 얼마야?"

기존 AI: "제가 실시간 가격을 확인할 수 없지만, 
         일반적으로 30만원대입니다..." ( 부정확)

MCP 연결 AI: "쿠팡에서 확인했습니다. 
             에어팟 프로 2 최저가 289,000원, 
             로켓배송 가능, 4.8점 (리뷰 12,340건)" ( 실시간)

Scenario 2: Competitor Monitoring

당신: "우리 제품과 경쟁사 제품의 쿠팡 가격을 비교해줘"

기존 AI: "죄송합니다, 실시간 데이터에 접근할 수 없습니다."

MCP 연결 AI: [쿠팡에서 5개 상품 크롤링 수행]
             "비교 결과:
              - 우리 제품: 45,900원 (4.5점)
              - 경쟁A: 42,000원 (4.3점)
              - 경쟁B: 48,500원 (4.7점)
              가격 경쟁력은 중간이지만 평점이..."

Scenario 3: Content Research

당신: "이 블로그 글의 주요 내용을 요약해줘" [URL 전달]

기존 AI: (URL에 접근 불가)

MCP 연결 AI: [페이지 크롤링 → 본문 추출]
             "이 글의 핵심 3가지:
              1. ...
              2. ...
              3. ..."

Doing these tasks manually takes a lot of time. By connecting crawling through MCP, AI can perform these tasks on your behalf.


3. How Crawling MCP Servers Work

Crawling MCP servers work as follows:

1. 사용자가 AI에게 요청
   "쿠팡에서 이 상품 가격 확인해줘"

2. AI가 MCP 서버에 크롤링 요청
   → POST /scrape { "url": "https://coupang.com/..." }

3. MCP 서버가 크롤링 실행
   → 안티봇 우회
   → JavaScript 렌더링
   → 데이터 추출

4. MCP 서버가 결과 반환
   → { "title": "...", "price": 29900, "rating": 4.8 }

5. AI가 결과를 해석하여 사용자에게 답변
   "해당 상품은 29,900원이며 평점 4.8점입니다."

Key point: Users do not need to know about crawling. Just ask AI in natural language, and the MCP server handles all the technical work behind the scenes.


4. Setup Guide: Claude Desktop

Here's how to connect an MCP server in Claude Desktop.

Step 1: Install Claude Desktop

Download the desktop app from claude.ai/download.

Step 2: Open MCP Settings File

macOS:
```bash

설정 파일 열기

code ~/Library/Application\ Support/Claude/claude_desktop_config.json
```

Windows:
```bash

설정 파일 열기

code %APPDATA%\Claude\claude_desktop_config.json
```

Step 3: Add Crawling MCP Server

The example below shows how to connect to the HashScraper MCP server. You can add other MCP servers in the same format.

{
  "mcpServers": {
    "hashscraper": {
      "command": "npx",
      "args": ["-y", "@hashscraper/mcp-server"],
      "env": {
        "HASHSCRAPER_API_KEY": "your-api-key-here"
      }
    }
  }
}

You can get an API Key by signing up for free at hashscraper.com/mcp.

Step 4: Restart Claude Desktop

After saving the settings, completely exit Claude Desktop and then relaunch it. If you see the tool icon () at the bottom left, the connection is successful.

Step 5: How to Use

Now, just ask Claude questions related to web data:

"https://www.coupang.com/vp/products/12345678 이 상품 정보 알려줘"
"네이버 쇼핑에서 '무선 이어폰' 검색 결과 상위 5개 가져와"
"이 URL의 본문 내용을 요약해줘: https://example.com/article"

5. Setup Guide: Cursor

By connecting an MCP server in Cursor IDE, you can directly utilize web data while coding.

Step 1: Open Cursor Settings

Cmd+Shift+P (Mac) or Ctrl+Shift+P (Windows) → Search for "Cursor Settings" → Go to the MCP tab

Step 2: Add MCP Server

Click "Add new MCP server" and enter the following:

{
  "hashscraper": {
    "command": "npx",
    "args": ["-y", "@hashscraper/mcp-server"],
    "env": {
      "HASHSCRAPER_API_KEY": "your-api-key-here"
    }
  }
}

Step 3: Example Usage

# Cursor의 AI Chat에서:
"쿠팡 API 응답 형식이 궁금한데, 
 실제로 이 URL을 크롤링해서 HTML 구조를 분석해줘"

→ AI가 MCP 서버로 페이지를 크롤링하고, 
   HTML 구조를 분석하여 파싱 코드를 작성해줍니다.

6. Setup Guide: Windsurf

Step 1: Open MCP Settings File

In Windsurf, open the settings file by pressing Cmd+Shift+P (Mac) or Ctrl+Shift+P (Windows) → Search for "Open MCP Config."

Step 2: Add MCP Server

{
  "mcpServers": {
    "hashscraper": {
      "command": "npx",
      "args": ["-y", "@hashscraper/mcp-server"],
      "env": {
        "HASHSCRAPER_API_KEY": "your-api-key-here"
      }
    }
  }
}

Step 3: Utilize in Cascade

Windsurf's Cascade (AI agent) automatically recognizes the MCP server. When you request web data-related tasks to Cascade, it calls the crawling MCP server.

# Cascade에서:
"이 웹페이지의 API 응답 구조를 분석해서 
 타입스크립트 인터페이스를 만들어줘"

→ Cascade가 페이지를 크롤링 → HTML 분석 → 
   TypeScript interface 자동 생성

7. Practical Examples

Example 1: Automating Price Monitoring

프롬프트: "다음 5개 URL의 상품 가격을 가져와서 표로 정리해줘"

AI 응답:
| 상품 | 가격 | 배송 | 평점 |
|------|------|------|------|
| 에어팟 프로 2 | 289,000원 | 로켓배송 | 4.8 |
| 갤럭시 버즈 3 | 179,000원 | 일반배송 | 4.6 |
| ...

Example 2: Review Analysis

프롬프트: "이 상품 페이지에서 최근 리뷰 20개를 수집하고, 
         긍정/부정 키워드를 분석해줘"

AI 응답:
긍정 키워드: 음질(12회), 노이즈캔슬링(8회), 디자인(6회)
부정 키워드: 가격(5회), 배터리(3회), 착용감(2회)
전체 만족도: 87% 긍정

Example 3: Competitor Content Analysis

프롬프트: "경쟁사 블로그 3개를 크롤링해서 
         최근 1달간 어떤 주제를 다뤘는지 분석해줘"

AI 응답:
경쟁사 A: AI 활용 사례 (3편), 제품 업데이트 (2편)
경쟁사 B: SEO 가이드 (4편), 고객 성공 사례 (1편)
...
트렌드: AI 관련 콘텐츠가 공통적으로 증가 추세

8. Comparison of Crawling MCP Servers

Here are the main crawling MCP servers available:

Service Anti-bot Bypass Price Features
Firecrawl MCP Basic Free 500 requests, $16/month~ Suitable for general sites, blocked by Akamai
Bright Data MCP Advanced Free 5,000 requests/month, additional paid Global coverage, generous free tier
HashScraper MCP Advanced (including Akamai) Free 100 requests, $35/month~ Specialized in anti-bot, returns parsed JSON
Crawl4AI Basic Open-source (free) Requires self-hosting, does not support advanced anti-bot

Selection Criteria:
- Crawling only general websites, starting for free → Firecrawl or Crawl4AI
- Crawling global sites, generous free usage → Bright Data MCP (5,000 requests free per month)
- Strong anti-bot sites (Akamai, Cloudflare, etc.) → HashScraper MCP


9. Frequently Asked Questions

Q: Do I need to know programming to use MCP?

You need some technical knowledge as you have to modify JSON files during setup. However, following this guide will take only 5 minutes. After setup, you can request AI in natural language without programming.

Q: How much does crawling cost?

It varies depending on the MCP server provider. It ranges from free (Crawl4AI, self-hosting) to tens to hundreds of dollars per month. HashScraper MCP starts at $35/month after a free trial of 100 requests.

Q: Can ChatGPT use MCP?

As of January 2026, ChatGPT does not officially support MCP yet. AI clients that support MCP include Claude Desktop, Cursor, Windsurf. OpenAI may support it in the future.

Q: Can I connect multiple MCP servers simultaneously?

Yes. You can connect crawling servers, database servers, GitHub servers, etc., simultaneously. AI automatically selects the appropriate tool for the situation.

Q: How accurate is the crawling data?

The data returned by MCP servers is extracted from actual websites, so it is identical to the content on the website at that time. Errors may occur in AI's interpretation of this data, but the original data itself is accurate.


Conclusion

MCP greatly expands the capabilities of AI agents. By connecting a crawling MCP server, AI can provide more accurate and useful answers based on real-time web data.

Especially in a business environment, delegating tasks like price monitoring, market research, and review analysis to AI can save a significant amount of time.


Add Crawling to Your AI Agent

HashScraper MCP is a crawling MCP server with built-in anti-bot bypass. It automatically handles strong bot blocking like Akamai, Cloudflare, and more.

Start with 100 requests for free →

5-minute setup, immediate use.


Related Articles

Comments

Add Comment

Your email won't be published and will only be used for reply notifications.

Continue Reading

Get notified of new posts

We'll email you when 해시스크래퍼 기술 블로그 publishes new content.

Your email will only be used for new post notifications.