Real estate transaction data crawling guide - Automatic collection of apartment and officetel prices

Guide on collecting real estate transaction data, including methods like public APIs, web scraping, and specialized services. Includes official data from the Ministry of Land, Infrastructure and Transport, as well as information from Naver Real Estate, Zigbang, and the Korea Appraisal Board.

8
Real estate transaction data crawling guide - Automatic collection of apartment and officetel prices

"Can I buy this apartment right now?"

To answer this question, you ultimately need data. You should evaluate with numbers, not just intuition, considering factors like actual transaction trends, surrounding market prices, rental rate, and transaction volume changes. Real estate investors, PropTech startups, real estate agencies, and academic researchers all have the same concern: "How can we automatically collect this data?"

This article covers three methods of collecting real estate transaction data:

  1. Public API — Ministry of Land, Infrastructure and Transport's Real Transaction Price Public API (Free, Safest)
  2. Direct Web Scraping — Scraping websites like Naver Real Estate (High technical difficulty)
  3. Professional Services — Automatic data collection using scraping services (Most convenient)

It summarizes the advantages and disadvantages of each method, practical Python code, troubleshooting, and legal precautions.


Table of Contents

  1. Where is Real Estate Data Located?
  2. Method 1: Utilizing Ministry of Land Public API
  3. Method 2: Naver Real Estate Scraping
  4. Method 3: Utilizing Professional Scraping Services
  5. Utilizing Collected Data — Analysis and Visualization
  6. Troubleshooting — Common Problems and Solutions
  7. Legal Precautions
  8. Frequently Asked Questions (FAQ)

Where is Real Estate Data Located?

Real estate transaction data in Korea is provided from various sources, depending on the purpose.

1. Ministry of Land Real Transaction Price Public System

URL: rt.molit.go.kr

This is the official real transaction price data operated by the Ministry of Land, Infrastructure and Transport. When a real estate transaction is reported, it is made public in this system.

Provided Data:
- Apartment, townhouse/multi-family, detached/household sales/rental
- Office-tel sales/rental
- Land, pre-sale/occupancy rights, commercial/business transaction
- Transaction date, transaction amount, area, floors, construction year, etc.

Public Data Portal API:
You can use the Ministry of Land Real Transaction Price API for free on Data Portal. With just an API key, you can automatically collect data through programming.

Pros: Official data, free, no legal risks
Cons: Only provides real transaction prices (no market price or property information), delayed updates

2. Naver Real Estate

URL: land.naver.com

It provides not only real transaction price data but also the most abundant additional information such as property information, market prices, school districts, and surrounding infrastructure.

Provided Data:
- Real transaction prices (based on Ministry of Land data)
- Current property information (listing price, property type)
- Market price information (KB price, Naver's own estimated price)
- Complex information (number of units, parking, maintenance fees, etc.)
- School districts and surrounding facility information

Pros: Rich data, user-friendly
Cons: High difficulty in scraping, subject to terms of use restrictions

3. Zigbang / Dabang

URL: zigbang.com / dabangapp.com

They excel in property information focusing on small houses like one-rooms and two-rooms. Particularly, they have abundant data in the monthly rental market.

Pros: Abundant small house/rental data
Cons: Less apartment sales data compared to Naver Real Estate

4. Korea Real Estate Information Center (formerly Korea Appraisal Board)

URL: reb.or.kr

It provides statistical data such as real estate price trends, rental conversion rates, and sales/lease price indices. Through the R-ONE Real Estate Statistics System, you can access various statistics.

Pros: Official statistics, price indices provided
Cons: Aggregate statistics rather than individual transaction data


Method 1: Utilizing Ministry of Land Public API

This is the most orthodox and stable method. It is safe as it uses public data.

Step 1: Obtaining API Key

  1. Sign up on the Data Portal
  2. Search for "Ministry of Land Apartment Sales Real Transaction Data"
  3. Click "Apply for Use" → Enter purpose of use
  4. Obtain API key (usually takes immediate to 1 day)

Tip: Two types of keys are issued: an encoded key and a decoded key. It is convenient to use the decoded key in Python. Using the encoded key may cause errors due to double encoding by the requests library.

Key API List

API Name Description Service Code
Apartment Sales Real Transaction Data Apartment sales real transaction prices getRTMSDataSvcAptTrade
Apartment Monthly Rent Data Apartment lease/rental prices getRTMSDataSvcAptRent
Townhouse/Multi-family Sales Real Transaction Data Villa/townhouse sales prices getRTMSDataSvcRHTrade
Office-tel Sales Real Transaction Data Office-tel sales prices getRTMSDataSvcOffiTrade
Detached/Household Sales Real Transaction Data Detached house sales prices getRTMSDataSvcSHTrade
Land Sales Real Transaction Data Land transaction prices getRTMSDataSvcLandTrade

Python Code — Querying Apartment Sales Real Transaction Prices

import requests
import xml.etree.ElementTree as ET
import pandas as pd

# API 설정 — 디코딩된 키를 사용하세요
SERVICE_KEY = "발급받은_디코딩_API_키"
BASE_URL = "http://openapi.molit.go.kr/OpenAPI_ToolInstall/service/rest/RTMSOBJSvc/getRTMSDataSvcAptTrade"

def get_apt_trade(lawd_cd: str, deal_ymd: str) -> pd.DataFrame:
    """
    아파트 매매 실거래가를 조회합니다.

    Args:
        lawd_cd: 법정동 앞 5자리 코드 (예: "11680" = 서울 강남구)
        deal_ymd: 계약년월 (예: "202601")

    Returns:
        거래 데이터가 담긴 DataFrame
    """
    params = {
        "serviceKey": SERVICE_KEY,
        "LAWD_CD": lawd_cd,
        "DEAL_YMD": deal_ymd,
        "numOfRows": "9999",
        "pageNo": "1"
    }

    response = requests.get(BASE_URL, params=params, timeout=30)
    response.raise_for_status()

    root = ET.fromstring(response.text)

    # 에러 체크
    result_code = root.find(".//resultCode")
    if result_code is not None and result_code.text != "00":
        result_msg = root.find(".//resultMsg")
        raise Exception(f"API 오류: {result_msg.text if result_msg is not None else 'Unknown'}")

    items = root.findall(".//item")
    if not items:
        return pd.DataFrame()

    data = []
    for item in items:
        row = {}
        for child in item:
            text = child.text.strip() if child.text else ""
            row[child.tag] = text
        data.append(row)

    return pd.DataFrame(data)


# 사용 예시: 서울 강남구(11680) 2026년 1월 거래 조회
df = get_apt_trade("11680", "202601")
print(f"조회된 거래 수: {len(df)}")
if not df.empty:
    # 거래금액 정수 변환 (쉼표와 공백 제거)
    df["거래금액_만원"] = df["거래금액"].str.replace(",", "").str.strip().astype(int)
    print(df[["거래금액_만원", "아파트", "전용면적", ""]].head(10))

Collecting Annual Data for Entire Seoul by District

import time

# 서울시 구별 법정동 앞 5자리 코드
SEOUL_CODES = {
    "강남구": "11680", "서초구": "11650", "송파구": "11710",
    "강동구": "11740", "마포구": "11440", "용산구": "11170",
    "성동구": "11200", "광진구": "11215", "중구": "11140",
    "종로구": "11110", "영등포구": "11560", "동작구": "11590",
    "관악구": "11620", "강서구": "11500", "양천구": "11470",
    "구로구": "11530", "금천구": "11545", "노원구": "11350",
    "도봉구": "11320", "강북구": "11305", "성북구": "11290",
    "동대문구": "11230", "중랑구": "11260", "은평구": "11380",
    "서대문구": "11410"
}

def collect_seoul_yearly(year: int) -> pd.DataFrame:
    """서울시 전체 구의 1년치 아파트 매매 데이터를 수집합니다."""
    all_data = []
    months = [f"{year}{m:02d}" for m in range(1, 13)]
    total = len(SEOUL_CODES) * len(months)
    count = 0

    for gu_name, code in SEOUL_CODES.items():
        for month in months:
            count += 1
            try:
                df = get_apt_trade(code, month)
                if not df.empty:
                    df[""] = gu_name
                    all_data.append(df)
                print(f"[{count}/{total}]  {gu_name} {month}: {len(df)}")
            except Exception as e:
                print(f"[{count}/{total}]  {gu_name} {month}: {e}")
            time.sleep(0.3)  # API 호출 간격

    if all_data:
        result = pd.concat(all_data, ignore_index=True)
        return result
    return pd.DataFrame()

# 2025년 전체 데이터 수집
result = collect_seoul_yearly(2025)
result.to_csv("서울_아파트_실거래가_2025.csv", index=False, encoding="utf-8-sig")
print(f"\n{len(result)}건 수집 완료!")

Limitations of Public API

Public APIs are stable but have limitations:

Limitation Description
Data Scope Provides only real transaction prices and basic information. No additional information like market prices, listings, school districts, etc.
Delayed Updates Not real-time but reflected several days to weeks after being reported
Call Limits Daily API call limits (usually 1,000/day, expandable upon request)
Data Format Default response is in XML format (some APIs also support JSON)
Region Codes Must manage legal district codes separately
API URL Changes Public Data Portal API URLs periodically change

Method 2: Naver Real Estate Scraping

When more extensive data than public API is needed, consider this method. It allows you to collect data like market prices, listings, complex information not provided by public API.

Naver Real Estate's Internal API

Naver Real Estate renders data by calling internal APIs on the frontend. You can check these API calls in the browser developer tools (F12 → Network tab).

Key Endpoints (subject to change):

# 단지 기본 정보
GET https://fin.land.naver.com/complexes/{complexNo}

# 단지 매물 목록
GET https://fin.land.naver.com/complexes/{complexNo}/articles

# 단지 실거래가
GET https://fin.land.naver.com/complexes/{complexNo}/real-estates/trades

# 지역별 단지 목록
GET https://fin.land.naver.com/front-api/v1/complex/marker

Python Code Example — Naver Real Estate Complex Information

import requests
import json
import time

def get_naver_complex(complex_no: str) -> dict | None:
    """
    네이버 부동산에서 아파트 단지 정보를 조회합니다.

    주의: 네이버 내부 API는 수시로 변경될 수 있습니다.
    이 코드가 동작하지 않으면 브라우저 개발자 도구에서
    최신 API 엔드포인트를 확인하세요.
    """
    url = f"https://fin.land.naver.com/complexes/{complex_no}"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/120.0.0.0 Safari/537.36",
        "Referer": "https://land.naver.com/",
        "Accept": "application/json"
    }

    try:
        response = requests.get(url, headers=headers, timeout=10)
        if response.status_code == 200:
            return response.json()
        else:
            print(f"HTTP {response.status_code}: {complex_no}")
            return None
    except requests.exceptions.JSONDecodeError:
        # HTML이 반환된 경우 (API 변경 가능성)
        print(f"JSON 파싱 실패 — API 엔드포인트가 변경되었을 수 있습니다")
        return None
    except requests.exceptions.RequestException as e:
        print(f"요청 실패: {e}")
        return None

Realistic Issues Faced in Scraping

While theoretically simple, Naver Real Estate scraping requires continuous maintenance:

  1. API Endpoint Changes: Naver frequently changes internal API URLs. Code that worked yesterday may break today.
  2. Authentication Tokens: Some data require session tokens or separate authentication, necessitating token renewal logic.
  3. IP Blocking: Sending many requests in a short time can lead to IP blocking, requiring proxy rotation.
  4. JavaScript Rendering: Some pages use client-side rendering instead of server-side rendering, requiring browser automation tools like Selenium or Playwright.
  5. Terms of Use: Naver's terms of use prohibit automated data collection. Refer to a separate article for more details on legal issues related to scraping.

Method 3: Utilizing Professional Scraping Services

If you want to overcome the limitations of public APIs without the maintenance burden of direct scraping, using professional scraping services is a practical choice.

Comparison of Three Methods

Aspect Public API Direct Scraping Professional Service
Initial Setup Several days Weeks to months Immediate
Data Scope Only real transaction prices Market prices + listings + additional info Market prices + listings + additional info
Maintenance Almost none High (adapting to API changes) Managed by service
Legal Risk None Present Managed by service
Cost Free Development/operational costs Service usage fee
IP Blocking Risk None High None
Data Quality High (official data) Variable High
Scalability Limited Direct implementation needed High

Data Collection with HashScraper for Real Estate

HashScraper has extensive experience in collecting real estate-related data:

  • Naver Real Estate Data: Automatically collect property listings by region/type, listing price, area, floors, complex information
  • Integrated Real Transaction Prices: Simplify complex public API calls with an easy interface
  • Monitoring Price Changes: Track price changes regularly and provide alerts
  • Excel/API Integration: Download collected data to Excel or integrate with existing systems via REST API

While building your own scraper requires time for adapting to API changes, handling IP blocking, and data cleaning, using a professional service allows you to focus solely on data analysis.

Interested in learning more about real estate data collection? You can request a consultation through HashScraper.


Utilizing Collected Data — Analysis and Visualization

Once you have collected data, it's time to extract value through analysis.

Basic Analysis: Price Trends by District

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

# 한글 폰트 설정
matplotlib.rcParams['font.family'] = 'AppleGothic'  # macOS
# matplotlib.rcParams['font.family'] = 'Malgun Gothic'  # Windows
matplotlib.rcParams['axes.unicode_minus'] = False

# 데이터 로드 및 전처리
df = pd.read_csv("서울_아파트_실거래가_2025.csv")
df["거래금액_만원"] = (
    df["거래금액"].astype(str).str.replace(",", "").str.strip().astype(int)
)
df["거래월"] = df[""].astype(str) + "-" + df[""].astype(str).str.zfill(2)

# 구별 월평균 매매가
monthly = df.groupby(["", "거래월"])["거래금액_만원"].mean().reset_index()

# 강남 3구 비교 그래프
fig, ax = plt.subplots(figsize=(12, 6))
for gu in ["강남구", "서초구", "송파구"]:
    data = monthly[monthly[""] == gu].sort_values("거래월")
    ax.plot(data["거래월"], data["거래금액_만원"] / 10000, marker="o", label=gu)

ax.set_title("강남 3구 아파트 평균 매매가 추이 (2025)", fontsize=14)
ax.set_xlabel("거래월")
ax.set_ylabel("평균 매매가 (억 원)")
ax.legend()
ax.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("강남3구_매매가_추이.png", dpi=150)
plt.show()

Advanced Analysis: Calculating Rental Price Ratio

Rental price ratio (rental price ÷ sales price × 100) is a key indicator in real estate investment. A higher rental price ratio indicates higher attractiveness for gap investments, while a lower ratio suggests a potential overheated sales market.

def analyze_jeonse_rate(trade_file: str, rent_file: str, gu: str) -> pd.DataFrame:
    """
    특정 구의 아파트별 전세가율을 계산합니다.
    전세가율 = 전세 보증금 / 매매가 × 100
    """
    trade = pd.read_csv(trade_file)
    rent = pd.read_csv(rent_file)

    # 해당 구 필터링
    trade = trade[trade[""] == gu].copy()
    rent = rent[rent[""] == gu].copy()

    # 금액 정수 변환
    trade["매매가"] = trade["거래금액"].astype(str).str.replace(",", "").str.strip().astype(int)
    rent["보증금"] = rent["보증금액"].astype(str).str.replace(",", "").str.strip().astype(int)

    # 전세만 필터링 (월세 제외: 월세금액이 0인 경우)
    rent = rent[rent["월세금액"].astype(str).str.strip().replace("", "0").astype(int) == 0]

    # 아파트별 평균가
    avg_trade = trade.groupby("아파트")["매매가"].mean()
    avg_rent = rent.groupby("아파트")["보증금"].mean()

    # 전세가율 계산
    common = avg_trade.index.intersection(avg_rent.index)
    rates = pd.DataFrame({
        "아파트": common,
        "평균매매가_만원": avg_trade[common].values,
        "평균전세가_만원": avg_rent[common].values,
    })
    rates["전세가율"] = (rates["평균전세가_만원"] / rates["평균매매가_만원"] * 100).round(1)
    rates = rates.sort_values("전세가율", ascending=False)

    return rates

# 강남구 전세가율 분석
rates = analyze_jeonse_rate(
    "서울_아파트_매매_2025.csv",
    "서울_아파트_전세_2025.csv",
    "강남구"
)
print(rates.head(20))

Use Cases of Data Utilization

User Purpose Required Data
Individual Investors Determine buying timing, select investment areas Real transaction trends, rental price ratio, transaction volume
PropTech Startups Price prediction models, automated price evaluation Bulk real transaction prices, market prices, complex information
Real Estate Agencies Property comparison, customer consultation material Current listings, real transaction prices, market prices
Academic Research Analyze housing price determinants Real transaction prices + population/economic data
Financial Institutions Collateral assessment, risk management Market prices, real transaction prices, price indices

Troubleshooting — Common Problems and Solutions

Related to Public API

Issue 1: "SERVICE_KEY_IS_NOT_REGISTERED_ERROR"
→ This error occurs when using the encoded key and requests double encode. Use the decoded key instead.

Issue 2: "LIMITED_NUMBER_OF_SERVICE_REQUESTS_EXCEEDS_ERROR"
→ Exceeded daily call limit. Apply for increased traffic on the Public Data Portal or try again the next day.

Issue 3: Data returns as 0 records
→ Check if the region code (LAWD_CD) is correct. Use only the first 5 digits of the legal district code. Also, there may be no transactions in that month.

Issue 4: API URL changes
→ Check the detailed page of the API on the Public Data Portal. The URL structure may change periodically. Checking the "Request Address (Endpoint)" on the API detailed page is the most accurate.

Related to Naver Real Estate Scraping

Issue 1: 403 Forbidden response
→ User-Agent header is missing or IP is blocked. Set appropriate headers and increase request intervals.

Issue 2: JSON parsing failure (HTML returned)
→ The API endpoint may have changed. Check the latest URL in the browser developer tools.

Issue 3: Partial data returned
→ It may be an API requiring pagination. Check the page and size parameters.


Legal Precautions

Legal issues must be considered when collecting real estate data.

Public API — Safe

Real transaction price data collected using the Ministry of Land public API is public data. According to the "Act on the Activation of Public Data Provision and Use" (Public Data Act), anyone can freely use it for commercial or non-commercial purposes. However, compliance with the API usage limits is necessary.

Naver Real Estate Scraping — Caution Required

  • Terms of Use: Naver's terms of use prohibit automated data collection. Violation may lead to civil risks.
  • Server Load: Excessive requests causing server load may violate the Information and Communication Network Act.
  • Data Copyright: Naver's independently generated price estimates or analysis data may be protected as copyrighted works. However, parts where Naver simply reopens Ministry of Land real transaction data maintain the original public data nature.
  • Personal Information: Property information may include broker contact details, so be cautious about collecting and using personal information.

Recommended Approach

  1. Prioritize Public API: If real transaction data analysis is the main goal, use the public API.
  2. Additional Information from Professional Service: For market prices, listings, etc., not provided by the public API, consider using professional scraping services.
  3. Minimize Direct Scraping: If you choose direct scraping, limit request speed and comply with robots.txt.

Frequently Asked Questions (FAQ)

Q. Is the public API really free?

Yes, the Ministry of Land Real Transaction Price API on the Data Portal is free. Once you sign up and obtain an API key, you can start using it immediately. There is a daily call limit (usually 1,000/day), and you can request increased traffic for large data needs.

Q. Where can I check the region code (LAWD_CD)?

Legal district codes can be downloaded from the Administrative Standard Code Management System. Downloading the "Legal District Code Full Data" provides codes for all legal districts nationwide. The LAWD_CD used in the API is the first 5 digits of this code (municipality level).

Q. Are Naver Real Estate market prices accurate?

Naver Real Estate's market prices are a reference value compiled from various sources like KB Kookmin Bank prices, Korea Real Estate Information Center prices, etc. They may differ from real transaction prices, and it's best to use them as a reference rather than the sole basis for decision-making. The most objective data is the actual real transaction prices.

Q. Can I use the data collected in an app or service?

Real transaction price data collected via public API can be freely used for commercial purposes as well (attribution recommended). However, redistributing data scraped from private services like Naver Real Estate for commercial purposes may violate terms of use and copyright infringement.

Q. What data is needed for rental price ratio and gap investment analysis?

Rental price ratio analysis requires both the sales price and rental price of the same apartment. By combining the "Apartment Sales Real Transaction Data" and "Apartment Monthly Rent Data" from the public API, you can perform this analysis. For accurate analysis, compare recent transactions of the same area size for the past 3-6 months and exclude outliers (special transactions).

Q. Can office-tel data be collected in the same way?

Yes, by separately applying for the "Ministry of Land_Office-tel Sales Real Transaction Data" API on the Public Data Portal. The code structure is the same as the apartment API, just change the service code in the BASE_URL to getRTMSDataSvcOffiTrade.


Conclusion

Real estate data collection methods vary depending on the purpose:

  • If only real transaction analysis is needed → Ministry of Land Public API is the best choice (free, legal, stable)
  • If market price and property information are also needed → Consider using professional scraping services
  • For small-scale learning/testing purposes → Direct scraping is also possible (check legal precautions)

Regardless of the method chosen, the key is to secure reliable data stably. Establishing a continuous data pipeline rather than one-time collection is essential for gaining practical insights.


Need automated real estate data collection? HashScraper can collect data from various real estate platforms like Naver Real Estate, Zigbang, Dabang without IP blocking. Request a free consultation

Comments

Add Comment

Your email won't be published and will only be used for reply notifications.

Continue Reading

Get notified of new posts

We'll email you when 해시스크래퍼 기술 블로그 publishes new content.

Your email will only be used for new post notifications.