0. Web Crawling, Manual Execution Too Troublesome?
Have you found it cumbersome to manually execute web crawling code? We introduce a method for Python code to run automatically at desired times and intervals. Let's start automating together!
1. Utilizing Python Scheduler
If you have written web scraping code in Python, one of the easiest ways to automate it is by utilizing the 'schedule' library in Python.
1.1. Installing the Library
pip install schedule
1.2. Automation Code
import schedule
import time
def job():
print("크롤링 시작!")
# 여기에 웹 크롤링 코드를 넣으세요.
schedule.every(10).minutes.do(job) # 10분마다
# schedule.every().hour.do(job) # 1시간마다
# schedule.every().day.at("10:30").do(job) # 매일 10:30에
while True:
schedule.run_pending()
time.sleep(1)
2. Utilizing System Scheduler
A system scheduler is a tool provided by the operating system that allows users to automatically execute desired tasks at preset times or intervals. It is used not only for web crawling scripts but also for various tasks such as backups, system updates, etc. 'Task Scheduler' in Windows and 'cron' in Mac and Linux are prime examples.
2.1. Windows: Task Scheduler
Search for 'Task Scheduler' in the Start menu.
Select 'Create Task'.
Enter the task name and description.
In the 'Triggers' tab, add a new trigger to set the execution time and interval.
In the 'Actions' tab, add a new action to input the command to execute the Python script. Example: python.exe path\to\script.py
Once configured, click 'OK' to save the task.
※ Note: If there are spaces in the Python script path or Python executable path, they should be enclosed in double quotation marks (").
2.2. Mac & Linux: cron
Open the terminal and enter the command crontab -e to edit the cron job.
Add the task to be scheduled following the format below.
분 시 일 월 요일 /파이썬의_절대경로/python3 /크롤링_파이썬_스크립트의_절대경로/script.py
For example, to run at 3:30 PM daily:
30 15 * * * /usr/local/bin/python3 /your/path/to/script.py
※ Log Checking: By default, the output of cron jobs is sent via email, but in most systems, this email system is not enabled. Therefore, you can set it to directly output to a log file.
※ Note: Since cron requires absolute paths, make sure to enter the absolute paths of Python and the script accurately. It is recommended to set the necessary environment variables directly in the script as environment variables may not be set.
3. Conclusion
We have explored how to automate web crawling code using Python scheduler and system scheduler. Utilizing these automation tools is crucial to maximize the efficiency of web crawling tasks.
Furthermore, besides the methods mentioned earlier, there are various cloud services and specialized automation tools available, so it is good to explore different methods to find the optimal solution.
Automation liberates us from simple repetitive tasks and enables us to focus on more valuable tasks. We encourage you to actively utilize automation!
Also, Check Out:
Data Collection, Automate Now
Start in 5 minutes without coding · Experience crawling 5,000+ websites




