Creating a Campuspick contest and extracurricular activity crawler with Python - Contest & extracurricular activity automatic crawling project: Part 2

Detailed guide on how to automatically crawl Campuspick competitions and external activities using Python. Automate with Crontab and Task Scheduler!

7
Creating a Campuspick contest and extracurricular activity crawler with Python - Contest & extracurricular activity automatic crawling project: Part 2

Following the "Creating a Campus Pick Crawler with Python" we made last time, this time let's find out how to run the crawler we created at the desired day, date, and time.

There are various types of schedulers, but this time we will implement web scraping automation using Crontab for Unix-based Mac/Linux and Task Scheduler for Windows. You should configure the code according to the operating system you are using.

0. Precautions before starting

해당 Crontab과 TaskScheduler(작업스케줄러)는 컴퓨터가 켜져있는 상황에서만 동작합니다.
코드를 실행시키고 컴퓨터를 꺼버리면 작동하지 않습니다!

1. Automating with Crontab

Open the terminal and enter the command crontab -e to edit the cron job.

Add the task you want to schedule following the format below:

minute hour day month day_of_week /absolute_path_to_python /absolute_path_to_crawling_python_script

2. Finding the Absolute Path to Python

which python3
/usr/local/bin/python3

Using the which command in the terminal will display the full directory of the file. Insert this directory address into the Python absolute path.

3. Finding the Absolute Path to the Script

find . -name "파일이름"

Running the command will show the path corresponding to the file name. Insert this directory address into the absolute path of the Python script.

For example, to run it every day at 3:30 PM:

30 15 * * * /usr/local/bin/python3 /your/path/to/script.py

※ Note: Since cron requires absolute paths, make sure to enter the absolute paths of Python and the script accurately. It is recommended to set the necessary environment variables directly in the script as environment variables may not be set.

This way, our crawler will run every day at 3:30 PM.

4. Automating with Task Scheduler

4.1. Finding the Absolute Path to Python

Finding the path to Python is different on Windows. Open the CMD command prompt window and run:

where python3

Take the path that appears and set it as the path to Python. If multiple paths for python3 are displayed, determine which Python you are using and select the corresponding Python path.

4.2. Finding the Absolute Path to the Script

In the CMD command prompt used earlier, use the dir command to find the path.

dir campuspickcrawling.py /s

You must write the name and extension of your Python crawler file as shown above. Once the path is returned, use that path.

4.3. Setting Up

  • Search for 'Task Scheduler' in the Start menu

  • Select 'Create Task'

  • Enter task name and description

  • In the 'Triggers' tab, add a new trigger to set the execution time and frequency

  • In the 'Actions' tab, add a new action to run the Python script

    (*Insert the path to the Python script obtained earlier here.)

  • Once the setup is complete, click 'OK' to save the task.

With this setup, you can automate tasks on Windows as well.

5. Conclusion

In this post, we learned how to automate the crawler we created using different methods on UNIX and Windows. In the next post, since we have automated it, we will learn how to periodically send the crawling data we created via email.

Also, check out:

Data Collection, Now Automate

Start for free in 5 minutes · Experience web scraping from 5,000+ websites

Get started for free →

Comments

Add Comment

Your email won't be published and will only be used for reply notifications.

Continue Reading

Get notified of new posts

We'll email you when 해시스크래퍼 기술 블로그 publishes new content.

Your email will only be used for new post notifications.