Notice

공지 - 모든 주차 추후 이미지, 코드블럭 수⋯

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

개발캡슐

2022.07.08~13_3주차_Python, gitbash, mongoDB- 설치 및 크롤링/스크래핑_8강~15강 본문

스파르타코딩클럽-항해99_9기/3주차_Python, mongoDB, Flask, 크롤링, 스크랩핑

2022.07.08~13_3주차_Python, gitbash, mongoDB- 설치 및 크롤링/스크래핑_8강~15강

DevGreeny 2022. 7. 18. 19:39

3-8강. Quiz_웹스크래핑(크롤링)연습

2022.07.13
file_name: hello.py
크롤링할 것 - 순위, 영화명(제목), 평점

순위, 제목, 평점

import requests  # requests 라이브러리 설치 필요

r = requests.get('http://spartacodingclub.shop/sparta_api/seoulair')
rjson = r.json()

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

#old_content > table > tbody > tr:nth-child(3) > td.title > div > a
#old_content > table > tbody > tr:nth-child(4) > td.title > div > a

#old_content > table > tbody > tr:nth-child(2) > td:nth-child(1) > img
#old_content > table > tbody > tr:nth-child(3) > td:nth-child(1) > img

#old_content > table > tbody > tr:nth-child(2) > td.point
#old_content > table > tbody > tr:nth-child(3) > td.point

movies = soup.select('#old_content > table > tbody > tr')

for movie in movies:
    a = movie.select_one('td.title > div > a')
    if a is not None:
        title = a.text
        rank = movie.select_one('td:nth-child(1) > img')['alt']
        star = movie.select_one('td.point').text
print(rank, title, star)

=> 이 크롤링한 것들을 데이터베이스(DB)에 넣는 것을 해볼 것임.

3-9강. DB개괄

2022.07.13
DB는 왜 쓰고?
- 나중에 잘 찾기 위해 정리를 한 것.
- '정렬되어있는 순서'(index)로 쉽게 찾으려고.
- 순서들이 잘 정렬되어있고 이 순서로 뽑았을 때 일일이 찾지 않고 한 번에 뽑아쓰기 위해 씀.
DB는 무엇?
- 두 종류
- SQL : 칸을 만들어두고 그 안에 채운다. EX) 엑셀에 이름, 전화번호, 이메일 채움.
  - 정형화 되어있어서 이상한 게 안 들어오고, 좀 더 정렬되어있어서 더 빨리 찾기 쉬움.
- NoSQL(Not only SQL) : 그냥 들어오는 대로 쌓음
  - 그때그때마다 쌓이기 때문에 ,기획, 비즈니스 바뀌고 발전할 떄, 유연하게 대처하게 도움
  - 초기 서비스, 초기 스타트업에서 주로 이용.
- NoSQL 대표적 : mongoDB
- DB의 실체 : 프로그램과 같음.
- 데이터를 잘 쌓고 데이터를 잘 가져오게 하는 프로그램.
- 유저가 몰리거나, DB를 백업해야 하거나, 모니터링 하기가 아주 용이해.
- 요즘 트렌드는 클라우드에서 제공해주는 걸 쓰곤 함.
- 최신 클라우드 서비스 : mongoDB Atlas

3-10강. mongoDB 시작

2022.07.13

mongoDB 설치

ID : test, PW: ㅎㅎ;

mongoDB에 접속하려면 2 가지 패키지가 필요해.
- pymongo, dnspython

3-11강. mongoDB 연결

2022.07.13
python프로그램 창의 - 파일 - 설정 - 패키지 + - pymongo, dnspython 각각 검색 후 설치
mongoDB 내 프로젝트 'sparta' - Database - connect - connect your application
- connect to Cluster0 - DRIVER : Python , VERSION: 3.6 or later (3.X이상이면 다 괜춘)
- add you connection string into your application code의 주소를 복사

mongodb+srv://test:<password>@cluster0.^^.mongodb.net/?retryWrites=true&w=majority

복사 후
pymongo 기본 코드 스니펫 - > 파이썬에 첨부.
복사한 application code를 주소란에 넣음
그 주소 내용 수정 :
- <password> => sparta
- ?retryWrites=true&w=majority의 ?retry 앞 => Cluster0

mongodb+srv://test:<password>@cluster0.^^.mongodb.net/?retryWrites=true&w=majority

#데이터를 하나 집어 넣어봄.

- 넣은 DB 내용

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

#데이터를 하나 집어 넣어봄.

doc = {
    'name' : 'bob',
    'age' : 27
}

db.users.insert_one(doc)

=> 그 결과:

이후 mongoDB - collections - users 라는 collection에 DB를 넣은 걸 볼 수 있음(아래이미지)

![image-20220712235218807](C:\Users\user\AppData\Roaming\Typora\typora-user-images\image-20220712235218807.png)

=> DB를 막 집어넣더라도 비슷한 애들끼리 모아놔야 하니까 :를 collection라고 함.

3-12강. pymongo로 DB조작하기

2022.07.13
file_name: dbprac.py
python 프로그램을 가지고 DB에 데이터를 넣거나 가져오거나, 수정, 삭제

1_데이터 추가.

1. 기초

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.^^.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

#데이터를 하나 집어 넣어봄.


db.users.insert_one({'name':'bobby','age':27})
db.users.insert_one({'name':'john','age':20})
db.users.insert_one({'name':'ann','age':20})

#실행하면 mongoDB "user"collection 안에 데이터가 들어가있음.

2. 통상적으론

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.^^.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

doc = {'name':'bobby','age':27} #딕셔너리
db.users.insert_one(doc)

=> 딕셔너리를 하나 만들어주고 걔를 넣어라.

2_데이터 찾기

1. 데이터를 다 꺼내서 보기.

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.^^.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

all_users = list(db.users.find({})) # = users 의 users는 collection의 users.

for user in all_users:
    print(user)

=> 결과창

![image-20220713002235788](C:\Users\user\AppData\Roaming\Typora\typora-user-images\image-20220713002235788.png)

=> 결과창의 **"_id : ..."** 란 내용은 통상적으로 볼 필요없어. mongoDB가 뭔가 넣을 때

**자동으로 생성**하는 거야.

그래서, list(db.users.find({}))안에 **{'_id':False}** 이렇게 써주면,

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

all_users = list(db.users.find({},{'_id':False})) # = users 의 users는 collection의 users.

for user in all_users:
    print(user)

=>아래 결과창, 처럼 **_id와 그 내용은 사라져.**

![image-20220713002400045](C:\Users\user\AppData\Roaming\Typora\typora-user-images\image-20220713002400045.png)

==> 이것들은 데이터 여러 개를 쓸 때 이렇게 씀.

2. 만약 `하나`만 찾고싶다. 이름이 bobby라는 애 하나.

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

user = db.users.find_one({'name':'bobby'})
print(user)

=> 결과

{'_id': ObjectId('62cd8dcfc84ddcb47d824ce3'), 'name': 'bobby', 'age': 27}

2-1. bobby의 age를 가져오고싶다.

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

user = db.users.find_one({'name':'bobby'})
print(user['age'])

=> 결과창 : 27

3_업데이트(수정)

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

db.users.update_one({'name':'bobby'},{'$set':{'age':19}})

=> 뜻: users라는 데에 가서 업데이트를 하나 하는데 조건이 뭐냐면 name이 bobby란 애를 찾아서 age를 19로 만들어라.

=> 결과 : collection "users"의 데이터 내용이 **수정**됨. **bobby의 age**가 **27 -> 19**

![image-20220713003356986](C:\Users\user\AppData\Roaming\Typora\typora-user-images\image-20220713003356986.png)

=> 수정(업데이트) 명령: .update_one

=> 조건: {'name':'bobby'} / 바꿔라: {'$set':{'age':19}}

4_삭제 - 잘 안 씀.

1) 조건 : 'name'이 'bobby'

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

db.users.delete_one({'name':'bobby'})

=> 결과: mongoDB - **collection 'users'** - DB 내용 중 **'bobby'와 그 내용이 사라짐.**

![image-20220713004737597](C:\Users\user\AppData\Roaming\Typora\typora-user-images\image-20220713004737597.png)

5_요약

1. 데이터추가
2. 데이터 찾기
3. 수정(업데이트)
4. 삭제

=> 위의 전체 코드 요약

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

d# 저장 - 예시
doc = {'name':'bobby','age':21}
db.users.insert_one(doc)

# 한 개 찾기 - 예시
user = db.users.find_one({'name':'bobby'})

# 여러개 찾기 - 예시 ( _id 값은 제외하고 출력)
all_users = list(db.users.find({},{'_id':False}))

# 바꾸기 - 예시
db.users.update_one({'name':'bobby'},{'$set':{'age':19}})

# 지우기 - 예시
db.users.delete_one({'name':'bobby'})

저장(넣어주기) - 데이터 저장할 때는 딕셔너리 하나 만들어서 그대로 insert_one 해서 넣어준다.
한 개 찾기 - find_one해서 찾으면 된다.
여러개 찾을 때는 리스트 갖다 쓰고.
조건은 없다. 모든 걸 찾을테니까. 조건을 넣어줄 수도 있다.
update_one => 조건 ~ 어떻게 바꿔라.
delete - 조건만 하면 싹 날라감.

3-13강. 웹스크래핑 결과 저장하기.

2022.07.13

영화, 순위 데이터베이스에 넣어보기

**file_name: hello.py , dbprac.py**
dbprac.py 의 상단 3줄 -> hello.py의 beautifulsoup 아래 적당히 넣어줌

import requests
from bs4 import BeautifulSoup

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

#old_content > table > tbody > tr:nth-child(3) > td.title > div > a
#old_content > table > tbody > tr:nth-child(4) > td.title > div > a

#old_content > table > tbody > tr:nth-child(2) > td:nth-child(1) > img
#old_content > table > tbody > tr:nth-child(3) > td:nth-child(1) > img

#old_content > table > tbody > tr:nth-child(2) > td.point
#old_content > table > tbody > tr:nth-child(3) > td.point

movies = soup.select('#old_content > table > tbody > tr')

for movie in movies:
    a = movie.select_one('td.title > div > a')
    if a is not None:
        title = a.text
        rank = movie.select_one('td:nth-child(1) > img')['alt']
        star = movie.select_one('td.point').text
        print(rank, title, star)

위의 내용을 데이터베이스에 저장

import requests
from bs4 import BeautifulSoup

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver?sel=pnt&date=20210829',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

#old_content > table > tbody > tr:nth-child(3) > td.title > div > a
#old_content > table > tbody > tr:nth-child(4) > td.title > div > a

#old_content > table > tbody > tr:nth-child(2) > td:nth-child(1) > img
#old_content > table > tbody > tr:nth-child(3) > td:nth-child(1) > img

#old_content > table > tbody > tr:nth-child(2) > td.point
#old_content > table > tbody > tr:nth-child(3) > td.point

movies = soup.select('#old_content > table > tbody > tr')

for movie in movies:
    a = movie.select_one('td.title > div > a')
    if a is not None:
        title = a.text
        rank = movie.select_one('td:nth-child(1) > img')['alt']
        star = movie.select_one('td.point').text
        doc = {
            'title':title,
            'rank':rank,
            'star':star
        }
        db.movies.insert_one(doc)

3-14강. Quiz_웹스크래핑 결과 이용

2022.07.14

1. 데이터 가져오고 찾기

가버나움의 평점을 가져오고
가버나움의 평점과 같은 영화제목 찾아오기.

from pymongo import MongoClient
client = MongoClient('mongodb+srv://test:sparta@cluster0.oqxmexp.mongodb.net/Cluster0?retryWrites=true&w=majority')
db = client.dbsparta

movie = db.movies.find_one({'title':'가버나움'})
star = movie['star']

all_movies = list(db.movies.find({'star':star},{'_id':False}))
for m in all_movies:
    print(m['title'])

2. 업데이트(수정)

가버나움의 평점을 **문자열 '0'**으로 수정해주기

db.movies.update_one({'title':'가버나움'},{'$set':{'star':'0'}})