How can I use BeautifulSoup to get all IMDB user reviews of a movie
I am working on a school project and want to get all user reviews of superhero movies of IMDB.
First, I try to get all user reviews of only 1 movie.
The page of user reviews, consists of 25 user reviews and a 'load more' button. While I already managed to write a code to open the load more button. I get stuck in the second part: getting all user reviews in a list.
I already tried to use BeautifulSoup to find all 'content' parts on the page. However, my list remains empty.
from bs4 import BeautifulSoup testurl = "https://www.imdb.com/title/tt0357277/reviews?ref_=tt_urv" patience_time1 = 60 XPATH_loadmore = "//*[@id='load-more-trigger']" XPATH_grade = "//*[@class='review-container']/div" list_grades =  driver = webdriver.Firefox() driver.get(testurl) # This is the part in which I open all 'load more' buttons. while True: try: loadmore = driver.find_element_by_id("load-more-trigger") time.sleep(2) loadmore.click() time.sleep(5) except Exception as e: print(e) break print("Complete") time.sleep(10) # When the whole page is loaded, I want to get all 'content' parts. soup = BeautifulSoup(driver.page_source) content = soup.findAll("content") list_content = [c.text_content() for c in content] driver.quit()
I expect to get a list of all content of the review-containers on the website. However, my list remains empty.
You use BeautifulSoup4, correct?
Method names changed from 3 to 4. (document)
find_all takes the tag name, and an optional
class_ param for the css class (see this SO answer)
So your code should be using the new name:
# content = soup.findAll("content") content = soup.find_all('div', class_=['text','show-more__control'])
get_text() in your list-comprehension:
# list_content = [c.text_content() for c in content] list_content = [tag.get_text() for tag in content]
Lastly, provide a parser when getting the soup: (document)
soup = BeautifulSoup(driver.page_source, features="html.parser")
Otherwise you will encounter this UserWarning:
SO56261323.py:36: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.