I am trying to pull all of the text and links by date in a table and so far can only get one entry (but not correctly as the link is not named correctly). I think nextsibling might work here but perhaps that's not the right solution.
Here's the html:
<ul class="indented"> <br> <strong>May 15, 2019</strong> <ul> Sign up for more insight into FERC with our monthly news email, The FERC insight <a href="/media/insight.asp">Read More</a> </ul> <br><br> <strong>May 15, 2019</strong> <ul> FERC To Convene a Technical Conference regarding Columbia Gas Transmission, LLC on July 10, 2019 <a href="/CalendarFiles/20190515104556-RP19-763-000%20TC.pdf">Notice</a> <img src="/images/icon_pdf.gif" alt="PDF"> | <a href="/EventCalendar/EventDetails.aspx?ID=13414&CalType=%20&CalendarID=116&Date=07/10/2019&View=Listview">Event Details</a> </ul> <br><br>
Here's my code:
import requests from bs4 import BeautifulSoup url1 = ('https://www.ferc.gov/media/headlines.asp') r = requests.get(url1) # Create a BeautifulSoup object soup = BeautifulSoup(r.content, 'lxml') # Pull headline text from the ul class indented headlines = soup.find_all("ul", class_="indented") headline = headlines date = headline.select_one('strong').text.strip() print(date) headline_text = headline.select_one('ul').text.strip() print(headline_text) headline_link = headline.select_one('ul a')["href"] headline_link = 'https://www.ferc.gov' + headline_link print(headline_link)
I get the first date, text and link because I'm using select_one. I need to get all of the links and name them properly for each date. Would findnext work here or findnextsibling?
I believe this is what you are looking for; it gets the date, announcement and related links:
[start same as your code; thru soup declaration] dates = soup.find_all("strong") for date in dates: if '2019' in date.text: print(date.text) print(date.nextSibling.nextSibling.text) for ref in date.nextSibling.nextSibling.find_all('a'): new_link = "https://www.ferc.gov" + ref['href'] print(new_link) print('=============================')
Random part of the output:
May 15, 2019 FERC To Convene a Technical Conference regarding Columbia Gas Transmission, LLC on July 10, 2019 Notice | Event Details https://www.ferc.gov/CalendarFiles/20190515104556-RP19-763-000%20TC.pdf https://www.ferc.gov/EventCalendar/EventDetails.aspx?ID=13414&CalType=%20&CalendarID=116&Date=07/10/2019&View=Listview =============================