I am learning scraping. Scraping site.
I am able to select:
from dropdown using selenium.
I can select from मौजा का नाम चुने:
.
Afterwards, I am able to click on the खाता खोजें
button.
As a result, a table is populated at the bottom by javascript.
The button’s div code:
<input type="submit" name="ctl00$ContentPlaceHolder1$BtnSearch" value="खाता खोजें" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$ContentPlaceHolder1$BtnSearch", "", true, "S", "", false, false))" id="ctl00_ContentPlaceHolder1_BtnSearch" style="width:146px;">
Pagination is done by:
javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$11')
I am not able to scrape this table.
What I have tried:
- phantomjs isn’t supported with selenium
- Table’s id,
ctl00_ContentPlaceHolder1_GridView1
, is not in HTML source code. Tried some approaches, no luck so far.
#p_element = driver.find_element_by_id(id_='ctl00_ContentPlaceHolder1_GridView1')
p_element = driver.find_element_by_xpath('//*[@id="aspnetForm"]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr[4]')
print(p_element.text)
path_for_table='//*[@id="aspnetForm"]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr[4]'
table_list = WebDriverWait(driver, 2).until(lambda driver: driver.find_element_by_xpath(path_for_table))
print(table_list)
Pages I have looked at:
First, let’s get the site. I am using BeautifulSoup to scrape along with Selenium.
Then click on a village name (change according to your need)
click on "खाता खोजें" button:
Get the page’s source using BeautifulSoup
find the id:
ctl00_ContentPlaceHolder1_UpdatePanel2
and find alltd
s in it:Get columns and get the text out of them
columns
:column_names
:Next get the body of the table
Then create chunks of 6 columns for each entry and get the text out
data:
Now
import Pandas
and use it to create a dataframe out of this list of lists:Final result:
…
To scape the next pages: