python - Retain hyperlinks in Pandas - Excel to dataframe -
i have large excel file several sheets of data need convert html. excited try pandas streamline conversion , keep saving excel sheets html , spending day removing horrific ms tags.
i able read excel file + sheets , load them dataframe. problem is stripping hyperlinks out of cells. i've looked on cannot find answer on retaining hyperlinks. first time i've used pandas, inexperience. below code , screenshot of output. thank help.
in [2]: import pandas pd in [3]: xls_file = pd.excelfile('desktop/cfec-temp/blackbook/blackbook.xlsx') in [4]: xls_file out[4]: <pandas.io.excel.excelfile @ 0x1132ce4e0> in [5]: xls_file.sheet_names out[5]: ['sheet1', 'sheet2', 'sheet3', 'sheet4', 'sheet5', 'sheet6', 'sheet7'] in [6]: df = xls_file.parse('sheet1') in [7]: df
what use openpyxl hyperlinks, iterate through rows in loop, creating list of hyperlinks, , add new column in pandas dataframe:
import openpxyl import pandas pd df = pd.read_excel('file.xlsm') wb = openpyxl.load_workbook('yourfile.xlsm') ws = wb.get_sheet_by_name('sheet1') print(ws.cell(row=2, column=1).hyperlink.target) links = [] in range(2, ws.max_row + 1): # 2nd arg in range() not inclusive, add 1 links.append(ws.cell(row=i, column=1).hyperlink.target) df['link'] = links
same first bit of code answered in other question
Comments
Post a Comment