python - Retain hyperlinks in Pandas - Excel to dataframe -


i have large excel file several sheets of data need convert html. excited try pandas streamline conversion , keep saving excel sheets html , spending day removing horrific ms tags.

i able read excel file + sheets , load them dataframe. problem is stripping hyperlinks out of cells. i've looked on cannot find answer on retaining hyperlinks. first time i've used pandas, inexperience. below code , screenshot of output. thank help.

in [2]: import pandas pd  in [3]: xls_file = pd.excelfile('desktop/cfec-temp/blackbook/blackbook.xlsx')  in [4]: xls_file out[4]: <pandas.io.excel.excelfile @ 0x1132ce4e0>  in [5]: xls_file.sheet_names out[5]: ['sheet1', 'sheet2', 'sheet3', 'sheet4', 'sheet5', 'sheet6', 'sheet7']  in [6]: df = xls_file.parse('sheet1')  in [7]: df 

enter image description here

what use openpyxl hyperlinks, iterate through rows in loop, creating list of hyperlinks, , add new column in pandas dataframe:

import openpxyl import pandas pd  df = pd.read_excel('file.xlsm')  wb = openpyxl.load_workbook('yourfile.xlsm') ws = wb.get_sheet_by_name('sheet1') print(ws.cell(row=2, column=1).hyperlink.target)  links = [] in range(2, ws.max_row + 1):  # 2nd arg in range() not inclusive, add 1     links.append(ws.cell(row=i, column=1).hyperlink.target)  df['link'] = links 

same first bit of code answered in other question


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -