My data looks like below, and it is in excel file with extension xlsx, so I am using openpyxl library to read the file in python and extract the data from row 6 onwards for 2 columns,and I am appending the extracted data into dictionary with keys "Column_name" and "Column_Type". The reason for doing this way is because, I would like to add extra string between the extracted data. The string I would like to add is CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE();
So far my code and output looks like below.My expected output in txt file is also mentioned below. If you notice, I would like to enumerate value from both keys at same time, I am not sure how to achive this in python so any help is appreciated ?
Thanks in advance for your time and effort!
Data
File Name: Employee
Sheet Name: Employee
File Type: csv
Field Name Type
Name String
Salary Numeric
Date Date
Phone Int
Code so far
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Access Sheet
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) "\n")
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
for i in range(6, ws.max_row 1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
theString = " "
# This relies on the lists 'Column_name' and 'Column_Type' always being the same length
# I.e., there should always be a value for each key
for i in range(len(mylines['Column_name'])):
theString = mylines['Column_name'][i] " " mylines['Column_Type'][i]
if i < ws.max_row:
theString = ", "
outputFile.close()
Output in txt file with above Code:
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE([['Name', 'Salary', 'Date', 'Phone'], ['String', 'Numeric', 'Date', 'Int']]);
Expected output in Txt file
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE( Name String, Salary Numeric, Date Date, Phone Int);
CodePudding user response:
You could take advantage of the fact that there should always be the same number of items in the "Column_name" and "Column_Type" lists to reduce the problem to iteration using a single variable:
# Simulated Excel Data
ws = {
'max_row': 3,
'cell': [
['Name', 'String'],
['Salary', 'Numeric'],
['Date', 'Date'],
['Phone', 'Int']
]
}
mylines={"Column_name":[],"Column_Type":[]}
for i in range(0, ws['max_row'] 1):
name = ws['cell'][i][0]
name1=ws['cell'][i][1]
mylines["Column_name"].append(name)
mylines["Column_Type"].append(name1)
theString = " "
# This relies on the lists 'Column_name' and 'Column_Type' always being the same length
# I.e., there should always be a value for each key
for i in range(len(mylines['Column_name'])):
theString = mylines['Column_name'][i] " " mylines['Column_Type'][i]
if i < ws['max_row']:
theString = ", "
# OLD
print("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format([(mylines[k]) for k,v in mylines.items()]) "\n")
# NEW
print("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) "\n")
When I ran this, I got the following output (the first line from the old version and the second line from the new version):
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE([['Name', 'Salary', 'Date', 'Phone'], ['String', 'Numeric', 'Date', 'Int']]);
CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE( Name String, Salary Numeric, Date Date, Phone Int);
Of course, if you have actual numeric and other non-string data in your spreadsheet, you'll also want to cast the variable values to strings using str() before appending them to theString.
The only difference from your code (other than the changes I made to have the code work as a standalone without an actual spreadsheet) is replacing the format parameter in the last line with a string built by appending the values from both lists in mylines.
