How to Open csv file with pandas data frame

by Antenna_   Last Updated April 15, 2019 10:26 AM

There is a CSV format file with three column dataframe. The third column has long text. This error message occurred, when i tried to open the file using pandas.read_csv

message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte.

But there is no problem opening the file with

with open('file.csv', 'r', encoding='utf-8', errors = "ignore") as csvfile:

I don't know how converting this data to dataframe and i don't think pandas.read_csv handle this error properly.

So, how can i open this file and get dataframe?



Answers 3


Try this:

Open the cvs file in a text editor and make sure to save it in utf-8 format.

Then read the file as normal:

import pandas
csvfile = pandas.read_csv('file.csv', encoding='utf-8')
user8579087
user8579087
January 22, 2018 15:12 PM

I would try using the built-in csv reader then put the data into pandas.

import csv
with open('eggs.csv', newline='') as csvfile:
     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     for row in spamreader:
         print(', '.join(row))

If this doesn't work, then at least you can confirm that it is a csv issue and not a pandas issue choking on encodings.

The other recommendation is to ensure you are using Python 3.x that handles encoding issues much better than 2.7.

If you can provide your sample, I can test it myself and update my answer accordingly.

jamescampbell
jamescampbell
January 22, 2018 16:20 PM

You can try another option for encoding as "ISO-8859-1"

In your case: with open('file.csv', 'r', encoding = 'ISO-8859-1', errors = "ignore") as csvfile:

or try this: import pandas as pd data_file = pd.read_csv("file.csv", encoding = "ISO-8859-1") print(data_file)

Shubham Yadav
Shubham Yadav
April 15, 2019 10:25 AM

Related Questions


Updated July 30, 2018 21:26 PM

Updated March 12, 2017 23:26 PM

Updated July 25, 2017 22:26 PM

Updated December 31, 2017 16:26 PM

Updated August 04, 2018 02:26 AM