Reading Excel 2003 and 2007 Files Using OLEDB

| January 30, 2009 | 37 Comments

excelreader
One of the feature of the application I’m working on these days is allowing the user to import data from Excel 2003 and 2007 files. After some research I found out that OLEDB is the only way to read both Excel formats without any problems.

For this example I split the code into two functions. The first is OpenExcelFile which opens the Excel file and sets the string array data field workSheetNames to the names of the sheets in the file. This is done by reading the names of the tables in the schema, because OLEDB opens the whole file as a schema and the sheets in the file as tables in the schema. The function takes one argument called isOpenXMLFormat, that is set to true to open a 2007 file and false to open a 2003 or previous file. The only difference between the two formats is the connection string used.

        public void OpenExcelFile(bool isOpenXMLFormat)
        {
            //open the excel file using OLEDB
            OleDbConnection con;

            if (isOpenXMLFormat)
                //read a 2007 file
                connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" +
                    fileName + ";Extended Properties=\"Excel 8.0;HDR=YES;\"";
            else
                //read a 97-2003 file
                connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
                    fileName + ";Extended Properties=Excel 8.0;";

            con = new OleDbConnection(connectionString);
            con.Open();

            //get all the available sheets
            System.Data.DataTable dataSet = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

            //get the number of sheets in the file
            workSheetNames = new String[dataSet.Rows.Count];
            int i = 0;
            foreach (DataRow row in dataSet.Rows)
            {
                //insert the sheet's name in the current element of the array
                //and remove the $ sign at the end
                workSheetNames[i] = row["TABLE_NAME"].ToString().Trim(new[] { '$' });
                i++;
            }

            if (con != null)
            {
                con.Close();
                con.Dispose();
            }

            if (dataSet != null)
                dataSet.Dispose();
        }

The second function is GetWorksheet, and it takes the name of the sheet as an argument. Here the file is opened again using the connection string set by OpenExcelFile. And executes a SQL SELECT on the name of the sheet passed in the argument, and returns the result as a DataTable.

        public System.Data.DataTable GetWorksheet(string worksheetName)
        {
            OleDbConnection con = new System.Data.OleDb.OleDbConnection(connectionString);
            OleDbDataAdapter cmd = new System.Data.OleDb.OleDbDataAdapter(
                "select * from [" + worksheetName + "$]", con);

            con.Open();
            System.Data.DataSet excelDataSet = new DataSet();
            cmd.Fill(excelDataSet);
            con.Close();

            return excelDataSet.Tables[0];
        }

 

Download Source Code

 


Related Posts:

Filed Under: ASP.NET, WinForms

Follow us on Twitter

Comments

  1. Dagu says:

    This is what I’m looking for. Thank you very much.

  2. You are welcome Dagu. I appreciate the comment.

  3. Reading Excel 2003 and 2007 Files Using OLEDB | CodeHill…

    Thank you for submitting this cool story – Trackback from DotNetShoutout…

  4. Hi.

    I have been successful in reading all the sheets in an excel book. But after I read the excel, it becomes locked. So I can only open it in the read only mode.

    I have googled. But very thin voices heared over there. Any idea would be much useful.

    With appreciation. Thanks.

  5. ivan says:

    you can use only
    connectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=” +
    fileName + “;Extended Properties=\”Excel 8.0;HDR=YES;\”";

    for both types of excel file

  6. Hi ivan,
    I used:
    connectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=” +
    fileName + “;Extended Properties=\”Excel 8.0;HDR=YES;\””;

    And now there is no problem of locking excel files when text is extracted from them.

    For, if they works for both, i have saved one excel file with xls format and this has worked. I haven’t tested from excel 2003 iteself. However, I hope this should work. Thank you!

  7. Sandeep Jaiswal says:

    Gr8 post. this is what I was looking for, exactly according to my need. Thank you very much. but… what I want othe than this is, I want to read the comments from each cell, is it possible ?

  8. MaheshKumar says:

    Hai, It’s solve my all excel issues. Thank you very
    much.

  9. suresh says:

    thanks its solved my issue… good post keep it up.

  10. Tessy says:

    i wanna run a “complex” query like the following:

    SELECT *
    FROM ( SELECT customerNo, rate, dateApproved
    FROM ['worksheet$']
    WHERE dateCreate = (SELECT MAX(dateCreate) FROM ['worksheet$'] )
    AND status = ‘Approved’) AS T1
    WHERE dateApproved = SELECT MAX(dateApproved) FROM T1

    this way, it does not recognise the T1 object.Is there a way to achieve this?

  11. prashant says:

    I am getting error in this line…
    workSheetNames[i] = row["TABLE_NAME"].ToString().Trim(new[] { ‘$’ });

    Please guide me how to put sheet name.

  12. Beto says:

    prashant, the posting is missing a string array declaration. Add “String[]” before “workSheetNames[i].

    amgad, thanks for the excellent post.
    Beto

  13. Thank YOU Beto for dropping by and commenting

  14. Dane says:

    This code is most of what I need but would it be possible to take a dataset or worksheet from one and write to a worksheet in another file as a whole unit? I would like to avoid insert as it would require name value pairs and their is a lot of formatting and images in the orginal.
    Thanks,

  15. Peterhs says:

    When I read this I see here are people who know a lot more then about locking and connecting problems. Maybe it is “not done” to place my question (sorry), but I’m a little out off options and couldn’t get answers on other forums. If it’s a problem let me know I delete the question.

    We have a Excel 2007 database with information about suppliers. When we make a new document in Word, with mail-merge the information from a supplier is put in the document (it uses ODBC connection).

    This worked fine in Office 97, but not anymore in 2007.
    * I like to make the Excel file a template, everyone can see it (excel file made from template) and the original file is good protected against changes. Sadly it is not possible to use a template as a data source in mail merge.
    * When I use a xlsx (2007) file and one user mail merge, the excel file is locked for all other users, this doesn’t work.
    * When I save the file in “old format” (xls) and give it “read only” it works fine for the users.

    The problem with the last solution is with editing the Excel file. When no one is using it (no active mail merge) it can be opened for editing and still others can then mail merge with it. But when someone has then created a open mail merge, the original file (with changes) can’t be saved in Excel.

    A year ago I spend a couple of day’s with this problem and asked in forum’s, but no luck. The past year we got more and more frustrated with it, people must stay at work to make changes to the Excel file after working hours. And we are working with XLS for this file, it all sucks.

    So I looked at it the last 2 day’s again and tried many things (share, read only, csv, different connection types, etc.) but no luck.

    Does anyone have a solution or a hint to solve this problem?

    Many thanks

    Peter

  16. Hi Peter,
    I understand your problem, unfortunately I don’t use Office that much so I don’t know how to solve it. But I hope one of my readers would take the time to answer it.

  17. Jonathan Maes says:

    Hi,
    Love you’re article, and I use it for reading .xls and .xlsx files to a dataset.

    I tried to turn it around because I want to write also. Do you have some tips? maybe some code :)

    Greetz Jonathan

  18. [...] in the Excel file, those can be different each time. First I read an excel file with this code: http://codehill.com/2009/01/reading-…s-using-oledb/ I change some things in the DataTable and then I want to write to a copy of the first excel file. [...]

  19. Bob says:

    Thanks for the Post . works perfectly. Just what I needed in my code.

  20. Sekar says:

    thanks for the post. Really saved my day

  21. Subramanian says:

    GOOD WORK.. wAS MORE HELPFULL TO ME

  22. SLaks says:

    You need to use an OleDbConnectionStringBuilder:

    OleDbConnectionStringBuilder builder = new OleDbConnectionStringBuilder();

    if (isOpenXML)
    builder.Provider = “Microsoft.ACE.OLEDB.12.0″;
    else
    builder.Provider = “Microsoft.Jet.OLEDB.4.0″;

    builder.DataSource = fileName;
    builder["Extended Properties"] = “Extended Properties=\”Excel 8.0;HDR=YES;\”"

    con = new OleDbConnection(builder.ToString());

  23. Excelent post, thanks, but I have problems with hyperlink column in excel, I get only link text in datatable, but no URL ? Is there a way to get whole hyperlink ?

  24. Mohammed says:

    Really helpful!!!!
    good work

  25. Hari says:

    Does this work on 64bit machines as well?

  26. Amgad says:

    @Hari It should work on 64bit machines. But if didn’t please let me know.

  27. Ernie says:

    Very nice. Had my own version half done then came across yours. Rather than re-invent the wheel ….

    Thanks.

  28. Amgad says:

    You’re welcome Ernie, glad you found it useful.

  29. Dev says:

    @Antonio Bakula

    Unfortunately there is no way to grab an underlying hyperlink from the cell itself. Connecting to an Excel file via OLEDB treats all rows/columns like a flat database. Excel stores hyperlinks and other items in metadata native to the .XLS/.XLSX format. As of this posting the only way to grab an underlying hyperlink is via Interop. Doing a google search will get the results you need for grabbing a hyperlink via Interop.

    By the way .. Interop is SLOW (technically speaking it should be), so if you have a lot of records and need to merge data so you can incorporate the links in a personalized DataTable, be prepared to wait …

  30. Majd says:

    Thanx for this great post, But i have a problem. i have an excel sheet with many columns and i dont want all of these columns to be imported. also, the sheet has no header row and the first row includes merged fields including logo , etc. how can i import some of the colums . i tried select A, B ,C form [sheet1$] but i got “need parameters” error. does any one have a soulition to my problem using the posted code or any other ways?
    thanks in advance

  31. kumar says:

    Hi,

    I want to retrieve the excel sheet names. i used the above mentioned code. But my excel sheet contains few macros. The below code
    System.Data.DataTable dataSet = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

    am getting macros names also. Please help me to retrieve the sheet names alone.

    Thanks,
    Happy coding

  32. kumar says:

    HI Friends,

    Before loading the data into datatable, i want to remove the empty rows from worksheet. Please help me

    Thanks,
    Happy Coding.

  33. Amit Verma says:

    Great article… Provided me with good insight when I was consistently getting “External table not in the correct format” error while reading Excel 2007 using Oledb. Thanks guys!

  34. ajoka says:

    Yeah, it really works.

    I’ve tried ExcelDataReader on codeplex, but it had some odd behaviour (some numeric/currency values were converted to date-time strings).

    So this OLEDB for OpenXml is really cool.

    I’ve tested it on 64 bit. It’s only a matter of oledb providers, but these 2 should be in windows already.

    Big thanx!

    Now i have to solve:
    - opening from stream and not from file
    - converting to dataset

    :-D

  35. Mars says:

    @Majd

    select A, B, C
    change to
    select F1, F2, F3

  36. mike says:

    Great code,

    but I am having problems with a particular spreadsheet

    it has the text “n/a” in a lot of cells and i think maybe the provider may have decided that the particular column is numerical and because “n/a” isnt numerical its getting dropped

    any help appreciated

Leave a Reply