Reading Excel 2003 and 2007 Files Using OLEDB

January 30, 2009 | By | 45 Comments

excelreader
One of the feature of the application I’m working on these days is allowing the user to import data from Excel 2003 and 2007 files. After some research I found out that OLEDB is the only way to read both Excel formats without any problems.

For this example I split the code into two functions. The first is OpenExcelFile which opens the Excel file and sets the string array data field workSheetNames to the names of the sheets in the file. This is done by reading the names of the tables in the schema, because OLEDB opens the whole file as a schema and the sheets in the file as tables in the schema. The function takes one argument called isOpenXMLFormat, that is set to true to open a 2007 file and false to open a 2003 or previous file. The only difference between the two formats is the connection string used.

The second function is GetWorksheet, and it takes the name of the sheet as an argument. Here the file is opened again using the connection string set by OpenExcelFile. And executes a SQL SELECT on the name of the sheet passed in the argument, and returns the result as a DataTable.

 

Download Source Code

 

Filed in: ASP.NET, WinForms | Tags: ,

Comments (45)

  1. This is what I’m looking for. Thank you very much.

  2. You are welcome Dagu. I appreciate the comment.

  3. Hi.

    I have been successful in reading all the sheets in an excel book. But after I read the excel, it becomes locked. So I can only open it in the read only mode.

    I have googled. But very thin voices heared over there. Any idea would be much useful.

    With appreciation. Thanks.

  4. ivan

    you can use only
    connectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=” +
    fileName + “;Extended Properties=\”Excel 8.0;HDR=YES;\””;

    for both types of excel file

  5. Hi ivan,
    I used:
    connectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=” +
    fileName + “;Extended Properties=\”Excel 8.0;HDR=YES;\””;

    And now there is no problem of locking excel files when text is extracted from them.

    For, if they works for both, i have saved one excel file with xls format and this has worked. I haven’t tested from excel 2003 iteself. However, I hope this should work. Thank you!

  6. Sandeep Jaiswal

    Gr8 post. this is what I was looking for, exactly according to my need. Thank you very much. but… what I want othe than this is, I want to read the comments from each cell, is it possible ?

  7. MaheshKumar

    Hai, It’s solve my all excel issues. Thank you very
    much.

  8. suresh

    thanks its solved my issue… good post keep it up.

  9. Tessy

    i wanna run a “complex” query like the following:

    SELECT *
    FROM ( SELECT customerNo, rate, dateApproved
    FROM ['worksheet$']
    WHERE dateCreate = (SELECT MAX(dateCreate) FROM ['worksheet$'] )
    AND status = ‘Approved’) AS T1
    WHERE dateApproved = SELECT MAX(dateApproved) FROM T1

    this way, it does not recognise the T1 object.Is there a way to achieve this?

  10. prashant

    I am getting error in this line…
    workSheetNames[i] = row["TABLE_NAME"].ToString().Trim(new[] { ‘$’ });

    Please guide me how to put sheet name.

  11. Beto

    prashant, the posting is missing a string array declaration. Add “String[]” before “workSheetNames[i].

    amgad, thanks for the excellent post.
    Beto

  12. Thank YOU Beto for dropping by and commenting

  13. Dane

    This code is most of what I need but would it be possible to take a dataset or worksheet from one and write to a worksheet in another file as a whole unit? I would like to avoid insert as it would require name value pairs and their is a lot of formatting and images in the orginal.
    Thanks,

  14. Peterhs

    When I read this I see here are people who know a lot more then about locking and connecting problems. Maybe it is “not done” to place my question (sorry), but I’m a little out off options and couldn’t get answers on other forums. If it’s a problem let me know I delete the question.

    We have a Excel 2007 database with information about suppliers. When we make a new document in Word, with mail-merge the information from a supplier is put in the document (it uses ODBC connection).

    This worked fine in Office 97, but not anymore in 2007.
    * I like to make the Excel file a template, everyone can see it (excel file made from template) and the original file is good protected against changes. Sadly it is not possible to use a template as a data source in mail merge.
    * When I use a xlsx (2007) file and one user mail merge, the excel file is locked for all other users, this doesn’t work.
    * When I save the file in “old format” (xls) and give it “read only” it works fine for the users.

    The problem with the last solution is with editing the Excel file. When no one is using it (no active mail merge) it can be opened for editing and still others can then mail merge with it. But when someone has then created a open mail merge, the original file (with changes) can’t be saved in Excel.

    A year ago I spend a couple of day’s with this problem and asked in forum’s, but no luck. The past year we got more and more frustrated with it, people must stay at work to make changes to the Excel file after working hours. And we are working with XLS for this file, it all sucks.

    So I looked at it the last 2 day’s again and tried many things (share, read only, csv, different connection types, etc.) but no luck.

    Does anyone have a solution or a hint to solve this problem?

    Many thanks

    Peter

  15. Hi Peter,
    I understand your problem, unfortunately I don’t use Office that much so I don’t know how to solve it. But I hope one of my readers would take the time to answer it.

  16. Jonathan Maes

    Hi,
    Love you’re article, and I use it for reading .xls and .xlsx files to a dataset.

    I tried to turn it around because I want to write also. Do you have some tips? maybe some code :)

    Greetz Jonathan

  17. Bob

    Thanks for the Post . works perfectly. Just what I needed in my code.

  18. Sekar

    thanks for the post. Really saved my day

  19. Subramanian

    GOOD WORK.. wAS MORE HELPFULL TO ME

  20. You need to use an OleDbConnectionStringBuilder:

    OleDbConnectionStringBuilder builder = new OleDbConnectionStringBuilder();

    if (isOpenXML)
    builder.Provider = “Microsoft.ACE.OLEDB.12.0″;
    else
    builder.Provider = “Microsoft.Jet.OLEDB.4.0″;

    builder.DataSource = fileName;
    builder["Extended Properties"] = “Extended Properties=\”Excel 8.0;HDR=YES;\””

    con = new OleDbConnection(builder.ToString());

  21. Excelent post, thanks, but I have problems with hyperlink column in excel, I get only link text in datatable, but no URL ? Is there a way to get whole hyperlink ?

  22. Mohammed

    Really helpful!!!!
    good work

  23. Hari

    Does this work on 64bit machines as well?

  24. @Hari It should work on 64bit machines. But if didn’t please let me know.

  25. Ernie

    Very nice. Had my own version half done then came across yours. Rather than re-invent the wheel ….

    Thanks.

  26. You’re welcome Ernie, glad you found it useful.

  27. Dev

    @Antonio Bakula

    Unfortunately there is no way to grab an underlying hyperlink from the cell itself. Connecting to an Excel file via OLEDB treats all rows/columns like a flat database. Excel stores hyperlinks and other items in metadata native to the .XLS/.XLSX format. As of this posting the only way to grab an underlying hyperlink is via Interop. Doing a google search will get the results you need for grabbing a hyperlink via Interop.

    By the way .. Interop is SLOW (technically speaking it should be), so if you have a lot of records and need to merge data so you can incorporate the links in a personalized DataTable, be prepared to wait …

  28. Majd

    Thanx for this great post, But i have a problem. i have an excel sheet with many columns and i dont want all of these columns to be imported. also, the sheet has no header row and the first row includes merged fields including logo , etc. how can i import some of the colums . i tried select A, B ,C form [sheet1$] but i got “need parameters” error. does any one have a soulition to my problem using the posted code or any other ways?
    thanks in advance

  29. kumar

    Hi,

    I want to retrieve the excel sheet names. i used the above mentioned code. But my excel sheet contains few macros. The below code
    System.Data.DataTable dataSet = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

    am getting macros names also. Please help me to retrieve the sheet names alone.

    Thanks,
    Happy coding

  30. kumar

    HI Friends,

    Before loading the data into datatable, i want to remove the empty rows from worksheet. Please help me

    Thanks,
    Happy Coding.

  31. Amit Verma

    Great article… Provided me with good insight when I was consistently getting “External table not in the correct format” error while reading Excel 2007 using Oledb. Thanks guys!

  32. ajoka

    Yeah, it really works.

    I’ve tried ExcelDataReader on codeplex, but it had some odd behaviour (some numeric/currency values were converted to date-time strings).

    So this OLEDB for OpenXml is really cool.

    I’ve tested it on 64 bit. It’s only a matter of oledb providers, but these 2 should be in windows already.

    Big thanx!

    Now i have to solve:
    - opening from stream and not from file
    - converting to dataset

    :-D

  33. Mars

    @Majd

    select A, B, C
    change to
    select F1, F2, F3

  34. mike

    Great code,

    but I am having problems with a particular spreadsheet

    it has the text “n/a” in a lot of cells and i think maybe the provider may have decided that the particular column is numerical and because “n/a” isnt numerical its getting dropped

    any help appreciated

  35. Anoop

    Is it possible to read the cell comments using oledb ?

  36. Erwin

    Nice article thanks a lot ;-)

    Is there a way I can get something more about the styles that are used within the excell sheet? Like font size, font, etc.

    any help is appreciated

  37. biju

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    using System.Data.OleDb;
    using System.IO;

    namespace simple
    {
    public partial class Form1 : Form
    {

    public Form1()
    {
    InitializeComponent();
    }

    //string path = “C:\\File\\Book1.xlsx”;
    string path = “C:\\File\\Book1.xls”;
    string Outputpath = “C:\\File\\Output.txt”;
    string Errorpath = “C:\\File\\error.txt”;
    private void button1_Click(object sender, EventArgs e)
    {
    try
    {

    // string connString = string.Format(“Provider=Microsoft.ACE.OLEDB.12.0;Data Source=” + path + “;Extended Properties=\”Excel 12.0 Xml;HDR=YES\”;”);
    string connString = string.Format(“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=” + path + “;Extended Properties=\”Excel 8.0; IMEX=1\”;”);
    OleDbConnection cn = new OleDbConnection(connString);
    cn.Open();
    OleDbDataAdapter adapter = new OleDbDataAdapter(“Select * From [Sheet1$] where EmpId is not null and Salary is not null”, cn);
    DataTable dt = new DataTable();
    adapter.Fill(dt);
    Write(dt, Outputpath);
    }
    catch(Exception ex)
    {
    ErrorText(Errorpath, “error : ” + ex.Message + ex.StackTrace);
    }

    }
    public static void Write(DataTable dt, string filePath)
    {
    int i = 0;
    StreamWriter sw = null;
    sw = new StreamWriter(filePath, false);
    string header = string.Empty;
    header = “Date & Time : ” + DateTime.Now + Environment.NewLine;
    sw.Write(header);
    for (i = 0; i < dt.Columns.Count – 1; i++)
    {
    sw.Write(dt.Columns[i].ColumnName + " ");
    }
    sw.Write(dt.Columns[i].ColumnName);
    sw.WriteLine();
    foreach (DataRow row in dt.Rows)
    {
    object[] array = row.ItemArray;
    for (i = 0; i < array.Length – 1; i++)
    {
    sw.Write(array[i] + " ");
    }
    sw.Write(array[i].ToString());
    sw.WriteLine();
    }
    sw.Close();
    }
    private void ErrorText(string fileName, string msg)
    {
    FileStream fs;
    StreamWriter sw;
    string header = string.Empty;
    if (!File.Exists(fileName))
    {
    //Create file and text file Header
    fs = File.Create(fileName);
    sw = new StreamWriter(fs);
    sw.Close();
    fs.Close();
    }
    //error writing to file
    fs = File.Open(fileName, FileMode.Append);
    sw = new StreamWriter(fs);
    header = "Date & Time : " + DateTime.Now + Environment.NewLine;
    if (msg != string.Empty)
    header = header + "Message : – " + msg + Environment.NewLine;
    sw.Write(header);
    sw.Close();
    fs.Close();
    }

    }
    }

  38. Stefan
  39. sa

    Hi,

    i am getting an excel sheet from vendor, it has numerical columns formatted to show only 2 decimal values in the excel sheet. But, when i load, i want to all the decimal values (i.e. actual value without formatting).
    Is there any way to do so ?

    Thanks…

  40. mohammad reza pushideh

    Thank you very much.

  41. midix

    You don’t need JET.OLEDB. ACE.OLEDB.12 is able to load also the old formats, and also it has another advantage – it has both 32 bit and 64 bit versions but JET.OLEDB does not.

    // try determining from extension
    bool isOldFormat =
    string.Compare(Path.GetExtension(sourceExcel), “xls”, true) == 0;

    bool isBinary =
    string.Compare(Path.GetExtension(sourceExcel), “xlsb”, true) == 0;

    // IMEX=1 – to force strings on mixed data
    // HDR=NO – to process all the available data
    string connectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\”{1};READONLY=TRUE;HDR=NO;IMEX=1;\””;

    string sourceType = null;

    // source type according to the
    // http://www.microsoft.com/en-us/download/details.aspx?id=13255
    if (isOldFormat)
    sourceType = “Excel 8.0″;
    else
    sourceType = “Excel 12.0″;

    if (!isBinary)
    sourceType += ” Xml”;

    connectionString = string.Format(connectionString, sourceExcel, sourceType);

    With this code I was able to read old xls and new xlsx and xlsb.

  42. Vishwadeep

    It’s work , Thank you very much.

Leave a Reply

Trackback URL | RSS Feed for This Entry