Data Type Issues

I am facing an issue while importing the data from a csv file and storing it in SQL Server using the C# StreamReader. All the columns in the csv files are read as varchar datatype while inserting into the database table. How do I insert in a proper way so that columns with numerical values get stored as int/float, strings as varchar, etc.

Can anyone suggest a proper way to do this task?

Can anyone help with a robust and generic code?

Thanks in advance.

2 answers

  • answered 2021-01-19 18:03 MarkPflug

    I maintain a library that can handle this quite easily: Sylvan.Data.Csv. The key is that you need to provide a schema for the incoming data so that the SqlClient library knows how to deal with it.

    // the "Schema" type comes from Sylvan.Data package
    var schema =
        new Schema
        .Builder()
        .Add<int>("Id")
        .Add<string>("Name")
        .Add<DateTime?>("ModifiedDate")
        .Build();
    
    
    var opts = new CsvDataReaderOptions
    {
        Schema = new CsvSchema(schema)
    };
    
    // Create, or CreateAsync can be passed the name of a file or a TextReader.
    using var csv = CsvDataReader.Create(csvFilefame, opts);
    
    SqlConnection conn = ...;
    
    var bcp = new SqlBulkCopy(conn);
    bcp.BulkCopyTimeout = 0; // no timeout.
    bcp.DestinationTableName = "MyTable";
    bcp.BatchSize = 50000;
    bcp.WriteToServer(csv);
    
    

    The Schema type is defined in the Sylvan.Data package, which is currently pre-release only, so if you don't feel comfortable taking a dependency on that (understandable), you can implement your own ICsvSchemaProvider which is pretty easy. This answer has an example of implementing your own typed schema provider.

    If you have any questions, or issues feel free to open an issue over at https://github.com/MarkPflug/Sylvan

  • answered 2021-01-19 19:31 DRapp

    What I have done in the past is to do the load into a temporary table and have all the columns as character-based with max-length allowed / expected. Then, after all the input columns, I would add the columns as their final data format so they wont have any impact on import stream / comma/tab delimited format.

    Once the data is imported into the temp table, then run an update query to get to the final format of int, decimal, float, bit, date/time, etc.

    Finally, if all goes well, then you can do all your queries or insert into from the select from the temp file as needed for final import.

    Ex:

    temp table has

    FirstName  varchar(20),
    LastName   varchar(20),
    BirthDateText varchar(10),
    SalaryText varchar(10),
    AnyOtherColumns varchar(10),
    RealBirthDate Timestamp
    Salary  decimal
    

    So, the import fills up from FirstName to the AnyOtherColumns, but now you can run update commands to properly convert into the respective REALBirthDate and Salary columns of data.

    No third-party, completely under your control and data cleansing can be done / validated before it goes into its final destination too.