Using Dialects when reading and writing CSV in Python

Posted on 08th April 2019

When reading and writing CSV files in Python using the csv module, you can specify an optional dialect parameter with the reader and writer function calls. So what is a Dialect?

A dialect is a group of parameters used to define the format of a CSV file. These parameters are:

  1. delimiter: The character used to separate the fields in a CSV file. Default is ,
  2. quotechar: A single character used to quote fields containing special characters. Default is "
  3. doublequote: The value of this parameter can be True or False. When it is True, the quotechar is doubled if it appears in a field. When False, the escapechar is used to escape the quotechar.
  4. escapechar: A single character used to escape the delimiter(when quoting is set to QUOTE_NONE) or quotechar(when doublequote is False)
  5. quoting: The value for this attribute and their meanings are as below.
    • QUOTE_ALL : Writer to quote all fields.
    • QUOTE_MINIMAL : Writer to quote fields that contain special characters
    • QUOTE_NONNUMERIC: Writer quotes all fields except numeric fields. Reader must convert all non-quoted fields as float data type.
    • QUOTE_NONE: Writer does not quote any fields and no special processing is required from the Reader object.
  6. lineterminator: String used by the writer to indicate the end of a line. Default is '\r\n'
  7. skipinitialspace: When this attribute is set to true, any whitespace just after the delimiter is ignored. Default is False.
  8. strict: When set to True, an exception is raised when input CSV is incorrectly formatted. Default is False.

Functions to manage dialects

Here is a list of functions used to manipulate dialects.

  • csv.list_dialects()

    The list_dialects() function of the csv module lists all the registered dialects.

    import csv
    print(csv.list_dialects())
    

    Output

    ['excel', 'excel-tab', 'unix']
    
  • csv.get_dialect(name)

    Returns a dialect object containing all the format parameters.

    import csv
    
    print(csv.list_dialects())
    
    d=csv.get_dialect('excel')
    print("Delimiter: ", d.delimiter)
    print("Doublequote: ", d.doublequote)
    print("Escapechar: ", d.escapechar)
    print("lineterminator: ", repr(d.lineterminator))
    print("quotechar: ", d.quotechar)
    print("Quoting: ", d.quoting)
    print("skipinitialspace: ", d.skipinitialspace)
    print("strict: ", d.strict)
    

    Output:

    Delimiter:  ,
    Doublequote:  1
    Escapechar:  None
    lineterminator:  '\r\n'
    quotechar:  "
    Quoting:  0
    skipinitialspace:  0
    strict:  0
    
  • csv.register_dialect(name [, dialect][ , formatparams])

    The register_dialect function creates a new dialect with the specified name. You can either pass a dialect from which to create the new dialect, pass the format parameters as keyword arguments or you can do both in which case with keyword arguments will override the parameters of the dialect.

    import csv
    
    csv.register_dialect('otg', 'excel', delimiter='|' )
    print(csv.list_dialects())
    
    d=csv.get_dialect('otg')
    print("Delimiter: ", repr(d.delimiter))
    print("Doublequote: ", d.doublequote)
    print("Escapechar: ", d.escapechar)
    print("lineterminator: ", repr(d.lineterminator))
    print("quotechar: ", d.quotechar)
    print("Quoting: ", d.quoting)
    print("skipinitialspace: ", d.skipinitialspace)
    print("strict: ", d.strict)
    

    Output:

    ['excel', 'excel-tab', 'unix', 'otg']
    Delimiter:  '|'
    Doublequote:  1
    Escapechar:  None
    lineterminator:  '\r\n'
    quotechar:  "
    Quoting:  0
    skipinitialspace:  0
    strict:  0
    
  • csv.unregister_dialect(name)

    The unregister_dialect function deletes the dialect.

    import csv
    csv.register_dialect('otg', 'excel', delimiter='|' )
    print(csv.list_dialects())
    csv.unregister_dialect('otg')
    print(csv.list_dialects())
    

    Output:

    ['excel', 'excel-tab', 'unix', 'otg']
    ['excel', 'excel-tab', 'unix']
    
  • csv.sniffer

    The sniffer class of csv module has methods that helps to infer the format of a csv file. It has two methods sniff() and has_header().

    sniff(data_bytes, delimiters=None) - the sniff method analyse the data that is provided and returns a dialect that represents the attributes found. Optionally, you can pass a set of valid delimiters to this function.

    import csv
    with open('mycsvfile.csv', 'r', newline='') as csv_file:
        dialect = csv.Sniffer().sniff(csv_file.read(1024))
        print(dialect.delimiter)
    

    Output:

    ,
    

    has_header(sample_data_bytes) - has_header function analyses the sample data and returns True if the first row is a header row or False if there is no header row.

    import csv
    with open('mycsvfile.csv', 'r', newline='') as csv_file:
    	print(csv.Sniffer().has_header(csv_file.read(1024)))
    

    Output:

    False
    

Post a comment

Comments

Nothing yet..be the first to share wisdom.