MySQL import (or: How I found the LazyWeb)

9:24 am work

O Lazyweb, o lazyweb…

This is the first time I have called on you, please don’t let me down.

I have a bunch of data files (dumped from an Oracle database) that look like this:

     17;    12,43;Data dump of integer and float fields
     18;    -4,10;The decimal separator is ,

I’d like to load the data into a MySQL database with

mysqlimport -Lr --fields-terminated-by=';' --fields-optionally-enclosed-by='"' database input.txt

but those pesky floating point fields are getting truncated at the , and no amount of messing with LC_NUMERIC gets it to use the comma as a decimal separator.

Anyone know how to massage mysql to do this?

7 Responses

  1. Stuart Langridge Says:

    I assume that chucking the file through sed ‘s/\([0-9]*; *-*[0-9]*\),\([0-9]*;\)/\1.\2/’ is unavailable for some reason?

  2. Raphaël Quinet Says:

    The mysqlimport tool is only sending the SQL command LOAD DATA INFILE… to the MySQL server. Changing LC_NUMERIC on the client side should have no effect. And you probably do not want to change LC_NUMERIC on the server side, so the best way is probably to transform the input file as suggested in the previous comment.

    If floating-point numbers can occur in any column, the substitution can be done with: ‘s/\([0-9]+\),\([0-9]+\)/\1.\2/’

  3. Dave Neary Says:

    the s/,/. trick doesn’t work because the database has a bucketload of numeric CSV fields like “12,15,32,18” to represent lists of numbers (don’t get me started).

    It looks like the easiest thing to do is to redo the dump (ech!) with

    alter session set NLS_NUMERIC_CHARACTERS=”.,”

    I also needed to add –fields-escaped-by=” since Oracle doesn’t quote fields or escape characters in there, and some text fields used ; as a separator.

  4. Dave Neary Says:

    Alternatively, starting the server in French mode, doing the import, then restarting it in English mode is an option. Will try that (thanks for the idea, Raphaël).

  5. Jared Sulem Says:

    You could try the following program to convert floating fields seperated with commas, into fields seperating with decimal points. Input and output is from/to standard input and standard output.

    —-
    #!/usr/bin/env python

    import sys
    import re
    import csv

    if __name__ == “__main__”:
    reader = csv.reader(sys.stdin, delimiter = “;”)
    writer = csv.writer(sys.stdout, delimiter = “;”)

    fre = re.compile(“^(\s*[0-9]*),([0-9]*\s*)$”)

    for row in reader:
    newrow = ()
    for field in row:
    match = fre.match(str(field))
    if match:
    newrow = newrow + (match.group(1) + “.” + match.group(2), )
    else:
    newrow = newrow + (field, )

    writer.writerow(newrow)

  6. Jared Sulem Says:

    It might be best to copy the program in the previous comment from the source HTML, since the tabs are lost in the browser rendering.

  7. Raphaël Quinet Says:

    If the database output uses commas as decimal separators and uses the same character for separating list elements, then it is asking for trouble.

    But these two meanings for the comma are almost certainly occurring in different columns, so the s/,/./ trick can still be used as suggested by Stuart Langridge: make sure that your regexp only matches the column in which the replacements should be made.