MySQL import (or: How I found the LazyWeb)
December 13, 2005 9:24 am workO Lazyweb, o lazyweb…
This is the first time I have called on you, please don’t let me down.
I have a bunch of data files (dumped from an Oracle database) that look like this:
17; 12,43;Data dump of integer and float fields 18; -4,10;The decimal separator is ,
I’d like to load the data into a MySQL database with
mysqlimport -Lr --fields-terminated-by=';' --fields-optionally-enclosed-by='"' database input.txt
but those pesky floating point fields are getting truncated at the , and no amount of messing with LC_NUMERIC gets it to use the comma as a decimal separator.
Anyone know how to massage mysql to do this?
December 13th, 2005 at 7:59 pm
I assume that chucking the file through sed ‘s/\([0-9]*; *-*[0-9]*\),\([0-9]*;\)/\1.\2/’ is unavailable for some reason?
December 13th, 2005 at 11:32 pm
The mysqlimport tool is only sending the SQL command LOAD DATA INFILE… to the MySQL server. Changing LC_NUMERIC on the client side should have no effect. And you probably do not want to change LC_NUMERIC on the server side, so the best way is probably to transform the input file as suggested in the previous comment.
If floating-point numbers can occur in any column, the substitution can be done with: ‘s/\([0-9]+\),\([0-9]+\)/\1.\2/’
December 14th, 2005 at 2:24 am
the s/,/. trick doesn’t work because the database has a bucketload of numeric CSV fields like “12,15,32,18” to represent lists of numbers (don’t get me started).
It looks like the easiest thing to do is to redo the dump (ech!) with
alter session set NLS_NUMERIC_CHARACTERS=”.,”
I also needed to add –fields-escaped-by=” since Oracle doesn’t quote fields or escape characters in there, and some text fields used ; as a separator.
December 14th, 2005 at 2:25 am
Alternatively, starting the server in French mode, doing the import, then restarting it in English mode is an option. Will try that (thanks for the idea, Raphaël).
December 14th, 2005 at 3:11 am
You could try the following program to convert floating fields seperated with commas, into fields seperating with decimal points. Input and output is from/to standard input and standard output.
—-
#!/usr/bin/env python
import sys
import re
import csv
if __name__ == “__main__”:
reader = csv.reader(sys.stdin, delimiter = “;”)
writer = csv.writer(sys.stdout, delimiter = “;”)
fre = re.compile(“^(\s*[0-9]*),([0-9]*\s*)$”)
for row in reader:
newrow = ()
for field in row:
match = fre.match(str(field))
if match:
newrow = newrow + (match.group(1) + “.” + match.group(2), )
else:
newrow = newrow + (field, )
writer.writerow(newrow)
December 14th, 2005 at 3:14 am
It might be best to copy the program in the previous comment from the source HTML, since the tabs are lost in the browser rendering.
December 14th, 2005 at 3:42 am
If the database output uses commas as decimal separators and uses the same character for separating list elements, then it is asking for trouble.
But these two meanings for the comma are almost certainly occurring in different columns, so the s/,/./ trick can still be used as suggested by Stuart Langridge: make sure that your regexp only matches the column in which the replacements should be made.