MySQL import (or: How I found the LazyWeb)

December 13, 2005 9:24 am Dave Neary work

O Lazyweb, o lazyweb…

This is the first time I have called on you, please don’t let me down.

I have a bunch of data files (dumped from an Oracle database) that look like this:

     17;    12,43;Data dump of integer and float fields
     18;    -4,10;The decimal separator is ,

I’d like to load the data into a MySQL database with

mysqlimport -Lr --fields-terminated-by=';' --fields-optionally-enclosed-by='"' database input.txt

but those pesky floating point fields are getting truncated at the , and no amount of messing with LC_NUMERIC gets it to use the comma as a decimal separator.

Anyone know how to massage mysql to do this?

7 Responses

Stuart Langridge Says:
December 13th, 2005 at 7:59 pm
I assume that chucking the file through sed ‘s/$[0-9]*; *-*[0-9]*$,$[0-9]*;$/\1.\2/’ is unavailable for some reason?
Raphaël Quinet Says:
December 13th, 2005 at 11:32 pm
The mysqlimport tool is only sending the SQL command LOAD DATA INFILE… to the MySQL server. Changing LC_NUMERIC on the client side should have no effect. And you probably do not want to change LC_NUMERIC on the server side, so the best way is probably to transform the input file as suggested in the previous comment.

If floating-point numbers can occur in any column, the substitution can be done with: ‘s/$[0-9]+$,$[0-9]+$/\1.\2/’
Dave Neary Says:
December 14th, 2005 at 2:24 am
the s/,/. trick doesn’t work because the database has a bucketload of numeric CSV fields like “12,15,32,18” to represent lists of numbers (don’t get me started).

It looks like the easiest thing to do is to redo the dump (ech!) with

alter session set NLS_NUMERIC_CHARACTERS=”.,”

I also needed to add –fields-escaped-by=” since Oracle doesn’t quote fields or escape characters in there, and some text fields used ; as a separator.
Dave Neary Says:
December 14th, 2005 at 2:25 am
Alternatively, starting the server in French mode, doing the import, then restarting it in English mode is an option. Will try that (thanks for the idea, Raphaël).
Jared Sulem Says:
December 14th, 2005 at 3:11 am
You could try the following program to convert floating fields seperated with commas, into fields seperating with decimal points. Input and output is from/to standard input and standard output.

—-
#!/usr/bin/env python

import sys
import re
import csv

if __name__ == “__main__”:
reader = csv.reader(sys.stdin, delimiter = “;”)
writer = csv.writer(sys.stdout, delimiter = “;”)

fre = re.compile(“^(\s*[0-9]*),([0-9]*\s*)$”)

for row in reader:
newrow = ()
for field in row:
match = fre.match(str(field))
if match:
newrow = newrow + (match.group(1) + “.” + match.group(2), )
else:
newrow = newrow + (field, )

writer.writerow(newrow)
Jared Sulem Says:
December 14th, 2005 at 3:14 am
It might be best to copy the program in the previous comment from the source HTML, since the tabs are lost in the browser rendering.
Raphaël Quinet Says:
December 14th, 2005 at 3:42 am
If the database output uses commas as decimal separators and uses the same character for separating list elements, then it is asking for trouble.

But these two meanings for the comma are almost certainly occurring in different columns, so the s/,/./ trick can still be used as suggested by Stuart Langridge: make sure that your regexp only matches the column in which the replacements should be made.

Safe as Milk

MySQL import (or: How I found the LazyWeb)

7 Responses

Search Posts

Recent Posts

Recent Comments

Archives

GNOME News