Page 1 of 1

Very slow Recipient Import

Posted: Thu Nov 02, 2006 10:24 pm
by quasimotoca
Hi:
The import of recipient lists is very very slow. Even on moderately sized lists of around 5K. I tried a list of 185K and I got a heap error. Is this something that you are aware of and are looking into? Is there a quick and dirty way I can bulk load a .csv file into the database? Maybe a simple SQL query from the command-line? Other than that, killer system folks, well done!!
Cheers,
Dave Cook

Index

Posted: Fri May 18, 2007 4:19 pm
by ramnel
I was having the same problem and a friend found that the file was not index. It now will upload 5,000 emails in 4 seconds. I didn't do it myself but I know any "MYSQl how to book" will tell you how to index a file.

Posted: Wed Sep 12, 2007 6:32 am
by carlshark
Same problem here.

Yes, but the databases have default indexes already. customer_1_tbl has customer_id as index. To what column did you change your index to?

yes.

Posted: Wed Nov 07, 2007 4:41 pm
by gcarr
please share your solutions as it has been frustrating for a non-uber-techie - but I have gotten this far...
thanks for your help

Posted: Tue Nov 20, 2007 1:14 am
by carlshark
Aside from logging in through SSH, starting top, and renice-ing the mysqld process, I don't have any fast solution.

To optimize MySQL settings using mysqladmin, I had to look up the man page and play around with max settings.

Even when optimized, I still cannot reliably upload more than 6000+ email addresses (minimum of 5 fields only, CSV). It takes me about an hour to upload a list of this size, and you have to babysit it.

import performance

Posted: Tue Nov 20, 2007 8:37 am
by maschoff
Could you please post the OpenEMM and the MySQL version you are using?

Posted: Wed Nov 28, 2007 2:06 am
by austin
I have seen this problem and the single issue is that none of the tables have more than one index on them (primary key) which causes everything to be very slow.

Rule of thumb: If you are looking up a field (column) in WHERE or ORDER BY that field needs to have an index. Yes you can end up with a lot of extra space used but not having any keys is absolutely stupid. Whoever came up with the schema for these tables needs to get a book on SQL and read the sections on Indexing and Optimizations.

I will hopefully post some sql for you guys in a few days with proper indexes created on them.

Example: customer_1_tbl has no keys other than primary.
Add indexes on the following: email, mailtype, datasourrce_id, and any custom defined fields. When you run an import and verify email address against all records it will take 1 day to complete if you have several thousand records.

Does any one have a list of most of the sql queries used in the import and sending sections? I am not good with Java so it will take me a lot longer to find the queries than for someone just to list them all out.

Whoever is in charge of the SQL please, please go read a manual then go back an optimize the table Schemas to add the proper indexes.

-Austin

Posted: Wed Nov 28, 2007 2:23 am
by austin
On second glance it also appears that there is no indexing on the tmp tables created during the import process which slows things considerably as well.

-Austin

Posted: Wed Nov 28, 2007 12:26 pm
by maschoff
Field email is indexed. If you plan to use a different field (like customer ID) for check of duplicates or synchronization you should put an index on this field.

If a MySQL expert can provide an improvement for our code we are happy to accept it since we are more familiar with Oracle.

Posted: Thu Dec 06, 2007 4:14 am
by yiyihui
I have same problem. I try to monitor mysql server and find the following sql query could cause this problem:
select count(*) from ( SELECT distinct cust.customer_id, cust.gender, cust.firstname, cust.lastname, cust.email FROM customer_1_tbl cust WHERE 1 ) as tmp_tbl

I don't know why using this query to count the total number of email. when I run this query, it needs more 4 minutes to get the result (I have 462464 emails), but if I use "select count(*) from customer_1_tbl", it will get same result and just spend less than 1 second.

Posted: Fri Dec 07, 2007 12:23 pm
by duke
I've had the same problem. I tried to import ~50k user and it was awfully slow. After 1/2 an hour I stopped the process and looked deeper into the database. Here is my solution:

- connect to the mysql-server an change to the openemm-db
- add indexes to 'email', 'gender', 'firstname' and 'lastname' within the customer_1_tbl-table:

Code: Select all

mysql> ALTER TABLE customer_1_tbl ADD INDEX(email);
mysql> ALTER TABLE customer_1_tbl ADD INDEX(gender);
mysql> ALTER TABLE customer_1_tbl ADD INDEX(firstname);
mysql> ALTER TABLE customer_1_tbl ADD INDEX(lastname);
After that I was able to import the ~50k users within ~30 seconds.

I added some more indexes to customer_1_tbl and customer_1_binding_tbl which will hopefully speed up mail creation, too (well, I actually can't compare ... but indexes are usefull most of the time ;))

Posted: Fri Dec 07, 2007 1:33 pm
by duke
Just for the record:
with growing addresses the import slows down again. I needed ~1 minute for the last 9k users of my ~340k list.

The recipient overview slows down, too. I think the problem is the creation of temp-tables, I don't know why they are used (this needs a lot of diskspace, too ... ~600MB in my case).

Posted: Thu Dec 13, 2007 3:38 pm
by maschoff
From release 5.3.2 on field email is indexed by default. Thanks for your input!