Saturday, February 21, 2009

PHP: how to make amendments to a CSV file

in my third article for today i want to write about a problem i had with a massive csv file the other night:

i got this big CSV file which contains over 3300 lines of entries. those entries include a date, but i can't use that date for sorting. first of all it's written among a string of other details; it's written as 20 February 2000 or February 20, 2000 or February 20 2000. that's 3 different ways. i decided that i needed another column in my CSV file where i'd store the date as a sortable value. i found that a leading id-number was followed by 3 whitespaces, and that those 3 whitespaces were otherwise not used in the document. so i decided to open my CSV file with WordPad and change the 3 whitespaces to 2 pipes thus: ||. inbetween those pipes i would store the date value i want to use for sorting. i spent about 2 nights entering the date by copying and pasting it right there when on the second night my screen got stuck again and i had to re-start my computer. i was just paging down the document when it happened; and i thought that i had just saved the document.

after re-starting my laptop i found out that the document had been in the process of saving when the screen got stuck. only a third of this document was still available to me. of course i had some kind of a backup copy, but that was the copy without the pipes. now i had to decide: do i start from scratch or am i going to look around for a script that could maybe do it with PHP which i wanted to do all the while anyway?.

i was a bit upset about all this since it was such a boring job to do and i knew i would be glad once i'd reached the end of the file.

alright, let me search the internet for some kind of a solution that may come close to what i want to achieve. now, what am i going to search for? how about php amend csv file?

in a german forum i found someone who tried to search for results in a CSV document: http://forum.de.selfhtml.org/archiv/2008/5/t171322/.

i started playing with that one. the person couldn't get his script to work.

here's my version of the script:

<pre style="font-family:'Courier New', Courier; font-size:11px; color:blue;">
<?php
define('SEARCH', '1961');
$s = (isset($_GET['s'])) ? $_GET['s'] : SEARCH;
$csv_file_open = fopen("worksheet.csv", "r");
$row = 1;
while ($data = fgetcsv ($csv_file_open, 1000, "|")) {
if (stristr ( $data[2], $s )) {
echo $data[0] . "|" . $s . "-";
echo $data[1] . "|";
echo $data[2] . "\n";
}
$row++;
}
fclose($csv_file_open);
?>
</pre>


this code snippet lets me search my document for the date 1961 (as a default value) and squeezes it between the id number and the other string devided by the pipe symbol, provided it finds the date value i've given. if i want another search result i could add ?s=1980, for instance, as a url parameter after the file name. the problem is, if i search for New York City, it would enter New York City, where found, between those 2 pipe symbols ... hm ... not exactly what i had in mind.

i gave the idea some more thought and came up with a solution that uses $y as a variable for the year, $m as a variable for the month and $d as a variable for the day. hm, that looks pretty good for starters.

as usual, i made a copy of my first example for keepsake and decided to work with a new php file i named csv-add-date.php.

i kept some of the old code, but for the moment i wouldn't need the $s variable.

here's my code in the csv-add-date.php file:

<pre style="font-family:'Courier New', Courier; font-size:11px; color:blue;">
<?php
define('SEARCH', '1961');
$s = (isset($_GET['s'])) ? $_GET['s'] : SEARCH;
$csv_file_open = fopen("clean_sheet.csv", "r");
$row = 1;
while ($data = fgetcsv ($csv_file_open, 1000, "|")) {
include ( 'csv-calculate-year.php' );
include ( 'csv-calculate-01-jan-day.php' );
include ( 'csv-calculate-02-feb-day.php' );
include ( 'csv-calculate-03-mar-day.php' );
include ( 'csv-calculate-04-apr-day.php' );
include ( 'csv-calculate-05-may-day.php' );
include ( 'csv-calculate-06-jun-day.php' );
include ( 'csv-calculate-07-jul-day.php' );
include ( 'csv-calculate-08-aug-day.php' );
include ( 'csv-calculate-09-sep-day.php' );
include ( 'csv-calculate-10-oct-day.php' );
include ( 'csv-calculate-11-nov-day.php' );
include ( 'csv-calculate-12-dec-day.php' );
echo $data[0] . "|" . $y . "-" . $m . "-" . $d;
echo $data[1] . "|";
echo $data[2] . "\n";
$row++;
}
fclose($csv_file_open);
?>
</pre>


i'm also using a new CSV file which i named clean_sheet.csv.

... and here is an excerpt from my csv-calculate-year.php:

<?php
if (stristr ( $data[2], '1958' )) { $y = '1958'; }
elseif (stristr ( $data[2], '1960' )) { $y = '1960'; }
elseif (stristr ( $data[2], '1961' )) { $y = '1961'; }
elseif (stristr ( $data[2], '1962' )) { $y = '1962'; }
elseif (stristr ( $data[2], '1963' )) { $y = '1963'; }
elseif (stristr ( $data[2], '1964' )) { $y = '1964'; }
elseif (stristr ( $data[2], '1965' )) { $y = '1965'; }
elseif (stristr ( $data[2], '1966' )) { $y = '1966'; }
elseif (stristr ( $data[2], '1967' )) { $y = '1967'; }
elseif (stristr ( $data[2], '1968' )) { $y = '1968'; }
elseif (stristr ( $data[2], '1969' )) { $y = '1969'; }
elseif (stristr ( $data[2], '1970' )) { $y = '1970'; }
...
?>


another example from one of the files that calculate the month and day:

<?php
if ( (stristr ( $data[2], ' 1 January ' )) || (stristr ( $data[2], ' January 1 ' )) || (stristr ( $data[2], ' January 1,' )) ) { $m = '01'; $d = '01'; }
elseif ( (stristr ( $data[2], ' 2 January ' )) || (stristr ( $data[2], ' January 2 ' )) || (stristr ( $data[2], ' January 2,' )) ) { $m = '01'; $d = '02'; }
elseif ( (stristr ( $data[2], ' 3 January ' )) || (stristr ( $data[2], ' January 3 ' )) || (stristr ( $data[2], ' January 3,' )) ) { $m = '01'; $d = '03'; }
elseif ( (stristr ( $data[2], ' 4 January ' )) || (stristr ( $data[2], ' January 4 ' )) || (stristr ( $data[2], ' January 4,' )) ) { $m = '01'; $d = '04'; }
elseif ( (stristr ( $data[2], ' 5 January ' )) || (stristr ( $data[2], ' January 5 ' )) || (stristr ( $data[2], ' January 5,' )) ) { $m = '01'; $d = '05'; }
elseif ( (stristr ( $data[2], ' 6 January ' )) || (stristr ( $data[2], ' January 6 ' )) || (stristr ( $data[2], ' January 6,' )) ) { $m = '01'; $d = '06'; }
elseif ( (stristr ( $data[2], ' 7 January ' )) || (stristr ( $data[2], ' January 7 ' )) || (stristr ( $data[2], ' January 7,' )) ) { $m = '01'; $d = '07'; }
elseif ( (stristr ( $data[2], ' 8 January ' )) || (stristr ( $data[2], ' January 8 ' )) || (stristr ( $data[2], ' January 8,' )) ) { $m = '01'; $d = '08'; }
elseif ( (stristr ( $data[2], ' 9 January ' )) || (stristr ( $data[2], ' January 9 ' )) || (stristr ( $data[2], ' January 9,' )) ) { $m = '01'; $d = '09'; }
elseif ( (stristr ( $data[2], ' 10 January ' )) || (stristr ( $data[2], ' January 10 ' )) || (stristr ( $data[2], ' January 10,' )) ) { $m = '01'; $d = '10'; }
...
?>


that last code is included in 12 files one for each month. in each one i had to replace the name of the month, say January with March for instance; and $m = '01 ought to be replaced with $m = '02 in the case of February.

those year and month/day files are included in the csv-add-date.php file using PHP's include function.

this csv-add-date.php file basically loads the clean_sheet.csv file and adds the year, month and day before it displays the content on the screen. it does not write the content to the clean_sheet.csv file. i think i'm going to get that done in another step to another new CSV file ... in case something should go wrong again.

now i can save myself a lot of time ... and boredom ... and repeat the process if necessary. this exercise has also taught me how to search through CSV files and display the results. that may be the topic of another article ...

i hope that this has been of some help to PHP coders.

Good Luck!

tom.paine


No comments: