How to read a CSV file using Perl? Reading and processing text files is one of the common tasks done by Perl. For example, often you encounter. CSV file (where CSV stand for Comma- separated values). Here is an example with three solutions. Good, Better, Best. The first is a reasonable solution for simple CSV files, that does not require anything beyond Perl. The second fixes some problems caused by a slightly more complex CSV files. The third is probably the best solution. The price is that these solutions depend on a. CPAN. Pick the one that matches your needs. I got a CSV file that looked like this. Tudor,Vidor,1. 0,Hapci. Szundi,Morgo,7,Szende. Kuka,Hofeherke,1. Dealing with special characters This simple solution will not work if you have a comma in a value, like for instance when the column is an address and it has a value is like “North street, 1”. In such cases, the column value in the CSV. Having a CSV file like this: HEADER 'first, column'|'second 'some random quotes' column'|'third ol' column' FOOTER and looking for result like: HEADER first, column|second 'some random quotes'. Kiralyno. Boszorkany,Herceg,9,Meselo. This is a CSV file. In each row there are fields separated with comma. Of course the separator can be any character as long as it is the same in the whole file. Most common separators are comma (CSV) and TAB (TSV) but people often use semi- colon or pipe | as well. Anyway, the task was to summarize the number in the 3rd column. The process should go like this. Read in the file line by line. For each line, extract the 3rd column. Add the value to a central variable where we accumulate the sum. Perl CSV column data extraction FAQ: Can you share an example of how to extract one or more columns from a Perl CSV file or other similarly-formatted flat text file? Perl is a terrific language for text processing, but several readers. Reading and processing text files is one of the common tasks done by Perl. For example, often you encounter a CSV file (where CSV stand for Comma-separated values) and you need to extract some information from there. Here is an. I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name. Some of the ideas we are. I used this Hive query to export a table into a CSV file. INSERT OVERWRITE DIRECTORY '/user/data/output/test' select column1, column2 from table1; The file generated '000000_0' does not have comma.We have already learned earlier how to read a file line by line so. I cannot use substr() easily as the location of the 3rd field is changing. What is fixed is that it is between the 2nd and the 3rd comma. I could use index() 3 times on each row to locate the 2nd and the 3rd comma. Perl has a much easier way for this. Using splitsplit() usually gets two parameters. The first is a knife, the second is the string that. The knife is actually a regular expression but for now we can stick to simple strings there. If I have a string such as $str = "Tudor: Vidor: 1. Hapci" I can call. The array @fields will be filled. Tudor", "Vidor", "1. Hapci". If I print $fields[2]. I'll see 1. 0 on the screen as the indexing of the array starts from 0. In our case the field separator character is a comma , and not a colon. We can write our script like this. ARGV[0] or die "Need to get CSV file on the command line\n". Could not open '$file' $!\n". If you save this as csv. Comma in the field. Every time you get a CSV file you can use this script to add up the values in the 3rd column. Unfortunately at some point you get warnings while running your script. Argument " alma"" isn't numeric in addition (+) at csv. You open the CSV file and it looks like this. Tudor,Vidor,1. 0,Hapci. Szundi,Morgo,7,Szende. Kuka,"Hofeherke, alma",1. Kiralyno. Boszorkany,Herceg,9,Meselo. As you can see the 2nd field in the 3rd row has a comma in the value so the people who wrote the file. Hofeherke, alma". This is totally normal within the "standard". CSV, but our script cannot properly handle the situation. CSV. It just cuts where it finds the separator character. We need a more robust solution to read CSV files. Luckily we can find a module on CPAN called Text: :CSV that is a full CSV reader and writer. This module is written using Object Oriented Programming (OOP) principals. Even if you don't know what OOP is, you don't have to worry. We won't really learn OOP at this point. We learn a little more syntax and a few expression, just so. Here is the code. Text: :CSV- > new({ sep_char => ',' }). ARGV[0] or die "Need to get CSV file on the command line\n". Could not open '$file' $!\n". Line could not be parsed: $line\n". Text: :CSV is a 3rd- party extension to Perl. It provides a set of new functionality. CSV files. Perl programmers call these 3rd- party extension modules, though people coming from. At this point I assume you already have the module installed on your computer. We discuss separately. First we need to load the module using use Text: :CSV. We don't need to say what to import. It works in an object oriented way: you need to create. The module itself, Text: :CSV is the class and you can create an instance, also called object, by. In Perl there is no strict rule how to name the constructor. The way to call the constructor on the class is using. This call creates an object setting the separator character to be comma (,). An object is just a scalar value. Actually comma being the separator character is the default, but it seems it is clearer if I set it explicitly. Text: :CSV- > new({ sep_char => ',' }). Most of the other code is the same, but instead of the 2 lines of split and adding to $sum. The Text: :CSV module does not have a split function. In order to split the code you need to call the. OOP phrase - the "parse method". Again we use the arrow (- > ). This call will try to parse the current line and will split it up to pieces. It will. not return the pieces. It will return true or false depending on its success or failure. One common case when it would fail is if there is only a single quotation. Kuka,"Hofeherke, alma,1. Kiralyno. If it fails we fall in the else part, print a warning, and go to the next line. If it succeeds we call the fields method that will return the pieces. Then we can fetch the 3rd element (index 2) which. Multi- line fields. There can be further "problems" with the CSV file. For example some fields might contain embedded newlines. Tudor,Vidor,1. 0,Hapci. Szundi,Morgo,7,Szende. Kuka,"Hofeherke. alma",1. Kiralyno. Boszorkany,Herceg,9,Meselo. The way we currently handle the CSV file cannot solve this problem but the Text: :CSV. This example is based on a comment by H. Merijn Brand, the maintainer of the. Text: :CSV_XS module. ARGV[0] or die "Need to get CSV file on the command line\n". Text: :CSV- > new ({. Could not open '$file' $!\n". This changes the whole way we handle the file. Instead of reading manually line- by- line. Text: :CSV module to read, what it considers a line. This will let it handle. We also turned on a couple of other flags in the module. UTF- 8 characters correctly. In addition, in this example the getline method returns a reference to an array. Lastly, after we finished the loop we still need to check if we reached the end- of- file (eof)? So we check if we reached the end of the file. If not, then we print the error message. BTW, in case you were wondering, the values in the CSV file are the names of the. In the comments, please wrap your code snippets within tags and use spaces for indentation.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
October 2016
Categories |