Thursday, February 17, 2022

[SOLVED] Print lines that are not numbers

Issue

Quite simply, I have a csv file, with one column that should only contain integers. However, not all of them are integers, and I want to check this file (over 5 gigabytes large) and to capture the line numbers and (preferably) the values that are not integers. I've tried a number of things, such as using masks, but to no avail.

For example, we have the following csv table:

ID
5342
76375
sdfg23
2342lslf
jfijfojwo
395-34425
abc-24523
afhfhue3224

I would want to know that lines 3, 4, 5, 6, 7, and 8 are not integers. Output would look like (as a dataframe/table equivalent):

+-------------+------+
| ID          | Row  |
+-------------+------+
| sdfg23      | 3    |
| 2342lslf    | 4    |
| jfijfojwo   | 5    |
| 395-34425   | 6    |
| abc-24523   | 7    |
| afhfhue3224 | 8    |
+-------------+------+

Or even just spilling the line numbers to standard out would be really helpful.

I've tried things like using sed for example: sed -n '/?![[:digit:]]=' csvfile.csv


Solution

You can check if any line contains any non-digit character.

$ # -n option enables line number in output
$ grep -n '[^0-9]' ip.txt
1:ID
4:sdfg23
5:2342lslf
6:jfijfojwo
7:395-34425
8:abc-24523
9:afhfhue3224

If you need further processing, awk would suit. Below is just an example, you can modify as per your needs.

$ awk 'NR==1{print "ID Row"; next} /[^0-9]/{print $0, NR-1}' ip.txt
ID Row
sdfg23 3
2342lslf 4
jfijfojwo 5
395-34425 6
abc-24523 7
afhfhue3224 8


Answered By - Sundeep
Answer Checked By - Robin (WPSolving Admin)