Issue
I have similar problem of merging-multiple-files-based-on-the-common-column as href="https://superuser.com/questions/1245094/merging-multiple-files-based-on-the-common-column">https://superuser.com/questions/1245094/merging-multiple-files-based-on-the-common-column. I am very near to the solution but I am new to python. I need help with the tweaking of code for joining multiple files. My IDs and columns for individual file look like:
File1.txt
id SRR1071717
chr1:15039:-::chr1:15795:- 2
chr1:15948:-::chr1:16606:- 6
File2.txt
id SRR1079830
chr1:11672:+::chr1:12009:+ 10
chr1:11845:+::chr1:12009:+ 7
chrY:9756574:+::chrY:9757796:+ 0
My desired output
id SRR1071717 SRR1079830
chr1:15039:-::chr1:15795:- 2 0
chr1:15948:-::chr1:16606:- 6 0
chr1:11672:+::chr1:12009:+ 0 10
chr1:11845:+::chr1:12009:+ 0 7
chrY:9756574:+::chrY:9757796:+ 0 0
My code: Matrix.py
import sys
columns = []
data = {}
ids = set()
for filename in sys.argv[1:]:
with open(filename, 'rU') as f:
key = next(f).strip().split()[1]
columns.append(key)
data[key] = {}
for line in f:
if line.strip():
id, value = line.strip().split()
try:
data[key][int(id)] = value
except ValueError as exc:
raise ValueError(
"Problem in line: '{}' '{}' '{}'".format(
id, value, line.rstrip()))
ids.add(int(id))
print('\t'.join(['ID'] + columns))
for id in sorted(ids):
line = []
for column in columns:
line.append(data[column].get(id, '0'))
print('\t'.join([str(id)] + line))
I ran a python code as shown but it's not working correctly (being new to python). Current Output (two lines only!).
python3 matrix.py File\*.txt
Current output
id SRR1071717 SRR1079830
chrY:9756574:+::chrY:9757796:+ 0 0
Solution
Using any awk:
$ cat tst.awk
FNR == 1 { ++numCols }
{
if ( !($1 in ids2rows) ) {
rows2ids[++numRows] = $1
ids2rows[$1] = numRows
}
rowNr = ids2rows[$1]
vals[rowNr,numCols] = $2
}
END {
for ( rowNr=1; rowNr<=numRows; rowNr++ ) {
id = rows2ids[rowNr]
printf "%s", id
for ( colNr=1; colNr<=numCols; colNr++ ) {
val = ( (rowNr,colNr) in vals ? vals[rowNr,colNr] : 0 )
printf "%s%s", OFS, val
}
print ""
}
}
$ awk -f tst.awk File1.txt File2.txt
id SRR1071717 SRR1079830
chr1:15039:-::chr1:15795:- 2 0
chr1:15948:-::chr1:16606:- 6 0
chr1:11672:+::chr1:12009:+ 0 10
chr1:11845:+::chr1:12009:+ 0 7
chrY:9756574:+::chrY:9757796:+ 0 0
Answered By - Ed Morton Answer Checked By - Dawn Plyler (WPSolving Volunteer)