Sunday, February 27, 2022

[SOLVED] Combine the content of couple of files with shared part of file name

Issue

I have a number of files named in this way:

  1223_1_myCount.txt   
  1223_1_myCount2.txt       
  1223_2_myStatistic.txt
  1223_2_myDiscarded.txt  
  1223_3_myExample.txt    
  1223_3_myStatistic.txt   
  ................     

For a total of 1000 couple of files. Is there a way to combine the content of the files two by two by matching 1, equally for 2, for 3 that is the only one part of the file name in common in the couple? For each combination of contents I would like also to write it on a file.


Solution

There may be more elegant ways to do this, but the following approach should work:

library(readtext)
library(dplyr)
library(purrr)
library(stringr)

Get file names from your working directory. The following code assumes only files of interest exist there. Otherwise you can filter with grep(".txt",file_names) etc.

file_names <- list.files(full.names = F)

Read files from the list file_names into a dataframe df as rows using map_dfr from purrr

df  <- map_dfr(file_names,readtext)

Create a new variable file_index to be used with group_by for concatenating text from files with identical file_index value (1, 2, or 3). Use str_c to collapse strings. You can change the pattern for combo_file_name within paste if you desire a different way to name the files containing the combined text

combo_data <- df %>%  mutate(file_index = sapply(strsplit(doc_id, "_"), "[", 2)) %>% group_by(file_index) %>% 
              summarize(combo_file_name = paste("combo_file",unique(file_index),sep="_") , combo_text = str_c(text, collapse = ", ")) %>%
              ungroup() %>% select(combo_file_name,combo_text)

Create a function to write files using combo_data as input and save files as combo_1.txt, combo_2.txt etc.

write_file <- function (df_in){
  fileConn <- file(paste(df_in[1],".txt",sep=""))
  writeLines(df_in[2], fileConn)
  close(fileConn)
}

apply(combo_data,1,write_file)

Use getwd() to find the working directory where the combined files are saved



Answered By - sachin2014
Answer Checked By - Marilyn (WPSolving Volunteer)