Issue
I have a large (>500GB) Postgres dump in GCS that I would like to strip COPY
commands from. Given the size of the dump I would like to perform the substitution on a gsutil cat
stream rather than storing locally:
gsutil cat gs://mybucket.mydomain.com/path/to/mydump.sql | some_command > mydump-commands.sql
The COPY
command can span multiple lines but always ends with \.
on its own line.
I have tried with perl:
# some_command =
perl -pe 'BEGIN{undef $/;} s/COPY.*\\.//smg'
this works for a small sample local file (below) but does not seem to work streaming from stdin.
--
-- PostgreSQL database dump
--
-- Dumped from database version 14.1
-- Dumped by pg_dump version 14.1
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
SET default_tablespace = '';
SET default_table_access_method = heap;
--
-- Name: mytable; Type: TABLE; Schema: dummy; Owner: bchrobot
--
CREATE TABLE dummy.mytable (
id integer,
title text
);
ALTER TABLE dummy.mytable OWNER TO bchrobot;
--
-- Data for Name: mytable; Type: TABLE DATA; Schema: dummy; Owner: bchrobot
--
COPY dummy.mytable (id, title) FROM stdin;
1 my first title
2 my second title
\.
--
-- PostgreSQL database dump complete
--
Any suggestions for this, specifically for working with streams?
Solution
This sed
command, as the replacement of some_command
, will delete all lines between a line beginning with COPY
and a line consisting of \.
, including those two lines.
sed '/^COPY/,/^\\\.$/d'
Answered By - M. Nejat Aydin Answer Checked By - Clifford M. (WPSolving Volunteer)