## A related story: Anonymous hacker’s “revenge”
On December 10,2010, the hacker group Anonymous posted a message explaining the general motivation behind their latest attack code-named “Operation Prefect” (2010). Angered by the companies that had dropped their support for wikileaks, Anonymous called for retaliation by launching distributed denial of service (DDoS) attacks on some of the organisations involved. It is unsigned and unattributed, and is published as a PDF(Portable Document Format) file.
This is a document from that time. I dug it up to satisfy my curiosity…
Although not signed, the script is used to quickly find the metadata of the document (the anonops_the_press_release.pdf shown here is the actual original file, and the metadata of the file is still preserved…).
A few days later, Greek police arrested Mr. Alex Tapanaris…
Mr. Alex Tapanaris’s revenge mission was cut short
This example tells us that even if the technology is not good, don’t let others know that you made the seed…
At present, sensitive metadata still exists in a large number of domestic resource websites
Take technical book resources downloaded by bloggers from major domestic resource websites as an example:
(Don’t ask me where resources come from, as a programmer, I know a little bit about how to get resources…)
In order to avoid becoming the second Mr. Alex Tapanaris when “a certain degree of library” posts “resources” to earn points, here is the script that blogger just completed to delete PDF metadata in bulk, and how to use it:
Quickly clear PDF metadata
The effect after clearing
#### Get document metadata in bulk (check others):
import PyPDF2
from PyPDF2 import PdfFileReader
import sys
import os
import re
Get all PDF files in the directory
def getFiles(a):
files = os.listdir()
If a single PDF file is entered, only a single PDF metadata is output
try:
if sys.argv[1]:
files = [sys.argv[1]]
except:
pass
pdf_files = list()
for file_name in files:
try:
result = re.match(r".*\.pdf$", file_name)
if result:
pdf_files.append(file_name)
except Exception as e:
pass
return pdf_files
Print the meta information of the file
def printMeta(files):
for filename in files:
try:
pdfFile= PdfFileReader(open(filename, "rb"))
docInfo = pdfFile.getDocumentInfo()
print ("=== meta information for file %s is :"%filename)
for metaItem in docInfo:
print(metaItem,":",docInfo[metaItem])
except Exception as e:
print("-- file %s metadata cannot be read, skipped!"%filename)
if __name__ == "__main__":
filenames = getFiles()
printMeta(filenames)
Copy the code
#### Clear source information (hide yourself):
import sys
import os
import re
from PyPDF2 import PdfFileReader, PdfFileWriter
Get all PDF files in the directory
def getFiles(a):
files = os.listdir()
pdf_files = list()
for file_name in files:
try:
result = re.match(r".*\.pdf$", file_name)
if result:
pdf_files.append(file_name)
except Exception as e:
pass
return pdf_files
def get_page_num(file_name):
Get a pdfFileReader object
my_pdf = PdfFileReader(open(file_name,"rb"))
Get page number
page_num = my_pdf.getNumPages()
print("Page number of PDF file %s is %s"%(file_name, page_num))
return page_num
def create_new_pdf(file_names):
try:
os.mkdir("./pure")
except Exception as e:
pass
for file_name in file_names:
try:
Get the original PDF information
my_pdf = PdfFileReader(open(file_name,"rb"))
Create a PdfFileWriter object
new_pdf = PdfFileWriter()
for i in range(0, get_page_num(file_name)):
page_info = my_pdf.getPage(i)
new_pdf.addPage(page_info)
new_pdf.write(open("./pure/%s"%file_name, "wb"))
print("File %s has cleared metadata!"%file_name)
except Exception as e:
print("There is a problem with file %s encoding, it has been skipped automatically!"%file_name)
if __name__ == "__main__":
create_new_pdf(getFiles());
Copy the code