Compare ERP Systems — the methodology that actually works
Comparing ERP systems is the part of the buying process where most mid-market projects either find their footing or lose it. The feature-table approach — rows for capabilities, columns for vendors, ticks and crosses in the cells — reliably misleads buyers because it weights every capability equally and rewards the vendor with the broadest marketing claims rather than the best fit. This page describes the methodology that experienced selection advisors actually use, with the criteria catalogue, the weighted scoring approach and the demo-day mechanics that produce defensible mid-market decisions.
Three comparison layers rather than one
Credible ERP comparison runs on three layers, each evaluated separately and rolled up at the end. Functional fit — does the product cover the functional requirements? Evaluated against the requirements document with MoSCoW prioritisation. Non-functional fit — performance, availability, security, data residency, language coverage, accessibility, deployment flexibility. Evaluated against the non-functional requirements with binary pass/fail on the must-haves and weighted scores on the rest. Commercial and partner fit — total cost over the contract horizon, contract terms, partner ecosystem density in DACH, implementation-partner reference quality, vendor financial stability.
Comparing on functional fit alone is the most common buyer mistake; the non-functional and commercial layers often eliminate vendors that score highest on functions. Comparing on commercial terms alone is the second most common mistake; cheap implementations of underfitted products are the slowest path to project failure.
Criteria catalogue and weighting
A mature mid-market comparison criteria catalogue has 80–150 line items split across the three layers. Functional criteria (50–80 items) trace back to the requirements document with MoSCoW priorities. Non-functional criteria (15–30 items) cover the technical dimensions. Commercial and partner criteria (15–30 items) cover contract, pricing, partner and vendor due-diligence questions.
The weighting matters more than the criteria themselves. A typical mid-market weight distribution: functional fit 50–60 % of the total score, non-functional 15–25 %, commercial and partner 20–30 %. Within functional fit, the Must-criteria carry a multiplier or are treated as pass/fail gates; the Should and Could criteria carry their nominal weights.
Weights should be set during the requirements phase and locked before the long-list issues. Adjusting weights after vendor responses arrive is the surest way to turn a structured comparison into a post-hoc rationalisation.
Scoring discipline
Each criterion is scored on a 0–4 scale: 0 not supported, 1 partially supported with workaround, 2 supported standard, 3 supported well with clear evidence, 4 exceeds requirement and demonstrates clear advantage. Five-point and ten-point scales produce false precision; three-point scales lose discrimination at the middle of the range. The 0–4 scale is the practical compromise.
Two scoring patterns avoid the typical pitfalls. Evidence-based scoring: every score above 2 requires documentary evidence (live demo, customer reference, technical documentation), not a vendor sales claim. Independent dual scoring: two evaluators score independently and reconcile discrepancies through evidence rather than discussion. Both patterns reduce the rating-game effect that vendors with strong sales teams otherwise exploit.
Demo days against the customer's own use cases
The short-list demos are the single most decisive evaluation event. Standard vendor demos are sales theatre that rewards the vendor with the slickest presenters; scripted demos against the customer's own use cases are the antidote. The customer supplies a demo script with 8–15 use cases drawn from the actual business, with the vendor preparing the demo against that script using a sandbox tenant.
The demo day runs for 4–8 hours per vendor, with a structured evaluation team from the customer side covering each functional area. The customer scores the demo against the same 0–4 scale immediately after each use case, with notes that reference specific evidence shown. Vendors that hedge or skip use cases score zero on those points; the temptation to forgive a missed use case in the name of fairness is the most expensive mistake the evaluation team can make.
Reference checks and on-site visits
Reference customers supplied by the vendor are screened. Reference customers found independently are the credible ones. Mid-market buyers should reach out via LinkedIn or industry networks to actual project leads at the vendor's recent customers (last 24 months, similar size, similar industry) and ask the questions that vendor-supplied references will not answer: scope creep, escalation behaviour, key-personnel turnover, realised-vs-planned go-live date, year-two satisfaction.
For the top 1–2 finalists, an on-site visit to a reference customer is the most valuable due-diligence investment a buyer can make. The visit reveals operational reality that no slide deck or video call can convey — whether key users actually use the system, whether the implementation-partner relationship is healthy, whether the customisations are maintainable.
From score to decision
The weighted score is an input to the decision, not a substitute for it. In a mature mid-market selection the top two finalists are typically within 5–10 % of each other in the total score; the choice between them is rarely won on the score alone. The qualitative factors that decide it: implementation-partner availability and quality, contract terms (term length, exit clauses, price-escalation caps), the cultural fit between the vendor's and customer's teams, and the strategic alignment of the product roadmap with the customer's direction.
The score discipline still matters because it produces a defensible decision and forces the qualitative trade-offs into the open rather than letting them happen by drift. Mid-market buyers who skip the structured score routinely end up with decisions that the project sponsor cannot explain at the steering-committee meeting, which is a procurement-governance failure independent of the technical outcome.
