Compare ERP Systems — the methodology that actually works

Comparing ERP systems is the part of the buying process where most mid-market projects either find their footing or lose it. The feature-table approach — rows for capabilities, columns for vendors, ticks and crosses in the cells — reliably misleads buyers because it weights every capability equally and rewards the vendor with the broadest marketing claims rather than the best fit. This page describes the methodology that experienced selection advisors actually use, with the criteria catalogue, the weighted scoring approach and the demo-day mechanics that produce defensible mid-market decisions.

Three comparison layers rather than one

Credible ERP comparison runs on three layers, each evaluated separately and rolled up at the end. Functional fit — does the product cover the functional requirements? Evaluated against the requirements document with MoSCoW prioritisation. Non-functional fit — performance, availability, security, data residency, language coverage, accessibility, deployment flexibility. Evaluated against the non-functional requirements with binary pass/fail on the must-haves and weighted scores on the rest. Commercial and partner fit — total cost over the contract horizon, contract terms, partner ecosystem density in DACH, implementation-partner reference quality, vendor financial stability.

Comparing on functional fit alone is the most common buyer mistake; the non-functional and commercial layers often eliminate vendors that score highest on functions. Comparing on commercial terms alone is the second most common mistake; cheap implementations of underfitted products are the slowest path to project failure.

Criteria catalogue and weighting

A mature mid-market comparison criteria catalogue has 80–150 line items split across the three layers. Functional criteria (50–80 items) trace back to the requirements document with MoSCoW priorities. Non-functional criteria (15–30 items) cover the technical dimensions. Commercial and partner criteria (15–30 items) cover contract, pricing, partner and vendor due-diligence questions.

The weighting matters more than the criteria themselves. A typical mid-market weight distribution: functional fit 50–60 % of the total score, non-functional 15–25 %, commercial and partner 20–30 %. Within functional fit, the Must-criteria carry a multiplier or are treated as pass/fail gates; the Should and Could criteria carry their nominal weights.

Weights should be set during the requirements phase and locked before the long-list issues. Adjusting weights after vendor responses arrive is the surest way to turn a structured comparison into a post-hoc rationalisation.

Scoring discipline

Each criterion is scored on a 0–4 scale: 0 not supported, 1 partially supported with workaround, 2 supported standard, 3 supported well with clear evidence, 4 exceeds requirement and demonstrates clear advantage. Five-point and ten-point scales produce false precision; three-point scales lose discrimination at the middle of the range. The 0–4 scale is the practical compromise.

Two scoring patterns avoid the typical pitfalls. Evidence-based scoring: every score above 2 requires documentary evidence (live demo, customer reference, technical documentation), not a vendor sales claim. Independent dual scoring: two evaluators score independently and reconcile discrepancies through evidence rather than discussion. Both patterns reduce the rating-game effect that vendors with strong sales teams otherwise exploit.

Demo days against the customer's own use cases

The short-list demos are the single most decisive evaluation event. Standard vendor demos are sales theatre that rewards the vendor with the slickest presenters; scripted demos against the customer's own use cases are the antidote. The customer supplies a demo script with 8–15 use cases drawn from the actual business, with the vendor preparing the demo against that script using a sandbox tenant.

The demo day runs for 4–8 hours per vendor, with a structured evaluation team from the customer side covering each functional area. The customer scores the demo against the same 0–4 scale immediately after each use case, with notes that reference specific evidence shown. Vendors that hedge or skip use cases score zero on those points; the temptation to forgive a missed use case in the name of fairness is the most expensive mistake the evaluation team can make.

Reference checks and on-site visits

Reference customers supplied by the vendor are screened. Reference customers found independently are the credible ones. Mid-market buyers should reach out via LinkedIn or industry networks to actual project leads at the vendor's recent customers (last 24 months, similar size, similar industry) and ask the questions that vendor-supplied references will not answer: scope creep, escalation behaviour, key-personnel turnover, realised-vs-planned go-live date, year-two satisfaction.

For the top 1–2 finalists, an on-site visit to a reference customer is the most valuable due-diligence investment a buyer can make. The visit reveals operational reality that no slide deck or video call can convey — whether key users actually use the system, whether the implementation-partner relationship is healthy, whether the customisations are maintainable.

From score to decision

The weighted score is an input to the decision, not a substitute for it. In a mature mid-market selection the top two finalists are typically within 5–10 % of each other in the total score; the choice between them is rarely won on the score alone. The qualitative factors that decide it: implementation-partner availability and quality, contract terms (term length, exit clauses, price-escalation caps), the cultural fit between the vendor's and customer's teams, and the strategic alignment of the product roadmap with the customer's direction.

The score discipline still matters because it produces a defensible decision and forces the qualitative trade-offs into the open rather than letting them happen by drift. Mid-market buyers who skip the structured score routinely end up with decisions that the project sponsor cannot explain at the steering-committee meeting, which is a procurement-governance failure independent of the technical outcome.

Also consider:SAP Business One · Microsoft Dynamics 365 Business Central

Frequently Asked Questions

How long does the comparison phase take?

For a mature mid-market selection: 8–14 weeks from issuing the requirements document to the contract-award decision. Two to four weeks for vendor response, four to six weeks for short-list demos and reference checks, two to four weeks for final scoring and decision. Faster cycles tend to skip the demo discipline; slower cycles tend to lose vendor engagement and miss commercial terms.

Can we compare cloud and on-premises options in the same shortlist?

Possible but requires care. The non-functional scoring will diverge significantly (deployment model, upgrade cadence, customisation freedom), and the commercial models do not compare line-for-line. The pragmatic approach is to make the deployment-model preference explicit during the requirements phase and either commit to a single model on the short-list or accept that the cross-model comparison requires an extra reconciliation step at the end.

Should we ask vendors to score themselves first?

Yes — the RFP response asks vendors to self-score against the requirements document with evidence. The customer-side evaluation team then re-scores independently, with the vendor self-score as one input but not the decisive one. Discrepancies above one point on the 0–4 scale trigger a clarification request to the vendor before final scoring. This pattern produces better evidence quality than purely customer-side scoring against vendor slide decks.

What if no vendor scores above the threshold on the Must criteria?

Reopen the Must list. A scenario where no available vendor meets all Musts usually indicates that the requirements document is over-specified, not that the market is empty. A workshop with the requirements-document authors and the project sponsor — revisiting the Musts with the explicit question ‘is this really a non-negotiable’ — typically demotes 3–5 Musts to Should and resolves the deadlock. The alternative (proceeding with a vendor that misses Musts) usually produces a project failure 12 months later.