Preventing the Database from Consuming Duplicate Events

To prevent the database from consuming duplicate events, you must reset the mf-event-avro-enriched topic so that it does not have events that the database has already consumed.

  1. Log in to the Transformation Hub node.
  2. To reset the offset record for the mf-event-avro-enriched topic, run the following commands:
    Use the offset value stored in the db_scheduler_offset.csv file.
    NS=$(kubectl get namespaces | awk '/arcsight/{print $1}')
    db_offsets=db_scheduler_offset.csv; n=$(grep -c "^[0-9]" $db_offsets); fmt='{"topic": "mf-event-avro-enriched", "partition": %s, "offset": %s}%s\n'; awk -v FS="," -v n="$n" -v f="$fmt" 'BEGIN{print "{\"partitions\": [ "; c=","}  /^[0-9]/{if(++r==n){c=""};o=$2+1;printf(f, $1, o, c)} END{print "], \"version\":1 }" }' $db_offsets | tee /tmp/offsets.json
    kubectl cp /tmp/offsets.json $NS/th-kafka-0:/tmp/offsets.json; kubectl exec -n $NS th-kafka-0 -- kafka-delete-records --bootstrap-server localhost:9092 --offset-json-file /tmp/offsets.json

    The output from the first two commands should be similar to the following content:

  3. (Optional) To verify that the offsets have been correctly updated, run the following command:
    kubectl exec -n $NS th-kafka-0 -- kafka-run-class kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic mf-event-avro-enriched  --time -2

    The output for this command should be similar to the output described Step 2.

  4. Continue to Upgrading the ArcSight Database