Analytics/Cluster/Camus

From Wikitech


This info is for members of analytics team, thus far we are just cutting and pasting a set of commands that helped us to test avro decoding via camus in 1002


How to produce to kafka

cat test_message.txt  | kafkacat -b  kafka1012.eqiad.wmnet:9092 -t test

Test message is a file like:

{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "1"}}
{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "2"}}
{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "3"}}
{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "4"}}


How to validate your data against your avro schema

We have found php bindings to be different than java ones, please validate messages using this java jar:

java -jar avro-tools-1.7.6.jar jsontofrag --schema-file CirrusSearchRequestSet.avsc searchmessage.json

How to run camus job to decode avro from kafka topic

Camus is our map reduce job but also has some of the code we depend on, thus camus jar appears twice.

Note that you need your local properties file to pass to camus. Note: "-P /home/user/avro-kafka/camus.avro.json.properties" below

"Real" properties files live on puppet: [1]

#!/bin/sh
export LIBJARS=/home/user/avro-kafka/camus-wmf-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-etl-kafka-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-api-0.1.0-wmf6.jar,/home/user/av
ro-kafka/camus-kafka-coders-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-schema-registry-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-parent-0.1.0-wmf6-tests.jar,/home/user/avro-kafka/refinery-camus-0.0.20-SNAPSHOT.jar

export HADOOP_CLASSPATH=/home/user/avro-kafka/camus-wmf-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-etl-kafka-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-api-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-kafka-coders-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-schema-registry-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-parent-0.1.0-wmf6-tests.jar:/home/user/avr
o-kafka/refinery-camus-0.0.20-SNAPSHOT.jar

/usr/bin/hadoop jar /home/user/avro-kafka/camus-wmf-0.1.0-wmf6.jar com.linkedin.camus.etl.kafka.CamusJob -libjars ${LIBJARS}  -Dcamus.job.name="some_avro_test"  -P /home/user/avro-kafka/camus.avro.json.properties >>  ./log_camus_avro_test.txt 2>&1