Week 3 – Day 2

Today was not fun 😩

Yeah, it was not fun but I did learn something. As I said yesterday, my main task now is to clearly define my goal or the anomalies I will be looking for by analyzing the logs. More than defining them, I have to answer the “how” question: how am I going to use the available information into the logs to detect my target anomalies?

Alright, now why was the day not fun? Well, I spent most of the day trying to understand the text in the logs. I can ensure you, it was not fun and it was the first time for me to do such activity.

However, it was not fun until I started understanding some outputs. 🙂 I was able to:

  • distinguish disks and systems checks which were done periodically (per minute)
  • distinguish what is recorded in the log file when a request in emitted either for Thor or Roxie.
  • find the log file which registers the info about Roxie Queries run on WsECL (ESP.log)

With what I have learned today, I developed some ideas about anomalies I can target. For example, Roxie is said to be the Rapid Data Delivery Engine. Hence, I may look for queries which takes too much time by analyzing the running time of queries in the log file. My analysis should help me set the baseline of what is considered an appropriate running time and use that to outline “anomalous queries”.

That is currently one of my ideas. I have to clarify everything tomorrow and format it well in a document which I will submit to my mentors for review. I do feel better than yesterday.

Hopefully, I am heading the right way 😉

Week 3 – Day 1

Hi there,

Hope you started your week well 🙂

As far as I am concerned, I am making little progress towards an unclear destination. In fact that was the main topic of my meeting today with my mentors about the project progress. Indeed, I did worked last week on an approach to produce a log-based anomaly detection system. However, I realized my procedure or approach had no specific target (yet). My target anomalies are not yet well defined.

To recall, I intend to build an anomaly detection system which would autonomously detect frauds in a log file and more precisely a Roxie log file. Before starting the project, I did some research about the topic and found many approaches where the developers did not have to understand the logs at all. However, to build their system, they did got some insights from subject matter experts or simply said, guys who had some understanding about the logs.

That is where my task for this week emerges. Now, I have to convert myself into a subject matter expert. I have to understand the Roxie Logs. My advisor did tell me about that requirement during my first meeting but I did not understand what he meant at that time. Now, I am reading the documentation on Roxie, re-watching the Online Classes on Roxie and also will (tomorrow) run queries on Roxie and try to understand the output in the log file. Hopefully, I will be able to get some heuristics and be able to pinpoint which anomalies I can detect through Roxie log analysis.

See you tomorrow and stay blessed 😉

Week 2 – Day 5

Last day of week!

Overall, the week has been fruitful. I am currently done with the Online Roxie classes (Yeah). Morevoer, I did a significant progress both on the solution design document and also the log parsing code. However, those two would be reviewed tomorrow morning during an audio conference call with my mentors.

To sum up, I did achieved about 90% of my objectives of the week. Tomorrow, after the meeting I will have a better perspective about the next steps.

See you tomorrow and stay blessed as always 🙂

Week 2 – Day 4

Today was an even “more interesting day”.

Again, I moved forward with the online lessons. It is now remaining me four lessons in the Roxie Online class part 2. After that I will be fully working on project implementation I think.

Concerning the solution design, I produced the document and improved it several times with precious and remarkable help of my mentor. I can say the document is 90% done. My supervisor still have to make remarks or suggestions on the last document update I sent her today.

Another important realization of the day was the log parsing code which I got done. My mentors still have to check the code still.

I am slowly moving from the learning and designing phase to implementation phase. I cannot wait to see the concrete solution working using HPCC Systems. I will share the Git Repo link here as soon as I get all approvals (solution design and code).

See you tomorrow for another day with LexisNexis 😊

Week 2 – Day 3

I would qualify this day as “interesting”.

I did finish the Roxie Course part 1 and started the Roxie course part 2. Part 2 has 12 lessons and I finished the first 4. I think I am on track to finish the online classes this week as scheduled.

Moreover, I have started the parsing code. I should have a kind of final version tomorrow which I will share with my mentors through the Git Repo I created today as well.

Also, I discussed with my mentor about the solution design. I sent her my approach and my understanding of the approach. Just as a reminder, I am trying to find anomalies in a log file using Unsupervised Learning Methods. In the approach I proposed, I used a random log sequence to demonstrate the approach. However, she told me to demonstrate using Roxie Logs which are the ones I would be officially using for the project and which I am collecting daily. I went into one of those logs and made some “interesting” remarks. I cannot say I understand those logs but I have found some patterns which could be very useful for my solution design. I did sent my observations to my mentor as well as some questions. I am waiting on her reply to improve and finalize my solution design.

Stay blessed and see you tomorrow 🙂

Week 2 – Day 2

Welcome to day 2 of week 2!

Today was nice, cool!

I did continue with the Roxie Course part 1. Part 1 is made of 14 lessons and I have the 10 first done. I should be done with Part 1 tomorrow therefore. Moreover, I almost finished the solution design document. I would revised it tomorrow and most probably send it to my mentors in the evening. The “cool” part of the day was when I got my little gift from LexisNexis as intern. Just to recall, I am working remotely from Kennesaw State University. They sent me a cool bag and many more. Awesome 🙂

Tomorrow’s plan is to

  • Finish part 1 of the Roxie classes
  • Finalize the design document
  • Create a Git Repo for the code and share the link with my mentors.
  • I will start writing the parsing code as soon the solution strategy and parsing approach gets approved.

Stay blessed 🙂

Week 2 – Day 1

Welcome to Week 2!

Here are my weekly objectives:

1- Parse Roxie Logs
2- Create the design Document
3- Finish the Online Classes

Today, I did started the Roxie classes and recorded the generated Logs. Moreover, I looked into the collected Roxie Logs to see how to go about parsing them. I have seen some interesting patterns already. Also, I have thought about the solution design which is not yet fully clear so far.

Tomorrow, I will start writing the design document hoping it will make things clearer. The design document will also help me validate my parsing approach. I am avoiding writing code without having a somehow clear idea of my target. I will also obviously continue with the online classes.

See you tomorrow and stay blessed 🙂

Week 1 – Day 5

Last working day of the week!

It was nice as I fulfilled my objective which was to finish the Advanced ECL part 2 course. I dealt with Superfiles on day 4 and today was about handling XML and JSON files as well parsing XML and free form text. It was not that difficult as long as you are able to clearly define your patterns and rules (I think that is the critical part).

Overall, the week was enriching. I can say I met my objectives to about 90%. Moreover, I met 4 other interns and 5 more will be coming. I did discuss with my mentors about the project’s next steps so as to have my tasks set for the coming week. Here they are:

  • Parse Roxie Logs
  • Finish the Online classes (Roxie courses)
  • Create a design document for the project

Hope you all had a wonderful week. Wish the best for the coming week 🙂

Week 1 – Day 4

Today was better than yesterday.

I finished the Advanced ECL Part 1 course which dealt with Normalization and denormalization. In summary, that course module teaches how to switch between flat data and relational data. I started Advanced ECL part 2 and finished the first section of it which discusses about “Superfiles”. They are handful to establish a folder structure on your cluster. The lessons in part 2 are so far shorter and lighter, even the lab exercises.

Apart from continuing the online classes and collecting my data, I did wrote and sent my first weekly report. Even though, I sent the report today, I will be working again tomorrow to make sure I fulfill the tasks requirements on time.

Concerning tomorrow, I should easily be able to finish the Advanced ECL part 2 course. See you then 🙂

Week 1 – Day 3

Today was a bit tougher.

Now I start to see the difference between the introductory ECL courses and the Advanced ECL ones. Today, I went over DENORMALIZE along with all the query exercises. During the introductory courses, I think I never went to a lab solution without the plain certitude that I did the lab well. Today was different. For some of the exercises, I was not getting the expected answer. Anyway, with the lab solution, I understood more about ECL which I have to confess, challenges me to think differently than in Java for example. At the end of the lessons I did recorded the Roxie Logs of the day which I will use as my dataset for the project.

Apart from having some little headaches with the online lessons, I had the pleasure to see other hard working interns as well as their blogs. It feels nice to be in the group.

Moreover I did discuss with my mentors about my thoughts for the future plans. For now, my next tasks for next week include:

  • Parse Roxie Logs
  • Create a design document
  • Finish the online lessons.

See you tomorrow for I hope a less head-aching day 🙂