Register    Login    Forum    Search    Chat [0]    FAQ

Board index » Emerald Hills General Forums » General Amtgard




Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: Vectored Judging System
 Post Posted: Tue Jun 14, 2016 8:01 am 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
I have spoken to many of you before I stepped up as Regent about my "new judging system". I spoke again at length at endreign. I wanted to let you know that I will be posting in this thread my write-ups and discussion points. Alpha testing of the system has already happened and more widespread conversations are happening around this concept so soon we'll move into the beta testing.

The next post will be my first write up on this subject. This system is still being developed and tweaked. So far every question about it has been good and helpful. Understanding all the reasoning behind it might ultimately be best left for in person discussions so hopefully you'll understand if my answer to your question is a "lets talk about that in person".

I personally feel very excited about this system and I know many others I've spoken with have become excited about it as well. With that excitement can come strong emotions, lets try and keep them all positive and if that fails keep it constructive and civil. [smilie=icon_cheesygrin.gif]

thanks!
Regent Solithan


Top 
 Profile  
 
 Post Posted: Tue Jun 14, 2016 8:03 am 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
A Multi-Vector Approach to Judging
Over the many years of Arts and Sciences in Amtgard at the Emerald Hills there have been many complaints about how judging and scoring work. Specifically the complaints are about how the current system is very subjective and inconsistent from contest to contest and even inconsistent between judges. Ask any judge what it takes to get a 5 on an entry and you’ll get multiple answers. The root of this problem is in the scoring itself. The judges are asked to present a single number on a page for the entry that represents the final score of the item. This single vector is not enough to convey how good the item actually is. It also doesn’t tell you very much about the item, especially when the item is somewhere in the middle of the range.

A multi-vector approach will add the level of detail the judges need to effectively score the item. This will bring consistency among judges and events. It will also provide valuable feed back to the contestants. In this new system 4 vectors will be used. The 4 vectors will be: Execution, Complexity, relevance/period and Aesthetic. Items will receive a score for each of the 4 vectors and each of those will be weighted separately. The weighted scores will combine to make the final overall score for the item. The weights of each vector can be determined based on the competition or kept static across the duration based on the regent’s decision.

Execution is the category typically weighted the heaviest at 40 to 45%. In most competitions the top concern is how well was the item executed. The bulk of the item’s score should come from this category. If there are no flaws in the execution of the item, then it should get a score of 5 in this category.

The complexity of the item is the second largest vector an item should be judged on. It should be weighted just under execution. Many factors go into the complexity of the entry and every entry has a complexity to it. Many items will not be able to score an overall perfect score of 5 because they lack the complexity, for example a macramé belt would likely not be so complex to warrant it. A decent general rule would be, if you can think of a way then entry could have been more complex then they didn’t max the complexity vector.

Relevance and Period are concepts that often come up in judging. If someone were to enter a computer program, it might be executed perfectly and could be very complex but it certainly wouldn’t be period and it might not be Amtgard relevant. While this category should be weighted far less than the others by default, it could be an interesting focus challenge for some competition to have this value raised up.

Aesthetic value is very subjective as beauty is said to be in the eye of the beholder but that doesn’t mean it doesn’t deserve a vector to be scored on. This vector is likely to have the highest bias among judges, due to this it should have the lowest weighted value. This is the category for expressing if you think and item is “pretty” or “Ugly” but an item’s overall score should not be too greatly impacted by this.

Once all the vectors have a value, the contestant has immediate feedback before the judge has written down a single word. If the judge thought the item was too simple, this will be shown by a low complexity score. If the contestant bit off more than they could chew, this would be represented in a high complexity score followed by a low execution score. If the judge just didn’t like the color you might see a low Aesthetic score or if the relevance was low that might indicate not to use plastic.


Top 
 Profile  
 
 Post Posted: Thu Jun 16, 2016 3:19 pm 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
A comment I was told was, "it's slower".
That's fine but I ask you to be more clear. What is slower? Math? Or critical thinking.
When you take out the 30 + entries that folks didn't want to submit in the first place you will have more time to judge it right.

If your answer was "because math" that's ok we will work on that.

I'll try and post a sample scoring sheet later.


Top 
 Profile  
 
 Post Posted: Sun Jun 19, 2016 7:26 pm 
Offline
User avatar

Joined: Mon Mar 25, 2013 7:07 pm
Posts: 71
You mentioned pictures, database, phone app, things of that nature. Can you please expand on that?

_________________
Countess Dame Mezzie
Kingdom Prime Minister


Top 
 Profile  
 
 Post Posted: Sun Jun 19, 2016 9:35 pm 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
Sure so everything is going to come out in phases and projects as the labor gets divided out to those that are good/want to do those things. Much of this will likely be developed concurrently but some things are going to take more time than others.

Goal #1 get the concept out there and get people talking about it.
Goal #2 get some baselines set and start training actual people on the system.
Goal #3 certify judges in the new system, use new system exclusively.
Goal #4 make the system more accessible. - this is the phase where we are going to try and get a phone app to make judging easier.
Goal #5 log the data. - this is where we start tracking all this data and pulling it together into a website to track and house. Upload pictures, upload scores.
Goal #6 expand. So loop these goals back on to themselves to get more out of it. By this I mean make an app or webpage to train and certify new judges.
Goal #7 Multi Kingdom. Get other kingdoms on board.


Top 
 Profile  
 
 Post Posted: Tue Jun 21, 2016 9:37 am 
Offline

Joined: Wed Aug 12, 2015 8:29 pm
Posts: 4
I really like breaking it down into categories for scoring. After our emergency park quals at Midnight Sun, we implemented a similar system that's been used for our park quals since. It gives everyone an idea of what to look for. I think the 4 categories you have simplifies it a lot, and makes it easier to provide feedback forms to the judges.


Top 
 Profile  
 
 Post Posted: Tue Jun 21, 2016 10:58 am 
Offline
User avatar

Joined: Mon Mar 25, 2013 7:07 pm
Posts: 71
What do you mean, certify the judges? It's hard enough at times to get a number of judges to do park anyway, now there will be a certification requirement? The proposed changes aren't rocket science or even accounting. Why make judges certified, and how?

_________________
Countess Dame Mezzie
Kingdom Prime Minister


Top 
 Profile  
 
 Post Posted: Wed Jun 22, 2016 4:30 pm 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
Thanks for the feedback and yes this concept is not unfamiliar to amtgard. We as a whole want to move this direction I see it every time I talk to folks or visit their parks.

Mezzie, fantastic question and I certainly see and share your concern. Am I accurately characterizing your concerns as follows ?
1) getting people to volunteer to be a judge is hard and adding an extra layer might reduce my available pool of judges.
2) how will the certification process take place.

I see this as a phased approach. For now, the certification process will simply be that the regent or their designated appointee, are happy in the level of knowledge of the vectored judging system of the potential judge. (Yes they understand the new system congrats you pass)

Mezzie I watched a vid where you explained the system to your park. This demonstrated to me your knowledge of the system. That alone would certify you at this phase.

What I would like to do is eventually come up with a test similar to the reeves test. Not something long that takes ages to do, but something you could do in 10mins that proves you know the system.

There is a final phase I would love to eventually get to but I'll go into that later.

Hopefully that addresses your concerns if not let me know.


Top 
 Profile  
 
 Post Posted: Wed Jun 22, 2016 5:12 pm 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
Ok this was touched on in the other thread so let me bring it up here. "But soli this isn't set up like an all thing item!"
Currently our system allows the regent to run things in the A&S world how they want with the slight exception of 1 event. The kingdom regent sets the tone for all the parks. I would like to eventually propose an all thing item to codify this proposed vectored judging system and get all the parks on the same page for at least some elements of the entire larger concept I have. For now I'm just going to use my powers as regent and say we are going to try this and if it seems like everyone is loving it I will write up the formal all thing proposal so it will continue.

If the quals proposal goes forward then all A&S tournenments can follow whatever format we set forth. It would be somewhat confusing to have to continue to do a single event 1 way and all the others a different way and that is probably why most regents just default to having all tourneys running like that one but if love to change that. We need to take back our A&S tourneys.


Top 
 Profile  
 
 Post Posted: Thu Jun 23, 2016 12:07 am 
Offline
User avatar

Joined: Mon Mar 25, 2013 7:07 pm
Posts: 71
I'd love to hear your further ideas about how you see this developing in the future. Please PM me at a point where I can give you my number so we might talk? I honestly see this as a hugely positive thing in judging, but I am curious, now.

_________________
Countess Dame Mezzie
Kingdom Prime Minister


Top 
 Profile  
 
 Post Posted: Fri Jun 24, 2016 3:38 pm 
Offline
User avatar

Joined: Tue Oct 20, 2015 9:39 pm
Posts: 12
I like it. I can see this going hand in hand with the new quals proposal. If both become "law" in a sense. The quals proposal could help give you more Judges AND more diverse opinions. As well as a more defined scoring system which could help the player refine his/her work.

I probably know the answer to this question? Do you think this new vector system will stop Judges who are going to Judge for the wrong reasons?
I.E. A recent A&S competition, I overheard a Judge say "Well I'm going to be a Dick". Now was he that way the entire competition, I don't think so. But their is that mindset.


Top 
 Profile  
 
 Post Posted: Thu Jun 30, 2016 12:13 pm 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
So let me address the question on the table first before I do a deep dive in to the future.

Will my changes currently being discussed prevent someone from being a dick? No.
Now for a slight spin, will my future changes detect someone being a dick? Yes.


So for the deep dive, consider most of what I’m about to go into theoretical. Its not all fleshed out and ready for public debate but since you want to know how this could play out and the “Ultimate” goal, hold your breath here we go!

Ok, so once we have all collectively agreed to use the new scoring method we will have structured data points. Once we have apps to help, we’ll have automatic uploads of this data. With a fancy website and the ability to attach pictures from the app we will have even more data points. These data points could be used as a training tool or a learning tool and most importantly as a historical reference.

For example, a judge could take a picture of an execution fail and it would get uploaded along with their score and notes, via the app, about how to correct this execution fail. This becomes feedback for the person entering it and for anyone else wanting to learn the craft. It doesn’t have to be one sided either; I expect execution successes to show that as well.

Now, everyone’s skill level and knowledge vary greatly. When we grab random people and make them judge random things you naturally get random results. These results become less random when you structure the scoring system for them to use but they are still random based on the knowledge level of the judge. You might get a judge that’s just being a dick or you might get “the Russian judge”.

This concept comes up in the judging circles where you have a judge that is an expert in their field and they get to judge an item in their field. They can get the perception that they are “being mean” when in fact they are just being very critical. Yes someone could just be a jerk and randomly score someone super low to “be mean” but there is a key difference here. One comes from spite the other comes from knowledge. Someone being mean won’t have the justification for the low scores and it will become obvious, more on that shortly.

I’ve posted how I think we should start with certified judges but that’s not where I want to stop. I want to have multiple levels of judge certification. We level test to allow you to play a certain class, why not extend the concept to level test to allow you to judge at a certain competition level. My goal is to have level tests for all the major groupings and another set of tests for all the categories. The end result will be area experts and subject matter experts. These will be voluntary tests folks can take to “prove” their worth. Yes you could take and pass them all and be an expert in everything but it’s unlikely that there are many folks out there with that level of knowledge. These folks are likely to be the people you expect them to be, your Masters and folks nearly to Master Hood. This step is needed in part to show who has knowledge. (It’s important for the previous paragraph. More soon I promise)

Generally we assume with knowledgeable people judging, their scores will cluster around a point. We don’t expect them to all be the same but we expect them to be close, within an expected range. Many previous Regents put in place schema to enforce this such as drop the lowest and the highest or average them together or whatnot. They all did this to find the center point we “know” to exist. Outliers exist for many different reasons but typically they exist because someone might know more than the others or worse, they don’t know enough or anything about what they are judging so they randomly assign. This problem is now detectable with the framework I’ve laid out.


If you’re still holding your breath you’re likely about to pass out but before you do, here comes the MATH!

Yes “Because Math”!

I can take all those data points, then I can factor in expertise and I can tell you if a score is statistically significant or not. In other words, I can find out if a score is an outlier.

Let me use a non-math example to help illustrate the point. I have 3 judges, 2 master garbers and a random stick jock. The item entered is a dress. The stick jock thinks it’s amazing and gives it all 5s. The master’s don’t, not even close.

In the above example it’s easy to see the outlier and the scores would show it as well with some math. To make things even more elegant you can ‘weight’ your formula by factoring in those expertise levels. This helps assign a relevance to the score.

More anecdotes, when I receive a score from a subject matter expert, on their subject, it means more to me than the other scores because I feel it’s a more true reflection of my work. A high score is nice but it doesn’t mean as much if I don’t think I really earned it. I don’t suspect I’m the only one that feels this way, in fact I suspect most of Amtgard holds this belief to some degree.


Should I crack the smelling salts to revive you?


TLDR
The system can detect bad scores and bad judges and structure this aspect of our game in a way few ever thought possible.


Top 
 Profile  
 
 Post Posted: Thu Jun 30, 2016 7:10 pm 
Offline
User avatar

Joined: Tue Jan 15, 2008 3:50 am
Posts: 150
Liking the ideas from what I've heard so far.

One thing I heard talked about over time and what I still think would be good is a gallery of past entries. The gallery could mention what was notiable about the piece and why it received the scores it did. Would also help towards training.


Top 
 Profile  
 
 Post Posted: Thu Jun 30, 2016 8:43 pm 
Offline

Joined: Mon Oct 15, 2012 9:14 pm
Posts: 87
Clu Da'Bard wrote:
Liking the ideas from what I've heard so far.

One thing I heard talked about over time and what I still think would be good is a gallery of past entries. The gallery could mention what was notiable about the piece and why it received the scores it did. Would also help towards training.

Yep part of the end game concept.


Top 
 Profile  
 
 Post Posted: Mon Jul 03, 2017 12:38 pm 
Offline

Joined: Wed Aug 12, 2015 8:29 pm
Posts: 4
Have the vector score sheets been posted somewhere? If not, could you? Or share them with me? I'm Emily Foster on Facebook.


Top 
 Profile  
 
Display posts from previous:  Sort by  
 
Post new topic Reply to topic  [ 15 posts ] 

Board index » Emerald Hills General Forums » General Amtgard


Who is online

Users browsing this forum: No registered users and 58 guests

 
 

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron